Episode 95

E95: Understanding Data Integrity and Liability in Generative AI with Joy Butler

Are you confident in the origins of your AI training data?

In this episode, we unravel the complexities of generative AI with IP attorney Joy Butler, delving into data provenance, liability, and the transformative role of diversity in AI.

We discuss the importance of verifying your AI training data to avoid copyright issues, the need to avoid inputting confidential information in AI prompts and ensuring compliance with contractual obligations, and how AI companies are offering indemnification and adopting licensing models to provide legal assurances and mitigate liabilities.

Tune in to stay ahead in the fast-evolving AI domain!

🔍 Three Key Takeaways:

Transparency and Traceability: Always verify the source of your AI training data to avoid copyright violations and ensure compliance.
Best Practices for AI Use: Be cautious about inputting confidential information in AI prompts and ensure your AI usage aligns with contractual obligations.
Indemnification and Licensing: AI companies are increasingly offering indemnification and adopting licensing models to provide users with legal assurances and mitigate liabilities.

Resources Mentioned:

Copyright Clearance Center

More About Our Guest:

Joy R. Butler helps companies and investors craft innovative business models, mitigate risks, and devise strategies for ventures into new markets, technologies, and product lines in the entertainment and digital technology industries. She is a graduate of Harvard College and Harvard Law School.

Connect with Joy Butler:

Charity Mentioned: https://girlswhocode.com/

Connect with Erin to learn how to use intellectual property to increase your income and impact. hourlytoexit.com/podcast.

Erin's LinkedIn Page: https://www.linkedin.com/in/erinaustin/

Hourly to Exit is Sponsored By:

This week’s episode of Hourly to Exit is sponsored by the NDA Navigator. Non-disclosure agreements (NDAs) are the bedrock of protecting your business's confidential information. However, facing a constant stream of NDAs can be overwhelming, especially when time and budget constraints prevent you from seeking full legal review. That's where the NDA Navigator comes to your rescue. Designed specifically for entrepreneurs, consultants, and business owners with corporate clients, the NDA Navigator is your guide to understanding, negotiating, and implementing NDAs. Empower yourself with legal insights and practical tools when you don’t have the time or funds to invest in a full legal review. Get 20% off by using the coupon code “H2E”. You can find it at www.protectyourexpertise.com.

Think Beyond IP YouTube Page: https://www.youtube.com/channel/UCVztXnDYnZ83oIb-EGX9IGA/videos

Music credit: Yes She Can by Tiny Music

A Team Dklutr production

Transcript

Speaker: 00:00:00

Hello, ladies.

Speaker: 00:00:01

Welcome to this week's episode of the hourly to exit podcast.

Speaker: 00:00:04

I have a very special guest today.

Speaker: 00:00:06

My law school classmate, joy Butler joy.

Speaker: 00:00:10

Welcome.

Speaker: 00:00:10

And thank you so much for joining us.

Speaker: 00:00:12

Thank you, Aaron.

Speaker: 00:00:13

I am honored to have been asked to be a guest.

Speaker: 00:00:17

Well, we're very excited to have you because AI could not be more

Speaker: 00:00:23

top of mind, for this audience.

Speaker: 00:00:25

And so as someone who has written extensively and spoken

Speaker: 00:00:29

about AI, I definitely wanted to have you on to, , go deep.

Speaker: 00:00:33

So before we get started, would you introduce yourself to the audience?

Speaker: 00:00:37

Sure.

Speaker: 00:00:37

so as you already shared, I am an attorney and in my law firm practice,

Speaker: 00:00:42

I provide product counsel services.

Speaker: 00:00:45

that essentially means I provide a combination of strategic and legal advice.

Speaker: 00:00:52

To companies that are, going into new lines of business or launching new

Speaker: 00:00:58

products new features of existing products or forming strategic partnerships.

Speaker: 00:01:04

And I come by that from, two areas of wall where I have a deep, in depth

Speaker: 00:01:11

knowledge, and that includes the technology side where I have worked on

Speaker: 00:01:18

and, help to structure probably literally, over 1000 contracts over the course

Speaker: 00:01:25

of my career for, all the contracts 1 would need when 1 is doing business.

Speaker: 00:01:30

Online and in digital technology, including end user license arrangements

Speaker: 00:01:36

and terms and conditions and the other prong of my in depth legal

Speaker: 00:01:41

knowledge concerns entertainment and copyright and this is where

Speaker: 00:01:46

you and I overlap quite a bit.

Speaker: 00:01:48

so I work on a lot of, creative content contracts also advise companies on.

Speaker: 00:01:56

Protecting their, copyrights and trademarks and, work with companies

Speaker: 00:02:02

that want to use, someone else's content, doing a lot of work

Speaker: 00:02:06

in the rights clearance area.

Speaker: 00:02:08

just to give your audience a little more of a flavor of the

Speaker: 00:02:12

types of projects I might work on, most of them are in the digital

Speaker: 00:02:16

technology and entertainment space.

Speaker: 00:02:18

for So a couple of projects include helping an entertainment social media

Speaker: 00:02:24

network launch, working with an commerce retail site that was incorporating a lot

Speaker: 00:02:32

of album cover work and original artwork.

Speaker: 00:02:37

another was, an ad supported, stock simulation game.

Speaker: 00:02:41

And here's something that may resonate with your audience, helping

Speaker: 00:02:46

a professional in the finance area take, this niche financial service

Speaker: 00:02:51

he was offering and, convert it into an online software as a service.

Speaker: 00:02:59

product.

Speaker: 00:03:01

so, that is me in a nutshell.

Speaker: 00:03:03

Awesome.

Speaker: 00:03:04

When did you first when I think even tell you the day I first heard about AI.

Speaker: 00:03:08

Where were you when you first heard about it?

Speaker: 00:03:10

What was the context and what were your initial thoughts?

Speaker: 00:03:14

I don't remember, the first time I heard about, JATGPT.

Speaker: 00:03:20

Right.

Speaker: 00:03:20

That may be what you're referring to.

Speaker: 00:03:22

Yeah.

Speaker: 00:03:22

but, actually within my practice, I have for quite some time been experimenting

Speaker: 00:03:29

with, trying to take some of my knowledge And, develop it into, digital tools,

Speaker: 00:03:36

making it more accessible to people.

Speaker: 00:03:39

as you know, I've written a couple of books on my areas of in depth knowledge.

Speaker: 00:03:44

So, one of the things I've been experimenting with is, taking some of that

Speaker: 00:03:49

knowledge and offering it in a digital format, one, experiment I believe I shared

Speaker: 00:03:54

with you was A contest and promotion tool, which asked a number of questions

Speaker: 00:04:01

and then gave you kind of a checklist.

Speaker: 00:04:04

of the legal questions you might ask before going forward with that.

Speaker: 00:04:09

and I've asked, actually used a tool, that a lot of, attorneys and,

Speaker: 00:04:15

well, it is a, Interview construction tool targeted to the legal space.

Speaker: 00:04:20

It's called Doc.

Speaker: 00:04:21

Assemble.

Speaker: 00:04:22

It's actually open source and spent a little bit of time.

Speaker: 00:04:26

tinkering around with that, is a long way to answer your question.

Speaker: 00:04:31

I was familiar with automation and artificial intelligence

Speaker: 00:04:35

through that process.

Speaker: 00:04:36

But when chat GPT came to my attention, that may have been around the same time

Speaker: 00:04:41

as it came to everyone else's attention.

Speaker: 00:04:43

I kept hearing about it and

Speaker: 00:04:46

right.

Speaker: 00:04:46

Right.

Speaker: 00:04:47

I guess I'd heard about it, but it was just noise to me kind

Speaker: 00:04:50

of like block train or crypto.

Speaker: 00:04:52

don't need to know that.

Speaker: 00:04:53

I don't want to know it, until finally I could no longer

Speaker: 00:04:56

ignore it, which was during.

Speaker: 00:04:58

Yeah.

Speaker: 00:04:58

And MCLE where I needed to get some credits, so I wasn't delinquent.

Speaker: 00:05:02

And so I'm listening to this one about AI and it's describing, they were talking

Speaker: 00:05:07

about chat, GBD in particular, and they're describing what you could do.

Speaker: 00:05:10

And they're having these samples and I'm like, what it can do.

Speaker: 00:05:13

What?

Speaker: 00:05:14

And so while I'm still in there, you know, it was just online.

Speaker: 00:05:17

I'm silly.

Speaker: 00:05:17

God forbid I go someplace in person.

Speaker: 00:05:18

and then I'm on my computer, like.

Speaker: 00:05:21

Doing stuff with it.

Speaker: 00:05:21

I'm like, Oh my God, this is bad.

Speaker: 00:05:24

And that was, well, it was February, 2023 and that was my initiation.

Speaker: 00:05:28

So what the last year has been, actually a fire hose of information

Speaker: 00:05:33

and changes in that time.

Speaker: 00:05:34

so I think chat GPT, it's the AOL of our times.

Speaker: 00:05:40

It's this technology that's been around for a while, but we finally have this

Speaker: 00:05:44

application that has made it a much more, accessible and user friendly

Speaker: 00:05:49

for a much wider group of people.

Speaker: 00:05:51

Yeah.

Speaker: 00:05:51

I mean, I guess, you know, when you think artificial intelligence has

Speaker: 00:05:54

been a while, I mean, obviously we've always had autocorrect and things

Speaker: 00:05:57

like that, or, all those things were artificial intelligence, right?

Speaker: 00:06:00

things like Alexa and Siri, right?

Speaker: 00:06:03

I mean, those Versions of it.

Speaker: 00:06:04

We just didn't think of it the way that we think of AI now.

Speaker: 00:06:08

Exactly.

Speaker: 00:06:09

It's been around for a while.

Speaker: 00:06:10

We just finally got a killer app in chat GPT.

Speaker: 00:06:15

Right.

Speaker: 00:06:15

Awesome.

Speaker: 00:06:16

So a lot of questions that I get are around, where's this data coming from?

Speaker: 00:06:22

what is the black box of, generative AI in particular we're talking about.

Speaker: 00:06:26

and what do I need to worry about?

Speaker: 00:06:29

are they taking my prompts and what are they doing with it?

Speaker: 00:06:33

client who is, utilizing signing an agreement to utilize the contract

Speaker: 00:06:39

review a I like, what are the issues regarding using 1 of those?

Speaker: 00:06:43

So everybody has questions about, What happens when I use AI and

Speaker: 00:06:48

what do I need to worry about?

Speaker: 00:06:49

And where does that, data come from and what is my exposure?

Speaker: 00:06:54

So I would just like to start from the top.

Speaker: 00:06:57

I think most of the audience is familiar, AI, but let's talk

Speaker: 00:07:01

about what training data is.

Speaker: 00:07:04

Like does it get its information from?

Speaker: 00:07:06

How does it get in there?

Speaker: 00:07:07

And, yeah, just start there with a general.

Speaker: 00:07:10

Yeah.

Speaker: 00:07:10

when we talk about a I models and some of the copyright and licensing

Speaker: 00:07:15

issues, there are kind of 2 categories.

Speaker: 00:07:19

category is the input.

Speaker: 00:07:21

And the 2nd category is the output.

Speaker: 00:07:24

When we're talking about generative, a, I, So when you mention training

Speaker: 00:07:28

material, you're talking about the first category of input.

Speaker: 00:07:32

And there has been a lot of controversy over whether or not, the training

Speaker: 00:07:37

material that is required to train these models, can be used without permission.

Speaker: 00:07:43

Because what the foundation models do, when I say foundation models, I mean

Speaker: 00:07:48

that Maybe eight or 10 models are around that, literally have millions of pieces

Speaker: 00:07:56

of content that they take into their kind of black box and, analyze it so that it

Speaker: 00:08:03

can be a general use large language model.

Speaker: 00:08:07

many of these models do is they source that data by getting data from

Speaker: 00:08:12

anywhere that they can, including, scraping the Internet for millions

Speaker: 00:08:17

and millions of pieces of data.

Speaker: 00:08:20

So, there's been, as I said, a lot of controversy around

Speaker: 00:08:23

whether or not permission is required for them to do that.

Speaker: 00:08:27

and many of these models are relying on now is an argument that, their use of that

Speaker: 00:08:34

material, as training material qualifies as a fair use to the Copyright Act.

Speaker: 00:08:41

I believe there number of, Areas a number of factors that will gradually

Speaker: 00:08:47

push these AI foundation models towards licensing that material.

Speaker: 00:08:53

1 of them is, is that there have been a number of lawsuits that have been filed

Speaker: 00:08:58

against them, charging them with copyright infringement and other related infractions

Speaker: 00:09:04

over their use of this material.

Speaker: 00:09:06

and.

Speaker: 00:09:07

A lot of those suits while all of those suits are still pending and they may

Speaker: 00:09:11

take a very long time to play out.

Speaker: 00:09:14

I think we're going to see progress towards more licensing prior to that.

Speaker: 00:09:19

And that's because people are very anxious.

Speaker: 00:09:22

to, use generative A.

Speaker: 00:09:24

Speaker: 00:09:25

And, before they use that, though, they want some comfort level that

Speaker: 00:09:30

their use that material is not going to subject them to any type of a

Speaker: 00:09:35

Speaker: 00:09:38

So in order to make their customers, comfortable, With the fact that they

Speaker: 00:09:44

can use this material without taking on any legal liability, we are seeing more

Speaker: 00:09:49

and more of these AI companies gradually move towards licensing the content.

Speaker: 00:09:56

want to follow up on that before I'm going to step back just a second,

Speaker: 00:09:59

because you said large language models, and then we have machine

Speaker: 00:10:03

learning and we have generative AI.

Speaker: 00:10:05

Are those synonyms?

Speaker: 00:10:06

Or are they all different elements?

Speaker: 00:10:09

Transcribed Okay,

Speaker: 00:10:11

not the expert here, but I'll share with you my understanding.

Speaker: 00:10:15

So, the large language models, they are, the general models that can process.

Speaker: 00:10:23

the generated output, so that means they take all of the input,

Speaker: 00:10:29

all of that training material, and they basically analyze it to see

Speaker: 00:10:34

what the relationship of each data point is to this other data point.

Speaker: 00:10:39

So, when you ask it to produce something, it is, estimating or.

Speaker: 00:10:46

Putting forth, it's analysis of what word should come next or what

Speaker: 00:10:52

should come next in this particular graph, which is why it needs so much

Speaker: 00:10:57

training material from which to learn.

Speaker: 00:10:59

Got it.

Speaker: 00:11:00

Okay.

Speaker: 00:11:00

Now, you mentioned going back to where it's going towards licensing because users

Speaker: 00:11:06

of AI, I want to know that they're not going to get sued they use the output.

Speaker: 00:11:11

what does that mean for all of the current data that has been scraped

Speaker: 00:11:16

from the Internet and all these places?

Speaker: 00:11:18

previously, I mean, isn't the data sets.

Speaker: 00:11:21

and our use of AI as is almost like too big to fail.

Speaker: 00:11:26

what could happen with these lawsuits that are happening right now, if there

Speaker: 00:11:30

are billions of pieces of, let's say pirated information and say, the chat

Speaker: 00:11:36

GBT, open AI is training data set.

Speaker: 00:11:38

what could the possible remedy be if they lose?

Speaker: 00:11:43

Okay, so I do want to separate this into 2 categories again, because

Speaker: 00:11:48

when we talk about infringement, there are 2 separate questions.

Speaker: 00:11:51

The 1st question being whether or not just the process of.

Speaker: 00:11:56

Of the, a I companies, taking in data as training material and using

Speaker: 00:12:01

it to train their model, whether or not that's copyright infringement.

Speaker: 00:12:05

That's one question.

Speaker: 00:12:06

And then the second question is if you as a user of these models,

Speaker: 00:12:12

if you produce content and.

Speaker: 00:12:15

Use it to produce generated content.

Speaker: 00:12:17

Is there any legal liability for you?

Speaker: 00:12:19

Now, there are circumstances that could be imagined where, it's

Speaker: 00:12:25

possible for the, models, training data to be considered a fair use.

Speaker: 00:12:33

But maybe the way you've used it in creating output, is infringing

Speaker: 00:12:39

or violating in some way.

Speaker: 00:12:41

I'm not saying that scenario has actually come up or may come up

Speaker: 00:12:45

often, but 1 can imagine a set of circumstances where that might be true.

Speaker: 00:12:50

So, back to your original question, where is all this going?

Speaker: 00:12:53

What are the potential remedies?

Speaker: 00:12:55

well, 1 remedy with respect to these lawsuits is that they will settle with

Speaker: 00:13:02

a lot of these companies because the companies that have sued them have been

Speaker: 00:13:06

the largest companies with the most resources and very large organizations.

Speaker: 00:13:12

Like the author's guild, so they may settle, come to some agreement

Speaker: 00:13:18

on what a settlement fee should be.

Speaker: 00:13:21

And it's also possible that part of their settlement might be a

Speaker: 00:13:25

licensing agreement going forward.

Speaker: 00:13:27

that resolves matters for, the large organizations that have sued and.

Speaker: 00:13:33

The large private companies, if it's an organization or association, representing

Speaker: 00:13:37

much smaller players, it remains to be seen how much might flow to them

Speaker: 00:13:43

as part of any judicial settlement.

Speaker: 00:13:45

It may be that as opposed to a private settlement, we might get

Speaker: 00:13:50

some sort of a judicial settlement.

Speaker: 00:13:53

I think.

Speaker: 00:13:54

It's perhaps less likely, but it might be one of the outcomes and that

Speaker: 00:13:59

might be a settlement like something that was proposed in Google Books.

Speaker: 00:14:04

Now, the Google Books lawsuit, if anyone remembers his lawsuit from 2015.

Speaker: 00:14:10

This is the lawsuit that came out of Google Books starting its program

Speaker: 00:14:14

where it digitized millions of books and use them and still uses

Speaker: 00:14:18

them today to give a snippet of books in response to our search.

Speaker: 00:14:23

So that is one of the cases on which a lot of these AI model companies

Speaker: 00:14:27

rely when they argue that their use of the training material is a fair use.

Speaker: 00:14:33

For those who remember, the Google Books case initially, tried to resolve

Speaker: 00:14:39

itself, via a judicial settlement agreement that would have permitted

Speaker: 00:14:44

the snippets of those books and allowed the digitization, but that judicial

Speaker: 00:14:49

settlement, or the private settlement that was proposed, went to court.

Speaker: 00:14:53

Very much beyond just providing snippets, which is, one of the reasons

Speaker: 00:14:59

that it was ultimately, not approved by the court and kept going on and

Speaker: 00:15:05

ultimately said, okay, well, we're stripping out all this information.

Speaker: 00:15:09

you try to do in the settlement.

Speaker: 00:15:11

But, as consolation, we decided Google books that your use is a fair use.

Speaker: 00:15:16

So, it might be that some of the parties, try to move in that direction of some

Speaker: 00:15:21

type of a settlement that encompasses both small and larger players.

Speaker: 00:15:27

some of the other types of resolutions that have been thrown

Speaker: 00:15:31

out include kind of a collective.

Speaker: 00:15:34

that would be parallel to, the way we collect, public performance

Speaker: 00:15:39

royalties in the music industry.

Speaker: 00:15:41

So, for example, when a song is performed on the radio, all songwriters,

Speaker: 00:15:48

receive some income from their songs they've written being played.

Speaker: 00:15:51

Well.

Speaker: 00:15:52

the radio station is not going out and I'm entering into license agreements

Speaker: 00:15:56

with the millions of songwriters.

Speaker: 00:15:58

They have collected is in the case of music, as cap and BMI and couple others.

Speaker: 00:16:03

that, have these collective agreements where they issue blanket licenses.

Speaker: 00:16:08

So something like that has been proposed, potentially for,

Speaker: 00:16:13

the training material space.

Speaker: 00:16:14

So that brings in both, rights owners with very large catalogs and rights

Speaker: 00:16:19

owners with very small catalogs.

Speaker: 00:16:22

The copyright office had a comment period where it asked a bunch of industry

Speaker: 00:16:27

players what they thought of this and most of the people who commented were

Speaker: 00:16:31

very much in favor with, direct licensing, or perhaps even aggregated licensing

Speaker: 00:16:38

now that may be in part because that's where, larger companies are going to

Speaker: 00:16:43

get kind of the premium licensing.

Speaker: 00:16:46

Um, because the direct licenses we've seen, today have been between, you

Speaker: 00:16:50

AI model companies and very large organizations for millions of dollars.

Speaker: 00:16:55

just like the, what we're talking about in the collective licensing, example,

Speaker: 00:17:00

those very large companies are not going to enter into license agreements

Speaker: 00:17:05

with, millions of small players.

Speaker: 00:17:09

there'll be some balancing where they too can participate.

Speaker: 00:17:12

1 potential example of how this might be alleviated is through aggregators.

Speaker: 00:17:18

So, one aggregator we have right now is the Copyright Clearance Center,

Speaker: 00:17:22

which is aggregating, scientific papers for use in training material, and

Speaker: 00:17:28

that allows, smaller rights owners to participate in, having their material

Speaker: 00:17:33

and being paid for their material to be used as training material,

Speaker: 00:17:37

if that's what they choose to do.

Speaker: 00:17:39

An example in, this space I've seen, come forward as a startup is called Dappier,

Speaker: 00:17:46

and that is a startup that is, dedicated to getting those smaller, rights owners,

Speaker: 00:17:54

giving them the opportunity to participate in being a part of training material.

Speaker: 00:17:59

and making that training material more accessible to both the large, AI models,

Speaker: 00:18:07

and, you know, the smaller, AI companies that might have fewer resources and not

Speaker: 00:18:12

be as able to, compete when, you know, license, you know, is Agreements are going

Speaker: 00:18:16

for millions and millions of dollars.

Speaker: 00:18:18

Yeah.

Speaker: 00:18:18

Yeah.

Speaker: 00:18:18

I mean, it sounds like this would all have to be perspective.

Speaker: 00:18:21

I mean, if, the AI companies have been scraping the Internet for we

Speaker: 00:18:25

don't know how long and is it even able to distinguish 1 piece of data?

Speaker: 00:18:30

In the data set from another, I don't know, like, how would you compensate all

Speaker: 00:18:35

the information that's already in there.

Speaker: 00:18:37

and in order to, parcel out payments, whatever fraction of a penny that, I might

Speaker: 00:18:43

get for, something, and going forward.

Speaker: 00:18:46

If you are a small content creator, kind of your everyday content creator,

Speaker: 00:18:50

like the audience here, it would then be on you to make sure that

Speaker: 00:18:55

your content is registered somewhere.

Speaker: 00:18:57

So you'd be part of some aggregator that has a license

Speaker: 00:19:00

who is getting paid by the AI, AI

Speaker: 00:19:03

Okay.

Speaker: 00:19:04

So several issues in that question.

Speaker: 00:19:07

okay.

Speaker: 00:19:08

Let's go with, That first part where you talk about kind of the provenance,

Speaker: 00:19:12

what was the source of the data?

Speaker: 00:19:14

Is it even traceable?

Speaker: 00:19:16

And this is one of the pain points.

Speaker: 00:19:18

And this is also where that analysis about whether your output subjects

Speaker: 00:19:25

you to any type of liability.

Speaker: 00:19:28

so back up for a second.

Speaker: 00:19:30

if you are any type of content creator, and you are trying to determine

Speaker: 00:19:35

whether or not the content you've created, is violating any rights,

Speaker: 00:19:40

you need to know its source, right?

Speaker: 00:19:43

So, For the output that they have, if that provenance is not available,

Speaker: 00:19:49

you using the generative AI, you can't even do that analysis.

Speaker: 00:19:53

So that's part of the pressure on the AI model companies.

Speaker: 00:19:57

in not just waiting for these lawsuits to play out, but making their

Speaker: 00:20:02

potential customers comfortable that you can use our AI models and it can

Speaker: 00:20:07

produce output that you can then use.

Speaker: 00:20:11

And so part of having to do that is knowing the provenance.

Speaker: 00:20:16

Now, the extent to which they currently do that.

Speaker: 00:20:19

I don't know.

Speaker: 00:20:20

I model companies have often.

Speaker: 00:20:22

Been quite opaque and not very transparent about how the sausage is being made.

Speaker: 00:20:28

on the outside, though, like again as another example of where the industry

Speaker: 00:20:32

is going, there has cropped up another, kind of startup in this space called

Speaker: 00:20:38

barely trained, which is offering certification for AI model companies.

Speaker: 00:20:44

that have, produced their models relying solely on an authorized data set.

Speaker: 00:20:52

And then, you know, theory is, you are, a company that wants to leverage AI,

Speaker: 00:20:58

you can get more comfort in knowing that, you're relying on AI model, an

Speaker: 00:21:03

AI company that is fairly trained.

Speaker: 00:21:04

And the last time I checked, there were only a few dozen companies that had

Speaker: 00:21:08

that certification, but, that may grow.

Speaker: 00:21:11

So, maybe enterprise users would go for the fairly trained type,

Speaker: 00:21:15

because they're much more concerned, frankly, than most everyday users

Speaker: 00:21:20

about the quality of that output.

Speaker: 00:21:22

it seems like if they're using it, to create public facing

Speaker: 00:21:26

materials, they would want that fairly trained data set behind it.

Speaker: 00:21:31

they did, they also give reps and warranties when you go through them

Speaker: 00:21:33

regarding the quality of the output.

Speaker: 00:21:35

Do they give representations and warranties fairly trained provided

Speaker: 00:21:39

anyone who well, fairly trained or someone who has licensed their

Speaker: 00:21:44

data from fairly trained would they then in their terms of use have.

Speaker: 00:21:49

Represent fairly,

Speaker: 00:21:51

trained doesn't license data.

Speaker: 00:21:53

Fairly trained is a certification program.

Speaker: 00:21:56

So if an AI company, wants this certification to show everyone, that.

Speaker: 00:22:04

They have relied on an authorized data set, then this is a

Speaker: 00:22:07

certification that they can apply for.

Speaker: 00:22:10

Got it.

Speaker: 00:22:10

Okay.

Speaker: 00:22:11

because I believe that there are some platforms that do provide

Speaker: 00:22:15

indemnification, although they have a bunch of provisos, where's that going?

Speaker: 00:22:19

So that users feel more.

Speaker: 00:22:21

Comfort there, right?

Speaker: 00:22:22

So, I mean, I think that's part of their responding to, this pain point

Speaker: 00:22:27

of needing to make their customers more comfortable with using their product.

Speaker: 00:22:32

they are, providing certain indemnifications, it remains to be seen,

Speaker: 00:22:37

how effective those indemnifications would be if a customer were actually sued.

Speaker: 00:22:44

And as you mentioned, they do have, a lot of exclusions, personally, I think that is

Speaker: 00:22:51

just sort of an intermediate stop gap and they are going to be pushed more towards,

Speaker: 00:22:57

More licensing of their data sets.

Speaker: 00:23:01

and I would say, while we wait for this to play out, I mean, as you

Speaker: 00:23:05

know, this lawsuit could take and probably will take a very long time.

Speaker: 00:23:09

the Google books case, for example, on which the AI companies are

Speaker: 00:23:12

relying to 10 years before finally reaching, that conclusion that, Google

Speaker: 00:23:18

books digitization was a fair use.

Speaker: 00:23:21

So I would say in the interim, AI companies, and those.

Speaker: 00:23:26

producing AI models should look more to, using, authorized data sets or

Speaker: 00:23:33

construction of their models and authorized data sets with a traceable

Speaker: 00:23:38

provenance so that, their customers, when using the, output or wanting

Speaker: 00:23:44

to put the output into use, can.

Speaker: 00:23:47

Know what the source is and do that analysis of is this

Speaker: 00:23:51

violating any copyright?

Speaker: 00:23:52

Is this violating any right of publicity or trademark or anything else?

Speaker: 00:23:57

I would say for the companies that want to leverage a I, when you're looking for

Speaker: 00:24:01

partners, you do want to look at partners who are using authorized data sets.

Speaker: 00:24:08

Right now, what I see is that a lot of companies, brands.

Speaker: 00:24:13

companies in the film and television industry that are actually leveraging AI

Speaker: 00:24:19

and it is being leveraged, but they're using it for a first draft or a proof

Speaker: 00:24:26

of concept for things that are iterative and you'll need to be turned around

Speaker: 00:24:31

very quickly, but they're not using it.

Speaker: 00:24:34

As part of the final consumer facing output, just due to those copyright

Speaker: 00:24:42

reasons, both the reasons we just discussed, fear of having any type

Speaker: 00:24:47

of legal liability, but also, because there are limitations on the degree to

Speaker: 00:24:53

which you can protect, output that's generated by, artificial intelligence.

Speaker: 00:24:59

Right, so when they have a.

Speaker: 00:25:01

Authorized data set.

Speaker: 00:25:03

And does the output come with footnotes with What does that look like?

Speaker: 00:25:07

Do you know?

Speaker: 00:25:07

Have you seen, what that looks like

Speaker: 00:25:09

tell us what the sources are with it?

Speaker: 00:25:12

Like, does it identify?

Speaker: 00:25:13

Yeah.

Speaker: 00:25:13

Oh, oh, oh, I see.

Speaker: 00:25:15

When is it authorized?

Speaker: 00:25:15

It is right.

Speaker: 00:25:16

To my knowledge, it is not coming with anything.

Speaker: 00:25:21

And you're talking about the fairly trained component, right?

Speaker: 00:25:23

Yeah, is not coming with anything.

Speaker: 00:25:26

but that does need to be a path toward which we're traveling.

Speaker: 00:25:31

And there has been like a lot of conversation about that in this space that

Speaker: 00:25:36

it needs to be, you marked, needs to be traced in terms of, what was the source?

Speaker: 00:25:42

What did you rely on to do that?

Speaker: 00:25:46

Yeah.

Speaker: 00:25:46

As far as, the magic that happens inside of generative AI

Speaker: 00:25:50

platform, do we know what that is?

Speaker: 00:25:53

Or is that kind of the trade secrets of each companies?

Speaker: 00:25:56

Or is there general technology that Makes the magic happen?

Speaker: 00:26:00

I am not the expert on the technology inside of the AI models.

Speaker: 00:26:06

I can share what I know.

Speaker: 00:26:08

In part, it does depend on the approach that they've used, whether it's supervised

Speaker: 00:26:14

learning or unsupervised learning.

Speaker: 00:26:16

Which to make it very simple depends on how much you assisted the machine,

Speaker: 00:26:21

like, did you mark things and tell them, this is a dog and this is a cat

Speaker: 00:26:26

or did you just give them like kind of millions of pictures and kind of let

Speaker: 00:26:29

them figure it out when you let them figure it out when it's unsupervised,

Speaker: 00:26:35

it is more of a black box in terms of how they got to that answer,

Speaker: 00:26:40

which brings up all sorts of other.

Speaker: 00:26:43

societal issue, right?

Speaker: 00:26:45

I think it's a time to play it about

Speaker: 00:26:47

a, uh, interesting.

Speaker: 00:26:48

can we wrap up with some best practices just for your everyday kind of chat, GBT,

Speaker: 00:26:53

Janet, what is the Google on Genesis?

Speaker: 00:26:55

What does it, user like when they're using it, for this audience, the expertise

Speaker: 00:26:59

based business, maybe they're using it to create first drafts or to help them with

Speaker: 00:27:03

social media posts or something like that, like just some general best practices.

Speaker: 00:27:08

Sure, you want to be, circumspect about any, confidential or proprietary

Speaker: 00:27:15

information you include in a prompt, may want to anonymize it.

Speaker: 00:27:21

you need to keep in mind that, whatever output you get from, the

Speaker: 00:27:26

AI model, may not be eligible for copyright protection if this is, um.

Speaker: 00:27:31

something, material or output that you are passing on to a client or to a

Speaker: 00:27:38

customer, you may need to disclose that use, and you have to make sure that,

Speaker: 00:27:45

you're using the output Depending on the extent to which you're using it, are

Speaker: 00:27:51

you using it just for, a little bit of assistance in, modifying a few sentences?

Speaker: 00:27:57

Or are you actually producing images with it or producing an entire report with it?

Speaker: 00:28:02

You gonna want to make sure that you're procedures for like using generative

Speaker: 00:28:07

AI are consistent with the, contract that you have with your customer.

Speaker: 00:28:12

If you want to know whether or not, your material, your prompts are being

Speaker: 00:28:17

incorporated into the training data and being used to further train that AI model,

Speaker: 00:28:25

take a look at the terms and conditions.

Speaker: 00:28:28

to give you an example for chat, if you're using the free model,

Speaker: 00:28:34

and it is, recording your history.

Speaker: 00:28:36

Of your prompts, then, the prompts that you put in.

Speaker: 00:28:40

There are subject to being included as part of future training data.

Speaker: 00:28:45

Yeah.

Speaker: 00:28:46

So, if it along the left hand side there, you have scroll

Speaker: 00:28:49

through and see all your graphs.

Speaker: 00:28:51

I have.

Speaker: 00:28:51

that means it is going into the training data.

Speaker: 00:28:54

That is excellent.

Speaker: 00:28:55

can't say with certainty that it is going into the training data.

Speaker: 00:28:58

But I would say, it is susceptible to being used.

Speaker: 00:29:02

It's like they have not provided you, CHAT2P has not provided

Speaker: 00:29:06

you any representation that they will not use it for training.

Speaker: 00:29:09

Right.

Speaker: 00:29:10

Very good.

Speaker: 00:29:10

Thank you for making that distinction.

Speaker: 00:29:12

thank you for this.

Speaker: 00:29:13

this podcast is to help create a society that, and an economy

Speaker: 00:29:18

that works for more of us.

Speaker: 00:29:20

So I love to ask my guests, if there is an organization or a person who is doing the

Speaker: 00:29:26

good and hard work to help make an economy that works for more of us, is there

Speaker: 00:29:31

one that you'd like to share with us?

Speaker: 00:29:32

Sure, I really like organizations whose mission it is to

Speaker: 00:29:37

bridge the digital divide.

Speaker: 00:29:39

And one of my favorite is girls who code that has as part of its

Speaker: 00:29:44

mission, introducing more women.

Speaker: 00:29:47

into the technology field, and that's very apropos to our conversation

Speaker: 00:29:51

today, because as part of making AI, you beneficial for all humankind, we

Speaker: 00:29:58

really do need, a diverse perspective.

Speaker: 00:30:02

Yeah, I mean, we know that just when we talk about the Trina dating sets, like

Speaker: 00:30:06

what data is going in there, obviously.

Speaker: 00:30:08

what the output is only as diverse as the input, right.

Speaker: 00:30:11

And how it's being trained.

Speaker: 00:30:13

And I know that has come up in a number of, controversial ways as well, but

Speaker: 00:30:17

whether something's leaning this way or that way, but we definitely want to make

Speaker: 00:30:20

sure everyone has a voice in the future.

Speaker: 00:30:23

Thank you for that one.

Speaker: 00:30:24

And we will put that in the show notes along with how people can reach you.

Speaker: 00:30:29

where do you hang out, Joy?

Speaker: 00:30:30

And how can people get in touch with you to find out more?

Speaker: 00:30:32

Sure, so I'm always, , happy to, chat with, people doing innovative

Speaker: 00:30:38

things with technology, especially in the digital technology, online and

Speaker: 00:30:43

entertainment space, so they can find me through my website, which is www.

Speaker: 00:30:47

joybutler.

Speaker: 00:30:49

com.

Speaker: 00:30:49

And I'm awesome.

Speaker: 00:30:50

So, on LinkedIn, I have Joy Butler.

Speaker: 00:30:53

Awesome.

Speaker: 00:30:53

Well, thank you so much.

Speaker: 00:30:55

And, yes, everyone, please, follow Joy and, let us know if you have

Speaker: 00:30:59

any other questions about AI.

Speaker: 00:31:01

I know it's constantly evolving.

Speaker: 00:31:03

There's always going to be something new and we can continue

Speaker: 00:31:06

this conversation in the future.

Speaker: 00:31:08

Thanks again, Joy.

Speaker: 00:31:09

Thank you.

Scaling Expertise

Episode 95

E95: Understanding Data Integrity and Liability in Generative AI with Joy Butler

Transcript

About the Podcast

Listen for free

Join our newsletter community to stay connected!