Oh, the Places I Go...on the backs of public data sources!

July 25, 2023

Oh, the Places I Go...on the backs of public data sources!

When I was first shown OpenAI's conversational chat API Playground in August of last year, I thought I'd found the neatest little internet peculiarity there was. I pushed it to give longer, and more detailed answers to question after question and giggled away at the relatively lacking boundaries to its capability.

When I began to write a blog publishing my conversations with one particular instance of the API in November, I was bursting with ideas. I could start my posts with quotes of Playground's responses, ask questions the way a UX researcher would to tease out the factors that determined its responses, explore philosophical questions on the nature of preference, self identity, etc.

And for a time, it was good. Around then I had also been introduced to DALL-E2, so I was creating cover art and thumbnails to accompany my experimentations with impunity. I say "impunity" because half of the content of my blog was being created by entities that owed their entire knowledge-bases to uncredited sources.

Playground and later Chat GPT developed (and continues to develop) their knowledge bases from publicly available information sources like newspapers, encyclopedias, scientific papers and "other publicly available text sources." DALL-E2 crawls the web looking for any images accompanied by text to apply its deep learning algorithms and discern meaning from a user who wants to see a chipmunk riding a horse painted by Botticelli.

Sure, not every picture of a chipmunk is owned by someone. And the works of great Renaissance painters are often public domain, to an extent. Some countries do prohibit recreations of specific works for private gain. But therein lies the rub. DALL-E2 is selling image renders for money, not the images it was trained on. They run 2 cents a piece for a 1024x1024 image last I checked.

So there's three factors pointing to a "fair use" argument for allowing OpenAI's APIs to go about their work. They only view information anybody can see. They use the information for research purposes. And well, they don't charge all that much do they? After all, an artist drawing caricatures at a carnival doesn't owe a royalty to Annibale Carracci any more than the young couple does when they ask the artist to draw them with humungous ears.

So where are the flaws? I explored this concept with a friend and colleague who teaches at a Chicagoland high school during a mutual friend's birthday party in a brewery at 11pm. The science hour. Being a public figure in the age of absolutely no privacy, they've asked I use a fictitious name when referring to them.

To provide some context, Madelyn has been instrumental in her school in developing AP curricula and innovative teaching strategies in multiple disciplines. A major tool in her arsenal has been the use of comic-style lesson materials in class. Complex concepts such as the Supreme Court and the Eighth Amendment to the Constitution are well pared down in comic frames it seems.

Madelyn first suggested a common refrain which was worker displacement. "I go to C2E2 every year, and I weep for the artists who will see their freelance work dry up by next year because the people who hired them now just have someone in house screwing around on DALL-E2 all day. I think about all of the creative work they'll never produce because they have to do something else and it's hard to handle."

A second, more alarming hole she poked was surrounding intellectual property rights. Madelyn shared a story with me about an artist who came across a work by LensaAI that bore a seemingly innocuous squiggle of lines near the bottom right of the already muddled composition. The imperfection wouldn't be that alarming if it had not born an uncanny resemblance to the artist's own signature.

Read the article in ARTNews here.

Heading into this conversation I had been riding a wave of "ain't this neat" excitement over the capabilities of OpenAI's achievements and their effects on my recent work as a writer and UX Designer. And for the sake of any conversation where there is something to learn, I tend to take a devil's advocate role to find out exactly how much I really believe the opinions I've started with taken against the arguments of others.

So my first counterpoint was about inspiration and owning what came before. If I see Justin Timberlake tearing it up at the Super Bowl, get motivated to become a dancing superstar, study a bunch of other prolific performers of my time, work hard and make it big, that's my achievement. Right?

Madelyn argued that Timberlake signed up to be an influence. He was paid for that performance and signed off on the rights that made it the NFL and Fox's property to exploit for financial gain. An artist who posts their work on Etsy for sale did not agree to feed DALL-E2 data for free so that it could one day supplant them.

My follow up to that point reminded me almost immediately of Napster and the cavalcade of other "file-sharing" platforms that erupted into the music/tech sector in the early 2000's. I pointed out that there were dozens of musicians and performing artists I've seen in concert whom I would never have sought out with $15-20 in my pocket. But now I've seen Wu Tang 3 times in concert, own collector's versions of their albums and follow the work of many of their contemporaries in the same fashion.

Besides, no Napster, no Pandora, no Spotify right? We now have a new paradigm of media sales that removes almost all barriers to the market for artists by leveraging the technology of digital media. Artists make pennies on the play, but their influence can spread into the public sphere as vigorously as the market demands. The marketplace of ideas is reinforced and all is right with the world.

Madelyn would certainly have more to say to that, starting with the fact that Spotify, SoundCloud and the rest at least credit the artists by name. And asking a room full of musicians how happy they are with their Spotify royalties is likely akin to being a long tail cat in a room full of rocking chairs.

This would also pique my cousin Corey who is a working musician performing in multiple pit bands for major musicals in Chicago. He's related stories to me of instruments trained on human performers being used to cut pit bands' sizes by 20-30% on the regular. But my time is up, so those arguments will have to wait until the next installment.

Until then, keep making that data everybody.

Search This Blog

Brian is Not a Robot

Oh, the Places I Go...on the backs of public data sources!

Comments

Post a Comment

Popular Posts

UI Man: The Ultimate Boss of Mega Man 2

Googles, and Echoes, and Open...AI?!