Howdy, y'all.

This week: a journalist at The Atlantic found four databases with 21 million copyrighted songs sitting in the open, being passed around the AI development community like a shared hard drive at a college dorm. The scale of what's been fed into these music generators is now documented — not alleged. Plus, Lionel Richie just filed four trademark applications on his own voice, following Taylor Swift and Matthew McConaughey. A strategy is beginning to take shape. Here's how it works and why it's available to more than just celebrities.

Let's get into it.

AI-yi-yi

21 Million Songs. Four Databases. All Searchable. This Is What AI Music Was Built On.

Until now, the AI music copyright fight has been a battle over allegations. Record labels said their songs were used to train AI models without permission. AI companies said their training data was proprietary — none of your business, trust us. Courts were being asked to resolve a factual dispute that nobody outside the AI industry could verify.

That changed on June 16, when Atlantic staff writer Alex Reisner published an investigation that's been turning heads all week.

What Reisner found

Reisner discovered four large datasets of songs being actively shared within the AI development community — not locked away, but circulating on AI data-sharing sites, cited in published research papers, and downloaded thousands of times. Together they contain roughly 21.2 million tracks. One dataset alone holds 12 million songs and would take 91 years to listen to straight through. Another has 9 million. Both are searchable.

The artists whose recordings appear in these datasets include Bad Bunny, Nirvana, Taylor Swift, Billie Eilish, Pearl Jam, Elvis Costello, Sheryl Crow, the Beatles, Miles Davis, and tens of thousands of lesser-known artists across every genre. The New Radicals' "You Get What You Give" — the song whose lyrics turned up in an AI-generated figure skating routine last November — appears in two of the four datasets.

Three of the four datasets are distributed as lists of links to songs on YouTube and Spotify. Developers download the actual audio using automated tools — some of which are specifically designed to bypass logins, ads, and monetization mechanisms. That's not incidental: it means the tools used to build these datasets were engineered to circumvent the systems that would otherwise generate revenue for creators.

Google has acknowledged using one of the datasets — a collection of more than 100,000 songs from the Free Music Archive, a site that allows free personal listening but requires licensing for commercial use — to train AI models. Stability AI has used songs from the same dataset. Who used the other three remains unknown, because the industry won't say.

The legal stakes just got concrete

The reason this matters beyond the headline number is what it does to the ongoing lawsuits.

The major labels — Sony, Universal, Warner — are currently suing Suno over AI-generated tracks that closely imitate specific copyrighted songs. The Sony v. Suno hearing is scheduled for July. Their damages theory seeks up to $150,000 per infringed work under the Copyright Act's statutory damages provision. Against 21 million songs, that math becomes staggering very quickly — the potential exposure across the industry runs into the trillions, which is part of why these cases will almost certainly settle rather than go to verdict.

But here's what the Atlantic investigation actually changes: it moves the plaintiffs' task from "prove these companies used our music" to "these companies' own research papers cite datasets that contain our music, and those datasets are downloadable by anyone right now." That's a different evidentiary position.

Suno, for its part, has said it uses "safeguards to protect against unauthorized distribution, impersonation and manipulations" and that reproductions of training data "should not happen." Those safeguards have not prevented the documented output similarities — Thriller, Shape of You, Johnny B. Goode — that appear in the labels' complaints and that Reisner links to directly in his piece.

What this means if you're not a major label

The investigation focuses on music, but the pattern is the same across text, images, and video: large quantities of creator content, scraped without permission, used to build commercial products, with the defense that it's either licensed (Google's position, covered here in Issue #18) or fair use (everyone else's position).

If you are a creator whose work lives on public platforms — music on YouTube or Spotify, writing on Substack or a public blog, art on social media — the realistic answer is that your work may already be in one of these datasets. You didn't consent to that. You may not be able to stop it retroactively.

What you can do: register your copyrights. The statutory damages that make these lawsuits financially meaningful — up to $150,000 per work for willful infringement — are only available for works that were registered before the infringement occurred. For works registered after, you're limited to actual damages, which are much harder to prove and much smaller. If you're a working creator who hasn't registered your catalog, this week's investigation is the argument for starting.

Cover Your Assets

Lionel Richie Trademarked His Voice. So Did Taylor Swift. Maybe YOU should.

On June 11, Lionel Richie filed four trademark applications with the USPTO, each covering audio of him saying a phrase from one of his most recognizable songs:

  • "Hello, is it me you're looking for?"

  • "Say you, say me"

  • "Easy like Sunday morning"

  • "All night long"

The filings were filed on an intent-to-use basis — meaning Richie isn't currently using these phrases as trademarks in commerce, but he's establishing his claim to do so.

This is the third high-profile voice trademark filing in roughly 14 months. Taylor Swift filed in April to register her voice saying "Hey, it's Taylor" and "Hey, it's Taylor Swift," along with her likeness. Matthew McConaughey has secured trademark protection for "Alright, alright, alright" — his line from Dazed and Confused — and Jimmy Kimmel has filed similar applications this year. Three filings in 14 months, all motivated by the same underlying threat: AI voice cloning.

Why trademark instead of copyright or right of publicity

Copyright doesn't protect a voice. It protects a specific recorded performance — the particular recording of "Hello" that Richie made in 1984. A cloned AI version of Richie's voice singing a new song isn't a copy of that recording; it's a simulation. Copyright doesn't reach it.

Right of publicity — the state law right to control commercial use of your name, image, and likeness — does protect voice in most states, but it varies significantly from state to state, it's hard to enforce across jurisdictions, and it doesn't create a federal registration that puts the world on notice.

Trademark offers something different: a federal registration, a public record, and a cause of action that doesn't depend on proving which state law applies. A registered sound mark gives the owner grounds to challenge imitations that "merely resemble" the protected sound — which is exactly what AI voice cloning produces. It reaches beyond what copyright and publicity rights already provide.

The catch is that trademark protection requires the sound to function as a "source identifier" — listeners must tie it to a specific product or service, not just recognize it as a famous lyric. The USPTO will expect evidence of that, and it won't be automatic. Whether Richie's applications ultimately succeed is genuinely uncertain. But if they do, they become an important precedent for how trademark law adapts to the AI era.

The NO FAKES Act is still pending — trademark is available now

Congress has been trying to pass the NO FAKES Act — which would establish a federal intellectual property right in a person's voice and likeness for the first time — since 2023. It was reintroduced in May for the third time, after failing to advance out of committee twice before. Don't wait for it.

The trademark strategy these artists are pursuing doesn't require new legislation. Sound marks are already registrable under existing law. The question is whether your voice, phrase, or sonic identity is distinctive enough to function as a source identifier — and whether you can demonstrate that in front of the USPTO.

For household-name celebrities with instantly recognizable voices, the answer is probably yes. For working creators, the analysis is more nuanced, but not foreclosed. A podcaster with a recognizable sign-off, a creator with a catchphrase their audience associates specifically with them, a musician whose vocal style is their brand — these are people for whom a sound mark application is worth at least a conversation with a trademark attorney.

The practical takeaway from watching Swift, McConaughey, Richie, and Kimmel all file in the same 14-month window isn't just that famous people are scared of AI. It's that they're reaching for a legal tool that already exists, filing now, and building a record — because the alternative is waiting for federal legislation that has already failed twice, while AI voice cloning gets cheaper and more accessible every month.

The strategy is available. The window to use it — before your voice is cloned and someone else is profiting from it — is open right now.

See you next time,

Hank

P.S. If someone forwarded this to you and you find it useful, subscribe at newsletter.creatoripacademy.com. No spam, no filler — just IP news that actually matters.

About Hank's IP Brew

Creator IP Academy helps creators understand and protect their intellectual property. Got a question? Reply to this email.

Keep reading