I recently produced (and was the Major-General in) The Pirates of Penzance. The opera was first performed in 1879, and published contemporaneously. Its composer, Arthur Sullivan, died in 1900, and its librettist. W. S. Gilbert, in 1911. So when I uploaded it to YouTube, why was my video tagged as a copyright violation? It wasn’t that kind of piracy!

(As an aside: Pirates premiered in earnest on Broadway rather than in England in order to prevent copyright pirates from taking advantage of the lack of copyright relations at that time between Britain and the United States, as they had with H.M.S. Pinafore. The subject matter of the show was perhaps chosen in part as a tongue-in-cheek reference to copyright piracy, which is especially humorous in the context of this post.)

This isn’t the first time this has happened to me. I’m sure this has happened to many other people too, most of whom probably don’t understand the Content ID system — many of whom probably never even double-checked after uploading that ad revenues on their video weren’t going to fraudulent claimants. I’ll break down the claims I got — all of which I’ve disputed — and what YouTube could do to alleviate this problem.

I want to clarify that this only has to do with the Content ID system, which of Google’s design, not the result of legislation (like the DMCA). If the actions taken by YouTube were legally required, that would be one thing — and there are certainly issues with the DMCA takedown process. But that’s not what I’m talking about here today. There are improvements that Google can make to prevent what happened to me from happening to others — and without even really tipping the scales away from any claimants, except fraudulent ones.

It’s important to distinguish between the different types of content claims that Content ID allows a claimant to make. Claimants can enter a recording — audio or video — into the matching system, and copies of that recording, or videos containing it, are automatically tagged by Content ID. Many YouTubers using clips under fair use provisions of copyright law have discussed how Content ID tags these recordings without being able to consider critical context — which is reasonable, but in those cases the system is working correctly, recognizing the registered recordings. Sometimes, though Content ID has failed, erroneously “recognizing” certain recordings of a work as being one of these registered recordings, even when this was not the case, for instance, when YouTube recognized a performance by Valentina Lisitsa as one by Glenn Gould. The system isn’t perfect.

Content ID also may flag recordings which are in the public domain in one part of the world, but not another. Ulrich Kaiser complained about copyright claims on his uploads of recordings which are in the public domain in his home country of Germany, but, due to how sound recordings are treated in U.S. copyright law, his uploads are of material still protected in the United States, although all in the clear in Germany.

But all of those complains have to do with Content ID’s system for recognizing recordings. My issues have all been with its composition tagging system. In music, there are separate copyrights on the composition and any given recording. For instance, while the musical composition The Pirates of Penzance is in the public domain — as is, for instance, Mozart’s Requiem mass — individual recordings of these pieces have separate copyrights, of which there are many active (as both these pieces have been recorded many times). The same concept exists in contemporary popular music — when an artist covers a song, there is a new copyright in the recording, but there is no new compositional copyright.

So after my video finished uploading, I looked to see if there were any claims, and sure enough, there were!

Copyright claims: Overture, No Em Deixis Sola Aquí, Ets Falsa Per Què M'enganyes, Made You, Anem, Anem!, Arrancant-se la Camisa, The Wedding, 3. Oh, Better Far To Live And Die

All of these claims are of the type “musical composition”, meaning that Content ID correctly recognized my video was a new recording of an existing piece. The AI is working! But the database is flawed, because none of these claims were legitimate.

To go through them:

  • Overture (claimed by APRA_CS, MUST_CS, ICE_CS) — Someone's claimed a copyright on the overture to Pirates. There's no legitimate claim here.
  • No Em Deixis Sola Aquí (Muserk Rights Management) — This is a claim on "Ah, leave me not to pine". The composition I'm supposedly copying is just the same number from a Catalan translation of the same opera. The Catalan lyrics are copyrighted, but Sullivan's original music isn't. The same is the case for Ets Falsa Per Què M'enganyes ("Oh, false one, you have deceived me!"), Anem, anem! ("Away, away! My heart's on fire!") and Arrancant-se la Camisa ("Sighing softly to the river").
  • Made You (SOLAR Music Rights Management, Sony ATV Publishing) — This is a claim on "When a felon's not engaged in his employment". I cannot find the piece I'm supposedly copying.
  • The Wedding (Concord Music Publishing, Kobalt Music Publishing) — This is a claim on "Poor wandering one", and, as in the previous case, I cannot find the piece I'm supposedly copying.
  • Oh, Better Far To Live And Die (UMPG Publishing) — This is the title of the Pirate King's song, and correctly identified. But it's part of the original opera and not copyrighted.

So, we have eight claims, of which two are obviously fraudulent claims on the melodies of the original opera, four are still clearly fraudulent as they are claiming that the melody of a version with translated lyrics is somehow new and two which might (let’s be charitable) be failures of Content ID, as I cannot find the pieces that I’m supposedly copying.

I do want to address one point that sometimes comes up for Pirates, which is that there are arrangements which are under copyright. Most famously, Music Theatre International will happily sell you a license for the reorchestration by William Elliott used in Joseph Papp’s 1980s revival, but there are also new orchestrations by Jim Newby and other copyrighted arrangements. A Dublin production was once nabbed for unlicensed use of Elliott’s orchestration (we played from public domain scores). But YouTube’s system only claims to recognize melodies, and the arrangements, while containing copyrightable new material, do not contain substantially new melodies, and Content ID does not claim to be able to distinguish an arrangement from a public domain source composition. So, by Content ID’s own rules, new arrangements should not be entered.

How to fix this? Well, I can’t speak for “Made You” or “The Wedding”, but the majority of these cases are clearly AI doing its job correctly, but using a database containing fraudulent claims. If you’re familiar with copyright registrations for musical compositions (as they’re done in the United States), you’d know that the typical form contains multiple claimants for this kind of work, generally including the type of work (say, “Music”) and the claimant (say, “Stephen Sondheim”). For Sweeney Todd, the Demon Barber of Fleet Street, there are a number of registrations: an “Other” registration for a new dramatization (the libretto, claimant Hugh Wheeler, registration PA0000058328), and “Music” registrations for a number of individual numbers as well as the full orchestration (listening as authors Stephen Sondheim for the music and words, and Jonathan Tunick for the orchestrations, PA0000041126).

I’m not saying that the exact format used in US copyright registrations is what needs to be associated with any piece registration in Content ID, but something along these lines would be best. If you are claiming a melody, you should be able to write who the legal authors are and (as applicable, in US law especially) the registration date. If that composer is someone like Stephen Sondheim (whose works are copyright), I have no issue with the entry. But why allow claimants to register pieces in the Content ID melody-matching scheme which are obviously composed by people whose works are in the public domain, such as Arthur Sullivan?

Right now, even if you dispute and win, the entries are not removed from Content ID — the claimants can simply choose to back off on the claim on your video, which, in my experience with fraudulent claims, usually happens when the claimants allow your counter-claim to go ahead at the end of the thirty-day appeal period. But when the next person uploads their production of Pirates, if nothing is changed, these fraudulent claimants will probably get that recording tagged with their claim too. And if the next uploader isn’t wise enough, ad revenues on those videos will be going to the fraudsters. Not a bad business, claiming copyright on public domain pieces, huh?