Will a certification that an AI model doesn’t infringe copyright solve one of the fundamental problems of this powerful technology?

As we have reported previously in a series of articles on the topic[1], whilst AI Large Language Models (LLMS) such as ChatGPT and image generators such as DALL.E 2 can be powerful tools which can be used for a wide range of applications, they have a number of flaws. One of which is the way they are trained and the data that they use to do so.

Indeed, training AI models with copyrighted data has been a charged issue in generative AI and a number of artists and authors sued several AI companies for copyright infringement. The New York Times also filed a lawsuit against OpenAI and Microsoft for violating its copyright when training its GPT models. Some proposed legislation currently making its way through Congress would require AI companies to disclose where they got their training data. The information will then be used to let copyright holders know if their work was used without their permission.

To counteract this issue ahead of time, a company called Fairly Trained, founded by former Stability AI Vice President for Audio Ed Newton-Rex, has developed a certification label that companies can use that prove they asked for permission to use training data sources protected by copyright.

Its first accreditation, which it calls the “Licensed Model certification”, will be awarded to companies that license protected data to train its models.  Fairly Trained said in a blog post that it has already certified nine generative AI companies that work in image, music, and voice generation. These include Beatoven.ai, Boomy, BRIA.ai, Endel, LifeScore, Rightsify, SOMMS.AI, Soundful, and Tuney.

In that post, Fairly Trained explains that they will not issue the certification to developers that rely on the fair use argument to train models, and that:

as our first certification, we don’t expect it to solve all the issues for creators that generative AI training raises. But we hope that it highlights that there is a meaningful difference between generative AI companies that license training data and those that use data without consent.”

Whether or not this certification takes off and become widely used is yet to be seen, however the author is pleased to see that the problem is not only being recognised, but also addressed.


[1] https://www.scintilla-ip.com/chatgpt-the-good-the-bad-and-the-ip-ugly/





At Scintilla, we help innovative companies get a grip on their intellectual property. Our unique commercial approach combines registration of patents and trade marks with strategic input so that IP can be a springboard for business growth. If you would like to discuss your IP needs, do contact us or book a free initial consultation!