The AI Code Laundromat: From Technical Optimization to License Piracy


Between “Vibe-coding” and license laundering, generative AI is shaking the very foundations of Open Source. From the Chardet case study to the Ship of Theseus paradox, this is an analysis of an ecosystem where automated rewriting threatens to break the digital social contract and permanently erase the contributor’s footprint.

Executive Brief

The Open Source Era of “Vibe-coding” and License Erosion.

I. Current State: The Breach of the Social Contract

The emergence of Generative AI is transforming software development into a task of supervision rather than writing, a shift captured by the concept of “vibe-coding.” This computational power now allows for the total rewriting of complex libraries in just a few days. It shatters the natural barrier of effort that, until now, protected Copyleft licenses such as the GPL or LGPL.

The Chardet Case Study: A total AI-driven rewrite in five days, followed by a pivot from a strict license to a permissive one (MIT). This effectively removes the obligation to “give back to the community.”

II. Strategic Risks and Threats

  1. AI Washing & License Laundering: Using LLMs as “translation machines” to scrub original legal restrictions under the guise of syntactic novelty.
  2. The “Payload Gap”: A study reveals that 94.25% of AI artifacts omit mandatory copyright notices. This lack of documentation creates an invisible infringement risk for end-user companies.
  3. The Legal Impasse (Thaler v. Perlmutter): Code generated without direct human intervention may belong to the public domain. In a legal absurdity, maintainers are applying licenses to rights they do not theoretically possess.

III. Philosophical Perspective: The Ship of Theseus

The debate is not about syntax, it is about identity. If AI replaces every “part” (the code) but retains the “blueprint” (the logic and function), the work remains a derivative work. To claim otherwise is to validate appropriation without contribution, to the detriment of the original intellectual authorship.

IV. Recommendations for the Enterprise

  1. Radical Provenance Audits: Move beyond surface-level analysis to track the “genetic” origin of the code using next-generation SCA (Software Composition Analysis) tools.
  2. Adopting “AI-Proof” Standards: Support the implementation of AI-specific SBOMs (Software Bill of Materials) to guarantee the traceability of training sources.
  3. Prioritizing “FOSS-respecting” Models: Direct technological choices toward transparent and auditable models. This ensures genuine digital sovereignty and long-term legal security.

Conclusion

AI must not become the enemy of Open Source. The permanence of our collective software heritage depends on our ability to intelligently implement rigorous traceability in the face of automated plagiarism.


The AI Code Laundromat: Introduction

Anyone familiar with the Open Source world will have noticed that Artificial Intelligence is disrupting the serenity and trust that the developer community has long enjoyed. It turns out that AI is becoming an industrial-scale tool for circumventing the licenses that, until now, protected the work of authors. This automation directly clashes with the vision of Linus Torvalds, for whom the strength of Open Source rests on a contract of reciprocity. For him, the right to use the work of others is inseparable from the duty to share one’s own improvements. Using AI to “launder” a license and appropriate logic without giving back to the community breaks the virtuous circle of collaboration that allowed the emergence of our greatest technological standards. Indeed, the arsenal of Gen AI tools unfortunately makes it possible to generate syntactically new versions of source code while preserving its business logic, to the detriment of the original authors and contributors. This intelligent automation undermines the rules and customs of intellectual property. It disrupts the ecosystem from a legal perspective and challenges the very principles of intellectual property.

The Chardet library, an essential encoding detection tool downloaded over 130 million times per month, has become a textbook case in this regard. In just five days, this project was entirely rewritten using the Claude AI. Consequently, the new maintainers took the opportunity to pivot from a strict license (LGPL) to an extremely permissive one (MIT).

To fully grasp the stakes of this legal shift, one must distinguish between the philosophies of these two licenses. The LGPL is a so-called “Copyleft” license that allows the use of code but mandates that any modification or improvement to the library must also be shared under the same license. In contrast, the MIT license is “permissive.” It allows for the reuse, modification, and even integration of the code into closed proprietary software, with the only real obligation being to preserve the original copyright notice. Moving from one to the other effectively removes the obligation to “give back to the community” what was borrowed.

This practice reveals a growing phenomenon known as “AI Washing,” which currently manifests in two distinct ways that are deeply corrosive to the ecosystem. On one hand, there is license laundering, which involves using Large Language Models (LLMs) as literal “translation machines” for code. The goal is to scrub the original legal restrictions under the guise of technical rewriting. On the other hand, we see permissive washing, the deceptive labeling of AI artifacts as “free.” In these cases, models or datasets are presented as “open” even though indispensable legal documents, such as license texts and copyright notices, are entirely missing from the repositories. This lack of a legal payload, the physical absence of the license text within the code or model in favor of a mere label, makes these tools legally toxic for developers and companies. It turns a promise of openness into a true compliance nightmare.

The Technological Aspect

For a new generation of technicians, Generative AI can seem like a magic wand capable of liquidating years of technical debt in record time. We are witnessing the emergence of “vibe-coding.” This is a practice where the developer no longer writes code, but instead supervises the massive and instantaneous production of lines of code by a machine. The Chardet case illustrates this power perfectly. Where a manual refactor would have required months of work, AI made it possible to deliver a version in just a few days, boasting a staggering 48x increase in execution speed.

The technological prowess displayed by AI tools is dissolving the major obstacles that once stood in the way human cost, time, and the intellectual effort required to rewrite complex code. Not long ago, any desire for a total refactor hit the natural wall of the task’s sheer scale. A lack of resources also immediately discouraged any attempt at license circumvention. Today, that barrier has fallen. With AI, the effort required to radically transform a software’s syntax while preserving its underlying logic has become virtually non-existent.

It is necessary to clear up a major misunderstanding to fully grasp what is at stake. AI does not, in any way, perform what could be considered “Clean Room Design.” Historically, this method relied on an absolute wall between two groups of engineers. A “Dirty Room” team would analyze the original code to extract pure functional specifications, while a “Clean Room” team would write the new code based solely on those instructions, without ever having been exposed to a single line of the source software. This procedure was designed to prove in court that any final similarities were the result of technical necessity rather than plagiarism.

However, an LLM is not an isolated entity, as it carries within it the footprint of its training data, which often includes the very source code one is trying to rewrite. When a developer submits a block of code to an AI for improvement, the algorithm invents nothing. In reality, it acts as a high-level translator. The original logic, structure, and inventiveness persist, even if they appear in a new syntactic form. It is clear, then, that AI does not create. In a sense, it camouflages. Ultimately, changing the words is not enough to erase the invention. Under the varnish of a new syntax, the code remains a derivative work that still belongs to its original author.

This technical ease creates a void in terms of traceability. If the prompt used contains all or part of the original source code, the AI technically cannot guarantee that its output is not simply plagiarism by reformulation. For the community, this means a project could appear visually new while being, in reality, a form of automated infringement. Worse still, provenance auditing would become nearly impossible without extremely sophisticated similarity analysis tools. This computational power does not merely serve technical performance. It must be understood as the engine of a deliberate legal circumvention strategy.

Strategic and Economic Drivers of “Laundering”

While switching to a permissive license via AI certainly serves a technical purpose, a closer look reveals it is equally a deliberate strategy to disengage from Copyleft obligations. The primary lever for this practice is the desire to neutralize the recursive nature of Copyleft licenses, which ensure the persistence of authorship rights. For a corporation, restrictive licenses like the GPL are perceived as a heavy burden because they mandate the public release of the source code for any derivative work. By using AI to “wash” the original code and re-license it under MIT or Apache, organizations can bypass this risk. In doing so, they grant themselves permission to privatize entire segments of collective intelligence, appropriating them into proprietary products or SaaS offerings to generate profit without ever having to redistribute their own improvements.

Beyond the commercial stakes, AI offers a really new administrative shortcut. Historically, changing the license of a legacy project required the unanimous consent of every contributor. In practice, this process was often impossible to carry out. Here, the machine allows developers to evade this requirement for consensus through a complete syntactic rewrite. Thus, new maintainers claim to start from a technical “blank slate” to avoid the legal burden of recognizing past authors. This practice makes it possible to resolve technical debt at a negligible cost. As illustrated by Chardet’s spectacular performance leap (recall that its speed increased 48-fold) AI does more than just change a legal label. It makes a full refactor nearly instantaneous. What once required months of manual labor is now liquidated in a matter of days. For Big Tech, the stakes are colossal. The goal is to transform nearly the entire Open Source heritage into a raw data set, stripped of its copyright notices, to fuel proprietary models.

Redefining Intellectual Property

The situation borders on the absurd when these practices are confronted with current case law, specifically the Thaler v. Perlmutter case. In this instance, the United States Copyright Office maintains that works generated without direct human intervention cannot be protected by copyright. This results in a total legal impasse. If AI-produced code belongs to the public domain by default due to the lack of a human author, new maintainers theoretically lack the standing to apply an MIT license to it. We are entering a reality where it becomes possible to grant or restrict rights that one does not technically possess.

Beyond these theoretical debates, the reality on the ground reveals a massive erosion of legal safeguards. A study from Queen’s University highlights an alarming phenomenon where 96.5% of datasets and 95.8% of models labeled as “permissive” omit the mandatory license text. As a reminder, “permissive” refers to licenses (such as MIT or Apache) that allow for broad reuse, including commercial use, with the primary requirement being the obligation to credit the original author and include the license text. Unfortunately, we see here that the law is fading behind a simple marketing metadata tag with no real legal value. For the end user, this “Payload Gap” creates a false sense of security. They believe they are handling an open tool, when in fact they are exposing themselves to infringement risks due to an inability to prove the chain of title.

Toward “AI-Proof” Licenses?

To counter this erosion of legal protections, the open-source community is beginning to build new ramparts capable of restoring a form of legal guarantee. These new types of licenses, dubbed “AI-proof,” aim to close the loopholes exploited by automated laundering.

One of the primary levers is to legally neutralize the “Clean Room” myth. New clauses could explicitly stipulate that any code generated or translated by an AI model exposed to the original source code during its training constitutes, by nature, a derivative work. Such a provision would prohibit the use of an LLM as a mere “syntactic engine” to bypass Copyleft obligations. In this scenario, we would simply restore the legal lineage that the AI sought to erase by linking the output to the training source.

We are witnessing a major strategic shift. This change is significant because the focus is no longer just on who has “the right to copy the code”, but rather on “what they are allowed to do with it”. This is the core purpose of RAIL (Responsible AI Licenses). Unlike traditional licenses that govern how code is copied, RAIL licenses introduce behavioral rules. They impose limits on how the model can be used (for instance, prohibiting surveillance or disinformation) and require these constraints to propagate to all downstream applications. It is a compelling way to regain control over the technology’s ultimate purpose rather than just its text. A technical counter-offensive is also necessary. To combat the “Payload Gap,” traceability standards such as AI-adapted SBOMs (Software Bill of Materials) are currently under development. The goal is to force models to embed an unforgeable digital identity of their sources. In this scenario, a tool’s inability to provide this proof of provenance would render any commercial exploitation of the generated code legally void. In this context, traceability becomes a sine qua non for legality.

Strategic Risks and Shifts for the Enterprise

For decision-makers and legal counsels, integrating AI-“laundered” code is no longer a simple matter of technical monitoring, but a major operational risk. Incorporating a component whose license has been altered by a syntactic engine is equivalent to injecting an invisible vulnerability into the heart of the software supply chain. During a deep audit, such as during a fundraising round, a merger and acquisition (M&A), or an initial public offering (IPO), the inability to prove the legitimacy of a license could invalidate the very value of a technological asset. It is therefore becoming vital to use next-generation Software Composition Analysis (SCA) tools, capable of tracking not just library names, but the “genetic” origin of the code itself. This threat extends beyond the corporate world to touch the very essence of Free Software. According to Bruce Perens, widely regarded as one of the founding fathers of Open Source, this capacity for industrial-scale cloning by AI could well signal the death knell of the current model. In his Keynote “What comes after open source?”, he ironically states: “We’re an excellent welfare program for corporations. Our users are the richest companies in the world.” We must acknowledge that if any actor can appropriate a Copyleft project and “turn” it proprietary by running it through an LLM, then licenses like the GPL lose their protective power. In this case, the social contract ensuring that everything taken from the community must be returned to it literally shatters. Open Source becomes a free reservoir for proprietary software.

Yet, amid this chaos, an opportunity is emerging for the most visionary entrepreneurs. Since they are becoming an abundant commodity, easy to generate, the value of software may no longer reside in the lines of code themselves, but rather in the quality of its architecture, its maintenance, and above all, its production ethics. The future likely belongs to “FOSS-respecting” AI models. Unlike “opaque box” models, a “FOSS-respecting” approach does more than just comply with the law (such as the European AI Act). It goes further by guaranteeing true digital sovereignty. This specifically involves using models and datasets whose provenance is auditable. The idea here is to verify that no copyrights were violated during training. For an enterprise, choosing these tools becomes a major strategic asset. It represents a commitment to transparent technology. It establishes that compliance is community-verifiable, which eliminates any hidden legal liabilities linked to code “laundering.” This becomes a selling point. The company can market a clean, transparent technology, free from any concealed legal baggage.

Ethical Perspective

Beyond lines of code and contractual clauses, AI raises a fundamental ethical question that brings us back to a millennia-old inquiry. This is the philosophical thought experiment concerning the notion of identity, known as the Ship of Theseus. If every single plank of Theseus’s ship is replaced one by one until none of the original parts remain, is it still the same ship?

When applied to code, the dilemma is striking. We realize that AI can rewrite every line of syntax, optimize every function, and rename every variable, effectively replacing the linguistic “planks” of the original text. Yet, if the logical structure and functional architecture remain those of the initial work, can we truly claim to have created a new legal object? 

This inquiry finds a direct echo in recent research, notably the study “A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts,” which highlights a disconnect between form and substance. When an AI paraphrases, we observe a prioritization of style over content. In other words, while the original author’s stylistic signature fades in favor of a generic imprint unique to the Gen AI model, the semantic content itself often remains intact. This phenomenon questions the “continuity of identity” of the work. According to this theory, as long as the fundamental attributes persist, whether it be the logical design or the function of the code, the object retains its original identity. We can see an analogy here with Da Vinci’s Mona Lisa. If a contemporary artist were to offer a version using fluorescent pigments or spray-paint techniques, the Renaissance “touch” would disappear in favor of a modern style. Yet, the work would still be recognized as Leonardo da Vinci’s. Why? Because the identity of the painting does not reside solely in the type of paint used, but in its geometric composition, the precise tilt of the gaze, the pyramidal structure of the subject, and so on. In this view, according to the study, authorship should remain with the initial creator as long as their unique concepts are preserved. Therefore, AI rewriting is merely a syntactic mask. It follows that one cannot justify a change in licensing or ownership based on such a transformation.

On the other hand, if one considers that the active act of transformation is what defines an author, then AI could be seen as the true creator of this new version. This ambiguity underscores that traceability has become a sine qua non for determining whether we are facing a mere writing aid or a true substitution of identity. For defenders of Open Source ethics, the answer is clear. They claims that changing the planks (the syntax) does not change the ship’s blueprints (the intellectual property), just as a Mona Lisa painted in pink remains the work of Da Vinci. In this light, AI facilitates appropriation without contribution, allowing one to claim ownership of a “new” vessel when they have merely masked the work of another. Thus, AI automates reformulation, offering the possibility to plunder a project’s intelligence without ever nourishing the ecosystem. This effortless “translation” dehumanizes source code, its authorship, and intellectual property. It transforms code into a raw commodity that can be appropriated through a simple prompt, ultimately shattering the principle of reciprocity that lies at the very heart of Open Source.

This techno-legal drift is giving rise to a crisis of attribution. The figures from the study “Permissive-Washing in the Open AI Supply Chain: A Large-Scale Audit of License Integrity” speak for themselves. According to the researchers, today, barely 5.75% of applications preserve the copyright notices of the models or datasets used. By erasing these mentions, we are not merely violating a legal rule, we are breaking the chain of recognition upon which the careers and reputations of developers depend. Now, code is being generated on such a massive scale that the original author tends to be drowned in the depths of machines and models. This systemic erasure risks discouraging the contributors of tomorrow, who may legitimately ask themselves: why offer their work to the community if an AI is to eventually digest it and spit it back out under an anonymous or, worse, proprietary banner?

Open Source DNA at the Core of Models

It is worth recalling a fundamental truth, often brushed aside in the marketing discourse of AI labs. By its very nature, Open Source Software forms the very backbone of every LLM. Since these models have been trained by blindly “scraping”, without authorization and even less remuneration, everything accessible online, open code is not merely a source of inspiration, it is, in fact, embedded in the very structure of the system. This means that, in essence, Open Source is an integral part of the model, regardless of the denials or semantic gymnastics of AI developers. As long as a developer cannot provide irrefutable proof of a completely “clean” training dataset (a certification that, in practice, will never happen!)the model remains fundamentally tied to the community work it has absorbed.

Conclusion

“AI Washing”, whether through license laundering or permissive washing, is not merely a technical issue or a debate for legal experts. It is a profound breach of the digital social contract that has bound creators and users together since the dawn of Open Source. By enabling the automation of appropriation without contribution, Generative AI risks transforming one of the finest examples of human collaboration into an anonymous data reservoir for proprietary software. Faced with this phenomenon, the question arises: are we forced to resign ourselves to a state of affairs where the notion of authorship must bow to the power of Gen AI? Or rather, will we be able to impose tangible traceability through “AI-Proof” mechanisms? The answer to this question will determine whether AI becomes a vehicle for sharing and innovation, or whether it acts as the catalyst for the end of Open Source by hijacking human contribution. For companies and developers alike, choosing transparent and respectful (FOSS-respecting) tools is no longer an ethical option. It is becoming a strategic necessity to preserve trust and the long-term viability of our software heritage.

The Ship of Theseus experiment leaves us with our backs against the wall. We must ask ourselves: if an entire ship is rebuilt using the planks removed from the original, which one is the true ship? By using AI to “launder” syntax, we create nothing, we simply displace ownership. If we accept that the machine can erase the origin of the wood to preserve only its shape, the very concept of sharing is called into question.

A sustainable ecosystem cannot be built on “laundered” code. By separating logic from its license, AI does not liberate software, it dispossesses it of its identity and its rights. The true danger is not that the machine will replace the developer, but that it will serve as a screen for unapologetic appropriation within a “shadow legal system.” Protecting attribution is not about slowing down progress, quite the opposite, it is about ensuring that the reservoir of common knowledge is not drained by the very people who refuse to replenish it.

Sources & References: