I think congress and the new administration should craft a modern statutory prong of fair use explicitly tailored to AI training data access, focused specifically on models that provide free/open access tiers or are open source. This targeted approach recognizes that existing copyright frameworks, rooted in decades-old standards, fail to address a transformative technology that is critical to economic growth, societal advances, and the greater good. While these models use copyrighted works in their training, they do so in ways that are highly innovative and do not directly compete with the original works. The carve-out could be part of a broader "new deal on data" that pairs expanded fair use protection with new obligations around transparency and limited licensing requirements - creating a balanced framework that promotes continued advancement in this essential field while providing appropriate safeguards and accountability.
Interested to learn more and consider this? I asked OpenAI Deep Research to help me flesh this out and am generally pleased with the below results.
Artificial Intelligence (AI) is increasingly central to economic growth, technological progress, and societal advancement. Modern AI systems – especially large language models and other generative AI – rely on vast quantities of training data, much of which is drawn from copyrighted texts, images, and media. Yet U.S. copyright law has not been updated to clearly accommodate these new uses. This gap has led to lawsuits and uncertainty as AI developers push the boundaries of “fair use” in training their models (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries) (AI, copyright and fair use: What you need to know | TechTarget). Current fair use doctrine, while flexible, was not explicitly designed for the mass ingestion of copyrighted works by machine learning algorithms. Without legal clarity, America’s AI innovators face a chilling effect: the risk of litigation or the need to negotiate thousands of licenses could stunt AI development at a pivotal moment (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries).
Updating U.S. copyright law with a statutory fair use carve-out for AI training data is a timely solution. This carve-out would explicitly permit the use of copyrighted materials in AI model training (sometimes called text and data mining or TDM) under defined conditions, removing ambiguity and legal risk. The payoff of such reform would extend far beyond the tech sector. Strong AI capabilities can drive breakthroughs in medicine, education, energy, and more – areas of broad societal importance. Ensuring AI developers can train on comprehensive datasets (which inevitably include some copyrighted works) is essential for AI systems to be accurate, unbiased, and useful across domains. In short, data is the fuel of the AI economy, and fair use is the legal engine that can allow that fuel to power innovation.
Leading democracies have begun to recognize this. For example, Japan amended its copyright law in 2019 to permit using a work “in any way… necessary” for information analysis (such as AI training), so long as the use is not for human enjoyment of the work’s expression and does not “unreasonably prejudice the interests of the copyright owner” (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press) (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press). Singapore introduced a similar exception in 2021 to support its “Smart Nation” AI ambitions (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press) (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press). These reforms explicitly balance innovation needs with reasonable safeguards for rights holders. The United States now has the opportunity to craft its own “fair use for AI” exception – setting a model for global adoption consistent with American values and economic interests. Doing so would cement U.S. leadership in AI while upholding a tradition of flexible copyright exceptions that fuel new technologies (just as fair use enabled search engines and web indexing in the past (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries)).
This brief outlines why a statutory update for AI training data is urgently needed and how it can be accomplished. We examine the economic and strategic stakes, the key stakeholders involved, and the lessons from recent global AI policy shifts. We then evaluate the legal trade-offs and propose a “New Deal on Data” – a policy framework that marries AI progress with fair treatment of content creators. Finally, we recommend concrete safe-harbor provisions and actions for U.S. lawmakers and the Administration to realize this vision by 2026, positioning the U.S. as the world’s AI leader in a manner that other nations can follow.
The economic upside of AI innovation is enormous – and the United States must ensure it captures a leading share. Multiple analyses predict AI will boost global GDP by the tens of trillions of dollars over the coming decade. PwC, for instance, estimates that AI could contribute up to $15.7 trillion to the global economy by 2030, more than the current output of China and India combined (PwC's Global Artificial Intelligence Study | PwC ). North America alone is forecast to see a 14.5% increase in GDP by 2030 due to AI adoption (roughly an additional $3–4 trillion in output), accounting for a significant portion of total global gains (PwC's Global Artificial Intelligence Study | PwC ). McKinsey research similarly finds that AI could deliver around $13 trillion in additional output by 2030, and that countries leading in AI could capture 20–25% more economic benefits than today, while laggards may realize only half of that potential () (). In short, leadership in AI is poised to confer a major competitive advantage in terms of productivity, innovation, and growth.
America’s historical edge in technology is on the line. The global AI race is already underway, with intense competition from China and the European Union (How the world stopped worrying and learned to love AI – POLITICO). The past two years have illustrated this vividly. The U.S. has made massive investments – for example, a recent $500 billion federal plan to bolster AI-related computing infrastructure, signaling determination to win the AI race (How the world stopped worrying and learned to love AI – POLITICO). Yet rivals are moving just as fast: Chinese firms have demonstrated they can develop advanced AI models at surprisingly low cost, jolting markets and challenging the notion of an insurmountable U.S. lead (How the world stopped worrying and learned to love AI – POLITICO). The European Union, initially focused on regulating AI risks, is now pivoting to an innovation-friendly stance, announcing tens of billions of euros for AI development and pledging to “cut red tape” to spur AI deployment (How the world stopped worrying and learned to love AI – POLITICO) (How the world stopped worrying and learned to love AI – POLITICO). As European Commission President Ursula von der Leyen put it, “Global AI leadership is still up for grabs… Europe is open for AI and for business” (How the world stopped worrying and learned to love AI – POLITICO). In other words, our allies and competitors alike recognize that whoever best enables AI growth will shape the future economy.
The United States cannot afford to lag in creating a policy environment that unleashes AI innovation. A statutory fair use carve-out for AI training data would directly support America’s AI sector by reducing legal friction and costs. By lowering barriers to entry (so startups and researchers can train AI systems without fear of crippling lawsuits) it would democratize AI development, not just benefit tech giants. This translates into more domestic AI R&D, more startups, and more talent gravitating to the U.S. ecosystem. In turn, stronger AI capabilities will boost industries from manufacturing to healthcare. For example, better AI trained on comprehensive data could lead to medical discoveries, smarter infrastructure, personalized education tools, and productivity gains across the board – each carrying significant economic benefits.
There is also a strategic national interest at stake. Advanced AI has implications for national security, defense, and global influence. U.S. leadership in AI means having the best tools for intelligence, cybersecurity, and military decision-making, as well as setting global norms for AI ethics and governance. If the U.S. imposes overly restrictive data usage rules while other countries allow freer innovation, we risk ceding ground in both economic and strategic domains. As one analysis framed it, “The race is already on among companies and countries” to harness AI, and managing the trade-offs properly will determine who captures AI’s full potential (). In this race, agile and innovation-friendly policy (including intellectual property rules) is as important as investment levels.
In summary, the economic imperative for a fair use carve-out is clear: it would help the U.S. maximize a multi-trillion-dollar opportunity and maintain its competitive edge. By embracing an updated legal framework that reflects AI’s realities, Washington can signal that the U.S. intends to remain the global hub for AI development. Just as importantly, this move would encourage other nations to follow suit, promoting an open global data environment rather than a patchwork of restrictive regimes. American leadership on this issue can ensure that AI innovation and creativity thrive together, fueling prosperity and cementing U.S. leadership in the digital economy.
Achieving a statutory fair use carve-out for AI training data will require aligning a diverse array of stakeholders in both the public and private sectors. Each has distinct interests and influences in this policy domain:
-
Congress (Legislative Branch) – Key Role: Enacting the carve-out into law. Both the House and Senate (especially Judiciary Committees responsible for copyright law) are crucial players. Recently, Congress has shown heightened interest in AI and copyright issues: in 2024 lawmakers introduced bills like the Generative AI Copyright Disclosure Act (Rep. Adam Schiff) to mandate transparency about training data (Copyright in Congress: 2024 Year in Review | Copyright Alliance), and the TRAIN Act (Sen. Peter Welch) to allow subpoenas for identifying copyrighted works used in AI training (Copyright in Congress: 2024 Year in Review | Copyright Alliance). These efforts indicate Congress’s engagement, although they leaned toward protecting rightsholders’ interests. To pass a carve-out, Congress must be convinced that enabling AI innovation is a national priority that can be balanced with reasonable safeguards. Key individuals may include members of the intellectual property subcommittees, tech-forward legislators, and those from states with strong tech industry presence. Bipartisan support will be important – framing the issue as about American innovation and competitiveness (rather than weakening copyright) can help achieve that.
-
Executive Branch & Administration – Key Role: Setting policy agenda and implementing supportive measures. The White House can champion the cause by incorporating the AI fair use exception into its tech policy platform and national AI strategy. Administration officials (e.g. in the Office of Science and Technology Policy and the National Economic Council) have influence in highlighting AI’s importance to the economy. The Department of Commerce (including agencies like the National Institute of Standards and Technology and the National Telecommunications and Information Administration) and the U.S. Trade Representative also have stakes: Commerce can guide industry standards and best practices for AI, while USTR will want to ensure any U.S. legal approach can be exported in trade agreements and is compatible with international obligations. Additionally, the U.S. Copyright Office – although housed in the legislative branch – works closely with the Administration on policy studies and could provide guidance. (The Copyright Office launched a comprehensive initiative in 2023–24 to examine AI’s impact on copyright (Looking Forward: The U.S. Copyright Office's AI Initiative in 2024) and is expected to report its findings, which can inform the administration’s stance.) A coordinated executive stance in favor of a balanced carve-out will be critical, both to push Congress and to manage implementation details (such as directing agencies to develop guidelines or regulations under the new law).
-
Judiciary (Courts) – Key Role: Interpreting copyright law in the interim and after any reform. In the absence of a new statute, federal courts are effectively deciding whether AI training is fair use on a case-by-case basis. Authors, artists, and media companies have already filed multiple lawsuits against AI developers, alleging copyright infringement for unauthorized training on their works (AI, copyright and fair use: What you need to know | TechTarget) (AI, copyright and fair use: What you need to know | TechTarget). For example, The New York Times sued OpenAI and Microsoft in late 2023 for using millions of its articles to train an AI model, fearing the AI would replicate and displace its content (AI, copyright and fair use: What you need to know | TechTarget). In April 2024, eight major newspapers (including the Chicago Tribune and New York Daily News) jointly filed a lawsuit claiming OpenAI “misused” their reporting to train its systems (AI, copyright and fair use: What you need to know | TechTarget). These cases are now winding through the courts. While some legal precedents (such as Authors Guild v. Google) suggest that transformative uses like search indexing or analysis are fair use (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries), it remains uncertain how courts will rule on AI training specifically. The judiciary is a stakeholder in that its decisions could either reinforce or undermine the need for a statutory carve-out. A clear law from Congress would guide the courts and reduce long-term litigation. Until then, however, judges effectively have to balance innovation vs. copyright on their own – making it all the more important for the legislative and executive branches to step in with policy guidance.
-
AI Industry and Developers – Key Role: Driving innovation and advocating for legal clarity. This group includes both the tech giants (OpenAI, Google, Microsoft, Meta, Amazon, IBM, and others investing heavily in AI) and startups and research labs developing AI models. Their interest is in having access to large datasets to train AI without constant fear of lawsuits or exorbitant licensing costs. Notably, leading AI firms have argued that using publicly available data for training is a transformative fair use. OpenAI’s stance (as stated in its legal filings and blog posts) is that “training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents” (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries). The industry has a strong incentive to push for a statutory safe harbor that codifies this interpretation. Big AI firms are likely to be influential in lobbying for such an exception, given their resources and the strategic importance of the issue. At the same time, they must navigate public relations and ethical concerns – appearing to steamroll content creators could backfire. Thus, many AI developers express willingness to work on reasonable compromises (such as attribution systems, opt-out mechanisms, or sharing certain benefits of AI) in exchange for legal certainty. The AI industry’s voice in this debate will highlight how overly restrictive copyright rules could hamstring not only their products but also the broader competitiveness of the U.S. tech sector.
-
Content Creators and Copyright Holders – Key Role: Ensuring their rights and interests are respected. This camp is diverse, including book authors, journalists and news organizations, visual artists, photographers, musicians, academic publishers, and media companies. They are the providers of the content that AI models may ingest. Their primary concern is that a broad exception for AI training could undermine their copyright protections and potential licensing revenues. Many fear a scenario where AI systems draw freely from their work to generate output that substitutes for the original, without compensation or credit. This stakeholder group has been vocal: organizations like the Authors Guild, News Media Alliance, music industry groups, and the Copyright Alliance have pushed back on unrestricted data mining. The New York Times’ lawsuit is one high-profile example, and authors like Sarah Silverman joined class-action suits claiming that AI models are effectively derivative works based on their books (AI, copyright and fair use: What you need to know | TechTarget). The creative industries have advocated for measures such as opt-out rights, licensing frameworks, or at least transparency requirements so they know if their work was used. They generally support legislation like the Schiff disclosure bill and the TRAIN Act, which give them tools to track or challenge AI training on their content (Copyright in Congress: 2024 Year in Review | Copyright Alliance) (Copyright in Congress: 2024 Year in Review | Copyright Alliance). Any “new deal” on data will need to bring this group on board by assuring that the carve-out isn’t a free pass for wholesale exploitation. Many content stakeholders may accept an AI fair use exception if it comes with fair reciprocity – for instance, guarantees that AI outputs won’t just regurgitate their works, or mechanisms where they could benefit from the AI (through attribution, access, or revenue-sharing opportunities). Engaging constructively with this group is crucial to avoid a political backlash that could stall reform.
-
Consumers, Researchers, and Society at Large – Key Role: Ultimate beneficiaries of AI innovation and a balanced copyright system. While not always represented by formal lobbyists, the public interest is a key stakeholder. This includes academic researchers and librarians (who have advocated strongly for keeping AI training uses within fair use (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries)), educators, startup entrepreneurs, and everyday users of AI-powered products. Their interest lies in both the advancement of AI capabilities and the continued production of quality creative content. From their perspective, the ideal outcome is an ecosystem where AI can learn from a wide array of knowledge (for more accurate and unbiased results) while human creators still have incentives to create new works. Notably, the Library Copyright Alliance (representing libraries and universities) has argued that treating AI training as fair use is “essential to protecting research”, warning that if text/data mining were limited only to public domain works, studies of contemporary culture and society would be severely hampered (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries). These voices highlight that overly strict copyright controls could hurt knowledge advancement and innovation. On the flip side, consumers also value the fruits of creative industries (books, journalism, art) and would not want them to be undermined. Thus, public interest groups tend to support a balanced solution – one that enables AI-driven progress in science, education, and services, while also fostering a healthy creative economy. Their perspective reinforces the need for a nuanced policy rather than absolutist positions.
In sum, forging a statutory fair use carve-out for AI training data requires a coalition approach. Congress and the Administration must work in tandem, informed by court developments and stakeholder input, to craft a compromise that the AI industry, rightsholders, and public interest groups can all live with. The next section examines how recent global developments underscore the possibility – and necessity – of recalibrating policy in favor of innovation, which sets the stage for finding common ground at home.
Global attitudes toward AI policy have undergone a dramatic pivot in the past two years. Nowhere is this clearer than in the contrasting atmospheres of two high-profile international gatherings: the UK’s AI Safety Summit at Bletchley Park (November 2023) and the Global AI Action Summit in Paris (February 2025). The former embodied a cautious, regulation-heavy approach, while the latter showcased a pro-market optimism about AI’s potential. This shift in tone and priorities provides important context – it reflects a growing consensus that innovation and economic opportunity now demand as much attention as risk mitigation. U.S. policymakers can draw on this changing global mindset to bolster the case for pro-innovation reforms like a fair use carve-out for AI training.
Bletchley Park, 2023 – A Cautionary Approach: Hosted by the United Kingdom, the Bletchley Park summit was the world’s first major gathering to address AI risks at the highest levels. Its focus was almost exclusively on AI safety, ethics, and governance. Government officials, academics, and tech company leaders convened to discuss concerns like AI alignment, existential risks of advanced AI, and frameworks to prevent misuse. The tone was one of precaution – effectively putting guardrails before acceleration. Major tech companies in attendance publicly “committed to establishing safety frameworks” for AI development (How the world stopped worrying and learned to love AI – POLITICO). The resulting Bletchley Declaration emphasized principles for responsible AI, voluntary commitments by firms to external testing, and further international cooperation on AI safety research. Notably, issues of data access or intellectual property were not central to Bletchley’s agenda – the declaration did not mention IP at all (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press). One critique later raised was that the summit was “100 percent about safety” with little to say about how to actually maximize AI’s benefits (How the world stopped worrying and learned to love AI – POLITICO). In effect, Bletchley Park symbolized the initial global impulse to “pause and regulate” – ensuring that AI wouldn’t run ahead of our ability to manage its risks.
Paris, 2025 – An Action-Oriented, Pro-Innovation Stance: Fast forward a year and a few months, and the narrative had evolved. In early 2025, France hosted the Paris AI Action Summit, deliberately branding it an “Action” summit to signal a departure from the cautious tone of Bletchley (How the world stopped worrying and learned to love AI – POLITICO). Leaders from around the world, including tech-positive voices like U.S. Vice President J.D. Vance and French President Emmanuel Macron, struck an upbeat note. Safety concerns took a backseat to optimism – as Politico reported, “Safety fears, top of mind at two previous AI summits in the U.K. and South Korea, barely featured” in Paris, and the final declaration mentioned “safety” only three times (How the world stopped worrying and learned to love AI – POLITICO). Instead, the focus was on investment, deployment, and light-touch regulation. Politicians touted massive funding pledges: Macron unveiled a plan to invest €109 billion in AI in the coming years, and the EU announced a multi-billion euro AI investment package (How the world stopped worrying and learned to love AI – POLITICO) (How the world stopped worrying and learned to love AI – POLITICO). The prevailing message was captured by Vice President Vance’s remark: “I’m not here to talk about AI safety, I’m here to talk about AI opportunity.” (How the world stopped worrying and learned to love AI – POLITICO) This encapsulated the summit’s pro-market optimism.
The Paris summit’s outcomes underscored a strategic pivot: from viewing AI primarily as a potential threat to be contained, to viewing it as a competitive opportunity to be seized. World leaders highlighted the need for agility and avoiding over-regulation. The U.S. and U.K. notably declined to sign the Paris summit’s final declaration – reportedly because it glossed over safety – yet they were fully engaged in the proceedings (How the world stopped worrying and learned to love AI – POLITICO). Indeed, behind the scenes the U.S. and like-minded countries had influenced the agenda to emphasize innovation. European officials, too, signaled a change in priorities; EU President von der Leyen used the event to declare that deploying AI for growth is now a top priority, pivoting from the EU’s earlier focus on strict regulation (How the world stopped worrying and learned to love AI – POLITICO). She assured that Europe would become more “innovation-friendly” and “open for AI and for business” (How the world stopped worrying and learned to love AI – POLITICO), aligning more closely with the U.S. stance that regulations should not “strangle the burgeoning industry.” (How the world stopped worrying and learned to love AI – POLITICO) In short, Paris represented a balancing of the scales: acknowledging safety still matters, but action and optimism must drive the next phase of global AI development.
This whiplash from Bletchley to Paris demonstrates a global shift in AI policy priorities – one that favors the economic and strategic importance of AI innovation. For U.S. policymakers, this shift is both validating and instructive. It validates the idea that fostering AI growth (through steps like easier access to data) is now seen as crucial by our allies and competitors. The world is moving toward a consensus that while we manage AI’s risks, we must also aggressively pursue its rewards. Being overly cautious carries its own risk: falling behind in technology and competitiveness. The Paris summit’s optimistic tone provides political cover and momentum for pro-innovation policies such as a fair use carve-out. If France, the EU, and others are loosening their regulatory grip to encourage AI, the U.S. should certainly ensure its domestic policies aren’t the bottleneck impeding AI advancement.
At the same time, the contrast highlights the need for balance. The journey from Bletchley’s caution to Paris’s enthusiasm suggests that the ideal approach lies in between: enable AI innovation while still upholding essential protections. The U.K.’s delegate in Paris noted that after focusing on safety, it was time to “put that safety to work” for growth (How the world stopped worrying and learned to love AI – POLITICO). The U.S. can embody that balanced philosophy in its copyright policy: craft a carve-out that unlocks AI’s potential (in line with Paris’s spirit) but with guardrails to prevent clear harms (addressing Bletchley-style concerns).
In conclusion, the world’s mood on AI has shifted toward opportunity – a trend the U.S. should lead, not resist. Embracing a statutory fair use exception for AI training data is a logical step in this direction, signaling that the U.S. intends to champion innovation on the global stage. The next sections will delve into how to execute this in practice, examining the trade-offs involved and proposing a framework that secures both innovation and the interests of creators – effectively bridging the gap between Bletchley Park and Paris in our domestic policy.
Designing a fair use carve-out for AI training data requires navigating several legal and policy trade-offs. Lawmakers must balance the benefits of accelerating AI innovation against the risks of undermining copyright protections, all while considering various policy mechanisms to achieve an optimal outcome. Below, we examine the key trade-offs and options:
Benefits of a Fair Use Carve-Out:
-
Accelerated Innovation and Competition: Removing legal uncertainty around data mining would supercharge AI development. Companies and researchers could train models on diverse datasets without months of legal vetting or fear of injunctions. This particularly benefits smaller AI startups and academic projects that lack the resources to negotiate licenses for large corpora. The result would be a more competitive, dynamic AI sector – not dominated only by those who can afford legal battles. Moreover, it enables open-source AI development: if data use is clearly lawful, more models and training code can be released openly, spurring community-driven innovation. In essence, the carve-out would function like a permissive “research and experimentation” zone within copyright, analogous to how fair use has historically allowed new technologies to flourish (for instance, enabling search engines to crawl and index the web, which courts found transformative and legal (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries)).
-
Advancing Knowledge and Societal Benefits: AI systems trained on comprehensive data can lead to breakthroughs in medicine, science, education, and more. If text and data mining is hampered, we risk missing out on lifesaving insights (say, an AI that reads all medical literature to find new drug interactions) or tools that improve everyday life (like AI tutors trained on a vast array of texts). Librarians and scholars underscore that treating AI training as fair use is “essential to protecting research”, because limiting training to only public domain works would exclude modern knowledge and skew what AI systems can learn (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries). A carve-out thus benefits the public by enabling AI to be as knowledgeable and up-to-date as possible. Crucially, doing so does not deprive human creators of their own ability to profit from or control their works in traditional markets – instead, it repurposes those works for a new, socially beneficial use. As one explainer put it, fair use in the AI context “enables developers to build on previous works without taking away existing owners’ rights to control and benefit from their original works” (AI, copyright and fair use: What you need to know | TechTarget). In other words, the original material remains intact and marketable; it is simply being analyzed by a machine for insights, not replacing the author’s creations.
-
Maintaining U.S. Leadership and Alignment with Allies: Legally enabling AI training would harmonize the U.S. with other tech-forward nations that have adopted similar exceptions. Japan and Singapore’s broad data mining allowances explicitly support their AI industries (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press) (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press). The EU, while more cautious, has also introduced text-and-data-mining exceptions (albeit with opt-out provisions for rights holders). If the U.S. fails to adapt, AI development might flow to jurisdictions with more permissive regimes – a loss for American competitiveness. A U.S. carve-out would keep AI R&D at home and allow American companies to collaborate internationally on equal legal footing. It also gives the U.S. moral authority to advocate open data policies globally, shaping a more innovation-friendly international IP system. Given the global shift to pro-innovation highlighted above, the timing is right for the U.S. to lead on this front.
Risks and Concerns:
-
Impact on Creators’ Incentives and Markets: The biggest concern is that allowing free use of copyrighted content for AI training could harm the economic incentives for creators. If AI systems can generate summaries or substitutes for original works, consumers might rely on AI outputs instead of purchasing or viewing the original, affecting markets. For example, an AI that can answer questions using information from The New York Times might reduce traffic to the Times’ website or even draw away advertising revenue if not managed properly. Content industries worry that they will effectively be supplying the raw material for free to technologies that then compete with them. This is especially sensitive for industries already disrupted by digital platforms. From the legal standpoint, this relates to the fourth fair use factor (effect on the market for the work). If an AI model’s use of a work significantly undercuts the market for that work, courts might deem it not fair. Crafting the carve-out will need to consider ways to minimize direct market substitution. Notably, training use is intermediate and non-expressive – the AI developer isn’t publishing the raw work – which means the original can still be consumed in the usual ways. But indirect market harm (via AI outputs) is a gray area that policy must address (for instance, by allowing rightsholders to complain if a model is clearly displacing their product, see safe harbor discussion below).
-
Potential for Copyright Abuse or Misuse: Without some guardrails, a broad exception might conceivably be misused. For instance, one could imagine a bad actor scraping entire books, claiming to be “training an AI,” but really just republishing the content in a new form. True AI training is transformative – it analyzes and abstracts from the input – but distinguishing a bona fide AI process from a sham could be challenging legally. Another scenario: if models are not well-designed, they might regurgitate large chunks of copyrighted text verbatim (as early generative models sometimes did), which essentially becomes a new form of unauthorized distribution. These risks can be mitigated by policy conditions (e.g., the carve-out could exclude cases where the AI outputs are substantively the same as the inputs, meaning those outputs would still infringe if they substitute for the original). Nonetheless, lawmakers must be aware that simply declaring “all AI training is fair use” without nuance could open loopholes. The goal is to protect good-faith transformative uses, not to bless content theft under an AI label.
-
Compliance with International Law (Berne Convention): The U.S. is party to the Berne Convention/TRIPS, which requires that copyright exceptions meet the “three-step test” – they must be for certain special cases, not conflict with a normal exploitation of the work, and not unreasonably prejudice the legitimate interests of authors. A poorly crafted carve-out could be challenged by trade partners as violating these criteria. For instance, if the exception was too broad and allowed uses that supplant normal licensing markets (conflicting with normal exploitation), it might raise red flags. However, many countries (as noted) have implemented TDM exceptions they deem compliant with Berne by tailoring the scope and adding conditions like “no unreasonable prejudice to rightsholders” (language we can echo in U.S. law (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press)). The U.S. would need to articulate the carve-out as a specific, limited exception aimed at AI training (a “special case”), which does not facilitate direct consumption of the copyrighted work and thus doesn’t encroach on the core market for the work. Ensuring the exception is seen as compatible with international norms is important for both legal and diplomatic reasons.
-
Litigation and Transition Costs: Any significant change in copyright law can trigger litigation as stakeholders test its boundaries. There is a risk that even after passing a carve-out, there will be lawsuits challenging particular uses or the law’s interpretation. In the short term, that means more legal uncertainty until case law clarifies the new statute. Courts might differ on what counts as AI “training” or whether a given output is protected by the carve-out or not. Policymakers should try to define terms clearly (what is “AI training data use”? what obligations must the AI developer meet?) to minimize this. Additionally, industries will face transition costs in adjusting to the new regime – e.g., content companies might invest in new watermarking or opt-out systems, and AI firms might implement new data handling protocols to comply with safe harbor conditions. While these costs are worth the long-term benefit, they should be considered in the implementation timeline (perhaps with a phase-in period for compliance measures).
Policy Options Considered:
-
Status Quo – Case-by-Case Fair Use Litigation: One option is to leave things to the courts under existing fair use law. AI developers would argue fair use as a defense, and over time precedents would accumulate. Indeed, some current lawsuits (like those by authors and newspapers) will likely set important benchmarks if they proceed. The advantage of this path is flexibility – fair use can evolve organically, and perhaps courts will indeed find most AI training to be fair (there is strong precedent in favor, such as the Google Books case where digitizing entire books for a search tool was fair use (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries)). However, the downsides are significant: it could take years of legal battles and possibly inconsistent decisions (different circuits might rule differently, prompting a Supreme Court showdown). During that time, uncertainty prevails – potentially discouraging investment, or conversely, leading to unbridled use followed by a harsh court ruling that then forces costly model retraining (as the NYTimes suit foreshadowed: a court could order a trained model to be discarded (AI, copyright and fair use: What you need to know | TechTarget)). Relying on litigation is a slow and expensive way to formulate what is essentially a broad policy question. It also leaves important nuance (like conditions on use) to be hashed out in adversarial settings, rather than crafted proactively.
-
Voluntary Agreements and Licensing Market Development: Another path is to encourage voluntary solutions between AI developers and content owners. This could include bilateral licensing deals (some news organizations have already started negotiating with AI firms for access to their archives), or industry-wide frameworks (perhaps a collective licensing body where AI companies pay fees that are distributed to creators, similar to how radio stations pay music royalties). Voluntary opt-out systems might also emerge (e.g., a standard metadata tag publishers can use to signal “do not scrape”). The U.S. government could stay hands-off and let the market sort it out. While market-driven solutions are appealing in theory, in practice this approach has big drawbacks. The transaction costs of licensing with thousands of rights holders are enormous – something that fair use historically avoids by eliminating the need for permission in socially beneficial cases. Collective licensing could mitigate that, but setting up a new collective management system is itself a heavy lift and usually requires some legal mandate or antitrust exemption. Moreover, big incumbents might strike deals, but smaller creators and smaller AI firms could be left out (leading to a less competitive AI market, and some creators still uncompensated). We could end up with a patchwork where only content from those who opt in is used, possibly biasing AI systems to whatever content is easiest to license, not necessarily the most representative or highest-quality. In short, pure market solutions likely won’t scale in the timeframe needed for AI development – and they risk entrenching the powerful (big tech and big media can deal, but startups and independent creators lose out).
-
Legislative Carve-Out with Conditions (Recommended): The optimal path is a legislative solution: explicitly carve out AI training uses as non-infringing, but build in conditions and safeguards to address the above risks. This approach combines clarity with balance. By writing the rules into statute, Congress can delineate the scope (what types of works/uses are covered), ensure it passes the three-step test, and incorporate protections for rightsholders (like requiring lawful access to data, no direct commercial redistribution of the content, etc.). The carve-out could be added as an amendment to Section 107 (fair use) or as a new section in the Copyright Act that functions as a specific exception. Importantly, to make it politically and practically viable, it should not be an unconditional blank check to AI developers. Instead, Congress can attach safe harbor provisions that condition the liability protection on certain behaviors by AI developers – effectively a quid pro quo that incentivizes good practices (this is detailed in the next section). The legislation could also clarify that the exception covers only the input/training phase and does not immunize outputs that amount to infringement (so if an AI spouts a verbatim protected poem, the usual rules of infringement apply to that output). By doing so, the law would mirror the approach taken in jurisdictions like Japan (broad allowance for analysis with a caveat against prejudicing authors’ interests) and the evolving consensus that intermediate copies for machine learning are a special case deserving of leniency (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press). This route provides certainty to innovators and protection to creators, achieving a better long-term equilibrium than protracted court fights or slow-moving private deals.
-
Partial Measures or Delays: There are, of course, other variations – such as a temporary moratorium on litigation (not very feasible), or an exception only for non-commercial research (letting companies fend for themselves). The EU essentially did a version of this: a mandatory exception for non-commercial text and data mining (so researchers are safe) and a separate optional exception for commercial TDM where rightsholders can opt out. The U.K. at one point considered a broad exception for all TDM with an opt-out for rightsholders, though it backtracked amid rightsholder pushback. A U.S. law could similarly limit the carve-out initially – for example, say it applies to nonprofits and startups below a certain revenue, but larger commercial uses still require fair use analysis or licensing. This might be more palatable to content industries as a stepping stone. However, such partial measures reduce the benefit and clarity of the reform. AI innovation often happens in the commercial sector, and even large players like OpenAI started as small labs. Drawing a line between “good” (non-commercial) and “bad” (commercial) uses is tricky and can hamper scaling. Moreover, limiting to non-commercial doesn’t fully solve the research problem: many important AI developments come from industry, and excluding them leaves the U.S. at a disadvantage. While a phased approach could be contemplated (e.g., first secure non-commercial exceptions, then broaden), this brief advocates aiming directly for a broad but balanced carve-out covering commercial uses as well, because that is where U.S. competitive interests most strongly lie (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press) (and as other countries like Japan/Singapore have shown, it is workable).
In weighing these options, the trade-off boils down to: how do we maximize the upside of AI (innovation, economic growth, global leadership) while minimizing potential downsides to creators and conforming to international norms? The recommended path of a conditioned carve-out seeks to do just that. It treats AI training as generally fair use (recognizing the transformative, non-expressive nature of the use (AI, copyright and fair use: What you need to know | TechTarget)), but doesn’t leave creators empty-handed or exposed to abuse. The next section outlines the specifics of this approach – the “New Deal on Data” – which attempts to give both AI developers and rightsholders a win-win scenario and clear guidance moving forward.
To reconcile the interests at stake and secure America’s AI future, we propose a “New Deal on Data” – a comprehensive legislative and policy package that ensures AI can progress unhindered, while giving content creators fair treatment and new opportunities in the AI-driven economy. This New Deal on Data has several key components:
1. Statutory Fair Use Carve-Out for AI Training: At its core is a new provision in U.S. copyright law explicitly permitting the use of lawfully acquired data (including copyrighted works) for purposes of machine learning and algorithmic analysis. This could be structured as an amendment to the fair use section or a standalone exception. The language should affirm that making copies or extracts of works as necessary to train an AI model (or to develop AI algorithms) is not an infringement, provided the use is truly for analysis/training and not for consuming the expressive content itself. In effect, the law will distinguish between using a work to enjoy its creativity (which remains protected) and using a work to learn from it (which becomes allowed). By doing so, it codifies what many argue is already the case: that AI training is a transformative, intermediate use akin to other accepted fair uses (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries) (AI, copyright and fair use: What you need to know | TechTarget). This carve-out should apply to all types of media (text, images, audio, etc.) because AI spans modalities. It would immediately remove the legal cloud over practices like web scraping or dataset creation for AI, as long as the data is accessed legitimately (no hacking) and used for model development.
2. Guardrails to Protect Rightsholders: The carve-out will include built-in limitations to ensure it does not eviscerate the rights or markets of content creators. Key guardrails might include:
-
No Enjoyment/Consumption Clause: Mirroring Japan’s approach, specify that the exception applies only when the use of the work is not for the purpose of human consumption of its expressive content (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press). In plain terms, if you’re using the work to entertain or inform people with its content, that’s not covered; if you’re using it to teach an AI system, it is covered. This helps draw a clear line distinguishing AI training from traditional exploitations of a work.
-
Non-Prejudice to Markets: Include a clause (consistent with the Berne three-step test) that the use should not “unreasonably prejudice the legitimate interests of the copyright owner” (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press). While somewhat general, this serves as a safeguard: if a particular use of data can be shown to cause tangible harm beyond the abstract (for instance, if an AI system was explicitly designed to replicate and replace a proprietary database that the rightsholder licenses for a fee), then that might fall outside the safe harbor. Most AI training, which uses broad eclectic datasets, would easily meet this standard, but it prevents extreme cases of market substitution from hiding behind the exception.
-
Output Reservation: Make it clear that the carve-out does not cover the outputs of AI systems to the extent they themselves infringe copyright. If an AI model generates a verbatim excerpt or a very close derivative of a copyrighted work, the normal rules of infringement apply to that output and to the act of distributing it. (In practice, this means a rightsholder could still challenge an AI application that is effectively publishing protected content – for example, an AI storytelling app that spits out whole chapters of existing novels on request would not be shielded by the training exception). By separating input from output, the law can allow the learning process but still give creators recourse if an AI is used as a tool of piracy or plagiarism. This addresses the common worry: “What if the AI just reproduces my work?” – the answer being, that reproduction is not protected by the carve-out. Only the learning phase is.
-
Lawful Access Requirement: The exception should apply only to works that have been lawfully obtained or accessed by the AI developer. This is similar to the condition in Singapore’s law (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press). If a work is behind a paywall or confidential, an AI developer should not be able to hack or steal it and claim safe harbor. But if a work is publicly available on the internet, or the developer has bought a copy/subscription, then using it for training is permitted. This encourages respect for access controls and contracts (it doesn’t override anti-piracy laws or privacy laws), while ensuring that publicly accessible information remains available for knowledge building. In practice, “lawful access” means AI companies can scrape public web pages or ingest purchased content (e.g., an e-book they bought) for training, but they couldn’t, say, use content from a leaked database or from a site they scraped in violation of a clear prohibition that has legal weight.
3. Safe Harbor Provisions (Incentivizing Responsible AI Practices): As part of the New Deal, Congress can create conditional safe harbors that limit liability for AI developers who adhere to certain good practices. This is akin to how the DMCA offers online services safe harbor if they follow notice-and-takedown procedures. Here, the idea is to set up a system where AI developers get extra protection (or presumptions in their favor) if they do things that provide reciprocal benefits to the public and rightsholders. For example:
- An Open Source/Open Access Safe Harbor: If an AI model (or significant portions of its code and weights) is released openly for others to use and inspect, the developer would enjoy an enhanced liability shield for its training data use. Rationale: open-source AI models contribute to the public good (education, research, transparency) and are less likely to pose secret commercial harm to any given content owner. The law could stipulate that models released under an open license, or made freely available for non-commercial use, are conclusively deemed fair use in their training, regardless of the data involved. This encourages companies to open up models (or at least a free version), spreading AI benefits. It also allows the public (including creators) to see how the model behaves, reducing fear of the unknown.
- A Free Tier or Public Benefit Requirement: Similarly, AI services that offer a meaningful free tier to users – rather than being entirely paywalled – could be granted safe harbor. If a company ensures that the general public can access and benefit from the AI (for example, a free basic version of a chatbot or image generator), then the training of that AI on varied data is seen as contributing to societal benefit, justifying limited liability. Conversely, if a model is used in completely closed, high-priced enterprise settings only, lawmakers might feel less urgency to protect that use via statute (those players might then negotiate licenses). Tying safe harbor to a free or open aspect essentially pushes the AI industry toward more open access, which expands the innovation ecosystem and addresses equity concerns (so AI’s benefits aren’t confined to those who pay).
- Transparency and Opt-Out Mechanism: As a condition of safe harbor, developers could be required to disclose information about their training datasets (at least in general terms, or specific works if known) and honor requests from rightsholders to opt-out their works from future training. A lighter version of the Schiff disclosure bill (Copyright in Congress: 2024 Year in Review | Copyright Alliance) could be integrated: rather than an onerous registry of every work used, perhaps require AI developers to publish summary statistics of their data sources (e.g., percentage of data from books vs. websites, list of top 100 domains scraped, etc.), and to provide a way for authors to flag their content for removal in the next model update. If a developer complies with such practices, they gain safe harbor; if they refuse basic transparency or ignore opt-outs, they might lose the statutory protection (and thus face potential infringement liability for those works). This approach builds trust – creators feel they have some control and knowledge, even if the initial training was allowed. It is a compromise: the AI developer doesn’t need to get permission first (which is impractical), but the creator can after-the-fact say “please don’t use my works again” and the developer must comply going forward. Most creators might not opt out if the AI proves harmless or even beneficial, but having the choice matters.
- Non-Competition Pledge for Creative Sectors: Another reciprocal idea is having AI firms pledge (or be required) not to develop AI products that are direct substitutes for certain types of creative works without a license. For instance, an AI that generates music indistinguishable from popular artists might be seen as directly competing with those artists. The law could carve out certain domains from the safe harbor unless additional agreements are in place. While hard to enforce, even a stated principle or requirement for companies to “avoid enabling the commercial replication of an individual author’s unique expression” without permission could serve as a guideline. In practice, this might mean if an AI was explicitly trained to mimic a specific living artist’s style in detail, that training might not qualify (unless that artist consented). However, broad general training on thousands of artists (none of whom is fully replicated) would be fine. This is a nuanced area, but the goal is to reassure creators that the carve-out isn’t meant to create sanctioned copycats of them; it’s about general learning.
The safe harbor provisions essentially say to AI developers: if you contribute back to the community (through openness, free access, transparency, respect for opt-outs), then the law will give you a high degree of certainty and protection for your use of data. If you choose to be entirely proprietary and uncooperative, you might still have the base carve-out to argue, but you could be more exposed to lawsuits if what you did falls outside strict lines. This creates incentives for responsible AI development in line with public values – a win-win outcome of the New Deal on Data.
4. Collaboration and Compensation Mechanisms: In addition to the legal carve-out, the New Deal on Data could include soft measures to bring stakeholders together. For example, establish a government-supported AI-Creator Council where AI companies and content industry reps regularly meet to discuss concerns and share best practices. This can ensure the relationship is cooperative, not adversarial, as the technology evolves. The council could oversee things like the aforementioned opt-out registry or develop standards for data attribution in AI (perhaps finding ways for models to credit sources when generating content, where feasible).
Another possible piece is exploring compensation models that don’t impede training. One idea floated by some experts is a “training levy” or fund: AI firms might collectively contribute to a fund that supports journalism, arts, and other creative endeavors, acknowledging that their outputs draw from the creative commons. This could be voluntary or structured via tax incentives. It wouldn’t be a per-work license cost (which is unworkable at scale) but rather a broader give-back to the creative ecosystem. Such a fund could finance grants for authors or music programs, etc. This concept is analogous to private copying levies or the way cable TV had compulsory licenses – not paying each rightsholder directly for every snippet used, but supporting the sector as a whole. Implementing this would require further study and likely industry buy-in, so the immediate legislative focus should remain on the exception itself. But including a mandate for an exploratory commission on creative remuneration in the AI era could signal to rightsholders that their livelihood is being considered.
5. Government Leadership by Example: The New Deal on Data can also involve the U.S. government setting an example in utilizing and sharing data for AI. For instance, the government could expand initiatives to open up public datasets (like federal archives, libraries, and databases) for AI training, fully in the public domain or under open licenses. It can also ensure that AI models developed with government funding follow the safe harbor best practices (perhaps requiring open release if feasible). By doing so, the government bolsters the pool of training data that is free of restriction, reducing reliance on copyrighted material over time. Simultaneously, it shows that good policy allows AI to flourish without trampling author rights – since the government can demonstrate responsible AI development (e.g., an AI trained on government-funded research that’s openly released, benefiting all).
6. International Alignment: Finally, this proposal would position the U.S. to shape international norms. The legislation could include a directive for the U.S. Trade Representative and State Department to promote similar exceptions abroad and to ensure that trade agreements explicitly permit such AI-related copyright exceptions. This global outreach is crucial, as AI development is borderless. If the U.S. has a clear rule, it can advocate for that rule in WIPO discussions or bilateral talks, pushing for a global “AI training exception” standard. That way, AI companies aren’t tripped up by varying laws when operating globally, and U.S. companies can share models or collaborate overseas with less friction. It also helps prevent a scenario where content owners in one country try to enforce their stricter laws against a U.S. AI developer – a common understanding can mitigate conflict. By framing the U.S. carve-out as a model for others, we reinforce U.S. leadership not just in technology but in forward-thinking policy.
In summary, the New Deal on Data is about balance and reciprocity. It says: we will free AI to learn from the world’s information (because that benefits everyone), but in return AI developers will share the benefits and respect certain boundaries to uphold creative value. It is a modern social contract between Silicon Valley and content creators, brokered by sensible legislation. By implementing this deal, Congress and the Administration would ensure the U.S. remains the best place to build transformative AI systems, while also affirming to creators that society values their contributions and will continue to protect them from unfair appropriation.
The final piece of this brief offers concrete recommendations for safe-harbor provisions to implement the above and a roadmap for turning this proposal into reality by 2026.
To operationalize the fair use carve-out and New Deal on Data, we recommend specific safe-harbor provisions that tie limited copyright liability to proactive steps by AI developers. These safe harbors will create clear rules of the road and encourage behaviors that benefit the public and rightsholders. Lawmakers can incorporate these as clauses in the legislation or empower the Copyright Office to issue regulations detailing them. Below are the key safe-harbor options and how they would work:
-
Safe Harbor 1: Open Source Model Release – “Open Model, Open Use.” AI developers who release their trained models (and/or training code) under an open-source license or make them freely available for use would receive a safe harbor protecting them from copyright infringement claims related to the training data. By open-sourcing, they demonstrate that the model is a public good, not a proprietary product built secretly on others’ content. This transparency greatly reduces the risk perception and allows anyone (including authors) to examine how the model behaves. Recommendation: Enact a rule that if a model is published under a permissive license (or made available for non-commercial public use without charge), then any intermediate copies made to develop that model are deemed non-infringing. This incentivizes companies to share at least some models openly. It could be applied on a sliding scale – e.g., full open-source grants complete safe harbor, partial open release (like allowing academic use) grants a partial presumption of fair use. The goal is to align the company’s interest (legal safety) with a societal benefit (open AI research).
-
Safe Harbor 2: Free Public Access Tier – “Benefit the public, earn protection.” Many AI services are commercial, but they can still offer value to the public through a free tier (for example, a chatbot that anyone can use up to a certain limit per day, or image generator with some free credits each month). We recommend a safe harbor provision that AI services providing open access (free of charge and easily available to the public) get a stronger shield against infringement claims. This recognizes that such services are effectively disseminating AI benefits broadly, not just monetizing closed systems. Recommendation: Define criteria for a qualifying “free tier” (perhaps must be available to all with basic functionality, not just a time-limited trial). If an AI model is deployed with such a free public access component, then any lawsuit alleging its training infringed copyright would face a raised bar – for instance, require plaintiffs to show willful exclusion of opt-outs or egregious misuse to overcome the safe harbor. This condition pushes companies to keep AI accessible. It’s a kind of public service requirement in exchange for use of public data. One can analogize it to broadcasters using public airwaves but providing public interest programming. Here, AI uses public data (largely gleaned from the internet commons) and in return should serve the public interest by not being entirely paywalled.
-
Safe Harbor 3: Data Transparency & Registry Compliance – “Be transparent and respect opt-outs.” Under this safe harbor, AI developers who adhere to transparency measures about their training data and comply with a formal opt-out process gain protection. Concretely, this means if the developer registers their training dataset (for instance, depositing a hashed copy or a list of sources with a neutral body or the Copyright Office) and provides a mechanism for copyright owners to identify and exclude their works, then the developer cannot be held liable for using works that were not opted-out at the time of training. Recommendation: Establish an “AI Training Registry” (perhaps managed by the Copyright Office or a nonprofit consortium). Developers who want safe harbor can submit a description of their dataset or the dataset itself in secure form. Rights holders could query the registry to see if their works are included (through automated matching). They could then file an opt-out notice for future training. The law would say that registry-compliant developers are shielded from monetary damages for past training of any work unless the developer failed to honor a prior opt-out from that rights holder. This essentially gives creators a forward-looking control while not penalizing AI firms for past or unknowing use. It also addresses the information asymmetry: creators currently don’t even know if their work was used. With transparency, trust can be built. The safe harbor ensures that as long as companies play by these rules, they won’t face lawsuits for, say, inadvertently including a certain novel in their 2022 training run – the author can only opt out for the future, not sue for the past. Over time, this balances interests: most will leave their works in (especially if they see the AI isn’t harming them), but those who strongly object can remove theirs, and companies adjust accordingly.
-
Safe Harbor 4: Anti-Memorization & Output Filtering – “Prevent leakages, avoid liability.” This safe harbor ties liability protection to technical measures that prevent an AI model from outputting large verbatim excerpts of training data. Essentially, if a developer implements reasonable content filtering or anti-memorization techniques to ensure the model doesn’t spit out someone’s entire song lyrics or a full chapter of a book upon request, then the developer is protected on the training side. If they do not implement such measures and the model frequently regurgitates protected works, they lose the safe harbor (and could be liable for contributory infringement for facilitating distribution of those works). Recommendation: Require AI developers to certify that their model has been tested for memorization of training data and that they have put in place safeguards (like maximum output length for verbatim text, or refusal to output detected copyrighted text beyond a few lines). If a plaintiff can show the developer willfully failed to do this (and the model is effectively a repository of pirated content), then the safe harbor is forfeited for that model. On the flip side, compliance with this gives the developer a strong defense. This approach directly targets the core legitimate worry of rights holders (the AI as an unwitting re-publisher) and gives developers a blueprint to avoid that outcome. Many AI companies are already attempting this (OpenAI, for example, claims to have filters to prevent large copyrighted song lyrics from being given by ChatGPT). This safe harbor would formalize it: protecting those who make good faith efforts to minimize output infringement, and isolating those who negligently let their AI become a tool for piracy.
-
Safe Harbor 5: Non-Commercial & Research Use Automatic Exception – While our goal is to include commercial uses, it’s worth explicitly affirming that non-commercial, research, or educational AI training is unequivocally protected. This could be framed as a safe harbor or simply part of the statute’s scope. It ensures universities, libraries, and independent researchers have full freedom to experiment with AI using whatever data they have access to. Many scholars argue this is already fair use (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries), but codifying it removes any doubt. This could be done by stating that if the AI development is not for direct monetary gain (or is being done by a nonprofit/educational institution), it is per se a fair use to employ copyrighted works in training. While commercial developers would still need the above safe harbors, this carve-out within a carve-out gives breathing room to academic and hobbyist innovation, which historically has been vital for breakthroughs (remember that some of the most influential early AI models came from university labs or open communities).
-
Safe Harbor 6: Voluntary License and Partnership Credits – If an AI developer does choose to enter licensing agreements or partnerships with content creators (for example, a deal with a news organization to get priority access to its content), that should not be seen as undermining the fair use claim for other content. A concern AI companies might have is: “If we license from some, will others claim that shows we needed a license from everyone?” To alleviate this, a safe harbor could clarify that entering voluntary licenses doesn’t waive the protection for unlicensed uses. In fact, we can encourage voluntary deals by saying if an AI firm licenses a certain collection, any use of that licensed data is of course non-infringing (it’s licensed), and it doesn’t affect their safe harbor for the rest. Basically, companies should be free to exceed the baseline (by paying some willing rightsholders) without fear of losing the safe harbor for those who weren’t part of a deal. This way, mutually beneficial arrangements can happen in parallel to the legal exception. It’s not so much a safe harbor as a clarification: partial licensing won’t be held against the claim of fair use for other works.
Each of these safe harbor provisions ties into the broader legislative scheme. They can be mixed and matched – for instance, the law might automatically exempt non-commercial uses, and for commercial uses offer a menu: comply with A, B, and C (e.g. transparency, filtering, free access) and you are safe. The idea isn’t to make it onerous, but to set best practices that the major, responsible AI players can adhere to (many likely would do these things anyway to maintain public trust).
By adopting these safe harbors, Congress would effectively create a framework of accountability in exchange for flexibility. AI developers get what they most need – clarity and reduced liability – and in exchange they give creators and the public some combination of transparency, access, and assurance against worst-case scenarios. It is a classic compromise that can leave all sides better off than the status quo of uncertainty and conflict.
Time is of the essence. The United States stands at a crossroads in the global AI race, and the decisions we make on intellectual property in the next year will help determine whether we lead or lag in this critical domain. The analysis above makes clear that updating copyright law to enable AI training is not a niche technical tweak – it is a strategic imperative for U.S. economic competitiveness, technological leadership, and even upholding our democratic values in the AI era. The current copyright framework, last overhauled in 1976, is straining to accommodate technologies undreamt of 50 years ago. Indeed, legal scholars have noted that comprehensive copyright reform tends to occur roughly every half-century, meaning the 1976 Act is due for reform by 2026 (The Evolving Role of Commercial Use in Copyright Law). The rapid rise of AI is precisely the kind of paradigm shift that compels such reform; “reform is in the air,” as one commentator observed (The Evolving Role of Commercial Use in Copyright Law). We must not let outdated laws inadvertently hamper the next wave of American innovation.
The call to action is therefore clear: Congress and the Administration should work hand-in-hand to enact a statutory fair use carve-out for AI training data, with appropriate safeguards, as part of a 2025–2026 intellectual property reform agenda. This initiative can be the centerpiece of a broader “Next Great Copyright Act” for the digital age. It aligns with what forward-looking policymakers from both parties want – spurring innovation, growing the economy, and outpacing authoritarian competitors like China in technological capability – while also respecting the property rights and contributions of American creators. In short, it’s a quintessential American solution: leveraging our openness and creativity to drive progress.
Recommended Roadmap (2024–2026):
-
Executive Leadership in 2024: The Administration should publicly endorse the concept of an AI training fair use exception. For example, the President (or Vice President) could incorporate it into major tech policy speeches, framing it as essential to “winning the AI future” and ensuring AI reflects democratic values (since a freer data environment means AI isn’t starved of diverse perspectives). The White House could establish a high-level Task Force on AI and Intellectual Property, bringing together the U.S. Copyright Office, USPTO, OSTP, and key industry and creator representatives to refine the proposal. This task force can build on the Copyright Office’s ongoing AI study and any recommendations coming from it (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries). By late 2024, the Administration should issue principles or even draft bill language signaling what it wants in legislation (much like how administrations often send Congress draft texts for consideration).
-
Stakeholder Convenings and Coalition-Building: Throughout 2024, Congress (through committees) and the Administration should convene roundtables with stakeholders – tech companies, publishers, authors guilds, libraries, etc. – to discuss the New Deal on Data framework. The aim is to hash out concerns and get input on safe harbor conditions in an inclusive forum rather than via lawsuits. Simultaneously, build a coalition of support: large AI firms and startups can join forces with research institutions and consumer groups to advocate for the carve-out, while reasonable voices in the creative community (those open to compromise) are brought on board with assurances (perhaps a commitment to some of the safe harbor provisions like transparency or an industry fund, as discussed). By early 2025, this coalition should produce a public report or open letter making the case to Congress for action, demonstrating unity between innovation and creation stakeholders.
-
Legislative Action in 2025: Leverage the momentum from the pro-innovation message of the Paris Summit and the domestic coalition to introduce a bill in the new Congress. Ideally, identify bipartisan co-sponsors in the House and Senate – for example, a tech-savvy Republican and a Democrat from a state with both creative and tech industries. The bill – call it the AI Innovation and Fair Use Act of 2025 – would encapsulate the statutory carve-out and the safe harbor provisions. It should be referred to the Judiciary Committees. Given the complexity, hearings are essential: Congress should hold hearings where experts testify on the economic stakes (citing numbers like the trillions in AI-driven GDP growth (PwC's Global Artificial Intelligence Study | PwC )), the legal rationale (fair use precedent and other countries’ exceptions (Training Generative AI Models on Copyrighted Works Is Fair Use — Association of Research Libraries) (AI Training and Copyright Infringement: Solutions from Asia | TechPolicy.Press)), and the safeguards for rightsholders (how the proposal still preserves creative markets). Lawmakers can question both AI company reps and creator reps to ensure the balance is right. The goal of these hearings is to educate members and staff, dispel myths (e.g., that this would destroy copyright or, on the flip side, that AI companies want to steal artists’ work – neither is true given the balanced approach), and refine the bill. Lessons from the EU and other jurisdictions can be presented to show this is globally reasonable.
-
Amendment and Consensus-Building: After hearings, the bill may be tweaked – perhaps adding an opt-out provision, or clarifying a term – to address valid concerns raised. It’s crucial to maintain the core (broad permission for AI training) while being flexible on implementation details. During this stage, high-level engagement by the Administration can help – e.g., the Commerce Secretary or U.S. CTO can hold private meetings with key senators to shore up support, emphasizing how this aligns with national competitiveness goals. By mid-to-late 2025, the aim is to have a version of the bill that has broad committee support. This could potentially be rolled into a larger legislative package if one emerges (for instance, if there’s a 2025 tech innovation omnibus, or as part of the anticipated 2026 copyright reform package). It might also move on its own if urgency is felt.
-
Passage by 2026: The target should be to pass the reform by late 2025 or early 2026. This timing would sync with the 50-year mark since the last major copyright act – a symbolic milestone to enact the “Copyright Act of 2026” updated for AI and digital realities. To ensure passage, proponents should keep the narrative focused on U.S. leadership and job creation (e.g., “This reform will help create the next ChatGPT or DeepMind here in America rather than overseas, securing high-tech jobs and economic growth”). It’s also wise to highlight support from educational and research communities, reinforcing that this is about knowledge advancement. By addressing reasonable accommodations for creators (through safe harbors and possibly a creation-support fund or tax credit), the bill can garner enough political goodwill to overcome opposition from any hardline copyright maximalists. The final votes, ideally, will show a bipartisan consensus that America’s copyright law can adapt to enable new technology – just as it has in past eras (e.g., accommodating VCRs, mp3 players, search engines through fair use or other exceptions).
-
Implementation and Follow-up (2026 and beyond): Once enacted, the Administration should ensure smooth implementation. The Copyright Office may need to develop the opt-out registry or issue guidelines for compliance – this should be adequately funded and accelerated. The DOJ and courts should be informed of the new policy so that any ongoing lawsuits can be viewed in light of the updated law (many may be mooted or see claims drop if the law applies retroactively or going forward). The Administration should also take the lead in international forums to announce and explain this new U.S. approach, encouraging allies to adopt similar measures. In trade negotiations or at WIPO, the U.S. can propose an “AI exception” template based on our law. Domestically, a mechanism to review the impact of the carve-out after, say, 5 years could be included – e.g., the Copyright Office and OSTP might report to Congress by 2030 on how the exception affected AI innovation and creative industries, to empirically assess the outcomes and tweak if necessary.
In conclusion, embracing a statutory fair use carve-out for AI training data is a forward-looking policy that will pay dividends in U.S. economic dynamism and global standing. It is a policy that says we choose innovation over fear, that we trust our longstanding principles (like fair use and the public interest in knowledge) to carry us through technological change. By enacting the New Deal on Data, Congress would ensure that American AI developers have the freedom to learn from the wealth of our digital age – from books to websites to scientific journals – without endless legal roadblocks. And by coupling that freedom with fair safeguards, we ensure that authors, artists, and industries continue to thrive alongside AI, not be trampled by it.
The world is watching for who will set the rules in the AI century. If the United States seizes this moment, we can lead by example: establishing a model that other nations adopt, creating a global environment where AI innovation is robust and responsibly integrated with respect for human creativity. This is America’s opportunity to write the next chapter of innovation policy – one that secures our prosperity and leadership for decades to come. We must act now, with confidence and foresight, to update our laws and unleash the full potential of AI for the benefit of all. The future will belong to those who innovate; let us ensure that future is made in America, under rules that reflect our best values of ingenuity, openness, and fairness.
Congress and the Administration should move swiftly – the 2026 horizon for copyright reform is fast approaching, and with it, the chance to usher in a new era of AI-empowered growth. This policy brief has outlined the path forward. The case is compelling, the coalition is forming, and the stakes are too high to delay. Now is the time for a bold “fair use for AI” reform – so that the United States not only keeps pace with the global AI revolution, but leads it, on our own terms.