Futurescapes

The Case for Advanced Market Commitments for AI Training Data: How Guaranteed Markets Could Democratize AI

Apr 25, 2025

5 min read

In the race to develop increasingly sophisticated artificial intelligence models, one factor remains paramount: the quality and diversity of training data. The most powerful AI systems today are trained on vast datasets scraped from the internet—but these datasets disproportionately represent wealthy, English-speaking populations while leaving billions of people underrepresented. This isn't just a minor oversight; it's a fundamental barrier to equitable AI development that could deepen existing global inequalities.

Consider this: while ChatGPT can craft eloquent essays in English, it struggles with languages like Swahili or Bengali—spoken by tens of millions of people. Image recognition systems excel at identifying objects common in Western contexts but falter when presented with items unique to other parts of the world. These aren't technical limitations but data limitations. And unlike many technological challenges, they won't naturally resolve themselves through market forces alone.

The problem runs deeper than just language. Facial recognition systems have notoriously higher error rates for darker-skinned faces and women—a direct result of training datasets that underrepresent these groups. Medical AI tools trained primarily on data from Western populations may fail to account for different disease presentations or treatment responses in other populations. As AI becomes increasingly embedded in critical infrastructure, these gaps represent not just inconveniences but potential harm to billions of people.

What if there were a proven mechanism that could help solve this problem? A tool already tested in global health that could be repurposed to democratize AI development? Advanced Market Commitments (AMCs) offer precisely such a solution—and there's already a successful blueprint for how they work.

When Markets Fail: The Innovation Gap

Before diving into solutions, we need to understand why some innovations never materialize despite their potential value. The answer lies in basic market economics: companies invest in developing products only when they anticipate sufficient return on investment.

Consider the case of vaccines for tropical diseases. Malaria, tuberculosis, and neglected tropical diseases afflict primarily low-income populations. Pharmaceutical companies, driven by profit motives, historically underinvested in these vaccines despite their potential to save millions of lives. As Michael Kremer and colleagues explain in their research on AMCs, the problem wasn't technological impossibility but insufficient economic incentives.

"Mechanisms such as patents and prizes that stimulate research and development for products sold in high-income markets may fall short in low-income markets," note Kremer, Levin, and Snyder in their analysis of AMCs for vaccines. "Patents generate deadweight loss along with the monopoly rents intended to incentivize investment; furthermore, the monopoly rents may be limited in countries with mostly poor consumers."

The Malaria Vaccine Tragedy: A Case Study in Market Failure

The story of malaria vaccine development illustrates this market failure with painful clarity. Despite malaria killing hundreds of thousands of people annually—mostly children under five in Africa—it took until 2021 for the World Health Organization to approve the first malaria vaccine, RTS,S (Mosquirix). This wasn't because the scientific challenges were insurmountable; it was largely because the economic incentives were misaligned.

The disease has been responsible for half of all human deaths since the Stone Age, killing an estimated 50 billion people throughout history. Yet research funding for malaria vaccines has been persistently inadequate compared to diseases that predominantly affect wealthy countries.

The first promising malaria vaccine candidate emerged in the 1980s, followed by clinical trials in the 1990s. However, development proceeded at a glacial pace. Why? The populations most affected by malaria couldn't afford to pay premium prices for vaccines, and pharmaceutical companies couldn't see a path to profitability. Without a guaranteed market, promising research languished.

Even when GlaxoSmithKline (GSK) finally developed Mosquirix, it required over 30 years of development and substantial non-profit funding. The ultimate cost was estimated at around $700 million—much of it coming from the Bill & Melinda Gates Foundation rather than traditional market-driven investment. And despite its importance, the vaccine's efficacy (about 30-40% reduction in severe malaria) reflects the chronic underinvestment compared to vaccines for diseases common in wealthy countries.

The malaria vaccine story isn't an exception—it's a pattern repeated across diseases affecting primarily low-income populations. This pattern extends beyond health to numerous technologies, including potentially AI systems and datasets for underrepresented groups.

A similar market failure plagues AI development for underrepresented populations. Creating high-quality datasets for languages spoken primarily in lower-income regions requires significant investment with uncertain returns. Why would a profit-maximizing company invest millions in developing Yoruba language datasets when the immediate commercial applications seem limited?

The lack of diverse training data creates a vicious cycle: without data, AI applications don't work well for certain populations; without working applications, there's little market incentive to collect more data. Breaking this cycle requires intervention—a mechanism to create markets where they otherwise wouldn't exist.

Advanced Market Commitments: Creating Markets Where None Exist

Advanced Market Commitments (AMCs) offer a compelling solution to this market failure. At their core, AMCs are a commitment by donors—typically governments or philanthropic organizations—to purchase a product at a pre-specified price once it meets certain criteria. This guaranteed market provides the incentive companies need to make investments they otherwise wouldn't.

The mechanism is elegant in its simplicity. As Kremer and colleagues describe it: "In an AMC, donors commit to a fund from which a specified subsidy is paid per unit purchased by low-income countries until the fund is exhausted, strengthening suppliers' incentives to invest in research, development, and capacity."

Unlike traditional grants that fund research directly, AMCs are results-based. Companies only receive payment when they deliver a product that meets specified standards. This aligns incentives and minimizes risk for donors, who pay only for success.

The beauty of AMCs lies in their dual nature: they combine the market's efficiency in producing innovations with a social mission to address needs the market alone wouldn't meet. They're not charity; they're market creation.

The GAVI Case Study: Proof of Concept

The theoretical appeal of AMCs found real-world validation in 2007 when the Gates Foundation and several countries pledged $1.5 billion toward a pilot AMC for pneumococcal vaccines. Pneumococcal disease was killing over 700,000 children under five each year, predominantly in low-income countries.

The pneumococcal AMC didn't fund research from scratch—effective vaccines already existed but weren't reaching children in developing countries. The challenge was incentivizing pharmaceutical companies to scale up production capacity and ensure affordable pricing for low-income countries.

The results speak for themselves. As of 2020, more than 225 million children had been vaccinated against pneumococcal disease through the program, preventing an estimated 700,000 deaths. The AMC accelerated vaccine introduction in developing countries by years and secured pricing nearly 90% below previous levels.

What makes this case study particularly relevant is that it addressed a problem structurally similar to the AI training data gap: a technically feasible innovation with enormous social value but insufficient market incentives.

Reimagining AMCs for AI Training Data

How might Advanced Market Commitments work for AI training data? Let's envision a concrete example:

A coalition of donors—perhaps including multilateral institutions like the World Bank, philanthropic organizations like the Gates Foundation, and forward-thinking governments—could establish an AMC fund specifically for developing high-quality, ethically collected training datasets for underrepresented languages and contexts.

The fund would specify clear criteria: dataset size, quality standards, diversity requirements, ethical collection methods, and validation procedures. It would then commit to purchasing datasets meeting these criteria at a price sufficient to incentivize development.

For instance, the AMC might commit to paying $10 million for a comprehensive, high-quality text dataset in Swahili that meets specific benchmarks. This guaranteed market would enable companies or research institutions to justify the investment required to create such datasets.

Importantly, these datasets could be made available under licensing terms that balance commercial viability with broad access. Some might become public goods, while others could use tiered pricing models where commercial entities pay more than academic or public-interest users.

The AMC could be structured to drive competition by committing to purchase from multiple providers, potentially improving quality and reducing costs. It might also include supply commitments from dataset creators, ensuring sustained data collection over time rather than one-time efforts.

Beyond Languages: Other Applications for Data AMCs

While language data presents the most obvious application, Advanced Market Commitments could address numerous other gaps in AI training data:

  1. Medical data from diverse populations: AMCs could fund the creation of ethically collected, privacy-preserving medical datasets from underrepresented regions, enabling the development of more universally effective healthcare AI.
  2. Cultural and contextual image data: Image recognition systems often perform poorly on objects, clothing, architecture, and scenes from non-Western contexts. AMCs could fund diverse image datasets that close these gaps.
  3. Multimodal data for accessibility applications: AI systems to assist people with disabilities require specialized training data. AMCs could fund datasets specifically designed to improve accessibility tools.
  4. Agricultural data from diverse ecological contexts: To build AI tools that help farmers in various regions, we need training data that captures diverse agricultural practices, crops, pests, and environmental conditions.
  5. Climate and environmental data from vulnerable regions: Many regions most vulnerable to climate change lack sufficient data to build effective AI models for climate adaptation and mitigation.

Each of these applications represents an area where commercial incentives alone are insufficient to generate the necessary data, but where the social value of having such data is enormous.

Challenges and Considerations

Advanced Market Commitments aren't without challenges. Getting the design right matters tremendously. As Kremer and colleagues note from their analysis of the pneumococcal vaccine AMC, "subtle changes in AMC design can have large effects." Several considerations would be critical for AI training data AMCs:

Quality verification: Unlike vaccines, where efficacy can be clearly tested, evaluating dataset quality is complex. AMCs would need robust validation mechanisms to ensure data quality meets specifications before payment.

Ethical data collection: Many past datasets have faced criticism for privacy violations or exploitative collection practices. AMCs would need to enforce stringent ethical standards, including informed consent and fair compensation for data contributors.

IP and access rights: The right balance of intellectual property protection and public access would be crucial. Too much restriction could limit social benefit; too little could undermine commercial participation.

Coordination across stakeholders: Effective AMCs require cooperation among funders, producers, and eventual users. Harmonizing diverse interests presents governance challenges.

Dynamic specifications: AI advances rapidly. AMC criteria would need to evolve to remain relevant as technology progresses.

Despite these challenges, the potential benefits are substantial. By creating markets for diverse training data, AMCs could help ensure that AI's benefits extend to currently underserved populations.

The Broader Vision: Democratizing AI Development

Advanced Market Commitments for AI training data represent more than just a technical fix—they embody a vision for more equitable technological development. By ensuring diverse data availability, AMCs could enable innovators worldwide to build AI applications suited to local needs rather than depending on systems designed primarily for wealthy markets.

Imagine language models that work as seamlessly in Amharic or Tagalog as they do in English. Picture healthcare AI that recognizes disease presentations across diverse populations. Envision agricultural advisory systems trained on farming practices from various ecological contexts.

These applications wouldn't just benefit underrepresented populations; they would enrich the global AI ecosystem. Diverse data leads to more robust, generalizable models—benefiting everyone.

The timing for such an initiative is ideal. We're still early in the AI revolution, with time to shape its direction. By establishing mechanisms like AMCs now, we can help ensure that AI development follows a more inclusive path than previous technological revolutions.

As we've seen with vaccines, waiting for market forces alone to address underserved populations can mean decades of unnecessary suffering. With AI poised to transform everything from healthcare to education to economic opportunity, we can't afford such delays.

Advanced Market Commitments offer a proven approach to bridge this gap—creating markets where they otherwise wouldn't exist and ensuring technological benefits reach those who need them most. The blueprint exists. The need is clear. The question is whether we have the vision and commitment to make it happen.

Written by

Abhilash Mishra

Abhilash Mishra

Founder and Chief Science Officer

Equitech Futures

Abhilash Mishra

More articles