India's AI Cannot Be Built on Someone Else's Language: The Bold Mission of Sarvam AI
- 3 days ago
- 7 min read
Picture a farmer in Tamil Nadu asking a voice assistant for advice on crop disease. The assistant responds in English. Picture an elderly woman in Odisha trying to access a government scheme through a chatbot. The bot understands her typed input poorly and her spoken words not at all. Picture a small business owner in Rajasthan trying to use an AI tool to draft a customer message in Marwari. The tool has never heard of Marwari.
This is not a hypothetical. This is the reality of AI built almost entirely on English-language data, trained on Western internet content, and optimised for users in the Global North.
India has 1.4 billion people. It has 22 officially recognised languages. It has hundreds of dialects, scripts, and regional variations of expression. And for most of them, the world's most powerful AI models — trained predominantly on English text — simply do not work well.

Two engineers looked at that problem and made a decision that was either visionary or audacious, depending on when you asked them: they would build AI from scratch, in India, for India, in India's languages, and rooted in India's data.
That decision became Sarvam AI.
The Researchers Who Were Already on This Path
Sarvam did not emerge from a eureka moment in a garage. It emerged from decades of accumulated conviction about what Indian AI needed to look like.
Dr. Pratyush Kumar holds a BTech in electrical and electronics engineering from IIT Bombay, followed by a PhD in computer engineering. His career took him through deep research roles at IBM and Microsoft, and then back to India — to IIT Madras, where he served as a faculty member and, critically, co-founded AI4Bharat, a pioneering open-source initiative dedicated to advancing artificial intelligence for Indian languages. He also co-founded One Fourth Labs, an applied AI education and research organisation. By the time he co-founded Sarvam AI, Pratyush Kumar had spent years building the intellectual and technical foundation for exactly this kind of work.
Dr. Vivek Raghavan's path was different but equally distinctive. A graduate of IIT Delhi with a PhD in electrical and computer engineering from Carnegie Mellon University, Vivek had spent nearly 12 years volunteering with the Unique Identification Authority of India — building the biometric systems that powered Aadhaar, the world's largest identity programme. He also contributed to EkStep Foundation and Khosla Labs, building large-scale digital platforms that served hundreds of millions of Indians. Vivek Raghavan understood, from the inside, what it meant to build technology at India's scale — not for a niche, not for the top of the pyramid, but for everyone.
When the global AI wave surged after the release of ChatGPT in late 2022, these two men saw not a threat but an opening. They saw a window in which India, if it moved fast enough and thought clearly enough, could build something genuinely its own.
In August 2023, they co-founded Sarvam AI in Bengaluru. The name itself was deliberate — Sarvam means "everything" in Sanskrit. The ambition the name encoded was equally deliberate: AI that works for all of India, across all its languages and contexts.
Forty-One Million Dollars in Five Months
The velocity with which the world responded to Sarvam AI's founding was itself a signal.
In December 2023 — just five months after the company was incorporated — Sarvam AI announced a funding round of $41 million. It was led by Lightspeed Venture Partners, with participation from Peak XV Partners and Khosla Ventures. Three of the world's most credible technology investors had backed an Indian AI company that had barely begun operations.
The investment was not just a vote of confidence in Pratyush Kumar and Vivek Raghavan, though their combined track records were formidable. It was a statement that the market believed in the thesis: India's linguistic and cultural diversity was not a niche problem. It was a massive, underserved market that no existing AI company was adequately addressing.
Total funding raised by Sarvam AI has reached $53.6 million.
Building the Full Stack, Not Just the Model
Sarvam AI did not set out to be a single-product company. From the beginning, its stated ambition was to build a full-stack generative AI platform — covering foundational models, speech systems, document intelligence, and developer APIs — all optimised specifically for Indian use cases.
The product portfolio that has emerged reflects exactly that ambition.
Saaras is Sarvam's automatic speech recognition system — supporting streaming and batch recognition across all 22 official Indian languages plus English, trained on over one million hours of curated multilingual audio data. For a country where hundreds of millions of people are more comfortable speaking than typing, and where voice is often the most natural interface with technology, this is foundational infrastructure.
Bulbul is Sarvam's text-to-speech model — converting written text into natural-sounding spoken audio across 11 Indian languages, with 39 distinct speaker voices that capture regional accents and tonal variations authentically. Bulbul V3, released in February 2026, supports over 30 voices across 11 Indian languages and Hinglish. For a farmer receiving agricultural advice through a phone call, or a citizen navigating a government service through a voice interface, Bulbul makes AI sound like someone who actually understands where they come from.
Sarvam Vision is a document intelligence model — capable of extracting text, tables, charts, and structured data from documents and images, including support for OCR across Indian language scripts, handwritten documents, low-resolution scans, and stamped records that would confuse any model trained only on clean digital text.
Sarvam-1, released in October 2024, is a 2-billion-parameter language model optimised specifically for Indian languages, prioritising token efficiency and faster inference. In May 2025, Sarvam unveiled Sarvam-M — a 24-billion-parameter hybrid model focused on reasoning and multilingual Indic tasks, trained to understand and respond in 10 Indian languages including Hindi, Tamil, Bengali, and Marathi.
Each of these products is not a global model with Indian language support bolted on. Each is built from the ground up for Indian reality.
The Government Mandate That Changed Everything
In April 2025, the Ministry of Electronics and Information Technology selected Sarvam AI to build India's first sovereign large language model under the IndiaAI Mission — a government initiative backed by an outlay of ₹10,372 crore to build a domestic AI ecosystem.
The IndiaAI Mission had been launched in March 2024 with an explicit goal: to ensure India had the compute infrastructure, the data resources, the research capability, and the foundational models to build and govern its own AI future — without depending entirely on models built and controlled by foreign companies.
Sarvam was among 12 organisations tasked with building AI models on Indian datasets. It was then selected to lead the sovereign LLM programme. The model it is building is expected to have 120 billion parameters, with 15 to 20 percent of its training corpus drawn from Indian data — making it the largest AI model ever built natively in India.
Compute infrastructure for the project is being provided under the IndiaAI Mission, with data centre support from Yotta and technical support from Nvidia. The model is planned for open-source release following its development — ensuring that what India builds, India can share.
Ashwini Vaishnaw, the Union Minister for Electronics and Information Technology, expressed the government's confidence clearly: "We are confident that Sarvam's models will be competitive with global models."
The Marketing Strategy: Open, Aggressive, and Built on Technical Credibility
Sarvam AI's approach to marketing is unusual in the Indian startup ecosystem — and that unusualness is intentional.
Open-source as trust-building. From early in its journey, Sarvam embraced open-source publishing of its models and research. Making models available for the developer community to download, evaluate, and use independently is a form of marketing that cannot be faked: if the model is good, developers know. If it is not, they also know. Sarvam's commitment to open-source — including the announced intention to open-source the sovereign LLM it builds under the IndiaAI Mission — signals a confidence in its work that is itself a marketing statement.
"14 Days, 14 Launches." Ahead of the India AI Impact Summit in early 2026, Sarvam executed one of the boldest product marketing campaigns by any Indian AI company: 14 consecutive days of new product and model launches, building public anticipation and media coverage in a deliberate, rolling cadence. The campaign culminated in the unveiling of two foundational LLMs — Sarvam-30B and Sarvam-105B. The strategy drew explicit comparisons to OpenAI's rapid release cadence in late 2024 — a deliberate positioning of Sarvam as a company operating at the same velocity as the world's leading AI labs, just with India at the centre.
Benchmark-led credibility. Rather than relying on brand advertising, Sarvam has consistently positioned its products through performance benchmarks — publishing results that show how its models perform against global alternatives on Indian language tasks. When the claim is that Sarvam Vision outperforms leading global models on Indian-script OCR, the supporting evidence is technical, measurable, and reproducible.
Sovereign AI as a national narrative. Sarvam has consistently aligned its messaging with something larger than commercial success: the idea of India's strategic AI autonomy. This is not cynical positioning — it is a genuine extension of both founders' careers, built around public digital infrastructure for India. But it is also powerful marketing, because it connects the company's work to a national conversation that has political, economic, and cultural dimensions.
A Name That Means Everything — For Good Reason
There is a particular kind of ambition that does not announce itself with noise. It announces itself with precision — with a carefully chosen name, a clearly articulated thesis, and the willingness to build something slow and foundational when the world is chasing fast and fashionable.
Sarvam AI has been building since August 2023. In less than two years, it raised over $53 million, built a portfolio of speech, language, and vision AI systems, earned the Indian government's mandate to build the country's sovereign AI model, and put Indian AI on the global map with a product launch strategy that drew worldwide attention.
The work ahead is enormous. Building a 120-billion-parameter model. Making it work across 22 Indian languages. Ensuring it is trusted by governments, enterprises, and citizens alike. Open-sourcing it so that every Indian developer and researcher can build on it.
But the founders have been preparing for exactly this work, in various forms, for decades. And India — a country that invented the concept of zero, that built the world's largest biometric identity system, that sends missions to the Moon and Mars — is not a country short of the ambition to build its own AI.
Sarvam is simply making sure the ambition has a home.
Founded August 2023. Funded in five months. Selected for India's sovereign AI in 2025. Built for a billion people, in their own languages.



Comments