top of page

Flipkart's Voice Search for Vernacular Users

  • Mar 4
  • 11 min read

The Bharat Internet Wave: India's Second Internet Population

Between 2016 and 2020, India experienced a structural transformation in its internet user base. The primary catalyst was Reliance Jio's entry into telecom in September 2016 with dramatically subsidised data tariffs, which brought hundreds of millions of first-time internet users online — predominantly from Tier II, III cities and rural India. A KPMG study on Indian Languages published in May 2018, cited across multiple Flipkart press communications and industry analyses, established the defining characteristic of this incoming cohort: 90% of new internet users in India were native language speakers, for whom English-language interfaces created a fundamental access barrier. The same study projected that Indian language internet users would grow at a compound annual growth rate of 18% to reach 536 million by 2021, and that the Hindi internet user base would outgrow English by 2021. For e-commerce platforms, this demographic shift posed both an opportunity and a product design crisis. India's existing e-commerce infrastructure — search bars, product descriptions, checkout flows, customer support — was almost entirely architected for English-literate, urban users. The Bain & Company–Flipkart joint study of 2020 provided sharper commercial context: online shoppers from Tier II cities already accounted for nearly half of all e-retail shoppers and contributed to three out of every five orders on leading platforms. The "next 200 million" was not a future projection — it was an already-arriving market segment that existing platform design was systematically excluding. The competitive response to this insight varied by platform. Amazon India had added Hindi to its apps in 2018 and launched a Hindi customer service chatbot by 2019. It added Bengali and Marathi voice shopping via Alexa in late 2021. Google Assistant extended banking services by voice in Hindi. However, unlike Amazon's reliance on the Alexa ecosystem — which required separate hardware investment — and unlike generic voice assistants designed for global markets, Flipkart pursued a vertically integrated, India-first voice architecture built entirely in-house. The strategic decision to build proprietary vernacular AI rather than adopt global voice infrastructure became the distinguishing element of Flipkart's approach.


MarkHub24

A Platform Built for Urban India, Facing a Rural Internet Revolution

By 2018, Flipkart was India's largest e-commerce marketplace by market share, with reported GMV of $7.5 billion and net sales of $4.6 billion in FY 2017–18, representing 50% year-on-year growth. In August 2018, Walmart completed its acquisition of a 77% stake in Flipkart for $16 billion — the largest e-commerce acquisition in history at the time. Despite its market leadership, Flipkart's product experience remained predominantly English-language and text-dependent. This created a structural ceiling on its addressable market precisely as the broadest wave of new internet adoption in Indian history was underway. The language barrier was not merely a UX inconvenience — it was a full access barrier for users who were literate in their native language but could not read or type in English. Flipkart's own CTO had articulated the challenge publicly in a 2016 interview: "The real challenge we're going to have to solve is bigger than Alexa. The problems that we see [in the U.S.] are not what we see in India." This framing acknowledged that no global voice platform — not Amazon's Alexa, not Google Assistant — was engineered for the specific linguistic complexity of Indian commerce. India's 22 constitutionally recognised official languages, each with distinct scripts, vocabulary, colloquial registers, and regional accents, required a purpose-built solution. The acquisition of Liv.ai in August 2018 was Flipkart's first publicly documented strategic investment in resolving this gap. It was also, notably, Flipkart's first acquisition following the Walmart investment — a signal of organisational priority. Kalyan Krishnamurthy, CEO of Flipkart Group, stated in the official acquisition press release: "Ultimately, we want to give our customers a conversational e-commerce experience and believe that with the voice interface the opportunities are endless including discovery, search, engagement, transactions etc."


Democratising E-Commerce: The "3V" Platform Thesis

Vernacular addressed the language barrier in reading, browsing, and product discovery for users uncomfortable with English text. Voice addressed the input barrier — the difficulty of typing, especially in Indian language scripts on mobile keyboards — by enabling spoken commands as the primary search interface. Video addressed the trust and comprehension barrier, enabling product demonstrations and reviews in regional languages for users who might not trust static text descriptions of unfamiliar product categories. Voice search, as a product initiative, was thus not a standalone feature but the second pillar of a structurally coherent market expansion strategy. As Flipkart Group CEO Kalyan Krishnamurthy stated in the September 2019 press announcement for the Hindi interface: "We have deployed around 80-90% of our resources towards solving for Bharat, with our Hindi interface being one of the biggest catalysts in this transition." This resource allocation statement, made in the context of India's largest e-commerce platform, was an unusually direct articulation of strategic priority — signalling that growth in urban, English-literate markets was secondary to the company's medium-term agenda.


A Four-Phase Rollout: From Acquisition to Platform-Wide Integration

Flipkart's vernacular voice strategy was not a single product launch but a sequenced, four-phase rollout documented across official press communications and credible media coverage between 2018 and 2022.


Phase 1 — Capability Acquisition: Liv.ai (August 2018)Flipkart acquired Liv.ai, an IIT-Kharagpur founded AI startup, for a reported ~$40 million. Liv.ai was the first Indian company to build speech-to-text APIs for 10 Indian languages with low-latency conversion using deep neural network architecture. Its technology handled multiple accents and performed well in noisy environments — a critical design requirement for Tier II and III Indian contexts. Post-acquisition, Liv.ai became Flipkart's "Centre of Excellence for Voice Solutions." The company did not white-label any global voice platform; all voice infrastructure was built or owned in-house.


Phase 2 — Vernacular Interface Launch: Hindi (September 2019)After months of intensive ethnographic research — described by Flipkart as field visits to users' homes across Tier II and III cities — Flipkart launched a full Hindi interface on its platform. Users could now browse, search, and read product information in Hindi. The launch was timed ahead of the festive sale season, reflecting a deliberate commercial strategy: acquiring first-time Hindi-speaking buyers before the highest-demand period of the Indian e-commerce calendar. The interface was built on Flipkart's in-house "Localisation and Translation Platform," which would later handle 5.4 million translated and transliterated words across languages.


Phase 3 — Voice Assistant for Grocery and South Indian Language Interfaces (June 2020)In June 2020, Flipkart launched a voice-first conversational AI platform on its grocery store, Super mart, supporting Hindi and English. An ethnographic study over five months across multiple towns and cities informed this launch. The same month, Flipkart expanded its vernacular interface to Tamil, Telugu, and Kannada — targeting South Indian states that accounted for, in Flipkart's own words, "a significant proportion of Flipkart's growing user base" and showed higher native-language script adoption. Native language adoption on the platform grew 2.5 times from the pre-COVID period to the festive season of 2020.


Phase 4 — Platform-Wide Voice Search Rollout (March 2021)Having tested voice in grocery, Flipkart extended the feature platform-wide in March 2021, supporting Hindi and English (and Hinglish — a blend of both). The technical stack — Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Text-to-Speech (TTS) — was built entirely by Flipkart's in-house engineering and data science teams. The feature handled colloquial search commands such as "kaala joota dikhana" (show me a black shoe) and "sabzi kaatne waala dena" (give me a vegetable cutter), specifically designed to mirror how non-digitally-native users would naturally describe products. By March 2021, with a rolling launch since January 2021, the platform reported 5 million+ voice queries per day. In February 2022, Flipkart extended voice search to its B2B arm, Flipkart Wholesale, for Kirana store owners — businesses that primarily serve semi-urban and rural markets. At this stage, Flipkart reported approximately 3 million daily voice queries on the wholesale platform, with more than half originating from towns with fewer than 50,000 people. By September 2022, Flipkart added Marathi to bring the total to six supported languages (English, Hindi, Marathi, Tamil, Telugu, Kannada), with the company reporting a 95% continued-usage rate among users who had adopted regional language interfaces since 2019.


The Ethnographic Foundation: Research Before Engineering

What distinguishes Flipkart's vernacular voice strategy from a purely technology-driven product decision is the depth of consumer research that preceded each product phase. Flipkart publicly documented conducting ethnographic studies — field research in consumers' homes and surrounding environments — prior to both the Hindi interface launch (September 2019) and the grocery voice assistant launch (June 2020). The grocery voice assistant launch was specifically preceded by a "detailed ethnographic study for over five months, in multiple towns and cities," according to Flipkart's official press communication carried by Your Story. The consumer insights surfaced by this research — as disclosed through official Flipkart communications — were specific and actionable. The Business Today report on the vernacular interface launch noted that Flipkart's ethnographic study found: users showed a preference for using hybrid words that mixed English with regional dialects (validating the "Hinglish" search capability); users employed a mix of translation and transliteration in the same sentence without realising it; and native language platform usage for socialising was higher than for commerce — suggesting that language comfort in informal contexts had not yet transferred to transactional contexts, creating a design opportunity. These insights directly shaped the technical design choices: the voice search system was built to handle colloquial, hybrid-language commands rather than grammatically precise queries. The strategic consumer insight underlying the entire initiative is best understood through the Jobs-to-be-Done (JTBD) framework. The "job" a first-time e-commerce user from Tier III India hired a platform to do was not merely "find a product" — it was "shop with confidence in a context that feels familiar and safe." An English-language, text-input interface failed this job entirely for a user whose reading language was Hindi and whose natural expression was spoken. Voice search in the user's own language and dialect completed this job. The 95% continued-usage rate among regional language interface adopters — disclosed by Flipkart in September 2022 — is the most direct available evidence that the product had successfully completed the job for which it was designed.


Product as Market Entry: No Above-the-Line Required

Flipkart's go-to-market strategy for its vernacular voice features was structurally different from a conventional product launch campaign. The platform did not deploy or publicly disclose a paid media strategy for these feature launches. Instead, the channel strategy was embedded in the product itself and its distribution through the Flipkart app ecosystem — available on the Google Play Store across the entirety of Android India's smartphone base. The timing of key launches was, however, clearly calibrated to commercial objectives. The Hindi interface was launched in September 2019 — specifically ahead of Flipkart's Big Billion Days festive sale, India's highest e-commerce volume period. As Business Standard reported at the time, Flipkart explicitly framed the Hindi launch as preparation for festive season customer acquisition from Tier II and III markets. This "product as acquisition tool" approach treats the feature itself as the marketing event, generating press coverage, word-of-mouth among new user segments, and direct acquisition of first-time buyers who had previously been excluded by language barriers. The B2B extension to Flipkart Wholesale (February 2022) introduced a separate channel dimension: Kirana store owners as a distribution surface. Kirana stores serve as the primary retail touchpoint for hundreds of millions of Indians, particularly in semi-urban and rural markets. By making voice-enabled procurement accessible to Kirana owners — business operators who were often vernacular-primary themselves — Flipkart extended its vernacular strategy from B2C customer acquisition to B2B supply chain integration, deepening platform dependence among a commercially critical user segment. No verified public information is available on above-the-line advertising spend, specific digital marketing campaigns, or agency partnerships used to promote Flipkart's vernacular voice features.


What the Flipkart Vernacular Strategy Teaches Brand and Product Strategists


In access-barrier markets, product design is the most powerful marketing tool. Flipkart's vernacular voice strategy generated 5 million daily voice queries without a documented paid media campaign to promote the voice feature specifically. This outcome illustrates a principle critical for brands targeting first-generation internet users: when a product genuinely removes a structural access barrier, the product experience itself drives adoption more powerfully than advertising. The implication for brand strategists is that in emerging market contexts — whether defined by language, digital literacy, income, or connectivity — investment in product localisation frequently delivers stronger acquisition outcomes than equivalent investment in media spend.

Ethnographic consumer research is a competitive advantage, not a process step. Flipkart's documented commitment to multi-month field research before each product phase — studying users in their homes, in their natural environments, in their actual language — produced consumer insights that were not available from quantitative data analysis alone. The discovery that Indian users naturally mix translation and transliteration in the same sentence, and prefer hybrid language registers over grammatically pure vernacular, directly shaped the "Hinglish" capability of the voice search system. Competitors relying on global voice platforms were unable to incorporate this insight into their products because their AI models were trained on different corpora. Consumer research translated into product specifications that competitors could not replicate.

Vertical integration in AI is a structural moat in localised markets. Flipkart's decision to acquire Liv.ai and build its voice stack entirely in-house, rather than licensing global voice APIs, created a proprietary technology asset that served as a competitive moat. Global voice APIs — from Google, Amazon, or Microsoft — were trained predominantly on English data and could not handle the noise profiles, accent diversity, or hybrid language patterns of Indian vernacular speech with comparable accuracy. By owning the technology, Flipkart could iterate on its models using its own catalogue data, continuously improving voice search quality in a way that platform-dependent competitors could not match. This is a template for any brand competing in markets with distinctive linguistic, cultural, or contextual characteristics: owning the AI layer, rather than licensing it, creates advantages that deepen over time.

Sequenced rollouts reduce risk and generate institutional learning. Flipkart's phased approach — beginning with the grocery category before expanding voice search platform-wide — is a textbook application of risk-managed innovation. Grocery is a high-frequency, low-complexity product category: users purchase familiar items repeatedly, making it an ideal testing environment for voice search accuracy and usability. The learning from two years of grocery voice interaction informed the platform-wide architecture. This sequencing is consistent with the build-measure-learn loop of product-led growth, applied to a feature with significant technical complexity and a user segment with limited e-commerce experience.

The 95% retention signal and its strategic implications. The retention figure of 95% continued usage among regional language interface adopters — if taken at face value from Flipkart's disclosed data — has significant strategic implications beyond the vernacular feature itself. It suggests that language-native onboarding creates a fundamentally different user relationship with the platform than English-language onboarding. A user who has been onboarded in Hindi or Tamil is, effectively, onboarded into a Hindi or Tamil version of Flipkart — one that competitors would need to replicate in its entirety, not just match on price or assortment, to compete for the same user. Language-native onboarding may therefore function as a platform lock-in mechanism, making vernacular investment a long-term retention strategy in addition to a near-term acquisition tool.


Discussion Questions

  1. Flipkart's ethnographic research revealed that Indian users naturally employ hybrid language registers — mixing Hindi and English in the same search query — and that this behaviour is unconscious rather than deliberate. How should product strategists approach the challenge of designing AI systems for linguistically heterogeneous markets? What are the implications for data collection, model training, and quality assessment in markets like India, where standardised language datasets may not capture actual usage patterns?

  2. Flipkart's CEO publicly stated that 80-90% of the company's resources were directed toward the "Bharat" user segment at the time of the Hindi interface launch. Evaluate this resource allocation decision from a portfolio strategy perspective. What are the trade-offs between investing heavily in new, lower-monetisation user segments versus optimising for the existing, higher-ARPU urban base? Under what conditions is this trade-off justified?

  3. The 95% continued-usage rate among vernacular interface adopters — if accurate — suggests that language-native onboarding creates platform stickiness that English-language onboarding does not. Design a research study that would allow you to test whether this retention differential is driven by language comfort, product familiarity, absence of alternatives, or some combination of factors. How would your findings shape investment decisions in vernacular product capabilities?

  4. Flipkart chose to acquire Liv.ai and build its voice infrastructure entirely in-house rather than licensing global voice APIs from Google or Amazon. Amazon India, by contrast, leveraged Alexa. Analyse the build-vs-buy decision in the context of vernacular AI for India. What criteria should determine whether a company builds proprietary AI capabilities versus licensing third-party solutions, and how does market maturity, data availability, and competitive dynamics affect this decision?

  5. Flipkart's vernacular voice strategy was framed explicitly as a tool to "democratise e-commerce" and bring the "next 200 million consumers online." Assess the strategic tension between this social impact framing and the commercial imperative of profitability. To what extent should a platform's market expansion into lower-income, lower-ARPU demographics be evaluated on commercial returns versus ecosystem growth metrics? How would you structure the business case for such an investment for the Walmart board?


Comments


© MarkHub24. Made with ❤ for Marketers

  • LinkedIn
bottom of page