Flipkart Product Search & Browse
Purpose
Search Flipkart (flipkart.com — India's largest e-commerce marketplace) for products by free-text query and return the ranked result set: title, brand, current and struck-off price, discount, star rating, rating/review counts, stock state, product id (pid), and canonical product URL. Optionally navigate to any result's detail page. Read-only — never adds to cart, logs in, or checks out.
When to Use
- Price / availability monitoring for a product or category on Flipkart.
- Bulk extraction of search results (title, price, rating, URL) across queries or pages.
- Resolving a search query to a canonical product
pid+ detail URL for downstream navigation. - Anywhere you'd otherwise scrape rendered Flipkart HTML — the data is already in the server-rendered page as JSON, so a single HTTP fetch beats driving a browser.
Workflow
Flipkart's search page is server-rendered: the complete result set (all ~24–40 product cards, pagination, sort options, breadcrumbs) is embedded in the HTML as a window.__INITIAL_STATE__ = {...} JSON blob. There is no separate public search API — the page makes zero XHR/fetch calls to retrieve results (confirmed via CDP network trace: 0 Fetch/XHR requests to any flipkart host on the search page). So the optimal path is one HTTP GET + JSON parse — no JS execution, no browser, ~100× cheaper than scripted browsing. Use a residential proxy for reliability (datacenter IPs occasionally return a transient 500; see gotchas).
Recommended: fetch + parse __INITIAL_STATE__
-
Build the search URL:
https://www.flipkart.com/search?q={url-encoded query}&page={N}&sort={sort}q— the search query (wireless+earbuds,running+shoes,laptop, …).page— 1-indexed; optional, defaults to 1. Total page count is in the response (often hundreds).sort— optional, one of:relevance(default),popularity,price_asc,price_desc,recency_desc,discount.
-
Fetch the HTML through a residential proxy:
browse cloud fetch "https://www.flipkart.com/search?q=wireless+earbuds&page=1" --proxiesReturns a JSON envelope
{ statusCode, headers, content, ... }; the HTML is incontent. ExpectstatusCode: 200. -
Extract the embedded state. Slice the substring after
window.__INITIAL_STATE__ =up to the closing</script, strip a trailing;, andJSON.parseit. -
Walk to the product widgets. Products live under
state.pageDataV4.page.data— iterate every slot key; the search results are in the (repeated) widget whosewidget.type === "PRODUCT_SUMMARY", each holding awidget.data.products[]array. Concatenate across allPRODUCT_SUMMARYwidgets and dedupe byproductInfo.value.id(cards repeat across slots). -
Decode each product from
product.productInfo.value(v):v.id— thepid(e.g.ACCH7KPDFXMWQ6XN).v.titles.title/v.titles.newTitle— product name;v.titles.superTitle— brand;v.titles.subtitle— variant (e.g. "Black, True Wireless").v.pricing.prices[]— array of{value, strikeOff}. Current price = the entry withstrikeOff: false; original (MRP) = the entry withstrikeOff: true.v.pricing.totalDiscountis the discount percentage (integer);v.pricing.discountAmountis the absolute rupee discount. Prices are integer INR (₹), no decimals.v.rating.average(out ofv.rating.base, =5),v.rating.count(total ratings),v.rating.reviewCount(text reviews),v.rating.roundOffCount(display string like "64.6K+").v.ratingmay be absent for unrated products.v.availability.displayState—IN_STOCK, etc.- Canonical URL =
https://www.flipkart.com+v.baseUrl(baseUrl already includes the?pid=query and the/p/itm…item id).
-
Metadata (sibling widgets under the same
page.data):PAGINATION_BARwidget →data.totalPages,data.currentPage;FILTER_SORT_OPTIONSwidget →data.query,data.productStartIndex/productEndIndex,data.breadCrumbs[].title(category path Flipkart auto-mapped the query into), anddata.sortOptions[](each.action.params.valueis thesortURL value). -
Navigate to a product (optional): GET the canonical URL the same way; the detail page also embeds a
window.__INITIAL_STATE__blob (product-detail-shaped —productPage/productInfotop-level keys) for richer specs.
Browser fallback
Only needed if the fetch path is rate-limited/blocked (not observed in testing). The search page renders cleanly on a Browserbase session with --verified --proxies. Critical: do NOT read window.__INITIAL_STATE__ or page.content() from the live page — the React app deletes the global and removes the inline is_script element after hydration, so the blob is gone from the live DOM (verified: page.content() does not contain __INITIAL_STATE__ after load). Instead, grab the raw navigation-response body, which still contains it:
- Playwright/CDP:
const resp = await page.goto(url); const html = await resp.text();then brace-matchwindow.__INITIAL_STATE__ = {…}out ofhtmland parse exactly as in the fetch path (steps 3–6). This is what the bundledplaywright.tsscript does and is verified working. browseCLI: the rendered product cards are present in the DOM (~125a[href*="/p/itm"]anchors, andbrowse get markdown bodylists them with prices/ratings), so DOM scraping of visible cards is a last resort — but reading the navigation response body and parsing the JSON is far more complete (all ~38 products, structured) and reliable.
For read-only extraction you never need to click anything; ignore the login-modal overlay.
Site-Specific Gotchas
- No standalone search JSON API. Flipkart server-renders results into
window.__INITIAL_STATE__; the search page issues 0 XHR/fetch calls for result data (verified by CDP network trace). Don't hunt for an/api/endpoint — parse the embedded state. (This differs from per-product async widgets on detail pages, which do lazy-load.) __INITIAL_STATE__is consumed and deleted on hydration — only the raw HTTP response has it. The React app readswindow.__INITIAL_STATE__and then removes both the global and the inlineis_scriptelement. So in a live browser,window.__INITIAL_STATE__,document.querySelector('#is_script'), andpage.content()all come up empty (verified). The blob survives only in the raw HTTP response body — i.e. exactly whatbrowse cloud fetchreturns, or Playwright's(await page.goto(url)).text(). A browser script that scrapespage.content()for the blob will silently return zero products. This is the reason the fetch path is recommended and why the browser fallback reads the navigation response, not the DOM.- The results slot under
page.datais an ARRAY of widgets, not a single widget. Slot10003(the mainPRODUCT_SUMMARYcarrier) is an array of repeatedPRODUCT_SUMMARYwidgets, each holding aproducts[]chunk. When walkingObject.values(page.data), flatten one level (spread array-valued slots) before filtering onwidget.type— otherwise you skip every product. (NaïveObject.values(...).filter(w => w.widget.type==='PRODUCT_SUMMARY')returns nothing for these slots.) - JSON-LD is not a usable fallback here. The page carries exactly one
<script type="application/ld+json">, but in testing it parsed to an object with no@type/keys (empty/placeholder) — do not rely on a schema.orgItemListfor extraction. The navigation-response__INITIAL_STATE__is the only complete source. - Proxies recommended, not strictly required.
browse cloud fetch ... --proxiesreturned200consistently; the same fetch without--proxiessucceeded but returned a transient500 Internal Server Erroron one of two attempts. Use--proxiesfor reliable, repeatable extraction. Stealth (--verified) only matters for the browser fallback. - Prices are an array, not a field.
v.pricing.pricesholds both the live price (strikeOff:false) and the MRP (strikeOff:true) — never assumeprices[0]is the current price (observed order is[MRP, current]). Filter by thestrikeOffflag. totalDiscountis a percentage;discountAmountis rupees. Don't conflate them (e.g.totalDiscount: 75means 75% off,discountAmount: 4500means ₹4500 off).- Cards repeat across slots. The same product appears in multiple
PRODUCT_SUMMARYwidgets; always dedupe byv.id. A single search page yields ~24–40 distinct products after dedupe. ratingcan be missing for new/unrated products — guard before readingv.rating.average.- Sponsored/ad products carry a populated
product.adInfoobject and may rank first; filter onadInfopresence if you need organic-only results. - Query auto-maps to a category. Flipkart resolves the free-text query into a category tree (
breadCrumbs), which scopes results — broad queries ("laptop") return fewer per page (~24) than category-rich ones ("wireless earbuds", ~38) because of card layout, not result scarcity. Total inventory is inPAGINATION_BAR.totalPages(e.g. 558 pages for "wireless earbuds"). baseUrlis already absolute-path + query-complete. It contains/p/itm…?pid=…; just prefix the origin. Don't re-append?pid=— it's already there.- Prices/availability are India-region (INR). Flipkart serves only India; there is no locale switch. Values are ₹.
- Currency has no symbol in the JSON —
valueis a bare integer; render the₹yourself.
Expected Output
{
"query": "wireless earbuds",
"page": 1,
"sort": "relevance",
"total_pages": 558,
"category_path": ["Home", "Audio & Video", "Headset", "Earphones"],
"result_count": 38,
"products": [
{
"id": "ACCH7KPDFXMWQ6XN",
"title": "GOBOULT Mustang Torq 60Hrs, App Support, 4Mic ENC, Breathable LED, 5.4v Bluetooth",
"brand": "GOBOULT",
"subtitle": "Yellow, True Wireless",
"current_price": 1499,
"original_price": 5999,
"discount_pct": 75,
"currency": "INR",
"rating": 4.2,
"rating_count": 64627,
"review_count": 4820,
"availability": "IN_STOCK",
"sponsored": false,
"url": "https://www.flipkart.com/goboult-mustang-torq-60hrs-app-support-4mic-enc-breathable-led-5-4v-bluetooth/p/itm74a6b52a73f95?pid=ACCH7KPDFXMWQ6XN"
}
]
}
For an unrated product the rating, rating_count, and review_count fields are null. If the query returns no matches, result_count is 0 and products is [].