Taobao Product Search
Purpose
Search Taobao (China consumer marketplace) for products by keyword in Chinese or English and return structured product results — title, price in CNY/RMB, sold count, shop name, rating, Tmall flag, and canonical item URL — with optional filters for price range, ship-from location, shipping options, and sort by sales / price / rating. Read-only — never adds to cart, never posts a comment, never logs in.
When to Use
- Cross-border price comparison: pulling CNY price points for a product before bargaining with a 1688 / Yiwu sourcing agent.
- Aggregating sold counts and Tmall vs. Taobao shop split for a SKU across multiple keywords (market sizing).
- Surfacing the cheapest mainland-China-shipped variant of a Chinese-language product name for an end user.
- Any read-only crawl that needs
{title, price, sold, shopName, rating, isTmall, url}per result row.
Workflow
Taobao's PC search page (s.taobao.com/search) is a thin React shell that fetches results from the internal mtop API at h5api.m.taobao.com/h5/mtop.taobao.wsearch.h5search/. The API requires (a) an appKey, (b) a per-session _m_h5_tk token freshly minted by visiting www.taobao.com, (c) a request sign derived from the body + token + timestamp, and (d) the cookie2 + t cookies set on first page hit. There is no public unsigned API. The browser path is the only honest option — but be aware that on Browserbase IP ranges Alibaba's Baxia anti-bot system serves a flat cloud_ip_bl deny (no captcha to solve) and the API XHR never returns. Verified across 4 iterations 2026-05-20: --verified --proxies --solve-captchas in us-east-1 and ap-southeast-1 all hit the same wall. Document this as a candidate skill until Browserbase IP rep changes or a logged-in context-id is supplied.
-
Create the most-stealthy session available:
SID=$(browse cloud sessions create --keep-alive --verified --proxies --solve-captchas \ --region ap-southeast-1 \ | node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>process.stdout.write(JSON.parse(s).id))") export BROWSE_SESSION="$SID"ap-southeast-1is the closest Browserbase region to mainland China and gives the residential proxy a marginally better reputation profile than US regions, but in our 2026-05-20 run it still gotcloud_ip_bl. If you have a Browserbase context with logged-in Taobao cookies (cookie2,_tb_token_,t,_m_h5_tk) baked in, pass--context-id <ID>— that bypasses the Baxia IP gate because the API trusts a warm session. -
Warm up cookies on the homepage first (don't go straight to
/search?q=):browse open "https://www.taobao.com/" --remote browse wait timeout 3000 --remoteThis sets the
thw=cn,cna, andtfstkcookies that the wsearch endpoint validates. Skipping this step doubles the chance of an immediate Baxia 405-then-deny redirect. -
Encode the query — Chinese keywords MUST be percent-encoded (UTF-8). English passes through. Construct:
https://s.taobao.com/search?q={urlenc-query} &sort={_coefp|_sale-desc|price-asc|price-desc|_ratesum-desc} &loc={location-cn-name} &start_price={n}&end_price={n} &filter={tag1};{tag2} &tab={all|mall|taobao|shop|bizz} &style=list &page={1..100}Verified URL parameters from the page's
window.__last_search_params(iter-2):q— query (URL-encoded UTF-8 for Chinese; e.g.%E6%89%8B%E6%9C%BAfor "手机").sort—_coefp(default "综合"),_sale-desc(sales desc, the default for sort=sale-desc URL param),price-asc/price-desc,_ratesum-desc(rating).tab—all(默认 "所有宝贝"),mall(Tmall only — same astab=mallfromlist.tmall.comredirect),taobao(淘宝-only excludes Tmall),shop(店铺),bizz(企业购 / B2B).loc— ship-from location, accepts Chinese names:广东,上海,北京,浙江,江苏. Mainland-China-only is implicit whenlocis set to a Chinese province; for "anywhere in mainland" omit the param (国际 cross-border listings only appear iflocis left blank AND noservice=tmallfilter is set).start_price/end_price— integer CNY. Max value enforced server-side (~999999).filter— semicolon-joined service flags. Known:service:tmall(Tmall only),service:cod(cash-on-delivery),service:freeshipping,service:postFee=0,service:24h(24h ship),service:7day(7-day-no-reason-return).style=listreturns the dense list view (more parseable thanstyle=card).page— 1-indexed, hard cap 100.pageSizeis fixed at 48 (cannot be raised).totalResultsandtotalPagein__last_search_paramsare placeholders that read 4800/100 BEFORE the API call returns — only treat them as ground truth after the product cards have rendered (see step 6).
-
Open the constructed URL:
browse open "$URL" --remote browse wait load --remote browse wait timeout 8000 --remote # the wsearch XHR completes 2-6s after `load` fires -
Branch on Baxia state BEFORE trying to parse cards — the page renders the skeleton + nav + pagination "1/100" + sort tabs (综合/销量/价格) regardless of whether the API succeeded, so DOM presence is not proof of data:
browse eval "(() => ({ dialog: !!document.querySelector('.baxia-dialog'), items: document.querySelectorAll('a[href*=\"item.taobao.com\"], a[href*=\"detail.tmall.com\"]').length, bodyTextEnd: document.body.innerText.slice(-50) }))()" --remotedialog: trueANDitems: 0ANDbodyTextends in加载中...— Baxia denied. Stop. Emit{success: false, reason: "anti_bot_blocked"}.dialog: falseANDitems >= 30— API returned. Proceed to step 6.dialog: falseANDitems: 0— wait another 4 s, then re-check. If still 0, treat asno_resultsonly after also confirming__last_search_params.totalResults === 0.
-
Extract one record per
[data-name="item"]card. Each card is adiv[data-name="item"](ordiv[data-name="itemNT"]for "new" Tmall cards). Per card:- Title:
[class*="title"] span[title]—getAttribute('title')gives the full text (otherwise truncated by…). Tmall cards prefix the title with a red "Tmall" badge — exclude that badge from the title string. - Price (CNY): the dominant
[class*="priceInt"]+[class*="priceFloat"]pair concatenated → e.g."1899" + ".00"→1899.00. Some Tmall cards use a single[class*="priceWrapper"] spanwith embedded ¥ glyph — strip¥and parse. - Sold count:
[class*="realSales"]text, e.g."1万+人付款"→ 10000. Patterns:\d+人付款(literal),\d+万\+人付款(×10000+). For card variants:class*=salesText. - Shop name:
[class*="shopNameText"](Tmall+Taobao unified). - Tmall flag: card root has
data-name="itemNT", OR detail link starts withhttps://detail.tmall.com/, OR card root has class containingtmall/mallStyle. - Rating: NOT shown on the search results page in 2024+. Taobao deprecated card-level rating display ~2022. Listed for completeness but always
nullin output. Only available on the item detail page. - Canonical URL: card root
a[href]— Tmall items:https://detail.tmall.com/item.htm?id={itemId}; Taobao items:https://item.taobao.com/item.htm?id={itemId}. Strip all query params exceptidfor canonicalization. - Ship-from location:
[class*="procity"]text — e.g."广东 深圳"(province city).
- Title:
Browser fallback (degenerate / no-data case)
If the Baxia wall is up (the dominant case for Browserbase IPs as of 2026-05-20), there is no page-context fallback that returns real product data. Honest options:
- Return
{success: false, reason: "anti_bot_blocked"}with the URL we attempted. - Surface
__last_search_params(the search query echo) andtotalResults/totalPageplaceholder values so the caller can confirm the URL parameters were accepted, even if data was withheld. - Capture a screenshot of the Baxia "请输入验证码" dialog as evidence — useful when escalating to a human reviewer.
Do not try to scrape s.m.taobao.com/h5?q=... as a fallback — it is also fully JS-rendered and depends on the same blocked mtop API. Do not try list.tmall.com/search_product.htm?q=... — it 302s to s.taobao.com/search?fromTmallRedirect=true&tab=mall (same blocked page). Do not try world.taobao.com/search/search.htm?q=... — it also 302s to s.taobao.com/search. All three were verified dead ends in iter-3.
Site-Specific Gotchas
- READ-ONLY. Never click a "立即购买" / "加入购物车" button — that starts a checkout flow.
- Baxia (Alibaba's anti-bot system) flat-denies all Browserbase IP ranges with
cloud_ip_bl— verified across 4 sessions 2026-05-20 spanning bare,--verified,--verified --proxies, and--verified --proxies --solve-captchasin two regions (us-east-1,ap-southeast-1). The deny is served as an iframehttps://bixi.alicdn.com/punish/punish:resource:template:baba:default_*.html?qrcode=...|cloud_ip_bl|0&action=deny— noteaction=deny(notaction=challenge); there is no captcha for--solve-captchasto solve. The only known workaround is supplying a Browserbase context-id with a logged-in Taobao session, OR running from a non-Browserbase residential IP. Do not waste wall time iterating without one of those. - Login link is rendered on every page —
亲,请登录redirects tologin.taobao.com/member/login.jhtml?redirectURL=.... The presence of this link does NOT mean login is required for the search itself; it is rendered for all anonymous sessions. Login is only required if you want to (a) save a search, (b) see "personalized" rankings, or (c) view shop-only sections. __last_search_params.totalResults = 4800andtotalPage = 100are placeholders, not API responses. They are populated from URL params + defaults before the wsearch XHR fires. Verify they have changed (or thatitems.length > 0in the DOM) before trusting them.- Page cap is hard 100.
page=101returnspage=1data silently. To enumerate beyond 4,800 items you must narrow with category, price, or shop filters and re-search. - Sort-param translation: URL
sort=sale-descis rewritten to internal_sale-desc;sort=price-asc→price-asc(no underscore prefix); the underscore prefix appears to be vestigial. Default (no sort) is_coefp("综合排序" = relevance). - Tmall is part of Taobao. Tmall items appear in
tab=allresults with a red badge. To filter Tmall-only, usetab=mallORfilter=service:tmall. To exclude Tmall, usetab=taobao. There is no native "exclude-Tmall" param if you also want shop-results — you must filter client-side ondata-name="item"(Taobao) vs.data-name="itemNT"(Tmall). - Rating is NOT exposed on the search page anymore (deprecated post-2022). The skill's
ratingfield is alwaysnull. To get rating, follow the canonical URL toitem.htm?id=...and read DSR (Detail Seller Rating) from the shop info widget — that is a separate skill. - Sold count format is non-numeric Chinese text:
1万+人付款= 10,000+,5000+人付款= 5,000+,已售10万+件= 100,000+. Parse万as ×10000,+as a lower bound. Some new cards showmonthSales("月销量 1.2万+") instead of cumulativerealSales— they are different metrics; document which you extracted. - Mainland-China-shipped enforcement:
loc=set to a Chinese province name silently excludes Hong Kong, Macau, Taiwan, and overseas listings. To include those, omitlocAND avoidfilter=service:tmall(which is mainland-only). The marketplace flag for cross-border listings is属性->发货地containing海外(overseas) — visible only if not filtered out. qmust be UTF-8 percent-encoded for Chinese. GBK-encoded queries silently return zero results (Taobao's URL rewriter assumes UTF-8 since 2018).--solve-captchasis a no-op here. Baxia'scloud_ip_bldeny path does not present a captcha — it serves a "请稍后再试" / "请输入验证码" QR-code-only dialog with no human-solvable challenge. Browserbase's solver looks for a Recaptcha / hCaptcha widget and finds nothing.- Don't waste wall time on
h5api.m.taobao.com/h5/mtop.taobao.wsearch.h5search/. Direct curl returns Baxia HTML even with proxies + spoofed cookies +Referer: https://s.taobao.com/. The endpoint validates asignHMAC over(appKey + t + body + _m_h5_tk-prefix)and the_m_h5_tktoken is rotated server-side every ~30 minutes, gated behind a successfulmtop.alibaba.acl.guard.guardEntryround-trip that itself requires the same anti-bot pass. Verified blocked in iter-2. - Recommendation feed iframes are always blocked even when search itself works. The "猜你喜欢" sidebar fetches
mtop.relationrecommend.wirelessrecommend.recommend— that endpoint is the one consistently observed serving Baxia denies even when wsearch succeeds. Iframes showingAccess denied — We have detected unusual trafficare NOT proof that the search itself failed; checkitems.lengthon the main grid. browse snapshotis fine but heavy. The PC search page DOM is 5,000+ refs when fully populated. Preferbrowse evalto extract just the fields you need rather than parsing the full a11y tree.
Expected Output
Three distinct outcome shapes:
// Search succeeded — products extracted
{
"success": true,
"query": "phone",
"sort": "_coefp",
"filters": { "loc": null, "start_price": null, "end_price": null, "filter": null, "tab": "all" },
"totalResults": 4800,
"page": 1,
"results": [
{
"title": "Apple/苹果 iPhone 15 Pro Max 256G 全网通5G手机",
"price": 7999.00,
"currency": "CNY",
"sold": 50000,
"soldRaw": "已售5万+件",
"shopName": "Apple Store官方旗舰店",
"rating": null,
"isTmall": true,
"shipFrom": "上海",
"url": "https://detail.tmall.com/item.htm?id=735810829485"
}
]
}
// Anti-bot wall — Baxia denied the wsearch API
{
"success": false,
"reason": "anti_bot_blocked",
"antiBotSystem": "baxia",
"denyCode": "cloud_ip_bl",
"url": "https://s.taobao.com/search?q=phone&sort=sale-desc",
"lastSearchParamsEcho": { "q": "phone", "sort": "_coefp", "totalResults": 4800, "totalPage": 100 }
}
// Zero results (rare — usually a Chinese-encoding bug or over-narrow filter)
{
"success": true,
"query": "...",
"totalResults": 0,
"results": []
}