taobao.com

search-products

Installation

Adds this website's skill for your agents

 

Summary

Search Taobao (China consumer marketplace) by keyword and return structured product results — title, CNY price, sold count, shop name, Tmall flag, and canonical URL — with filters for price range, ship-from location, shipping, and sort by sales / price / rating. Read-only.

FIG. 01
FIG. 02
FIG. 03
FIG. 04
FIG. 05
SKILL.md
200 lines

Taobao Product Search

Purpose

Search Taobao (China consumer marketplace) for products by keyword in Chinese or English and return structured product results — title, price in CNY/RMB, sold count, shop name, rating, Tmall flag, and canonical item URL — with optional filters for price range, ship-from location, shipping options, and sort by sales / price / rating. Read-only — never adds to cart, never posts a comment, never logs in.

When to Use

  • Cross-border price comparison: pulling CNY price points for a product before bargaining with a 1688 / Yiwu sourcing agent.
  • Aggregating sold counts and Tmall vs. Taobao shop split for a SKU across multiple keywords (market sizing).
  • Surfacing the cheapest mainland-China-shipped variant of a Chinese-language product name for an end user.
  • Any read-only crawl that needs {title, price, sold, shopName, rating, isTmall, url} per result row.

Workflow

Taobao's PC search page (s.taobao.com/search) is a thin React shell that fetches results from the internal mtop API at h5api.m.taobao.com/h5/mtop.taobao.wsearch.h5search/. The API requires (a) an appKey, (b) a per-session _m_h5_tk token freshly minted by visiting www.taobao.com, (c) a request sign derived from the body + token + timestamp, and (d) the cookie2 + t cookies set on first page hit. There is no public unsigned API. The browser path is the only honest option — but be aware that on Browserbase IP ranges Alibaba's Baxia anti-bot system serves a flat cloud_ip_bl deny (no captcha to solve) and the API XHR never returns. Verified across 4 iterations 2026-05-20: --verified --proxies --solve-captchas in us-east-1 and ap-southeast-1 all hit the same wall. Document this as a candidate skill until Browserbase IP rep changes or a logged-in context-id is supplied.

  1. Create the most-stealthy session available:

    SID=$(browse cloud sessions create --keep-alive --verified --proxies --solve-captchas \
      --region ap-southeast-1 \
      | node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>process.stdout.write(JSON.parse(s).id))")
    export BROWSE_SESSION="$SID"
    

    ap-southeast-1 is the closest Browserbase region to mainland China and gives the residential proxy a marginally better reputation profile than US regions, but in our 2026-05-20 run it still got cloud_ip_bl. If you have a Browserbase context with logged-in Taobao cookies (cookie2, _tb_token_, t, _m_h5_tk) baked in, pass --context-id <ID> — that bypasses the Baxia IP gate because the API trusts a warm session.

  2. Warm up cookies on the homepage first (don't go straight to /search?q=):

    browse open "https://www.taobao.com/" --remote
    browse wait timeout 3000 --remote
    

    This sets the thw=cn, cna, and tfstk cookies that the wsearch endpoint validates. Skipping this step doubles the chance of an immediate Baxia 405-then-deny redirect.

  3. Encode the query — Chinese keywords MUST be percent-encoded (UTF-8). English passes through. Construct:

    https://s.taobao.com/search?q={urlenc-query}
        &sort={_coefp|_sale-desc|price-asc|price-desc|_ratesum-desc}
        &loc={location-cn-name}
        &start_price={n}&end_price={n}
        &filter={tag1};{tag2}
        &tab={all|mall|taobao|shop|bizz}
        &style=list
        &page={1..100}
    

    Verified URL parameters from the page's window.__last_search_params (iter-2):

    • q — query (URL-encoded UTF-8 for Chinese; e.g. %E6%89%8B%E6%9C%BA for "手机").
    • sort_coefp (default "综合"), _sale-desc (sales desc, the default for sort=sale-desc URL param), price-asc / price-desc, _ratesum-desc (rating).
    • taball (默认 "所有宝贝"), mall (Tmall only — same as tab=mall from list.tmall.com redirect), taobao (淘宝-only excludes Tmall), shop (店铺), bizz (企业购 / B2B).
    • loc — ship-from location, accepts Chinese names: 广东, 上海, 北京, 浙江, 江苏. Mainland-China-only is implicit when loc is set to a Chinese province; for "anywhere in mainland" omit the param (国际 cross-border listings only appear if loc is left blank AND no service=tmall filter is set).
    • start_price / end_price — integer CNY. Max value enforced server-side (~999999).
    • filter — semicolon-joined service flags. Known: service:tmall (Tmall only), service:cod (cash-on-delivery), service:freeshipping, service:postFee=0, service:24h (24h ship), service:7day (7-day-no-reason-return).
    • style=list returns the dense list view (more parseable than style=card).
    • page — 1-indexed, hard cap 100. pageSize is fixed at 48 (cannot be raised).
    • totalResults and totalPage in __last_search_params are placeholders that read 4800/100 BEFORE the API call returns — only treat them as ground truth after the product cards have rendered (see step 6).
  4. Open the constructed URL:

    browse open "$URL" --remote
    browse wait load --remote
    browse wait timeout 8000 --remote   # the wsearch XHR completes 2-6s after `load` fires
    
  5. Branch on Baxia state BEFORE trying to parse cards — the page renders the skeleton + nav + pagination "1/100" + sort tabs (综合/销量/价格) regardless of whether the API succeeded, so DOM presence is not proof of data:

    browse eval "(() => ({
      dialog: !!document.querySelector('.baxia-dialog'),
      items: document.querySelectorAll('a[href*=\"item.taobao.com\"], a[href*=\"detail.tmall.com\"]').length,
      bodyTextEnd: document.body.innerText.slice(-50)
    }))()" --remote
    
    • dialog: true AND items: 0 AND bodyText ends in 加载中... — Baxia denied. Stop. Emit {success: false, reason: "anti_bot_blocked"}.
    • dialog: false AND items >= 30 — API returned. Proceed to step 6.
    • dialog: false AND items: 0 — wait another 4 s, then re-check. If still 0, treat as no_results only after also confirming __last_search_params.totalResults === 0.
  6. Extract one record per [data-name="item"] card. Each card is a div[data-name="item"] (or div[data-name="itemNT"] for "new" Tmall cards). Per card:

    • Title: [class*="title"] span[title]getAttribute('title') gives the full text (otherwise truncated by ). Tmall cards prefix the title with a red "Tmall" badge — exclude that badge from the title string.
    • Price (CNY): the dominant [class*="priceInt"] + [class*="priceFloat"] pair concatenated → e.g. "1899" + ".00"1899.00. Some Tmall cards use a single [class*="priceWrapper"] span with embedded ¥ glyph — strip ¥ and parse.
    • Sold count: [class*="realSales"] text, e.g. "1万+人付款" → 10000. Patterns: \d+人付款 (literal), \d+万\+人付款 (×10000+). For card variants: class*=salesText.
    • Shop name: [class*="shopNameText"] (Tmall+Taobao unified).
    • Tmall flag: card root has data-name="itemNT", OR detail link starts with https://detail.tmall.com/, OR card root has class containing tmall/mallStyle.
    • Rating: NOT shown on the search results page in 2024+. Taobao deprecated card-level rating display ~2022. Listed for completeness but always null in output. Only available on the item detail page.
    • Canonical URL: card root a[href] — Tmall items: https://detail.tmall.com/item.htm?id={itemId}; Taobao items: https://item.taobao.com/item.htm?id={itemId}. Strip all query params except id for canonicalization.
    • Ship-from location: [class*="procity"] text — e.g. "广东 深圳" (province city).

Browser fallback (degenerate / no-data case)

If the Baxia wall is up (the dominant case for Browserbase IPs as of 2026-05-20), there is no page-context fallback that returns real product data. Honest options:

  • Return {success: false, reason: "anti_bot_blocked"} with the URL we attempted.
  • Surface __last_search_params (the search query echo) and totalResults / totalPage placeholder values so the caller can confirm the URL parameters were accepted, even if data was withheld.
  • Capture a screenshot of the Baxia "请输入验证码" dialog as evidence — useful when escalating to a human reviewer.

Do not try to scrape s.m.taobao.com/h5?q=... as a fallback — it is also fully JS-rendered and depends on the same blocked mtop API. Do not try list.tmall.com/search_product.htm?q=... — it 302s to s.taobao.com/search?fromTmallRedirect=true&tab=mall (same blocked page). Do not try world.taobao.com/search/search.htm?q=... — it also 302s to s.taobao.com/search. All three were verified dead ends in iter-3.

Site-Specific Gotchas

  • READ-ONLY. Never click a "立即购买" / "加入购物车" button — that starts a checkout flow.
  • Baxia (Alibaba's anti-bot system) flat-denies all Browserbase IP ranges with cloud_ip_bl — verified across 4 sessions 2026-05-20 spanning bare, --verified, --verified --proxies, and --verified --proxies --solve-captchas in two regions (us-east-1, ap-southeast-1). The deny is served as an iframe https://bixi.alicdn.com/punish/punish:resource:template:baba:default_*.html?qrcode=...|cloud_ip_bl|0&action=deny — note action=deny (not action=challenge); there is no captcha for --solve-captchas to solve. The only known workaround is supplying a Browserbase context-id with a logged-in Taobao session, OR running from a non-Browserbase residential IP. Do not waste wall time iterating without one of those.
  • Login link is rendered on every page亲,请登录 redirects to login.taobao.com/member/login.jhtml?redirectURL=.... The presence of this link does NOT mean login is required for the search itself; it is rendered for all anonymous sessions. Login is only required if you want to (a) save a search, (b) see "personalized" rankings, or (c) view shop-only sections.
  • __last_search_params.totalResults = 4800 and totalPage = 100 are placeholders, not API responses. They are populated from URL params + defaults before the wsearch XHR fires. Verify they have changed (or that items.length > 0 in the DOM) before trusting them.
  • Page cap is hard 100. page=101 returns page=1 data silently. To enumerate beyond 4,800 items you must narrow with category, price, or shop filters and re-search.
  • Sort-param translation: URL sort=sale-desc is rewritten to internal _sale-desc; sort=price-ascprice-asc (no underscore prefix); the underscore prefix appears to be vestigial. Default (no sort) is _coefp ("综合排序" = relevance).
  • Tmall is part of Taobao. Tmall items appear in tab=all results with a red badge. To filter Tmall-only, use tab=mall OR filter=service:tmall. To exclude Tmall, use tab=taobao. There is no native "exclude-Tmall" param if you also want shop-results — you must filter client-side on data-name="item" (Taobao) vs. data-name="itemNT" (Tmall).
  • Rating is NOT exposed on the search page anymore (deprecated post-2022). The skill's rating field is always null. To get rating, follow the canonical URL to item.htm?id=... and read DSR (Detail Seller Rating) from the shop info widget — that is a separate skill.
  • Sold count format is non-numeric Chinese text: 1万+人付款 = 10,000+, 5000+人付款 = 5,000+, 已售10万+件 = 100,000+. Parse as ×10000, + as a lower bound. Some new cards show monthSales ("月销量 1.2万+") instead of cumulative realSales — they are different metrics; document which you extracted.
  • Mainland-China-shipped enforcement: loc= set to a Chinese province name silently excludes Hong Kong, Macau, Taiwan, and overseas listings. To include those, omit loc AND avoid filter=service:tmall (which is mainland-only). The marketplace flag for cross-border listings is 属性->发货地 containing 海外 (overseas) — visible only if not filtered out.
  • q must be UTF-8 percent-encoded for Chinese. GBK-encoded queries silently return zero results (Taobao's URL rewriter assumes UTF-8 since 2018).
  • --solve-captchas is a no-op here. Baxia's cloud_ip_bl deny path does not present a captcha — it serves a "请稍后再试" / "请输入验证码" QR-code-only dialog with no human-solvable challenge. Browserbase's solver looks for a Recaptcha / hCaptcha widget and finds nothing.
  • Don't waste wall time on h5api.m.taobao.com/h5/mtop.taobao.wsearch.h5search/. Direct curl returns Baxia HTML even with proxies + spoofed cookies + Referer: https://s.taobao.com/. The endpoint validates a sign HMAC over (appKey + t + body + _m_h5_tk-prefix) and the _m_h5_tk token is rotated server-side every ~30 minutes, gated behind a successful mtop.alibaba.acl.guard.guardEntry round-trip that itself requires the same anti-bot pass. Verified blocked in iter-2.
  • Recommendation feed iframes are always blocked even when search itself works. The "猜你喜欢" sidebar fetches mtop.relationrecommend.wirelessrecommend.recommend — that endpoint is the one consistently observed serving Baxia denies even when wsearch succeeds. Iframes showing Access denied — We have detected unusual traffic are NOT proof that the search itself failed; check items.length on the main grid.
  • browse snapshot is fine but heavy. The PC search page DOM is 5,000+ refs when fully populated. Prefer browse eval to extract just the fields you need rather than parsing the full a11y tree.

Expected Output

Three distinct outcome shapes:

// Search succeeded — products extracted
{
  "success": true,
  "query": "phone",
  "sort": "_coefp",
  "filters": { "loc": null, "start_price": null, "end_price": null, "filter": null, "tab": "all" },
  "totalResults": 4800,
  "page": 1,
  "results": [
    {
      "title": "Apple/苹果 iPhone 15 Pro Max 256G 全网通5G手机",
      "price": 7999.00,
      "currency": "CNY",
      "sold": 50000,
      "soldRaw": "已售5万+件",
      "shopName": "Apple Store官方旗舰店",
      "rating": null,
      "isTmall": true,
      "shipFrom": "上海",
      "url": "https://detail.tmall.com/item.htm?id=735810829485"
    }
  ]
}

// Anti-bot wall — Baxia denied the wsearch API
{
  "success": false,
  "reason": "anti_bot_blocked",
  "antiBotSystem": "baxia",
  "denyCode": "cloud_ip_bl",
  "url": "https://s.taobao.com/search?q=phone&sort=sale-desc",
  "lastSearchParamsEcho": { "q": "phone", "sort": "_coefp", "totalResults": 4800, "totalPage": 100 }
}

// Zero results (rare — usually a Chinese-encoding bug or over-narrow filter)
{
  "success": true,
  "query": "...",
  "totalResults": 0,
  "results": []
}