Bay Club Location Info Scraper
Purpose
Aggregate structured, per-location gym information from bayclubs.com for a wellness/health platform: club name, full street address (city, state, ZIP), phone, email, weekly operating hours (Sun–Sat, including "closed" days), and the club-specific amenity list. Bay Club is a ~33-location chain of athletic/fitness/golf clubs across California, Washington, and Oregon. The site is a statically-rendered Webflow site — all of this data is present in the initial HTML, so a residential-proxy HTTP fetch + parse is the fast, cheap, reliable path. Read-only; never submits the "Visit a club" lead form.
When to Use
- Building or refreshing a directory of Bay Club locations with addresses, hours, and offered amenities.
- Comparing amenities across locations (e.g. "which clubs have a pool / spa / pickleball?").
- Periodic re-sync of hours and contact info into an aggregator.
- Any flow that would otherwise scrape Bay Club club pages — the HTML is static, so skip a full browser unless your HTTP path is blocked.
Workflow
The recommended method is HTTP fetch through a residential proxy + HTML parse — no JavaScript execution is required. Every field lives in the server-rendered HTML. A bare (non-proxy) request intermittently gets a Cloudflare "Just a moment" interstitial (HTTP 200 but challenge body, no real content), so route the fetch through a residential proxy (browse cloud fetch <url> --proxies).
1. Enumerate all locations
Fetch the locations index and harvest every club URL:
GET https://bayclubs.com/locations (via residential proxy)
Extract club detail links with the regex href="(/clubs/[^"]+)" and de-dupe. This yields 33 locations (the authoritative list — it is a superset of /sitemap.xml, which omits pro-club-seattle and griffinclub). /sitemap.xml is an alternative enumeration source but trust /locations.
2. Fetch each club page
GET https://bayclubs.com/clubs/{slug} (via residential proxy)
3. Parse the "HOURS & LOCATION INFO" block
Locate the literal heading text HOURS & LOCATION INFO. Immediately after it:
- Weekly hours — seven lines
Sun:,Mon:, …Sat:. Each value is eitherH:MM am - H:MM pmor the literalclosed. The colon may be followed OR preceded by whitespace — matchDay\s*:\s*(closed|\d{1,2}:\d{2}\s*[ap]m\s*-\s*\d{1,2}:\d{2}\s*[ap]m)(e.g. Griffin Club rendersSun : 6:00 am - 9:00 pm). Note invisible zero-width /joiner characters separate the lines — strip them. - Address — the block between the
Sat:hours value and the literalPhone:, in the form{Club Name} {street} {City}, {ST} {ZIP}. The club name prefix duplicates the page<title>; strip it to get the clean street+city line. There is no comma between street and city, so don't try to split them — keep the line whole and parsestate/zipfrom the, {ST} {ZIP}tail. - Phone —
Phone:\s*([0-9().\- ]{7,20}). - Email —
Email:\s*([\w.%+\-]+@[\w.\-]+\.\w{2,}).
4. Parse the "CLUB AMENITIES" grid (club-specific)
The amenity list that is specific to that club is a Webflow CMS collection. Each card is an anchor:
<a class="clubamenities_linkblock ..." data-category="Fitness Centers" href="/amenities/fitness-center">…</a>
Collect the data-category attribute (the display name) and the href (/amenities/{slug}) from every clubamenities_linkblock anchor. De-dupe by name. Amenity counts vary by club (observed 4–29), confirming these are club-specific.
Do NOT derive amenities from /amenity/{slug} (singular) hrefs on the page — those are an identical global footer/nav block present on every club page (always the same ~25 links) and are NOT the club's actual offering. The club-specific signal is the clubamenities_linkblock / data-category cards, whose target hrefs use the plural /amenities/{slug}.
5. Emit one record per club
See Expected Output. Cross-validated: parsing 33 club pages yielded fully structured records for 31; see Gotchas for the 2 exceptions.
Browser fallback
If the proxied HTTP path is unavailable, a headless browser produces the identical data (cross-validated to the exact same 21 amenities + hours + address for sanfrancisco):
browse open https://bayclubs.com/clubs/{slug} --remote(a default Browserbase remote session passed Cloudflare without--proxiesin testing; add--proxiesif challenged).- Dismiss the Usercentrics cookie banner (click the
OKbutton) if it overlays content. browse snapshot— the page exposes a full accessibility tree (~458 refs). Read theHOURS & LOCATION INFOtext and theCLUB AMENITIESgrid labels (ALL-CAPS in the snapshot; title-case them when emitting).
This costs ~100× the HTTP path per page for zero extra data — use only as a fallback.
Site-Specific Gotchas
- Residential proxy for HTTP. A bare HTTP GET returns
200but sometimes serves a Cloudflare "Just a moment" / challenge body (no real content).browse cloud fetch --proxieswas reliable across all 33 pages. (Pre-run probe flaggedcloudflare+recaptcha; reCAPTCHA was never actually triggered on read-only GETs.) - Two PRO Club locations live off-domain.
https://bayclubs.com/clubs/pro-club-seattleand/clubs/proclub-bellevuereturn HTTP 301 redirecting tohttps://www.proclub.com/club/.... They have no usable data on bayclubs.com — either follow the redirect and parse proclub.com separately, or flag themsuccess:false, reason:"offsite_redirect". All other 31 clubs render fully on bayclubs.com. - Hours colon spacing is inconsistent. Most clubs render
Sun:but at least one (Griffin Club) rendersSun :(space before colon). Use a whitespace-tolerant regex or you'll silently drop all 7 days for that club. - Zero-width joiner noise. The hours lines are separated by
/zero-width characters; normalize whitespace (\s+ → " ") before regex-matching or day boundaries get fuzzy. - ZIP ≠ first 5-digit number. Several addresses begin with a 5-digit street number (e.g. Fremont
46650 Landing Parkway, Portland18120 SW…). Parse ZIP from the, {ST} {ZIP}tail, not the first\d{5}in the block, or you'll capture the street number. - One club spells out the state.
crowcanyonrendersDanville, California(full state name, no ZIP in the tail) instead of, CA 94526, so the 2-letterstate/zipextraction returns null there.address_rawis still captured intact — fall back to it. All other CA/WA/OR clubs use the 2-letter form. - Amenities: plural vs singular path is the whole ballgame. Club-specific amenities =
clubamenities_linkblockcards →data-categoryname +/amenities/{slug}(plural) href. The/amenity/{slug}(singular) and/new-amenities/{slug}links are global nav/footer and are identical on every page — using them gives every club the same bogus 25-item list. - No JSON-LD / structured-data block. The pages carry no
application/ld+json./page-jsonand/faq-jsonare not JSON endpoints — they're ordinary Webflow HTML pages titled "page-json"/"faq-json". Don't waste time trying to hit them as APIs. - Hosting fingerprint. Webflow behind Cloudflare (
X-Wf-Region,Surrogate-Key: pageId:…,cdn.prod.website-files.com). Content is fully pre-rendered; there is no client-side data API to discover. - Read-only. Each club page embeds a "Visit a club" lead-capture form (First/Last name, email, phone, club picker). Never fill or submit it.
- Embedded Google Map may error. The map iframe sometimes shows "Oops! Something went wrong" — irrelevant to data extraction; the textual address is the source of truth.
Expected Output
One record per club. Recommended top-level shape is { "source": "...", "count": N, "clubs": [ ... ] }.
{
"source": "https://bayclubs.com/locations",
"count": 33,
"clubs": [
{
"success": true,
"slug": "sanfrancisco",
"name": "Bay Club San Francisco",
"url": "https://bayclubs.com/clubs/sanfrancisco",
"address": "150 Greenwich Street San Francisco, CA 94111",
"address_raw": "Bay Club San Francisco 150 Greenwich Street San Francisco, CA 94111",
"state": "CA",
"zip": "94111",
"phone": "(415) 433-2200",
"email": "info.bcsf@bayclubs.com",
"hours": {
"Sun": "7:00 am - 7:00 pm",
"Mon": "5:00 am - 10:00 pm",
"Tue": "5:00 am - 10:00 pm",
"Wed": "5:00 am - 10:00 pm",
"Thu": "5:00 am - 10:00 pm",
"Fri": "5:00 am - 9:00 pm",
"Sat": "7:00 am - 7:00 pm"
},
"amenities": [
{ "name": "After School Programs", "url": "https://bayclubs.com/amenities/after-school-programs" },
{ "name": "Fitness Centers", "url": "https://bayclubs.com/amenities/fitness-center" },
{ "name": "Pilates", "url": "https://bayclubs.com/amenities/pilates" }
],
"amenities_count": 21,
"error_reasoning": null
}
]
}
Edge-case record shapes:
// Golf-only club with a "closed" day (StoneTree)
{ "success": true, "slug": "stonetree", "name": "StoneTree Golf Club",
"hours": { "Mon": "closed", "Tue": "7:00 am - 5:00 pm", "...": "..." },
"amenities_count": 9, "error_reasoning": null }
// Off-domain PRO Club location (301 -> proclub.com)
{ "success": false, "slug": "pro-club-seattle",
"reason": "offsite_redirect",
"redirect_to": "https://www.proclub.com/club/locations/seattle",
"error_reasoning": "Club detail is hosted on proclub.com, not bayclubs.com" }
// State spelled out, ZIP not in 2-letter tail (crowcanyon) — address_raw still valid
{ "success": true, "slug": "crowcanyon", "name": "Bay Club Crow Canyon Country Club",
"address": "711 Silver Lake Drive Danville, California",
"state": null, "zip": null,
"address_raw": "Bay Club Crow Canyon Country Club 711 Silver Lake Drive Danville, California",
"amenities_count": 11, "error_reasoning": null }