Scrape Blockchain Intelligence Team Data
Purpose
Collect the complete team/staff roster of the Blockchain Intelligence Professionals Association (BIPA, blockchaintelligence.com) and return one structured record per member: full name, role/job title, profile URL, photo URL, and full biography. This is a read-only extraction. The site is a WordPress install, so the fast path is its REST API for the roster plus a single plain-HTTP GET per member for the bio — no scripted browsing required.
When to Use
- "Scrape / collect the team (or staff, leadership, members) of blockchaintelligence.com."
- Building a contact/people dataset for BIPA (names, titles, bios, headshots).
- Monitoring the team page for added/removed members or role changes.
- Any task needing the canonical list of certified specialists / chairman listed under
/meet-our-team/.
Workflow
The optimal method is HTTP, not browser automation. Every relevant page is static, server-rendered HTML behind Cloudflare that serves 200 to plain GETs (no JS challenge, no proxy needed for these paths). Use browse cloud fetch <url> (or any HTTP client) — do not script a browser unless the HTTP path starts returning challenges.
-
Get the roster in one call (WP REST API). The "Meet our team" page is WordPress page id
9769. Fetch its rendered content:GET https://blockchaintelligence.com/wp-json/wp/v2/pages?slug=meet-our-teamParse the JSON, take
result[0].content.rendered. It contains one<div class="team-block …">per member. From each block extract:- profile_url —
href="https://blockchaintelligence.com/team/{slug}/" - photo_url — the
<img src="…wp-content/uploads/….jpg"> - name — text of
<div class="team-name"><a …>NAME</a></div> - role — text of
<div class="team-job">ROLE</div>
As of the last run this yields 7 members: Bogdan VACUSTA (Chairman), Aurel HUSTEA, Vasile LUPU, Diana PATRUT, Alina Gabriela POPESCU, Catalin VREME, Claudiu ZBIRCEA.
- profile_url —
-
Enrich each member with their bio (per-member GET). Bios are NOT in the REST API (see Gotchas). For each
profile_urlfrom step 1:GET https://blockchaintelligence.com/team/{slug}/From the returned HTML extract:
- name —
<div class="team-name …">NAME</div>(confirms the roster value) - role —
<div class="team-job">ROLE</div> - bio — concatenate the
<p>…</p>paragraphs inside<div class="team-content …">. If that selector misses, fall back to the<meta property="og:description">value (a truncated one-paragraph summary). - photo —
<meta property="og:image">(same image as the roster).
- name —
-
Decode HTML entities in every text field:
&→&,’→',–→–,'→', →space, then strip residual tags and collapse whitespace. -
Emit JSON — one object per member plus a top-level count and the roster endpoint used (see Expected Output).
Browser fallback
Only if the HTTP path is ever Cloudflare-challenged:
browse open https://blockchaintelligence.com/meet-our-team/ --remote(add--proxiesif challenged).browse wait load, then dismiss the "We value your privacy" cookie banner (click Accept All) if it overlaps content.browse get markdown bodyorbrowse get text bodyto pull the roster, then open each/team/{slug}/for bios. Same.team-name/.team-job/.team-contentstructure renders in the DOM.
Site-Specific Gotchas
- WordPress site (LiteSpeed + Cloudflare). Response headers advertise
Link: <…/wp-json/>; rel="https://api.w.org/"andX-Tec-Api-Root— confirming the REST API is live and open (no auth required for reads). - The
teampost type is NOT REST-registered.GET /wp-json/wp/v2/teamreturns404 rest_no_route, andteamdoes not appear in/wp-json/wp/v2/types. Don't waste time hunting for a/wp/v2/teamcollection — it doesn't exist. The roster only lives inside the Elementor-builtmeet-our-teampage content, and bios only on the/team/{slug}/HTML pages. - Roster is rendered by the GavickPro "gva-teams" Elementor widget. Member blocks use stable classes:
.team-block,.team-image,.team-name,.team-job. Each member link appears twice per block (image link + name link) — de-duplicate by slug. - Proxies / verified NOT required for the data path. The homepage shows a Cloudflare + hCaptcha posture (pre-run probe flagged
likelyNeedsProxies: truefor/), but the WP REST API endpoint and every/team/{slug}/and/meet-our-team/page returned200over a plainbrowse cloud fetchwith no--proxiesand no--verified. Only add proxies if you later get a Cloudflare interstitial. - Sitemaps don't help enumerate the team.
/sitemap.xml(Google Sitemap Generator) splits intopost-sitemap.xml/page-sitemap.xml/sitemap-misc.xml— there is no dedicated team sitemap, and/wp-sitemap.xmlreturns an empty urlset./sitemap_index.xml404s. Use the roster page, not the sitemap, as the source of truth for the member list. - Cookie consent banner ("We value your privacy") overlays the lower-right of pages in a real browser. It does not affect HTTP fetches; only matters for the browser fallback / screenshots.
- Detail pages are heavy (~130 KB HTML) because of the full Elementor theme; the
.team-contentbio block is a small slice. Parse, don't render. - Member count is small and can change (7 at last capture). Always re-derive the count from the roster rather than hardcoding.
Expected Output
{
"source": "blockchaintelligence.com",
"roster_endpoint": "https://blockchaintelligence.com/wp-json/wp/v2/pages?slug=meet-our-team",
"team_count": 7,
"team": [
{
"name": "Bogdan VACUSTA",
"role": "Chairman",
"slug": "bogdan-vacusta",
"profile_url": "https://blockchaintelligence.com/team/bogdan-vacusta/",
"photo_url": "https://blockchaintelligence.com/wp-content/uploads/bogdan2.jpg",
"bio": "Bogdan has over 20 years' experience in intelligence, compliance, audit & investigations and now serves as a strategy consultant on crypto-assets for the National Bank of Romania.\n\nAn Accredited Counter Fraud Specialist, Accredited Counter Fraud Manager, Certified DLT & Blockchain Manager, Certified Crypto-Assets Investigator & Compliance Specialist, Certified OSINT Practitioner …"
},
{
"name": "Aurel HUSTEA",
"role": "Crypto-Assets Investigator & Compliance Specialist",
"slug": "aurel-hustea",
"profile_url": "https://blockchaintelligence.com/team/aurel-hustea/",
"photo_url": "https://blockchaintelligence.com/wp-content/uploads/aurel2.jpg",
"bio": "…"
}
]
}
Per-member schema:
{
"name": "string — full name as displayed (surname is usually UPPERCASE)",
"role": "string — job title from .team-job",
"slug": "string — URL slug under /team/",
"profile_url": "string — absolute /team/{slug}/ URL",
"photo_url": "string — absolute wp-content/uploads image URL",
"bio": "string — paragraphs from .team-content joined by \\n\\n; falls back to og:description (truncated) if the block is absent"
}
Empty / failure shape (roster endpoint unreachable or returns no blocks):
{ "source": "blockchaintelligence.com", "team_count": 0, "team": [], "error_reasoning": "roster page returned <status> or no .team-block elements found" }