Explore the Thinking Machines Lab Blog (Connectionism)
Purpose
Enumerate the research blog posts published on thinkingmachines.ai — the blog is
branded "Connectionism" and lives at /blog/ — and return a structured list of
every post with its title, canonical URL, publication date, and author/attribution.
Optionally enrich each post with its description, full article text, and word count.
Read-only — this skill only reads public pages; it never submits forms, applies to
jobs, or clicks "Join us".
When to Use
- "What's on the Thinking Machines Lab blog?" / "List their latest research posts."
- Monitoring Connectionism for new posts (poll the index or the RSS feed on a schedule).
- Building a feed/digest of titles + dates + authors + links.
- Pulling the full text of a specific post (e.g. "LoRA Without Regret") for summarization.
- Any flow that would otherwise scrape the blog HTML —
fetchis faster, cheaper, and more reliable than driving a browser here.
Workflow
thinkingmachines.ai is a Hugo static site, fully server-rendered. Every blog page —
the index, each post, the RSS feed, and the sitemap — returns complete HTML/XML on a plain
HTTP GET. There is no client-side rendering and no content JSON API (the only XHR the
page fires is a Cloudflare RUM beacon at /cdn-cgi/rum). Cloudflare fronts the site but
does not challenge simple fetches: a bare browse cloud fetch (no proxy, no stealth)
returns HTTP 200. So the optimal path is a direct fetch + HTML/XML parse — driving a real
browser is unnecessary and ~100× more expensive.
1. Enumerate posts from the blog index (primary)
GET https://thinkingmachines.ai/blog/
Each post is a list item shaped like:
<a class="post-item-link" href="/blog/{slug}/">
<time class="desktop-time">May 11, 2026</time>
<div class="post-info">
<div class="post-title">Interaction Models: A Scalable Approach to Human-AI Collaboration</div>
<div class="author-date"> Thinking Machines </div>
<time class="mobile-time">May 11, 2026</time>
</div>
</a>
Extract per post:
- url — the
href(relative/blog/{slug}/); prependhttps://thinkingmachines.aifor the absolute URL. - title — text of
.post-title. - published — text of the
timeelement (e.g."May 11, 2026"). Note there are two<time>nodes per item (desktop-time+mobile-time) with identical text — dedupe. - author — text of
.author-date(e.g."Thinking Machines","John Schulman in collaboration with others at Thinking Machines"); trim whitespace.
The index lists newest-first and currently shows all 5 posts on one page — there is no pagination.
2. (Optional) Get full content / descriptions via RSS
GET https://thinkingmachines.ai/index.xml
A standard RSS 2.0 feed. Each <item> carries <title>, <link>, <guid>, <pubDate>
(RFC-822), and <description> containing the full HTML body of the post (the feed is
large — ~420 KB — because it inlines complete articles). One request yields every post's
full text. Filter <item>s whose <link> contains /blog/ — /index.xml is the
site-wide "recent content" feed and may include non-blog (/news/) items in the future;
today it happens to contain only the 5 blog posts.
3. (Optional) Enrich a single post
GET https://thinkingmachines.ai/blog/{slug}/
Each post page is server-rendered and exposes clean metadata in <head>:
<title>—"{Post Title} - Thinking Machines Lab"<meta name="description" content="...">— one-line summary<meta itemprop="datePublished" content="2025-09-29T00:00:00+00:00">— ISO-8601 date<meta itemprop="wordCount" content="5784"><article>...</article>— the full rendered body (math is KaTeX, rendered client-side, but the raw$...$LaTeX source is present in the server HTML).
4. (Optional) Discover all URLs via sitemap
GET https://thinkingmachines.ai/sitemap.xml
24 <loc> entries covering the whole site; exactly the 5 /blog/{slug}/ URLs plus the
/blog/ index are the blog surface. Useful as a cross-check that the index didn't miss a post.
Browser fallback
Only needed if the fetch endpoints ever start returning a Cloudflare interstitial (not observed). Drive a Browserbase session:
sid=$(browse cloud sessions create --keep-alive --proxies | node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>{s=s.slice(s.indexOf('{'));process.stdout.write(JSON.parse(s).id)})")
browse open "https://thinkingmachines.ai/blog/" --remote --session "$sid"
browse wait load --remote --session "$sid"
browse snapshot --remote --session "$sid" # ~60 a11y refs; post titles/dates/authors + the href map are all present
The accessibility snapshot exposes every post's title, date, author, and link, so a single
snapshot after wait load is enough — no scrolling or clicking required.
Site-Specific Gotchas
- The blog is named "Connectionism" and lives at
/blog/. The top-nav label is "Connectionism", not "Blog". Don't confuse it with/news/, which is a separate section for company announcements (Tinker GA, NVIDIA partnership, grants, etc.) — not research blog posts. - No content API. The site is static Hugo HTML; the only XHR is the Cloudflare RUM
beacon (
/cdn-cgi/rum). Don't waste time hunting for a JSON endpoint — parse the HTML or RSS. - Proxies / stealth are NOT required. Cloudflare fronts the site (
Server: cloudflare,cf-cache-status: DYNAMIC) but does not challenge GETs — barebrowse cloud fetchreturns HTTP 200 on the homepage,/blog/, every post,/index.xml, and/sitemap.xml. The pre-run probe predictedlikelyNeedsProxies: true, but direct testing showed proxies are unnecessary for the fetch path. (The browser-fallback validation happened to run on a--proxiessession, but that was belt-and-suspenders, not a requirement.) browse cloud fetchprints an "Update available: 0.7.2 -> 0.8.3" banner to stdout before the JSON envelope. Strip it before parsing: slice the output from the first{(e.g.s = s.slice(s.indexOf('{')); JSON.parse(s)).jqis not installed in the sandbox./index.xmlis site-wide "recent content", not blog-only. Today it contains exactly the 5 blog items, but to stay correct over time, filter<item>s by<link>containing/blog/. For a guaranteed blog-only feed,/blog/index.xmlalso exists.- Two
<time>nodes per index item (desktop-timeandmobile-time) carry identical text — dedupe so you don't double-count the date. - Author strings vary in shape — from a bare
"Thinking Machines"to"John Schulman in collaboration with others at Thinking Machines". Treat the whole.author-datestring as the attribution; don't try to split out a single name. - Date formats differ by surface: index =
"May 11, 2026"; RSS<pubDate>= RFC-822 ("Mon, 11 May 2026 00:00:00 +0000"); post-pagedatePublished= ISO-8601. Normalize if you need a canonical date. - Post math is KaTeX. Article bodies contain LaTeX (
$...$,$$...$$) rendered client-side by KaTeX. The raw LaTeX source is in the server HTML, so a fetch captures it; a browser screenshot captures the rendered math. - Browser-fallback quirk: inside a session already attached in CDP mode (e.g. with a
trace client),
browse open <url> --remoteerrors with "already running in cdp mode" — use plainbrowse open <url> --session <sid>instead.
Expected Output
Primary shape — the blog index enumeration:
{
"success": true,
"blog_name": "Connectionism",
"blog_url": "https://thinkingmachines.ai/blog/",
"post_count": 5,
"posts": [
{
"title": "Interaction Models: A Scalable Approach to Human-AI Collaboration",
"url": "https://thinkingmachines.ai/blog/interaction-models/",
"published": "May 11, 2026",
"author": "Thinking Machines"
},
{
"title": "On-Policy Distillation",
"url": "https://thinkingmachines.ai/blog/on-policy-distillation/",
"published": "Oct 27, 2025",
"author": "Kevin Lu in collaboration with others at Thinking Machines"
},
{
"title": "LoRA Without Regret",
"url": "https://thinkingmachines.ai/blog/lora/",
"published": "Sep 29, 2025",
"author": "John Schulman in collaboration with others at Thinking Machines"
},
{
"title": "Modular Manifolds",
"url": "https://thinkingmachines.ai/blog/modular-manifolds/",
"published": "Sep 26, 2025",
"author": "Jeremy Bernstein"
},
{
"title": "Defeating Nondeterminism in LLM Inference",
"url": "https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/",
"published": "Sep 10, 2025",
"author": "Horace He in collaboration with others at Thinking Machines"
}
],
"error_reasoning": null
}
Optional enriched per-post shape (when fetching an individual post page or RSS item):
{
"title": "LoRA Without Regret",
"url": "https://thinkingmachines.ai/blog/lora/",
"published": "2025-09-29T00:00:00+00:00",
"author": "John Schulman in collaboration with others at Thinking Machines",
"description": "How LoRA matches full training performance more broadly than expected.",
"word_count": 5784
}