SEC EDGAR Full-Text Filing Search
Purpose
Search the full body text of every SEC EDGAR filing accepted electronically since 2001 — 10-K, 10-Q, 8-K, S-1, S-3, 424B*, DEF 14A, SC 13D, SC 13G, 13F-HR, Forms 3/4/5, N-PX, and dozens of others — for a phrase or boolean expression, and return a structured list of matching filings with filer identity, form type, filing date, period of report, business + incorporation state, SIC code, the canonical filing-index URL, and the direct URL to the matching exhibit. Read-only.
This is distinct from EDGAR's filing-metadata browse surface (/cgi-bin/browse-edgar), which lists filings by company without looking inside them. This skill searches the body text.
When to Use
- "Find every 10-K from Q1 2024 that mentions 'climate risk'."
- "Which 8-Ks from Apple in the last year contain 'material weakness'?"
- "List all filings citing a specific Treasury regulation across the entire EDGAR corpus."
- "Find SC 13D/G filings disclosing a stake above N% in {company}."
- "Show all Form 4 trades by a named officer across every company they're an insider at."
- Any flow where the question is "which filings mention X?" rather than "what filings did company Y submit?"
Workflow
EDGAR's full-text search is backed by a single public JSON endpoint: https://efts.sec.gov/LATEST/search-index (an AWS API Gateway in front of an OpenSearch/Elasticsearch cluster, accept-cors *, no auth, no cookies, no captcha). The consumer SPA at https://www.sec.gov/edgar/search/ is a thin jQuery wrapper that calls this exact endpoint and renders the same metadata fields. Always use the JSON endpoint directly — the browser path costs ~50× more agent turns and returns no extra data.
1. Build the query URL
GET https://efts.sec.gov/LATEST/search-index
?q=<phrase-or-boolean>
[&forms=<csv>]
[&dateRange=custom&startdt=YYYY-MM-DD&enddt=YYYY-MM-DD]
[&ciks=<csv-of-10-digit-zero-padded>]
[&entityName=<text>]
[&locationCodes=<csv-of-state-codes>]
[&locationType=located|incorporated]
[&sics=<csv-of-4-digit-codes>]
[&sort=desc|asc]
[&from=N]
Headers:
User-Agent: <YourOrg> <contact@example.com> # SEC fair-access courtesy
Accept: application/json
The q parameter accepts:
- A bare word:
q=catastrophe - An exact phrase, URL-encoded with quotes:
q=%22climate+risk%22("climate risk") - Boolean operators on plain or quoted terms:
q=%22going+concern%22+AND+NOT+%22substantial+doubt%22. Supported operators:AND,OR,NOT, parentheses. Default operator between bare terms isAND.
Filter parameters and their semantics (every one verified live against the API on 2026-05-18):
| Param | Type | Meaning |
|---|---|---|
forms | csv | Form codes (10-K, 10-Q, 8-K, S-1, DEF 14A, SC 13D, SC 13G, 13F-HR, 4, 424B, N-PX, …). Matches the root form — forms=10-K returns both 10-K and 10-K/A (verified). Multi-form: forms=10-K,8-K. |
dateRange | enum | Must be custom for startdt/enddt to take effect. |
startdt / enddt | YYYY-MM-DD | Bounds on file_date (the date the filing landed at EDGAR, not the period of report). Both inclusive. |
ciks | csv | 10-digit zero-padded CIK (e.g. 0000320193 for Apple, not 320193). Multi: 0000320193,0001652044. Filters to filings where the CIK appears in _source.ciks[] — i.e. filer or co-filer or insider-subject (Form 4). |
entityName | string | Fuzzy text match on the company/individual-name index. Multi-select in the UI; comma-separate to OR them. Useful when you only know the name. The API returns CIKs in the results — cache them and switch to ciks= for repeat queries (more precise). |
locationCodes | csv | 2-letter state code (CA, NY, …) or 2-char EDGAR foreign code (e.g. X0 = England). Plural form required — the singular locationCode= is silently ignored. |
locationType | enum | located (default — matches _source.biz_states[], the filer's business-address state) or incorporated (matches _source.inc_states[], the state of incorporation). |
sics | csv | 4-digit Standard Industrial Classification codes (e.g. 2834 pharma, 6022 state commercial banks, 7372 software). Multi-select supported. |
sort | enum | desc = file_date newest first, asc = oldest first. Omit for default relevance (_score) sort. The variants sortBy=date, order=date are silently ignored — only `sort=desc |
from | int | Pagination offset. Page size is fixed at 100 when from is set. ES max_result_window caps from + size ≤ 10000 → effective ceiling from=9900. See gotcha below for deeper paging. |
2. Send the request
The endpoint accepts any modern User-Agent; the SEC's fair-access policy requests (but does not enforce at this endpoint) a UA identifying your org plus a contact email. A bare Chrome UA returns 200 fine. Stay under 10 req/s aggregate to www.sec.gov + efts.sec.gov.
Through Browserbase's server-side Fetch API:
browse cloud fetch \
"https://efts.sec.gov/LATEST/search-index?q=%22climate+risk%22&forms=10-K&dateRange=custom&startdt=2024-01-01&enddt=2024-03-31&from=0" \
--output /tmp/edgar.json
Or any HTTP client with Accept: application/json.
3. Parse the response
{
"took": 1975,
"timed_out": false,
"_shards": { "total": 50, "successful": 50, "skipped": 0, "failed": 0 },
"hits": {
"total": { "value": 281, "relation": "eq" },
"max_score": 7.69,
"hits": [ /* one entry per filing — see below */ ]
},
"aggregations": {
"entity_filter": { "buckets": [ { "key": "...", "doc_count": 2 }, ... ] },
"sic_filter": { "buckets": [ { "key": "6022", "doc_count": 57 }, ... ] },
"biz_states_filter": { "buckets": [ { "key": "NY", "doc_count": 51 }, ... ] },
"form_filter": { "buckets": [ { "key": "10-K", "doc_count": 281 } ] }
},
"query": { /* the ES query echo (useful for debugging) */ }
}
hits.total.valueis the true total.relation: "eq"means exact;"gte"appears when the bucket is capped (typically at 10,000). When you seegte, narrow your filters to get an exact count.hits.hits[]is the page of matches (length ≤ 100 per request).aggregationsgives top-30 facet buckets for entity / sic / biz_states / form — useful for refining a too-wide query.
Each hit:
{
"_index": "edgar_file",
"_id": "0000815097-24-000011:ccl-20231130.htm",
"_score": 7.69,
"_source": {
"adsh": "0000815097-24-000011",
"ciks": ["0000815097", "0001125259"],
"display_names": ["CARNIVAL CORP (CCL) (CIK 0000815097)", "CARNIVAL PLC (CUK, CUKPF) (CIK 0001125259)"],
"form": "10-K",
"root_forms": ["10-K"],
"file_type": "10-K",
"file_description":"10-K",
"file_date": "2024-01-26",
"period_ending": "2023-11-30",
"biz_states": ["FL", "X0"],
"inc_states": ["DE"],
"biz_locations": ["Miami, FL", "Southampton So15 1st, X0"],
"sics": ["4400", "4400"],
"file_num": ["001-09610", "001-15136"],
"film_num": ["24564723", "24564724"],
"items": [],
"sequence": 1,
"xsl": null
}
}
Field notes:
_idhas the structural format{adsh}:{filename}— split on the first:. The accession number is on the left; the filename of the exhibit that matched is on the right.adshis the accession number with hyphens. Drop them to get the directory name for the URL.ciks[]is always zero-padded 10-digit. Multi-CIK hits arise for joint filings (co-filers in a 10-K) and for Form 4/3/5 hits where the filer is the reporting officer and the second CIK is the issuer. The first CIK in the array is the primary filer for URL purposes.display_names[]is the human-readable filer label EDGAR shows in the UI (Company (TICKER) (CIK xxx)); each entry pairs 1-to-1 withciks[].formis the exact form code (with/Aamendment suffix when applicable).root_forms[]is the parent (10-Kcovers10-K/A,8-Kcovers8-K/A).forms=query filter matches againstroot_forms[].file_date≠period_ending. Filing date is when EDGAR accepted the filing; period of report is the as-of date (e.g. fiscal year-end for a 10-K). For 8-Ks,period_endingis the event date.biz_states[]/inc_states[]use 2-letter US state codes plus EDGAR's foreign codes:X0=England,X1=… (one of the international codes),A0/D0=…,XX/ZZ=other. Treat any code that's not in the US 50+DC+territories list as "foreign / unknown" for downstream consumers.sics[]is the 4-digit SIC code per filer CIK. Multi-filer hits have one SIC entry per CIK.items[]is populated only for 8-Ks — the list of 8-K event item codes ("2.02","5.02", etc.). Empty for other forms._scoreis the relevance score from the underlying OpenSearch query. Default sort is_scoredesc.
4. Build the canonical filing-index URL and the direct document URL
EDGAR's filing archive uses a deterministic directory layout under https://www.sec.gov/Archives/edgar/data/{cik_int}/{adsh_no_dashes}/:
cik_int = int(_source.ciks[0]) # strip leading zeros: "0000815097" → 815097
adsh_no_dashes = _source.adsh.replace("-", "") # "0000815097-24-000011" → "000081509724000011"
filename = _id.split(":", 1)[1] # the part after the colon in the hit's _id
filing_index_url = f"https://www.sec.gov/Archives/edgar/data/{cik_int}/{adsh_no_dashes}/{adsh}-index.htm"
document_url = f"https://www.sec.gov/Archives/edgar/data/{cik_int}/{adsh_no_dashes}/{filename}"
Both URLs verified live (HTTP 200) against https://www.sec.gov/Archives/edgar/data/815097/000081509724000011/0000815097-24-000011-index.htm and .../ccl-20231130.htm.
The filing-index page is small (~20 KB) and lists every exhibit in the filing. The direct document URL is the exact .htm or .txt that contained the matching phrase — typically the primary exhibit (the 10-K body, not Exhibit 21 subsidiaries unless that's where the term appeared).
If you need the filer's full filing history (not just this one hit), use the metadata-browse URL — that's a separate skill, but for completeness: https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik_int}&type={form_code}&dateb=&owner=include&count=40.
5. Paginate
Append &from=100, &from=200, … to walk pages of 100 hits. Stop when you've collected hits.total.value results, or when len(hits.hits) < 100 (last page).
For result sets larger than 10,000:
- ES rejects
from + size > 10000(verified: returns 200 witherrorType: ResponseError, errorMessage: "search_phase_execution_exception: ... Result window is too large, from + size must be less than or equal to: [10000]"). - There is no documented scroll/search_after surface on this endpoint.
- The workaround: narrow by
dateRange(orforms/locationCodes/sics) and walk the narrower buckets sequentially. Withsort=asc+ a tight date window per page, you can sweep an unbounded corpus.
6. Return the structured result
For each hits.hits[i], surface:
{
"accession_number": "_source.adsh",
"filer_cik": "_source.ciks[0]",
"filer_name": "_source.display_names[0]",
"co_filer_ciks": "_source.ciks[1:]",
"co_filer_names": "_source.display_names[1:]",
"form": "_source.form",
"root_form": "_source.root_forms[0]",
"file_date": "_source.file_date",
"period_ending": "_source.period_ending",
"items_8k": "_source.items",
"filer_biz_state": "_source.biz_states[0]",
"filer_inc_state": "_source.inc_states[0]",
"filer_sic": "_source.sics[0]",
"filer_biz_location": "_source.biz_locations[0]",
"matched_file": "_id.split(':', 1)[1]",
"score": "_score",
"filing_index_url": "(derived per §4)",
"document_url": "(derived per §4)"
}
Plus, at the result-set level: total_results: hits.total.value, total_is_capped: hits.total.relation != 'eq', and (optionally, when the caller wants to refine) the four aggregations.* buckets verbatim.
Browser fallback
If the JSON endpoint is unreachable but www.sec.gov is:
- Open
https://www.sec.gov/edgar/search/#/q=<URL-encoded-query>&forms=<csv>&dateRange=custom&startdt=...&enddt=...— the SPA is hash-routed (useshasher.min.js); query parameters live after the#/, not after?. The page loads with the form pre-populated and auto-submits. - Wait for
.divResultsContainer .result-sectionto render (the SPA injects results via jQuery after the sameefts.sec.govAJAX call you'd otherwise make directly). - Each
.result-sectioncontains a<td class="filetype">{form}</td>,<td class="filed">{file_date}</td>, anchor<a class="preview-file" data-adsh="{adsh}" data-file-name="{filename}" href="...">, and a CIK column. Readdata-adshanddata-file-namefrom the anchor and reconstruct the URLs per §4. - Pagination is a server-side
Nextlink in the SPA — clicking it triggers another AJAX call withfrom=. Same 10,000-cap applies; sameAkamaiprotection applies towww.sec.gov(the SPA wrapper page itself), so a Verified + residential-proxy session is recommended for the SPA path. Theefts.sec.govAPI host has no Akamai/Cloudflare in front of it — a bare BB cloud-fetch from any IP returns 200 reliably.
The browser fallback offers no additional data versus the API — the SPA renders the exact same metadata fields the JSON returns. Use it only as a last resort.
Site-Specific Gotchas
-
CIKs must be 10-digit zero-padded.
ciks=320193returnsHTTP 500 {"message":"Internal server error"};ciks=0000320193works. Alwaysstr(cik).zfill(10)before sending. Multi-CIK:ciks=0000320193,0001652044. -
locationCodesis plural — singularlocationCodeis silently ignored. Confirmed:locationCode=CAreturned the unfiltered 281 hits for the climate-risk Q1 2024 query;locationCodes=CAreturned 31 hits all withbiz_states={"CA"}. The consumer SPA's internallocationCodeis rewritten tolocationCodesbefore sending (verified inhttps://www.sec.gov/edgar/search/js/edgar_full_text_search.js~line 290:if(searchParams.locationCode && searchParams.locationCode!='all') searchParams.locationCodes = searchParams.locationCode;). -
locationTypedefaults tolocated(business state), notincorporated. A naivelocationCodes=DEquery returns 2 hits (filings where the filer's office is in Delaware, very rare); addlocationType=incorporatedto switch to the much more populousinc_states(Delaware-incorporated entities — 111 hits on the same query). WhenlocationType=incorporatedis set, the result set'sinc_statesdistribution often includes Maryland (MD) too — likely from REITs and entities cross-tagged via subsidiary CIKs; treat the filter as "all hits where at least one CIK hasinc_statescontaining your code", not strict equality. -
forms=10-Kmatches10-K/Atoo (and8-Kmatches8-K/A). The filter is onroot_forms. If you specifically need to exclude amendments, post-filter on_source.formfor an exact match. -
No in-document snippets are returned. Despite the
match_phrasequery on a hiddendoc_textfield, the response does not include EShighlightblocks (verified: triedhighlight=true,hl=on,snippet=true— none take effect; the underlying query has_source.exclude: ["doc_text"]and nohighlightclause). The consumer SPA itself does not display snippets — it shows filing metadata and a link out. If the caller asks for "the exact text that matched", you must fetch the document URL separately and grep client-side for the query phrase. Note that 10-K bodies routinely run several MB and exceed Browserbase's 1 MBcloud fetchcap — use a streaming HTTP client or open the document in a session for the text extraction step. -
hits.total.relation: "gte"means the count is capped at 10,000. When you see this, your query is broader than 10k. Narrow withdateRange/forms/locationCodes/sicsto get an exact count. -
Deep pagination is bounded at
from + size ≤ 10,000. With the fixed 100-per-page size, that's maxfrom=9900. Beyond that, the API returns a 200 envelope with an OpenSearch error JSON:search_phase_execution_exception: [illegal_argument_exception] Reason: Result window is too large, from + size must be less than or equal to: [10000] but was [10090]. There is nosearch_afteror scroll API exposed. Workaround: narrow with a filter (most commonly a tighterdateRange), and sweep usingsort=ascto walk forward in time inside each window. -
sort=desc|asconly — other variants are silently ignored.sortBy=date,order=date, etc. all fall back to default relevance sort without an error. Verify your sort is taking effect by inspecting the first 2-3file_datevalues in the result. -
q=is required-ish — emptyq=&ciks=...works (browses filings for a CIK), but the endpoint expects eitherqor some other selector.q=with no other filters returns relevance-sorted results across the full 10,000+ corpus. -
Boolean operators must be uppercase.
AND,OR,NOT. Lowercase variants are treated as plain tokens. Parentheses are supported:q=(foo+OR+bar)+AND+%22exact+phrase%22. -
entityNameis fuzzy and multi-match.entityName=Applereturns 88 hits across ~10 distinct "Apple"-named entities (Apple Inc., Apple Hospitality REIT, Apple Green Holding, …). For precision, resolve to a specific CIK once and useciks=thereafter. -
sics=filtering is exact on 4-digit codes. Comma-separated multi works (sics=2834,2835,2836for biotech-adjacent). The result set's hits will only contain those SICs in_source.sics[]— verified. -
X0and other 2-char codes inbiz_states[]/inc_states[]are EDGAR's foreign country codes, not US states.X0is England,D0is West Germany (legacy),A0is Alberta, etc. The full table is on EDGAR's company-search help page; treat unknown codes as "international" rather than a state. -
Transient 5xx (~36-byte error JSON) under load. Observed once during iter-1: a
sics=2834request returned a 36-byte{"message":"Internal server error"}body, then succeeded immediately on retry with the identical URL. Implement a retry-with-backoff of 2-3 attempts before propagating the error. -
The API has CORS open (
Access-Control-Allow-Origin: *) and no auth. A browser-extension or any cross-origin web client can call it directly. -
No Akamai/Cloudflare on
efts.sec.gov; Akamai IS present onwww.sec.gov. A bare HTTP client getsefts.sec.govJSON reliably. The consumer SPA host (www.sec.gov/edgar/search/) injects an Akamai detection script (/akam/13/{id}+bazadebezolkohpepadrvariable) — under aggressive scraping this can challenge. The download URLs underwww.sec.gov/Archives/edgar/data/...are not Akamai-protected and stream fine. -
SEC fair-access policy: 10 req/s aggregate across all sec.gov subdomains. Stay under it. Identifying
User-Agent: <YourOrg> <contact-email>is requested by policy but not enforced at the efts.sec.gov endpoint (a default Chrome UA returns 200). -
_idfilename can collide across different accessions in rare cases. Always carry the full_id(with the{adsh}:prefix) or theadshfield through downstream pipelines — never the filename alone — since two filings can have the same exhibit filename. -
Index coverage starts 2001-05-04. A query with
sort=ascand no date filter for"climate risk"on 10-Ks returned 2001-05-04 as the earliest hit, matching the SEC's published "full-text search since 2001" coverage statement. Older filings exist in EDGAR but are not in this index.
Expected Output
A search returning at least one hit:
{
"query": "\"climate risk\"",
"forms": ["10-K"],
"date_range": { "start": "2024-01-01", "end": "2024-03-31" },
"sort": "relevance",
"total_results": 281,
"total_is_capped": false,
"returned": 100,
"from": 0,
"next_from": 100,
"filings": [
{
"accession_number": "0000815097-24-000011",
"filer_cik": "0000815097",
"filer_name": "CARNIVAL CORP (CCL) (CIK 0000815097)",
"co_filer_ciks": ["0001125259"],
"co_filer_names": ["CARNIVAL PLC (CUK, CUKPF) (CIK 0001125259)"],
"form": "10-K",
"root_form": "10-K",
"file_date": "2024-01-26",
"period_ending": "2023-11-30",
"items_8k": [],
"filer_biz_state": "FL",
"filer_inc_state": "DE",
"filer_sic": "4400",
"filer_biz_location": "Miami, FL",
"matched_file": "ccl-20231130.htm",
"score": 7.69,
"filing_index_url": "https://www.sec.gov/Archives/edgar/data/815097/000081509724000011/0000815097-24-000011-index.htm",
"document_url": "https://www.sec.gov/Archives/edgar/data/815097/000081509724000011/ccl-20231130.htm",
"matched_snippet": null
}
],
"aggregations": {
"top_filers": [{ "name": "Carlyle Group Inc. (CG, CGABL) (CIK 0001527166)", "count": 2 }],
"top_sics": [{ "code": "6022", "count": 57 }, { "code": "6021", "count": 41 }],
"top_states": [{ "code": "NY", "count": 51 }, { "code": "CA", "count": 31 }],
"top_forms": [{ "code": "10-K", "count": 281 }]
}
}
A search returning zero hits:
{
"query": "\"material weakness\"",
"forms": ["8-K"],
"ciks": ["0000320193"],
"total_results": 0,
"total_is_capped": false,
"returned": 0,
"filings": []
}
A search where the total count is capped (broaden the alternation, narrow with filters):
{
"query": "the",
"forms": ["10-K", "8-K"],
"total_results": 10000,
"total_is_capped": true,
"returned": 100,
"from": 0,
"next_from": 100,
"filings": [ /* first 100 hits */ ],
"note": "Result set exceeds 10,000. Narrow with dateRange / locationCodes / sics and re-query to get an exact count."
}
A search where the requested page is past the ES window:
{
"query": "the",
"from": 10000,
"error": "result_window_exceeded",
"message": "from + size must be less than or equal to 10000. Narrow filters and re-paginate within the narrower set."
}
A search where the CIK was passed un-padded (the input must be fixed before retrying):
{
"error": "invalid_cik",
"message": "ciks must be 10-digit zero-padded; pad with leading zeros and retry. Received: '320193' → should be '0000320193'."
}