SEC EDGAR Full-Text Filing Search
Purpose
Search the body text of SEC filings (10-K, 10-Q, 8-K, S-1, 424B, DEF 14A, SC 13D/G, 13F-HR, Forms 3/4/5, N-PX, etc., back to 2001) and return matching filings as structured JSON: accession number, filer name + CIK, form type, filing date, period of report, SIC + state of incorporation, the matching exhibit filename, and the canonical filing-index + direct-document URLs. Distinct from EDGAR's filing-metadata browse — this searches the full document text. Read-only.
EDGAR full-text search is backed by a public JSON endpoint (https://efts.sec.gov/LATEST/search-index) with no auth, no cookies, and no captcha. Lead with that endpoint; the human UI at https://www.sec.gov/edgar/search/ is a thin React client over the same API and is only a fallback.
When to Use
- Find every filing that mentions a phrase ("material weakness", "going concern", "force majeure", "climate risk disclosure") across all filers, optionally narrowed by form type, date range, filer, location, or SIC.
- Monitor new filings matching a query (e.g. all 8-Ks mentioning a topic in the last 30 days).
- Pull the matching exhibit URL so a downstream step can fetch the document and extract the surrounding context.
- Anywhere you'd otherwise scrape the EDGAR full-text search results page — the JSON API is faster, cheaper, and structurally reliable.
Workflow
Hit the JSON endpoint directly with an HTTP GET. SEC's fair-access policy requires a descriptive User-Agent header that identifies the requester (User-Agent: Sample Company contact@example.com) — a missing or generic UA gets a 403. No auth, cookies, or proxy are needed. Rate-limit to ≤ 10 requests/second.
-
Build the request URL:
GET https://efts.sec.gov/LATEST/search-index ?q=<query> &forms=<f1,f2,...> &dateRange=custom&startdt=YYYY-MM-DD&enddt=YYYY-MM-DD &ciks=<0000320193,...> # optional &entityName=<company name> # optional &locationCodes=<CA,NY,...> # optional &locationType=incorporated # optional (default = located-in) &from=<offset> # optional, pagination Header: User-Agent: Sample Company contact@example.comqsyntax:- Bare phrase → exact phrase match (
q="climate risk"runsmatch_phraseondoc_text). - Wrap in escaped quotes for a literal exact phrase:
q=%22force majeure%22. - Boolean operators are supported:
AND,OR,NOT, and parentheses.
Filters:
forms— comma-separated EDGAR root form codes (10-K,10-Q,8-K,S-1,424B,DEF 14A,SC 13D,SC 13G,13F-HR,4,3,5,N-PX). Filters onroot_forms, so10-Kalso captures10-K/Aamendments.dateRange=custom+startdt/enddt(YYYY-MM-DD). The API applies thefile_daterange purely fromstartdt/enddt; named UI ranges (last30d,last1y, …) are just shortcuts the UI converts to explicit dates.ciks— comma-separated, zero-padded 10-digit CIKs.entityName— free-text company/person name (resolves against the entity index).locationCodes— comma-separated 2-letter state/country codes; default semantics = principal-office located-in. AddlocationType=incorporatedto switch to state of incorporation. Note the param islocationCodes(plural) —locationCode(singular) is silently ignored.
- Bare phrase → exact phrase match (
-
Parse the response (Elasticsearch-shaped JSON):
hits.total.value— total result count, capped at 10000 (checkhits.total.relation:eq= exact,gte= at least). The returned slice is partial whenever this exceeds what you fetched.hits.hits[]— up to 100 hits per request (the page size is server-fixed at 100; the UI shows 10, but the raw API returns 100). Each hit:_id="{accession}:{filename}"— split on:to get the accession number and the matching exhibit filename._source.adsh— accession number (e.g.0000815097-24-000011)._source.ciks[],_source.display_names[]— arrays (co-registrants produce multiple);display_nameslook like"CARNIVAL CORP (CCL) (CIK 0000815097)"— take the substring before(for the clean filer name._source.form,_source.file_date,_source.period_ending,_source.sics[],_source.inc_states[],_source.biz_states[],_source.biz_locations[],_source.items[](8-K item codes).
aggregations— facet bucket counts (form_filter,entity_filter,sic_filter,biz_states_filter, top 30 each) for building a faceted summary without extra requests.
-
Construct the URLs (not returned by the API — derive them):
cikInt = parseInt(_source.ciks[0])(strip leading zeros) →815097accNoDash = _source.adsh.replace(/-/g, "")→000081509724000011filename= part after:in_id- Direct document URL:
https://www.sec.gov/Archives/edgar/data/{cikInt}/{accNoDash}/{filename} - Filing index URL:
https://www.sec.gov/Archives/edgar/data/{cikInt}/{accNoDash}/{adsh}-index.htm
-
Snippet (if the caller needs the matching excerpt): the API does not return it (see gotcha). Fetch the direct document URL with the same
User-Agentand locate the query terms in the document text yourself, wrapping them in your own highlight markers. -
Paginate if
hits.total.valueexceeds 100: re-request withfrom=100,from=200, … up tofrom=9900(the 10000-result Elasticsearch window cap). Beyond that, narrow the query (tighter date range / form filter) instead.
Browser fallback
Only if the JSON endpoint is unreachable. The human UI is a hash-routed React app:
https://www.sec.gov/edgar/search/#/q=%22climate%20risk%22&forms=10-K&dateRange=custom&startdt=2024-01-01&enddt=2024-03-31
browse open the hash URL, browse wait timeout 3000 (results render after an XHR to the same efts.sec.gov endpoint), dismiss the occasional "We'd welcome your feedback" survey modal (click No thanks), then read the results table. This is strictly slower and yields the same data as the API — prefer the API.
Site-Specific Gotchas
- The API never returns the matching text snippet. The server-side query hard-codes
_source: { exclude: ["doc_text"] }and configures no highlighter, so there is nohighlightfield and no excerpt in any response — only filing metadata + the matching exhibit filename (in_id). To get the actual matched text you must fetch the document URL and search it client-side. Passing your own&_source=...param does not override this — it is ignored. - A descriptive
User-Agentis mandatory. SEC's fair-access policy rejects empty/default UAs with403. UseUser-Agent: Your Company you@example.com. (browse cloud fetchand a real browser session send an acceptable UA automatically; a barefetch/requests/curlwith no UA gets blocked.) - Transient
{"message": "Internal server error"}(HTTP 500). Observed intermittently on otherwise-valid requests (hit during testing and during an autobrowse iteration). It clears on an immediate retry of the identical URL — build in one retry before treating a 500 as fatal. - Page size is 100, not 10. The UI paginates by 10, but the raw endpoint returns up to 100 hits per call with no size param. Don't assume 10.
- Total count caps at 10000 with
relation: "gte". You cannot page pastfrom=9900; narrow the query to see more. locationCodesis plural.locationCode(singular) is silently dropped (returns unfiltered results). Default = located-in;locationType=incorporatedswitches to state of incorporation.- Arrays for co-registrants.
ciks,display_names,biz_states,sics,biz_locationsare arrays — a single filing can list multiple filers (e.g. Carnival Corp + Carnival plc). Use index[0]for the primary filer unless you need all. display_namesneeds parsing. Format is"NAME (TICKER) (CIK 0000123456)"(double-spaced). Split on(for the clean name; the trailing(CIK …)already gives you the zero-padded CIK if you'd rather regex it out than readciks[].formsmatches root form types.forms=10-Kalso returns10-K/Aamendments;forms=4returns Form 4 ownership filings. There is no exact-form-string filter — filter_source.formclient-side if you need to exclude amendments.- No SIC query param. SIC appears only as an aggregation facet (
sic_filter) and in each hit's_source.sics[]. To filter by SIC, filter the returned hits client-side on_source.sics. - Coverage starts 2001. Full-text indexing does not cover filings before 2001; older filings won't appear regardless of date range.
- No anti-bot. No Akamai, captcha, login, or proxy requirement observed — a bare (non-stealth, non-proxy) request succeeds. Don't waste a residential proxy on this.
Expected Output
{
"success": true,
"query": "\"climate risk\"",
"forms": ["10-K"],
"date_range": { "start": "2024-01-01", "end": "2024-03-31" },
"total_results": 281,
"total_relation": "eq",
"returned": 100,
"from": 0,
"facets": {
"by_form": [{ "form": "10-K", "count": 281 }],
"by_state": [{ "state": "NY", "count": 51 }, { "state": "CA", "count": 31 }],
"by_sic": [{ "sic": "6022", "count": 57 }]
},
"results": [
{
"accession_number": "0000815097-24-000011",
"filer_name": "CARNIVAL CORP",
"filer_cik": "0000815097",
"all_filers": ["CARNIVAL CORP (CCL) (CIK 0000815097)", "CARNIVAL PLC (CUK, CUKPF) (CIK 0001125259)"],
"form_type": "10-K",
"filing_date": "2024-01-26",
"period_of_report": "2023-11-30",
"sic": "4400",
"state_of_incorporation": "DE",
"business_state": "FL",
"matching_file": "ccl-20231130.htm",
"document_url": "https://www.sec.gov/Archives/edgar/data/815097/000081509724000011/ccl-20231130.htm",
"filing_index_url": "https://www.sec.gov/Archives/edgar/data/815097/000081509724000011/0000815097-24-000011-index.htm",
"snippet": null
}
]
}
snippet is null from the API alone; populate it only if a follow-up document fetch was performed (see Workflow step 4).
No results:
{ "success": true, "query": "\"asdfqwerzxcv nonexistent phrase\"", "total_results": 0, "returned": 0, "results": [] }
Transient upstream error (after the one retry also fails):
{ "success": false, "error_reasoning": "EDGAR returned HTTP 500 {\"message\":\"Internal server error\"} on two consecutive attempts" }