sec.gov

search-edgar-fulltext

Installation

Adds this website's skill for your agents

browse skills add sec.gov/search-edgar-fulltext-dpk6r2
Summary

Search the full body text of SEC EDGAR filings (10-K, 10-Q, 8-K, S-1, DEF 14A, 13F, 13D/G, Form 4, etc., 2001-present) via the public efts.sec.gov JSON API, with filters for form type, filer (CIK or name), filer location, SIC code, and date range. Returns structured filing metadata plus canonical filing-index and document URLs.

FIG. 01
FIG. 02
FIG. 03
FIG. 04
FIG. 05
SKILL.md
373 lines

SEC EDGAR Full-Text Filing Search

Purpose

Search the full body text of every SEC EDGAR filing accepted electronically since 2001 — 10-K, 10-Q, 8-K, S-1, S-3, 424B*, DEF 14A, SC 13D, SC 13G, 13F-HR, Forms 3/4/5, N-PX, and dozens of others — for a phrase or boolean expression, and return a structured list of matching filings with filer identity, form type, filing date, period of report, business + incorporation state, SIC code, the canonical filing-index URL, and the direct URL to the matching exhibit. Read-only.

This is distinct from EDGAR's filing-metadata browse surface (/cgi-bin/browse-edgar), which lists filings by company without looking inside them. This skill searches the body text.

When to Use

  • "Find every 10-K from Q1 2024 that mentions 'climate risk'."
  • "Which 8-Ks from Apple in the last year contain 'material weakness'?"
  • "List all filings citing a specific Treasury regulation across the entire EDGAR corpus."
  • "Find SC 13D/G filings disclosing a stake above N% in {company}."
  • "Show all Form 4 trades by a named officer across every company they're an insider at."
  • Any flow where the question is "which filings mention X?" rather than "what filings did company Y submit?"

Workflow

EDGAR's full-text search is backed by a single public JSON endpoint: https://efts.sec.gov/LATEST/search-index (an AWS API Gateway in front of an OpenSearch/Elasticsearch cluster, accept-cors *, no auth, no cookies, no captcha). The consumer SPA at https://www.sec.gov/edgar/search/ is a thin jQuery wrapper that calls this exact endpoint and renders the same metadata fields. Always use the JSON endpoint directly — the browser path costs ~50× more agent turns and returns no extra data.

1. Build the query URL

GET https://efts.sec.gov/LATEST/search-index
    ?q=<phrase-or-boolean>
    [&forms=<csv>]
    [&dateRange=custom&startdt=YYYY-MM-DD&enddt=YYYY-MM-DD]
    [&ciks=<csv-of-10-digit-zero-padded>]
    [&entityName=<text>]
    [&locationCodes=<csv-of-state-codes>]
    [&locationType=located|incorporated]
    [&sics=<csv-of-4-digit-codes>]
    [&sort=desc|asc]
    [&from=N]

Headers:
  User-Agent: <YourOrg> <contact@example.com>   # SEC fair-access courtesy
  Accept: application/json

The q parameter accepts:

  • A bare word: q=catastrophe
  • An exact phrase, URL-encoded with quotes: q=%22climate+risk%22 ("climate risk")
  • Boolean operators on plain or quoted terms: q=%22going+concern%22+AND+NOT+%22substantial+doubt%22. Supported operators: AND, OR, NOT, parentheses. Default operator between bare terms is AND.

Filter parameters and their semantics (every one verified live against the API on 2026-05-18):

ParamTypeMeaning
formscsvForm codes (10-K, 10-Q, 8-K, S-1, DEF 14A, SC 13D, SC 13G, 13F-HR, 4, 424B, N-PX, …). Matches the root formforms=10-K returns both 10-K and 10-K/A (verified). Multi-form: forms=10-K,8-K.
dateRangeenumMust be custom for startdt/enddt to take effect.
startdt / enddtYYYY-MM-DDBounds on file_date (the date the filing landed at EDGAR, not the period of report). Both inclusive.
cikscsv10-digit zero-padded CIK (e.g. 0000320193 for Apple, not 320193). Multi: 0000320193,0001652044. Filters to filings where the CIK appears in _source.ciks[] — i.e. filer or co-filer or insider-subject (Form 4).
entityNamestringFuzzy text match on the company/individual-name index. Multi-select in the UI; comma-separate to OR them. Useful when you only know the name. The API returns CIKs in the results — cache them and switch to ciks= for repeat queries (more precise).
locationCodescsv2-letter state code (CA, NY, …) or 2-char EDGAR foreign code (e.g. X0 = England). Plural form required — the singular locationCode= is silently ignored.
locationTypeenumlocated (default — matches _source.biz_states[], the filer's business-address state) or incorporated (matches _source.inc_states[], the state of incorporation).
sicscsv4-digit Standard Industrial Classification codes (e.g. 2834 pharma, 6022 state commercial banks, 7372 software). Multi-select supported.
sortenumdesc = file_date newest first, asc = oldest first. Omit for default relevance (_score) sort. The variants sortBy=date, order=date are silently ignored — only `sort=desc
fromintPagination offset. Page size is fixed at 100 when from is set. ES max_result_window caps from + size ≤ 10000 → effective ceiling from=9900. See gotcha below for deeper paging.

2. Send the request

The endpoint accepts any modern User-Agent; the SEC's fair-access policy requests (but does not enforce at this endpoint) a UA identifying your org plus a contact email. A bare Chrome UA returns 200 fine. Stay under 10 req/s aggregate to www.sec.gov + efts.sec.gov.

Through Browserbase's server-side Fetch API:

browse cloud fetch \
  "https://efts.sec.gov/LATEST/search-index?q=%22climate+risk%22&forms=10-K&dateRange=custom&startdt=2024-01-01&enddt=2024-03-31&from=0" \
  --output /tmp/edgar.json

Or any HTTP client with Accept: application/json.

3. Parse the response

{
  "took": 1975,
  "timed_out": false,
  "_shards": { "total": 50, "successful": 50, "skipped": 0, "failed": 0 },
  "hits": {
    "total": { "value": 281, "relation": "eq" },
    "max_score": 7.69,
    "hits": [ /* one entry per filing — see below */ ]
  },
  "aggregations": {
    "entity_filter":     { "buckets": [ { "key": "...", "doc_count": 2 }, ... ] },
    "sic_filter":        { "buckets": [ { "key": "6022", "doc_count": 57 }, ... ] },
    "biz_states_filter": { "buckets": [ { "key": "NY", "doc_count": 51 }, ... ] },
    "form_filter":       { "buckets": [ { "key": "10-K", "doc_count": 281 } ] }
  },
  "query": { /* the ES query echo (useful for debugging) */ }
}
  • hits.total.value is the true total. relation: "eq" means exact; "gte" appears when the bucket is capped (typically at 10,000). When you see gte, narrow your filters to get an exact count.
  • hits.hits[] is the page of matches (length ≤ 100 per request).
  • aggregations gives top-30 facet buckets for entity / sic / biz_states / form — useful for refining a too-wide query.

Each hit:

{
  "_index": "edgar_file",
  "_id": "0000815097-24-000011:ccl-20231130.htm",
  "_score": 7.69,
  "_source": {
    "adsh":            "0000815097-24-000011",
    "ciks":            ["0000815097", "0001125259"],
    "display_names":   ["CARNIVAL CORP  (CCL)  (CIK 0000815097)", "CARNIVAL PLC  (CUK, CUKPF)  (CIK 0001125259)"],
    "form":            "10-K",
    "root_forms":      ["10-K"],
    "file_type":       "10-K",
    "file_description":"10-K",
    "file_date":       "2024-01-26",
    "period_ending":   "2023-11-30",
    "biz_states":      ["FL", "X0"],
    "inc_states":      ["DE"],
    "biz_locations":   ["Miami, FL", "Southampton So15 1st, X0"],
    "sics":            ["4400", "4400"],
    "file_num":        ["001-09610", "001-15136"],
    "film_num":        ["24564723", "24564724"],
    "items":           [],
    "sequence":        1,
    "xsl":             null
  }
}

Field notes:

  • _id has the structural format {adsh}:{filename} — split on the first :. The accession number is on the left; the filename of the exhibit that matched is on the right.
  • adsh is the accession number with hyphens. Drop them to get the directory name for the URL.
  • ciks[] is always zero-padded 10-digit. Multi-CIK hits arise for joint filings (co-filers in a 10-K) and for Form 4/3/5 hits where the filer is the reporting officer and the second CIK is the issuer. The first CIK in the array is the primary filer for URL purposes.
  • display_names[] is the human-readable filer label EDGAR shows in the UI (Company (TICKER) (CIK xxx)); each entry pairs 1-to-1 with ciks[].
  • form is the exact form code (with /A amendment suffix when applicable). root_forms[] is the parent (10-K covers 10-K/A, 8-K covers 8-K/A). forms= query filter matches against root_forms[].
  • file_dateperiod_ending. Filing date is when EDGAR accepted the filing; period of report is the as-of date (e.g. fiscal year-end for a 10-K). For 8-Ks, period_ending is the event date.
  • biz_states[] / inc_states[] use 2-letter US state codes plus EDGAR's foreign codes: X0=England, X1=… (one of the international codes), A0/D0=…, XX/ZZ=other. Treat any code that's not in the US 50+DC+territories list as "foreign / unknown" for downstream consumers.
  • sics[] is the 4-digit SIC code per filer CIK. Multi-filer hits have one SIC entry per CIK.
  • items[] is populated only for 8-Ks — the list of 8-K event item codes ("2.02", "5.02", etc.). Empty for other forms.
  • _score is the relevance score from the underlying OpenSearch query. Default sort is _score desc.

4. Build the canonical filing-index URL and the direct document URL

EDGAR's filing archive uses a deterministic directory layout under https://www.sec.gov/Archives/edgar/data/{cik_int}/{adsh_no_dashes}/:

cik_int        = int(_source.ciks[0])           # strip leading zeros: "0000815097" → 815097
adsh_no_dashes = _source.adsh.replace("-", "")  # "0000815097-24-000011" → "000081509724000011"
filename       = _id.split(":", 1)[1]           # the part after the colon in the hit's _id

filing_index_url = f"https://www.sec.gov/Archives/edgar/data/{cik_int}/{adsh_no_dashes}/{adsh}-index.htm"
document_url     = f"https://www.sec.gov/Archives/edgar/data/{cik_int}/{adsh_no_dashes}/{filename}"

Both URLs verified live (HTTP 200) against https://www.sec.gov/Archives/edgar/data/815097/000081509724000011/0000815097-24-000011-index.htm and .../ccl-20231130.htm.

The filing-index page is small (~20 KB) and lists every exhibit in the filing. The direct document URL is the exact .htm or .txt that contained the matching phrase — typically the primary exhibit (the 10-K body, not Exhibit 21 subsidiaries unless that's where the term appeared).

If you need the filer's full filing history (not just this one hit), use the metadata-browse URL — that's a separate skill, but for completeness: https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik_int}&type={form_code}&dateb=&owner=include&count=40.

5. Paginate

Append &from=100, &from=200, … to walk pages of 100 hits. Stop when you've collected hits.total.value results, or when len(hits.hits) < 100 (last page).

For result sets larger than 10,000:

  • ES rejects from + size > 10000 (verified: returns 200 with errorType: ResponseError, errorMessage: "search_phase_execution_exception: ... Result window is too large, from + size must be less than or equal to: [10000]").
  • There is no documented scroll/search_after surface on this endpoint.
  • The workaround: narrow by dateRange (or forms / locationCodes / sics) and walk the narrower buckets sequentially. With sort=asc + a tight date window per page, you can sweep an unbounded corpus.

6. Return the structured result

For each hits.hits[i], surface:

{
  "accession_number":   "_source.adsh",
  "filer_cik":          "_source.ciks[0]",
  "filer_name":         "_source.display_names[0]",
  "co_filer_ciks":      "_source.ciks[1:]",
  "co_filer_names":     "_source.display_names[1:]",
  "form":               "_source.form",
  "root_form":          "_source.root_forms[0]",
  "file_date":          "_source.file_date",
  "period_ending":      "_source.period_ending",
  "items_8k":           "_source.items",
  "filer_biz_state":    "_source.biz_states[0]",
  "filer_inc_state":    "_source.inc_states[0]",
  "filer_sic":          "_source.sics[0]",
  "filer_biz_location": "_source.biz_locations[0]",
  "matched_file":       "_id.split(':', 1)[1]",
  "score":              "_score",
  "filing_index_url":   "(derived per §4)",
  "document_url":       "(derived per §4)"
}

Plus, at the result-set level: total_results: hits.total.value, total_is_capped: hits.total.relation != 'eq', and (optionally, when the caller wants to refine) the four aggregations.* buckets verbatim.

Browser fallback

If the JSON endpoint is unreachable but www.sec.gov is:

  1. Open https://www.sec.gov/edgar/search/#/q=<URL-encoded-query>&forms=<csv>&dateRange=custom&startdt=...&enddt=... — the SPA is hash-routed (uses hasher.min.js); query parameters live after the #/, not after ?. The page loads with the form pre-populated and auto-submits.
  2. Wait for .divResultsContainer .result-section to render (the SPA injects results via jQuery after the same efts.sec.gov AJAX call you'd otherwise make directly).
  3. Each .result-section contains a <td class="filetype">{form}</td>, <td class="filed">{file_date}</td>, anchor <a class="preview-file" data-adsh="{adsh}" data-file-name="{filename}" href="...">, and a CIK column. Read data-adsh and data-file-name from the anchor and reconstruct the URLs per §4.
  4. Pagination is a server-side Next link in the SPA — clicking it triggers another AJAX call with from=. Same 10,000-cap applies; same Akamai protection applies to www.sec.gov (the SPA wrapper page itself), so a Verified + residential-proxy session is recommended for the SPA path. The efts.sec.gov API host has no Akamai/Cloudflare in front of it — a bare BB cloud-fetch from any IP returns 200 reliably.

The browser fallback offers no additional data versus the API — the SPA renders the exact same metadata fields the JSON returns. Use it only as a last resort.

Site-Specific Gotchas

  • CIKs must be 10-digit zero-padded. ciks=320193 returns HTTP 500 {"message":"Internal server error"}; ciks=0000320193 works. Always str(cik).zfill(10) before sending. Multi-CIK: ciks=0000320193,0001652044.

  • locationCodes is plural — singular locationCode is silently ignored. Confirmed: locationCode=CA returned the unfiltered 281 hits for the climate-risk Q1 2024 query; locationCodes=CA returned 31 hits all with biz_states={"CA"}. The consumer SPA's internal locationCode is rewritten to locationCodes before sending (verified in https://www.sec.gov/edgar/search/js/edgar_full_text_search.js ~line 290: if(searchParams.locationCode && searchParams.locationCode!='all') searchParams.locationCodes = searchParams.locationCode;).

  • locationType defaults to located (business state), not incorporated. A naive locationCodes=DE query returns 2 hits (filings where the filer's office is in Delaware, very rare); add locationType=incorporated to switch to the much more populous inc_states (Delaware-incorporated entities — 111 hits on the same query). When locationType=incorporated is set, the result set's inc_states distribution often includes Maryland (MD) too — likely from REITs and entities cross-tagged via subsidiary CIKs; treat the filter as "all hits where at least one CIK has inc_states containing your code", not strict equality.

  • forms=10-K matches 10-K/A too (and 8-K matches 8-K/A). The filter is on root_forms. If you specifically need to exclude amendments, post-filter on _source.form for an exact match.

  • No in-document snippets are returned. Despite the match_phrase query on a hidden doc_text field, the response does not include ES highlight blocks (verified: tried highlight=true, hl=on, snippet=true — none take effect; the underlying query has _source.exclude: ["doc_text"] and no highlight clause). The consumer SPA itself does not display snippets — it shows filing metadata and a link out. If the caller asks for "the exact text that matched", you must fetch the document URL separately and grep client-side for the query phrase. Note that 10-K bodies routinely run several MB and exceed Browserbase's 1 MB cloud fetch cap — use a streaming HTTP client or open the document in a session for the text extraction step.

  • hits.total.relation: "gte" means the count is capped at 10,000. When you see this, your query is broader than 10k. Narrow with dateRange / forms / locationCodes / sics to get an exact count.

  • Deep pagination is bounded at from + size ≤ 10,000. With the fixed 100-per-page size, that's max from=9900. Beyond that, the API returns a 200 envelope with an OpenSearch error JSON: search_phase_execution_exception: [illegal_argument_exception] Reason: Result window is too large, from + size must be less than or equal to: [10000] but was [10090]. There is no search_after or scroll API exposed. Workaround: narrow with a filter (most commonly a tighter dateRange), and sweep using sort=asc to walk forward in time inside each window.

  • sort=desc|asc only — other variants are silently ignored. sortBy=date, order=date, etc. all fall back to default relevance sort without an error. Verify your sort is taking effect by inspecting the first 2-3 file_date values in the result.

  • q= is required-ish — empty q=&ciks=... works (browses filings for a CIK), but the endpoint expects either q or some other selector. q= with no other filters returns relevance-sorted results across the full 10,000+ corpus.

  • Boolean operators must be uppercase. AND, OR, NOT. Lowercase variants are treated as plain tokens. Parentheses are supported: q=(foo+OR+bar)+AND+%22exact+phrase%22.

  • entityName is fuzzy and multi-match. entityName=Apple returns 88 hits across ~10 distinct "Apple"-named entities (Apple Inc., Apple Hospitality REIT, Apple Green Holding, …). For precision, resolve to a specific CIK once and use ciks= thereafter.

  • sics= filtering is exact on 4-digit codes. Comma-separated multi works (sics=2834,2835,2836 for biotech-adjacent). The result set's hits will only contain those SICs in _source.sics[] — verified.

  • X0 and other 2-char codes in biz_states[] / inc_states[] are EDGAR's foreign country codes, not US states. X0 is England, D0 is West Germany (legacy), A0 is Alberta, etc. The full table is on EDGAR's company-search help page; treat unknown codes as "international" rather than a state.

  • Transient 5xx (~36-byte error JSON) under load. Observed once during iter-1: a sics=2834 request returned a 36-byte {"message":"Internal server error"} body, then succeeded immediately on retry with the identical URL. Implement a retry-with-backoff of 2-3 attempts before propagating the error.

  • The API has CORS open (Access-Control-Allow-Origin: *) and no auth. A browser-extension or any cross-origin web client can call it directly.

  • No Akamai/Cloudflare on efts.sec.gov; Akamai IS present on www.sec.gov. A bare HTTP client gets efts.sec.gov JSON reliably. The consumer SPA host (www.sec.gov/edgar/search/) injects an Akamai detection script (/akam/13/{id} + bazadebezolkohpepadr variable) — under aggressive scraping this can challenge. The download URLs under www.sec.gov/Archives/edgar/data/... are not Akamai-protected and stream fine.

  • SEC fair-access policy: 10 req/s aggregate across all sec.gov subdomains. Stay under it. Identifying User-Agent: <YourOrg> <contact-email> is requested by policy but not enforced at the efts.sec.gov endpoint (a default Chrome UA returns 200).

  • _id filename can collide across different accessions in rare cases. Always carry the full _id (with the {adsh}: prefix) or the adsh field through downstream pipelines — never the filename alone — since two filings can have the same exhibit filename.

  • Index coverage starts 2001-05-04. A query with sort=asc and no date filter for "climate risk" on 10-Ks returned 2001-05-04 as the earliest hit, matching the SEC's published "full-text search since 2001" coverage statement. Older filings exist in EDGAR but are not in this index.

Expected Output

A search returning at least one hit:

{
  "query":         "\"climate risk\"",
  "forms":         ["10-K"],
  "date_range":    { "start": "2024-01-01", "end": "2024-03-31" },
  "sort":          "relevance",
  "total_results": 281,
  "total_is_capped": false,
  "returned":      100,
  "from":          0,
  "next_from":     100,
  "filings": [
    {
      "accession_number":    "0000815097-24-000011",
      "filer_cik":           "0000815097",
      "filer_name":          "CARNIVAL CORP  (CCL)  (CIK 0000815097)",
      "co_filer_ciks":       ["0001125259"],
      "co_filer_names":      ["CARNIVAL PLC  (CUK, CUKPF)  (CIK 0001125259)"],
      "form":                "10-K",
      "root_form":           "10-K",
      "file_date":           "2024-01-26",
      "period_ending":       "2023-11-30",
      "items_8k":            [],
      "filer_biz_state":     "FL",
      "filer_inc_state":     "DE",
      "filer_sic":           "4400",
      "filer_biz_location":  "Miami, FL",
      "matched_file":        "ccl-20231130.htm",
      "score":               7.69,
      "filing_index_url":    "https://www.sec.gov/Archives/edgar/data/815097/000081509724000011/0000815097-24-000011-index.htm",
      "document_url":        "https://www.sec.gov/Archives/edgar/data/815097/000081509724000011/ccl-20231130.htm",
      "matched_snippet":     null
    }
  ],
  "aggregations": {
    "top_filers":   [{ "name": "Carlyle Group Inc.  (CG, CGABL)  (CIK 0001527166)", "count": 2 }],
    "top_sics":     [{ "code": "6022", "count": 57 }, { "code": "6021", "count": 41 }],
    "top_states":   [{ "code": "NY", "count": 51 }, { "code": "CA", "count": 31 }],
    "top_forms":    [{ "code": "10-K", "count": 281 }]
  }
}

A search returning zero hits:

{
  "query": "\"material weakness\"",
  "forms": ["8-K"],
  "ciks":  ["0000320193"],
  "total_results": 0,
  "total_is_capped": false,
  "returned": 0,
  "filings": []
}

A search where the total count is capped (broaden the alternation, narrow with filters):

{
  "query": "the",
  "forms": ["10-K", "8-K"],
  "total_results": 10000,
  "total_is_capped": true,
  "returned": 100,
  "from": 0,
  "next_from": 100,
  "filings": [ /* first 100 hits */ ],
  "note": "Result set exceeds 10,000. Narrow with dateRange / locationCodes / sics and re-query to get an exact count."
}

A search where the requested page is past the ES window:

{
  "query": "the",
  "from":  10000,
  "error": "result_window_exceeded",
  "message": "from + size must be less than or equal to 10000. Narrow filters and re-paginate within the narrower set."
}

A search where the CIK was passed un-padded (the input must be fixed before retrying):

{
  "error":   "invalid_cik",
  "message": "ciks must be 10-digit zero-padded; pad with leading zeros and retry. Received: '320193' → should be '0000320193'."
}