Methodology
How SecFilingDex works
SecFilingDex turns the U.S. Securities and Exchange Commission’s EDGAR system — a sprawling, hard-to-search trove of public-domain corporate filings — into a programmatic, citation-grade database. This page is the full version of how that pipeline works, where the data comes from, what we add on top of EDGAR, and where the editorial line sits.
Data sources
Every filing on SecFilingDex is sourced from the SEC’s EDGAR system via the public bulk-data APIs:
- EDGAR submissions JSON (
data.sec.gov/submissions/CIK*.json) for filer-level metadata: company name, CIK, SIC code, fiscal-year-end, recent filing history. - EDGAR full-text search (
efts.sec.gov/LATEST/search-index) for full filing discovery and incremental indexing. - EDGAR form indexes for form-type taxonomy (10-K, 10-Q, 8-K, 13F, 13D/G, S-1, Proxy DEF 14A, Form 4, 20-F, 6-K).
- SIC code dictionary from the SEC’s Division of Corporation Finance for industry classification.
SEC EDGAR is U.S. federal-government public-domain data; republishing it in a structured form is explicitly allowed under SEC’s fair-access policy provided requesters identify themselves and stay within rate limits. We comply with both.
Indexing cadence
SecFilingDex refreshes the underlying filing dataset on a rolling schedule: new filings are picked up within hours of appearing on EDGAR; filer metadata refreshes weekly; SIC taxonomy refreshes monthly. Each filing page’s schema.org/dateModified + the corresponding sitemap lastmod reflect the last time SecFilingDex verified the underlying EDGAR record, not the filer’s original filing date (which is preserved separately as filedAt).
For applications that need real-time filing alerts, EDGAR itself is the canonical source — we don’t claim sub-second freshness, and we don’t want to be the source of record for trading decisions. Our value is in the structured surface, the cross-filer relationship graph, and the citation-grade markup — not real-time race-to-publish.
Taxonomy decisions
Every filing on SecFilingDex is categorised along four orthogonal axes, each of which gets its own indexable surface:
- Form type (10-K, 10-Q, 8-K, 13F-HR, etc.) — the kind of filing it is. Each form type has a plain-English explainer at /learn/[form-type] and a per-form aggregator at
/form/[formType]/. - Filer (CIK + company name + ticker if any). Per-filer indexes at
/filer/[cik]/show every filing we have for that filer. - Industry (SIC code). Per-industry indexes at
/industry/[sicCode]/show every filer in that SIC sector and their cumulative filing history. - Date (filed timestamp). Recency-weighted ranking surfaces recent filings on aggregator pages.
The choice of four axes (form / filer / industry / date) is deliberate — it’s the minimum that lets users find a filing without knowing the accession number, and the maximum that doesn’t collapse into a confusing matrix of redundant pages.
Structured-data approach
Every page on SecFilingDex emits JSON-LD structured data:
- Filing pages emit
Article+Datasetschema with full filer identity, form type, filed date, accession number, and a link to the EDGAR source. The Article author is SecFilingDex (we wrote the page); the underlying filing authorship is the filer (the company that filed it) and we mark that explicitly via theaboutfield. - Filer pages emit
Organization+Datasetschema describing the filer’s identity and filing history. - Industry pages emit
CollectionPage+Datasetschema with the SIC code, sector name, and constituent filer count. - Form-type explainer pages (
/learn/[form-type]) emitArticle+DefinedTermschema for citation by AI agents looking up form-type definitions. - Twin JSON endpoints at
/api/filing/[accession].jsonmirror every filing page in machine-readable form (Aleyda Solis “Extractable” LLM-citation characteristic).
Source attribution — we don’t hide where this came from
Every filing page on SecFilingDex links back to the original EDGAR source in the page header, the body, the JSON-LD citation field, and the API endpoint. SecFilingDex does NOT claim to be the source of record for SEC filings — EDGAR is, and we treat that as the authoritative reference. Our value is in the surface (search, taxonomy, cross-references, structured-data twins), not in re-issuing the filings.
When this matters: if a discrepancy appears between SecFilingDex and EDGAR (e.g., a filer’s name updates on EDGAR before our nightly refresh catches it), EDGAR wins. Our pages may lag the source by up to a few hours; users who need millisecond-current SEC data should consume EDGAR’s API directly.
Editorial vs. data-display boundary
SecFilingDex publishes raw filing data + plain-English form-type explainers + taxonomic indexes. We do NOT:
- Issue verdict labels on filers (no “Buy AAPL on the basis of this 13F”, no “Sell” recommendations — we don’t hold licensed investment-advisor credentials).
- Predict price movements from filing patterns.
- Make investment recommendations of any kind.
- Aggregate filings into “insider trading signals” or similar actionable claims.
Our editorial work is the form-type explainers at /learn/ (what a 10-K is, what 13F deadlines mean, why 8-K matters in M&A), the taxonomy decisions above, and the per-page structured-data wiring. Everything else is mechanical republishing of public-domain government data, marked as such in the schema.
Corrections + takedowns
Spot a factual error on any SecFilingDex page? Email [email protected] with the URL of the page and a description of the issue. Corrections post within 5 business days; the page’s dateModified updates accordingly so search engines and LLMs see the freshness signal.
SEC EDGAR filings are public-domain. SecFilingDex republishes them under that classification with full attribution. If you are an EDGAR filer and want a page on SecFilingDex amended (e.g., your filer name updated after a corporate name change), email us; we update within 48 hours of verification. SecFilingDex does not field takedown requests for filings on the basis of substantive content — that is between the filer and the SEC.