Authority page

CalcFi Open Data.

A single landing page for every open distribution of CalcFi data: Kaggle, Hugging Face, datahub.io, data.world, DoltHub, MotherDuck, GitHub, PyPI, Anaconda, Read the Docs, Streamlit, plus the BigQuery Analytics Hub listing. Everything ships under Creative Commons Attribution 4.0 International.

Last reviewed: 2026-06-04·Canonical DOI: 10.6084/m9.figshare.32332290·License: CC BY 4.0

What is CalcFi Open Data?

CalcFi Open Data is a curated, free, license-clean bundle of 34 financial and macro time series mirrored from US federal primary sources (FRED, BLS, Treasury, Freddie Mac, SSA) into a single consistent schema. The bundle ships as a CSV pack plus a Parquet pack, with companion clients in Python, JavaScript, and Julia for typed access. It is the data layer behind a large fraction of CalcFi calculators on the live site, exposed publicly so that journalists, researchers, students, and downstream developers can use the same numbers without rebuilding the ingest pipeline.

Why publish it as a separate dataset instead of leaving it locked inside the calculators? Because a site is a closed surface and a dataset is an open one. A reader can verify a CalcFi calculator result against the underlying series in one click, an academic can reuse the data under CC BY 4.0 without scraping, and a future product can build on the same source without re-implementing the ingest. The cost is small (publish once, mirror everywhere). The payoff is a credible, citable, long-lived data surface that anchors CalcFi as a source rather than just another tool.

The dataset is intentionally narrow. CalcFi ingests far more than 34 series for the live calculators (state-level salary data, city rent-vs-home, county-level COL, IRS reference tables, SSA bend points, the full Treasury yield curve). The 34 series in the open bundle are the highest-reuse macro layer; the rest stays inside the application boundary. Future releases will widen the bundle as bandwidth allows.

Distribution mirrors

14 active mirrors across the major open-data registries. Each entry links to the live distribution page where you can download or install.

CalcFi Open Data (Hugging Face) Hugging Face
Canonical Hugging Face mirror of the CalcFi Open Data CSV + Parquet bundle. Loadable via the `datasets` library.
License: CC BY 4.0
datahub.io/calcfi — 20 dataset hub datahub.io
Twenty individually-curated datasets on datahub.io. Each ships with datapackage.json metadata and downloadable CSV.
License: CC BY 4.0
data.world/jerehere/calcfi-open-data data.world
data.world dataset with queryable SQL workbook. Includes journalist API access via data.world REST.
License: CC BY 4.0
DoltHub — git-for-data mirror DoltHub
Versioned Dolt repository (git-for-data). Diff-able row history across every published refresh.
License: CC BY 4.0
MotherDuck — cloud DuckDB share MotherDuck
Cloud-hosted DuckDB share. ATTACH from any DuckDB client for sub-second analytical queries.
License: CC BY 4.0
GitHub — calcfi-open-data (canonical source repo) GitHub
Source repository the other mirrors are derived from. CSV + Parquet under `data/`, methodology under `docs/`.
License: CC BY 4.0
GitHub — dbt-calcfi-open-data GitHub
dbt project that materializes the CalcFi Open Data series as warehouse tables (BigQuery, Snowflake, DuckDB).
License: CC BY 4.0
PyPI — calcfidata package PyPI
Python package. `pip install calcfidata`. Returns pandas DataFrames keyed by FRED-style series codes.
License: CC BY 4.0
Anaconda — calcfidata channel Anaconda
Conda channel for the calcfidata Python package. Installable via `conda install -c jeresalmisto calcfidata`.
License: CC BY 4.0
Read the Docs — calcfidata.readthedocs.io Read the Docs
Sphinx-built reference documentation. Series catalog, refresh cadence, methodology, citation guide.
License: CC BY 4.0
Datasette — live SQL queries Datasette (Vercel)
Live Datasette instance. Five saved queries (yield curve, mortgage rate, CPI vs PCE, etc.) with JSON / CSV export.
License: CC BY 4.0
Streamlit — yield-curve recession indicator Streamlit Cloud
Interactive Streamlit app rendering the 2s/10s spread with recession-band shading.
License: CC BY 4.0
Streamlit — mortgage rate today Streamlit Cloud
Live 30-year fixed mortgage rate visualizer backed by the Freddie Mac PMMS series.
License: CC BY 4.0
Streamlit — CPI / PCE inflation tracker Streamlit Cloud
Side-by-side CPI and PCE inflation chart with the Fed 2% target overlay.
License: CC BY 4.0

Per-series datasets on Kaggle

24 individually-citable Kaggle datasets, one per series. Useful when you only need a single time series and want to cite the per-series Kaggle DOI rather than the bundle.

Interactive apps (Hugging Face Spaces)

10 Gradio apps backed by the dataset. Each is open source and reproducible.

Source code repositories

Pipeline code is open source. Two repositories cover the canonical data bundle and the dbt warehouse models that downstream pipelines can subscribe to.

calcfi-open-data (GitHub)
Canonical source repository with CSV + Parquet under data/, methodology under docs/, ingest under scripts/. CI auto-mints a Zenodo Software DOI on each version tag.
dbt-calcfi-open-data (GitHub)
dbt project that materializes the CalcFi Open Data series as warehouse tables. Targets BigQuery, Snowflake, Redshift, and DuckDB out of the box.

BigQuery Analytics Hub listing

The dataset is listed on Google Cloud BigQuery Analytics Hub for enterprise subscribers that prefer an in-warehouse subscription model. Search for “CalcFi Open Data” inside Analytics Hub or contact hello@calcfi.app for the listing URL.

FAQ

What license covers CalcFi Open Data?

Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, adapt, redistribute, and reuse the dataset for any purpose including commercial use, provided you credit the original author, link to the license, and indicate any changes you made. Recommended attribution: "Salmisto, J. (2026). CalcFi Open Data. Figshare. DOI 10.6084/m9.figshare.32332290".

How do I install the Python package?

Run pip install calcfidata. The package exposes pandas DataFrames keyed by FRED-style series codes. The same package is published to the Anaconda channel under jeresalmisto/calcfidata for conda-managed environments. Documentation is on Read the Docs at calcfidata.readthedocs.io.

Which mirror should I use?

For research papers, cite the Figshare canonical DOI 10.6084/m9.figshare.32332290 and download from whichever mirror is most convenient. For data-science workflows, Kaggle and Hugging Face are the most integrated. For SQL analytics, MotherDuck (cloud DuckDB) gives sub-second queries; data.world has a SQL workbook. For git-style versioned data, use DoltHub. For static CSV downloads with datapackage.json, use datahub.io. All mirrors carry identical content under the same license.

How often does the dataset refresh?

The pipeline pulls from primary sources on each source agency's native cadence (nightly for Treasury and FRED, weekly for Freddie Mac PMMS, monthly for BLS CPI and CES, quarterly for BEA national accounts, annual for IRS and SSA reference tables). Refreshes propagate to the canonical Figshare DOI on each release cycle; downstream mirrors are re-synced shortly after. See the data sources page for the refresh schedule per agency.

Can I embed CalcFi data in a commercial product?

Yes. CC BY 4.0 explicitly permits commercial reuse. The only requirements are attribution back to the canonical source and an indication of any modifications. For embedded use cases (dashboards, internal reports, customer-facing widgets), the API or the PyPI package is the lowest-friction integration; the bulk CSV download is best for offline analysis.