Skip to main content
FR-Databricks · Delta Sharing · v1

Filings in your lakehouse,
via Delta Sharing.

The full FinancialReports corpus delivered through the open Delta Sharing protocol. Read directly from Databricks, Spark, pandas, or any compatible client — no data duplication, no ETL to build.

ProtocolDelta Sharing (open)
RefreshDaily · incremental
CoverageG20 + Europe · 66 sources
AuthBearer token · per-client share
FormatsDelta tables · Parquet-backed
AccessEnterprise · per agreement
01 · TL;DR

What Delta Sharing delivery is, in three lines.

Same data as the API and S3 bucket. Delivered via the open Delta Sharing protocol — works with Databricks Unity Catalog, Apache Spark, pandas, and any Delta Sharing-compatible client.

What you get

A share profile and a bearer token.

Delta tables for filings, companies, and reference data. In Databricks, they appear as a catalog in Unity Catalog. Outside Databricks, use the open-source delta-sharing Python client or any compatible connector.

Who it's for

Teams running Spark or lakehouse architectures.

ML engineers training models on filing text. Data engineers building medallion-architecture pipelines. Quant teams running PySpark notebooks. If your compute is Spark-native, Delta Sharing is the zero-copy path.

How it's billed

Annual subscription. You pay your compute.

The data subscription is flat. You pay Databricks (or your Spark vendor) for compute. Delta Sharing reads are serverless on the provider side — no warehouse to spin up on our end.

02 · S3 vs Snowflake vs Databricks

Three bulk channels, one dataset.

Choose by infrastructure fit. All three share the same filing IDs and schema — you can mix channels or migrate between them without data re-mapping.

Dimension
S3 bulk
Snowflake
Databricks
Setup effort
IAM role + bucket policy. ~1 hour.
Accept Marketplace listing. ~5 min.
LowImport share profile. ~5 min.
Query engine
FlexibleDuckDB, Athena, Spark, pandas.
Snowflake SQL only.
FlexibleSpark, pandas, R, any Delta client.
Best for ML / LLM training
BestRaw Markdown files, direct S3 read.
Metadata yes. Blob export is awkward.
StrongPySpark to model pipeline. Native.
Lakehouse integration
Mount or external table. Manual.
Not lakehouse-native.
BestUnity Catalog. First-class citizen.
Cross-platform
Any S3-compatible client.
Snowflake accounts only.
OpenOpen protocol. Not Databricks-only.
Latency
Daily batch (~24 h).
Hourly~1-2 h via Snowpipe.
Daily batch (~24 h).
03 · Quickstart

From zero to your first query.

Four steps. Works with Databricks Unity Catalog or the open-source delta-sharing Python package. No cloud-specific setup required on your side.

1

Request access

Contact us or email [email protected]. We'll provision a Delta Sharing share scoped to your contract and send you a share profile file (.share).

2

Import the share profile

Databricks: Add the share as a provider in Unity Catalog via Data → Delta Sharing → Providers. Open-source: Save the .share file and point the Delta Sharing client at it.

3

Create a catalog (Databricks) or load tables (OSS)

Databricks: Create a catalog from the share — tables appear in Unity Catalog alongside your own data. OSS: Use delta_sharing.load_as_spark() or delta_sharing.load_as_pandas().

4

Query

The tables are live — new filings appear daily. Query with SQL, PySpark, or pandas. See the examples on the right.

Databricks · SQL
-- After creating a catalog from the share:

-- Latest German annual reports
SELECT
  filing_id,
  company_name,
  title,
  release_datetime,
  fiscal_year
FROM financialreports.default.filings
WHERE country_code = 'DE'
  AND filing_type_code = '10-K'
  AND release_datetime >= DATE_SUB(CURRENT_DATE(), 365)
ORDER BY release_datetime DESC
LIMIT 25;

-- Cross-catalog join to your own tables
SELECT
  h.portfolio,
  f.company_name,
  f.title,
  f.release_datetime
FROM financialreports.default.filings f
JOIN my_catalog.default.holdings h
  ON f.company_id = h.fr_company_id
WHERE f.release_datetime >= DATE_SUB(CURRENT_DATE(), 7)
ORDER BY f.release_datetime DESC;
04 · Querying

Three clients, same data.

Delta Sharing is an open protocol. Use it from Databricks SQL, PySpark notebooks, or any machine with the delta-sharing Python package. No Databricks account required for the open-source client.

Databricks SQL · top issuers by disclosure volume, YTD
-- Unity Catalog: shared tables are first-class
SELECT
  company_name,
  country_code,
  COUNT(*) AS n_filings,
  MAX(release_datetime) AS latest
FROM financialreports.default.filings
WHERE release_datetime >= '2026-01-01'
  AND country_code IN ('DE', 'FR', 'GB')
GROUP BY company_name, country_code
ORDER BY n_filings DESC
LIMIT 20;
PySpark · notebook workflow
# In a Databricks notebook or any Spark cluster
filings = spark.table("financialreports.default.filings")

# Filter and join with internal data
eu_annuals = (filings
  .filter("country_code IN ('DE','FR','GB','IT','ES')")
  .filter("filing_type_code = '10-K'")
  .filter("fiscal_year = 2025"))

holdings = spark.table("my_catalog.default.portfolio")

(eu_annuals.join(holdings, "company_id", "inner")
           .select("company_name", "title", "release_datetime")
           .orderBy("release_datetime", ascending=False)
           .show(20, truncate=False))
Python · delta-sharing open-source client
import delta_sharing

# Works anywhere — no Databricks account needed
profile = "financialreports.share"

# List available tables
client = delta_sharing.SharingClient(profile)
tables = client.list_all_tables()
for t in tables:
    print(f"{t.share}.{t.schema}.{t.name}")

# Load filings into pandas
table_url = f"{profile}#financialreports.default.filings"
df = delta_sharing.load_as_pandas(table_url)

# Filter locally
de_annuals = df[
    (df["country_code"] == "DE") &
    (df["filing_type_code"] == "10-K")
].sort_values("release_datetime", ascending=False).head(25)
05 · Delta Sharing protocol

Open standard. Not vendor lock-in.

Delta Sharing is an open protocol by the Linux Foundation. The data is Parquet-backed, the API is REST, and the client libraries are open source. You can read FinancialReports data from any compatible tool — not just Databricks.

Standard

Linux Foundation, Apache 2.0.

The protocol spec, reference server, and all client libraries are open source under the Linux Foundation's Delta Lake project. No proprietary extensions required.

Clients

Python, Spark, pandas, R, Rust.

The open-source delta-sharing Python package works anywhere — laptops, CI/CD, cloud VMs. Databricks Unity Catalog provides native integration. Apache Spark reads shares natively.

Under the hood

REST API with pre-signed Parquet URLs.

The sharing server returns pre-signed URLs to Parquet files. Your client downloads directly from object storage — no data passes through a proxy. Predicate pushdown and column pruning are protocol-native.

06 · Security

Token-scoped, encrypted, audit-logged.

Every share is scoped to the tables and columns in your contract. Bearer tokens are rotatable, and all access is logged.

Authentication

Bearer token per share profile.

Each client gets a unique .share profile with a bearer token. Tokens are rotatable — request a new one anytime without disrupting queries in flight.

Scope

Table and column level access.

Your share exposes only the tables and columns in your contract. Geography filters are applied at the share definition — not at query time. You can't access data outside your scope.

Transport

TLS everywhere. Parquet at rest.

The sharing server API and all pre-signed Parquet URLs use TLS. Data at rest is encrypted with SSE-S3. Pre-signed URLs expire after a short window.

07 · FAQ

Questions every prospect asks.

If yours isn't here, email [email protected].

Do I need a Databricks account?

No. Delta Sharing is an open protocol. The delta-sharing Python package works on any machine — pip install delta-sharing, point it at the .share profile, and load tables into pandas or Spark. Databricks just makes it easier with Unity Catalog integration.

How does latency compare to Snowflake delivery?

Snowflake delivery runs hourly (~1–2 h latency). Delta Sharing delivery runs daily (~24 h latency). If you need sub-day latency in a lakehouse, pair Delta Sharing with the API or webhooks for real-time event triggers.

Can I get just EU data?

Yes. Your share is scoped to the geography in your contract — EU-only, global, single-country, or any combination. Adjusting scope later is a share configuration update on our side.

What tables are included?

The same six tables as Snowflake: filings, companies, sources, filing_types, filing_categories, and languages. Schema is identical across all delivery channels.

Can I access the original filing documents (PDFs)?

The Delta table includes raw_document_s3_key — the S3 path to the original file. We provision a cross-account IAM grant for document access, same as S3 bulk delivery. See S3 security docs.

Is this the same data as the API and S3?

Exactly the same. Filing IDs match across all channels. A filing's filing_id in the Delta table is the same as the API's id field and the S3 Parquet's filing_id column. Mix and match channels freely.

How does S3 bulk differ from Delta Sharing?

S3 bulk gives you raw Parquet files and Markdown documents on S3 — you own the query engine choice. Delta Sharing provides a structured table abstraction with predicate pushdown and column pruning built into the protocol. If you already use Databricks or Spark, Delta Sharing is simpler. If you use DuckDB or Athena, S3 is more natural.

Ready to add regulatory filings to your lakehouse?

Tell us your Databricks workspace URL (or just that you want Delta Sharing) and the geography you need. We'll provision a share profile within a business day.