Grounded in peer-reviewed literature. Structured. Ready.

The data layer between scientific literature and your model. Starting with soil GHG.

Built on

The infrastructure underneath.

Solum sits between scientific literature and your work — built like the systems your engineers would build if they had the time. Schemas, validators, versioned snapshots, and an API your team can actually integrate with.

What Solum offers

Figure & table-level provenance Proprietary figure-extraction pipeline solum-torch PyTorch adapter Semantic search JSON Schema · 70+ entity types 3-validator QA pipeline Versioned snapshots Shareable report links One-click PDF export DuckDB + REST API 4 controlled vocabularies

Explorer UI · Browse & filter

Search and filter 1000s of treatments by management practice, GHG type, soil texture, region — directly in the browser.

Provenance · Figure-level trace

Click any value and see the exact figure or table it was extracted from — source PDF rendered alongside.

solum-torch · ML adapter

Drop-in PyTorch Dataset with covariates joined at query time. pip install and train.

Why you need Solum

Four cases. Same payoff.

"Audit on Friday. Every number has to defend itself."

Provenance down to the source figure and page. Every value chains paper → study → treatment → observation → source figure or table. The API serves the figure raster on demand. Citation bundles export the full trail.

"Pulling an effect-size table for cover crops × N₂O across Midwest sites."

Treatment-level resolution across study designs and observation methods — field experiments, incubations, remote sensing, modeling intercomparisons. Filter by practice, GHG, soil texture, region, design, and method. The exact subset you need in minutes.

"Building a DNDC calibration set. The data has to match the model."

Calibration sets queryable by model system — DNDC, SALUS, ecosys — with CMIP5 climate and NASA POWER weather covariates joined at query time from spatial raster tiles.

"Training a neural network. The data has to be real, not 30 hand-curated papers."

pip install solum-torch. Drop-in PyTorch Dataset class — covariates joined at query time, observation sequences pre-batched. Real experimental data, ready to train.

An asymmetric advantage.

Three roles. The same quiet edge.

Some teams are working from spreadsheets and memory. Some are working from a structured, peer-reviewed evidence layer with full provenance. The difference shows up in deadlines, audits, and bid pipelines.

For

The Researcher / Consultant

It's Friday. You've been staring at 47 PDFs since Tuesday. Your director needs the report by EOD.

Before

Three weeks of paper-by-paper extraction. Numbers you can't fully defend if anyone pushes back.

After

Done by Friday. Every number cited. Every claim defended.

What you tell your director

"I used a structured database of 1000s of peer-reviewed studies. Every number traces back to the exact figure or table it came from — paper, study, site, and source location. If anyone audits us, the citation bundle is one click away."

For

The Modeler / ML Engineer

Your DNDC calibration is overdue, or you're writing data scrapers instead of model code. Either way, data is the bottleneck.

Before

Calibration takes longer than the model run. Training sets built from 30 hand-curated papers. Provenance lives in a spreadsheet built from memory.

After

Pull a calibration set or training corpus in an afternoon. Real experimental data, joined with covariates, ready for the run or the loop.

What you tell the room

"Our model — process-based or neural — runs on the most comprehensive structured experimental dataset in the field. Every observation traces to a named paper, figure, and table. Competitors are working from synthetic data and spreadsheets."

For

The Practice Lead

Q3 review tomorrow. Your VP wants to know why utilization is at 60% and why the bid pipeline isn't bigger.

Before

Half of every project disappears into background research. Margins thin, headcount maxed, growth stalled.

After

Three more projects this quarter without adding headcount. Every project ships with a defensible literature foundation, in hours.

What you tell your VP

"We bought infrastructure. Solum gives every project a peer-reviewed evidence layer in hours instead of weeks. Utilization is up, margin recovered itself, and we doubled the pipeline."

Where Solum is going

Four stops on one road.

The data layer is the foundation. Three more destinations turn it into something a researcher, a model, or another AI can actually use. Each stop below is tagged honestly — what's shipping, what's committed, what we're exploring.

The Data Layer

The foundation. Peer-reviewed papers, structured at the treatment level, with full provenance.

Now

1000s of papers indexed in soil GHG Figure & table-level provenance Versioned snapshots Effect-size aggregation views

Continuous ingestion — scaling to 5,000+ papers Methodology paper in Scientific Data Custom simulation-input file generator — DNDC · SALUS · ecosys Calibration data exporter — formatted for your model Human-in-the-loop expert review

Later

Figure-extraction accuracy 95%+ Domain expansion: water quality, biodiversity

Solum Intelligence

A domain-aware analysis copilot. Ask in natural language — Solum runs the query, drafts the memo, cites the source.

Now

Semantic search over the corpus

Ask Solum — natural-language analyst Agronomy-aware semantic layer (treatments, units, replicates) Auto-generated plots & effect-size tables

Later

Code-on-demand for you Report templates: PDD appendix · brief · client memo

Bring Your Own AI

Your agent. Your stack. Solum exposed as a first-class tool to whatever AI your team already uses.

Now

REST API + OpenAPI spec

Solum MCP server Tiered API keys: free academic, metered commercial pip install solum-torch on PyPI

Later

R package · Hugging Face dataset cards Shareable agent workspaces

Models as a Service

A single composite prediction — process modeling, neural networks, and remote-sensing data fused through data assimilation. Not a one-off calibrated run; a continuously updated estimate. Available in the UI, via API, or to your AI agent.

Run predictions in the UI, API, or via your AI agent Above- and below-ground outputs — biomass, yield, soil water, SOC, N₂O Data assimilation — process models, neural networks & field observations fused into one estimate Multi-source integration — remote sensing, weather, soil rasters joined live Every prediction cites its calibration papers Calibration-set builder for your own model runs

What's on this list moves forward when there's pull, not push. If a milestone matters to your team, tell us — we ship for customers, not for slides.

Tiers & Access

Pick the door that matches how you work.

Tier 1

Explorer

Researchers & students

Free

Forever

· 50-paper UI sample
· Search & filter
· CSV export

Get Started

Custom

Enterprise

MRV providers, carbon-tech, ML groups

Let's talk

Pricing tailored to your team

· Full database + REST API
· Audit-ready provenance bundles
· solum-torch & custom extraction
· Dedicated support

The auditor question

"Where did your calibration
data come from?"

A spreadsheet built from memory is not an answer that survives an audit. A database of 1000s of peer-reviewed studies with full provenance is. Solum exports a citation bundle for every calibration set — every data point traces back to a named paper, a named study, a named site.

Request Early Access Book a 20-min call

Methodology paper to be published in Scientific Data (Nature portfolio) later this year.

Grounded in peer-reviewed literature. Structured. Ready.

The infrastructure underneath.

Four cases. Same payoff.

"Audit on Friday. Every number has to defend itself."

"Pulling an effect-size table for cover crops × N₂O across Midwest sites."

"Building a DNDC calibration set. The data has to match the model."

"Training a neural network. The data has to be real, not 30 hand-curated papers."

Three roles. The same quiet edge.

The Researcher / Consultant

The Modeler / ML Engineer

The Practice Lead

Four stops on one road.

The Data Layer

Solum Intelligence

Bring Your Own AI

Models as a Service

Pick the door that matches how you work.

Explorer

Enterprise

"Where did your calibrationdata come from?"

"Where did your calibration
data come from?"