Grounded in peer-reviewed literature. Structured. Ready.

The data layer between scientific literature and your model. Starting with soil GHG.

Built on

The infrastructure underneath.

Solum sits between scientific literature and your work — built like the systems your engineers would build if they had the time. Schemas, validators, versioned snapshots, and an API your team can actually integrate with.

What Solum offers

Figure & table-level provenance Proprietary figure-extraction pipeline solum-torch PyTorch adapter Semantic search JSON Schema · 70+ entity types 3-validator QA pipeline Versioned snapshots Shareable report links One-click PDF export DuckDB + REST API 4 controlled vocabularies
Explorer UI · Browse & filter

Search and filter 1000s of treatments by management practice, GHG type, soil texture, region — directly in the browser.

Provenance · Figure-level trace

Click any value and see the exact figure or table it was extracted from — source PDF rendered alongside.

solum-torch · ML adapter

Drop-in PyTorch Dataset with covariates joined at query time. pip install and train.

Solum Explorer demo screenshot

Why you need Solum

Four cases. Same payoff.

01

"Audit on Friday. Every number has to defend itself."

Provenance down to the source figure and page. Every value chains paper → study → treatment → observation → source figure or table. The API serves the figure raster on demand. Citation bundles export the full trail.

02

"Pulling an effect-size table for cover crops × N₂O across Midwest sites."

Treatment-level resolution across study designs and observation methods — field experiments, incubations, remote sensing, modeling intercomparisons. Filter by practice, GHG, soil texture, region, design, and method. The exact subset you need in minutes.

03

"Building a DNDC calibration set. The data has to match the model."

Calibration sets queryable by model system — DNDC, SALUS, ecosys — with CMIP5 climate and NASA POWER weather covariates joined at query time from spatial raster tiles.

04

"Training a neural network. The data has to be real, not 30 hand-curated papers."

pip install solum-torch. Drop-in PyTorch Dataset class — covariates joined at query time, observation sequences pre-batched. Real experimental data, ready to train.

An asymmetric advantage.

Three roles. The same quiet edge.

Some teams are working from spreadsheets and memory. Some are working from a structured, peer-reviewed evidence layer with full provenance. The difference shows up in deadlines, audits, and bid pipelines.

For

The Researcher / Consultant

It's Friday. You've been staring at 47 PDFs since Tuesday. Your director needs the report by EOD.

Before

Three weeks of paper-by-paper extraction. Numbers you can't fully defend if anyone pushes back.

After

Done by Friday. Every number cited. Every claim defended.

What you tell your director

"I used a structured database of 1000s of peer-reviewed studies. Every number traces back to the exact figure or table it came from — paper, study, site, and source location. If anyone audits us, the citation bundle is one click away."

For

The Modeler / ML Engineer

Your DNDC calibration is overdue, or you're writing data scrapers instead of model code. Either way, data is the bottleneck.

Before

Calibration takes longer than the model run. Training sets built from 30 hand-curated papers. Provenance lives in a spreadsheet built from memory.

After

Pull a calibration set or training corpus in an afternoon. Real experimental data, joined with covariates, ready for the run or the loop.

What you tell the room

"Our model — process-based or neural — runs on the most comprehensive structured experimental dataset in the field. Every observation traces to a named paper, figure, and table. Competitors are working from synthetic data and spreadsheets."

For

The Practice Lead

Q3 review tomorrow. Your VP wants to know why utilization is at 60% and why the bid pipeline isn't bigger.

Before

Half of every project disappears into background research. Margins thin, headcount maxed, growth stalled.

After

Three more projects this quarter without adding headcount. Every project ships with a defensible literature foundation, in hours.

What you tell your VP

"We bought infrastructure. Solum gives every project a peer-reviewed evidence layer in hours instead of weeks. Utilization is up, margin recovered itself, and we doubled the pipeline."

Where Solum is going

Four stops on one road.

The data layer is the foundation. Three more destinations turn it into something a researcher, a model, or another AI can actually use. Each stop below is tagged honestly — what's shipping, what's committed, what we're exploring.

01

The Data Layer

The foundation. Peer-reviewed papers, structured at the treatment level, with full provenance.

Now
1000s of papers indexed in soil GHG Figure & table-level provenance Versioned snapshots Effect-size aggregation views
Next
Continuous ingestion — scaling to 5,000+ papers Methodology paper in Scientific Data Custom simulation-input file generator — DNDC · SALUS · ecosys Calibration data exporter — formatted for your model Human-in-the-loop expert review
Later
Figure-extraction accuracy 95%+ Domain expansion: water quality, biodiversity
02

Solum Intelligence

A domain-aware analysis copilot. Ask in natural language — Solum runs the query, drafts the memo, cites the source.

Now
Semantic search over the corpus
Next
Ask Solum — natural-language analyst Agronomy-aware semantic layer (treatments, units, replicates) Auto-generated plots & effect-size tables
Later
Code-on-demand for you Report templates: PDD appendix · brief · client memo
03

Bring Your Own AI

Your agent. Your stack. Solum exposed as a first-class tool to whatever AI your team already uses.

Now
REST API + OpenAPI spec
Next
Solum MCP server Tiered API keys: free academic, metered commercial pip install solum-torch on PyPI
Later
R package · Hugging Face dataset cards Shareable agent workspaces
04

Models as a Service

A single composite prediction — process modeling, neural networks, and remote-sensing data fused through data assimilation. Not a one-off calibrated run; a continuously updated estimate. Available in the UI, via API, or to your AI agent.

Next
Run predictions in the UI, API, or via your AI agent Above- and below-ground outputs — biomass, yield, soil water, SOC, N₂O Data assimilation — process models, neural networks & field observations fused into one estimate Multi-source integration — remote sensing, weather, soil rasters joined live Every prediction cites its calibration papers Calibration-set builder for your own model runs

What's on this list moves forward when there's pull, not push. If a milestone matters to your team, tell us — we ship for customers, not for slides.

Tiers & Access

Pick the door that matches how you work.

Tier 1

Explorer

Researchers & students

Free

Forever

  • · 50-paper UI sample
  • · Search & filter
  • · CSV export
Get Started

Custom

Enterprise

MRV providers, carbon-tech, ML groups

Let's talk

Pricing tailored to your team

  • · Full database + REST API
  • · Audit-ready provenance bundles
  • · solum-torch & custom extraction
  • · Dedicated support
Contact Us

The auditor question

"Where did your calibration
data come from?"

A spreadsheet built from memory is not an answer that survives an audit. A database of 1000s of peer-reviewed studies with full provenance is. Solum exports a citation bundle for every calibration set — every data point traces back to a named paper, a named study, a named site.

Methodology paper to be published in Scientific Data (Nature portfolio) later this year.