Building Virtual Knowledge Graphs for Life Sciences with Ontop-VKG
SWAT4HCLS 2026 Tutorial
Abstract¶
This tutorial introduces Virtual Knowledge Graphs (VKGs) using the Ontop framework. Starting from a solar system dataset, you will learn how to convert tabular data to RDF, write OBDA mappings, and launch SPARQL endpoints over relational databases — without materializing any RDF. The core concepts are then applied to real-world use cases in clinical data (OMOP CDM), biodiversity (GBIF), and bioimaging (OMERO).
This tutorial introduces Virtual Knowledge Graphs (VKGs) — a technique for exposing relational data as RDF through a live SPARQL endpoint, without copying or converting the data. The mapping is purely virtual: every SPARQL query is translated to SQL on the fly.
The tutorial is built around the Ontop VKG framework and uses the Ontology-Based Data Access (OBDA) mapping language.
Tutorial structure¶
The tutorial is organised in three parts:
Part I — Core concepts¶
Warmup — Convert a CSV table of planets to RDF using Python and rdflib. Understand the relationship between wide/long tables and RDF triples.
OBDA Mappings — Learn the Ontop mapping language by virtualising a solar system PostgreSQL database (planets and moons).
Installation — Set up Java, Ontop CLI, DuckDB, and optionally PostgreSQL and GraphDB Desktop.
Part II — Use case demonstrators¶
OMOP CDM — Expose clinical OMOP data stored as Parquet files via DuckDB and Ontop.
GBIF Biodiversity — Query GBIF occurrence records from S3 Parquet snapshots through a SPARQL endpoint.
Part III — Advanced application¶
OMERO Bioimaging — Apply Ontop to an OMERO PostgreSQL database to virtualise microscopy image metadata.
IDR: A Billion-Triple Knowledge Graph — Deploy a SPARQL endpoint over the entire Image Data Resource (14 million images, 1 billion triples) with resolvable links and RDF materialization.
Architecture¶
The general VKG pipeline:
Relational data → Ontop (OBDA mappings + ontology) → SPARQL endpointOntop supports any JDBC-accessible database. This tutorial demonstrates three backends:
| Backend | Use case | Data format |
|---|---|---|
| PostgreSQL | Solar system, OMERO, IDR | Tables in a SQL database |
| DuckDB | OMOP CDM | Parquet files (embedded) |
| DuckDB | GBIF | Parquet files (from S3) |
Source repositories¶
This tutorial: AmsterdamUMC
/ontop -vkg -tutorial SWAT4HCLS materials: mpievolbio
-scicomp /swat4hcls2026 -ontop -tutorial OMOP source: AmsterdamUMC
/Ontop4OMOP OMERO mappings: German
-BioImaging /omero -ontop -mappings