Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Building Virtual Knowledge Graphs for Life Sciences with Ontop-VKG

SWAT4HCLS 2026 Tutorial

Abstract

This tutorial introduces Virtual Knowledge Graphs (VKGs) using the Ontop framework. Starting from a solar system dataset, you will learn how to convert tabular data to RDF, write OBDA mappings, and launch SPARQL endpoints over relational databases — without materializing any RDF. The core concepts are then applied to real-world use cases in clinical data (OMOP CDM), biodiversity (GBIF), and bioimaging (OMERO).

Keywords:Virtual Knowledge GraphOntopOBDASPARQLRDFOMOP CDMGBIFOMERODuckDBPostgreSQLFAIR dataLinked Data

This tutorial introduces Virtual Knowledge Graphs (VKGs) — a technique for exposing relational data as RDF through a live SPARQL endpoint, without copying or converting the data. The mapping is purely virtual: every SPARQL query is translated to SQL on the fly.

The tutorial is built around the Ontop VKG framework and uses the Ontology-Based Data Access (OBDA) mapping language.

Tutorial structure

The tutorial is organised in three parts:

Part I — Core concepts

  1. Warmup — Convert a CSV table of planets to RDF using Python and rdflib. Understand the relationship between wide/long tables and RDF triples.

  2. OBDA Mappings — Learn the Ontop mapping language by virtualising a solar system PostgreSQL database (planets and moons).

  3. Installation — Set up Java, Ontop CLI, DuckDB, and optionally PostgreSQL and GraphDB Desktop.

Part II — Use case demonstrators

  1. OMOP CDM — Expose clinical OMOP data stored as Parquet files via DuckDB and Ontop.

  2. GBIF Biodiversity — Query GBIF occurrence records from S3 Parquet snapshots through a SPARQL endpoint.

Part III — Advanced application

  1. OMERO Bioimaging — Apply Ontop to an OMERO PostgreSQL database to virtualise microscopy image metadata.

  2. IDR: A Billion-Triple Knowledge Graph — Deploy a SPARQL endpoint over the entire Image Data Resource (14 million images, 1 billion triples) with resolvable links and RDF materialization.

Architecture

The general VKG pipeline:

Relational data  →  Ontop (OBDA mappings + ontology)  →  SPARQL endpoint

Ontop supports any JDBC-accessible database. This tutorial demonstrates three backends:

BackendUse caseData format
PostgreSQLSolar system, OMERO, IDRTables in a SQL database
DuckDBOMOP CDMParquet files (embedded)
DuckDBGBIFParquet files (from S3)

Source repositories