Usage Guide
OntoCheck provides two interfaces: a command-line tool for quick assessments and a Python API for programmatic use and integration into data pipelines.
All assessments require an ontology serialized in Turtle (.ttl) format.
The input file must be free of syntax errors. Some assessments (see Modes 3
and 4 below) also require a set of SPARQL queries derived from competency
questions.
Assessment Modes
OntoCheck supports four assessment modes, controlled by a declarative
configuration C = (O, Q, M, G) where O specifies ontologies, Q is a
set of competency queries, M refers to the metrics, and G is a knowledge
graph.
- Mode 1 – Task-agnostic
(O, empty, M, empty) Run structural, labeling, accessibility, and naming-convention metrics on a single ontology.
- Mode 2 – Task-specific Web ontology
(O, Q, M, G) Validate an ontology against KGQA benchmark queries over a knowledge graph (e.g., LC-QuAD / DBpedia).
- Mode 3 – Task-based Scientific
(O, Q, M, empty) Assess a domain ontology against competency questions encoded as SPARQL queries.
- Mode 4 – Cross-Domain
(O[], Q, M, empty) Merge multiple ontologies and assess the combined vocabulary against cross-domain competency questions.
Command-Line Interface
The ontocheck command accepts a --mode flag (default: 1) that
selects which assessment to run.
ontocheck <ttl_files...> --mode {1,2,3,4} [options]
Mode 1: Task-Agnostic (default)
Run one or more task-agnostic metrics on a single ontology.
# Run two specific metrics
ontocheck my_ontology.ttl --metrics altLabelCheck definitionCheck
# Run all available metrics with custom output paths
ontocheck my_ontology.ttl --metrics all --log-file results.log --csv-file results.csv
Mode 2: Task-Specific Web Ontology
Validate against a KGQA benchmark over a knowledge graph.
ontocheck dbpedia_ontology.ttl \
--mode 2 \
--questions lcquad_queries.json \
--domain-prefixes dbo \
--knowledge-graph dbpedia_kg.ttl
Mode 3: Task-Based Scientific
Assess a domain ontology against competency questions.
ontocheck my_ontology.ttl \
--mode 3 \
--questions competency_questions.json \
--domain-prefixes mds \
--domain-ns-fragments cwrusdle.bitbucket.io/mds
Optionally run task-agnostic metrics alongside the task-based assessment:
ontocheck my_ontology.ttl \
--mode 3 \
--questions competency_questions.json \
--domain-prefixes mds \
--metrics checkLabel definitionCheck
Mode 4: Cross-Domain
Merge multiple ontologies and evaluate against cross-domain queries.
ontocheck xrd_ontology.ttl capacitors_ontology.ttl \
--mode 4 \
--questions cross_domain_questions.json \
--domain-prefixes mds
CLI Arguments
ttl_filesOne or more paths to Turtle ontology files. Mode 4 requires at least two.
--modeAssessment mode:
1(default),2,3, or4.--metricsTask-agnostic metric names, or
all. Required for Mode 1; optional for Modes 2–4.--questionsPath to a JSON or Markdown file containing SPARQL queries. Required for Modes 2, 3, and 4.
--domain-prefixesNamespace prefixes used in the SPARQL queries (e.g.,
mds,dbo). Required for Modes 2, 3, and 4.--domain-ns-fragmentsNamespace URI fragments for filtering domain terms. Optional.
--knowledge-graphPath to a knowledge-graph file. Required for Mode 2.
--log-filePath for the output log file (default:
assessment.log).--csv-filePath for the output CSV results file (default:
assessment_scores.csv).
Python API
Mode 1: Task-Agnostic Assessment
The run_ontology_assessment() function runs one or
more task-agnostic metrics on a Turtle file.
from ontocheck import run_ontology_assessment
# Run selected metrics
run_ontology_assessment(
ttl_file="my_ontology.ttl",
metrics=["altLabelCheck", "isolatedElements", "semanticConnection"],
output_log_file="assessment.log",
output_csv_file="scores.csv",
)
# Run all metrics
run_ontology_assessment(
ttl_file="my_ontology.ttl",
metrics="all",
)
Mode 2: Task-Specific Web Ontology Assessment
The run_web_ontology_assessment() function validates
an ontology against KGQA benchmark queries over a knowledge graph.
from ontocheck import run_web_ontology_assessment
result = run_web_ontology_assessment(
ttl_file="dbpedia_ontology.ttl",
questions="lcquad_queries.json",
domain_prefixes=["dbo"],
knowledge_graph="dbpedia_kg.ttl",
)
print(f"Relevance: {result['relevance']:.2%}")
print(f"Accuracy: {result['accuracy']:.2%}")
Mode 3: Task-Based Scientific Assessment
The run_task_based_assessment() function evaluates a
domain ontology against competency questions encoded as SPARQL queries.
from ontocheck import run_task_based_assessment
result = run_task_based_assessment(
ttl_files="my_ontology.ttl",
questions="competency_questions.json",
domain_prefixes=["mds"],
domain_ns_fragments=["cwrusdle.bitbucket.io/mds"],
)
print(f"Relevance: {result['relevance']:.2%}")
print(f"Accuracy: {result['accuracy']:.2%}")
print(f"Ontology terms (T_o): {result['T_o_count']}")
print(f"Task terms (T_a): {result['T_a_count']}")
To also run task-agnostic metrics alongside the task-based assessment:
result = run_task_based_assessment(
ttl_files="my_ontology.ttl",
questions="competency_questions.json",
domain_prefixes=["mds"],
metrics=["checkLabel", "definitionCheck"],
)
Mode 4: Cross-Domain Assessment
Pass multiple ontology files to
run_task_based_assessment() to merge them and
evaluate cross-domain competency questions.
from ontocheck import run_task_based_assessment
result = run_task_based_assessment(
ttl_files=["xrd_ontology.ttl", "capacitors_ontology.ttl"],
questions="cross_domain_questions.json",
domain_prefixes=["mds"],
)
print(f"Relevance: {result['relevance']:.2%}")
print(f"Accuracy: {result['accuracy']:.2%}")
Underlying Task-Based Metric
For direct access to the task-based Relevance/Accuracy computation without
logging or CSV output, use
task_based_metric_v_0_0_1():
from ontocheck import task_based_metric_v_0_0_1
result = task_based_metric_v_0_0_1(
ttl_file="my_ontology.ttl",
questions="competency_questions.json",
domain_prefixes=["mds"],
domain_ns_fragments=["cwrusdle.bitbucket.io/mds"],
)
print(f"Relevance: {result['relevance']:.2%}")
print(f"Accuracy: {result['accuracy']:.2%}")
Supplying questions. The questions parameter accepts three formats:
JSON file path (ending in
.json): Each element of the JSON array should contain asparql_querykey with a SPARQL query string.Markdown file path (ending in
.md): SPARQL queries are extracted from fenced code blocks marked withsparql.List of strings: Raw SPARQL query strings passed directly.
# From a list of SPARQL query strings
queries = [
"SELECT ?x WHERE { ?x a mds:Sample }",
"SELECT ?x WHERE { ?x mds:hasMaterialComposition ?comp }",
]
result = task_based_metric_v_0_0_1(
ttl_file="my_ontology.ttl",
questions=queries,
domain_prefixes=["mds"],
)
Inspecting gaps. The returned dictionary includes diagnostic sets for identifying vocabulary gaps:
# Terms the tasks require but the ontology lacks
for term in sorted(result['missing_from_onto']):
print(f" Missing: {term}")
# Ontology terms not exercised by any task query
print(f"Unused ontology terms: {len(result['unused_in_onto'])}")
Available Metric Names
The following metric names can be passed to the --metrics CLI argument
or the metrics parameter of run_ontology_assessment:
Labeling:
checkLabel, altLabelCheck, definitionCheck
Structural:
isolatedElements, classConnections, missingDomainRange,
leafNodeCheck, semanticConnection
Accessibility:
sparqlEndpoint, rdfDump, humanLicense, externalLinks
Naming Convention:
classCapitalCheck, classSpaceCheck, spellCheck,
duplicateLabels, searchClass
OntoCheck is Built for the Community
OntoCheck is conceived as a community resource: we actively encourage collaboration, contribution of new metrics, and submission of domain competency question sets, in the shared interest of building robust, reusable semantic infrastructure for FAIR scientific data.