OCA_training_pathway

Semantic Engine

Semantics is a branch of linguistics and logic concerned with meaning.

The Semantic Engine is a set of tools being developed at Agri-food Data Canada helping researchers generate machine-accessible meaning for their data.

One of the first tools we are developing will help researchers more easily create and use data schemas.

The benefits of better data schemas

Data must be structured to be understood and a schema describes the structure of the data.

For example, a schema can describe what information is contained within the columns of a dataset. Researchers can tune the detail of descriptions in their data schemas depending on their needs. We can represent a schema in a table format like the example below:

A table representation of the schema of an associated data table.

The better a schema is, the more value it adds to the associated data sets. Researchers and their collaborators can better document data and avoid misinterpreting data, which can lead to false assumptions in their research.

Better data schemas aid researchers in sharing data with the research community. Better documentation enables researchers to effectively communicate the context of the data to other users, ensuring that the information is used accurately. This is especially valuable in cross-disciplinary research where other users are less familiar with the conventions of a particular discipline.

How to easily write better schemas

Effective data schemas can be challenging to write because they require specialized knowledge and time. Agri-food Data Canada is creating a semantic engine to help researchers write better data schemas with less effort. Agri-food Data Canada is developing this semantic engine with the input of researchers to ensure that it meets their needs.

To create the semantic engine, Agri-food Data Canada is partnering with the Human Colossus Foundation (HCF) to adopt the HCF’s work on overlays capture architecture (OCA) as the underlying schema standard. Overlays capture architecture is an extensible, flexible, international, open, and machine-accessible standard for schemas.

The Human Colossus Foundation has developed overlays capture architecture (OCA), which is an open, international standard for data schemas. Agri-food Data Canada is adopting and adapting OCA in partnership with the Human Colossus Foundation

From a table representation of a schema, an OCA schema splits each feature into a separate layer. Each layer is a separate file (written in a machine-readable format) that recognizes the capture base, or the foundation of the schema describing the data set.

The different features of the data schema can be expressed as layers (or overlays) of the capture base. This is the overlays capture architecture, which can be expressed in a machine-readable format.

Capture Base. This is the most basic, foundational structure of the schema. The capture base contains an identifier that all other overlays reference. A capture base is most valuable when it doesn’t change much. Details are added to the schema via the overlays without changing the capture base which keeps the schema structure consistent which is good for interoperability.

The Capture Base of a schema contains the basic structure and minimal elements of the schema.

Overlays. Layers are added to the schema to provide more detail, making it easier to understand and use data collected and structured according to the associated schema. Layers can be more than just descriptions and labels. For example, a user can add a data transformation layer that contains instructions for transforming data from another schema into their format. This would help users wanting to work with data presented in an unusual format. The data transformation layer records how to transform data from one schema type to another, making data collected with two different capture bases interoperable.

The Overlays of a schema provide more detail and build upon the capture base.

The many benefits of the OCA-layered schema architecture include the following:

After schemas have been created they can be published as a separate research object. This makes it easier for others to adopt and adapt existing schemas rather than recreating the work. The result is data that is more interoperable and easy to be understood by users.

Schemas can be published separately in a repository and used by many datasets in different data repositories.

The semantic engine that Agri-food Data Canada is creating in partnership with the Human Colossus Foundation lets researchers create, use and export schemas using the flexible and extensible OCA standard. The semantic engine will help researchers generate meaningful data.