Vocabulary#
Vocabularies are collections of identifiers for (generally) related terms.
All data is associated with identifiers. That is, when you read data, you identify the data you want to read. Similarly, when you write data, you identify the data that you are writing.
In the Solid ecosystem, all data identifiers are IRIs:
IRIs are globally unique identifiers. Being globally unique prevents name clashes and allows for different interpretations of common concepts.
For example, the concept of a
Person
is identified in Schema.org with the identifierhttps://schema.org/Person
, and Schema.org’s interpretation of the Person concept is described as “A person (alive, dead, undead, or fictional).”.The Person Core Ontology, however, identifies the Person concept with the identifier
https://www.w3.org/ns/person#Person
, and their interpretation of the Person concept is described as “An individual person who may be dead or alive, but not imaginary.”Using IRIs allows for the unambiguous differentiation between slightly different interpretations of common concepts, whereas simply using ‘Person’ as the identifier would lead to confusion when attempting to interoperate.
IRIs are dereferenceable, i.e., they can be looked up easily, such as by pasting them into the address bar of any browser.
Providing meaningful descriptive information at an IRI helps with discovering and understanding the data identified by that IRI.
Pre-existing Vocabularies#
Many vocabularies (i.e., collections of terms identified with IRIs)
already exist to identify various concepts (e.g., Organization
,
Person
) and properties (e.g., address
or the starting time of
an event
). The concepts and properties being identified may be
general or highly specialized. For a list of some existing
vocabularies, see Existing Vocabularies.
When possible, rather than creating your own vocabulary of terms/identifiers, choose from existing ones. This helps promote the use of shared/common terms, and therefore, interoperability.
Using Terms from Vocabularies#
To define your data entities, you can use terms from any combination of vocabularies. That is, to save data for a person, you could use:
http://schema.org/familyName
as the identifier for the last name andhttp://xmlns.com/foaf/0.1/firstName
as the identifier for the first name.
However, in practice, you are more likely to use the first and last name terms from the same vocabulary; e.g.,
http://schema.org/familyName
andhttp://schema.org/givenName
orhttp://xmlns.com/foaf/0.1/lastName
andhttp://xmlns.com/foaf/0.1/firstName
.
Nevertheless, as previously mentioned, you can use terms from any combination of vocabularies.
For example, the following code snippet uses the solid-client
function getStringNoLocale to return specific
data items (identified by their IRI strings) from a data entity
retrievedPerson
.
// ...
import {
getStringNoLocale,
} from "@inrupt/solid-client";
// ...
const lastName = getStringNoLocale(retrievedPerson, "http://schema.org/familyName");
const fname = getStringNoLocale(retrievedPerson, "http://xmlns.com/foaf/0.1/firstName");
Using Convenience Objects#
To simplify the usage of pre-existing vocabularies, Inrupt’s vocabulary libraries provide convenience objects for many (but not all) common terms/identifiers you can use in your data entities:
For some common RDF-related vocabularies like RDFS, FOAF, LDP or OWL. |
|
For Solid-related vocabularies like Solid Terms and Workspace. |
|
For Inrupt specific vocabularies. |
Convenience objects contain static constants for common identifiers used across Solid. Importing these classes obviates the need for developers to hard-code these identifiers in their code. Although you can use the IRI strings instead of the convenience objects, these objects represent many of the ideas and concepts that are useful in Solid itself as well as in Solid applications.
The convenience objects include the (IRI) values for each term so you don’t have to remember them or mistype them. The getStringNoLocale can accept either (IRI) strings or the convenience objects. As such, the previous example can be rewritten as follows:
// ...
import {
getStringNoLocale,
} from "@inrupt/solid-client";
import { FOAF, SCHEMA_INRUPT, VCARD } from "@inrupt/vocab-common-rdf";
// ...
const lastName = getStringNoLocale(retrievedPerson, SCHEMA_INRUPT.familyName);
const fname = getStringNoLocale(retrievedPerson, FOAF.firstName);
const role = getStringNoLocale(retrievedPerson, VCARD.role);
FOAF
provides convenience objects for the Friend of a Friend Vocabulary. For example, theFOAF.firstName
is a convenience object that includes thehttp://xmlns.com/foaf/0.1/firstName
IRI.
SCHEMA_INRUPT
is Inrupt’s extension of the schema.org Vocabulary. It provides convenience objects for a subset of terms from the schema.org Vocabulary, adding language tags/translations to labels and comments if missing from schema.org.By limiting the number of terms,
SCHEMA_INRUPT
aims to make working with select terms from Schema.org easier. Schema.org currently defines over 2,500 terms (see Organisation of Schema.org), whereas most applications (including Solid itself) only require specialized subsets of those terms. SCHEMA_INRUPT, which consists of a small set of generally applicable terms, reduces noise, clutter and bundle sizes.If you require a Schema.org term not in SCHEMA_INRUPT, you can use the term’s IRI string directly in your own code, create your own extension vocabulary, or request that Inrupt add that term to SCHEMA_INRUPT.
VCARD
provides convenience objects for the vCard Vocabulary. For example, theVCARD.role
is a convenience object that includes thehttp://www.w3.org/2006/vcard/ns#role
IRI.
See also:
Interoperability#
Consider an example where you are saving your address and a property of
the address is a zipcode or a postal code. If you save data for this
property as zipcode
, then applications must use zipcode
when
accessing this data. If others use zip
, postalCode
, or
postcode
, etc. as the identifier when storing their data, then
applications that use the zipcode
identifier cannot access their
data.
The use of different identifiers for the same data can hinder interoperability. That is, to use an application that retrieves the same data from multiple data sources, the application must be updated to keep track of the various identifiers in order to access this data. Otherwise, the application would not be able to access the data if the data source is not using the expected identifier.
Rather than having to keep track of the varying identifiers across data sources, the use of the same identifier for the same data can help promote interoperability. This idea of coming to broad agreement on common identifiers is perhaps epitomized by Schema.org (from Google, Microsoft and Yahoo!), and is becoming increasingly common in more specialized fields, like biomedicine (e.g. BioPortal https://bioportal.bioontology.org/ontologies) and finance (e.g. FIBO https://spec.edmcouncil.org/fibo/).
See also:
Vocabularies vs. Data Schemas#
Vocabularies provide terms that can be used to identify data. Vocabularies are not data schemas (or in RDF-parlance, “shapes”). That is, unlike data schemas (e.g., JSON Schema, relational database schemas (i.e. Data Definition Language (DDL) or XML Schema) which enforce what properties must appear and can appear for a data entity, vocabularies impose no such restrictions.
Consider an example where you are storing data entities that represent
people. To describe these data entities, you decide to use the
http://schema.org/Person
identifier from Schema.org vocabulary.
That is, the data entity has a property RDF.type
set to
http://schema.org/Person
.
Identifying the data entities as being of RDF.type
http://schema.org/Person
imposes no conditions about the data
properties saved about a person. That is, although
http://schema.org/Person
lists properties/identifiers that are
categorized/grouped under it, these place no restrictions on how you
should or could describe a person; i.e.,
The properties listed under
http://schema.org/Person
can be used to identify non-http://schema.org/Person
data.Your data entity does not need to include all the properties under
http://schema.org/Person
. In fact, your entity does not need to include any of the properties listed underhttp://schema.org/Person
. That is, you can identify the Person’s properties with non-http://schema.org/Person
properties, even from other vocabularies. For example, you could decide that you want to define a person as a data entity with the following properties (from Semantic Arts gist vocabulary) only:https://ontologies.semanticarts.com/gist/name
https://ontologies.semanticarts.com/gist/isIdentifiedBy
Someone else may also identify their data entities as
http://schema.org/Person
but with completely different properties, e.g.:https://schema.org/familyName
,https://schema.org/givenName
, and`https://ontologies.semanticarts.com/gist/hasCommunicationAddress
.
Shapes#
Shapes define what properties must and can appear for a data entity; i.e., shapes, not vocabularies, constrain the data.
Similar to using a common vocabulary, using shared Shapes for data also promotes interoperability. For example, consider multiple applications that read and write data entities that represent people.
One application’s expected “shape” of a person includes the following properties:
an
RDF.type
ofhttp://schema.org/Person
https://schema.org/familyName
,https://schema.org/givenName
,https://schema.org/email
, andhttps://schema.org/telephone
.
Another application’s expected “shape” of a person includes the following properties:
an
RDF.type
ofhttps://ontologies.semanticarts.com/gist/Person
https://ontologies.semanticarts.com/gist/name
https://ontologies.semanticarts.com/gist/isIdentifiedBy
https://ontologies.semanticarts.com/gist/hasCommunicationAddress
.
The two applications are not interoperable. That is, they cannot act upon the other’s data. But, if both applications used a common “shape”, which would also result in the use of the same vocabularies, then although developed separately, they would be able to act upon each other’s data.
For additional information on Shapes, see: