Talking About Metadata and Application Profiles

Authors:DCMI Application Profile Interest Group.

Date: June 9, 2022

Documents in this project

Introduction

It is hard to talk about application profiles because they are two degrees of abstraction removed from reality. Application profiles are about metadata. Metadata, in turn, is about things in the world that are being described.

今天desc许多技术用于元数据ribe themselves in terms that may sound superficially similar while being based on subtly different concepts. Consider the many meanings of "class" or "type" or of our favorite: "schema" (see "How to confuse yourself and everyone else" below).

This style guide presents the terminology we used in designing the DC Tabular Application Profiles model [DCTAP]. Every term here is used with multiple meanings elsewhere, but following this pattern consistently can help avoid (or at least resolve) potential confusion.

This terminology largely follows the Resource Description Framework (RDF) because RDF is the most widely recognized language for "graph metadata"—a modern approach to metadata that favors the ability to link or combine data from different sources on the basis of a generically interoperable model [RDF Primer]. Most metadata in the world is not natively expressed in RDF but can,in principle, be converted into RDF for the purpose of interoperability. Expression in RDF requires clarity about the things being described and their characteristics. The modeling discipline required by RDF makes for well-designed, interoperable metadata, and thus provides a good foundation for DC TAP.

Orwell's rule 6applies: "break any of these rules sooner than say anything outright barbarous".

The things being described

Entitiesexist in the real or imaginary world, they may be material, digital or purely conceptual. RDF calls them resources.

Entities havecharacteristicsandrelationshipsto other entities.

Entities may be grouped by shared characteristic or relationship, or by enumeration intotypes or classes of entity.

In RDF, entities and types of entity are identified with IRIs.

The metadata

Metadata instancesdescribe entities. In the case of a conceptual enitity, the metadata may act as a definition. Metadata instances are composed ofstatements.

Astatementin a metadata instance asserts a value for a single characteristic of a single entity or one relationship between it and another entity.

In RDF, as in natural languages, a statement has asubject, apredicateand anobject.

Thesubject, is an identifier for the entity being described.

Thepredicateis an identifier for the characteristic or relationship in the statement.

Theobjectis a description of the characteristic or an identifier the related entity.

An object in one statement may be the subject or object of other statements, so that RDF metadata may be visualized as a network orgraphofnodesconnected by predicates. The connecting predicates are often callededges.

An arrangement of nodes and edges forms ashape.

Other metadata frameworks have elements, or attributes and values, or key-value pairs. For example XML and JSON are hierarchical tree-like structures. XML documents are structured as nested elements, which may have attributes.

In the TAP, we often refer to theobjectas thevalueof the property in a metadata instance.

The vocabularies and models

Vocabulariesanddata modelsare the raw materials of metadata. These may be published community standards or as ad hoc specifications. Metadata standards and specifications may define usage rules. Sometimes the combination of a vocabulary and a model is called a schema, however the term schema is used in very different ways in different metadata frameworks (see below).

One type of vocabulary comprisesdescriptive terms, such aspropertiesandclassesand relationships between the terms. Another type of vocabulary may be a list of values which can be used in describing the characteristics of an entity.

Vocabularies in RDF identify their terms with IRIs. In RDF instance datapredicatesare identified asproperties, that is, they describe acharacteristicorrelationshipthat may be asserted. AnRDF classidentifies a type or class of entity.

Other metadata frameworks follow a similar pattern of a vocabulary (sometimes called data elements or terms) and a model.

The application profile, including TAPs

Anapplication profiledescribes, explains, and defines additionalrules现有的词汇表和模型应该如何used in a metadata instance.

An application profile comprises a set oftemplatesfor metadata statements in the instance data. The templates define and describe local choices for how statements are constructed, which may include constraints and explanatory information such as labels and notes. Examples ofconstraintsare cardinality of statements, type of the value of a property, and specific rules for the property values.

The DC TAP specification defines a tabular format for application profiles in which statement templates are therowsand the individualelementsof those templates form thecolumns.

A set of statement templates that applies to a single entity or concept defines ashape. Ashapecomprisesstatement templatesfor anodein the metadata that meets some criterion or criteria, for example all nodes belonging to a given class or that are anobjectof a givenproperty. Shapes in the profile may be the same as the structures defined in the metadatamodel, or they may be defined in the profile as a derived view over the metadata.

WARNING: How to confuse yourself and everyone else

Since entities in the world include digital and conceptual things, it follows that metadata, vocabularies, models and application profiles, properties, statements and all the rest are entities, and that you can have metadata about any of these things. Likewise, metadata vocabularies and application profiles describe and/or define concepts and so are forms of metadata. However, pursuing this line of thinking will only serve to confuse.

We have tried to avoid using the word "schema”。来自哲学,然后用于psychology to mean something like a mental model or framework by which the world is interpreted, it was co-opted by computer science to mean some sort of data model. Unfortunately we now find ourselves in a world where we have relational database schemas, XML Schema Definitions (XSD), RDF-Schema (RDFS), JSON-Schema, schema.org, as well as informal use of the word to mean a data standard. There is also potential for use as a malapropism for the words 'scheme' and 'schematic [diagram]'. If used without qualification as to which of these is intended, the word on its own is hopelessly ambiguous. We have useddata modelwhen we mean the broad computer science sense of the word schema.

In summary

For real world things: useentity,characteristic,relationshipandtypeorclass of entity.

For metadata: useinstance(data); and in RDF metadata:statement,subject,predicate,object,value,node,graphandedge.

Forvocabulariesanddata models: useterm,propertyand (metadata)class.

For application profiles (especially TAPs): usestatement templateconstraint,elementandshape.

References

[DCTAP]Karen Coyle (ed.)DC Tabular Application Profiles (DC TAP) - PrimerDCMI working draft for comment. URL://www.voudr.com/groups/application_profiles_ig/dctap_primer/(This reference should be updated when the final version is published. Current work can be found on theDCTAP Github repository.)

[RDF Primer]Guus Schreiber and Yves Raimond (eds.),RDF 1.1 Primer. W3C Working Group Note 24 June 2014. URL:https://www.w3.org/TR/rdf11-primer/