元数据设计、实施和最佳实践方面的创新

Dublin Core™标签应用于XML数据模式,用于描述和分类

创作者: 安德鲁·德曼
发行日期: 1999-05-20
最新版本: //www.voudr.com/specifications/dublin-core/dc-xml-data-schemas/
发布历史: //www.voudr.com/specifications/dublin-core/dc-xml-data-schemas/release_history/
描述: 都柏林核心元数据元素集是由图书管理员设计的15个元素的集合,用于对文档进行分类和编目。这些元素非常通用,适合对xml数据模式进行分类和描述。本文提出了一种基于Dublin Core元素的模式,并给出了其在xml数据模式中的应用指导。

Dublin Core™元数据元素集是由图书管理员设计的15个元素的集合,用于对文档进行分类和编目。这些元素非常通用,适合对xml数据模式进行分类和描述。本文提出了一个基于Dublin Core™元素的模式,并给出了在XML-Data模式中应用该模式的指导原则。

但首先,我们来看一个示例:根据这里描述的元素分类的一个简单模式可能如下所示:

  <标题>我的琐碎架构  安德鲁·守工士  mailto:(电子邮件保护)   urn:electrocommerce-org/taxonomy/teapot teapot   urn:electrocommerce-org/taxonomy/coffee coffee  caff< /keyword>   

的模式
这定义了一小组标记,每个标记都基于附录a中所示的相应通用Dublin Core™元素,但专门用于编目模式。

<! - 架构目录的架构,版本1,基于Dublin Core,由AJL生成5/13/99。- >  ,这定义了一组小组标签,每个标签相应的通用都柏林核心,但这里专注于编目模式的目的。有关Dublin Core™的更多信息,请访问http://purl.org/dc。           <描述>混合文本和标记。如果标记,必须是良好的。 <属性类型=“XML:lang”/>   <描述>用于分类的关键字,具有人类语言含义,但未从由URI标识的受控词汇绘制。我们建议仅使用小写文本。 <属性类型=“xml:lang”/>   <描述>这种模式的描述性标题。 <属性类型=“xml:lang”/>   <描述>人或组织主要负责创建此模式的智力内容。  <元素类型=“personReference”/> <元素类型=“freeText”/>    The topic of the schema. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the schema. The use of controlled vocabularies and formal classification schemes is encouraged.        A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.    The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity.       A person or organization not specified in a Creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a Creator element (for example, editor, transcriber, and illustrator).       A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names are also candidates for this element.       Information about a second resource from which the present resource is derived. While it is generally recommended that elements contain information about the present resource only, this element may contain a date, creator, format, identifier, or other metadata for the second resource when it is considered important for discovery of the present resource; recommended best practice is to use the Relation element instead. For example, it is possible to use a Source date of 1603 in a description of a 1996 film adaptation of a Shakespearean play, but it is preferred instead to use Relation "IsBasedOn" with a reference to a separate resource whose description contains a Date of 1603. Source is not applicable if the present resource is in its original form.       The language of the intellectual content of the resource. When used, he content of this field must coincide with RFC 1766 [Tags for the Identification of Languages, http://ds.internic.net/rfc/rfc1766.txt ]; examples include en, de, es, fi, fr, ja, th, and zh.   A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.        A small set of tags, each based on the corresponding generic Dublin Core™ element, but here specialized for the purpose of cataloging schemas. See http://purl.org/dc for more information on Dublin Core™. Many tags may be repeated at this level, and also allow multiple occurences of their subelments. The intended usage is that distinct items (for example distinct creators) should be expressed with separate elements, while alternative forms of reference to the same item (for example, several ways of referring to the same creator) should be expressed as alternate subelements.               

如何使用架构
理解如何使用它的关键是首先理解几个基于uri的引用的角色,比如personReference、subjectReference和resourceReference。这些都发生在Dublin Core™中内容模型非常灵活的元素中。例如,在DC中,创建者元素可能有自由文本,或者它可能通过一些众所周知的标识系统引用特定的公司或个人。控制的名称集,例如D-U-N-S数字,是很好的标识符。我们将它们与通用资源标识符规范配对,并建议控制标识符的公司和组织应该用URI来命名它们的标识符集,从而允许我们在需要受控标识符的地方使用数据类型“URI”。

例如,假设Dun和Bradstreet给他们发布的每个数字都提供了一个以“urn:www-dnb-com/dunsno”开头的URI。一个创建者元素可能看起来像

<创造者> < personRef > urn: www-dnb-com dunsno / 123456789012345 < / personRef > < / >创造者

类似地,主题分类法将由许多权威机构合理地定义。每一个都应该有一个对应的URI名称空间,使用类似于

<主题> < subjectRef > urn: electrocommerce-org /分类/茶壶< / subjectRef > < / >主题

主题分类也允许不受控制的词汇表中的关键字,因此可能会看到以下情况:

<主题> <受试者> URN:电力信息 -  org / cathonononomy / taxody / teapot  <关键字>茶壶 

根据这里描述的元素分类的普通模式可能如下所示:

  <标题>我的琐碎架构  安德鲁·守工士  mailto:(电子邮件保护)   urn:electrocommerce-org/taxonomy/teapot teapot   urn:electrocommerce-org/taxonomy/coffee coffee  caff< /keyword>   

附录A:通用都柏林核心™元素集
这使得以完全无限制的方式定义十五个元素。每个元素都可以包含任何内容。

<! -  Dublin Core的架构,生成5/13/99 4:03:15 PM由AJL。- >  <描述> Dublin Core™是一个简单的元数据元素集促进发现电子资源。  <描述>给予资源的名称,通常由创建者或发布者。   <描述>主要负责创建资源智能内容的人或组织。例如,在视觉资源的情况下书面文件,艺术家,摄影师或插画家的作者。  <描述>资源主题。通常,主题将表示为描述资源的主题或内容的关键字或短语。鼓励使用受控的词汇和正式分类方案。   <描述>资源内容的文本描述,包括在视觉资源的文档对象或内容描述的情况下的摘要。   <描述>负责使资源以其现行形式提供的实体,例如出版社,大学部门或企业实体。   <描述>在创建者元素中未指定的人或组织对资源做出了重大的智力贡献,但其贡献是创建者元素中指定的任何人或组织(例如,编辑器,转录和 illustrator).   A date associated with the creation or availability of the resource. Such a date is not to be confused with one belonging in the Coverage element, which would be associated with the resource only insofar as the intellectual content is somehow about that date. Recommended best practice is defined in a profile of ISO 8601 [Date and Time Formats (based on ISO8601), W3C Technical Note, http://www.w3.org/TR/NOTE-datetime] that includes (among others) dates of the forms YYYY and YYYY-MM-DD. In this scheme, for example, the date 1994-11-05 corresponds to November 5, 1994.   The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. For the sake of interoperability, Type should be selected from an enumerated list that is currently under development in the workshop series.   The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. For the sake of interoperability, Format should be selected from an enumerated list that is currently under development in the workshop series.   A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names are also candidates for this element.   Information about a second resource from which the present resource is derived. While it is generally recommended that elements contain information about the present resource only, this element may contain a date, creator, format, identifier, or other metadata for the second resource when it is considered important for discovery of the present resource; recommended best practice is to use the Relation element instead. For example, it is possible to use a Source date of 1603 in a description of a 1996 film adaptation of a Shakespearean play, but it is preferred instead to use Relation "IsBasedOn" with a reference to a separate resource whose description contains a Date of 1603. Source is not applicable if the present resource is in its original form.   The language of the intellectual content of the resource. Where practical, the content of this field should coincide with RFC 1766 [Tags for the Identification of Languages, http://ds.internic.net/rfc/rfc1766.txt ]; examples include en, de, es, fi, fr, ja, th, and zh.   An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), a translation of a work (IsBasedOn), a chapter of a book (IsPartOf), and a mechanical transformation of a dataset into an image (IsFormatOf). For the sake of interoperability, relationships should be selected from an enumerated list that is currently under development in the workshop series.   The spatial or temporal characteristics of the intellectual content of the resource. Spatial coverage refers to a physical region (e.g., celestial sector); use coordinates (e.g., longitude and latitude) or place names that are from a controlled list or are fully spelled out. Temporal coverage refers to what the resource is about rather than when it was created or made available (the latter belonging in the Date element); use the same date/time format (often a range) [Date and Time Formats (based on ISO8601), W3C Technical Note, http://www.w3.org/TR/NOTE-datetime] as recommended for the Date element or time periods that are from a controlled list or are fully spelled out.   A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.