DCMI DCSV

Creators: Simon Cox
Renato Iannella
Contributors: Andy Powell
Andrew Wilson
Pete Johnston
Tom Baker
Date Issued: 2006-04-10
Latest Version: https://dublincore.org/specifications/dublin-core/dcmi-dcsv/
Release History: https://dublincore.org/specifications/dublin-core/dcmi-dcsv/release_history/
Description: This document describes a method for recording simple structured data in a text string, or structured value string. This method is referred to for historical reasons as DCSV (which originally meant "Dublin Core Structured Value").

Table of Contents

  1. Introduction
  2. Structured Value Strings
  3. Parsing DCSV
  4. Sample Code for parsing DCSV-encoded values
  5. Glossary
  6. Acknowledgements
  7. References

1. Introduction

It is often desirable to encode or _serialise_simple structured data within a text string. Some generic methods are in common use. Borrowing conventions from natural languages, commas (,) and semi-colons (;) are frequently used as list separators. Similarly, comma-separated values (CSV) and tab-separated values (TSV) are common export formats from spreadsheet and database software, with _line feeds_separating rows or tuples. Dots (.) and dashes (-) are sometimes used to imply hierarchies, particularly in thesaurus applications. The eXtensible Markup Language [XML] provides a more general solution, using tags contained within angle brackets (<, >) to indicate structure.

This document describes a particular method for encoding simple structured data within avalue string. In the DCMI Abstract Model [DCAM], avalue stringis defined as "a simple string that represents the value of a property".Value strings根据描述的方法在这个编码document are referred to here asstructured value strings.

(Note that for historical reasons, the method itself is still referred to here as the DCSV Syntax, or DCSV. "DCSV" originally stood for "Dublin Core™ Structured Value", a legacy concept from circa 1997 which no longer has a place in today's DCMI Abstract Model [DCAM].)

As of 2006, when this specification was revised, the DCMI Usage Board encourages implementors to describe the value resource of a property more fully, where necessary, by making it the subject of an additional description. In terms of the the DCMI Abstract Model, this means creating a "related description" within the context of a "description set".

Using a related description in this way places all of the information in a description set within the context of the DCMI Abstract Model, helping to ensure that recipients of the metadata will be able to parse and understand it. In contrast, the use of DCSV-encoded strings for the description of the value resource forces recipients of the metadata to understand both the DCMI Abstract Model and the DCSV specification described here.

Despite these limitations, implementers may want to use DCSV-encoded strings in situations where the chosen encoding syntax used by the application does not support related descriptions (e.g., XHTML) or where there is a significant legacy adoption of DCSV-encoded structured value strings within a community.

2. Structured Value Strings

The DCSV Syntax allows a _structured value string_to be parsed into a set ofcomponents. To represent this set ofcomponents,语法区分两种类型的苏bstring within thestructured value string--componentLabelsandcomponentValues. AcomponentLabelis the name of a _component_within the structured data, and a _componentValue_is the data itself.

Punctuation characters are used in encoding astructured value stringas follows:

  • equal signs (=) separate thecomponentLabelfrom thecomponentValue;
  • semicolons (;) separate (optionally labelled)componentswithin a list;
  • dots (.) may be used withincomponentLabelsto indicate hierarchical or containment relationships betweencomponents.

ThecomponentLabelsand thecomponentValuesthemselves each consist of a text string. The intention is that the_componentLabel_ will be a word or code corresponding to the name of thecomponent. ThecomponentLabelsmay be absent, in which case the entire substring delimited by semi-colons (;) or the end of the string comprises acomponentValue.

The following patterns show how structured information about a resource may be recorded in strings using DCSV:

"u1; u2; u3" "cA=v1" "cA=v1; cB.part1=v2; cB.part2=v3" "cA=v1; u2; u3"

where

  • u1, u2 and u3 arecomponentValuesof unlabelledcomponents,
  • cA and cB are thecomponentLabelsforcomponents,
  • part1 and part2 arecomponentLabelsforcomponentsthat are sub-components of thecomponentwith thecomponentLabelcB, and
  • v1, v2, and v3 arecomponentValuesof labelledcomponents.

The use of specific punctuation characters in DCSV-encoded_value strings_ means that care must be exercised if these characters are to be used directly within strings which comprise the content (eithercomponentLabelsor_componentValues_) of thecomponents. For DCSV, therefore, when an equal sign (=) or a semicolon (;) is required within thecomponentValue,使用反斜杠字符转义,美联社pearing as = ;. There should be no ambiguity regarding the dot, full-stop, or period (.) within strings: when it is part of a_componentLabel_, a dot indicates some hierarchy; when part of acomponentValue, it has the conventional meaning for the context. This method of escaping special characters largely preserves readability and the ability to enter DCSV-encoded metadatavalue stringseasily using a text editor if required. Software written to process DCSV-encodedvalue stringsmust make the necessary substitutions.

Note that DCSV is only intended to be used for relatively simple structured information about resources.

3. DCSV syntax encoding schemes

Where DCSV-encoded structured value strings are used, this should be indicated by using a syntax encoding scheme. For example, the DCMI endorsed DCMI Period encoding scheme should be used as follows in XHTML:

  

Note that "DCSV" itself should not be used as a syntax encoding scheme. Implementors should use the DCSV specification to derive application-specific syntax encoding schemes where necessary.

Note also that as of 2006, for the reasons outlined above, it is unlikely that the DCMI Usage Board will endorse any new DCMI-maintained terms based on the DCSV specification.

4. Parsing DCSV

A simple method can be used to parse metadatavalue stringsencoded according to the DCSV syntax. For a single DCSV-encodedvalue string:

  1. split thevalue stringinto a list of substrings on any unescaped semicolons (;);
    if no semicolon is present, there is a single substring;
  2. split each substring into acomponentLabel-componentValuepair on any unescaped equal signs (=);
    if no equal sign (=) is present, the _componentLabel_is empty;
  3. within eachcomponentValue, replace the escaped characters with the actual character required.

5. Sample Code for parsing DCSV-encoded values

The following Perl program reads a DCSV-encoded string entered on stdin and prints a formatted version of the structured result. This code is provided for demonstration purposes only and contains no error-checking.

#!/usr/local/bin/perl use strict print "Enter string to be parsed:\n"; my $string = join('',); print "\nString to be parsed is [$string]\n"; # First escape % characters $string =~ s/%/"%".unpack('C',"%")."%"/eg; # Next change \ escaped characters to %d% where d is the character's ascii code $string =~ s/\(.)/"%".unpack('C',$1)."%"/eg; print "\nEscaped string is [$string]\n"; # Now split the string into components my @components = split(/;/, $string); print "\nComponents:\n"; foreach $component (@components) { my ($label, $value) = split(/=/, $component, 2); # if there is no = copy contents of $label into $value and empty $label if (!$value) { $value = $label; $name = ''; } # strip whitespace from name string $label =~ s/^\s*(\S+)\s*$/$1/; # convert % escaped characters back in label string $label =~ s/%(\d+)%/pack('C',$1)/eg; # convert % escaped characters back in value string $value =~s/%(\d+)%/pack('C',$1)/eg; print "Component Label [$label] has Component Value [$value]\n"; }

6. Glossary

This document uses the following terms:

component
One of a set of parts that comprise astructured value string.
componentLabel
A label given to acomponent.
componentValue
Data contained in thecomponent.
structured value
Structured valueis a term that was loosely employed in the DCMI context between 1997 and 2005 to designate a variety of multi-component entities used as "attribute values" in metadata. Strings encoded according to the syntax described in this specification were called "Dublin Core™ Structured Values", hence the acronym "DCSV".
structured value string
Avalue stringthat contains machine-parsable component parts (and which has an associatedsyntax encoding schemeindicating how the component parts are encoded within the string).
syntax encoding scheme
Asyntax encoding schemeindicates that thevalue stringis formatted in accordance with a formal notation, such as "2000-01-01" as the standard expression of a date.
value
Avalueis the physical or conceptual entity that is associated with apropertywhen it is used to describe aresource.
value representation
Avalue representationis a surrogate for (i.e., a representation of) thevalue.
value string
Avalue stringis a simple string that represents thevalueof aproperty.

6. Acknowledgments

John Kunze encouraged the original authors to write up their proposal formally, resulting in the first DCSV specification of July 2000. Kim Covil wrote the perl code. Eric Miller nagged regarding overlap with XML. Steve Tolkin convinced the original authors to switch to =.

After approval of the DCMI Abstract Model [DCAM] as a DCMI Recommendation in March 2005, the DCMI Usage Board undertook a review of this DCSV syntax specification and of the related specifications for the encoding schemes DCMI Box, DCMI Point, and DCMI Period, with the goal of revising their language for conformance with the Abstract Model [REVIEW].

7. References

[DCAM]
A. Powell, M. Nilsson, A. Naeve, P. Johnston, 2005,DCMI Abstract Model
http://dublincore.org/specifications/dublin-core/abstract-model/.

[REVIEW]
DCMI Usage BoardRevision of DCSV specifications
http://dublincore.org/usage/decisions/2006/2006-01.DCSV-revisions.html.

[XML]
Extensible Markup Language
http://www.w3.org/XML/.