DCMI DCSV

DCMI DCSV: A syntax for representing simple structured data in a text string

Creator: Simon Cox
Creator: Renato Iannella
Contributor: Andy Powell
Contributor: Andrew Wilson
Contributor: Pete Johnston
Contributor: Thomas Baker
Date Issued: 2006-04-10
Identifier: //www.voudr.com/specifications/dublin-core/dcmi-dcsv/2006-04-10/
Replaces: //www.voudr.com/specifications/dublin-core/dcmi-dcsv/2000-07-28/
Is Replaced By: Not Applicable
Latest version: //www.voudr.com/specifications/dublin-core/dcmi-dcsv/
Status of document: This is a DCMIRecommendation.
Description of document: This document describes a method for recording simple structured data in a text string, orstructured value string. This method is referred to for historical reasons as DCSV (which originally meant "Dublin Core™ Structured Value").
Revision note: 2006-04-10. After approval of the DCMI Abstract Model [DCAM] as a DCMI Recommendation in March 2005, the DCMI Usage Board undertook a review of the DCSV syntax specification and of the related specifications for the encoding schemes DCMI Box, DCMI Point, and DCMI Period, with the goal of revising their language for conformance with the Abstract Model. A summary of the changes made can be found in the document"Revision of DCSV specifications". As of 2005, the DCMI Abstract Model supports the construct "related description" as a method for describing value entities such as a persons or, indeed, time periods or locations in space. The DCMI Usage Board encourages implementers to consider using related descriptions as an alternative to packaging descriptive information in DCSV-encoded strings. Descriptions based on the DCMI Abstract Model are more likely to be interoperable over the longer term than descriptions using DCSV-syntax-based specifications.

Table of Contents

  1. Introduction
  2. Structured Value Strings
  3. Parsing DCSV
  4. Sample Code for parsing DCSV-encoded values
  5. Glossary
  6. Acknowledgements
  7. References

1. Introduction

It is often desirable to encode or _serialise_simple structured data within a text string. Some generic methods are in common use. Borrowing conventions from natural languages, commas (,) and semi-colons (;) are frequently used as list separators. Similarly, comma-separated values (CSV) and tab-separated values (TSV) are common export formats from spreadsheet and database software, with _line feeds_separating rows or tuples. Dots (.) and dashes (-) are sometimes used to imply hierarchies, particularly in thesaurus applications. The eXtensible Markup Language [XML] provides a more general solution, using tags contained within angle brackets (<, >) to indicate structure.

This document describes a particular method for encoding simple structured data within avalue string. In the DCMI Abstract Model [DCAM], avalue stringis defined as "a simple string that represents the value of a property".Value stringsencoded according to the method described in this document are referred to here asstructured value strings.

(Note that for historical reasons, the method itself is still referred to here as the DCSV Syntax, or DCSV. "DCSV" originally stood for "Dublin Core™ Structured Value", a legacy concept from circa 1997 which no longer has a place in today's DCMI Abstract Model [DCAM].)

As of 2006, when this specification was revised, the DCMI Usage Board encourages implementors to describe the value resource of a property more fully, where necessary, by making it the subject of an additional description. In terms of the the DCMI Abstract Model, this means creating a "related description" within the context of a "description set".

Using a related description in this way places all of the information in a description set within the context of the DCMI Abstract Model, helping to ensure that recipients of the metadata will be able to parse and understand it. In contrast, the use of DCSV-encoded strings for the description of the value resource forces recipients of the metadata to understand both the DCMI Abstract Model and the DCSV specification described here.

Despite these limitations, implementers may want to use DCSV-encoded strings in situations where the chosen encoding syntax used by the application does not support related descriptions (e.g., XHTML) or where there is a significant legacy adoption of DCSV-encoded structured value strings within a community.

2. Structured Value Strings

The DCSV Syntax allows a _structured value string_to be parsed into a set ofcomponents. To represent this set ofcomponents,语法区分两种类型的苏bstring within thestructured value string--componentLabelscomponentValues. AcomponentLabelis the name of a _component_within the structured data, and a _componentValue_is the data itself.

Punctuation characters are used in encoding astructured value stringas follows:

  • equal signs (=) separate thecomponentLabelfrom thecomponentValue;
  • semicolons (;) separate (optionally labelled)componentswithin a list;
  • dots (.) may be used withincomponentLabelsto indicate hierarchical or containment relationships betweencomponents.

ThecomponentLabels和thecomponentValuesthemselves each consist of a text string. The intention is that the_componentLabel_ will be a word or code corresponding to the name of thecomponent. ThecomponentLabelsmay be absent, in which case the entire substring delimited by semi-colons (;) or the end of the string comprises acomponentValue.

The following patterns show how structured information about a resource may be recorded in strings using DCSV:

"u1; u2; u3" "cA=v1" "cA=v1; cB.part1=v2; cB.part2=v3" "cA=v1; u2; u3"

where

  • u1, u2 and u3 arecomponentValuesof unlabelledcomponents,
  • cA and cB are thecomponentLabelsforcomponents,
  • part1 and part2 arecomponentLabelsforcomponentsthat are sub-components of thecomponentwith thecomponentLabelcB, and
  • v1, v2, and v3 arecomponentValuesof labelledcomponents.

The use of specific punctuation characters in DCSV-encoded_value strings_ means that care must be exercised if these characters are to be used directly within strings which comprise the content (eithercomponentLabelsor_componentValues_) of thecomponents. For DCSV, therefore, when an equal sign (=) or a semicolon (;) is required within thecomponentValue, the characters are escaped using a backslash, appearing as = ;. There should be no ambiguity regarding the dot, full-stop, or period (.) within strings: when it is part of a_componentLabel_, a dot indicates some hierarchy; when part of acomponentValue, it has the conventional meaning for the context. This method of escaping special characters largely preserves readability and the ability to enter DCSV-encoded metadatavalue stringseasily using a text editor if required. Software written to process DCSV-encodedvalue stringsmust make the necessary substitutions.

Note that DCSV is only intended to be used for relatively simple structured information about resources.

3. DCSV syntax encoding schemes

DCSV-encoded结构化值字符串是我们吗ed, this should be indicated by using a syntax encoding scheme. For example, the DCMI endorsed DCMI Period encoding scheme should be used as follows in XHTML:

  

Note that "DCSV" itself should not be used as a syntax encoding scheme. Implementors should use the DCSV specification to derive application-specific syntax encoding schemes where necessary.

Note also that as of 2006, for the reasons outlined above, it is unlikely that the DCMI Usage Board will endorse any new DCMI-maintained terms based on the DCSV specification.

4. Parsing DCSV

A simple method can be used to parse metadatavalue stringsencoded according to the DCSV syntax. For a single DCSV-encodedvalue string:

  1. split thevalue stringinto a list of substrings on any unescaped semicolons (;);
    if no semicolon is present, there is a single substring;
  2. split each substring into acomponentLabel-componentValuepair on any unescaped equal signs (=);
    if no equal sign (=) is present, the _componentLabel_is empty;
  3. within eachcomponentValue, replace the escaped characters with the actual character required.

5. Sample Code for parsing DCSV-encoded values

The following Perl program reads a DCSV-encoded string entered on stdin and prints a formatted version of the structured result. This code is provided for demonstration purposes only and contains no error-checking.

#!/usr/local/bin/perl使用严格的打印”进入生态ng to be parsed:\n"; my $string = join('',); print "\nString to be parsed is [$string]\n"; # First escape % characters $string =~ s/%/"%".unpack('C',"%")."%"/eg; # Next change \ escaped characters to %d% where d is the character's ascii code $string =~ s/\\(.)/"%".unpack('C',$1)."%"/eg; print "\nEscaped string is [$string]\n"; # Now split the string into components my @components = split(/;/, $string); print "\nComponents:\n"; foreach $component (@components) { my ($label, $value) = split(/=/, $component, 2); # if there is no = copy contents of $label into $value and empty $label if (!$value) { $value = $label; $name = ''; } # strip whitespace from name string $label =~ s/^\s*(\S+)\s*$/$1/; # convert % escaped characters back in label string $label =~ s/%(\d+)%/pack('C',$1)/eg; # convert % escaped characters back in value string $value =~s/%(\d+)%/pack('C',$1)/eg; print "Component Label [$label] has Component Value [$value]\n"; }

6. Glossary

This document uses the following terms:

component
One of a set of parts that comprise astructured value string.
componentLabel
A label given to acomponent.
componentValue
Data contained in thecomponent.
structured value
Structured valueis a term that was loosely employed in the DCMI context between 1997 and 2005 to designate a variety of multi-component entities used as "attribute values" in metadata. Strings encoded according to the syntax described in this specification were called "Dublin Core™ Structured Values", hence the acronym "DCSV".
structured value string
Avalue stringthat contains machine-parsable component parts (and which has an associatedsyntax encoding scheme印第安纳州icating how the component parts are encoded within the string).
syntax encoding scheme
Asyntax encoding scheme印第安纳州icates that thevalue stringis formatted in accordance with a formal notation, such as "2000-01-01" as the standard expression of a date.
value
Avalueis the physical or conceptual entity that is associated with apropertywhen it is used to describe aresource.
value representation
Avalue representationis a surrogate for (i.e., a representation of) thevalue.
value string
Avalue stringis a simple string that represents thevalueof aproperty.

6. Acknowledgments

John Kunze encouraged the original authors to write up their proposal formally, resulting in the first DCSV specification of July 2000. Kim Covil wrote the perl code. Eric Miller nagged regarding overlap with XML. Steve Tolkin convinced the original authors to switch to =.

After approval of the DCMI Abstract Model [DCAM] as a DCMI Recommendation in March 2005, the DCMI Usage Board undertook a review of this DCSV syntax specification and of the related specifications for the encoding schemes DCMI Box, DCMI Point, and DCMI Period, with the goal of revising their language for conformance with the Abstract Model [REVIEW].

7. References

[DCAM]
A. Powell, M. Nilsson, A. Naeve, P. Johnston, 2005,DCMI Abstract Model
//www.voudr.com/specifications/dublin-core/abstract-model/.

[REVIEW]
DCMI Usage BoardRevision of DCSV specifications
//www.voudr.com/usage/decisions/2006/2006-01.DCSV-revisions.html.

[XML]
可扩展标记语言
http://www.w3.org/XML/.