DICOM PS3.5 2018d - Data Structures and Encoding

6.2 Value Representation (VR)

The Value Representation of a Data Element describes the data type and format of that Data Element's Value(s). PS3.6 lists the VR of each Data Element by Data Element Tag.

Values with VRs constructed of character strings, except in the case of the VR UI, shall be padded with SPACE characters (20H, in the Default Character Repertoire) when necessary to achieve even length. Values with a VR of UI shall be padded with a single trailing NULL (00H) character when necessary to achieve even length. Values with a VR of OB shall be padded with a single trailing NULL byte value (00H) when necessary to achieve even length.

All new VRs defined in future versions of DICOM shall be of the same Data Element Structure as defined in Section 7.1.2 (i.e., following the format for VRs such as OB, OD, OF, OL, OW, SQ and UN).

Note

  1. Since all new VRs will be defined as specified in Section 7.1.2, an implementation may choose to ignore VRs not recognized by applying the rules stated in Section 7.1.2.

  2. When converting a Data Set from an Explicit VR Transfer Syntax to a different Transfer Syntax, an implementation may copy Data Elements with unrecognized VRs in the following manner:

    • If the endianness of the Transfer Syntaxes is the same, the Value of the Data Element may be copied unchanged and if the target Transfer Syntax is Explicit VR, the VR bytes copied unchanged. In practice this only applies to Little Endian Transfer Syntaxes, since there was only one Big Endian Transfer Syntax defined.

    • If the source Transfer Syntax is Little Endian and the target Transfer Syntax is the (retired) Big Endian Explicit VR Transfer Syntax, then the Value of the Data Element may be copied unchanged and the VR changed to UN, since being unrecognized, whether or not byte swapping is required is unknown. If the VR were copied unchanged, the byte order of the value might or might not be incorrect.

    • If the source Transfer Syntax is the (retired) Big Endian Explicit VR Transfer Syntax, then the Data Element cannot be copied, because whether or not byte swapping is required is unknown, and there is no equivalent of the UN VR to use when the value is big endian rather than little endian.

    The issues of whether or not the element may be copied, and what VR to use if copying, do not arise when converting a Data Set from Implicit VR Little Endian Transfer Syntax, since the VR would not be present to be unrecognized, and if the data element VR is not known from a data dictionary, then UN would be used.

An individual Value, including padding, shall not exceed the Length of Value, except in the case of the last Value of a multi-valued field as specified in Section 6.4.

Note

The lengths of Value Representations for which the Character Repertoire can be extended or replaced are expressly specified in characters rather than bytes in Table 6.2-1. This is because the mapping from a character to the number of bytes used for that character's encoding may be dependent on the character set used.

Escape Sequences used for Code Extension shall not be included in the count of characters.

Table 6.2-1. DICOM Value Representations

VR Name

Definition

Character Repertoire

Length of Value

AE

Application Entity

A string of characters that identifies an Application Entity with leading and trailing spaces (20H) being non-significant. A value consisting solely of spaces shall not be used.

Default Character Repertoire excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6), and all control characters.

16 bytes maximum

AS

Age String

A string of characters with one of the following formats -- nnnD, nnnW, nnnM, nnnY; where nnn shall contain the number of days for D, weeks for W, months for M, or years for Y.

Example: "018M" would represent an age of 18 months.

"0"-"9", "D", "W", "M", "Y" of Default Character Repertoire

4 bytes fixed

AT

Attribute Tag

Ordered pair of 16-bit unsigned integers that is the value of a Data Element Tag.

Example: A Data Element Tag of (0018,00FF) would be encoded as a series of 4 bytes in a Little-Endian Transfer Syntax as 18H,00H,FFH,00H.

Note

The encoding of an AT value is exactly the same as the encoding of a Data Element Tag as defined in Section 7.

not applicable

4 bytes fixed

CS

Code String

A string of characters identifying a controlled concept. Leading or trailing spaces (20H) are not significant.

Uppercase characters, "0"-"9", the SPACE character, and underscore "_", of the Default Character Repertoire

16 bytes maximum

DA

Date

A string of characters of the format YYYYMMDD; where YYYY shall contain year, MM shall contain the month, and DD shall contain the day, interpreted as a date of the Gregorian calendar system.

Example:

  • "19930822" would represent August 22, 1993.

Note

  1. The ACR-NEMA Standard 300 (predecessor to DICOM) supported a string of characters of the format YYYY.MM.DD for this VR. Use of this format is not compliant.

  2. See also DT VR in this table.

  3. Dates before year 1582, e.g., used for dating historical or archeological items, are interpreted as proleptic Gregorian calendar dates, unless otherwise specified.

"0"-"9" of Default Character Repertoire

In the context of a Query with range matching (see PS3.4), the character "-" is allowed, and a trailing SPACE character is allowed for padding.

8 bytes fixed

In the context of a Query with range matching (see PS3.4), the length is 18 bytes maximum.

DS

Decimal String

A string of characters representing either a fixed point number or a floating point number. A fixed point number shall contain only the characters 0-9 with an optional leading "+" or "-" and an optional "." to mark the decimal point. A floating point number shall be conveyed as defined in ANSI X3.9, with an "E" or "e" to indicate the start of the exponent. Decimal Strings may be padded with leading or trailing spaces. Embedded spaces are not allowed.

Note

Data Elements with multiple values using this VR may not be properly encoded if Explicit-VR Transfer Syntax is used and the VL of this attribute exceeds 65534 bytes.

"0"-"9", "+", "-", "E", "e", "." and the SPACE character of Default Character Repertoire

16 bytes maximum

DT

Date Time

A concatenated date-time character string in the format:

YYYYMMDDHHMMSS.FFFFFF&ZZXX

The components of this string, from left to right, are YYYY = Year, MM = Month, DD = Day, HH = Hour (range "00" - "23"), MM = Minute (range "00" - "59"), SS = Second (range "00" - "60").

FFFFFF = Fractional Second contains a fractional part of a second as small as 1 millionth of a second (range "000000" - "999999").

&ZZXX is an optional suffix for offset from Coordinated Universal Time (UTC), where & = "+" or "-", and ZZ = Hours and XX = Minutes of offset.

The year, month, and day shall be interpreted as a date of the Gregorian calendar system.

A 24-hour clock is used. Midnight shall be represented by only "0000" since "2400" would violate the hour range.

The Fractional Second component, if present, shall contain 1 to 6 digits. If Fractional Second is unspecified the preceding "." shall not be included. The offset suffix, if present, shall contain 4 digits. The string may be padded with trailing SPACE characters. Leading and embedded spaces are not allowed.

A component that is omitted from the string is termed a null component. Trailing null components of Date Time indicate that the value is not precise to the precision of those components. The YYYY component shall not be null. Non-trailing null components are prohibited. The optional suffix is not considered as a component.

A Date Time value without the optional suffix is interpreted to be in the local time zone of the application creating the Data Element, unless explicitly specified by the Timezone Offset From UTC (0008,0201).

UTC offsets are calculated as "local time minus UTC". The offset for a Date Time value in UTC shall be +0000.

Note

  1. The range of the offset is -1200 to +1400. The offset for United States Eastern Standard Time is -0500. The offset for Japan Standard Time is +0900.

  2. The RFC 2822 use of -0000 as an offset to indicate local time is not allowed.

  3. A Date Time value of 195308 means August 1953, not specific to particular day. A Date Time value of 19530827111300.0 means August 27, 1953, 11;13 a.m. accurate to 1/10th second.

  4. The Second component may have a value of 60 only for a leap second.

  5. The offset may be included regardless of null components; e.g., 2007-0500 is a legal value.

"0"-"9", "+", "-", "." and the SPACE character of Default Character Repertoire

26 bytes maximum

In the context of a Query with range matching (see PS3.4), the length is 54 bytes maximum.

FL

Floating Point Single

Single precision binary floating point number represented in IEEE 754:1985 32-bit Floating Point Number Format.

not applicable

4 bytes fixed

FD

Floating Point Double

Double precision binary floating point number represented in IEEE 754:1985 64-bit Floating Point Number Format.

not applicable

8 bytes fixed

IS

Integer String

A string of characters representing an Integer in base-10 (decimal), shall contain only the characters 0 - 9, with an optional leading "+" or "-". It may be padded with leading and/or trailing spaces. Embedded spaces are not allowed.

The integer, n, represented shall be in the range:

-231<= n <= (231-1).

"0"-"9", "+", "-" and the SPACE character of Default Character Repertoire

12 bytes maximum

LO

Long String

A character string that may be padded with leading and/or trailing spaces. The character code 5CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between values in multiple valued data elements. The string shall not have Control Characters except for ESC.

Default Character Repertoire and/or as defined by (0008,0005) excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6), and all Control Characters except ESC when used for ISO 2022 escape sequences.

64 chars maximum (see Note in Section 6.2)

LT

Long Text

A character string that may contain one or more paragraphs. It may contain the Graphic Character set and the Control Characters, CR, LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but leading spaces are considered to be significant. Data Elements with this VR shall not be multi-valued and therefore character code 5CH (the BACKSLASH "\" in ISO-IR 6) may be used.

Default Character Repertoire and/or as defined by (0008,0005) excluding Control Characters except TAB, LF, FF, CR (and ESC when used for ISO 2022 escape sequences).

10240 chars maximum (see Note in Section 6.2)

OB

Other Byte

An octet-stream where the encoding of the contents is specified by the negotiated Transfer Syntax. OB is a VR that is insensitive to byte ordering (see Section 7.3). The octet-stream shall be padded with a single trailing NULL byte value (00H) when necessary to achieve even length.

not applicable

see Transfer Syntax definition

OD

Other Double

A stream of 64-bit IEEE 754:1985 floating point words. OD is a VR that requires byte swapping within each 64-bit word when changing byte ordering (see Section 7.3).

not applicable

232-8 bytes maximum

OF

Other Float

A stream of 32-bit IEEE 754:1985 floating point words. OF is a VR that requires byte swapping within each 32-bit word when changing byte ordering (see Section 7.3).

not applicable

232-4 bytes maximum

OL

Other Long

A stream of 32-bit words where the encoding of the contents is specified by the negotiated Transfer Syntax. OL is a VR that requires byte swapping within each word when changing byte ordering (see Section 7.3).

not applicable

see Transfer Syntax definition

OW

Other Word

A stream of 16-bit words where the encoding of the contents is specified by the negotiated Transfer Syntax. OW is a VR that requires byte swapping within each word when changing byte ordering (see Section 7.3).

not applicable

see Transfer Syntax definition

PN

Person Name

A character string encoded using a 5 component convention. The character code 5CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between values in multiple valued data elements. The string may be padded with trailing spaces. For human use, the five components in their order of occurrence are: family name complex, given name complex, middle name, name prefix, name suffix.

Note

HL7 prohibits leading spaces within a component; DICOM allows leading and trailing spaces and considers them insignificant.

Any of the five components may be an empty string. The component delimiter shall be the caret "^" character (5EH). There shall be no more than four component delimiters, i.e., none after the last component if all components are present. Delimiters are required for interior null components. Trailing null components and their delimiters may be omitted. Multiple entries are permitted in each component and are encoded as natural text strings, in the format preferred by the named person.

For veterinary use, the first two of the five components in their order of occurrence are: responsible party family name or responsible organization name, patient name. The remaining components are not used and shall not be present.

This group of five components is referred to as a Person Name component group.

For the purpose of writing names in ideographic characters and in phonetic characters, up to 3 groups of components (see Annex H, Annex I and Annex J) may be used. The delimiter for component groups shall be the equals character "=" (3DH). There shall be no more than two component group delimiters, i.e., none after the last component group if all component groups are present. The three component groups of components in their order of occurrence are: an alphabetic representation, an ideographic representation, and a phonetic representation.

Any component group may be absent, including the first component group. In this case, the person name may start with one or more "=" delimiters. Delimiters are required for interior null component groups. Trailing null component groups and their delimiters may be omitted.

Precise semantics are defined for each component group. See Section 6.2.1.2.

For examples and notes, see Section 6.2.1.1.

Default Character Repertoire and/or as defined by (0008,0005) excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6) and all Control Characters except ESC when used for ISO 2022 escape sequences.

64 chars maximum per component group

(see Note in Section 6.2)

SH

Short String

A character string that may be padded with leading and/or trailing spaces. The character code 05CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between values for multiple data elements. The string shall not have Control Characters except ESC.

Default Character Repertoire and/or as defined by (0008,0005) excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6) and all Control Characters except ESC when used for ISO 2022 escape sequences.

16 chars maximum (see Note in Section 6.2)

SL

Signed Long

Signed binary integer 32 bits long in 2's complement form.

Represents an integer, n, in the range:

- 231<= n <= 231-1.

not applicable

4 bytes fixed

SQ

Sequence of Items

Value is a Sequence of zero or more Items, as defined in Section 7.5.

not applicable (see Section 7.5)

not applicable (see Section 7.5)

SS

Signed Short

Signed binary integer 16 bits long in 2's complement form. Represents an integer n in the range:

-215<= n <= 215-1.

not applicable

2 bytes fixed

ST

Short Text

A character string that may contain one or more paragraphs. It may contain the Graphic Character set and the Control Characters, CR, LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but leading spaces are considered to be significant. Data Elements with this VR shall not be multi-valued and therefore character code 5CH (the BACKSLASH "\" in ISO-IR 6) may be used.

Default Character Repertoire and/or as defined by (0008,0005) excluding Control Characters except TAB, LF, FF, CR (and ESC when used for ISO 2022 escape sequences).

1024 chars maximum (see Note in Section 6.2)

TM

Time

A string of characters of the format HHMMSS.FFFFFF; where HH contains hours (range "00" - "23"), MM contains minutes (range "00" - "59"), SS contains seconds (range "00" - "60"), and FFFFFF contains a fractional part of a second as small as 1 millionth of a second (range "000000" - "999999"). A 24-hour clock is used. Midnight shall be represented by only "0000" since "2400" would violate the hour range. The string may be padded with trailing spaces. Leading and embedded spaces are not allowed.

One or more of the components MM, SS, or FFFFFF may be unspecified as long as every component to the right of an unspecified component is also unspecified, which indicates that the value is not precise to the precision of those unspecified components.

The FFFFFF component, if present, shall contain 1 to 6 digits. If FFFFFF is unspecified the preceding "." shall not be included.

Examples:

  1. "070907.0705 " represents a time of 7 hours, 9 minutes and 7.0705 seconds.

  2. "1010" represents a time of 10 hours, and 10 minutes.

  3. "021 " is an invalid value.

Note

  1. The ACR-NEMA Standard 300 (predecessor to DICOM) supported a string of characters of the format HH:MM:SS.frac for this VR. Use of this format is not compliant.

  2. See also DT VR in this table.

  3. The SS component may have a value of 60 only for a leap second.

"0"-"9", "." and the SPACE character of Default Character Repertoire

In the context of a Query with range matching (see PS3.4), the character "-" is allowed.

14 bytes maximum

In the context of a Query with range matching (see PS3.4), the length is 28 bytes maximum.

UC

Unlimited Characters

A character string that may be of unlimited length that may be padded with trailing spaces. The character code 5CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between values in multiple valued data elements. The string shall not have Control Characters except for ESC.

Default Character Repertoire and/or as defined by (0008,0005) excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6), and all Control Characters except ESC when used for ISO 2022 escape sequences.

232-2 bytes maximum

See Note 2

UI

Unique Identifier (UID)

A character string containing a UID that is used to uniquely identify a wide variety of items. The UID is a series of numeric components separated by the period "." character. If a Value Field containing one or more UIDs is an odd number of bytes in length, the Value Field shall be padded with a single trailing NULL (00H) character to ensure that the Value Field is an even number of bytes in length. See Section 9 and Annex B for a complete specification and examples.

"0"-"9", "." of Default Character Repertoire

64 bytes maximum

UL

Unsigned Long

Unsigned binary integer 32 bits long. Represents an integer n in the range:

0 <= n < 232.

not applicable

4 bytes fixed

UN

Unknown

An octet-stream where the encoding of the contents is unknown (see Section 6.2.2).

not applicable

Any length valid for any of the other DICOM Value Representations

UR

Universal Resource Identifier or Universal Resource Locator (URI/URL)

A string of characters that identifies a URI or a URL as defined in [RFC3986]. Leading spaces are not allowed. Trailing spaces shall be ignored. Data Elements with this VR shall not be multi-valued.

Note

Both absolute and relative URIs are permitted. If the URI is relative, then it is relative to the base URI of the object within which it is contained.

The subset of the Default Character Repertoire required for the URI as defined in IETF RFC3986 Section 2, plus the space (20H) character permitted only as trailing padding.

Characters outside the permitted character set must be "percent encoded".

Note

The Backslash (5CH) character is among those disallowed in URIs.

232-2 bytes maximum.

See Note 2

US

Unsigned Short

Unsigned binary integer 16 bits long. Represents integer n in the range:

0 <= n < 216.

not applicable

2 bytes fixed

UT

Unlimited Text

A character string that may contain one or more paragraphs. It may contain the Graphic Character set and the Control Characters, CR, LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but leading spaces are considered to be significant. Data Elements with this VR shall not be multi-valued and therefore character code 5CH (the BACKSLASH "\" in ISO-IR 6) may be used.

Default Character Repertoire and/or as defined by (0008,0005) excluding Control Characters except TAB, LF, FF, CR (and ESC when used for ISO 2022 escape sequences).

232-2 bytes maximum

See Note 2


Note

  1. For attributes that were present in ACR-NEMA 1.0 and 2.0 and that have been retired, the specifications of Value Representation and Value Multiplicity provided are recommendations for the purpose of interpreting their values in objects created in accordance with earlier versions of this standard. These recommendations are suggested as most appropriate for a particular attribute; however, there is no guarantee that historical objects will not violate some requirements or specified VR and/or VM.

  2. The length of the value of UC, UR and UT VRs is limited only by the size of the maximum unsigned integer representable in a 32 bit VL field minus two, since FFFFFFFFH is reserved and lengths are required to be even.

  3. In previous editions of the standard (see PS3.5 2015a), the TAB character was not listed as permitted for the ST, LT and UT VRs. It has been added for the convenience of formatting and the encoding of XML text.

6.2.1 Person Name (PN) Value Representation

6.2.1.1 Examples of PN VR and Notes

Examples:

  • Rev. John Robert Quincy Adams, B.A. M.Div.

    "Adams^John Robert Quincy^^Rev.^B.A. M.Div."

    [One family name; three given names; no middle name; one prefix; two suffixes.]

  • Susan Morrison-Jones, Ph.D., Chief Executive Officer

    "Morrison-Jones^Susan^^^Ph.D., Chief Executive Officer"

    [Two family names; one given name; no middle name; no prefix; two suffixes.]

  • John Doe

    "Doe^John"

    [One family name; one given name; no middle name, prefix, or suffix. Delimiters have been omitted for the three trailing null components.]

  • (for examples of the encoding of Person Names using multi-byte character sets see Annex H)

  • "Smith^Fluffy"

    [A cat, rather than a human, whose responsible party family name is Smith, and whose own name is Fluffy]

  • "ABC Farms^Running on Water"

    [A horse whose responsible organization is named ABC Farms, and whose name is "Running On Water"]

Note

  1. A similar multiple component convention is also used by the HL7 v2 XPN data type. However, the XPN data type places the suffix component before the prefix, and has a sixth component "degree" that DICOM subsumes in the name suffix. There are also differences in the manner in which name representation is identified.

  2. In typical American and European usage the first occurrence of "given name" would represent the "first name". The second and subsequent occurrences of the "given name" would typically be treated as a middle name(s). The "middle name" component is retained for the purpose of backward compatibility with existing standards.

  3. The implementer should remain mindful of earlier usage forms that represented "given names" as "first" and "middle" and that translations to and from this previous typical usage may be required.

  4. For reasons of backward compatibility with versions of this standard prior to V3.0, person names might be considered a single family name complex (single component without "^" delimiters).

6.2.1.2 Ideographic and Phonetic Characters in Data Elements with VR of PN

Character strings representing person names are encoded using a convention for PN value representations based on component groups with 5 components.

For the purpose of writing names in ideographic characters and in phonetic characters, up to 3 component groups may be used. The delimiter of the component group shall be the equals character "=" (3DH). The three component groups in their order of occurrence are: an alphabetic representation, an ideographic representation, and a phonetic representation.

Any component group may be absent, including the first component group. In this case, the person name may start with one or more "=" delimiters. Delimiters are also required for interior null component groups. Trailing null component groups and their delimiters may be omitted.

The first component group (identified by DICOM as "alphabetic") shall be encoded using the character set specified by the Attribute Specific Character Set (0008,0005), value 1. If Attribute Specific Character Set (0008,0005) is not present, the default Character Repertoire ISO-IR 6 shall be used. ISO 2022 escapes for Code Extension shall not be used in this component group. When Specific Character Set (0008,0005) value 1 specifies a multi-byte character set without Code Extension (i.e., Unicode in UTF-8, GB18030 or GBK), the characters of this component group may be encoded with multiple bytes, but shall be drawn from the code points U+0020 through U+1FFF of ISO/IEC 10646, or the following ISO/IEC 10646 code points:

  • U+3001, U+3002, U+300C, U+300D, U+3099 through U+309C, and U+30A0 through U+30FF

The second group shall be used for ideographic characters. The character sets used will usually be those from Attribute Specific Character Set (0008,0005), value 2 through n, and may use ISO 2022 escapes.

The third group shall be used for phonetic characters. The character sets used shall be those from Attribute Specific Character Set (0008,0005), value 1 through n, and may use ISO 2022 escapes.

Delimiter characters "^" and "=" are taken from the character set specified by value 1 of the Attribute Specific Character Set (0008,0005). If Attribute Specific Character Set (0008,0005), value 1 is not present, the default Character Repertoire ISO-IR 6 shall be used.

At the beginning of the value of the Person Name data element, the following initial condition is assumed: if Attribute Specific Character Set (0008,0005), value 1 is not present, the default Character Repertoire ISO-IR 6 is invoked, and if the Attribute Specific Character Set (0008,0005), value 1 is present, the character set specified by value 1 of the Attribute is invoked.

At the end of the value of the Person Name data element, and before the component delimiters "^" and "=", the character set shall be switched to the default character repertoire ISO-IR 6, if value 1 of the Attribute Specific Character Set (0008,0005) is not present. If value 1 of the Attribute Specific Character Set (0008,0005) is present, the character set shall be switched to that specified by value 1 of the Attribute.

The value length of each component group is 64 characters maximum, including the delimiter for the component group. Each combining character (e.g., diacritics or vowel marks) shall be considered a separate character for this maximum length, regardless of how an application may display such combining characters (i.e., combined into the glyph for the base character, or rendered separately).

DICOM PS3.5 2018d - Data Structures and Encoding