DICOM PS3.5 2024e - Data Structures and Encoding |
---|
The Value Representation of a Data Element describes the data type and format of that Data Element's Value(s). PS3.6 lists the VR of each Data Element by Data Element Tag.
Values with VRs constructed of character strings, except in the case of the VR UI, shall be padded with SPACE characters (20H, in the Default Character Repertoire) when necessary to achieve even length. Values with a VR of UI shall be padded with a single trailing NULL (00H) character when necessary to achieve even length. Values with a VR of OB shall be padded with a single trailing NULL byte value (00H) when necessary to achieve even length.
All new VRs defined in future versions of DICOM shall be of the same Data Element Structure as defined in Section 7.1.2 with reserved bytes after the VR and a 32-bit unsigned integer VL (i.e., following the format for VRs such as OB or UT), and may or may not permit Undefined Length.
Since all new VRs will be defined as specified in Section 7.1.2, an implementation may choose to ignore VRs not recognized by applying the rules stated in Section 7.1.2.
When converting a Data Set from an Explicit VR Transfer Syntax to a different Transfer Syntax, an implementation may copy Data Elements with unrecognized VRs in the following manner:
If the endianness of the Transfer Syntaxes is the same, the Value of the Data Element may be copied unchanged and if the target Transfer Syntax is Explicit VR, the VR bytes copied unchanged. In practice this only applies to Little Endian Transfer Syntaxes, since there was only one Big Endian Transfer Syntax defined.
If the source Transfer Syntax is Little Endian and the target Transfer Syntax is the (retired) Big Endian Explicit VR Transfer Syntax, then the Value of the Data Element may be copied unchanged and the VR changed to UN, since being unrecognized, whether or not byte swapping is required is unknown. If the VR were copied unchanged, the byte order of the Value might or might not be incorrect.
If the source Transfer Syntax is the (retired) Big Endian Explicit VR Transfer Syntax, then the Data Element cannot be copied, because whether or not byte swapping is required is unknown, and there is no equivalent of the UN VR to use when the Value is big endian rather than little endian.
The issues of whether or not the Data Element may be copied, and what VR to use if copying, do not arise when converting a Data Set from Implicit VR Little Endian Transfer Syntax, since the VR would not be present to be unrecognized, and if the Data Element VR is not known from a data dictionary, then UN would be used.
An individual Value, including padding, shall not exceed the Length of Value, except in the case of the last Value of a multi-valued field as specified in Section 6.4.
The lengths of Value Representations for which the Character Repertoire can be extended or replaced are expressly specified in characters rather than bytes in Table 6.2-1. This is because the mapping from a character to the number of bytes used for that character's encoding may be dependent on the character set used.
Escape Sequences used for Code Extension shall not be included in the count of characters.
Table 6.2-1. DICOM Value Representations
A string of characters that identifies an Application Entity with leading and trailing spaces (20H) being non-significant. A Value consisting solely of spaces shall not be used. |
Default Character Repertoire excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6), and all control characters. |
||
A string of characters with one of the following formats -- nnnD, nnnW, nnnM, nnnY; where nnn shall contain the number of days for D, weeks for W, months for M, or years for Y. |
|||
Ordered pair of 16-bit unsigned integers that is the Value of a Data Element Tag. Example: A Data Element Tag of (0018,00FF) would be encoded as a series of 4 bytes in a Little-Endian Transfer Syntax as 18H,00H,FFH,00H. NoteThe encoding of an AT Value is exactly the same as the encoding of a Data Element Tag as defined in Section 7. |
|||
A string of characters identifying a controlled concept. Leading or trailing spaces (20H) are not significant. Alternatively, in the context of a Query with Empty Value Matching (see PS3.4), a string of two QUOTATION MARK characters, representing an empty key Value. |
Uppercase characters, "0"-"9", the SPACE character, and underscore "_", of the Default Character Repertoire In the context of a Query with Empty Value Matching (see PS3.4), the QUOTATION MARK character is allowed. |
In the context of a Query with Empty Value Matching (see PS3.4), the length is 2 bytes fixed. |
|
A string of characters of the format YYYYMMDD; where YYYY shall contain year, MM shall contain the month, and DD shall contain the day, interpreted as a date of the Gregorian calendar system. Note
Alternatively, in the context of a Query with Empty Value Matching (see PS3.4), a string of two QUOTATION MARK characters, representing an empty key Value. |
"0"-"9" of Default Character Repertoire In the context of a Query with Range Matching (see PS3.4), the character "-" is allowed, and a trailing SPACE character is allowed for padding. In the context of a Query with Empty Value Matching (see PS3.4), the QUOTATION MARK character is allowed. |
In the context of a Query with Range Matching (see PS3.4), the length is 18 bytes maximum. In the context of a Query with Empty Value Matching (see PS3.4), the length is 2 bytes fixed. |
|
A string of characters representing either a fixed point number or a floating point number. A fixed point number shall contain only the characters 0-9 with an optional leading "+" or "-" and an optional "." to mark the decimal point. A floating point number shall be conveyed as defined in ANSI X3.9, with an "E" or "e" to indicate the start of the exponent. Decimal Strings may be padded with leading or trailing spaces. Embedded spaces are not allowed. |
"0"-"9", "+", "-", "E", "e", "." and the SPACE character of Default Character Repertoire |
||
A concatenated date-time character string in the format: The components of this string, from left to right, are YYYY = Year, MM = Month, DD = Day, HH = Hour (range "00" - "23"), MM = Minute (range "00" - "59"), SS = Second (range "00" - "60"). FFFFFF = Fractional Second contains a fractional part of a second as small as 1 millionth of a second (range "000000" - "999999"). &ZZXX is an optional suffix for offset from Coordinated Universal Time (UTC), where & = "+" or "-", and ZZ = Hours and XX = Minutes of offset. The year, month, and day shall be interpreted as a date of the Gregorian calendar system. A 24-hour clock is used. Midnight shall be represented by only "0000" since "2400" would violate the hour range. The Fractional Second component, if present, shall contain 1 to 6 digits. If Fractional Second is unspecified the preceding "." shall not be included. The offset suffix, if present, shall contain 4 digits. The string may be padded with trailing SPACE characters. Leading and embedded spaces are not allowed. A component that is omitted from the string is termed a null component. Trailing null components of Date Time indicate that the Value is not precise to the precision of those components. The YYYY component shall not be null. Non-trailing null components are prohibited. The optional suffix is not considered as a component. A Date Time Value without the optional suffix is interpreted to be in the local time zone of the application creating the Data Element, unless explicitly specified by the Timezone Offset From UTC (0008,0201). UTC offsets are calculated as "local time minus UTC". The offset for a Date Time Value in UTC shall be +0000. Alternatively, in the context of a Query with Empty Value Matching (see PS3.4), a string of two QUOTATION MARK characters, representing an empty key Value. Note
|
"0"-"9", "+", "-", "." and the SPACE character of Default Character Repertoire In the context of a Query with Empty Value Matching (see PS3.4), the QUOTATION MARK character is allowed. |
In the context of a Query with Range Matching (see PS3.4), the length is 54 bytes maximum. In the context of a Query with Empty Value Matching (see PS3.4), the length is 2 bytes fixed. |
|
Single precision binary floating point value represented in [IEEE 754] binary32 format. All [IEEE 754] values are permitted, including NaN (Not a Number) and infinity values. |
|||
Double precision binary floating point value represented in [IEEE 754] binary64 format. All [IEEE 754] values are permitted, including NaN (Not a Number) and infinity values. |
|||
A string of characters representing an Integer in base-10 (decimal), shall contain only the characters 0 - 9, with an optional leading "+" or "-". It may be padded with leading and/or trailing spaces. Embedded spaces are not allowed. |
"0"-"9", "+", "-" and the SPACE character of Default Character Repertoire |
||
A character string that may be padded with leading and/or trailing spaces. The character code 5CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between Values in multi-valued Data Elements. The string shall not have Control Characters except for ESC. |
Default Character Repertoire and/or as defined by (0008,0005) excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6), and all Control Characters except ESC when used for [ISO/IEC 2022] escape sequences. |
64 chars maximum (see Note in Section 6.2) |
|
A character string that may contain one or more paragraphs. It may contain the Graphic Character set and the Control Characters, CR, LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but leading spaces are considered to be significant. Data Elements with this VR shall not be multi-valued and therefore character code 5CH (the BACKSLASH "\" in ISO-IR 6) may be used. |
Default Character Repertoire and/or as defined by (0008,0005) excluding Control Characters except TAB, LF, FF, CR (and ESC when used for [ISO/IEC 2022] escape sequences). |
10240 chars maximum (see Note in Section 6.2) |
|
An octet-stream where the encoding of the contents is specified by the negotiated Transfer Syntax. OB is a VR that is insensitive to byte ordering (see Section 7.3). The octet-stream shall be padded with a single trailing NULL byte value (00H) when necessary to achieve even length. |
|||
A stream of [IEEE 754] binary64 values. All [IEEE 754] values are permitted, including NaN (Not a Number) and infinity values. OD is a VR that requires byte swapping within each 64-bit word when changing byte ordering (see Section 7.3). |
|||
A stream of [IEEE 754] binary32 values. All [IEEE 754] values are permitted, including NaN (Not a Number) and infinity values. OF is a VR that requires byte swapping within each 32-bit word when changing byte ordering (see Section 7.3). |
|||
A stream of 32-bit words where the encoding of the contents is specified by the negotiated Transfer Syntax. OL is a VR that requires byte swapping within each word when changing byte ordering (see Section 7.3). |
|||
A stream of 64-bit words where the encoding of the contents is specified by the negotiated Transfer Syntax. OV is a VR that requires byte swapping within each word when changing byte ordering (see Section 7.3). |
|||
A stream of 16-bit words where the encoding of the contents is specified by the negotiated Transfer Syntax. OW is a VR that requires byte swapping within each word when changing byte ordering (see Section 7.3). |
|||
A character string encoded using a 5 component convention. The character code 5CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between Values in multi-valued Data Elements. The string may be padded with trailing spaces. For human use, the five components in their order of occurrence are: family name complex, given name complex, middle name, name prefix, name suffix. NoteHL7 prohibits leading spaces within a component; DICOM allows leading and trailing spaces and considers them insignificant. Any of the five components may be an empty string. The component delimiter shall be the caret "^" character (5EH). There shall be no more than four component delimiters, i.e., none after the last component if all components are present. Delimiters are required for interior null components. Trailing null components and their delimiters may be omitted. Multiple entries are permitted in each component and are encoded as natural text strings, in the format preferred by the named person. For veterinary use, the first two of the five components in their order of occurrence are: responsible party family name or responsible organization name, patient name. The remaining components are not used and shall not be present. This group of five components is referred to as a Person Name component group. For the purpose of writing names in ideographic characters and in phonetic characters, up to 3 groups of components (see Annex H, Annex I and Annex J) may be used. The delimiter for component groups shall be the equals character "=" (3DH). There shall be no more than two component group delimiters, i.e., none after the last component group if all component groups are present. The three component groups of components in their order of occurrence are: an alphabetic representation, an ideographic representation, and a phonetic representation. Any component group may be absent, including the first component group. In this case, the person name may start with one or more "=" delimiters. Delimiters are required for interior null component groups. Trailing null component groups and their delimiters may be omitted. Precise semantics are defined for each component group. See Section 6.2.1.2. For examples and notes, see Section 6.2.1.1. |
Default Character Repertoire and/or as defined by (0008,0005) excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6) and all Control Characters except ESC when used for [ISO/IEC 2022] escape sequences. |
64 chars maximum per component group (see Note in Section 6.2) |
|
A character string that may be padded with leading and/or trailing spaces. The character code 05CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between Values for multi-valued Data Elements. The string shall not have Control Characters except ESC. |
Default Character Repertoire and/or as defined by (0008,0005) excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6) and all Control Characters except ESC when used for [ISO/IEC 2022] escape sequences. |
16 chars maximum (see Note in Section 6.2) |
|
Signed binary integer 32 bits long in 2's complement form. |
|||
Value is a Sequence of zero or more Items, as defined in Section 7.5. |
not applicable (see Section 7.5) |
not applicable (see Section 7.5) |
|
Signed binary integer 16 bits long in 2's complement form. Represents an integer n in the range: |
|||
A character string that may contain one or more paragraphs. It may contain the Graphic Character set and the Control Characters, CR, LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but leading spaces are considered to be significant. Data Elements with this VR shall not be multi-valued and therefore character code 5CH (the BACKSLASH "\" in ISO-IR 6) may be used. |
Default Character Repertoire and/or as defined by (0008,0005) excluding Control Characters except TAB, LF, FF, CR (and ESC when used for [ISO/IEC 2022] escape sequences). |
1024 chars maximum (see Note in Section 6.2) |
|
Signed binary integer 64 bits long. Represents an integer n in the range: |
|||
A string of characters of the format HHMMSS.FFFFFF; where HH contains hours (range "00" - "23"), MM contains minutes (range "00" - "59"), SS contains seconds (range "00" - "60"), and FFFFFF contains a fractional part of a second as small as 1 millionth of a second (range "000000" - "999999"). A 24-hour clock is used. Midnight shall be represented by only "0000" since "2400" would violate the hour range. The string may be padded with trailing spaces. Leading and embedded spaces are not allowed. One or more of the components MM, SS, or FFFFFF may be unspecified as long as every component to the right of an unspecified component is also unspecified, which indicates that the Value is not precise to the precision of those unspecified components. The FFFFFF component, if present, shall contain 1 to 6 digits. If FFFFFF is unspecified the preceding "." shall not be included. NoteAlternatively, in the context of a Query with Empty Value Matching (see PS3.4), a string of two QUOTATION MARK characters, representing an empty key Value. |
"0"-"9", "." and the SPACE character of Default Character Repertoire In the context of a Query with Range Matching (see PS3.4), the character "-" is allowed. In the context of a Query with Empty Value Matching (see PS3.4), the QUOTATION MARK character is allowed. |
In the context of a Query with Range Matching (see PS3.4), the length is 28 bytes maximum. In the context of a Query with Empty Value Matching (see PS3.4), the length is 2 bytes fixed. |
|
A character string that may be of unlimited length that may be padded with trailing spaces. The character code 5CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between Values in multi-valued Data Elements. The string shall not have Control Characters except for ESC. |
Default Character Repertoire and/or as defined by (0008,0005) excluding character code 5CH (the BACKSLASH "\" in ISO-IR 6), and all Control Characters except ESC when used for [ISO/IEC 2022] escape sequences. |
See Note 2 |
|
A character string containing a UID that is used to uniquely identify a wide variety of items. The UID is a series of numeric components separated by the period "." character. If a Value Field containing one or more UIDs is an odd number of bytes in length, the Value Field shall be padded with a single trailing NULL (00H) character to ensure that the Value Field is an even number of bytes in length. See Section 9 and Annex B for a complete specification and examples. |
|||
Unsigned binary integer 32 bits long. Represents an integer n in the range: |
|||
An octet-stream where the encoding of the contents is unknown (see Section 6.2.2). |
Any length valid for any of the other DICOM Value Representations |
||
Universal Resource Identifier or Universal Resource Locator (URI/URL) |
A string of characters that identifies a URI or a URL as defined in [RFC3986]. Leading spaces are not allowed. Trailing spaces shall be ignored. Data Elements with this VR shall not be multi-valued. Alternatively, in the context of a Query with Empty Value Matching (see PS3.4), a string of two QUOTATION MARK characters, representing an empty key Value. |
The subset of the Default Character Repertoire required for the URI as defined in IETF RFC3986 Section 2, plus the space (20H) character permitted only as trailing padding. Characters outside the permitted character set must be "percent encoded". In the context of a Query with Empty Value Matching (see PS3.4), the QUOTATION MARK character is allowed. |
See Note 2. In the context of a Query with Empty Value Matching (see PS3.4), the length is 2 bytes fixed. |
Unsigned binary integer 16 bits long. Represents integer n in the range: |
|||
A character string that may contain one or more paragraphs. It may contain the Graphic Character set and the Control Characters, CR, LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but leading spaces are considered to be significant. Data Elements with this VR shall not be multi-valued and therefore character code 5CH (the BACKSLASH "\" in ISO-IR 6) may be used. |
Default Character Repertoire and/or as defined by (0008,0005) excluding Control Characters except TAB, LF, FF, CR (and ESC when used for [ISO/IEC 2022] escape sequences). |
See Note 2 |
|
Unsigned binary integer 64 bits long. Represents an integer n in the range: |
For Data Elements that were present in ACR-NEMA 1.0 and 2.0 and that have been retired, the specifications of Value Representation and Value Multiplicity provided are recommendations for the purpose of interpreting their Values in objects created in accordance with earlier versions of this Standard. These recommendations are suggested as most appropriate for a particular Data Element; however, there is no guarantee that historical objects will not violate some requirements or specified VR and/or VM.
The length of the Value of UC, UR and UT VRs is limited only by the size of the maximum unsigned integer representable in a 32 bit VL field minus two, since FFFFFFFFH is reserved and lengths are required to be even.
In previous editions of the Standard (see PS3.5 2015a), the TAB character was not listed as permitted for the ST, LT and UT VRs. It has been added for the convenience of formatting and the encoding of XML text.
Rev. John Robert Quincy Adams, B.A. M.Div.
"Adams^John Robert Quincy^^Rev.^B.A. M.Div."
[One family name; three given names; no middle name; one prefix; two suffixes.]
Susan Morrison-Jones, Ph.D., Chief Executive Officer
"Morrison-Jones^Susan^^^Ph.D., Chief Executive Officer"
[Two family names; one given name; no middle name; no prefix; two suffixes.]
[One family name; one given name; no middle name, prefix, or suffix. Delimiters have been omitted for the three trailing null components.]
(for examples of the encoding of Person Names using multi-byte character sets see Annex H)
[A cat, rather than a human, whose responsible party family name is Smith, and whose own name is Fluffy]
[A horse whose responsible organization is named ABC Farms, and whose name is "Running On Water"]
A similar multiple component convention is also used by the HL7 v2 XPN data type. However, the XPN data type places the suffix component before the prefix, and has a sixth component "degree" that DICOM subsumes in the name suffix. There are also differences in the manner in which name representation is identified.
In typical American and European usage the first occurrence of "given name" would represent the "first name". The second and subsequent occurrences of the "given name" would typically be treated as a middle name(s). The "middle name" component is retained for the purpose of backward compatibility with existing standards.
The implementer should remain mindful of earlier usage forms that represented "given names" as "first" and "middle" and that translations to and from this previous typical usage may be required.
For reasons of backward compatibility with older versions of this Standard, person names might be considered a single family name complex (single component without "^" delimiters).
Character strings representing person names are encoded using a convention for PN Value Representation based on component groups with 5 components.
For the purpose of writing names in ideographic characters and in phonetic characters, up to 3 component groups may be used. The delimiter of the component group shall be the equals character "=" (3DH). The three component groups in their order of occurrence are: an alphabetic representation, an ideographic representation, and a phonetic representation.
Any component group may be absent, including the first component group. In this case, the person name may start with one or more "=" delimiters. Delimiters are also required for interior null component groups. Trailing null component groups and their delimiters may be omitted.
The first component group (identified by DICOM as "alphabetic") shall be encoded using the character set specified by Specific Character Set (0008,0005), Value 1. If Attribute Specific Character Set (0008,0005) is not present, the Default Character Repertoire ISO-IR 6 shall be used. [ISO/IEC 2022] escapes for Code Extension shall not be used in this component group. When Specific Character Set (0008,0005) Value 1 specifies a multi-byte character set without Code Extension (i.e., Unicode in UTF-8, [GB 18030] or [GBK]), the characters of this component group may be encoded with multiple bytes, but shall be drawn from the code points U+0020 through U+1FFF of [ISO/IEC 10646], or the following [ISO/IEC 10646] code points:
The second group shall be used for ideographic characters. The character sets used will usually be those from Attribute Specific Character Set (0008,0005), Value 2 through n, and may use [ISO/IEC 2022] escapes.
The third group shall be used for phonetic characters. The character sets used shall be those from Attribute Specific Character Set (0008,0005), Value 1 through n, and may use [ISO/IEC 2022] escapes.
Delimiter characters "^" and "=" are taken from the character set specified by Value 1 of Specific Character Set (0008,0005). If Attribute Specific Character Set (0008,0005), Value 1 is not present, the Default Character Repertoire ISO-IR 6 shall be used.
At the beginning of the Value of the Person Name Data Element, the following initial condition is assumed: if Attribute Specific Character Set (0008,0005), Value 1 is not present, the Default Character Repertoire ISO-IR 6 is invoked, and if Specific Character Set (0008,0005), Value 1 is present, the character set specified by Value 1 of the Attribute is invoked.
At the end of the Value of the Person Name Data Element, and before the component delimiters "^" and "=", the character set shall be switched to the Default Character Repertoire ISO-IR 6, if Value 1 of Specific Character Set (0008,0005) is not present. If Value 1 of Specific Character Set (0008,0005) is present, the character set shall be switched to that specified by Value 1 of the Attribute.
The Value Length of each component group is 64 characters maximum, including the delimiter for the component group. Each combining character (e.g., diacritics or vowel marks) shall be considered a separate character for this maximum length, regardless of how an application may display such combining characters (i.e., combined into the glyph for the base character, or rendered separately).
DICOM PS3.5 2024e - Data Structures and Encoding |
---|