DICOM PS3.5 2024e - Data Structures and Encoding |
---|
The Unicode UTF-8 character set and the [GB 18030] character set may be used for multiple languages. Some of these languages may also be encoded using other character sets that are defined elsewhere in the DICOM Standard. As Unicode UTF-8 and [GB 18030] encodings do not allow [ISO/IEC 2022] character set replacement, these must be used for all strings in a single SOP Instance. This may have implications for the character set selected for the encoding of the SOP Instance.
Since the [GBK] character set is fully code point compatible to the larger character set of [GB 18030], and the specific examples of [GB 18030] encoding this in Annex (J.3 and J.4) include only the Chinese characters falling in the common coding area between the two standards, these examples are used to demonstrate the person name and text encoding in both standards. Examples specific to [GBK] are not necessary.
Example J.1-1. Example of Person Name Value Representation in the Chinese Language Using Unicode
Person names in the Chinese language may be written in Hanzi (ideographic characters), and/or Latin (alphabetic characters). The Latin representation may be derived using pinyin or another Romanization method, or may be a chosen "westernized" name. The two component groups should be written in the order of alphabetic, then ideographic; the phonetic component group is typically not used (see Table 6.2-1). In this example the traditional script is used.
Some healthcare information systems may encode a "westernized" name with other patient aliases in a separate Attribute, e.g., Other Patient Names (0010,1091).
Some environments using Chinese language may use the third name component, e.g., for the Yi or Mongolian script, with or without the first name component. This would be similar to the Japanese and Korean name component usage.
In the example below, the Specific Character Set Attribute (0008,0005) would contain:
Character encoded representation is:
DICOM PS3.5 2024e - Data Structures and Encoding |
---|