DICOM PS3.15 2024e - Security and System Management Profiles |
---|
This Annex addresses the removal and replacement of Attributes within a DICOM Dataset that may potentially result in leakage of Individually Identifiable Information (III) about the patient or other individuals or organizations associated with the data.
Use of the Attribute Confidentiality Profiles does not guarantee that all individually identifying information will be removed, i.e., de-identification of the Attributes does not imply de-identification of the Information Object. Use of these profiles does not replace a de-identification process, but should be part of it. The description of such a process is beyond the scope of DICOM, but would at least involve determining the context of the de-identification (e.g., for what purpose is the data de-identified, who are the recipients, how is the de-identified data shared), interpreting the applicable regulations, and assessing the risk of detrimental re-identification.
The Profiles are provided to address the balance between the removal of information and the need to retain information so that the Datasets remain useful for their intended purpose.
Options are used in addition to the Profiles to prevent a combinatorial expansion of different Profiles.
The Application Level Confidentiality Profile addresses the following aspects of security:
Other aspects of security not addressed by this Profile, that may be addressed elsewhere in the Standard, include:
This Profile is targeted toward creating a special purpose, de-identified version of an already-existing Data Set. It is not intended to replace the original SOP Instance from which the de-identified SOP Instance is created, nor is it intended to act as the primary representation of clinical Data Sets in image archives. The de-identified SOP Instances are useful, for example, in creating teaching or research files, performing clinical trials, or submission to registries where the identity of the patient and other individuals is required to be protected. In some cases, it is also necessary to provide a means of recovering identity by authorized personnel.
An Application may claim conformance to the Basic Application Level Confidentiality Profile and Options as a de-identifier if it protects and retains all Attributes as specified in the Profile and Options. Protection in this context is defined as the following process:
The application may create one or more instances of the Encrypted Attributes Data Set and copy Attributes to be protected into the (single) item of the Modified Attributes Sequence (0400,0550) of one or more of the Encrypted Attributes Data Set instances.
A complete reconstruction of the original Data Set may not be possible; however, Attributes (e.g., SOP Instance UID) in the Modified Attributes Sequence of an Encrypted Attributes Data Set may refer back to the original SOP Instance holding the original Data Set.
It is not required that the Encrypted Attributes Data Set be created; indeed, there may be circumstances where the de-identified Dataset is expected to be archived long enough that any contemporary encryption technology may be inadequate to provide long term protection against unauthorized recovery of identification.
Other mechanisms to assist in identity recovery or longitudinal consistency of replaced UIDs or dates and times are deprecated in favor of the Encrypted Attributes Data Set mechanism that is intended for this purpose. For example, if it is desired to include an encrypted hash of the Patient's Name, it should not be encoded in a separate Private Data Element implemented for that purpose, but should be included in the Encrypted Attributes Data Set and encoded using the standard mechanism. This allows for compatibility between different implementations and provides security based on the quality and control of the encryption keys. Note also, that unencrypted hashes are considerably less secure and should be avoided, since they are vulnerable to trivial dictionary based attacks.
Each Attribute to be protected shall then either be removed from the dataset, or have its value replaced by a different "replacement value" that does not allow identification of the patient.
It is the responsibility of the de-identifier to ensure that this process does not negatively affect the integrity of the Information Object Definition, i.e., Dummy values may be necessary for Type 1 Attributes that are protected but may not be sent with zero length, and are to be stored or exchanged in encrypted form by applications that may not be aware of the security mechanism.
The Standard does not mandate the use of any particular dummy value, and indeed it may have some meaning, for example in data that may be used for teaching purposes, where the real patient identifying information is encrypted for later retrieval, but a meaningful alternative form of identification is provided. For example, a dummy Patient's Name (0010,0010) may convey the type of pathology in a teaching case. It is the responsibility of the de-identifier software or human operator to ensure that the dummy values cannot be used to identify the patient.
It is the responsibility of the de-identifier to ensure the consistency of dummy values for Attributes such as Study Instance UID (0020,000D) or Frame of Reference UID (0020,0052) if multiple related SOP Instances are protected. Indeed, all Attributes of every entity about the Instance level should remain consistent for all Instances protected, e.g., Patient ID for the Patient entity, Study ID for the Study entity, Series Number for the Series entity.
If an Attribute to be protected is contained in a Sequence of Items, the complete Sequence of Items may need to be protected.
The de-identifier should ensure that no identifying information that is burned in to the image pixel data either because the modality does not generate such burned in identification in the first place, or by removing it through the use of the Clean Pixel Data Option; see Section E.3. If non-pixel data graphics or overlays contain identification, the de-identifier is required to remove them, or clean them if the Clean Graphics Option is supported. See Section E.3.3 The means by which burned in or graphic identifying information is located and removed is outside the scope of this Standard.
Each Attribute specified to be retained shall be retained. At the discretion of the de-identifier, Attributes may be added to the dataset to be protected.
If used, all instances of the Encrypted Attributes Data Set shall be encoded with a DICOM Transfer Syntax, encrypted, and stored in the dataset to be protected as an Item of the Encrypted Attributes Sequence (0400,0500). The encryption shall be done using RSA [RFC 2313] for the key transport of the content-encryption keys. A de-identifier conforming to this security Profile may use either AES or Triple-DES for content-encryption. The AES key length may be any length allowed by the RFCs. The Triple-DES key length is 168 bits as defined by [ANSI X9.52]. Encoding shall be performed according to the specifications for RSA Key Transport and Triple DES Content Encryption in [RFC 3370] and for AES Content Encryption in [RFC 3565].
Each item of the Encrypted Attributes Sequence (0400,0500) consists of two Attributes, Encrypted Content Transfer Syntax UID (0400,0510) containing the UID of the Transfer Syntax that was used to encode the instance of the Encrypted Attributes Data Set, and Encrypted Content (0400,0520) containing the block of data resulting from the encryption of the Encrypted Attributes Data Set instance.
RSA key transport of the content-encryption keys is specified as a requirement in the European Prestandard ENV 13608-2: Health Informatics - Security for healthcare communication - Part 2: Secure data objects.
No requirements on the size of the asymmetric key pairs used for RSA key transport are defined in this confidentiality scheme. Implementations claiming conformance to the Basic Application Level Confidentiality Profile as a de-identifier shall always protect (e.g., encrypt and replace) the SOP Instance UID (0008,0018) Attribute as well as all references to other SOP Instances, whether contained in the main dataset or embedded in an Item of a Sequence of Items, that could potentially be used by unauthorized entities to identify the patient.
The Attribute Patient Identity Removed (0012,0062) shall be replaced or added to the dataset with a value of YES. Additionally, one or more codes from CID 7050 “De-identification Method” corresponding to the Profile and Options used shall be added to De-identification Method Code Sequence (0012,0064), and/or a text string describing the method used shall be inserted in or added to De-identification Method (0012,0063).
If the Dataset being de-identified is being stored within a DICOM File, then the File Meta Information including the 128 byte preamble, if present, shall be replaced with a description of the de-identifying application. Otherwise, there is a risk that identity information may leak through unmodified File Meta Information or preamble. See PS3.10. This includes information regarding Application Entity Titles, Presentation Addresses, implementation information, and private information.
If the Dataset being de-identified is being communicated by DICOM Real-Time Video, then the File Meta Information including the 128 byte preamble, if present, shall be replaced with a description of the de-identifying application. Otherwise, there is a risk that identity information may leak through unmodified File Meta Information or preamble. See PS3.22. This includes information regarding Application Entity Titles, Presentation Addresses, implementation information, and private information.
The Attributes listed in Table E.1-1 for each Profile or Option are contained in Standard IODs, or may be contained in Standard Extended IODs. An implementation claiming conformance to the Basic Application Level Confidentiality Profile as a de-identifier shall protect or retain all instances of the Attributes listed in Table E.1-1, whether contained in the main dataset or embedded in an Item of a Sequence of Items. The action codes in Table E.1-1a are used in Table E.1-1.
Table E.1-1a. De-identification Action Codes
These action codes are applicable to both Sequence and non-Sequence Attributes; in the case of Sequences, the action is applicable to the Sequence and all of its contents. Cleaning a sequence ("C" action) entails changing values of Attributes within that Sequence when the meaning of the Sequence within the context of its use in the IOD is specified, or recursively applying the Profile rules to each Dataset in each Item of the Sequence otherwise. Keeping a Sequence ("K" action) requires recursively applying the Profile rules to each Dataset in each Item of the Sequence (for example, in order to remap any UIDs contained within that sequence).
A requirement for an Option, when implemented, overrides any requirement for the underlying Profile. This will make de-identification retain or remove more information.
The Attributes listed in Table E.1-1 may not be sufficient to guarantee confidentiality of patient identity. In particular, identifying information may be contained in Private Attributes, new Standard Attributes, Retired Standard Attributes and additional Standard Attributes not present in Standard Composite IODs (as defined in PS3.3) but used in Standard Extended SOP Classes. Table E.1-1 indicates those Attributes that are used in Standard Composite IODs as well as those Attributes that are Retired. Also included in Table E.1-1 are some Elements that are not normally found in a Dataset, but are used in Commands, Directories and Meta Information Headers, but that could be misused within Private Sequences. Textual Content Items of Structured Reports, textual annotations of Presentation States, Curves and Overlays are specifically addressed. It is the responsibility of the de-identifier to ensure that all identifying information is removed.
It should be noted that conformance to the Basic Application Level Confidentiality Profile does not necessarily guarantee confidentiality. For example, if an attacker already has access to the original images, the Pixel Data could be matched, though the probability and impact of such a threat may be deemed to be negligible. If the Encrypted Attributes Sequence is used, it should be understood that any encryption scheme may be vulnerable to attack. Also, an organization's Security Policy and Key Management policy are recognized to have a much greater impact on the effectiveness of protection.
National and local regulations, which may vary, might require that additional Attributes be de-identified, though the Profile and Options have been designed to be sufficient to satisfy known regulations without compromising the usefulness of the de-identified instances for their intended purpose.
Table E.1-1 is normative, but it is subject to extension as the DICOM Standard evolves and other similar Attributes are added to IODs. De-identifiers may take this extensibility into account, for example, by considering handling all dates and times on the basis of their Value Representation of DT, DA or TM, rather than just those date and time Attributes lists.
The Profile and Options do not specify whether the design of a de-identifier should be to remove what is known to be a risk of identity leakage, or to retain only what is known to be safe. The former approach may fail when the Standard is extended, or when a vendor adds unanticipated Standard Attributes or Private Attributes, whilst the latter requires an extensive, if not complete, comparison of each instance with the Information Object Definitions in PS3.3 to avoid discarding required or useful information. Table E.1-1 defines the minimum actions required for conformance.
The "C" (clean) action is specified not only for string VRs, but also for Code Sequences, since the use of private or local codes and non-standard code meanings may potentially cause identity leakage.
The Digital Signatures Sequence (FFFA,FFFA) needs to be removed because it contains the Certificate of Signer (0400,0115); theoretically the signature could be verified and the object re-signed by the de-identifier itself with its own certificate, but this is not required by the Standard.
In general, there are no CS VR Attributes in this table, since it is usually safe to assume that code strings do not contain identifying information.
In general, there are no Code Sequence Attributes in this table, since it is usually safe to assume that coded sequence entries, including private codes, do not contain identifying information. Exceptions are codes for providers and staff.
The Clean Pixel Data and Clean Recognizable Visual Features Options are not listed in this table, since they are defined by descriptions of operations on the Pixel Data itself. The Clean Pixel Data Option may be applied to the Pixel Data within the Icon Image Sequence, or more likely the Icon Image Sequence may be recreated entirely once the Pixel Data of the main Dataset has been cleaned. The Icon Image Sequence is to be removed when its Pixel Data cannot be cleaned.
The Original Attributes Sequence (0400,0561) (which in turn contains the Modified Attributes Sequence (0400,0550) ) generally needs to be removed, because it may contain unencrypted copies of other Attributes that may have been modified (e.g., coerced to use local identifiers and names during import of foreign images); an alternative approach would be to selectively modify its contents. This is distinct from the use of the Modified Attributes Sequence (0400,0550) within the Encrypted Attributes Sequence (0400,0500).
Table E.1-1 distinguishes Attributes that are in standard Composite IODs defined in PS3.3 from those that are not; some Attributes are defined in PS3.3 for other IODs, or have a specific usage other than in the top level Dataset of a Composite IOD, but are (mis-) used by implementers in instances as a Standard Extended SOP Class at other levels than as defined by the Standard. Any such Attributes encountered may be removed without compromising the conformance of the instance with the standard IOD. For example, Verifying Observer Sequence (0040,A073) is only defined in structured report IODs and hence is described in Table E.1-1 as D since it is Type 1C; if encountered in an image instance, it should simply be removed (treated as X).
Using an Attribute Confidentiality Profile Option that requires the retention of information that normally would be removed, potentially increases the risk of detrimental re-identification. Following de-identification rules as outlined here implies retention or non-retention of information only and does not deal with any related regulatory aspect.
Because of the varied nature of encapsulated documents (CDA, PDF, STL/OBJ, etc.), options for cleaning the content of the Encapsulated Document (0042,0011) Attribute are not specified by the Standard, and it is required to be replaced. If a De-identifier has additional knowledge of the content it may attempt to clean the Attribute, and document in its Conformance Statement how this is performed.