Various options are defined to be applicable to the Basic Application Level Confidentiality Profile. Some of these options require removal of additional information, and some of these options require retention of information that would otherwise be removed.
The following options are defined that require removal of additional information:
Clean Pixel Data Option
Clean Recognizable Visual Features Option
Clean Graphics Option
Clean Structured Content Option
Clean Descriptors Option
The following options are defined that require retention of information that would otherwise be removed but that is needed for specific uses:
Retain Longitudinal Temporal Information with Full Dates Option
Retain Longitudinal Temporal Information with Modified Dates Option
Retain Patient Characteristics Option
Retain Device Identity Option
Retain UIDs
Retain Safe Private Option
When this Option is specified in addition to an Application Level Confidentiality Profile, any information burned in to the Pixel Data (7FE0,0010) corresponding to the Attribute information specified to be removed by the Profile and any other Options specified shall also be removed, as described in Table E.1-1.
This may require intervention of or approval by a human operator.
The Attribute Burned In Annotation (0028,0301) shall be added to the Dataset with a value of "NO".
This capability is called out as a specific option, since it may be extremely burdensome in practice to implement and is unnecessary for the vast majority of modalities that do not burn in such annotation in the first place. For example, CT images do not normally contain such burned in annotation, whereas Ultrasound images routinely do.
Though image processing and optical character recognition techniques can be used to detect the presence of and location of burned in text, and matching against known identifying information can be applied, deciding whether or not that text is identifying information or some other type of information may be non-trivial. Compliance with this option requires that identifying information is removed, regardless of how that is achieved. It is not required that information specified to be retained in the non-pixel data by other Options (e.g., physical characteristics, dates or descriptors) also be retained burned-in to the pixel data. Thus the most conservative approach of removing any and all burned in text would be compliant. This may involve sacrificing additional potentially useful information such as localizer posting and manual graphic annotations.
The stored pixel values are to be changed (blacked out); it is not sufficient to superimpose an overlay or graphic annotation or shutter to obscure the pixel data values, since those may not be ignored by the receiving system.
This option is intended to apply to the Pixel Data (7FE0,0010) Attribute that occurs in the top level Dataset of an Image Storage SOP Instance. The other standard use of Pixel Data (7FE0,0010) is within Icon Image Sequence (0088,0200), which is already described in Table E.1-1 and the accompanying note as requiring removal. This option does not require the ability to manually or automatically process the pixel values of Pixel Data (7FE0,0010) occurring in any other location than the top level dataset, but it does not prohibit it. Pixel Data (7FE0,0010) occurring within private Attributes will be removed because such Attributes will not be known to be safe.
When this Option is specified in addition to an Application Level Confidentiality Profile, if there is sufficient visual information within the Pixel Data of a set of instances to allow an individual to be recognized from the instances themselves or a reconstruction of a set of instances, then sufficient removal or distortion of the Pixel Data shall be applied to prevent recognition.
This may require intervention of or approval by a human operator.
The Attribute Recognizable Visual Features (0028,0302) shall be added to the Dataset with a value of "NO".
This capability is called out as a specific option, since it may be extremely burdensome in practice to implement and is unnecessary for the vast majority of anatomic sites and modalities.
In the case of full-face photographs, the risk of visual identification is obvious, and numerous techniques are well established for de-identification, such as applying black rectangles over the eyes, etc.
In the case of high-resolution cross-sectional imaging of the entire head and neck, it has been suggested that a 3D volume or surface rendering of the pixel data may be sufficient to allow identification (or matching against a constrained subset of individuals) under some circumstances.
Application of this option may render the pixel data unusable for the purpose for which it has been collected, and hence its use may require a compromise between de-identification and utility based on obtaining appropriate ethical approval and informed consent. Consider for example, the case of dental images.
Instances of various Standard and Standard Extended SOP Classes, including Images, Presentation States and other Composite SOP Instances, may contain identification information encoded as graphics, text annotations or overlays. This does not include information contained in Structured Report SOP Classes.
When this Option is specified in addition to an Application Level Confidentiality Profile, any information encoded in graphics, text annotations or overlays corresponding to the Attribute information specified to be removed by the Profile and any other Options specified shall also be removed, as described in Table E.1-1.
This may require intervention of a human operator.
This capability is called out as a specific option, since it may be more practical to simply remove all such graphics, text annotations or overlays (as required by the profile without this option).
As with burned-in pixel data annotation, deciding whether or not text is identifying information or some other type of information may be non-trivial. It is not required that information specified to be retained in the non-pixel data by other Options (e.g., physical characteristics, dates or descriptors) also be retained in graphics, text annotations or overlays.
Instances of Structured Report SOP Classes may contain identifiable information in a Content Sequence (0040,A730) encoded in Content Items. Instances of other SOP Classes may contain structured content encoded in a similar manner in the Acquisition Context Sequence (0040,0555) or Specimen Preparation Sequence (0040,0610).
When this Option is specified in addition to an Application Level Confidentiality Profile, any information encoded in SR Content Items or Acquisition Context or Specimen Preparation Sequence Items corresponding to the Attribute information specified to be removed by the Profile and any other Options specified shall also be removed.
For example, the "observer" responsible for a diagnostic imaging report may be explicitly identified in Observation Content related Content Items in an SR.
A de-identifier that does not implement this option creates significant risk when attempting to de-identity a Structured Report unless it is only used to de-identify instances that are known to have no identifying information in the Content Sequence.
Even though many Attributes are defined in the DICOM Standard for specific purposes, such as to describe a Study or a Series, those that contain plain text over which an operator has control may contain unstructured information that includes identities.
When this Option is specified in addition to an Application Level Confidentiality Profile, any information that is embedded in text or string Attributes corresponding to the Attribute information specified to be removed by the Profile and any other Options specified shall also be removed, as described in Table E.1-1.
For example, an operator may include a person's name or a patient's demographics or physical characteristics in the Study Description (0008,1030), perhaps because their modality user interface does not provide other fields or because other systems do not display them. E.g., the description might contain "CT chest abdomen pelvis - 55F Dr. Smith".
One approach to cleaning such text strings without human intervention is to extract and retain only values known to be useful and safe and discard all others. For example, in the string "CT chest abdomen pelvis - 55F Dr. Smith" are found in Study Description (0008,1030), then it would be feasible to detect and retain "CT chest abdomen pelvis" and discard the remainder. In an international setting, this may require an extensive dictionary of words that are safe to retain, e.g., to detect "Buik" for abdomen in Dutch or "λεκάνη" for pelvis in Greek. Another possibility is to extract such information and attempt to code the information in other Attributes (if otherwise absent or empty) such as Anatomic Region Sequence (0008,2218). However, the possibility of string values being both identifying and descriptive in different uses needs to be considered, e.g., "Dr. Hand" or "M. Genou".
Table E.1-1 calls out specific Attributes known to be at risk, but an implementer may want to consider any attribute that could potential contain character data, though this Option does not require that this be done. For example, all SH, LO, ST, LT and UT Value Representations could perhaps be misused. Code strings, CS, are not generally at risk, but a check against known Defined Terms and Enumerated Values could be performed. Though extremely unusual, it is conceivable that even a DS or IS string could be misused, and a check could be made that only legal numeric characters were used. Any PN Attribute is obviously at risk. The OB VR is discussed in the Retain Safe Private Option.
This Option specifies what needs to be removed, not what needs to be retained. Depending on the application, it may be desirable to retain some information, such as technique description, but discard other information, such as diagnosis, for example because it may bias the interpretation in a clinical trial. For example, one approach is to remove all description and comment attributes except Series Description (0008,103E), since this Attribute rarely contains identifying or diagnosis information yet is typically a reliable source of useful information about the acquisition technique populated automatically from modality device protocols, though it still could be cleaned as described in Note 2.
It should be recognized that if any descriptor contains information about a particularly unusual procedure or condition, then in conjunction with other demographic information it might reduce the number of possible individuals that could be the imaging subject. However, this is to some extent true also if the condition or other unusual physical features are obvious from visual examination of the images themselves. E.g., how many conjoined twins born in a particular month in Philadelphia might there be?
The manner of cleaning shall be described in the Conformance Statement.
Dates and times are recognized as having a potential for leakage of identity because they constrain the number of possible individuals that could be the imaging subject, though only if there is access to other information about the individuals concerned to match it against.
However, there are applications that require dates and times to be present to able to fulfill the objective. This is particularly true in therapeutic clinical trials in which the objective is to measure change in an outcome measure over time. Further, it is often necessary to correlate information from images with information from other sources, such as clinical and laboratory data, and dates and times need to be consistent.
Two options are specified to address these requirements:
Retain Longitudinal Temporal Information With Full Dates Option
Retain Longitudinal Temporal Information With Modified Dates Option
When the Retain Longitudinal Temporal Information With Full Dates Option is specified in addition to an Application Level Confidentiality Profile, any dates and times present in the Attributes shall be retained, as described in Table E.1-1. The Attribute Longitudinal Temporal Information Modified (0028,0303) shall be added to the Dataset with a value of "UNMODIFIED".
When the Retain Longitudinal Temporal Information With Modified Dates Option is specified in addition to an Application Level Confidentiality Profile, any dates and times present in the Attributes listed in Table E.1-1 shall be modified. The modification of the dates and times shall be performed in a manner that:
aggregates or transforms dates so as to reduce the possibility of matching for re-identification
preserves the gross longitudinal temporal relationships between images obtained on different dates to the extent necessary for the application
preserves the fine temporal relationships between images and real-world events to the extent necessary for analysis of the images for the application
The Attribute Longitudinal Temporal Information Modified (0028,0303) shall be added to the Dataset with a value of "MODIFIED".
Aggregation of dates may be performed by various means such as setting all dates to the first day of the month, all months to the first month of the year, etc., depending on the precision required for the application.
It is possible to modify all dates and times to dummy values by shifting them relative to an arbitrary epoch, and hence retain the precise longitudinal temporal relationships amongst a set of studies, when either de-identification of the entire set is performed at the same time, or some sort of mapping or database is kept to repeat this process on separate occasions.
Transformation of dates and times should be considered together, in order to address studies that span midnight.
Any transformation of times should be performed in such a manner as to not disrupt computations needed for analysis, such as comparison of start of injection time to the acquisition time for PET SUV, or extraction of time-intensity values from dynamic contrast enhanced studies.
The manner of date modification shall be described in the Conformance Statement.
Physical characteristics of the patient, which are descriptive rather than identifying information per se, are recognized as having a potential for leakage of identity because they constrain the number of possible individuals that could be the imaging subject, though only if there is access to other information about the individuals concerned to match it against.
However, there are applications that require such physical characteristics in order to perform the computations necessary to analyze the images to fulfill the objective. One such class of applications is those that are related to metabolic measures, such as computation of PET Standard Uptake Values (SUV) or DEXA or MRI measures of body composition, which are based on body weight, body surface area or lean body mass.
When this Option is specified in addition to an Application Level Confidentiality Profile, information about age, sex, height and weight and other characteristics present in the Attributes shall be retained, as described in Table E.1-1.
The manner of cleaning of retained attributes shall be described in the Conformance Statement.
Information about the identity of the device that was used to perform the acquisition is recognized as having a potential for leakage of identity because it may constrain the number of possible individuals that could be the imaging subject, though only if there is access to other information about the individuals concerned to match it against.
However, there are applications that require such device information to perform the analysis or interpretation. The type of correction for spatial or other inhomogeneity may require knowledge of the specific device serial number. Confirmation that specific devices that have been previously qualified (e.g., with phantoms) may be required. Further, there may be a need to maintain a record of the device used for regulatory or registry purposes, yet the acquisition site may not maintain an adequate electronic audit trail.
When this Option is specified in addition to an Application Level Confidentiality Profile, information about the identity of the device in the Attributes shall be retained, as described in Table E.1-1.
Though individuals do not have unique identifiers themselves, studies, series, instances and other entities in the DICOM model are assigned globally unique UIDs. Whilst these UIDs cannot be mapped directly to an individual out of context, given access to the original images, or to a database of the original images containing the UIDs, it would be possible to recover the individual's identity.
However, there are applications that require the ability to maintain an audit trail back to the original images and though there are other mechanisms they may not scale well or be reliably implemented. This Option is provided for use when it is judged that the risk of gaining access to the original information via the UIDs is small relative to the benefit of retaining them.
When this Option is specified in addition to an Application Level Confidentiality Profile, UIDs shall be retained, as described in Table E.1-1.
A UID of a DICOM entity is not the same as a unique identifier of an individual, such as would be proscribed by some privacy regulations.
UIDs are generated using a hierarchical scheme of "roots", which may be traceable by a knowledgeable person back to the original assignee of the root, typically the device manufacturer, but sometimes the organization using the device.
When evaluating the risk of matching UIDs with the original images or PACS database, one should consider that even if the UIDs are changed, the pixel data itself presents a similar risk. Specifically, the pixel data of the de-identified image can be matched against the pixel data of the original image. Such matching can be greatly accelerated by comparing pre-computed hash values of the pixel data. Removal of burned-in identification may change the pixel data but then matching against a sub-region of the pixel data is almost certainly possible (e.g., the central region of an image). Even addition of noise to an image is not sufficient to prevent re-identification since statistical matching techniques can be used. Ultimately, if any useable pixel data is retained during de-identification, then re-identification is nearly always possible if one has access to the original images. Ergo, replacement of UIDs should not give rise to a false confidence that the images have been more thoroughly de-identified than if the UIDs are retained.
Regardless of this option, implementers should take care not to remove UIDs that are structural and defined by the standard as opposed to those that are instance-related. E.g., one would never remove or replace the SOP Class UID for de-identification purposes.
The Implementation Class UID (0002,0012) is not included in the list of UID attributes to be retained, since it is part of the File Meta Information (see PS3.10), which is entirely replaced whenever a file is stored or modified during de-identification. See Section E.1.1.
By definition, Private Attributes contain proprietary information, in many cases the nature of which is known only to the vendor and not publicly documented.
However, some Private Attributes may be necessary for the desired application. For example, specific technique information such as CT helical span pitch, or pixel value transformation, such as PET SUV rescale factors, may only be available in Private Attributes since the information is either not defined in Standard Attributes, or was added to the DICOM Standard after the acquisition device was manufactured.
When this Option is specified in addition to an Application Level Confidentiality Profile, Private Attributes that are known by the de-identifier to be safe from identity leakage shall be retained, together with the Private Creator IDs that are required to fully define the retained Private Attributes; all other Private Attributes shall be removed.
When this Option is not specified, all Private Attributes shall be removed, as described in Table E.1-1.
A sample list of Private Attributes thought to be safe is provided here. Vendors do not guarantee them to be safe, and do not commit to sending them in any particular software version (including future products).
Table E.3.10-1. Safe Private Attributes
Data Element |
Private Creator |
VR |
VM |
Meaning |
---|---|---|---|---|
(7053,xx00) |
Philips PET Private Group |
DS |
1 |
SUV Factor - Multiplying stored pixel values by Rescale Slope then this factor results in SUVbw in g/l |
(7053,xx09) |
Philips PET Private Group |
DS |
1 |
Activity Concentration Factor - Multiplying stored pixel values by Rescale Slope then this factor results in MBq/ml. |
(00E1,xx21) |
ELSCINT1 |
DS |
1 |
DLP |
(01E1,xx26) |
ELSCINT1 |
CS |
1 |
Phantom Type |
(01E1,xx50) |
ELSCINT1 |
DS |
1 |
Acquisition Duration |
(01F1,xx01) |
ELSCINT1 |
CS |
1 |
Acquisition Type |
(01F1,xx07) |
ELSCINT1 |
DS |
1 |
Table Velocity |
(01F1,xx26) |
ELSCINT1 |
DS |
1 |
Pitch |
(01F1,xx27) |
ELSCINT1 |
DS |
1 |
Rotation Time |
(0019,xx23) |
GEMS_ACQU_01 |
DS |
1 |
Table Speed [mm/rotation] |
(0019,xx24) |
GEMS_ACQU_01 |
DS |
1 |
Mid Scan Time [sec] |
(0019,xx27) |
GEMS_ACQU_01 |
DS |
1 |
Rotation Speed (Gantry Period) |
(0019,xx9E) |
GEMS_ACQU_01 |
LO |
1 |
Internal Pulse Sequence Name |
(0043,xx27) |
GEMS_PARM_01 |
SH |
1 |
Scan Pitch Ratio in the form "n.nnn:1" |
(0045,xx01) |
GEMS_HELIOS_01 |
SS |
1 |
Number of Macro Rows in Detector |
(0045,xx02) |
GEMS_HELIOS_01 |
FL |
1 |
Macro width at ISO Center |
(0903,xx10) |
GEIIS PACS |
US |
1 |
Reject Image Flag |
(0903,xx11) |
GEIIS PACS |
US |
1 |
Significant Flag |
(0903,xx12) |
GEIIS PACS |
US |
1 |
Confidential Flag |
(2001,xx03) |
Philips Imaging DD 001 |
FL |
1 |
Diffusion B-Factor |
(2001,xx04) |
Philips Imaging DD 001 |
CS |
1 |
Diffusion Direction |
(0019,xx0C) |
SIEMENS MR HEADER |
IS |
1 |
B Value |
(0019,xx0D) |
SIEMENS MR HEADER |
CS |
1 |
Diffusion Directionality |
(0019,xx0E) |
SIEMENS MR HEADER |
FD |
3 |
Diffusion Gradient Direction |
(0019,xx27) |
SIEMENS MR HEADER |
FD |
6 |
B Matrix |
(0043,xx39) |
GEMS_PARM_01 |
IS |
4 |
1stvalue is B Value |
(0043,xx6F) |
GEMS_PARM_01 |
DS |
3-4 |
Scanner Table Entry + Gradient Coil Selected |
(0025,xx07) |
GEMS_SERS_01 |
SL |
1 |
Images in Series |
(7E01,xx01) |
HOLOGIC, Inc. |
LO |
1 |
Codec Version |
(7E01,xx02) |
HOLOGIC, Inc. |
SH |
1 |
Codec Content Type |
(7E01,xx10) |
HOLOGIC, Inc. |
SQ |
1 |
High Resolution Data Sequence |
(7E01,xx11) |
HOLOGIC, Inc. |
SQ |
1 |
Low Resolution Data Sequence |
(7E01,xx12) |
HOLOGIC, Inc. |
OB |
1 |
Codec Content |
(0099,xx01) |
NQHeader |
UI |
1 |
Version |
(0099,xx02) |
NQHeader |
UI |
1 |
Analyzed Series UID |
(0099,xx04) |
NQHeader |
SS |
1 |
Return Code |
(0099,xx05) |
NQHeader |
LT |
1 |
Return Message |
(0099,xx10) |
NQHeader |
FL |
1 |
MI |
(0099,xx20) |
NQHeader |
SH |
1 |
Units |
(0099,xx21) |
NQHeader |
FL |
1 |
ICV |
(0199,xx01) |
NQLeft |
FL |
1 |
Left Cortical White Matter |
(0199,xx02) |
NQLeft |
FL |
1 |
Left Cortical Gray Matter |
(0199,xx03) |
NQLeft |
FL |
1 |
Left 3rd Ventricle |
(0199,xx04) |
NQLeft |
FL |
1 |
Left 4th Ventricle |
(0199,xx05) |
NQLeft |
FL |
1 |
Left 5th Ventricle |
(0199,xx06) |
NQLeft |
FL |
1 |
Left Lateral Ventricle |
(0199,xx07) |
NQLeft |
FL |
1 |
Left Inferior Lateral Ventricle |
(0199,xx08) |
NQLeft |
FL |
1 |
Left Inferior CSF |
(0199,xx09) |
NQLeft |
FL |
1 |
Left Cerebellar White Matter |
(0199,xx0a) |
NQLeft |
FL |
1 |
Left Cerebellar Gray Matter |
(0199,xx0b) |
NQLeft |
FL |
1 |
Left Hippocampus |
(0199,xx0c) |
NQLeft |
FL |
1 |
Left Amygdala |
(0199,xx0d) |
NQLeft |
FL |
1 |
Left Thalamus |
(0199,xx0e) |
NQLeft |
FL |
1 |
Left Caudate |
(0199,xx0f) |
NQLeft |
FL |
1 |
Left Putamen |
(0199,xx10) |
NQLeft |
FL |
1 |
Left Pallidum |
(0199,xx11) |
NQLeft |
FL |
1 |
Left Ventral Diencephalon |
(0199,xx12) |
NQLeft |
FL |
1 |
Left Nucleus Accumbens |
(0199,xx13) |
NQLeft |
FL |
1 |
Left Brain Stem |
(0199,xx14) |
NQLeft |
FL |
1 |
Left Exterior CSF |
(0199,xx15) |
NQLeft |
FL |
1 |
Left WM Hypo |
(0199,xx16) |
NQLeft |
FL |
1 |
Left Other |
(0299,xx01) |
NQRight |
FL |
1 |
Right Cortical White Matter |
(0299,xx02) |
NQRight |
FL |
1 |
Right Cortical Gray Matter |
(0299,xx03) |
NQRight |
FL |
1 |
Right 3rd Ventricle |
(0299,xx04) |
NQRight |
FL |
1 |
Right 4th Ventricle |
(0299,xx05) |
NQRight |
FL |
1 |
Right 5th Ventricle |
(0299,xx06) |
NQRight |
FL |
1 |
Right Lateral Ventricle |
(0299,xx07) |
NQRight |
FL |
1 |
Right Inferior Lateral Ventricle |
(0299,xx08) |
NQRight |
FL |
1 |
Right Inferior CSF |
(0299,xx09) |
NQRight |
FL |
1 |
Right Cerebellar White Matter |
(0299,xx0a) |
NQRight |
FL |
1 |
Right Cerebellar Gray Matter |
(0299,xx0b) |
NQRight |
FL |
1 |
Right Hippocampus |
(0299,xx0c) |
NQRight |
FL |
1 |
Right Amygdala |
(0299,xx0d) |
NQRight |
FL |
1 |
Right Thalamus |
(0299,xx0e) |
NQRight |
FL |
1 |
Right Caudate |
(0299,xx0f) |
NQRight |
FL |
1 |
Right Putamen |
(0299,xx10) |
NQRight |
FL |
1 |
Right Pallidum |
(0299,xx11) |
NQRight |
FL |
1 |
Right Ventral Diencephalon |
(0299,xx12) |
NQRight |
FL |
1 |
Right Nucleus Accumbens |
(0299,xx13) |
NQRight |
FL |
1 |
Right Brain Stem |
(0299,xx14) |
NQRight |
FL |
1 |
Right Exterior CSF |
(0299,xx15) |
NQRight |
FL |
1 |
Right WM Hypo |
(0299,xx16) |
NQRight |
FL |
1 |
Right Other |
One approach to retaining Private Attributes safely, either when the VR is encoded explicitly or known from a data dictionary (such as may be derived from published DICOM Conformance Statements or previously encountered instances, perhaps by adaptively extending the data dictionary as new explicit VR instances are received), is to retain those Attributes that are numeric only. For example, one might retain US, SS, UL, SS, FL and FD binary values, and IS and DS string values that contain only valid numeric characters. One might assume that other string Value Representations are unsafe in the absence of definite confirmation from the vendor to the contrary; code strings (CS) may be an exception. Bulk binary data in OB Value representations is particularly unsafe, and may often contain entire proprietary format headers in binary or text or XML form that includes the patient's name and other identifying information.
The safe private attributes that are retained shall be described in the Conformance Statement.