DICOM PS3.22 2024c - Real-Time Communication

DICOM Standards Committee

A DICOM® publication

Table of Contents

Notice and Disclaimer
1. Scope and Field of Application
2. Normative and Informative References
3. Definitions
4. Symbols and Abbreviations
5. Conventions
6. Data Communication Requirements
6.1. Interaction
6.2. Transport
6.2.1. RTP Header
6.2.2. RTP Payload
7. DICOM Real-Time Format
7.1. RTV Meta Information
7.2. Standard SOP Classes
8. Security Considerations
9. Conformance

List of Figures

6-1a. DICOM Communication Model for Real-Time Communication
6-1. Real World diagram of DICOM-RTV
6-2. Interaction Diagram
7-1. DICOM Dataset Encapsulation Within RTP

List of Tables

7.1-1. RTV Meta Information
7.2-1. Standard SOP Classes

Notice and Disclaimer

The information in this publication was considered technically sound by the consensus of persons engaged in the development and approval of the document at the time it was developed. Consensus does not necessarily mean that there is unanimous agreement among every person participating in the development of this document.

NEMA standards and guideline publications, of which the document contained herein is one, are developed through a voluntary consensus standards development process. This process brings together volunteers and/or seeks out the views of persons who have an interest in the topic covered by this publication. While NEMA administers the process and establishes rules to promote fairness in the development of consensus, it does not write the document and it does not independently test, evaluate, or verify the accuracy or completeness of any information or the soundness of any judgments contained in its standards and guideline publications.

NEMA disclaims liability for any personal injury, property, or other damages of any nature whatsoever, whether special, indirect, consequential, or compensatory, directly or indirectly resulting from the publication, use of, application, or reliance on this document. NEMA disclaims and makes no guaranty or warranty, expressed or implied, as to the accuracy or completeness of any information published herein, and disclaims and makes no warranty that the information in this document will fulfill any of your particular purposes or needs. NEMA does not undertake to guarantee the performance of any individual manufacturer or seller's products or services by virtue of this standard or guide.

In publishing and making this document available, NEMA is not undertaking to render professional or other services for or on behalf of any person or entity, nor is NEMA undertaking to perform any duty owed by any person or entity to someone else. Anyone using this document should rely on his or her own independent judgment or, as appropriate, seek the advice of a competent professional in determining the exercise of reasonable care in any given circumstances. Information and other standards on the topic covered by this publication may be available from other sources, which the user may wish to consult for additional views or information not covered by this publication.

NEMA has no power, nor does it undertake to police or enforce compliance with the contents of this document. NEMA does not certify, test, or inspect products, designs, or installations for safety or health purposes. Any certification or other statement of compliance with any health or safety-related information in this document shall not be attributable to NEMA and is solely the responsibility of the certifier or maker of the statement.


This DICOM Standard was developed according to the procedures of the DICOM Standards Committee.

The DICOM Standard is structured as a multi-part document using the guidelines established in [ISO/IEC Directives, Part 2].

1 Scope and Field of Application

This Part of the DICOM Standard specifies an SMPTE ST 2110-10 based service, relying on RTP, for the real-time transport of DICOM metadata. It provides a mechanism for the transport of DICOM metadata associated with a video or an audio flow based on the SMPTE ST 2110-20 and SMPTE ST 2110-30, respectively.

2 Normative and Informative References

The following standards contain provisions that, through reference in this text, constitute provisions of this Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this Standard are encouraged to investigate the possibilities of applying the most recent editions of the standards indicated below.

[ISO/IEC Directives, Part 2] ISO/IEC. 2016/05. 7.0. Rules for the structure and drafting of International Standards. .

[EBU-SMPTE-VSF JT-NM Phase 2 Report] , , and . 2015. v1.0. Joint Task Force on Networked Media (JT-NM) Phase 2 Report- Reference Architecture. .

[RFC 3550] IETF. 2003/07. RTP: A Transport Protocol for Real-Time Applications. .

[RFC 5285] IETF. 2008/07. A General Mechanism for RTP Header Extensions. .

[SMPTE ST 2110-10] Society of Motion Picture and Television Engineers (SMPTE). 2017. Professional Media over IP Networks: System Timing and Definitions.

[SMPTE ST 2110-20] Society of Motion Picture and Television Engineers (SMPTE). 2017. Professional Media over IP Networks: Uncompressed Active Video.

[SMPTE ST 2110-30] Society of Motion Picture and Television Engineers (SMPTE). 2017. Professional Media over IP Networks: PCM Digital Audio.

3 Definitions

For the purposes of this Standard the following definitions apply.

3.1 Reference Architecture Definitions:

This Part of the Standard makes use of the following terms defined in [EBU-SMPTE-VSF JT-NM Phase 2 Report]:


Video, audio or data type of source.


A sequence of Grains from a Source; a concrete representation of content emanating from the Source.


Represents an element of Essence or other data associated with a specific time, such as a frame, or a group of consecutive audio samples, or captions.


A collection of time-synchronized Flows intended for simultaneous presentation, providing a complete experience of a Source Group.


An abstract concept that represents the primary origin of a Flow or set of Flows.

3.2 DICOM Real-Time Video Definitions:

DICOM Real-Time Video

DICOM-RTV encompasses the DICOM-RTV Service, transport of related multimedia bulk data and the Real-Time IODs to which it may be applied.


Real-Time transport of metadata which characterize multimedia bulk data.

4 Symbols and Abbreviations

The following symbols and abbreviations are used in this Part of the Standard.


Audio Video Profile


DICOM Real-Time Video


Networked Media Open Specifications


Precision Time Protocol


Real-Time Transport Protocol


Session Description Protocol


Society of Motion Picture and Television Engineers

5 Conventions

6 Data Communication Requirements

Figure 5-1 in PS3.1 presents the general communication model of the DICOM Standard, which spans both network (on-line) and storage media interchange (off-line) communications. Application Entities may utilize any of the following transport mechanisms:

  • the DICOM Message Service and Upper Layer Service, which provides independence from specific physical networking communication support and protocols such as TCP/IP,

  • the DICOM Web Service API and HTTP Service, which allows use of common hypertext and associated protocols for transport of DICOM services,

  • the Basic DICOM File Service, which provides access to Storage Media independently from specific physical media storage formats and file structures, or

  • DICOM Real-Time Communication, which provides real-time transport of DICOM metadata based on SMPTE and RTP.

This Part describes the DICOM Real-Time Communication, which uses the RTP protocol as defined in [SMPTE ST 2110-10], as depicted in Figure 6-1a.

DICOM Communication Model for Real-Time Communication

Figure 6-1a. DICOM Communication Model for Real-Time Communication

6.1 Interaction

As shown in Figure 6-1, a device can have multiple Sources, one for each Essence which corresponds of the type of bulk data (video, audio or medical metadata), each Source producing one or multiple Flows representing the same content in different formats (high definition, low definition, uncompressed, compress with or without loss, …).

Several Sources may be grouped in a Source Group. A concrete experience of a Source Group is a Rendition, defined as a collection of time-synchronized Flows intended for simultaneous presentation (e.g., the audio channel of a surgical camera).

Real World diagram of DICOM-RTV

Figure 6-1. Real World diagram of DICOM-RTV

DICOM Real-Time Video standard specifies the communication mechanism for metadata, associated with real-time video and/or audio, originated from a medical imaging device. The mechanism involves one Source and one Flow of "DICOM Video Metadata Essence" for each video Flow and one Source and one Flow of "DICOM Audio Metadata Essence" for each audio Flow. Optionally, there is one Source and one Flow for the "DICOM Rendition Metadata" associating multiple Flows produced by the same device.

The interaction shall be as shown in Figure 6-2.

Interaction Diagram

Figure 6-2. Interaction Diagram

[SMPTE ST 2110-10] provides end-to-end network transport functions for applications transmitting real-time data. Content is transmitted in RTP sessions using RTP packets respecting [SMPTE ST 2110-10].

A device can provide and/or consume content. A device that provides content has one or more Sources that can be of different Essences (e.g., Video and Audio). A Source is the origin of one or more Flows. Multiple Flows coming from the same Source are representations of the same content in different resolutions and/or codings. This is a broadcast/multicast protocol, so a device provides content whether or not a consuming device is present. A device that consumes content can subscribe/unsubscribe to available Flows.

The context and content of a video and/or audio Flow is described by a DICOM Metadata Flow, which is associated with each Flow. However the same DICOM Metadata Flow may be used to describe more than one Flow if their content is the same and their coding are close enough not to affect professional interpretation. A DICOM Rendition Metadata Flow may be used to associate multiple Flows provided by one device.

6.2 Transport

6.2.1 RTP Header

All Essences shall be transported with RTP according to [SMPTE ST 2110-10], which requires that each Flow is described by an SDP object which specifies its content as well as connection details enabling the receiver to join the session. In addition to mandatory information specified in [SMPTE ST 2110-10], for Audio and Video Essence, the SDP may also include the following information:

  • PTP Sync Timestamp

  • PTP Origin Timestamp

  • Source Identifier

  • Flow Identifier


This information is the best way for associating multiple Flows originating from the same device. The presence of such information in the SDP implies that it is contained in the RTP Extended Header present in the first IP packet of a Grain (video frame, audio sample, metadata set). It makes it possible to automatically associate and temporarily synchronize two Flows based on their content.

By definition, all the Flows according to [SMPTE ST 2110-10] are synchronized by means of a common reference to the Universal Time, using PTP, with precision on the order of nanoseconds.

The RTP Header, for video and audio Flows, shall follow [SMPTE ST 2110-20] and [SMPTE ST 2110-30], respectively.

The RTP Header, for DICOM Metadata Flows, shall follow [SMPTE ST 2110-10]. The clock rate shall be identical to the one defined in the referenced audio or video Flow. The following additional constraints apply:

extension (X) : 1 bit

Shall be set to 1.

payload type (PT)

The value of payload type is selected from the range 96-127. It is recommended to avoid numbers frequently used for audio (97) and video (96), and for example use 104 for DICOM Metadata Essence. The value shall be associated to the media type "application" and the subtype "dicom" in the SDP. E.g., (DICOM Metadata on port 12345):

m=application12345 RTP/AVP104

a=rtpmap:104 dicom/90000

For the DICOM Metadata Essence, the RTP Header Extension defined by NMOS shall be present, including the following information:

  • PTP Sync Timestamp

  • PTP Origin Timestamp

  • Source Identifier

  • Flow Identifier

The "defined by profile" part of the RTP Header Extension shall be set to 0xBEDE identifying that the one-byte header extension form is used, as specified in [RFC 5285].

6.2.2 RTP Payload

The RTP Payload for audio and video Flows shall follow [SMPTE ST 2110-20] and [SMPTE ST 2110-30], respectively.

The RTP Payload for DICOM Metadata Flows (audio, video and rendition) shall follow [SMPTE ST 2110-10].

The RTP Payload for DICOM Metadata Flows consists of a DICOM dataset compliant with real-time communication.

The DICOM dataset is made of three parts:

  • the RTV Meta Information part. This part shall be present in each Grain.

  • the dynamic part containing information that varies over time (e.g., Origin Timestamp of the frame, Position of a probe, circle defining the eye. When it exists, this part shall be present in each Grain. The transmission rate of the dynamic part shall be identical to the rate of the associated Flow (e.g., one dataset per frame). This part is for the moment not applicable to DICOM Rendition Metadata.

  • the static part containing information that doesn't vary over time (e.g. Patient Name, Modality, …). This part will not be present in every Grain but shall be present at least in one Grain per second.


The receiver cannot process information received from a sender until it receives DICOM Metadata including the static part, so it has to be sent at least every second in order to avoid a longer wait by the receiver when "connected" to a sender.

The transmission rate of DICOM audio flows will be typically of the range of 48kHz. The transmission rate of DICOM video flows will be typically of the range of 60Hz. The transmission rate of the DICOM Rendition Metadata Flow shall be at least 1Hz. It may be appropriate to use a higher frequency if there is a need for tight synchronization of associated Flows from a device (e.g., two videos of a stereo pair).

7 DICOM Real-Time Format

The DICOM Real-Time Format provides a means to encapsulate in an RTP session the Data Set representing a SOP Instance.

Figure 7-1 illustrates the encapsulation of a DICOM audio or video dataset in RTP. The byte stream of the Data Set is placed into the RTP Payload after the DICOM-RTV Meta Information. Each RTP session corresponds to a single SOP Instance.

DICOM Dataset Encapsulation Within RTP

Figure 7-1. DICOM Dataset Encapsulation Within RTP

7.1 RTV Meta Information

The RTV Meta Information includes identifying information on the encapsulated DICOM Data Set.


The group number of the RTV Meta Information attributes (0002,xxxx) is lower than the one of other attributes in order to place the RTV Meta Information at the beginning of the payload, as is done in PS3.10.

Table 7.1-1. RTV Meta Information

Attribute Name



Attribute Description

Header Preamble

No Tag or Length Fields


A fixed 128 byte field available for Application Profile or implementation specified use. If not used by an Application Profile or a specific implementation, all bytes shall be set to 00H.

Receivers shall not rely on the content of this Preamble to determine that this payload is or is not a DICOM payload.

DICOM Prefix

No Tag or Length Fields


Four bytes containing the character string "DICM". This Prefix is intended to be used to recognize that this payload is or is not a DICOM payload.

File Meta Information Group Length



Number of bytes following this RTV Meta Element (end of the Value field) up to and including the last RTV Meta Element of the Group 2 RTV Meta Information

Transfer Syntax UID



Uniquely identifies the Transfer Syntax used to encode the referred bulk-data Flow. This Transfer Syntax does not apply to the RTV Metadata which is encoded using the Explicit VR Little Endian Transfer Syntax.

RTV Meta Information Version



This is a two byte field where each bit identifies a version of this RTV Meta Information header. In version 1 the first byte value is 00H and the second byte value is 01H.

RTV Communication SOP Class UID



Uniquely identifies the SOP Class associated with the Data Set. SOP Class UIDs allowed for RTV Communication are specified in section 7.2 STANDARD SOP CLASSES.

RTV Communication SOP Instance UID



Uniquely identifies the SOP Instance associated with the Data Set placed in the RTP Payload and following the RTV Meta Information.

RTV Source Identifier



The UUID of the RTP source that sends the RTV Metadata Flow.

RTV Flow Identifier



The UUID of the RTV Metadata Flow.

RTV Flow RTP Sampling Rate



The rate of the dynamic part of the RTV Metadata Flow, the same as the bulk-data Flow rate.

Required if RTV Metadata Flow includes a dynamic part.

RTV Flow Actual Frame Duration



Duration of image capture in msec.

Private Information Creator UID



The UID of the creator of the private information (0002,0102).

Private Information



Contains Private Information placed in the RTV Meta Information. The creator shall be identified in (0002,0100). Required if Private Information Creator UID (0002,0100) is present.

7.2 Standard SOP Classes

The SOP Classes in the Real-Time Communication Class identify the Composite IODs to be sent. Table 7.2-1 identifies Standard SOP Classes.

Table 7.2-1. Standard SOP Classes

SOP Class Name


IOD Specification (defined in PS3.3)

Video Endoscopic Image Real-Time Communication


Real-Time Video Endoscopic Image IOD

Video Photographic Image Real-Time Communication


Real-Time Video Photographic Image IOD

Audio Waveform Real-Time Communication


Real-Time Audio Waveform IOD

Rendition Selection Document Real-Time Communication


Rendition Selection Document IOD

8 Security Considerations

The metadata and ancillary streams usually contain Personally Identifiable Information (PII). The video and audio streams might contain protected information. The underlying SMPTE protocols do not specify any security protections to ensure confidentiality, integrity, or availability of the various data streams. DICOM does not specify any additions to the SMPTE protocols to provide such protection. Authorization and authentication of access to the DICOM-RTV Service is handled by configuration. Authentication is not re-confirmed at initiation of the underlying SMPTE protocols, and DICOM does not specify any additions to the SMPTE protocols for access control, authorization, or authentication.

The potential eavesdropping, replay, message insertion, deletion, modification, man-in-the-middle and denial of service attacks have not been analyzed. That analysis is up to the individual sites and installations.

Individual sites and installations will also need to perform their own assessments and selection of security mechanisms and add protections as necessary. The data rates and strict timing requirements for the data streams require careful analysis of any security mechanisms that are added. There do exist security mechanisms that operate at and below the IP level that can meet foreseen use cases, but there is insufficient experience or evidence to justify DICOM making a recommendation.

9 Conformance

An implementation claiming conformance to PS3.22 shall function in accordance with all its mandatory sections.

DICOM-RTV Services are used to transmit in real-time Composite SOP Instances. All Composite SOP Instances transmitted shall conform to the requirements specified in other Parts of the Standard.

An implementation may conform to the DICOM-RTV Services by supporting the role of origin device or receiving device, or both, for any of the Services defined in PS3.22.

The structure of Conformance Statements is specified in PS3.2.

An implementation shall describe in its Conformance Statement the Real-World Activity associated with its use of DICOM-RTV Services, including any proxy functionality between a DICOM-RTV and another service provided through DIMSE Service or RESTful (i.e.; storage of received video and audio with associated metadata).

In addition, the Conformance Statement document for a DICOM-RTV sending device shall specify how the receivers can get the content of the SDP objects describing the metadata and associated video and/or audio flows.