Text preview

Abstract

This paper provides an overview of the Real-time Transport Protocol (RTP) and emphasizes its emphasis on security, particularly confidentiality and authenticity. The procedure involves taking a media file as input, encrypting it, generating a message digest for the encrypted data, and sending it to the user. When the recipient receives the data, they verify the digest by comparing it with the received one. If they match, the data is decrypted and played in real time using a player.

The current RFC1889 specification emphasizes confidentiality and delegates authenticity to lower layer protocols. This study investigates both authenticity and confidentiality. Authentication can be achieved using MD5, SHA-1, and SHA-2 hash algorithms, while AES-128 and Triple DES cryptographic algorithms ensure confidentiality. In terms of security, SHA-2 surpasses other hash algorithms; however, in terms of time efficiency, SHA-1 outperforms SHA-2.

AES-128 is considered superior to Triple

...

DES in both time efficiency and security. Therefore, SHA-1 is selected for authenticity and AES-128 is chosen for confidentiality in securing RTP. The experiment was conducted using J2SDK1.5 and focused on Real-time transport protocol, Transport control protocol, Cryptographic algorithm, and Hash algorithm.

Introduction:

The computer and internet are now vital aspects of human life, requiring increased availability and utilization of secure and up-to-date data. To meet this need, the Real-time Transport Protocol (RTP) and its associated protocols facilitate efficient usage of internet-based information in real-time applications. This research assesses the security measures implemented by RTP while proposing a method to enhance its authenticity. It also examines RTP's position within computer network layers, explores various scenarios where it is utilized, and highlights the significance of time in stream transmission through RTP. The study additionally presents analytical findings.

RTP

View entire sample

Join StudyHippo to see entire essay

is a versatile protocol that can supply the required data for a given application. Instead of operating as a separate layer, it is commonly integrated into the application processing. RTP is modular and can be customized for various purposes. Every specific usage of RTP necessitates an application-specific RTP profile, which customizes the fundamental RTP protocol to fit a particular application field.

The purpose of RTP profiles is to determine how data is packaged into RTP packets, specifying the formats. RFC 1889 outlines the essential fields for transporting real-time data and introduces the Real-time Transport Control Protocol (RTCP). RTCP serves to offer feedback on transmission quality, participant information for the RTP session, and basic session control services. RTP is designed as an application-level protocol for delivering time-sensitive content, like audio and video, across diverse networks.

The purpose of RTP is to support the transmission, monitoring, reconstruction, mixing, and synchronization of data streams. It offers network transport functions for applications that transmit real-time data. However, RTP intentionally does not include certain quality of service features like flow control, error control, acknowledgement, and retransmission requests. This omission is due to the fact that resending a missing packet could result in it arriving too late for real-time use. Instead of retransmission, RTP employs interpolation to generate lost packets during transmission – a common practice in real-time protocols.

Another protocol, known as Real-time Transport Control Protocol (RTCP), is used to enhance the performance of Real-time Transport Protocol (RTP). RTCP manages feedback regarding delay, jitter, bandwidth, congestion, and other network properties. It also deals with synchronization of multiple streams that may have different clocks and drift rates. By utilizing RTCP, synchronization of these streams

can be ensured.

The use of RTP level translators and mixers is also supported by RTCP. This paper is organized in the following way: Section 2 discusses RTP use scenarios, Section 3 discusses the position of RTP in computer networks, and Section 4 discusses time consideration in RTP.

In section 5, the text discusses the RTP packet format, data transfer protocol, and RTCP. Section 6 delves into the hash and cryptographic algorithms used for RTP security, while section 7 presents the results and performance analysis. Finally, section 8 outlines the conclusion derived from the work.

RTP Use Scenarios:

The following sections provide details about the use of RTP. The examples chosen aim to demonstrate the basic operation of applications that utilize RTP. In these examples, RTP is transmitted over IP and UDP and follows the conventions specified in the companion Internet-Draft draft-ietf-avt-profile for audio and video.

Simple Multicast Audio Conference:

A working group of the IETF uses the IP multicast services of the Internet for voice communications, in order to discuss the latest protocol draft. The working group chair obtains a multicast group address and pair of ports through an allocation mechanism. One port is designated for audio data and the other for control (RTCP) packets. This address and port information is then distributed to the intended participants. If privacy is required, the data and control packets can be encrypted. In this case, an encryption key must also be generated and distributed.

The RTP (Real-time Transport Protocol) does not cover the specific details of allocation and distribution mechanisms. In an audio conferencing application, each participant sends audio data in small durations of around 20 ms. Each portion of audio data is preceded

by an RTP header, and together they form a UDP packet. The RTP header specifies the audio encoding, such as PCM, ADPCM or LPC, of each packet. This allows senders to adjust the encoding during a conference to accommodate new participants or respond to indications of network congestion, such as a low-bandwidth connection. Similar to other packet networks, the Internet may occasionally lose or reorder packets and introduce variable delays.

To handle these impairments, the RTP header includes timing details and a sequence number which enable receivers to recreate the timing generated by the source. Consequently, audio chunks in this scenario are played consecutively through the speaker every 20 ms. This timing reconstruction is conducted independently for each RTP packet source in the conference. Additionally, the receiver can utilize the sequence number to approximate the number of lost packets. As participants join and exit the conference throughout its duration, it is beneficial to identify who is currently involved and assess their audio data reception quality.

For the purpose mentioned, each instance of the audio application in the conference periodically broadcasts a reception report along with the user's name on the RTCP (control) port. This report assesses the quality of the current speaker and can be used to adjust encoding methods. Depending on bandwidth limits, additional identifying information may also be included. When a site leaves the conference, it sends an RTCP BYE packet.

Audio and Video Conference:

If both audio and video media are utilized in a conference, they are transmitted as separate RTP sessions. RTCP packets are sent for each medium using two different UDP port pairs and/or multicast addresses.

There is no direct connection between the audio and

video sessions at the RTP level. However, it is important for a user involved in both sessions to use the same distinguished name in RTCP packets for both sessions to associate them. Participants in a conference can choose to receive only one medium if desired due to this separation. Nonetheless, synchronized playback of audio and video from a source can be achieved by utilizing timing information carried in RTCP packets for both sessions.

Mixers and Translators:

The assumption is that all sites want media data in the same format, but this may not always be suitable. For example, suppose participants in one area have low-speed links while most conference attendees have high-speed network access. In such cases, instead of imposing lower-bandwidth and reduced-quality audio encoding on everyone, an RTP-level relay called a mixer can be deployed near the low-bandwidth area.

This mixer serves to resynchronize incoming audio packets, ensuring that they are spaced at a constant 20 ms interval, as originally generated by the sender. It then combines these reconstructed audio streams into one stream, converts the audio encoding to a lower-bandwidth format, and transmits the resulting packet stream across a low-speed link. These packets can be sent individually to a single recipient or simultaneously to multiple recipients using different addresses. The RTP header contains information that allows mixers to identify the sources that contributed to each mixed packet, enabling accurate talker indication for receivers. Some participants in the audio conference may have high-bandwidth connections but cannot be reached directly through IP multicast. For example, they may be located behind an application-level firewall that blocks all IP packets.

In certain cases, websites may not require mixing. Instead, they

can utilize a translator, specifically an RTP-level relay. This involves employing two translators located on either side of the firewall. The external translator redirects all securely received multicast packets to the internal translator within the firewall. Subsequently, the internal translator retransmits these packets as multicast packets to a restricted multicast group exclusive to the site's internal network. Both mixers and translators have diverse functionalities. For instance, a video mixer can resize individual people's images from separate video streams and merge them into a single video stream, thereby creating a simulated group scene.

The text describes different examples of translation in computer networks where a group of hosts speaking IP/UDP are connected to hosts that understand ST-II, or where video streams from individual sources are translated packet-by-packet without resynchronization or mixing.

The position of RTP in computer networks is then explained. It is decided that RTP should run over User Datagram Protocol (UDP), a connectionless transport protocol. As a result, RTP is placed in user space. The multimedia application, which includes various audio, video, text, and potentially other streams, is processed by the RTP library located in user space. This library multiplexes the streams and encodes the RTP packets, which are then transferred to a socket.

In the operating system kernel, UDP packets are generated and embedded in IP packets. If a computer is connected to an Ethernet, the IP packets are then encapsulated in Ethernet frames for transmission. The protocol stack for this scenario is depicted in Figure 1. Although RTP operates in user space rather than the OS kernel and is packed by UDP in the Ethernet, IP, and Ethernet layers, it is challenging to determine which

layer RTP belongs to. However, since it is associated with an application program and is a generic, application-independent protocol that only offers transport capabilities, it can be considered a transport protocol implemented in the application layer. The nesting of the packets can be seen in Figure 2.

The primary goal of RTP is to ensure the real-time viability of transmitting streams. However, implementing security measures for these streams requires additional time for encryption or creating signatures, such as digests for entire movie or audio files. Therefore, security introduces overhead and affects the timeliness of RTP. This paper aims to choose algorithms and procedures that can make RTP reliable in terms of both time and security.

Time considerations play a crucial role in RTP. When accessing a video or audio file over the Internet in real-time, the network's bandwidth becomes the most critical parameter. Additionally, factors like the minimum clip size and duration, as well as the processor speed of both the server and client, are important to consider.

If there are no security considerations, we can analyze the mathematical calculations involved in accessing real-time audio or video clips. The size of a one-second clip is represented as "oneSecFileSize" in bits and each clip has a duration of "cSec" seconds. The upload transmission rate is "uRate" bits per second, while the download transmission rate is "dRate" bits per second.

To determine the time it takes to upload a clip, denoted as "tUpload", we multiply the file size by the duration and then divide by the upload rate. On the other hand, the time to download a clip, referred to as "tDownload", is simply equal to the file size of one

second.

If either the upload time or download time exceeds the duration of the clip, there will be a delay in playback. This situation occurs when max(tUpload, tDownload) > cSec.

In order for continuous playback of clips, the following condition must be met: Max (1/uRate, 1/dRate) ; 1/ oneSecFileSize and Min (uRate, dRate) ; oneSecFileSize. The size of the clip does not affect the waiting time between clips at the receiver, only the size of a one-second file and ensuring that the provided upload and download rates satisfy the above condition. The lag time between playing and capturing is calculated by adding cSec, tupload and tdownload. Using this equation, the maximum lag without any breaks in the feed is 3*cSec and the minimum lag is cSec. To achieve a clip that is as close to real time as possible, cSec should be decreased.

Next, analyze the following scenarios using the provided information:

Both parties have a low bandwidth modem connection:

Assume that both the sender and receiver have a transmission rate (uRate and dRate) of 20Kbits/sec. In this case, the file size for one second should be less than 20Kbits. For a 10-second clip, the maximum playback delay will be 30 seconds. It has been observed that the minimum file size for transmitting a one-second video (without audio) is 8Kbits using H263 encoding and with a video size of 128x96 pixels. Additionally, it has been observed that the minimum file size for a video with an 8-bit mono audio, sampled at a rate of 8000Hz, is 80Kbits.

Either of the parties has a low bandwidth connection:

Assume that one party has a rate of 20Kbits/sec while the other party has significantly higher

speed.

In this case, the one-second file size should be less than 20Kbits, but the maximum playback lag is about 20 seconds if the clip size is 10 seconds. Both sender and receiver have high bandwidth: It is noted here that the one-second clip size may vary based on the file format and how it is encoded. For example, the one-second clip size of an MP3 file is smaller than that of a WAV file. However, when cryptographic algorithms are applied to the clip, extra processing time is added for each side. If strong encryption algorithms are used, then even more time is required for both sides and it will affect upload or download time as well as the time lag between them. Therefore, real-time access to data is also impacted.

When providing security in RTP, the parameters to consider include the bandwidth of the network, file format of clips, upload and download of the clip, processor and memory speed, and the application of cryptographic and hash algorithms.

RTP packet format and data transfer protocol:

The RTP packet formats and its Data Transfer Protocol are as follows:

RTP fixed header files:

Whenever data is transferred with RTP, a fixed header with the payload is always added. The RTP header has the following format shown in figure 4: The first twelve octets are present in every RTP packet, while the list of CSRC identifiers is present only when inserted by a mixer. The Version (V) field is 2 bits wide and identifies the version of RTP.

The specified version is version 2. The width of the padding is 1 bit. If the padding bit is set, the packet will have additional padding octets

at the end that are not part of the payload. The last octet of the padding indicates how many padding octets should be ignored. Padding is necessary for encryption algorithms with fixed block sizes or for carrying multiple RTP packets in a lower-layer protocol data unit.

Extension (X) is a single bit wide. If the extension bit is enabled, the fixed header is followed by precisely one header extension. CSRC Count (CC) is four bits wide. It indicates the quantity of CSRC identifiers succeeding the fixed header. Marker (M) has a width of one bit. The specific meaning of the marker is specified by a profile.

The purpose of this is to mark significant events, like frame boundaries, in the packet stream. A profile can modify the number of bits in the payload type field to add marker bits or specify no marker bit. The payload type (PT) is a 7-bit field that identifies the format of the RTP payload and how it should be interpreted by the application. A profile defines a default static mapping of payload type codes to payload formats. Additional payload type codes can be defined dynamically through methods other than RTP.

The companion profile Internet-Draft draft-ietf-avt-profile specifies an initial set of default mappings for audio and video, which can be expanded in future editions of the Assigned Numbers RFC [9]. When sending RTP, only one RTP payload type is emitted at a time, as this field is not meant for multiplexing separate media streams. The sequence number is 16 bits in width and increases by one for each RTP data packet sent. The receiver can use it to identify packet loss and

restore packet sequence.

The sequence number is initially set to a random value to make it harder for known-plaintext attacks to decrypt the encryption, even if the source itself doesn't encrypt the packets, because the packets might go through a translator that does encrypt. The time stamp is a 32-bit value that represents the sampling instant of the first octet in the RTP data packet. This sampling instant must come from a clock that increases consistently and linearly over time to enable synchronization and measurement of jitter. The clock's resolution must be adequate for desired synchronization accuracy and for measuring packet arrival jitter (having one tick per video frame is usually not enough).

The clock frequency of the payload format of data determines its clock frequency. This frequency is set either statically in the profile or payload format specification or dynamically for payload formats defined through non-RTP methods. If RTP packets are created periodically, the predetermined sampling instant based on the sampling clock should be used, not the system clock reading. For example, in fixed-rate audio, the timestamp clock would increase by one for each sampling period. If an audio application reads 160 sampling periods from the input device, the timestamp would increase by 160 for each block, regardless of whether the block is transmitted or dropped. The initial value of the timestamp is random, just like the sequence number. If multiple RTP packets are logically generated at once, such as belonging to the same video frame, they may have equal timestamps consecutively.

When transmitting data in a non-sequential order, such as with MPEG interpolated video frames, consecutive RTP packets may have timestamps that are not in a

consistent order. The synchronization source (SSRC) field, which is 32 bits wide, is used to identify the synchronization source. This identifier is randomly chosen to minimize the chance of two synchronization sources having the same SSRC identifier within the same RTP session. However, all RTP implementations should be prepared to handle collisions if they occur. If a source changes its source transport address, it must also select a new SSRC identifier to prevent confusion with a repeated source.

The CSRC list is made up of 0 to 15 items, with each item being 32 bits long. This list identifies the contributing sources for the payload found in the packet. The number of identifiers is determined by the CC field. If there are more than 15 contributing sources, only 15 can be identified.

CSRC identifiers are added by mixers, using the SSRC identifiers of the contributing sources. This is done to accurately indicate the talker at the receiver. For instance, when creating audio packets, the SSRC identifiers of all the sources mixed together are listed.

Multiplexing RTP Sessions:

To ensure efficient protocol processing, it is recommended to minimize the number of multiplexing points. In RTP, multiplexing is achieved through the destination transport address, which includes the network address and port number. This address defines an RTP session. For example, in a teleconference where audio and video media are separately encoded, each medium should have its own RTP session with a unique destination transport address.

The intent is not to carry the audio and video together in a single RTP session and then separate them based on the payload type or SSRC fields. If packets with different payload types were interleaved

but had the same SSRC, it would present several issues. If a payload type were changed during a session, there would be no way to determine which of the old values the new one replaced. An SSRC is used to identify a single timing and sequence number space. Interleaving multiple payload types would require separate timing spaces if the media clock rates are different, and different sequence number spaces would be needed to determine which payload type experienced packet loss.

The RTCP sender and receiver reports can only describe one timing and sequence number space per SSRC and do not include a payload type field. An RTP mixer cannot combine compatible interleaved media streams into a single stream. Having multiple media in one RTP session prevents the use of different network paths or resource allocations if needed. It also prevents the reception of only a subset of the media, such as audio-only if video exceeds available bandwidth. Furthermore, receiver implementations that use separate processes for different media are not possible with a single RTP session. Alternatively, using a different SSRC for each medium but sending them in the same RTP session would address the first three issues but not the last two.

Profile-specific Modifications To The Rtp Header:

The current RTP data packet header is considered complete for common functions required in all application classes supported by RTP. However, the header can be customized through modifications or additions specified in a profile document while still allowing monitoring and recording tools that are independent of the profile to work properly.

The profile-specific information in the marker bit and payload type field is allocated in the fixed header to accommodate

the potential need for them in many applications. This prevents the necessity of adding another 32-bit word solely for their storage. A profile can redefine the octet containing these fields to meet specific requirements, including changes in the number of marker bits. If marker bits are present, it is recommended to place one in the most significant bit of the octet. This is because profile-independent monitors can potentially observe a relationship between packet loss patterns and the marker bit. Any additional information required for a specific payload format, such as video encoding, should be carried in the payload section of the packet. This information may be included in a header that always appears at the start of the payload section or indicated by a reserved value in the data pattern. If a certain class of applications requires additional functionality unrelated to payload format, the associated profile should define extra fixed fields immediately following the SSRC field of the existing fixed header.

The applications will have quick and direct access to additional fields, while profile-independent monitors or recorders can still process RTP packets by interpreting only the first twelve octets. If there is a need for additional functionality that is common across all profiles, a new version of RTP should be defined to permanently change the fixed header.

The Rtp Header Extension provides a mechanism for individual implementations to experiment with new functions that are independent of payload format. These functions require additional information to be included in the RTP data packet header. The mechanism is designed in a way that allows interoperating implementations that have not been extended to ignore the header extension.

Please note that this header

extension should only be used sparingly. Most uses of this mechanism would be better accomplished using the methods described in the previous section. For instance, if a specific extension is needed for a profile, it would be more efficient to add it to the fixed header rather than using a conditional or variable location. Any additional information required for a particular payload format should not be included in this header extension, but rather should be carried in the payload section of the packet. If the X bit in the RTP header is set to one, a variable-length header extension is appended to the RTP header, following the CSRC list if present. The structure of the RTP header extension is illustrated in figure 5: This extension includes a 16-bit length field, which indicates the number of 32-bit words in the extension (excluding the four-octet extension header). Therefore, a length of zero is also considered valid.

The RTP data header allows only one extension to be added. This limitation allows different implementations to experiment independently with various header extensions. The first 16 bits of the extension are left open to differentiate identifiers or parameters. The specific format of these 16 bits is determined by the profile specification in use. However, this RTP specification does not provide any definitions for header extensions.

Real-time Transport Control Protocol (RTCP):

RTCP is a coordination protocol of Real-time Transport Protocol (RTP). Its purpose is to enhance the performance of RTP by performing certain tasks. RTCP operates by periodically transmitting control packets to all participants in the session. These control packets are distributed using the same mechanism as

the data packets. To achieve this, the underlying protocol must support multiplexing of the data and control packets. One example of achieving multiplexing is by using separate port numbers with the User Datagram Protocol (UDP).

RTCP has four main functions. The first function is to provide feedback on the quality of the data distribution. This feedback is important for the RTP's role as a transport protocol, as well as for flow and congestion control functions in other transport protocols. The feedback is useful for controlling adaptive encodings and diagnosing faults in the distribution. By sending reception feedback reports to all participants, one can determine if problems are local or global. In IP multicast scenarios, network service providers can also receive feedback information and act as third-party monitors to diagnose network problems.

This feedback function is performed by the RTCP sender and receiver reports. RTCP carries a persistent transport-level identifier for an RTP source called the canonical name or CNAME. Receivers require the CNAME to keep track of each participant, as the SSRC identifier may change due to conflicts or program restarts. The CNAME is also necessary to associate multiple data streams from a participant in a set of related RTP sessions, such as audio and video synchronization. To ensure scalability for a large number of participants, it is important to control the rate at which RTCP packets are sent by all participants. This allows each participant to independently observe the number of participants by sending control packets to all others.

The purpose of this number is to determine the rate at which packets are sent. Another function is to provide basic session control information, such as participant identification

for display in the user interface. This feature is particularly useful in sessions without strict control over membership or negotiation of parameters. RTCP serves as a convenient means to communicate with all participants, but it may not meet all the communication needs for controlling an application. In certain cases, a higher-level session control protocol may be necessary, although it is not covered in this document. Functions (i)-(iii) are required for RTP in IP multicast environments and recommended for all environments.

RTP application designers should avoid using mechanisms that can only function in unicast mode and cannot handle larger numbers efficiently.

RTCP Transmission Interval:

RTP is designed to automatically adapt to session sizes ranging from a few participants to thousands. In an audio conference, data traffic is naturally limited as only one or two people speak at a time. With multicast distribution, the data rate on a link remains consistent regardless of the number of participants. However, control traffic is not self-limited. If reception reports from each participant were sent at a constant rate, control traffic would increase in proportion to the number of participants. As a result, the rate needs to be adjusted accordingly.

Each session assumes that the data traffic is subjected to an "session bandwidth" limit, which is divided among the participants. This limit could either be reserved and regulated by the network or simply a fair share. The session bandwidth can be determined by considering the cost or prior knowledge of the available network bandwidth for the session. It is somewhat separate from the media encoding chosen, although the choice of encoding may be restricted by the session bandwidth. The session management application is expected to

provide the session bandwidth parameter when invoking a media application, but media applications can also establish a default based on the data bandwidth for the encoding used in a single-sender scenario. Additionally, bandwidth limits may also be enforced by the application based on multicast scope rules or other criteria.

Bandwidth calculations for control and data traffic in the resource reservation system include lower-layer transport and network protocols such as UDP and IP. The application should also be aware of the protocols in use. Link level headers are excluded from the calculation as the packet will have different link level headers during transmission. The control traffic should only consume a small portion of the session bandwidth, to ensure that the primary function of the transport protocol is not affected. This fraction of the bandwidth can be specified in a resource reservation protocol, allowing each participant to calculate their share independently. It is recommended that 5% of the session bandwidth be allocated to RTCP.

While the value of this and other constants in the interval calculation is not critical, all participants in the session must use the same value.

The real-time transport protocol Essay Example

Abstract

Introduction:

RTP Use Scenarios:

Simple Multicast Audio Conference:

Audio and Video Conference:

Mixers and Translators:

Multiplexing RTP Sessions:

Profile-specific Modifications To The Rtp Header:

Real-time Transport Control Protocol (RTCP):

RTCP Transmission Interval:

Haven't found what you were looking for?

Search for samples, answers to your questions and flashcards

The real-time transport protocol Essay Example

Abstract

Introduction:

RTP Use Scenarios:

Simple Multicast Audio Conference:

Audio and Video Conference:

Mixers and Translators:

Multiplexing RTP Sessions:

Profile-specific Modifications To The Rtp Header:

Real-time Transport Control Protocol (RTCP):

RTCP Transmission Interval:

Haven't found what you were looking for?

Search for samples, answers to your questions and flashcards

Unfortunately copying the content is not possible

Tell us your email address and we’ll send this sample there.