video Understanding a media presentation Understanding DASH adaptive streaming presentations


Example

DASH is the most widely deployed adaptive streaming technology in modern solutions, used to deliver video in a wide variety of scenarios. The best way to understand DASH presentations is to observe the network activity that takes place during playback.

This example uses Fiddler to capture and analyze browser network traffic, though any similar tool will also suffice. We will use the dash.js open source player for video playback.

For our demo content, we will use the Axinom DASH test vectors, specificially the single-period 1080p variant of the "Clear" test vector.

enter image description here

With your network capture running, open the dash.js nightly build sample player in any modern browser and enter the URL http://media.axprod.net/TestVectors/v6-Clear/Manifest_1080p.mpd into the textbox. Press Load to start playback. You will observe the following files being downloaded:

http://media.axprod.net/TestVectors/v7-Clear/Manifest_1080p.mpd
http://media.axprod.net/TestVectors/v7-Clear/2/init.mp4
http://media.axprod.net/TestVectors/v7-Clear/15/init.mp4
http://media.axprod.net/TestVectors/v7-Clear/18/init.mp4
http://media.axprod.net/TestVectors/v7-Clear/18/0001.m4s
http://media.axprod.net/TestVectors/v7-Clear/2/0001.m4s
http://media.axprod.net/TestVectors/v7-Clear/15/0001.m4s
http://media.axprod.net/TestVectors/v7-Clear/15/0002.m4s
http://media.axprod.net/TestVectors/v7-Clear/5/init.mp4
http://media.axprod.net/TestVectors/v7-Clear/5/0002.m4s
http://media.axprod.net/TestVectors/v7-Clear/18/0002.m4s
http://media.axprod.net/TestVectors/v7-Clear/5/0003.m4s
...

The first file is the presentation manifest - an XML document whose format is defined in ISO/IEC 23009-1. This describes the DASH presentation to a depth sufficient to allows the player to understand how to play it back.

If you look inside the manifest, you will see various AdaptationSet elements, each of which describes a single adaptation of the content. For example, there is one adaptation set for the video track, three adaptation sets for three audio languages and five adaptation sets for five subtitle languages.

Inside adaptation sets are Representation elements. For the video adaptation set, there are several of these - each representation contains the same visual content encoded using a different quality level. Each audio and text adaptation set only has one representation.

enter image description here

To perform playback, a player will need to decide which adaptation sets to present to the viewer. It can make this decision based on any custom or built-in business logic that it desires (e.g. language order of preference). The adaptation sets that the content author considers primary have a Role element in the manifest declaring the "main" role.

Furthermore, the player will need to decide which representation to present to the viewer (if an adaptation set offers multiple representations). Most players start conservatively and apply a heuristic algorithm that will attempt to present the maximum quality level that the viewer's network connection can sustain.

The player is free to change the active set of representations and/or adaptation sets at any time, either in response to user action (selecting a different languge) or automated logic (bandwidth heuristics result in quality level change).

The SegmentTemplate element defines the URL structure that the player can use to access the different representations. A key factor of DASH presentations is that content is split into small segments of a few seconds each (4 seconds in case of our sample movie), which are downloaded independently. Each representation also has an initialization segment, named "init.mp4" for this sample movie, which contains representation-specific decoder configuration and must therefore be loaded before any other segment from that representation can be processed.

The behavior described here is accurate for the DASH Live profile, which is the most commonly used variant of DASH. There also exist other profiles with slightly different behavior, not covered here. Pay attention to the "profile" attribute on the DASH manifest root element to ensure that this description applies to your videos!

As you examine the list of URLs obtained from network traffic capture and compare against the information provided by the manifest, you will conclude that the player performed the following actions after downloading the manifest:

  1. Download the initialization segments for representations 2 (360p video), 15 (English audio) and 18 (English subtitles).
  2. Download the first segments of the above three representations (0001.m4s).
  3. Download the second segment of the audio representation.
  4. From the second video segment onwards, switch to the 1080p video stream! This is indicated by downloading the initialization segment and the second segment of representation 5 (1080p video).
  5. Continue to download more segments of the active representations.

By observing network activity, it becomes easy to observe the decisions that a DASH adaptive streaming player makes in operation. Such a player is simply a mechanism that downloads segments of various tracks and provides them consecutively to a media playback engine, switching tracks as appropriate.

The Axinom DASH test vectors also contain archive files that let you download the entire presentation for filesystem-level analysis. You will find that the files on disk are exactly as they are on the network level. This means that DASH presentations can be served by arbitrary HTTP servers, without the need for any custom server-side logic.

A aspect of Live profile DASH that complicates analysis is that the media samples are spread across a large number of segments. Most media analysis tools are unable to process individual segments, operating only on whole tracks. You can often get past this limitation by simply concatenating the segments of a single representation, starting with the initialization segment. For example, on Windows you may use the following command:

copy init.mp4 /b + 0001.mp4 /b + 0002.mp4 /b + 0003.mp4 /b track.mp4

This will create a track.mp4 file that contains the first three media segments from a representation. While not identical in structure to a stand-alone MP4 file, such a file can still be analyzed by most tools (such as mp4info and FFmpeg) without significant loss of functionality.