The GenericByteData format

Introduction

This document describes the so-called GenericByteData format developed and maintained by Milestone Systems. It is a proprietary data format for encapsulation of various types of media data used by the different products developed by Milestone Systems. It is thus the (only) common format for exchanging media data between these products and is used extensively throughout the system. The data encapsulated by the format is different types of media data (e.g. video data). The format defines the way the different types of data is encapsulated, and therefore it may also be seen as a sort of a media container format. The format is generic in the sense that it can contain many different types of data, it can easily be extended and adapted with new sub-formats, and it is also a generic way of handling the media data from the many different devices (e.g. IP cameras) that are handled by the system.

The device independence is exactly one of the strengths behind and reasons for the definition of the format. In an IP surveillance system, the device sends media data encapsulated in some format using some protocol. As Milestone software supports many devices from many different manufacturers, there are also many different formats and protocols that are used for this purpose, where some are proprietary and thus specific to the single device manufacturer whereas others are using open protocols. In any situation, the handling of the device specific details is left with the device driver part of the system.

The device driver takes care of all the communication with the device and thus also handles the media data transportation layer, and as part of this the driver will strip away any protocol or similar encapsulation layer of the actual raw media data and rearranges this extracted data into the GenericByteData format. All other parts of the surveillance software system now only need to handle the single generic format and does not know anything about how the media data was originally delivered by the device. In general, if the system only looks at the data delivered as GenericByteData and is not using any additional side information, it will not know from which device the data came from. The key to this is that the encapsulated media data is completely raw in the sense that it is in a form that complies fully with the media data standard and thus does not include any additional data elements that is not relevant or a part of the actual media data standard. Additionally, the generic definition of the different headers assures that different spots in the system can handle the data without understanding the details of the actual encapsulated data by only processing (parts of) the header. This is detailed later in description of the different sub-formats and the codec support beginning with the next section.

The name of the format when described in this document is GenericByteData, which has originally been a colloquial term used internally by Milestone R&D for referring to the format but is now more or less the only term used for the format. The name obviously refers to the genericness of the format and the fact that it encapsulates data that can be any type of byte data and is not only confined to different types of media data. A byte is equal to the usual 8-bit octet definition.

Overview of the format

The GenericByteData format defines an overall format that in reality consists of a number of different sub-formats. The sub-formats define the different encapsulations offered by the format and in general the sub-formats relate to the different types of media data. The GenericByteData is then a collective term for the range of sub-formats with the sub-formats not having a specific name as such except for a short appellation useful in communicating the differences between sub-formats.

Currently, there are defined four sub-formats handling the two major types of media data that exist in an IP surveillance system, namely audio and video data. No other types of byte data have currently been defined for encapsulation. One could use the format - with proper extension (definition) of the format - for almost any type of data including for example various sorts of meta data.

The format is basically a wrapper of raw byte data. The format is specified as an encapsulation of specifically-defined chunks of raw byte data with a format header added in front of the data. This is illustrated in the figure below. The prefixed header is different from sub-format to sub-format but common for them all is that they are compact in size so that having the header only adds a minimal overhead to the data although this depends entirely on the amount of raw data that is wrapped into a single GenericByteData chunk (i.e. some GenericByteData chunks may contain only a small amount of raw data and in that case the overhead of the header will obviously be relatively large). The encapsulated byte data is also referred to as the body data.

Illustration of the fact that the format basically adds a header in front of the encapsulated raw byte data.

There is no special marker, tag, or similar start-code as such defined for the GenericByteData format. The format is simply distinguished by the first two bytes of the header. These two bytes defines as a single (unsigned) 16-bit value, which sub-format that is wrapped into a given GenericByteData chunk and thus also defines the format and the size of the remainder of the header. The two bytes define the data type (or sub-format) of the entire chunk of GenericByteData including the format of the header.

The four currently defined sub-formats consists of two for handling video, one for audio and one for combining multiple packets (chunks) into one compound GenericByteData chunk. The video sub-formats define one for handling video blocks (short sequences of video) and the other for handling video packets (single pictures). The video sub-formats are described in chapter Video sub-formats with the audio sub-format described in chapter Audio sub-formats, and the multi-packet sub-format is described separately in chapter Additional sub-formats.

The distinguishing of sub-formats from the first two bytes of the format header is illustrated in the figure below, and the table summarizes the data type values of these two bytes for each of the sub-formats currently defined. Note that hexadecimal numbers are presented in the common way with 0x prefixed the actual hexadecimal digits of the number.

Illustration of the first two bytes of the GenericByteData header that defines the data type or sub-format of the full chunk of data.

The different sub-formats (data types) defined. All values not listed are also reserved (e.g. `0x0000` is also a reserved value).
Sub-format	Data type values
Video blocks	`0x0001 - 0x000F`
Video stream packet	`0x0010`
Audio stream packet	`0x0020`
Metadata stream packet	`0x0030`
Multi-packet	`0xFEF0`
Reserved	`0xFF00 - 0xFFFF`

It is important to note that all values in all the header variants of the format are always in the big-endian byte order (network order). This is true irrespective of the bit-width of the value. The format currently defines values of 8 bits, 16 bits, 32 bits, and 64 bits, and for these bit-widths the binary representation in the header is always in the big-endian order with the most significant byte placed first. Additionally for some sub-formats, a number of bit flags are defined but these are logically placed inside a value of more bits (e.g. an unsigned 16-bit value containing 16 bits available for flags). The media data following the header is defined to be a string of bytes and is thus byte-oriented but otherwise follows the logical byte order for the specific media data. In general, media data is logically also a byte-stream; e.g. MPEG-4 video data as an elementary stream is a byte-stream (bit-stream). This is described later for each media data encapsulation. Furthermore, most header values regardless of their bit-width are unsigned unless noted otherwise.