Overview

Introduction

Flavor originated from the need to simplify and speed up the development of software that processes coded audio-visual or general multi-media information. This includes encoders and decoders as well as applications that manipulate such information. Examples include editing tools, content creation tools, multimedia indexing, search engines, etc.

Such information is invariably encoded in a highly efficient form, to minimize the cost of storage and transmission. This source coding operation is almost always performed in a bitstream-oriented fashion: the data to be represented is converted to a sequence of binary values of arbitrary (and typically variable) lengths, according to a specified syntax. The syntax itself can have various degrees of sophistication. One of the simplest forms is the GIF87a format, consisting of essentially two headers and blocks of coded image data using the Lempel-Ziv-Welch compression. Much more complex formats include the JPEG, MPEG-1, 2, and 4 specifications, among others.

General-purpose programming languages such as C++ and Java do not provide native facilities for coping with such data. Software codec or application developers need to build their own facilities, involving two components. First, they need to develop software that deals with the bitstream-oriented nature of the data, as general-purpose microprocessors are strictly byte-oriented. Second, they need to implement parsing and generation code that complies with the syntax of the format at hand (be it proprietary or standard). These two tasks represent a significant amount of the overall development effort. They also have to be duplicated by everyone who requires access to a particular compressed representation within their application. Furthermore, they can also represent a substantial percentage of the overall execution time of the application.

Flavor addresses these problems in an integrated way. First, it allows the "formal" description of the bitstream syntax. Formal here means that the description is based on a well-defined grammar, and as a result is amenable to software tool manipulation. In the past such descriptions were using ad-hoc conventions involving tabular data and/or pseudo-code. A second and key aspect of Flavor is that it has been developed as an extension of C++ and Java, both heavily used object-oriented languages in multimedia applications development. This ensures seamless integration of Flavor code with both C++ and Java code and the overall architecture of an application.

Flavor was designed as an object-oriented language, anticipating an audio-visual world comprised of audio-visual objects, both synthetic and natural, and combining it with well-established paradigms for software design and implementation. Its object-oriented facilities go beyond the mere duplication of C++ and Java features, and introduce several new concepts that are pertinent to bitstream-based media representation.

In order to validate the expressive power of the language, several existing bitstream formats have already been described in Flavor, including sophisticated structures such as MPEG-1 Systems, Video and Audio. See the Samples page for the list of currently available Flavor descriptions; the Flavor package includes most of the samples listed in that page as well as working C++/Java programs that use the sample descriptions. A translator has also been developed for translating Flavor code to C++ and Java. (Click here to download the latest Flavor package).

Top

History

Flavor originated from the need to simplify and speed up the development of software that processes coded bitstream. Flavor's ancestor was a Perl script called mkvlc, originally written in 1993-94, which automatically generated C code for variable-length coding table declaration (including decoding trees).

In 1995, the ideas behind mkvlc took a more concrete shape in the form of a "syntactic description language," i.e., a formal way to describe not just variable-length codes, but the entire structure of a bitstream. Such a facility was needed in the MPEG-4 standardization activity, which was moving in a direction of flexible, even programmable, audio-visual decoding systems.

Syntactic description was originally introduced in MPEG-4 to indicate the format of coded data as represented in a serialized bitstream. The acronym MSDL was introduced by the then-called AOE (Applications and Operating Environments) subgroup of MPEG, the group that laid down the foundations of MPEG-4. The acronym referred to the MPEG-4 Syntactic Description Language, indicating that this language would describe the format of a bitstream as it should be delivered to a decoder. This would allow the combination of the various basic MPEG-4 coding tools into complete algorithms.

A specific Syntactic Description Language was originally introduced and adopted in the November 1995 MPEG meeting. It subsequently underwent a sequence of revisions, taking into account feedback from several people in the MPEG Systems, Audio, and Video groups. Note that the acronym SDL has no relation to the "Specification and Description Language", ITU-T Recommendation Z.100, which is used in the telecommunications field. The name Flavor (which stands for Formal Language for Audio-Visual Object Representation) was introduced in March 1996 by its designers at Columbia.

The MSDL term changed as the name and scope of the AOE group switched to "MPEG-4 Integration." The expanded scope was necessary with the realization that a configurable decoding terminal would have significant flexibility and complexity, thus expanding the level of control that a content developer would need to exercise over it. The MSDL acronym now referred to MPEG-4 System and Description Languages, which covered issues like tool APIs and scene composition, as well as the "glue" language between them (Java was considered at the time). The syntactic description component was referred to as MSDL-S, where 'S' standed for syntax. Allowing full programmability at the terminal was later realized to be too radical, as several technical issues remained unresolved. Emphasis was placed in a parametric representation scheme, which was more in line with prior MPEG specifications. The MPEG-4 Integration group was renamed MPEG-4 Systems, similarly to MPEG-1 and MPEG-2, which is its current name. The scope of the group, however, remained broader than its MPEG-1 and MPEG-2 ancestors and includes scene description and interaction, multiplexing/synchronization, MPEG-4 file format, as well as syntactic description. The term Syntactic Description Language (SDL) is now used in MPEG-4, and comprises Section 6 of the MPEG-4 Systems Committee Draft.

Top

Technical Overview

Flavor provides a formal way to specify how data is laid out in a serialized bitstream. It is based on a principal of separation between bitstream parsing operations and other encoding/decoding operations. This separation acknowledges that the same syntax can be utilized by different tool implementations, but - more importantly - that the same tool can work unchanged with a different bitstream syntax (e.g., a different variable-length code table). For example, the number of bits used for a specific quantity can change without modifying any part of the decoding, compositing, or rendering algorithms.

Flavor builds on the C-like syntax used in MPEG-1 and MPEG-2, which however was informal and could not fully describe the specification. Several of the lower layers could only be handled by explanatory text. The text had to be carefully crafted and tested over time for ambiguities. Other specifications (e.g., GIF and JPEG) use similar bitstream representation schemes, and hence share the same limitations. Even though other facilities already exist for representating syntax (e.g., ASN.1 - ISO International Standards 8824 and 8825), they cannot cope with the intricate complexities of source coding operations (variable length coding, arithmetic coding, etc.).

Flavor was designed to be an intuitive and natural extension to the typing system of object-oriented languages -- particularly C++ and Java. This means that the bitstream representation information is placed together with the data definition in a single place. In C++ and Java, this place is where the class is defined. Flavor has been explicitly designed to follow a declarative approach to bitstream syntax specification. In other words, the designer is specifying how the data is laid out on the bitstream, and does not detail a step-by-step procedure that parses it. This latter procedural approach would severely limit both the expressive power as well as the capability for automated processing and optimization, as it would eliminate the necessary level of abstraction.

A related example from traditional programming is the handling of floating point numbers. The programmer does not have to specify how such numbers are represented or how operations are performed; these tasks are automatically taken care of by the compiler in coordination with the underlying hardware or run-time emulation libraries.

An additional feature of combining type declaration and bitstream representation is that the underlying object hierarchy of the base programming language (C++ or Java), becomes quite naturally the object hierarchy for bitstream representation purposes as well. This is an important benefit for ease of application development.

By using C++ and Java as a basis, translation of Flavor to regular C++ and Java using automated tools becomes relatively easy and intuitive. This way the syntactic description can be directly used for application development, significantly simplifying the job of the programmer.

The following examples indicate how the integration of type and bitstream representation information is accomplished.

Consider a simple visual object called Example with just a single Data member, represented using 8 bits. Using the MPEG-1/2 methodology, this would be described as follows.

Example 1: "Example" using MPEG-1/MPEG-2

Example() {	No. of bits	Mnemonic
Data	8	uimsbf
}

Note that the "uimsbf" mnemonic means unsigned integer, most significant bit first.

A C++ (and similarly Java) description of this object could be specified as shown in Example 2 below.

Example 2: "Example" using C++


class Example {
    unsigned int Data;
    void get(void) { Data=::getuint(8);  }
    void put(void) { ::putuint(8, Data); }
};

Here getuint() is a function for reading bits from a bitstream (here 8) and returning them as an unsigned integer; the putuint() function has a similar functionality but for output purposes. When Example::get() is called, the bitstream is read and the resultant quantity is placed in the Data member.

In Flavor, the same object would be described as follows:

Example 3: "Example" using Flavor


class Example {
    unsigned int(8) Data;
}

As we can see, the bitstream representation is integrated with the type declaration. The above should be read as: Data is an integer quantity represented using 8 bits in the bitstream.

These examples, although trivial, demonstrate the differences between the various approaches. In Example 1, we just have a tabulation of the various bitstream entities, grouped into syntactic units (here Example). This style is sufficient for straightforward representations, but fails when more complex structures are used (e.g., variable length codes).

In Example 2, the syntax is incorporated into hand-written code embedded in a get() or equivalent method. This is not really the specification of a syntax, but rather the code that implements it. One of the key objectives in the design of Flavor is to actually use the specification of Example 3 to automatically produce the code in Example 2.

Flavor provides a wide range of facilities to define sophisticated bitstreams, and includes all common language features of C++ and Java (declarations, statements, etc.). It also introduces a set of its own idioms, in order to properly address media representation: partial arrays, bitstream polymorphism, parameter types, and etc. Since Flavor is a declarative language, it does not need functions or methods. As a result, all statements are included in the data declaration part of the class. A complete description of Flavor can be found in the Flavor specification. A translator is also available that converts Flavor to C++ and Java; you can view its manual on-line or download the entire software package.

Top

Benefits of Flavor

Flavor can be used at three different levels of increasing sophistication: syntax documentation, software development, and syntax communication. While the first two are of immediate use, syntax communication is currently the subject of exploratory research.

1. Syntax Documentation

At a very basic level, Flavor can be used for syntax documentation, i.e., to describe the syntax of a particular coded audio-visual (or general multimedia) bitstream. Benefits include:

Unambiguous definition of all syntactic structures of the bitstream. No explanatory text is necessary except from the definition of Flavor itself. Note that, because of the reliance of Flavor on well-known languages (C++/Java), its specification is quite intuitive.
Concise description. Because syntactic constructs are expressed using facilities directly supported by Flavor, they invariable take much less text to describe. As the language constructs are much closer to the problem domain, they are able to express it more succinctly.

2. Software Development

At a second level of use, and as a result of the precise way in which Flavor is defined, the syntax specification can be used to automatically generate code that operates on it. This includes code that inputs bitstream objects, outputs bitstream objects, or just traces the contents of a bitstream (produces a detailed output of its contents). Such code is a key part of several applications, including decoders, encoders, editors, indexing and search engines, bitstream validation suites, etc., and consitutes a large part of the development effort. By automatically producing such code, a substantial and oftentimes complex part of the programmer's effort is completely eliminated.

A comprehensive software package for translating Flavor code to C++ or Java code is already available for download.

3. Syntax Communication

At a third level, the bistream syntax as defined by Flavor can be converted into a binary form and transmitted together with the content. This allows the design of systems with flexible or fully configurable syntax parsing. The concept is similar to self-extracting archives, initially popularized in PCs by the PKZip utilities: the archive is delivered as an executable file which, when ran on a specific platform (e.g., MS-DOS on Intel x86), extracts the files it contains.

Such an approach has several advantages. Since Flavor is able to capture high-level syntactic features, it provides full optimizaton flexibility to the implementor. In a hardware setting, binary Flavor commands can be mapped directly to hardware, thus allowing extremely fast bitstream parsing and generation. In a software setting, these command can be interpreted by optimized software. Such optimizations can have a significant impact on the performance of both software and hardware-based systems, since bitstream I/O is a significant part of a media application's computational budget and the operations are very costly on general-purpose byte-oriented processing systems.

Such trends for adding multimedia processing support even to general-purpose microprocessors are already evident in Sun's Visual Instruction Set for SPARC chips and Intel's MMX for the Pentium. Although these extensions address Single Instruction Multiple Data (SIMD) operations, one can similarly extend the command sets with bitstream I/O operations for a significant additional performance improvement.

The case of Java is particularly interesting, as it already allows for downloading of executable code. Currently, class definitions includes data members and methods that operate on these variables. In the case of Flavor, it would include a third component: the syntax for those data members that are parsed from the bitstream (parsable components). This syntactic information can be operated upon by automatically generated methods, and can also be optimized by extending the Java Virtual Machine with bitstream I/O instructions. Such capabilities are natural for media-intensive applications, which have to continuously operate on the bitstream level to create, access, or modify coded multimedia information. Note that the syntax information could be downloadable separately from the class, thus allowing to redefine the syntax for an already downloaded (or even locally available standard) class.

In addition to optimization, an additional - and perhaps less obvious - benefit of providing for programmable syntax is that it allows both forward and backward compatibility. Given a particular class and its associated syntax, we can extend the class with additional data members or methods. Older systems that do not have the new class will still be able to operate on the new objects: the syntactic description will correctly extract the information from the bitstream, but the methods will only be able to operate on the data that existed in the older specification. This would allow one to extend, for example, the MPEG-2 specification by adding an extra flag at the macroblock level, but still allow older systems to interoperate.

Top

Flavor and XML - XFlavor

XFlavor is an enhancement of the basic Flavor package in which the translator has been engineered to support XML. In addition to the bitstream reading and writing code, now the translator can also produce the code for generating XML documents corresponding to the bitstreams described by Flavor. An advantage of using an XML representation of multimedia data is that the data is easier to access and manipulate. Once the data is in XML form, due to its self-describing nature, the semantic values of the data (e.g., the width and height of an image) are directly available; the XML document abstracts the bitstream layer. As a result, applications can easily be developed using a generic XML parser to obtain the values. In the bitstream format, such values must be extracted via bit string manipulations, according to the specified syntax.

Another advantage is that applications are provided with generic access to multimedia data. For example, consider a search engine that uses DCT values to search for images and videos. Dedicated software modules would be needed to parse the bitstreams of the numerous available formats and obtain the relevant DCT values. We would need as many bitstream parsers as there are supported formats. However, if XML were to be used to represent the data, then one search engine with an XML parser would be sufficient.

Additionally, most binary media formats have internal structure, and exposing this in an XML form makes this structure human-readable and processable by generic XML tools (e.g., XSLT) or plain text manipulating tools (e.g., Perl). Such an idea has recently been proposed in MPEG-21 for easy and general manipulation of scalable content with the introduction of a bitstream syntax description language, BSDL.

The translator can also be used to generate a schema (using XML Schema) from a Flavor description of a bitstream syntax. Most of the currently available XML parsers support schemas and they can be used to automatically check the "validity" of corresponding XML documents. The data checking process is a very important part of a multimedia application; however, it is tedious to implement. In addition, we use the generated schema to exactly convert conforming XML documents into original bitstreams. To realize such conversion, the translator also includes bitstream syntax information in the generated schema.

Of course, an XML representation defeats one of the main purposes of media representation: to compress data as small as possible. Hence, although it provides very useful features for processing data, it should not be used to replace the original binary representation. For storage and transmission, the binary format should be used to minimize the associated cost. Additionally, for fast and memory-effcient processing of multimedia data, it is better to use the binary data than the expanded textual data. As a solution, we have developed bitgen (bitstream generator) as part of XFlavor, for converting the XML represented multimedia data back into its native bitstream format. There are other solutions, e.g., BiM, XMill and XMLZip; however, all these tools compress XML documents into their own binary format. Our tool yields a better compression result than using any one of the above-mentioned compression tools because they incorporate the information contained in the tags when compressing XML documents (in order to maintain the structure), whereas bitgen simply discards the tags and encodes the content of the elements according to the original bitstream syntax. Also, we are dealing with an already compressed bitstream to begin with, and any additional compression on the original data would be very difficult.

Top

Status of Flavor

The specification of Flavor has evolved over a period of more than four years, and has been stable for more than a year. Several examples of well-known bitstream formats have been generated, including MPEG-1 and MPEG-2, validating its expressive power and refining its features. The specification is available on-line.

Since Version 5.0, the Flavor translator has been enhanced to support XML. This allows multimedia data to be converted back and forth between the binary and XML representations. Once in the XML form, data can be more easily and flexibly manipulated. For more information about the XML features, refer to the Flavor and XML section.

Research work is under way on the issue of syntax communication based on Java. The objective is to make media representation an integral part of an object, and allow its full programmability and reconfiguration.

Top