Mapping byte sequence of FFmpeg API structures

I’m seeing if it’s possible to call the FFmpeg APIs directly from Xojo. I’m using the APIs rather than calling FFmpeg from the Shell as I need to bundle FFmpeg with my Mac/Win app and don’t want to require my clients to install FFmpeg. I see I can include the shared libraries in my apps by following the FFmpeg compliance list.

First I wanted to ask has anyone done this before from Xojo?

As a bit of background I have no previous programming knowledge of C, so for example it took me several hours to work out that an asterisk (*) in a declaration refers to a ptr! I haven’t been able to find a comprehensive reference on how to write declares in this forum, xDev mag, xDev library or the developer conference videos that I have or can access on-line. Don’t get me wrong, there’s some very helpful posts here in this forum, but they seem to be more for calling OS APIs. However, if anyone knows of something else please let me know.

To start I’m trying to call the APIs to load a stereo WAV file and split it into two mono WAV files (something I can easily do by calling FFmpeg via the shell). I’ve been able to load the WAV into an AVFormatContext, then get ptrs to the audio stream and codec. I’m able to get a sensible codec name from the codec object (ptr), so I know I’m onto something. I’m currently using DeclareLibraryMBS to interact with the APIs which all seems to be working well (many thanks for your help so far @Christian_Schmitz).

I’m coming unstuck with larger structures and knowing how to access this data, or see where the data is that I need in the byte sequence. I’m currently looking to get and set the values for an AVCodecParameters structure. Here’s the debugger showing the bytes for this structure when I load my stereo WAV:

0100 0000 0C00 0100 0100 0000 0000 0000
0000 0000 0000 0000 0000 0000 0200 0000
0028 2300 0000 0000 1800 0000 1800 0000
9DFF FFFF 9DFF FFFF 0000 0000 0000 0000
0000 0000 0100 0000 0000 0000 0000 0000
0200 0000 0200 0000 0200 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0200 0000 80BB 0000 0600 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0200 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0100 0000
0000 0000 0000 0000 0000 0000 0000 0000

I have shown in bold what seem to be 2x sets of bit rate values (24 bit) and 1x sample rate value (48 kHz).

I’m not sure if all audio properties are set correctly, but it seems as though I need to set ‘ch_layout’, ‘sample_rate’ and ‘bit_rate’ so I can apply these settings to the codec context and read each frame.

If the two ‘1800 0000’ are ‘bits_per_coded_sample’ and ‘bits_per_raw_sample’ then does anyone see how ‘bit_rate’ stored before these two values could be set to ‘0028 2300 0000 0000’ ? I’m assuming this value would be stored in 8 bytes being a ‘int64_t’.

Any help would be much appreciated.

The structure is probably using a byte alignment other than 1. For example, with an 8-byte alignment (on a 64 bit target) the bit_rate member should appear at offset 32 (as in your hex dump) instead of offset 24 as you might expect from adding up the struct members.

1 Like

Looking on the header, I would put numbers in like this:

typedef struct AVCodecParameters

Column 1 Column 2
0 enum AVMediaType codec_type;
4 enum AVCodecID codec_id;
8 uint32_t codec_tag;
16 uint8_t *extradata;
24 int extradata_size;
28 int format;
32 int64_t bit_rate;
40 int bits_per_coded_sample;
44 int bits_per_raw_sample;
48 int profile;
52 int level;
56 int width;
60 int height;
64 AVRational sample_aspect_ratio;
72 enum AVFieldOrder field_order;
76 enum AVColorRange color_range;
80 enum AVColorPrimaries color_primaries;
84 enum AVColorTransferCharacteristic color_trc;
88 enum AVColorSpace color_space;
92 enum AVChromaLocation chroma_location;
96 int video_delay;

bit_rate is an int64, so 002823 would be 0x232800, so 2304000 bytes/second.

PS: And you may need to figure out if FF_API_OLD_CHANNEL_LAYOUT is set for the DLL or not for later fields.

2 Likes

Many thanks @Andrew_Lambert and @Christian_Schmitz. This helps me a whole lot. I’ll check it out tomorrow (getting late here now).

@Christian_Schmitz - Thanks again for your help. The info on bit_rate makes perfect sense.

I have a question on the AVCodecParameters structure list you provided. The first part of this down to bits_per_raw_sample matches the bytes in the debugger (see original post). If I then look ahead to sample_rate (value after video_delay which is the last in your list) this will have an offset of 100. However, looking at the bytes in the debugger it looks like the sample rate is stored at byte # 116 for 4 bytes. If I create a structure in Xojo using your table the sample rate value is loaded into trailing_padding which in my structure has an offset of 116. So my question: Should the byte count in the debugger match the offset values in the structure? First part of my structure:

If so, then I take it some of these values need to be changed to 8 byte data types.

I have yet to look at the AVChannelLayout structure. Many thanks.

Did you look into the header file for C++?

There is a FF_API_OLD_CHANNEL_LAYOUT flag, which defines whether some extra fields get included.

I’m assuming you want to know if FF_API_OLD_CHANNEL_LAYOUT is set to True. I’m not sure where to look for that with my limited knowledge of C/C++. I’m assuming you’re asking because of line 152 of the Codec Parameters header file. If so, then I would say that FF_API_OLD_CHANNEL_LAYOUT is False because I see the sample_rate value in the debugger. Also each time I run my app and check the debugger bytes as per my first post, it looks like all these bytes relate to this structure, as in after the last byte shown, the structure of the data in the bytes change.

If I search FF_API_OLD_CHANNEL_LAYOUT in the Finder I’m shown 67 header and ‘.c’ files.

Well, if FF_API_OLD_CHANNEL_LAYOUT is defined, the offset for sample_rate is +12 bytes.

So depending how the library was compiled, the offset is either 100 or 112.
(if I got the numbers right)

The only reference to #define FF_API_OLD_CHANNEL_LAYOUT I can find in all the FFmpeg files on my drive is:
#define FF_API_OLD_CHANNEL_LAYOUT (LIBAVUTIL_VERSION_MAJOR < 59)
in line 111 of the libavutil version.h file. On line 81 of this file there is:
#define LIBAVUTIL_VERSION_MAJOR 58.

So from this code I take it this is ‘defined’ and so uint64_t channel_layout and int channels are included, therefore requiring +12 bytes for sample_rate. Sorry, just me trying to understand what’s going on here. Again many thanks for your help - now I know where to look for this.

Back to my previous question, should the byte count in the debugger match the offset values in the structure? Or is there some 32/64 bit translation issue that means I can’t check my structure offset values with the data I find in the Xojo debugger (as per my debugger bytes shown in original post)?