Audio: determining volume of music file

I have an audio challenge…

I’m trying to build a simple audio player for playing music from my iTunes library, a bit like the systems you see in a bar: very user-friendly, with simple cross fading and simple seek functionality to add songs to the running playlist. I want to keep all management of music & playlists and all importing of musicfiles, editing properties, genre etc. in iTunes and have a clean and simple player.

Why you might ask? Partly because I don’t like iTunes for playing music anymore (way to bloated and hard to find music and manage a running playlist) and partly because I like to program and need a new project :wink:

I have built a working prototype setup that plays mp3 and AAC files but I still have one major problem I need to take care of before I can proceed with my project…

My iTunes library currently contains about 15.000 songs and has taken me 14 years to collect. This results in a wide variety in volume of all these music files. iTunes corrects this by changing the volume to a preset value while playing a file. This value is not stored in the iTunes library, so I presume it is stored in the ID3 tag? My problem is that without this correction, my music files are all over the place regarding volume. One song plays soft, the next is loud, etc.

I looked at different ways to overcome this problem, but I think the easiest way is to read the ID3 tag and see if there may be volume info in there? I don’t think it will be easy to read the “music data” to try and establish a volume level in my program?

I use sox


sox ./OldPhone.mp3 -n stat

[quote]Samples read: 442367
Length (seconds): 10.030998
Scaled by: 2147483647.0
Maximum amplitude: 0.618559
Minimum amplitude: -0.568570
Midline amplitude: 0.024994
Mean norm: 0.083623
Mean amplitude: -0.003463
RMS amplitude: 0.117218
Maximum delta: 0.628859
Minimum delta: 0.000000
Mean delta: 0.028367
RMS delta: 0.045607
Rough frequency: 2730
Volume adjustment: 1.617

Wow, that looks like a magnificent app, thanks! Is it possible to call Sox from within Xojo and get the data into your own program somehow?


Hmm. Tried Sox and it seems to work, even getting the output in my program, but Sox comes without mp3 support (because it’s patented so it seems). Need to recompile your own Sox with mp3 support and no way I know how to do that…

in Linux

sudo apt-get install libsox-fmt-mp3

in OSX you can try brew

brew install sox --with-lame

more at,-flac,-vorbis,-ao,-amr,-opus-support

Technically you could also use SoundFile or AVFoundation classes in MBS Plugins to get the samples and calculate he average value for the frames.

I wonder if ffmpeg might help here…

ffmpeg -i song.mp3 -acodec pcm_u8 -ar 22050 song.wav

Once you have the wav file I would recommend building an amplitude histogram of the entire wav samples.
Find you mean and +/- std deviations and use those values to control your volume before you play the wav.

Like the Brian say.
Peak and RMS Normalization

ffmpeg -I myfile.mp3 -filter:a volumedetect -f null /dev/null

Read the output values from the command line log:
calculate the required offset, and use the volume filter

[Parsed_volumedetect_0 @ 0x7f8ba1c121a0] mean_volume: -16.0 dB [Parsed_volumedetect_0 @ 0x7f8ba1c121a0] max_volume: -5.0 dB
Volume Filter

ffmpeg -i myfile.mp3 -filter:a "volume=0.5" output.wav

Dynamic Audio Normalizer.

This filter applies a certain amount of gain to the input audio in order to bring its peak magnitude to a target level (e.g. 0 dBFS). However, in contrast to more “simple” normalization algorithms, the Dynamic Audio Normalizer dynamically re-adjusts the gain factor to the input audio. This allows for applying extra gain to the “quiet” sections of the audio while avoiding distortions or clipping the “loud” sections. In other words: The Dynamic Audio Normalizer will “even out” the volume of quiet and loud sections, in the sense that the volume of each section is brought to the same target level. Note, however, that the Dynamic Audio Normalizer achieves this goal without applying “dynamic range compressing”. It will retain 100% of the dynamic range within each section of the audio file.
More infos

The ffmpeg program only works after a lengthy analysis of the sound file and I think that will mess up the flow of my program. Dynamic Audio Normalizing was actually my “wet dream”. I have an old Windows DJ program called OtsJuke or OtsDJ that has a live audio compression and normalization system and that is really cool for the problem I have here, but creating this in my own program will be impossible, because you have to analyse and change a live audio stream.

My idea was to do the analysis “on the fly” just before a song plays. My program runs with two Movieplayer controls that are the two decks that play music. When player 1 starts playing, the next song on the playlist can be analyzed and after that can be loaded up in player 2. Volume of player 2 will be adjusted according to the results, player 2 is ready to play/fade the song in when needed.

At the moment I have Sox working… sort of. Sox works fine in my Terminal window, does everything I need it to, even with mp3 files (no m4a files, but that’s not that important). However, it will not work in my Shell from Xojo, since the Sox command is not recognized there resulting in an error. Very strange…

Maybe I PATH problem have you install the brew?

It must be something like that.

I compiled and installed Sox with Brew and the strange thing is that in Terminal it works like a charm. However in Shell there is an error saying that Sox is an unknown command (so Shell uses different path settings as Terminal does and the Xojo manual confirms that). I don’t know where the Sox command is installed, so I can’t add the path to it when I call the program from within Shell.

Frustrating being so close and not being able to get it working… :wink:

Make sure you are using the full path for Sox in the Shell command.

That’s the problem: after I “brewed” (compiled and installed) Sox I don’t know where my Mac installed the program file, so I don’t know the full path.

use whereis command in terminal

use whereis command in terminal

Yess! Found Sox and with it the way to use it in my program. Sox is now returning the volume settings for each mp3 file and I can compensate accordingly. Sounds nice at the moment!!

Part of the reason I suggested ffpeg was to convert the mp3 into something ‘legible’ by software such that you might be able to stream as you process.
maybe ffmpeg has some real time streaming you can take advantage of.

anyways, sounds like you’ve got lots of options!
Don’t give up on your goal.

Brian, I want to kiss you :wink:

After a struggle with Sox I revisited your suggestion: ffmpeg. This package also comes with “ffprobe” and guess what? This program can read the iTunNORM tag off of an mp3 or AAC file and that was exactly what I was looking for!! This tag holds iTunes’ magical volume normalization info and that was what I needed.

This discovery immediately dropped me in another deep hole: interpreting the iTunNORM tag, since it holds 10 fields of 8 character long strings. These strings have hexadecimal values and the first two fields hold the values that are connected to the normalized volume in iTunes. After some hours of Googling I found some obscure forum with a topic that explains how to calculate from this hexadecimal value to the much needed volume info. Now I can calculate the volume in dB.

Next step is to make a connection between this dB value and the volume setting of my Movieplayer control.

Anybody have any idea’s about that :slight_smile:

Can you copy and paste the output from ffprobe so i can have a look