Can someone explain to me how FFmpeg seems to be the only open-source software to do even just basic functionality with audio.
I was looking at getting the sound wave graph for a piece of audio a while ago, and not only was FFmpeg the only option I found to be able to do it, it was amazingly fast and also free.
It’s not perfect but it’s way easier to use for audio stuff than FFmpeg is. I have a bunch of scripts I reuse that do basic stuff like high-pass, normalize, automatically trim audio files, add fade-in or fade-out, downmix to mono, and then resample / dither to the right depth and size.
It also will spit out spectrograms.
Generally when I need to record a ton of sound clips, I chop the audio up and rename it in a GUI editor similar to Audacity, and then do all the processing in SoX. I might also do a bunch of work in a DAW beforehand.
The man pages are chock full of examples too, which is great because the tool does a lot. Some of the examples are really interesting too, such as the delay effect showing how to synthesise a guitar chord.
I use an audio player built largely around sox¹, and it allows you to take advantage of the power of sox.
SoX is amazing because it indeed makes very nice spectrograms which visually show how audio is encoded. It makes it easy to see if this really is a lossless FLAC or a crappy 192 VBC mp3 audio source.
If you personally hear the difference is a completely different subject of course.
I hadn't even thought about SOX 'till your comment in about 10 years. And looking at the page, there hasn't been a new release since 2015.
From what I recall, it only worked on wav files back in the day, but now it supports OGG. But a lot has changed in even 5 years - does it even support MP3, as patents expired since then?
> From what I recall, it only worked on wav files back in the day
It depends on your build, but on my system it supports: 8svx aif aifc aiff aiffc al amb amr-nb amr-wb anb au avr awb caf cdda cdr cvs cvsd cvu dat dvms f32 f4 f64 f8 fap flac fssd gsm gsrt hcom htk ima ircam la lpc lpc10 lu mat mat4 mat5 maud mp2 mp3 nist ogg paf prc pvf raw s1 s16 s2 s24 s3 s32 s4 s8 sb sd2 sds sf sl sln smp snd sndfile sndr sndt sou sox sph sw txw u1 u16 u2 u24 u3 u32 u4 u8 ub ul uw vms voc vorbis vox w64 wav wavpcm wv wve xa xi. You can check your own with `sox --help`.
I just use SoX for processing audio data, and then pass the result to LAME if I want an MP3. Each format has so many different options for encoding and metadata anyway. It’s not like video, where the sheer amount of data discourages you from working uncompressed.
Sure, there hasn’t been a new release since 2015… but would that be necessary? It’s not missing any features I want.
It's not important that it doesn't support mp3. That's not it's purpose - it doesn't need to. The unix philosophy. Feel free to pipeline it on either side with tools that do support MP3.
according to https://github.com/chirlu/sox/commit/af261dcc91071cafd7d8305..., sox added support for Ogg Vorbis files in 2001, which is a little more than 5 years ago. since sox didn't exist until 1999 and vorbis didn't exist until 2000, that seems like pretty solid format support to me.
In addition to what others have said, there's also gstreamer and its suite of plugins. I find gstreamer a bit easier to work with, although both are very complex pieces of software and each have their own quirks.
If you're looking for audio production work, there's Ardour, although I haven't used it myself. http://ardour.org/
Indeed it does; it's about as complex as ffmpeg, and in my opinion has a somewhat more intuitive interface for building up complicated pipelines of processing steps:
You can use gst_parse_launch to create a pipeline using the launch syntax.
I've found this helpful to prototype with gst-launch-1.0 and then pull into a separate program down the road. I found it to be pretty hairy trying to create and link all the individual elements manually in complex pipelines.
My DAW is bash+sox+ecasound because I don't want to be distracted by visuals when working with audio. However, I just started working on a project involving about 15 hours of digital audio recorded under less than ideal circumstances a couple of decades ago and need a reliable way to analyze the data. SoX produces spectrograms that are insufficient for my needs and I've had reliability issues with Audacity. So far, DFasma looks very promising:
What ever happened to Facebook's (or was it Netflix?) technology to create a new unit of time measurement to help align audio and video files? I believe it was called a "flick"...
Audacity is a great GUI for working with audio files. I would think it has a way to export a graph of the wave that it shows you when you open up an audio file.
You can install an FFMPEG plugin for Audacity if you need broader support of audio formats (either import or export).
- You don't trust so much complex logic, taking untrusted input, written in C and want to rewrite it in Rust.
- You want to code it all again using an API that doesn't expect to get its input from a blocking read() function.
- ...
I think the main reason there isn't any alternative is that it supports soooo many formats that the task seems impossible to anybody thinking about it.
> - You want to code it all again using an API that doesn't expect to get its input from a blocking read() function.
In which real world situation/scenario is this a problem? It is hard to think of one, but I am probably missing something?
In any case, if that was a real show-stopper, it would probably be much wiser to go with a fork that would modify that one thing, instead of re-writing the whole project.
I could see it being an issue if you were doing a bunch of streaming transcodes, and wanted that in an event loop instead of blocking... but
a) you're probably going to want to control the number of simultaneous streams to a low enough number that you could just fork
b) the responsible thing to do when decoding streams with ffmpeg is to disable all formats except your whitelisted format, but still sandbox the heck out of it, because there's been a lot of CVEs where a crafted input allows remote code execution
Sandboxing is going to be much more complete if the ffmpeg process is only dealing with one input fd, one output fd (maybe an error reportint fd), and no network or filesystem access --- you don't want a decoder error to influence media you're encoding/decoding for another user.
I was looking at getting the sound wave graph for a piece of audio a while ago, and not only was FFmpeg the only option I found to be able to do it, it was amazingly fast and also free.