Can someone explain to me how FFmpeg seems to be the only open-source software t...

klodolph · on June 16, 2020

SoX is amazing. http://sox.sourceforge.net/

It’s not perfect but it’s way easier to use for audio stuff than FFmpeg is. I have a bunch of scripts I reuse that do basic stuff like high-pass, normalize, automatically trim audio files, add fade-in or fade-out, downmix to mono, and then resample / dither to the right depth and size.

It also will spit out spectrograms.

Generally when I need to record a ton of sound clips, I chop the audio up and rename it in a GUI editor similar to Audacity, and then do all the processing in SoX. I might also do a bunch of work in a DAW beforehand.

JNRowe · on June 16, 2020

> SoX is amazing.

100% agree.

The man pages are chock full of examples too, which is great because the tool does a lot. Some of the examples are really interesting too, such as the delay effect showing how to synthesise a guitar chord.

I use an audio player built largely around sox¹, and it allows you to take advantage of the power of sox.

1. https://80x24.org/dtas.git/about/

thesimp · on June 16, 2020

SoX is amazing because it indeed makes very nice spectrograms which visually show how audio is encoded. It makes it easy to see if this really is a lossless FLAC or a crappy 192 VBC mp3 audio source.

If you personally hear the difference is a completely different subject of course.

hiram112 · on June 16, 2020

I hadn't even thought about SOX 'till your comment in about 10 years. And looking at the page, there hasn't been a new release since 2015.

From what I recall, it only worked on wav files back in the day, but now it supports OGG. But a lot has changed in even 5 years - does it even support MP3, as patents expired since then?

JNRowe · on June 16, 2020

> From what I recall, it only worked on wav files back in the day

It depends on your build, but on my system it supports: 8svx aif aifc aiff aiffc al amb amr-nb amr-wb anb au avr awb caf cdda cdr cvs cvsd cvu dat dvms f32 f4 f64 f8 fap flac fssd gsm gsrt hcom htk ima ircam la lpc lpc10 lu mat mat4 mat5 maud mp2 mp3 nist ogg paf prc pvf raw s1 s16 s2 s24 s3 s32 s4 s8 sb sd2 sds sf sl sln smp snd sndfile sndr sndt sou sox sph sw txw u1 u16 u2 u24 u3 u32 u4 u8 ub ul uw vms voc vorbis vox w64 wav wavpcm wv wve xa xi. You can check your own with `sox --help`.

On Debian mp3 support requires `libsox-fmt-mp3`.

klodolph · on June 16, 2020

I just use SoX for processing audio data, and then pass the result to LAME if I want an MP3. Each format has so many different options for encoding and metadata anyway. It’s not like video, where the sheer amount of data discourages you from working uncompressed.

Sure, there hasn’t been a new release since 2015… but would that be necessary? It’s not missing any features I want.

LeoPanthera · on June 16, 2020

It's not important that it doesn't support mp3. That's not it's purpose - it doesn't need to. The unix philosophy. Feel free to pipeline it on either side with tools that do support MP3.

Hello71 · on June 16, 2020

according to https://github.com/chirlu/sox/commit/af261dcc91071cafd7d8305..., sox added support for Ogg Vorbis files in 2001, which is a little more than 5 years ago. since sox didn't exist until 1999 and vorbis didn't exist until 2000, that seems like pretty solid format support to me.

coldpie · on June 16, 2020

In addition to what others have said, there's also gstreamer and its suite of plugins. I find gstreamer a bit easier to work with, although both are very complex pieces of software and each have their own quirks.

If you're looking for audio production work, there's Ardour, although I haven't used it myself. http://ardour.org/

ufo · on June 16, 2020

Does gstreamer have a command-line interface in addition to the libraries?

otherjason · on June 16, 2020

Indeed it does; it's about as complex as ffmpeg, and in my opinion has a somewhat more intuitive interface for building up complicated pipelines of processing steps:

https://gstreamer.freedesktop.org/documentation/tools/index....

aljarry · on June 16, 2020

There is a "test tool" for gstreamer pipelines: gst-launch, but generally it's encouraged to run the gstreamer more as a library instead.

Example, if you have gstreamer libs installed:

gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink

oplav · on June 16, 2020

You can use gst_parse_launch to create a pipeline using the launch syntax.

I've found this helpful to prototype with gst-launch-1.0 and then pull into a separate program down the road. I found it to be pretty hairy trying to create and link all the individual elements manually in complex pipelines.

https://gstreamer.freedesktop.org/documentation/gstreamer/gs...

johnvaluk · on June 16, 2020

My DAW is bash+sox+ecasound because I don't want to be distracted by visuals when working with audio. However, I just started working on a project involving about 15 hours of digital audio recorded under less than ideal circumstances a couple of decades ago and need a reliable way to analyze the data. SoX produces spectrograms that are insufficient for my needs and I've had reliability issues with Audacity. So far, DFasma looks very promising:

https://gillesdegottex.gitlab.io/dfasma-website/

ttctciyf · on June 17, 2020

Maybe jaaa[0] and japa[1] are worth a look (from Fons Adriaensen's suite of audio tools[2])

0: https://kokkinizita.linuxaudio.org/linuxaudio/jaaa-pict.html

1: https://kokkinizita.linuxaudio.org/linuxaudio/japa-pict.html

2: https://kokkinizita.linuxaudio.org/linuxaudio/index.html

KennyFromIT · on June 16, 2020

What ever happened to Facebook's (or was it Netflix?) technology to create a new unit of time measurement to help align audio and video files? I believe it was called a "flick"...

aspenmayer · on June 16, 2020

Looks like it's archived?

https://github.com/facebookarchive/Flicks

via

https://gizmodo.com/facebook-invented-a-new-unit-of-time-182...

kevin_thibedeau · on June 16, 2020

Doesn't support NTSC. Better to force everyone to deal with irrational sampling rates than pretend they don't exist.

zerocrates · on June 17, 2020

The NTSC framerates are weird, but they are rational, in the mathematical sense.

jononor · on June 16, 2020

Python with librosa is also quite practical. Can work interactively by doing it in a Jupyter notebook. Some tutorials (from 2015). https://www.youtube.com/watch?v=0ALKGR0I5MA https://www.youtube.com/watch?v=MhOdbtPhbLU

Not affiliated with the project, just use it quite a lot.

pitaj · on June 16, 2020

I don't know much, but isn't Audacity a thing?

mattacular · on June 16, 2020

Audacity is a great GUI for working with audio files. I would think it has a way to export a graph of the wave that it shows you when you open up an audio file.

You can install an FFMPEG plugin for Audacity if you need broader support of audio formats (either import or export).

voltagex_ · on June 17, 2020

At least on Windows, Audacity only supports an ancient version of ffmpeg. I don't know why.

72deluxe · on June 16, 2020

Try https://github.com/bbc/audiowaveform for waveforms

DonnyV · on June 16, 2020

What about Reaper? Its a full fledged DAW. http://reaper.fm/

encom · on June 16, 2020

Reaper is proprietary.

qppo · on June 16, 2020

I mean there's plenty of open source stuff for audio. Visualization/oscilloscope tools included.

But that kind of visualization isn't what I'd call "basic" functionality, since it's functionally useless for the vast majority of audio applications.

ivansavz · on June 16, 2020

In case that's helpful, here is the code we use to generate waveforms using: .mp3 -(1)-> .wav -> numpy -> matplotlib: https://github.com/learningequality/pressurecooker/blob/mast...

ffmpeg is used for the key step (1) ...

ArgyleSound · on June 16, 2020

why duplicate something that works really well?

LaLaLand122 · on June 16, 2020

- You don't trust so much complex logic, taking untrusted input, written in C and want to rewrite it in Rust.

- You want to code it all again using an API that doesn't expect to get its input from a blocking read() function.

- ...

I think the main reason there isn't any alternative is that it supports soooo many formats that the task seems impossible to anybody thinking about it.

Erlich_Bachman · on June 16, 2020

> - You want to code it all again using an API that doesn't expect to get its input from a blocking read() function.

In which real world situation/scenario is this a problem? It is hard to think of one, but I am probably missing something?

In any case, if that was a real show-stopper, it would probably be much wiser to go with a fork that would modify that one thing, instead of re-writing the whole project.

toast0 · on June 17, 2020

I could see it being an issue if you were doing a bunch of streaming transcodes, and wanted that in an event loop instead of blocking... but

a) you're probably going to want to control the number of simultaneous streams to a low enough number that you could just fork

b) the responsible thing to do when decoding streams with ffmpeg is to disable all formats except your whitelisted format, but still sandbox the heck out of it, because there's been a lot of CVEs where a crafted input allows remote code execution

Sandboxing is going to be much more complete if the ffmpeg process is only dealing with one input fd, one output fd (maybe an error reportint fd), and no network or filesystem access --- you don't want a decoder error to influence media you're encoding/decoding for another user.