hey, author here Thanks for the feedback. "applying a similar technique in the f...

tasty_freeze · on June 24, 2017

Mathematically, the DFT is invertable, ie lossless, but practically there will be a bit of loss due to the finite precision of float point numbers. Even though it isn't lossless, the amount of loss should be miniscule as compared to the 16KHz->2KHz loss you are trying to overcome.

volkuleshov · on June 24, 2017

The problem with the DFT is not whether it's lossless or not, it's that it may not be the best feature representation for a given task.

Both the DFT and the proposed model apply convolutions to the input, but in the former case, these are fixed, while in the latter, they are learned.

This is similar to how we don't use hard-coded features like SIFT or wavelets, or Gabor filters when we do image classification with a CNN.

zxcmx · on June 24, 2017

It's not precision loss, it's that when you DFT you choose an interval. If you choose a short interval you are less certain about frequencies while if you choose a long interval you are less certain about time domain changes (i.e, changes in the signal over your time period).

Funnily enough this is similar to heisenberg's uncertainty principle, you can read about it here: http://fourier.eng.hmc.edu/e101/lectures/Fourier_Analysis/no...

sigi45 · on June 24, 2017

When using DL, perhaps you might try to do a downsampling which would suite your DL?

I mean yes it would be awesome to use your network to upsample stuff but that is apperently hard. What about upsampling something DL friendly and trying to reduce the downsamplesize as the challange?

hcrisp · on June 24, 2017

Since time domain content is the reconstruction target, wouldn't LSTMs be a better choice than CNNs? I would think the spectral content would be time variant and depend on the sequential history.