Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

hey, author here

Thanks for the feedback.

"applying a similar technique in the frequency domain", "Maybe training an image reconstructor on the short term spectrogram" - This is what I originally thought to do. However, this approach suffers from information loss whenever you transform from the frequency domain back to the time domain. Since the goal was super-resolution in the time domain, working in the time domain is more sensible.



Mathematically, the DFT is invertable, ie lossless, but practically there will be a bit of loss due to the finite precision of float point numbers. Even though it isn't lossless, the amount of loss should be miniscule as compared to the 16KHz->2KHz loss you are trying to overcome.


The problem with the DFT is not whether it's lossless or not, it's that it may not be the best feature representation for a given task.

Both the DFT and the proposed model apply convolutions to the input, but in the former case, these are fixed, while in the latter, they are learned.

This is similar to how we don't use hard-coded features like SIFT or wavelets, or Gabor filters when we do image classification with a CNN.


It's not precision loss, it's that when you DFT you choose an interval. If you choose a short interval you are less certain about frequencies while if you choose a long interval you are less certain about time domain changes (i.e, changes in the signal over your time period).

Funnily enough this is similar to heisenberg's uncertainty principle, you can read about it here: http://fourier.eng.hmc.edu/e101/lectures/Fourier_Analysis/no...


When using DL, perhaps you might try to do a downsampling which would suite your DL?

I mean yes it would be awesome to use your network to upsample stuff but that is apperently hard. What about upsampling something DL friendly and trying to reduce the downsamplesize as the challange?


Since time domain content is the reconstruction target, wouldn't LSTMs be a better choice than CNNs? I would think the spectral content would be time variant and depend on the sequential history.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: