"applying a similar technique in the frequency domain", "Maybe training an image reconstructor on the short term spectrogram" - This is what I originally thought to do. However, this approach suffers from information loss whenever you transform from the frequency domain back to the time domain. Since the goal was super-resolution in the time domain, working in the time domain is more sensible.
Mathematically, the DFT is invertable, ie lossless, but practically there will be a bit of loss due to the finite precision of float point numbers. Even though it isn't lossless, the amount of loss should be miniscule as compared to the 16KHz->2KHz loss you are trying to overcome.
It's not precision loss, it's that when you DFT you choose an interval. If you choose a short interval you are less certain about frequencies while if you choose a long interval you are less certain about time domain changes (i.e, changes in the signal over your time period).
When using DL, perhaps you might try to do a downsampling which would suite your DL?
I mean yes it would be awesome to use your network to upsample stuff but that is apperently hard. What about upsampling something DL friendly and trying to reduce the downsamplesize as the challange?
Since time domain content is the reconstruction target, wouldn't LSTMs be a better choice than CNNs? I would think the spectral content would be time variant and depend on the sequential history.
Thanks for the feedback.
"applying a similar technique in the frequency domain", "Maybe training an image reconstructor on the short term spectrogram" - This is what I originally thought to do. However, this approach suffers from information loss whenever you transform from the frequency domain back to the time domain. Since the goal was super-resolution in the time domain, working in the time domain is more sensible.