> We can divide by a number (scaling_factor) to scale down its magnitude to the ...

KidComputer · on May 21, 2019

That's because octave is using doubles. You can do the exact same thing in PyTorch by passing in dtype=torch.float64 into torch.randn.

axiom92 · on May 21, 2019

> They are large, but far from overflowing.

Sure, but isn't large relative? Sure you can make them overflow in octave as well, given enough layers. Which brings us to next point :-)

> And this corresponds to a network of deep 100, which is not a realistic scenario.

Actually deep 100 is not unrealistic at all these days! https://arxiv.org/abs/1611.09326

L2R · on May 22, 2019

There are approaches to ensure parameters remain stable despite the depth (selu, for example).