Let me get this straight: You're saying that for most weights, the gradient won't point towards the minimum (which makes sense since they are highly correlated), but for certain groups of weights the gradient is well aligned with the eigenvector, and that group makes a large step toward the optimum and becomes the "winning ticket".
Well, maybe, but won't the gradient become misaligned just after the first step?
Well, maybe, but won't the gradient become misaligned just after the first step?