As I understand (or maybe I should say "in my opinion") the magic of the gaussian distribution lies in the two assumptions you make. You have a rotational invariant answer (X and Y are related) but you are assuming the distribution in X and Y are independent. And these are valid things to assume.
The gaussian distribution is not a particularly good representation of most real problems in the sense that the probability for large errors decreases far too rapidly. Maybe there are ideal cases you can say are gaussian, but in any real problem there are some kind of outliers. We go in to a calculation assuming we have gaussian noise but really we don't. And, we have to add additional logic to handle these "outlier" cases.
The thing that is magic is that the gaussian distribution factorizes. If we are evolving the state of a system after taking a measurement, as long as the system had gaussian errors and the measurement has a gaussian error, the system after the measurement will still be gaussian. We can paramterize our errors with two numbers, the center and the spread.
If we didn't have this factorization, the distribution would change shape after the measurement. We would have to keep a ton more information, the amount of which grows geometrically with the number of variables we have. It is just intractable.
So as I see it at least, we use this distribution because we can, more so than because it is the correct one. (But, of course, it also still does work pretty well!)
The gaussian distribution is not a particularly good representation of most real problems in the sense that the probability for large errors decreases far too rapidly. Maybe there are ideal cases you can say are gaussian, but in any real problem there are some kind of outliers. We go in to a calculation assuming we have gaussian noise but really we don't. And, we have to add additional logic to handle these "outlier" cases.
The thing that is magic is that the gaussian distribution factorizes. If we are evolving the state of a system after taking a measurement, as long as the system had gaussian errors and the measurement has a gaussian error, the system after the measurement will still be gaussian. We can paramterize our errors with two numbers, the center and the spread.
If we didn't have this factorization, the distribution would change shape after the measurement. We would have to keep a ton more information, the amount of which grows geometrically with the number of variables we have. It is just intractable.
So as I see it at least, we use this distribution because we can, more so than because it is the correct one. (But, of course, it also still does work pretty well!)