simple proof that you can't get better than O(n^2) is that you need to produce an n x n matrix with O(n^2) distinct values, so you need at least O(n^2) just to print the result.
The problem is defined to be over all (real-valued?) nxn matrices. It's fine if your method uses a representation other than the typical row- normal or column-normal form, but its performance is going to be measured over the worst case.
No, it doesn’t. You still need to produce n² zeroes for the result. If you multiply two n × n matrices, the result isn’t a vector; it’s a n × n matrix.
By that logic, every algorithm is O(1). You first limit it to a single input, then you “correctly define” the output for that input using some fixed length string, and you’re golden.
Thus the importance of a good definition 'reasonable'. I can't remember the one in that one course 10 years ago.
However, yes, there are big differences based on the encoding. Nobody would accept using unary input encoding to artificially inflate the size of the input. It's also why the time-compelixity of prime-testing is sometimes confusing.