Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey OP here!

When I first was introduced to matrices (high school) it was in the context of systems of equations. Matrices were a shorthand for writing out the equations and happened to have interesting rules for addition etc. It took me a while to think about them as functions on their own right and not just tables. This post is my attempt to relearn them as functions which has helped me develop a much stronger intuition for linear algebra. That’s my motivation for this post and why I decided to work on it. Feedback is more than welcome.



What got me for a while was the concept of a tensor:

For example: What is a tensor?

Wrong way to answer it: Well, the number 5 is a tensor. So's a row vector. So's a column vector. So's the dot product and the cross product. So's a two-dimensional matrix. So's a four-dimensional matrix, just... don't ask me to write one on the board, eh? So's this Greek letter with smaller Greek letters arranged on its top right and bottom right. Literally anything you can think of is a tensor, now... try to find some conceptual unity.

Then coordinate-free fanaticism kicked in, robbing the purported explanations of any explanatory power in terms of practical applications of tensors. The only thing they could do was shift indices around.

What finally made it stick is decomposing every mathematical concept into three parts:

1. Intuition, or why we have the concept to begin with.

2. Definitions, or the axioms which "are" the concept in some formal sense.

3. Implementations, or how we write specific instances of the concept down, including things like the source code of software which implements the concept.


If you ask a mathematician a tensor is an element of a tensor product, just like a vector is an element of a vector space. This moves the question to "what is a tensor product", which you can think about as a way to turn bilinear maps into linear maps (this is an informal statement of the universal property of the tensor product, you also need a proof of existence of such an object, but it's easy for vector spaces and alright for modules after seeing enough algebra)


Crikey, I hope I never have to talk to that mathematician! That's a terse, unintuitive definition that isn't very helpful unless you're already familiar with the concepts. (Also maybe you meant linear maps into bilinear?)

Reminds me of the time an algebraist mentioned to me that he was working on profinite group theory. I asked what a profinite group was, and he immediately replied 'an inverse limit of an inverse system', with no follow up. Well thanks buddy, that really opened my eyes.


Math is just a much deeper topic than most others. The things people do in research level math can take a really long time to explain to a lay person because of the many layers of abstraction involved.


It is a very deep and specialised topic. However, there are ways to convey intuition to a 'mathematically mature' audience, and there are quick definitions that are correct but unenlightening. I much prefer the former :)


No, it turns bilinear maps into linear one! If you have three R-modules (one can read K-vector spaces if unfamiliar with modules) N,M,P and a bilinear map N×M→P then there is a unique linear map N⊗M→P compatible with the map N×M→N⊗M which is part of the structure of a tensor product. (What's really going on here in fancy terms is the so called Hom-Tensor adjunction because the _⊗M functor is adjoint to the Hom(M,_) functor, but just thinking about bilinear and linear maps is much clearer)


Ok fair enough, I think I get you.


I think the ideas behind the coordiate-free formulation of tensor calculus make it relatively easy though.

A tensor is a function that takes an ordered set of N covariant vectors (i.e. row vectors) and M contravariant vectors (i.e. column vectors) and spits out a real number. It has to be linear in each of its arguments.

I'm pretty sure all the complicated transforms follow from that definition (though you may have to assume the Leibniz rule - I can't remember), and from ordinary calculus.


As a layman, the word "tensor" always intimidated me. As a programmer, I was surprised then when I found out that a tensor is just a multi-dimensional array (where the number of dimensions can be as small as 0). That was a concept I was already quite comfortable with.


That's a bit like saying a vector is 'a row of numbers'. Not incorrect, but missing the point. What matters is what vectors do. It's the properties like pointwise addition, scalar multiplication, and existence of an inverse that make vectors vectors.


You're confusing a tensor with its representation. Tensors are objects which must obey a certain set of rules. (Which rules depends on whether you're talking to a mathematician or a physicist.)


That’s not really what a tensor is; this simplification is due to tensorflow I think?


I always just thought of it as a thing that is indexable.


It's a nice article - you focus on matrices as a kind of operator that takes a vector as input and produces another vector. This is one side of the coin.

The other interpretation is that matrices are functions that take two arguments (a row vector and a column vector) and produce a real number. IMO this interpretation opens the door to deeper mathematics. It links in to the idea that a column vector is a functional on a row vector (and vice versa), giving you the notion of dual space, and ultimately leading on to differential forms. It also makes tensor analysis much more natural in general.


If you're going to attempt a definition like this you need some more conditions (approaching concepts like linearity, for instance). Otherwise you can have a black box like x^3y^3-u^3v^3 that takes in [u,v] and [x,y]^T and spits out a real number; that's not a matrix-y operation.


That didn’t make any sense to, and I work with matrices every day. Are you trying to describe a dot product?


I'm describing a somewhat unusual way of thinking about vectors, matrices etc. At least, it's unusual from the perspective of someone with an engineering / CS background.

First think about row and column vectors. A row vector and a column vector can be combined via standard matrix multiplication to produce a real number. From that perspective, a row vector is a function that takes a column vector and returns a real number. Similarly, column vectors take row vectors as arguments and produce real numbers.

It turns out that row (column) vectors are the only linear functions on column (row) vectors. This result is known as the Reisz representation theorem. If I give you a linear function on a row vector, you can find a column vector so that computing my function is equivalent to calculating a matrix multiply with your column vector.

Now on to matrices. Matrices take one row vector and one column vector and produce a real number. I can feed a matrix a single argument - the row vector, say - so that it becomes a function that takes one more argument (the column vector) before it returns a real number. Sort of like currying in functional programming. But as we said, the only linear functions that map column vectors into real numbers are row vectors. So by feeding our matrix one row vector, we've produced another row vector. This is the "matrices transform vectors" perspective in the OP's article. But I think the "Matrices are linear functions" perspective is more general and more powerful.

This perspective of vectors, matrices, etc... as functions might seem needlessly convoluted. But I think it's the right way to think about these objects. Tricky concepts like the tensor product and vector space duality become relatively trivial once you come to see all these objects as functions.


I appreciate you trying to explain it, however I believe it would have really helped if you started from the examples where this way of thinking is useful: you mentioned tensor product and vector space duality, however unless someone is already familiar with these concepts (I'm not) then it does seem needlessly convoluted. Are there any practical applications of these concepts that you can describe?


I haven't been involved in abstract math in close to a decade, but I think it's a description of the (general) inner product. So, a generalization of the dot product. The classic dot product is that operation with the identity matrix. My understanding is that using matrices that way is very common in physics.


Any example of some practical use that would make it easier to understand?


>geometrically, all linear maps can be thought of as rotations and scalings.

and reflections.


A reflection is just a scaling with a negative factor.


And projections. Without them you only get linear maps with nonzero determinant.


Shearings cannot be represented in this way.


Yes they can. This follows from singular value decomposition. Let S be the matrix representation of a shear transformation. There exist rotation matrices R, B and a diagonal matrix D such that S = RDC, where C is the transpose of B. D is the matrix representation of a scaling transformation and R, B are the matrix representations of rotation transformations. Since S is a product of rotation and scaling matrices, its corresponding linear transformation is a composition of rotations and scalings.

It would ordinarily be weird to represent shear transformations using rotations and scalings because shear matrices are elementary. But it checks out.


OK, point taken. I considered "scaling" in a less general sense (scalar multiple of the unit matrix), while you want to allow arbitrary diagonal entries. My definition is to my knowledge the common one in linear algebra textbooks because in yours, the feasible maps depend on the chosen basis.

EDIT: To state my point more clearly: in textbooks, "scaling" is the linear map that is induced by the "scalar multiplication" in the definition of the vector space (that is why both terms start with "scal").


Reminds me of the old "hack" to use three shear transformations to rotate an image.

The idea being that a shear is relatively much faster on weaker CPUs, relative to doing a "proper" (reverse mapping) rotation.

A nice write-up can be found here: https://www.ocf.berkeley.edu/~fricke/projects/israel/paeth/r...


Notably though, shearings are very 'rare'. Any pertubation will make a shearing no longer a shearing. At least, if I remember correctly.


Same for the the orthogonal matrices, or the diagonal matrices, or the symmetric matrices, or the unit determinant matrices, or the singular matrices ... They are all sets of Lebesgue measure zero.


Orthogonal, diagonal, symmetric, and unit-determinant matrices are all sub-groups though, which makes them 'more special' then all shearing matrices.

Singular matrices are special in the sense that they keep the matrix monoid from being a group. My category theory isn't strong enough to characterize it, but this probably also has a name.

Edit: I think the singular matrices are the 'kernel' of the right adjoint of the forgetful functor from the category of groups to the category of monoids. Though I must admit a lot of that sentence is my stringing together words I only vaguely know.


They can if you add a dimension to the space. That's one of the reasons 3d graphics use 4d vectors and matrices.


You're talking about translation.


No, I wasn't, but I did confuse the terms. Shear can be done without the extra dimension. Skew transforms require the extra dimension, as does translation.


What do you mean by skew? A perspective transformation (homography)? I'm not sure it's standard terminology.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: