Everybody seems to be missing the fact that this was nothing more than a lewd pun. "Unprojected text" = "Unprotected sex". What would it even mean to "never have unprojected text"?
Nit-picking, but points form an "affine space": in particular, there's no meaningful way to "add" points; or to "scale" a point. We can only talk about the relations between points (e.g. "the midpoint of A and B"; or "the set of points coplanar to A, B and C") https://en.wikipedia.org/wiki/Affine_space
In contrast, elements of a vector space have absolute properties; in particular, there is a "zero vector" whose magnitude is 0. Vectors can be added and scaled in a meaningful way.
We can obtain a vector space from an affine space of points by considering the displacements between points (i.e. arrows going from one point to another): the zero vector is the displacement from any point to itself; displacements can be added (first follow one arrow, then the other); and displacements can be scaled (changing the magnitude, whilst keeping the direction fixed).
To form a vector space of points, we can choose some arbitrary point in our affine space (which we call "the origin") and consider the vector space of displacements from our chosen origin. In other words, if we take the "arrows" from our vector space of displacements, and have them all start at the same point (our arbitrary "origin"), then each arrow corresponds to a point (its end), and each point corresponds to an arrow (ending at that point, starting at the origin). The origin point itself corresponds to the zero vector of displacements. This way, we've made points equivalent to displacements (from "the origin"), and since the latter form a vector space, so do the former.
However, despite these shenanigans it's important to keep in mind that it's still only meaningful to talk about points in a relative way. That's explicit in the affine approach; whilst the vector approach always has an implicit "relative to this particular origin".
Good question. How does one simply find the bounding quad of rotated perspective text? Will that handle perspective distortion?
I guess the author partly answers your question early on with discussion of the Merino-Gracia paper, which fits a quad to individual lines of text, and a comment about how that relies on first being able to detect lines of text.
Matt also doesn’t claim this method is better. He says “I’m sure its neither as accurate or as useful as the Merino-Gracia approach.“ I assume the example text “Needlessly Complex” is a bit of self-deprecating humor, acknowledging he may not be taking the easiest path there is. But the method here seems interesting and useful to me for its approach; it doesn’t have to identify word or page boundaries, or lines of text, as a prerequisite. The assumptions are simple and the optimization is simple, it’s a nice study in different ways to think about the problem.
Finding linear boundaries of a wile block of text is much easier than finding letter boundaries.
It's a 1980s textbook matter of finding lines where the brightness gradient is extremely large.
Which algorithm are you referring to? I have a copy of Jain et al and I can’t find what you’re describing. Do you have a link to something? The Hough transform is used in this article, if that’s what you’re thinking of, but that will not work to find the bounding box of text, the lines have to be solid, contiguous, and linear for that to work. Note the method in the article doesn’t depend on the text having a solid surround color, or even have the text arranged in a roughly rectangular shape. And it also doesn’t depend on the text being linear. These differences are valuable, not having to make the same assumptions you’re making, and it means this method (whether or not it’s “better”) may work in a wider variety of situations, or may make a very good complement to existing methods.
The method does have to identify lines of text to find the rotation angle, but doing so after perspective correction using the "all letters should be about the same size" assumption means that a Hough transform is enough for that step, since the lines should already be roughly parallel.
(Having to identify page boundaries is handwaved away with "I’m going to make a huge simplifying assumption that that the image we’re processing basically contains only the text that we want to rectify")