Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Unprojecting text with ellipses (2016) (mzucker.github.io)
151 points by nmstoker on May 19, 2024 | hide | past | favorite | 21 comments


I thought at first that it was about https://en.wikipedia.org/wiki/Ellipsis , which makes sense in a textual context, not about https://en.wikipedia.org/wiki/Ellipse , so it took me a minute to understand the relevance of the article.


Never have unprojected text. I learned the hard way it's just not worth it.


If you never have done it, how can you have learned the hard way that it's not worth it?


Since we're nitpicking, OP said:

"Never have unprojected text."

Not:

"I never have [...]"

The absence of an explicit subject means that another correct interpretation of the sentence is that the OP is giving you some good advice.


Good advice about what? Unprotected text is worse than the alternatives, projected text, or not text at all?

That's an outrageous claim that needs some sort of justification.


A valid interpretation of the OP's sentence includes the advice never to have unprojected text.

Im not evaluating the worth of said advice, just the grammar, to play nitpick tennis with alex_duf who graciously conceded the point.


Everybody seems to be missing the fact that this was nothing more than a lewd pun. "Unprojected text" = "Unprotected sex". What would it even mean to "never have unprojected text"?


I see! I had failed to parse the sentence!


What would I have to learn to understand all the maths in this post?



It’s only basic pre-algebra and matrix multiplication. Plus, the typical Mathematicians’ love of variable naming and use of the tilde.

Matrix equations are really just shorthand for several related equations. The notation can be a bit unsettling if you aren’t used to it.


Linear Algebra. A point in space can be though as a vector. Rotation and scaling are done by multiplying a vector with a matrix.


Nit-picking, but points form an "affine space": in particular, there's no meaningful way to "add" points; or to "scale" a point. We can only talk about the relations between points (e.g. "the midpoint of A and B"; or "the set of points coplanar to A, B and C") https://en.wikipedia.org/wiki/Affine_space

In contrast, elements of a vector space have absolute properties; in particular, there is a "zero vector" whose magnitude is 0. Vectors can be added and scaled in a meaningful way.

We can obtain a vector space from an affine space of points by considering the displacements between points (i.e. arrows going from one point to another): the zero vector is the displacement from any point to itself; displacements can be added (first follow one arrow, then the other); and displacements can be scaled (changing the magnitude, whilst keeping the direction fixed).

To form a vector space of points, we can choose some arbitrary point in our affine space (which we call "the origin") and consider the vector space of displacements from our chosen origin. In other words, if we take the "arrows" from our vector space of displacements, and have them all start at the same point (our arbitrary "origin"), then each arrow corresponds to a point (its end), and each point corresponds to an arrow (ending at that point, starting at the origin). The origin point itself corresponds to the zero vector of displacements. This way, we've made points equivalent to displacements (from "the origin"), and since the latter form a vector space, so do the former.

However, despite these shenanigans it's important to keep in mind that it's still only meaningful to talk about points in a relative way. That's explicit in the affine approach; whilst the vector approach always has an implicit "relative to this particular origin".


Why is this better than simply finding the bounding quadrilateral of the text, and rectangularizing that?


Good question. How does one simply find the bounding quad of rotated perspective text? Will that handle perspective distortion?

I guess the author partly answers your question early on with discussion of the Merino-Gracia paper, which fits a quad to individual lines of text, and a comment about how that relies on first being able to detect lines of text.

Matt also doesn’t claim this method is better. He says “I’m sure its neither as accurate or as useful as the Merino-Gracia approach.“ I assume the example text “Needlessly Complex” is a bit of self-deprecating humor, acknowledging he may not be taking the easiest path there is. But the method here seems interesting and useful to me for its approach; it doesn’t have to identify word or page boundaries, or lines of text, as a prerequisite. The assumptions are simple and the optimization is simple, it’s a nice study in different ways to think about the problem.


Finding linear boundaries of a wile block of text is much easier than finding letter boundaries. It's a 1980s textbook matter of finding lines where the brightness gradient is extremely large.


Which algorithm are you referring to? I have a copy of Jain et al and I can’t find what you’re describing. Do you have a link to something? The Hough transform is used in this article, if that’s what you’re thinking of, but that will not work to find the bounding box of text, the lines have to be solid, contiguous, and linear for that to work. Note the method in the article doesn’t depend on the text having a solid surround color, or even have the text arranged in a roughly rectangular shape. And it also doesn’t depend on the text being linear. These differences are valuable, not having to make the same assumptions you’re making, and it means this method (whether or not it’s “better”) may work in a wider variety of situations, or may make a very good complement to existing methods.


The method does have to identify lines of text to find the rotation angle, but doing so after perspective correction using the "all letters should be about the same size" assumption means that a Hough transform is enough for that step, since the lines should already be roughly parallel.

(Having to identify page boundaries is handwaved away with "I’m going to make a huge simplifying assumption that that the image we’re processing basically contains only the text that we want to rectify")


As the blog title has it, it's needlessly complex.


I wonder how well does it work for images. There is going to be some data loss, but how much?


Not at all for most photos, I think. What would you replace the assumption “on average, all letters should be about the same size” with?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: