This reminds me where many years ago I learned about the world record holder for...

djaque · on March 9, 2020

That's really awesome, but it feels like cheating to call it the OCR record holder. It's really OCR + context. However it would be interesting if you could apply the same idea at the word and sentence level of written language. I'm guessing there are people that already do this.

chongli · on March 9, 2020

it feels like cheating to call it the OCR record holder

This is how humans recognize text though. For the most part, humans don’t try to read languages we don’t understand. To deny a computer access to context is like asking a human to transcribe a language they don’t understand.

mlyle · on March 9, 2020

I'm a pretty fast typist but if you ask me to transcribe latin text of gibberish or of a language I don't know.... not so fast. A lot of it is how much I have to slow down to be accurate in recognizing characters.

Someone · on March 9, 2020

I would guess your typing also gets significantly slower. Mine does; I’ve gone through periods where I had trouble typing the word in because my muscle memory turned it into int.

kaiabwpdjqn · on March 9, 2020

> It took hundreds of hours of human review time to find a single OCR mistake from this process!

This stands out to me as improbable. Not in that the error rate could be that low, but in that they actually had humans spend hundreds of hours checking the accuracy of difficult character recognition. How did that happen?

RandallBrown · on March 9, 2020

Put a handful of grad students in a room for a week and you have hundreds of hours right there.

dzdt · on March 9, 2020

I searched out the article: "Reading Chess", 1990, HS Baird and Ken Thompson. (Yes, that Ken Thompson).

http://doc.cat-v.org/bell_labs/reading_chess/reading_chess.p...

It doesn't actually quantify the human proofreading time. I might have recalled incorrectly; I heard about this in the late 1990's as a war story from another OCR researcher.

PaulHoule · on March 9, 2020

It's an embarassing problem to have a system with "accuracy to high to measure"!