Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Georgian African American newspapers from 1886-1926 now available freely online (usg.edu)
182 points by DoreenMichele on March 20, 2021 | hide | past | favorite | 40 comments


How should one distinguish Georgia as U.S. State vs. eastern European country?

It’s one of the few contextual proper names I’ve not ever seen have to be differentiated, though it seems it would need to be when speaking to a global audience.

For that matter, as an author, under what circumstance should you need to differentiate, if all audience is global?

Is it adequate to assume that since Google filters and has been filtering content by language for years, that Georgia in search results and relevant pages is almost always the Georgia they seek?


As someone in western europe, if it's non-US media it's probably Georgia the country, which is merely mentioned "rarely" as opposed to "only in us election week". If it's US media, it's probably Georgia the state. That said, countries are generally considered more important, so if it's not obvious you're talking about the US, I think the onus would be on people referring to Georgia, USA to specify.

Most countries are just not interested in internal divisions of other countries so it doesn't come up often. Just like I don't have the need to distinguish Munster (Ireland) and Münster (Germany) even though the latter often loses its umlaut in English text, or run into confusion between Northern Territory (Australia) and Northwest Territory (Canada). And the English language isn't short context sensitive words in other areas too, the most frequent example being "read" vs "read".


So imagine my surprise when staying with friends of my wife in Münster, they had never heard of Muenster cheese. They assured me that it must come from Munster in Ireland because Americans commonly confuse the two. But I was not convinced because the ‘ue’ in the spelling indicated a transliteration from ‘ü’. It was pre-Wikipedia, but luckily our hosts had a copy of ‘Das großes Buch vom Käse’ which let us know we were both wrong. Apparently there is also an Alsatian village named Munster with a distinctive cheese recreated by emigrants to the US.


In the EU the Munster cheese you'll find is generally from Alsace, it's protected (PDO).


I've lived in Georgia (the U.S. state) my whole life, and when I read "Georgian" as an adjective, I usually assume it means Georgia the country. People here typically use "Georgian" only as a noun to refer to a resident of the state. In adjective form, you usually just hear "Georgia" (e.g., Georgia coast, Georgia pines).


I wrote this elsewhere as an expanded form of my other comment here:

In the pictured headline, if "Georgian African American newspapers" is using "Georgian" as a demonym, then it means newspapers published by African Americans from Georgia.

But if the headline is using "Georgian" as an adjective, then lacking hyphenation, it's ambiguous whether it means newspapers published by African Americans from Georgia /or/ newspapers published in Georgia by African Americans.

We rarely use the adjectival form of U.S. states ("Georgia Peaches", not "Georgian Peaches"), so I found the headline a little garden path and initially started parsing it as if it were referring to the country of Georgia.

Then, right below the headline, the text reads "Georgia African American newspapers" which isn't ambiguous and means newspapers published in Georgia by African Americans. That's confirmed by the rest of the article.

Of course, the African Americans in question were likely also from Georgia, so to be pedantic, it's Georgian African American Georgia newspapers.

(I doubt any African Americans from Georgia emigrated to the Russian Empire at the time, otherwise we could be talking about African American Georgian Russian Georgia newspapers ... or something.)

Anyway if I were the headline writer, I think I would have used "African American Georgia newspapers..."


I was born in Georgia-the-US-State, and yet I stumbled a bit on the headline, parsing it as Tblisi-not-Atlanta.

Your careful analysis might explain my brain. (In this particular case. If that is even possible, because my brain...)


Agreed, as an midwestern American. Hearing "Georgian", I would think of the country, then the Georgian era (especially regarding architecture) and only then the state.


Yeah, had it said "Georgia" instead I would not have had the same confusion OP had.

I guess it's one of those unspoken rules we don't know we know.


I did think about that issue. In this case, the word Georgian is immediately followed by the phrase African American in the title. That should be sufficient to make it clear it's the US state we are talking about, not the country.


I found it a little garden path. I think "African American Georgia newspapers from..." is slightly clearer.

That slightly alters the meaning by not using the demonym: newspapers published by African Americans from Georgia vs newspapers published in Georgia by African Americans.

I guess you could also go with African American Georgian Georgia newspapers to be explicit that it's newspapers published in Georgia by African Americans from Georgia.

Sorry, I guess I have Georgia on my mind now.


That's good feedback. I also stumbled a bit on the word Georgian and I am originally from Georgia.

To me, that calls to mind Georgian architecture, not stuff from the state of Georgia.

https://en.wikipedia.org/wiki/Georgian_architecture

But the reality is that HN has some guidelines concerning titles (plus a character limit). So I did the best I could within the constraints as I best understand them, but I will certainly keep your remarks here in mind for any future titling challenges.


> That's good feedback. I also stumbled a bit on the word Georgian and I am originally from Georgia.

WHICH GEORGIA?! I swear people in this thread are writing specifically to make the rest of us mad :P


The one with the Tybee bomb.


There may have been African Americans or at least some of African heritage in Georgia of eastern Europe during that time period[1] and a little later[2].

[1]- https://www.blackpast.org/global-african-history/black-prese...

[2]- https://www.wilsoncenter.org/blog-post/black-skin-red-land-a...


Right, but I would bet that even today African Americans make up a very small miniority of people of African heritage in the country of Georgia and African Americans were certainly not numerous enough there in the 1890s to have their own newspapers.


This is not wrong if you consider Abkhazia part of Georgia. There are definitely recorded African descent families speaking Abkhaz and pictured in Abkhaz clothing. You are being unfairly downvoted.

https://en.m.wikipedia.org/wiki/Abkhazians_of_African_descen...


Are they American ex-patriots?


I've noticed that in desperately searching for the polite term to refer to local black people, a lot of Europeans and Asians accidentally and humorously land on "African-American."


No but their going to America was not out of the realm of possibility if they had Russian documents. I’m just saying it’s not as facially impossible as it sounds


Well, in the context of "Georgian African American" I didn't have any trouble. I'm surprised you did, as the other Georgia isn't currently, and has never in the past, been "American"


Georgian to me especially in this context originally meant the period.

https://en.wikipedia.org/wiki/Georgian_era


A bit off-topic but I recently wrote a blog post about how you could get historical online published news articles:

https://blog.newscatcherapi.com/an-ultimate-list-of-open-sou...


Watch out because this site has links to hundreds of other newspapers too dating back to the 1700s. I made the mistake of checking one of them out yesterday and lost the next four hours of my day reading about historical events. Really amazing stuff.


It’s interesting to see the difference in how African Americans were depicted and covered.

It is well-known that the “white” press of that time often demonized its black citizens and escalated tensions that helped prolong the period of white terrorism, voter suppression and lynching throughout the Reconstruction period.

- https://tulsaworld.com/news/tulsa-race-massacre-1921-tulsa-n...

- https://www.kansascity.com/news/local/article247928045.html

This sadly continues even today both in the US (to a lesser degree) and internationally.

We see similar treatment of minorities in the Chinese press (Hong Kong residents, Uyghurs) and in the US conservative press about migrants and immigrants.


> This sadly continues even today both in the US (to a lesser degree) and internationally.

To a lesser degree? I am inclined to believe that this[0] Vietnamese-language paper publishing an entirely fabricated story about Black men (and representing a photograph of tragically murdered Ahmaud Arbery to identify a fictitious perpetrators) was intended to prey upon the incredulity of Vietnamese-Americans.

0. https://tuoitrexahoi.vn/truy-na-4-ke-da-mau-cuop-hiep-2-me-c...


So many ads for watermelon.


[flagged]


I don't think 1886-1926 Georgia was known for its willingness to include Black people at major, lily-white newspapers.

https://www.georgiaencyclopedia.org/articles/history-archaeo...

> Georgia's toll of 458 lynch victims was exceeded only by Mississippi's toll of 538. During the 1880s and 1890s, instances of lethal mob violence increased steadily, peaking in 1899 when twenty-seven Georgians fell victim to lynch mobs. Between 1890 and 1900 Georgia averaged more than one mob killing per month.


White is a complicated term. Definitely before the 1920s, it wasn’t necessarily European, because it excluded, for example, non-Protestants like the Irish, Italians, and Russians.


I think it would be a lot clearer if they did. I'd love if they renamed the years of segregated baseball "White Major League Baseball" or started calling it "The White Constitution of the United States."


Reading through the front pages, one would never know this was an ‘African American’ newspaper, as it is no different from any other. It is quite racist to even mention that fact that it is ‘African American’.


You may not be familiar with US history, but it was an extremely segregated society back then with many lynchings occurring for any black people who stepped out of line. Black Americans frequently had to make parallel systems just for their communities since they were often not served by services run by white Americans.

So in this case the skin color of the people who created these newspapers is an important historical contextual point. Additionally, it’s simply a fact of the matter being discussed and as such it isn’t racist to mention a basic fact.


AI is going to take over the world they say, but we still don't have an easy to use and free high end OCR available.

Australia's Trove is getting humans to translate them‽ - https://trove.nla.gov.au/help/become-voluntrove/text-correct...

Anyway, very cool, the world needs more of these out of copyright newspapers online. History has been lockup up by historians for to long.


> we still don't have an easy to use and free high end OCR available

There is Tesseract: https://github.com/tesseract-ocr/tesseract

> Australia's Trove is getting humans to translate them‽

Your link clearly refers to correcting an existing transcription:

"While viewing digitised newspaper and gazette articles, you may notice that the text transcript doesn’t always match the text in the article. You have the power to fix this by editing the transcript to match the article text."

Most likely they used OCR software to generate the initial transcript, but allow users to correct the OCR output because they know the software is not perfect.


I used tesseract for something that sounded extremely simple: OCR a date tag and a few characters identifying a large list of photos. The tag is of very high quality, white over black, a 6yo would read it flawlessly. I had to use the "old" tesseract (the new one uses ML), probably because the ML-based one was "inventing" characters or swapping them, and could not reliably identify numbers. Even with the old solution, I had to resize, apply some filters and was barely reaching 99% recognition. For something extremely clean ! So we have ways to go...


Tesseract makes me feel like we've gone backwards in 20 years.

I know there have been improvements, and probably impressive ones, but they are not being implemented in any sort of unison in anything easy to use.

If computers can't even read newspapers, which only have a certain amount of fonts and letters/words to chose from how can we think they can do much else.

Test of - https://trove.nla.gov.au/newspaper/article/151059024

Tesseract starts really well here but then stops after a few lines. No idea why, it just frustrates me.

"FEMALE PIRATE LEADER. HONG KONO, Sunday. After no sign of activity in Bias" - Then Stops.

Trove is below and doesn't get much.

"FEMALE PIRATE LEADER. HOHCrKOiro. SanOV. After twiigiiJofswUTlty In »» Xay far ? jux. pirates, lei br a woman, aged 28, EagUA tpio^K, attacked the JspaBMe steaiktr TM1 Slant ? betwsan jbaojr and"

ABBYY Finereader is OK -

"FEMALE PIRATE LEADER BOKO KOHO, Sunday, After no ties of activity la Bin Bay for a year. pirates. lad by a woman, aged 28. English speaking attacked the Japanese steamer"


Tesseract is great in certain situations but a lot comes down to having a robust preprocessing pipeline and a correction flow. I wrote about this a few years back:

https://chris.improbable.org/2014/3/17/content-search-on-a-b...

https://blogs.loc.gov/thesignal/2014/08/making-scanned-conte...

Basically the big problems are getting the content deskewed (even a slight rotation will cause accuracy to plummet, which is a problem if there was page curl or a flaw in the original printing process), breaking text into clean segments (non-trivial in newspaper layouts), and dealing with noise from dust or content from the other side of the page bleeding through. The collection I’ve worked the most with (https://chroniclingamerica.loc.gov/) also had a lot of problems due to many collections having been scanned from microfilm first. Tesseract 4 is better but in my testing you aren’t going to see revolutionary improvements without investing in tooling to identify segments and clean them up before passing them to Tesseract.

Since that entire collection is public domain and freely available for download (https://chroniclingamerica.loc.gov/data/ or s3://ndnp-batches), researchers have used various ML tools on it and that definitely looks promising but is not a silver bullet by any means. There are some trained files available here along with a large public S3 dataset:

https://news-navigator.labs.loc.gov/


Are you sure we don't have decent OCR today? I just pulled out a pen, wrote "Pie for breakfast" with my non-dominant hand so it looks like trash, then I scanned it with Google Translate (on an obsolete iPhone, almost in the dark) and it picked it up perfectly. It seems dramatically better than the state of the art 20 years ago.


Where can I get something like that that I can use as a library in my own program? I don't care if costs money or is open source, just that it works.

The closest I can find is Tesseract which was developed in the '80s, and for handwriting it gets maybe two thirds of the letters right. Doesn't work on cursive.

I want something that gets 99 point something right and works on cursive, that I can use as a library, offline. Plus flying cars, and a pony.


google translate is proprietary, not free.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: