Lots of missing context from these sheets that has to be interpreted (ie, how do you taxonomize each field of information?). Then asking questions on top of these documents is a step on top: "is the allegation about sexual violence?", "What is the name and rank of the person being accused?", "Is anything anomalous in the review process?", "Has this person's rank changed in the past 5 years?" etc etc.
Now expand this problem to hundreds of thousands of different types of document.
Lots of missing context from these sheets that has to be interpreted (ie, how do you taxonomize each field of information?). Then asking questions on top of these documents is a step on top: "is the allegation about sexual violence?", "What is the name and rank of the person being accused?", "Is anything anomalous in the review process?", "Has this person's rank changed in the past 5 years?" etc etc.
Now expand this problem to hundreds of thousands of different types of document.