This just confirms it is me. Which yes, reduces the problem of me being replicated, but does not do anything for my anonymity. That part may not seem important to you, considering you use your real name, but it is to me. It allows me to be more open.
Ah, think I see your point. Your worry is that language use, etc. could be use to deanonymize you, by correlating with text that was not written anonymously. But that's a separate issue from voice or writing style cloning to pretend it's you that said it. In the latter case you could use a pseudonymous signing key?
I agree that deanonymization is an issue that is hard to tackle. I wonder if someone studied how unique writing style is. E.g. browser fingerprints are fairly unique, but I wonder to what extend you can filter a person from, say a pool of 100 million, using writing style alone (grammar, vocabulary use, etc.). I guess it becomes quite easy if you engage in a lot of domain-specific discussions and use their vocabularies.
E.g. if I'd talk about Marlin-kernels here, you could probably narrow me down to a few hundred people. Throw in another comment about the Glove80. Maybe ten people at most?
> I wonder if someone studied how unique writing style is.
Since I teach I can tell you that I can usually tell who wrote something by their language and it even works with code. There's also the Enron dataset, which is a common dataset for first time ML students where you do exactly this task.
Your language is in fact a fingerprint. And like you suggest, topics too. Much of our freedom of anonymity comes from the fact that it is hard or not worth it to dox people.
I do agree that verification is a different issue though. I'm not sure keys will solve it because you're not going to sign anything that is scandalous, so it might even give evidence for those that want to falsely claim foul play. And how do you sign a leak?
The problem with signing is that it seems to work for the cases we don't care about and do nothing for the ones we do. That is unless we sign literally everything, including our voice, but then you kill anonymity (why I connected it) and you could then probably clone that too.
Ah, think I see your point. Your worry is that language use, etc. could be use to deanonymize you, by correlating with text that was not written anonymously. But that's a separate issue from voice or writing style cloning to pretend it's you that said it. In the latter case you could use a pseudonymous signing key?
I agree that deanonymization is an issue that is hard to tackle. I wonder if someone studied how unique writing style is. E.g. browser fingerprints are fairly unique, but I wonder to what extend you can filter a person from, say a pool of 100 million, using writing style alone (grammar, vocabulary use, etc.). I guess it becomes quite easy if you engage in a lot of domain-specific discussions and use their vocabularies.
E.g. if I'd talk about Marlin-kernels here, you could probably narrow me down to a few hundred people. Throw in another comment about the Glove80. Maybe ten people at most?