I don't trust the "delete" button to scrub it from FB's database.
I'd be slightly more confident (slightly) that editing the post might cause the core data in the db to be updated, however. In which case, I think the more effective script would be one that goes through all of your FB posts and scrambles them, or replaces the text with gibberish.
Lots of people in this thread are claiming that it doesn't based on gut feeling, but consider:
- Facebook publicly claims it does [1]
- Mark Zuckerberg testified in-front of congress and stated they do [2]
- Multiple government regulators have specifically checked that it does in their privacy audits. [3]
They might be lying and actively conspiring to not delete it, but they'd have to have a very good reason to take on that much legal risk.
Now, consider:
- A infinitesimal fraction of facebook users try to delete anything.
- Facebook makes money from your data by showing you ads. If you stop using facebook, you stop generating any revenue and your data becomes a liability, not an asset.
> Mark Zuckerberg testified in-front of congress and stated they do
I just want to point out that "testified in-front of Congress" seems to have no relation to the truth of any statement. I don't recall the last time (if ever) there have been legal ramifications for lying in front of Congress.
Zuckerberg also testified that shadow profiles don't exist and that users can always remove their personal information from Facebook (they do exist, and you can't). James Clapper testified that the NSA doesn't "wittingly" surveil hundreds of millions of Americans (they do, very "wittingly").
I agree that it is technically illegal, but if the most recent example of a conviction (that you can think of) was almost 30 years ago that tells you that it's effectively unenforced.
Does it? One conviction in 30 years is also consistent with the conclusion that not that many people get invited to give evidence to Congress, and the people that do tell the truth (or at least don't tell provable lies).
He lied to the FBI after a plea bargain which involved him telling them the truth. From memory, this all happened before his Congress testimony but it's definitely not related.
It is illegal to lie to Congress (unrelated to whether you're under oath or not), but I can think of very few examples where there were real ramifications for it.
Cohen pleaded guilty to several charges [1], including some that alleged that he lied in a written statement sent to the Senate Select Committee on Intelligence and House of Representatives Permanent Select Committee on Intelligence on 28 August 2017, and then repeated those lies in his testimony to the Senate committee on 25 October 2017.
> He lied to the FBI after a plea bargain which involved him telling them the truth.
Sure you're not thinking of Manafort? I don't think Cohen had a plea deal, and I don't think he was charged with lying to the FBI. Manafort and Flynn did, and Manafort I think was the one that then broke the deal. There's a lot of crimes to keep track of.
I guess it depends on what you mean by real ramifications as it's being served concurrently but Cohen is serving two months for lying to Congress, a charge he plead guilty to.
Yep, that looks right from his Wikipedia article. One count of making false statements to a congressional committee. It’s not immediately obvious to me when he was charged/convicted of that though. https://en.m.wikipedia.org/wiki/Michael_Cohen_(lawyer)
Was the nature of deletion explicitly defined as "deleted records are removed from all databases, data warehouses included"?
They technically wouldn't be lying if they only deleted the record from application systems, but retained it in historical data by simply setting a bit flag (eg "IsDeleted") to 1.
The question about what motive they have comes down to whether or not the types of posts a person deletes indicate something meaningful about their personality. Ad analytics are ultimately seeking to understand the kind of person you are, after all.
I read the audit, and here's something I noticed (other than that it's 8 years old):
"In determining appropriate retention periods for personal information, data controllers can have due regard to any statutory obligations to retain data. However, if the purpose for which the information was obtained has ceased and the personal information is no longer required for that purpose, the data must be deleted or disposed of in a secure manner. Full and irrevocable anonymisation would achieve the same objective." (Page 69, under 3.4 Retention)
So basically, as long as it's anonymised, the data can be retained.
But “if the purpose for which the information was obtained” was to enable better ad targeting, then “data retention” is still “appropriate”, eh?
Under this reading, is there even an obligation to anonymize? And how anonymous does anonymous need to be? Does a simple base64 encoding count as “anonymized”?
I had that thought originally too - however the keywords here are "statutory obligation" (basically, legal reasons.) So if the FBI asked them to retain the data, they can do so as long as the FBI needs it.
And the matter of anonymisation is a great question - "irrevocable" anonymisation certainly has a different meaning back in 2011, when swapping a name for a guid would do the job. Nowadays, it would require at least deleting all relationships as well, since social network analysis is much more advanced these days (especially at FB, of all places.) It wouldn't be impossible to derive the identity of an anonymous account/record based on the undeleted data associated with it. And since FB's "ghost profiles" are something we know exist, I think it's safe to assume those relationships are being maintained somehow.
<cynical interpretation> The data can be retained so long as Facebook is in the business of targeted advertising, and that's "the purpose for which the information was obtained".
Also, I'm now looking forward to Zuck's next apology for "a breach of trust" and his explanation of how they "failed to live up to their own standards" when it becomes publicly undeniable that their "Full and irrevocable anonymisation" of data they've claimed is deleted is as flawed as all the other attempts we've seen of doing that.
Facebook don't need to keep your past posts, their algorithms have already run the data. The real question is whether you can reset the "personas" they assign to you....
#3 it has a submodel which detects you are feeding it false data.
Given that people change over time, #2 seems unlikely. The question to me is "how advanced is #3?". Another thing I wonder about is how relative are these to eaxh other. At odds?
#1 is a well-established technique called "online learning", and I'd bet money that it's how many of these "industrial scale" ML algorithms are trained.
#2 makes no sense from a business perspective.
#3 is also well-established. This is how Google's Captchas work, for example.
> "Facebook makes money from your data by showing you ads. If you stop using facebook, you stop generating any revenue and your data becomes a liability, not an asset."
If my friends or acquaintances continue to use facebook, then information about me continues to be useful to facebook as that information can be used to complete their model of what sort of people my friends are, that they would associate with somebody like me.
I wouldn't trust Zuckerberg to pour a glass of water on a burning orphan, let alone delete data for real. Nor would I trust any politician to know anything about the subject.
Maybe not lie per se but they might have multiple data stores, one for the "social media" side of things and an archive for machine learning, one for targeted ads, etc. So they could legitimately say "we delete your data [from the perspective of social media]" while still having an archive for their other business units.
It wouldn't be the first time Facebook (nor any other business that profits from people's data) have been deceptive while treading a careful line between honesty and lies.
edit: I should add that if they were keeping archived records in a separate business unit, there is the possibility that they keep the original or even all edits. So scrambling your data might not do much aside confusing those who have you in their feed.
The UI doesn't say "delete", it says "remove". Remove could mean "hide from view" or it could mean "delete the underlying database entry". It's a bit of grey area where it could be argued that users have consented to by using Facebook.
Granted GDPR is supposed to catch businesses that pull those kind of stunts, you have to remember that Facebook do already break GDPR in number of public ways too. So it's pretty clear they have a relatively open interpretation of the regulations (and an army of lawyers who are confident they can proceed in such a way). Or it might just be the case that even the worst fine issued by the GDPR is worth the risk given the financial benefits awarded to Facebook for retaining data.
>- Facebook makes money from your data by showing you ads. If you stop using facebook, you stop generating any revenue and your data becomes a liability, not an asset.
This part at least is incorrect. Your data will always be an asset, and FB makes money not just by showing you ads but by selling your data.
And they obviously give away some data for free, especially pertaining to users who click through OAuth consent screens as in the Cambridge Analytica case: https://developers.facebook.com/docs/graph-api
What are some examples of transactions where they exchanged data for money?
Do you know what the legal status is of models derived from someone else's data? EG, can I train an AI from every Marvel movie, then use that to produce other movies?
That's not a great analogy because I would be violating other people's IP, whereas FB generally has rights to the data they collect.
Facebook cares about “your” data, but that doesn’t really mean your painstakingly edited review of the new avengers movie. It’s more what you spend time looking at, what you click on, etc.
They did not sell information to Cambridge Analytica. As stated in the page you linked, the data was scraped using a 3rd party application requiring user opt-in. Now, there was a bug that allowed the app developer to also scrape data from the friends of the actual users but that has been patched and the developer's license has been revoked as the entire ordeal was against FB's licensing agreement.
I am not a fan of Facebook but this entire thread is just filled with misinformation.
I work at a company you've heard of, and you've probably (P > .5) used at least one of our products. We're international (including EU). I personally wrote our user data deletion logic for GDPR compliance.
We delete all of your PII. It doesn't matter if you're from the EU or not, because it's too expensive to figure it out and too risky because you're going to miss some weird edge case - Legal doesn't have much of a sense of humor when it comes to wiggle room.
My experience has been that larger organizations consistently spend more effort (relative to size) to genuinely comply with privacy regulations than smaller ones. The risk:reward ratio for deliberately ignoring or subverting privacy regulations is insanely bad. There are too many surfaces along which that would leak out, and the gains would be pretty marginal. Pretty much anyone who works at a large tech company can and will confirm that this is the case (see Jeff Kaufman's posts on this thread). There is no conspiracy between the six-digit number of engineers who work at these companies to keep quiet about it.
>There is no conspiracy between the six-digit number of engineers who work at these companies to keep quiet about it.
I wouldn't say "conspiracy" but they've been mums-the-word about things like shadow-profiles. Perhaps, it comes down to too much kool-aid but to say that engineers would initially speak-up in such cases has been proven wrong, time and again.
Take the Snowden revelations, as an example: How many years were the programs in service before Snowden went public? How long have we known about shadow profiles and no one from Facebook has come forward to say, "yes, this is what they're doing and it's wrong"?
Relying on people to do the good that should be done by the organisations doesn't take into consideration that those engineers face severe penalties for "going pubic" about such things - namely because whistle-blower laws do not supercede such thing as NDAs.
Doesn't Facebook still have a zoo of MySQL schemas in production for each table? While I'd believe the product works as intended in some fraction of production, I think it's reasonable to suspect some or all of the system is broken for a non-trivial set of users and/or groups.
And then, what about all the data copied into Hive? Copied to 3rd parties? Does Facebook go and delete data that 3rd parties took?
You'll no doubt soon be scolded for mentioning that quote, but it will continue to be relevant for as long as Zuckerberg's actions continue display that mentality. We'll soon have somebody telling us he said that years ago and people change, but the only thing that changed with Zuck is he became a bit more diplomatic and guarded with his language. Actions speak louder than words and Zuckerberg's actions speak clearly.
He's the same old dirt bag he ever was. Here are some fresh stories to back that up:
Yeah. If there was some evidence that Zuckerberg's overall approach had changed, I too would argue that continually resurfacing an old quote until the the end of time is unfair and unproductive.
All of Zuckerberg's and Facebook's actions, however, continue to suggest that his approach hasn't changed.
Yeah, for a funny example, we seem to have (mostly) stopped reminding Google of their old "don't be evil" slogan, since it became obvious their overall approach had changed ;-)
I used to be one of the scoldiers saying things like "oh all young nerds say dumb stuff when bragging to their friends"
Honestly the vibe's still there if you pay him/fb enough attention. I'm working on extracting myself completely from them (Still got a few group messages on upcoming events to clear)
Not sure why you are downvoted - always good to remember. In many ways thats what he kept saying until last year, even to congress - just with nicer, politically more appropriate words.
It's always good to remember this and we should keep repeating it.
This is the most profound insight we have into the mind of Mark Zuckerberg and what he thinks of his user base.
Stop listening to the PR and examine his actions since then. You can see how that comment was not merely a youthful indiscretion, but his entire business.
Zuckerberg continues to display this exact same mentality, only now he has billions of dollars in resources.
One of the first thing I learned in the IT industry...do not delete any data no matter what the circumstance is! And I believe internet companies whose life and soul is data would not want to delete it
> - Facebook makes money from your data by showing you ads. If you stop using facebook, you stop generating any revenue and your data becomes a liability, not an asset.
> Facebook publicly claims it does
Hahahahaha YMMD!
> Mark Zuckerberg testified in-front of congress and stated they do
He would tell us everything that helps him as he is an opportunist
> Multiple government regulators have specifically checked that it does in their privacy audits
You're talking about the governments secretly spying on us and lying about that since decades? Ouch.
There is no legal risk for those who observe if you abide the laws.
> A infinitesimal fraction of facebook users try to delete anything
What does that tell you about the majority of ppl?
> Facebook makes money from your data by showing you ads. If you stop using facebook, you stop generating any revenue and your data becomes a liability, not an asset.
That's just what you think you know. What about psychological profiling, law enforcement etc.?
Also did you hear about the "shadow profiles" about users who don't have an account? The moment you're surfing into their net (as we know even through embedded like buttons etc.) you'll be milked like a dairy cow.
> What reason would there be to lie about it?
What reason is there not to be honest about it? The recent events have shown very clearly that they don't have to fear much.
> They might be lying and actively conspiring to not delete it, but they'd have to have a very good reason to take on that much legal risk.
Or they can just claim, again, that they accidentally forgot to delete copies of these posts from their backups. It just completely slipped their mind.
Mark Zuckerberg is a liar who has demonstrated that he will do and say whatever is in his and Facebook's best interests. If you believe anything other than that, you're a fool.
Hmm ... I wonder if they have an edit limit where old edits get erased if you keep editing, sort of like that trick someone came up with years ago for pinging the credit bureau so many times so that eventually your credit looked better since old items were erased or not considered for your credit score ... or something.
It's called "bumpage" or "B*" on forums. People were leery of using the word back in the day because they thought the bureaus might catch on. It doesn't work anymore.
This is the case with many other online platforms (such as reddit) in that the original content of a post is still reachable if it was deleted, so a common method is to "scrub" edit your posts then delete them.
However at Facebook's size, and given they're known for 0 privacy, they likely track all changes anyway.
Reddit doesn't actually store the history of your edits, only that you edited it (which is why, when you scrub reddit history, people recommend editing it because then your posting is gone except for say a backup). Facebook, however, allows you to see the history of posts so if someone edits it you can see what they originally posted.
I don't think editing a Facebook post first would have the desired effect.
I wouldn't be surprised if it's not being scrubbed from FB database, but that's not the author's goal.
The README states the intent is to clean up publicly facing content. It's meant to tidy up internet presence to the general viewing public, not escape the grip of FB's data vacume.
I'm a big fan a Selenium, and this is great usage of it! Scripting a boring, repetitive browser task, that would take a large amount of time & effort to do manually.
I'm pretty sure they use some sort of event store for their backend in which case all versions/changes/updates/deletes are stored as a separate revision alongside the original content.
All the big companies are doing immutable, append-only event logging and probably have no mechanism to expunge this data. All because storage is cheap and they need to hold on to everything for testing or whatever future need that might arise.
> All the big companies are doing immutable, append-only event logging and probably have no mechanism to expunge this data. All because storage is cheap and they need to hold on to everything for testing or whatever future need that might arise.
I'm very confident that the data is fully removed, because properly deleting data within NN days is treated as very serious internally. But I don't know the details of how it's done for cold storage.
(I would love to see someone subpoena something deleted, say, 1y ago and write up whether it was produced.)
I wonder how that works with Facebook's Blu-Ray cold storage [1]. Are optical disks treated like paper documents or is it still considered electronic storage?
Ah, that's interesting. If they store customer data on BlueRay disks I assume Facebook took the necessary steps to delete/destroy records according to GDPR requirements when customers request deletion....
It's currently possible to view previous versions of posts, just click "# edits" in the bottom-right by the "# comments" button (neither of which look like a link)
Tape backup is re-writable though isn't it? I had inherited one in the 90s and could re-write, granted it was painfully slow. Or are you saying that Blu-ray is re-writable too? The first iteration I used a few years ago could only write once.
Maybe disks are cheaper, even if you can't reuse them. As an added benefit, you can also restore info from the more distant past if you recognize errors too late.
I believe there is zero doubt that somewhere the text has been saved. Not only the original text, but also various results of analysis. Since they store edit history, too, replacing it with other text would simply be an indication that you’re trying to cover it up.
I would guess that this has been demonstrated in court.
> Since they store edit history, too, replacing it with other text would simply be an indication that you’re trying to cover it up.
It's text you wrote, why would there be any problem "covering up" your own text?
Note that, I'm not arguing you should have a right to use Selenium to scrape their site and replace the text, but if you went through every post in a non-automated fashion and changed the text, from a perspective of "covering up" the text, the purpose would be the same. Facebook almost certainly has an interest in keeping 100M users from scraping with Selenium, that's a separate thing entirely.
If you’re trying to hide the content from someone who has access to the edit history, editing it is ineffective and simply shows that you wish for it to be hidden. When would this be an issue? In court.
It’s true that editing could serve to hide it from other users, but privacy settings could do that also.
This is information that you originally put out, wrote down on Facebook, so it's not that you're hiding something you didn't want anyone to know originally. If it was libelous or something so you wanted to delete it after the fact, it's still likely possible to get it back -- even if Facebook deleted the actual message in the database, there's likely a number of methods -- server logs, db logs, cached resources, db backups, or even a simple screenshot -- that would likely be sufficient proof.
Plenty of people post things that they shouldn't have and regret. It could be plainly incriminating, like when people ill-advisedly make a post essentially admitting to a crime, threatening someone, or discussing events related to a lawsuit in public. It's possible deleting or obscuring such a post could even be construed to be destruction of evidence.
Point being, the ability to edit and view edit history changes very little with regard to the courts. Instead of needing to present Facebook with a subpoena for information from e.g. logs, you're able to simply view history. It doesn't change anything whether you can delete the data or "delete" the data, it just changes how accessible it is.
I think the benefit of this is that other people can't easily see what you posted. For example recruiters for jobs, journalists or activists trying to publicly doxx or embarrass you, stalkers, people with a grudge against you, etc.
If FB has this data squirrelled away somewhere in cold storage then in order for anyone to harm you with it, FB has to admit to keeping it (which would be a scandal) and it has to get out into the wild. That is a reduction in risk compared to someone just looking at your profile.
There was a post a long time ago on HN on how to taint your FB data over time so that it's difficult to tell where your real data stopped and your fake data began. I can't find it now, but part of the advice was don't replace your real data with gibberish, but rather with non-gibberish - public domain texts for example, or better yet AI generated text). And don't do it all at once but spread it out over a year or so.
With Facebook's distributed data stores, I'd be really surprised if any deletion was immediate.
It's more likely that they tombstone a post. That's where they store a "this post has been deleted" flag in their database. Using this style of deletion, the post and the tombstone would be eventually removed from the data store when it is periodically compacted.
I had deleted all my posts and photos. Even my profile photos. Then I disabled my account. After a year I enabled it back. Facebook used to prompt me to add some photos - both before disabling and after reenabling.
The eerie part? It used to show faded thumbnails of my photos that I had deleted more than a year ago in Photos section when prompting me to add my pics.
Probably will want to scramble multiple times, each one randomly with realistic looking sentences (better yet - just take about 100 preprogrammed sentences and permute a subset of them), and making sure there is a random spread to the total number of edits for each post
If you use Reddit and want to delete your comment/post history I would recommend Shreddit[0] for the reason you mentioned[1]. It's important to do this frequently because there are quite a few sites out there that cache Reddit content periodically.
Something like this is probably against some terms of use that you implicitly agree to as a Facebook user, and I wouldn’t be surprised if they would auto detect it and ban you.
> I'd be slightly more confident (slightly) that editing the post might cause the core data in the db to be updated, however.
FWIW, I believe this is still true of reddit. Deleting a post doesn't actually delete it. It just no longer shows on the webpage. To really remove it from the database, you need to edit the post/comment, replace the contents with something simple (e.g., the character "a"), save it, then delete it.
Post-modern privacy models include those developed by University of Chicago law professor Richard Hasen, who argued the Obama administration could regulate Facebook under the guise of protecting privacy. (He was later fired from his posts.) Other authors have argued that allowing Facebook users to choose what information they share online would weaken privacy, and that if Facebook were able to take over the content management systems of websites and allow people to censor content, that was something the First Amendment should prohibit.
This is pretty good perhaps even for auto-generated absurdities:
Seed with: "Why does my dog speak poor English?"
> It's possible your dog has a rare genetic condition called dyslexia, which puts him or her at risk for learning difficulties. If the problem has progressed beyond a certain point—say, if your dog has been in the hospital for over a year—and when you go to bring him out of the hospital—not for exercise or exercise training—he may not speak English in the proper way. Your veterinarian will work on the problem while you are in the hospital.
What should I do with a dog with dyslexia?
It's important to be vigilant and familiar with what is going on with your dog in hospital. If you find that your dog may be learning, take him home and have him trained as soon as possible. It may take a while for your dog to learn to associate the letters of English with words.
I might have to use this to reply to unclear tasks from now on, great find.
There are times when we need to ask ourselves, are our interests really served by having this ability, when so many of us are now doing so much online anyway?
probably safe to say that the idea of a user on a platform like Facebook trying to obfuscate their history on the platform... is a short jog from an individual trying to avoid government backed hacking on them.
Facebook has many years of pretty smart work, and probably safe to say tens of millions of dollars just in meetings about specifically this (re costs associated with effort of user data retention).
What will be the consequences though? I'm sure they can easily discover that the user is trying to overwrite his history, batch edits are easy to spot - but what will they do about it? Ban him & delete his data? Stop him from editing his own posts? Just throw away all changes that don't pass the smell test?
I would speculate that they have built systems in a way that it would not matter what any user does. any edits to a already existing post probably just append
I'd be slightly more confident (slightly) that editing the post might cause the core data in the db to be updated, however. In which case, I think the more effective script would be one that goes through all of your FB posts and scrambles them, or replaces the text with gibberish.