> For a website that has 10,000 visitors per month, in one year you could save about 4.5 kilograms of CO2 emissions just by replacing Google analytics with Plausible.
What the hell? How do you calculate this figure? That's roughly equivalent to the CO2 created by driving a gas-burning car 10 feet to the data center to ferry information about each request (figuring a car emits about 1.2 pounds of CO2 per mile traveled). That's an astounding claim, and there's no effort to even explain the idea behind it.
This kind of penny wise and pound foolish approach just seems like a waste of time at best and at worst it lulls voters into complacency and distracts from the fact that our politicians still aren't doing anywhere close to enough to address carbon emissions. It's just PR and corporate greenwashing.
Like everything else, the best approach is just to use a mix of regulation, renewable subsidies, and a carbon tax to make using fossil fuels cost prohibitive compared to renewables and the market will eliminate them on its own. The wider the cost difference becomes, the faster renewables will displace carbon energy. We're getting there slowly as wind and solar are now slightly cheaper than carbon fuels, but we should definitely be helping it along a lot faster if we're serious about avoiding the worst case climate scenarios.
So far, it seems like we aren't serious about it and our leadership is sleepwalking us towards increasing catastrophe.
44.3 kB per visit * 120,000 visits per year * 1.8 kWh per GB * 475 g CO2 per kWh = 4.5 kg per year after fixing units.
"These numbers are all estimates but you can imagine if millions of website owners and Google Analytics users end up making a similar reduction in their website size too. The total reduction in the carbon footprint of the web would be immense."
Energy is used at the data centre, telecoms networks and by the end user’s computer or mobile device. Of course, this varies for every website and every visitor and so we use an average figure. The figures used are for 2017 from the report On Global Electricity Usage of Communication Technology: Trends to 2030 by Anders Andrae and Tomas Edler, adjusted to remove manufacturing energy as this is not relevant to this calculator. We then divide the total amount of energy used by the total annual data transfer over the web as reported in the Nature article, How to stop data centres gobbling up the world’s electricity. This gives us a figure of 1.8kWh/GB."
If you view the paper [1], I gave it a quick scan and it seems to be counting the electricity usage of all communications devices on top of data centers.
So in power per GB transferred, it's counting all the power used by people's 60" internet-connected TV displays.
Which is, obviously, absurd to include if you're trying to measure the marginal effect of additional data. More data doesn't increase your screen's power consumption, obviously.
An accurate claim for Plausible would have to be based mainly on marginal increases of power by datacenter and communications networks.
It's hard to say. Obviously without any data at all none of those screens would be on.
I always find the discussion of marginal increases of energy tricky. If I buy a plane ticket on a half-empty flight, obviously that flight was going to take off anyway, so the marginal increase of my weight plus my luggage is fairly negligible in comparison, so I'm only to "blame" for a fraction of the fuel spent, right? But who else is there to blame except the passengers, without whom there would be (eventually) no flights? So shouldn't we all divide the blame evenly?
If we keep it simple, there are two kinds of marginal increases.
The first type is when marginal increase can lead to a "new unit", like planes you refer to -- or servers used by data centers. If a plane fits 100 people, then (simplifying) 1/100 of the time you'll result in a new plane being used, so it makes sense to divide the plane's total resources by passengers -- not just the fuel you used.
But the second type never results in a "new unit". In this scenario, using more resource-hungry analytics will never push someone to purchase a second cell phone to spread the load. So counting anything but marginal energy increase usage by the CPU directly is disingenuous.
So in the case of analytics software, their data center server/power resources fall into the first type. But the consumer device resources fall into the second type.
So in this case I don't think there's anything tricky at all about it.
What if you are switching from a datacenter that's carbon neutral to one that isn't?
Also as a note, the google analytics js is heavily cached and thus doesn't have to travel as far or at all. Also Google has onramps to their carbon neutral infrastructure everywhere, so theres also that.
> "These numbers are all estimates but you can imagine if millions of website owners and Google Analytics users end up making a similar reduction in their website size too. The total reduction in the carbon footprint of the web would be immense."
If we removed 40k of CDN content per visit then the 1.8 kwh/GB would be 2.8 kwh/GB.
I mean 10k requests/second seems quite achievable for a single server. And I'd totally believe that 12 seconds of compute (per year!) wouldn't use much energy. In reality those requests would be intermixed with millions more for other sites and the servers would be running continuously, but the resources attributable to an individual site should be the same.
I mean, let's think about this a bit. The US generated about 4.13 trillion kilowatt-hours in 2019, and that generation emitted about 1.72 billion metric tons of CO2, or about 0.92 pounds of CO2 per kWh (https://www.eia.gov/tools/faqs/faq.php). Let's assume Google gets their power at that rate (which is unfair to Google because they claim to use 100% renewable energy, but I don't want to get into that).
A typical server rack might use anywhere from maybe 5-50 kWh. Let's say Google has really beefy ones that consume 100 kWh per hour. That's 92 pounds of CO2 per hour. For the 12 seconds you mentioned, that's still only 0.011 kilograms of total CO2 used. And the claim is that they're BETTER by 4.5 kilograms.
They've gotta be talking about some other expense than the server. But what sort of expense? The cost to build a server? Something about general maintenance of the Internet? ISPs between clients and the server?
Plausible is great, and I see the need for it, but I've always enjoyed using AWStats instead, as there is no need to add third party code to my site. It all happens in the background and it paints a much better picture of your stats since users can't block the gathering of stats with an AD-Blocker.
Interesting you say that. There's no reason Plausible could not be used like AWStats. Parsing logs is just a different ingestion mechanism and we already provide self-hosting via Docker. On principle it wouldn't be too difficult to drain your logs into a Plausible instance or just run it on the same host along your web server.
So this is why we haven't put too much effort in log analysis. The stats we got from AWStats were mostly bot traffic with no good way to get rid of them.
Have you run AWStats and Plausible side-by-side? Do you not have ~90% bots in your logs?
JS won't ever give you an accurate number (there's a growing army of people blocking JS & trackers). Logs will provide an accurate number, albeit you may not know if it's a 100% human.
How would you compare it to GoAcess [0]? I've only ever used GoAccess, but AWStats seems to be the older, more mature tool... so I would be curious about a comparison.
I'm a pausible customer. They're doing pretty good work, just the basics really and that's enough. I wish they reduced their prices instead of giving 5% to charity. If I want to give to charity I'll do it myself thank you. I feel a little bit like it's just a lot of feel good marketing and barbone software I'm paying for and might churn one of those days. Maybe I'm not the only one. I have very small number of visitors to monitor and your first plan is quite far from free. It's affordable, but not for everyone.
I feel like the title should be Taking on Google Analytics. Everyone associates Google with search, not so much website analytics. This title makes me think there’s someone trying to unseat their position in search.
Google analytics is the wrong end of Google. Sure you can get few customers now and then who love privacy and will ditch GA.
But for most, GA is how Google ads knows how to calculate conversions. People who want to use Google ads (which are everywhere) have to use GA. If you are not using Google Ads, I dont think Google cares much about your site anyway.
This is probably a better overview of what makes Plausible a privacy-focused tool: open source, can be self-hosted, no connection to adtech, minimal data collection, no cookies, no persistent identifiers, no personal data, no cross-site/device tracking etc
OK, so not quite privacy-preserving in the cryptographic sense, but more of a matter of degree. Plausible Analytics collects less data than Google Analytics, but not zero.
Great to see Plausible on HackerNews. It's one of the few pieces of software (Stripe is one, Starling another) that I deeply enjoy using. I get a good feeling when I open it up. I don't really have the UX vocabulary to explain it better than that unfortunately.
I’ve been using it for a while but feels pretty lite on the analytics so far, would be nice to see performance stats per page if that’s possible in a privacy friendly way.
You mean like a page drilldown to see stats of the individual page? You can do that already. On our live demo, click on any page in the Top Pages report and the dashboard will be segmented to only show the traffic that visited that particular page.
- For the large part the concern is what is done with the data.
- Data can be anonymized. (Although this is often hard to verify)
- You can hide the data in the client. For example imagine you want to know how many users use feature X. You can send an analytics report with 90% chance of a random value, and 10% chance sending the true boolean. You can't tell if any specific user has used the feature (because most likely it is a random value) but you can get a pretty good estimate what portion of your users use the feature.
My understanding is that Plausible is focused on the use an anonymization.
It is better privacy. There's one thing for one entity to know what everyone is doing on the web, all the websites that you visit and what you do on them, and another thing for an entity to know that you visited their own website, without knowing what other websites you visit and what you do on them.
LE: The best solution is still self-hosting, as hosted plausible is still a 3rd party entity that centralizes data (even though they probably don't use or share this data).
but ... the website that I'm visiting have no incentive in caring about my privacy, I mean yes they should but what's in it for them ?
I think this go to market approach of "we are better because google is evil" is just flawed.
Well, I'm an indie Dev and I do care, I find advertising and cookie notices really annoyining and I can afford the 40 something euros a year it costs me. I don't need to track everything.
They do care, the data can be collected anonymously, without being linked directly to your person. They can use such data to improve your experience, without affecting you personally in any way.
For me the default uBlock origin settings do block Plausible tracking, even on a website that used their own domain name to serve the script, but I assume it was because the name was "analytics.site.com".
Seems a bit like Plausible only pays lip service to some of these ideas. Merely 5 months ago the co-founder touted here on HN about how they are "big fans of open source so wanted as permissive [a] licence as possible" [0], then promptly went and changed the license to a strongly copyleft (AGPL) a few weeks later!
They might well be the next Elastic/CockroachDB/MongoDB/etc. Or better yet, they might do the classic bait-and-switch later on: get developer buy in with a good story about openness, then once they'd gotten enough of a customer (aka dev) share, do the switch.
What's wrong with AGPL that doesn't fit with our ideas?
We were on the MIT first and got into a situation where a large corporation wanted to take our code and resell it to tens of thousands of their customers and they made it clear they didn't want to contribute anything back to our project whatsoever.
We are a two person team putting our own time and savings into this and it could have instantly killed the project and the chance of becoming sustainable.
We changed the license and that was a simple way to stop them without changing our principles/ideas. Could have gone proprietary too at that stage but we didn't.
> What's wrong with AGPL that doesn't fit with our ideas?
Absolutely nothing. That person doesn't know what they're talking about.
I am sorry to hear that you learned about the peril of a permissive license in the way you did, but I'm happy that you switched to strong copyleft. Arguments demanding permissive licensing instead of strong copyleft amount to saying "but then how will I stand on your neck?" You shouldn't have to put up with that.
I don’t really know the background here but it really bugs me when I see people arguing nefarious intent simply because someone changed their mind later. Is there a logical fallacy that addresses “allegations of flip flopping”?
Sometimes people learn something new that changes things. Sometimes situations change and so the strategy needs to change. Sometimes people realize, for whatever reason, they were wrong and so they take steps to correct it. Do some people sometimes flip flop for the purpose of misleading people or pandering? Of course. But I really don’t think that’s typically the motive. We should be supportive of people changing their minds, not suspicious.
I'm not sure I understand the root of your complaint. You're saying that because the developers changed the license from a permissive license to a strong copyleft license, they're not supporting open source? I think that using a license like the AGPL is much better for the open source community in the long run, because it makes it more likely that the code will stay free and accessible no matter what company wants to adapt it
Anyone know how they identify the same user? All solution I know generate a unique number and put in cookie.
At my app https://hanami.run I don't track user and cannot know if the same users visit our website :-(. I don't want to use cookie and want to get away with GDPR. At the same time, I love to see which visitors repeatly read my website/blog and where they drop so I can optimize my site.
Everything that can be used to uniquely identify a visitor falls under the GDPR. We don't store IP addresses, so it should be GDPR compliant, but we still need to check that to make the claim.
The site has a big re-captcha banner on it - one of Google's most consumer hostile products. They should consider switching to something else if they want to "take on google".
That's about it. Horrible user experience - oh you're about to pay us, just click a few sidewalks first - and condescension of asking people to do a menial task that improves their ML models. But forcing you to use one of their sanctioned browsers and let the record what they want to is where the real hostility comes in. Its exercising monopoly power to squeeze more out of people and repress competition, I'd call that hostile.
Do you block Google trackers aggressively? reCAPTCHA uses that very heavily: if you allow all of their stuff and they track you across the web, you'll have to basically never do more than click the button. On the other hand, if you take your privacy seriously and are aggressive about tracker blocking, you'll have a pretty awful time.
I imagine hCaptcha doesn't have enough trackers sprinkled around the web to use those as signals for this.
I do block Google trackers, and have network state partitioning enabled, however the reCAPTCHA tests are usually bearable. (often a checkbox, sometimes a page) It seem like I get at least 2 pages of tests for hCaptcha every time.
Off topic but I’ve noticed recently that I’m frequently forced to incorrectly answer re-captchas the way a computer would in order to move forward.
Some examples: “click all the tractors” showed I did not complete the task because of a photo of construction equipment; “click all the crosswalks” because I didn’t select the photo of a thick white fence; “click all the traffic lights” because I didn’t select a photo of a parking meter. I just clicked the incorrect photo so I could move on but I can’t help but wonder if there’s any mechanism to catch those incorrect (manual, human) annotations on the training data Google is collecting.
Old-school "squiggly letters" captcha? For all the fear mongering around AI and machine learning supposedly breaking them, I'm still not aware of a general-purpose tool that would solve those out of the box without significant engineering effort.
The worry is not about ML. It's about bot farms in India/China with real people behind the wheel. That's why CAPTCHA needs to be able to evolve without maintenance from the website operator.
Plausible is still allowing DNS trickery for cross domain tracking as far as I can tell. This alone will keep us from ever trusting them. Only bad actors does this.
The analytics have no direct benefits to an individual visitor, like ads, so I get why you'd block them. I myself don't care about showing up in the analytics of the website I visit. But I pay for paisible because they are way less intrusive, and they get added to the blocklist anyway. This doesn't encourage good behaviour. From a website owner perspective, if I don't circumvent the blockers I need a server side solution. It would be equivalent privacy wise, harder to set up but less visible.
If you try to access a service and it's down, doesn't that impact you as an individual visitor? With analytics they can know it's down. If you click a button and you get an error, doesn't it affect you if they are not notified and fix that error? If a company you like and use their products can get better and release even better and cheaper products using analytics, doesn't that bring direct benefits to you?
What the hell? How do you calculate this figure? That's roughly equivalent to the CO2 created by driving a gas-burning car 10 feet to the data center to ferry information about each request (figuring a car emits about 1.2 pounds of CO2 per mile traveled). That's an astounding claim, and there's no effort to even explain the idea behind it.