Taking on Google

CobrastanJorji · on March 31, 2021

> For a website that has 10,000 visitors per month, in one year you could save about 4.5 kilograms of CO2 emissions just by replacing Google analytics with Plausible.

What the hell? How do you calculate this figure? That's roughly equivalent to the CO2 created by driving a gas-burning car 10 feet to the data center to ferry information about each request (figuring a car emits about 1.2 pounds of CO2 per mile traveled). That's an astounding claim, and there's no effort to even explain the idea behind it.

eximius · on March 31, 2021

> 4.5 kilograms of CO2

So, an absolutely negligible amount of CO2?

By virtually any metric? i.e., you, as an individual, exhale that much CO2 in a week.

Hold industrial processes responsible for CO2 emissions, not your website. (Unless you're bitcoin, I guess?)

anonporridge · on March 31, 2021

This kind of penny wise and pound foolish approach just seems like a waste of time at best and at worst it lulls voters into complacency and distracts from the fact that our politicians still aren't doing anywhere close to enough to address carbon emissions. It's just PR and corporate greenwashing.

Like everything else, the best approach is just to use a mix of regulation, renewable subsidies, and a carbon tax to make using fossil fuels cost prohibitive compared to renewables and the market will eliminate them on its own. The wider the cost difference becomes, the faster renewables will displace carbon energy. We're getting there slowly as wind and solar are now slightly cheaper than carbon fuels, but we should definitely be helping it along a lot faster if we're serious about avoiding the worst case climate scenarios.

So far, it seems like we aren't serious about it and our leadership is sleepwalking us towards increasing catastrophe.

aimor · on March 31, 2021

It's a point from the plausible.io website and sales pitch.

https://plausible.io/lightweight-web-analytics

The file size savings is 44.3 kB per visitor, which over 120,000 visits is 5 GB per year.

The Website Carbon Calculator uses a ratio of 1.8 kWh per GB of data transferred, and 475 g CO2 generated per kWh.

https://www.websitecarbon.com/

44.3 kB per visit * 120,000 visits per year * 1.8 kWh per GB * 475 g CO2 per kWh = 4.5 kg per year after fixing units.

"These numbers are all estimates but you can imagine if millions of website owners and Google Analytics users end up making a similar reduction in their website size too. The total reduction in the carbon footprint of the web would be immense."

ariwilson · on March 31, 2021

"a ratio of 1.8 kWh per GB of data transferred"

This seems wildly high, even counting all the hops. For reference, 1.8 kWh is enough to move my car 7 miles and my e-bike over 100 miles.

guenthert · on March 31, 2021

Someone wants to be paid for that energy dissipated. I don't see how Netflix could be profitable this way.

aimor · on March 31, 2021

It's difficult to estimate accurately, but their methods are spelled out on the website.

https://www.websitecarbon.com/how-does-it-work/

"Energy intensity of web data

Energy is used at the data centre, telecoms networks and by the end user’s computer or mobile device. Of course, this varies for every website and every visitor and so we use an average figure. The figures used are for 2017 from the report On Global Electricity Usage of Communication Technology: Trends to 2030 by Anders Andrae and Tomas Edler, adjusted to remove manufacturing energy as this is not relevant to this calculator. We then divide the total amount of energy used by the total annual data transfer over the web as reported in the Nature article, How to stop data centres gobbling up the world’s electricity. This gives us a figure of 1.8kWh/GB."

crazygringo · on March 31, 2021

If you view the paper [1], I gave it a quick scan and it seems to be counting the electricity usage of all communications devices on top of data centers.

So in power per GB transferred, it's counting all the power used by people's 60" internet-connected TV displays.

Which is, obviously, absurd to include if you're trying to measure the marginal effect of additional data. More data doesn't increase your screen's power consumption, obviously.

An accurate claim for Plausible would have to be based mainly on marginal increases of power by datacenter and communications networks.

[1] https://www.mdpi.com/2078-1547/6/1/117/htm

SamBam · on March 31, 2021

It's hard to say. Obviously without any data at all none of those screens would be on.

I always find the discussion of marginal increases of energy tricky. If I buy a plane ticket on a half-empty flight, obviously that flight was going to take off anyway, so the marginal increase of my weight plus my luggage is fairly negligible in comparison, so I'm only to "blame" for a fraction of the fuel spent, right? But who else is there to blame except the passengers, without whom there would be (eventually) no flights? So shouldn't we all divide the blame evenly?

crazygringo · on March 31, 2021

If we keep it simple, there are two kinds of marginal increases.

The first type is when marginal increase can lead to a "new unit", like planes you refer to -- or servers used by data centers. If a plane fits 100 people, then (simplifying) 1/100 of the time you'll result in a new plane being used, so it makes sense to divide the plane's total resources by passengers -- not just the fuel you used.

But the second type never results in a "new unit". In this scenario, using more resource-hungry analytics will never push someone to purchase a second cell phone to spread the load. So counting anything but marginal energy increase usage by the CPU directly is disingenuous.

So in the case of analytics software, their data center server/power resources fall into the first type. But the consumer device resources fall into the second type.

So in this case I don't think there's anything tricky at all about it.

ryanobjc · on March 31, 2021

What if you are switching from a datacenter that's carbon neutral to one that isn't?

Also as a note, the google analytics js is heavily cached and thus doesn't have to travel as far or at all. Also Google has onramps to their carbon neutral infrastructure everywhere, so theres also that.

foolmeonce · on March 31, 2021

> "These numbers are all estimates but you can imagine if millions of website owners and Google Analytics users end up making a similar reduction in their website size too. The total reduction in the carbon footprint of the web would be immense."

If we removed 40k of CDN content per visit then the 1.8 kwh/GB would be 2.8 kwh/GB.

XCSme · on April 1, 2021

Wait, based on this calculation, transferring 1GB results in ~1KG of CO2? That sounds insane.

seoaeu · on March 31, 2021

I mean 10k requests/second seems quite achievable for a single server. And I'd totally believe that 12 seconds of compute (per year!) wouldn't use much energy. In reality those requests would be intermixed with millions more for other sites and the servers would be running continuously, but the resources attributable to an individual site should be the same.

CobrastanJorji · on March 31, 2021

I mean, let's think about this a bit. The US generated about 4.13 trillion kilowatt-hours in 2019, and that generation emitted about 1.72 billion metric tons of CO2, or about 0.92 pounds of CO2 per kWh (https://www.eia.gov/tools/faqs/faq.php). Let's assume Google gets their power at that rate (which is unfair to Google because they claim to use 100% renewable energy, but I don't want to get into that).

A typical server rack might use anywhere from maybe 5-50 kWh. Let's say Google has really beefy ones that consume 100 kWh per hour. That's 92 pounds of CO2 per hour. For the 12 seconds you mentioned, that's still only 0.011 kilograms of total CO2 used. And the claim is that they're BETTER by 4.5 kilograms.

They've gotta be talking about some other expense than the server. But what sort of expense? The cost to build a server? Something about general maintenance of the Internet? ISPs between clients and the server?

lupire · on April 1, 2021

Of they are this bad at Fermi estimates of carbon emission, I can only imagine how bad they are at web analytics.

cyberlab · on March 31, 2021

Plausible is great, and I see the need for it, but I've always enjoyed using AWStats instead, as there is no need to add third party code to my site. It all happens in the background and it paints a much better picture of your stats since users can't block the gathering of stats with an AD-Blocker.

rkagerer · on March 31, 2021

I've used AWStats for years. It's not perfect, but it's preferable to the scummy alternatives proffered by the big boys.

ukutaht · on April 1, 2021

Plausible developer here.

Interesting you say that. There's no reason Plausible could not be used like AWStats. Parsing logs is just a different ingestion mechanism and we already provide self-hosting via Docker. On principle it wouldn't be too difficult to drain your logs into a Plausible instance or just run it on the same host along your web server.

We ran a test last summer and found the stats from our JS-based tracker much much much more usable: https://plausible.io/blog/server-log-analysis

So this is why we haven't put too much effort in log analysis. The stats we got from AWStats were mostly bot traffic with no good way to get rid of them.

Have you run AWStats and Plausible side-by-side? Do you not have ~90% bots in your logs?

alex0401 · on April 2, 2021

JS won't ever give you an accurate number (there's a growing army of people blocking JS & trackers). Logs will provide an accurate number, albeit you may not know if it's a 100% human.

not_knuth · on March 31, 2021

How would you compare it to GoAcess [0]? I've only ever used GoAccess, but AWStats seems to be the older, more mature tool... so I would be curious about a comparison.

[0] https://goaccess.io/

marvinblum · on March 31, 2021

That's one of the reasons I build Pirsch (https://pirsch.io/). All the JavaScript integrations can be blocked.

lecarore · on March 31, 2021

I'm a pausible customer. They're doing pretty good work, just the basics really and that's enough. I wish they reduced their prices instead of giving 5% to charity. If I want to give to charity I'll do it myself thank you. I feel a little bit like it's just a lot of feel good marketing and barbone software I'm paying for and might churn one of those days. Maybe I'm not the only one. I have very small number of visitors to monitor and your first plan is quite far from free. It's affordable, but not for everyone.

frakkingcylons · on March 31, 2021

I feel like the title should be Taking on Google Analytics. Everyone associates Google with search, not so much website analytics. This title makes me think there’s someone trying to unseat their position in search.

ganeshkrishnan · on March 31, 2021

Google analytics is the wrong end of Google. Sure you can get few customers now and then who love privacy and will ditch GA.

But for most, GA is how Google ads knows how to calculate conversions. People who want to use Google ads (which are everywhere) have to use GA. If you are not using Google Ads, I dont think Google cares much about your site anyway.

winrid · on March 31, 2021

You don't have to, use GA with ads, if you go the "maximizing clicks" route. Yes, it's less accurate.

koalaman · on March 31, 2021

I read most of the article but couldn't quite find how they do analytics without tracking people. Did I miss it? How are they more privacy preserving?

markosaric · on March 31, 2021

This is probably a better overview of what makes Plausible a privacy-focused tool: open source, can be self-hosted, no connection to adtech, minimal data collection, no cookies, no persistent identifiers, no personal data, no cross-site/device tracking etc

https://plausible.io/privacy-focused-web-analytics

(I'm the co-founder)

warkdarrior · on March 31, 2021

OK, so not quite privacy-preserving in the cryptographic sense, but more of a matter of degree. Plausible Analytics collects less data than Google Analytics, but not zero.

aabbcc1241 · on April 1, 2021

The key difference I appreciate is no cross site tracking.

If I'm using a service I won't mind that service knows my traffic pattern. (That's the point of having an account on the service)

0898 · on March 31, 2021

Great to see Plausible on HackerNews. It's one of the few pieces of software (Stripe is one, Starling another) that I deeply enjoy using. I get a good feeling when I open it up. I don't really have the UX vocabulary to explain it better than that unfortunately.

codechicago277 · on March 31, 2021

I’ve been using it for a while but feels pretty lite on the analytics so far, would be nice to see performance stats per page if that’s possible in a privacy friendly way.

markosaric · on March 31, 2021

You mean like a page drilldown to see stats of the individual page? You can do that already. On our live demo, click on any page in the Top Pages report and the dashboard will be segmented to only show the traffic that visited that particular page.

octopoc · on March 31, 2021

What is Starling? I searched for it and found tons of things called that.

0898 · on March 31, 2021

Sorry. Starling Bank.

hoerzu · on March 31, 2021

Oh the irony. The website is protected with Google Recaptcha

untoxicness · on March 31, 2021

> The website is protected with Google Recaptcha

Which website?

The Plausible register page uses hCaptcha (https://plausible.io/register).

max_ · on March 31, 2021

Am I missing something? To me, analytics & privacy sound like a contradiction

kevincox · on March 31, 2021

This is far too simple of a view.

- For the large part the concern is what is done with the data.

- Data can be anonymized. (Although this is often hard to verify)

- You can hide the data in the client. For example imagine you want to know how many users use feature X. You can send an analytics report with 90% chance of a random value, and 10% chance sending the true boolean. You can't tell if any specific user has used the feature (because most likely it is a random value) but you can get a pretty good estimate what portion of your users use the feature.

My understanding is that Plausible is focused on the use an anonymization.

XCSme · on March 31, 2021

It is better privacy. There's one thing for one entity to know what everyone is doing on the web, all the websites that you visit and what you do on them, and another thing for an entity to know that you visited their own website, without knowing what other websites you visit and what you do on them.

LE: The best solution is still self-hosting, as hosted plausible is still a 3rd party entity that centralizes data (even though they probably don't use or share this data).

gdsdfe · on March 31, 2021

but ... the website that I'm visiting have no incentive in caring about my privacy, I mean yes they should but what's in it for them ? I think this go to market approach of "we are better because google is evil" is just flawed.

lecarore · on March 31, 2021

Well, I'm an indie Dev and I do care, I find advertising and cookie notices really annoyining and I can afford the 40 something euros a year it costs me. I don't need to track everything.

XCSme · on March 31, 2021

They do care, the data can be collected anonymously, without being linked directly to your person. They can use such data to improve your experience, without affecting you personally in any way.

XCSme · on April 1, 2021

> https://plausible.io/vs-google-analytics#avoiding-the-adbloc...

For me the default uBlock origin settings do block Plausible tracking, even on a website that used their own domain name to serve the script, but I assume it was because the name was "analytics.site.com".

ElijahLynn · on March 31, 2021

Live demo of the open-source Plausible Analytics here, so you can see the HN spike!

https://plausible.io/plausible.io?period=day (39 current visitors)

nopaintwat · on March 31, 2021

Really great to see a tech company with the motto "Don't be evil"

z77dj3kl · on March 31, 2021

Seems a bit like Plausible only pays lip service to some of these ideas. Merely 5 months ago the co-founder touted here on HN about how they are "big fans of open source so wanted as permissive [a] licence as possible" [0], then promptly went and changed the license to a strongly copyleft (AGPL) a few weeks later!

They might well be the next Elastic/CockroachDB/MongoDB/etc. Or better yet, they might do the classic bait-and-switch later on: get developer buy in with a good story about openness, then once they'd gotten enough of a customer (aka dev) share, do the switch.

[0]: https://news.ycombinator.com/item?id=24700565

markosaric · on March 31, 2021

What's wrong with AGPL that doesn't fit with our ideas?

We were on the MIT first and got into a situation where a large corporation wanted to take our code and resell it to tens of thousands of their customers and they made it clear they didn't want to contribute anything back to our project whatsoever.

We are a two person team putting our own time and savings into this and it could have instantly killed the project and the chance of becoming sustainable.

We changed the license and that was a simple way to stop them without changing our principles/ideas. Could have gone proprietary too at that stage but we didn't.

Everything is clearly explained here https://plausible.io/blog/open-source-licenses

BugsJustFindMe · on March 31, 2021

> What's wrong with AGPL that doesn't fit with our ideas?

Absolutely nothing. That person doesn't know what they're talking about.

I am sorry to hear that you learned about the peril of a permissive license in the way you did, but I'm happy that you switched to strong copyleft. Arguments demanding permissive licensing instead of strong copyleft amount to saying "but then how will I stand on your neck?" You shouldn't have to put up with that.

Daho0n · on March 31, 2021

So pretty much the Elastic route as pointed out by GP.

elliekelly · on March 31, 2021

I don’t really know the background here but it really bugs me when I see people arguing nefarious intent simply because someone changed their mind later. Is there a logical fallacy that addresses “allegations of flip flopping”?

Sometimes people learn something new that changes things. Sometimes situations change and so the strategy needs to change. Sometimes people realize, for whatever reason, they were wrong and so they take steps to correct it. Do some people sometimes flip flop for the purpose of misleading people or pandering? Of course. But I really don’t think that’s typically the motive. We should be supportive of people changing their minds, not suspicious.

nightpool · on March 31, 2021

I'm not sure I understand the root of your complaint. You're saying that because the developers changed the license from a permissive license to a strong copyleft license, they're not supporting open source? I think that using a license like the AGPL is much better for the open source community in the long run, because it makes it more likely that the code will stay free and accessible no matter what company wants to adapt it

FearlessNebula · on March 31, 2021

Why is [a] in brackets?

z77dj3kl · on March 31, 2021

To indicate I edited it (to make it grammatically correct in my sentence).

IncRnd · on March 31, 2021

That looks like a grammatical correction.

kureikain · on March 31, 2021

Anyone know how they identify the same user? All solution I know generate a unique number and put in cookie.

At my app https://hanami.run I don't track user and cannot know if the same users visit our website :-(. I don't want to use cookie and want to get away with GDPR. At the same time, I love to see which visitors repeatly read my website/blog and where they drop so I can optimize my site.

marvinblum · on March 31, 2021

Fingerprinting. That's what I do for Pirsch (https://pirsch.io/) and I think they do it in a similar manner. You can check out our source code here: https://github.com/pirsch-analytics/pirsch/blob/master/finge... or take a look at the Plausible repo.

kureikain · on March 31, 2021

Oh this is really great. Thanks for that. The code is concise.

marvinblum · on March 31, 2021

Yeah that's the easy part. It gets more interesting when you get try to filter bots, parse the user-agent and stuff like that.

A21z · on March 31, 2021

Even though it may be cookieless, you won’t « get away with GDPR » with fingerprinting.

kureikain · on March 31, 2021

It looks like the finger print only rely on ip/user-agent and I think ip/user-agent are ok to be stored and still GDPR compliance?

marvinblum · on March 31, 2021

Everything that can be used to uniquely identify a visitor falls under the GDPR. We don't store IP addresses, so it should be GDPR compliant, but we still need to check that to make the claim.

iujjkfjdkkdkf · on March 31, 2021

The site has a big re-captcha banner on it - one of Google's most consumer hostile products. They should consider switching to something else if they want to "take on google".

rapnie · on March 31, 2021

Well, the site is probably not taking on Google all that much just yet, but they are interviewing Marko Saric who is.

The plausible landing page gives me zero cookies and only requests are to plausible.io and testing.plausible.io

https://plausible.io/

johnnybaptist · on March 31, 2021

Could you explain more about what is consumer hostile about Google's re-captcha?

Any recommended alternatives would be appreciated as well.

tinus_hn · on March 31, 2021

You’re only a considered a real person if you use Googles blessed browser set up the way Google likes it.

nindalf · on March 31, 2021

This is not true. I've used exclusively Firefox and Safari for years and have never fallen afoul of recaptcha except when testing it as a developer.

minsc__and__boo · on March 31, 2021

If they're using the latest version (v3) of re-captcha, it's not hostile as it doesn't even have user interaction.

https://www.youtube.com/watch?v=tbvxFW4UJdU

It runs entirely in the background, and pretty much the only time you'll see a prompt is if you're using a VPN, Tor, or specifically block it.

jraph · on March 31, 2021

Yes, "it runs entirely in the background" means it tracks the hell out of you, across websites.

Basically you are blocked if you care about privacy and refuse this tracking.

That's what I'm willing to call "hostile". I'd say, it's even worse than picking a few pictures, which is already hostile.

iujjkfjdkkdkf · on March 31, 2021

That's about it. Horrible user experience - oh you're about to pay us, just click a few sidewalks first - and condescension of asking people to do a menial task that improves their ML models. But forcing you to use one of their sanctioned browsers and let the record what they want to is where the real hostility comes in. Its exercising monopoly power to squeeze more out of people and repress competition, I'd call that hostile.

minsc__and__boo · on April 1, 2021

Which is still miles less hostile than a website forcing you to prove you are not a bot, every single time, which was the point.

swiley · on March 31, 2021

>pretty much the only time you'll see a prompt is if you're using a VPN, Tor, or specifically block it.

Or using a non Google browser or using an account that Google doesn't like (because they can't associate it with a real identity or whatever.)

edoceo · on March 31, 2021

hCaptcha has been mentioned as alternative.

kevincox · on March 31, 2021

I find hCaptcha way more annoying to solve than reCAPTCHA. The puzzles take way longer and I often have to do multiple of them.

z77dj3kl · on March 31, 2021

Do you block Google trackers aggressively? reCAPTCHA uses that very heavily: if you allow all of their stuff and they track you across the web, you'll have to basically never do more than click the button. On the other hand, if you take your privacy seriously and are aggressive about tracker blocking, you'll have a pretty awful time.

I imagine hCaptcha doesn't have enough trackers sprinkled around the web to use those as signals for this.

kevincox · on March 31, 2021

I do block Google trackers, and have network state partitioning enabled, however the reCAPTCHA tests are usually bearable. (often a checkbox, sometimes a page) It seem like I get at least 2 pages of tests for hCaptcha every time.

eatbots · on March 31, 2021

This is under the control of the site with hCaptcha, so you'll tend to see more variety in difficulty levels depending on their settings.

There will always be some individual variance, but when we've tested this people always solve hCaptcha faster than reCAPTCHA on average.

(disclosure: work there)

Jiejeing · on March 31, 2021

Same. I heavily block google scripts and hate reCAPTCHA with a passion, but hCaptcha really takes the cake for the most painful captcha experience.

melomal · on March 31, 2021

The fact that they will need to pay Google for this service after a certain threshold also limits taking them on. Pennies but still paying them.

elliekelly · on March 31, 2021

Off topic but I’ve noticed recently that I’m frequently forced to incorrectly answer re-captchas the way a computer would in order to move forward.

Some examples: “click all the tractors” showed I did not complete the task because of a photo of construction equipment; “click all the crosswalks” because I didn’t select the photo of a thick white fence; “click all the traffic lights” because I didn’t select a photo of a parking meter. I just clicked the incorrect photo so I could move on but I can’t help but wonder if there’s any mechanism to catch those incorrect (manual, human) annotations on the training data Google is collecting.

sedatk · on March 31, 2021

What’s the non-hostile alternative?

Nextgrid · on March 31, 2021

Old-school "squiggly letters" captcha? For all the fear mongering around AI and machine learning supposedly breaking them, I'm still not aware of a general-purpose tool that would solve those out of the box without significant engineering effort.

spijdar · on March 31, 2021

A general purpose tool like this one? https://github.com/PatrickLib/captcha_recognize

throwaway53453 · on March 31, 2021

The worry is not about ML. It's about bot farms in India/China with real people behind the wheel. That's why CAPTCHA needs to be able to evolve without maintenance from the website operator.

tinus_hn · on March 31, 2021

It’s not like Googles solution is watertight.

Daho0n · on March 31, 2021

Plausible is still allowing DNS trickery for cross domain tracking as far as I can tell. This alone will keep us from ever trusting them. Only bad actors does this.

lecarore · on March 31, 2021

The analytics have no direct benefits to an individual visitor, like ads, so I get why you'd block them. I myself don't care about showing up in the analytics of the website I visit. But I pay for paisible because they are way less intrusive, and they get added to the blocklist anyway. This doesn't encourage good behaviour. From a website owner perspective, if I don't circumvent the blockers I need a server side solution. It would be equivalent privacy wise, harder to set up but less visible.

XCSme · on April 1, 2021

If you try to access a service and it's down, doesn't that impact you as an individual visitor? With analytics they can know it's down. If you click a button and you get an error, doesn't it affect you if they are not notified and fix that error? If a company you like and use their products can get better and release even better and cheaper products using analytics, doesn't that bring direct benefits to you?