How do you reach that conclusion? From the article: > *It just so happens that T...

acdha · on Feb 2, 2022

> The article contradicts your conclusion. If Firefox did not have telemetry, the bug would have had no impact, and users would not have suffered an outage.

This is your mistake: as explained in the article, it could have affected any component. Telemetry happened to hit it first but anything using HTTP/3 with that path would have been affected.

“This is why users who disabled Telemetry would see this problem resolved even though the problem is not related to Telemetry functionality itself and could have been triggered otherwise.”

KronisLV · on Feb 2, 2022

> ...as explained in the article, it could have affected any component. Telemetry happened to hit it first but anything using HTTP/3 with that path would have been affected.

Is this really relevant, though? To the users who were unable to use their browsers normally it doesn't matter that this problem could have occurred elsewhere as well, but rather that it did occur here in particular.

If particular sites would break, then that could be debugged separately, but as it stands even people who'd be perfectly fine with browsing regular HTTP/1.1 or HTTP/2 sites were also now impacted, not even due to opening a site that they wanted to visit themselves, but rather some background piece of functionality.

That's not to say that i think there shouldn't be telemetry in place, just that the poster is correct in saying that this wouldn't be such a high visibility issue if there was no telemetry in place and thus no HTTP/3 apart from sites the user visits.

acdha · on Feb 2, 2022

The comment I was replying to worded in a way which was trying to attribute blame to the telemetry service. As shown in this thread, there's a certain ideological position which welcomes any attacks on telemetry and I think that's a distraction from the technical discussion about how Mozilla could better have avoided a bug in their networking libraries. Recognizing this as a bug in the network stack first triggered by Telemetry makes it clear that this is not the place to have the millionth iteration of flamewars about that service but rather questions like the design of that network loop or not having test suite of the intersection of those particular libraries.

KronisLV · on Feb 2, 2022

> Recognizing this as a bug in the network stack first triggered by Telemetry makes it clear that this is not the place to have the millionth iteration of flamewars about that service but rather questions like the design of that network loop or not having test suite of the intersection of those particular libraries.

Surely one could adopt a "shared nothing" approach, or something close to it - a separate process for the telemetry functionality which only reads things from either shared memory or from the disk, where the main browser processes could put what's relevant/needed for it.

If a browser process fails to work with HTTP/3, i don't think the entire OS would suddenly find itself not having any network connectivity. For example, a Nextcloud client would still continue working and synchronizing files. If there was some critical bug in curl, surely that wouldn't necessarily bring down web browsers, like Chromium, either!

Why couldn't telemetry be implemented in a similarly decoupled way and thus eliminate the possibility of the "core browser" breaking due to something like this? Let the telemetry break in all the ways you're not aware of but let the browser continue working until it hits similar circumstances (if it at all will, HTTP/3 isn't all that common yet).

I don't care much for flame wars or "going full Stallman", but surely there is an argument to be made about increasing resiliency against situations like this one. Claiming that the current implementation of this telemetry is blameless doesn't feel adequate.

angus-prune · on Feb 2, 2022

> I can understand using a few resources in the background to serve some of that data, but it must never get in the way of actual work getting done.

Which is exactly how the code was intended to work. Firefox did not design their software to hang in the event of telementry losing internet access.

I don't know firefox's internal architecture or its development, what follows is pure conjecture.

Their intention seems to be to slowly migrate the codebase from C++ to Rust. That telemetry is the only function to so far rely on their new rust networking library viaduct (and thus trigger the bug) could be because they wanted use their least important fucntionality as a test bed. In which case, if there wasn't any telemetry, a different piece of code would have been migrated to rust first and triggered this same bug. Without the telemtry, it would have presumably taken them longer to realise that things had broken, let alone resolve it.