All end to end tests are non-deterministic due to asynchrony. At some point you ...

aidenn0 · on Sept 24, 2021

I mean the exact output given a certain set of inputs may be slightly different due to asynchrony, but given a set of inputs, there should be a finite set of correct outputs and check for those.

To use a stupid example: if listAnimals returns [cat, dog, mouse] some of the time and [cat, mouse, dog] other times, if your passes on the former and not the latter, then your test is broken and you should fix it. If it sometimes returns [cat, dog, mouse, tree] then your system is broken and you should fix it.

amw-zero · on Sept 25, 2021

A more accurate way to look at this based on your example is that, sometimes listAnimals returns [cat, dog, mouse], and sometimes it returns null.

It’s not that the result is nondeterministic, it’s that _whether or not the result is returned within the timeout of the polling mechanism_ is deterministic.

aidenn0 · on Sept 25, 2021

Presumably that happens in production as well, and the test can determine that the system does the proper thing when that happens?

mattnewton · on Sept 24, 2021

I should be able to test that this usually works though, right?

grumple · on Sept 24, 2021

You can test these things, sure. But if you're using other people's software (linux, vms, chromedriver, capybara) on other people's hardware (again, vms), you have to tolerate the fact that you can't control everything if you want to actually get work done. A little electrical, magnetic, or gravitational anomaly here, a little memory access blip there, some competition for cpu time elsewhere... I suspect there are probably only a handful of completely controlled environments on the planet and even those are suspect.

Test suites are sort of an eventual consistency problem themselves...

aidenn0 · on Sept 24, 2021

If you use other people's software and hardware, and those things don't perform the way your software assumes they perform, knowing that would be useful, right? There's always a limit to how much you want to handle, but if you are having a test fail even a large fraction of 1% of the time, then there's probably some underlying behavior that you should account for in production as well.

amw-zero · on Sept 25, 2021

No, that test doesn’t give you any useful information, because all it told you was that your expected answer wasn’t found in the configured time interval. You have no way of knowing whether or not your expected behavior would be satisfied if you ran for t + 1 seconds.

jerome-jh · on Sept 25, 2021

After some time you have to consider the test is failed and investigate, even if it would have succeeded had the timeout been 1 second larger. I cannot believe they do not have quality of service requirements. Testing those requirements is of course not easy. It may take to much time to run on every release or may be considered out of the scope of E2E tests and compliance is checked with telemetry results.

However pick any response time mandated by the QOS requirements, multiply by an appropriate x and use this as the pass/fail timeout for your test. Take a value large enough that can easily be considered a bug (because e.g. the customer would think the operation failed and would hit refresh or back). You then have an issue that is definitely worth investigating. You may actually have reproduced a rare issue that is part of the long tail of your telemetry.

aidenn0 · on Sept 26, 2021

Right have a timeout measured in minutes. The timeouts have zero effect on a clean run, so large timeouts have no effect on time to deploy if you require a clean run of tests for deploying.

mattnewton · on Sept 24, 2021

Right, but the key word here being “usually” - if I can’t just run the test three times and assume 2/3rds of the time it’s good, how can I know it usually works in production?

Is the right solution really to throw up your hands and not test end to end ever? I guess the argument is more convincing if it’s not that it’s impractical, it’s just too expensive relative to the returns.

jart · on Sept 24, 2021

You can if it's BSD-licensed.

amw-zero · on Sept 25, 2021

This is what’s known as “counterintuition.” You would think that you could, but you are wrong.

I’m not saying you can’t write a passing end to end test. Of course you can get it to pass some times. But they are inherently non-deterministic.