I'm in complete agreement. A short while back I mused on whether we could leverage genetic algorithms to actually find a single point of failure in the systems we're designing (linked here: https://news.ycombinator.com/item?id=7660998). Heck, I can totally see people using genetic algorithms to come up with nuanced unit/functional test input-data just to see what happens (sorta like Haskell's quickCheck but on steroids spanning subsystems). Tired of using a simple-minded tool like jMeter that just hits your service with 1000 concurrent requests that look similar? Let's use GA to simulate how a certain "variety" and "volume"of requests can cause a cascading failure while your app's JVM is undergoing garbage collection. That's a contrived example, but you get what i mean, right? For real though, I wonder how many people are using this on a daily basis to spruce up their test cases (or as you said, rewarding things like colliding with walls). It would be interesting to see how much success they've had with this.
There are people using "generated" tests based on specs crafted by programmers. The whole thing is called "generative" testing.
The reason why this stuff works, is that there are almost always lots of bugs out there, and that RNGs aren't subject to the misconceptions programmers are subject to. It's also why fuzzing works.
I definitely think that GAs can uncover a variety of bugs ranging from simple NPEs and more nuanced bugs like memory leaks (which would then required human dev intervention to investigate for a post-mortem) by dynamically generating the test-inputs.
In addition to simply generating the input data, do you feel like GAs could broaden their span to essentially "mock" the states of other components in the system? I'm thinking of a case where you have some set of services deployed on different machines that communicate with each other. In theory, could we have the GA "mock/simulate" a network jitter sporadically (thus intercepting a request from ServiceA to ServiceB and deliberately dropping it)? This extends beyond the input data for some entry point at ServiceA, and instead encapsulates some sort of an "ether" surrounding all the components. Every permutation and combination of the subsytem state could in theory be controlled by the governing GA.
If someone actually built a DSL/library that handled these things I'm sure it would benefit everybody in a remarkable way.
You post seems to be insinuating that we deem GAs as a panacea. No one here is saying that; to the contrary, we're just trying to see how we could use them for dynamically generating interesting test-input data. In addition to that, I'm just thinking out loud if you could possibly append to that functionality and see how to "prepare" more interesting test cases when multiple layers are involved.
No one is disputing the fact that you need an expert to tune these to get the desirable result for hard problems. I argue that having a "good enough" understanding of GAs (i.e. you don't need a PhD in the subject) should be sufficient for you to solve simpler problems such as the one we're discussing.
Do you have any counter-arguments to that? Can you cite any other examples where this view is challenged?