As a researcher studying evolutionary algorithms, one thing these algorithms are quite adept at is finding holes in the objective functions you give to them. They are like automated lawyers seeking loopholes.
You want to evolve a controller for a robot that walks far, it will find exploits in the physics engine that defy gravity or somehow catapult the robot through the air (this sort of exploit is the bane of those doing research in virtual creatures [1]). You want to evolve a talented player for an atari game like beamrider, and evolution will find an infinite score-loop exploit [2]. It's meeting the letter of the law in maximizing your measure, but it's missing the forest for the trees.
This kind of problem has led some in the community to look at driving evolution by other means -- e.g. my dissertation was on evolving neural network controls for robots by choosing the most novel behaviors instead of those that performed the best by the fitness measure [3].
Overfitting is when your model is over-fit to the data you are training on [1], but what we're talking about here is slightly different -- the EA has maximized the given objective function correctly, but it turns out that your objective function is corruptible: A high value on your simple measure doesn't match the intuition motivating it.
You had something particular in mind when you set the objective function for your biped robot to be distance travelled before it falls. Finding an exploit in the underlying physics engine that allows the biped to explode and travel quite far -- while indeed excelling at the function, is far from what you had in mind. Yet it does reveal a vulnerability in your physics engine, which while unexpected, may be valuable in its own right, as the application the post's author argues.
This isn't entirely unique to EAs -- but it seems to happen more frequently in the kinds of domains EAs are often applied to -- complex simulations where the gradient of an objective function is difficult or impossible to calculate.
I think he had something else in mind, somewhat related to overfitting -- you're maximizing (too much) a metric over an unrealistic solution space, and that solution when applied to a real solution space won't behave as intended.
So those are "overfitting" for the exact quirks of the engine. I agree, it's a stretch, but not unlike in ML, it shows that optimizing too much will show quirks if your 'objective function/simulation' pair is not exactly equal to the 'desired objective function/reality', or if there is an innumerable ensemble of situations you might expect the solution to behave well.
It happens in EA because the researchers don't account for over fitting. The experimenters often do not validate so the population converges to the training set maxima too far. If you had 2 simulations of walking. You would select and mutate based on one sim and stop the evolution when performance degraded on the second sim. Not blindly maximize the one measure.
if validation were performed the population would not evolve into the cracks of unlikely physics loop holes.
Thomas Ray, who wrote the A-Life program Tierra found that genetic algorithms were really good at finding bugs in his code and exploiting them. One bug where that happened was a register he forgot to clear between simulated viruses. A virus evolved to exploit the data and use it to attack other viruses.
EvoSuite springs to mind for that one [1]. Java only though, but their publications describe how it works well enough to port the ideas from what I can tell.
I'm in complete agreement. A short while back I mused on whether we could leverage genetic algorithms to actually find a single point of failure in the systems we're designing (linked here: https://news.ycombinator.com/item?id=7660998). Heck, I can totally see people using genetic algorithms to come up with nuanced unit/functional test input-data just to see what happens (sorta like Haskell's quickCheck but on steroids spanning subsystems). Tired of using a simple-minded tool like jMeter that just hits your service with 1000 concurrent requests that look similar? Let's use GA to simulate how a certain "variety" and "volume"of requests can cause a cascading failure while your app's JVM is undergoing garbage collection. That's a contrived example, but you get what i mean, right? For real though, I wonder how many people are using this on a daily basis to spruce up their test cases (or as you said, rewarding things like colliding with walls). It would be interesting to see how much success they've had with this.
There are people using "generated" tests based on specs crafted by programmers. The whole thing is called "generative" testing.
The reason why this stuff works, is that there are almost always lots of bugs out there, and that RNGs aren't subject to the misconceptions programmers are subject to. It's also why fuzzing works.
I definitely think that GAs can uncover a variety of bugs ranging from simple NPEs and more nuanced bugs like memory leaks (which would then required human dev intervention to investigate for a post-mortem) by dynamically generating the test-inputs.
In addition to simply generating the input data, do you feel like GAs could broaden their span to essentially "mock" the states of other components in the system? I'm thinking of a case where you have some set of services deployed on different machines that communicate with each other. In theory, could we have the GA "mock/simulate" a network jitter sporadically (thus intercepting a request from ServiceA to ServiceB and deliberately dropping it)? This extends beyond the input data for some entry point at ServiceA, and instead encapsulates some sort of an "ether" surrounding all the components. Every permutation and combination of the subsytem state could in theory be controlled by the governing GA.
If someone actually built a DSL/library that handled these things I'm sure it would benefit everybody in a remarkable way.
You post seems to be insinuating that we deem GAs as a panacea. No one here is saying that; to the contrary, we're just trying to see how we could use them for dynamically generating interesting test-input data. In addition to that, I'm just thinking out loud if you could possibly append to that functionality and see how to "prepare" more interesting test cases when multiple layers are involved.
No one is disputing the fact that you need an expert to tune these to get the desirable result for hard problems. I argue that having a "good enough" understanding of GAs (i.e. you don't need a PhD in the subject) should be sufficient for you to solve simpler problems such as the one we're discussing.
Do you have any counter-arguments to that? Can you cite any other examples where this view is challenged?
As another example of using genetic algorithms for breaking things, a friend of mine wrote his dissertation [1] on using genetic algorithms for file format fuzzing. I believe there may even be some source code available on GitHub.
A long time ago I had huge interest in genetic algorithms and on other soft optimization techniques. It is what inspired me to take up graduate studies in the first place. But very quickly I got thoroughly disillusioned by the community around it.
What I am going to say is going to be very unpopular to the audience of this post. What turned me off and left a bad taste is the tendency for the community to push it as magic, much like what a shady snake oil salesmen would do.
Rather than building the theory on analysis, classification and measurable properties and definitions of objective functions for which GA with specific mutation and crossover operators are a good solution, the way of the community is to push it as snake oil. Often taking examples of cooked up functions that arent really important to globally optimize over and totally ignoring speed of optimization with other competing tools.
The method of choice of the community seemed to impress the audience, not with how fast and effective the methods are compared to the traditional, but with colorful analogies. Heck, a bead of perspiration rolling down an armpit optimizes a function, that does not mean that is how we should optimize any function on a computer that way.
This is really unfortunate, because if one goes back to GA's genesis, classifying and characterizing functions that yield themselves to GA was a big part of David Goldberg's book. There are indeed functions that are very amenable to GA and other such methods. Characterize them mathematically, rather than what seems to me, cook up two adhoc operations name them 'crossover' and 'mutation' and look cool.
That is hardly progress.
On the other hand if you prove a theorem stating that for this types of problems, if you let GA run for this amount of time you are going to be close to the optimum with high probability. That would be phenomenally useful.
At its basic, GA is a parallelized and randomized exploration of the space. Tell me why is that particular way of randomization and parallelization the right thing to do, or for what classes of problems is it the right thing to do. Without this, they are anecdotes.
I will give you a specific example of what I think is a better way. Consider the clustering algorithm K-means. Obtaining its global minimum is an NP hard problem and has been a target for GA, ant colony, swarm optimization and what have you. All of them are blown way off the park by simply better initialization of the standard K-means algorithm. For example just initialize it with a set of furthest first points and it will give good results. Not only that, it will provably be very close to the global optimization, but people did not know that in the beginning. Some people had to say to themselves "I am going to find out more about the structure of the problem so that I can find efficient methods". These make progress.
Take another example, non-negative matrix factorization: another objective function that is NP hard to optimize. I will make a different point with this example (although the same point could be made with the k-means example). The cases where the simple local algorithm for NMF does not converge to near global optimization are provably uninteresting to begin with. Even if the global optimum was indeed found, it would serve no purpose.
Another example global optimization of low rank matrix factorization models for recommender system, same problem again, optimal solution NP hard to obtain. Again simple methods do provably well if you utilize the structure of the problem.
Often the difficult function that one is trying to optimize can be entirely circumvented by modeling the phenomena with a better behaved function, model the problem differently so that better techniques apply. This not always possible, but I see people not even trying, apparently it is easier to publish a few papers when you spray it with colorful analogies. An example: protein folding. It is an misguided effort to find the globally optimized shape, nature does not fold proteins optimally.
Another example: consider classifiers. Misclassification as a function is the worst nightmare, non differentiable, nonconvex, with huge swaths of zero gradient. So what do people in machine learning do, bound it on top by a well behaved function and optimize that function. They dont stop there. They first validate empirically that optimizing the surrogate has good performance and then prove theoretically that this comes with guarantees on performance.
I am more convinced that analyzing important objective functions (that are important to optimize), studying their properties and motivating performant optimizations schemes is a far better contribution than coming up with colorful analogies of genetics, evolution, swarms etc without characterizing precisely what it is that makes it work and when. Demystify the problem and the technique rather than mystify and please compare running times and resources with more traditional optimization.
Again, I am not saying that GA, or swarm optimization methods are wrong to use. There are indeed cases where it is exactly the correct thing to use. Consider gene/dna assembly, here GA is indeed the right thing to use. GA works well when the optimal solution looks like assembly of small good solutions. Swarm works well when the optimization function looks like a smooth well behaved function overlayed with a high frequency fluctuation. I wish this is what the community focused on.
I was going to post a comment with similar points, but instead Ill just add to srean's ideas.
If nothing is known about the objective function, no optimization algorithm can possibly be said to be better than any other. This is the no free lunch theorem.
So there is absolutely no sense in talking about optimization algorithms in isolation from the problems they're meant to solve. An intelligent appraisal of genetic algorithms would talk about the types of objective functions they seem to be able to find good answers for.
The no free lunch theorem states that, the performance of any two search algorithms are equivalent when averaged across all possible problems. This fails to hold in coevolutionary settings - selecting a champion through self play. In such cases there will be pairs of algorithms where one is demonstrably better than the other for all possible problems; a free lunch by criteria of the NFL theorem [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100... ; also worth checking out: http://www.santafe.edu/media/workingpapers/12-10-017.pdf].
The No Free Lunch Theorem is also one of those limit statements that rarely impinges on reality. We are not interested in all possible functions - the majority of which will be of such complexity as to be indistinguishable from random - only those with exploitable structure. It's much the same reason for why, although kmeans is NP-hard, failing to find a good clustering is very often suggestive of an ill-posed problem with no interesting structure. If kmeans didn't find a good cluster, very possibly a good one does not exist (e.g. Clustering is difficult only when it does not matter; http://arxiv.org/abs/1205.4891)
> If nothing is known about the objective function
What that "nothing is known" means, in the context of NFL theorems, is that values at each point are random, and independent of values at other points.
For example, if our domain is a 100-by-100 grid, and we fill the grid with random numbers, now we have such a function. And indeed, no optimization function is of any help in trying to find the largest of the 10 000 random numbers.
But this kind of totally random functions pretty much do not exist in the real world. So if out objective function came from the real application (and not from a random number generator), we already know enough about it that we can say that the NFL theorems don't apply to it.
Sure, no particular instance of a function will be so utterly random, but I think you are probably missing the point of the parent comment. All that NFL says is that if you fix an optimization algorithm and average its performance over all possible measurable functions, no algorithm will be better than the other. The average is with respect to the distribution you described. However, if you skew the distribution so that certain class of functions are more likely than others, then an ordering will be imposed over some of the algorithms. Some algorithms will be better than others.
Therefore, trying to design a fully generic optimization algorithm is a fool's errand. On the contrary, a specific objective function will have specific properties and in that case it would be valuable to use specific algorithms suited to its properties. To follow the argument forward, it would be very useful if it was characterized what is exactly the class of objective functions that EA, GA, swarms, ant colon algorithms handle well. Under what assumptions is their specific randomization and parallel evaluation the right thing to do. These are resource intensive procedures, so such a characterization will tell us when is that effort well spent.
One uncharitable but plausible way to read the tail end of your comment is that just because a particular function is not random, it will violate NFL and magically make EA, GA style algorithms appropriate, that is not true. Glad that you were able to reword it before the edit window closed.
Edit: replying here because this thread is becoming nested too deep.
Hi @Dn_Ab I think we may have talked passed each other, so clarifying. Of course a bias / preference will naturally get induced over algorithms when you select objective functions non-uniformly. No magic there. But that is not going to make a particular choice of an algorithm (in this case EA, GA et al) magically appropriate for whatever specific nonuniform distribution over the objective function chosen. The choice either has to be deliberate (in which case we would need to know a measurable description of the class where these algorithms work better) or one has wait to get wildly lucky, the latter is about as productive as playing lottery except that the tickets are pretty expensive when we play EA, GA etc.
@sampo > have very specific mathematically defined meanings here, and those meaning are probably very different than what a casual reader might expect.
Good point, upvoted, now I understand your previous comment better.
@sampo > I also like your "snake oil" metaphor, and I was happy to see your top comment on this topic.
I expected it to be downvoted out of existence given a few snarks that I yielded to.
> All that NFL says is that if you fix an optimization algorithm and average its performance over all possible measurable functions, no algorithm will be better than the other.
> Therefore, trying to design a fully generic optimization algorithm is a fool's errand.
I don't think we have any factual disagreement. In pure mathematical context, what you say is absolutely true.
I just wish to emphasize, that "all possible measurable functions" and "fully generic optimization algorithm" have very specific mathematical meanings here, and those meaning are probably different than what a casual reader might expect.
In a set of "all possible functions" an overwhelming majority are functions that are indistinguishable from random noise. They are e.g. not continuous, not even remotely like continuous, they have no structure at all whatsoever.
On an intuitive level, everybody understands that it is a fool's errand to try to design an optimization algorithm to find the maximum from an array of random numbers. But when you just say "trying to design a fully generic optimization algorithm", people may not realize that the "fully generic" contains the requirement that it should also work on random noise.
The No Free Lunch Theorem does not say anything about the feasibility of fully generic optimization algorithms, if we restrict the "fully generic" to mean anything that can be expected to appear in any real world application. A lot of people might agree that an algorithm that performs well in any real world problem is "fully generic", even though it may not perform well in all imaginable abstract mathematical settings (which mostly means settings of random noise).
To your last comment: I also agree with you on this front. Also my understanding is that for every or almost every type of problem there are other optimization algorithms that drastically outperform genetic algorithms. I also like your "snake oil" metaphor, and I was happy to see your top comment on this topic.
I'm not sure that the phenomenon you've encountered is unique to the evolutionary computation community: If you go to an academic conference dedicated to specific methodology, because the attendees have a vested interest in that methodology (they may have built their career on it), that hammer will certainly end up finding many questionable 'nails'.
In my opinion, EAs are most useful when you don't have a more specific human-fit algorithm to solve a problem, and particularly when a gradient to your objective function cannot be calculated. For example, when you are trying to create a controller for a many-jointed robot in a complex simulation with a high-level objective function.
Thank you for your comment, hope I am not being too harsh, I just have been very peeved by my experience. The downvotes are already here, not unexpected.
From my experience, and this was a while ago so I will be glad if this has changed, it seems that the community prefers to push their techniques as snake oil and not try to nail down the characteristics of their techniques and show how to match it with a function I want to optimize. I want them to offer principled guidelines that would allow generalizing the techniques beyond anecdotes. I have no problems with communities pushing their technique, to the contrary, my disenchantment is with them not doing this.
I want the community to produce re-usable pieces of interesting/novel information that I can use when I am faced with optimizing a function.
You mentioned non-differentiable functions. Now lets take a look at a subclass of these functions: convex non-differentiable functions. There are very efficient methods for these.
Consider another class, lets throw away convexity, consider functions that are non-differentiable and very rough locally but when filtered with a low pass filter is well behaved. Then again we know what to do.
Consider functions that are difference of potentially non differentiable convex functions (this is a Huge class. The difference need neither be differentiable, nor be convex), then again we have good ideas about what to do.
I think building this decision function: Problem_type -> preferred_algorithm is a very useful exercise. What annoys me is that GA community seems not to be interested in this, and take cheap shots by presenting anecdotes.
Prove properties, of your techniques, I will buy them by the bagful.
Metaheuristics are blind methods that make very few assumptions about the structure of the solution space. If you already know some of the structure of course you can find a better optimization algorithm. Better yet, you can find a better representation or mutation function that will massively shrink the search space. Or a fitness function more likely to lead evolution through the correct path. Or even seed solutions that give the algorithm a head start.
>On the other hand if you prove a theorem stating that for this types of problems, if you let GA run for this amount of time you are going to be close to the optimum with high probability. That would be phenomenally useful.
>At its basic, GA is a parallelized and randomized exploration of the space. Tell me why is that particular way of randomization and parallelization the right thing to do, or for what classes of problems is it the right thing to do. Without this, they are anecdotes.
That's pretty much impossible. If you already know so much about the problem space to the point of being able to prove theorems about it, then you don't need metaheuristics.
With apologies, if a proponent of an algorithm cannot tell me under what circumstances the proposed algorithm would do better, the proponent is selling me snake oil. There has to be some quantitative statements about their properties, not necessarily proof of convergence to the global optimum. Otherwise it is not science, (yet). Its not an understanding of the space of an arbitrary problem that I desire, it is a measurable or mathematical characterization of the space where these algorithms work well, ideally the process of measurement should be cheaper than applying the algorithm.
EDIT: replaced 'you' with 'proponent' lest it gives the impression that I meant it personally.
We do know what kinds of situations genetic algorithms do well at (though generally other metaheuristics like hillclimbing do slightly better.) The fitness landscape has to be as smooth as possible and not have bad local optima. You want as few variables as possible. And you want them to be independently correlated with fitness. If an improvement requires multiple things to mutate all at once it's unlikely to happen.
There are various other heuristics. But you definitely need to have an understanding of the solution space (as well as the alternative) to know for sure.
I largely agree with you, but there are recent pushes to make the study of genetic algorithms (usually not stated this way) more palatable to the theory community. Look, for example, to the work of Christos Papadimitriou [1] (who is the anti snake-oil guy if I've ever seen one). He gave a talk last year at the Simons Institute on his ideas [2]. See also [3].
The essential argument of this new school is that GAs (and evolution in general) do not try to find the optimal individual. Here is a reader's digest argument as to why: say you had actually achieved the perfect individual, then sex would result in imperfect children while the parents die off. So if evolution isn't optimizing fitness, what is it optimizing? One hypothesis is that it's balancing fitness and variation, so that if the environment changes drastically the entire population does not die off.
[Edit: another note to add] And there is a common adage in CS theory that for every problem, a focused gradient descent algorithm (using knowledge about the optimization function, domain, etc.) will always outperform GA. So this is a sort of meta-theorem that says asexual reproduction is better at finding the optimal individual than sexual reproduction.
There are concrete techniques being analyzed in this context, for example the "Multiplicative Weights Update Algorithm," which has been proven to perform well against adversarial manipulations of the environment. See [4] for a long and detailed reference of its applications to classical CS problems.
You provided (i) a link to Papadimitriou's dblp page, (ii) a link to an academic program to devise algorithms, data structures and mathematics to reconstruct and study evolution (nothing to do with GAs), a program in which Papadimitrou is involved and (iii) a survey paper on a classic optimization technique from the mid 80s known as exponentiated gradient method (alternatively called mirror descent, Bregman proximal gradient method, multiplicative update method), again nothing to do with GAs. Apparently it all connects to GAs, but how it does so totally eludes me :)
GA is not natural evolution, GA does whatever it is programed to do by whoever wrote the code for it. The program that you pointed to is about using CS techniques to analyze huge amounts of natural evolution related data (by the terrabytes) to make sense out of it.
I have not watched the video yet, so it is possible I am missing something, I must be.
GA has nothing to do with evolution? Now I am thoroughly confused. GA is inspired by natural selection. If you program your algorithm to mimic natural selection then of course it's related to evolution, and if you don't then it's no longer a GA. Maybe your point is that the GA community has taken their snake oil so far away from this that they're not even using natural selection anymore?
If you want to study GAs, and you want to avoid the snake oil, what would you do? Selling it as studying the mathematics of natural selection seems like a good idea to me. I'm not saying that's Papadimitriou's motivation, but he does mention things like GAs in his talks.
The DBLP page has a handful of papers at the very beginning relating to this new direction. I didn't want to presume to choose one, so I linked there. But for those who don't have the time to read the titles and abstracts, here is one titled "Multiplicative updates in coordination games and the theory of evolution." [1]
Here is an excerpt:
> In this paper we provide such a demonstration; in doing so, we make some totally unexpected connections between Evolution and familiar concepts from Computation and Game Theory.
They go on to show that the natural selection occurring in some well accepted model of natural selection is equivalent to a multiplicative weight update in a coordination game. This would suggest that if we want to come up with provable guarantees for the convergence properties of GAs, we may fare better by relating their dynamics to these well-understood tools.
GA uses evolutionary analogies and metaphors, but that is about as much the similarity between large scale algorithms and models to explore gene and evolution data goes. So really little or no similarity between the two except for use of common words.
To put it differently that academic program is as related to GA as, imaging algorithms to explore FMRI data is related to training artificial neural networks.
However the specific paper that you point to now, and changing GA to follow the nature's model is indeed an interesting idea.
If this is what GAs are doing then it would imply it's not a good technique for single-purpose optimization at all. It could be a good technique, for example, for designing robust agents to interact in some environment, such as robots walking or viruses in an unknown network.
I think the best use of GAs is sort of rapid prototyping. Throw them at the problem to get a better feel for the search space then think about it deeper and find the best algorithm you can come up with.
With the appropriate disclaimers that there's usually better algorithms I think they are pretty valuable from a pedagogical point of view (actually I'm thinking more of nature inspired algorithms in general not just GAs). It helps if you can relate an algorithm to something in nature because most algorithms are very abstract and hard to grasp.
I also think nature inspired algorithms can help people think about problem solving in creative ways (just wander through your garden and see if you can replicate some of the stuff in algorithmic form etc.)
I like the implied statistical wisdom, i.e. If you fail often, then help us find where we are failing. Rather than, if your passing solutions are brilliant then help us find passing solutions.
An aside, for things like unit tests where in most cases the result of the test is binary i.e fail/pass. How do you think about and ultimately represent the fitness criteria.
You want to evolve a controller for a robot that walks far, it will find exploits in the physics engine that defy gravity or somehow catapult the robot through the air (this sort of exploit is the bane of those doing research in virtual creatures [1]). You want to evolve a talented player for an atari game like beamrider, and evolution will find an infinite score-loop exploit [2]. It's meeting the letter of the law in maximizing your measure, but it's missing the forest for the trees.
This kind of problem has led some in the community to look at driving evolution by other means -- e.g. my dissertation was on evolving neural network controls for robots by choosing the most novel behaviors instead of those that performed the best by the fitness measure [3].
[1] https://archive.org/details/sims_evolved_virtual_creatures_1...
[2] http://www.cs.utexas.edu/users/ai-lab/?atari
[3] http://eplex.cs.ucf.edu/noveltysearch/userspage/