It can be some nice fuzzy algorithm with learning capabilities. E.g. if you change background, but button is in same place, you can just update the test.
From my experience, the fuzzy matching works well enough to handle picking out text or icons on interfaces with semi-transparent backgrounds that change on scrolling (e.g., due to a parallax effect). It's pixel-based and not scale-invariant, though, so it won't automatically handle multiple zoom levels/resolutions.