The thing about Tesauro's backgammon work that excited the community is that the system trained by playing itself (http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node108.htm... -- "To apply the learning rule we need a source of backgammon games. Tesauro obtained an unending sequence of games by playing his learning backgammon player against itself.").
Also, it didn't use an elaborate set of features and heuristics adapted for backgammon, just a simple representation of the state of the board (a list of 0/1 variables encoding how many pieces of each color are on each position).
This is pretty close to "from scratch", and I think the article would have done well to point out what is actually new here.
Also, it didn't use an elaborate set of features and heuristics adapted for backgammon, just a simple representation of the state of the board (a list of 0/1 variables encoding how many pieces of each color are on each position).
This is pretty close to "from scratch", and I think the article would have done well to point out what is actually new here.