You are right that they assume the subject is looking at a face, and so will always produce "face-like" output.
The paper includes a quantitative evaluation, in which they take a set of 30 distractor face images not used elsewhere in the study, and for each of them determine whether the reconstructed face is more similar to it, or to the original face the subject was looking at. On average, the reconstructed face is closer to the correct face than the distractor 62.5% of the time.
So it's better than random, and I think it's pretty cool work, but the quality of the reconstructions is pretty terrible, considering that a randomly chosen distractor will usually be very different from the test face (~half the time opposite sex, frequently different race, different age, etc). For comparison, it would be interesting to evaluate some simple, obviously terrible reconstructions by the same metric. For example, we could "reconstruct" the face as a image that is a single, solid color, the average RGB value of the pixels in the original face. Another "reconstruction" that it seems would very likely do better under this evaluation metric is something like the "race-gender-age" of perp descriptions in the news ("white male in his thirties").
The paper includes a quantitative evaluation, in which they take a set of 30 distractor face images not used elsewhere in the study, and for each of them determine whether the reconstructed face is more similar to it, or to the original face the subject was looking at. On average, the reconstructed face is closer to the correct face than the distractor 62.5% of the time.
So it's better than random, and I think it's pretty cool work, but the quality of the reconstructions is pretty terrible, considering that a randomly chosen distractor will usually be very different from the test face (~half the time opposite sex, frequently different race, different age, etc). For comparison, it would be interesting to evaluate some simple, obviously terrible reconstructions by the same metric. For example, we could "reconstruct" the face as a image that is a single, solid color, the average RGB value of the pixels in the original face. Another "reconstruction" that it seems would very likely do better under this evaluation metric is something like the "race-gender-age" of perp descriptions in the news ("white male in his thirties").