Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Visual Turing Test (deepart.io)
152 points by stared on Feb 6, 2016 | hide | past | favorite | 84 comments


I got 9/10. Chose the first one incorrectly.

For me, the giveaway for computer-generated images was that they don't focus on the "subject" and have too much detail on the background.

For example, look at the computer-generated boats - they blend into the background. It's hard to tell where the boat ends and the rocks begin, or where the sails end and the clouds begin. Or look at the grey house in front of the dude's face or the pipe behind the glasses lady's head, which bring in too much attention but add nothing to the painting.


Yesterday I was reading this blogpost (http://maryrosecook.com/blog/post/scarface-prince-of-the-cit...) so as soon as I clicked this link I thought no way can a computer learn to do what I read yesterday on its own!

I had read headlines about computer-generated images but had never seen them, yet I managed 9/10, anticipating that an artist's technique is not just relation between color etc. but a philosophy.

Edit: Also, it is not a turing test in true sense, since it's one sided. It would be turing test if I tell them what to draw & then it's tough to distinguish between what an artist & computer paints.


For me the giveaway in some of them were that the computer-generated ones were "from a photograph". So, the woman with the hair up in an impossible position was a human-generated one [1], for example. Also, the ones that looked more photograph-like were also computer generated. For instance, compare [2] and [3]

[1] http://turing.deepart.io/t/4.jpg

[2] http://turing.deepart.io/t/1.jpg

[3] http://turing.deepart.io/f/3.png


Same here. The first one was hard, I spent a long time and ended up getting that as the only wrong one. They mostly looked like real photos with a filter applied, which gave it away pretty easy.


This is not a Turing test. Please correct me if I am wrong, but I bet that none of this was drawn by a computer. Most likely, those that were supposedly "drawn by computers" were pictures drawn by humans but applied various type of filtering (e.g. those by the deep learning algorithms).


Well, yea. It says as much on the page:

"one is painted by a human and another one is generated by artificial intelligence based on a photo and a style of a painter."

I find this to be easily "beatable" by simply judging which one is most likely to have its origin in a photo (vanity, and such) with a fallback on the one with a lot of repeating patterns.


I would say it is not a Turing test primarily because it does not involve a dialog between the examiner and the subjects.

I don't think programs written to imitate a craft, or even to learn to imitate a craft from examples, count as AI, no matter how impressive.


What about a photograph with filters applied to it? Does that count?


Think what would really count as a "visual turing test" is if the computer-generated images were really "generated" by a computer, not rendered from a photo. Still, it's an interesting exercise in how much the rendering can make the photo feel "opinionated"—as if you can sense the painter behind it; that's the criteria I used for choosing, but I got a 7/10.


> what would really count as a "visual turing test"

I disagree - nobody would say drawings/paintings (by humans) based on photos or observations by eye are less legitimate than scenes completely imagined.


How about the following: The examiner can provide any picture of their choosing to two subjects A and B, where one is a human and the other a machine. At the end of a reasonable period, both subjects respond to the examiner simultaneously with one drawing each, derived from that same picture. In creating each drawing, A and B are both trying to convince the examiner that they are human. The examiner then should identify which of the two subjects is a computer and which is human. To better reflect the traditional Turing test, the examiner should be able to repeat this process with different pictures (keeping the identities of A and B unknown, but fixed) before making their determination.

Interestingly, this is a Turing test that, when applied purely to humans, doesn't require said humans to share a common language.


That's a nice protocol.

I have to quibble though that the Turing test is meant to be a test of intelligence, and this sort of task seems pretty different, although it may actually be visual-AI-hard (to invent a new term). So may deserve a different name.


You can call it a test of human-level aesthetic sensibility if you'd think a highly intelligent alien species would be unlikely to share our aesthetics. Although, arguably, a highly intelligent alien would be able to understand our aesthetics to the point where they pass the test regardless...


I think you are teasing me, but my answer would be: no, it doesn't count. The Turing test is about "understanding". The filtering of those images has no understanding of what the images are about.


The Turing test is not about understanding, it's simply about distinguishability. The idea is measuring understanding is hard, but it's easy to try to distinguish the production of a machine in several parameters by doing a blind test. So a computer may actually "understand more" of something but be blatantly computer-like (by being much better than a human could ever produce) -- a chess expert may identify a top level chess AI that way -- and thus fail the Turing test. So it's more of a sufficient but not necessary intelligence test.

Those photo modifications kind of fit into that theme: if you can't distinguish the filters from real art, they passed an indistinguishability test.


I got 6/10. I guessed that most of the things that looked "too real" were based on photos with filters applied to make it look like a painting, and that the abstract stuff was generated by humans. E.g. I was very surprised a computer produced this: http://turing.deepart.io/f/0.png or this http://turing.deepart.io/f/4.png

But the opposite heuristic doesn't work either, because not all the computer paintings are abstract. It mimics a lot of different styles and I am extremely impressed.


>I guessed that most of the things that looked "too real" were based on photos with filters applied to make it look like a painting, and that the abstract stuff was generated by humans.

I too used the same method and 5/10. Looking at the other comments, it seems people have better idea what art is than I do.


10/10. Most of these computer "generated" pieces are not good art. This is painted by a human who knows what it is to be human in the place that is painted: http://turing.deepart.io/t/7.jpg. This is just eh, shit? http://turing.deepart.io/f/7.png.

I am convinced intelligence programming will tackle these issues some day. But apparently not yet.


I thought it was easier to tell which ones were computer generated rather than which were painted by a human. The computer art all seem rather abstract yet uniform.

also the high-fidelity, near-photographic paintings are always human.


I was only 3/10 - so I think it takes an eye for art to beat it. In my mind this is like a chess program getting to 1200. Not enough to beat an expert, but it's on to something.


I'm not an expert by any means and I got 10/10


Perhaps it's what we were looking for?

I suspect that we can be trained to spot fakes too to make it harder. Perhaps there will be some co-evolution?


I wonder about that too. I had 2/10, way worse than chance level.


Note that all of the computer generated pieces were created by this impressive deep net algorithm (paper prepublished in 8/2015): http://arxiv.org/abs/1508.06576

Somehow I cannot believe how fast they got the idea to create profit from it: http://deepart.io/


Yeah, things move fast there: Paper (August) -> Open source code (September) -> Deepart (November).

As a side note, for me the pace was one of the reasons to move from science to data science (shameless plug: http://p.migdal.pl/2015/12/14/sci-to-data-sci.html).


Were you part of the DeepSense Right Whale Kaggle team? That was some good shit.


I work at DeepSense, and with these guys, but I didn't participate in this team. I was shocked as well; and now I try to learn from them as much as possible. :)


8/10, but I think this is extremely impressive.

If I wouldn't know that some of them were generated by ANN, I wouldn't question that like 90-95% of them were painted by a human. And I consider myself pretty alright at painting.

These 2 are especially awesome:

http://turing.deepart.io/f/0.png

http://turing.deepart.io/t/9.jpg



I'd be interested to see the original photos for those two. I expect a lot of the "art" we sense in those came from the photographer, not the algorithm.


That second one would make for an amazing canvas print.


The second one is actually painted by Francoise Nielly http://www.galeries-bartoux.com/fr/artistes/francoise-nielly...


Here are the results so far http://turing.deepart.io:3838/


As a 10/10er, I think the fact that people tend to score less goes to show that:

1. The photos are low-res enough to hide the most ridiculous artifacts produced by neural nets

2. The machine-generated images are trying to emulate the likes of van Gogh, not the likes of Leonardo (again hiding the extent of NNs complete inability to understand what they're doing)

3. Most people simply neither paint nor appreciate visual art, so this is not as powerful as the Turing test! (One point of the Turing test is that a human easily passes it, because all humans speak; they don't all paint.)


It's much simpler than that. This is NOT a Turing test in that the computer generated images have been selected beforehand from the best results. What users are grading here is not the quality of the algorithm that generated the pictures; it's the quality of the choice made by the human who selected these specific pictures among all the examples of computer-generated art.


9/10. The low res made it harder to tell which one was computer generated, but it was still pretty clear. Now if they could do it full size, I'd start to be interested.


https://deepart-io.s3.amazonaws.com/hi-res/leon3.jpg - hi-res version of the algorithm is ready. Soon we will make the second edition of the test, stay tuned.


What's deep about this is how deep it is in uncanny valley. The stuff you're doing is super interesting though, don't take it too harshly.


this is great work kidzik. looking forward to see more about it. hi-res are quite good, that even if in the uncanny they can be quite a success for deepart lovers.


Thanks! We teamed up with the authors of the original algorithm and Leon figured out quickly how to improve resolution.


9/10 , FB share not working . I would really like to see result for higher res images . For now I will just pass and not comment on the results .


BTW: "An artist or an ape?" test by Mikhail Simkin http://reverent.org/an_artist_or_an_ape.html


The problem with that one is that all the artists are technically apes too (https://en.wikipedia.org/wiki/Ape).


I got 100% on this one. I looked at all of them before making my selections and found what seemed to be a pattern amongst the ape ones.


Yeah, that seems like the trick. The ape ones are really very similar to each other.


100% on this, and found it pretty easy to tell the difference. The main 'tell' was that the ape images had individual brushstrokes that were erratic in thickness/pressure/amount of paint, whereas the individual strokes in the artist images were generally more consistent.


Knowing upfront that one is computer generated does not really work nor showing it to an audience that has seen deep neural net pics for months on the frontpage of HN. Showing them on the street in a non tech neighborhood and ask which is the better/nicer looking painting without saying that one is done by a computer should get some better results.

Prepare for some people actually getting angry about it once you tell that the paintings they liked most are computer generated (from pictures but still).


> Prepare for some people actually getting angry about it once you te that the paintings they liked most are computer generated (from pictures but still).

My girlfriend is a fine arts major and isn't speaking to me right now.


Note that it is much easier for a computer to create a human-like painting than to behave like a human. The Turing test requires a two-way interaction medium.


Exactly right - the Turing test as it was originally conceived is very hard to implement (some argue impossible). To use the popular phrase, this is "not even wrong".


9/10

The computer does not duplicate selective detail. Artists will put detail in areas of focus, and intentionally obscure or abstract other areas. Also, there is often an interplay between medium and message. To that last point, I'm guessing there is still a human "artist" who chose the reference scene and pose and artist style, even if they didn't paint it directly. Towards that, I see this as a creative tool to be included in some future Photoshop, or similar.

Also... Share on Facebook? "App Not Setup: This app is still in development mode, and you don't have access to it. Switch to a registered test user or ask an app admin for permissions."


8/10, also got "App not Setup": http://imgh.us/not-set-up.png


Got 8/10. Of the choices, I was instantly sure that this one wasn't computer-generated from a photo because of the unnatural dark area on the top: http://turing.deepart.io/t/6.jpg. I think (even if photos weren't involved) the fact that the dark area is out-of-place but still aesthetically congruent may suggest something about human creativity that would be hard to replicate using a computer.


Actually I thought that was human because the nipple stood out so clearly.


It looks more like these images are procedurally made rather than actual AI created paintings.


Knowing that it was a turing test actually made it easier for me. (8/10) While I did take into account the painting style (the random rainbow look was suspect), I mostly ignored images where the subject looked like a photograph. This one threw me off on that basis though: http://turing.deepart.io/t/4.jpg.


For me the giveaway was the motion effect of the hair following the tilt of her head, followed by the spiralling background.


I recently played around in this space ( https://medium.com/data-engineering/artificial-startup-style... ). I was merely thinking through what it would take to turn this into a Turing test and not actually executing one.

These guys seem to be trying harder to actually make this a Turing test, but as the commenters above have noted, the problem with using source photographs makes this a fairly imperfect test. I have a suggested fix for that.

In my last effort in that post, I actually used one source photograph, and then compared the output by a human painter and the AI painter. My original results were terrible, but then quickly vastly improved by a better implementation of the algorithm, the Deep Forger implementation.

I think that this variant in the procedure, having both the human and the AI start from the same source photograph, could make for some interesting variants on Turing testing.


I wasn't very successful, but I tried to use "uniformity" criteria - if it looked like there are portions of the picture that were being omitted or abstracted away, I assumed human and vice versa.

I wonder if the algorithm could be improved if it had a sense of what is "important" in the picture, and then choose a different algorithm to process important and unimportant portions.


I got 10/10 (no Facebook or Twitter account to share this, unfortunately.) Update: my wife got 10/10, too.

Right now neural nets do pretty poorly with "loose/messy" styles (you get noise instead of brushstrokes; if they posted high-resolution images there, I cannot believe anyone would score less than 10/10.) Neural nets do exceptionally poorly with "neat" styles (try copying the style of the Mona Lisa into your photo and it'll paint an eye into your mouth.)

However, I'm pretty sure that an algorithm hand-crafted to replicate an artist's style might fool me (just like a hand-crafted chess algorithm wipes the floor with a neural net plus some tree search; not so for Go, I know - I don't even play Go so I don't have a firm opinion on that one.)

I must say that this whole "photo + style" thing really gets my goat - not the research as much as the sort of press coverage it gets. I project that a stock market crash will improve such coverage significantly.


I'd be very interested to see the source images before processing, to see how the resulting images differ from the originals. FWIW, http://turing.deepart.io/t/8.jpg has a URL visible in the middle.

One thing I did find interesting is that I did the test initially and got 4/10 (most felt like guesses to be honest). I repeated the test before looking at the answers (though obviously knowing the score introduces some bias), but zoomed the page so the images were about 2-3x bigger than the default. My score increased to 9/10, and I was much more confident of my answers whilst selecting them.


9/10, but I was already familiar with the algorithm. Useful features to find the nature of the "artist" for me included the following:

- Is the style of the painting applied uniformly over the whole image? Humans will, for effect, leave parts of a painting less emphasized and developed, whereas this algorithm will generally apply a style to the entirety of an image.

- If a painting has a surreal style, the subject is generally also surreal. Human painters distort shapes and forms. This is not done by this algorithm.

- Humans will add contrast to make objects stand out, even if colours are similar to their background. This is something the computers haven't yet completely figured out.

Still, this is a very impressive algorithm.


8/10, on a mobile device, just going on gut instinct


I somehow doubt that this is just a photo stylized by a computer. Aren't the eyes way out of proportion? http://turing.deepart.io/f/0.png


Yes, exactly. While the background in the picture of the baseball player doesn't look photographic, at least the proportions are realistic. But the proportions of the female face are completely unrealistic. How could it be based on a photograph of a real human?

Are these computer-generated images based on photographs of real subjects, or on photographs of stylized paintings?

I don't think this test is very interesting until we can see the images the computer-generated ones were based on.


I got the 8/10. Style is extremely important in understanding an artist and his work. Without a grasp on the original style, it's hard to see if a painting is genuinely of that style. So a more substantial visual Turing test would be a series of original works by an artist v.s. a series of simulated works targeting that style by AI.


9/10. I could tell from some of the pixels and from having seen quite a lot of computer generated imagery in my time.


10/10. My partner got 5/10, which he claims is because he's "more like a computer" (he's autistic). There were features in all of them that struck me as expressive in a very human way.

It's amazing to live in a time when questions about human identity and expression are more than just hypothetical.


9/10 I think you can somewhat tell by the visual flourishes that a real artist will add to their painting that a computer will not. Also the computer ones while not completely obvious I could tell were generated from a real picture without exaggerations a real artist may add.


Got 10/10. I think though if I would have only be looking at one picture at time and were to say yes or no. My score would have been much lower. Quite often in the computer-generated ones it looks like someone have gone nuts with the smudge tool in photoshop.


9/10, missed the last one. I feel like the main difference between the generated one and the drawn one are:

- obvious artifacts in the neural networks

- humans paint light differently than how a camera captures them, evident even after heavy processing.


Apparently I'm a computer since I only got 3/10. Am I the only one that seems to have a lack of artistic aesthetic?

I do try to appreciate art, but I'll readily admit that I just don't get a lot of it.


Your result is 10/10!

I guess as of today T-1000 would get his bitch ass kicked.


I got 8/10 right. I didn't really think about how I might figure it out, but just went with my gut feeling. Apparently that worked pretty well. I wonder why!


3/10. I wouldn't call this a 'turing test' though, in a turing test I can ask questions, here it's totally static.


Well, is it really considered a Turing Test? I thought in Turing Test, the interrogator is the one to ask (choosing from questions at his will).


5/10 – I'm not human?


does anybody know a way to run an algorithm with similar results on my own pictures? or how kids are calling it: "is there an open source version of this?"


The easiest way is... https://deepart.io/. Open source: https://github.com/jcjohnson/neural-style (not sure if exactly this one, there are a few implementations).


9/10 first one incorrect


Your result is 10/10!


Well, this is really nonsense, unfortunately.

First, all images could've been painted by a human.

Second, if I always click the left one, it says Your result is 5/10. A turing test is to test "a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human". If I – presumably a human being – score 5 out of 10, the test is not a Turing test after all, because humans aren't tested in a Turing test.

Edit: My observation was obviously wrong. Thanks for pointing it out.


You are not being tested, you are the judge. now, if all the judges do no better than chance, the system has passed. I got 8/10, and knowing the deepart pictures from before, I was pretty confident in some of the judgments, so I don't think they pass.


Indeed. We haven't pass yet - that's the first attempt though. Still only 66% of human accuracy is quite remarkable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: