The paper I was thinking of is called: "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"[0]. I do not have experience training and investigating neural nets, but from what I read in that paper, there's no reason to presume you'll find neurons that represent a feature you're interested in. In the paper they alter the reward function to get neurons that correspond to the features they are interested in.
[0] https://arxiv.org/pdf/1606.03657v1.pdf