This is the concept behind "transfer learning", taking a fully-trained CNN (ResNet, Inception, MobileNet, etc) and removing the last or last two layers. What I don't understand yet is when to use which trained network, i.e. if they have different features that make them suited for different application domains.
I don't think there's a lot to say about this, you want the pretraining on a large dataset and on a task that's similar to yours. Probably largeness is somewhat more important. These things aren't an exact science at this point.