harder more complete way: read this "Deep Reinforcement Learning: An Overview", https://arxiv.org/abs/1701.07274 progressively implementing subparts in Python and openai gym https://gym.openai.com/read-only.html
Few links:
- https://sscaitournament.com/
- https://github.com/dgant/purplewave
https://github.com/blizzard