Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How to get stock fundamentals data with Python (theautomatic.net)
129 points by atreadw on May 5, 2020 | hide | past | favorite | 49 comments


For anyone interested in stock market API's and financial data, I started a YouTube channel on this topic last year which is starting to gain some traction:

https://www.youtube.com/parttimelarry

It covers popular services and libraries like Alpaca, Robinhood Private API, TD Ameritrade option prices, Yahoo Finance, TradingView alerts, backtesting, and a lot more.


Curious if you have used your foundational data infra for trading live? Were you able to beat a benchmark (e.g., beta-adjusted index returns?)

I'm asking because I've used semi-pro and pro infra (paid services) and it was hard to get consistent returns. It was easy to get returns, but rarely beyond, say, S&P 500 returns (at which point, I might as well just invest in the S&P 500 index.)


Yup.

What you really want to know is "how problematic is the noise in this data"?

One way to answer it is to create your own FF sorts [1] and regress your Yahoo-derived factor returns against Ken French's (which come from CRSP/Compustat). If the intercepts are small/statistically insignificant and the R2's reasonable, then you're all set.

The big problem you will hit is survivorship bias in Yahoo!. My own research suggests that the quality is perfectly acceptable, provided you back-fill an unbiased universe (e.g. Russell 3K) from another source.

I'm actually surprised you didn't outperform the market with the survivorship bias. Back test must have covered a very small time period.

[1] https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data...


What exactly do you mean by "back-fill an unbiased universe (e.g. Russell 3K)"?


Yahoo! doesn't have data for delisted or bankrupt companies, a "survivorship bias."

You need a set of companies that lacks this bias, and the set of Russell 3000 index constituents is a good set to use.

Some of the older Russel 3000 companies will be missing from Yahoo, because they are delisted or bankrupt. You need to find a secondary source for them, or "back-fill" your securities master.


Looks awesome, I subscribed!


My sense is that the golden age of the home API trader is about to emerge and I'd love to hear how things are going for other tinkerers. From my side, I decided to dive back into math, which I've ended up enjoying so much that I haven't surfaced yet to put any of it to work.


Been trading for roughly 3 years now. And yup, I'm exactly in the same boat, just recently dove back into filling all the holes i have in my knowledge (portions of math/stats)


Khan is so great for that. Sal's contribution to society is massive. And he's great to listen to. Even at 1.5X speed. :-)

Math is funny that way. Miss out on some fundamental aspect and it'll ruin you. And Khan explains some really basic concepts in ways that has given me new perspective or insight.


How does one get started? I feel like I have a lot of the math background, but approaching putting this toward an actual market feels opaque.


Random question but is this Tel from the east coast? (wont name the city to respect anonymity). Think we worked together before :)

Depends how new you are to it all, but assuming you just know basic trading stuff (what a bid/offer is, what options greeks are, etc..). Everything I do is stat arb type of trades. My path won't be the same as yours, but here's how I started.

What I did...

- read literally every link and topic under https://www.investopedia.com (the education tab -> investing trading section on the right)

- quantopian.com - look under the learning tab. I went through all the material - literally all of it. There's a series of Lectures that explains a ton

- quantconnect.com - I went through all their tutorials. Read through the existing strategies to get a feel for everything and gain insights - https://www.quantconnect.com/tutorials/strategy-library

- Absorbed almost everything Ernie Chan has written. This was the first book that got me interested in all this - https://www.amazon.com/Quantitative-Trading-Build-Algorithmi...

I read a _ton_ and played around with lots and lots of data to try and get my systems working. Still no where near where I'd like to be, but been profitable since the beginning (minus a giant loss on some gambles I took - I no longer trade discretionary because of this). The links I pasted above are what I'd consider the "easy" part of all this, anyone can read and learn. Learning the more in-depth stuff around specific topics is harder, but I have a bunch of books (willing to share some recommendations if you need).

Once you progress, you'll find specific areas you're interested in (equity options, commodity futures, spread trades, fixed income/bonds, etc.. ).


Has all that paid off for you? Are you seeing any alpha based on strategies you've developed?


yes :)

edit: think I should follow up with a comment/question I ask myself alot. Is all the work/time worth it? In my case, I'd say yes, because I've always had an interest in finance and literally would be reading + working with markets on the side for fun anyway. If I think of all the effort poured into this vs say working for some large tech company... I'd probably say just go work the high paying tech job. Obviously you can make money in the markets, but the tradeoff for keeping your sanity is a very real question you have to ask yourself.


Very cool. It sounds like you're basically treating this as a profitable hobby. How much alpha are you achieving, and how many hours a week would you say you put into it? Are you trading equities only or options and other derivatives as well?


it's definitely more than a side hobby. I probably put about 20 hours/week while also working 40 hours my normal work. I've also had stints where I worked on trading full time (6 months+ stretch). That's why I sent my follow up comment on "is it worth it". If your serious about trading, you need to understand the level of commitment. Also add the stress levels from when you inevitably hit bumps in the road (my large loss was a huge damage to my psyche at the time). It's not for everyone.

To answer your question, I started in equities but moved to commodity and index futures/options.


Oh dang! Yes, hey! Nice to hear from you—I'll send you an email :)


https://xkcd.com/1570/

The important math for understanding the foundations of finance and business isn't advanced math. It is accounting - primarily addition, subtraction and simple algebra or calculus 101 for things like interest calculations or discounted value modeling. This basic math also underpins the "fundamentals" of business used in objective, common-sense investing - the kind Warren Buffett is fond of.

Also, "stocks" and math are a classic "the map is not the territory" situation. Math describes stocks and business performance very well but does not define it. A machine learning algorithm trained on historical price data in concert with differential equations of 1,000 variables will historically do no better at investing than buying and holding index funds.

Technical Analysts "quants" would disagree, but alas: https://www.bloomberg.com/news/articles/2020-05-02/after-qua...

So the thing to focus on isn't going from math -> stock trading. It is to learn accounting and business and basic stock market concepts. Eventually a math background will help, but it's support not the foundation.


11:15, restate my assumptions: ...


I've been up to this recently. None of the maths I've used is very advanced yet. I've mostly been trying to use simple statistical tools like regression analysis as well as possible. I know a few people who have found employed in finance after having done graduate work in stochastic processes, but I'm not sure what the payoff would be for some of the more advanced stuff as a non-institutional trader.


Do you think this is the beginning of the golden age for home API traders due to market conditions or the tooling that enables home traders to operate getting better and better?


Not the OP, but I think it's the latter (tooling getting better and easier) combined with lower (or close-to-zero) trade execution fees and more widely available information.


What are some good fundamental data providers for small-time amateur investors (non-professionals)? I am looking for 1. US + international stocks 2. deep history of all historical accounting records (15+ years) 3. api, ideally with python examples

So far I have only found https://eodhistoricaldata.com and haven't tested them yet, is there anything better?


Almost no amateur investor data provider has data integrity guarantees.

You're almost always better off getting the data yourself and cleaning it.


They're the only provider if you're not willing to spend several thousand per year. Beware that a lot of their info is scraped from other sources and does not have accuracy guarantees.

You can also scrape the fundamentals directly from the 10-K statements each company publishes, but it's very difficult to get a clean, consistent dataset out of it.


To be honest, I'm surprised yahoo didn't ban web scraping in their terms of service.


Does anybody know anything else about this?

Is Yahoo cool with web scraping? Is it just something they tolerate? Are they just too incompetent to care?


Ahh, thw beauty and versatility of Python again. And the momentum doesn’t stop. Never regret jumping off of Perl, then Ruby ten years ago to fully commit to Python. I can see so many use-cases for it (with the rich ecosystem), on an almost daily basis.


Any programming language with a http client and a sqlite driver won't have problems with this example. Although, this is so simple that even sqlite might be overkill for analyzing the data.


Python quickly falls over with large datasets. Its great as a "glue" language or for POCs for large data processing or handling jobs - but at some point you need to graduate to faster things...


What’s «large»? I’ve set up multi-TB analyses without issue, using dask. I don’t know of a more productive language for such analyses, either. What is even better?


+1 on dask-distributed. I'm surprised they dont get more coverage/discussion in the industry! Why do you think they are so overlooked?


You can chew through TB even without distributed, as long as you have the disks and cores on the same system.


data.table is leagues ahead of python in terms of code length and performance.

https://h2oai.github.io/db-benchmark/


Between things like pyspark or using a rdbms you can scale pretty far. Even with other languages like java this is the case. Once your dataset goes beyond a single machine you need some kind of data platform.


Language speed becomes mostly irrelevant compared to the strategy, framework, toolset, and dataset design for parallelization.

If you have a use case where a program in C runs in a reasonable time on your laptop but one in Python doesn't, that solution is only going to take you so far before you'll want to graduate from your laptop and take advantage of the Python ecosystem again in real big data contexts.


Well that just isn't true. The only reason Python is remotely even usable in this context is because it's merely a wrapper around C the vast majority of the time. Language definitely matters.


You know, actual databases are something you can "glue" onto a Python infrastructure.


This should be fun to play with in a Jupyter notebook. :)


For me I know about what this data stands for and what it represents, I just don't know what to do _after_ acquiring this sort of data. Any pointers on that? I live in a really poor country where I can't do algotrading anyways but the eventual goal is to venture into that field by moving abroad, any pointers on what I can build right now with free APIs or from data like this?


anybody know of a good Python API for options prices? I don't think Alpaca has them


In a similar boat. Would like historical bid/ask if possible and live bid/ask (not 15 min delay). Willing to pay if it's not hundreds per month.


I bought historical option data in the past directly from CBOE [0]. Depending on what you need the prices are very reasonable. E.g. for a single symbol ~$70-$80 for a year depending on the interval you need. There is a big discount if you just buy all the symbols.

I cannot recommend the stock prices though. Somehow there were just a lot of mistakes and you can get them from other sources cheaper.

[0]: https://datashop.cboe.com


I got them from InteractiveBrokers a while back. It was definitely not hundreds per month, more like 15/mo

https://www.interactivebrokers.com/en/index.php?f=14193#coll...


TD's API supports option chains. They don't have a good first party Python library, but it's pretty straight forward.

https://developer.tdameritrade.com/option-chains/apis/get/ma...

IEX only does end of day pricing and even that is delayed until the next day.


Check out Tradier, https://documentation.tradier.com

In full disclosure, I work at Tradier.


I've been playing with Alpaca - they give you access to a lot of financial data through their apis without relying on scraping Yahoo...


I didn't see much scraping on that site. It just looked like regular calls. Maybe the API is not as clean or consistent as it could be. But anyway I did a quick search for Alpaca. This the one you meant: https://alpaca.markets/?


Here's an API guide for alpaca if you're keen: https://algotrading101.com/learn/alpaca-trading-api-guide/

Disclaimer: One of my writers wrote it for my blog


cool tool! fyi you should probably add a license of some kind in your github repo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: