Crawlee is open source and free to use and we don't have any plans to monetize it in future. It will be always free to use.
We provide Apify platform to publish your scrapers as Actors for the developer community, and developers earn money through it. You can use Crawlee for Python as well :)
tldr; Crawlee is and always will be free to use and open sourced.
- Crawlee has out-of-the-box support for headless browser crawling (Playwright). You don't have to install any plugin or set up the middleware.
- Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code. You don't have to care about what middleware, settings, and anything are or need to be changed, on the top that we also have templates which makes the learning curve much smaller.
- Complete type hint coverage. Which is something Scrapy hasn't completed yet.
- Based on standard Asyncio. Integrating Scrapy into a classic asyncio app requires integration of Twisted and asyncio. Which is possible, but not easy, and can result in troubles.
> You don't have to install any plugin or set up the middleware.
That cuts both ways, in true 80/20 fashion: it also means that anyone who isn't on the happy path of the way that crawlee was designed is going to have to edit your python files (`pip install -e` type business) to achieve their goals
I've been working on a crawler recently and honestly you need the flexibility middleware gives you. You can only get so far with reasonable defaults, crawling isn't a one-size fits all kinda thing.
Crawlee isn’t any less configurable than Scrapy. It just uses different, in my personal opinion more approachable, patterns. It makes it easier to start with, but you can tweak whatever you want. Btw, you can add middleware in Crawlee Router.
> Crawlee isn’t any less configurable than Scrapy.
Oh, then I have obviously overlooked how one would be able to determined if a proxy has been blocked and evict it from the pool <https://github.com/rejoiceinhope/scrapy-proxy-pool/blob/b833...> . Or how to use an HTTP cache independent of the "browser" cache (e.g. to allow short-circuiting the actual request if I can prove it is not stale for my needs, which enables recrawls to fix logic bugs or even downloading the actual request-response payloads for making better tests) https://docs.scrapy.org/en/2.11/topics/downloader-middleware...
Unless you meant what I said about "pip install -e && echo glhf" in which case, yes, it's a "simple matter of programming" into a framework that was not designed to be extended
Cache management is also what I had in mind. Ive been using golang+colly and the default caching behavior is just different enough from what I need. I haven't written a custom cache middleware, but I'm getting to that point.
Amplication, an open source devtool, launched a campaign, inviting first time contributors to contribute to their repository and get all the support they need for the PR and also some cool swag like a limited edition shirt and pack of stickers