Hacker Newsnew | past | comments | ask | show | jobs | submit | jancurn's commentslogin

You sent me down a rabbit hole: https://esolangs.org/wiki/APGsembly is mentioned in the book

And for a related rabbit hole where people actually went all the to the bottom, there's of course the full implementation of Tetris in GoL which was nerd-sniped by a CodeGolf challenge

https://codegolf.stackexchange.com/questions/11880/build-a-w...


Sometimes you see something that makes you wonder how it is that you get to exist in the same world as people with the drive and intelligence to do truly awesome ( in the original sense) thing like this. I am proud of myself when the compiler works on the first try.

I think it's awesome that they can do this amazing fun esoteric stuff, but at the same time a small part of me thinks maybe they need to be doing something more meaningful in the real world.

This small part is what makes broken people. Whoever reads this, go have fun! :)

You know what? I think I will.

I wonder, what would that be, that thing that is more meaningful?

I would make the case that, zoomed out far enough, nothing at all is meaningful, so you might as well make beautiful things, and this is a delightfully beautiful thing.


the only thing that's meaningful is having fun, everything else is a waste of time

Hey, this is Jan, founder of Apify.

We’re running a marketplace of 8,000+ tools called Actors for all kinds of web data extraction and automation use cases (see https://apify.com/store). Just last month, we paid out more than $500k to community developers who publish these Actors on the Apify platform.

The unit economics work for niche tools: scrapers for specific platforms, packaged open-source tools, MCP servers, or API wrappers. Too small for building SaaS around it, but developers earn a few thousand dollars per month as passive income.

We believe there can be many more such Actors. So we're putting $1M in prizes on the table to motivate developers to build new, useful Actors. Our bet is that 10,000 new specific tools can widely expand the capabilities of many AI agents and unlock a lot of value.


Hey HN,

This is Jan, the founder of Apify (https://apify.com/) — a full-stack web scraping platform.

With the help of Python community and the early adopters feedback, after an year of building Crawlee for Python in beta mode, we are launching Crawlee for Python v1.0.0.

The main features are:

- Unified storage client system: less duplication, better extensibility, and a cleaner developer experience. It also opens the door for the community to build and share their own storage client implementations.

- Adaptive Playwright crawler: makes your crawls faster and cheaper, while still allowing you to reliably handle complex, dynamic websites. In practice, you get the best of both worlds: speed on simple pages and robustness on modern, JavaScript-heavy sites.

- New default HTTP client `ImpitHttpClient` (https://crawlee.dev/python/api/class/ImpitHttpClient), powered by the Impit (https://github.com/apify/impit) library): fewer false positives, more resilient crawls, and less need for complicated workarounds. Impit is also developed as an open-source project by Apify, so you can dive into the internals or contribute improvements yourself: you can also create your own instance, configure it to your needs (e.g., enable HTTP/3 or choose a specific browser profile), and pass it into your crawler.

- Sitemap request loader: easier to start large-scale crawls where sitemaps already provide full coverage of the site

- Robots exclusion standard: not only helps you build ethical crawlers, but can also save time and bandwidth by skipping disallowed or irrelevant pages

- Fingerprinting: each crawler run looks like a real browser on a real device. Using fingerprinting in Crawlee is straightforward: create a fingerprint generator with your desired options and pass it to the crawler.

- Open telemetry: monitor real-time dashboards or analyze traces to understand crawler performance. easier to integrate Crawlee into existing monitoring pipelines

For details, you can read the announcement blog post: https://crawlee.dev/blog/crawlee-for-python-v1

Our team and I will be happy to answer here any questions you might have.


Hey all,

we’re publishing this whitepaper that describes a new concept for building serverless microapps called Actors, which are easy to develop, share, integrate, and build upon. Actors are a reincarnation of the UNIX philosophy for programs running in the cloud.

Our goal is to make Actors an open web standard. We’d love to hear your thoughts.

Here’s the corresponding GitHub repo: https://github.com/apify/actor-whitepaper


For this use case, you might use this ready-made Actor: https://apify.com/apify/website-content-crawler


For sure, simply store cookies after login and then use them to initiate the crawl.


Not yet, but it’s on the roadmap


Thank you!


The main advantage (for now) is that the library has a single interface for both HTTP and headless browsers, and bundled auto scaling. You can write your crawlers using the same base abstraction, and the framework takes care of this heavy lifting. Developers of scrapers shouldn't need to reinvent the wheel, and just focus on building the "business" logic of their scrapers. Having said that, if you wrote your own crawling library, the motivation to use Crawlee might be lower, and that's fair enough.

Please note that this is the first release, and we'll keep adding many more features as we go, including anti-blocking, adaptive crawling, etc. To see where this might go, check https://github.com/apify/crawlee


Can I ask - what is anti-blocking?


Usually refers to “evading bot detection”.

Detecting when blocked and switching proxy/“browser fingerprint”.


Is this a good feature to include? Shouldn't we respect the host's settings on this?


It’s a fair and totally reasonable question but clashes with reality. Many hosts have data that others want/like to scrape (eBay, Amazon, Google, airlines, etc.) and they setup anti-scraping mechanisms to try and prevent scraping. Whether or not to respect those desires is a bigger question but not one for the scraping library - it’s one for those doing the scraping and their lawyers.

The fact is - many many people want to scrape these sites and there is massive demand for tools to help them do that, so if APIFY/Crawlee decide to take the moral ground and not offer a way around bot detection, someone else will.


Ah yes, the old 'if I don't build the bombs for them, someone else will'. I don't think this is taking the moral high ground, this is saying we don't care whether it's moral, there's demand and we'll build it.


There are many legitimate and legal use cases where one might want to circumvent blocking of bots. We believe that everyone has the moral right to access and fairly use non-personal publicly available data on the web the way they want, not just the way the publishers want them to. This is the core founding principle of the open web, which allowed the web to become what it is today.

BTW we continuously update this exhaustive post covering all legal aspects of web scraping: https://blog.apify.com/is-web-scraping-legal/



It’s an “old” law that did not consider many intricacies of internet and the platforms that exist on it and it’s mostly made obsolete by EU case law, which has shrunk the definition of a protected database under this law so much that it’s practically inapplicable to web scraping.

(Not my opinion. I visited a major global law firm’s seminar on this topic a month ago and this is what they said.)


I'm not gonna feel bad if a corporation gets its data scraped (whenever it's legal to do so, and this is another kind of question I'm not knowledgeable enough to face) when they themselves try to scrape other companies' data


You seem to have a massive category error here. To my understanding, this is not only going to circumvent the scraping protection of companies that scrape other people's data.


Google and Amazon where built on scrapped data, who are you kidding?


There's a bidirectional benefit to Google at least. That's why SEO exists. People want to appear in search results.


I make sure to enroll in projects which scrape Google/Amazon en-masse just for the satisfaction.


Probably not :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: