It is very easy to get this dataset directly from HN API. Let me just post it he...

tarasglek · on May 11, 2024

May I hijack this thread for a related q. I love the public up-to-date hn dataset.

I saw recursive cte blog post..but this doesn't seem to work your hn dataset

https://play.clickhouse.com/play?user=play#V0lUSCBSRUNVUlNJV...

Are recursive ctes disabled on this instance or am i doing something wrong?

zX41ZdbW · on May 14, 2024

Done, and now it works perfectly.

tarasglek · on May 15, 2024

what was broken?

zX41ZdbW · on May 12, 2024

This is unclear to me, I will ask the author.

zX41ZdbW · on May 14, 2024

The reason is trivial - I disabled the new feature flag on the playground service long ago (when it was in development). I will enable it back and send an example.

strooper · on May 10, 2024

While trying the script, I am getting the following error -

<Trace> ReadWriteBufferFromHTTP: Failed to make request to 'https://hacker-news.firebaseio.com/v0/item/40298680.json'. Error: Timeout: connect timed out: 216.239.32.107:443. Failed at try 3/10. Will retry with current backoff wait is 200/10000 ms.

I googled with no luck. I was wondering if you have a solution for it.

zX41ZdbW · on May 10, 2024

It makes many requests in parallel, and that's why some of them could be retried. It logs every retry, e.g., "Failed at try 3/10". It will throw an error only if it fails all ten tries. The number of retries is defined in the script.

Example of how it should work:

    $ ch -q "SELECT * FROM url('https://hacker-news.firebaseio.com/v0/item/40298680.json')" --format Vertical
    Row 1:
    ──────
    by:     octopoc
    id:     40298680
    parent: 40297716
    text:   Oops, thanks. I guess Marx was being referenced? I had thought Marx was English but apparently he was German-Jewish[1]<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Karl_Marx" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Karl_Marx</a>
    time:   1715179584
    type:   comment

zX41ZdbW · on May 9, 2024

Also, a proof that it is updated in real-time: https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...