Hacker Newsnew | past | comments | ask | show | jobs | submit | nlittlepoole's commentslogin

DuckDB can read/write SQLite files via extension. So you can do that now with DuckDB as is.

https://duckdb.org/docs/stable/core_extensions/sqlite


My understanding is that this is still too slow for quick inserts, because duckdb (like all columnar stores) is designed for batches.


The way I understood it, you can do your inserts with SQLite "proper", and simultaneously use DuckDB for analytics (aka read-only).


Aha! That makes so much sense. Thank you for this.

Edit: Ah, right, the downside is that this is not going to have good olap query performance when interacting directly with the sqlite tables. So still necessary to copy out to duckdb tables (probably in batches) if this matters. Still seems very useful to me though.


Analytics is done in "batches" (daily, weekly) anyways, right?

We know you can't get both, row and column orders at the same time, and that continuously maintaining both means duplication and ensuring you get the worst case from both worlds.

Local, row-wise writing is the way to go for write performance. Column-oriented reads are the way to do analytics at scale. It seems alright to have a sync process that does the order re-arrangement (maybe with extra precomputed statistics, and sharding to allow many workers if necessary) to let queries of now historical data run fast.


It's not just about row versus column. OLAPs are potentially denormalised as well, and sometimes pre-aggregation, such as rolling up by day, by customer.

If you really need to get performance you'll be building a star schema.


Not all olap-like queries are for daily reporting.

I agree that the basic architecture should be row order -> delay -> column order, but the question (in my mind) is balancing the length of that delay with the usefulness of column order queries for a given workload. I seem to keep running into workloads that do inserts very quickly and then batch reads on a slower cadence (either in lockstep with the writes, or concurrently) but not on the extremely slow cadence seen in the typical olap reporting type flow. Essentially, building up state and then querying the results.

I'm not so sure about "continuously maintaining both means duplication and ensuring you get the worst case from both worlds". Maybe you're right, I'm just not so sure. I agree that it's duplicating storage requirements, but is that such a big deal? And I think if fast writes and lookups and fast batch reads are both possible at the cost of storage duplication, that would actually be the best case from both worlds?

I mean, this isn't that different conceptually from the architecture of log-structured merge trees, which have this same kind of "duplication" but for good purpose. (Indeed, rocksdb has been the closest thing to what I want for this workload that I've found; I just think it would be neat if I could use sqlite+duckdb instead, accepting some tradeoffs.)


> the question (in my mind) is balancing the length of that delay with the usefulness of column order queries for a given workload. I seem to keep running into workloads that do inserts very quickly and then batch reads on a slower cadence (either in lockstep with the writes, or concurrently) but not on the extremely slow cadence seen in the typical olap reporting type flow. Essentially, building up state and then querying the results.

I see. Can you come up with row/table watermarks? Say your column store is up-to-date with certain watermark, so any query that requires freshness beyond that will need to snoop into the rows that haven't made it into the columnar store to check for data up to the required query timestamp.

In the past I've dealt with a system that had read-optimised columnar data that was overlaid with fresh write-optimised data and used timestamps to agree on the data that should be visible to the queries. It continuously consolidated data into the read-optimised store instead of having the silly daily job that you might have in the extremely slow cadence reporting job you mention.

You can write such a system, but in reality I've found it hard to justify building a system for continuous updates when a 15min delay isn't the end of the world, but it's doable if you want it.

> I'm not so sure about "continuously maintaining both means duplication and ensuring you get the worst case from both worlds". Maybe you're right, I'm just not so sure. I agree that it's duplicating storage requirements, but is that such a big deal? And I think if fast writes and lookups and fast batch reads are both possible at the cost of storage duplication, that would actually be the best case from both worlds?

I mean that if you want both views in a consistent world, then writes will bring things to a crawl as both, row and column ordered data needs to be updated before the writing lock is released.


Yes! We're definitely talking about the same thing here! Definitely not thinking of consistent writes to both views.

Now that you said this about watermarks, I realize that this is definitely the same idea as streaming systems like flink (which is where I'm familiar with watermarks from), but my use cases are smaller data and I'm looking for lower latency than distributed systems like that. I'm interested in delays that are on the order of double to triple digit milliseconds, rather than 15 minutes. (But also not microseconds.)

I definitely agree that it's difficult to justify building this, which is why I keep looking for a system that already exists :)


Yeah exactly. Wouldn't it be easier to make it self reported but use a mechanism to incentivize accurate market prices. Like enforce that the owner can't sell for more than x% the reported amount and rents are capped based on the reported amount. Use of the property as collateral has to match the reported value. You can only be insured for value up to that reported number. Lots of things the state can do to create that incentive.


https://jan.ai/ is pretty idiot proof.


You need a viable Proof of Human/Personhood system and afaik one does not yet exist.


It's called an ID card / SSN.


I agree with your take here. I have a partner that anyone in this thread would label as a #1 but my two siblings are essentially on the same plane of intimacy. If my partner left my life for whatever reason, I'd still have my siblings and while losing someone is hard it wouldn't destroy me because of that support. Its a fairly normal dynamic in my family. I had two uncles who lived together their whole lives, even through relationships that didn't end up working out even though they had children. My grandmother lived in the same apartment building as her brother for 30 years and her other two siblings lived in the same neighborhood. Her husband (my grandfather) died a few years into their marriage and her siblings (who never got married) always fulfilled the role that a partner would play.


I agree with your premise but ultimately its the responsibility of governments to manage how society distributes what is produced. In the US at least, it seems most of the tension here comes from the fact that we've decided that if you are not doing "work", then you don't "deserve" food, shelter, or healthcare. Trying to stop automation always seems like a fools errand because there is simply too much incentive to automate.


Yes, I agree completely.

Though I can imagine a counter argument from a technologist/futurist positing that the incentive to automate in part comes from that merciless system of survival.

The modern Luddite, I think, doesn't necessarily "hate" looms, or their inventors. They just don't have any faith that those necessary compensations you describe will ever happen, or at least not quickly enough to save them. Perhaps the government is convinced by exactly that argument above. Or at very least too apathetic (or financially tangled) to fight it.

We all act along the axes where we can affect something, effect something. Smashing looms is a fool's errand, but sometimes that's all the power you feel you have.


> The modern Luddite, I think, doesn't necessarily "hate" looms, or their inventors. They just don't have any faith that those necessary compensations you describe will ever happen, or at least not quickly enough to save them. Perhaps the government is convinced by exactly that argument above. Or at very least too apathetic (or financially tangled) to fight it.

Agree

> We all act along the axes where we can affect something, effect something. Smashing looms is a fool's errand, but sometimes that's all the power you feel you have.

However I think what's fascinating about this is it requires believing you can't change your skill set to something else. Sometimes I don't think its as simple as feeling empowered, but also fear and anxiety about change.


are you guys only sticking to Ethereum? Any plans for Algorand support?


Hi! We're starting with Ethereum and Polgyon, but we're definitely looking to expand to other chains. Happy to talk more (email address in the GP)


Can we get any of Iceberg/Delta/Hudi that isn't terribly complex to setup? Like configurable completely from Standard SQL via the CREATE EXTERNAL TABLE syntax.


100% agree.


Probably for SQL (top n, ...), but not for wrangling & analytics & ML & ai & viz


I partake in this trade. There are other DeFi markets than Aave with better rates (6% to 9% range) for borrowing Tether. I also don't only use one market or one chain to hedge a bit smart contract risk. I also didn't sell Tether and just hold cash. Maxing out my I Bonds allocation and then buying treasuries has offset the interest on Tether such that I've been slightly net positive for the last 18 months on my position. This is all gambling money, no money I actually need day to day is tied up in this and my retirement/savings are invested an a traditional portfolio of stocks/bonds/real estate.


Even Aave itself has better rates -- the figure the author is quoting is from the the platform's option to lock in a fixed rate for your loan. Currently you can lock in 12.24% [1], but you can also borrow at the variable rate, starting at 3.15%.

Now, that does subject you to uncontrollable variation, but if you look at the chart, it's historically stayed at a very low level. Even the occasional spike you see is only for a day or two and has little impact on the annual average. [2]

Furthermore, the whole time, you're getting credited for interest accrued on your collateral. (1.18% on the USDC here -- so, all in all about a 2% annual carrying cost, not a bit issue if you think the crypto market are on borrowed time!)

"But what about the case where USDT borrowing surges and you have a persistent high rate?"

If that happens at all, it's probably because everyone else is dumping Tether, meaning its price is probably falling, and it's a great time to close the short anyway!

[1] https://app.aave.com/reserve-overview/?underlyingAsset=0xdac...

[2] People often miss that "omg high interest rate" for a few days translates into a very little expense in absolute terms. It was especially bad when banks were complaining about having to do one-off overnight loans on a very temporary basis for 4% rather than 2%, supposedly meriting Fed intervention!


> People often miss that "omg high interest rate" for a few days translates into a very little expense in absolute terms

That is assuming crypto rates are like USD bank rates.

Do you know any structural reason the rates can’t spike to a Megapercent (annualised) rate or higher? If you are being charged interest, and the rate spikes, you could lose your collateral quite quickly (and it seems likely trading would be stopped so you might not even be able to close out).


Yes, I do -- you can look at how the borrow rate varies with the fraction of available tokens borrowed, e.g.:

https://compound.finance/markets/USDT

It saturates at a pretty low level.

I have no idea what you mean by the expression "like USD bank rates" though. Fixed? (bank rates aren't that, necessarily)


Why do you believe that Tether is investing in riskier assets than treasuries and equivalents? With so much capital, and all the scrutiny they've had for many years and throughout many cycles it seems incredibly foolish to do anything else.

Tether effectively has a risk-free golden goose, it seems quite foolish to slaughter it in an attempt to gain slightly more alpha.


> Why do you believe that Tether is investing in riskier assets than treasuries and equivalents?

That's easy. I can assume US treasuries will be here in 3 month, or a year or 10 years, and almost everyone will agree with me.

Almost no one would agree with close to 100% certainty that Tether will be here in 10 years or a year or even 3 months.


1. doesn't this describe FTX just as well?

2. if they're not doing anything shady, how come they can't be more transparent than they are?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: