I still wonder who the audience is for tools like this. The website posits you c...

saigal · on May 24, 2024

the target audience is developers who wish to embed text to SQL functionality into their own products. the target audience is less the 'internal use case' (i.e. a data analyst) and more about letting external users do things they couldn't do before. a good example is payroll software where this type of technology can allow users to pull reports.

TheRealPomax · on May 24, 2024

With what level of accuracy? And what guarantee of correctness? Because a report that happens to get the joins wrong once every 1000 reports is going to lead to fun legal problems.

You still need someone who understands why you should use which approach to get the data you need without getting completely wrong numbers back that _look_ perfectly fine but reflect fantasy, not reality.

ecjhdnc2025 · on May 25, 2024

> With what level of accuracy? And what guarantee of correctness? Because a report that happens to get the joins wrong once every 1000 reports is going to lead to fun legal problems.

The truth is everyone knows LLMs can't tell correct from error, can't tell real from imagined, and cannot care.

The word "hallucinate" has been used to explain when an LLM gets things wrong, when it's equally applicable to when it gets things right.

Everyone thinks the hallucinations can be trained out, leaving only edge cases. But in reality, edge cases are often horror stories. And an LLM edge case isn't a known quantity for which, say, limits, tolerances and test suites can really do the job. Because there's nobody with domain skill saying, look, this is safe or viable within these limits.

All LLM products are built with the same intention: we can use this to replace real people or expertise that is expensive to develop, or sell it to companies on that basis.

If it goes wrong, they know the excited customer will invest an unbillable amount of time re-training the LLM or double-checking its output -- developing a new unnecessary, tangential skill or still spending time doing what the LLM was meant to replace.

But hopefully you only need a handful of such babysitters, right? And if it goes really wrong there are disclaimers and legal departments.

panarky · on May 25, 2024

Getting joins wrong once in 1000 queries would beat 99.9% of experienced data analysts.

Our standards for AI are too high.

If an autonomous car causes one wreck per ten million miles, people set the cars on fire.

When someone finds an LLM that suggests eating a small rock every day, that anecdote is used to discredit all LLM results.

This shit makes errors. But what is the alternative? Human analysts who get joins wrong four times in ten? Human drivers who cause wrecks 30 times per ten million miles? Human social media recommendations about nutritional supplements?

saigal · on May 25, 2024

The autonomous car analogy is a good one. The technology is overall so far superior to a human (probably scrolling TikTok) driving but the moment it makes a mistake we remove the AEV which would be to to higher societal benefit.

Decisions should be made against an alternative, not against some fictitious perfect solution.

saigal · on May 24, 2024

i agree that there will be "early adopter" type use cases and others that might take a while (e.g. healthcare with hipaa compliance)

it is still the early days. goal is to give the developer tools to do this easier.

_ugfj · on May 24, 2024

Enough of this weasel talk.

It's not the early days.

Not by a country mile.

To quote Cory Doctorow

> I don’t see any path from continuous improvements to the (admittedly impressive) ”machine learning” field that leads to a general AI any more than I can see a path from continuous improvements in horse-breeding that leads to an internal combustion engine.

You can counter it doesn't necessarily need an AGI here but that doesn't change the fact you can't crank this engine harder and expect it to power an airplane.

And, as always https://hachyderm.io/@inthehands/112006855076082650

> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.

> Alas, that does not remotely resemble how people are pitching this technology.

Terr_ · on May 24, 2024

> can't crank this engine harder and expect it to power an airplane.

Similarly, but from my far-less notable-self in another discussion today:

> [H]uman exuberance is riding on the (questionable) idea that a really good text-correlation specialist can effectively impersonate a general AI.

> Even worse: Some people assume an exceptional text-specialist model will effectively meta-impersonate a generalist model impersonating a different kind of specialist!

warkdarrior · on May 24, 2024

Indeed, AI is not marketed as a BS generator, just as HTTP is not marketed as a spam/ad/fraud/harassment transport protocol. All technologies are dual-use, deal with it!

EasyMark · on May 24, 2024

There's the old adage of "trust, but verify" with LLM's I'm feeling it more like "Acknowledge, but verify, and verify again". It has certainly pointed me in the right direction faster vs google "here's some SEO stuff to sort through" :)

saigal · on May 25, 2024

I agree with you. The larger point with text to SQL, however, is that it will not work if it is a simple wrap of an LLM (GPT or otherwise). Text to SQL will only work if there is a sufficient understanding of the business context required. To do this is hard, but with tools such as Dataherald a dev's life gets a whole lot easier.

_ugfj · on May 25, 2024

what is your affiliation with Dataherald

cess11 · on May 25, 2024

Likely co-founder and CEO: https://www.dataherald.com/company

saigal · on May 25, 2024

Yes. Correct.

altdataseller · on May 25, 2024

Its not the early days in terms of expecting digital tools to be correct 99% of the time. Early adoption age was back in 2000-2009. Now everyone expects polished tools that does what it expects them to do

saigal · on May 25, 2024

"...what it expects them to do"

therein lies the nuance. some people expect to get a natural language answer back. others expect to get a data table back. others expect to get correct SQL back. this is why it's so important to understand the use case and not bucket everything together.

saigal · on May 25, 2024

if you expect correct 99% of the time, you will be waiting for a very very very long time for most, except for the most constrained, use cases

arrosenberg · on May 24, 2024

I agree that is a more reasonable use-case. The readme for this tool seems geared toward the business of answering business questions.

saigal · on May 24, 2024

Tbh the original intention was to be the "data analyst" but we found over time (and with literally 100s of user conversations at small cos and enterprises) the embedded use case was more interesting and made for a better business, which was not at all what we expected.

rkuodys · on May 25, 2024

Could you share how products integrate txt to sql within a product? Very curious

saigal · on May 25, 2024

Search bar within the SaaS interface that allows user to ask data questions and returns back NL answer or specific cut of data

tirumaraiselvan · on May 27, 2024

Why do you think that's more reasonable :)

To me it seems a more risky use-case since you don't have control/observability over what an untrusted user is asking for?

greenavocado · on May 24, 2024

> the target audience is developers who wish to embed text to SQL functionality into their own products

Who is asking?

boredemployee · on May 24, 2024

you wouldnt believe the amount of developers that don't know how to write sql

threeseed · on May 25, 2024

Because ORM libraries were invented 30 years ago.

There is no requirement to learn SQL for most of the applications built today.

lelanthran · on May 25, 2024

> Because ORM libraries were invented 30 years ago.

> There is no requirement to learn SQL for most of the applications built today.

In the same way that because Linked List libraries were invented 50 years ago, there is no requirement to learn what linked lists are for most of the applications built today?

You aren't getting past the requirement to learn relational databases "because ORM", and there is no material or course that teaches relational databases without teaching SQL.

The unfortunate result of this is that people who boast about knowing $ORM while not knowing SQL have never learned relational databases either.

sfn42 · on May 25, 2024

ORM doesn't really excuse you from understanding what's going on. In a way using ORM is more difficult because you have to understand both what sql you want and how to get the framework to generate it for you.

Of course there's a lot of incompetent people who have no idea what they're doing, if it seems to work they ship it. That leads to a lot of nonsensical bullshit and unnecessarily slow systems.

meekaaku · on May 25, 2024

Most applications dont need to get data from a relational database. But for those apps that do, knowing SQL is pretty much a must have. The developer himself or someone on the team.

cerved · on May 25, 2024

Depends how good you want the application to be

saigal · on May 24, 2024

hmm not sure I understand the question

twojacobtwo · on May 24, 2024

I think GP meant 'where or from whom have you seen/heard demand for this?'.

Weirdly, I was just thinking about using an LLM to form sql queries for me, because I've forgotten much of what I knew. First time I had that thought and 5 minutes later, this fascinating idea rolls into my feed to pull me in further. I know I'm not exactly the target audience, but now I'm intrigued.

I went through a coding/design bootcamp a while back and there was virtually no focus on SQL, so a lot of my classmates were hesitant to jump into relational dbs for projects. I could see it being used in a tool for new devs or those who've focused on a JS stack and need some help with SQL.

skydhash · on May 24, 2024

> I could see it being used in a tool for new devs or those who've focused on a JS stack and need some help with SQL

Or they could buy a book like Learning SQL. Or spend a weekend on Youtube.

saigal · on May 24, 2024

allow me to clarify.. Dataherald isn't intended for developers because they don't know SQL, it's intended for developers who want to build text to SQL into their products

nicoburns · on May 25, 2024

But who wants text-to-sql in products that they use? You wouldn't be able to trust the results. So what is it useful for? Of course you could learn to check the output. But then you could just learn SQL. I know dozens of not particularly technical people (certainly not software developers) who have learnt enough SQL to be useful over a couple of days.

Kiro · on May 25, 2024

The demand is huge. Accuracy is less important because the alternative is being completely in the dark or wait for a developer to get the data for you. In my experience people want to quickly get a ballpark number before they dig deeper.

I agree that you should just learn SQL but that doesn't change the fact that a lot of companies want this right now. SQLAI claims to have hundreds of thousands of customers.

saigal · on May 25, 2024

Couldn’t agree with this more. Exactly.

edmundsauto · on May 25, 2024

I think a lot of people want something like this. Especially as more non technical people are adding business analysts to their jd.

I’ve tried to teach SQL to PMs, bug triage specialists, etc. even a couple of days is too much time for them to learn something not critical or core to their job. Their alternative is to bug data teams with adhoc requests, which data people hate.

A tool like this would probably save 15% of a data teams time, and reduce the worst part of their job. At companies with hundreds, or even thousands, of data folks - that’s massive

And the users are smart people. They can read SQL to see if it looks like the right filters are applied. The “accuracy” issue exists but for certain use cases, it’s honestly not the biggest concern.

Not sure why the tone in this thread is so negative. To the founders, thank you!

saigal · on May 25, 2024

we've encountered a lot of instances when people know SQL but just want a first draft of SQL to expedite the process. we see this a lot from data analysts too.

aazo11 · on May 25, 2024

While the engine response is not accurate all the time, the engine returns a confidence score. We have never encountered cases where a deployment with necessary training data indicates a .9 confidence score on an incorrectly generated SQL.

saigal · on May 24, 2024

We’ve seen demand from all types of SaaS applications where the user might need data— software that helps customer support staff answer data questions, CRM, payroll software, just to name a few.

hackernewds · on May 25, 2024

This is the grievance I have as a data scientist. It is one of the fields where things are technical, while meanwhile everyone thinks they could do the job and provide excessive input and exact direction.

saigal · on May 25, 2024

there is a middle ground here. the most complicated queries will need the intel and business context of a smart data scientist. there are however so many types of queries where automation would make the world so much easier and allow more self-serve type data inquiries. too often the rhetoric around these topics is binary as in "it works" or "it doesn't work." in reality, there are certain use cases that work now and others that don't yet.