I still wonder who the audience is for tools like this. The website posits you can answer data questions without going through an analyst, but the role of the analyst is not to be a SQL whisperer for PMs and Executives - it is to be an expert in the model and the data. A data warehouse of any real scale is going to have some amount of issues - anomalous data, different interpretations of the same numbers - how does the LLM deal with that consistently across a business?
the target audience is developers who wish to embed text to SQL functionality into their own products. the target audience is less the 'internal use case' (i.e. a data analyst) and more about letting external users do things they couldn't do before. a good example is payroll software where this type of technology can allow users to pull reports.
With what level of accuracy? And what guarantee of correctness? Because a report that happens to get the joins wrong once every 1000 reports is going to lead to fun legal problems.
You still need someone who understands why you should use which approach to get the data you need without getting completely wrong numbers back that _look_ perfectly fine but reflect fantasy, not reality.
> With what level of accuracy? And what guarantee of correctness? Because a report that happens to get the joins wrong once every 1000 reports is going to lead to fun legal problems.
The truth is everyone knows LLMs can't tell correct from error, can't tell real from imagined, and cannot care.
The word "hallucinate" has been used to explain when an LLM gets things wrong, when it's equally applicable to when it gets things right.
Everyone thinks the hallucinations can be trained out, leaving only edge cases. But in reality, edge cases are often horror stories. And an LLM edge case isn't a known quantity for which, say, limits, tolerances and test suites can really do the job. Because there's nobody with domain skill saying, look, this is safe or viable within these limits.
All LLM products are built with the same intention: we can use this to replace real people or expertise that is expensive to develop, or sell it to companies on that basis.
If it goes wrong, they know the excited customer will invest an unbillable amount of time re-training the LLM or double-checking its output -- developing a new unnecessary, tangential skill or still spending time doing what the LLM was meant to replace.
But hopefully you only need a handful of such babysitters, right? And if it goes really wrong there are disclaimers and legal departments.
Getting joins wrong once in 1000 queries would beat 99.9% of experienced data analysts.
Our standards for AI are too high.
If an autonomous car causes one wreck per ten million miles, people set the cars on fire.
When someone finds an LLM that suggests eating a small rock every day, that anecdote is used to discredit all LLM results.
This shit makes errors. But what is the alternative? Human analysts who get joins wrong four times in ten? Human drivers who cause wrecks 30 times per ten million miles? Human social media recommendations about nutritional supplements?
The autonomous car analogy is a good one. The technology is overall so far superior to a human (probably scrolling TikTok) driving but the moment it makes a mistake we remove the AEV which would be to to higher societal benefit.
Decisions should be made against an alternative, not against some fictitious perfect solution.
> I don’t see any path from continuous improvements to the (admittedly impressive) ”machine learning” field that leads to a general AI any more than I can see a path from continuous improvements in horse-breeding that leads to an internal combustion engine.
You can counter it doesn't necessarily need an AGI here but that doesn't change the fact you can't crank this engine harder and expect it to power an airplane.
> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.
> Alas, that does not remotely resemble how people are pitching this technology.
> can't crank this engine harder and expect it to power an airplane.
Similarly, but from my far-less notable-self in another discussion today:
> [H]uman exuberance is riding on the (questionable) idea that a really good text-correlation specialist can effectively impersonate a general AI.
> Even worse: Some people assume an exceptional text-specialist model will effectively meta-impersonate a generalist model impersonating a different kind of specialist!
Indeed, AI is not marketed as a BS generator, just as HTTP is not marketed as a spam/ad/fraud/harassment transport protocol. All technologies are dual-use, deal with it!
There's the old adage of "trust, but verify" with LLM's I'm feeling it more like "Acknowledge, but verify, and verify again". It has certainly pointed me in the right direction faster vs google "here's some SEO stuff to sort through" :)
I agree with you. The larger point with text to SQL, however, is that it will not work if it is a simple wrap of an LLM (GPT or otherwise). Text to SQL will only work if there is a sufficient understanding of the business context required. To do this is hard, but with tools such as Dataherald a dev's life gets a whole lot easier.
Its not the early days in terms of expecting digital tools to be correct 99% of the time. Early adoption age was back in 2000-2009. Now everyone expects polished tools that does what it expects them to do
therein lies the nuance. some people expect to get a natural language answer back. others expect to get a data table back. others expect to get correct SQL back. this is why it's so important to understand the use case and not bucket everything together.
Tbh the original intention was to be the "data analyst" but we found over time (and with literally 100s of user conversations at small cos and enterprises) the embedded use case was more interesting and made for a better business, which was not at all what we expected.
> Because ORM libraries were invented 30 years ago.
> There is no requirement to learn SQL for most of the applications built today.
In the same way that because Linked List libraries were invented 50 years ago, there is no requirement to learn what linked lists are for most of the applications built today?
You aren't getting past the requirement to learn relational databases "because ORM", and there is no material or course that teaches relational databases without teaching SQL.
The unfortunate result of this is that people who boast about knowing $ORM while not knowing SQL have never learned relational databases either.
ORM doesn't really excuse you from understanding what's going on. In a way using ORM is more difficult because you have to understand both what sql you want and how to get the framework to generate it for you.
Of course there's a lot of incompetent people who have no idea what they're doing, if it seems to work they ship it. That leads to a lot of nonsensical bullshit and unnecessarily slow systems.
Most applications dont need to get data from a relational database. But for those apps that do, knowing SQL is pretty much a must have. The developer himself or someone on the team.
I think GP meant 'where or from whom have you seen/heard demand for this?'.
Weirdly, I was just thinking about using an LLM to form sql queries for me, because I've forgotten much of what I knew. First time I had that thought and 5 minutes later, this fascinating idea rolls into my feed to pull me in further. I know I'm not exactly the target audience, but now I'm intrigued.
I went through a coding/design bootcamp a while back and there was virtually no focus on SQL, so a lot of my classmates were hesitant to jump into relational dbs for projects. I could see it being used in a tool for new devs or those who've focused on a JS stack and need some help with SQL.
allow me to clarify.. Dataherald isn't intended for developers because they don't know SQL, it's intended for developers who want to build text to SQL into their products
But who wants text-to-sql in products that they use? You wouldn't be able to trust the results. So what is it useful for? Of course you could learn to check the output. But then you could just learn SQL. I know dozens of not particularly technical people (certainly not software developers) who have learnt enough SQL to be useful over a couple of days.
The demand is huge. Accuracy is less important because the alternative is being completely in the dark or wait for a developer to get the data for you. In my experience people want to quickly get a ballpark number before they dig deeper.
I agree that you should just learn SQL but that doesn't change the fact that a lot of companies want this right now. SQLAI claims to have hundreds of thousands of customers.
I think a lot of people want something like this. Especially as more non technical people are adding business analysts to their jd.
I’ve tried to teach SQL to PMs, bug triage specialists, etc. even a couple of days is too much time for them to learn something not critical or core to their job. Their alternative is to bug data teams with adhoc requests, which data people hate.
A tool like this would probably save 15% of a data teams time, and reduce the worst part of their job. At companies with hundreds, or even thousands, of data folks - that’s massive
And the users are smart people. They can read SQL to see if it looks like the right filters are applied. The “accuracy” issue exists but for certain use cases, it’s honestly not the biggest concern.
Not sure why the tone in this thread is so negative. To the founders, thank you!
we've encountered a lot of instances when people know SQL but just want a first draft of SQL to expedite the process. we see this a lot from data analysts too.
While the engine response is not accurate all the time, the engine returns a confidence score. We have never encountered cases where a deployment with necessary training data indicates a .9 confidence score on an incorrectly generated SQL.
We’ve seen demand from all types of SaaS applications where the user might need data— software that helps customer support staff answer data questions, CRM, payroll software, just to name a few.
This is the grievance I have as a data scientist. It is one of the fields where things are technical, while meanwhile everyone thinks they could do the job and provide excessive input and exact direction.
there is a middle ground here. the most complicated queries will need the intel and business context of a smart data scientist. there are however so many types of queries where automation would make the world so much easier and allow more self-serve type data inquiries. too often the rhetoric around these topics is binary as in "it works" or "it doesn't work." in reality, there are certain use cases that work now and others that don't yet.