The thing that I like about Duckling is that it is a rules based system, which can easily be interrogated. Model based text extraction is much harder to fix when there is a bug. I use Duckling as a service in value extraction from queries and content alongside a model based system for NER (such as spaCy). Using both together makes for more accurate enrichment in general (by cross referencing between the two for values, and adding exception rules)
(Very small correction: Duckling is rule-based but uses a super simple Naive Bayes classifier to prioritize between the many potential parses produced by the rules -- we see it as a hybrid approach)
Interesting! When I worked at IBM, we evaluated Duckling (the Haskell version) for use in the Watson Assistant product but decided to write our own numerical quantity parser/interpreter. We used ANTLR and created context-free grammars as we found that we could improve both precision and recall substantially. Sadly not open source though.
I must say it looks very eat from the point of view of usability. Are the training data sets open? Do you see feasible for small app coders (who don’t have thousands of examples to train) to use Duckling as more or less NLP parser without getting too much deep into the NLP and AI theory?
Are the trained sets mean to be used by different client code or languages?
Duckling is relevant to parse very structured language, typically temporal expressions (dates and times...). It relies on a mix of rules and machine learning. Rules and datasets for many (human) languages are available in the repo. You don't need a lot of data to add support for what you need, owing to this hybrid rules+ML approach (as opposed to just ML).
Hi, thanks for dropping in. What's the status of Clojure implementation? Would you recommend new projects to use it? Is anyone looking at new/old issues? Are there potential new maintainers for Clojure version?
The current Clojure version is quite stable, we used it at wit.ai/Facebook for several years before moving to Haskell.
I'd love to see somebody taking over and resuscitate it! One interesting direction could be to remove Java dependencies (mostly to Date) so that it's usable in ClojureScript. It would make a great JS library.
TL;DR Haskell made more sense for us to scale with the number of requests (existing FB infra) as well as the number of engineers working on the project (type checking, etc).
What (s)he means though is in Java if firstName is there then it will be a string.
In Clojure firstName might be anything, a string, a number or even an entire other hashmap or type, literally anything. This might or might not cause a runtime crash if you are doing something that assumes it's a string. So you check.
And "saves" is a bit of a misnomer, since it implies the "cost" (of all the static type machinery) is less. Well, dynamic fans (or those with dynamic preferences if "fan" is too strong) will disagree. ;) In practice many systems get streams of bits from somewhere (like the network) that commonly get interpreted into strings and from there other types. The validation and conversion is necessary in any language, after that though it's just FUD to bring up that a function expecting a person with a :name key and string value potentially could be given something else. In the cases where through changes we make it something else, static types are a nice extra assurance on consistency, but that isn't the only way or the most impactful way to gain assurances.
Also this quote was about the cited reason for FB preferring Haskell to Clojure generally, not about this project specifically. IME it's common in big organisations to prefer stuff that has controls at the cost of productivity. It's probably harder to ship buggy Haskell code even if it's harder to write.
I came here to mention the same thing. I experimented with the Clojure version a long while ago, and evaluated the Haskell version about a year ago for a project at work. Good stuff.