Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GraphLint: Creating a Domain Specific Language for Graph Validation (knewton.com)
40 points by wfleiss on Feb 8, 2012 | hide | past | favorite | 10 comments


fwiw, when faced with a similar problem i've used graph databases (these days i guess neo4j, but at the time i used datalog). you either find errors loading the database, or via queries that parallel the rule-like approach used here. it's not as cool as this, but you end up with something quite similar to what is described (a way to find errors "declaratively") without needing to implement a new language.

also, i wonder if life would have been easier if the implementation had used prolog? not only is it famously nice for supporting dsls (don't ask me how - i've never done it, but you can imagine that parsing is pretty easy, since you get backtracking "for free"), but it might provide much of the scaffolding needed to evaluate queries.


I'm a software engineer at Knewton who worked with Kenny on this project.

We're not using Neo4j because of other unrelated constraints that have led us to write our own graph data store. Kenny's work implemented the "rule paralleling queries" approach you mention.

I'm not sure about Prolog, but Java's Antlr library was really easy to use. Writing the parser+lexer infrastructure only took a few days. It wasn't quite as clean as using Antlr in a language like SML, but for implementing a parser in Java it was pretty great.


I have recently been experimenting with Treetop, a ruby gem that allows you to easily work with parsing expression grammars.

I may move to Common Lisp or Clojure for writing DSLs, but I am pretty happy with my early experience with Treetop. The one downside, especially compared to Antlr which has a couple of books out there, is that the Treetop documentation had me more than once scratching my head and hacking rather than looking up a relevant reference.

I think parsing expression grammars and packrat parsing in general is going to explode the DSL dev scene . . .


interesting, thanks (sorry for misunderstanding slightly). possibly related / useful in future - did you notice that neo4j's licence terms changed a while back? i can't remember the details, but i had avoided it in the past because of licensing and remember that the change seemed to be an improvement.


The Neo4j community edition is now free and licensed under GPL (http://neo4j.org/download/), and they are working on an Enterprise license that is free for startups (https://groups.google.com/d/msg/gremlin-users/kw9N6xYb6l4/uH...).

There is also a Neo4j Heroku Add On (http://addons.heroku.com/neo4j) that is currently free for developers.


This is really cool - prolog and datalog are great for a wide variety of things, but I'd imagine the performance advantages writing a DSL rather than having to go through the whole *log interpreter is a major plus.

Out of interest, what's the performance like at scale (50 000+ nodes)? (In hind-site, I guess I don't actually have anything to compare it to, but a 50 000 node graph seems pretty big!)


Kenny's initial implementation of the validation on a sample 50,000 node data graph and a specific ontology took four seconds.

The goal of Kenny's project was to get a working DSL and prototype of the validation algorithm. We don't really have much performance data to compare it to either at this point. As we gather that data by using the system, we'll make it faster as necessary. (Nathan Marz's "suffering-oriented programming" post is a perfect description of this development process http://nathanmarz.com/blog/suffering-oriented-programming.ht...)


What advantages does this hold over doing the same thing with Gremlin? Other than being slightly prettier syntactically. [0]

[0] https://github.com/tinkerpop/gremlin/wiki


Kenny's work is a language for describing the schema of a graph (e.g. "this graph is a bipartite graph"). We want our non-programmers to be able to write graph schemas, hence the DSL. I imagine one could use the Gremlin language to specify a graph schema, but I think it would be a bit obtuse as Gremlin is a query language.

We're actually using the Blueprints model under the hood to implement the validation algorithm.


Gremlin is a DSL that comes in several variants -- there is Gremlin-Java, Gremlin-Scala, and the original Gremlin-Groovy (which is what's used in the example on the wiki).

Gremlin-JavaScript is in the works and Gremlin-Jython is on deck.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: