I always wonder how readable those queries really are. Its a nice claim but has ...

okram · on Oct 22, 2015

In Apache TinkerPop's Gremlin3.

  g.V().match(
    as("cypher").hasLabel("QueryLanguage").out("queries").as("graphs"),
    as("user").out("uses").as("cypher"),
      where("user", within(["Oracle","Apache Spark", "Tableau", "Structr"])),
    as("openCypher").out('makesAvailable").as("cypher")).
      select("cypher").by("attributes")

The original query is sort of an odd query as you don't don't need all the unbound variables...

And if you want to use SPARQL over TinkerPop, just use a SPARQL->Gremlin Virtual Machine compiler. https://github.com/dkuppitz/sparql-gremlin

johnymontana · on Oct 22, 2015

Yeah, I actually wrote that query just as an example to show what Cypher looks like and some fun around the openCypher announcement. Wasn't expecting it to end up on HN, let alone have Marko convert it to Gremlin...

timwilliate · on Oct 23, 2015

I disagree with you on the statement that Neo4j does not work well in life sciences. I am a data scientist building large scale systems for mining genomic data, and we built a fairly critical piece of that infrastructure around Neo4j. I actually presented an overview of that work at GraphConnect this week:

http://speakerdeck.com/timwilliate/graphs-are-feeding-the-wo...

Many meaningful lineages in life sciences can be hundreds to thousands of levels deep (our datasets are great examples). Neo4j is the only graph database I have evaluated that handles traversals across lineages of this depth while still achieving the performance scalability promised by maintaining index-free adjacency across which ever node in the cluster a traversal is sent to.

a_bonobo · on Oct 23, 2015

The recent "huge open tree of life" paper uses a Neo4j database as well: http://www.pnas.org/content/112/41/12764.full

jerven · on Oct 23, 2015

I am just going to point to our work at sparql.uniprot.org. A graph database with 17 billion edges and 3 billion+ nodes. Containing in its whole the NCBI tax and GO tax trees. That you can access for free over HTTP using standard SPARQL 1.1. This does not run on a cluster but single nodes with Virtuoso 7.2.1.

I am not saying that Neo4J is a bad choice, I am just saying that it due to its lack of federation support it is an expensive choice for the life sciences. i.e. an economic argument over a technical one, and not even looking at 1 project a time but in general for the community. Neo4J and Cypher will never support federation in the way that SPARQL allows. This is because all this URI business in RDF is annoying when modelling your data but critical when merging datasets on demand between separate databases. e.g. joining ChEMBL & UniProt & MeSH & PubChem etc...

We in the life sciences rarely do graph traversals for graph traversal sake, but tend to join trees. e.g. intersect a branch of a taxonomic tree with a branch of the GO tree. There are cases where real graph traversals are being done (assembly&variation graphs).

OpenCypher is a great step forward. Now Neo4J needs a open public standard for serializing graphs to disk that can imported into Neo4J and other databases. RDF being supported by so many different databases allows us to support many more of our users (at UniProt) even if they don't use SPARQL or our choice of Graph database themselves.

jakewins · on Oct 22, 2015

There's nothing stopping you from flipping that relationship order though, or making that pattern more compact. In fact, I'd prefer an overall reversed order, something like:

  MATCH (openCypher)-[:MAKES_AVAILBLE]->(cypher:QueryLanguage)-[:QUERIES]->(graphs),
        (u:User)-[:USES]->(cypher)
  WHERE u.name IN [‘Oracle’, ‘Apache Spark’, ‘Tableau’, ‘Structr’]
  RETURN cypher.attributes

I guess, in this particular case, it's subjective preference which language you feel expresses the query pattern most legibly. I certainly prefer the visual approach of cypher.

okram · on Oct 22, 2015

Ah, thats better as you don't need all the unbound variables. However, you still don't need "graphs" unless for the English reading of the promotion.

In Gremlin3, the above is:

  g.V().match(
    as("openCypher").out("makesAvailable").hasLabel("QueryLanguage").as("cypher").out("queries").as("graphs"),
    as("user").out("uses").as("cypher"),
      where("user", within(["Oracle","Apache Spark", "Tableau", "Structr"])),
        select("cypher").by("attributes")

jerven · on Oct 23, 2015

I like cypher, then gremlin for small queries. The problem is the queries I see are much, much larger. And at about 10 lines in the use of white space in SPARQL starts to make a real difference in readability in my opinion.

That can of course also be affected by my slight reading disability where the shape of the words is important. This shape could be disturbed by the connecting sigils in both gremlin and Cypher. So I understand that my preference might not hold for the whole population :)

grandalf · on Oct 22, 2015

> I find Gremlin a lot nicer than Cypher, and a lot more powerful as well.

Do you have an example of a query that you think is better in Gremlin? I've yet to see one but haven't spent much time with Gremlin.

taylorbuley · on Oct 22, 2015

A major benefit of Gremlin is portability to e.g. Google's Cayley. If I was specializing in Neo4J I would specialize in Cypher too.

jonpaine · on Oct 23, 2015

I think that's exactly why openCypher is happening. A robust and widely adopted openCypher is a good thing for users - but also for neo technology.

grandalf · on Oct 25, 2015

Does anyone know whether the semantics available in Cypher would be practical to use when querying a distributed graph database? Or is it useful to have a closer to the metal implementation considering the distributed system tradeoffs?