Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AnyDSL: A Framework for Rapid Development of Domain-Specific Libraries (anydsl.github.io)
87 points by mabynogy on March 5, 2017 | hide | past | favorite | 25 comments


    A key feature of AnyDSL is that transformations of code from higher level of abstractions to lower levels is not done in a compiler
Look, you can call it a "mumbleencabulator" if you want, it's still quacking an awful lot like a compiler...


Yeah, I tend to agree. I tried reading it even more closely, and here's how I'm interpreting it:

* In a normal compiler, you have the front-end which generates intermediate code in a pseudo-assembly language, and you have the back end that compiles that intermediate code into assembly language (or into JVM or CLR or even JavaScript, these days...)

* In AnyDSL, the front end takes the code and compiles it into, well, the same language, only with decreasing levels of abstraction from the hardware.

I don't know if you're familiar with "shader" languages (for OpenGL and DirectX and similar), but they're written in a syntax that looks very much like C. It's not precisely C, but it looks a whole lot like C -- but every statement effectively generates machine code directly, and they get executed in parallel. So if you're writing a pixel shader that makes everything more red, you'd write a statement like (pseudocode):

    output.r += 10;
...and when it drew the texture or polygon to the screen, it would execute your code once for each pixel.

In AnyDSL, it looks like the top level language is very much like a normal (functional) language. That language gets "compiled" into an intermediate abstraction language (like the one using @iterate call), and the final "compilation" step is like the shader language, in that it's letting you define a target-specific implementation, which could include vector functions or pixel shaders or whatever.

If this were C, then the second level would be internal to the compiler and completely opaque, and the final level would be the compiler backend+libraries, which might have calls to implement vector math. If you imagine lifting those backend libraries up into the same compiler, where the libraries are written in a similar language syntax, you can imagine that they can do a lot more to optimize the result, because more of the code gets built at the same time, and a lot of intermediate states can be eliminated at compile time.

But yes, it still feels fundamentally like a compiler. Only a compiler where you can't just link in existing C-based libraries, because then it wouldn't be able to do its magic. So you therefore need to reinvent all the wheels?


In most compilers today, the front-end generates an AST, from which an(other) intermediate representation (IR, named Thorin in AnyDSL[1]) is generated. On that IR, the "middle-end" already optimizes a lot and the back-end finally generates whatever output formats the compiler targets (for AnyDSL: LLVM, CUDA/NVVM, OpenCL).

> * In AnyDSL, the front end takes the code and compiles it into, well, the same language, only with decreasing levels of abstraction from the hardware.

The AnyDSL language (Impala) compiler works exactly like my compiler description above. However, the levels of abstraction you refer to here, are all written in the same "top level" language (a "host" language in DSL literature terms) and are basically just libraries on top of each other. The @ on the iterate call partially evaluates the call (aggressive inlining and specialization) at compile time on the Thorin IR. This feature allows the compiler to remove the layers of abstraction again and bake the code together, like you say.

> But yes, it still feels fundamentally like a compiler. Only a compiler where you can't just link in existing C-based libraries, because then it wouldn't be able to do its magic. So you therefore need to reinvent all the wheels?

Yes, it is a compiler - but you don't need to write one for your DSL (unless you really need other syntax). You're correct with regard to external libraries: If you need to remove the abstraction overhead (e.g. function calls) of your external library for performance reasons you'd need to re-implement that in AnyDSL. However, that holds true for any DSL framework and compiler, unless you compile everything to (C) binaries and do partial evaluation at link time. But at that point you've already lost a lot of information that a compiler can usefully exploit to generate better code: more abstract types, higher-order functions, scheduling info, alias info, etc..

[1] See the paper: http://compilers.cs.uni-saarland.de/papers/lkh15_cgo.pdf


>However, that holds true for any DSL framework and compiler, unless you compile everything to (C) binaries and do partial evaluation at link time. But at that point you've already lost a lot of information that a compiler can usefully exploit to generate better code: more abstract types, higher-order functions, scheduling info, alias info, etc..

For quite some time, C and C++ linkers have had the ability to look back at the original C/C++ code and literally rebuild the code differently to optimize the program across objects. In the Microsoft compile chain this is called "Global Optimizations". So I don't see AnyDSL's ability to do the same as a completely unique proposition.

The other issue that I see with this is that 98% of code doesn't need to be optimized down to the level that AnyDSL enables. JIT compilers are really good at optimizing the execution path without having to involve programmers at the intermediate and low levels, for example.

In the few cases where you need to target LLVM, CUDA and OpenCL for a particularly performance critical routine, I could see this as being useful, as long as the resulting code can be called from a more standard language. I'd like to see really trivial C bindings (via exported .h files) at a minimum, but C++, Go, Rust, JavaScript, Python, and other modern languages could really hit the performance required of 98% of the code, leaving only the extreme performance code for something like AnyDSL.

But I'd say that front-and-center: "AnyDSL is about creating ultra optimized cross-hardware performance critical code, targeting LLVM, CUDA, and OpenCL code based on the same high level code." All of the talk about theory is just distracting people from the actual "killer app" of AnyDSL.


Partial evaluation (whether online/offline) is not implemented in any (current) C/C++/Rust compiler, for very good reasons. "Global Optimizations" may do a significant number of optimizations that partial evaluation may do and some that partial evaluation is unable to do (depending on your abstractions, e.g. code motion).

I totally agree that these kinds of optimizations are unnecessary for most applications, which is why we target and compare against High-Performance Computing algorithms and DSLs. Have a look at the publications and comparisons with OpenCV and similar.

If you want to wrap AnyDSL HPC code with some other language, sure. There is an experimental compiler flag --emit-c-interface right now. :)

However, most of the current HPC DSL frameworks generate some C code from Python/etc by stitching together library calls to BLAS/etc. in some haphazard, mostly untyped way - not ideal.

> But I'd say that front-and-center: "AnyDSL is about creating ultra optimized cross-hardware performance critical code, targeting LLVM, CUDA, and OpenCL code based on the same high level code." All of the talk about theory is just distracting people from the actual "killer app" of AnyDSL.

Thanks, I agree. Presentation could definitely use some work. As it stands, it's mostly a presentation of the published research work and not yet addressed to users.


> * In AnyDSL, the front end takes the code and compiles it into, well, the same language, only with decreasing levels of abstraction from the hardware.

I thought this was called a macro.


The exact point is that instead of macros, higher-order functions are used.


Why can't you use higher-order functions in macros?


Basically the idea is that a compiler that does partial evaluation and can generate extremely efficient code for higher-order functions provides a nice framework for Deals.

I wouldn't call it reinventing the wheel.


It's reinventing the wheel if you need rewrite functionality that's otherwise available in C libraries.

Has anyone who you can trust written encryption code in AnyDSL? If not, then you either need to write it yourself (reinventing the wheel, and generally considered A Bad Idea with encryption code), or you need to call a C library, obviating much of the optimization advantage of using AnyDSL.


I'm wondering how this compares to Xtext[1], which is another framework for developing domain specific languages. Xtext feels more approachable to me, but maybe that's because there appears to be a lot more documentation and tooling.

For example, the docs on Xtext's grammar language[2] seem very intuitive to me, even though I'm not experienced in compilers or language design. I don't have quite the same intuition when looking over the AnyDSL docs[3]. Maybe the Xtext docs are just more goal-oriented, i.e. "Five simple steps to your first language".

[1] https://eclipse.org/Xtext/index.html [2] https://eclipse.org/Xtext/documentation/301_grammarlanguage.... [3] https://github.com/AnyDSL/anydsl/wiki/Tutorial


Xtext and AnyDSL have very different goals. Xtext is mostly about Syntax and IDE-Support, while AnyDSL is about compilation. With Xtext you'll get support for defining the grammar of your language, but you'll write your own compiler for your DSL - the DSL is "deeply-embedded" in the host language Java, that is: represented as a Java datastructure. In AnyDSL, you don't have any support for custom syntax - all your DSLs are basically just "libraries"/types/functions in the host language Impala - a "shallow" embedding. Java examples of shallow embedding are most "fluent interface" libraries, e.g. jOOQ[1].

This has the benefit that you don't need to know about compiler tech to implement your DSL. However, a domain-specific compiler can optimize using domain-specific knowledge and potentially generate faster code. For this reason AnyDSL/Impala provides online partial evaluation with the '@' operator, which aggressively specializes functions and evaluates at compile time. With the right DSL abstractions, this can result in generated code that is as fast as hand-tuned code.

For a more complete view of the relation between partial evaluation and DSL embedding, have a look at the GPCE'15 paper[2].

We totally agree that the website and documentation (there is some in the github wikis) is lacking at the moment and we're working on them. However, AnyDSL is still a young research project.

[1] http://www.jooq.org/doc/3.9/manual/sql-building/sql-statemen... [2] http://compilers.cs.uni-saarland.de/papers/gpce15.pdf


JetBrains' MPS also is in the same field


I don't know what people are used to these days, but looking at the diagram in the post, if this is what author calls a DSL, then they haven't seen a DSL in their life before. What I see in the image is bog standard use of procedures to abstract code away. Does the word "DSL" these days mean "use functions"?

An actual DSL looks like this:

https://www.irif.fr/~jch/software/cl-yacc/cl-yacc.html

(scroll down to "define-parser")

It enables domain-specific abstractions in code. It's not about just naming your functions right.


They're going to ban "lisp as the real example" comments here before too long. (-;


It would be a sad day indeed. There's nothing better than an occasional example from decades ago for one to re-evaluate current hype levels in programming ;).


That "Machine Expert" snipped still looks way too much like a programming language. I fail to see the advantage over rolling a DSL with ANTLR which is reasonably intuitive (I'm assuming this is meant for external DSLs) or better yet using a language workbench like Xtext (which also gives you an Eclipse-IDE for your DSL "for free").

I guess the more the merrier but the linked website doesn't really showcase the tool very well (imo). For quick prototyping I'll probably stick with Prolog DCGs :)


In AnyDSL, Impala is a host language for shallow-embedded DSLs. No parser generators/grammar required/needed as that is not the point.

From the last paragraph of the overview section: "The DSL developer just reuses Impala's infrastructure (lexer, parser, semantic analysis, and code generator). He does not need to develop his own front-end."

We should probably emphasise that and restructure the introduction text.


I think the point is that the machine expert part (which essentially implements an iteration construct) can be provided by a target architecture specific library.


How does this compare to lisp and lisp-like languages that use macros to create DSLs? Particularly Racket and its #lang mechanism?


There's nothing worse than DSLs.


amen


After quickly scanning the text, I have no idea in what way a "framework for DSL Libraries" is different from, well, a programming language.

It appears that the levels of abstraction are solved a bit cleaner, but also not fundamentally different.

(no expert, though. Maybe someone can translate it)


Well, who says a programming language cannot provide a framework for DSL libraries?

I like that iteration constructs, for example, can be implemented simply as higher-order functions.

This way, the compiler already knows how to deal with them and does not require additional definitions regarding parsing or code generation.


Or, you can use Ruby ;]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: