Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Compile time regular expression in C++ (github.com/hanickadot)
72 points by Tideflat on Sept 14, 2023 | hide | past | favorite | 29 comments


I'd love for someone to add this to rebar[1] so that we can get a good sense of how well it does against other general purpose regex engines. It will be a little tricky to add (since the build step will require emitting a C++ program and compiling it), but it should be possible.

[1]: https://github.com/BurntSushi/rebar


The C++ standard library "general regex engine" is crap (at speed), so there's no competition on that front at least...


I know. I'm not really looking to compare it with C++'s standard library regex engine. (Although if someone wanted to do that work, I would welcome it into the benchmark.)

There are other engines written in C++ (in part or in whole) in rebar. And it is useful to compare it with engines not written in C++ too.


It's a good library but be careful as it can significantly increase compile times. I added a couple of reasonably long regular expressions to a c++ source file and it increased the compile time from near instant (below to 1 second) to 30 seconds. Might be wise to move these to dedicated source files so you don't pay this penalty each time you make changes.


Boost Xpressive has had static/"compile time" regex in C++ since the mid 2000s

https://www.boost.org/doc/libs/1_83_0/doc/html/xpressive.htm...

A performance comparison would be interesting.


I had great results with this back in the day. The template meta programming DSL you had to use was pretty horrifying but it was a triumph of what you could do with template metaprogramming (which was something discovered, not designed, in C++).

The first compile time regular expressions I saw that just used normal regular expressions was D's CTRE which produced an even faster "engine" than Boost Xpressive. This was thanks to D's compile time function evaluation reaching a point something like this was possible.

I just started using this CTRE for a regular expression whose performance had become problematic and I'm very impressed with it so far. It's pretty easy to surpass std::regex but not sacrificing usability was surprising. Build times haven't been affected too much either (for my use case).


D has ctRegex since a decade ago https://dlang.org/phobos/std_regex.html#ctRegex


At the risk of sounding like an idiot, what is the benefit of being able to match at compile time?


One of the stated benefits is that the compiler optimizer can optimize the regex state machine that gets created, resulting in significantly faster match times at runtime. There are some benchmarks if you look up her cppcon talk about the library iirc.

Edit: here is her talk, at 39 minutes she shares the benchmarks. https://m.youtube.com/watch?v=QM3W36COnE4


It's not necessarily match at compile time, it's compile the regex at compile time to eliminate all superfluous runtime costs.


C++ finally catches up to Perl :-)


Since people is posting other lang implementations... someone did it for zig too (probably less polished than this C++ lib) [0]. It is nice that the regexes can be used at compile time too [1].

--

0: https://github.com/alexnask/ctregex.zig

1: I think the difference between C++ template language and Zig comptime is that Zig's comptime is almost equal as Zig's regular language, whereas the experience of programming C++ templates almost feels like learning a separate, equally complex language.


This has been available in Lisp since at least 2004.

https://github.com/edicl/cl-ppcre/


C++ has had run-time regular expressions in its standard library since C++11, but this is about compile-time regular expressions


I've never used cl-ppcre myself, but its docs[1] claim that it provides compile-time regexes:

> CL-PPCRE uses compiler macros to pre-compile scanners at load time if possible. This happens if the compiler can determine that the regular expression (no matter if it's a string or an S-expression) is constant at compile time and is intended to save the time for creating scanners at execution time (probably creating the same scanner over and over in a loop).

[1]: https://edicl.github.io/cl-ppcre/


I think this is more like caching the regex than creating it at compile time? Load time I think is basically runtime. I think lisp can be loaded and then rehydrated later, but I'm not sure how common that is.


Hard to say exactly since I don't have experience with cl-ppcre, but this line seems to suggest something is actually happening at complile time:

> This happens if the compiler can determine that the regular expression (no matter if it's a string or an S-expression) is constant at compile time

If this were just a run of the mill caching mechanism, then whether the pattern was a constant at compile time wouldn't matter.


Also in .NET since version 7. It generates quite nice commented code too.

https://learn.microsoft.com/en-us/dotnet/standard/base-types...


Great thanks, I'll use that


This has been a part of the dlang standard library for some time: https://dlang.org/phobos/std_regex.html


Hopefully the standard regex will be constexpr soon.


Not to mention, like... trig functions. I know std::cos et al. need to set errno. Maybe cppfront will get rid of it.


What does cppfront have to do with it? You can write/download/import/enable constexpr implementations of cos with standard c++ compilers?

That’s like saying you can’t drive during winter with your car because it came with summer tires... just change your tires?


Compile with -fno-math-errno. Almost nobody checks errno for math functions, and disabling it can give a huge speedup.


That sets the cmath functions constexpr??


No.


Isn’t the issue with functions like this that they are not guaranteed to be the same on all hardware? So making it constexpr could break programs or cause the constexpr evaluated result to differ from the runtime result even for the same inputs


constexpr cmath is coming in C++23


Whoa slowndown there. Let's get a proper consexpr string support first




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: