It is silly to make an overly broad statement about recursion killing. On modern "hosted" OSes, there are safeguards about stack overflows, which will quickly kill your process. This _can_ be a problem on embedded systems with limited memory management facilities (e.g., hw with no MMUs) and I do understand that the library author can't control where the library is used, and in fact some safety critical systems require a maximum stack depth guarantee which rules out recursion. However, some problems, especially parsing CFGs, are inherently recursive in nature and I'd argue going the non-recursive route with explicit stacks would result in bugs elsewhere because the code becomes hard to reason about.
>> denial of service was considered to be the realistic impact
> It is silly to make an overly broad statement about recursion killing. On modern "hosted" OSes, there are safeguards about stack overflows, which will quickly kill your process.
>> denial of service was considered to be the realistic impact
is in the article as justification for why this has low criticality and therefore isn't subject to the 90 day disclosure timeline. I.e. it's _limiting_ the predicted impact.
I assumed GP was referring to the other more critical risk, stack clashing, which I guess could lead to RCE? not being an issue on modern OS's.
The article basically said: "Letting your get killed this way would practically lead to DoS attacks (a security issue), therefore [conclusion]." The response was basically: "Actually, on modern OSes, your application gets killed, unlike on embedded systems. Therefore, [opposite conclusion]."
This doesn't make sense as a comment, regardless of the the particular conclusion.
You can fairly easily recurse with a context argument in which you maintain a depth count.
int recursive_fun(rec_context *ctx, ...)
{
int res = 0;
if (++ctx->depth > REC_LIMIT)
return ERROR_RECURSION_LIMIT;
...
res = recursive_fun(ctx, ...);
out:
ctx->depth--;
return res;
}
However, a problem with recursion might be something other than the maximum depth reached.
If recursion traverses an exponentially sized abstract space, it can chew up a lot of processing time before going anywhere near the protective limit.
The other problem, especially for libraries, is that they don't know either the overall stack size nor the size of a single stack frame (the former is specified when linking the executable, the latter is an implementation detail and subject to optimizer's trickery).
Many "fixes" for recursion effectively re-implement it, but do so with their own data structures that are equivalent to the stack (as originally used) but have precise definitions and therefore can be controlled wrt resources used.
If all the recursive code is under your control (no external libs at the bottom, or, worse, trips through external libs and back into recursion), you can make some generous estimate of the upper bound of a stack frame.
If the OS provides ample stack space, you can inquire about the limit, and subtract a safety margin from that to create your own soft limit that you can test against in the recursion.
Another way is to be absolutely restrictive, rather than track the limit. Create some test cases representing the worst recursive cases you're willing to handle (and document as such). Empirically measure their stack use. Multiply that by 1.5 and use that as the stack check limit.
On startup you can check that this soft limit is not too close to the real one and diagnose the situation.
If your soft limit is many megabytes away from the real one, you're not likely to hit the real one, unless something really stupid happens, like alloca, or a call into some code you don't control that goes rampant.
The submitted article, by the way, could use more nuance like this. Careless recursion in C, without reasoning about the consequences (what happens for certain inputs) is obviously bad, not recursion itself.
Keeping the focus specifically on this bug: Do you think it was "careless recursion" in libexpat? That library was started in 1997, and the recursion bug wasn't found until 2022.
> Another way is to be absolutely restrictive, rather than track the limit. Create some test cases representing the worst recursive cases you're willing to handle (and document as such). Empirically measure their stack use. Multiply that by 1.5 and use that as the stack check limit.
We look forward to your patches for libexpat to add new unit tests.
Unixes give us that. You have to fork the computation to have it contained in its own process, whose run-time, virtual memory limit, and stack depth you can control.
Doing it all in your VM/runtime, so you can bail the computation with an exception, is more challenging.
> Sending a deeply nested JSON object as part of a request to some API should not crash the server that handles the request.
But the API should have limits. For example, respond 413 Payload Too Large if the request is beyond any normal size for that API. Document the limits to API users. You now have an upper bound of how much nesting can be in the user input.
There's still not much reason to recurse using the program's own call stack rather than having your own stack structure that can live in dynamic memory and be handled in a context-appropriate way (e.g. returning an error at some depth limit rather than being killed by the OS).
Recursive parsing of CFGs is only better when they're LL grammars, but LR grammars (which are the most common grammar used in programming languages) are definitely better with an explicit stack due to the repeated way the state machine needs to run. You might be able to do a nicer LALR parser recursively but I personally haven't seen one.
Deeply nested instances of right-recursive rules will blow the stack. If the LARL parser has an unlimited stack due to dynamic allocation, that will perpetrate a DOS.
Table-driven LALR(1) with an explicit stack does not make the recursion issue go away. The tooling may provide built-in handling for it which translates excessive depth into a syntax error.
Recursive parsing using the native stack can take steps to protect itself, like by keeping track of the depth, and bailing upon hitting a limit.
Techniques involving obtaining the stack pointer (or close estimate thereof), and comparing it to a limit, are also possible.