Hacker Newsnew | past | comments | ask | show | jobs | submit | more pornel's commentslogin

Nothing in the C standard requires bytes to have 8 bits either.

There's a massive gap between what C allows, and what real C codebases can tolerate.

In practice, you don't have room to store lengths along pointers without disturbing sizeof and pointer<>integer casts. Fil-C and ASAN need to smuggle that information out of band.


Note that Fil-C is a garbage-collected language that is significantly slower than C.

It's not a target for writing new code (you'd be better off with C# or golang), but something like sandboxing with WASM, except that Fil-C crashes more precisely.


From the topic starter: "I've posted a graph showing nearly 9000 microbenchmarks of Fil-C vs. clang on cryptographic software (each run pinned to 1 core on the same Zen 4). Typically code compiled with Fil-C takes between 1x and 4x as many cycles as the same code compiled with clang"

Thus, Fil-C compiled code is 1 to 4 times as slow as plain C. This is not in the "significantly slower" ballpark, like where most interpreters are. The ROOT C/C++ interpreter is 20+ times slower than binary code, for example.


Cryptographic software is probably close to a best case scenario since there is very little memory management involved and runtime is dominated by computation in tight loops. As long as Fil-C is able to avoid doing anything expensive in the inner loops you get good performance.


  > best case scenario since there is very little memory management involved and runtime is dominated by computation in tight loops.
This describes most C programs and many, if not most, C++ programs. Basically, this is how C/C++ code is being written, by avoiding memory management, especially in tight loops.


This depends heavily on what problem domain you're talking about. For example, a DBMS is necessarily going to shuffle a lot of data into and out of memory.


It depends. Consider DuckDB or another heavily vectorized columnar DB: there's a big part of the system (SQL parser, storage chunk manager, etc.) that's not especially performance-sensitive and a set of tiny, fast kernels that do things like predicate-push-down-based full table scans, ART lookups, and hash table creation for merge joins. DuckDB is a huge pile of C++. I don't see a RIIR taking off before AGI.

But you know what might work?

Take current DuckDB, compile it with Fil-C, and use a new escape hatch to call out to the tiny unsafe kernels that do vectorized high-speed columnar data operations on fixed memory areas that the buffers safe code set up on behalf of the unsafe kernels. That's how it'd probably work if DuckDB were implemented in Rust today, and it's how it could be made to work with Fil-C without a major rewrite.

Granted, this model would require Fil-C's author to become somewhat less dogmatic about having no escape hatches at all whatsoever, but I suspect he'll un-harden his heart as his work gains adoption and legitimate use-cases for an FFI/escape hatch appear.


> DuckDB is a huge pile of C++. I don't see a RIIR taking off before AGI.

While I'm not a big fan of rewriting things, all of DuckDB has been written in the last 10 years. Surely a rewrite with the benefit of hindsight could reach equivalent functionality in less than 10 years?


the sqlite RIIR is going quite well: https://turso.tech/blog/beyond-the-single-writer-limitation-...

(sqlite is quite a bit smaller than DuckDB tho)


I'm trying to make Turso load some data, but it is so slow that even several months are not enough to load the dataset: https://github.com/ClickHouse/ClickBench/issues/336


Is it? It's much less new.


for one, duckdb includes all of sqlite (and many other dependencies). it knows how to do things like efficiently query over parquet files in s3. it's expansive - a swiss army knife for working with data wherever it's at.

sqlite is a "self contained system" depending on no external software except c standard library for target os:

> A minimal build of SQLite requires just these routines from the standard C library:

> memcmp(), memcpy(), memmove(), memset(), strcmp(), strlen(), strncmp()

> Most builds also use the system memory allocation routines:

> malloc(), realloc(), free()

> Default builds of SQLite contain appropriate VFS objects for talking to the underlying operating system, and those VFS objects will contain operating system calls such as open(), read(), write(), fsync(), and so forth

Quoting from the appropriately named https://sqlite.org/selfcontained.html

as a very rough and unfair estimate between the two project's source, sqlite is about 8% the size of duckdb:

    $ pwd
    /Users/jitl/src/duckdb/src
    $ sloc .
    
    ---------- Result ------------
    
                Physical :  418092
                  Source :  317274
                 Comment :  50113
     Single-line comment :  46187
           Block comment :  3926
                   Mixed :  4415
     Empty block comment :  588
                   Empty :  55708
                   To Do :  136
    
    Number of files read :  2611
    
    ----------------------------
    $ cd ~/Downloads/sqlite-amalgamation-3500400/
    $ sloc .
    
    ---------- Result ------------
    
                Physical :  34742
                  Source :  25801
                 Comment :  8110
     Single-line comment :  1
           Block comment :  8109
                   Mixed :  1257
     Empty block comment :  1
                   Empty :  2089
                   To Do :  5
    
    Number of files read :  2
    
    ----------------------------


Oh, wow! I really had no idea!


I am professional database developer. We do not do what you are thinking we are doing. ;)


I was thinking less about the DB data itself and more about temporary allocations that have to be made per-request. The same is true for most server software. Even if arenas are used to reduce the number of allocations you're still doing a lot more memory management than a typical cryptographic benchmark.


Most databases do almost no memory management at runtime, at least not in any conventional sense. They mostly just DMA disk into and out of a fixed set of buffers. Objects don't have a conventional lifetime.


Along with the sibling comment, microbenchmarks should not be used as authoritative data when the use case is full applications. For that matter, highly optimized Java or Go may be "1 to 4 times as slow as plain C". Fil-C has its merits, but they should be described carefully, just with any technology.


I replied to unwarranted (to my eye) call that Fil-C is significantly slower than plain C.

Fil-C has its drawbacks, but they should be described carefully, just with any technology.


I maintain that microbenchmarks are not convincing, but you have a fair point that GP's statement is unfounded, and now I've made a reply to GP to that effect.


Or JavaScript for that matter


What does "significantly" mean to you? To my ear, "significantly" means "statistically significant".


WASM is a sandbox. It doesn't obviate memory safety measures elsewhere. A program with a buffer overflow running in WASM can still be exploited to do anything that program can do within in WASM sandbox, e.g. disclose information it shouldn't. WASM ensures such a program can't escape its container, but memory safety bugs within a container can still be plenty harmful.


You can buffer overflow in fil-c and it won't detect it unless the entire buffer was its own stack or heap allocation with nothing following it (and also it needs to be a multiple of 16 bytes, cause that's padding that fil-c allows you to overflow into). So it arguably isn't much different from wasm.

Quick example:

typedef struct Foo {

    int buf[2];

    float some_float;
} Foo;

int main(void) {

    Foo foo = {0};

    for (size_t i = 0; i < 3; ++i) {

        foo.buf[i] = 0x3f000000;

        printf("foo.buf[%zu]: %d\n", i, foo.buf[i]);

    }

    printf("foo.some_float: %f\n", foo.some_float);
}

This overflows into the float, not causing any panics, printing 0.5 for the float.


At least WASM can be added incrementally. Fil-C is all or nothing and it cannot be used without rebuilding everything. In that respect a sandbox ranks lower in comprehensiveness but higher in practicality and that's the main issue with Fil-C. It's extremely impressive but it's not a practical solution for C's memory safety issues.


What language do people considering c as an option for a new project consider? Rust is the obvious one we aren't going to discuss because then we won't be able to talk about anything else, Zig is probably almost as well loved and defended, but it isn't actually memory safe, just much easier to be memory safe. As you say, c# and go, also maybe f# and ocaml if we are just writing simple c style stuff none of those would look all that different. Go jhs some ub related to concurrency that people run into, but most of these simple utilities are either single threaded or fine grained parallel which is pretty easy to get right. Julia too maybe?


In terms of GC quality, Nim comes to mind.


I keep ignoring nim for some reason. How fast is it with all the checks on? The benchmarks for it julia, and swift typically turn off safety checks, which is not how I would run them.


Since anything/0 = infinity, these kinds of things always depend upon what programs do and as a sibling comment correctly observes how much they interfere with SIMD autovectorization and sevral other things.

That said, as a rough guideline, nim c -d=release can certainly be almost the same speed as -d=danger and is often within a few (single digits) percent. E.g.:

    .../bu(main)$ nim c -d=useMalloc --panics=on --cc=clang -d=release -o=/t/rel unfold.nim
    Hint: mm: orc; opt: speed; options: -d:release
    61608 lines; 0.976s; 140.723MiB peakmem; proj: .../bu/unfold.nim; out: /t/rel [SuccessX]
    .../bu(main)$ nim c -d=useMalloc --panics=on --cc=clang -d=danger -o=/t/dan unfold.nim
    Hint: mm: orc; opt: speed; options: -d:danger
    61608 lines; 2.705s; 141.629MiB peakmem; proj: .../bu/unfold.nim; out: /t/dan [SuccessX]
    .../bu(main)$ seq 1 100000 > /t/dat
    .../bu(main)$ /t
    /t$ re=(chrt 99 taskset -c 2 env -i HOME=$HOME PATH=$PATH)
    /t$ $re tim "./dan -n50 <dat>/n" "./rel -n50 <dat>/n"
    225.5 +- 1.2 μs (AlreadySubtracted)Overhead
    4177 +- 15 μs   ./dan -n50 <dat>/n
    4302 +- 17 μs   ./rel -n50 <dat>/n
    /t$ a (4302 +- 17)/(4177 +- 15)
    1.0299 +- 0.0055
    /t$ a 299./55
    5.43636... # kurtosis=>5.4 sigmas is not so significant
Of course, as per my first sentence, the best benchmarks are your own applications run against your own data and its idiosyncratic distributions.

EDIT: btw, /t -> /tmp which is a /dev/shm bind mount while /n -> /dev/null.


In Julia, at least, bounds checks tend to be a pretty minor hit (~20%) unless the bounds check gets in the way of vectorization


A GC lang isn't necessarily significantly slower than C. You should qualify your statements. Moreover, this is a variant of C, which means that the programs are likely less liberal with heap allocations. It remains to be seen how much of a slowdown Fil-C imposes under normal operating conditions. Moreover, although it is indeed primarily suited for existing programs, its use in new programs isn't necessarily worse than, e.g., C# or Go. If performance is the deciding factor, probably use Rust, Zig, Nim, D, etc. .


Test with Fil-C, compile with gcc into production. Easy.


You seem to think of "rust enthusiasts" as some organized group with a goal of writing Rust for the sake of it. Rust is long past such extremely early adopter phase.

What you're seeing now is developers who are interested in writing a better version of whatever they're already working on, and they're choosing Rust to do it. It's not a group "Rust enthusiasts" ninjas infiltrating projects. It's more and more developers everywhere adopting Rust as a tool to get their job done, not to play language wars.


Nah, I called out redox and another commenter pointed out ripgrep as an even better example of what I’d prefer to see, and those are also by what I would call rust enthusiasts. I don’t think of them as a monolithic group.

Where we disagree is I would not call injecting rust into an established project “writing a better version”. I would love it if they did write a better version, so we could witness its advantages before switching to it.


They are referring to adopting the Sequoia PGP library, which is written in Rust. There are plenty of benefits to using Sequoia which you can investigate now, no need to theoretically wait for the integration to happen. Not coincidentally, the RPM package manager also adopted Sequoia PGP.


First off, the mail references far more rust adoption than just Sequoia, but since you bring it up: here is how RPM adopted Sequoia in Fedora-land. There was a proposal, a discussion with developers about the ramifications (including discussion about making sure the RPMs built on all architectures), and there were votes and approvals. Costs and benefits and alternatives were analyzed. Here's a page that has links to the various votes and discussion: https://fedoraproject.org/wiki/Changes/RpmSequoia

Can't you see how much more thought and care went into this, than is on display in this Debian email (the "if your architecture is not supported in 6 months then your port is dead" email)?


> (including discussion about making sure the RPMs built on all architectures)

All officially supported ones. The Debian discussion is not about officially supported Debian ports, it's about unofficial ones.


That's non-sequitur.

The problem here is that C is too basic, dated, with inadequate higher-level abstractions, which makes writing robust and secure software extra difficult and laborious. "Learning underlying hardware" doesn't solve that at all.

Debian supports dozens of architectures, so it needs to abstract away architecture-specific details.

Rust gives you as much control as C for optimizing software, but at the same time neither Rust nor C really expose actual underlying hardware (on purpose). They target an abstract machine with Undefined Behaviors that don't behave like the hardware. Their optimisers will stab you in the back if you assume you can just do what the hardware does. And even if you could write directly for every logic gate in your hardware, that still wouldn't help with the fragility and tedium of writing secure parsers and correct package validation logic.


You've discarded the error type, which trivialised the example. Rust's error propagation keeps the error value (or converts it to the target type).

The difference is that Result is a value, which can be stored and operated on like any other value. Exceptions aren't, and need to be propagated separately. This is more apparent in generic code, which can work with Result without knowing it's a Result. For example, if you have a helper that calls a callback in parallel on every element of an array, the callback can return Result, and the parallel executor doesn't need to care (and returns you an array of results, which you can inspect however you want). OTOH with exceptions, the executor would need to catch the exception and store it somehow in the returned array.


Try:

    Either<Foo, SomeException> x;
    try {
        x = Either.left(foo().bar().baz().car());
    } catch (SomeException e) {
        x = Either.right(e);
    }
I have rewritten the parent code to preserve the exception without semantic difference or loss of type safety.

If there are multiple types of exception that can be thrown, the right can be a union type.


Either is another name for a Result type.

Either works, but now you have two ways of returning errors, and they aren't even mutually exclusive (Either-retuning function can still throw).

Catch-and-wrap doesn't compose in generic code. When every call may throw, it isn't returning its return type T, but actually an Either<T, Exception>, but you lack type system capable of reasoning about that explicitly. You get an incomplete return type, because the missing information is in function signatures and control flow, not the types they return. It's not immediately obvious how this breaks type systems if you keep throwing, but throwing stops being an option when you want to separate returning errors from the immediate act of changing control flow, like when you collect multiple results without stopping on the first error. Then you need a type for capturing the full result type of a call.

If you write a generic map() function that takes T and returns U, it composes well only if exceptions don't alter the types. If you map A->B, B->C, C->D, it trivially chains to A->D without exceptions. An identity function naturally gives you A->A mapping. This works with Results without special-casing them. It can handle int->Result, Result->int, Result->Result, it's all the same, universally. It works the same whether you map over a single element, or an array of elements.

But if every map callback could throw, then you don't get a clean T->U mapping, only T -> Either<U, Exception>. You don't have an identity function! You end up with Either<Either<Either<... when you chain them, unless you special-case collapsing of Eithers in your map function. The difference is that with Result, any transformation of Result<T, E1> to Result<U, E2> (or any other combo) is done inside the concrete functions, abstracted away from callers. But if a function call throws, the type change and transformation of the type is forced upon the caller. It can't be abstracted away from the caller. The map() needs to know about Either, and have a strategy for wrapping and unwrapping them.

catch lets you convert exceptions to values, and throw convert values to exceptions, so in the end you can make it work for any specific use-case, but it's just this extra clunky conversion step you have to keep doing, and you juggle between two competing designs that don't compose well. With Result, you have one way of returning errors that is more general, more composable, and doesn't have a second incomplete less flexible way to be converted to/from.


I think you're missing the key point about return types with checked exceptions.

`int thing()` in Java returns type `int`. `int thing() throws AnException` in Java returns type `int | AnException`, with language-mandated destructuring assignment with the `int` going into the normal return path and `AnException` going into a required `catch` block.

`int thing() throws AnException, OtherException` returns type `int | AnException | OtherException`.

The argument you're making, that the compiler doesn't know the return type and "you lack type system capable of reasoning about that explicitly", is false. Just because the function says its return type is `int` doesn't mean the compiler is unaware there are three possible returns, and also doesn't mean the programmer is unaware of that.

The argument you are making applies to UNchecked exceptions and does not apply to CHECKED exceptions.


It's not a single return type T that is a sum type. It's two control flow paths returning one type each, and that's a major difference, because the types and control flow are complected together in a way that poorly interacts with the type system.

It's not `fn(T) -> U` where U may be whatever it wants, including Ok|Exception in some cases. It's `fn(T) -> U throws E`, and the `U throws E` part is not a type on its own. It's part of the function signature, but lacks a directly corresponding type for U|E values. It's a separate not-a-type thing that doesn't exist as a value, but is an effect of control flow changes. It needs to be caught and converted to a real value with a nameable type before it can work like a value. Retuning Either<U, E> isn't the `U throws E` thing either. Java's special alternative way of returning either U or E is not a return type, but two control flow paths returning one type each.

Compiler is fully aware of what's happening, but it's not the same mechanism as Result. By focusing on "can this be done at all", you miss the whole point of Result achieving this in a more elegant way, with fewer special non-type things in the language. Being just a regular value with a real type, which simply works everywhere where values work without altering control flow is the main improvement of Result over checked exceptions. Removal of try/catch from the language is the advantage and simplification that Result brings.

Result proves that Java's special-case exception checking is duplicating work of type checking, which needlessly lives half outside of the realm of typed values. Java's checked exceptions could be removed from the language entirely, because it's just a duplicate redundant type checker, with less power and less generality than the type checker for values.


For Turing Complete languages everything is just a syntactical issue (Turing Tarpit).


And syntax is what most programmers will complain about. Even if it makes the wrong code easier to type.



That would be true if the society was already perfectly fair and neutral (which some people believe).

However, there is racism and sexism in the world (it's systemic, in a sense it's not about one person not liking another personally, but biases propagated throughout the society). To counter that, you need to recognize it, and it will be necessary to treat some people differently.

For example, women may not feel safe being a small minority at a gathering full of men. If you do nothing, many potentially interested women will not show up. You could conclude that it's just the way things are and women are simply not interested enough in the topic, or you could acknowledge the gender-specific issue and do something about it. But this isn't a problem affecting everyone equally, so it would require treating women specially.


OTOH Leaf is a proof that old batteries can be replaced and upgrade an old car.

Original Leafs were sold with a 24kWh capacity. Current ones have 48kWh for the same price, and 64kWh replacement batteries are available. So you can go from half of the crappiest range to 3× more range than when the car was brand new.

Old batteries with reduced capacity don't even have to be thrown out. There are projects that reuse old Leaf batteries for grid energy storage (any capacity is useful when they're sitting on the ground).

I'm betting that the current-gen mainstream cars will benefit similarly, especially that production volume is orders of magnitude higher now (lots of brands share the same platform and battery modules).


It's functionally tolerable when you disable transparency and increase contrast in accessibility settings.

Of course it makes everything look dull and primitive. Crammed and misaligned controls are even more obvious when elements have borders. You still have unhelpful animations.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: