No, the fact that this can be done in a library and looks like a native language...

simias · on May 8, 2021

Operator overloading is a mixed blessing though, it can be very convenient but it's also very good at obfuscating what's going on.

For instance I'm not familiar with this boost library so I'd have a lot of trouble piecing out what your snippet does, especially since there's no explicit function call besides the printf.

Personally if we're going the OOP route I'd much prefer something like Rust's `var.to_be()`, `var.to_le` etc... At least it's very explicit.

My hot take is that operator overloading should only ever be used for mathematical operators (multiplying vectors etc...), everything else is almost invariably a bad idea.

pwdisswordfish8 · on May 8, 2021

Ironically, it was proposed not so long ago to deprecate to_be/to_le in favour of to_be_bytes/to_le_bytes, since the former conflate abstract values with bit representations.

nly · on May 8, 2021

That's fine if whatever type 'var' happens to be is NOT usable as an arithmetic type, otherwise you can easily just forget to call .to_le() or .to_native(), or whatever, and end up with a bug. I don't know Rust, so don't know if this is the case?

Boost.Endian actually lets you pick between arithmetic and buffer types.

'big_uint32_buf_t' is a buffer type that requires you to call .value() or do a conversion to an integral type. It does not support arithmetic operations.

'big_uint32_t' is an arithmetic type, and supports all the arithmetic operators.

There are also variants of both endian suffixed '_at' for when you know you have aligned access.

raphlinus · on May 8, 2021

The idiomatic way to do this in Rust is to use functions like .to_le_bytes(), so you have the u32 (or whatever) on one end and raw bytes (something like [u8; 4]) on the other. It can get slightly tedious if you're doing it by hand, but it's impossible to accidentally forget. If you're doing this kind of thing at scale, like dealing with TrueType fonts (another bastion of big-endian), it's common to reach for derive macros, which automate a great deal of the tedium.

nly · on May 8, 2021

Who decides what methods to add to the bytes type/abstraction?

If I have a 3 byte big endian integer can I access it easily in rust without resorting to shifts?

In C++ I could probably create a fairly convincing big_uint24_t type and use it in a packed struct and there would be no inconsistencies with how it's used with respect to the more common varieties

raphlinus · on May 8, 2021

In Rust, [u8; N] and &[u8] are both primitive types, and not abstractions. It's possible to create an abstraction around either (the former even more so now with const generics), but that's not necessary. It's also possible to use "extension traits" to add methods, even to existing and built-in types[1].

I'm not sure about a 3 byte big endian integer. I mean, that's going to compile down to some combination of shifting and masking operations anyway, isn't it? I suspect that if you have some oddball binary format that needs, this it will be possible to write some code to marshal it, that compiles down to the best possible asm. Godbolt is your friend here :)

[1]: https://rust-lang.github.io/rfcs/0445-extension-trait-conven...

nly · on May 8, 2021

I agree then that in Rust you could make something consistent.

I think there's no need for explicit shifts. You need to memcpy anyway to deal with alignment issues, so you may as well just copy in to the last 3 bytes of a zero-initialized, big endian, 32bit uint.

https://gcc.godbolt.org/z/jEnsW8WfE

raphlinus · on May 8, 2021

That's just constant folding. Here's what it looks like when you actually need to go to memory:

https://gcc.godbolt.org/z/9qGqh6M1E

And I think we're on the same page, it should be possible to get similar results in Rust.

Brian_K_White · on May 8, 2021

It demonstrates that c++ is even less safe.

cbmuser · on May 8, 2021

You are still casting one pointer type into another which can result in unaligned access.

If you need to change byte orders, you should use library to achieve that.

nly · on May 8, 2021

Boost.Endian is the library here and this code is safe because the big_uint32_t type has an alignment requirement of 1 byte.

This is why ubsan is silent and not even injecting a check in to the compiled code.

You can check the alignment constraints with static_assert (something else you can't do in standard C): https://gcc.godbolt.org/z/KTcf9ax6r

kevin_thibedeau · on May 8, 2021

C11 has static_assert: https://gcc.godbolt.org/z/E3bGc95o3

Is also has _Generic() so you can roll up a family of endianness conversion functions and safely change types without blowing up somewhere else with a hardcoded conversion routine.