Anytime a program has a special case like that, it makes sense to craft an optimal data structure for it.
Also, consider that the terminating 0 byte isn't just one byte. There's also the alignment of what malloc returns, which may be 4 or 8 or even 16 bytes.
malloc also has internal bookkeeping overhead, typically 16-32 bytes per allocation.
Which is why one should never allocate a single (short) string from a generic allocator. Instead, one allocates a big chunk upfront (e.g. 4K bytes or more), and breaks from that, using a simple index that points to the first unused bytes.
In this way, the overhead of allocating a string is truly only the terminating zero byte - no alignment constraints. This scheme is easy to implement as long as strings don't need to be freed individually.
Also, consider that the terminating 0 byte isn't just one byte. There's also the alignment of what malloc returns, which may be 4 or 8 or even 16 bytes.