The OS program loader is who allocates pages to .bss and initializes them to zero. In the case of an OS, who would do that initialization? At best it would be the bootloader. Initialization during kernel_init() before using any globals is acceptable too.
Having seen both sides of this, people are arguing about the proper way to skin a cat. The standard only cares the cat be skinned not how.
On OS it's done by the OS for security and performance reasons. One you don't want people to snoop on memory freed from other processes. Two the OS can use the MMU to map in previously zero's pages of memory on demand, so you don't need to actually zero the entire .BSS section on startup.
On a bare metal system usually it's done either in assembly (or more cheezy in C) + linker magic.