All the major OSes evolved through different paths, but eventually ended up with...

All the major OSes evolved through different paths, but eventually ended up with very similar schemes.

Windows (ignoring the 9x series): In NT 3.x, GDI was just a thin DLL that RPC'd over to CSRSS.EXE. CSRSS did the actual drawing directly from userspace. As recently as XP (I haven't checked later) CSRSS still has an elevated IOPL, which meant that despite being a userspace process, VMware had to use binary translation on it, instead of direct execution. In NT4/XP, GDI (and some other stuff) moved into win32k.sys. Userspace would essentially do a syscall, and the kernel would talk to the video card. For 3D graphics, the model is that userspace apps load a video-card specific library to generate command buffers for the video card. When the app wants to execute a command buffer, it uses a syscall; the kernel verifies that the commands are safe, and submits it to the hardware. In Vista and later, a path similar to the 3D path is used for all drawing, only the drawing is done on an app-specific offscreen surface. Another userspace process generates command buffers to composite those offscreen buffers together to generate what you see on the screen.

Linux/X11: In the dark ages, it was very similar to NT 3.x (X came first, I just ended up writing in this order). Applications used xlib to generate X protocol messages which were sent via a unix domain socket, or a TCP socket to the X server. The X server then, from userspace, programmed the video card. This had the same IOPL implications for the X server as CSRSS.EXE. When 3D acceleration was added, it worked very similarly to 3D in NT4/XP. Finally, with compositing and now Wayland, the model is similar to Vista+.

OSX: In NextStep/early OSX, applications drew (software only) into shared memory chunks. A userspace compositor process did software compositing from those shared memory chunks into the video card's VRAM. With middle OSX (can't recall exact versions here), the compositor process started to upload dirty regions from the shared memory chunks into offscreen VRAM surfaces, and then programmed the video card to composite them together. Finally, modern OSX works similar to modern Linux and Vista+.

I just wrote them up in this order arbitrarily. X did drawing via RPC to a user-space process long before NT and NextStep existed. NextStep did compositing long before the other two. Ironically, given the flamewars and marketing of the 90's, Linux/X was exactly as "microkerneley" as NT3.x and NextStep, and more so then NT4. And they all evolved towards very similar architectures.