typically the driver buffers 2-3 frames worth of the command stream.
the command buffer is only accessible by the driver, in Ring0, you cant get at it in Ring3.
why is it not acceptible to issue standard API calls, just like the OS does?
for instance, Windows Desktop Window Manager uses D3D9Ex in Vista, and provides each window that is created with a rendertarget buffer to draw into.
unless your hobby OS is writing a driver layer, its pretty hard to get below the API.
and then, you'd need to write drivers for each piece of hw you want to support, using your new driver layer, eg device driver interface or DDI.
to understand the DDI between the driver and the hw, you'd have to not only have access to the Windows DDK and the driver source MS provides to have a working example, but also access to each IHVs hardware reference manual. unless you have relationships with all those companies, that is pretty hard to pull off.