Previously, we had one global GPU_matrix stack, so the API was not thread safe. This patch makes the stack be per GPUContext, effectively making it local per thread.
A little gripe is that the API now calls into the active GPUContext, so it's not totally self-contained/independent. We could add _ex variations for all public GPU_matrix functions, taking the GPUContext as argument, to make this dependency explicit and controllable. We probably wouldn't use it for the time being though.
Needed for VR session drawing on a separate thread.