Page MenuHome

replace GLEW with libepoxy
ClosedPublic

Authored by Christian Rauch (christian.rauch) on Jun 25 2022, 1:43 AM.

Details

Summary

As discussed in https://developer.blender.org/D12034#412258, GLEW should be replaced by libepoxy to enable dynamic loading of OpenGL.

Build:

sh
make lite debug ninja BUILD_CMAKE_ARGS="-DWITH_GHOST_X11=OFF -DWITH_GHOST_WAYLAND=ON -DWITH_GHOST_WAYLAND_LIBDECOR=ON -DPYTHON_VERSION=3.10" BUILD_DIR="../blender_build"

Verify that there are no X11 dependencies any more:

sh
lddtree ./blender_build/bin/blender

and:

sh
./blender_build/bin/blender

should then start a pure Wayland client.

This also works with GLX (-DWITH_GHOST_X11=ON -DWITH_GHOST_WAYLAND=OFF).

This has not been tested on other systems (Windows, macOS) as I do not have access to those systems and the build bot does not allow me to trigger experimental builds any more (I get "you need to have role 'any-control'").

Diff Detail

Repository
rB Blender
Branch
epoxy
Build Status
Buildable 22690
Build 22690: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Operating system: Linux-5.17.5-76051705-generic-x86_64-with-glibc2.34 64 Bits
Graphics card: NVIDIA GeForce GTX 1080 Ti/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 515.48.07

I installed the epoxy of my OS and build with X11 backend.

When I quit blender I get a segment fault:

Saved session recovery to '/tmp/quit.blend'
blender: ../src/dispatch_common.c:858: epoxy_get_proc_address: Assertion `0 && "Couldn't find current GLX or EGL context.\n"' failed.
Aborted (core dumped)

libc.so.6!__pthread_kill_implementation(int no_tid, int signo, pthread_t threadid) (pthread_kill.c:44)
libc.so.6!__pthread_kill_internal(int signo, pthread_t threadid) (pthread_kill.c:80)
libc.so.6!__GI___pthread_kill(pthread_t threadid, int signo) (pthread_kill.c:91)
libc.so.6!__GI_raise(int sig) (raise.c:26)
libc.so.6!__GI_abort() (abort.c:79)
libc.so.6!__assert_fail_base(const char * fmt, const char * assertion, const char * file, unsigned int line, const char * function) (assert.c:92)
libc.so.6!__GI___assert_fail(const char * assertion, const char * file, unsigned int line, const char * function) (assert.c:101)
libepoxy.so.0![Unknown/Just-In-Time compiled code] (Unknown Source:0)
blender::gpu::GLTexture::samplers_free() (/home/jeroen/blender-git/blender/source/blender/gpu/opengl/gl_texture.cc:582)
blender::gpu::GLBackend::~GLBackend(blender::gpu::GLBackend * const this) (/home/jeroen/blender-git/blender/source/blender/gpu/opengl/gl_backend.hh:45)
blender::gpu::GLBackend::~GLBackend(blender::gpu::GLBackend * const this) (/home/jeroen/blender-git/blender/source/blender/gpu/opengl/gl_backend.hh:48)
GPU_backend_exit() (/home/jeroen/blender-git/blender/source/blender/gpu/intern/gpu_context.cc:243)
WM_exit_ex(bContext * C, const _Bool do_python) (/home/jeroen/blender-git/blender/source/blender/windowmanager/intern/wm_init_exit.c:612)
WM_exit(bContext * C) (/home/jeroen/blender-git/blender/source/blender/windowmanager/intern/wm_init_exit.c:641)
wm_exit_handler(bContext * C, const wmEvent * event, void * userdata) (/home/jeroen/blender-git/blender/source/blender/windowmanager/intern/wm_init_exit.c:411)
wm_handler_ui_call(bContext * C, wmEventHandler_UI * handler, const wmEvent * event, int always_pass) (/home/jeroen/blender-git/blender/source/blender/windowmanager/intern/wm_event_system.cc:732)
wm_handlers_do_intern(bContext * C, wmWindow * win, wmEvent * event, ListBase * handlers) (/home/jeroen/blender-git/blender/source/blender/windowmanager/intern/wm_event_system.cc:3162)
wm_handlers_do(bContext * C, wmEvent * event, ListBase * handlers) (/home/jeroen/blender-git/blender/source/blender/windowmanager/intern/wm_event_system.cc:3281)
wm_event_do_handlers(bContext * C) (/home/jeroen/blender-git/blender/source/blender/windowmanager/intern/wm_event_system.cc:3873)
WM_main(bContext * C) (/home/jeroen/blender-git/blender/source/blender/windowmanager/intern/wm.c:631)

When reporting a bug we might want to include if it is using wayland or X11. I don't believe every user is aware on which they are running. And it might important to quickly narrow down issues. We could add it to the Operating system label.
I did some testing:

  • renderdoc integration works
  • draw tests
[==========] 7 tests from 1 test suite ran. (8816 ms total)
[  PASSED  ] 4 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] GPUOpenGLTest.gpu_shader_compute_2d
[  FAILED  ] GPUOpenGLTest.gpu_shader_compute_1d
[  FAILED  ] GPUOpenGLTest.gpu_shader_compute_vbo

 3 FAILED TESTS

These test can be enabled using WITH_OPENGL_DRAW_TESTS=On. I haven't checked if the downloading/validation fails or the execution.

I cannot validate the render tests due to the segmentation fault. (WITH_OPENGL_RENDER_TESTS=On)

88% tests passed, 34 tests failed out of 278

The following tests FAILED:
         41 - bf_draw_tests (Failed)
         44 - bf_gpu_tests (Failed)
        115 - eevee_bsdf_test (Failed)
        116 - eevee_camera_test (Failed)
        117 - eevee_denoise_test (Failed)
        118 - eevee_displacement_test (Failed)
        119 - eevee_grease_pencil_test (Failed)
        120 - eevee_hair_test (Failed)
        125 - eevee_integrator_test (Failed)
        126 - eevee_light_test (Failed)
        127 - eevee_mesh_test (Failed)
        128 - eevee_motion_blur_test (Failed)
        129 - eevee_openvdb_test (Failed)
        131 - eevee_render_layer_test (Failed)
        132 - eevee_reports_test (Failed)
        133 - eevee_shader_test (Failed)
        134 - eevee_shadow_catcher_test (Failed)
        135 - eevee_sss_test (Failed)
        136 - eevee_volume_test (Failed)
        137 - workbench_bsdf_test (Failed)
        138 - workbench_camera_test (Failed)
        139 - workbench_denoise_test (Failed)
        140 - workbench_displacement_test (Failed)
        142 - workbench_hair_test (Failed)
        147 - workbench_integrator_test (Failed)
        148 - workbench_light_test (Failed)
        149 - workbench_mesh_test (Failed)
        150 - workbench_motion_blur_test (Failed)
        153 - workbench_render_layer_test (Failed)
        154 - workbench_reports_test (Failed)
        155 - workbench_shader_test (Failed)
        156 - workbench_shadow_catcher_test (Failed)
        157 - workbench_sss_test (Failed)
        158 - workbench_volume_test (Failed)

Opening scenes like (rain_restaurant) crashes. https://cloud.blender.org/p/characters/5f1ede754347a0fc05ba21a0

We should be careful that we didn't break any GPU workaround as this patch uses strings to detect extensions. I did go over them and couldn't find any, but we cannot know for certain.

Jeroen Bakker (jbakker) requested changes to this revision.Jul 15 2022, 9:38 AM
This revision now requires changes to proceed.Jul 15 2022, 9:38 AM

@Jeroen Bakker (jbakker), as mentioned above, there is a bug that needs to be fixed where glDeleteSamplers is called without an active context. I started working on a fix a while ago, while finish it and submit for review.

Updates for windows.

The crash in epoxy was epoxy just not being happy
when being used as a static library, they rely on
their dllmain being called when new threads get
created so they can allocate memory for the per
thread dispatch table, with those callbacks not
being called in static configuration, tls memory
was not being allocated and it ended up memcpy'ing
into a null pointer causing the crash.

This change moves epoxy to shared on windows which
will stop the crashing but blender still will not
start due to the fact epoxy_has_wgl_extension
cannot be called without an active context causing
blender to orderly report it has issues creating the
context and nicely exits without crashing.

I hacked a dummy context in (in the ugliest way
possible, code really isn't contributable) and was
able to get blender to start. but jeroen or clement
will probably have to take a look here to do this in
a maintainable fashion.

Preliminary windows libs added in rBL62973

  • fix pdb being harvested

I updated the wrong patch, sorry about that, this should revert back to the last patch from @Ray Molenkamp (LazyDodo).

After various fixes I got the OpenGL tests passing now on Linux.

  • D15463 broke tests in master so had to fix that first.
  • D15465 fixes crash on exit
  • D15470 fixes crash in OpenSubdiv
  • bf_draw_tests is still partially failing, but it's the same in master (undefined variable "drw_curves")

As a Linux and macOS developer this looks ready to me now, so I'll accept it from that point of view, but presumably the Windows side needs more work.

Add back 'clog' include, needed for building with Wayland.

Tested with WITH_GHOST_WAYLAND & WITH_GHOST_SDL, both seem to work fine (tested multiple-windows & rendering).

Just a heads up, the patch doesn't apply cleanly with the latest master changes.
The rebase conflicts are trivial though. (I would push the changes here, but I currently do not have arc working and I don't want to commandeer the revision for this small fix)

Rendering seems to be working on my end as well.
However after checking the blender binary with ldd is seems like we are still linking in libGL and friends.
We have forgotten to remove the library includes from our main cmake file.
However this also means that we will have to rework our logic a bit in a few places.

Here is an example patch that strips out most of the opengl code from our side: https://developer.blender.org/P3098
Note that it is just an example and we need to do more work on it.
However with those changes, Blender doesn't link in any GL libs by itself and things should now all be handled by libepoxy.

If we don't do this, then the lib epoxy dynamic library loading will not work.
(Though it might not be obvious as the program still runs if the linked GL libs are there)

I've also removed the ANGLE library in the patch as that seems to be dead code (we don't support OpenGL ES).

Besides this, we also need to make EGL and GLX loading truly dynamic and not a compile option.
Otherwise we can't ship official builds that works on both GLX only and EGL platforms.
I suggest that we do this by trying to load the EGL libs with lib epoxy, if that fails, we try to fallback to GLX.

Once our supported distros that only supports GLX (CentOS IIRC) dies, then we can remove GLX support entirely.

@Brecht Van Lommel (brecht) I also noticed that the cycles stand alone build option uses the "blender gl libs" variable.
I'm guessing that we would have to make it use libepoxy or make it always look for its own GL libs?

I checked the draw test cases and will add them to my list for fixing.

I've posted my follow up cleanup D15554 and headless rendering patch D15555

Rebase on master (resolve conflicts)

Add WITH_SYSTEM_EPOXY to support linking with the system's libepoxy.
without this I wasn't able to test D15554, D15555.

  • Previous update unintentionally removed: build_files/cmake/Modules/FindLibEpoxy.cmake

Add WITH_SYSTEM_EPOXY to support linking with the system's libepoxy.
without this I wasn't able to test D15554, D15555.

Aren't the WITH_SYSTEM_* flags only supposed to be used to exclude the hard bundled libraries that are in extern/ ?

I talked to Sybren as well about this. We think it is very weird that you would use this to exclude precompiled libraries this way.

For me using the system libepoxy worked fine for me when not using the precompiled libraries without this change.
At the same time it also seemed to work fine for Sybren with the precompiled libraries. He didn't need to resort to using his system libraries.

To me it seems like this is not a good fix for the issue. More of a workaround.

Add WITH_SYSTEM_EPOXY to support linking with the system's libepoxy.
without this I wasn't able to test D15554, D15555.

Aren't the WITH_SYSTEM_* flags only supposed to be used to exclude the hard bundled libraries that are in extern/ ?

Sometimes I bypass libraries in lib/ (occasionally I run into compatibility issues) and was used to using system glew, but your right, this is mainly for libraries in extern/ with the exception of freetype which is a special case IIRC.

Removed WITH_SYSTEM_EPOXY.

Sometimes I bypass libraries in lib/ (occasionally I run into compatibility issues) and was used to using system glew, but your right, this is mainly for libraries in extern/ with the exception of freetype which is a special case IIRC.

Removed WITH_SYSTEM_EPOXY.

Thanks!

If you have issues compiling this with the bundled libs, we should of course fix that as well.
Do you still have problems with that?

LGTM, it builds well, and the only X-related libraries I see still getting linked in are pulled in by Pulseaudio, so that's fine:

sybren@ws-sybren ~/w/b/blender (arcpatch-D15291)> lddtree (which blender)
blender => /home/sybren/bin/blender (interpreter => /lib64/ld-linux-x86-64.so.2)
    libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
    libepoxy.so.0 => /usr/lib/x86_64-linux-gnu/libepoxy.so.0
    libjack.so.0 => /usr/lib/x86_64-linux-gnu/libjack.so.0
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
    libpulse.so.0 => /usr/lib/x86_64-linux-gnu/libpulse.so.0
        libpulsecommon-13.99.so => /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-13.99.so
            libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1
                libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6
                libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6
                    libbsd.so.0 => /usr/lib/x86_64-linux-gnu/libbsd.so.0
            libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0
                liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5
                liblz4.so.1 => /usr/lib/x86_64-linux-gnu/liblz4.so.1
                libgcrypt.so.20 => /usr/lib/x86_64-linux-gnu/libgcrypt.so.20
                    libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0
            libwrap.so.0 => /usr/lib/x86_64-linux-gnu/libwrap.so.0
                libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1
            libsndfile.so.1 => /usr/lib/x86_64-linux-gnu/libsndfile.so.1
                libFLAC.so.8 => /usr/lib/x86_64-linux-gnu/libFLAC.so.8
                libogg.so.0 => /usr/lib/x86_64-linux-gnu/libogg.so.0
                libvorbis.so.0 => /usr/lib/x86_64-linux-gnu/libvorbis.so.0
                libvorbisenc.so.2 => /usr/lib/x86_64-linux-gnu/libvorbisenc.so.2
            libasyncns.so.0 => /usr/lib/x86_64-linux-gnu/libasyncns.so.0
                libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2
            libapparmor.so.1 => /usr/lib/x86_64-linux-gnu/libapparmor.so.1
        libdbus-1.so.3 => /lib/x86_64-linux-gnu/libdbus-1.so.3
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
    ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
Campbell Barton (campbellbarton) accepted this revision.EditedAug 3 2022, 8:42 AM

If you have issues compiling this with the bundled libs, we should of course fix that as well.
Do you still have problems with that?

Tested make deps and the resulting libepoxy works as expected.

Windows was not working at all for me, and needed significant changes. Now things seem to be working for me, and workbench/eevee tests pass.

With GLEW all function pointers for extensions are initialized in glewInit and remain in place after destroying a context. With libepoxy this is not the case, they can only be initialized within a valid context.

That meant the dummy context used to create the actual context needs to have a longer lifetime than before, and the code was refactored to achieve that.

Additionally, the dispatch table switching does appear to be slow, so I have patched libepoxy to not do that. In Blender all contexts are using the same OpenGL version so there is no need to switch, and with GLEW this was already an implicit assumption.

This requires the Windows libepoxy library to be rebuilt.

Fix Windows buildbot error.

I'm having the following compilation warning on macOS:

1[56/2470] Building CXX object intern/opensubdiv/CMakeFiles/bf_intern_opensubdiv.dir/internal/evaluator/gl_compute_evaluator.cc.o
2In file included from /Users/sergey/Developer/blender/blender/intern/opensubdiv/internal/evaluator/gl_compute_evaluator.cc:27:
3In file included from /Users/sergey/Developer/blender/blender/intern/opensubdiv/internal/evaluator/gl_compute_evaluator.h:31:
4In file included from /Users/sergey/Developer/blender/lib/darwin/opensubdiv/include/opensubdiv/osd/opengl.h:36:
5/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/OpenGL.framework/Headers/gl3.h:15:2: warning: gl.h and gl3.h are both included. Compiler will not invoke errors if using removed OpenGL functionality. [-W#warnings]
6#warning gl.h and gl3.h are both included. Compiler will not invoke errors if using removed OpenGL functionality.
7 ^
81 warning generated.

Something like

1diff --git a/intern/opensubdiv/internal/evaluator/gl_compute_evaluator.cc b/intern/opensubdiv/internal/evaluator/gl_compute_evaluator.cc
2index df747e23d2a..148770b0d39 100644
3--- a/intern/opensubdiv/internal/evaluator/gl_compute_evaluator.cc
4+++ b/intern/opensubdiv/internal/evaluator/gl_compute_evaluator.cc
5@@ -24,6 +24,20 @@
6
7 #include <epoxy/gl.h>
8
9+/* There are few aspects here:
10+ * - macOS is strict about including both gl.h and gl3.h
11+ * - libepoxy only pretends to be a replacement for gl.h
12+ * - OpenSubdiv internally uses `OpenGL/gl3.h` on macOS
13+ *
14+ * In order to silence the warning pretend that gl3 has been included, fully relying on symbols
15+ * from the epoxy.
16+ *
17+ * This works differently from how OpenSubdiv internally will use `OpenGL/gl3.h` without epoxy.
18+ * Sounds fragile, but so far things seems to work. */
19+#if defined(__APPLE__)
20+# define __gl3_h_
21+#endif
22+
23 #include "gl_compute_evaluator.h"
24
25 #include <opensubdiv/far/error.h>
would solve the warning by ensuring our OpenSubdiv integration uses epoxy for OpenGL. However, internally OpenSubdiv could still include OpenGL/gl3.h and use that instead of epoxy.
Sounds inconsistent and fragile. Can not find a desired design of how external libraries are expected to use OpenGL, so can not provide any further suggestions at this time.

By testing viewport render and GPU Subdivisions on Mac Pro everything seems to work. I did not run the regression tests yet though.

intern/opensubdiv/internal/evaluator/gl_compute_evaluator.h
28

If the OSD_USES_GLEW never to be used then simply remove add_definitions(-DOSD_USES_GLEW) from the CMakeLists.txt.

If there is some obscure reason for some code paths to use OSD_USES_GLEW but not others then it should be documented much better.

By testing viewport render and GPU Subdivisions on Mac Pro everything seems to work. I did not run the regression tests yet though.

Do note that GPU subdivision is likely disabled on Mac since they do not meet the GL 4.3 requirement (or compute shader support) for this feature.

  • fix bin path on windows harvest
Ray Molenkamp (LazyDodo) requested changes to this revision.Aug 5 2022, 4:27 PM

Additionally, the dispatch table switching does appear to be slow, so I have patched libepoxy to not do that. In Blender all contexts are using the same OpenGL version so there is no need to switch, and with GLEW this was already an implicit assumption.

Updated the epoxy windows libs, but.... I'll be honest, the nature of that patch is a little concerning, how bad is the perf without it? Since patching it leads to the situation where downstream distro builds will have perf issues unless they apply this patch as well, a patch that will likely be breaking other consumers of epoxy in their eco system, and that's ain't right, not right at all.

Options i see

  1. Take it up with upstream epoxy
  2. Stick epoxy in /extern , default to our patched version, and put a stern note on WITH_SYSTEM_EPOXY warning about the perf implications of enabling it
  3. keep the patch as is and let downstream builds fend for them selves
This revision now requires changes to proceed.Aug 5 2022, 4:27 PM

@Sebastian Parborg (zeddb) was quick to point out on chat the changes are WGL only, i'm an idiot. good to go!

This revision is now accepted and ready to land.Aug 5 2022, 4:31 PM

Thanks for the additional diffs to this. Is the libepoxy support now ready to be merged into master?

Yes, I'll merge the patch.

Include Sergey's patch for macOS warnings.

This revision was automatically updated to reflect the committed changes.

Edit: False alarm, running make update fixes this problem

CMake problem on macos with commit a296b8f694d1:

-- Detected OS X 12.3 and Xcode 13.4.1 at /Applications/Xcode.app/Contents/Developer
-- SDKs Directory: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs
-- Detected OSX_SYSROOT: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk
-- 'WITH_PUGIXML' is disabled: forcing 'set(WITH_OPENIMAGEIO OFF)'
CMake Error at /opt/homebrew/Cellar/cmake/3.23.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Epoxy (missing: Epoxy_LIBRARY Epoxy_INCLUDE_DIR)
Call Stack (most recent call first):
  /opt/homebrew/Cellar/cmake/3.23.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  build_files/cmake/Modules/FindEpoxy.cmake:36 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  build_files/cmake/platform/platform_apple.cmake:230 (find_package)
  CMakeLists.txt:1009 (include)


-- Configuring incomplete, errors occurred!

Edit: False alarm, running make update fixes this problem

Would be nice next time to not forget about install_deps script with such big changes...