Things arrived in the mail today! With luck, I'll have the server for the bot set up this weekend :D

a hitch in my plan: opening a complicated model on a windows 11 vm running on half a CPU core on a potato takes a hot minute to compile a few hundred shaders 🙃

another hitch in my plan is my RSI is flaring up tonight :|

this is the problem model in the first image. each color is a different shader that had to be generated by tangerine to render the voxel, and the average complexity of the generated shaders is also high.

the second image is the generated part for one of the these shaders. its basically the object structure w/ all the params pulled out.

it's basically half way to being an interpreter, i'd just need to also pull the call structure out into a parameter buffer, and replace the function with an interpreter loop. it would be slower than rendering with all of the compiled shaders, but it only needs to produce one frame for the bot

incidentally this happens to also be my plan for how to deal with shader compiler hitches in general, i've just been procrastinating on it because opengl doesn't believe in async shader compiling

here's some very entertaining reading about the same problem in a completely different project

here's the wheel loading in normally on my main machine. the shaders are cached by the driver, so the pop in is a lot faster than it would be on a cold start.

and here's the same model running with the new interpreter. once the octree stuff is done processing, the model renders with no pop in. The time to image is much lower, but a given frame is much longer.

anyways, here's the code for just the interpreter if you are curious. it is quite short.

when i get around to adding occlusion culling this should become quite fast, as a lot of the frame time is burned rendering voxels you can't see. visibility feedback could also be used to prioritize the compiling queue. this also might mean a wysiwyg editor could be possible since the time to render is instant once the octree is solved. lots exciting stuff.

@aeva whoa, this is cool! Do you have different sdfs at different levels of the octtree?

@jonbro yes :D the root of the tree contains the entire model, which would be too slow to render compiled or otherwise. the octree splits to eliminate dead space, and as it does so each node removes the parts of the CSG tree that can't effect it resulting in a simpler SDF

@aeva oh neat! storing aabbs for the sdf ops is clever.

@aeva I guess I can't think of an alternate way to approach :D

this is really cool to see an end to end implementation of this.

@jonbro thank you :D also i wrote a blog post a while back about the general technique

@aeva awesome! I'm not sure I'm ready to revive my toy voxel sdf thingy, but these notes are gonna be my starting point if i do.

I gave up at the culling SDF ops stage, so I could never really have complex models :(

@jonbro so far this approach is working quite well for me. the main problem is the distance fields aren't exact after any set operators, so it can't cull as aggressively on the CPU as i would like it to. it also definitely needs clustered occlusion culling. I think this strat has promise though.

@aeva I don’t know if it works on any other platform but on macOS you could achieve async shader compilation by using shared contexts — each context could compile one shader at once. ‘Course, some of the compilation was still deferred to see the relevant state at draw time, so you tended to need to draw with each shader and the correct state bound, so it was a huge pain in the butt…

@OneSadCookie i had a go at the shared contexts approach, and it ended up causing a lot of mystery behavior, like long half minute hangs in strange places like timing queries and such. the problem with shared contexts is that when you opt into them, the driver then turns on a ton of hazard tracking and synchronization it normally doesn't have to do, and thus is not as well QA'd as the main stuff.

@OneSadCookie there's also an extension for parallel shader compiling that doesn't work properly on nvidia, and there's also the shader binary trick where you compile it in another process and then load the binary. a common problem to all of these is that opengl likes to recompile shaders it already compiled the first time you use them and every time you change pipeline state for Reasons

@OneSadCookie i'm planning on jettisoning gl for vk because of this, but the going is slow because vk is supremely unpleasant to write for some reason.

@aeva ah yeah, that is all very sucks. And I haven’t written Vk myself but I’ve seen somebody’s setup code and some of the complexity reflected through WGPU, so I can understand the desire to stick with GL for a bit longer!

@aeva You could try compiling to SPIR-V in a separate thread, that way the driver only has to go from bytecode to machine code.

@lh0xfb how much of an undertaking is it to convert from gl/glsl to spirv/glsl?

@lh0xfb also does SPIRV let you turn off optimizations?


I use shaderc:

It has several options, like optimization level, source language, and target environment (I'm using HLSL + vulkan).
It's shipped as part of the vulkan SDK, so I just LoadLibrary and load its functions to use from C.

There's also glslang:

Also dxc can compile hlsl to dxil or spir-v:


I am not 100% sure, but it looks like it would not be a huge undertaking if you are already using GL3.3 core.
There are some code snippets on the GL wiki linked above, it looks pretty trivial to change the cpu code side.

@lh0xfb i'm targeting 4.2 with some extensions. i don't think i'm using anything particularly exotic

@aeva That should be fine; I think the main requirement is explicit attribute and binding locations, so that everything links together consistently.

@lh0xfb ah! i'm already doing that anyway since it makes the gl side simpler


Here's a site where someone shared their before / after of converting their GLSL shader over to work with SPIR-V:

I think it may also affect your cpu code with regard to relying on GL to perform reflection on your shader, eg. to find the binding id of a uniform. With SPIR-V you'd instead need to know which binding id you want to update.

It's still a hell of a lot simpler than vulkan, where you have to also provide even more info about *everything* and do descriptor set allocation, writes, and lifetime / state management such that descriptors are not modified while the gpu is using them.

I do the laziest/simplest thing possible, where I only have one global descriptor set per frame, I update it once before any rendering is done, and reclaim it a couple frames later.

@aeva could u… precompile them on a different machine

@mcc so tangerine is essentially a shader compiler, where the input is a csg graph and the output is a bag of shaders. most stuff generates a low number of permutations and the driver caches it. at worst its usually a few seconds of waiting before the model renders and then everything is fast.

@mcc i've been planning on building an interpreted variant to run while compiling in the background, but async shader compiling on opengl is really hard to do so i've been putting it off. however i think in this use case i don't want to compile the shaders at all, so an interpreter would be perfect

@mcc it wouldn't be that hard to put together from what I have right now since all of the shape parameters are extracted and passed in via a buffer already, and I'd just have to come up with some op codes to pass in the same way

@aeva so I guess like… it sounded like your goal was to put it on a low power device. Why do you need to generate the cache on a low power device? Could you not generate the cache and the shader bag on the high power device and zap it over to the low power one at init time?

@aeva the fact you are planning to deploy exactly one unit makes several workflows viable that otherwise maybe would not be

@mcc so a quick example, these are the shaders that the compiler produces for the step pyramid vs the wheel-o-problems. they can't be produced ahead of time since they are specific to the models

@mcc since i was using llvmpipe on the potato, i could probably have a different machine with the same version of llvm pipe compile the final glsl and extract the binary and so on, but gl makes that kind of thing very brittle. i'm not confident that it would work between two different OSes, and idk if it uses CPU specific extensions when available

@mcc however, if i build an interpreter for the generated part then i can avoid the problem entirely

Sign in to participate in the conversation

The original server operated by the Mastodon gGmbH non-profit