Page MenuHome

[Cycles] WIP: Partial Implementation of OSL On the gpu.
AbandonedPublic

Authored by Ray Molenkamp (LazyDodo) on Jan 24 2016, 1:33 AM.

Details

Summary

This patch Implements a subset of osl on top of cycles-svm

Tested on:
-MSVC2013/X64
-CPU
-GPU Cuda/gtx670
-GPU OpenCL/gtx670

Current state:

  • This patch picks up the compiled oso file and implements a basic oso->svm compiler. External files are currently not supported. (Couldn't find a hook i could use in the osl compiler classes that would just give me the bytecode)
  • Loops are fully supported (do/while, while, for)
  • Except the color closure and matrix datatypes, all datatypes should be supported.
  • Given the small stack space we have for svm, string support is very limited, strings are hashed (hash is lame right now, should really replace it with crc32 or something better) during pre-process, so direct compares and assignment works, but none of the concatenation, substring, etc operators.
  • Colorspace Conversion is limited, while hsv->rgb conversion works, the results are different between svm and osl, seems like svm is clipping the input values while osl looks to do a modulus , noticed a difference haven't investigated the cause yet.
  • No matrix support what so ever.
  • No closures, I don't understand well enough how they are implemented in cycles (or what they really are) so I left this for 'some other time' or 'someone smarter'
  • Limited globals, P works, i'm fairly sure I implemented u/v wrong given the results are quite different between osl and this implementation all others are missing.
  • No texture support
  • No transformations
  • No debug printf support
  • Noise, cellnoise 3D works, everything else is either not implemented or give a different results compared to regular osl.
  • No Pointcloud support
  • the Math node has been extended significantly, while all new operations have been tested, I have not exposed them to the end user in the Node Editor UI. The math node has this drop down that becomes really long/useless with this many items in there, a better UI solution might be needed here. Other nodes (like integermath) are not exposed in the end user UI at all.

Performance:

Some benchmarks:
http://www.lazydodo.com/tmp/osl_gpu.png
http://www.lazydodo.com/tmp/gpu_osl3.jpg
http://www.lazydodo.com/tmp/gpu_osl2.png

When rendering 100% osl shaded pixels, osl generally is faster, but when used for a few elements in a scene, svm tends to be the faster one

Overall conclusion:

If you're using osl for procedurals, this patch is fairly feature complete, and will work wonderfully , more advanced use is currently not supported, some things should be trivial to implement (transformations) others not as much (string support will be hard given the limited stack space available) others will be near impossible (printf from a shader? yeahh unlikely that'll happen)

The biggest risk is the oso format, it's is not standardized / documented , the osl people are free to completely change this format next revision and screw up this implementation.

Diff Detail

Repository
rB Blender

Event Timeline

Ray Molenkamp (LazyDodo) retitled this revision from to [Cycles] WIP: Partial Implementation of OSL On the gpu..
Ray Molenkamp (LazyDodo) updated this object.
Ray Molenkamp (LazyDodo) set the repository for this revision to rB Blender.

For me the biggest issue is the cost to maintain and support this, it adds a lot of visible and hidden code complexity.

This implements a small subset of OSL, and the parts that are missing are the most difficult to implement. And so the question is then how would this evolve? Would you leave it more or less at this subset, or would you actually target a full re-implementation of OSL in Cycles?

Because I'm just not sure if the limited developer time should be spent on doing that, handling bug fixes, documenting which subset works, keeping it updated as OSL evolves, having to keep this working when they want to bigger shading system changes, etc. It's almost adding a third shading system next to SVM and OSL, and having two is already problematic.

As I explained before, I think this is not the ideal way to add OSL support on the GPU. Performance will not be great because SVM really isn't optimized to be used to execute such granular instructions, and by compiling OSO straight to SVM it's missing all the smart OSL optimizations.

So I'm wondering where you see this going. Right now it provides an easier way for programmers to create certain procedural patterns on the GPU. But a lot of such clever procedurals tend to be nice demos but not used by much by artists in practice, and performance of these scripts could be quite poor. So I feel like for something like this to be worth the cost, it would need to move beyond that, and I'm not sure how that would happen.

For me the biggest issue is the cost to maintain and support this, it adds a lot of visible and hidden code complexity.

This implements a small subset of OSL, and the parts that are missing are the most difficult to implement. And so the question is then how would this evolve? Would you leave it more or less at this subset, or would you actually target a full re-implementation of OSL in Cycles?

Because I'm just not sure if the limited developer time should be spent on doing that, handling bug fixes, documenting which subset works, keeping it updated as OSL evolves, having to keep this working when they want to bigger shading system changes, etc. It's almost adding a third shading system next to SVM and OSL, and having two is already problematic.

As I explained before, I think this is not the ideal way to add OSL support on the GPU. Performance will not be great because SVM really isn't optimized to be used to execute such granular instructions, and by compiling OSO straight to SVM it's missing all the smart OSL optimizations.

So I'm wondering where you see this going. Right now it provides an easier way for programmers to create certain procedural patterns on the GPU. But a lot of such clever procedurals tend to be nice demos but not used by much by artists in practice, and performance of these scripts could be quite poor. So I feel like for something like this to be worth the cost, it would need to move beyond that, and I'm not sure how that would happen.

This a proof of concept, I wondered if it could be done, and the answer as it stands is 'partially'. Where to go from here was really up to you guys and the reason why i posted this WIP patch. If you go 'nah we'll never accept this' i'll go on my merry way and do something else, if you go 'we could see this work, if you improve X & Y & Z' i'd happily work on that too.

Perf wise i'm not overly worried, sure osl does some nice tricks behind the scenes such as unrolling loops and some constant folding, (and i've seen some 'dead' instructions with no clear purpose in some oso's which llvm will most likely get rid of) but do feel that most of those optimizations are rather isa specific, (i contemplated compiling oso to ptx and trying to force feed that to cuda, but that path just contained too many unknowns to be comfortable with) I do suspect that most artists will use osl on specific elements in their scene, just to get some functionality regular nodes does not expose, and from what i can tell, the svm implementation is pretty much always faster than rendering the whole scene in OSL.

While it seems like a lot of 'hidden' code / third shading system, It's really just the oso->svm translation that is specific to this patch, every thing else is exposed in regular svm nodes, The pretty much doubling of math operations in the math node ? I'm fairly sure a regular nodes user could find a good use for them, more noise types? regular nodes users would *love* them. (I think there's actually a ticket somewhere for implementing the gabor noise in svm, doesn't look like someone picked it up yet though) So even if we don't go though with an osl script node for svm, I think parts of the patch are very usable for end users.

My biggest worry about maintaining this is chasing after osl releases to keep this up to date, unless the oso format gets formalized and documented by them, i have very little interest and making this patch 'officially supported' but it feel this is a discussion we should have with the osl people after we decide this is a direction we want to go in. (I'd hate for them to pour time into formalizing and documenting, only to go 'nah we decided not to do this')

I'd love a full implementation (minus the impossible bits like printf) and if that's hard, let it be hard! However given the time i already put in, I decided to get some feedback first to see if i am wasting my time here. So this is more of a 'Is this going to work for you guys?' than a 'here's a patch, it's your problem now!' kind of thing.

For the record: printf is quite doable for the compute model 2.0 and above, which covers all the officially supported platforms. But good luck supporting attributes query, ray tracing and such.. Feature complete OSL on SVM isn't really doable in the nice and fast-to-render way.

As for math node -- those operations are quite specific, never seen artists asking for such an operations actually.

Noise textures -- IMO we should on;y distribute reasonable set of textures really commonly used by artists, leaving all the rest cases to either node groups or OSL shaders. Surely one might raise a concern about OSL being slower than SVM, but that's the issue to be addressed.

Clearly not a direction we want to go in.

this would be cool .. coding via scriptnode is kiss like for procedural-texture-fun ( i wish there could be cycles gpu-power and microdisplacement to :)