May 19, 2011
Sometimes… Working in the games industry SUCKS…
I just found out yesterday that the major NGP project we were working on has now been canned by Sony…
All the hard work we’ve been putting in over the past several months to get the game ready for it’s big E3 debut was all in vein…
& even though I still have a job I can’t help but feel bad for all the talented young men & women who have been let go as a result of this…
I know this kind of thing happens all the time in the games industry but it just sucks so bad when it finally happens to you & your left wondering whether the complete lack of developer stability & job security really makes this worth it in the long term… I wanna have a family someday too…
All I can do is pray for all those who got let go, that they can hopefully find something better & (knowing how small this industry really is) rest on the knowledge that we’ll probably work together again sometime in the future…
Wish you all the best guys…
May 6, 2011
After a rather long hiatus I’ve decided to return once more to drop some new nuggets of learning I’ve picked up along my travels…
In this post I’m going to be dealing with the subject of modern graphics development & a topic that alot of studios probably do but likely don’t tell anyone how they do it (in detail)…
In modern day console hardware we generally find that dynamic branching sucks..
This generally wouldn’t be a problem outside of the fact that sometimes when trying to maintain a well structured and organised shader library for your game, should you choose to go the uber shader route, you generally need to use some form of branches or switches to determine which features of the shader are used by each model material.
In an ideal world we’d just stick a load of “if” statements in the shader code (like we do in most modern HLLs) and walk away, however the apparent cost of these simple branches on a piece of hardware thats built for maximising data throughput, can easily render our efforts prohibitively crippled performance-wise…
So what’s the solution?
In short, static branch removal…
Now those of you who are familiar with uber shaders will know exactly what this is however for the rest of you who don’t I’ll try to explain..
The aim is to mitigate the cost of dynamic branches in your uber shader by effectively trading computation time for memory. This is done by simply compiling your shader multiple times, with each “permutation” defined by turning on or off various parts of/features in the code. These collections of shaders can then be used at run time so when a material is set to be rendered it chooses the correct permutation it needs & runs that. This method provides the benefit of both maintaining the centralisation of code of the uber shader paradigm whilst providing a necessary means of significantly simplifying the compiled code for materials that only use a subset of the uber shader’s featureset.
Sounds easy right?
Not so much in reality as it can take some careful planning and, depending on the shader asset pipeline for your game, may be much harder to engineer and integrate than initially perceived.
As far as the core implementation goes however, there are various approaches that can be taken in order to do this and since branch removal is performed as a pre-process before compilation, it can be done in any number of ways. The most flexible approach would probably be in setting up a tool that parses the shader source and locates branches on literal “if” statements on any feature switch/bool shader constant of interest, generating duplicate sources of the code with and without the conditional-code-to-be-executed present or not. This is however a very complex task & not one I’d advice anyone to attempt as it would take a lot of work to get the parsing code robust enough to be useable in the real world.
The alternative however is to get the compiler to do it for you by replacing branches of interest in the code with special macros defined which enable you to configure at compile time the code that is and isn’t present in the shader.
In general I’ve performed this step by doing the following:-
- In the shader or a shader header file the preprocessor checks if a macro to define branch removal macros (brms) is defined & if not, defines it setting it to false (e.g. think #ifndef DEFINE_BRANCHREMOVALMACROS #define DEFINE_BRANCHREMOVALMACROS=0 etc..)
- If your brm’s define macro is set to false, define your brms to resolve to bog standard “if” statements (this allows an environment that’s not setup to deal with your branch removal system to still be able to compile and run your shader, useful if your artists are using it directly for visualising materials in Maya or FXComposer for example..) otherwise resolve them to a special case “if” statement where your condition is now a compiler input macro
- During branch removal the compiler is setup and the brm’s define macro is set as a compiler macro define input as well as a list of branch condition input macros which resolve directly to either true or false depending on the current permutation you wish to generate
- The compiler is then run multiple times with each run modifying the values of the branch condition input macros in order to generate different code for each permutation. This works as each branch condition now resolves fully to either “if( true )” or “if( false )” by the compiler, which will then proceed to optimise out/in code, mitigating the generation of any branch instructions in the compiled binary
In order to do this your system still needs to be able to somehow analyse the shader source code and identify all the dynamic branch bools/switches used in the vertex and pixel programs as it’s these bools/switches you’ll be replacing with your macros when the time comes to compile. If your shaders are strictly Cg, NVIDIA provides a Cg API that allows you to quickly and conveniently interrogate your shaders in this way in order to retrieve the information needed.
Once you’ve compiled your list of switches you’ll need to categorise and tag your permutations so that you can identify what configuration of these bools each permutation represents. This will allow you to select the correct shader permutation at run-time for each material by matching the permutation’s switch configuration tag against that of the correct shader binary, probably via some sort of hashing setup.
In the implementation I did, I used a 16-bit bitmask to represent the hashkey for each configuration of the uber shader where each bool was added into a switch-table (made up of a bit shift position and id pair) & each one represented a single bit in the mask. This meant that no uber shader could define anymore than 16 switches however, given that the total number of permutations for any given shader program is 2^n (where n = number of switches), I figured trying to generate more than 2^16 or 65,536 possible permutations would be undesirable at best (both in terms of run-time memory overhead and compilation-time).
Also one of the benefits of using the 16-bit bitmask key was that the algorithm for generating the switch values for compiling the shader programs was simple as I could simply iterate from 0 to the maximum number of configurations of my shader, with each iteration simply taking the iterator index value, treating it as a bit mask and using those values to set each branch condition input macro values (e.g. MYMACROSWITCH=true or false).
Once you’ve got this far the only thing left to do is re-wire your run-time to load each group of permutations in-place of the old, single shader & during render setup, as each material sets the values of the bools to configure the shader for use, the system internally takes each bool, searches the switch-table for the corresponding bit in the current active config mask and sets the bit to the value of the bool for the material. Then when it comes to render with the material, the current active config mask is used to index into the hash structure of shader program permutations to select the correct one for use…
That’s about it folks!
Not the most straight-forward system to engineer however a little patience, thought and proper code design can go along way into helping you put something together thats both robust and scalable.
Other factors to note also are generally related to the memory-footprint of the system and with so many permutations of each shader, you can quickly end up in a situation where it’s easy to run out of memory if you’re not careful. There are however steps that can be taken to reduce the memory load however these require a much more analytical approach as they’re generally much more specific to the needs of your team/game/engine (i.e. stripping permutations of shaders you know will never be used either because they don’t logically make sense given the semantics of the switches themselves, or because they’re known to never get assigned for use at run-time, if material parameters never change for example…)
For more details on this however you’ll probably want to check out an awesome tri-ace GDC presentation [here].
That’s all from me for now…
Hope this post has been useful and insightful…
More to come in future
- The Hog
March 12, 2010
Sooo I made it to GDC this year…
First time out in San Fran too which is a great bonus. After arriving on Saturday and spending a couple of days recovering from jet lag & getting accustomed to the surroundings we finally made it to the main conference sessions today. Had a couple of seminars; one from Blizzard on performance & optimizations, one from Epic on building (or rather decorating) procedural buildings & a couple on asset management systems (which were reassuring in that they kinda validated the way we’re handling it at Curve). Tried to make it to the Sony motion controller (i.e. Playstation Move) seminar but it was packed (seriously! waaaay too many students this year…) but we did manage to have a wander around the expo & check out some cool stuff from:-
- God of War III (looked amazing)
- Live demonstrations of the Playstation Move technology (seems very accurate & responsive which is encouraging)
- Nvidia & Sony 3D stereoscopic games (tech works quite well…)
- etc etc
All in all a rather packed out day & more to come tomorrow & Saturday…
Looking forward to it…
January 15, 2010
So I started working on string management & support for my iPhone app & reached that oh so familiar point where I realised “hey! wouldn’t it be great if I could debug wide-char/UTF-16 strings in xcode!”..
Read the rest of this entry »
January 2, 2010
After all long break from blogging I figure I’d get back into the habit again..
& what better way to do so then beginning a new series on iPhone development. Over the next few months I’ll try to keep the blog up to date with useful tips & tricks I’ve picked up along the way. Hope it helps someone out!!
June 22, 2009
After a looooong break I feel it’s time to drop back into blogging. Expecially since I’ve been getting a fair amount of unexpected traffic of late (I didn’t think anyone actually read this blog! thanks guys!!).
The days have been pretty busy of late having been moved into the core engine team at my current studio and spending a vast amount of time learning what it takes to make a next-gen multi-platform engine tick and writing everything from particle editors to flash native rendering systems and more.
It is however time to get back into imparting some of the things i’ve learned & so stay tuned for some more (hopefully) useful tutorials soon..
- The Hog
August 11, 2008
During my travels of programming video games for a living, I take great pleasure in the fulfillment of learning, exploring & experimenting with algorithms & the intricacies of programming languages everyday. Being a gamplay programmer I tend to find that I make regular use of the modulus operator & it seems to be one of those little tools that most people seem to overlook but can be really useful for a variety of problems. So I thought it might be a good idea to write a little article about my experiences with lil’ old ‘%’ & hopefully it maybe able to encourage others share good practical uses for it too & hopefully teach me a thing or two.
Before we begin I’d just like to make a note that I’m going to be using C# for all my examples considering this blog are generally c#/XNA-centric (however i’m pretty sure the same functions for the modulus operator can be applied in C/C++ & most other languages)..