Thursday, February 3, 2011

Syntax highlighting C# and GLSL source code with LaTeX and the 'Listings' package


We wanted to syntax highlight the source code listings in our book in order to make them easier to read. Unfortunately, the languages used for most of the listings, C# and GLSL, are not supported out of the box by the LaTeX Listings package. What to do?

Well, C# and GLSL both have roots in the C language, so we started out tagging our code listings as C++ in order to get some minimal highlighting. The results aren't very good with that approach. Keywords that are shared with C++, like float and return, are nicely highlighted, but not so with
GLSL-specific keywords like vec3. And wouldn't it be nice if built-in GLSL functions like sqrt
and cos were highlighted, like they are in NShader?

I should mention before I get too far that I'm far from a LaTeX expert.
What I'm about to describe worked for me, but please let me know if I'm doing something odd.

C# and GLSL language definitions

The Listings LaTeX package makes it fairly easy to define new languages. Several language definitions for C# can be found around the web. Here's ours:
\lstdefinelanguage{CSharp}
{
sensitive=true,
morekeywords=[1]{
abstract, as, base, break, case,
catch, checked, class, const, continue,
default, delegate, do, else, enum,
event, explicit, extern, false,
finally, fixed, for, foreach, goto, if,
implicit, in, interface, internal, is,
lock, namespace, new, null, operator,
out, override, params, private,
protected, public, readonly, ref,
return, sealed, sizeof, stackalloc,
static, struct, switch, this, throw,
true, try, typeof, unchecked, unsafe,
using, virtual, volatile, while, bool,
byte, char, decimal, double, float,
int, lock, object, sbyte, short, string,
uint, ulong, ushort, void},
morecomment=[l]{//},
morecomment=[s]{/*}{*/},
morecomment=[l][keywordstyle4]{\#},
morestring=[b]",
morestring=[b]',
}
Surprisingly, we weren't able to find a language definition for GLSL. It was pretty easy to put one together based on the GLSL spec, though:
\lstdefinelanguage{GLSL}
{
sensitive=true,
morekeywords=[1]{
attribute, const, uniform, varying,
layout, centroid, flat, smooth,
noperspective, break, continue, do,
for, while, switch, case, default, if,
else, in, out, inout, float, int, void,
bool, true, false, invariant, discard,
return, mat2, mat3, mat4, mat2x2, mat2x3,
mat2x4, mat3x2, mat3x3, mat3x4, mat4x2,
mat4x3, mat4x4, vec2, vec3, vec4, ivec2,
ivec3, ivec4, bvec2, bvec3, bvec4, uint,
uvec2, uvec3, uvec4, lowp, mediump, highp,
precision, sampler1D, sampler2D, sampler3D,
samplerCube, sampler1DShadow,
sampler2DShadow, samplerCubeShadow,
sampler1DArray, sampler2DArray,
sampler1DArrayShadow, sampler2DArrayShadow,
isampler1D, isampler2D, isampler3D,
isamplerCube, isampler1DArray,
isampler2DArray, usampler1D, usampler2D,
usampler3D, usamplerCube, usampler1DArray,
usampler2DArray, sampler2DRect,
sampler2DRectShadow, isampler2DRect,
usampler2DRect, samplerBuffer,
isamplerBuffer, usamplerBuffer, sampler2DMS,
isampler2DMS, usampler2DMS,
sampler2DMSArray, isampler2DMSArray,
usampler2DMSArray, struct},
morekeywords=[2]{
radians,degrees,sin,cos,tan,asin,acos,atan,
atan,sinh,cosh,tanh,asinh,acosh,atanh,pow,
exp,log,exp2,log2,sqrt,inversesqrt,abs,sign,
floor,trunc,round,roundEven,ceil,fract,mod,modf,
min,max,clamp,mix,step,smoothstep,isnan,isinf,
floatBitsToInt,floatBitsToUint,intBitsToFloat,
uintBitsToFloat,length,distance,dot,cross,
normalize,faceforward,reflect,refract,
matrixCompMult,outerProduct,transpose,
determinant,inverse,lessThan,lessThanEqual,
greaterThan,greaterThanEqual,equal,notEqual,
any,all,not,textureSize,texture,textureProj,
textureLod,textureOffset,texelFetch,
texelFetchOffset,textureProjOffset,
textureLodOffset,textureProjLod,
textureProjLodOffset,textureGrad,
textureGradOffset,textureProjGrad,
textureProjGradOffset,texture1D,texture1DProj,
texture1DProjLod,texture2D,texture2DProj,
texture2DLod,texture2DProjLod,texture3D,
texture3DProj,texture3DLod,texture3DProjLod,
textureCube,textureCubeLod,shadow1D,shadow2D,
shadow1DProj,shadow2DProj,shadow1DLod,
shadow2DLod,shadow1DProjLod,shadow2DProjLod,
dFdx,dFdy,fwidth,noise1,noise2,noise3,noise4,
EmitVertex,EndPrimitive},
morekeywords=[3]{
gl_VertexID,gl_InstanceID,gl_Position,
gl_PointSize,gl_ClipDistance,gl_PerVertex,
gl_Layer,gl_ClipVertex,gl_FragCoord,
gl_FrontFacing,gl_ClipDistance,gl_FragColor,
gl_FragData,gl_MaxDrawBuffers,gl_FragDepth,
gl_PointCoord,gl_PrimitiveID,
gl_MaxVertexAttribs,gl_MaxVertexUniformComponents,
gl_MaxVaryingFloats,gl_MaxVaryingComponents,
gl_MaxVertexOutputComponents,
gl_MaxGeometryInputComponents,
gl_MaxGeometryOutputComponents,
gl_MaxFragmentInputComponents,
gl_MaxVertexTextureImageUnits,
gl_MaxCombinedTextureImageUnits,
gl_MaxTextureImageUnits,
gl_MaxFragmentUniformComponents,
gl_MaxDrawBuffers,gl_MaxClipDistances,
gl_MaxGeometryTextureImageUnits,
gl_MaxGeometryOutputVertices,
gl_MaxGeometryOutputVertices,
gl_MaxGeometryTotalOutputComponents,
gl_MaxGeometryUniformComponents,
gl_MaxGeometryVaryingComponents,gl_DepthRange},
morecomment=[l]{//},
morecomment=[s]{/*}{*/},
morecomment=[l][keywordstyle4]{\#},
}
GLSL has a ton of keywords, especially since we've included the built-in functions and variables in separate keyword lists so that they can be highlighted separately.

Coloring the source code

Now that the languages are defined, we can specify how they are highlighted. The color scheme presented here is loosely based on Visual Studio and NShader:
\lstset{
backgroundcolor=\color[rgb]{0.95, 0.95, 0.95},
tabsize=2,
rulecolor=,
basicstyle=\scriptsize,
upquote=true,
aboveskip={1.5\baselineskip},
columns=fixed,
showstringspaces=false,
extendedchars=true,
breaklines=true,
prebreak = \raisebox{0ex}[0ex][0ex]{\ensuremath{\hookleftarrow}},
frame=single,
showtabs=false,
showspaces=false,
showstringspaces=false,
identifierstyle=\ttfamily,
keywordstyle=\color[rgb]{1.0,0,0},
keywordstyle=[1]\color[rgb]{0,0,0.75},
keywordstyle=[2]\color[rgb]{0.5,0.0,0.0},
keywordstyle=[3]\color[rgb]{0.127,0.427,0.514},
keywordstyle=[4]\color[rgb]{0.4,0.4,0.4},
commentstyle=\color[rgb]{0.133,0.545,0.133},
stringstyle=\color[rgb]{0.639,0.082,0.082},
}
I was surprised to learn that the Listings package does not allow styles be defined per language; a single style is applied to all languages. This is not as limiting as it at first appears, however, because we can define multiple groups of keywords and specify a style for each group individually.

With the languages and style defined, we can write LaTeX code like this:
\begin{lstlisting}[language=GLSL]
// Vertex shader
in vec4 position;
uniform mat4 og_modelViewPerspectiveMatrix;

void main()
{
gl_Position = og_modelViewPerspectiveMatrix * position;
}

// Fragment shader
out vec3 fragmentColor;
void main() { fragmentColor = vec3(0.0, 0.0, 0.0); }
\end{lstlisting}
To generate a nice listing like this:


Highlighting type names

You may have noticed that the Visual Studio editor highlights the names of classes, structs, interfaces, and enums. How can we do that with LaTeX? LaTeX, of course, has no idea which identifiers in our source listings are class names and which are the names of variables, methods, etc., so we have to tell it. We can do that by specifying, for each listing, the identifiers that belong to a keyword group:
\begin{lstlisting}[language=CSharp,classoffset=2,morekeywords={GL,EnableCap}]
GL.Enable(EnableCap.DepthTest);
// ... Set other states
Render();
\end{lstlisting}
Which produces the following:



Notice that GL and EnableCap are highlighted because they have been explicitly listed, using classoffset and morekeywords, as belonging to keyword group 3 - an offset of 2 from the base.

Limitations

The Listings package has a couple of limitations that we were unable to work around:
  • Highlighting is entirely keyword based and cannot take context into account. It's fairly common in C# to have a single identifier that is a class name in one context and a property name in another context. If both usages of the identifier occur in a single listing, there does not appear to be any way to highlight the identifier only where it refers to a class name. Either it's highlighted everywhere, or nowhere. Update 2011/05/27: I found a way to do this after all. See below.
  • The Listings package allows us to specify a keyword prefix so that all keywords starting with a character sequence are highlighted. Unfortunately, only a single prefix can be specified. It would have been helpful to highlight all identifiers in GLSL listings that start with gl or og.
  • There does not appear to be a way to highlight numeric literals (such as 1.23) or operators (such as ==).
If you know a solution to any of these problems, please let me know!

Update 2011/05/27

I wrote above that I was unable to find a way to highlight some occurrences of an identifier but not others within a single listing. It turns out there is a way after all. The trick is to insert a do-nothing "escape to LaTeX" somewhere in the middle of the occurrence of the identifier that you don't want highlighted.

First, define an escapechar. It can be any character you want, so long as the character will not otherwise occur in your listings. Here I've used backtick (`) as my escapechar:
\lstset{
...
escapechar=`,
...
}
Then write your listing like this:
\begin{lstlisting}[language=CSharp, caption={RenderState Properties.},classoffset=2,morekeywords={RenderState,PrimitiveRestart,FacetCulling,RasterizationMode,ScissorTest,StencilTest,DepthTest,DepthRange,Blending,ColorMask}]
public class RenderState
{
public PrimitiveRestart P``rimitiveRestart { get; set; }
public FacetCulling F``acetCulling { get; set; }
public RasterizationMode R``asterizationMode { get; set; }
public ScissorTest S``cissorTest { get; set; }
public StencilTest S``tencilTest { get; set; }
public DepthTest D``epthTest { get; set; }
public DepthRange D``epthRange { get; set; }
public Blending B``lending { get; set; }
public ColorMask C``olorMask { get; set; }
public bool DepthMask { get; set; }
}
\end{lstlisting}
Notice the double backticks (``) in the names of the properties. The first backtick escapes to LaTeX mode, and the second returns to listing mode. You could include LaTeX commands between the backticks, but there's no need; the interruption alone is enough to cause the identifier to not be highlighted. LaTeX will render the listing above like this: