Tuesday, July 24, 2012

OpenGL Insights

Our new book, OpenGL Insights, will be available at SIGGRAPH, and shipping from Amazon and elsewhere early August.

Christophe Riccio and I served as editors, and worked with 52 authors and 17 technical reviewers from the OpenGL, OpenGL ES, and WebGL communities to create 44 chapters and 712 pages on rendering techniques, performance, debugging, profiling, software design, teaching, and other topics.

The OpenGL Insights website is now up.  It contains:
  • Five sample chapters
    • Octree-Based Sparse Voxelization Using the GPU Hardware Rasterizer by Cyril Crassin and Simon Green
    • Performance Tuning for Tile-Based Architectures by Bruce Merry
    • Asynchronous Buffer Transfers by Ladislav Hrabcak and Arnaud Masserann
    • ARB_debug_output: A Helping Hand for Desperate Developers by António Ramires Fernandes and Bruno Oliveira
    • The ANGLE Project: Implementing OpenGL ES 2.0 on Direct3D by Daniel Koch and Nicolas Capens
  • OpenGL 4.2 and OpenGL ES 2.0 Pipeline Map
    • This was Christophe's idea, and his amazing work created the most detailed pipeline diagram that I've ever seen.  I love it.  I have a copy on my cube wall and plan to hand it out the first day of class in my GPU course.  A two-sided 14x18 inch detachable color version is also included with the book.
  • Tips
    • The book includes short OpenGL tips, for example, Depth writes only occur if GL_DEPTH_TEST is enabled.  The website includes all the tips in the book, and we plan to continuously update it, so please send your tips to editors@openglinsights.com.

Big thanks to Omar Rodriguez, coauthor of the Browser Graphics Analysis and Optimizations chapter, for his help with the website.

When I look back at the original call for authors, I am delighted that the final book lives up to the high expectations and broad scope that we initially set.  We received an overwhelming number of chapter proposals, especially considering OpenGL Insights was an unestablished, new series.  Authors put in significant effort, sometimes reworking parts several times in feedback loops with reviewers.  All chapters received feedback from multiple reviewers; some chapters had as many as six.  The reviewer's technical knowledge was humbling.

A huge part of this project's success is due to Christophe.  When I initially emailed Christophe in March, 2011 to ask him to join me in this effort, I said: this project needs your enthusiasm; it will not be the same quality without you.  Quite frankly, that was an understatement.  Christophe's passion, drive, vision, belief in community, and OpenGL knowledge are amazing.  Working so closely with him was a rewarding experience.

Monday, July 2, 2012

Reflections on Teaching GPU Programming and Architecture: Take II

I recently wrapped up teaching CIS 565: GPU Programming and Architecture at the University of Pennsylvania. The first time I taught it, I wrote a reflections post on my lessons learned. Here are my reflections on my second time through.

But first a brief course history... This course was first offered in 2005 by Suresh Venkatasubramanian, and focused on Cg for GPGPU. In 2007, Gary Katz and Joe Kider immediately migrated to CUDA when it became available in the middle of the semester. I was a student at the time, and remember our bitonic sort homework getting canceled because the solution just came out in the CUDA SDK. Joe taught the course several times, and suggested I teach it starting last spring, 2011. Over the years, we added quite a bit of rendering, making the course a combination of GPU computing and real-time rendering.

This past semester some things went well, and others did not.

1. Exploit parallelism in the final.

GPU Accelerated Path Tracing - Xing Du
My first semester, I had each student create a practice final that I crowdsourced to create a 100% student-made final. It was great fun and had great results.

To mix it up this semester, I crowdsourced questions on each homework to create a 100% student-made take-home final. I took it a step further by also having an in-class final that I made. I brought it into class and said "you probably realized that you made the take-home final, but I made the in-class final myself. However, there is only one copy. Design a parallel algorithm to take it."

Given that much of the course is on exploiting parallelism and parallel algorithms, i.e., reductions, scan, stream compaction, sorting, searching, etc., I thought the parallelism theme for the final was appropriate and fun.

Instead of using the student's parallel algorithm of ripping the final into pieces (I'm surprised they didn't want to photocopy it), I gave the exam verbally, and had each student work on the same question in parallel. Then, acting as the scheduling hardware, I picked a student to answer. If they got it wrong, the value of the question dropped to emulate a synchronization penalty, and the rest of the class was allowed to synchronize and come up with an answer. Although the class as a whole was a parallel system, they weren't quite a GPU. That's OK - the difference between the two was actually a question.

The in-class final was a ton of fun, created a lot of energy, and hopefully left a lasting impression with the students, which was my real goal. I also received positive feedback on the take-home final; students liked going back through the material - which is the exact point of a final!

2. Everyone should use GitHub in their courses.

Ray Marching Distance Fields in Real-Time in WebGL
Nop Jiarathanakul
I believe students should learn source control as early as possible. Given how much momentum git and github have (github hosts 2.6 million projects last time I checked), my students were required to use github for their homework and projects. Github kindly provides free educational accounts with private repos.

Although it required some initial ramp up, by the end of the semester the students really liked using github, and it became a habit for many even after the course. Besides exposing students to source control, github also made it easy to deploy fixes to our starter code, and host the course website backed by a git repo, which is much better than manual FTP.

In addition to the private repos used for homeworks, every student made their final project open source, and some continue to develop their projects. Given that github is becoming the new resume, i.e., show me code not paper, next semester all coding will be done in public repos so students can start building a code portfolio. Of course, this creates potential academic integrity issues, but we can combat them with follow-up presentations and structuring projects as implement x of n features.

3. The second-lecture effect.

Real-Time Reflections and Refractions - Ian Lilley
Software developers are familiar with the second-system effect that states developers tend to over-engineer their second system. This semester, I experienced something I call the second-lecture effect, which is the opposite of the second system - one's second lecture on the same topic is under-delivered.

For me, creating a new lecture took 12-15 hours; however, updating and prepping the same lecture a year later took 0-6 hours (0 for something I do everyday, and 6 for something I didn't touch all year). I, of course, expected the prep time to go down, but I didn't expect the quality of the lecture to go down too. Well, it did.

When I first create a lecture, the research is fresh in my mind, the topic is sometimes new to me, and the series of motivating questions to ask come naturally. My interest and excitement comes through in the lecture. However, the second time through is old news, and it is evident because the lecture is shorter, not at the same depth, and lacks the energy level.

We need to fight the second-lecture effect.

(OK, my lectures weren't that bad this semester, but they were not as good as the first time I taught the course).

4. Publicly available lectures have more benefits than drawbacks.

GPU-Accelerated Logo Detection - Yu Luo
When I first taught the course, I recorded every lecture, and made the wma files available upon request. Since it was my first time teaching, I wasn't sure if I was polished enough to post them publicly.

This semester, I made them available for the whole world. The benefits were:
  • More students used them!
  • I can point anyone interested in GPUs or rendering to them.
  • Giving back to the larger community makes me feel good.
However, there were some drawbacks:
  • Knowing that the audio will be posted tones me down: I am less likely to complain about the latest driver bug, tell some stories, or name names.
  • I'm not always right. For example, at one point, I said shared memory access takes the same number of cycles as registers on the G80, which is wrong. Shared memory takes two cycles (assuming no bank conflicts); and register access takes one. Although I correct these mistakes later, they are still embarrassing.

5. I don't scale well.

Procedurally Generating an Infinite City - Alice Yang
This semester I went from 12 to 20 students. It was still a small class, but it was also a 67% size increase for the one instructor and TA. This didn't increase lecture prep time, but it increased everything else. For example, I went from 550 to 740 gmail conversations over the semester, and had more student presentations and projects to mentor.

Given that I also started the semester beyond burnt out from OpenGL Insights (should be well worth it though), I was spread too thin.

Having more group, instead of individual, projects would allow me to spend more time on each project. We actually experimented with pair programming for an image processing assignment, followed by an in-class quiz with pretty positive results. The idea was to foster collaboration, but still ensure the knowledge of individuals; having less assignments to grade was a side-effect.

6. Student projects blew me away.

GPU-Accelerated Simplified General Perturbation No. 4 (SGP4) Model
Matthew Ahn
The final projects were outstanding. Screen shots of many are shown throughout this post. Many received external attention on twitter and blogs.

It helped remind me that this course is about projects, and to inspire me to really focus on them next semester. In fact, I am dropping written homework - with the exception of performance analysis - to focus on more projects. I am also re-branding homeworks as projects, and the project as the final project. No one wants to do homework, but everyone wants to do projects.

7. Grades measure rigor; rigor isn't everything.

GPU-Based Physically-Based Photorealistic Unbiased Pathtracer
Peter Kutz and Karl Li
I wrote before that organized and conscientious students get high GPAs, but that doesn't mean that they mastered the material or demonstrated the most passion; it means they carefully dotted their i's and crossed their t's in the quest for grades.

After this semester, I now take an even firmer stance on this position. Grades are all about not making mistakes: avoiding compiler warnings and errors; submitting all files on time; and answering questions carefully and fully. These are good rigorous traits, but not enough.

I want to create a culture of passion for graphics, and to facilitate the quest for knowledge, not the quest for grades. Next semester I'm doing away with grades for the most part. We'll give the students feedback on their projects, and they will grade themselves. My role will be to make sure the students fall into place relative to each other.

8. Apparently 9am is early.

I moved the class from 6pm to 9am thinking it would fit better into student's schedule; however, the most overwhelming course feedback I received was the course was too early. We're back to 6pm next semester.

9. Compute and graphics.

GPU-Accelerated RGB-D SLAM with Microsoft Kinect - Yedong Niu
Over the years, this course evolved along with the evolution of the GPU. Prior to this semester, we taught OpenGL for rendering and GPGPU before teaching compute with CUDA. This semester, I taught CUDA first, which made teaching the GLSL programming model trivial; however, its restrictiveness was even more evident.

Given the diverse backgrounds of the students taking the course - some are interested in rendering; others are interested in compute only; some are interested in both - we are going to split the course. This fall, it will focus on rendering, and we hope to offer something later that covers general compute with many-core GPUs and multi-core CPUs.

10. My success is student success.

Single Pass Order Independent Transparency - Sean Lilley
As I said before, I measure my success by my student's success. I even started a hall of fame for the course. Induction into the hall of fame isn't based on grades; it is based on how taking the course shaped a student's future, most often in the form of obtaining a related job or internship. This semester, Sean Lilley, who is spending the summer at AMD, and Ian Lilley, who is spending the summer with me at AGI working on Cesium, were inducted.

Varun Sampath, now at NVIDIA, was an awesome TA this semester, and was inducted into the hall of fame last spring.