Thursday, December 27, 2012

GitHub and Open Source in the Classroom

I started using source control in high school, and I believe every computer science student should be comfortable with at least one source control package. Last semester in my GPU Programming and Architecture course at the University of Pennsylvania, we had students use GitHub for their code. This semester, we took it a step further.

Last semester, we used a GitHub educational account so each student could access a free private repo. To release a project, we added a new directory to each student's repo (which was slighly less work than creating a new repo; forking was not an option since each student wasn't required to create an educational account). Students submitted their work by creating a tag. This all worked OK. Students learned basic git features. Rolling out fixes to our starter code was reasonable enough.

But we could do much better.

This semester we said goodbye to private repos, and went completely open source (well, students still had the option to use private repos, but no one did). To release a project, we simply added the new repo to our GitHub organization, and students forked it. Students submitted their work with pull requests. Besides better exposing students to git, this had some major benefits.

Code portfolios

Each student built a code portfolio, which will help them start their careers. This is inspired by an internship posting by Christophe Riccio:

"The profile for the candidates: C++, OpenGL, OpenCL. Applications with a source code and for the principle of it, a resume."

I whole-heartily agree with the spirit of this. Students with public code portfolios are more marketable than students without. Public code portfolios give students more exposure and give employers a lower barrier-to-entry. By diffing repos, employers can also see what exactly the student did vs. what was provided.

Combine a public code portfolio with cool projects (our students implement GPU ray tracers, path tracers, and rasterizers to name a few), and quite frankly, I don't know how students without a code portfolio will compete.

In addition to helping our students, keeping everything public has the potential to help anyone.

Help the larger community

Interested in writing a GPU path tracer? Fork ours. Don't stop there, read our slides and listen to the audio. They're public for everyone's benefit. Ultimately if the course gets attention, it also benefits those who took the course.

Academic integrity

The most common question I get is if everyone's work is public, can't students cheat?

Yes, but students don't need public repos to cheat. They can talk among themselves and there's already lots of code online.

There's also lots of things that discourage cheating:
  • No two projects are alike - we never provide rigid requirements. For example, the image processing project gives a selection of filters to implement, and requires students to come up with new ones. The rasterizer gives a set of required stages, and a selection of extra ones. In practice, projects turned out to be remarkably different.
  • Random presentations follow projects - three or four students present their projects in class without any notice. The main motivation is for students to get use to talking on their feet giving an elevator pitch. Discouraging cheating for fear of not presenting well is perhaps a side-effect.
  • Social pressure - the code trail is public and commits have time stamps.

Cheating is, of course, still possible, especially copying one-liners. However, the benefits of public code repos outweigh the risk. We strive for a course culture where students don't care to cheat. They want to implement the projects - actually, even students outside of our course want to implement them. If a student wants to chase after grades, they need to take another course. In our course, we stay up to 2am coding for fun, not grades.

No comments:

Post a Comment