Introduction to parallel programming is the second MOOC course that I signed up for. The emergence of parallel and distributed computing is not slowing down and it seems that most developers are not accustomed to the very different train of though that parallelism invokes. Most recent GPUs have 1024 simple compute units each of which can run parallel threads. The general course over by the course instructor, John Owens:
The first week started off pretty simple focussing on why GPUs, parallel programming and CUDA. I found the pace of the videos just right and much more engaging than other courses I have looked at.
The basic of CUDA:
Week 1 Assignment
On review the assignment solution is sub-optimal enough to do the job.