So, I have finally gotten a little free time to do a little more coding. I am trying to cut down the time required for the lumen tracking algorithm. It had been initially planned for pure GPU code, but I ran into some technical issues early on and so in order to have things ready for RSNA, I settled on some CPU code which performed the same function but just took much longer to do it. This led to an awkward 1-1.5 minutes where everything else was on hold while a single thread was cranking away at the lumen tracking. Suffice it to say that something simple like that shouldn't take a third of the total run time. At any rate, during the conference I had a lot of time to think about the order in which steps are executed since I had to explain what the program was doing to people.
Surprisingly, even though I have taken out a significant component of the parallelism (previously I had near full core occupance on 4 cores throughout most of the run time), the overall run time is similar. In fact, once I get the GPU based lumen tracking code fully integrated, I will have actually dropped my overall run time significantly.
There has been an interval upgrade in CPU to a phenom II x4 running at 3.6. Though it is nice to have this on the workstation, it really doesn't make too much difference now that I am getting more adept at writing GPU code. Another bonus is that since I don't have multiple threads accessing GPU time with large blocks of memory, I have taken the workstation down to a 9600 for display and a single 280GTX for computations. The idle power consumpution is 200W (down from 250W w/ the 65nm 9950 CPU with similar GPU configuration and down from 360W during the show w/ 3x 280GTX and the 9950).