So, I have finally gotten a little free time to do a little more coding. I am trying to cut down the time required for the lumen tracking algorithm. It had been initially planned for pure GPU code, but I ran into some technical issues early on and so in order to have things ready for RSNA, I settled on some CPU code which performed the same function but just took much longer to do it. This led to an awkward 1-1.5 minutes where everything else was on hold while a single thread was cranking away at the lumen tracking. Suffice it to say that something simple like that shouldn't take a third of the total run time. At any rate, during the conference I had a lot of time to think about the order in which steps are executed since I had to explain what the program was doing to people.
Surprisingly, even though I have taken out a significant component of the parallelism (previously I had near full core occupance on 4 cores throughout most of the run time), the overall run time is similar. In fact, once I get the GPU based lumen tracking code fully integrated, I will have actually dropped my overall run time significantly.
There has been an interval upgrade in CPU to a phenom II x4 running at 3.6. Though it is nice to have this on the workstation, it really doesn't make too much difference now that I am getting more adept at writing GPU code. Another bonus is that since I don't have multiple threads accessing GPU time with large blocks of memory, I have taken the workstation down to a 9600 for display and a single 280GTX for computations. The idle power consumpution is 200W (down from 250W w/ the 65nm 9950 CPU with similar GPU configuration and down from 360W during the show w/ 3x 280GTX and the 9950).
It is best not to know your own limitiation since they can only limit your potential.
Total Pageviews
Monday, January 19, 2009
Sunday, January 11, 2009
well, it has been a while since I have posted to this blog. Since the last post, I have mostly been working on my radiology related computer vision program which I presented at RSNA about a month ago. It went pretty well.
Since that time, I have decided that a single workstation setup is insufficient for large scale testing of my program on multiple CT angiograms. So, I have started building a computing cluster in my apartment. Since I have a bunch of graphics hardware sitting around (2x 280 GTX and 2x Tesla), I am setting up a single workstation for each card. I have the first two nodes setup already and have successfully launched a test program across the the network.
The ultimate goal is to have all four nodes up and running by the end of the month. That way, I can launch 5 instances of the program simultaneously and cut down on the run time for testing new builds on 12-24 scans.
For the processors, I went with the 45W AMD dual core chips running at 2.6GHz. I have been surprised at how well they perform despite being fairly low on the L2 cache side. It seems that the only thing that really changes my run time is increasing RAM (beyond 4GB) and increasing the GPU power. The newer NVIDIA cards seem to have relxed the memory coalescing requirements compared to the G80 series which has benefited me greatly.
Well, time to go... let's see if i can keep up with posting more frequently these days.
Since that time, I have decided that a single workstation setup is insufficient for large scale testing of my program on multiple CT angiograms. So, I have started building a computing cluster in my apartment. Since I have a bunch of graphics hardware sitting around (2x 280 GTX and 2x Tesla), I am setting up a single workstation for each card. I have the first two nodes setup already and have successfully launched a test program across the the network.
The ultimate goal is to have all four nodes up and running by the end of the month. That way, I can launch 5 instances of the program simultaneously and cut down on the run time for testing new builds on 12-24 scans.
For the processors, I went with the 45W AMD dual core chips running at 2.6GHz. I have been surprised at how well they perform despite being fairly low on the L2 cache side. It seems that the only thing that really changes my run time is increasing RAM (beyond 4GB) and increasing the GPU power. The newer NVIDIA cards seem to have relxed the memory coalescing requirements compared to the G80 series which has benefited me greatly.
Well, time to go... let's see if i can keep up with posting more frequently these days.
Subscribe to:
Posts (Atom)