Note: This post has been updated after I received some help from nVidia’s Timo Stich. It turned out that some weights have to be set to zero; otherwise, CUDA gives an incorrect answer.
Today, I downloaded the latest version of nVidia’s CUDA to try out the graph cut segmentation present in NPP.
There is an example called “imageSegmentationNPP” which solves a two-dimensional (4-connected) problem. These are the results I got:
GPU: 84 ms.
CPU (GridCut): 24 ms.
The only thing I had to write myself were a small wrapper function to add GridCut to nVidia’s example. Both methods gave exactly the same solution and objective function value, so my wrapper seems correct.
My computer has neither a fast GPU nor a fast CPU, which makes comparison difficult. Instead, I tried a more interesting experiment. The original example by nVidia has a very low amount of regularization, so I increased the regularization:
GPU: 84 ms.
CPU (GridCut): 31 ms.
And even more:
GPU: 145 ms.
CPU (GridCut): 103 ms.
CUDA presumably uses push-relabel and GridCut uses augmenting paths. Now, the next step for me is to find a better GPU than the GTS 250 I have been using…