*Note: This post has been updated after I received some help from nVidia’s Timo Stich. It turned out that some weights have to be set to zero; otherwise, CUDA gives an incorrect answer.
*

Today, I downloaded the latest version of nVidia’s CUDA to try out the graph cut segmentation present in NPP.

There is an example called “imageSegmentationNPP” which solves a two-dimensional (4-connected) problem. These are the results I got:

GPU: **84 ms.**

CPU (GridCut): **24 ms.**

