Wednesday, March 28, 2012

Preliminary Comparison of OpenSURF and CUDA SURF

OpenSURF[1] is an implementation of SURF feature detector/descriptor/matching in C++/C#. CUDA SURF[2] is an implementation of OpenSURF using CUDA SDK and CUDPP. Both use OpenCV to deal with basic image operations. CUDA SURF shares exactly the same function interface of OpenSURF so they are a reasonable pair to compare performance.


Here's a brief test on SURF algorithm using CPU vs GPU on the same computer(Intel Xeon 3.60GHz/4GB/Nvidia Quadro FX 5800/Ubuntu 11.04 32bit). The input images are shown.
Test Images from OpenSURF[1]


The preliminary test shows that both algorithm achieves good and similar results but CPU-based OpenSURF(0.65s) is 3x faster than GPU-based CUDA SURF(1.72s). I was quite surprised first and add timing probes to detect the difference of to implementation and found that CUDA SURF consumes numerous time in initializing to allocate memory(1.66s) and the rest part is far more faster than OpenSURF. It is potential doable for real-time processing as it only needs initialization once. More details will be tested and discussed later and I will try to optimized the CUDA SURF on this specific computer.


Code in Time(2) section


1:       // Allocate device memory  
2:       int img_width = src->width;  
3:       int img_height = src->height;  
4:       size_t rgb_img_pitch, gray_img_pitch, int_img_pitch, int_img_tr_pitch;  
5:       CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_rgb_img, &rgb_img_pitch, img_width * sizeof(unsigned int), img_height) );  
6:       CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_gray_img, &gray_img_pitch, img_width * sizeof(float), img_height) );  
7:       CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img, &int_img_pitch, img_width * sizeof(float), img_height) );  
8:       CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img_tr, &int_img_tr_pitch, img_height * sizeof(float), img_width) );  
9:       CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img_tr2, &int_img_tr_pitch, img_height * sizeof(float), img_width) );   


CPU-based OpenSURF(0.65s)
Matches: 76
Time(load):0.03000
Time(descriptor):0.56000
        Time(Integral):0.00000
        Time(FastHessian):0.00000
        Time(getIpoints):0.09000
        Time(descriptor):0.33000
        Time(cvReleaseImage):0.00000
        --------------------------------------
        Time(Integral):0.00000
        Time(FastHessian):0.00000
        Time(getIpoints):0.03000
        Time(descriptor):0.11000
        Time(cvReleaseImage):0.00000
Time(match):0.02000
Time(plot):0.00000
Time(save):0.04000


GPU-based CUDA SURF(1.72s)
Matches: 66
Time(load):0.02000
Time(descriptor):1.69000
        Time(Integral):1.68000
                Time(1):0.0000000000
                Time(2):1.6800000000
                Time(3):0.0000000000
                Time(4):0.0000000000
                Time(5):0.0000000000
                Time(6):0.0000000000
                Time(7):0.0000000000
                Time(8):0.0000000000
        Time(FastHessian):0.00000
        Time(getIpoints):0.00000
        Time(descriptor):0.00000
        Time(freeCudaImage):0.00000
        --------------------------------------
        Time(Integral):0.00000
                Time(1):0.0000000000
                Time(2):0.0000000000
                Time(3):0.0000000000
                Time(4):0.0000000000
                Time(5):0.0000000000
                Time(6):0.0000000000
                Time(7):0.0000000000
                Time(8):0.0000000000
        Time(FastHessian):0.00000
        Time(getIpoints):0.01000
        Time(descriptor):0.00000
        Time(freeCudaImage):0.00000
Time(match):0.01000
Time(plot):0.00000
Time(save):0.03000

             


CPU-based OpenSURF(0.65s)




GPU-based CUDA SURF(1.72s)

BTW, maybe there's a better way for timing which will increase the accuracy.[3]

             


Reference
[1]http://www.chrisevansdev.com/computer-vision-opensurf.html
[2]http://www.d2.mpi-inf.mpg.de/surf
[3]Measuring Computing Times and Operation Counts
of Generic Algorithms, http://www.cs.rpi.edu/~musser/gp/timing.html

1 comment:

  1. Dear Mr. Yedong Niu

    Hope you have nice day,
    can you help me

    I need an OCR piece of code that will be used in my application for 5 different mobile platforms (Android, iPhone, windows phone , Blackberry, Symbian).



    Input: image captured through mobile camera.

    Output: text that contains the identified .



    I’d like to inform you that after my search through various works I started to use Tesseract OCR engine as it’s open source on android platform, I’m trying now to compile the library using Android NDK plus Cygwin package (Unix environment emulator), then I’ll use the compiled library in my project.



    So, I should say that my criteria if you have another suggestion should be offline, free licensee and open source solution.




    Best Regards


    Nasr

    ReplyDelete