Here's a brief test on SURF algorithm using CPU vs GPU on the same computer(Intel Xeon 3.60GHz/4GB/Nvidia Quadro FX 5800/Ubuntu 11.04 32bit). The input images are shown.
Test Images from OpenSURF[1]
The preliminary test shows that both algorithm achieves good and similar results but CPU-based OpenSURF(0.65s) is 3x faster than GPU-based CUDA SURF(1.72s). I was quite surprised first and add timing probes to detect the difference of to implementation and found that CUDA SURF consumes numerous time in initializing to allocate memory(1.66s) and the rest part is far more faster than OpenSURF. It is potential doable for real-time processing as it only needs initialization once. More details will be tested and discussed later and I will try to optimized the CUDA SURF on this specific computer.
Code in Time(2) section
1: // Allocate device memory
2: int img_width = src->width;
3: int img_height = src->height;
4: size_t rgb_img_pitch, gray_img_pitch, int_img_pitch, int_img_tr_pitch;
5: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_rgb_img, &rgb_img_pitch, img_width * sizeof(unsigned int), img_height) );
6: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_gray_img, &gray_img_pitch, img_width * sizeof(float), img_height) );
7: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img, &int_img_pitch, img_width * sizeof(float), img_height) );
8: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img_tr, &int_img_tr_pitch, img_height * sizeof(float), img_width) );
9: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img_tr2, &int_img_tr_pitch, img_height * sizeof(float), img_width) );
CPU-based OpenSURF(0.65s)
Matches: 76
Time(load):0.03000
Time(descriptor):0.56000
Time(Integral):0.00000
Time(FastHessian):0.00000
Time(getIpoints):0.09000
Time(descriptor):0.33000
Time(cvReleaseImage):0.00000
--------------------------------------
Time(Integral):0.00000
Time(FastHessian):0.00000
Time(getIpoints):0.03000
Time(descriptor):0.11000
Time(cvReleaseImage):0.00000
Time(match):0.02000
Time(plot):0.00000
Time(save):0.04000
GPU-based CUDA SURF(1.72s)
Matches: 66
Time(load):0.02000
Time(descriptor):1.69000
Time(Integral):1.68000
Time(1):0.0000000000
Time(2):1.6800000000
Time(3):0.0000000000
Time(4):0.0000000000
Time(5):0.0000000000
Time(6):0.0000000000
Time(7):0.0000000000
Time(8):0.0000000000
Time(FastHessian):0.00000
Time(getIpoints):0.00000
Time(descriptor):0.00000
Time(freeCudaImage):0.00000
--------------------------------------
Time(Integral):0.00000
Time(1):0.0000000000
Time(2):0.0000000000
Time(3):0.0000000000
Time(4):0.0000000000
Time(5):0.0000000000
Time(6):0.0000000000
Time(7):0.0000000000
Time(8):0.0000000000
Time(FastHessian):0.00000
Time(getIpoints):0.01000
Time(descriptor):0.00000
Time(freeCudaImage):0.00000
Time(match):0.01000
Time(plot):0.00000
Time(save):0.03000
CPU-based OpenSURF(0.65s)
GPU-based CUDA SURF(1.72s)
BTW, maybe there's a better way for timing which will increase the accuracy.[3]
Reference
[1]http://www.chrisevansdev.com/computer-vision-opensurf.html
[2]http://www.d2.mpi-inf.mpg.de/surf
[3]Measuring Computing Times and Operation Counts
of Generic Algorithms, http://www.cs.rpi.edu/~musser/gp/timing.html
Dear Mr. Yedong Niu
ReplyDeleteHope you have nice day,
can you help me
I need an OCR piece of code that will be used in my application for 5 different mobile platforms (Android, iPhone, windows phone , Blackberry, Symbian).
Input: image captured through mobile camera.
Output: text that contains the identified .
I’d like to inform you that after my search through various works I started to use Tesseract OCR engine as it’s open source on android platform, I’m trying now to compile the library using Android NDK plus Cygwin package (Unix environment emulator), then I’ll use the compiled library in my project.
So, I should say that my criteria if you have another suggestion should be offline, free licensee and open source solution.
Best Regards
Nasr