Wednesday, April 25, 2012
Tuesday, April 24, 2012
I have been debugging a weird bug for a week. It causes random segmentation fault for no obvious reason. Since I've decided to reinvent the wheel somehow, to start from the very beginning and use opencv and pcl to build a casual version of the project, life is hard for every tiny problem including the incompatibility of pcl-1.1 with pcl-1.5, opencv2.3 and opencv2.0. The result so far is far from satisfactory.
Wednesday, April 18, 2012
OpenCV
There's some problems in integrating CUDA SURF with ROS, so I am not shifting to OpenCV GPU support. The libopencv2.3 was not compiled with CUDA support thus need to recompile OpenCV 2.3 with CUDA flag.
Now mainly testing offline as ROS provides rosbag to log and play data for realtime processing and debugging. The CUDA SURF needs roughly 0.02s for SURF detection and matching, while needs 0.08s for image loading and writing on 640x480 RGB8 images.
Now mainly testing offline as ROS provides rosbag to log and play data for realtime processing and debugging. The CUDA SURF needs roughly 0.02s for SURF detection and matching, while needs 0.08s for image loading and writing on 640x480 RGB8 images.
Continuous SURF detection and matching
Monday, April 16, 2012
Sunday, April 1, 2012
Midpoint Check
To improve the performance of real-time implementation of RGB-D 3D environment reconstruction, the following algorithm could be considered to implement with CUDA.
1. SURF(Speeded Up Robust Feature) detector/descriptor/matching
2. RANSAC
3. ICP(Iterative Cloud Point)
By midpoint I have basically done testing GPU based SURF algorithm and modify the interface to be integrated in ROS. It relies on OpenCV, OpenSurf, CUDPP and CUDA SURF libraries.
Next is to combine RANSAC pose estimation and ICP into sequential kernels with minimum memory transfer. Real-time rendering with OpenGL in rviz node should be considered as well.
The midpoint check presentation could be downloaded here.
1. SURF(Speeded Up Robust Feature) detector/descriptor/matching
2. RANSAC
3. ICP(Iterative Cloud Point)
By midpoint I have basically done testing GPU based SURF algorithm and modify the interface to be integrated in ROS. It relies on OpenCV, OpenSurf, CUDPP and CUDA SURF libraries.
Next is to combine RANSAC pose estimation and ICP into sequential kernels with minimum memory transfer. Real-time rendering with OpenGL in rviz node should be considered as well.
The midpoint check presentation could be downloaded here.
Wednesday, March 28, 2012
Preliminary Comparison of OpenSURF and CUDA SURF
OpenSURF[1] is an implementation of SURF feature detector/descriptor/matching in C++/C#. CUDA SURF[2] is an implementation of OpenSURF using CUDA SDK and CUDPP. Both use OpenCV to deal with basic image operations. CUDA SURF shares exactly the same function interface of OpenSURF so they are a reasonable pair to compare performance.
Here's a brief test on SURF algorithm using CPU vs GPU on the same computer(Intel Xeon 3.60GHz/4GB/Nvidia Quadro FX 5800/Ubuntu 11.04 32bit). The input images are shown.
The preliminary test shows that both algorithm achieves good and similar results but CPU-based OpenSURF(0.65s) is 3x faster than GPU-based CUDA SURF(1.72s). I was quite surprised first and add timing probes to detect the difference of to implementation and found that CUDA SURF consumes numerous time in initializing to allocate memory(1.66s) and the rest part is far more faster than OpenSURF. It is potential doable for real-time processing as it only needs initialization once. More details will be tested and discussed later and I will try to optimized the CUDA SURF on this specific computer.
Code in Time(2) section
CPU-based OpenSURF(0.65s)
Matches: 76
Time(load):0.03000
Time(descriptor):0.56000
Time(Integral):0.00000
Time(FastHessian):0.00000
Time(getIpoints):0.09000
Time(descriptor):0.33000
Time(cvReleaseImage):0.00000
--------------------------------------
Time(Integral):0.00000
Time(FastHessian):0.00000
Time(getIpoints):0.03000
Time(descriptor):0.11000
Time(cvReleaseImage):0.00000
Time(match):0.02000
Time(plot):0.00000
Time(save):0.04000
Here's a brief test on SURF algorithm using CPU vs GPU on the same computer(Intel Xeon 3.60GHz/4GB/Nvidia Quadro FX 5800/Ubuntu 11.04 32bit). The input images are shown.
Test Images from OpenSURF[1]
The preliminary test shows that both algorithm achieves good and similar results but CPU-based OpenSURF(0.65s) is 3x faster than GPU-based CUDA SURF(1.72s). I was quite surprised first and add timing probes to detect the difference of to implementation and found that CUDA SURF consumes numerous time in initializing to allocate memory(1.66s) and the rest part is far more faster than OpenSURF. It is potential doable for real-time processing as it only needs initialization once. More details will be tested and discussed later and I will try to optimized the CUDA SURF on this specific computer.
Code in Time(2) section
1: // Allocate device memory
2: int img_width = src->width;
3: int img_height = src->height;
4: size_t rgb_img_pitch, gray_img_pitch, int_img_pitch, int_img_tr_pitch;
5: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_rgb_img, &rgb_img_pitch, img_width * sizeof(unsigned int), img_height) );
6: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_gray_img, &gray_img_pitch, img_width * sizeof(float), img_height) );
7: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img, &int_img_pitch, img_width * sizeof(float), img_height) );
8: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img_tr, &int_img_tr_pitch, img_height * sizeof(float), img_width) );
9: CUDA_SAFE_CALL( cudaMallocPitch((void**)&d_int_img_tr2, &int_img_tr_pitch, img_height * sizeof(float), img_width) );
CPU-based OpenSURF(0.65s)
Matches: 76
Time(load):0.03000
Time(descriptor):0.56000
Time(Integral):0.00000
Time(FastHessian):0.00000
Time(getIpoints):0.09000
Time(descriptor):0.33000
Time(cvReleaseImage):0.00000
--------------------------------------
Time(Integral):0.00000
Time(FastHessian):0.00000
Time(getIpoints):0.03000
Time(descriptor):0.11000
Time(cvReleaseImage):0.00000
Time(match):0.02000
Time(plot):0.00000
Time(save):0.04000
GPU-based CUDA SURF(1.72s)
Matches: 66
Time(load):0.02000
Time(descriptor):1.69000
Time(Integral):1.68000
Time(1):0.0000000000
Time(2):1.6800000000
Time(3):0.0000000000
Time(4):0.0000000000
Time(5):0.0000000000
Time(6):0.0000000000
Time(7):0.0000000000
Time(8):0.0000000000
Time(FastHessian):0.00000
Time(getIpoints):0.00000
Time(descriptor):0.00000
Time(freeCudaImage):0.00000
--------------------------------------
Time(Integral):0.00000
Time(1):0.0000000000
Time(2):0.0000000000
Time(3):0.0000000000
Time(4):0.0000000000
Time(5):0.0000000000
Time(6):0.0000000000
Time(7):0.0000000000
Time(8):0.0000000000
Time(FastHessian):0.00000
Time(getIpoints):0.01000
Time(descriptor):0.00000
Time(freeCudaImage):0.00000
Time(match):0.01000
Time(plot):0.00000
Time(save):0.03000
CPU-based OpenSURF(0.65s)
GPU-based CUDA SURF(1.72s)
BTW, maybe there's a better way for timing which will increase the accuracy.[3]
Reference
[1]http://www.chrisevansdev.com/computer-vision-opensurf.html
[2]http://www.d2.mpi-inf.mpg.de/surf
[3]Measuring Computing Times and Operation Counts
of Generic Algorithms, http://www.cs.rpi.edu/~musser/gp/timing.html
CUDPP
CUDPP is the CUDA Data Parallel Primitives Library. CUDPP is a library of data-parallel algorithm primitives such as parallel prefix-sum (”scan”), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.[1]
As SURF uses integral images for fast convolution, it is essential to calculate summed-area table by CUDA. CUDPP is an alternative and dependency of CUDA SURF.[2]
When I tried to compile the CUDPP library, I found it extremely slow so that I thought the computer died somehow. After waiting for decades of minutes it finally completed. Curiously I tried to figure out the reason and got an answer for the wiki of CUDPP.[3]
"Compile time continues to get longer as we add more functionality. CUDA is really slow at
compiling template functions with multiple parameters, and we use a lot. There are something like
384 different scan kernels, for example, and a similar number for segscan. "
Reference
[1]CUDPP, http://code.google.com/p/cudpp/
[2]CUDA SURF http://www.d2.mpi-inf.mpg.de/surf
[3]http://code.google.com/p/cudpp/issues/detail?id=19
As SURF uses integral images for fast convolution, it is essential to calculate summed-area table by CUDA. CUDPP is an alternative and dependency of CUDA SURF.[2]
When I tried to compile the CUDPP library, I found it extremely slow so that I thought the computer died somehow. After waiting for decades of minutes it finally completed. Curiously I tried to figure out the reason and got an answer for the wiki of CUDPP.[3]
"Compile time continues to get longer as we add more functionality. CUDA is really slow at
compiling template functions with multiple parameters, and we use a lot. There are something like
384 different scan kernels, for example, and a similar number for segscan. "
Reference
[1]CUDPP, http://code.google.com/p/cudpp/
[2]CUDA SURF http://www.d2.mpi-inf.mpg.de/surf
[3]http://code.google.com/p/cudpp/issues/detail?id=19
libopencv-2.3.1
In ROS(Electric) the package opencv2 is deprecated. Instead, libopencv-dev 2.3.1 is used as the source of OpenCV. However it comes with some troubles in linking the library as the PKG_CONFIG_PATH will automatically add -I to the libraries(2.3.1), there would be error when linking. Meanwhile the libopencv-dev 2.3.1 uses opencv-2.3.1.pc instead of opencv.pc in usr/lib/pkgconfig/, which will cause the following error when compiling out of ROS for the regular c/cpp source file although the pkg-config is set to the opencv-2.3.1.pc.
/usr/bin/ld: cannot find -l/usr/lib/libopencv_contrib.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_legacy.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_objdetect.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_calib3d.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_features2d.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_video.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_highgui.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_ml.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_imgproc.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_flann.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_core.so.2.3.1
collect2: ld returned 1 exit status
make: * [DisplayImage] Error 1
To solve the problem, the method that works on my computer is to create an additional .pc file at /usr/lib/pkgconfig/opencv.pc with the following contents.[1]
Reference
[1]about compiling opencv programs outside ROS, http://answers.ros.org/question/11916/about-compiling-opencv-programs-outside-ros
[2]Ticket #1475, https://code.ros.org/trac/opencv/ticket/1475
/usr/bin/ld: cannot find -l/usr/lib/libopencv_contrib.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_legacy.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_objdetect.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_calib3d.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_features2d.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_video.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_highgui.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_ml.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_imgproc.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_flann.so.2.3.1
/usr/bin/ld: cannot find -l/usr/lib/libopencv_core.so.2.3.1
collect2: ld returned 1 exit status
make: * [DisplayImage] Error 1
To solve the problem, the method that works on my computer is to create an additional .pc file at /usr/lib/pkgconfig/opencv.pc with the following contents.[1]
1: # Package Information for pkg-config
2: prefix=/usr
3: exec_prefix=${prefix}
4: libdir=${exec_prefix}/lib
5: includedir_old=${prefix}/include/opencv-2.3.1/opencv
6: includedir_new=${prefix}/include/opencv-2.3.1
7: Name: OpenCV
8: Description: Open Source Computer Vision Library
9: Version: 2.3.1
10: Libs: -L${libdir} -lopencv_contrib -lopencv_legacy -lopencv_objdetect -lopencv_calib3d -lopencv_features2d -lopencv_video -lopencv_highgui -lopencv_ml -lopencv_imgproc -lopencv_flann -lopencv_core
11: Cflags: -I${includedir_old} -I${includedir_new}
Reference
[1]about compiling opencv programs outside ROS, http://answers.ros.org/question/11916/about-compiling-opencv-programs-outside-ros
[2]Ticket #1475, https://code.ros.org/trac/opencv/ticket/1475
Dropbox Uploader in terminal
Since I am working on shaggy remotely these days, I need a way to transfer code and files to shaggy, execute and transfer them back to my computer. For security issue I need to ssh eniac.seas.upenn.edu and then ssh shaggy and this renders scp command annoying. There are a couple of ways to do it and here's my experience.
1. github
This works professionally, no doubts. Just clone the repository on shaggy and push and pull it.
2. scp
Need to scp to eniac first and then shaggy.
3. Dropbox uploader
There is a bash tool called dropbox_uploader[1] and I can upload files to my Dropbox within a single line.
Usage: ./dropbox_uploader.sh [OPTIONS]...
Options:
-u [USERNAME] (required if not hardcoded)
-p [PASSWORD]
-f [FILE/FOLDER] (required)
-d [REMOTE_FOLDER] (default: /)
-v Verbose mode
1. github
This works professionally, no doubts. Just clone the repository on shaggy and push and pull it.
2. scp
Need to scp to eniac first and then shaggy.
3. Dropbox uploader
There is a bash tool called dropbox_uploader[1] and I can upload files to my Dropbox within a single line.
Usage: ./dropbox_uploader.sh [OPTIONS]...
Options:
-u [USERNAME] (required if not hardcoded)
-p [PASSWORD]
-f [FILE/FOLDER] (required)
-d [REMOTE_FOLDER] (default: /)
-v Verbose mode
It's convenient to send the results one way back to my computer.
Tuesday, March 27, 2012
Update
Working on some paper about vision based slam these days. Tested(offline) built-in SIFT GPU code for sequential images last week and it significantly improve the speed. Now I am thinking to combine the SIFT/SRUF feature extraction and matching in single kernel, which would decrease the overhead for memory loading and writing. Working on shaggy, need to install some dependencies.
Wednesday, March 14, 2012
More Resources on SURF & GPU
SURF C++ Implementations
OpenSURF: http://www.chrisevansdev.com/computer-vision-opensurf.html
OpenCV SURF: http://opencv.itseez.com/modules/gpu/doc/feature_detection_and_description.html
SURF GPU Implementations
Speeded Up SURF: http://asrl.utias.utoronto.ca/code/gpusurf/
CUDA SURF: http://www.d2.mpi-inf.mpg.de/surf?q=surf
GPU SURF: http://homes.esat.kuleuven.be/~ncorneli/gpusurf/
Reference
http://en.wikipedia.org/wiki/SURF
OpenSURF: http://www.chrisevansdev.com/computer-vision-opensurf.html
OpenCV SURF: http://opencv.itseez.com/modules/gpu/doc/feature_detection_and_description.html
SURF GPU Implementations
Speeded Up SURF: http://asrl.utias.utoronto.ca/code/gpusurf/
CUDA SURF: http://www.d2.mpi-inf.mpg.de/surf?q=surf
GPU SURF: http://homes.esat.kuleuven.be/~ncorneli/gpusurf/
Reference
http://en.wikipedia.org/wiki/SURF
Tuesday, March 13, 2012
Reading Digest::SURF
I am reading SURF feature detector/descriptor/match these days. A major paper is by Herbert Bay and et al, called SURF: Speeded Up Robust Features[1].
Speeded Up Robust Features(SURF) is a kind of scale-invariant and rotation-invariant features based on Hessian Matrix. Compared with SIFT and other approach to point features, it approximates or even outperforms repeatability, distinctiveness and robustness, yet can be computed and compared much faster[1].
The SURF method could be applied to sequential frames sampled by the Kinect RGB camera as well as the structured IR depth camera for global map registration and pose estimation. The scale-invariant and rotation-invariant properties fit the requirement of 3D reconstruction.
According to the paper[1], I found a couple of potential implementations could be done on CUDA to improve performance of real-time processing. So far they are
Detector
(1) Integral image: SURF uses integral image for convolution, which means an image point is represented as the sum of all pixels in the input image of a rectangular region formed by the point and the origin. This could be optimized by using multiple threads with CUDA.
(2) Filter parallelism: The approximation of 2nd order Gaussian derivatives with integral image are independent of size, which enable parallelism in applying filters of different scales to images.
(3) Point-wise calculation: Convolution and other operation may involve point-wise operations, which could be optimized by well-designed kernel function with techniques such as coalescing, tiling. This idea should work with the rest of the steps.
Descriptor
more to read...
Matching
more to read...
Reference
[1] Herbert Bay, Tinne Tuytelaars, Luc Van Gool, SURF: Speeded Up Robust Features,
Speeded Up Robust Features(SURF) is a kind of scale-invariant and rotation-invariant features based on Hessian Matrix. Compared with SIFT and other approach to point features, it approximates or even outperforms repeatability, distinctiveness and robustness, yet can be computed and compared much faster[1].
The SURF method could be applied to sequential frames sampled by the Kinect RGB camera as well as the structured IR depth camera for global map registration and pose estimation. The scale-invariant and rotation-invariant properties fit the requirement of 3D reconstruction.
According to the paper[1], I found a couple of potential implementations could be done on CUDA to improve performance of real-time processing. So far they are
Detector
(1) Integral image: SURF uses integral image for convolution, which means an image point is represented as the sum of all pixels in the input image of a rectangular region formed by the point and the origin. This could be optimized by using multiple threads with CUDA.
(2) Filter parallelism: The approximation of 2nd order Gaussian derivatives with integral image are independent of size, which enable parallelism in applying filters of different scales to images.
(3) Point-wise calculation: Convolution and other operation may involve point-wise operations, which could be optimized by well-designed kernel function with techniques such as coalescing, tiling. This idea should work with the rest of the steps.
Descriptor
more to read...
Matching
more to read...
Reference
[1] Herbert Bay, Tinne Tuytelaars, Luc Van Gool, SURF: Speeded Up Robust Features,
Naive CPU-based RGB-D SLAM
Complied and ran the ROS packages[1] of RGB-D SLAM on my laptop again yesterday. The result looks cool but pretty laggy for real-time purpose. Here's a short video on of the result. The 3D reconstruction of my dormitory with cloud points are shown.
Reference
[1] Felix Endres, Juergen Hess, Nikolas Engelhard, http://www.ros.org/wiki/rgbdslam
Reference
[1] Felix Endres, Juergen Hess, Nikolas Engelhard, http://www.ros.org/wiki/rgbdslam
CUDA-based Algorithm Resources
Installed ROS today. Found some resources about SIFT/SURT/ICP implementations on CUDA. These may be good references and pretty helpful with my project.
GPU SIFT: http://cs.unc.edu/~ccwu/siftgpu/
CUDA SURF: http://www.d2.mpi-inf.mpg.de/surf?q=surf
GPU ICP: http://home.hiroshima-u.ac.jp/tamaki/study/cuda_softassign_emicp/
GPU SIFT: http://cs.unc.edu/~ccwu/siftgpu/
CUDA SURF: http://www.d2.mpi-inf.mpg.de/surf?q=surf
GPU ICP: http://home.hiroshima-u.ac.jp/tamaki/study/cuda_softassign_emicp/
Monday, March 12, 2012
Project Proposal
CIS 565 Final Project Pitch
GPU Accelerated RGB-D SLAM with Microsoft Kinect
Yedong Niu
03/11/2012
Background
The simultaneous localization and mapping (SLAM) problem asks if it is possible for a mobile robot to be placed at an unknown location in an unknown environment and for the robot to incrementally build a consistent map of this environment while simultaneously determining its location within this map[1]. While vision based SLAM is one of the most recent approaches in the SLAM community, RGB-D(epth) method with affordable Microsoft Kinect sensor is a typical implementation. The real-time application is a challenge as it deals with gigantic amount of point data with limited hardware resources on a mobile robot. GPU implementation may solve the problem above somehow.
Goal
My project aims to improve the performance of real-time 3D environment reconstruction[5] with Kinect by using CUDA. I will mainly focus on improving the efficiency of related computer vision algorithms including registration, feature extraction and matching(SIFT/SURF and RANSAC), and Iterative Closest Point (ICP) algorithm. My project will base on N. Engelhard’s paper[2] and apply GPU application on every appropriate step. The model takes 2 seconds per frame on Intel i&@2GHz[2], which is the baseline where I started. I may use some OpenCV[3] and PCL[4] GPU libraries if allowed.
The above 3D reconstruction is based on point clouds. An optional goal is to reconstruct the environment by geometry-based surfaces, which is more challenging but more rewarding in some applications such as virtual touch input[6]. As time is limited, I don’t know whether I could reach this goal finally.
6D SLAM with RGB-D Data from Kinect [5]
KinectFusion[6]
Reference
[1] Hugh Durrant-Whyte, Tim Bailey, Simultaneous Localization and Mapping: Part I, 2006
[2] N. Engelhard, F. Endres and etal, Real-time 3D visual SLAM with a hand-held RGB-D camera, 2011
[3] OpenCV GPU documentation 2.3, http://opencv.itseez.com/modules/gpu/doc/gpu.html, 2012
[4] PCL documentation, http://pointclouds.org/documentation/, 2012
[5] N. Engelhard, http://www.youtube.com/watch?v=XejNctt2Fcs, 6D SLAM with RGB-D Data from Kinect
[6] Shahram Izadi and etal, KinectFusion: Realtime 3D Reconstruction and Interaction Using a Moving Depth Camera, pp563, 2011
Saturday, March 10, 2012
Just a start
This is the official blog for my CIS 565 final project. The topic is about GPU-accelerated Kinect applications. The project aims to improve the performance of applications such as RGB-D SLAM, gesture recognition and real-time rendering. One specific topic will be choose in the following days.
During the spring break I looked into some topics about GPU based OpenCV and PCL libraries. They seems to be powerful for computer vision algorithms including registration, feature extraction and matching and the like. More to read and decide the roadmap of the project.
During the spring break I looked into some topics about GPU based OpenCV and PCL libraries. They seems to be powerful for computer vision algorithms including registration, feature extraction and matching and the like. More to read and decide the roadmap of the project.
Subscribe to:
Posts (Atom)