When we first designed the framework of perception module for Amazon Picking Challenge(APC), we tried to continue using our existing RGB-D object recognition and pose estimation pipeline in our pipeline. However, immediately after we saw the actual items we need to recognize in the competition, we are aware of that the traditional method using keypoints won’t work since most of the items are not textured enough for stable keypoint detection and matching. Also, we are really concerned about the RGB resolution from the Kinect sensor and the distortion in Kinect V2 is so severe that the corner of the shelf are bended therefore we gave it up. Speaking of Kinect V2, when we were at the competition, MIT team really did a good work using Kinect V2 by putting a pair of Kinect V2 in a formation below and it avoided the distortion problem in the corner and it is really a brilliant idea.
In order to recognize untextured/less-textured objects, there are two different methods: 1) using RGB-D feature and descritor and 2) using more advanced maching learning techniques. Unfortunately, during my experiences, there is no stable RGB-D keypoint detector or descriptor available for object recognition now. I decided to turn help from machine learning communities. Finally, there are 2 different methods implemented: 1) kernel descriptor and 2) EBlearn. We digged into the detailed for both methods theoretically and practically.
Our recognition pipeline shown very good performance in object detection for both textured and less-textured items even under very cluttered environments. The remaining problem in my point of view in this robotic perception is pose estimation or state estimation for manipulation. Using machine learning methods, once we get the detected roi, the traditional approach to compute the relative pose is via ICP but it is really not good enough especially when the roi is not accurate. Again, since I cannot find a good depth keypoint detector and descriptor, compute the relative pose via feature matching and SVD seems to be difficult. My future work will cover RGB-D descriptor for less-texture object recognition and pose estimation.