|Title||Monocular SLAM Supported Object Recognition|
|Author||Sudeep Pillai and John Leonard|
Contributions in my opinion
This paper combines visual SLAM with object recognition, from the first glance, it may look similar as the SLAM++ paper from Andrew Davison group, however, the problems the author want to address are different. The SLAM part in this work acts as a pre-processed step to obtain the reconstructed point cloud, and further partition the point cloud using density based over-segmentation. From this results, the author reprojected segmented point cloud to different viewpoints and recognise the items in image space. The paper spends a lot of time in explain the image feature coding strategy starting from traditional BoVW to recent VLAD and even more recent FLAIR. In short words, FLAIR enables user to detect the position of the objects in the image. As a common sense, sliding window detection needs to solve the scalability issue, however, BoVW representation can avoid this issue by sacrificing the ability to localise the object in the image. How to localise the object in the image, FLAIR seems to be solution for this problem. In conclusion, I will categorize this paper as an extension of FLAIR rather than a combination of SLAM with object recognition. I also assume the scalability which the author highlighted in the abstract inherits from FLAIR. This work shows improvements on UW RGB-D scene dataset.
Questions in my opinion
Given an RGB sensor, I believe the work is absolute great idea. By doing SLAM, the geometrical information is taken into consideration, and it is no wonder it can generate better results compared with traditional methods such as BING. But the questions are :1) what will happen if RGB-D sensor is equipmented, the part of SLAM seems to be reduntant; 2) what will happen is the SLAM part shows errors in scene reconstruction. I am keen to know if the work is done by RGB-D sensor, 1) since RGB-D structure of every frame is available from RGB-D sensor, can online detection achieve the same results compared this paper which is doing detection after SLAM? 2) what can continuous frames/detection results help each other? 3) instead of generating bounding box detection results, are we able to generate a more accurate detection results using RGB-D segmentation or superpixel segmentation?
This paper proves that object recognition can be improved using the SLAM which adding more viewpoint information. and using FLAIR as detection is a great idea to improve the scalability issue.
Preliminary understanding on the scalability of FLAIR FLAIR dense samples the image space and instead of going through all the possible object candidatures, it only goes through the image space.