T²SQNet: A Recognition Model for Manipulating Partially Observed Transparent Tableware Objects

Conference on Robot Learning (CoRL) 2024

Young Hun Kim^{*, 1}, Seungyeon Kim^{*, 1}, Yonghyeon Lee², Frank C. Park¹

¹Seoul National University, ²Korea Institute For Advanced Study

^*Equal contribution

TL;DR: This paper proposes a novel framework for recognizing and manipulating partially observed transparent tableware objects.

Sequential Decluttering

T²SQNet-based method succeeds in sequentially grasping the objects without re-recognition, while avoiding collisions with other objects and the environment.

Target Retrieval

T²SQNet-based method also successfully rearranges the surrounding objects and finally retrieves an initially non-graspable target object (e.g., wineglass).

Abstract

Recognizing and manipulating transparent tableware from partial view RGB image observations is made challenging by the difficulty in obtaining reliable depth measurements of transparent objects. In this paper we present the Transparent Tableware SuperQuadric Network (T²SQNet), a neural network model that leverages a family of newly extended deformable superquadrics to produce low-dimensional, instance-wise and accurate 3D geometric representations of transparent objects from partial views. As a byproduct and contribution of independent interest, we also present TablewareNet, a publicly available toolset of seven parametrized shapes based on our extended deformable superquadrics, that can be used to generate new datasets of tableware objects of diverse shapes and sizes. Experiments with T²SQNet trained with TablewareNet show that T²SQNet outperforms existing methods in recognizing transparent objects, in some cases by significant margins, and can be effectively used in robotic applications like decluttering and target retrieval.

Video

Dataset

Extended Deformable Superquadrics

Superquadrics, parametrized by only a few parameters, can represent a relatively wide range of geometric shapes. We employ two kinds of superquadrics: superellipsoids, which have been used for object manipulation, and superparaboloids, which are newly introduced. Superellipsoids and superparaboloids are implicit surfaces with the following implicit functions with size parameters $(a_1, a_2, a_3) \in \mathbb{R}_+^3$ and shape parameters $(e_1, e_2) \in \mathbb{R}_+^2$: for $\textbf{x} = (x, y, z)$, $$ \begin{equation*} \overbrace{f_{se}(\textbf{x})=\left(\left|\frac{x}{a_1}\right|^{\frac{2}{e_2}} + \left|\frac{y}{a_2}\right|^{\frac{2}{e_2}}\right)^{\frac{e_2}{e_1}} + \left|\frac{z}{a_3}\right|^{\frac{2}{e_1}} = 1 }^{\text{Superellipsoid}}, \:\:\:\:\:\: \overbrace{f_{sp}(\textbf{x})= \left(\left|\frac{x}{a_1}\right|^{\frac{2}{e_2}} + \left|\frac{y}{a_2}\right|^{\frac{2}{e_2}}\right)^{\frac{e_2}{e_1}} - \left(\frac{z}{a_3}\right) = 1}^{\text{Superparaboloid}} \label{eq:sq} \end{equation*} $$ Deformable superquadrics extend superquadrics by incorporating global deformations, including tapering, bending, and shearing transformations. By adjusting the parameters, various surfaces can be represented, as shown below.

TablewareNet: Dataset for Cluttered Transparent Tableware

We combine deformable superquadrics to define templates representing seven types of tableware: wine glasses, bottles, beer bottles, bowls, dishes, handleless cups, and mugs. By adjusting parameters, we can generate diverse 3D tableware meshes. Spawning these meshes in a user-defined environment (e.g., table or shelf) within a physics simulator allows us to generate cluttered scenes. Using Blender, a photorealistic renderer, with transparent textures, we obtain RGB images of the scenes from arbitrary camera poses.

Train Your Own Models with TablewareNet!

For each scene in TablewareNet, we provide mask images, depth images, and RGB images captured from seven different camera poses using the synthetic camera parameters of the RealSense D435. Additionally, each scene includes 3D geometric information such as object poses, tableware parameters, class labels, bounding boxes, meshes, and TSDF values. Currently, the dataset features only one version: one with transparent objects on a table (a version with objects on a shelf can be generated using the data generation scripts provided in our GitHub). Click the button below to download and use TablewareNet!

Recognition Model

T²SQNet: Transparent Tableware SuperQuadric Network

Overall, our method consists of four steps: (1) mask prediction in 2D images, (2) prediction of 3D bounding boxes, (3) computation of a smoothed visual hull through voxel carving, and (4) prediction of tableware parameters (i.e., a set of superquadric parameters). We apply these modules sequentially during inference, which can lead to the accumulation of prediction errors. To address this, we develop techniques to train each module accurately and robustly against noise and sim-to-real gaps.

Recognition Results

The figure below shows the ground-truth shapes of the transparent TRansPose objects alongside the inferred implicit surfaces from T²SQNet. Although capturing surface details, such as the curvature of a water bottle, is challenging due to the nature of superquadric surfaces, we can confirm that T²SQNet infers the overall shapes to a considerable extent.

Geometry-Aware Object Manipulation

Object Manipulation with T²SQNet

T²SQNet offers several practical advantages for downstream object manipulation tasks. For example, it allows for the easy design of an effective 6-DoF grasp sampler based on deformable superquadric representations, enables rapid collision checks through implicit function representations of deformable superquadric surfaces, and facilitates target-driven manipulation with instance-wise object recognition.

Object Manipulation Results

We demonstrate the effectiveness of our model, T²SQNet, on two object manipulation tasks: (i) sequential decluttering, which involves sequential grasping in a cluttered environment, and (ii) target retrieval, which involves object rearrangement planning to retrieve an initially non-graspable target object. The target object is indicated by a specific tableware class name (e.g., wineglass). Real-world manipulation videos can be found below.

Citation


      @inproceedings{kim2024t,
        title={T $\^{} 2$ SQNet: A Recognition Model for Manipulating Partially Observed Transparent Tableware Objects},
        author={Kim, Young Hun and Kim, Seungyeon and Lee, Yonghyeon and Park, Frank C},
        booktitle={8th Annual Conference on Robot Learning},
        year={2024}
      }

T2SQNet: A Recognition Model for Manipulating Partially Observed Transparent Tableware Objects