CMU and Stanford Researchers Develop OBJECTFOLDER 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Perceiving and manipulating a wide range of objects is part of our daily activities. The alarm clock looks round and shiny, the cutlery jingles when hit with a fork, and the fork is sharp when touched on the edge. Each object has different physical characteristics, including 3D shapes, appearances, and material types, which contribute to their distinct sensory modes.

The purpose of computer vision is to recognize and locate objects in static images. They are therefore frequently modeled in 2D. Previous work on shape modeling provides low quality visual textures and only geometric 3D CAD models of the objects instead of more realistic textures. Additionally, the majority of work focuses on a single modality, usually vision, and does not cover the full range of physical attributes of the object. As a result, modeling of real-world things in the past has been quite limited and unrealistic.

Researchers from Stanford University and Carnegie Mellon University recently introduced OBJECTFOLDER 2.0, a massive dataset of implicitly represented multi-sensory duplicates of real things, into a study. It contains 1000 top notch 3D elements which have been gathered from online databases. We are improving the acoustic and tactile stimulation pipelines to display more realistic multisensory data compared to OBJECTFOLDER 1.01, which renders slowly and has poor multisensory simulation quality.

Building a massive dataset of realistic, multimodal 3D object models is the team’s goal in order to allow learning with these virtualized objects to be generalized to their physical counterparts. The researchers used current, high-quality scans of real objects to extract their physical characteristics, such as their visual textures, material compositions and 3D shapes. The scientists then used an implicit neural representation network to encode the simulated multisensory data after simulating each object’s visual, auditory, and tactile data according to its inherent properties. Models developed with these virtualized objects can then be used for activities requiring these things in the real world if the sensory input is accurate enough.

In addition, a brand new implicit neural representation network has been developed by researchers that displays real-time tactile, auditory and visual sensory data with the highest rendering quality. The team was able to apply the models learned about the virtualized objects to three challenging real-world tasks, including locating contacts, reconstructing forms, and estimating the scale of objects. Many applications, such as policy reinforcement learning, multi-sensory learning with vision, audio and touch, robot input of a variety of real objects on multiple robotic platforms, etc. are made possible by OBJECTFOLDER 2.0.


The goal of OBJECTFOLDER 2.0 is to advance multimodal learning in computer vision and robotics by delivering a 1,000-item dataset in the form of implicit neural representations. The dataset is ten times larger and renders orders of magnitude faster than previous efforts. The researchers also dramatically increased the quality and realism of the multisensory data. During three test tasks, the researchers demonstrated that the models developed with the virtualized elements transferred successfully to their real-world counterparts. The team is excited about the research OBJECTFOLDER 2.0 will make possible and believes the dataset offers a viable path for multimodal object-centric learning in computer vision and robotics.

This Article is written as a summary article by Marktechpost Staff based on the research paper 'OBJECTFOLDER 2.0: A Multisensory Object Dataset for Sim2Real Transfer'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, github link and project.

Please Don't Forget To Join Our ML Subreddit

Comments are closed.