Holodeck: Language Guided Generation of
3D Embodied AI Environments

1University of Pennsylvania,   2Stanford University,
3University of Washington,   4Allen Institute for Artificial Intelligence
*Equal Technical Contribution
Three professors' office connected to a long hallway, the professor in office 1 is a fan of Star Wars.
a 1b1b apartment of a researcher
who has a cat
an arcade room with a pool table placed in the middle
a spa with large hot tubs and massage tables
a bike shop with various types of bicycles
a garage with a red sedan and a black bicycle
a sculpture museum with diverse statues
(indoor view)
a hunter's cabin adorned with mounted animal heads
(indoor view)
a bohemian style living room
a victorian style living room with a piano
a waiting room
a small hospital room with a wheelchair

Holodeck can generate diverse types of 3D environments (arcade, spa, museum), customize for styles (victorian, bohemian), and understand fine-grained requirements ("has a cat", "fan of Star Wars").

Abstract

3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs for styles, and can capture the semantics of complex queries such as "apartment for a researcher with a cat" and "office of a professor who is a fan of Star Wars". Holodeck leverages a large language model (GPT-4) for common sense knowledge about what the scene might look like and uses a large collection of 3D assets from Objaverse to populate the scene with diverse objects. To address the challenge of positioning objects correctly, we prompt GPT-4 to generate spatial relational constraints between objects and then optimize the layout to satisfy those constraints. Our large-scale human evaluation shows that annotators prefer Holodeck over manually designed procedural baselines in residential scenes and that Holodeck can produce high-quality outputs for diverse scene types. We also demonstrate an exciting application of Holodeck in Embodied AI, training agents to navigate in novel scenes like music rooms and daycares without human-constructed data, which is a significant step forward in developing general-purpose embodied agents.

Method

Given any text input, Holodeck generates 3D interactive embodied environments by utilizing a series of specialized modules through multiple rounds of conversation with an LLM (GPT-4).

Customizability

Holodeck can customize floor plans, materials, objects, etc. to match the user's input.

Floor Plan Customizability
Floor Plan Customizability
Material Customizability
Material Customizability
Object Customizability
Object Customizability
Door & Window Customizability
Door & Window Customizability

Human Evaluation

Our large-scale user studies involving 680 participants demonstrates that Holodeck significantly surpasses ProcTHOR in generating residential scenes and achieves high-quality outputs for various scene types.

Humans perfer Holodeck over ProcTHOR in generating residential scenes.

Interpolate start reference image.

(*iTHOR is designed by human experts.)


Holodeck can generate satisfactory (better than hard-coded rules) outputs for diverse scene types in MIT Scenes dataset.

Interpolate start reference image.

(The three horizontal lines represent the average score of each system on four types of residential scenes: bedroom, living room, bathroom and kitchen.)

Object Navigation in Novel Environments

Holodeck can aid embodied agents in adapting to new scene types and objects during object navigation tasks.

We introduce NoveltyTHOR, an artist-designed benchmark to evaluate embodied agents in diverse scenes.

Daycare_01
Daycare_01
Daycare_02
Daycare_02
Office_01
Office_01
Office_02
Office_02
Arcade_01
Arcade_01
Arcade_02
Arcade_02
Gym_01
Gym_01
Gym_02
Gym_02
Music Room_01
Music_Room_01
Music Room_02
Music_Room_02
Daycare_01
Daycare_01
Daycare_02
Daycare_02
Office_01
Office_01
Office_02
Office_02
Arcade_01
Arcade_01
Arcade_02
Arcade_02
Gym_01
Gym_01
Gym_02
Gym_02
Music Room_01
Music_Room_01
Music Room_02
Music_Room_02
Daycare_01
Daycare_01
Daycare_02
Daycare_02
Office_01
Office_01
Office_02
Office_02
Arcade_01
Arcade_01
Arcade_02
Arcade_02
Gym_01
Gym_01

Results. Agents fine-tuned on Holodeck showcase better zero-shot generalization on NoveltyTHOR.

(ProcTHOR + Objaverse is our strong baseline by enhancing ProcTHOR with Objaverse objects selected by Holodeck.)

BibTeX

@article{yang2023holodeck,
      title={Holodeck: Language Guided Generation of 3D Embodied AI Environments}, 
      author={Yue Yang and Fan-Yun Sun and Luca Weihs and Eli VanderBilt and Alvaro Herrasti and Winson Han and Jiajun Wu and Nick Haber and Ranjay Krishna and Lingjie Liu and Chris Callison-Burch and Mark Yatskar and Aniruddha Kembhavi and Christopher Clark},
      journal={arXiv preprint arXiv:2312.09067},
      year={2023}
}