Curieux.JY
  • JungYeon Lee
  • Post
  • ๐Ÿ•ธ๏ธ Graph
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ํ•œ ์ค„๋กœ ์‹œ์ž‘ํ•˜๋ฉด
    • ์™œ ์–ด๋ ค์šด๊ฐ€ โ€” ๋ฐ์ดํ„ฐ์™€ ๋ฒค์น˜๋งˆํฌ, ์–‘์ชฝ์˜ ๋ณ‘๋ชฉ
    • ๋ฐฉ๋ฒ• ์ƒ์„ธ โ€” 3์š”์†Œ ํ”„๋ ˆ์ž„์›Œํฌ
      • MetaSim์˜ 3๊ณ„์ธต ์•„ํ‚คํ…์ฒ˜
      • ์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ๋Šฅ๋ ฅ
      • ๋ฐ์ดํ„ฐ์…‹ โ€” ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์ด ์ฃผ๋ ฅ, ๊ทธ ์œ„์— ์ƒ์„ฑยท์ฆ๊ฐ•
      • ๋ฒค์น˜๋งˆํฌ โ€” 4๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ํ”„๋กœํ† ์ฝœ๊ณผ IL/RL
    • ์ง๊ด€ โ€” ์™œ โ€œ์ถ”์ƒํ™” ๋จผ์ €โ€์ธ๊ฐ€
    • ์‹คํ—˜ โ€” ์‹ ๋ขฐ์„ฑ ๊ฒ€์ฆ์ด ๋ชฉ์ 
    • ๐Ÿ”ฌ ์žฌํ˜„ ๋…ธํŠธ (claude-curio demo)
    • ๋น„ํŒ์ ์œผ๋กœ ๋ณด๋ฉด
      • ๊ฐ•์ 
      • ์•ฝ์ ยทํ•œ๊ณ„
    • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ์ž๋ฆฌ๋งค๊น€
    • ์š”์•ฝ

๐Ÿ“ƒRoboVerse ๋ฆฌ๋ทฐ

simulation
benchmark
dataset
sim2real
il
rl
manipulation
cross-embodiment
world-model
humanoid
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Published

June 29, 2026

  • Paper Link

  • Code Link

  • Project

  • Haoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang ์™ธ ๋‹ค์ˆ˜ (UC Berkeley, PKU, USC, UMich, UIUC, Stanford, CMU, UCLA, BIGAI)

  • Preprint (arXiv:2504.18904v1), 2025

  1. ๐Ÿ’ก ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋งˆ๋‹ค ํฌ๋งทยทAPIยท์ž์‚ฐ์ด ์ œ๊ฐ๊ฐ์ด๋ผ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐยท๋ฒค์น˜๋งˆํฌ๊ฐ€ ํŒŒํŽธํ™”๋˜๋Š” ๋ฌธ์ œ๋ฅผ, ํ•˜๋‚˜์˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ถˆ๊ฐ€์ง€๋ก (simulator-agnostic) ์ถ”์ƒํ™” ์œ„์— ํ”Œ๋žซํผยท๋ฐ์ดํ„ฐ์…‹ยท๋ฒค์น˜๋งˆํฌ ์…‹์„ ํ†ต์งธ๋กœ ์˜ฌ๋ ค ํ‘ผ๋‹ค.
  2. โš™๏ธ ํ•ต์‹ฌ ์ธํ”„๋ผ MetaSim์ด 3๊ณ„์ธต(๋ฒ”์šฉ config MetaConfig โ†’ ์ •๋ ฌ๋œ ๋ฐฑ์—”๋“œ Handler โ†’ Gym ๋ž˜ํผ)์œผ๋กœ 6๊ฐœ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ(Isaac SimยทIsaac GymยทMuJoCoยทGenesisยทSAPIENยทPyBullet)๋ฅผ ๋™์ผ ์ธํ„ฐํŽ˜์ด์Šค๋กœ ๋ฌถ์–ด, cross-simulator ํ†ตํ•ฉยทhybrid simulationยทcross-embodiment retargeting์„ ๊ฐ€๋Šฅ์ผ€ ํ•˜๊ณ , ๊ทธ ์œ„์—์„œ 14๊ฐœ ๋ฒค์น˜๋งˆํฌ๋ฅผ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•ด 276 task categoryยท510.5k trajectoryยท5.5k assetยท50M+ transition์˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ IL/RL ํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌ๋ฅผ ๋งŒ๋“ ๋‹ค.
  3. ๐ŸŽฏ IL ๋ฒค์น˜๋งˆํฌ์—์„œ Diffusion Policy ํ‰๊ท  48.6%ยทACT ํ‰๊ท  50.0%๋กœ ๋ฐ์ดํ„ฐ ์‹ ๋ขฐ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ณ , 4๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ํ”„๋กœํ† ์ฝœ(taskโ†’envโ†’cameraโ†’lighting)์—์„œ ์ ์ง„์  ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ธก์ •ํ•˜๋ฉฐ, RoboVerse ๋ฐ์ดํ„ฐ๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•œ OpenVLA๊ฐ€ ์ถ”๊ฐ€ ํ•™์Šต ์—†์ด ์‹ค์„ธ๊ณ„ grasping 50โ€“80%(์ง์ ‘ sim-to-real)๋ฅผ ๋‹ฌ์„ฑํ•œ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

NLPยทCV๋Š” โ€œ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ + ํ‘œ์ค€ ๋ฒค์น˜๋งˆํฌโ€๋กœ ํญ๋ฐœ์ ์œผ๋กœ ์„ฑ์žฅํ–ˆ์ง€๋งŒ, ๋กœ๋ด‡์€ ๋‘ ์ถ• ๋ชจ๋‘์—์„œ ๋ง‰ํžŒ๋‹ค. ์‹ค์„ธ๊ณ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์€ ๋น„์‹ธ๊ณ  ํ•˜๋“œ์›จ์–ด ์ข…์†์ ์ด๋ฉฐ, ์‹ค์„ธ๊ณ„ ๋ฒค์น˜๋งˆํฌ๋Š” ์กฐ๋ช…ยท๋ฐฐ์น˜ยท๋ฐฐ๊ฒฝ์ด ๋งค๋ฒˆ ๋‹ฌ๋ผ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๊ณต์ • ๋น„๊ต๊ฐ€ ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ๋Œ€์•ˆ์ด์ง€๋งŒ โ€” ์—ฌ๊ธฐ์„œ RoboVerse์˜ ๋ฌธ์ œ์˜์‹์ด ์‹œ์ž‘๋œ๋‹ค โ€” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋งˆ๋‹ค ๋‚ด๋ถ€ ๊ตฌ์กฐยท์™ธ๋ถ€ ์ธํ„ฐํŽ˜์ด์Šคยท์ž์‚ฐ ํฌ๋งท์ด ๋„ˆ๋ฌด ๋‹ฌ๋ผ์„œ, ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๋งŒ๋“  ๋ฐ์ดํ„ฐยทํƒœ์Šคํฌยท๋ชจ๋ธ์„ ๋‹ค๋ฅธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋กœ ์˜ฎ๊ธฐ๋Š” ์ผ์ด ๋…ธ๋™์ง‘์•ฝ์ ์ด๊ณ , ๊ฒฐ๊ณผ์ ์œผ๋กœ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์ƒํƒœ๊ณ„๊ฐ€ ํŒŒํŽธํ™”๋œ๋‹ค. RoboVerse๋Š” ์ด ํŒŒํŽธํ™”๋ฅผ ์ •๋ฉด์œผ๋กœ ๊ฒจ๋ƒฅํ•ด, ํฉ์–ด์ง„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋“ค์„ ํ•˜๋‚˜์˜ ํ‘œ์ค€ ํฌ๋งท๊ณผ ๋‹จ์ผ ์ธํ”„๋ผ ์•„๋ž˜๋กœ ํ†ตํ•ฉํ•˜๋Š” ํ”Œ๋žซํผ(MetaSim) + ๊ทธ ์œ„์—์„œ ๋งŒ๋“  ๋Œ€๊ทœ๋ชจ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹ + ํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌ์˜ 3์š”์†Œ ํŒจํ‚ค์ง€๋ฅผ ์ œ์•ˆํ•œ๋‹ค.


๊ฐœ์š”(Fig. 1) โ€” RoboVerse๋Š” ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ”Œ๋žซํผ, ๋Œ€๊ทœ๋ชจ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹, ํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ํ†ต์ผ๋œ ํ”„๋กœํ† ์ฝœ๋กœ ์ƒˆ ํƒœ์Šคํฌยท์‹œ์—ฐ์„ ๋งค๋„๋Ÿฝ๊ฒŒ ํ†ตํ•ฉํ•˜๋ฉฐ, ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์œผ๋กœ ๊ตฌ์ถ•๋œ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก :

RoboVerse๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋…ผ๋ฌธ์ด๋ผ๊ธฐ๋ณด๋‹ค ์‹œ์Šคํ…œยท๋ฐ์ดํ„ฐ์…‹ ๋…ผ๋ฌธ์ด๋‹ค. ํ•ต์‹ฌ์€ ๋‹จ์ผ ์ˆ˜์‹์ด ์•„๋‹ˆ๋ผ ์ถ”์ƒํ™”์˜ ์„ค๊ณ„์— ์žˆ๋‹ค. ์ž„์˜์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ agentsยทobjectsยทtasksยทsensorsยทphysics ๋‹ค์„ฏ ์š”์†Œ๋กœ ๋ถ„ํ•ดํ•˜๊ณ , ์ด๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์— ๋ฌด๊ด€ํ•œ nested dataclass MetaConfig๋กœ ํ‘œํ˜„ํ•œ๋‹ค. ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค config c๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ๊ฐ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ฐฑ์—”๋“œ์˜ Handler๊ฐ€ ์ด๋ฅผ ์ž์‹ ์˜ ๋ช…๋ น์œผ๋กœ ๋ฒˆ์—ญํ•œ๋‹ค:

\text{Sim}_b = \mathcal{H}_b(c),\qquad b \in \{\text{IsaacSim},\text{IsaacGym},\text{MuJoCo},\text{Genesis},\text{SAPIEN},\text{PyBullet}\}.

Handler๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ˆ˜๋ช…์ฃผ๊ธฐ ์ „์ฒด๋ฅผ ๊ณตํ†ต ๋ฉ”์„œ๋“œ(launch(), get_states(), set_states(), โ€ฆ)๋กœ ์ •๋ ฌํ•˜๊ณ , ๊ทธ ์œ„์˜ Gym ๋ž˜ํผ๊ฐ€ step()/reset()/render()/close()๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ์ด ์ •๋ ฌ ๋•์— ์„ธ ๋Šฅ๋ ฅ์ด ์ƒ๊ธด๋‹ค โ€” โ‘  cross-simulator integration(ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ํƒœ์Šคํฌยท๊ถค์ ์„ ๋‹ค๋ฅธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์‚ฌ์šฉ, sim-to-sim), โ‘ก hybrid simulation(ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋ฌผ๋ฆฌ ์—”์ง„ + ๋‹ค๋ฅธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋ Œ๋”๋Ÿฌ๋ฅผ ๊ฒฐํ•ฉ), โ‘ข cross-embodiment transfer(์—”๋“œ์ดํŽ™ํ„ฐ ์ž์„ธ retargeting์œผ๋กœ ํ‰ํ–‰ ๊ทธ๋ฆฌํผ ๋กœ๋ด‡ ๊ฐ„ ๊ถค์  ์žฌ์‚ฌ์šฉ). IL ๋ฒค์น˜๋งˆํฌ ๊ธฐ์ค€์„ ์ธ Diffusion Policy๋Š” ํ‘œ์ค€ ์กฐ๊ฑด๋ถ€ denoising์œผ๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค: \widehat{\epsilon^{k}}=\epsilon_{\theta}(a^{k},s,k).

์ฃผ์š” ๊ฒฐ๊ณผ: (์›๋ฌธ ํ™•์ธ ์ˆ˜์น˜๋งŒ)

  • ๋ฐ์ดํ„ฐ์…‹ ๊ทœ๋ชจ: manipulation 14๊ฐœ ์†Œ์Šค ๋ฒค์น˜๋งˆํฌ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ โ†’ 276 task category, 510.5k trajectory, 5.5k asset, ์ •์ฑ… ํ•™์Šต์šฉ 50M+ state transition(Tab. I).
  • IL ๋ฒค์น˜๋งˆํฌ(Tab. II): 6๊ฐœ ๋Œ€ํ‘œ ํƒœ์Šคํฌ ํ‰๊ท  ์„ฑ๊ณต๋ฅ  โ€” Diffusion Policy 48.6%(78M), ACT 50.0%(84M). ํƒœ์Šคํฌ๋ณ„ ํŽธ์ฐจ ํผ(์˜ˆ: ์ ‘์ด‰ ํ’๋ถ€ํ•œ robosuite NutAssembly์—์„œ DP 7.1%, ACT 0.0%).
  • 4๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™”(Tab. III): PickCube์—์„œ Diffusion Policy๊ฐ€ Level 0 52.7% โ†’ Level 1 11.1% โ†’ Level 2ยท3 0.0%๋กœ ๊ธ‰๋ฝ โ€” ์นด๋ฉ”๋ผยท์กฐ๋ช… ๋ณ€๋™์— ํ˜„ ์ •์ฑ…์ด ๋งค์šฐ ์ทจ์•ฝํ•จ์„ ์ •๋Ÿ‰ํ™”.
  • ์ง์ ‘ sim-to-real(Tab. V/VIII): RoboVerse ๋ฐ์ดํ„ฐ๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•œ OpenVLA๊ฐ€ ์ถ”๊ฐ€ ํ•™์Šต ์—†์ด ๋ฏธ์ง€ ๋ฌผ์ฒด grasping์—์„œ 7/10ยท8/10ยท5/10(50โ€“80%), Octo๋Š” 5/10ยท3/10ยท6/10(30โ€“60%).
  • ๊ถค์  ์ฆ๊ฐ•(Fig. 10): 50๊ฐœ source ์‹œ์—ฐ โ†’ 200/1000/3000๊ฐœ ์ƒ์„ฑ ์‹œ์—ฐ์œผ๋กœ ๋Š˜๋ฆด์ˆ˜๋ก Diffusion Policy ์„ฑ๊ณต๋ฅ  ์ผ๊ด€ ์ƒ์Šน.

๊ฒฐ๋ก : RoboVerse๋Š” โ€œ์–ด๋–ป๊ฒŒ ๋” ์ข‹์€ ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š”๊ฐ€โ€๊ฐ€ ์•„๋‹ˆ๋ผ โ€œ์–ด๋–ป๊ฒŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ž์‚ฐ์„ ํ†ตํ•ฉยทํ™•์žฅยทํ‘œ์ค€ํ™”ํ•˜๋Š”๊ฐ€โ€์— ๋‹ตํ•œ๋‹ค. MetaSim์˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ถˆ๊ฐ€์ง€๋ก  ์ถ”์ƒํ™” ํ•˜๋‚˜๋กœ ํฉ์–ด์ง„ ๋ฒค์น˜๋งˆํฌ๋ฅผ ๋™์ผ ํฌ๋งท์œผ๋กœ ๋ชจ์œผ๊ณ , ๊ทธ ์œ„์—์„œ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐยทํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌยทsim-to-real ํŒŒ์ดํ”„๋ผ์ธ์„ ์ผ๊ด€๋˜๊ฒŒ ๊ตด๋ฆด ์ˆ˜ ์žˆ์Œ์„ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์œผ๋กœ ์‹ค์ฆํ•œ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

ํ•œ ์ค„๋กœ ์‹œ์ž‘ํ•˜๋ฉด

๋กœ๋ด‡ ํ•™์Šต์ด NLPยทCV์ฒ˜๋Ÿผ ์Šค์ผ€์ผํ•˜์ง€ ๋ชปํ•˜๋Š” ์ง„์งœ ๋ณ‘๋ชฉ์€ ๋ชจ๋ธ์ด ์•„๋‹ˆ๋ผ ์ธํ”„๋ผ์˜ ํŒŒํŽธํ™”๋‹ค โ€” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋งˆ๋‹ค ํฌ๋งท์ด ๋‹ฌ๋ผ ๋ฐ์ดํ„ฐยทํƒœ์Šคํฌ๊ฐ€ ์žฌ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค. RoboVerse๋Š” ์ด ํŒŒํŽธํ™”๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ถˆ๊ฐ€์ง€๋ก  ์ถ”์ƒํ™”(MetaSim)๋กœ ๋ด‰ํ•ฉํ•˜๊ณ , ๊ทธ ์œ„์— ํ†ตํ•ฉ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์Œ“์•„ โ€œ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ณด์กฐ ๋กœ๋ด‡ ํ•™์Šตโ€์˜ ๊ณต์šฉ ๊ธฐ๋ฐ˜์„ ์ œ์•ˆํ•œ๋‹ค.

์™œ ์–ด๋ ค์šด๊ฐ€ โ€” ๋ฐ์ดํ„ฐ์™€ ๋ฒค์น˜๋งˆํฌ, ์–‘์ชฝ์˜ ๋ณ‘๋ชฉ

์ €์ž๋“ค์˜ ์ถœ๋ฐœ์ ์€ ๋‹จ์ˆœํ•˜๋‹ค. ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์™€ ํ‘œ์ค€ ๋ฒค์น˜๋งˆํฌ๊ฐ€ NLPยทCV๋ฅผ ๋Œ์–ด์˜ฌ๋ ธ๋Š”๋ฐ ๋กœ๋ด‡๋งŒ ์•ˆ ๋œ๋‹ค. ๋‘ ๊ฐ€์ง€ ๊ธธ์ด ๋‹ค ๋ง‰ํ˜€์„œ๋‹ค.

์‹ค์„ธ๊ณ„ ๊ธธ์˜ ํ•œ๊ณ„. ์‹ค๋กœ๋ด‡ ์‹œ์—ฐ ์ˆ˜์ง‘์€ ์‹œ๊ฐ„ยท์ž์› ์ง‘์•ฝ์ ์ด๊ณ , ๋ชจ์€ ๋ฐ์ดํ„ฐ๋Š” ํ•˜๋“œ์›จ์–ดยท๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์ข…์†์ ์ด๋ผ ์ƒˆ ์‹œ๋‚˜๋ฆฌ์˜ค๋กœ ์ž˜ ์˜ฎ๊ฒจ๊ฐ€์ง€ ์•Š๋Š”๋‹ค. ๋” ๊ทผ๋ณธ์ ์œผ๋กœ ์‹ค์„ธ๊ณ„ ๋ฒค์น˜๋งˆํฌ๋Š” ์žฌํ˜„์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค โ€” ๋ฌผ์ฒด ๋ฐฐ์น˜๊ฐ€ ๋กค์•„์›ƒ๋งˆ๋‹ค ๋ณ€ํ•˜๊ณ , ์ž์—ฐ๊ด‘์ด ํ”๋“ค๋ฆฌ๋ฉฐ, ๋ฐฐ๊ฒฝ์ด ๋ฐ”๋€๋‹ค. ๊ทธ๋ž˜์„œ ๊ณต์ • ๋น„๊ต๊ฐ€ ์–ด๋ ต๊ณ  ๊ฐœ๋ฐœ ๋ฐ˜๋ณต์ด ๋น„์‹ธ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธธ์˜ ํ•œ๊ณ„. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋Š” ํšจ์œจ์  ๊ณ„์‚ฐยทํ•ฉ์„ฑ ์ž์‚ฐยท์žฌํ˜„ ๊ฐ€๋Šฅ ์„ค์ •์„ ์ฃผ๋Š” ๋งค๋ ฅ์  ๋Œ€์•ˆ์ด์ง€๋งŒ, ๋‘ ์žฅ๋ฒฝ์ด ์žˆ๋‹ค. (1) ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์„ค๊ณ„๊ฐ€ ๋ณต์žกํ•˜๊ณ  ๋งŽ์€ ํ”Œ๋žซํผ์ด ๋ฏธ์„ฑ์ˆ™ํ•ด ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•์— ์ „๋ฌธ์„ฑ์ด ๋“ ๋‹ค. (2) ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋งˆ๋‹ค ๋‚ด๋ถ€ ์•„ํ‚คํ…์ฒ˜ยท์™ธ๋ถ€ ์ธํ„ฐํŽ˜์ด์Šค๊ฐ€ ์ฒœ์ฐจ๋งŒ๋ณ„์ด๋ผ ๋ฐ์ดํ„ฐยท๋ชจ๋ธยท์›Œํฌํ”Œ๋กœ๋ฅผ ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๋‹ค๋ฅธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋กœ ์˜ฎ๊ธฐ๊ธฐ๊ฐ€ ๋…ธ๋™์ง‘์•ฝ์ ์ด๋‹ค. ๊ฒฐ๊ณผ๋Š” ํŒŒํŽธํ™”๋œ ์ƒํƒœ๊ณ„ โ€” ๊ธฐ์กด ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹ยท๋ฒค์น˜๋งˆํฌ์˜ ์žฌ์‚ฌ์šฉ์ด ์–ด๋ ต๊ณ , ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ํ™œ์šฉ์ด ๊ฐ€๋กœ๋ง‰ํžŒ๋‹ค.

RoboVerse์˜ ๋ช…์ œ๋Š” โ€œ์ด ํŒŒํŽธํ™” ์ž์ฒด๊ฐ€ ํ•ด๊ฒฐํ•ด์•ผ ํ•  1์ฐจ ๋ฌธ์ œโ€๋ผ๋Š” ๊ฒƒ์ด๋‹ค. ๋” ๋‚˜์€ ์ •์ฑ… ์ด์ „์—, ํฉ์–ด์ง„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ํ‘œ์ค€ ํฌ๋งท๊ณผ ๋‹จ์ผ ์ธํ”„๋ผ๋กœ ๋ฌถ๋Š” ์ผ์ด ๋จผ์ €๋‹ค.

๋ฐฉ๋ฒ• ์ƒ์„ธ โ€” 3์š”์†Œ ํ”„๋ ˆ์ž„์›Œํฌ

RoboVerse๋Š” (1) ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ”Œ๋žซํผ, (2) ๋Œ€๊ทœ๋ชจ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ์…‹, (3) ํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌ์˜ ์„ธ ๋ถ€๋ถ„์œผ๋กœ ์ด๋ค„์ง€๊ณ , ๊ทธ ์‹ฌ์žฅ์— ์ธํ”„๋ผ MetaSim์ด ์žˆ๋‹ค.


์ „์ฒด ๊ตฌ์„ฑ(Fig. 2) โ€” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ”Œ๋žซํผยท๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹ยทํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌ. ํ”Œ๋žซํผ์˜ ํ•ต์‹ฌ์€ MetaSim์ด๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ๊ณผ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์ถ•์ด ์ด๋ค„์ง„๋‹ค.

MetaSim์˜ 3๊ณ„์ธต ์•„ํ‚คํ…์ฒ˜

MetaSim์€ ํŠน์ • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ตฌํ˜„ ์œ„์— ๋†“์ด๋Š” ๊ณ ์ˆ˜์ค€ ์ธํ„ฐํŽ˜์ด์Šค๋‹ค. ์„ธ ๊ณ„์ธต์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค.

(1) ๋ฒ”์šฉ ์„ค์ • ์‹œ์Šคํ…œ โ€” MetaConfig. ์ „ํ˜•์  ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์€ agents(๋ˆ„๊ฐ€ ํ–‰๋™ํ•˜๋‚˜), objects(ํ™˜๊ฒฝ์€ ์–ด๋–ป๊ฒŒ ์ƒ๊ฒผ๋‚˜), tasks(๋ฌด์—‡์„ ํ•˜๋‚˜ โ€” instructionยทsuccess metricยทreward), sensors(์–ด๋–ป๊ฒŒ ์ธ์ง€ยท์ธก์ •ํ•˜๋‚˜), physics(์ง€๋ฐฐ ๋ฌผ๋ฆฌ ๋ฒ•์น™)์˜ ๋‹ค์„ฏ ์š”์†Œ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์ด์ƒ์ ์œผ๋กœ ์ด๋“ค์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ถˆ๊ฐ€์ง€๋ก ์ด์–ด์•ผ ํ•œ๋‹ค. RoboVerse๋Š” ์ด๋ฅผ nested dataclass MetaConfig๋กœ ์ถ”์ƒํ™”ํ•œ๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐฑ์—”๋“œ๊ฐ€ ์ด config๋ฅผ ํ•ด์„ํ•ด ๋Œ€์‘ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ๋งŒ๋“ค๊ณ , ๋™์‹œ์— ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ณ ์œ  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(solver type ๋“ฑ)๋„ ์„ ํƒ์ ์œผ๋กœ ๋ฐ›์•„ ๊ฐ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๊ณ ์œ  ๊ธฐ๋Šฅ์„ ์‚ด๋ฆฐ๋‹ค.


MetaConfig(Fig. 4) โ€” ์ž„์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์˜ ํ•ต์‹ฌ ์š”์†Œ(agentsยทobjectsยทtaskยทsensorsยทphysics)๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ถˆ๊ฐ€์ง€๋ก ์ ์œผ๋กœ ์ถ”์ƒํ™”ํ•œ nested dataclass. task๋Š” TaskConfig(instructionsยทsuccess_metricsยทreward_funcs)๋กœ, physics๋Š” PhysicsConfig(gravityยทcollisionยทfriction)๋กœ ํŽผ์ณ์ง„๋‹ค.

(2) ์ •๋ ฌ๋œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ฐฑ์—”๋“œ โ€” Handler. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋งˆ๋‹ค ๊ตฌํ˜„์€ ๋‹ค๋ฅด์ง€๋งŒ ์ผ์ƒ ์—ฐ์‚ฐ(์”ฌ ์ดˆ๊ธฐํ™”ยท๊ฐ์ฒด ๋กœ๋”ฉยท๋ฌผ๋ฆฌ ์Šคํ…ยท๊ด€์ธก ํš๋“ยท์‹œ๊ฐ„ ๊ด€๋ฆฌยท์„ฑ๊ณต ํŒ์ •)์€ ๋น„์Šทํ•œ ํŒจํ„ด์„ ๋”ฐ๋ฅธ๋‹ค. MetaSim์€ ์ด๋ฅผ Handler ํด๋ž˜์Šค์˜ ๊ณตํ†ต ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์ •๋ ฌํ•œ๋‹ค. ๊ฐ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋Š” ์ž๊ธฐ Handler ์ธ์Šคํ„ด์Šค๋ฅผ ๊ฐ–๊ณ  launch(), get_states(), set_states() ๋“ฑ ์ˆ˜๋ช…์ฃผ๊ธฐ ์ „์ฒด์˜ ๊ณตํ†ต ๋ฉ”์„œ๋“œ๋ฅผ ๊ตฌํ˜„ํ•œ๋‹ค.

(3) Gym ํ™˜๊ฒฝ ๋ž˜ํผ. Handler๋ฅผ ๊ฐ์‹ธ ํ‘œ์ค€ ํ•™์Šต ํ™˜๊ฒฝ(Gym)์œผ๋กœ ๋งŒ๋“ ๋‹ค. step()/reset()/render()/close()๊ฐ€ ๋‚ด๋ถ€์ ์œผ๋กœ Handler ๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•ด ๊ตฌํ˜„๋œ๋‹ค โ€” RLยท๋กœ๋ด‡ ํ•™์Šต์—์„œ ๊ฐ€์žฅ ๋„๋ฆฌ ์“ฐ์ด๋Š” ํŒจ๋Ÿฌ๋‹ค์ž„์— ๊ณง์žฅ ๋ถ™๋Š”๋‹ค.


MetaSim 3๊ณ„์ธต(Fig. 3) โ€” ๋ฒ”์šฉ ์„ค์ • ์‹œ์Šคํ…œ + ์ •๋ ฌ๋œ ๋ฐฑ์—”๋“œ(Isaac LabยทIsaac GymยทMuJoCoยทSAPIENยทGenesisยทBulletยทCoppeliaSim) + Gym ๋ž˜ํผ. ์ด ์ถ”์ƒํ™”๊ฐ€ cross-simulator ํ†ตํ•ฉยทhybrid simulationยทcross-embodiment transfer ์„ธ ๋Šฅ๋ ฅ์„ ๊ฐ€๋Šฅ์ผ€ ํ•˜๊ณ , ๊ทธ ์œ„์—์„œ ํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌ์™€ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ์…‹์ด ๋งŒ๋“ค์–ด์ง„๋‹ค.

์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ๋Šฅ๋ ฅ

์ด ์ •๋ ฌ์—์„œ ์„ธ ๋Šฅ๋ ฅ์ด ์ž์—ฐํžˆ ๋‚˜์˜จ๋‹ค. โ‘  Cross-Simulator Integration โ€” ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ํƒœ์Šคํฌยท๊ถค์ ์„ ๋‹ค๋ฅธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ. ์˜ˆ์ปจ๋Œ€ Meta-World ํƒœ์Šคํฌ๋ฅผ Isaac Gym์—์„œ ๋น ๋ฅธ ๋ณ‘๋ ฌ ํ•™์Šต์— ์“ฐ๊ณ , ์ƒ์„ฑ๋œ ๊ถค์ ์„ Isaac Sim์—์„œ ๋ Œ๋”๋งํ•œ๋‹ค(sim-to-sim). โ‘ก Hybrid Simulation โ€” ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๊ฐ•๋ ฅํ•œ ๋ Œ๋”๋Ÿฌ(์˜ˆ: Isaac Sim)์™€ ๋‹ค๋ฅธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ ์—”์ง„(์˜ˆ: MuJoCo)์„ ๋‹จ์ผ ๋ช…๋ น์œผ๋กœ ๊ฒฐํ•ฉํ•ด ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑ. โ‘ข Cross-Embodiment Transfer โ€” ์—”๋“œ์ดํŽ™ํ„ฐ ์ž์„ธ๋ฅผ retargetํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ํ‰ํ–‰ ๊ทธ๋ฆฌํผ ๋กœ๋ด‡ ํ˜•ํƒœ ๊ฐ„ ๊ถค์ ์„ ์žฌ์‚ฌ์šฉ, ์ด์ข… ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹จ์ผ ํฌ๋งท์œผ๋กœ ํ†ตํ•ฉ.

๋ฐ์ดํ„ฐ์…‹ โ€” ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์ด ์ฃผ๋ ฅ, ๊ทธ ์œ„์— ์ƒ์„ฑยท์ฆ๊ฐ•

๋ฐ์ดํ„ฐ ๊ตฌ์ถ•์˜ ์ฃผ๋œ ์›์ฒœ์€ ๊ธฐ์กด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์œผ๋กœ๋ถ€ํ„ฐ์˜ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์ด๋‹ค. ์ง์ ‘ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์ด ์–ด๋ ค์šด ๊ฒฝ์šฐ motion planningยทRL rollout์œผ๋กœ ์™„์ „ํ•œ ๊ถค์ ์„ ๋งŒ๋“ค๊ณ , success checker๋ฅผ ๋งž์ถฐ ์—„๊ฒฉํžˆ ํ•„ํ„ฐ๋งํ•œ๋‹ค. ํ˜„์žฌ manipulation ์ชฝ์œผ๋กœ ManiSkillยทRLBenchยทCALVINยทMeta-WorldยทrobosuiteยทMimicGenยทGAPartNetยทOpen6DORยทARNOLDยทLIBEROยทSIMPLERยทGraspNetยทGarmentLabยทUniDoorManip ๋“ฑ 14๊ฐœ๋ฅผ ํ†ตํ•ฉํ–ˆ๋‹ค. Navigation์€ VLN-CE์˜ R2R(10k episode)ยทRxR(20k episode)์„ MatterPort3D(90 scene)์™€ ๊ฒฐํ•ฉํ–ˆ๊ณ , locomotionยทwhole-body๋Š” HumanoidBenchยทHumanoid-XยทSkillBlender๋ฅผ ๊ฐ€์ ธ์™”๋‹ค.

๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ๋„˜์–ด ์„ธ ๊ฐˆ๋ž˜๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋ชจ์€๋‹ค โ€” (a) teleoperation(ํ‚ค๋ณด๋“œยท์กฐ์ด์Šคํ‹ฑยท์Šค๋งˆํŠธํฐ ์•ฑยท๋ชจ์…˜์บก์ฒ˜ยทVR๋กœ armยทdexterous handยทbimanual ์ œ์–ด), (b) AI-assisted task generation(๋Œ€๊ทœ๋ชจ ์ƒ์„ฑ๋ชจ๋ธ์ด ๊ณต๊ฐ„ยท์˜๋ฏธ ์ œ์•ฝ์„ ํ•™์Šตํ•ด ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํƒ€๋‹นํ•œ ์”ฌ์„ ๋ฐฐ์น˜, format validation + feasibility check์˜ 2๋‹จ๊ณ„ ํ•„ํ„ฐ๋ง), (c) real-to-sim(๋ชจ๋ฐ”์ผ ๋‹ค์‹œ์  ์ดฌ์˜ โ†’ COLMAPยทGaussian Splatting โ†’ VLM์œผ๋กœ ๋ฌผ๋ฆฌ ์†์„ฑ ์ถ”๋ก  โ†’ TSDF ๋ฉ”์‹œ โ†’ URDF ๊ตฌ์„ฑ). ์ตœ์ข…์ ์œผ๋กœ ๊ถค์  ์ฆ๊ฐ•(MimicGen ํ”„๋ ˆ์ž„์›Œํฌ ๊ธฐ๋ฐ˜ object-centric subtask ๋ถ„ํ•ด)๊ณผ domain randomization์œผ๋กœ ๋‹ค์–‘์„ฑยท๊ทœ๋ชจ๋ฅผ ํ‚ค์šด๋‹ค. Domain randomization์€ Isaac Sim handler์—์„œ ๋„ค ์ข…๋ฅ˜๋กœ โ€” ํ…Œ์ด๋ธ”/๋ฐ”๋‹ฅ/๋ฒฝ ์žฌ์งˆ(ํ…Œ์ด๋ธ” 300์ข…, ๋ฒฝยท๋ฐ”๋‹ฅ ๊ฐ ~150์ข…), ์กฐ๋ช…(distant + cylinder light array), ์นด๋ฉ”๋ผ ์ž์„ธ(59๊ฐœ ํ›„๋ณด), ๋ฐ˜์‚ฌ ์†์„ฑ(roughnessยทspecularยทmetallic).


๋ฐ์ดํ„ฐ์…‹ ๋น„๊ตยท๊ฐค๋Ÿฌ๋ฆฌ(Fig. 8) โ€” ์ขŒ: ๋Œ€ํ‘œ์  ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ์…‹, ์šฐ: RoboVerse ๋ฐ์ดํ„ฐ์…‹. ํ’๋ถ€ํ•œ domain randomization์„ ์ ์šฉํ•œ ๋Œ€ํ‘œ ํƒœ์Šคํฌ๋“ค.

๋ฒค์น˜๋งˆํฌ โ€” 4๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ํ”„๋กœํ† ์ฝœ๊ณผ IL/RL

IL ๋ฒค์น˜๋งˆํฌ๋Š” ๊ณ ์ •๋œ ์‹œ์—ฐ ์ง‘ํ•ฉ๊ณผ ํ†ต์ œ๋œ ํ‰๊ฐ€ ํ™˜๊ฒฝ์„ ์“ด๋‹ค. ํ•ต์‹ฌ ์„ค๊ณ„๋Š” 4๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ํ”„๋กœํ† ์ฝœ๋กœ, ๋ฐ์ดํ„ฐ์˜ 90%๋ฅผ ํ•™์Šต, 10%๋ฅผ ์ผ๋ฐ˜ํ™” ํ‰๊ฐ€์— ํ• ๋‹นํ•œ๋‹ค โ€” Level 0: task space ์ผ๋ฐ˜ํ™”(์นด๋ฉ”๋ผยท์žฌ์งˆยท์กฐ๋ช… ๊ณ ์ •, ๋ฌผ์ฒด ์ดˆ๊ธฐํ™”ยทinstruction๋งŒ 90/10 ๋ถ„ํ• ), Level 1: environment randomization(์”ฌยทํ…Œ์ด๋ธ”ยท๋ฐ”๋‹ฅ ๋ณ€๊ฒฝ), Level 2: camera randomization(์‹œ์  ๋†’์ดยท๊ฐ๋„), Level 3: lightingยทreflection randomization. RL ๋ฒค์น˜๋งˆํฌ๋Š” Stable-Baselines3ยทrsl_rl์˜ PPO์™€ ์› ๋ฒค์น˜๋งˆํฌ์˜ TD-MPC2๋ฅผ MetaSim ์ธํ„ฐํŽ˜์ด์Šค์— ํ†ตํ•ฉํ•ด HumanoidBench๋ฅผ MuJoCoโ†”๏ธŽIsaac Sim ์–‘์ชฝ์—์„œ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ–ˆ๋‹ค.


4๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™” ํ”„๋กœํ† ์ฝœ(Fig. 9) โ€” Level 0(task space) โ†’ Level 1(environment) โ†’ Level 2(camera) โ†’ Level 3(lightingยทreflection)๋กœ ๊ฐˆ์ˆ˜๋ก randomization์ด ๊ฐ•ํ•ด์ง„๋‹ค. 90% ํ•™์Šต / 10% ์ผ๋ฐ˜ํ™” ํ‰๊ฐ€.

์ง๊ด€ โ€” ์™œ โ€œ์ถ”์ƒํ™” ๋จผ์ €โ€์ธ๊ฐ€

์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ํ†ต์ฐฐ์€ โ€œ์ •์ฑ…์„ ์ž˜ ๋งŒ๋“œ๋Š” ๊ฒƒโ€๊ณผ โ€œ๋ฐ์ดํ„ฐยท๋ฒค์น˜๋งˆํฌ๋ฅผ ์ž˜ ๋งŒ๋“œ๋Š” ๊ฒƒโ€์ด ์„œ๋กœ ๋‹ค๋ฅธ ๋ฌธ์ œ์ด๊ณ , ํ›„์ž๊ฐ€ ์ „์ž์˜ ์ „์ œ์กฐ๊ฑด์ด๋ผ๋Š” ์ ์ด๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ A์˜ ํƒœ์Šคํฌ๋ฅผ B์—์„œ ๋ชป ์“ฐ๋ฉด, ๋ชจ๋“  ์—ฐ๊ตฌ์ž๊ฐ€ ๊ฐ์ž ๊ฐ™์€ ํƒœ์Šคํฌ๋ฅผ ์žฌ๊ตฌํ˜„ํ•˜๋ฉฐ ๋ฐ”ํ€ด๋ฅผ ๋‹ค์‹œ ๋ฐœ๋ช…ํ•œ๋‹ค. MetaSim์˜ ํ•ต์‹ฌ์€ โ€œ์”ฌ์„ ๊ธฐ์ˆ ํ•˜๋Š” ์–ธ์–ด(MetaConfig)โ€์™€ โ€œ๊ทธ ์–ธ์–ด๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋ฐฑ์—”๋“œ(Handler)โ€๋ฅผ ๋ถ„๋ฆฌํ•œ ๊ฒƒ โ€” ์ปดํŒŒ์ผ๋Ÿฌ๊ฐ€ ์†Œ์Šค์ฝ”๋“œ์™€ ํƒ€๊นƒ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ถ„๋ฆฌํ•˜๋“ฏ. ์ด ๋ถ„๋ฆฌ ํ•œ ๋ฒˆ์œผ๋กœ cross-simulatorยทhybridยทcross-embodiment๊ฐ€ ๋ชจ๋‘ ๊ฐ™์€ ์ถ”์ƒํ™”์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋”ฐ๋ฆ„์ •๋ฆฌ๊ฐ€ ๋œ๋‹ค. ํŠนํžˆ hybrid simulation(์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ + ์ข‹์€ ๋ Œ๋”๋Ÿฌ๋ฅผ ์กฐํ•ฉ)์€ ๋‹จ์ผ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ๋ชป ์ฃผ๋Š” โ€œ์ •ํ™•ํ•˜๋ฉด์„œ ์‚ฌ์‹ค์ ์ธโ€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ’์‹ธ๊ฒŒ ๋งŒ๋“ ๋‹ค๋Š” ์ ์—์„œ sim-to-real์— ์ง์ ‘ ๊ธฐ์—ฌํ•œ๋‹ค.

์‹คํ—˜ โ€” ์‹ ๋ขฐ์„ฑ ๊ฒ€์ฆ์ด ๋ชฉ์ 

์ €์ž๋“ค์ด ๋ชป ๋ฐ•๋Š” ์ : ์‹คํ—˜์˜ ๋ชฉ์ ์€ ์ •์ฑ… ์„ฑ๋Šฅ ๊ฒฝ์Ÿ์ด ์•„๋‹ˆ๋ผ ๋ฐ์ดํ„ฐยท๋ฒค์น˜๋งˆํฌ์˜ ์‹ ๋ขฐ์„ฑ ๊ฒ€์ฆ๊ณผ ์‹œ์Šคํ…œ์˜ ํฌ๊ด„์„ฑ ์ž…์ฆ์ด๋‹ค.

IL ๋ฒค์น˜๋งˆํฌ(Tab. II). ๊ฐ ์†Œ์Šค ๋ฒค์น˜๋งˆํฌ์—์„œ ๋Œ€ํ‘œ ํƒœ์Šคํฌ ํ•˜๋‚˜์”ฉ(ManiSkill PickCubeยทStackCube, RLBench CloseBox, CALVIN MoveSliderLeft, LIBERO PickChocolatePudding, robosuite NutAssembly)์„ ๊ณจ๋ผ ๋‹จ์ผ ํƒœ์Šคํฌ ์„ค์ •ยท3 seed ํ‰๊ท ์œผ๋กœ ํ‰๊ฐ€ํ•œ๋‹ค. ํ‰๊ท  ์„ฑ๊ณต๋ฅ ์€ Diffusion Policy 48.6%, ACT 50.0%๋กœ ๋น„์Šทํ•˜์ง€๋งŒ ํƒœ์Šคํฌ๋ณ„ ๋ถ„์‚ฐ์ด ํฌ๋‹ค โ€” CALVIN MoveSliderLeft์—์„œ ACT 85.0%ยทDP 76.5%๋กœ ๋†’๊ณ , ์ ‘์ด‰ ํ’๋ถ€ํ•œ robosuite NutAssembly์—์„œ๋Š” DP 7.1%ยทACT 0.0%๋กœ ๋ฌด๋„ˆ์ง„๋‹ค. ์ด ๋ถ„์‚ฐ ์ž์ฒด๊ฐ€ โ€œ๋ฒค์น˜๋งˆํฌ๊ฐ€ ๋‹ค์–‘ํ•œ ๋‚œ์ด๋„๋ฅผ ๋‹ด๋Š”๋‹คโ€๋Š” ์ฆ๊ฑฐ๋กœ ์ œ์‹œ๋œ๋‹ค.

4๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™”(Tab. III). ํ•ต์‹ฌ ๋ฉ”์‹œ์ง€๋Š” ํ˜„ ์ •์ฑ…๋“ค์ด ์‹œ๊ฐ ์ผ๋ฐ˜ํ™”์— ๋งค์šฐ ์•ฝํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. PickCube์—์„œ Diffusion Policy๋Š” Level 0 52.7% โ†’ Level 1 11.1% โ†’ Level 2ยท3 0.0%, ACT๋Š” 31.7% โ†’ 30.0% โ†’ 6.7% โ†’ 3.3%๋กœ ์นด๋ฉ”๋ผยท์กฐ๋ช… ๋ณ€๋™์—์„œ ๊ฑฐ์˜ ๋ถ•๊ดดํ•œ๋‹ค. MoveSliderLeft์ฒ˜๋Ÿผ ์ƒ๋Œ€์ ์œผ๋กœ ๊ฒฌ๊ณ ํ•œ ํƒœ์Šคํฌ(DP L0 76.5% โ†’ L3 60.0%)๋„ ์žˆ์ง€๋งŒ, ์ „๋ฐ˜์ ์œผ๋กœ ์‹œ๊ฐ randomization์ด ๊ฐ•ํ•ด์งˆ์ˆ˜๋ก ๊ธ‰๊ฒฉํžˆ ๋–จ์–ด์ง„๋‹ค โ€” ์ด๋Š” ์ •์ฑ…์˜ ์•ฝ์ ์ธ ๋™์‹œ์— ๋ฒค์น˜๋งˆํฌ๊ฐ€ ์˜๋ฏธ ์žˆ๋Š” ๋‚œ์ด๋„ ๊ตฌ๋ฐฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค๋Š” ๋ฐฉ์ฆ์ด๋‹ค.

๊ถค์  ์ฆ๊ฐ•(Fig. 10). 50๊ฐœ source ์‹œ์—ฐ์—์„œ 200ยท1000ยท3000๊ฐœ๋กœ ์ฆ๊ฐ•์„ ๋Š˜๋ฆด์ˆ˜๋ก 4๊ฐœ ๋Œ€ํ‘œ ํƒœ์Šคํฌ์—์„œ Diffusion Policy ์„ฑ๊ณต๋ฅ ์ด ์ผ๊ด€ ์ƒ์Šน โ€” ์ฆ๊ฐ• API์˜ ํšจ๊ณผ์™€ ํ™•์žฅ์„ฑ์„ ๋ณด์ธ๋‹ค.


๊ถค์  ์ฆ๊ฐ• ํšจ๊ณผ(Fig. 10) โ€” source ๋ฐ์ดํ„ฐ์…‹ ๋Œ€๋น„ ์ฆ๊ฐ• ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šตํ•œ ์ •์ฑ…์˜ ์„ฑ๊ณต๋ฅ . ์ƒ์„ฑ ๋ฐ์ดํ„ฐ๊ฐ€ ๋Š˜์ˆ˜๋ก ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋œ๋‹ค.

World model ํ•™์Šต(VI-E). DROID 50k episode๋งŒ์œผ๋กœ ํ•™์Šตํ•œ action-conditioned world model์€ action ์กฐ๊ฑด์€ ๋”ฐ๋ฅด๋‚˜ ๊ทธ๋ฆฌํผ-๋ฌผ์ฒด ์ ‘์ด‰์˜ ๋ฌผ๋ฆฌ๋ฅผ ๋ชป ์žก์•„ ์ ‘์ด‰ ์‹œ ๋ฌผ์ฒด๊ฐ€ โ€œ๋’คํ‹€๋ฆฐ๋‹ค(warped)โ€. RoboVerse ํ•ฉ์„ฑ 50k๋ฅผ ๋”ํ•ด 100k๋กœ ํ‚ค์šฐ๋ฉด ๋ฌผ์ฒด ๊ธฐํ•˜ ๋ณด์กด์ด ๊ฐœ์„ ๋œ๋‹ค โ€” ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ค์„ธ๊ณ„ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ฐ•ํ•จ์„ ๋ณด์ธ๋‹ค(๋‹ค๋งŒ โ€œ์˜์ƒ๋งŒ ๋ณด๋Š” ๊ฒƒโ€์œผ๋กœ DROID์˜ ์ •๊ตํ•œ ๋ฌผ๋ฆฌ๋ฅผ ๋‹ค ๋ฐฐ์šฐ๊ธด ๋ถ€์กฑํ•˜๋‹ค๊ณ  ์†”์งํžˆ ์ธ์ •).

์ง์ ‘ sim-to-real / sim-to-sim-to-real(VI-FยทG, Tab. V). RoboVerse ๋ฐ์ดํ„ฐ๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•œ OpenVLA๋ฅผ ์ถ”๊ฐ€ ํ•™์Šต ์—†์ด ์‹ค์„ธ๊ณ„๋กœ ์˜ฎ๊ฒจ ๋ฏธ์ง€ ํ™˜๊ฒฝ์˜ ๋ฏธ์ง€ ๋ฌผ์ฒด๋ฅผ ์กฐ์ž‘ํ•œ๋‹ค. GraspNet์—์„œ ์ ์‘ํ•œ ์‹œ์—ฐ์œผ๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•œ ๊ฒฐ๊ณผ, ๋„์ „์  ์–ธ์–ด ์œ ๋„ grasping์—์„œ OpenVLA 7/10ยท8/10ยท5/10(50โ€“80%), Octo 5/10ยท3/10ยท6/10. RL์€ HumanoidBench whole-body ์ œ์–ด๋ฅผ sim-to-sim-to-real๋กœ ์ „์ดํ•œ๋‹ค.


์ง์ ‘ sim-to-real(Fig. 12) โ€” RoboVerse ํ”„๋ ˆ์ž„์›Œํฌ ๋‚ด ํ•™์Šต์ด ๋ฏธ์ง€ ํ™˜๊ฒฝ์˜ ๋ฏธ์ง€ ๋ฌผ์ฒด ์กฐ์ž‘์—์„œ ๋งค๋„๋Ÿฌ์šด ์ง์ ‘ sim-to-real(IL)๊ณผ whole-body humanoid ์ œ์–ด์˜ sim-to-sim-to-real(RL) ์ „์ด๋ฅผ ๊ฐ€๋Šฅ์ผ€ ํ•จ์„ ๋ณด์ธ๋‹ค.

๐Ÿ”ฌ ์žฌํ˜„ ๋…ธํŠธ (claude-curio demo)

๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์ฃผ์žฅ(์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ถˆ๊ฐ€์ง€๋ก  ์ถ”์ƒํ™”๊ฐ€ ์‹ค์ œ๋กœ ๋™์ž‘ํ•˜๋Š”๊ฐ€)์„ ์†Œ๋น„์ž GPU ํ™˜๊ฒฝ์—์„œ ์ง์ ‘ ๊ฒ€์ฆํ–ˆ๋‹ค โ€” RTX 4070 Laptop 8GB.

  • ํ†ตํ•ฉ APIยท๋ Œ๋”๊นŒ์ง€ ํ™•์ธ. MuJoCo ๋ฐฑ์—”๋“œ๋กœ python metasim/example/control_test.py --sim mujoco --headless๋ฅผ ์‹คํ–‰ํ•ด exit 0ยท100 ์Šคํ…ยทFranka Panda ํŒ”๊ณผ ์˜ค๋ธŒ์ ํŠธ ๋ Œ๋” ๋น„๋””์˜ค๋ฅผ ์–ป์—ˆ๋‹ค. ๋ช…๋ น ์„ฑ๊ณต์— ๊ทธ์น˜์ง€ ์•Š๊ณ  ๊ฒฐ๊ณผ๊นŒ์ง€ ๊ฒ€์ฆ โ€” ๋””์ฝ”๋“œ ๊ฒฐ๊ณผ shape (100, 1024, 1024, 3) uint8, non-blank 100/100 ํ”„๋ ˆ์ž„.
  • โ€œ1-์ธ์ž ๋ฐฑ์—”๋“œ ๊ต์ฒดโ€๊ฐ€ ์ฝ”๋“œ ๋ ˆ๋ฒจ์—์„œ ์„ฑ๋ฆฝ. ScenarioCfg(simulator=...) ํ•œ ์ธ์ž๋กœ ๋ฐฑ์—”๋“œ๋ฅผ ๋ฐ”๊พธ๋Š” ๊ตฌ์กฐ๋ฅผ ํ™•์ธํ–ˆ๋‹ค. ๋…ผ๋ฌธ์ด ๋งํ•˜๋Š” simulator-agnostic ์ถ”์ƒํ™”๊ฐ€ ์ถ”์ƒ๋„ ํ‘œํ˜„์ด ์•„๋‹ˆ๋ผ ์‹ค์ œ ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์กด์žฌํ•œ๋‹ค.
  • ์žฌํ˜„์„ฑ ๋””ํ…Œ์ผ. ํ—ค๋“œ๋ฆฌ์Šค MuJoCo๋Š” MUJOCO_GL=egl๊ฐ€ ํ•„์ˆ˜๋‹ค. extras๋Š” ์ž„์˜ ์กฐํ•ฉ์ด ์•ˆ ๋œ๋‹ค โ€” ๋ฐฑ์—”๋“œ๋ณ„๋กœ numpy/torch/CUDA ํ•€์ด ๋‹ฌ๋ผ ์ถฉ๋Œํ•˜๋ฏ€๋กœ ๋ฐฑ์—”๋“œ 1๊ฐœ = venv 1๊ฐœ๊ฐ€ ์‹ค๋ฌด ๊ทœ์น™์ด๋‹ค.
  • ๋ฐœ๊ฒฌํ•œ ํ•œ๊ณ„. PyBullet ๋ฐฑ์—”๋“œ๋Š” ์ด ์ปค๋ฐ‹ ๊ธฐ์ค€ ํ†ตํ•ฉ state API๊ฐ€ ๋ถˆ์™„์ „ํ–ˆ๋‹ค โ€” _get_states๊ฐ€ body_state๋ฅผ ์ฑ„์šฐ์ง€ ์•Š์•„ ํ…์„œ ๊ฒ€์ฆ์—์„œ TypeError๊ฐ€ ๋‚ฌ๋‹ค. ๋™์ผ ๊ฒฝ๋กœ๋ฅผ MuJoCo๋Š” ํ†ต๊ณผํ•˜๋ฏ€๋กœ ํ™˜๊ฒฝ ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ๋ผ ๋ฐฑ์—”๋“œ๋ณ„ ํ†ตํ•ฉ ์„ฑ์ˆ™๋„ ํŽธ์ฐจ๋‹ค.

์žฌํ˜„์€ claude-curio์˜ ์ž์ฒด ๋ฐ๋ชจ(์›๋ณธ MetaSim ๊ธฐ๋ฐ˜)๋กœ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ๊ฒ€์ฆํ•œ ๊ฒƒ์€ ํ†ตํ•ฉ ์ถ”์ƒํ™”์˜ ๋™์ž‘ ์—ฌ๋ถ€์ด์ง€ ๋ฐ์ดํ„ฐ์…‹ยท๋ฒค์น˜๋งˆํฌ ์ „์ฒด ๊ทœ๋ชจ์˜ ์žฌํ˜„์ด ์•„๋‹ˆ๋‹ค.

๋น„ํŒ์ ์œผ๋กœ ๋ณด๋ฉด

๊ฐ•์ 

  • ํŒŒํŽธํ™”๋ฅผ ์ •๋ฉด์œผ๋กœ ํ‘ธ๋Š” ์ถ”์ƒํ™”. โ€œ์”ฌ ๊ธฐ์ˆ (MetaConfig) โ†”๏ธŽ ์‹คํ–‰ ๋ฐฑ์—”๋“œ(Handler)โ€ ๋ถ„๋ฆฌ๋Š” ๋‹จ์ˆœํ•˜์ง€๋งŒ ๊ฐ•๋ ฅํ•˜๋‹ค. ์ด ํ•œ ๋ฒˆ์˜ ๊ฒฐ์ •์œผ๋กœ cross-simulatorยทhybridยทcross-embodiment๊ฐ€ ๋ชจ๋‘ ๋”ฐ๋ฆ„์ •๋ฆฌ๋กœ ๋‚˜์˜จ๋‹ค โ€” ์ƒˆ ํ˜•์‹์„ ๋ฐœ๋ช…ํ•˜์ง€ ์•Š๊ณ  ์ปดํŒŒ์ผ๋Ÿฌ์‹ ๋ถ„๋ฆฌ๋ฅผ ๋กœ๋ด‡ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์— ์ ์šฉํ•œ ๊น”๋”ํ•œ ์„ค๊ณ„.
  • ๊ทœ๋ชจ์™€ ํญ์˜ ๋™์‹œ ๋‹ฌ์„ฑ. 14๊ฐœ manipulation ๋ฒค์น˜๋งˆํฌ + navigation(R2RยทRxRยทMatterPort3D) + locomotion(HumanoidBenchยทHumanoid-X)์„ ๋‹จ์ผ ํฌ๋งท์œผ๋กœ ๋ชจ์•„ 276 categoryยท510.5k trajectoryยท50M+ transition์„ ๋งŒ๋“  ๊ฒƒ์€ ๋‹จ์ˆœ ์–‘์ด ์•„๋‹ˆ๋ผ ์ด์งˆ์  ์†Œ์Šค๋ฅผ ํ†ต์ผํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๊ฐ€์น˜๊ฐ€ ํฌ๋‹ค.
  • ๋ฒค์น˜๋งˆํฌ์˜ ๋‚œ์ด๋„ ๊ตฌ๋ฐฐ๊ฐ€ ์ธก์ •์œผ๋กœ ๋“œ๋Ÿฌ๋‚œ๋‹ค. 4๋‹จ๊ณ„ ํ”„๋กœํ† ์ฝœ์—์„œ ์ •์ฑ… ์„ฑ๊ณต๋ฅ ์ด Level์ด ์˜ค๋ฅผ์ˆ˜๋ก ๋ฌด๋„ˆ์ง€๋Š” ๊ฒƒ(PickCube 52.7โ†’0.0%)์€ ๋ฒค์น˜๋งˆํฌ๊ฐ€ ์˜๋ฏธ ์žˆ๋Š” ์ผ๋ฐ˜ํ™” ์••๋ ฅ์„ ์ค€๋‹ค๋Š” ๊ฐ•ํ•œ ์ฆ๊ฑฐ๋‹ค. โ€œ์ •์ฑ… ๊ฒฝ์Ÿ์ด ์•„๋‹ˆ๋ผ ์‹ ๋ขฐ์„ฑ ๊ฒ€์ฆโ€์ด๋ผ๋Š” ๋ชฉ์ ๊ณผ ์ผ๊ด€๋œ๋‹ค.
  • sim-to-real๊นŒ์ง€ ๋‹ซ๋Š” end-to-end. real-to-sim ์ž์‚ฐ ๋ณต์› โ†’ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ โ†’ ์ •์ฑ… ํ•™์Šต โ†’ ์ง์ ‘ sim-to-real๊นŒ์ง€ ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ๊ตด๋Ÿฌ๊ฐ€๊ณ , ์ถ”๊ฐ€ ํ•™์Šต ์—†๋Š” ์ „์ด๋กœ 50โ€“80% grasping์„ ๋ณด์ธ ๊ฒƒ์€ hybrid simulation์˜ ์‚ฌ์‹ค์„ฑ ์ฃผ์žฅ์— ์‹ค์ฆ์„ ๋‹จ๋‹ค.
  • ์ž๊ธฐ ํ•œ๊ณ„์— ์ •์งํ•˜๋‹ค. ๋ถ€๋ก์—์„œ ์„ธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ(SAPIENยทIsaac GymยทPyBullet)๊ฐ€ ์šด๋™๋Ÿ‰ยท๊ฐ์šด๋™๋Ÿ‰ยท์šด๋™์—๋„ˆ์ง€ ๋ณด์กด ๋ฒ•์น™์กฐ์ฐจ ์ง€ํ‚ค์ง€ ๋ชปํ•จ์„ ์ง์ ‘ ์ธก์ •ํ•ด ๋ณด์ด๊ณ , ์ด๋ฅผ โ€œ๋ณต์žกํ•œ ๊ฑฐ๋™์˜ ์ง์ ‘ sim-to-real ํฌ๋ง์— ๋น„๊ด€์  ์‹ ํ˜ธโ€๋ผ ์ ๋Š”๋‹ค โ€” ์ž๊ธฐ ํ”Œ๋žซํผ์˜ ํ† ๋Œ€(์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ)์˜ ํ•œ๊ณ„๋ฅผ ์ˆจ๊ธฐ์ง€ ์•Š๋Š” ๋ณด๊ธฐ ๋“œ๋ฌธ ์ •์งํ•จ.

์•ฝ์ ยทํ•œ๊ณ„

  • teaser ์ˆ˜์น˜์™€ ๋ณธ๋ฌธ ์ˆ˜์น˜์˜ ๋ถˆ์ผ์น˜. Fig. 1 ์บก์…˜์€ โ€œ1,000+ task, 10M+ transitionโ€์„ ๋‚ด์„ธ์šฐ์ง€๋งŒ ๋ณธ๋ฌธ ํ†ต๊ณ„๋Š” 276 task categoryยท510.5k trajectoryยท50M+ transition์ด๋‹ค. ์ •์˜๊ฐ€ ๋‹ค๋ฅธ(task vs task category) ํƒ“์ผ ์ˆ˜ ์žˆ์œผ๋‚˜, ๋Œ€ํ‘œ ๊ทธ๋ฆผ์˜ ์ˆซ์ž๊ฐ€ ๋ณธ๋ฌธ๊ณผ ์–ด๊ธ‹๋‚˜๋Š” ๊ฒƒ์€ ์ธ์šฉ ์‹œ ํ˜ผ๋™์„ ๋ถ€๋ฅธ๋‹ค โ€” ๋ณธ ๋ฆฌ๋ทฐ๋Š” ๋ณธ๋ฌธ ์ˆ˜์น˜๋ฅผ ๊ถŒ์œ„๋กœ ์‚ผ์•˜๋‹ค.
  • ํ‰๊ฐ€ ํ‘œ๋ณธ์ด ์ž‘๋‹ค. IL์€ ํ•™์Šต 10 + ๊ฒ€์ฆ 10 ์‹œ๋‚˜๋ฆฌ์˜คยท3 seed, sim-to-real grasping์€ ํƒœ์Šคํฌ๋‹น 10ํšŒ๋กœ ํ‘œ๋ณธ์ด ์ž‘์•„ ํ†ต๊ณ„์  ์‹ ๋ขฐ๊ตฌ๊ฐ„์„ ๋…ผํ•˜๊ธฐ ์–ด๋ ต๋‹ค. OpenVLA๋Š” ์ž์› ์ œ์•ฝ์œผ๋กœ 20๊ฐœ ์‹œ๋‚˜๋ฆฌ์˜ค๋งŒ, VLA๋Š” ๋‹จ์ผ ํƒœ์Šคํฌ ์„ค์ •๋งŒ ํ‰๊ฐ€ํ–ˆ๋‹ค.
  • ๊ธฐ์ค€์„ ์ด ์ตœ์ ์ด ์•„๋‹ ์ˆ˜ ์žˆ์Œ(์ €์ž ์ธ์ •). ๋ชจ๋“  baseline์„ RoboVerse ์•ˆ์—์„œ ์žฌ๊ตฌํ˜„ํ–ˆ์œผ๋‚˜ ์ผ๋ถ€๋Š” suboptimal์ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋ช…์‹œํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ํ‘œ์˜ ์ ˆ๋Œ€ ์ˆ˜์น˜(์˜ˆ: ACT NutAssembly 0.0%)๋ฅผ ์› ๋…ผ๋ฌธ ์„ฑ๋Šฅ๊ณผ ์ง์ ‘ ๋น„๊ตํ•˜๋ฉด ์•ˆ ๋œ๋‹ค โ€” ์–ด๋””๊นŒ์ง€๋‚˜ ์‹œ์Šคํ…œ ๊ฒ€์ฆ์šฉ.
  • cross-embodiment์˜ ๋ฒ”์œ„๊ฐ€ ์ข๋‹ค. retargeting์ด ํ‰ํ–‰ ๊ทธ๋ฆฌํผ ๋กœ๋ด‡์— ํ•œ์ •๋œ๋‹ค. dexterous handยท๋‹ค์ง€ ์† ๊ฐ„ ์ผ๋ฐ˜ retargeting์€ ๋‹ค๋ฃจ์ง€ ์•Š์•„, ์†์žฌ์ฃผ ์กฐ์ž‘ ๋ฐ์ดํ„ฐ์˜ cross-embodiment ์žฌ์‚ฌ์šฉ์€ ๋ฒ”์œ„ ๋ฐ–์ด๋‹ค.
  • ๋น„๊ฐ•์ฒดยทfoundation model์€ ๋ฏธ์™„. ๋น„๊ฐ•์ฒด ๋ฌผ์ฒด์˜ ํ†ต์ผ ํฌ๋งท์ด ์•„์ง ๋ฏธ์ง€์›์ด๊ณ (ํ•œ๊ณ„๋กœ ๋ช…์‹œ), ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ foundation model์„ ์‚ฌ์ „ํ•™์Šตํ•˜๋Š” ๊ฐ€์žฅ ํฅ๋ฏธ๋กœ์šด ํ™œ์šฉ์€ ์ž์› ์ œ์•ฝ์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ ๋ฒ”์œ„ ๋ฐ–์ด๋‹ค โ€” โ€œ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“ค์—ˆ๋‹คโ€์™€ โ€œ๊ทธ ๋ฐ์ดํ„ฐ๋กœ ๋ฌด์—‡์ด ๊ฐ€๋Šฅํ•œ๊ฐ€โ€๋Š” ์•„์ง ๋ถ€๋ถ„์ ์œผ๋กœ๋งŒ ์—ฐ๊ฒฐ๋œ๋‹ค.
  • ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ฌผ๋ฆฌ ์ž์ฒด์˜ ์ฒœ์žฅ. ๋ณด์กด ๋ฒ•์น™ ์œ„๋ฐ˜ ์‹คํ—˜์ด ๋ณด์—ฌ์ฃผ๋“ฏ, ํ†ตํ•ฉ ์ถ”์ƒํ™”๊ฐ€ ์•„๋ฌด๋ฆฌ ๊น”๋”ํ•ด๋„ ํ•˜๋ถ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋“ค์˜ ๋ฌผ๋ฆฌ ๋ถ€์ •ํ™•์„ฑ์€ ๊ทธ๋Œ€๋กœ ์ƒ์†๋œ๋‹ค. ํ†ตํ•ฉ์ด ์ •ํ™•์„ฑ์„ ๋งŒ๋“ค์–ด์ฃผ์ง€๋Š” ์•Š๋Š”๋‹ค.
  • ์ธํ”„๋ผยท์œ ์ง€๋ณด์ˆ˜ ๋ถ€๋‹ด. 6๊ฐœ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ฐฑ์—”๋“œ๋ฅผ ๋™์‹œ์— ์ •๋ ฌยท์œ ์ง€ํ•˜๋Š” ๊ฒƒ์€ ์ง€์†์  ๋น„์šฉ์ด๋ฉฐ, ๊ฐ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์—…๋ฐ์ดํŠธ๋งˆ๋‹ค Handler ์ •ํ•ฉ์„ ๋งž์ถฐ์•ผ ํ•œ๋‹ค. ๋…ผ๋ฌธ์€ ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ธฐ์—ฌ๋กœ ์œ ์ง€ํ•˜์ž๊ณ  ์ œ์•ˆํ•˜๋‚˜, ์žฅ๊ธฐ ์ •ํ•ฉ์„ฑ์€ ์—ด๋ฆฐ ๋ฌธ์ œ๋‹ค.
  • ๋ฐฑ์—”๋“œ๋ณ„ ํ†ตํ•ฉ ์„ฑ์ˆ™๋„ ํŽธ์ฐจ. 6๊ฐœ ๋ฐฑ์—”๋“œ ๋™๋“ฑ ์ง€์›์„ ํ‘œ๋ฐฉํ•˜์ง€๋งŒ ์‹ค์ œ ์„ฑ์ˆ™๋„๋Š” ๋ฐฑ์—”๋“œ๋งˆ๋‹ค ๋‹ค๋ฅด๋‹ค โ€” ์œ„ ์žฌํ˜„ ๋…ธํŠธ์—์„œ MuJoCo๋Š” ํ†ตํ•ฉ state API๊ฐ€ ์ •์ƒ ๋™์ž‘ํ•œ ๋ฐ˜๋ฉด PyBullet์€ ๊ฐ™์€ ๊ฒฝ๋กœ์—์„œ body_state๋ฅผ ์ฑ„์šฐ์ง€ ๋ชปํ•ด ์‹คํŒจํ–ˆ๋‹ค. โ€œํ†ตํ•ฉ ์ธํ„ฐํŽ˜์ด์Šคโ€๊ฐ€ ๋ชจ๋“  ๋ฐฑ์—”๋“œ์—์„œ ๋™์ผ ์ˆ˜์ค€์œผ๋กœ ์™„์„ฑ๋๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉด ์•ˆ ๋œ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ์ž๋ฆฌ๋งค๊น€

RoboVerse๋Š” ์„ธ ํ๋ฆ„์˜ ํ•ฉ๋ฅ˜์ ์— ์žˆ๋‹ค. ์ฒซ์งธ, ๋กœ๋ด‡ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐยท๋ฐ์ดํ„ฐ ์ƒ์„ฑ๊ธฐ: GPU ๋ณ‘๋ ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐยท๋ Œ๋”๋ง์„ ์ œ๊ณตํ•˜๋Š” ManiSkill3 ๋ฆฌ๋ทฐ๋Š” RoboVerse๊ฐ€ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•˜๋Š” ์†Œ์Šค ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ, โ€œ๋‹จ์ผ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๊นŠ์ด vs ๋‹ค์ค‘ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ํ†ตํ•ฉโ€์ด๋ผ๋Š” ๋Œ€๋น„๋ฅผ ์ด๋ฃฌ๋‹ค. ๊ฐ•ํ•œ domain randomization์„ ๊ฐ–์ถ˜ ํ™•์žฅํ˜• ๋ฐ์ดํ„ฐ ์ƒ์„ฑ๊ธฐยท๋ฒค์น˜๋งˆํฌ RoboTwin2 ๋ฆฌ๋ทฐ์™€๋Š” ๋ชฉํ‘œ(ํ™•์žฅ ๊ฐ€๋Šฅ ๋ฐ์ดํ„ฐ + ๊ฒฌ๊ณ ํ•œ ๋ฒค์น˜๋งˆํฌ)๊ฐ€ ๋งค์šฐ ๊ฐ€๊น๋˜, RoboVerse๋Š” ๋‹จ์ผ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ์•„๋‹ˆ๋ผ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-ํšก๋‹จ ํ†ตํ•ฉ์„ ํƒํ•œ ์ ์ด ๋‹ค๋ฅด๋‹ค. ๋‘˜์งธ, ๋ฒค์น˜๋งˆํฌ: 1,000๊ฐœ ์ผ์ƒ ํ™œ๋™์„ ๋‹ด์€ BEHAVIOR-1K ๋ฆฌ๋ทฐ, whole-body locomotionยทmanipulation์˜ HumanoidBench ๋ฆฌ๋ทฐ๋Š” RoboVerse๊ฐ€ ์ง์ ‘ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•ด RL ๋ฒค์น˜๋งˆํฌ๋กœ ํก์ˆ˜ํ•œ ๋Œ€์ƒ์œผ๋กœ, โ€œ๊ฐœ๋ณ„ ๋ฒค์น˜๋งˆํฌ vs ๋ฒค์น˜๋งˆํฌ๋“ค์˜ ๋ฉ”ํƒ€-ํ†ตํ•ฉโ€์ด๋ผ๋Š” ์ธต์œ„ ์ฐจ์ด๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. RoboVerse์˜ sim-to-sim-to-real humanoid ์ „์ด๋Š” Whole-Body Humanoid Locomotion ๋ฆฌ๋ทฐ์™€ ๋งž๋‹ฟ๋Š”๋‹ค. ์…‹์งธ, ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•ยทworld model: ๊ถค์  ์ฆ๊ฐ•์€ DexMimicGen ๋ฆฌ๋ทฐ๊ฐ€ ํ™•์žฅํ•œ MimicGen ํ”„๋ ˆ์ž„์›Œํฌ(object-centric subtask ๋ถ„ํ•ด)์— ๊ธฐ๋ฐ˜ํ•˜๋ฉฐ, world model ์‹คํ—˜์€ ๋Œ€๊ทœ๋ชจ ๋ฉ€ํ‹ฐํƒœ์Šคํฌ world model์„ ๋‹ค๋ฃจ๋Š” Newt(TD-MPC2 World Model) ๋ฆฌ๋ทฐ์™€ โ€œํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๊ฐ€ world model ํ•™์Šต์„ ๋ณด๊ฐ•ํ•˜๋Š”๊ฐ€โ€๋ผ๋Š” ์งˆ๋ฌธ์—์„œ ๋งŒ๋‚œ๋‹ค.

์š”์•ฝ

RoboVerse์˜ ๊ธฐ์—ฌ๋Š” โ€œ๋กœ๋ด‡ ํ•™์Šต์˜ ์Šค์ผ€์ผ๋ง ๋ณ‘๋ชฉ์€ ๋ชจ๋ธ์ด ์•„๋‹ˆ๋ผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ธํ”„๋ผ์˜ ํŒŒํŽธํ™”์ด๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ถˆ๊ฐ€์ง€๋ก  ์ถ”์ƒํ™”(MetaSim) ํ•˜๋‚˜๋กœ ํฉ์–ด์ง„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐยท๋ฒค์น˜๋งˆํฌยท๋ฐ์ดํ„ฐ๋ฅผ ๋‹จ์ผ ํฌ๋งท์œผ๋กœ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋‹คโ€๋Š” ์‹œ์Šคํ…œ์  ๋ช…์ œ๋ฅผ ๊ด‘๋ฒ”์œ„ํ•œ ์‹ค์ฆ์œผ๋กœ ๋’ท๋ฐ›์นจํ•œ ๋ฐ ์žˆ๋‹ค. MetaConfig(์”ฌ ๊ธฐ์ˆ ) โ†”๏ธŽ Handler(๋ฐฑ์—”๋“œ ์‹คํ–‰) โ†”๏ธŽ Gym ๋ž˜ํผ์˜ 3๊ณ„์ธต์ด cross-simulatorยทhybridยทcross-embodiment๋ฅผ ๋”ฐ๋ฆ„์ •๋ฆฌ๋กœ ๋งŒ๋“ค๊ณ , ๊ทธ ์œ„์—์„œ 14๊ฐœ ๋ฒค์น˜๋งˆํฌ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ + teleoperationยทAI ์ƒ์„ฑยทreal-to-sim + ์ฆ๊ฐ•ยทrandomization์œผ๋กœ 276 categoryยท510.5k trajectoryยท50M+ transition์„ ์Œ“๋Š”๋‹ค. IL ๋ฒค์น˜๋งˆํฌ(DP 48.6%ยทACT 50.0%)์™€ 4๋‹จ๊ณ„ ์ผ๋ฐ˜ํ™”(PickCube 52.7โ†’0.0%)๋Š” ๋ฐ์ดํ„ฐยท๋ฒค์น˜๋งˆํฌ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ๋‚œ์ด๋„ ๊ตฌ๋ฐฐ๋ฅผ ๊ฒ€์ฆํ•˜๊ณ , RoboVerse ๋ฐ์ดํ„ฐ๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•œ OpenVLA์˜ ์ง์ ‘ sim-to-real grasping 50โ€“80%๋Š” hybrid simulation์˜ ์‚ฌ์‹ค์„ฑ์„ ์‹ค์ฆํ•œ๋‹ค. ํ•œ๊ณ„๋„ ๋ถ„๋ช…ํ•˜๋‹ค โ€” teaser ์ˆ˜์น˜ ๋ถˆ์ผ์น˜, ์ž‘์€ ํ‰๊ฐ€ ํ‘œ๋ณธ, ํ‰ํ–‰ ๊ทธ๋ฆฌํผ์— ํ•œ์ •๋œ cross-embodiment, ๋ฏธ์™„์˜ ๋น„๊ฐ•์ฒดยทfoundation model ํ™œ์šฉ, ๊ทธ๋ฆฌ๊ณ  ๋ณด์กด ๋ฒ•์น™์กฐ์ฐจ ์–ด๊ธฐ๋Š” ํ•˜๋ถ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋ฌผ๋ฆฌ ์ฒœ์žฅ. ๊ทธ๋Ÿผ์—๋„ โ€œ์ •์ฑ… ๊ฒฝ์Ÿ ์ด์ „์— ๊ณต์šฉ ๊ธฐ๋ฐ˜์„ ๋จผ์ € ๊น๋‹คโ€๋Š” ์ด ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ด€์ ์€, ํŒŒํŽธํ™”๋กœ ๋ฉˆ์ถฐ ์žˆ๋˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ณด์กฐ ๋กœ๋ด‡ ํ•™์Šต์— ์„ค๋“๋ ฅ ์žˆ๋Š” ํ†ตํ•ฉ ์ฒญ์‚ฌ์ง„์„ ์ œ์‹œํ•œ๋‹ค. (์ฝ”๋“œยท๋ฐ์ดํ„ฐ์…‹์€ ํ”„๋กœ์ ํŠธ ํŽ˜์ด์ง€์— ๊ณต๊ฐœ โ€” ์žฌํ˜„ ํ‰๊ฐ€๋Š” ํ™˜๊ฒฝ ๊ตฌ์ถ• ํ›„ ๊ฐ€๋Šฅ.)

Copyright 2026, JungYeon Lee