Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • 1. ์„œ๋ก : ์™œ ํœด๋จธ๋…ธ์ด๋“œ Loco-Manipulation์ด ์–ด๋ ค์šด๊ฐ€?
      • 1.1 ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•๋“ค์˜ ํ•œ๊ณ„
      • 1.2 Real-World Data vs. Simulation
    • 2. VIRAL ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐœ์š”
      • 2.1 Teacher-Student ์•„ํ‚คํ…์ฒ˜
    • 3. Teacher Policy ํ•™์Šต: ํ•ต์‹ฌ ์„ค๊ณ„ ์š”์†Œ
      • 3.1 Action Space ์„ค๊ณ„: Delta vs. Absolute
      • 3.2 WBC Command as API
      • 3.3 Reference State Initialization (RSI)
      • 3.4 Stage-Based Reward Design
    • 4. Student Policy ํ•™์Šต: Visual Distillation
      • 4.1 DAgger + BC Mixture
      • 4.2 Vision Backbone ์„ ํƒ
      • 4.3 History Architecture
      • 4.4 ๋ถ„์‚ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•™์Šต ์‹œ์Šคํ…œ
    • 5. Sim-to-Real Transfer ์ „๋žต
      • 5.1 System Identification for Dexterous Hand
      • 5.2 Camera FOV Alignment and Randomization
      • 5.3 Visual and Simulation Randomization
    • 6. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„
      • 6.1 ์‹คํ—˜ ์„ค์ •
      • 6.2 ๊ฒฌ๊ณ ์„ฑ(Robustness) ํ‰๊ฐ€
      • 6.3 ์ผ๋ฐ˜ํ™”(Generalization) ํ‰๊ฐ€
      • 6.4 Compute Scaling์˜ ์ค‘์š”์„ฑ
      • 6.5 Object Generalization
    • 7. ํ•œ๊ณ„์ ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ: ๋„ค ๊ฐ€์ง€ Coverage Gap
      • 7.1 Physics Coverage: ๋ฌผ๋ฆฌ์  ๋‹ค์–‘์„ฑ ๊ฒฉ์ฐจ
      • 7.2 Task Coverage: Task ์ƒ์„ฑ์˜ Long-Tail
      • 7.3 Reward and Policy Coverage: Reward Engineering ๋ณ‘๋ชฉ
      • 7.4 Hardware Coverage: ํ•˜๋“œ์›จ์–ด-์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒฉ์ฐจ
      • 7.5 ์ €์ž๋“ค์˜ ์ „๋ง
    • 8. ๊ฒฐ๋ก  ๋ฐ ํ•ต์‹ฌ ๊ตํ›ˆ
      • 8.1 ์ฃผ์š” ๊ธฐ์ˆ ์  ๊ธฐ์—ฌ
      • 8.2 ๋กœ๋ด‡๊ณตํ•™์ž๋“ค์„ ์œ„ํ•œ ์‹ค์šฉ์  ๊ตํ›ˆ
      • 8.3 ๋‚จ์€ ๊ณผ์ œ
    • Appendix: ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ
      • Teacher Policy (PPO)
      • Student Policy (DAgger + BC)
      • Domain Randomization Ranges
  • โ›๏ธ Dig Review
  • VIRAL: ํœด๋จธ๋…ธ์ด๋“œ ๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜์„ ์œ„ํ•œ ๋Œ€๊ทœ๋ชจ ์‹œ๊ฐ Sim-to-Real ํ”„๋ ˆ์ž„์›Œํฌ
    • VIRAL ๊ฐœ์š”: ๊ต์‚ฌ-ํ•™์ƒ ๋Œ€๊ทœ๋ชจ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ
    • ๊ต์‚ฌ ์ •์ฑ… ํ•™์Šต: ํŠน๊ถŒ ์ •๋ณด ํ™œ์šฉ ๊ฐ•ํ™”ํ•™์Šต (RL)
    • ํ•™์ƒ ์ •์ฑ… ์ฆ๋ฅ˜: ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ •์ฑ…์˜ ๋Œ€๊ทœ๋ชจ ํ•™์Šต
    • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ ๋Œ€๊ทœ๋ชจ ํ•™์Šต์˜ ์—ญํ• 
    • Sim-to-Real ์ „์ด ๊ธฐ๋ฒ•: ์‹œ๊ฐ ๋„๋ฉ”์ธ ๋žœ๋คํ™”์™€ ์‹ค์ œ-์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ •ํ•ฉ
    • ์‹คํ—˜ ๊ฒฐ๊ณผ: 54ํšŒ ์—ฐ์† ์‚ฌ์ดํด, ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ
    • ํ•œ๊ณ„ ๋ฐ ๋…ผ์˜: VIRAL์˜ ์˜์˜์™€ ํ–ฅํ›„ ๋„์ „๊ณผ์ œ

๐Ÿ“ƒVIRAL ๋ฆฌ๋ทฐ

humanoid
loco-manipulation
Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation
Published

November 24, 2025

๐Ÿ” Ping. ๐Ÿ”” Ring. โ›๏ธ Dig. A tiered review series: quick look, key ideas, deep dive.

  • Paper Link
  • Homepage
  1. โœจ VIRAL์€ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์ž์œจ์ ์ธ loco-manipulation ๊ธฐ์ˆ  ๋ถ€์กฑ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต๋œ ์ •์ฑ…์„ ์‹ค์ œ ํ•˜๋“œ์›จ์–ด์— zero-shot์œผ๋กœ ๋ฐฐํฌํ•˜๋Š” visual sim-to-real ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿง‘โ€๐Ÿซ ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” privileged RL teacher๊ฐ€ ์žฅ๊ธฐ์ ์ธ loco-manipulation์„ ํ•™์Šตํ•˜๊ณ , ์ด๋ฅผ ๋Œ€๊ทœ๋ชจ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ๊ด‘๋ฒ”์œ„ํ•œ ์‹œ๊ฐ์  ๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™”๋ฅผ ํ†ตํ•ด RGB ๊ธฐ๋ฐ˜ student policy๋กœ ์ฆ๋ฅ˜ํ•˜๋Š” teacher-student ๋ฐฉ์‹์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿค– Unitree G1 ํœด๋จธ๋…ธ์ด๋“œ์— ๋ฐฐํฌ๋œ VIRAL์˜ RGB ๊ธฐ๋ฐ˜ ์ •์ฑ…์€ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ 54ํšŒ ์—ฐ์† loco-manipulation์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ๋ณ€ํ™”์— ๋Œ€ํ•œ ๋†’์€ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ๊ณผ ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์— ๊ทผ์ ‘ํ•˜๋Š” ์„ฑ๋Šฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

VIRAL(Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation)์€ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์ž์œจ์ ์ธ loco-manipulation(์ด๋™๊ณผ ์กฐ์ž‘) ๊ธฐ์ˆ  ๋ถ€์กฑ์ด๋ผ๋Š” ํ•ต์‹ฌ์ ์ธ ๋ฐฐํฌ ์žฅ๋ฒฝ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ visual sim-to-real ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋ชจ๋“  ํ•™์Šต ๊ณผ์ •์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ง„ํ–‰ํ•˜๋ฉฐ, ํ•™์Šต๋œ ์ •์ฑ…์„ ์‹ค์ œ ํ•˜๋“œ์›จ์–ด์— zero-shot์œผ๋กœ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.

VIRAL์˜ ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ Teacher-Student ๋””์ž์ธ์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

1. Teacher Training (Phase 1: Reinforcement Learning)

  • Teacher Formulation: Teacher ์ •์ฑ… \pi_{teacher}๋Š” privileged information(์™„์ „ํ•œ ์ƒํƒœ ์ •๋ณด)์„ ํ™œ์šฉํ•˜์—ฌ ์žฅ๊ธฐ์ ์ธ loco-manipulation ์ž‘์—…์„ ํ•™์Šตํ•˜๋Š” goal-conditioned RL ์ •์ฑ…์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • Action Space: Teacher๋Š” ์ €์ˆ˜์ค€์˜ Whole-Body Control(WBC) ์ •์ฑ…(์˜ˆ: HOMIE)์„ ์œ„ํ•œ ๊ณ ์ˆ˜์ค€ ๋ช…๋ น์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, a_t = (\Delta v_t, \Delta \omega_{yaw,t}, \Delta q_{arm,t}, \Delta q_{finger,t}) ํ˜•ํƒœ์˜ delta ๋ช…๋ น์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ \Delta v_t๋Š” ์„ ํ˜• ์†๋„(x, y), \Delta \omega_{yaw,t}๋Š” ๊ฐ์†๋„(yaw), \Delta q_{arm,t}๋Š” ํŒ” ๊ด€์ ˆ, \Delta q_{finger,t}๋Š” ์†๊ฐ€๋ฝ ๋ชจํ„ฐ์— ๋Œ€ํ•œ delta joint target์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ delta action space๋Š” RL ํ›ˆ๋ จ์„ ํฌ๊ฒŒ ๊ฐ€์†ํ™”ํ•˜๊ณ  ์•ˆ์ •ํ™”์‹œํ‚ต๋‹ˆ๋‹ค.
  • Privileged Observation: Teacher์˜ ๊ด€์ธก o^{priv}_t = [o^{prop-priv}_t, o^{exte-priv}_t]๋Š” ๋‹ค์Œ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:
    • Proprioception (o^{prop-priv}_t): ๋กœ๋ด‡์˜ ๋ฒ ์ด์Šค ์„ ํ˜•(v_t) ๋ฐ ๊ฐ์†๋„(\omega_t), ๋ฒ ์ด์Šค ํˆฌ์˜ ์ค‘๋ ฅ(g_t), ์ด์ „ ์•ก์…˜(a_{t-1}), ๊ด€์ ˆ ์œ„์น˜(q_t) ๋ฐ ์†๋„(\dot{q}_t), ์†๋ ํž˜(f_{finger,t}).
    • Exteroception (o^{exte-priv}_t): ํ˜„์žฌ ์ž‘์—… ๋‹จ๊ณ„(e_t), ๋ฐฐ์น˜ ๋ฐ ๋ฆฌํ”„ํŠธ ๋ชฉํ‘œ(T_t), ๋กœ๋ด‡์— ๋Œ€ํ•œ ๋ฌผ์ฒด์™€ ํ…Œ์ด๋ธ”์˜ ์ƒ๋Œ€ ๋ณ€ํ™˜(O_t).
  • Reward Design: ์ž‘์—…์€ ๊ฑท๊ธฐ(walking), ๋ฐฐ์น˜(placing), ์žก๊ธฐ(grasping), ํšŒ์ „(turning)์˜ ์‹œํ€€์Šค๋กœ ๋ถ„ํ• ๋˜๋ฉฐ, ๊ฐ ๋‹จ๊ณ„์— ๋Œ€ํ•œ ๋ณด์ƒ์ด ์„ค๊ณ„๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ฌผ์ฒด๋ฅผ ํ–ฅํ•ด ๊ฑท๋Š” ๋ณด์ƒ์€ r_{walk} = \exp(-4 (\|p_{robot} - p_{GraspObj}\| - 0.45)^2)์™€ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.
  • Reference State Initialization (RSI): RL ํ›ˆ๋ จ์„ ์œ„ํ•ด ํ…”๋ ˆ์กฐ์ž‘ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ๋ชจ 200๊ฐœ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ , ์ด๋ฅผ ์ƒํƒœ ์ดˆ๊ธฐํ™” ๋ฒ„ํผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์—ํ”ผ์†Œ๋“œ ๋ฆฌ์…‹ ์‹œ ๋ฐ๋ชจ ์Šค๋ƒ…์ƒท์„ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ ๋กœ๋ด‡, ๋ฌผ์ฒด, ํ…Œ์ด๋ธ”์„ ์ดˆ๊ธฐํ™”ํ•จ์œผ๋กœ์จ, ์ •์ฑ…์ด ์ฒ˜์Œ๋ถ€ํ„ฐ ๋„๋‹ฌํ•˜๊ธฐ ์–ด๋ ค์šด ๋‹ค์–‘ํ•œ ๋ณด์ƒ ์ƒํƒœ์— ๋…ธ์ถœ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ ํƒ์ƒ‰์„ ๊ฐœ์„ ํ•˜๊ณ  ๋ณด์ƒ ์„ค๊ณ„์˜ ๋ถ€๋‹ด์„ ์ค„์ž…๋‹ˆ๋‹ค.

2. Student Training (Phase 2: Supervised Learning)

  • Visual Distillation: Teacher ์ •์ฑ…์€ ์‹ค์ œ ๋กœ๋ด‡์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ด€์ธก(proprioception ๋ฐ RGB ์ด๋ฏธ์ง€)๋งŒ์„ ๋ฐ›๋Š” vision-based Student ์ •์ฑ… \pi_{student}๋กœ ์ฆ๋ฅ˜๋ฉ๋‹ˆ๋‹ค.
  • DAgger & BC Mixture: Student๋Š” ์˜จ๋ผ์ธ DAgger์™€ Behavior Cloning(BC)์˜ ํ˜ผํ•ฉ์„ ํ†ตํ•ด ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ๋ชฉํ‘œ๋Š” Teacher์˜ ์•ก์…˜์„ ๋ชจ๋ฐฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์†์‹ค ํ•จ์ˆ˜๋Š” Teacher-induced ๋ฐ Student-induced ๊ด€์ธก ๋ถ„ํฌ์˜ ํ˜ผํ•ฉ์— ๋Œ€ํ•ด ๊ณ„์‚ฐ๋˜๋Š” MSE ๋ชฉ์  ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค: L_{distill} = E_{o_t \sim \rho_o}[\Vert \pi_{teacher}(o^{teacher}_t) - \pi_{student}(o^{student}_t) \Vert_2^2] ์—ฌ๊ธฐ์„œ \rho_o \triangleq \alpha \rho_o^{\pi_{teacher}} + (1 - \alpha) \rho_o^{\pi_{student}}์ด๋ฉฐ, \alpha๋Š” ํ˜ผํ•ฉ ๊ณ„์ˆ˜๋กœ, \alpha=0.5๊ฐ€ ํšจ๊ณผ์ ์ธ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.
  • Network Backbone: Student์˜ vision backbone์œผ๋กœ๋Š” DINOv3์™€ ๊ฐ™์€ ์ตœ์‹  ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๊ฐ€ ์‚ฌ์šฉ๋˜์–ด ๊ณ ํ’ˆ์งˆ RGB ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ์ด๋ฅผ proprioception๊ณผ ์œตํ•ฉํ•˜์—ฌ ์ •์ฑ… ํ—ค๋“œ์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ์‹œ๊ฐ„์  ์ปจํ…์ŠคํŠธ๋ฅผ ํ†ตํ•ฉํ•˜๋Š” history-aware ์•„ํ‚คํ…์ฒ˜๋„ ํ‰๊ฐ€๋ฉ๋‹ˆ๋‹ค.
  • Distributed Simulation Learning System: ์‹œ๊ฐ์  ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ›ˆ๋ จ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ํ™•์žฅํ•˜๊ธฐ ์œ„ํ•ด Isaac Lab์—์„œ TRL(Transformer Reinforcement Learning)๊ณผ Accelerate๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ๋Œ€ 64๊ฐœ์˜ GPU๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋Œ€๊ทœ๋ชจ ๋ถ„์‚ฐ ํ•™์Šต ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

3. Sim-to-Real Transfer

  • SysID for Dexterous Hand: Unitree G1์˜ ๊ณ ๊ธฐ์–ด๋น„ 3-finger ๋ฑ์Šคํ„ฐ๋Ÿฌ์Šค ํ•ธ๋“œ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜-์‹ค์ œ ๋ถˆ์ผ์น˜๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์‹ค์ œ ๊ทธ๋ฆฝ-๋ฆด๋ฆฌ์Šค ๋™์ž‘์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์žฌํ˜„ํ•˜๊ณ , ์†๊ฐ€๋ฝ armature, stiffness, damping ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ์‹œ์Šคํ…œ ์‹๋ณ„(SysID)์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ด€์ ˆ ๊ถค์ ์„ ์‹ค์ œ ์ธก์ •๊ฐ’๊ณผ ์ผ์น˜์‹œํ‚ต๋‹ˆ๋‹ค.
  • FOV Alignment and Randomization: ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ์นด๋ฉ”๋ผ ๋‚ด๋ถ€ ๋งค๊ฐœ๋ณ€์ˆ˜(์ดˆ์  ๊ฑฐ๋ฆฌ, ์ดˆ์  ๊ฑฐ๋ฆฌ, ์„ผ์„œ ์กฐ๋ฆฌ๊ฐœ)๋ฅผ ์ œ์กฐ์—…์ฒด ์‚ฌ์–‘์— ๋งž์ถ”๊ณ , ์‹ค์ œ-์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์™ธ์ธก ๋งค๊ฐœ๋ณ€์ˆ˜(extrinsics) ๋ณด์ •์„ ์‹œ๊ฐ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ค‘์—๋Š” ์™ธ์ธก ๋งค๊ฐœ๋ณ€์ˆ˜ ๋ฌด์ž‘์œ„ํ™”(extrinsics randomization)๋ฅผ ์ ์šฉํ•˜์—ฌ ํ•˜๋“œ์›จ์–ด๋กœ ์ธํ•œ ์‹œ์  ์ฐจ์ด์— ๊ฐ•๊ฑด์„ฑ์„ ํ™•๋ณดํ•ฉ๋‹ˆ๋‹ค.
  • Visual and Simulation Randomization: sim-to-real ์ „์†ก์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํ›ˆ๋ จ ์ค‘ ๊ด‘๋ฒ”์œ„ํ•œ ์‹œ๊ฐ์  ๋ฐ ๋ฌผ๋ฆฌ์  ๋ฌด์ž‘์œ„ํ™”๊ฐ€ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ์ด๋ฏธ์ง€ ํ’ˆ์งˆ(๋ฐ๊ธฐ, ๋Œ€๋น„, ์ƒ‰์กฐ, ์ฑ„๋„, ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ, ๋ธ”๋Ÿฌ), ์นด๋ฉ”๋ผ ์™ธ์ธก ๋งค๊ฐœ๋ณ€์ˆ˜, ์„ผ์„œ ์ง€์—ฐ, ๋” ๋ผ์ดํŠธ ํ™˜๊ฒฝ, ๋ฐ”๋‹ฅ, ํ…Œ์ด๋ธ”, ๋ฌผ์ฒด, ๋กœ๋ด‡ ๊ตฌ์„ฑ ์š”์†Œ์˜ ์žฌ๋ฃŒ ๋ฐ ์ƒ‰์ƒ ์†์„ฑ ๋ฌด์ž‘์œ„ํ™”๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

๊ทœ๋ชจ์˜ ์ค‘์š”์„ฑ

VIRAL์€ ์ปดํ“จํŒ… ๊ทœ๋ชจ๊ฐ€ Teacher ๋ฐ Student ํ›ˆ๋ จ ๋ชจ๋‘์— ์ค‘์š”ํ•จ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜์‹ญ ๊ฐœ์˜ GPU(์ตœ๋Œ€ 64๊ฐœ)๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์ด ํ•™์Šต์˜ ์‹ ๋ขฐ์„ฑ์„ ๋ณด์žฅํ•˜๋ฉฐ, ๋‚ฎ์€ ์ปดํ“จํŒ… ํ™˜๊ฒฝ์—์„œ๋Š” ์ข…์ข… ์‹คํŒจํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ

Unitree G1 ํœด๋จธ๋…ธ์ด๋“œ์— ๋ฐฐํฌ๋œ RGB ๊ธฐ๋ฐ˜ ์ •์ฑ…์€ ์ตœ๋Œ€ 54ํšŒ ์—ฐ์† loco-manipulation ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๋‹ค์–‘ํ•œ ๊ณต๊ฐ„ ๋ฐ ์™ธํ˜• ๋ณ€ํ™”์— ์ผ๋ฐ˜ํ™”๋ฉ๋‹ˆ๋‹ค. ์ด ์ •์ฑ…์€ ์‹ค์ œ ์„ธ๊ณ„ fine-tuning ์—†์ด ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€ ํ…”๋ ˆ์กฐ์ž‘ ์„ฑ๋Šฅ์— ๊ทผ์ ‘ํ•˜๋Š” ๊ฒฌ๊ณ ํ•จ๊ณผ ํšจ์œจ์„ฑ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ablation ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด RGB ๊ธฐ๋ฐ˜ ํœด๋จธ๋…ธ์ด๋“œ loco-manipulation์„ ์‹คํ˜„ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ํ•ต์‹ฌ ์„ค๊ณ„ ์„ ํƒ์ด ๋ถ„์„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

TL;DR: VIRAL์€ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ loco-manipulation์„ ์ˆœ์ˆ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•˜๊ณ  ์‹ค์ œ ํ•˜๋“œ์›จ์–ด์— zero-shot์œผ๋กœ ๋ฐฐํฌํ•˜๋Š” visual sim-to-real ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. 64๊ฐœ GPU ๊ทœ๋ชจ์˜ ๋Œ€๊ทœ๋ชจ ํ•™์Šต๊ณผ ๊ด‘๋ฒ”์œ„ํ•œ domain randomization์„ ํ†ตํ•ด Unitree G1 ํœด๋จธ๋…ธ์ด๋“œ์—์„œ 59ํšŒ ์—ฐ์† ์‹œ๋„ ์ค‘ 54ํšŒ ์„ฑ๊ณต(91.5%)์ด๋ผ๋Š” ์ „๋ฌธ๊ฐ€ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์— ๊ทผ์ ‘ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

1. ์„œ๋ก : ์™œ ํœด๋จธ๋…ธ์ด๋“œ Loco-Manipulation์ด ์–ด๋ ค์šด๊ฐ€?

ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์€ ๋ฒ”์šฉ ๋ฌผ๋ฆฌ์  ์ธ๊ณต์ง€๋Šฅ์˜ ๊ถ๊ทน์ ์ธ ๊ตฌํ˜„์ฒด๋กœ ์—ฌ๊ฒจ์ง‘๋‹ˆ๋‹ค. ์ธ๊ฐ„์˜ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง„ ๋กœ๋ด‡์ด ์ธ๊ฐ„์ด ์„ค๊ณ„ํ•œ ํ™˜๊ฒฝ์—์„œ ๋‹ค์–‘ํ•œ ๋ฌผ๋ฆฌ์  ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ์‚ฌํšŒ์˜ ๋งŽ์€ ๋ฌผ๋ฆฌ์  ๋…ธ๋™์„ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์žฌ์˜ ํœด๋จธ๋…ธ์ด๋“œ ์‹œ์Šคํ…œ๋“ค์€ ๋Œ€๋ถ€๋ถ„ ์‹ ์ค‘ํ•˜๊ฒŒ ์„ค๊ณ„๋œ ๋ฐ๋ชจ ํ™˜๊ฒฝ ๋ฐ”๊นฅ์—์„œ๋Š” ์ œํ•œ์ ์ธ ์‹ค์šฉ์„ฑ๋งŒ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ ํ•ต์‹ฌ์ ์ธ missing piece๊ฐ€ ๋ฐ”๋กœ autonomous loco-manipulation์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ locomotion(์ด๋™)๊ณผ manipulation(์กฐ์ž‘)์„ ์˜จ๋ณด๋“œ ์„ผ์„œ ๊ธฐ๋ฐ˜ ์ธ์‹๊ณผ ํ•จ๊ป˜ ๊ธด๋ฐ€ํ•˜๊ฒŒ ์กฐ์œจํ•˜์—ฌ, ๊ธด ์‹œ๊ฐ„ ๋™์•ˆ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ์—์„œ ์œ ์šฉํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

1.1 ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•๋“ค์˜ ํ•œ๊ณ„

ํ˜„์žฌ ํœด๋จธ๋…ธ์ด๋“œ ์—ฐ๊ตฌ์˜ ์ฃผ์š” ํ๋ฆ„๋“ค์„ ์‚ดํŽด๋ณด๋ฉด:

  • Blind Locomotion: proprioceptive ์ •๋ณด๋งŒ์œผ๋กœ ๊ฑท๊ธฐ์— ์ง‘์ค‘. ํ™˜๊ฒฝ ์ธ์‹ ์—†์ด๋Š” ์‹ค์ œ ์ž‘์—… ์ˆ˜ํ–‰ ๋ถˆ๊ฐ€.
  • Tabletop Manipulation: ๊ณ ์ •๋œ base์—์„œ์˜ manipulation. ์ด๋™์ด ํ•„์š”ํ•œ ์‹ค์ œ ํ™˜๊ฒฝ์— ์ ์šฉ ๊ณค๋ž€.
  • Teleoperation ์˜์กด: ์ธ๊ฐ„ ์กฐ์ž‘์ž์˜ ์‹ค์‹œ๊ฐ„ ๊ฐœ์ž… ํ•„์š”. ์ž์œจ์„ฑ์ด ์—†์–ด ํ™•์žฅ์„ฑ์— ํ•œ๊ณ„.
  • ์™ธ๋ถ€ ์„ผ์„œ ์˜์กด: motion capture ๋“ฑ ๋น„์˜จ๋ณด๋“œ ์„ผ์„œ ํ•„์š”. ์‹ค์ œ ๋ฐฐํฌ ํ™˜๊ฒฝ์—์„œ ์‚ฌ์šฉ ๋ถˆ๊ฐ€.

1.2 Real-World Data vs. Simulation

์ตœ๊ทผ LLM์˜ ์„ฑ๊ณต ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋กœ๋ณดํ‹ฑ์Šค์— ์ ์šฉํ•˜๋ ค๋Š” ์‹œ๋„๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ ์‹ค์„ธ๊ณ„ ๋ฐ์ดํ„ฐ์…‹์„ ์ˆ˜์ง‘ํ•˜๊ณ  โ€œrobotic foundation modelโ€์„ ํ•™์Šตํ•˜๋Š” ์ ‘๊ทผ๋ฒ•์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ mobile manipulation์€ ๊ณ ์ •๋œ tabletop ์„ค์ •๋ณด๋‹ค ํ›จ์”ฌ ๋” ๋งŽ์€ variation์„ ํฌํ•จํ•˜๋ฉฐ, ํœด๋จธ๋…ธ์ด๋“œ์˜ ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋‹น ๋น„์šฉ์ด ๋”์šฑ ๋†’์Šต๋‹ˆ๋‹ค. ๋†’์€ ์ž์œ ๋„(DoF), ์•ˆ์ „ ์ œ์•ฝ, ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ์Šคํƒ์˜ ์—”์ง€๋‹ˆ์–ด๋ง ์˜ค๋ฒ„ํ—ค๋“œ ๋“ฑ์ด ๊ทธ ์ด์œ ์ž…๋‹ˆ๋‹ค.

์ด์— ๋Œ€ํ•œ ๋Œ€์•ˆ์œผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜„๋Œ€์˜ GPU ๊ฐ€์† photorealistic ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋“ค์€ ์ธ๊ฐ„ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ๋Œ€๋น„ ํ›จ์”ฌ ๋‚ฎ์€ ํ•œ๊ณ„ ๋น„์šฉ์œผ๋กœ ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Sim-to-real์€ ์ด๋ฏธ legged locomotion์—์„œ de facto ์ ‘๊ทผ๋ฒ•์ด ๋˜์—ˆ์ง€๋งŒ, manipulation ์˜์—ญ์€ ์—ฌ์ „ํžˆ ์‹ค์„ธ๊ณ„ imitation learning์ด ์ฃผ๋ฅ˜์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์—ฐ๊ตฌ ์งˆ๋ฌธ: โ€œCan visual sim-to-real enable useful humanoid loco-manipulation with onboard perception?โ€


2. VIRAL ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐœ์š”

VIRAL(Visual Sim-to-Real At ScaLe)์€ ํœด๋จธ๋…ธ์ด๋“œ loco-manipulation์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์™„์ „ํžˆ ํ•™์Šตํ•˜๊ณ  ์‹ค์ œ ํ•˜๋“œ์›จ์–ด์— zero-shot์œผ๋กœ ๋ฐฐํฌํ•˜๋Š” visual sim-to-real ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์˜ ๋ชฉํ‘œ๋Š” ์ƒˆ๋กœ์šด RL์ด๋‚˜ sim-to-real ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, RGB ๊ธฐ๋ฐ˜ ํœด๋จธ๋…ธ์ด๋“œ loco-manipulation์„ ์‹ค์ œ๋กœ ์ž‘๋™ํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ์ „์ฒด ์Šคํƒ์˜ ๊ธฐ์ˆ ์  ๋ ˆ์‹œํ”ผ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

2.1 Teacher-Student ์•„ํ‚คํ…์ฒ˜

VIRAL์€ ํšจ์œจ์ ์ธ visual simulation ํ•™์Šต์„ ์œ„ํ•ด teacher-student privileged learning ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด 2๋‹จ๊ณ„ ์ ‘๊ทผ๋ฒ•์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๋จผ์ € โ€œ์‰ฌ์šดโ€ ๋ฌธ์ œ๋ฅผ ํ’€๊ณ , ๊ทธ ์†”๋ฃจ์…˜์„ โ€œ์–ด๋ ค์šดโ€ ๋ฌธ์ œ๋กœ ์ „์ดํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  1. Stage 1 - Teacher Policy ํ•™์Šต: 16 GPUs(2 nodes ร— 8 L40S)์—์„œ privileged state ๊ธฐ๋ฐ˜ RL teacher ํ•™์Šต. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ground-truth ์ •๋ณด(๋ฌผ์ฒด ์œ„์น˜, ํ…Œ์ด๋ธ” transform ๋“ฑ)๋ฅผ ํ™œ์šฉ.

  2. Stage 2 - Student Policy ์ฆ๋ฅ˜: 64 GPUs(8 nodes ร— 8 L40S)์—์„œ tiled rendering์„ ํ™œ์šฉํ•œ ๋Œ€๊ทœ๋ชจ visual distillation. Teacher์˜ ์ง€์‹์„ RGB ์ด๋ฏธ์ง€์™€ proprioception๋งŒ ๊ด€์ฐฐํ•˜๋Š” student๋กœ ์ „์ด.

์ตœ์ข…์ ์œผ๋กœ student policy๊ฐ€ ์‹ค์ œ ๋กœ๋ด‡์— ๋ฐฐํฌ๋˜๋ฉฐ, ์ด ๊ณผ์ •์—์„œ ์–ด๋– ํ•œ ์‹ค์„ธ๊ณ„ fine-tuning๋„ ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์˜จ๋ณด๋“œ ์„ผ์„œ๋งŒ์œผ๋กœ walking, placing, grasping, object transport๋ฅผ ํฌํ•จํ•œ ์—ฐ์†์ ์ธ loco-manipulation์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.


3. Teacher Policy ํ•™์Šต: ํ•ต์‹ฌ ์„ค๊ณ„ ์š”์†Œ

Teacher policy๋Š” goal-conditioned RL policy๋กœ formulate๋ฉ๋‹ˆ๋‹ค. ๋งค time step t์—์„œ teacher๋Š” privileged observation์„ ๋ฐ›์•„ low-level WBC(Whole-Body Control) policy์— ์ „๋‹ฌํ•  high-level command๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

3.1 Action Space ์„ค๊ณ„: Delta vs. Absolute

์ผ๋ฐ˜์ ์ธ legged locomotion RL์—์„œ๋Š” absolute joint targets๋ฅผ action space๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ VIRAL์—์„œ๋Š” delta action space๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. Policy๊ฐ€ ์ ˆ๋Œ€ ์œ„์น˜ ๋Œ€์‹  ์ฆ๋ถ„(increment)์„ ์ถœ๋ ฅํ•˜๊ณ , ์ด๊ฒƒ์ด WBC command์— ๋ˆ„์ ๋ฉ๋‹ˆ๋‹ค.

Action ๊ตฌ์„ฑ: a = (ฮ”vโ‚“, ฮ”vแตง, ฮ”ฯ‰_yaw, ฮ”q_arm, ฮ”q_finger)
- ฮ”v: delta linear velocity (x, y)
- ฮ”ฯ‰: delta angular velocity (yaw)
- ฮ”q: delta joint targets for arm and finger motors

์‹คํ—˜ ๊ฒฐ๊ณผ, delta action space๊ฐ€ RL ํ•™์Šต์„ ํ˜„์ €ํžˆ ๊ฐ€์†ํ™”ํ•˜๊ณ  ์•ˆ์ •ํ™”ํ•ฉ๋‹ˆ๋‹ค. Absolute action์„ ์‚ฌ์šฉํ•œ ๋ณ€ํ˜•์€ ๋†’์€ ์„ฑ๊ณต๋ฅ ์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•œ ๋ฐ˜๋ฉด, delta action์„ ์‚ฌ์šฉํ•œ teacher๋Š” ์•ˆ์ •์ ์œผ๋กœ ๊ณผ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

3.2 WBC Command as API

VIRAL์˜ teacher๋Š” low-level motor skills์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹ , pre-trained WBC policy(HOMIE) ์œ„์—์„œ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ด ์„ค๊ณ„์˜ ํ•ต์‹ฌ ์ด์ ์€:

  • Reward Engineering ๋ถ€๋‹ด ๊ฐ์†Œ: ๊ธฐ๋ณธ์ ์ธ locomotion๊ณผ balance๋Š” WBC๊ฐ€ ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ, task-level reward์— ์ง‘์ค‘ ๊ฐ€๋Šฅ.
  • ์•ˆ์ „ํ•˜๊ณ  ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ๋ฐฐํฌ: Action space๊ฐ€ ์•ˆ์ „ํ•˜๊ณ  ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ํœด๋จธ๋…ธ์ด๋“œ ๋™์ž‘ ์˜์—ญ์œผ๋กœ ์ œํ•œ๋จ.
  • ๋ชจ๋“ˆ์„ฑ: ๋‹ค๋ฅธ WBC controller(์˜ˆ: TWIST, SONIC)๋กœ ๋Œ€์ฒด ๊ฐ€๋Šฅํ•œ ์œ ์—ฐํ•œ ์„ค๊ณ„.

WBC์˜ command interface๋Š” locomotion์„ ์œ„ํ•œ velocity์™€ height tracking commands, ๊ทธ๋ฆฌ๊ณ  upper-body joint commands๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. VIRAL์€ ์—ฌ๊ธฐ์— finger actions์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์™„์ „ํ•œ loco-manipulation action space๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

3.3 Reference State Initialization (RSI)

๊ณ  ์ž์œ ๋„ ํœด๋จธ๋…ธ์ด๋“œ์—์„œ ๊ธด ์‹œ๊ฐ„ ๋™์•ˆ์˜ walking-placing-grasping-turning ์Šคํ‚ฌ์„ RL๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ๊ทน๋„๋กœ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. Heavy reward engineering์ด ํ•„์š”ํ•˜๊ณ , ๊ทธ๋Ÿผ์—๋„ ์ข…์ข… suboptimalํ•˜๊ฑฐ๋‚˜ sim-to-real transfer๊ฐ€ ์‹คํŒจํ•˜๋Š” policy๋ฅผ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

VIRAL์€ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Reference State Initialization (RSI)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 200๊ฐœ์˜ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ demonstration์„ ์ˆ˜์ง‘ํ•˜๊ณ , ์ด๋ฅผ RL์˜ state-initialization buffer๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

RSI ๋™์ž‘ ์›๋ฆฌ: ๋งค episode reset ์‹œ, demonstration์˜ snapshot์„ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ scene(robot, objects, tables)์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด policy๊ฐ€ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋ชจ๋“  ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜๊ธฐ ์ „์—๋„ ๋‹ค์–‘ํ•œ rewarding states๋ฅผ ๊ฒฝํ—˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

RSI์˜ ํ•ต์‹ฌ ์ด์ ์€ ๋‘ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค: 1. Brittle reward tuning์— ๋Œ€ํ•œ ์˜์กด๋„๋ฅผ ํฌ๊ฒŒ ์ค„์ž…๋‹ˆ๋‹ค 2. ์ธ๊ฐ„์ด ์ œ๊ณตํ•œ grasping๊ณผ placement poses๊ฐ€ ๊ฐ•๋ ฅํ•œ ์‚ฌ์ „ ์ง€์‹(prior)์„ ์ œ๊ณตํ•˜์—ฌ sim-to-real transfer๋ฅผ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.

Ablation ๊ฒฐ๊ณผ, RSI ์—†์ด๋Š” teacher policy๊ฐ€ ๋น ๋ฅด๊ฒŒ 10% ๋ฏธ๋งŒ์˜ ์„ฑ๊ณต๋ฅ ์—์„œ ์ •์ฒด๋˜๋Š” ๋ฐ˜๋ฉด, RSI๋ฅผ ์‚ฌ์šฉํ•œ full VIRAL teacher๋Š” ๊ฑฐ์˜ 95% ์„ฑ๊ณต๋ฅ ์— ๋„๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” RSI๊ฐ€ ํœด๋จธ๋…ธ์ด๋“œ loco-manipulation ํ•™์Šต์— ํ•„์ˆ˜์ ์ž„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

3.4 Stage-Based Reward Design

ํœด๋จธ๋…ธ์ด๋“œ loco-manipulation์„ ์œ„ํ•œ reward ์„ค๊ณ„๋ฅผ ์œ„ํ•ด, task๋ฅผ walking, placing, grasping, turning์˜ ์ˆœ์ฐจ์  ๋‹จ๊ณ„๋กœ ๋ถ„ํ•ดํ•ฉ๋‹ˆ๋‹ค. ๋„ค ๊ฐ€์ง€ ํ•ต์‹ฌ reward๊ฐ€ ์ •์˜๋ฉ๋‹ˆ๋‹ค:

  1. Walking toward objects: r_walk = -d_robot_object (๋กœ๋ด‡-๋ฌผ์ฒด ๊ฑฐ๋ฆฌ ์ตœ์†Œํ™”)
  2. Placing objects: r_place = -d_object_target - f_finger (๋ฌผ์ฒด-๋ชฉํ‘œ ๊ฑฐ๋ฆฌ์™€ ์†๊ฐ€๋ฝ ํž˜)
  3. Grasping objects: r_grasp = f_grasp + d_hand_object (ํŒŒ์ง€๋ ฅ๊ณผ ์†-๋ฌผ์ฒด ๊ฑฐ๋ฆฌ)
  4. Turning: r_turn = -|y - y_target| (๋ชฉํ‘œ yaw ๊ฐ๋„๋กœ์˜ ํšŒ์ „)

์ด reward๋Š” stage-weighted sum์œผ๋กœ ๊ณ„์‚ฐ๋˜๋ฉฐ, stage ์ „ํ™˜์€ stage-specific ์กฐ๊ฑด์— ์˜ํ•ด ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค. ์™„์ „ํ•œ place-pickup cycle์€ 5๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค: 1. ๋ฌผ์ฒด๋กœ ๊ฑธ์–ด๊ฐ€๊ธฐ 2. ํŒ”๊ณผ ์†์„ pre-place pose๋กœ ์ด๋™ 3. ๋ฌผ์ฒด ๋†“๊ธฐ 4. ๋‹ค์Œ ๋ฌผ์ฒด ์žก๊ณ  ๋“ค์–ด์˜ฌ๋ฆฌ๊ธฐ 5. ํšŒ์ „

์ด ์‹œํ€€์Šค๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ ๊ธด ์‹œ๊ฐ„์˜ loco-manipulation loop๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.


4. Student Policy ํ•™์Šต: Visual Distillation

Privileged teacher๊ฐ€ ๊ฐ•๋ ฅํ•œ behavior๋ฅผ ๋ฐœ๊ฒฌํ•œ ํ›„, ์ด๋ฅผ ์‹ค์ œ ๋กœ๋ด‡์—์„œ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ๊ด€์ธก(proprioception๊ณผ RGB ์ด๋ฏธ์ง€)๋งŒ ๋ฐ›๋Š” student policy๋กœ ์ฆ๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ๋Œ€๊ทœ๋ชจ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ tiled rendering๊ณผ ํ•จ๊ป˜ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.

4.1 DAgger + BC Mixture

Student policy๋Š” online DAgger(Dataset Aggregation)์™€ Behavior Cloning(BC)์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ๋ฅผ ํ†ตํ•ด ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฐฉ๋ฒ• ๋ชจ๋‘ ๋™์ผํ•œ MSE objective๋ฅผ ๊ณต์œ ํ•˜์ง€๋งŒ, ๊ด€์ธก ๋ถ„ํฌ์˜ ์ถœ์ฒ˜๊ฐ€ ๋‹ค๋ฆ…๋‹ˆ๋‹ค:

L = ฮฑยทE[||ฯ€_s(o) - ฯ€_t(o)||ยฒ]_teacher + (1-ฮฑ)ยทE[||ฯ€_s(o) - ฯ€_t(o)||ยฒ]_student
  • Teacher rollouts (BC): ๊นจ๋—ํ•˜๊ณ  near-optimalํ•œ demonstrations ์ œ๊ณต. Student์— ๊ฐ•๋ ฅํ•œ prior๋ฅผ ๋น ๋ฅด๊ฒŒ ๊ฐ์ธ.
  • Student rollouts (DAgger): Teacher์˜ ์ด์ƒ์  ๋ถ„ํฌ ๋ฐ”๊นฅ์˜ states ๋…ธ์ถœ. Error-correction ๊ฒฌ๊ณ ์„ฑ ํ–ฅ์ƒ, compounding error ๋ฐฉ์ง€.

Ablation ๊ฒฐ๊ณผ, ์ˆœ์ˆ˜ BC(ฮฑ=1)๋Š” ๋น ๋ฅธ loss ๊ฐ์†Œ๋ฅผ ๋ณด์ด์ง€๋งŒ ์ž์‹ ์˜ ์‹ค์ˆ˜๋ฅผ ๊ต์ •ํ•˜์ง€ ๋ชปํ•˜๋Š” brittle policy๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. Student rollouts๋ฅผ ๋„์ž…ํ•˜๋ฉด(ฮฑ=0.5) ์ตœ์ ํ™”๊ฐ€ ์•ฝ๊ฐ„ ๋А๋ ค์ง€์ง€๋งŒ, ๋ฐฐํฌ ์„ฑ๊ณต๋ฅ ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ฮฑ=0.5๋ฅผ ๊ธฐ๋ณธ DAgger-BC ๋น„์œจ๋กœ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค.

4.2 Vision Backbone ์„ ํƒ

Student์˜ vision backbone์œผ๋กœ ์ตœ์‹  image encoder์ธ DINOv3๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. 640ร—480 RGB ์ด๋ฏธ์ง€์—์„œ high-quality visual features๋ฅผ ์ถ”์ถœํ•˜๊ณ , ์ด๋ฅผ proprioceptive ์ •๋ณด์™€ fusionํ•˜์—ฌ policy head์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.

Vision backbone ๋น„๊ต ์‹คํ—˜ ๊ฒฐ๊ณผ, state-of-the-art backbone(DINOv3)์ด ๋” ๊ฐ•๋ ฅํ•œ visual representations๊ณผ ๋” ํฐ capacity๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ๋” ๋น ๋ฅธ ์ˆ˜๋ ด๊ณผ ๋” ๋†’์€ task ์„ฑ๊ณต๋ฅ ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ๋” ์ข‹์€ visual features๊ฐ€ ๊ณง ๋” ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” policy ํ•™์Šต์œผ๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค.

4.3 History Architecture

Student policy head์˜ ์•„ํ‚คํ…์ฒ˜๋กœ single-step MLP baseline, feed-forward history model, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์–‘ํ•œ history length์˜ LSTM์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ, history-aware models์ด ์ผ๊ด€๋˜๊ฒŒ single-step baseline์„ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋” ๊ธด temporal windows๋Š” ๋ฆฌ์†Œ์Šค๊ฐ€ ํ—ˆ์šฉํ•˜๋Š” ๋ฒ”์œ„์—์„œ ์ถ”๊ฐ€์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” loco-manipulation๊ณผ ๊ฐ™์€ sequential decision-making task์—์„œ temporal context๊ฐ€ ์ค‘์š”ํ•จ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

4.4 ๋ถ„์‚ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•™์Šต ์‹œ์Šคํ…œ

๋Œ€๊ทœ๋ชจ visual simulation์€ ๋ Œ๋”๋ง์ด ์—†๋Š” physics simulation๋ณด๋‹ค ์ตœ์†Œ ํ•œ ์ž๋ฆฟ์ˆ˜ ์ด์ƒ ๋А๋ฆฝ๋‹ˆ๋‹ค. Visual simulation training throughput์„ ํ™•์žฅํ•˜๊ธฐ ์œ„ํ•ด, ์ €์ž๋“ค์€ TRL์˜ ์ปค์Šคํ„ฐ๋งˆ์ด์ฆˆ๋œ ๋ฒ„์ „์„ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค. Accelerate๋ฅผ ์ง€์›ํ•˜์—ฌ ์—ฌ๋Ÿฌ GPU์™€ compute nodes์— ๊ฑธ์ณ ํšจ์œจ์ ์œผ๋กœ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ตฌํ˜„์€ single-GPU ํ•™์Šต์˜ ๋‹จ์ˆœ์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ, high-throughput visual sim-to-real ํ•™์Šต์„ ์œ„ํ•ด ๋Œ€๊ทœ๋ชจ ํด๋Ÿฌ์Šคํ„ฐ๋กœ near-linear scaling์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.


5. Sim-to-Real Transfer ์ „๋žต

Visual sim-to-real transfer๋ฅผ ์œ„ํ•ด VIRAL์€ ๋‘ ๊ฐ€์ง€ ์ƒ๋ณด์ ์ธ ์ „๋žต์„ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค: ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ธก์˜ ๋Œ€๊ทœ๋ชจ domain randomization๊ณผ ํ•˜๋“œ์›จ์–ด ์ธก์˜ real-to-sim alignment.

5.1 System Identification for Dexterous Hand

ํ˜„๋Œ€ ํœด๋จธ๋…ธ์ด๋“œ๋“ค์€ ์ ์  ๋” low-gear ratio motors๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ motor-level SysID์˜ ํ•„์š”์„ฑ์„ ์ค„์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Unitree G1์˜ 3-finger dexterous hand๋Š” high gear ratio๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์ƒ๋‹นํ•œ sim-to-real mismatch๋ฅผ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ €์ž๋“ค์€ ์‹ค์„ธ๊ณ„์—์„œ grasp-release primitive๋ฅผ ์ •์˜ํ•˜๊ณ  ๋™์ผํ•œ action sequence๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ replayํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ finger armature, stiffness, damping parameters์— ๋Œ€ํ•ด SysID๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋œ joint trajectories๋ฅผ ์‹ค์ œ ์ธก์ •๊ฐ’๊ณผ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.

5.2 Camera FOV Alignment and Randomization

์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ camera intrinsics(focal length, focus distance, sensor apertures)๋ฅผ ์ œ์กฐ์‚ฌ ์‚ฌ์–‘์— ๋งž์ถฅ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Unitree G1 ๋กœ๋ด‡์˜ camera extrinsics๋Š” ์œ ๋‹›๋ณ„๋กœ ๋‹ค๋ฅด๋ฉฐ, ๊ฐ™์€ ๋กœ๋ด‡์—์„œ๋„ ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ driftํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ visual observations์„ ๋” ์ž˜ ์ •๋ ฌํ•˜๊ธฐ ์œ„ํ•ด, ์ €์ž๋“ค์€ ๋ Œ๋”๋ง๋œ ์ด๋ฏธ์ง€์™€ ์‹ค์ œ ์ด๋ฏธ์ง€๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ๋งค์นญํ•˜์—ฌ lightweight real-to-sim extrinsics calibration์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ถ”๊ฐ€๋กœ training ์ค‘ extrinsics randomization์„ ์ ์šฉํ•˜์—ฌ, student๊ฐ€ ํ•˜๋“œ์›จ์–ด ์œ ๋ฐœ viewpoint ์ฐจ์ด์— ๊ฒฌ๊ณ ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

5.3 Visual and Simulation Randomization

๊ฒฌ๊ณ ์„ฑ ํ–ฅ์ƒ๊ณผ sim-to-real transfer ๊ฐœ์„ ์„ ์œ„ํ•ด, training ์ค‘ ๊ด‘๋ฒ”์œ„ํ•œ visual ๋ฐ physical randomization์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค:

Category Randomization Parameters
Image Quality brightness, contrast, hue, saturation, Gaussian noise, blur
Camera Extrinsics position noise (x, y, z), rotation noise (roll, pitch, yaw)
Camera Latency transmission delays ๋ชจ๋ธ๋ง
Dome Light Indoor, Clear, Cloudy, Night, Studio ํ™˜๊ฒฝ
Material floors, tables, objects, robot components์˜ ์ƒ‰์ƒ/์žฌ์งˆ ์†์„ฑ
Table Properties ๋†’์ด, ๊นŠ์ด, ๋„ˆ๋น„, ๋‘๊ป˜ ๋ณ€ํ™”

Ablation ๊ฒฐ๊ณผ, ์„ธ ๊ฐ€์ง€ ์ฃผ์š” component๊ฐ€ ๊ฐ€์žฅ ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค: 1. Material randomization (M) 2. Dome-light randomization (D) 3. Camera-extrinsics randomization (E)

๋ชจ๋“  randomization์„ ๋„๋ฉด ์„ฑ๋Šฅ์ด 35.1% ๊ฐ์†Œํ•˜๊ณ , ๊ฐœ๋ณ„ component๋ฅผ ์ œ๊ฑฐํ•ด๋„ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” randomization๋“ค์ด ์ƒํ˜ธ ๋ณด์™„์ ์ด๋ฉฐ ๊ฒฌ๊ณ ํ•œ sim-to-real transfer๋ฅผ ์œ„ํ•œ ํ•ต์‹ฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ํ˜•์„ฑํ•จ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.


6. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„

6.1 ์‹คํ—˜ ์„ค์ •

์‹คํ—˜์€ 29-DoF Unitree G1 humanoid์—์„œ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. ์ด ๋กœ๋ด‡์€ 7-DoF three-finger dexterous hands๋ฅผ ์žฅ์ฐฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Perception์€ Intel RealSense D435i๊ฐ€ ์ œ๊ณตํ•˜๋ฉฐ, ๋ชจ๋“  policy inference๋Š” Intel i9-14900K CPU์™€ NVIDIA RTX 4090 GPU๊ฐ€ ์žฅ์ฐฉ๋œ ๋ฐ์Šคํฌํƒ‘ ์›Œํฌ์Šคํ…Œ์ด์…˜์—์„œ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.

6.2 ๊ฒฌ๊ณ ์„ฑ(Robustness) ํ‰๊ฐ€

ํ•™์Šต๋œ student policy์˜ ๊ฒฌ๊ณ ์„ฑ์„ ์—ฐ์†์ ์ธ loco-manipulation task์—์„œ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ํœด๋จธ๋…ธ์ด๋“œ๊ฐ€ ๋‘ ํ…Œ์ด๋ธ” ์‚ฌ์ด๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ฑธ์–ด๋‹ค๋‹ˆ๋ฉฐ, ๋ฌผ์ฒด๋ฅผ ๋†“๊ณ , ์ƒˆ ๋ฌผ์ฒด๋ฅผ ์žก๊ณ , ๋Œ์•„์„œ๋Š” ๊ณผ์ œ์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ฒฐ๊ณผ: 59ํšŒ ์—ฐ์† ์‹ค์„ธ๊ณ„ ์‹œ๋„์—์„œ VIRAL์€ 54ํšŒ ์„ฑ๊ณต (91.5% ์„ฑ๊ณต๋ฅ )์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ํ™•์žฅ๋œ ๋ฐฐํฌ์—์„œ ๊ฐ•๋ ฅํ•œ ์‹ ๋ขฐ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์ธ๊ฐ„ ํ…”๋ ˆ์˜คํผ๋ ˆ์ดํ„ฐ์™€์˜ ๋น„๊ต

VIRAL์„ ๋‘ ๋ช…์˜ ์ธ๊ฐ„ ํ…”๋ ˆ์˜คํผ๋ ˆ์ดํ„ฐ์™€ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค: G1 ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ๊ฒฝํ—˜ 1000์‹œ๊ฐ„ ์ด์ƒ์˜ ์ „๋ฌธ๊ฐ€์™€ ์•ฝ 1์‹œ๊ฐ„ ๊ฒฝํ—˜์˜ ๋น„์ „๋ฌธ๊ฐ€. ๋ชจ๋“  ์กฐ๊ฑด์—์„œ ๋™์ผํ•œ HOMIE policy๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฑฐ์˜ apple-to-apple ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์กฐ๊ฑด ์„ฑ๊ณต๋ฅ  Cycle Time
์ „๋ฌธ๊ฐ€ (1000+ hrs) 100% 21.4์ดˆ
VIRAL 91.5% 20.2์ดˆ
๋น„์ „๋ฌธ๊ฐ€ (~1 hr) 73% ๋А๋ฆผ

์ „๋ฌธ๊ฐ€๋Š” 100% ์„ฑ๊ณต๋ฅ ๊ณผ 21.4์ดˆ์˜ cycle time์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. VIRAL์€ 20.2์ดˆ๋กœ ์•ฝ๊ฐ„ ๋” ๋น ๋ฅธ cycle time์„ ๋ณด์ด๋ฉด์„œ, ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์— ๊ทผ์ ‘ํ•˜๋Š” ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ•œํŽธ, ๋น„์ „๋ฌธ๊ฐ€๋Š” 73%์˜ ์„ฑ๊ณต๋ฅ ๊ณผ ํ˜„์ €ํžˆ ๋А๋ฆฐ ์‹คํ–‰ ์†๋„๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค.

์ด ๊ฒฐ๊ณผ๋Š” VIRAL์ด ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์˜ ์„ฑ๊ณต์€ ์•„์ง ์–ด๋ ต์ง€๋งŒ, ๋น„์ „๋ฌธ๊ฐ€๋ฅผ ์‹ ๋ขฐ์„ฑ๊ณผ ํšจ์œจ์„ฑ ๋ชจ๋‘์—์„œ ํฌ๊ฒŒ ๋Šฅ๊ฐ€ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

6.3 ์ผ๋ฐ˜ํ™”(Generalization) ํ‰๊ฐ€

์‹ค์„ธ๊ณ„ ์ผ๋ฐ˜ํ™”๋ฅผ ์—ฌ๋Ÿฌ ํ™˜๊ฒฝ ์š”์ธ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ณ€ํ™”์‹œ์ผœ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค:

  • Tray Position: Y์ถ•(์ขŒ/์ค‘์•™/์šฐ), X์ถ•(ํ…Œ์ด๋ธ” ์•ˆ์ชฝ 20cm ~ ๊ฐ€์žฅ์ž๋ฆฌ ๋ฐ”๊นฅ 15cm)
  • Cylinder Position: ๋‹ค์–‘ํ•œ ์œ„์น˜์—์„œ์˜ ์›ํ†ตํ˜• ๋ฌผ์ฒด ์กฐ์ž‘
  • Robot Position: Y์ถ•(์ขŒ/์ค‘์•™/์šฐ), X์ถ•(๊ฐ€๊นŒ์šด ๊ฑฐ๋ฆฌ~๋จผ ๊ฑฐ๋ฆฌ)
  • Table Height: 26.5์ธ์น˜ ~ 31.8์ธ์น˜ ๋ฒ”์œ„
  • Lighting Conditions: ๋ฐ์Œ/์–ด๋‘์›€/๊นœ๋นก์ž„ ํ™˜๊ฒฝ
  • Table Cloth Color: ํšŒ์ƒ‰, ๋…น์ƒ‰, ๋…ธ๋ž‘, ๋ณด๋ผ, ์ฒญ๋ก, ํŒŒ๋ž‘, ์ฃผํ™ฉ, ๋นจ๊ฐ•
  • Table Type: ๋‹ค์–‘ํ•œ ์žฌ์งˆ๊ณผ ๋””์ž์ธ์˜ ํ…Œ์ด๋ธ”
  • Object Variety: ๋‹ค์–‘ํ•œ ํ˜•ํƒœ, ํฌ๊ธฐ, ์žฌ์งˆ์˜ ๋ฌผ์ฒด

์ด๋Ÿฌํ•œ ๋ณ€ํ˜•๋“ค์— ๊ฑธ์ณ VIRAL์€ ์ถ”๊ฐ€ ํŠœ๋‹ ์—†์ด ์ผ๊ด€๋˜๊ฒŒ task๋ฅผ ์™„์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ์ด behavior๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ training ์ค‘ ์‚ฌ์šฉ๋œ domain randomization๊ณผ RL์˜ ๊ฒฌ๊ณ ์„ฑ์— ๊ธฐ์ธํ•ฉ๋‹ˆ๋‹ค.

6.4 Compute Scaling์˜ ์ค‘์š”์„ฑ

์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ฐœ๊ฒฌ ์ค‘ ํ•˜๋‚˜๋Š” compute scale์ด ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” loco-manipulation ํ•™์Šต์— criticalํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Low-compute regime์—์„œ๋Š” ์ข…์ข… ํ•™์Šต์ด ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค.

Teacher Training Scaling

GPU ๋ฆฌ์†Œ์Šค๋ฅผ 1๊ฐœ์—์„œ 16๊ฐœ๋กœ ํ™•์žฅํ•˜๋ฉด์„œ teacher training์„ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. GPU ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋ฉด ํ•™์Šต์ด ์ƒ๋‹นํžˆ ๊ฐ€์†ํ™”๋ฉ๋‹ˆ๋‹ค. ๋” ํฐ batch์˜ parallel environments๊ฐ€ ๋‹จ์œ„ wall time๋‹น ๋” ๋„“์€ state-space coverage๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ดˆ๊ธฐ training์—์„œ๋Š” better-than-linear speedup์„ ๋ณด์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 4 GPUs๋กœ modest ์„ฑ๊ณต๋ฅ ์— ๋„๋‹ฌํ•˜๋Š” ์‹œ๊ฐ„์€ 2 GPUs์˜ ์ ˆ๋ฐ˜ ๋ฏธ๋งŒ์ž…๋‹ˆ๋‹ค.

์†๋„๋ฅผ ๋„˜์–ด, scaling์€ asymptotic performance์— ๋šœ๋ ทํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค: - ๋ถˆ์ถฉ๋ถ„ํ•œ compute(1-2 GPUs): teacher๊ฐ€ ์›ํ•˜๋Š” ์„ฑ๋Šฅ ๋ฒ”์œ„๋ณด๋‹ค ํ›จ์”ฌ ์•„๋ž˜์—์„œ ์ •์ฒด - 8-16 GPUs: ์ผ๊ด€๋˜๊ฒŒ 90% ์ด์ƒ์˜ ์„ฑ๊ณต๋ฅ  ๋„๋‹ฌ

Student Training Scaling

Student policy์—์„œ๋„ ๋ช…ํ™•ํ•œ scaling ์ถ”์„ธ๊ฐ€ ๊ด€์ฐฐ๋ฉ๋‹ˆ๋‹ค. GPU๋ฅผ 1๊ฐœ์—์„œ 64๊ฐœ๋กœ ๋Š˜๋ฆฌ๋ฉด์„œ distillation loss์™€ downstream ์„ฑ๊ณต๋ฅ ์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.

  • ๋” ํฐ ๊ทœ๋ชจ์˜ training์ด ์ผ๊ด€๋˜๊ฒŒ ์ˆ˜๋ ด์„ ๊ฐ€์†ํ™”
  • ๋™์ผํ•œ loss ์ž„๊ณ„๊ฐ’์— ํ›จ์”ฌ ๋นจ๋ฆฌ ๋„๋‹ฌ
  • ์„ฑ๊ณต๋ฅ  ๊ณก์„ ์ด ํ›จ์”ฌ ๊ฐ€ํŒŒ๋ฅด๊ฒŒ ์ƒ์Šน

์†๋„๋ฅผ ๋„˜์–ด, scaling์€ training ์•ˆ์ •์„ฑ๋„ ํ–ฅ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋” ๋งŽ์€ GPUs๋กœ training๋œ policies๋Š” ๋” ๋ถ€๋“œ๋Ÿฌ์šด loss ๊ณก์„ ๊ณผ ๋” ์ ์€ ์„ฑ๊ณต๋ฅ  ๋ถ„์‚ฐ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ํ†ต์ฐฐ: ์ƒ๋‹นํ•œ computing์€ ๋‹จ์ˆœํ•œ ํŽธ์˜๊ฐ€ ์•„๋‹ˆ๋ผ, ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” visual loco-manipulation distillation์„ ์œ„ํ•œ ์‹ค์งˆ์ ์ธ ์š”๊ตฌ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค.

6.5 Object Generalization

๋‘ ๊ฐ€์ง€ training regime์—์„œ grasping subtask์˜ object-level generalization์„ ์—ฐ๊ตฌํ•ฉ๋‹ˆ๋‹ค: 1. ์›ํ†ตํ˜• ๋ฌผ์ฒด๋งŒ์œผ๋กœ single-object training 2. 10๊ฐœ์˜ ๋‹ค๋ฅธ ๋ฌผ์ฒด๋กœ multi-object training

Test time์— ๋™์ผํ•œ 10๊ฐœ ๋ฌผ์ฒด์—์„œ ํ‰๊ฐ€ํ•˜๊ณ  normalized ์„ฑ๊ณต๋ฅ ์„ ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ๋Š” ์—ฌ๋Ÿฌ ๋ฌผ์ฒด๋กœ trainingํ•˜๋ฉด ํ›จ์”ฌ ๋” ๋‚˜์€ generalization์„ ์‚ฐ์ถœํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. Multi-object policy๋Š” ๋ชจ๋“  ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ cylinder-only baseline๋ณด๋‹ค ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.


7. ํ•œ๊ณ„์ ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ: ๋„ค ๊ฐ€์ง€ Coverage Gap

Sim-to-real์€ locomotion, geometric perception, rigid-body manipulation๊ณผ ๊ฐ™์€ ๊ฐœ๋ณ„ ๋Šฅ๋ ฅ์—์„œ ๋†€๋ผ์šด ์„ฑ๊ณต์„ ๋ณด์—ฌ์™”์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ๋ฒ”์šฉ loco-manipulation์œผ๋กœ ํ™•์žฅํ•˜๋ฉด(โ€œ์–ด๋””๋“  ์ด๋™, ๋ฌด์—‡์ด๋“  ์ธ์‹, ๋ฌด์—‡์ด๋“  ์กฐ์ž‘โ€), ํ˜„์žฌ ํŒจ๋Ÿฌ๋‹ค์ž„์ด ์•„์ง ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•œ ๋„ค ๊ฐ€์ง€ critical coverage gaps์ด ๋“œ๋Ÿฌ๋‚ฉ๋‹ˆ๋‹ค.

7.1 Physics Coverage: ๋ฌผ๋ฆฌ์  ๋‹ค์–‘์„ฑ ๊ฒฉ์ฐจ

ํ˜„๋Œ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋“ค์€ ์ด๋ก ์ ์œผ๋กœ fluid-structure interactions๊ณผ deformable bodies๋ฅผ ํฌํ•จํ•œ ๋ณต์žกํ•œ dynamics๋ฅผ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทผ๋ณธ์ ์ธ ๋ณ‘๋ชฉ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋Šฅ์˜ ๋ถ€์กฑ์ด ์•„๋‹ˆ๋ผ, ์ด๋Ÿฌํ•œ ๊ธฐ๋Šฅ๋“ค์„ ํ˜„์‹ค์— ๊ธฐ๋ฐ˜์„ ๋‘๊ธฐ ์œ„ํ•œ ์—”์ง€๋‹ˆ์–ด๋ง ๋…ธ๋ ฅ์˜ ํ™•์žฅ์„ฑ์ž…๋‹ˆ๋‹ค.

์ถฉ๋ถ„ํ•œ ๋…ธ๋ ฅ์„ ๋“ค์ด๋ฉด, ์Œ€ ํผ๋‚ด๊ธฐ, ์ง‘๊ฒŒ๋กœ ๊ตญ์ˆ˜ ์ง‘๊ธฐ, ๋งˆ๋Š˜ ์ฐ๊ธฐ, ์†์œผ๋กœ ์ดˆ๋ฐฅ ๋งŒ๋“ค๊ธฐ, ์ปคํ”ผ ๋จธ์‹ ์— ์ฝฉ ๋„ฃ๊ธฐ ๋“ฑ ํŠน์ • ํ™˜๊ฒฝ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋„๋ก ์—”์ง€๋‹ˆ์–ด๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๊ฐ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” ์‹ค์„ธ๊ณ„์™€ ์ •๋ ฌํ•˜๊ธฐ ์œ„ํ•œ material properties์™€ boundary conditions์˜ ๋งž์ถคํ˜• ํŠœ๋‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋„์ „์€ ์ด ๋…ธ๋ ฅ์„ ์ผ์ƒ์ƒํ™œ์˜ open-ended diversity๋กœ ํ™•์žฅํ•˜๋Š” ๋ฐ ์žˆ์Šต๋‹ˆ๋‹ค. ์žฅ๋ฒฝ์€ ์ด๋Ÿฌํ•œ ์ƒํ˜ธ์ž‘์šฉ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์‹ค์„ธ๊ณ„ ๋ฌผ๋ฆฌ์˜ long tail์— ๋Œ€ํ•ด ์ •ํ™•ํžˆ ์ธ์Šคํ„ด์Šคํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์—”์ง€๋‹ˆ์–ด๋ง ๋น„์šฉ์ด ์‹ค์„ธ๊ณ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ๋ณต์žก์„ฑ์„ ์ดˆ๊ณผํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

7.2 Task Coverage: Task ์ƒ์„ฑ์˜ Long-Tail

๋ฌผ๋ฆฌ๊ฐ€ ์™„๋ฒฝํ•˜๊ฒŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋  ์ˆ˜ ์žˆ๋‹ค ํ•˜๋”๋ผ๋„, tasks์˜ ๋‹ค์–‘์„ฑ์€ ๋ฏธํ•ด๊ฒฐ ๊ณผ์ œ๋กœ ๋‚จ์Šต๋‹ˆ๋‹ค. ๋‹จ์ผ task(์˜ˆ: ์„ค๊ฑฐ์ง€)๋ฅผ ์œ„ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ๊ตฌ์ถ•์—๋Š” object geometries๋ฟ๋งŒ ์•„๋‹ˆ๋ผ functional affordances, ๋‹ค์–‘ํ•œ states(๋”๋Ÿฌ์›€ vs ๊นจ๋—ํ•จ), interaction logic์˜ ๋ชจ๋ธ๋ง์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋”์šฑ์ด, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ์ธ๊ฐ„์˜ ์ƒ์ƒ๋ ฅ์— ์˜ํ•ด ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. โ€œunknown unknownsโ€โ€”์‹ค์„ธ๊ณ„ ๋ฐฐํฌ ์ค‘์—๋งŒ ๋‚˜ํƒ€๋‚˜๋Š” edge cases์™€ task variantsโ€”๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

7.3 Reward and Policy Coverage: Reward Engineering ๋ณ‘๋ชฉ

ํƒ์ƒ‰์„ ์•ˆ๋‚ดํ•  ๋งŒํผ ์ถฉ๋ถ„ํžˆ denseํ•˜๋ฉด์„œ๋„ specification gaming์„ ๋ฐฉ์ง€ํ•  ๋งŒํผ ์ถฉ๋ถ„ํžˆ sparseํ•œ โ€œRL-friendlyโ€ reward functions์„ ์ •์˜ํ•˜๋Š” ๊ฒƒ์€ ํ™•์žฅ๋˜์ง€ ์•Š๋Š” ์„ฌ์„ธํ•œ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ, under-exploration๊ณผ over-exploration ์‚ฌ์ด์˜ ๊ธด์žฅ์ด ๊ด€์ฐฐ๋ฉ๋‹ˆ๋‹ค: - Dense, shaped rewards โ†’ policy๋ฅผ local optima๋‚˜ simulator exploits๋กœ ํŽธํ–ฅ - Sparse rewards โ†’ high-dimensional spaces์—์„œ ํ•™์Šต ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ์‹คํŒจ

๋‹จ์ผ task์˜ ๊ฒฝ์šฐ, ์ด๋Ÿฌํ•œ rewards๋ฅผ โ€œGoldilocksโ€ regime์„ ์ฐพ๋„๋ก ํŠœ๋‹ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ˆ˜์ฒœ ๊ฐœ์˜ ๋‹ค๋ฅธ tasks์— ๋Œ€ํ•ด robust reward functions์„ ์ˆ˜๋™์œผ๋กœ ์„ค๊ณ„ํ•˜๋Š” ๊ฒƒ์€ ๋‹ค๋ฃจ๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

์ด๋Š” ์ค‘์š”ํ•œ trade-off๋ฅผ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค: sim-to-real์€ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์„ ์ œ๊ณตํ•˜์ง€๋งŒ, ๋†’์€ ์‚ฌ์ „ ์—”์ง€๋‹ˆ์–ด๋ง ๋…ธ๋ ฅ์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค. ๋Œ€์กฐ์ ์œผ๋กœ, imitation learning์€ ๋ถ€๋‹ด์„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์œผ๋กœ ์ด๋™์‹œํ‚ต๋‹ˆ๋‹ค. ํ˜„์žฌ๋กœ์„œ๋Š”, ๋ฉฐ์น ๊ฐ„์˜ ๊ณ ํ’ˆ์งˆ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์ข…์ข… ํŠน์ • tasks์— ๋Œ€ํ•ด ์ˆ˜๊ฐœ์›”์˜ sim-to-real ์—”์ง€๋‹ˆ์–ด๋ง์„ ๋Šฅ๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

7.4 Hardware Coverage: ํ•˜๋“œ์›จ์–ด-์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒฉ์ฐจ

๋งˆ์ง€๋ง‰์œผ๋กœ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ์ด์ƒํ™”๋œ actuation๊ณผ ํ˜„์žฌ ํœด๋จธ๋…ธ์ด๋“œ ํ•˜๋“œ์›จ์–ด์˜ ํ˜„์‹ค ์‚ฌ์ด์— ๋šœ๋ ทํ•œ ๊ฒฉ์ฐจ๊ฐ€ ๋‚จ์•„ ์žˆ์Šต๋‹ˆ๋‹ค. Locomotion์„ ์œ„ํ•œ quasi-direct drive(QDD) actuators๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ์ž˜ ๋ชจ๋ธ๋ง๋˜์ง€๋งŒ, dexterous manipulation hardware๋Š” ์ข…์ข… ๋ชจ๋ธ๋ง๋˜์ง€ ์•Š์€ friction, backlash, thermal throttling, sensor noise๋กœ ๊ณ ํ†ต๋ฐ›์Šต๋‹ˆ๋‹ค.

7.5 ์ €์ž๋“ค์˜ ์ „๋ง

์ด ๋„ค ๊ฐ€์ง€ gaps์€ sim-to-real์ด ๋กœ๋ณดํ‹ฑ์Šค์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ์œ ์ง€ํ•  ๊ฒƒ์ด์ง€๋งŒโ€”ํŠนํžˆ ์•ˆ์ „ํ•˜๊ณ  ์•ˆ์ •์ ์ธ ํ‰๊ฐ€์™€ bounded state-spaces์˜ skills ํ•ด๊ฒฐ์—์„œโ€”๋ฒ”์šฉ loco-manipulation์œผ๋กœ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์€ ๊ฐ€๊นŒ์šด ๋ฏธ๋ž˜์—๋Š” ๋„๋‹ฌํ•  ์ˆ˜ ์—†์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ถ„์•ผ๋Š” locomotion์—์„œ sim-to-real์˜ sweet spot์„ ์„ฑ๊ณต์ ์œผ๋กœ ์‹๋ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค: ์ œํ•œ๋œ parameters(terrain, mass)์˜ aggressive randomization๊ณผ ์‹ ์ค‘ํ•˜๊ฒŒ ์„ค๊ณ„๋œ reward functions์ด ์ž˜ ์ผ๋ฐ˜ํ™”ํ•˜๋Š” robust policies๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ manipulation์˜ equivalent sweet spot์€ ์•„์ง ๋ฐœ๊ฒฌ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

์ €์ž๋“ค์˜ ์ œ์•ˆ: ์•ž์œผ๋กœ์˜ ๊ธธ์€ ๋” ๋„“์€ ๋ฐ์ดํ„ฐ ์—์ฝ”์‹œ์Šคํ…œ ๋‚ด์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ์—ญํ• ์„ ์žฌ์ •์˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ์‹ค์„ธ๊ณ„์˜ ์ „์ฒด ๋ถ„ํฌ๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ๊ฐ•์ œํ•˜๋Š” ๋Œ€์‹ , ๋‹ค์Œ frontier๋Š” sim-to-real์„ ๊ธ‰์†ํžˆ ์„ฑ์ˆ™ํ•ด์ง€๋Š” ์‹ค์„ธ๊ณ„ imitation learning ๋ฐ foundation models ์Šคํƒ๊ณผ ํ†ตํ•ฉํ•˜๋Š” ๋ฐ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ์‹ค์„ธ๊ณ„ ํ•™์Šต์„ ๋Œ€์ฒดํ•˜๊ธฐ๋ณด๋‹ค ๋ณด์™„ํ•˜๋Š” ์ด ์‹œ๋„ˆ์ง€๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒƒ์ด ๋ฒ”์šฉ loco-manipulation์˜ ๋ฏธ๋ž˜๋ฅผ ์œ„ํ•œ ๊ฐ€์žฅ ํฅ๋ฏธ๋กœ์šด ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค.


8. ๊ฒฐ๋ก  ๋ฐ ํ•ต์‹ฌ ๊ตํ›ˆ

VIRAL์€ RGB ๊ธฐ๋ฐ˜ ํœด๋จธ๋…ธ์ด๋“œ loco-manipulation์„ ์‹ค์ œ๋กœ ์ž‘๋™ํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ํฌ๊ด„์ ์ธ ๊ธฐ์ˆ ์  ๋ ˆ์‹œํ”ผ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

8.1 ์ฃผ์š” ๊ธฐ์ˆ ์  ๊ธฐ์—ฌ

  1. Teacher-Student Framework: Privileged information์œผ๋กœ ํ•™์Šต ํ›„ visual policy๋กœ distillationํ•˜๋Š” 2๋‹จ๊ณ„ ์ ‘๊ทผ๋ฒ•
  2. Delta Action Space: Absolute targets ๋Œ€์‹  increments๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RL ํ•™์Šต ๊ฐ€์†ํ™” ๋ฐ ์•ˆ์ •ํ™”
  3. Reference State Initialization: Teleoperation demonstrations๋ฅผ state-initialization buffer๋กœ ํ™œ์šฉ
  4. DAgger-BC Mixture: Teacher์™€ student rollouts์˜ ํ˜ผํ•ฉ์„ ํ†ตํ•œ ๊ฒฌ๊ณ ํ•œ visual distillation
  5. ๋Œ€๊ทœ๋ชจ Domain Randomization: Visual, physical, camera parameters์— ๊ฑธ์นœ ๊ด‘๋ฒ”์œ„ํ•œ randomization

8.2 ๋กœ๋ด‡๊ณตํ•™์ž๋“ค์„ ์œ„ํ•œ ์‹ค์šฉ์  ๊ตํ›ˆ

  1. Compute Scale์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค: Low-compute regimes์—์„œ๋Š” ํ•™์Šต์ด ์ข…์ข… ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค. 64 GPUs ๊ทœ๋ชจ์˜ computing์ด ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ํ•™์Šต์˜ ์‹ค์งˆ์  ์š”๊ตฌ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค.

  2. WBC๋ฅผ API๋กœ ํ™œ์šฉํ•˜์„ธ์š”: Low-level control์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•˜๊ธฐ๋ณด๋‹ค, pre-trained WBC ์œ„์—์„œ high-level policy๋ฅผ ํ•™์Šตํ•˜๋ฉด reward engineering ๋ถ€๋‹ด์ด ์ค„๊ณ  ๋ฐฐํฌ๊ฐ€ ์•ˆ์ „ํ•ด์ง‘๋‹ˆ๋‹ค.

  3. Demonstrations๋ฅผ ์ ๊ทน ํ™œ์šฉํ•˜์„ธ์š”: ์ˆœ์ˆ˜ RL์€ long-horizon loco-manipulation์—์„œ ์‹คํŒจํ•˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. RSI์™€ ๊ฐ™์€ demonstration-guided ์ „๋žต์ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.

  4. Hardware alignment์— ํˆฌ์žํ•˜์„ธ์š”: ํŠนํžˆ high gear ratio๊ฐ€ ์žˆ๋Š” dexterous hands์˜ SysID์™€ camera extrinsics calibration์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

  5. Randomization์€ ์ƒํ˜ธ ๋ณด์™„์ ์ž…๋‹ˆ๋‹ค: Material, lighting, camera randomization์ด ํ•จ๊ป˜ sim-to-real transfer์˜ ํ•ต์‹ฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค.

8.3 ๋‚จ์€ ๊ณผ์ œ

VIRAL์€ impressiveํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์ง€๋งŒ, ๋…ผ๋ฌธ์—์„œ ์†”์งํ•˜๊ฒŒ ์ธ์ •ํ•˜๋“ฏ์ด, ๋ฒ”์šฉ loco-manipulation์œผ๋กœ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ ์—ด๋ฆฐ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. Physics diversity, task coverage, reward engineering, hardware-simulation gap์˜ ๋„ค ๊ฐ€์ง€ ๊ทผ๋ณธ์ ์ธ ๊ฒฉ์ฐจ๊ฐ€ ๋‚จ์•„ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐ€์žฅ ์œ ๋งํ•œ ๋ฐฉํ–ฅ์€ sim-to-real์„ ์‹ค์„ธ๊ณ„ imitation learning ๋ฐ foundation models์™€ ํ†ตํ•ฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ์‹ค์„ธ๊ณ„ ํ•™์Šต์„ ๋Œ€์ฒดํ•˜๊ธฐ๋ณด๋‹ค ๋ณด์™„ํ•˜๋Š” ์‹œ๋„ˆ์ง€๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค.

Appendix: ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ

Teacher Policy (PPO)

Parameter Value
Number of environments 32,768 (2048ร—8GPUsร—2Nodes)
Discount factor (ฮณ) 0.998
Learning rate 0.00002
Entropy coefficient 0.01
Value loss coefficient 1
Init noise std 0.5
MLP size [512, 256, 128]

Student Policy (DAgger + BC)

Parameter Value
Number of environments 65,535 (1024ร—8GPUsร—8Nodes)
Steps per environment 1
Learning rate 0.0002
DAgger-BC ratio (ฮฑ) 0.5

Domain Randomization Ranges

Parameter Distribution
Brightness U(-0.2, 0.2)
Contrast U(0.8, 1.2)
Camera Position X U(-0.02, 0.02) m
Camera Position Y U(-0.02, 0.02) m
Camera Position Z U(-0.02, 0.02) m
Table Height U(0.68, 0.81) m
Dome Light Intensity U(500, 2000)

โ›๏ธ Dig Review

โ›๏ธ Dig โ€” Go deep, uncover the layers. Dive into technical detail.

VIRAL: ํœด๋จธ๋…ธ์ด๋“œ ๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜์„ ์œ„ํ•œ ๋Œ€๊ทœ๋ชจ ์‹œ๊ฐ Sim-to-Real ํ”„๋ ˆ์ž„์›Œํฌ

ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์—๊ฒŒ ๋ณดํ–‰๊ณผ ์กฐ์ž‘์„ ํ†ตํ•ฉํ•œ ์žฅ๊ธฐ ํ–‰๋™(๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜)์„ autonomously ์ˆ˜ํ–‰ํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์€ ์˜ค๋žซ๋™์•ˆ ๋‚œ์ œ์˜€์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡๋“ค์€ ์ด๋™๊ณผ ๋ฌผ์ฒด ์กฐ์ž‘์„ ๊ธด ์‹œ๊ฐ„ ๋™์•ˆ ์—ฐ๊ณ„ํ•˜๋Š” ๋ฐ์— ์–ด๋ ค์›€์„ ๊ฒช์—ˆ๊ณ , ๋Œ€๋ถ€๋ถ„ ์‚ฌ๋žŒ์˜ ์›๊ฒฉ ์กฐ์ž‘(teleoperation)์ด๋‚˜ ์ œํ•œ๋œ ํ™˜๊ฒฝ์— ์˜์กดํ–ˆ์ฃ . ์ด์— ๋Œ€ํ•œ ํ•ต์‹ฌ ์žฅ์•  ์š”์†Œ๋กœ ์‹œ๊ฐ ์ธ์‹์— ๊ธฐ๋ฐ˜ํ•œ ์žฅ๊ธฐ ์ œ์–ด ์ •์ฑ…์˜ ๋ถ€์žฌ์™€, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ํ•™์Šตํ•œ ์ •์ฑ…์„ ์‹ค์ œ ๋กœ๋ด‡์— ์˜ฎ๊ธธ ๋•Œ ๋ฐœ์ƒํ•˜๋Š” Sim-to-Real(์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ๊ฐ„) ๊ฒฉ์ฐจ๊ฐ€ ๊ผฝํž™๋‹ˆ๋‹ค.

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์— ๋„์ „ํ•œ ์ตœ์‹  ์—ฐ๊ตฌ โ€œVIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulationโ€๋ฅผ ๊นŠ์ด ์žˆ๊ฒŒ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. VIRAL์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ํ•™์Šตํ•œ ์‹œ๊ฐ ์ •์ฑ…์„ ๋‹จ ํ•œ ๋ฒˆ์˜ ์‹ค์„ธ๊ณ„ ํŠœ๋‹ ์—†์ด ์‹ค์ œ ํœด๋จธ๋…ธ์ด๋“œ(์œ ๋‹ˆํŠธ๋ฆฌ G1)์—์„œ ์—ฐ์†์ ์ธ ๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๋งŒ๋“  ํฅ๋ฏธ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์ด ๊ธ€์—์„œ๋Š” VIRAL์˜ ๋ฐฉ๋ฒ•๋ก (๊ต์‚ฌ-ํ•™์ƒ ํ•™์Šต ๊ตฌ์กฐ, ์‹œ์Šคํ…œ ๊ตฌ์„ฑ, ํ›ˆ๋ จ ์ ˆ์ฐจ), ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ๋ฐ Sim-to-Real ์ „์ด ๊ธฐ๋ฒ•์„ ์ƒ์„ธํžˆ ์„ค๋ช…ํ•˜๊ณ , ๋…ผ๋ฌธ์— ์ œ์‹œ๋œ ์ฃผ์š” ๊ทธ๋ฆผ๊ณผ ํ‘œ๋ฅผ ํ†ตํ•ด ๋‚ด์šฉ์„ ์‹œ๊ฐ์ ์œผ๋กœ ์ดํ•ดํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ธ€์˜ ํ›„๋ฐ˜๋ถ€์—๋Š” ๋น„ํŒ์  ๊ด€์ ์—์„œ ๋ณธ ํ•œ๊ณ„์™€ ๊ฐœ์„ ์ ๋„ 10% ์ •๋„ ํ• ์• ํ•˜์—ฌ ๋‹ค๋ฃจ๊ฒ ์Šต๋‹ˆ๋‹ค.

VIRAL ๊ฐœ์š”: ๊ต์‚ฌ-ํ•™์ƒ ๋Œ€๊ทœ๋ชจ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ

VIRAL์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” โ€œ๊ต์‚ฌ-ํ•™์ƒ(Teacher-Student)โ€ ํ˜•ํƒœ์˜ ํ•™์Šต ๊ตฌ์กฐ๋ฅผ ์ทจํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ต์‚ฌ(policy)๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ’€-์Šคํ…Œ์ดํŠธ(full-state) ์ •๋ณด๋ฅผ ํŠน๊ถŒ์ ์œผ๋กœ ํ™œ์šฉํ•ด RL(๊ฐ•ํ™”ํ•™์Šต)๋กœ ์žฅ๊ธฐ ๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜ ๊ธฐ์ˆ ์„ ๋จผ์ € ์ตํžˆ๊ณ , ํ•™์ƒ(policy)์€ ์‹œ๊ฐ ์„ผ์„œ ์ž…๋ ฅ๋งŒ์œผ๋กœ ์›€์ง์ด๋Š” ์ •์ฑ…์„ ๊ต์‚ฌ๋กœ๋ถ€ํ„ฐ ์ง€๊ธˆ๊นŒ์ง€ ๋ฐฐ์šด ๋‚ด์šฉ์„ โ€œ์ฆ๋ฅ˜โ€(distill)ํ•˜์—ฌ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ Sim-to-Real ์ „์ด ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ํ•™์ƒ ์ •์ฑ…์„ ์‹ค์ œ ๋กœ๋ด‡์— ์ด์‹ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ํฌ๊ฒŒ ์„ธ ๋‹จ๊ณ„๋กœ ์š”์•ฝํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  1. ๊ต์‚ฌ ์ •์ฑ… ๊ฐ•ํ™”ํ•™์Šต โ€“ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํŠน๊ถŒ ์„ผ์„œ์ •๋ณด๋ฅผ ๊ฐ€์ง„ RL ๊ต์‚ฌ ์ •์ฑ…์„ ํ›ˆ๋ จ (์ „์‹  ์ƒํƒœ, ๋ฌผ์ฒด ์œ„์น˜ ๋“ฑ ๋ชจ๋“  ์ •๋ณด ํ™œ์šฉ)
  2. ํ•™์ƒ ์ •์ฑ… ์ฆ๋ฅ˜ ํ•™์Šต โ€“ ๋Œ€๊ทœ๋ชจ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ RGB ์นด๋ฉ”๋ผ ์˜์ƒ๊ณผ ์‹ค์ œ ๋กœ๋ด‡๊ณผ ์ผ์น˜ํ•˜๋Š” ์ œํ•œ๋œ ์„ผ์„œ์ •๋ณด๋งŒ์œผ๋กœ ํ•™์ƒ ์ •์ฑ…์„ ํ›ˆ๋ จ. ์ด๋•Œ ํ•™์ƒ์€ ๊ต์‚ฌ์˜ ํ–‰๋™์„ ๋ชจ๋ฐฉ(DAgger+BC)
  3. Sim-to-Real ์ „์ด โ€“ ์‹œ๊ฐ ๋„๋ฉ”์ธ ๋žœ๋คํ™”์™€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-์‹ค๊ธฐ๊ณ„ ์ •ํ•ฉ ๋“ฑ์„ ํ†ตํ•ด, ํ•™์Šต๋œ ํ•™์ƒ ์ •์ฑ…์„ ์ œ๋กœ์ƒท(Zero-Shot)์œผ๋กœ ์‹ค์„ธ๊ณ„ ๋กœ๋ด‡์— ์ ์šฉ

์š”์ปจ๋Œ€, VIRAL์€ โ€œ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์†์—์„œ ํฌ๊ฒŒ ๋ฐฐ์šฐ๊ณ , ํ˜„์‹ค์— ๋ฐ”๋กœ ํˆฌ์ž…โ€ํ•˜๋Š” ์ „๋žต์„ ์ทจํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ VIRAL ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์ „์ฒด ๊ตฌ์กฐ๋ฅผ ํ•œ๋ˆˆ์— ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๊ทธ๋ฆผ 1: VIRAL ํ”„๋ ˆ์ž„์›Œํฌ ํŒŒ์ดํ”„๋ผ์ธ ๊ฐœ์š”. ์™ผ์ชฝ๋ถ€ํ„ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ ์นด๋ฉ”๋ผ ๋ Œ๋”๋ง, ๊ต์‚ฌ ์ •์ฑ…(ํŠน๊ถŒ ์„ผ์„œ์‚ฌ์šฉ)๊ณผ ํ•™์ƒ ์ •์ฑ…(์‹œ๊ฐ ์ž…๋ ฅ ์‚ฌ์šฉ)์˜ ๊ตฌ์กฐ๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค. ๊ต์‚ฌ ์ •์ฑ… \pi_{\text{teacher}}๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ํ’€-์Šคํ…Œ์ดํŠธ ์ •๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๋†’์€ ์ˆ˜์ค€์˜ ํ–‰๋™ ๋ช…๋ น์„ ์ถœ๋ ฅํ•˜๊ณ , ์ด๋Š” ์ „์‹  ์ œ์–ด๊ธฐ(WBC)๋ฅผ ํ†ตํ•ด ์„ธ๋ถ€ ๊ด€์ ˆ ์ œ์–ด๋กœ ์‹คํ˜„๋ฉ๋‹ˆ๋‹ค. ํ•™์ƒ ์ •์ฑ… \pi_{\text{student}}๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๋ Œ๋”๋ง๋œ RGB ์˜์ƒ๊ณผ ์‹ค์ œ ๋กœ๋ด‡์—์„œ๋„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ œํ•œ๋œ ๊ด€์ ˆ/IMU ์„ผ์„œ ์ •๋ณด๋งŒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ–‰๋™์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ํ›ˆ๋ จ ์‹œ ๊ต์‚ฌ ํ–‰๋™์„ ๋ชจ๋ฐฉํ•˜๋„๋ก ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ํ•™์Šต๋œ ํ•™์ƒ ์ •์ฑ…์€ ์‹ค์ œ ๋กœ๋ด‡(์˜ค๋ฅธ์ชฝ์˜ Unitree G1 ํœด๋จธ๋…ธ์ด๋“œ)์— ํˆฌ์ž…๋˜์–ด, ์นด๋ฉ”๋ผ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ์„ ๋ฐ›์•„ ๋™์ž‘ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๊ต์‚ฌ ์ •์ฑ… ํ•™์Šต: ํŠน๊ถŒ ์ •๋ณด ํ™œ์šฉ ๊ฐ•ํ™”ํ•™์Šต (RL)

VIRAL์˜ 1๋‹จ๊ณ„๋Š” ๊ต์‚ฌ ์ •์ฑ…์„ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ต์‚ฌ ์ •์ฑ…์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ƒ์—์„œ๋งŒ ๋™์ž‘ํ•˜๋ฉฐ, ๋กœ๋ด‡๊ณผ ํ™˜๊ฒฝ์˜ ๋ชจ๋“  ์ƒํƒœ ์ •๋ณด๋ฅผ ํŠน๊ถŒ์ ์œผ๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋กœ๋ด‡์˜ ๊ด€์ ˆ๊ฐ/์†๋„, ๋ฌผ์ฒด์™€ ํ…Œ์ด๋ธ”์˜ ์ •ํ™•ํ•œ ์œ„์น˜์™€ ์ƒ๋Œ€๋ณ€ํ™˜, ํ˜„์žฌ ์ž‘์—…๋‹จ๊ณ„(stage) ๋“ฑ์˜ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ๊ด€์ธก์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์‹œ๊ฐ์  ์ฒ˜๋ฆฌ ์—†์ด ๋‚ด๋ถ€ ์ƒํƒœ๋กœ ํ•™์Šตํ•˜๋ฏ€๋กœ, ๋‚œ์ด๋„๊ฐ€ ํฌ๊ฒŒ ๋‚ฎ์•„์ง€๊ณ  ํšจ์œจ์ ์ธ RL์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๊ต์‚ฌ ์ •์ฑ…์€ PPO ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ํ•™์Šต๋˜๋ฉฐ, Whole-Body Controller(WBC)๋กœ ์•Œ๋ ค์ง„ ์ €์ˆ˜์ค€ ์ œ์–ด๊ธฐ์™€ ์—ฐ๊ณ„๋ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๊ต์‚ฌ ์ •์ฑ…์ด ์ง์ ‘ ๋ชจ๋“  ๊ด€์ ˆ ํ† ํฌ๋ฅผ ๋‚ด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, HOMIE๋ผ๋Š” ์‚ฌ์ „ ์„ค๊ณ„๋œ ๊ฐ•์ธํ•œ WBC์—๊ฒŒ ์ƒ์œ„ ํ–‰๋™ ๋ช…๋ น์„ ๋‚ด๋ ค์ฃผ๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์•ก์…˜ ๊ณต๊ฐ„(action space)์€ ํŠนํžˆ โ€œ๋ธํƒ€(Delta) ์•ก์…˜ ์ŠคํŽ˜์ด์Šคโ€๋กœ ์ •์˜๋˜๋Š”๋ฐ, ์ด๋Š” ๊ธฐ์กด ๋ชฉํ‘œ์น˜ ๋Œ€๋น„ ๋ณ€ํ™”๋Ÿ‰์„ ๋ช…๋ นํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๊ต์‚ฌ ์ •์ฑ…์˜ ์•ก์…˜ \mathbf{a}_t๋Š” ๋กœ๋ด‡ ๋ฐ”๋””์˜ ์„ ์†๋„/๊ฐ์†๋„ ๋ณ€ํ™” (\Delta \mathbf{v}, \Delta \omega)์™€ ๋‘ ํŒ” ๋ฐ ์†๊ฐ€๋ฝ์˜ ๋ชฉํ‘œ ๊ด€์ ˆ๊ฐ ๋ณ€ํ™” (\Delta \mathbf{q}_{\text{arm}}, \Delta \mathbf{q}_{\text{finger}})๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋ธํƒ€ ํ˜•์‹์˜ ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜๋ฉด, ์ ˆ๋Œ€ ๊ด€์ ˆ๊ฐ์„ ์ง์ ‘ ๋‚ด๋Š” ๊ฒƒ๋ณด๋‹ค ํ•™์Šต์ด ์•ˆ์ •๋˜๊ณ  ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ๋…ผ๋ฌธ ๊ฒฐ๊ณผ์—์„œ๋„ ๋ธํƒ€ ์•ก์…˜์„ ์“ฐ์ง€ ์•Š๊ณ  ์ ˆ๋Œ€ ์•ก์…˜์„ ์“ด ์‹คํ—˜์—์„œ๋Š” ํ•™์Šต์ด ๊ฑฐ์˜ ์‹คํŒจํ•˜๋Š” ๋ฐ˜๋ฉด, ๋ธํƒ€ ์•ก์…˜์„ ์“ฐ๋ฉด ๋น ๋ฅด๊ฒŒ ๋ณด์ƒ๊ณผ ์„ฑ๊ณต๋ฅ ์ด ์˜ฌ๋ผ๊ฐ”์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ ๊ต์‚ฌ RL์˜ ๋ณด์ƒ(reward)์€ ์ž‘์—… ๋‹จ๊ณ„๋ฅผ ๊ตฌ๋ถ„ํ•˜์—ฌ ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜์˜ ์ „์ฒด ์ž‘์—…์„ (1) ๊ฑท๊ธฐ, (2) ๋ฌผ์ฒด ๋†“๊ธฐ, (3) ์ง‘๊ธฐ, (4) ํšŒ์ „์˜ ๋„ค ๋‹จ๊ณ„๋กœ ๊ตฌ๋ถ„ํ•˜๊ณ , ๊ฐ ๋‹จ๊ณ„๋ณ„๋กœ ๋ชฉํ‘œ ๋‹ฌ์„ฑ์— ๋Œ€ํ•œ ๋ณด์ƒ์„ ์ค๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ โ€œ๊ฑท๊ธฐโ€ ๋‹จ๊ณ„์—์„œ๋Š” ๋กœ๋ด‡๊ณผ ๋Œ€์ƒ ๋ฌผ์ฒด ์‚ฌ์ด ๊ฑฐ๋ฆฌ๋ฅผ ์ขํžˆ๋Š” ๊ฒƒ์— ๋Œ€ํ•œ ๋ณด์ƒ r_{\text{walk}}์„, โ€œ์ง‘๊ธฐโ€ ๋‹จ๊ณ„์—์„œ๋Š” ์†์ด ํ…Œ์ด๋ธ” ์œ„ ๋ฌผ์ฒด๋ฅผ ๋“ค์–ด์˜ฌ๋ฆฌ๋Š” ๋†’์ด์— ๋”ฐ๋ฅธ ๋ณด์ƒ r_{\text{grasp-z}} ๋“ฑ์„ ๋ถ€์—ฌํ•˜๋Š” ์‹์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋‹จ๊ณ„๋ณ„ ์„ธ๋ถ„ํ™”๋œ ๋ณด์ƒ์€ ์žฅ์‹œ๊ฐ„์— ๊ฑธ์นœ ํ–‰๋™ ํ•™์Šต์„ ์•ˆ์ •ํ™”ํ•˜๋Š”๋ฐ ๊ธฐ์—ฌํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ ˆํผ๋Ÿฐ์Šค ์ƒํƒœ ์ดˆ๊ธฐํ™” (RSI)๋„ ๊ต์‚ฌ ์ •์ฑ… ํ•™์Šต์˜ ์„ฑํŒจ๋ฅผ ๊ฐ€๋ฅด๋Š” ์ค‘์š”ํ•œ ์š”์†Œ์ž…๋‹ˆ๋‹ค. ์žฅ๊ธฐ๊ฐ„์˜ ํ–‰๋™์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ์ตํžˆ๋Š” ๊ฒƒ์€ ํƒ์ƒ‰ ๋ฌธ์ œ ๋•Œ๋ฌธ์— ๋งค์šฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. VIRAL์—์„œ๋Š” ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํ…”๋ ˆ์กฐ์ž‘(์ „๋ฌธ๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์กฐ์ข…)์œผ๋กœ ์ˆ˜์ง‘ํ•œ 200๊ฐœ์˜ ๋ฐ๋ชจ ํŠธ๋ ˆ์ด์ ํ„ฐ๋ฆฌ์—์„œ ์ผ๋ถ€ ์ƒํƒœ๋“ค์„ ๊ฐ€์ ธ์™€ ์—ํ”ผ์†Œ๋“œ ์‹œ์ž‘ ์ƒํƒœ๋กœ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๊ฐ€๋”์€ ๋กœ๋ด‡์„ ์‚ฌ๋žŒ์ด ์‹œ์—ฐํ•œ ์ค‘๊ฐ„ ์ƒํƒœ์— ๋†“๊ณ  ๊ฑฐ๊ธฐ์„œ๋ถ€ํ„ฐ ํ•™์Šต์„ ์ด์–ด๊ฐ€๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด์ฃ . ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด RL ์—์ด์ „ํŠธ๊ฐ€ ๋‹ค์–‘ํ•œ ์œ ๋ง ์ƒํƒœ๋ฅผ ์ง์ ‘ ๊ฒฝํ—˜ํ•  ์ˆ˜ ์žˆ์–ด, ์žฅ๊ธฐ ๋ณด์ƒ์„ ๋ฐ›๊ธฐ๊ฐ€ ํ›จ์”ฌ ์ˆ˜์›”ํ•ด์ง‘๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์˜ ๋ถ„์„์— ๋”ฐ๋ฅด๋ฉด RSI๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ๊ต์‚ฌ ์ •์ฑ…์˜ ์ตœ์ข… ์„ฑ๊ณต๋ฅ ์ด 95%์— ๋‹ฌํ•œ ๋ฐ˜๋ฉด, RSI ์—†์ด๋Š” 10% ๋ฏธ๋งŒ์— ๊ทธ์ณ ์‚ฌ์‹ค์ƒ ์‹คํŒจํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋งŒํผ RSI ๊ธฐ๋ฒ•์€ ์žฅ๊ธฐ ๊ณผ์ œ ํ•™์Šต์— ํ•„์ˆ˜์ ์ธ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ฆผ 2: ๊ต์‚ฌ ์ •์ฑ… ํ•™์Šต์—์„œ RSI์™€ ๋ธํƒ€ ์•ก์…˜ ์ŠคํŽ˜์ด์Šค์˜ ํšจ๊ณผ. ์ขŒ์ธก ๊ทธ๋ž˜ํ”„๋Š” ๊ต์‚ฌ ์ •์ฑ…์˜ ํ•™์Šต ๊ณผ์ •์—์„œ ๋ˆ„์  ๋ณด์ƒ์˜ ํ–ฅ์ƒ์„, ์šฐ์ธก์€ ์ž‘์—… ์„ฑ๊ณต๋ฅ ์˜ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋…ธ๋ž€์ƒ‰ ๊ณก์„ ์€ RSI์™€ ๋ธํƒ€ ์•ก์…˜์„ ๋ชจ๋‘ ๋„์ž…ํ•œ ์™„์ „ํ•œ ์„ค์ •, ๋…น์ƒ‰์€ RSI ์—†์ด ๋ธํƒ€ ์•ก์…˜๋งŒ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ, ํŒŒ๋ž‘์€ ๋ธํƒ€ ์•ก์…˜ ์—†์ด ์ ˆ๋Œ€ ๊ด€์ ˆ ๋ช…๋ น์„ ์“ด ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค. ๋ณด์‹œ๋‹ค์‹œํ”ผ ๋ธํƒ€ ์•ก์…˜์„ ์“ฐ์ง€ ์•Š์œผ๋ฉด ๊ฐ•ํ™”ํ•™์Šต์ด ๊ฑฐ์˜ ์ง„ํ–‰๋˜์ง€ ๋ชปํ•˜๊ณ  (๋ณด์ƒ๊ณผ ์„ฑ๊ณต๋ฅ  ์ •์ฒด), RSI๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ์—๋„ ์„ฑ๊ณต๋ฅ ์ด ์ดˆ๊ธฐ ์ €์กฐํ•œ ์ˆ˜์ค€์—์„œ ํฌ๊ฒŒ ์˜ค๋ฅด์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด RSI+๋ธํƒ€ ์•ก์…˜ ์กฐํ•ฉ์„ ์“ฐ๋ฉด ์งง์€ ํ•™์Šต ์‹œ๊ฐ„ ๋‚ด์— ๋ณด์ƒ์ด ํฌ๊ฒŒ ์ƒ์Šนํ•˜๊ณ  ์ตœ์ข… ์„ฑ๊ณต๋ฅ ๋„ 95%๊นŒ์ง€ ๋„๋‹ฌํ•˜๋Š” ๋“ฑ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฐ๊ณผ๋Š” ์•ˆ์ •์  ํœด๋จธ๋…ธ์ด๋“œ ํ•™์Šต์— ๋ธํƒ€ ์•ก์…˜๊ณผ ์ดˆ๊ธฐํ™” ์ „๋žต์ด ์–ผ๋งˆ๋‚˜ ์ค‘์š”ํ•œ์ง€๋ฅผ ์ž˜ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ๊ต์‚ฌ ์ •์ฑ…์€ ํŠน๊ถŒ์ •๋ณด ๋•๋ถ„์— ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์•ˆ์—์„œ ๋น„๊ต์  ์ˆ˜์›”ํ•˜๊ฒŒ โ€œ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€โ€์˜ ์žฅ๊ธฐ ํ–‰๋™์„ ์ตํ˜”์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์—ฌ๊ธฐ์—” ์•ˆ์ •์ ์ธ ์•ก์…˜ ํ‘œํ˜„(๋ธํƒ€)๊ณผ ํšจ๊ณผ์ ์ธ ์ดˆ๊ธฐํ™”(RSI)๊ฐ€ ํ•ต์‹ฌ ์—ญํ• ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ด๋ ‡๊ฒŒ ํ›ˆ๋ จ๋œ ๊ต์‚ฌ ์ •์ฑ…์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์‹ค์ œ ๋กœ๋ด‡์— ์ ์šฉ ๊ฐ€๋Šฅํ•œ ํ•™์ƒ ์ •์ฑ…์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ€๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ•™์ƒ ์ •์ฑ… ์ฆ๋ฅ˜: ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ •์ฑ…์˜ ๋Œ€๊ทœ๋ชจ ํ•™์Šต

VIRAL์˜ 2๋‹จ๊ณ„๋Š” ํ•™์ƒ ์ •์ฑ…(์‹œ๊ฐ ์ •์ฑ…)์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•™์ƒ ์ •์ฑ…์€ ์˜ค์ง ์‹ค์ œ ๋กœ๋ด‡์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด๋งŒ์„ ์ด์šฉํ•˜๋„๋ก ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋กœ๋ด‡์˜ ๋‚ด์žฅ ์„ผ์„œ ๋ฐ์ดํ„ฐ(๊ด€์ ˆ ๊ฐ/์†๋„, IMU ๋“ฑ)์™€ ๋กœ๋ด‡ ๋จธ๋ฆฌ์— ๋‹ฌ๋ฆฐ RGB ์นด๋ฉ”๋ผ ์˜์ƒ๋งŒ์„ ๊ด€์ธก์œผ๋กœ ๋ฐ›์•„๋“ค์ž…๋‹ˆ๋‹ค. ๋‹น์—ฐํžˆ ๊ต์‚ฌ์ฒ˜๋Ÿผ ๋ฌผ์ฒด์˜ ์ขŒํ‘œ๋‚˜ ๋กœ๋ด‡-๋ฌผ์ฒด ๊ฐ„ ๊ฑฐ๋ฆฌ ๋“ฑ์˜ ์€๋ฐ€ํ•œ ์ •๋ณด๋Š” ํ™œ์šฉํ•˜์ง€ ๋ชปํ•˜๋ฏ€๋กœ, ๋ฌธ์ œ๊ฐ€ ํ›จ์”ฌ ์–ด๋ ต์ฃ . ๋Œ€์‹  ํ•™์ƒ์€ ๊ต์‚ฌ ์ •์ฑ…์˜ ํ–‰๋™์„ ๋ชจ๋ฐฉํ•˜๋ฉด์„œ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด โ€œ์ •์ฑ… ์ฆ๋ฅ˜โ€ ๋‹จ๊ณ„๋กœ, ๊ต์‚ฌ์˜ ๊ฒฐ์ •์  ์—ญํ• ์€ ๋ฐ”๋กœ ์—ฌ๊ธฐ๊นŒ์ง€์ž…๋‹ˆ๋‹ค.

ํ•™์ƒ ์ •์ฑ… ํ•™์Šต์—๋Š” ๋ชจ๋ฐฉํ•™์Šต(IL) ๊ธฐ๋ฒ•์ธ DAgger(Dataset Aggregation)์™€ Behavior Cloning(ํ–‰๋™ ๋ณต์ œ)์˜ ํ˜ผํ•ฉ ๋ฐฉ์‹์ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ดˆ๊ธฐ์—๋Š” ๊ต์‚ฌ ์ •์ฑ…์˜ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹จ์ˆœ ๋ชจ๋ฐฉ(BC)ํ•˜๊ณ , ์ดํ›„ ํ•™์ƒ์ด ์ž๊ธฐ ํ–‰๋™์œผ๋กœ ๊ฒฝํ—˜์„ ์Œ“์œผ๋ฉด์„œ ์ˆ˜์‹œ๋กœ ๊ต์‚ฌ์˜ ์ •๋‹ต์„ ์ฐธ์กฐํ•ด ์˜ค๋ฅ˜๋ฅผ ๊ต์ •(online DAgger)ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” BC ๋Œ€ DAgger์˜ ๋น„์œจ์„ ์•ฝ 50:50(\alpha=0.5)์œผ๋กœ ์„ž๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์•ˆ์ •์ ์ด์—ˆ๋‹ค๊ณ  ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค. ์ˆœ์ˆ˜ Behavior Cloning๋งŒ ํ–ˆ์„ ๋•Œ๋ณด๋‹ค๋Š” DAgger ํ˜ผํ•ฉ ์‹œ ์‹ค์ œ ๋กœ๋ด‡ ๋ฐฐ์น˜ ์‹œ ์„ฑ๊ณต๋ฅ ์ด ํ–ฅ์ƒ๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ•™์ƒ์ด ์ž๊ธฐ ์˜ค๋ฅ˜๋ฅผ ์Šค์Šค๋กœ ๊ฒฝํ—˜ํ•˜๊ณ  ์ˆ˜์ •ํ•  ๊ธฐํšŒ๋ฅผ ์คŒ์œผ๋กœ์จ, ๋ฐฐ์šด ์ •์ฑ…์ด ์ดˆ๊ธฐ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์—์„œ ๋ฒ—์–ด๋‚˜๋„ ๊ฐ•๊ฑดํ•จ์„ ์œ ์ง€ํ•˜๊ฒŒ ํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์ด์ฃ .

ํ•™์ƒ ์ •์ฑ…์˜ ์ž…๋ ฅ ์ค‘ ๊ฐ€์žฅ ๊นŒ๋‹ค๋กœ์šด ๋ถ€๋ถ„์€ ๊ณ ํ•ด์ƒ๋„ RGB ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ์ž…๋‹ˆ๋‹ค. VIRAL์€ ์ด๋ฅผ ์œ„ํ•ด ์ตœ์‹  ๋น„์ „ ๋ฐฑ๋ณธ ์‹ ๊ฒฝ๋ง์ธ DINOv3๋ฅผ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. DINOv3๋Š” ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ ์ž๊ธฐ์ง€๋„ ํ•™์Šต(self-supervised)๋œ ๊ฐ•๋ ฅํ•œ ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ๋กœ, ์ด๋ฏธ์ง€์˜ ์œ ์šฉํ•œ ํŠน์ง• ํ‘œํ˜„์„ ๋ฝ‘์•„์ค๋‹ˆ๋‹ค. ์ด ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ๊ฐ€ ๋ฝ‘์€ ์ด๋ฏธ์ง€ feature๋ฅผ ๋กœ๋ด‡์˜ proprioception ์„ผ์„œ๊ฐ’๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ, ํ•™์ƒ ์ •์ฑ…์˜ ์ตœ์ข… ๊ฒฐํ•ฉ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ํ•™์ƒ ์ •์ฑ…์ด ์‹œ๊ฐ„์ƒ์˜ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก, history-aware architecture(์˜ˆ: RNN์ด๋‚˜ ์‹œ๊ณ„์—ด ์Šคํƒ)๋„ ์‹œ๋„๋˜์—ˆ์œผ๋ฉฐ ์ด๋Ÿฌํ•œ ์ด๋ ฅ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์ด ๋‹จ๋ฐœ ๊ด€์ธก ๋ชจ๋ธ๋ณด๋‹ค ์ผ๊ด€๋˜๊ฒŒ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์š”์ปจ๋Œ€ โ€œ์ข‹์€ ์‹œ๊ฐ ํ”ผ์ฒ˜ + ๊ณผ๊ฑฐ ์ƒํƒœ ํ™œ์šฉโ€์ด ํ•™์ƒ ์ •์ฑ… ํ•™์Šต์—๋„ ์ค‘์š”ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•™์ƒ ์ •์ฑ… ํ›ˆ๋ จ์€ ์—„์ฒญ๋‚œ ์–‘์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ๊ฐ ์ž…๋ ฅ์ด ์žˆ๋‹ค ๋ณด๋‹ˆ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์‹ค์ œ์™€ ์œ ์‚ฌํ•œ ์˜์ƒ ๋ Œ๋”๋ง์„ ๋Œ€๋Ÿ‰ ์ƒ์„ฑํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด NVIDIA์˜ Isaac Sim ๊ธฐ๋ฐ˜ ํ”Œ๋žซํผ์ธ Isaac Lab๋ฅผ ํ™œ์šฉ, ๋ฉ€ํ‹ฐ GPU ๋ถ„์‚ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ๋Œ๋ ธ์Šต๋‹ˆ๋‹ค. ์ด๋ฅธ๋ฐ” โ€œtiled renderingโ€ ๊ธฐ๋ฒ•์œผ๋กœ, ์—ฌ๋Ÿฌ GPU์— ๊ฑธ์ณ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์—์„œ ๋™์‹œ๋‹ค๋ฐœ์ ์œผ๋กœ ํ™”๋ฉด ๋ Œ๋”๋ง๊ณผ ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ์„ ์ˆ˜ํ–‰ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘/ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. VIRAL ์‹คํ—˜์—์„œ๋Š” ์ตœ๋Œ€ 64๊ฐœ์˜ GPU๊นŒ์ง€ ํˆฌ์ž…ํ•˜์—ฌ ํ•™์ƒ ์ •์ฑ…์„ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์—ฐ์‚ฐ์„ ์Šค์ผ€์ผ ์—…(scale up)ํ•˜์ž, ์žฅ์‹œ๊ฐ„ ๋ณต์žกํ•œ ๊ณผ์ œ๋„ ์•ˆ์ •์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์—ˆ๋Š”๋ฐ์š”, ๋…ผ๋ฌธ์—์„œ๋Š” โ€œ์ด ์ •๋„ ๋Œ€๊ทœ๋ชจ๋กœ ํ•˜์ง€ ์•Š์œผ๋ฉด ์˜คํžˆ๋ ค ํ•™์Šต์ด ์‹คํŒจํ•˜๊ฑฐ๋‚˜ ์ •์ฑ…์ด ๋ถˆ์•ˆ์ •ํ–ˆ๋‹คโ€๋ผ๊ณ ๊นŒ์ง€ ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ ๋ณ„๋„ ์„น์…˜์—์„œ ๋” ์ž์„ธํžˆ ๋‹ค๋ฃจ๊ฒ ์ง€๋งŒ, ์—ฐ์‚ฐ ์ž์›์˜ ์Šค์ผ€์ผ์ด ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์ด ์ง€๋Œ€ํ–ˆ๋˜ ๊ฒƒ์ด ์ด ์—ฐ๊ตฌ์˜ ์ค‘์š”ํ•œ ๋ฉ”์‹œ์ง€์ž…๋‹ˆ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ ๋Œ€๊ทœ๋ชจ ํ•™์Šต์˜ ์—ญํ• 

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์€ VIRAL์˜ ํ•™์Šต์— ์žˆ์–ด์„œ ์‚ฌ์‹ค์ƒ์˜ ๋ฐ์ดํ„ฐ ๊ณต์žฅ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ํ˜„์‹ค์—์„œ ๋กœ๋ด‡์ด ๋ฐ์ดํ„ฐ 1ํšŒ๋ฅผ ์ˆ˜์ง‘ํ•˜๋ ค๋ฉด ์‚ฌ๋žŒ์ด ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ค€๋น„ํ•˜๊ณ  ๋กœ๋ด‡์ด ์›€์ง์ด๊ณ  ์„ผ์„œ๋กœ ์ฝ์–ด์˜ค๋Š” ๋“ฑ ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์ด ๋“ค์ง€๋งŒ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์†ํ•˜๊ณ  ๋ณ‘๋ ฌ๋กœ ๋ณต์ œํ•จ์œผ๋กœ์จ ํ˜„์‹ค ๋Œ€๋น„ 10,000๋ฐฐ ์†๋„๋กœ ๊ฒฝํ—˜์„ ์Œ“์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค๊ณ  ์—ฐ๊ตฌํŒ€์€ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ๋ฌผ๋ฆฌ ๊ฐ€์†๊ณผ ๋ณ‘๋ ฌํ™”๋ฅผ ํ†ตํ•ด ๋ฐฉ๋Œ€ํ•œ ํŠธ๋ ˆ์ด๋‹ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹จ๊ธฐ๊ฐ„์— ํ™•๋ณดํ•˜๋Š” ๊ฒƒ์ด VIRAL ์„ฑ๊ณผ์˜ ํ•œ ์ถ•์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ, ์•ž์„œ ์–ธ๊ธ‰ํ•œ GPU 64์žฅ ๊ทœ๋ชจ์˜ ๋ณ‘๋ ฌ ํ•™์Šต์€ ์ด ์—ฐ๊ตฌ๊ฐ€ ๊ฐ•์กฐํ•˜๋Š” ๋ฐ”์™€ ๊ฐ™์ด โ€œ์„ ํƒ์ด ์•„๋‹Œ ํ•„์ˆ˜โ€์˜€๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์ธ ์ˆ˜์น˜๋ฅผ ์‚ดํŽด๋ณด๋ฉด, ๊ต์‚ฌ ์ •์ฑ…์€ 16 GPU ์ •๋„๋กœ๋„ ์ถฉ๋ถ„ํžˆ ์•ˆ์ •์  ํ•™์Šต์ด ๊ฐ€๋Šฅํ–ˆ์ง€๋งŒ (๋” ์ ์œผ๋ฉด ์‹คํŒจ ํ™•๋ฅ  ์ฆ๊ฐ€), ํ•™์ƒ ์ •์ฑ… ์ฆ๋ฅ˜์—๋Š” 64 GPU๊นŒ์ง€ ํ™œ์šฉํ•˜์—ฌ ๊ฒจ์šฐ ์›ํ•˜๋Š” ์„ฑ๋Šฅ์— ๋„๋‹ฌํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ๋“ค์€ GPU ๊ฐœ์ˆ˜์— ๋”ฐ๋ฅธ ํ•™์Šต ์†๋„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๊ทธ๋ฆผ 3: ๊ต์‚ฌ ์ •์ฑ… ํ•™์Šต์—์„œ์˜ ์—ฐ์‚ฐ ์Šค์ผ€์ผ๋ง ํšจ๊ณผ. ๊ต์‚ฌ ์ •์ฑ…์„ 1, 2, 4, 8, 16๊ฐœ์˜ GPU๋กœ ๋ณ‘๋ ฌ ํ•™์Šต์‹œํ‚จ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•œ ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค. ์ขŒ์ธก ๊ทธ๋ž˜ํ”„๋Š” ํ•™์Šต ๊ฒฝ๊ณผ ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ๋ˆ„์  ๋ณด์ƒ ๊ณก์„ ์„, ์šฐ์ธก ๊ทธ๋ž˜ํ”„๋Š” ์ž‘์—… ์„ฑ๊ณต๋ฅ  ๊ณก์„ ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ GPU ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์„์ˆ˜๋ก ๋™์ผ ์‹œ๊ฐ„ ๋‚ด ๋” ๋น ๋ฅด๊ฒŒ ๋†’์€ ์„ฑ๊ณผ์— ๋„๋‹ฌํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ 16 GPU(๋…ธ๋ž€์ƒ‰)๋Š” ์•ฝ 10์‹œ๊ฐ„ ๋งŒ์— ๊ฑฐ์˜ ์ตœ๋Œ€ ๋ณด์ƒ๊ณผ 90% ์ด์ƒ์˜ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•œ ๋ฐ˜๋ฉด, 1 GPU(๋ณด๋ผ์ƒ‰)๋Š” ๊ฐ™์€ ์‹œ๊ฐ„์— ํ˜„์ €ํžˆ ๋‚ฎ์€ ๋ณด์ƒ๊ณผ ์•ฝ 30% ์„ฑ๊ณต๋ฅ ์— ๊ทธ์ณค์Šต๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ๋ณ‘๋ ฌ๋„ ์ฆ๊ฐ€๊ฐ€ ํ•™์Šต์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์ตœ์ข… ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ๋Œ์–ด์˜ฌ๋ ธ์œผ๋ฉฐ, GPU ์ž์›์ด ์ถฉ๋ถ„์น˜ ์•Š์„ ๊ฒฝ์šฐ ๊ต์‚ฌ RL์กฐ์ฐจ ํ•™์Šต ์‹คํŒจ์— ๊ฐ€๊นŒ์šด ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ด๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ฆผ 4: ํ•™์ƒ ์ •์ฑ… ํ•™์Šต์—์„œ์˜ ์—ฐ์‚ฐ ์Šค์ผ€์ผ๋ง ํšจ๊ณผ. ํ•™์ƒ ์ •์ฑ… ์ฆ๋ฅ˜์—๋„ GPU ๋ณ‘๋ ฌ ์ž์›์˜ ํฌ๊ธฐ๊ฐ€ ๊ฒฐ์ •์ ์ž…๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ž˜ํ”„๋Š” 1, 2, 4, 8, 16, 32, 64 GPU๋กœ ํ•™์Šต์‹œํ‚จ ๊ฒฝ์šฐ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. ์ขŒ์ธก์€ ๊ต์‚ฌ-ํ•™์ƒ ํ–‰๋™ ์ฐจ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ฆ๋ฅ˜ ์†์‹ค(distillation loss)์˜ ๊ฐ์†Œ ์ถ”์ด๋ฅผ, ์šฐ์ธก์€ ๊ฒฐ๊ณผ ์ •์ฑ…์˜ ์„ฑ๊ณต๋ฅ ์„ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. GPU ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์„์ˆ˜๋ก ์†์‹ค ๊ฐ์†Œ๊ฐ€ ๊ฐ€ํŒŒ๋ฅด๊ณ  ์•ˆ์ •์ ์ด๋ฉฐ, ์ตœ์ข… ์„ฑ๋Šฅ(์„ฑ๊ณต๋ฅ )๋„ ๋” ๋†’์Šต๋‹ˆ๋‹ค. ํŠนํžˆ GPU 1๊ฐœ๋กœ ํ•™์Šตํ•œ ๊ฒฝ์šฐ(๋ณด๋ผ์ƒ‰)๋Š” ์†์‹ค ๊ฐ์†Œ ํญ์ด ์ž‘๊ณ  ์„ฑ๊ณต๋ฅ ์ด ~70% ์ˆ˜์ค€์— ๋จธ๋ฌด๋Š” ๋ฐ˜๋ฉด, 64๊ฐœ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ(๋…ธ๋ž€์ƒ‰)๋Š” ์†์‹ค์ด ๋น ๋ฅด๊ฒŒ ์ค„๊ณ  ์„ฑ๊ณต๋ฅ ๋„ ~85% ์ด์ƒ์œผ๋กœ ํ–ฅ์ƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ์ด๋Ÿฌํ•œ ์‹คํ—˜์„ ํ†ตํ•ด โ€œ๋Œ€๊ทœ๋ชจ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์—†์ด๋Š” ์žฅ์‹œ๊ฐ„ ์‹œ๊ฐ ์ •์ฑ… ํ•™์Šต์ด ์•„์˜ˆ ๋ถˆ๊ฐ€๋Šฅํ•˜๊ฑฐ๋‚˜ ๋งค์šฐ ๋ถˆ์•ˆ์ •ํ•˜๋‹คโ€๋Š” ๊ฒฐ๋ก ์„ ์–ป์—ˆ๊ณ , ์Šค์ผ€์ผ ์—…์ด ์„ฑ๋Šฅ์„ ์œ„ํ•œ ์œ ์ผ๋ฌด์ดํ•œ ๊ธธ์ž„์„ ๊ฐ•์กฐํ–ˆ์Šต๋‹ˆ๋‹ค.

์ •๋ฆฌํ•˜๋ฉด, VIRAL ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•œ ํ•™์Šต์ด ๋กœ๋ด‡์˜ ๋ณต์žกํ•œ ์žฅ๊ธฐ ๊ณผ์ œ ํ•ด๊ฒฐ์„ ๊ฐ€์†ํ™”ํ•˜๊ณ  ๊ฐ€๋Šฅ์ผ€ ํ–ˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ถฉ๋ถ„ํ•œ ์—ฐ์‚ฐ์ž์›์ด ์ฃผ์–ด์ง€๋ฉด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ƒ์—์„œ โ€œ๊ฒฝํ—˜์˜ ํญ๋ฐœ์  ์ฆ๊ฐ€โ€๋ฅผ ์ด๋Œ์–ด ๋‚ผ ์ˆ˜ ์žˆ๊ณ , ์ด๋Š” ๊ณง ์ •์ฑ…์˜ ์„ฑ๋Šฅ ๋ฐ ์ผ๋ฐ˜ํ™” ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค. ๋ฌผ๋ก , ์ด๋ ‡๊ฒŒ ํฐ ๊ทœ๋ชจ์˜ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ์ƒ๋‹นํ•œ ๊ธฐ์ˆ  ์ธํ”„๋ผ์™€ ๋น„์šฉ์ด ํ•„์š”ํ•˜๋‹ค๋Š” ํ˜„์‹ค์ ์ธ ์ œ์•ฝ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ›„์ˆ ํ•  ํ•œ๊ณ„์ ์—์„œ๋„ ๋…ผ์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Sim-to-Real ์ „์ด ๊ธฐ๋ฒ•: ์‹œ๊ฐ ๋„๋ฉ”์ธ ๋žœ๋คํ™”์™€ ์‹ค์ œ-์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ •ํ•ฉ

์ด์ œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ž˜ ํ•™์Šต๋œ ํ•™์ƒ ์‹œ๊ฐ ์ •์ฑ…์„ ์‹ค์ œ ๋กœ๋ด‡์— ์˜ฎ๊ธฐ๋Š” ๋‹จ๊ณ„, ์ฆ‰ Sim-to-Real ์ „์ด์˜ ๊ณผ์ œ๋กœ ๋„˜์–ด๊ฐ‘๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ํ˜„์‹ค์˜ ์ฐจ์ด๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ํ”ํžˆ โ€œ๋ฆฌ์–ผ๋ฆฌํ‹ฐ ๊ฐญ(reality gap)โ€์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š”๋ฐ, VIRAL์—์„œ๋Š” ์ด๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐˆ๋ž˜ ์ ‘๊ทผ์„ ์ทจํ–ˆ์Šต๋‹ˆ๋‹ค: (a) ์‹œ๊ฐ์  ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋กœ ํ•™์ƒ ์ •์ฑ…์„ ํŠผํŠผํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ, (b) ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ž์ฒด๋ฅผ ํ˜„์‹ค์— ๊ฐ€๊น๊ฒŒ ์†๋ณด๊ธฐ.

์‹œ๊ฐ ๋„๋ฉ”์ธ ๋žœ๋คํ™”(Visual Domain Randomization)๋Š” ์ตœ๊ทผ Sim-to-Real ์—ฐ๊ตฌ์—์„œ ๋„๋ฆฌ ์“ฐ์ด๋Š” ๊ธฐ๋ฒ•์œผ๋กœ, ํ›ˆ๋ จ ์‹œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋ Œ๋”๋ง ์Šคํƒ€์ผ์„ ๋ฌด์ž‘์œ„๋กœ ๋‹ค์–‘ํ™”ํ•˜์—ฌ ์—์ด์ „ํŠธ๊ฐ€ ์ผ๋ฐ˜ํ™”๋œ ์‹œ๊ฐ ์ธ์‹ ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ”๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. VIRAL์—์„œ๋Š” ๋งค์šฐ ํญ๋„“์€ ๋žœ๋คํ™”๋ฅผ ๋„์ž…ํ–ˆ๋Š”๋ฐ, ์กฐ๋ช…, ์žฌ์งˆ, ์นด๋ฉ”๋ผ, ์˜์ƒ ํšจ๊ณผ ๋“ฑ ๋‹ค๋ฐฉ๋ฉด์— ๊ฑธ์ณค์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ์‚ดํŽด๋ณด๋ฉด:

  • ์กฐ๋ช… ํ™˜๊ฒฝ(random lighting): ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ์ „์—ญ ์กฐ๋ช…(Dome light)์„ ๋‹ค์–‘ํ•œ ๋ฐ๊ธฐ, ์ƒ‰์˜จ๋„, ๊ทธ๋ฆผ์ž ํŒจํ„ด ๋“ฑ์œผ๋กœ ๋ฌด์ž‘์œ„ ๋ณ€๊ฒฝ
  • ์žฌ์งˆ ์†์„ฑ(random materials): ๋ฐ”๋‹ฅ, ํ…Œ์ด๋ธ”, ๋ฌผ์ฒด์˜ ์ƒ‰์ƒ, ์งˆ๊ฐ, ๋ฐ˜์‚ฌํŠน์„ฑ ๋“ฑ์„ ์ž„์˜๋กœ ๋ฐ”๊ฟˆ (์˜ˆ: ํ…Œ์ด๋ธ”๋ณด ์ƒ‰์ƒ์ด๋‚˜ ํŒจํ„ด์„ ๋งค ์—ํ”ผ์†Œ๋“œ ๋‹ค๋ฅด๊ฒŒ)
  • ์นด๋ฉ”๋ผ ์™ธ๋ถ€/๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ(random camera): ์นด๋ฉ”๋ผ์˜ ์œ„์น˜/๋ฐฉํ–ฅ(Extrinsics)์„ ์•ฝ๊ฐ„์”ฉ ํ”๋“ค๊ณ , ํ™”๊ฐ ๋“ฑ ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ(Intrinsics)๋„ ์ œ์กฐ์‚ฌ ์ŠคํŽ™ ๋‚ด์—์„œ ๋ณ€๋™์‹œํ‚ด. ๋˜ํ•œ ์นด๋ฉ”๋ผ ์ง€์—ฐ(latency)๋„ ๋žœ๋ค ์ง€์—ฐ์„ ์ฃผ์–ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ์™„๋ฒฝํ•œ ์‹ค์‹œ๊ฐ„ ๋™๊ธฐํ™”๊ฐ€ ์•„๋‹ˆ์–ด๋„ ๊ฐ•์ธํ•˜๊ฒŒ ๋Œ€์‘ํ•˜๋„๋ก ํ•จ
  • ์ด๋ฏธ์ง€ ํ’ˆ์งˆ(random image effects): ๋ฐ๊ธฐ(brightness), ๋Œ€๋น„(contrast), ์ฑ„๋„(saturation), ์ƒ‰์กฐ(hue)๋ฅผ ๋ณ€ํ™”์‹œํ‚ค๊ณ , ์˜์ƒ์— ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋‚˜ ๋ธ”๋Ÿฌ ๋“ฑ์„ ์ž„์˜๋กœ ์ถ”๊ฐ€

๊ทธ๋ฆผ 5: ํ•™์ƒ ์ •์ฑ… ํ›ˆ๋ จ ์‹œ ์ ์šฉ๋œ ์‹œ๊ฐ ๋„๋ฉ”์ธ ๋žœ๋คํ™” ์˜ˆ์‹œ. ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฌด์ž‘์œ„ ์‹œ๊ฐ ๋ณ€ํ™˜์„ ํ†ตํ•ด, ํ•™์ƒ ๋„คํŠธ์›Œํฌ๊ฐ€ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ๋ณ€ํ™”์—๋„ ๊ฒฌ๋”œ ์ˆ˜ ์žˆ๊ฒŒ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. (์œ„๋ถ€ํ„ฐ) ์ „์—ญ ์กฐ๋ช…์„ ๋ฐ”๊ฟ” ์‹ค๋‚ด/์‹ค์™ธ ํ˜น์€ ๋ฐ๊ธฐ ๋ณ€ํ™”๋ฅผ ์ฃผ๊ณ , ์ด๋ฏธ์ง€ ์ƒ‰๊ฐ๊ณผ ํ™”์งˆ์„ ํ”๋“ค๋ฉฐ, ์žฌ์งˆ ์†์„ฑ๋„ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. ์˜ค๋ฅธ์ชฝ ์˜ˆ์‹œ์—์„œ๋Š” ์นด๋ฉ”๋ผ ์‹œ์  ๋ณ€๋™๋„ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” ์นด๋ฉ”๋ผ extrinsics ๋žœ๋คํ™”๋กœ์„œ ์‹ค์ œ์™€ ์•ฝ๊ฐ„ ๋‹ค๋ฅธ ๊ฐ๋„/์œ„์น˜๋กœ ๋ Œ๋”๋งํ•˜์—ฌ๋„ ์ •์ฑ…์ด ์ผ๊ด€๋˜๊ฒŒ ๋™์ž‘ํ•˜๋„๋ก ๋งŒ๋“  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋‹ค์–‘ํ•œ ๋žœ๋คํ™”๋ฅผ ๋™์‹œ์— ์‹ค์‹œํ•˜๋ฉด, ํ›ˆ๋ จ๋œ ์ •์ฑ…์€ ํ˜„์‹ค์—์„œ ๋งˆ์ฃผ์น  ์ˆ˜ ์žˆ๋Š” ์˜ˆ๊ธฐ์น˜ ๋ชปํ•œ ์‹œ๊ฐ ์ฐจ์ด๋“ค(์˜ˆ: ์กฐ๋ช…์ด ์–ด๋‘ก๊ฑฐ๋‚˜, ์นด๋ฉ”๋ผ์— ์žก์Œ์ด ๋‚€ ๊ฒฝ์šฐ ๋“ฑ)์— ๋Œ€ํ•ด์„œ๋„ ๋ฏผ๊ฐํ•˜์ง€ ์•Š๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ๋ถ„์„์—์„œ๋„ ์‹œ๊ฐ ๋žœ๋คํ™” ์š”์†Œ๋“ค์„ ๋ชจ๋‘ ๋„๋ฉด ์„ฑ๋Šฅ์ด 35.1%p๋‚˜ ๋–จ์–ด์กŒ๋‹ค๊ณ  ํ•˜๋‹ˆ, ๊ทธ ํšจ๊ณผ๊ฐ€ ๋งค์šฐ ํฌ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์œผ๋กœ Sim-to-Real์˜ ๋‘ ๋ฒˆ์งธ ์ถ•์€, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ž์ฒด๋ฅผ ํ˜„์‹ค๊ณผ ๋น„์Šทํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ๊ฒฉ์ฐจ๋ฅผ ์ขํžˆ๋Š” ๋…ธ๋ ฅ์ž…๋‹ˆ๋‹ค. ์•„๋ฌด๋ฆฌ ์ •์ฑ…์ด ๊ฐ•๊ฑดํ•ด๋„, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ ์‹ค์ œ ๋กœ๋ด‡ ์‚ฌ์ด์— ๊ทผ๋ณธ์  ๋ชจ๋ธ ์ฐจ์ด๊ฐ€ ๋„ˆ๋ฌด ํฌ๋ฉด ๊ฒฐ๊ตญ ์„ฑ๋Šฅ ํ•œ๊ณ„๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. VIRAL์—์„œ๋Š” ํŠนํžˆ ๋กœ๋ด‡ ์†๊ณผ ์นด๋ฉ”๋ผ์— ์ดˆ์ ์„ ๋งž์ถฐ ์‹œ์Šคํ…œ ์‹๋ณ„(System ID)๊ณผ ๋ณด์ •(calibration)์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ์†๊ฐ€๋ฝ ์‹œ์Šคํ…œ ์‹๋ณ„: Unitree G1 ํœด๋จธ๋…ธ์ด๋“œ์˜ ์†์€ 3์†๊ฐ€๋ฝ ๋ฑ์Šคํ„ฐ๋Ÿฌ์Šค ํ•ธ๋“œ์ธ๋ฐ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์ƒ ์ด ์†์˜ ์›€์ง์ž„๊ณผ ์‹ค์ œ ์†์˜ ์›€์ง์ž„์ด ๋‹ค๋ฅด๋ฉด ๋ฌผ์ฒด๋ฅผ ์ฅ๋Š” ํ–‰๋™์— ํฐ ์˜ํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ์†๊ฐ€๋ฝ ๊ด€์ ˆ์˜ ๋ชจํ„ฐ ๋ชจํ˜• ํŒŒ๋ผ๋ฏธํ„ฐ(์˜ˆ: ๋ชจํ„ฐ ์•”์ถ”์–ด, ์Šคํ”„๋ง ๊ฐ•์„ฑ, ๊ฐ์‡  ๊ณ„์ˆ˜ ๋“ฑ)๋ฅผ ์‹ค์ œ ์ธก์ • ๋ฐ์ดํ„ฐ์— ๋งž์ถ”์–ด ํŠœ๋‹ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๋กœ๋ด‡ ์†์— ๋Œ€ํ•œ ์ •๋ฐ€ ์‹œ์Šคํ…œ ๋™์ •์„ ํ†ตํ•ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ์†์ด ๊ฐ€๋Šฅํ•˜๋ฉด ์‹ค์ œ์™€ ๋™์ž‘ ํŠน์„ฑ์ด ๊ฐ™๊ฒŒ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ SysID ์ ์šฉ ์ „ํ›„๋กœ ์† ๋™์ž‘์˜ ์ผ์น˜๋„๊ฐ€ ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค (๊ฐ€๋ น, SysID ์ „์—๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ๋Š” ์ž˜ ์žก๋˜ ๋ฌผ์ฒด๋ฅผ ์‹ค์ œ์—์„  ๋†“์น˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋Š”๋ฐ ํŠœ๋‹ ํ›„ ํ•ด๊ฒฐ๋˜๋Š” ์‹์œผ๋กœ, โ€œ์ฒœ์ง€์ฐจ์ดโ€๋ผ๊ณ  ํ‘œํ˜„๋˜๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค).

  • ์นด๋ฉ”๋ผ FOV ์ •ํ•ฉ ๋ฐ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜: ๋กœ๋ด‡ ๋จธ๋ฆฌ์˜ RGB-D ์นด๋ฉ”๋ผ(Intel RealSense D435i)์˜ ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ(์ดˆ์ , ํ™”๊ฐ ๋“ฑ)๋ฅผ ์ œ์กฐ์‚ฌ ์ œ๊ณต๊ฐ’์œผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์— ์„ค์ •ํ•˜๊ณ , ์™ธ๋ถ€ ์œ„์น˜/๊ฐ๋„๋Š” ์‹ค์ œ ์žฅ์ฐฉ๋œ ์นด๋ฉ”๋ผ๊ฐ€ ๋ณด๋Š” ํ™”๊ฐ๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ Œ๋”๋ง ํ™”๊ฐ์ด ์ผ์น˜ํ•˜๋„๋ก ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‰ฝ๊ฒŒ ๋งํ•ด, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์† ์นด๋ฉ”๋ผ๊ฐ€ ์‹ค์ œ ์นด๋ฉ”๋ผ โ€œ๋ˆˆโ€๊ณผ ์ตœ๋Œ€ํ•œ ๊ฐ™์€ ์žฅ๋ฉด์„ ๋ณด๋„๋ก ๋งž์ถ˜ ๊ฒƒ์ด์ฃ . ๋˜ํ•œ ์•ž์„œ ๋žœ๋คํ™” ๋ถ€๋ถ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ, ํ›ˆ๋ จ ์ค‘์—๋Š” extrinsics(์นด๋ฉ”๋ผ ์œ„์น˜/๊ฐ)๋„ ์‚ด์ง ๋žœ๋ค ๋ณ€ํ™”๋ฅผ ์ฃผ์–ด, ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ์˜ค์ฐจ๋‚˜ ํ”๋“ค๋ฆผ์ด ์žˆ์–ด๋„ ๋ฌธ์ œ์—†๊ฒŒ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

์œ„์˜ ์‹œ๊ฐ ๋žœ๋คํ™”์™€ ์‹ค์ œ ์ •ํ•ฉ ์ž‘์—… ๋•๋ถ„์—, VIRAL์˜ ์ตœ์ข… ํ•™์ƒ ์ •์ฑ…์€ ํ˜„์‹ค ์„ธ๊ณ„๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ „์ด๋  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ด ์—ฐ๊ตฌ๋Š” ํ˜„์‹ค ๋ฐ์ดํ„ฐ๋กœ ๋ฏธ์„ธ ์กฐ์ •(fine-tuning)์„ ์ „ํ˜€ ํ•˜์ง€ ์•Š์•˜์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์„ฑ๊ณต์ ์ธ ์ •์ฑ… ์‹คํ–‰์„ ๋ณด์—ฌ์ฃผ์—ˆ๋Š”๋ฐ์š”, ์ด๋Š” ๋Œ€๊ทœ๋ชจ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•™์Šต๊ณผ ์„ฌ์„ธํ•œ Sim-to-Real ๊ธฐ๋ฒ•์ด ํ•ฉ์ณ์ ธ โ€œ๋„๋ฉ”์ธ ๊ฐ„ ๊ฒฉ์ฐจโ€๋ฅผ ์ถฉ๋ถ„ํžˆ ๋ฉ”์› ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ: 54ํšŒ ์—ฐ์† ์‚ฌ์ดํด, ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ

๊ทธ๋ ‡๋‹ค๋ฉด ์ด๋ ‡๊ฒŒ ํ•™์Šต๋œ ์‹œ๊ฐ ์ •์ฑ…์„ ์‹ค์ œ ์œ ๋‹ˆํŠธ๋ฆฌ G1 ํœด๋จธ๋…ธ์ด๋“œ์— ํˆฌ์ž…ํ–ˆ์„ ๋•Œ ์–ด๋–ค ์ผ์ด ๋ฒŒ์–ด์กŒ์„๊นŒ์š”? ๋…ผ๋ฌธ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด, ๋‹จ ํ•œ ๋ฒˆ์˜ ์‹ค์„ธ๊ณ„ ์žฌํ•™์Šต ์—†์ด๋„ ๋กœ๋ด‡์€ ์—ฐ์†์ ์ธ ๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜ ์ž‘์—…์„ ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๋กœ๋ด‡์€ โ€œ๊ฑท๊ธฐ-์ •์ง€-๋ฌผ์ฒด ๋†“๊ธฐ-์ง‘๊ธฐ-ํšŒ์ „โ€์˜ ์ผ๋ จ์˜ ํ–‰๋™ ์‚ฌ์ดํด์„ ๋ฐ˜๋ณตํ•˜๋Š” ์‹คํ—˜์—์„œ ์ตœ๋Œ€ 54ํšŒ ์—ฐ์† ์„ฑ๊ณต ์‚ฌ์ดํด์„ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด 59ํšŒ์˜ ๋ฐ˜๋ณต ์ค‘ 54๋ฒˆ์„ ์™„๋ฒฝํžˆ ์ˆ˜ํ–‰ํ–ˆ์œผ๋‹ˆ, ์„ฑ๊ณต๋ฅ  ์•ฝ 91.5%์˜ ๋งค์šฐ ๋†’์€ ์•ˆ์ •์„ฑ์„ ๋ณด์ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํฅ๋ฏธ๋กœ์šด ์ ์€ ์ž‘์—… ์ˆ˜ํ–‰ ์†๋„๋„ ์‚ฌ๋žŒ๊ณผ ๊ฒฌ์ค„ ๋งŒํ–ˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ˆ™๋ จ์ž(๋…ผ๋ฌธ ์ €์ž ๋“ฑ์˜ ํœด๋จธ๋…ธ์ด๋“œ ์›๊ฒฉ ์กฐ์ž‘ ์ „๋ฌธ๊ฐ€)๊ฐ€ ๊ฐ™์€ ์ž‘์—…์„ ์›๊ฒฉ์กฐ์ž‘์œผ๋กœ ์ˆ˜ํ–‰ํ–ˆ์„ ๋•Œ ํ•œ ์‚ฌ์ดํด ํ‰๊ท  21.4์ดˆ๊ฐ€ ๊ฑธ๋ ธ๋Š”๋ฐ, VIRAL ์ •์ฑ…์€ ํ‰๊ท  20.2์ดˆ๋กœ ์•ฝ๊ฐ„ ๋” ๋นจ๋ฆฌ ์ž‘์—…์„ ์™„๋ฃŒํ–ˆ์Šต๋‹ˆ๋‹ค. ๋น„์ˆ™๋ จ์ž๊ฐ€ ์กฐ์ข…ํ•˜๋Š” ๊ฒƒ๊ณผ ๋น„๊ตํ•˜๋ฉด VIRAL ์ •์ฑ…์ด ์••๋„์ ์œผ๋กœ ์šฐ์ˆ˜ํ–ˆ๋‹ค๊ณ  ํ•˜๋ฉฐ, ์ด๋Š” ์‚ฌ๋žŒ ์ˆ˜์ค€ ํ˜น์€ ๊ทธ ์ด์ƒ์˜ ์ž๋™ํ™” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€ ์‚ฌ๋ก€๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ VIRAL ์ •์ฑ…์€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์—์„œ๋„ ์ธ์ƒ์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๋ณ„๋„์˜ ์‹คํ—˜ ์„ธํŒ… ๋ณ€๊ฒฝ ์—†์ด๋„, ๋‹ค์–‘ํ•œ ๊ณต๊ฐ„์  ๋ฐฐ์น˜์™€ ์‹œ๊ฐ์  ๋ณ€ํ™”์— ๋Œ€์‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋กœ๋ด‡์˜ ์‹œ์ž‘ ์œ„์น˜๋ฅผ ํ…Œ์ด๋ธ”์— ๋Œ€ํ•ด ์™ผ์ชฝ/์ค‘์•™/์˜ค๋ฅธ์ชฝ์œผ๋กœ ๋ฐ”๊ฟ”๋„, ํ…Œ์ด๋ธ” ์œ„์˜ ํŠธ๋ ˆ์ด(์Ÿ๋ฐ˜) ์œ„์น˜๋ฅผ ๋ฐ”๊นฅ/์•ˆ์ชฝ์œผ๋กœ ์›€์ง์—ฌ๋„, ํ…Œ์ด๋ธ” ๋†’์ด๋ฅผ 66cm์—์„œ 81cm ์‚ฌ์ด ๋‹ค์–‘ํ•œ ๊ฐ’์œผ๋กœ ๋ฐ”๊ฟ”๋„ ๋กœ๋ด‡์€ ์—ฌ์ „ํžˆ ๋ฌผ์ฒด๋ฅผ ์ •ํ™•ํžˆ ์ง‘์–ด ์˜ฎ๊ฒผ์Šต๋‹ˆ๋‹ค. ์กฐ๋ช…์„ ํ™˜ํ•˜๊ฒŒ ์ผœ๊ฑฐ๋‚˜ ๊นœ๋นก์ด๊ฒŒ ํ•˜๊ฑฐ๋‚˜ ๊ฑฐ์˜ ์–ด๋‘ก๊ฒŒ ํ•ด๋„, ํ…Œ์ด๋ธ”๋ณด ์ƒ‰์ƒ์„ ๋ฐ์€ ํŒŒ๋ž‘/์ดˆ๋ก/๋…ธ๋ž‘/๋นจ๊ฐ• ๋“ฑ์œผ๋กœ ๋ฐ”๊ฟ”๋„, ์‹ฌ์ง€์–ด ํ…Œ์ด๋ธ”์˜ ์žฌ์งˆ์ด๋‚˜ ๋ชจ์–‘์„ ๋‹ค๋ฅธ ํ…Œ์ด๋ธ”๋กœ ๋ฐ”๊ฟ” ๋†“์•„๋„ ์ •์ฑ…์€ ์„ฑ๊ณต์ ์œผ๋กœ ๋™์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋œ ๋ฌผ์ฒด ์™ธ์— ์ƒˆ๋กœ์šด ํ˜•ํƒœ์˜ ๋ฌผ์ฒด๋“ค(์˜ˆ: ๋ฌผ๋ณ‘, ๋ณผ๋งํ•€, ์Šคํ”„๋ ˆ์ด ์บ”, ํ…Œ๋‹ˆ์Šค๊ณต ํ†ต ๋“ฑ)์„ ๋‘์—ˆ์„ ๋•Œ๋„ ์ƒ๋‹น ๋ถ€๋ถ„ ์ผ๋ฐ˜ํ™”๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” VIRAL์˜ ๋น„์ „ ์ •์ฑ…์ด ํŠน์ • ํ™˜๊ฒฝ์— ๊ณผ์ ํ•ฉ๋˜์ง€ ์•Š๊ณ , ๋„“์€ ๋ถ„ํฌ์˜ ์ƒํ™ฉ์„ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต๋˜์—ˆ์Œ์„ ์ฆ๋ช…ํ•˜๋Š” ๋Œ€๋ชฉ์ž…๋‹ˆ๋‹ค.

๋ฌผ๋ก  ๋ชจ๋“  ๋ณ€ํ™”์— ์™„๋ฒฝํžˆ ๋Œ€์‘ํ•œ ๊ฒƒ์€ ์•„๋‹ˆ์–ด์„œ, ๋…ผ๋ฌธ์—์„œ๋„ ์‹คํŒจ ์‚ฌ๋ก€๋ฅผ ์ผ๋ถ€ ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ๋Š” ์žก์„ ์ˆ˜ ์žˆ์—ˆ์ง€๋งŒ ์‹ค์ œ๋กœ๋Š” ๋ฏธ๋„๋Ÿฌ์šด ์žฌ์งˆ ๋•Œ๋ฌธ์— ๋†“์นœ ๊ฒฝ์šฐ, ์†๊ฐ€๋ฝ์ด ๊ฑธ๋ ค๋ฒ„๋ฆฐ ๊ฒฝ์šฐ, ํŠน์ดํ•œ ๋ชจ์–‘์˜ ์ฒ˜์Œ ๋ณด๋Š” ๋ฌผ์ฒด๋ผ ์‹ค์ˆ˜ํ•œ ๊ฒฝ์šฐ ๋“ฑ์ด ๋ณด๊ณ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์‹คํŒจ๋“ค์€ ์˜ˆ์™ธ์  ์‚ฌ๋ก€์˜€์œผ๋ฉฐ, ์ „๋ฐ˜์ ์œผ๋กœ VIRAL์˜ zero-shot ์ •์ฑ…์€ ํ˜„์‹ค ํ™˜๊ฒฝ์—์„œ ๋งค์šฐ ๊ฐ•์ธํ•˜๊ณ  ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ์ˆ˜ํ–‰์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

์ง€๊ธˆ๊นŒ์ง€ VIRAL์˜ ๋ฐฉ๋ฒ•๋ก ๊ณผ ์„ฑ๋Šฅ์„ ์‚ดํŽด๋ณด์•˜๋Š”๋ฐ์š”, ์ด์–ด์„œ๋Š” ์ด ์—ฐ๊ตฌ๋ฅผ ์กฐ๊ธˆ ๋น„ํŒ์ ์ธ ์‹œ๊ฐ์œผ๋กœ ๊ฒ€ํ† ํ•ด๋ณด๊ณ , ์•ž์œผ๋กœ์˜ ๊ฐœ์„  ๋ฐฉํ–ฅ์— ๋Œ€ํ•ด ๋…ผ์˜ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ•œ๊ณ„ ๋ฐ ๋…ผ์˜: VIRAL์˜ ์˜์˜์™€ ํ–ฅํ›„ ๋„์ „๊ณผ์ œ

VIRAL์€ โ€œ์‹ค์„ธ๊ณ„ ๋ฐ์ดํ„ฐ 0โ€์œผ๋กœ๋„ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์žฅ๊ธฐ ์Šคํ‚ฌ์„ ํ•™์Šต์‹œ์ผœ ๋ƒˆ๋‹ค๋Š” ์ ์—์„œ ๋กœ๋ด‡๊ณตํ•™ ๋ถ„์•ผ์— ํฐ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ RL+์‹œ๋ฎฌ๋ ˆ์ด์…˜+๋Œ€๊ทœ๋ชจ GPU๋ผ๋Š” ์กฐํ•ฉ์œผ๋กœ๋„ ์ถฉ๋ถ„ํžˆ ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์˜ ์ •์ฑ…์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๋ฐ์ดํ„ฐ ๋ถ€์กฑ์„ ์ฐฝ์˜์ ์œผ๋กœ ๊ทน๋ณตํ•œ ์‚ฌ๋ก€๋กœ ํ‰๊ฐ€๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋™์‹œ์—, ์ด ์ ‘๊ทผ์ด ๋ชจ๋“  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค๊ณ  ๋ณด๊ธฐ๋Š” ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋„ ํ•œ๊ณ„๋กœ ์ธ์ •ํ•œ ๋ถ€๋ถ„๊ณผ, ์ด๋ฅผ ๋„˜์–ด์„œ๊ธฐ ์œ„ํ•œ ๊ณ ๋ฏผ๋“ค์„ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ๋ฌผ๋ฆฌ ํŠน์„ฑ์˜ ์ปค๋ฒ„๋ฆฌ์ง€ ํ•œ๊ณ„: ํ˜„์žฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฌผ๋ฆฌ ํ˜„์ƒ์—๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. VIRAL์€ ๋น„๊ต์  ๋‹จ๋‹จํ•œ ๋ฌผ์ฒด์™€ ๊ณ ์ •๋œ ํ™˜๊ฒฝ(ํ…Œ์ด๋ธ” ์œ„ ๋ฌผ์ฒด ์ง‘๊ธฐ)์„ ๋‹ค๋ค˜์ง€๋งŒ, ๋งŒ์•ฝ ์œ ์ฒด, ์˜ท๊ฐ๊ฐ™์€ ๋ณ€ํ˜•์ฒด, ๋ชจ๋ž˜์™€ ๊ฐ™์€ ์ž…์ž์žฌ๋ฃŒ ๋“ฑ์ด ๋“ฑ์žฅํ•˜๋ฉด ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ์ •ํ™•ํžˆ ์žฌํ˜„ํ•˜๊ธฐ ์–ด๋ ค์›Œ์ง‘๋‹ˆ๋‹ค. ํ˜„์‹ค์˜ ๋ฌผ๋ฆฌ์  ๋‹ค์–‘์„ฑ์„ ๋ชจ๋‘ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์— ๋‹ด๋Š” ๊ฒƒ์€ ์—„์ฒญ๋‚œ ์—”์ง€๋‹ˆ์–ด๋ง ๋…ธ๋ ฅ๊ณผ ๋น„์šฉ์ด ๋“œ๋Š” ์ผ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ Physical Diversity Gap์€ ์—ฌ์ „ํžˆ Sim-to-Real ์ ‘๊ทผ์˜ ์ปค๋‹ค๋ž€ ๊ฑธ๋ฆผ๋Œ์ž…๋‹ˆ๋‹ค.

  • ๊ณผ์ œ ๋‹ค์–‘์„ฑ(Long-tail of Tasks): ์ด๋ฒˆ ์‹คํ—˜์€ ํ•˜๋‚˜์˜ ํŠน์ • ์ž‘์—…(ํ…Œ์ด๋ธ” ์œ„ ๋ฌผ์ฒด ์ง‘์—ˆ๋‹ค ์˜ฎ๊ธฐ๊ธฐ)์„ ๋‹ค๋ค˜์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ฐ€์ •์ด๋‚˜ ์‚ฐ์—… ํ˜„์žฅ์—๋Š” ์ด๋ณด๋‹ค ํ›จ์”ฌ ๋‹ค์–‘ํ•œ ์ž‘์—…๋“ค์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ๊ทธ ๋ชจ๋“  ์ž‘์—… ํ™˜๊ฒฝ๊ณผ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์‚ฌ๋žŒ์˜ ํž˜์œผ๋กœ ์„ค๊ณ„ํ•˜๋ ค๋ฉด ์ƒ์ƒํ•˜๊ธฐ ์–ด๋ ค์šธ ์ •๋„์˜ ๋…ธ๋ ฅ์ด ํ•„์š”ํ•˜๊ฒ ์ง€์š”. ๊ฒฐ๊ตญ ์ฝ˜ํ…์ธ  ์ œ์ž‘์˜ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ธ๊ฐ„์ด ๋ฏธ์ฒ˜ ์ƒ๊ฐ์ง€ ๋ชปํ•œ โ€œ๋ฏธ์ง€์˜ ์ƒํ™ฉโ€๊นŒ์ง€ ๋ชจ๋‘ ํฌ๊ด„ํ•˜๋Š” ๊ฐ€์ƒ ํ™˜๊ฒฝ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ํ˜„์‹ค์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅ์— ๊ฐ€๊น์Šต๋‹ˆ๋‹ค. ์ด ์ ์€ Sim-to-Real ํ•œ๊ณ„๋ฅผ ๋„˜์–ด General-Purpose(๋ฒ”์šฉ) ๋กœ๋ด‡์œผ๋กœ ๊ฐ€๋Š” ๋ฐ ์ˆ™์ œ๋กœ ๋‚จ์Šต๋‹ˆ๋‹ค.

  • ๋ณด์ƒ ์„ค๊ณ„ ๋ฐ ์ •์ฑ… ๋ฒ”์œ„: ๊ต์‚ฌ ์ •์ฑ…์„ ํ›ˆ๋ จํ•  ๋•Œ๋„, ๊ฐœ๋ฐœ์ž๋“ค์ด ์†์ˆ˜ ๋ณด์ƒ ํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ๊ณผ์ œ์•ผ ๋‹จ๊ณ„๊ฐ€ ๋ช…ํ™•ํ•˜๊ณ  ๊ตฌ์กฐํ™”๋˜์–ด ๋น„๊ต์  ์ˆ˜์›”ํ–ˆ์ง€๋งŒ, ๋งŒ์•ฝ ๋กœ๋ด‡์ด ๊ฐ€์ •์˜ ๋ชจ๋“  ๊ฐ€๊ตฌ์™€ ๋ฌผ๊ฑด์„ ๋‹ค๋ฃจ๋Š” ์ˆ˜์ค€์˜ ๋ฒ”์šฉ์„ฑ์„ ์›ํ•œ๋‹ค๋ฉด ์ˆ˜์ฒœ ๊ฐ€์ง€ ์ด์ƒ์˜ ๋ณด์ƒ๊ณผ ์Šคํ‚ฌ์„ ์ผ์ผ์ด ์„ค๊ณ„ํ•ด์•ผ ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ณด์ƒ ์„ค๊ณ„์˜ ๋ณ‘๋ชฉ์ด๋ฉฐ, ์ž์นซํ•˜๋ฉด ์„ค๊ณ„๋˜์ง€ ์•Š์€ ๋ถ€๋ถ„์—์„œ ์—‰๋šฑํ•œ ๋ณด์ƒ ์ตœ์ ํ™”๋ฅผ ์ถ”๊ตฌํ•ด ์œ„ํ—˜ํ•œ ํ–‰๋™์„ ํ•  ๊ฐ€๋Šฅ์„ฑ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ์ด์œ ๋กœ ์˜คํžˆ๋ ค ํ˜„์‹ค ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šต(Imitation Learning)ํ•˜๋Š” ์ ‘๊ทผ์ด ํŠน์ • ์ž‘์—…์—๋Š” ๋” ํ˜„์‹ค์ ์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋…ผ์˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ํ•˜๋“œ์›จ์–ด-์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ฒฉ์ฐจ: VIRAL์—์„œ๋„ ์†๊ฐ€๋ฝ SysID, ์นด๋ฉ”๋ผ ๋ณด์ • ๋“ฑ์œผ๋กœ ๋…ธ๋ ฅํ–ˆ์ง€๋งŒ, ์—ฌ์ „ํžˆ ํ˜„์‹ค ํ•˜๋“œ์›จ์–ด์™€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ฐ„ ์ฐจ์ด๋Š” ๋‚จ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์‹œ๊ฐ„ ์ง€๋‚˜๋ฉฐ ๋ชจํ„ฐ ๋ฐœ์—ด๋กœ ์ธํ•œ ์ถœ๋ ฅ ์ €ํ•˜, ๊ธฐ๊ณ„ ๋งˆ๋ชจ, ์„ผ์„œ ๋…ธ์ด์ฆˆ, ๊ด€์ ˆ ๋งˆ์ฐฐ/๋ฐฑ๋ž˜์‹œ ๋“ฑ ์‹ค์ œ ๋กœ๋ด‡ ๊ณ ์œ ์˜ ํŠน์„ฑ์€ ์‹œ๊ฐ„์ด ํ๋ฅด๋ฉฐ ๋ณ€๋™๊นŒ์ง€ ์ƒ๊น๋‹ˆ๋‹ค. ์™„๋ฒฝํžˆ ํ˜„์‹ค๊ณผ ๋™์ผํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ๋งŒ๋“œ๋Š” ๊ฑด ๋ถˆ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ, ๊ฒฐ๊ตญ Sim-to-Real์—” ์–ธ์ œ๋‚˜ ์ž”์—ฌ ์˜ค์ฐจ๊ฐ€ ์žˆ๊ฒŒ ๋งˆ๋ จ์ž…๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ๋ณต์žกํ•œ ์กฐ์ž‘์ผ์ˆ˜๋ก ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ž˜ ๋๋˜ ๊ฒƒ์ด ํ˜„์‹ค์—์„  ์กฐ๊ธˆ์”ฉ ์–ด๊ธ‹๋‚  ์œ„ํ—˜์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์œ„์™€ ๊ฐ™์€ ํ•œ๊ณ„๋“ค๋กœ ์ธํ•ด, VIRAL์ด ๊ถ๊ทน์ ์ธ ๋ฒ”์šฉ ํœด๋จธ๋…ธ์ด๋“œ ํ•™์Šต์˜ ์™„๊ฒฐํŒ์€ ์•„๋‹ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ๊ฐ•๋ ฅํ•˜์ง€๋งŒ, ํ˜„์‹ค ๋ฐ์ดํ„ฐ๋ฅผ ์™„์ „ํžˆ ๋Œ€์ฒดํ•  ์ˆ˜๋Š” ์—†์„ ๊ฒƒ์ด๋ผ๋Š” ์ ๋„ ์—ฐ๊ตฌ์ง„์€ ์ธ์ •ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์˜คํžˆ๋ ค ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ํ˜„์‹ค ํ•™์Šต์˜ ์œตํ•ฉ์ด ํ–ฅํ›„ ๋ฐฉํ–ฅ์œผ๋กœ ์ œ์‹œ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํ˜„์‹ค์—์„œ ์†Œ๋Ÿ‰์˜ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป์–ด์˜ค๋Š” ๊ฒƒ์„ simulation์œผ๋กœ ๋ณด์™„ํ•˜๊ฑฐ๋‚˜, ๋ฐ˜๋Œ€๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ๋จผ์ € ์˜ˆ๋น„ ํ•™์Šต์„ ํ•œ ํ›„ ํ˜„์‹ค์—์„œ ํŒŒ์ธํŠœ๋‹์„ ํ•˜๋Š” ๋“ฑ ํ˜ผํ•ฉ ์ „๋žต์ด ํ•„์š”ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ฑฐ๋Œ€ ํ”„๋ฆฌํŠธ๋ ˆ์ธ ๋ชจ๋ธ(์˜ˆ: ๊ฑฐ๋Œ€ ๋น„์ „-ํ–‰๋™ ๋ชจ๋ธ)์ด๋‚˜ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ๊ฐ„ ํ•™์Šต๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ์—ญํ• ์„ ์•ˆ์ „์„ฑ ๊ฒ€์ฆ ๋ฐ ์ดˆ๊ธฐ ์˜์—ญ ํ•œ์ • ํ•™์Šต ์ •๋„๋กœ ํ™œ์šฉํ•˜๊ณ , ์ตœ์ข…์ ์ธ ์ผ๋ฐ˜ํ™”๋Š” ํ˜„์‹ค ์„ธ๊ณ„ ๋ฐ์ดํ„ฐ์™€ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ, VIRAL ์—ฐ๊ตฌ๊ฐ€ ๋ณด์—ฌ์ค€ ํ•œ ๊ฐ€์ง€ ๋ถ„๋ช…ํ•œ ๊ตํ›ˆ์€ โ€œ์Šค์ผ€์ผ์ด ๋‹ต์ผ ๋•Œ๋„ ์žˆ๋‹คโ€๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์•ฝ 6๊ฐœ์›”์— ๊ฑธ์นœ ์ธํ”„๋ผ ๊ตฌ์ถ•๊ณผ ์ˆ˜์‹ญ ๋Œ€์˜ GPU๋ฅผ ํˆฌ์ž…ํ•œ ๋Œ€๊ทœ๋ชจ ์‹คํ—˜ ๋์— ์–ป์€ ์„ฑ๊ณผ๋ผ๋Š” ์ ์—์„œ, ์‰ฝ๊ฒŒ ๋”ฐ๋ผํ•˜๊ธฐ ์–ด๋ ค์šด ๋ถ€๋ถ„๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋กœ๋ด‡ ๋ถ„์•ผ์—์„œ๋„ ๊ทœ๋ชจ์˜ ๊ฒฝ์ œ์™€ ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ ์ ‘๊ทผ์ด ํšจ๊ณผ๋ฅผ ๋ฐœํœ˜ํ•จ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค๋Š” ์ ์€ ๊ณ ๋ฌด์ ์ž…๋‹ˆ๋‹ค. ํ–ฅํ›„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „๊ณผ ์ปดํ“จํŒ… ์ž์›์˜ ์ฆ๋Œ€๋กœ ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์ด ๋” ๋งŽ์€ ๊ณณ์—์„œ ์žฌํ˜„ ๊ฐ€๋Šฅํ•ด์ง€๊ณ , ๋‚˜์•„๊ฐ€ ํ˜„์‹ค ๋ฐ์ดํ„ฐ์™€์˜ ์กฐํ•ฉ์œผ๋กœ ์ง„์ •ํ•œ ๋ฒ”์šฉ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์— ํ•œ ๊ฑธ์Œ ๋‹ค๊ฐ€๊ฐˆ ์ˆ˜ ์žˆ๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•ด๋ด…๋‹ˆ๋‹ค.

์ฐธ๊ณ ๋ฌธํ—Œ: ๋ณธ ๊ธ€์˜ ๋‚ด์šฉ์€ ๋…ผ๋ฌธ ์›๋ฌธ๊ณผ ์ €์ž๋“ค์ด ๊ณต๊ฐœํ•œ ํ”„๋กœ์ ํŠธ ์›น์‚ฌ์ดํŠธ, ๊ทธ๋ฆฌ๊ณ  ๊ด€๋ จ ๋ฆฌ๋ทฐ ์ž๋ฃŒ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ •๋ฆฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. VIRAL์˜ ๊ตฌ์ฒด์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ตฌํ˜„๊ณผ ์ถ”๊ฐ€ ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ arXiv์— ๊ณต๊ฐœ๋œ ๋…ผ๋ฌธ์„ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Sources:

  1. T. He et al., "VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation," arXiv:2511.15200 (2025)
  2. VIRAL ํ”„๋กœ์ ํŠธ ์›น์‚ฌ์ดํŠธ โ€“ Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation
  3. Moonlight Review โ€“ VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation (์š”์•ฝ ๋ฐ ํ•ด์„ค)

Copyright 2026, JungYeon Lee