Curieux.JY
  • Post
  • Note
  • Jung Yeon Lee

On this page

  • Brief Review
  • Detail Review
    • ๋ฐฐ๊ฒฝ๊ณผ ๋ฌธ์ œ ์ •์˜
    • ์ฃผ์š” ๊ธฐ์—ฌ ๋ฐ ํ˜์‹ ์ 
    • ๊ธฐ์ˆ ์  ๊ตฌ์„ฑ: ManipTrans ๋‘ ๋‹จ๊ณ„ ๋ฐฉ๋ฒ•๋ก 
      • 1๋‹จ๊ณ„: ์† ๋™์ž‘ ๋ชจ๋ฐฉ (Trajectory Imitation Pre-training)
      • 2๋‹จ๊ณ„: ์ž”์ฐจ ์ •์ฑ…์„ ํ†ตํ•œ ์ƒํ˜ธ์ž‘์šฉ ๋ฏธ์„ธ ์กฐ์ • (Residual Learning Fine-tuning)
    • ์‹คํ—˜ ๊ฒฐ๊ณผ: ์„ฑ๋Šฅ ํ‰๊ฐ€ ๋ฐ ๋ถ„์„
    • ๋…ผ์˜ ๋ฐ ํ•œ๊ณ„์ 

๐Ÿ“ƒManipTrans ๋ฆฌ๋ทฐ

retargeting
imitation
residual
Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
Published

August 25, 2025

  • Paper Link
  • Project Link
  • Code Link
  1. ์ด ๋…ผ๋ฌธ์€ ๋ณต์žกํ•œ ์–‘์† ๋กœ๋ด‡ ์กฐ์ž‘ ๊ธฐ์ˆ ์„ ํšจ์œจ์ ์œผ๋กœ ์ „์ดํ•˜๊ธฐ ์œ„ํ•œ MANIPTRANS๋ผ๋Š” ์ƒˆ๋กœ์šด 2๋‹จ๊ณ„ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. MANIPTRANS๋Š” ๋จผ์ € ์† ๊ถค์  ๋ชจ๋ฐฉ ๋ชจ๋ธ์„ ์‚ฌ์ „ ํ•™์Šตํ•œ ํ›„, ๋ฌผ๋ฆฌ์  ์ƒํ˜ธ์ž‘์šฉ ์ œ์•ฝ์„ ์œ„ํ•œ ์ž”์—ฌ ํ•™์Šต์„ ํ†ตํ•ด ์ธ๊ฐ„ ๋™์ž‘์„ ์ •ํ™•ํ•˜๊ณ  ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํƒ€๋‹นํ•˜๊ฒŒ ๋ชจ๋ฐฉํ•ฉ๋‹ˆ๋‹ค.
  3. ์ด ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด SOTA๋ณด๋‹ค ์„ฑ๊ณต๋ฅ , ์ •ํ™•๋„, ํšจ์œจ์„ฑ ๋ฉด์—์„œ ๋›ฐ์–ด๋‚˜๋ฉฐ, ๋Œ€๊ทœ๋ชจ ์–‘์† ์กฐ์ž‘ ๋ฐ์ดํ„ฐ์…‹์ธ DEXMANIPNET์„ ๊ตฌ์ถ•ํ•˜๊ณ  ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ์‹ ์ฒด์™€ ์‹ค์ œ ํ™˜๊ฒฝ์œผ๋กœ์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Brief Review

Embodied AI ๋ถ„์•ผ๋Š” dexterous robotic manipulation ์—ฐ๊ตฌ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉฐ ๋น ๋ฅด๊ฒŒ ๋ฐœ์ „ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ Embodied AI ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ •๋ฐ€ํ•˜๊ณ  ๋Œ€๊ทœ๋ชจ์˜ ์ธ๊ฐ„๊ณผ ์œ ์‚ฌํ•œ ์กฐ์ž‘ ์‹œํ€€์Šค๋ฅผ ํ•„์š”๋กœ ํ•˜์ง€๋งŒ, ๊ธฐ์กด ๊ฐ•ํ™” ํ•™์Šต(RL)์ด๋‚˜ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ Teleoperation์€ ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป๋Š” ๋ฐ ์–ด๋ ค์›€์ด ์žˆ์Šต๋‹ˆ๋‹ค. RL์€ task-specific reward function ์„ค๊ณ„๊ฐ€ ํ•„์š”ํ•˜์—ฌ ํ™•์žฅ์„ฑ๊ณผ ์ž‘์—… ๋ณต์žก์„ฑ์— ์ œํ•œ์ด ์žˆ๊ณ , Teleoperation์€ ๋…ธ๋™ ์ง‘์•ฝ์ ์ด๊ณ  ๋น„์šฉ์ด ๋งŽ์ด ๋“ค๋ฉฐ Embodiment-specific ๋ฐ์ดํ„ฐ๋งŒ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์€ MANIPTRANS๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. MANIPTRANS๋Š” ์ธ๊ฐ„์˜ ์–‘์†(bimanual) ์กฐ์ž‘ ๊ธฐ์ˆ ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ƒ์˜ dexterous robotic hand๋กœ ํšจ์œจ์ ์œผ๋กœ ์ด์ „ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด 2๋‹จ๊ณ„ ๋ฐฉ๋ฒ•๋ก ์ž…๋‹ˆ๋‹ค. MANIPTRANS๋Š” ๋จผ์ € ์† ๋™์ž‘์„ ๋ชจ๋ฐฉํ•˜๋Š” generalist trajectory imitator๋ฅผ ์‚ฌ์ „ ํ•™์Šตํ•˜๊ณ , ๊ทธ ๋‹ค์Œ ์ƒํ˜ธ์ž‘์šฉ ์ œ์•ฝ ์กฐ๊ฑด ํ•˜์—์„œ ํŠน์ • residual module์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ๋ณต์žกํ•œ ์–‘์† ์ž‘์—…์˜ ํšจ์œจ์ ์ธ ํ•™์Šต๊ณผ ์ •ํ™•ํ•œ ์‹คํ–‰์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

MANIPTRANS์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์ด์ „ ํ”„๋กœ์„ธ์Šค๋ฅผ ๋‘ ๋‹จ๊ณ„๋กœ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ์† ๋™์ž‘ ๋ชจ๋ฐฉ์— ์ดˆ์ ์„ ๋งž์ถ˜ ์‚ฌ์ „ ํ•™์Šต ๋‹จ๊ณ„์ด๋ฉฐ, ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ์ƒํ˜ธ์ž‘์šฉ ์ œ์•ฝ ์กฐ๊ฑด์„ ์ถฉ์กฑํ•˜๊ธฐ ์œ„ํ•œ ํŠน์ • action ๋ฏธ์„ธ ์กฐ์ • ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐ•๋ ฅํ•œ generalist model์ด ๋…ธ์ด์ฆˆ์—๋„ ๊ฐ•๊ฑดํ•˜๊ฒŒ ์ธ๊ฐ„ ์†๊ฐ€๋ฝ ๋™์ž‘์„ ์ •ํ™•ํ•˜๊ฒŒ ๋ชจ๋ฐฉํ•˜๋„๋ก ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ์ด ์ดˆ๊ธฐ ๋ชจ๋ฐฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ residual learning module์„ ๋„์ž…ํ•˜์—ฌ ๋กœ๋ด‡์˜ actions์„ ์ ์ง„์ ์œผ๋กœ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋“ˆ์€ 1) ๋ฌผ๋ฆฌ์  ์ œ์•ฝ ํ•˜์—์„œ ๊ฐ์ฒด ํ‘œ๋ฉด๊ณผ์˜ ์•ˆ์ •์ ์ธ ์ ‘์ด‰์„ ๋ณด์žฅํ•˜์—ฌ ํšจ๊ณผ์ ์ธ ๊ฐ์ฒด ์กฐ์ž‘์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ณ , 2) ์–‘์†์„ ํ˜‘๋ ฅ์‹œ์ผœ ๋ณต์žกํ•œ ์–‘์† ์ž‘์—…์˜ ์ •๋ฐ€ํ•˜๊ณ  ๊ณ ํ™”์งˆ(high-fidelity) ์‹คํ–‰์„ ๋ณด์žฅํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ์ธก๋ฉด์— ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค.

์ด ์„ค๊ณ„์˜ ์žฅ์ ์€ ์„ธ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.

  1. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ ์‚ฌ์ „ ํ•™์Šต์„ ํ†ตํ•œ ๋™์  ์† ๋ชจ๋ฐฉ์— ์ง‘์ค‘ํ•˜์—ฌ ํ˜•ํƒœํ•™์  ์ฐจ์ด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์™„ํ™”ํ•ฉ๋‹ˆ๋‹ค.
  2. ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ์–‘์† ๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ ์ถ”์ ์— ์ง‘์ค‘ํ•˜์—ฌ ๋ฏธ๋ฌ˜ํ•œ ์›€์ง์ž„์„ ์ •ํ™•ํ•˜๊ฒŒ ํฌ์ฐฉํ•˜๊ณ  ์ž์—ฐ์Šค๋Ÿฌ์šด ๊ณ ํ™”์งˆ ์กฐ์ž‘์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  3. ์ธ๊ฐ„ ์† ๋™์ž‘ ๋ชจ๋ฐฉ๊ณผ physics-based object interaction constraints๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ action space ๋ณต์žก์„ฑ์„ ํฌ๊ฒŒ ์ค„์—ฌ ํ•™์Šต ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

์ด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ MANIPTRANS๋Š” ์ž„์˜์˜, ๋…ธ์ด์ฆˆ๊ฐ€ ํฌํ•จ๋œ ์† MoCap ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊ทธ๋Ÿด๋“ฏํ•œ ๋™์ž‘์œผ๋กœ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฏธ๋ฆฌ ์ •์˜๋œ ๋‹จ๊ณ„(์˜ˆ: โ€œ์ ‘๊ทผ-์žก๊ธฐ-์กฐ์ž‘โ€)๋‚˜ task-specific reward engineering ์—†์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

3. Method

MANIPTRANS๋Š” ์ฃผ์–ด์ง„ ์ธ๊ฐ„ ์†-๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ reference trajectories๋ฅผ dexterous robotic hand๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ •ํ™•ํ•˜๊ฒŒ ๋ณต์ œํ•˜๊ณ  task์˜ semantic manipulation constraints๋ฅผ ๋งŒ์กฑํ•˜๋„๋ก ํ•˜๋Š” ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด 2๋‹จ๊ณ„ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

3.1. Preliminaries

๋ณต์žกํ•œ ์–‘์† ํ™˜๊ฒฝ์—์„œ ์กฐ์ž‘ ์ด์ „ ๋ฌธ์ œ๋ฅผ ๊ณต์‹ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์™ผ์†๊ณผ ์˜ค๋ฅธ์† dexterous hand d=\{d_l, d_r\}๋Š” ๋‘ ๊ฐœ์˜ ๊ฐ์ฒด o=\{o_l, o_r\}์™€ ํ˜‘๋ ฅ์ ์œผ๋กœ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ์ธ๊ฐ„ ์† h=\{h_l, h_r\}์˜ ํ–‰๋™์„ ๋ณต์ œํ•ฉ๋‹ˆ๋‹ค. ์ธ๊ฐ„ demonstration์˜ reference trajectories๋Š” T^h=\{\tau_t^h\}_{t=1}^T์™€ T^o=\{\tau_t^o\}_{t=1}^T๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค.

\tau_t^h๋Š” wrist์˜ 6-DoF pose w^h \in SE(3), linear ๋ฐ angular velocities \dot{w}^h=\{v^h, u^h\}, MANO [96]๋กœ ์ •์˜๋œ finger joint positions j^h \in R^{F \times 3} ๋ฐ velocities \dot{j}^h=\{v^j, u^j\}๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. F๋Š” ์† keypoint ์ˆ˜์ž…๋‹ˆ๋‹ค. \tau_t^o๋Š” ๊ฐ ๊ฐ์ฒด์˜ 6-DoF pose p_t^o \in SE(3)์™€ velocities \dot{p}_t^o=\{v^o, u^o\}๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ๊ณต๊ฐ„ ๋ณต์žก์„ฑ ๊ฐ์†Œ๋ฅผ ์œ„ํ•ด ๋ชจ๋“  translation์€ dexterous hand์˜ wrist position์— ์ƒ๋Œ€์ ์œผ๋กœ ์ •๊ทœํ™”๋ฉ๋‹ˆ๋‹ค.

๋ฌธ์ œ๋Š” Implicit Markov Decision Process (MDP) M=\langle S, A, \mathcal{T}, R, \gamma \rangle๋กœ ๋ชจ๋ธ๋ง๋ฉ๋‹ˆ๋‹ค. (State space S, Action space A, Transition dynamics \mathcal{T}, Reward function R, discount factor \gamma) ๊ฐ dexterous hand์˜ ์‹œ๊ฐ„ t์—์„œ์˜ action a_t \in A๋Š” PD control์„ ์œ„ํ•œ ๊ฐ joint์˜ target positions a_t^q \in R^K์™€ robotic wrist์— ๊ฐ€ํ•ด์ง€๋Š” 6-DoF force a_t^w \in R^6๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. K๋Š” robotic hand์˜ DoF์ž…๋‹ˆ๋‹ค.

์ด์ „ ํ”„๋กœ์„ธ์Šค๋Š” ๋‘ ๋‹จ๊ณ„๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

  1. ์‚ฌ์ „ ํ•™์Šต๋œ hand-only trajectory imitation model I,
  2. coarse actions๋ฅผ task-compliant actions๋กœ fine-tuningํ•˜๋Š” residual module R. ์‹œ๊ฐ„ t์—์„œ์˜ state๋Š” ๊ฐ ๋‹จ๊ณ„๋ณ„๋กœ s_t^I \in S_I์™€ s_t^R \in S_R๋กœ ์ •์˜๋˜๋ฉฐ, ์ƒ์‘ํ•˜๋Š” reward functions๋Š” r_t^I = R(s_t^I, a_t^I) ๋ฐ r_t^R = R(s_t^R, a_t^R)์ž…๋‹ˆ๋‹ค.

๋‹จ๊ณ„ ๋ชจ๋‘ Proximal Policy Optimization (PPO) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ discounted reward E[\sum_{t=1}^T \gamma^{t-1} r_t^{stage}]๋ฅผ ์ตœ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

3.2. Hand Trajectory Imitating

์ด ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ์ƒ์„ธํ•œ ์ธ๊ฐ„ ์†๊ฐ€๋ฝ ๋™์ž‘์„ ์ •ํ™•ํ•˜๊ฒŒ ๋ณต์ œํ•˜๋Š” general hand trajectory imitation model I๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ dexterous hand์˜ ์‹œ๊ฐ„ t์—์„œ์˜ state๋Š” s_t^I = \{\tau_t^h, s_t^{prop}\}๋กœ ์ •์˜๋˜๋ฉฐ, target hand trajectory \tau_t^h์™€ ํ˜„์žฌ proprioception s_t^{prop} = \{q_t^d, \dot{q}_t^d, w_t^d, \dot{w}_t^d\}๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. q_t^d, w_t^d๋Š” ๊ฐ๊ฐ joint angles์™€ wrist poses์ž…๋‹ˆ๋‹ค. RL์„ ์‚ฌ์šฉํ•˜์—ฌ policy \pi_I(a_t | s_t^I, a_{t-1})๋ฅผ ํ•™์Šตํ•˜์—ฌ actions a_t^I๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

Reward Functions: r_t^I๋Š” dexterous hand๊ฐ€ reference hand trajectory \tau_t^h๋ฅผ ์ถ”์ ํ•˜๋ฉด์„œ ์•ˆ์ •์„ฑ๊ณผ ๋ถ€๋“œ๋Ÿฌ์›€์„ ๋ณด์žฅํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  1. Wrist tracking reward r_t^{wrist}: w_t^d \ominus w_t^h์™€ \dot{w}_t^d - \dot{w}_t^h์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. โŠ–๋Š” SE(3) ๊ณต๊ฐ„์—์„œ์˜ ์ฐจ์ด์ž…๋‹ˆ๋‹ค.
  2. Finger imitation reward r_t^{finger}: Dexterous hand๊ฐ€ reference finger joint positions๋ฅผ ๋ฐ€์ฐฉ ์ถ”์ ํ•˜๋„๋ก ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. MANO model์— ํ•ด๋‹นํ•˜๋Š” F๊ฐœ์˜ ์†๊ฐ€๋ฝ keypoint j_t^d๋ฅผ dexterous hand์— manually selection ํ•ฉ๋‹ˆ๋‹ค. Weights w_f์™€ decay rates \lambda_f๋Š” ์†๊ฐ€๋ฝ ๋, ํŠนํžˆ ์—„์ง€, ๊ฒ€์ง€, ์ค‘์ง€์— ๊ฐ•์กฐ๋ฅผ ๋‘ก๋‹ˆ๋‹ค. ์ด๋Š” ์ธ๊ฐ„๊ณผ ๋กœ๋ด‡ ์†์˜ ํ˜•ํƒœํ•™์  ์ฐจ์ด ์˜ํ–ฅ์„ ์™„ํ™”ํ•ฉ๋‹ˆ๋‹ค. r_t^{finger} = \sum_{f=1}^F w_f \cdot \exp (-\lambda_f \|j_t^d - j_t^h\|_2^2).
  3. Smoothness Reward r_t^{smooth}: ๊ฐ joint์— ๊ฐ€ํ•ด์ง€๋Š” power์— ๋ฒŒ์ ์„ ์ค๋‹ˆ๋‹ค.

\text{Total reward:} r_t^I = w_{wrist} \cdot r_t^{wrist} + w_{finger} \cdot r_t^{finger} + w_{smooth} \cdot r_t^{smooth}

Training Strategy:

์† ๋ชจ๋ฐฉ์„ ๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ์—์„œ ๋ถ„๋ฆฌํ•˜์—ฌ, \pi_I๋Š” ํš๋“ํ•˜๊ธฐ ์–ด๋ ค์šด ์กฐ์ž‘ ๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด hand motion datasets์™€ synthetic data๋ฅผ ํฌํ•จํ•œ hand-only datasets๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ policy๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ํšจ์œจ์„ฑ์„ ์œ„ํ•ด Reference State Initialization (RSI)์™€ early termination์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Dexterous hand keypoint j_t^d๊ฐ€ ์ž„๊ณ„๊ฐ’ \epsilon_{finger} ์ด์ƒ ๋ฒ—์–ด๋‚˜๋ฉด ์—ํ”ผ์†Œ๋“œ๊ฐ€ ์กฐ๊ธฐ ์ข…๋ฃŒ๋˜๊ณ  ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋ง๋œ MoCap state๋กœ ์žฌ์„ค์ •๋ฉ๋‹ˆ๋‹ค. Curriculum learning ์„ ์‚ฌ์šฉํ•˜์—ฌ \epsilon_{finger}๋ฅผ ์ ์ง„์ ์œผ๋กœ ์ค„์—ฌ ์ดˆ๊ธฐ ๋„“์€ ํƒ์ƒ‰ ํ›„ ๋ฏธ์„ธํ•œ ์†๊ฐ€๋ฝ ์ œ์–ด์— ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค.

3.3. Residual Learning for Interaction

์‚ฌ์ „ ํ•™์Šต๋œ \pi_I๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ residual module R์„ ์‚ฌ์šฉํ•˜์—ฌ coarse actions๋ฅผ fine-tuneํ•˜๊ณ  task-specific constraints๋ฅผ ๋งŒ์กฑ์‹œํ‚ต๋‹ˆ๋‹ค.

State Space Expansion for Interaction: Dexterous hand์™€ object ๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ์„ ๊ณ ๋ คํ•˜์—ฌ hand-related state s_t^I ์™ธ์— ์ถ”๊ฐ€ interaction-related information์„ ํ†ตํ•ฉํ•˜์—ฌ state space๋ฅผ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.

  • Object information: MoCap ๋ฐ์ดํ„ฐ์˜ object meshes o์˜ convex hull ห†o๋ฅผ simulation ํ™˜๊ฒฝ์—์„œ ์ƒ์„ฑ. Reference T^o ๋”ฐ๋ผ ๊ฐ์ฒด ์กฐ์ž‘ ์œ„ํ•ด ๊ฐ์ฒด์˜ position p_t^{\hat{o}} (wrist position w_t^d ๊ธฐ์ค€), velocities \dot{p}_t^{\hat{o}}, center of mass m_t^{\hat{o}}, gravitational force vector G_t^{\hat{o}} ํฌํ•จ. ๊ฐ์ฒด ํ˜•์ƒ ์ธ์ฝ”๋”ฉ ์œ„ํ•ด BPS [91] ์‚ฌ์šฉ.
  • Spatial relationship: ์†๊ณผ ๊ฐ์ฒด ๊ฐ„ ๊ณต๊ฐ„ ๊ด€๊ณ„๋ฅผ distance metric D(j_t^d, p_t^{\hat{o}}) = \|j_t^d - p_t^{\hat{o}}\|_2^2์œผ๋กœ ์ธ์ฝ”๋”ฉ.
  • Contact force C_t: Simulation์—์„œ ์–ป์€ contact force๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ํฌํ•จ. ์•ˆ์ •์ ์ธ grasping๊ณผ manipulation์— ์ค‘์š”.

Expanded interaction state: s_t^{interact} = \{\tau_t^o, p_t^{\hat{o}}, \dot{p}_t^{\hat{o}}, m_t^{\hat{o}}, G_t^{\hat{o}}, \text{BPS}(\hat{o}), D(j_t^d, p_t^{\hat{o}}), C_t\} Combined state: s_t^R = s_t^I \cup s_t^{interact}

Residual Actions Combining Strategy: Goal์€ initial imitation actions a_t^I๋ฅผ refineํ•˜๋Š” residual actions \Delta a_t^R๋ฅผ ํ•™์Šตํ•˜์—ฌ task compliance๋ฅผ ๋ณด์žฅํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ตœ์ข… action์€ a_t = a_t^I + \Delta a_t^R๋กœ ๊ณ„์‚ฐ๋˜๋ฉฐ, residual action์€ element-wise๋กœ ๋”ํ•ด์ง‘๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ action a_t๋Š” dexterous hand์˜ joint limit์— ๋งž๊ฒŒ clipping๋ฉ๋‹ˆ๋‹ค. ํ•™์Šต ์ดˆ๊ธฐ์—๋Š” dexterous hand ์›€์ง์ž„์ด ์ด๋ฏธ reference hand trajectory์— ๊ฐ€๊น๊ธฐ ๋•Œ๋ฌธ์— residual actions์€ 0์— ๊ฐ€๊น๋„๋ก ์˜ˆ์ƒ๋˜๋ฉฐ, ์ด๋Š” model collapse๋ฅผ ๋ฐฉ์ง€ํ•˜๊ณ  convergence๋ฅผ ๊ฐ€์†ํ™”ํ•ฉ๋‹ˆ๋‹ค. Residual module์„ zero-mean Gaussian distribution์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  warm-up strategy๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ ์ง„์ ์œผ๋กœ ํ•™์Šต์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.

Reward Functions: Task-agnosticํ•˜๊ฒŒ ์„ค๊ณ„๋˜์–ด task-specific reward engineering์„ ํ”ผํ•ฉ๋‹ˆ๋‹ค.

  • Hand imitation reward r_t^I (Sec 3.2) ํฌํ•จ.
  • Object following reward r_t^{object}: Simulated object์™€ reference trajectory ๊ฐ„ positional ๋ฐ velocity ์ฐจ์ด ์ตœ์†Œํ™” (p_t^{\hat{o}} \ominus p_t^o and \dot{p}_t^{\hat{o}} - \dot{p}_t^o).
  • Contact force reward r_t^{contact}: MoCap ๋ฐ์ดํ„ฐ์—์„œ hand-object distance๊ฐ€ ์ž„๊ณ„๊ฐ’ \xi_c ์ดํ•˜์ผ ๋•Œ ์ ์ ˆํ•œ contact force ์žฅ๋ ค. r_t^{contact} = w_c \cdot \exp ( -\lambda_c \sum_{f=1}^F C_{t,f}^d \cdot \mathbf{1}_{D(j_{t,f}^h, p_t^o \cdot o) < \xi_c} ). ์—ฌ๊ธฐ์„œ \mathbf{1}(\cdot)์€ indicator function, C_{t,f}^d๋Š” fingertip์—์„œ์˜ contact force์ž…๋‹ˆ๋‹ค.

\text{Total reward: } r_t^R = r_t^I + w_{object} \cdot r_t^{object} + w_{contact} \cdot r_t^{contact}

Training Strategy: QuasiSim ์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„ relaxation mechanism์„ ๋„์ž…. Isaac Gym ํ™˜๊ฒฝ์—์„œ ๋ฌผ๋ฆฌ์  ์ œ์•ฝ ์กฐ๊ฑด์„ ์ง์ ‘ ์กฐ์ •ํ•˜์—ฌ ํ•™์Šต ํšจ์œจ์„ฑ ํ–ฅ์ƒ. ์ดˆ๊ธฐ์—๋Š” gravitational constant G๋ฅผ 0์œผ๋กœ, friction coefficient F๋ฅผ ๋†’์€ ๊ฐ’์œผ๋กœ ์„ค์ •. ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ G๋ฅผ ์‹ค์ œ ๊ฐ’์œผ๋กœ ๋ณต์›ํ•˜๊ณ  F๋ฅผ ์ ์ ˆํ•œ ๊ฐ’์œผ๋กœ ๊ฐ์†Œ. Imitation ๋‹จ๊ณ„์™€ ์œ ์‚ฌํ•˜๊ฒŒ RSI, early termination(p_t^{\hat{o}}๊ฐ€ \epsilon_{object} ๋ฒ—์–ด๋‚  ์‹œ), curriculum learning ์‚ฌ์šฉ (\epsilon_{object} ์ ์ง„์  ๊ฐ์†Œ). Contact termination ์กฐ๊ฑด ์ถ”๊ฐ€: MoCap์—์„œ ์ธ๊ฐ„ ์†์ด ๋‹จ๋‹จํžˆ ์žก๊ณ  ์žˆ์Œ์„ ๋‚˜ํƒ€๋‚ผ ๋•Œ(D(j_{t,f}^h, p_t^o \cdot o) < \xi_t) C_{t,f}^d๊ฐ€ 0์ด ์•„๋‹ˆ์–ด์•ผ ํ•จ. ์ด ์กฐ๊ฑด์„ ์ถฉ์กฑํ•˜์ง€ ๋ชปํ•˜๋ฉด ์กฐ๊ธฐ ์ข…๋ฃŒ.

3.4. DEXMANIPNET Dataset

MANIPTRANS๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DEXMANIPNET ๋ฐ์ดํ„ฐ์…‹์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. FAVOR์™€ OakInk-V2 ๋‘ ๋Œ€ํ‘œ์  ๋Œ€๊ทœ๋ชจ hand-object interaction datasets์—์„œ ํŒŒ์ƒ๋ฉ๋‹ˆ๋‹ค. OakInk-V2๋Š” pen capping, bottle unscrewing ๊ฐ™์€ ๋ณต์žกํ•œ ์ƒํ˜ธ์ž‘์šฉ์„, FAVOR๋Š” object rearrangement ๊ฐ™์€ ๊ธฐ์ดˆ ์ž‘์—…์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. Dexterous robotic hand์˜ ํ‘œ์ค€ํ™” ๋ถ€์กฑ์œผ๋กœ Inspire Hand (simulated 12-DoF)๋ฅผ ์ฃผ์š” ํ”Œ๋žซํผ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

DEXMANIPNET์€ 61๊ฐ€์ง€ ๋‹ค์–‘ํ•˜๊ณ  ์–ด๋ ค์šด task๋ฅผ ํฌํ•จํ•˜๋ฉฐ, 1.2K๊ฐœ ๊ฐ์ฒด์— ๋Œ€ํ•œ 3.3K episode์˜ robotic hand manipulation, ์ด 1.34 million frames์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด ์ค‘ ์•ฝ 600 sequence๋Š” ๋ณต์žกํ•œ ์–‘์† task๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ๊ฐ episode๋Š” Isaac Gym ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ •ํ™•ํ•˜๊ฒŒ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

4. Experiments

MANIPTRANS๋ฅผ manipulation precision, task compliance, transfer efficiency ์ธก๋ฉด์—์„œ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. Metrics๋Š” ์—์„œ adapted๋˜์—ˆ์œผ๋‚˜ ์–‘์† task ๋ณต์žก์„ฑ์œผ๋กœ ์ธํ•ด ๋” ์—„๊ฒฉํ•ฉ๋‹ˆ๋‹ค.

  1. Per-frame Average Object Rotation and Translation Error: E_r = \frac{1}{TP} \sum_{t=1}^T (\text{prot}_t^{\hat{o}} \cdot (\text{prot}_t^o)^{-1}) E_t = \frac{1}{TP} \sum_{t=1}^T \| \text{ptsl}_t^{\hat{o}} - \text{ptsl}_t^o \|_2^2 Degree์™€ cm๋กœ ๋ณด๊ณ .
  2. Mean Per-Joint Position Error (cm): E_j = \frac{1}{T \cdot F} \sum_{t=1}^T \sum_{f=1}^F \| j_t^d - j_t^h \|_2^2 Hand joint ์œ„์น˜ ํ‰๊ท  ์˜ค๋ฅ˜.
  3. Mean Per-Fingertip Position Error (cm): E_{ft} = \frac{1}{T \cdot M} \sum_{t=1}^T \sum_{ft=1}^M \| t_t^{df t} - t_t^{hf t} \|_2^2 Fingertip motion ๋ชจ๋ฐฉ ํ’ˆ์งˆ ํ‰๊ฐ€. M์€ ๋‹จ์† 5, ์–‘์† 10.
  4. Success Rate (SR): E_r, E_t, E_j, E_{ft}๊ฐ€ ๊ฐ๊ฐ 30โ—ฆ, 3 cm, 8 cm, 6 cm ์ดํ•˜์ผ ๋•Œ ์„ฑ๊ณต. ์–‘์† task๋Š” ์–ด๋А ํ•œ ์†์ด๋ผ๋„ ์กฐ๊ฑด์„ ๋งŒ์กฑ ๋ชปํ•˜๋ฉด ์‹คํŒจ.

Implementation Details: ๊ฐ dexterous robotic hand์— 21๊ฐœ keypoints (fingertips, palm, phalangeal positions) ์ˆ˜๋™ ์„ ํƒ. Curriculum learning (\epsilon_{finger}: 6cm to 4cm, \epsilon_{object}: 90โ—ฆ/6cm to 30โ—ฆ/2cm). PPO ์‚ฌ์šฉ. Batch size 1024, \gamma=0.99. Isaac Gym ํ™˜๊ฒฝ, 4096 environments ๋ณ‘๋ ฌ ์‹คํ–‰ (RTX 4090, i9-13900KF).

4.3. Evaluations

RL-combined methods์™€ optimization-based methods ๋น„๊ต.

  • RL-Combined: RL-Only, Retarget + Residual (human-robot keypoint alignment ํ›„ residual action), Retarget-Only (naive baseline).
  • Table 1: MANIPTRANS๊ฐ€ ๋ชจ๋“  baseline ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ precision๊ณผ SR (ํŠนํžˆ bimanual SR 39.5% vs 13.9%, 12.1%, 0.0%). Retarget-Only๋Š” ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅ. RL-Only๋Š” ๋น„์ตœ์ . Retarget+Residual ๋Œ€๋น„ ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ ํ™œ์šฉ์œผ๋กœ ๋” ์ •ํ™•ํ•œ ์กฐ์ž‘ ๊ฐ€๋Šฅ. Retargeting ๋ฐฉ์‹์€ ์ ‘์ด‰์ด ๋งŽ์€ ์ƒํ™ฉ์—์„œ ๋ถˆ์•ˆ์ •์„ฑ ์œ ๋ฐœ. Fig 3์€ MANIPTRANS์˜ qualitative ๊ฒฐ๊ณผ.
  • Optimization-Based (QuasiSim ): Qualitative ๋น„๊ต (Fig 4). MANIPTRANS๊ฐ€ Shadow Hand์—์„œ ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์•ˆ์ •์ ์ธ ์ ‘์ด‰, ๋ถ€๋“œ๋Ÿฌ์šด ๋™์ž‘ ์ƒ์„ฑ. Efficientcy ์ธก๋ฉด์—์„œ MANIPTRANS๊ฐ€ ํ›จ์”ฌ ๋น ๋ฆ„ (minutes vs hours).

4.4. Cross-Embodiments Validation

Shadow Hand , articulated MANO hand [27, 96], Inspire Hand , Allegro Hand (DoF: 22, 22, 12, 16) ๋“ฑ ๋‹ค์–‘ํ•œ embodiment์— ๋Œ€ํ•œ extensibility ์‹œ์—ฐ (Fig 4, 5, Appx A). ์ธ๊ฐ„ ์†๊ฐ€๋ฝ๊ณผ ๋กœ๋ด‡ joint ๊ฐ„ correspondence์—๋งŒ ์˜์กดํ•˜์—ฌ embodiment-agnostic. Network hyperparameters๋‚˜ reward weights ๋ณ€๊ฒฝ ์—†์ด ์ผ๊ด€๋œ, ๋ถ€๋“œ๋Ÿฝ๊ณ  ์ •๋ฐ€ํ•œ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ. Allegro Hand (4 finger, ํผ) ์ ์‘ ์œ„ํ•ด fingertip mapping ๋ฐ \epsilon_{finger} (8cm๋กœ ์™„ํ™”) ์กฐ์ •. Appx A.3 (Table 3)๋Š” ํ™•์žฅ ์‹คํ—˜ ์„ค์ • ์š”์•ฝ.

4.5. Real-World Deployment

๋‘ ๋Œ€์˜ 7-DoF Realman arm [95]๊ณผ upgraded Inspire Hands (tactile sensors ์ถ”๊ฐ€) ์‚ฌ์šฉ (Fig 6). Simulation์˜ 12-DoF ๋กœ๋ด‡ ์†๊ณผ ์‹ค์ œ 6-DoF ํ•˜๋“œ์›จ์–ด ๊ฐ„ ๊ฒฉ์ฐจ ํ•ด์†Œ๋ฅผ ์œ„ํ•ด fitting-based method ์‚ฌ์šฉ: fingertip alignment๋ฅผ ์œ„ํ•ด ์‹ค์ œ ๋กœ๋ด‡์˜ joint angles qฬƒd \in R^6 ์ตœ์ ํ™” argmin_{qฬƒd} \frac{1}{T \cdot M} \sum_{t=1}^T \sum_{ft=1}^M \|t_t^{df t} - t_t^{\tilde{d}f t}\|_2^2. ์ถ”๊ฐ€ temporal smoothness loss L_{smooth} = \frac{1}{T-1}\sum_{t=1}^{T-1} \|q_{t+1}^{\tilde{d}} - q_t^{\tilde{d}}\|_2^2 ํฌํ•จ. Inverse kinematics๋กœ arm ์ œ์–ด. Replay ์‹œ strict temporal alignment๋Š” ๊ฐ•์ œํ•˜์ง€ ์•Š์Œ. โ€œOpening the toothpasteโ€ ๋“ฑ ๋ณต์žกํ•œ ๋ฏธ์„ธ ์–‘์† ์กฐ์ž‘ ์„ฑ๊ณต ์‹œ์—ฐ.

4.6. Ablation Studies

  • Tactile Information: Contact force C๋ฅผ observation, reward, termination ์กฐ๊ฑด์œผ๋กœ ํ†ตํ•ฉ ํšจ๊ณผ ๋ถ„์„ (Fig 7a). Reward์— C ํฌํ•จ ์‹œ SR ๊ฐœ์„ . Observation ์‹œ ์ˆ˜๋ ด ๊ฐ€์†. Termination ์กฐ๊ฑด์—์„œ C ์ œ์™ธ ์‹œ ์ดˆ๊ธฐ ์„ฑ๋Šฅ ์ข‹์œผ๋‚˜ ์ˆ˜๋ ด ๋А๋ ค์ง.
  • Training Strategy (Curriculum Learning): Gravity relaxation, friction ์ฆ๊ฐ€, thresholds relaxation ํšจ๊ณผ ๋ถ„์„ (Fig 7b). Gravity ๋ฌด์‹œ, high friction ์‚ฌ์šฉ ์‹œ ์ˆ˜๋ ด ๊ฐ€์† ๋ฐ ์ตœ์ข… SR ์ฆ๊ฐ€. Threshold constraints ์ดˆ๊ธฐ ์™„ํ™” ์—†์œผ๋ฉด ์ˆ˜๋ ด ์‹คํŒจ ๊ฐ€๋Šฅ.

4.7. DEXMANIPNET for Policy Learning

DEXMANIPNET์˜ policy learning ์ž ์žฌ๋ ฅ ๋ฒค์น˜๋งˆํ‚น. Rearrangement task (bottle to goal)์— IBC , BET [101], Diffusion Policy [25] (UNet, Transformer ๊ธฐ๋ฐ˜) ์ ์šฉ (Table 2, Fig 11). 85% ํ•™์Šต, 15% ํ‰๊ฐ€. SR์€ ๊ฐ์ฒด ์ตœ์ข… ์œ„์น˜๊ฐ€ ๋ชฉํ‘œ 10cm ์ด๋‚ด์ผ ๋•Œ ์„ฑ๊ณต. Dexterous manipulation task์˜ ์–ด๋ ค์›€ ๊ฐ•์กฐ. Regression-based methods๋Š” error accumulation ๋ฌธ์ œ.

5. Conclusion and Discussion

MANIPTRANS๋Š” ์ธ๊ฐ„ ์กฐ์ž‘ ๊ธฐ์ˆ ์„ dexterous robotic hand์— ํšจ์œจ์ ์œผ๋กœ ์ด์ „ํ•˜๋Š” 2๋‹จ๊ณ„ ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์† ๋™์ž‘ ๋ชจ๋ฐฉ๊ณผ ๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ์„ residual learning์œผ๋กœ ๋ถ„๋ฆฌํ•˜์—ฌ ํ˜•ํƒœํ•™์  ์ฐจ์ด์™€ ๋ณต์žกํ•œ task ์–ด๋ ค์›€์„ ๊ทน๋ณตํ•˜๋ฉฐ ๊ณ ํ™”์งˆ ๋™์ž‘๊ณผ ํšจ์œจ์  ํ•™์Šต์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ MANIPTRANS๊ฐ€ SOTA methods๋ฅผ motion precision ๋ฐ computational efficiency ๋ฉด์—์„œ ๋Šฅ๊ฐ€ํ•˜๋ฉฐ, cross-embodiment ์ ์‘์„ฑ ๋ฐ real-world deployment ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํ™•์žฅ ๊ฐ€๋Šฅํ•œ DEXMANIPNET์€ Embodied AI ๋ฐœ์ „์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.

Discussion and Limitations: MANIPTRANS๋Š” ๋Œ€๋ถ€๋ถ„์˜ MoCap ๋ฐ์ดํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ด์ „ํ•˜์ง€๋งŒ, ์ผ๋ถ€ sequence๋Š” ๊ทธ๋ ‡์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์ฃผ์š” ์ด์œ ๋Š” 1) ์ƒํ˜ธ์ž‘์šฉ poses์˜ ๊ณผ๋„ํ•œ noise, 2) simulation์„ ์œ„ํ•œ ๊ฐ์ฒด ๋ชจ๋ธ, ํŠนํžˆ articulated objects์˜ ๋ถ€์ •ํ™•์„ฑ์ž…๋‹ˆ๋‹ค. MANIPTRANS์˜ ๊ฐ•๊ฑด์„ฑ ํ–ฅ์ƒ๊ณผ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊ทธ๋Ÿด๋“ฏํ•œ ๊ฐ์ฒด ๋ชจ๋ธ ์ƒ์„ฑ์€ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค.

Supplementary Material์—์„œ๋Š” Extensibility (Articulated Object Manipulation - Appx A.1 Fig 8, Challenging Hand Embodiments like Allegro Hand - Appx A.2 Fig 9, Table 3), Robustness Evaluation (noisy hand trajectory input - Appx B Table 4), Time Cost Analysis (Appx C Fig 10), Settings Details (Hand/Dexterous Hand Correspondence, Training/Simulation Parameters - Appx D), DEXMANIPNET Statistics (Table 6), Rearrangement Policy Learning Details (Table 7, Fig 11)๋ฅผ ์ถ”๊ฐ€๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.


Detail Review

๋ฐฐ๊ฒฝ๊ณผ ๋ฌธ์ œ ์ •์˜

ํ˜„๋Œ€ ๋‹ค๊ด€์ ˆ ๋กœ๋ด‡ ์†(dexterous robotic hand)์€ ์ธ๊ฐ„ ์†์ฒ˜๋Ÿผ ์ •๊ตํ•œ ์กฐ์ž‘์„ ๋ชฉํ‘œ๋กœ ๊ฐœ๋ฐœ๋˜๊ณ  ์žˆ์œผ๋ฉฐ, ํŠนํžˆ ์–‘์†(bimanual) ํ˜‘๋™ ์ž‘์—…์„ ์ธ๊ฐ„ ์ˆ˜์ค€์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์€ ํฐ ๋„์ „์ž…๋‹ˆ๋‹ค. ์ธ๊ฐ„์˜ ์–‘์†์€ ํŽœ ๋šœ๊ป‘์„ ์—ด๊ณ  ๋‹ซ๊ฑฐ๋‚˜ ๋ณ‘ ๋šœ๊ป‘์„ ๋น„ํŠธ๋Š” ๋“ฑ ๋ณต์žกํ•œ ํ˜‘์กฐ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋กœ๋ด‡์—๊ฒŒ ์ด๋Ÿฌํ•œ ๋Šฅ๋ ฅ์„ ํ•™์Šต์‹œํ‚ค๋Š” ์ผ์€ ์‰ฝ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด์— ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•๋“ค์€ ๊ฐ•ํ™”ํ•™์Šต(RL)์„ ํ†ตํ•ด ๋กœ๋ด‡ ์† ํ–‰๋™์„ ์Šค์Šค๋กœ ํƒ์ƒ‰ํ•˜๊ฒŒ ํ•˜๊ฑฐ๋‚˜, ์‚ฌ๋žŒ ์กฐ์ž‘์„ ์›๊ฒฉ์กฐ์ž‘(teleoperation)์œผ๋กœ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ด ์™”์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ „ํ†ต์  RL์€ ๊ณผ์ œ๋ณ„๋กœ ์ •๊ตํ•œ ๋ณด์ƒ ํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•ด์•ผ ํ•ด ํ™•์žฅ์„ฑ์ด ๋–จ์–ด์ง€๊ณ  ๋ณต์žกํ•œ ๊ณผ์ œ์—๋Š” ์ ์šฉ์ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์€ ์‚ฌ๋žŒ ์šด์˜์ž๊ฐ€ ๊ฐ€์ƒํ˜„์‹ค(VR) ๊ธฐ๊ธฐ๋ฅผ ํ™œ์šฉํ•ด ๋กœ๋ด‡ ์†์„ ์ง์ ‘ ์กฐ์ข…ํ•˜๋ฉฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ๋ฐฉ์‹์ธ๋ฐ, ๋น„์šฉยท๋…ธ๋ ฅ ๋ฉด์—์„œ ๋น„ํšจ์œจ์ ์ด๊ณ  ํ•œ์ •๋œ ํ™˜๊ฒฝ์— ํŠนํ™”๋œ ๋ฐ์ดํ„ฐ์…‹๋งŒ ์–ป๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ตœ๊ทผ์—๋Š” ์ธ๊ฐ„ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ(์˜ˆ: ๋ชจ์…˜ ์บก์ฒ˜ MoCap์œผ๋กœ ๊ธฐ๋กํ•œ ์‚ฌ๋žŒ ์† ๋™์ž‘)๋ฅผ ๋กœ๋ด‡ ์†์— ๋ชจ๋ฐฉ ์ „์ดํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ๋žŒ์˜ ์กฐ์ž‘ ๊ถค์ ์„ ๋ชจ๋ฐฉํ•˜๋ฉด ์ธ๊ฐ„๊ณผ ์œ ์‚ฌํ•œ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฌผ์ฒด-์† ์ƒํ˜ธ์ž‘์šฉ์„ ์–ป์„ ์ˆ˜ ์žˆ๊ณ , ๋Œ€๊ทœ๋ชจ MoCap ๋ฐ์ดํ„ฐ์…‹๊ณผ ์† ์ถ”์  ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ ์–‘์งˆ์˜ ์ธ๊ฐ„ ์กฐ์ž‘ ์‹œํ€€์Šค๋ฅผ ์‰ฝ๊ฒŒ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ์ด๋Ÿฐ ๋ชจ๋ฐฉ ํ•™์Šต์„ ํ•˜๋ฉด ํ˜„์‹ค์—์„œ ๋ฐ”๋กœ ์‹คํ—˜ํ•˜์ง€ ์•Š๊ณ ๋„ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ์…˜ ์ „์ด(motion transfer) ๋ฌธ์ œ๋ž€, ์ธ๊ฐ„ ๋‘ ์†์˜ ์กฐ์ž‘ ์‹œ์—ฐ์„ ์ฃผ์–ด์ง„ ๋กœ๋ด‡ ์–‘์† ์‹œ์Šคํ…œ์— ์˜ฎ๊ฒจ์™€ ๋™์ผํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๋งŒ๋“œ๋Š” ๊ณผ์ œ๋ฅผ ๋งํ•ฉ๋‹ˆ๋‹ค. ๋ณด๋‹ค ๊ณต์‹์ ์œผ๋กœ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์™ผ์†๊ณผ ์˜ค๋ฅธ์† ๋‘ ๊ฐœ์˜ ๋‹ค๊ด€์ ˆ ๋กœ๋ด‡ ์†์ด ์ฃผ์–ด์ง„ ์ธ๊ฐ„์˜ ์™ผ์†, ์˜ค๋ฅธ์† ์›€์ง์ž„์„ ๋ชจ๋ฐฉํ•ด ๋‘ ๋ฌผ์ฒด๋ฅผ ํ˜‘๋ ฅ ์กฐ์ž‘ํ•˜๋„๋ก ํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๋กœ ๋ฌธ์ œ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ•œ์ชฝ ์†์ด ํŽœ ๋šœ๊ป‘์„ ์žก๊ณ  ๋‹ค๋ฅธ ์†์ด ํŽœ ๋ชธ์ฒด๋ฅผ ์ฅ๋Š” ํŽœ ๋งˆ๊ฐœ ๋‹ซ๊ธฐ ์ž‘์—…์„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž…๋ ฅ์œผ๋กœ๋Š” ์ธ๊ฐ„ ์† ์‹œ์—ฐ์˜ ์ฐธ์กฐ ๊ถค์ (ํ”„๋ ˆ์ž„ ์‹œํ€€์Šค๋กœ ํ‘œํ˜„๋œ ์†๋ชฉ์˜ 6์ž์œ ๋„ ์ž์„ธ, ์†๊ฐ€๋ฝ ๊ด€์ ˆ ๊ฐ๋„ ๋ฐ ๊ฐ์†๋„ ๋“ฑ)๊ณผ ๋ฌผ์ฒด๋“ค์˜ ์›€์ง์ž„ ๊ถค์ ์ด ์ฃผ์–ด์ง€๋ฉฐ, ๋ชฉํ‘œ๋Š” ๋กœ๋ด‡ ์†๋“ค์ด ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ƒ์—์„œ ์ด ์ฐธ์กฐ ๋™์ž‘์„ ์ •ํ™•ํžˆ ๋”ฐ๋ผํ•˜๋ฉด์„œ๋„ ์ž‘์—…์˜ ๋ฌผ๋ฆฌ์  ์ œ์•ฝ์„ ๋งŒ์กฑํ•˜๋Š” ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด์™€ ๊ฐ™์€ ์–‘์† ์กฐ์ž‘ ๋ชจ์…˜ ์ „์ด๋Š” ๋ช‡ ๊ฐ€์ง€ ์–ด๋ ค์›€์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ์„  ์ธ๊ฐ„ ์†๊ณผ ๋กœ๋ด‡ ์†์˜ ํ˜•ํƒœ(morphology)๊ฐ€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ˆœํžˆ ๊ด€์ ˆ ๊ฐ๋„๋ฅผ ๋งค์นญ์‹œํ‚ค๋Š” ์ง์ ‘ ๋ฆฌํƒ€๊ฒŒํŒ…์€ ๋ถ€์ž์—ฐ์Šค๋Ÿฌ์šด ์ž์„ธ๋ฅผ ๋งŒ๋“ค๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. ๋˜, ๋ชจ์บก ๋ฐ์ดํ„ฐ ์ž์ฒด๊ฐ€ ์ •ํ™•ํ•˜๋‹ค ํ•˜๋”๋ผ๋„ ํ”„๋ ˆ์ž„ ๋‹จ์œ„์˜ ์ž‘์€ ์˜ค์ฐจ๋“ค์ด ๋ˆ„์ ๋˜๋ฉด ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ณ ์ •๋ฐ€ ์ž‘์—…์—์„œ๋Š” ์‹คํŒจ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ํ•œ ์†์ด ์•„๋‹Œ ๋‘ ์†์„ ๋™์‹œ์— ์ œ์–ดํ•˜๋ ค๋ฉด ๋™์ž‘ ๊ณต๊ฐ„์˜ ์ฐจ์›์ด ๋งค์šฐ ๋†’์•„์ ธ ํ•™์Šต ๋‚œ์ด๋„๊ฐ€ ํญ์ฆํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ์„ ํ–‰ ์—ฐ๊ตฌ๋“ค ๋Œ€๋ถ€๋ถ„์€ ๋‹จ์ผ ์†์˜ grasp(์ฅ๊ธฐ)์ด๋‚˜ ๋ฌผ์ฒด ๋“ค์–ด์˜ฌ๋ฆฌ๊ธฐ ์ •๋„์—์„œ ๋ฉˆ์ถ”๊ณ , ๋ณ‘ ๋šœ๊ป‘ ๋Œ๋ ค ์—ด๊ธฐ๋‚˜ ํŽœ ๋šœ๊ป‘ ์”Œ์šฐ๊ธฐ ๊ฐ™์€ ๋ณต์žกํ•œ ์–‘์† ๋™์ž‘์€ ๊ฑฐ์˜ ๋‹ค๋ฃจ์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฐฐ๊ฒฝ์—์„œ ์ด๋ฒˆ์— ์†Œ๊ฐœํ•  ManipTrans (CVPR 2025 ์ฑ„ํƒ ๋…ผ๋ฌธ)๋Š” ์ธ๊ฐ„์˜ ์–‘์† ์กฐ์ž‘ ์‹œ์—ฐ์„ ๋กœ๋ด‡์˜ ๋‘ ์†์— ํšจ๊ณผ์ ์œผ๋กœ ์ „์ดํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ โ€œ๋ชจ์…˜ ์ „์ดโ€ ๋ฌธ์ œ๋ฅผ ๋‘ ๋‹จ๊ณ„๋กœ ๋ถ„ํ• ํ•˜์—ฌ ์ƒ๊ฐํ•œ ๋…์ฐฝ์  ์ ‘๊ทผ์ด ๋‹๋ณด์ž…๋‹ˆ๋‹ค. ์ „ํ†ต์  ๋ฆฌํƒ€๊ฒŒํŒ… ๋ฐฉ๋ฒ•์ด ๋ชจ์บก ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ๋กœ๋ด‡ ๊ด€์ ˆ๋กœ ๋งคํ•‘ํ•˜๋ ค๋‹ค ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ์•ˆ์ •ํ•œ ๋™์ž‘์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๋ฐ ๋ฐ˜ํ•ด, ManipTrans๋Š” ์‹œ์—ฐ ๋ชจ์…˜์„ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅํ•œ ๋กœ๋ด‡ ํ–‰๋™์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”๋ฐ ์„ฑ๊ณต์ ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆผ์€ ๊ธฐ์กด ๋ฆฌํƒ€๊ฒŒํŒ…์˜ ์‹คํŒจ ์‚ฌ๋ก€(๋กœ๋ด‡์ด ๋ชจ์บก ๊ถค์ ์„ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ฅด๋‹ค ๋ฌผ์ฒด๋ฅผ ๋†“์น˜๋Š” ๋ชจ์Šต)์™€ ManipTrans๋กœ ํ•™์Šตํ•œ ๊ฒฐ๊ณผ(ํŽœ ๋šœ๊ป‘ ์”Œ์šฐ๊ธฐ, ๋ณ‘ ๋งˆ๊ฐœ ์—ด๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ์ž‘์—…์„ ์„ฑ๊ณต์ ์œผ๋กœ ์žฌํ˜„)์˜ ๋น„๊ต์ž…๋‹ˆ๋‹ค.

์ฃผ์š” ๊ธฐ์—ฌ ๋ฐ ํ˜์‹ ์ 

๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์š”์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์ด์ค‘ ๋กœ๋ด‡ ์† ๋ชจ์…˜ ์ „์ด๋ฅผ ์œ„ํ•œ 2๋‹จ๊ณ„ ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์•ˆ โ€“ ์ธ๊ฐ„์˜ ์–‘์† ์กฐ์ž‘ ๊ธฐ์ˆ ์„ ๋กœ๋ด‡์— ์ •๋ฐ€ํ•˜๊ฒŒ ์ „์ดํ•˜๊ธฐ ์œ„ํ•ด, ๋จผ์ € ์† ์›€์ง์ž„ ์ž์ฒด๋ฅผ ๋ชจ๋ฐฉํ•˜๊ณ  ์ดํ›„ ๋ฌผ์ฒด ์ƒํ˜ธ์ž‘์šฉ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ManipTrans ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋กœ์จ ์ฐธ์กฐ ์†/๋ฌผ์ฒด ๊ถค์ ์„ ๋‘˜ ๋‹ค ์ •ํ™•ํžˆ ์ถ”์ ํ•˜๋ฉฐ ๊ณผ์ œ ์ˆ˜ํ–‰์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ๋Œ€๊ทœ๋ชจ ๋ชจ์‚ฌ ๋ฐ์ดํ„ฐ์…‹ DexManipNet ๊ตฌ์ถ• โ€“ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ƒˆ๋กœ์šด ์–‘์† ์กฐ์ž‘ ์ž‘์—…(ํŽœ ๋šœ๊ป‘ ์”Œ์šฐ๊ธฐ, ๋ณ‘๋šœ๊ป‘ ๋Œ๋ ค ์—ด๊ธฐ, ์‹คํ—˜์šฉ ํ”Œ๋ผ์Šคํฌ ํ”๋“ค๊ธฐ ๋“ฑ)๊นŒ์ง€ ํฌํ•จํ•œ ๋Œ€๊ทœ๋ชจ ๋กœ๋ด‡ ์กฐ์ž‘ ๋ฐ์ดํ„ฐ์…‹์„ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. DexManipNet์€ 3,300๊ฐœ ์—ํ”ผ์†Œ๋“œ์—์„œ ์•ฝ 134๋งŒ ํ”„๋ ˆ์ž„์˜ ๋กœ๋ด‡ ์† ์กฐ์ž‘ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ด๊ณ  ์žˆ์œผ๋ฉฐ, 61๊ฐ€์ง€์— ์ด๋ฅด๋Š” ํ’๋ถ€ํ•œ ์ž‘์—…๋“ค์„ ํฌ๊ด„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ด์ „์— ๊ณต๊ฐœ๋œ ์œ ์‚ฌ ๋ฐ์ดํ„ฐ์…‹๋“ค๋ณด๋‹ค ๊ทœ๋ชจ๋‚˜ ๋‹ค์–‘์„ฑ ๋ฉด์—์„œ ํ›จ์”ฌ ํฌ๋ฉฐ, ํ–ฅํ›„ ๋กœ๋ด‡ ์ •์ฑ… ํ•™์Šต ์—ฐ๊ตฌ์— ๊ท€์ค‘ํ•œ ์ž์›์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ํƒ์›”ํ•œ ์„ฑ๋Šฅ ๋ฐ ์ผ๋ฐ˜ํ™” โ€“ ์ œ์•ˆ ๋ฐฉ๋ฒ•์„ ๋‹ค์–‘ํ•œ ์‹คํ—˜์œผ๋กœ ๊ฒ€์ฆํ•œ ๊ฒฐ๊ณผ, ๊ธฐ์กด ์ตœ์‹  ๊ธฐ๋ฒ• ๋Œ€๋น„ ๋™์ž‘ ์ •๋ฐ€๋„์™€ ์ „์ด ์„ฑ๊ณต๋ฅ ์—์„œ ํฌ๊ฒŒ ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๊ฐœ์ธ์šฉ PC ํ™˜๊ฒฝ์—์„œ์กฐ์ฐจ ํ•™์Šต ํšจ์œจ์ด ์šฐ์ˆ˜ํ•˜์—ฌ ์ „์ด ์†๋„๊ฐ€ ๋น ๋ฅด๊ณ , ์—ฌ๋Ÿฌ ํ˜•ํƒœ์™€ ์ž์œ ๋„๋ฅผ ๊ฐ€์ง„ ๋กœ๋ด‡ ์†(์˜ˆ: Shadow Hand, Allegro Hand ๋“ฑ)์—๋„ ์ตœ์†Œํ•œ์˜ ์ˆ˜์ •๋งŒ์œผ๋กœ ์ ์šฉ๋˜์–ด ์ผ๊ด€๋œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ํ•™์Šตํ•œ ์ •์ฑ…์„ ์‹ค์ œ ๋กœ๋ด‡ ์žฅ๋น„๋กœ ์žฌ์ƒํ•˜์—ฌ, ๊ธฐ์กด ๊ฐ•ํ™”ํ•™์Šต์ด๋‚˜ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์œผ๋กœ๋Š” ๋‹ฌ์„ฑํ•˜์ง€ ๋ชปํ–ˆ๋˜ ๋ฏผ์ฒฉํ•˜๊ณ  ์ž์—ฐ์Šค๋Ÿฌ์šด ์–‘์† ์กฐ์ž‘์„ ํ˜„์‹ค์—์„œ๋„ ๊ตฌํ˜„ํ•ด ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ManipTrans๋Š” ๋‹จ์ˆœํ•˜์ง€๋งŒ ํšจ๊ณผ์ ์ธ ์ด์ค‘ ๋กœ๋ด‡ ์† ๋ชจ์…˜ ์ „์ด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ์ธ๊ฐ„ ์กฐ์ž‘ ์‹œ์—ฐ์„ ๋กœ๋ด‡์—์„œ ์ •ํ™•ํžˆ ์žฌํ˜„ํ•˜๋Š” ๋ฐ ์„ฑ๊ณตํ•จ์œผ๋กœ์จ ํ•™์ˆ ์ ์œผ๋กœ๋‚˜ ์‹ค์šฉ์ ์œผ๋กœ ํฐ ์˜์˜๋ฅผ ์ง€๋‹™๋‹ˆ๋‹ค.

๊ธฐ์ˆ ์  ๊ตฌ์„ฑ: ManipTrans ๋‘ ๋‹จ๊ณ„ ๋ฐฉ๋ฒ•๋ก 

ManipTrans์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๋ชจ์…˜ ์ „์ด๋ฅผ โ€œ๋‘˜๋กœ ๋‚˜๋ˆ„์–ดโ€ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ๋Š” ์†์˜ ์›€์ง์ž„ ์ž์ฒด์— ์ง‘์ค‘ํ•˜๊ณ , ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ ๊ทธ ์›€์ง์ž„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฃจ๋Š” ์„ธ๋ถ€ ์กฐ์ž‘์„ ๋ณด์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ถ„๋ฆฌ๋Š” ์ธ๊ฐ„-๋กœ๋ด‡ ์† ๊ตฌ์กฐ ์ฐจ์ด๋กœ ์ธํ•œ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ณ , ๋†’์€ ์ฐจ์›์˜ ์–‘์† ์ œ์–ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ์ „๋žต์ž…๋‹ˆ๋‹ค.

1๋‹จ๊ณ„: ์† ๋™์ž‘ ๋ชจ๋ฐฉ (Trajectory Imitation Pre-training)

์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ๋Š” ๋ฌผ์ฒด์™€์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ๋ฐฐ์ œํ•œ ์ฑ„ ์†์˜ ๊ณ ์œ ํ•œ ์›€์ง์ž„ ๊ถค์ ์„ ๋ชจ๋ฐฉํ•˜๋Š” ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ธ๊ฐ„ ์‹œ์—ฐ์—์„œ ์–ป์€ ์–‘์†์˜ ์†๋ชฉ 6์ž์œ ๋„ ์ž์„ธ ๋ฐ ์†๊ฐ€๋ฝ ๊ด€์ ˆ ๊ฐ๋„ ์‹œํ€€์Šค๋ฅผ ๋ชฉํ‘œ๋กœ, ๋กœ๋ด‡ ์†์ด ์ด๋ฅผ ๋™์ผํ•˜๊ฒŒ ๋”ฐ๋ผ๊ฐ€๋„๋ก ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค[19]. ์ด๋•Œ ๋กœ๋ด‡ ์†์€ ๋ฌผ์ฒด๋ฅผ ์žก๊ฑฐ๋‚˜ ํž˜์„ ์ฃผ๋Š” ๋“ฑ์˜ ์ƒํ˜ธ์ž‘์šฉ ์—†์ด ๊ณต์ค‘์—์„œ ์†๊ฐ€๋ฝ ๋ชจ์–‘๊ณผ ์›€์ง์ž„๋งŒ ํ‰๋‚ด ๋‚ด๋„๋ก ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

ํ•™์Šต์—๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฒ•(PPO ์•Œ๊ณ ๋ฆฌ์ฆ˜)์„ ํ™œ์šฉํ•˜๋ฉฐ, ์„ค๊ณ„๋œ ๋ณด์ƒ ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์š”์†Œ๋“ค๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  • ์†๋ชฉ ์œ„์น˜/์ž์„ธ ์ถ”์ข… ๋ณด์ƒ: ๋กœ๋ด‡ ์†๋ชฉ์ด ์ฐธ์กฐ ๊ถค์ ์˜ ์†๋ชฉ๊ณผ ์–ผ๋งˆ๋‚˜ ์ผ์น˜ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ๋ณด์ƒ์ž…๋‹ˆ๋‹ค. ์†๋ชฉ์˜ ์œ„์น˜์™€ ๋ฐฉํ–ฅ ์˜ค์ฐจ๋ฅผ ์ค„์ด๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.
  • ์†๊ฐ€๋ฝ ์ž์„ธ ์ถ”์ข… ๋ณด์ƒ: ๋กœ๋ด‡ ์†๊ฐ€๋ฝ ๊ด€์ ˆ ๊ฐ๋„๊ฐ€ ์ฐธ์กฐ ์ธ๊ฐ„ ์†๊ฐ€๋ฝ ๊ด€์ ˆ ๊ฐ’๊ณผ ๊ฐ€๊นŒ์›Œ์ง€๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์†๊ฐ€๋ฝ ๋ ์œ„์น˜๊ฐ€ ์ž˜ ๋งž๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋ฏ€๋กœ, ์‚ฌ๋žŒ ์† ๋ชจ๋ธ(MANO)๊ณผ ๋Œ€์‘๋˜๋Š” ๋กœ๋ด‡ ์†์˜ ํ•ต์‹ฌ ํ‚คํฌ์ธํŠธ(์—„์ง€, ๊ฒ€์ง€, ์ค‘์ง€ ๋ ๋งˆ๋”” ๋“ฑ)์— ๊ฐ€์ค‘์น˜๋ฅผ ๋‘์–ด ์ •๋ฐ€ ์ถ”์ข…ํ•˜๋„๋ก ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค[22]. ์ด๋Š” ์‚ฌ๋žŒ๊ณผ ๋กœ๋ด‡ ์†์˜ ํ˜•ํƒœ ์ฐจ์ด๋ฅผ ๋ณด์ •ํ•ด์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ์›€์ง์ž„ ๋ถ€๋“œ๋Ÿฌ์›€ ๋ณด์ƒ: ๊ฐ‘์ž‘์Šค๋Ÿฝ๊ฒŒ ํŠ€๋Š” ๋™์ž‘์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด, ๋กœ๋ด‡ ์† ๊ด€์ ˆ์˜ ๊ฐ์†๋„ ๋ณ€ํ™”๋‚˜ ํ† ํฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๊ณผ๋„ํ•˜์ง€ ์•Š๋„๋ก ํŽ˜๋„ํ‹ฐ๋ฅผ ์ค๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์‹œ์—ฐ๊ณผ ์œ ์‚ฌํ•œ ์ž์—ฐ์Šค๋Ÿฌ์šด ์›€์ง์ž„์„ ์–ป์Šต๋‹ˆ๋‹ค.

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ๋Š” ์† ๋™์ž‘๋งŒ ํฌํ•จ๋œ ๋ชจ์บก ๋ฐ์ดํ„ฐ์…‹๋“ค์„ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๊ธฐ์กด ๊ณต๊ฐœ ์† ๋ชจ์…˜ ์ปฌ๋ ‰์…˜ ๋ฐ ํ•ฉ์„ฑ ๋ณด๊ฐ„ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์ขŒ์šฐ ์† ๋™์ž‘ ๋นˆ๋„๋ฅผ ๊ท ํ˜• ๋งž์ถ”๊ธฐ ์œ„ํ•ด ์ขŒ์šฐ ๋Œ€์นญ ๋ณ€ํ™˜์„ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค[23]. ๋ฌผ์ฒด๊ฐ€ ์—†๋Š” ์† ๋‹จ๋… ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•จ์œผ๋กœ์จ, ๋ณต์žกํ•œ ๋ฌผ๋ฆฌ ์ƒํ˜ธ์ž‘์šฉ ์—†์ด๋„ ์†๊ฐ€๋ฝ ์›€์ง์ž„์„ ์ •๊ตํ•˜๊ฒŒ ๋ชจ๋ฐฉํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ์ด๋Š” ์ธ๊ฐ„-๋กœ๋ด‡ ์† ํ˜•ํƒœ ์ฐจ์ด๋กœ ์ธํ•œ ๋ฌธ์ œ๋ฅผ ํฌ๊ฒŒ ์ค„์—ฌ์ค๋‹ˆ๋‹ค. ๋˜ํ•œ ํ•™์Šต ํšจ์œจ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์ดˆ๊ธฐ ์ƒํƒœ๋ฅผ ์‹œ์—ฐ ๊ถค์ ์˜ ์ž„์˜ ์ง€์ ์—์„œ ์‹œ์ž‘(reference state initialization)์‹œํ‚ค๊ณ , ๊ถค์ ์—์„œ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚˜๋ฉด ์กฐ๊ธฐ ์ข…๋ฃŒํ•˜์—ฌ ๋‹ค์‹œ ์‹œ๋„ํ•˜๋„๋ก ํ•˜๋Š” ๋“ฑ์˜ ์ปค๋ฆฌํ˜๋Ÿผ ์ „๋žต์„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ 1๋‹จ๊ณ„์—์„œ๋Š” ๋…ธ์ด์ฆˆ์— ๊ฐ•์ธํ•œ(hand motion with resilience to noise) ๋ฒ”์šฉ ์† ๋ชจ์…˜ ๋ชจ๋ฐฉ ๋ชจ๋ธ์ด ์–ป์–ด์กŒ์Šต๋‹ˆ๋‹ค[25].

2๋‹จ๊ณ„: ์ž”์ฐจ ์ •์ฑ…์„ ํ†ตํ•œ ์ƒํ˜ธ์ž‘์šฉ ๋ฏธ์„ธ ์กฐ์ • (Residual Learning Fine-tuning)

1๋‹จ๊ณ„์—์„œ ์†๊ฐ€๋ฝ ์›€์ง์ž„์„ ์ž˜ ๋”ฐ๋ผํ•˜๊ฒŒ ๋œ ์ •์ฑ…์„ ๋ฐ”ํƒ•์œผ๋กœ, ์ด์ œ ์‹ค์ œ ๋ฌผ์ฒด๋ฅผ ์กฐ์ž‘ํ•˜๋Š” ์ œ์•ฝ์„ ๋ฐ˜์˜ํ•˜๋„๋ก Residual Learning ๊ธฐ๋ฒ•์„ ์ ์šฉํ•œ ์ž”์ฐจ(residual) ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. Residual Learning์ด๋ž€, ๊ธฐ์กด ์ •์ฑ…์˜ ํ–‰๋™์— ์ž‘์€ ๋ณด์ •๋Ÿ‰(์ž”์ฐจ)์„ ๋”ํ•ด์ฃผ๋Š” ํ˜•ํƒœ๋กœ ์ •์ฑ…์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ๊ธฐ์กด ์†”๋ฃจ์…˜ + ฮฑ ๋ฐฉ์‹์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์–ด ํšจ์œจ์ ์ด๊ณ  ์•ˆ์ •์ ์ธ ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค[27]. ManipTrans์—์„œ๋Š” 1๋‹จ๊ณ„ ๋ชจ๋ฐฉ ์ •์ฑ…์ด ๋‚ด๋Š” ๊ธฐ๋ณธ ๋™์ž‘์— ๋Œ€ํ•ด, 2๋‹จ๊ณ„ ์ž”์ฐจ ์ •์ฑ…์ด ํ•„์š”ํ•œ ์ถ”๊ฐ€ ์กฐ์ž‘์„ ๊ณ„์‚ฐํ•˜์—ฌ ํ•ฉ์„ฑ๋œ ์ตœ์ข… ํ–‰๋™์„ ๋กœ๋ด‡์— ์‹คํ–‰์‹œํ‚ต๋‹ˆ๋‹ค.

์ด๋•Œ 2๋‹จ๊ณ„์—์„œ๋Š” ๋กœ๋ด‡ ์†์ด ์‹ค์ œ๋กœ ๋ฌผ์ฒด๋ฅผ ์žก๊ณ  ๋‹ค๋ฃจ๋ฏ€๋กœ, ์ƒํƒœ ๊ณต๊ฐ„(state)์—๋„ ๋ฌผ์ฒด์™€์˜ ์ƒํ˜ธ์ž‘์šฉ ์ •๋ณด๊ฐ€ ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, 1๋‹จ๊ณ„์˜ ์† ๊ด€์ ˆ ์ƒํƒœ ๋“ฑ์— ๋”ํ•ด ๋ฌผ์ฒด์˜ ์ƒํƒœ(์†๋ชฉ ๊ธฐ์ค€ ์ƒ๋Œ€ ์œ„์น˜ ๋ฐ ์†๋„, ์งˆ๋Ÿ‰ ์ค‘์‹ฌ, ์ค‘๋ ฅ ๋ฐฉํ–ฅ ๋“ฑ)๋ฅผ ํฌํ•จ์‹œํ‚ค๊ณ , ๋ฌผ์ฒด์˜ ๋ชจ์–‘์€ BPS(Basis Point Set) ํ‘œํ˜„์œผ๋กœ ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ์ œ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ฐ ๋กœ๋ด‡ ์†๊ฐ€๋ฝ ํ‚คํฌ์ธํŠธ์™€ ๋ฌผ์ฒด ํ‘œ๋ฉด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•ด ์†-๋ฌผ์ฒด ๊ณต๊ฐ„์  ๊ด€๊ณ„๋ฅผ ํ”ผ์ฒ˜๋กœ ๋„ฃ๊ณ , ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ๋ถ€ํ„ฐ ์–ป๋Š” ์†๊ฐ€๋ฝ-๋ฌผ์ฒด ์ ‘์ด‰๋ ฅ๋„ ๋ช…์‹œ์ ์œผ๋กœ ํฌํ•จ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค[30]. ์ด๋ฅผ ํ†ตํ•ด ์ •์ฑ…์ด ์–‘์†๊ณผ ๋ฌผ์ฒด ์‚ฌ์ด์˜ ๋ฌผ๋ฆฌ์  ์ƒํ˜ธ์ž‘์šฉ์„ ์ธ์ง€ํ•˜๊ณ  ์•ˆ์ •์ ์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ์ฅ๊ฑฐ๋‚˜ ์กฐ์ž‘ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์ถฉ๋ถ„ํžˆ ์–ป๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Residual ์ •์ฑ…์˜ ๋™์ž‘์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: ์šฐ์„  ๋งค ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์Šคํ…๋งˆ๋‹ค 1๋‹จ๊ณ„ ๋ชจ๋ฐฉ ์ •์ฑ…์œผ๋กœ๋ถ€ํ„ฐ ํ˜„์žฌ ์ƒํƒœ์—์„œ์˜ ์˜ˆ์ƒ ๋™์ž‘ a_{\text{im}}์„ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด์–ด์„œ Residual ์ •์ฑ…์€ ํ™•์žฅ๋œ ์ƒํƒœ ํ‘œํ˜„์„ ๋ณด๊ณ  ๋ณด์ • ํ–‰๋™ a_{\text{res}}์„ ์‚ฐ์ถœํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข… ๋กœ๋ด‡ ์ œ์–ด ๋ช…๋ น์€ ์ด ๋‘˜์„ ํ•ฉ์นœ a = a_{\text{im}} + a_{\text{res}} ํ˜•ํƒœ๋กœ ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค. (ํ•„์š”ํ•  ๊ฒฝ์šฐ ๋กœ๋ด‡ ๊ด€์ ˆ ํ•œ๊ณ„ ๋“ฑ์„ ๋„˜์ง€ ์•Š๋„๋ก ํด๋ฆฌํ•‘ ์ฒ˜๋ฆฌํ•จ) ์ฒ˜์Œ ํ•™์Šต์„ ์‹œ์ž‘ํ•  ๋•Œ๋Š” ์ด๋ฏธ 1๋‹จ๊ณ„ ๋™์ž‘๋งŒ์œผ๋กœ๋„ ์ฐธ์กฐ ๊ถค์ ๊ณผ ์œ ์‚ฌํ•œ ์›€์ง์ž„์ด ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์—, Residual ์ถœ๋ ฅ์€ 0์— ๊ฐ€๊น๊ฒŒ ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์ด ๋ฐ”๋žŒ์งํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ๋„ Residual ์ •์ฑ…์˜ ๊ฐ€์ค‘์น˜๋Š” ์ฒ˜์Œ ํ‰๊ท  0์˜ ์ž‘์€ ๊ฐ’๋“ค๋กœ ์ดˆ๊ธฐํ™”ํ•˜๊ณ , ํ•™์Šต ์ดˆ๋ฐ˜์—๋Š” Residual์˜ ๊ธฐ์—ฌ๋ฅผ ์„œ์„œํžˆ ๋Š˜๋ ค๊ฐ€๋Š” ์›Œ๋ฐ์—… ์ „๋žต์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์กด ๋ชจ๋ฐฉ ๋™์ž‘์„ ํ•ด์น˜์ง€ ์•Š๊ณ  ๋ฏธ์„ธ ์กฐ์ •๋งŒ ํ•™์Šตํ•˜๋„๋ก ์œ ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค.

2๋‹จ๊ณ„์˜ ๋ณด์ƒ ํ•จ์ˆ˜๋Š”, 1๋‹จ๊ณ„์—์„œ ์“ฐ์ธ ์†๋™์ž‘ ๋ชจ๋ฐฉ ๋ณด์ƒ์— ๋”ํ•ด ๋‘ ๊ฐ€์ง€ ์š”์†Œ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค:

  1. ๋ฌผ์ฒด ๊ฒฝ๋กœ ์ถ”์ข… ๋ณด์ƒ์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ƒ์˜ ๋ฌผ์ฒด๊ฐ€ ์ธ๊ฐ„ ์‹œ์—ฐ์˜ ๋ฌผ์ฒด ๊ถค์ ์„ ์ž˜ ๋”ฐ๋ผ๊ฐ€๋„๋ก ์œ„์น˜ ๋ฐ ์†๋„ ์˜ค์ฐจ๋ฅผ ์ค„์ด๋Š” ๋ณด์ƒ์ž…๋‹ˆ๋‹ค.
  2. ์ ‘์ด‰ ํž˜ ๋ณด์ƒ์€ ์ธ๊ฐ„ ์‹œ์—ฐ์—์„œ ๋‘ ์†๊ฐ€๋ฝ์ด ๋ฌผ์ฒด๋ฅผ ์žก๊ณ  ์žˆ๋Š” ๊ตฌ๊ฐ„์— ํ•ด๋‹นํ•˜๋ฉด ๋กœ๋ด‡ ์†๊ฐ€๋ฝ๋„ ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์˜ ์ ‘์ด‰๋ ฅ์„ ๋ฐœ์ƒ์‹œํ‚ค๋„๋ก ์žฅ๋ คํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ชจ์บก ๋ฐ์ดํ„ฐ์—์„œ ์†๊ฐ€๋ฝ์ด ๋ฌผ์ฒด๋ฅผ ์ฅ๊ณ  ์žˆ๋Š” ํ”„๋ ˆ์ž„์—์„œ๋Š”, ๋กœ๋ด‡์ด ์ถฉ๋ถ„ํ•œ ํž˜์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ์žก์ง€ ์•Š๊ณ  ์žˆ์œผ๋ฉด ๋ณด์ƒ์ด ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋กœ๋ด‡์ด ๋ฌผ์ฒด๋ฅผ ํ™•์‹คํžˆ ์ฅ๊ณ  ๋†“์น˜์ง€ ์•Š๋„๋ก ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ „๋ฐ˜์ ์œผ๋กœ ManipTrans๋Š” ํŠน์ • ์ž‘์—…์— ๋งž์ถ˜ ํŠน์ˆ˜ํ•œ ๋ณด์ƒ ์„ค๊ณ„ ์—†์ด๋„(task-agnostic ๋ณด์ƒ) ์ด๋Ÿฌํ•œ ์ผ๋ฐ˜์  ๋ณด์ƒ ๊ตฌ์„ฑ๋งŒ์œผ๋กœ ๋‹ค์–‘ํ•œ ์ž‘์—…์—์„œ ์ž˜ ๋™์ž‘ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ๊ณผ์ •์˜ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ๋ช‡ ๊ฐ€์ง€ ํŠธ๋ฆญ์„ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ž”์ฐจ ์ •์ฑ… ํ•™์Šต ์ดˆ๋ฐ˜์— ํ”ํžˆ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๋Š”, ๋ฌผ๋ฆฌ ์ƒํ˜ธ์ž‘์šฉ ์ œ์•ฝ ๋•Œ๋ฌธ์— ๊ตญ์†Œ์ตœ์ ํ•ด์— ๋น ์ง€๊ฑฐ๋‚˜ ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•ด์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ด์™„(relaxation) ๊ธฐ๋ฒ•์„ ๋„์ž…ํ–ˆ๋Š”๋ฐ, ํ›ˆ๋ จ ์ดˆ๊ธฐ์—๋Š” ์ค‘๋ ฅ์„ 0์œผ๋กœ ์ค„์ด๊ณ  ๋งˆ์ฐฐ ๊ณ„์ˆ˜๋ฅผ ๋†’์ด๋Š” ์‹์œผ๋กœ ํ™˜๊ฒฝ์„ ์ผ์‹œ์ ์œผ๋กœ ์‰ฝ๊ฒŒ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค. ์ค‘๋ ฅ์ด ์—†๊ณ  ๋งˆ์ฐฐ์ด ํฐ ์ƒํ™ฉ์—์„œ๋Š” ๋กœ๋ด‡ ์†์ด ๋ฌผ์ฒด๋ฅผ ๊ฐ€๋ณ๊ฒŒ ๋ถ™์žก๊ณ  ์ฐธ์กฐ ๊ถค์ ์— ๋งž์ถ”๊ธฐ๊ฐ€ ์ˆ˜์›”ํ•ด์ง€๋ฏ€๋กœ, ์ดˆ๊ธฐ์— ๋น ๋ฅด๊ฒŒ ์„ฑ๊ณต ๊ถค์ ๋“ค์„ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ ์ ์ฐจ ์ค‘๋ ฅ์„ ์‹ค์ œ๊ฐ’์œผ๋กœ ๋ณต์›ํ•˜๊ณ  ๋งˆ์ฐฐ ๊ณ„์ˆ˜๋ฅผ ์ •์ƒ ์ˆ˜์ค€์œผ๋กœ ๊ฐ์†Œ์‹œ์ผœ, ์ตœ์ข…์ ์œผ๋กœ๋Š” ์‹ค์ œ ๋ฌผ๋ฆฌ ํ™˜๊ฒฝ์— ๊ฐ€๊น๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ์ธ์œ„์ ์ธ ๋ณ„๋„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์“ด ์„ ํ–‰์—ฐ๊ตฌ(์˜ˆ: QuasiSim)์™€ ๋‹ฌ๋ฆฌ, ํ‘œ์ค€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ(Isaac Gym)์˜ ์„ค์ •๋งŒ ๋™์ ์œผ๋กœ ๋ฐ”๊พธ๋Š” ๋ฐฉ์‹์ด๋ผ ๊ตฌํ˜„์ด ๊ฐ„๋‹จํ•˜๋ฉด์„œ๋„ ํšจ๊ณผ์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐ–์—๋„ 1๋‹จ๊ณ„์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ดˆ๊ธฐ ์ƒํƒœ๋ฅผ ์‹œ์—ฐ ๊ถค์  ๊ทผ์ฒ˜์—์„œ ์ƒ˜ํ”Œ๋งํ•˜๊ณ , ๋ฌผ์ฒด๊ฐ€ ์ผ์ • ๋ฒ”์œ„ ์ด์ƒ ๋–จ์–ด์ง€๋ฉด ์—ํ”ผ์†Œ๋“œ๋ฅผ ์กฐ๊ธฐ ์ข…๋ฃŒํ•˜๋Š” ๋“ฑ์˜ ๊ทœ์น™์„ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค[39]. ํŠนํžˆ ์ธ๊ฐ„ ์‹œ์—ฐ์—์„œ ๋‘ ์†์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๊ฝ‰ ์žก๋Š” ์‹œ์ ์— ๋กœ๋ด‡์ด ์ œ๋Œ€๋กœ ํž˜์„ ์ฃผ์ง€ ๋ชปํ•˜๋ฉด ๋ฐ”๋กœ ์ข…๋ฃŒํ•ด๋ฒ„๋ฆฌ๋Š” ์ ‘์ด‰๋ ฅ ์กฐ๊ฑด๋„ ๋‘์–ด, ๋ฐ˜๋“œ์‹œ ๋ฌผ์ฒด๋ฅผ ๋†“์น˜์ง€ ์•Š๋„๋ก ํ•™์Šต์‹œํ‚ค๋Š” ์„ธ๋ฐ€ํ•œ ์žฅ์น˜๋ฅผ ๋งˆ๋ จํ–ˆ์Šต๋‹ˆ๋‹ค. ์œ„์˜ 1๋‹จ๊ณ„ ๋ชจ๋ฐฉ ์ •์ฑ…๊ณผ 2๋‹จ๊ณ„ ์ž”์ฐจ ์ •์ฑ…์€ NVIDIA Isaac Gym ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ํ™˜๊ฒฝ์—์„œ ๊ตฌ๋™๋˜๋Š” ์ˆ˜์ฒœ ๊ฐœ์˜ ๋ณ‘๋ ฌ ์—ํ”ผ์†Œ๋“œ๋ฅผ ํ†ตํ•ด ํšจ์œจ์ ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” 4096๊ฐœ์˜ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์„ ์‚ฌ์šฉํ•˜์—ฌ PPO ๊ธฐ๋ฐ˜ ์ •์ฑ…์„ ํ•™์Šตํ–ˆ๊ณ , GPU ํ•œ ๋Œ€๋กœ๋„ ์›ํ™œํžˆ ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ–ˆ์Œ์„ ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ œ์•ˆ ๋ฐฉ๋ฒ•์˜ ํ•™์Šต ํšจ์œจ์„ฑ์ด ๋†’์•„ ์‹ค์šฉ์ ์ด๋ผ๋Š” ์ ์„ ๊ฐ•์กฐํ•˜๋Š” ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ: ์„ฑ๋Šฅ ํ‰๊ฐ€ ๋ฐ ๋ถ„์„

ManipTrans์˜ ์„ฑ๋Šฅ์€ ๋‹ค์–‘ํ•œ ์ง€ํ‘œ์—์„œ ํ‰๊ฐ€๋˜์—ˆ๊ณ , ์—ฌ๋Ÿฌ ๋น„๊ต ๋ฐฉ๋ฒ•์„ ํฌ๊ฒŒ ์ƒํšŒํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜์—๋Š” ๋Œ€ํ‘œ์ ์ธ ์–‘์† ์กฐ์ž‘ ๋ฐ์ดํ„ฐ์…‹์ธ OakInk-V2์˜ ๊ฒ€์ฆ ์„ธํŠธ๋ฅผ ํ™œ์šฉํ•˜์˜€์œผ๋ฉฐ(์•ฝ ์ ˆ๋ฐ˜์ด ์–‘์† ์ž‘์—…), ๊ทธ ์™ธ์— GRAB, FAOVR, ARCTIC ๋“ฑ์˜ ๋ฐ์ดํ„ฐ๋„ ์ •์„ฑ ํ‰๊ฐ€์— ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ‰๊ฐ€ ์ง€ํ‘œ๋กœ๋Š” ๋ฌผ์ฒด์˜ ์œ„์น˜/์ž์„ธ ์˜ค๋ฅ˜, ๋กœ๋ด‡ ์† ๊ด€์ ˆ ์œ„์น˜ ์˜ค๋ฅ˜, ์†๊ฐ€๋ฝ ๋ ์œ„์น˜ ์˜ค๋ฅ˜ ๋“ฑ์ด ํ‰๊ท ์ ์œผ๋กœ ์–ผ๋งˆ๋‚˜ ๋‚˜ํƒ€๋‚˜๋Š”์ง€ ๊ณ„์‚ฐํ–ˆ๊ณ , ํŠนํžˆ ์„ฑ๊ณต๋ฅ (success rate)์€ ๋กœ๋ด‡ ๋‘ ์† ๋ชจ๋‘๊ฐ€ ์ฐธ์กฐ ๊ถค์ ์„ ์ผ์ • ์˜ค์ฐจ ์ดํ•˜๋กœ ์ถ”์ ํ•˜๋ฉด ์„ฑ๊ณต์œผ๋กœ ๊ฐ„์ฃผํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ •์˜ํ–ˆ์Šต๋‹ˆ๋‹ค. (์–‘์† ์ž‘์—…์˜ ๊ฒฝ์šฐ ์–ด๋А ํ•œ ์†์ด๋ผ๋„ ๊ธฐ์ค€์„ ๋ชป ๋งŒ์กฑํ•˜๋ฉด ์‹คํŒจ๋กœ ์ฒ˜๋ฆฌํ•˜์—ฌ ์„ฑ๊ณต ์กฐ๊ฑด์„ ์—„๊ฒฉํ•˜๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.)

๋น„๊ต ๋Œ€์ƒ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š”

  1. Retarget-Only: ์•„๋ฌด ํ•™์Šต ์—†์ด ๋ชจ์บก ์†๋™์ž‘์„ ๋กœ๋ด‡ ๊ด€์ ˆ๋กœ ๋‹จ์ˆœ ์ด์‹ํ•œ ๊ฒฝ์šฐ,
  2. RL-Only: ๋ชจ๋ฐฉ ๋ณด์ƒ๋งŒ์œผ๋กœ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๊ฐ•ํ™”ํ•™์Šตํ•œ ๊ฒฝ์šฐ,
  3. Retarget + Residual: ์ธ๊ฐ„-๋กœ๋ด‡ ์†๊ฐ€๋ฝ ๋Œ€์‘์„ ๋งž์ถฐ ๋ฆฌํƒ€๊ฒŒํŒ…ํ•œ ๊ถค์ ์„ ๊ธฐ๋ณธ ๋™์ž‘์œผ๋กœ ํ•˜๊ณ , ๊ทธ ์œ„์— Residual RL๋งŒ ์ ์šฉํ•œ ๊ฒฝ์šฐ ๋“ฑ์ด ํฌํ•จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด๋“ค์€ ManipTrans์˜ ์ผ๋ถ€ ๊ตฌ์„ฑ์š”์†Œ๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๋ถ€๋ถ„ ๊ฒฐํ•ฉ ๊ธฐ๋ฒ•๋“ค์ด๋‚˜ ๊ธฐ์กด ๋ฌธํ—Œ์˜ ์ ‘๊ทผ๋ฒ•์„ ์žฌ๊ตฌํ˜„ํ•œ ๊ฒƒ์œผ๋กœ, ManipTrans์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•œ ๋น„๊ต๊ตฐ์ž…๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ๋ฅผ ์‚ดํŽด๋ณด๋ฉด ManipTrans๊ฐ€ ๋ชจ๋“  ์ง€ํ‘œ์—์„œ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์„ฑ๊ณต๋ฅ ์˜ ๊ฒฝ์šฐ, ๋‹จ์ˆœ ๋ฆฌํƒ€๊ฒŒํŒ…์€ ์–‘์† ์ž‘์—… ์„ฑ๊ณต๋ฅ  0%์— ๋ถˆ๊ณผํ–ˆ๊ณ  RL-Only๋„ ์•ฝ 12% ์ˆ˜์ค€์œผ๋กœ ๋งค์šฐ ๋‚ฎ์•˜์ง€๋งŒ, ManipTrans๋Š” ์•ฝ 39.5%์˜ ์–‘์† ์ž‘์—… ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•˜์—ฌ ํฌ๊ฒŒ ์•ž์„ฐ์Šต๋‹ˆ๋‹ค. ๋‹จ์ผ ์† ์ž‘์—…์—์„œ๋„ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋“ค์ด 30~47% ์„ ์— ๋จธ๋ฌธ ๋ฐ ๋น„ํ•ด ManipTrans๋Š” 58% ์ˆ˜์ค€์˜ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ฌผ์ฒด ์ž์„ธ ์˜ค์ฐจ, ์†๊ฐ€๋ฝ ๋ ์œ„์น˜ ์˜ค์ฐจ ๋“ฑ ์ •๋ฐ€๋„ ์ง€ํ‘œ๋„ ManipTrans๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์•„ ์ฐธ์กฐ ๋™์ž‘์„ ๊ฐ€์žฅ ์ •ํ™•ํ•˜๊ฒŒ ๋”ฐ๋ผํ•จ์„ ์ฆ๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ–ฅ์ƒ์€ ๋‘ ๋‹จ๊ณ„ ์ „์ด ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์†๊ฐ€๋ฝ ์„ธ๋ถ€ ์›€์ง์ž„๊ณผ ๋ฌผ์ฒด ์ƒํ˜ธ์ž‘์šฉ์„ ๋ชจ๋‘ ํšจ๊ณผ์ ์œผ๋กœ ํฌ์ฐฉํ•ด๋‚ด๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ๋ถ„์„๋ฉ๋‹ˆ๋‹ค.

ํฅ๋ฏธ๋กœ์šด ์ ์€, Retarget-Only ๋ฐฉ์‹์€ ๋†’์€ ์ž์œ ๋„์˜ ๋กœ๋ด‡ ์† ๊ณต๊ฐ„์—์„œ ์˜ค๋ฅ˜ ๋ˆ„์ ์„ ๊ฐ๋‹นํ•˜์ง€ ๋ชปํ•ด ์‚ฌ์‹ค์ƒ ๊ฑฐ์˜ ์‹คํŒจํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•œํŽธ RL-Only๋Š” ์ฒ˜์Œ๋ถ€ํ„ฐ ํƒ์ƒ‰ํ•˜๋‹ค ๋ณด๋‹ˆ ํ•™์Šต ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ณ  ๋ชจ์…˜ ์ •๋ฐ€๋„๊ฐ€ ๋–จ์–ด์ ธ ์•„์‰ฌ์šด ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. Retarget + Residual ๊ธฐ๋ฒ•๋„ ManipTrans๋ณด๋‹ค๋Š” ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋ƒˆ๋Š”๋ฐ, ์ด๋Š” ์ดˆ๊ธฐ์— ๋ฆฌํƒ€๊ฒŒํŒ…ํ•œ ๋™์ž‘ ์ž์ฒด๊ฐ€ ๋ฌผ์ฒด ์ ‘์ด‰ ์ƒํ™ฉ์—์„œ ์ถฉ๋Œ์„ ์ผ์œผํ‚ค๋Š” ๋“ฑ ๋ถ€์ž์—ฐ์Šค๋Ÿฌ์›Œ Residual ํ•™์Šต์„ ๋ฐฉํ•ดํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ManipTrans๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ ์† ๋ชจ์…˜ ๋ชจ๋ฐฉ ๋ชจ๋ธ์„ ํ™œ์šฉํ•จ์œผ๋กœ์จ ๋ณด๋‹ค ์•ˆ์ •์ ์ด๊ณ  ์ •ํ™•ํ•œ ๊ธฐ๋ณธ ๋™์ž‘์„ ์ œ๊ณตํ•˜๊ณ , Residual ๋‹จ๊ณ„์—์„œ๋„ ์ถ”๊ฐ€ ์ œ์•ฝ๋งŒ ํ•™์Šตํ•˜๋ฉด ๋˜๋ฏ€๋กœ ์ œ์–ด ๋‚œ์ด๋„๊ฐ€ ๊ฐ์†Œํ•˜์—ฌ ์ตœ์ข… ์„ฑ๋Šฅ์ด ๋†’์•˜์Šต๋‹ˆ๋‹ค.

์ •์„ฑ์ ์ธ ๊ฒฐ๊ณผ๋กœ๋„ ManipTrans์˜ ์šฐ์ˆ˜์„ฑ์ด ๋“œ๋Ÿฌ๋‚ฌ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ ์ œ์‹œ๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋™์˜์ƒ๊ณผ ์ด๋ฏธ์ง€์— ๋”ฐ๋ฅด๋ฉด, ๋กœ๋ด‡ ์†์ด ๊ฐ€๋Š” ๊ฝƒ์ค„๊ธฐ๋ฅผ ๋‘ ์†๊ฐ€๋ฝ์œผ๋กœ ์ง‘์–ด ๊ฝƒ๋ณ‘์— ๊ฝ‚๋Š”๋‹ค๊ฑฐ๋‚˜, ๊ธด ์Šคํ‘ผ์œผ๋กœ ๋ณ‘ ์•ˆ์˜ ๋ฌผ์ฒด๋ฅผ ํ•จ๊ป˜ ๊ธ์–ด๋‚ด๋Š” ์ž‘์—…, ์–‡์€ ํŽœ์œผ๋กœ ๊ธ€์”จ ์“ฐ๊ธฐ ๋“ฑ ๋งค์šฐ ์„ฌ์„ธํ•œ ๋™์ž‘๋“ค๋„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ฆผ 3์— ์ผ๋ถ€ ์˜ˆ์‹œ๊ฐ€ ๋‚˜ํƒ€๋‚˜ ์žˆ๋Š”๋ฐ, ์ƒ๋‹จ ๋‘ ํ–‰์€ ๋‹จ์ผ ์† ์กฐ์ž‘(ํ”Œ๋ผ์Šคํฌ ํ”๋“ค๊ธฐ, ํŽœ์œผ๋กœ ์“ฐ๊ธฐ) ์žฅ๋ฉด๋“ค์ด๊ณ  ํ•˜๋‹จ ํ–‰์€ ์–‘์† ์กฐ์ž‘(๊ฝƒ๊ฝ‚์ด, ๋ฌผ ๋”ฐ๋ฅด๊ธฐ, ์Šคํ‘ผ์œผ๋กœ ๊ธ์–ด๋‚ด๊ธฐ) ์žฅ๋ฉด๋“ค์ž…๋‹ˆ๋‹ค. ์‚ฌ๋žŒ์˜ ์„ฌ์„ธํ•œ ์†๋†€๋ฆผ์ด ํ•„์š”ํ•œ ์ด ์ž‘์—…๋“ค์„ ๋กœ๋ด‡์ด ํฐ ์–ด์ƒ‰ํ•จ ์—†์ด ์žฌํ˜„ํ–ˆ๋‹ค๋Š” ์ ์€, ์ œ์•ˆํ•œ ๋ชจ์…˜ ์ „์ด ๋ฐฉ๋ฒ•์˜ ํ˜„์‹ค์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋‹ค์Œ์œผ๋กœ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด, ํ•™์Šต๋œ ์ •์ฑ…์„ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ์† ํ”Œ๋žซํผ์— ์ ์šฉํ•œ ์‹คํ—˜์ด ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Shadow Hand (๋ชจํ„ฐ 24๊ฐœ), MANO hand ๋ชจํ˜• (๊ฐ€์ƒ ๊ด€์ ˆ 22๊ฐœ), Inspire Hand (12๊ฐœ), Allegro Hand (16๊ฐœ)์ฒ˜๋Ÿผ ๊ตฌ์กฐ์™€ ์ž์œ ๋„๊ฐ€ ๋‹ค๋ฅธ ๋กœ๋ด‡ ์†๋“ค์— ๋Œ€ํ•ด ManipTrans๋ฅผ ๋™์ผํ•˜๊ฒŒ ์ ์šฉํ•ด๋ณธ ๊ฒฐ๊ณผ, ๋ณ„๋„์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์—†์ด๋„ ๋ชจ๋“  ๊ฒฝ์šฐ์— ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ๊ณผ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋™์ž‘์ด ๋‚˜์˜ด์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

์ฆ‰, ManipTrans๋Š” ์‚ฌ๋žŒ ์†์˜ ์†๊ฐ€๋ฝ-๊ด€์ ˆ ๋Œ€์‘๋งŒ ์ •ํ•ด์ฃผ๋ฉด ํŠน์ • ์† ๊ตฌํ˜„์ฒด์— ์ข…์†๋˜์ง€ ์•Š๊ณ  ๋™์ž‘์„ ์ „์ดํ•  ์ˆ˜ ์žˆ์–ด ํ”Œ๋žซํผ์— ๋ถˆ๊ฐ€์ง€๋ก ์ ์ธ ๋ฒ”์šฉ์„ฑ์„ ์ง€๋…”์Šต๋‹ˆ๋‹ค. ์ด๋Š” 1๋‹จ๊ณ„ ๋ชจ๋ฐฉ ๋ชจ๋ธ์ด ์†๊ฐ€๋ฝ ํ‚คํฌ์ธํŠธ ํŠธ๋ž˜ํ‚น์—๋งŒ ์ง‘์ค‘ํ•˜๊ณ , 2๋‹จ๊ณ„์—์„œ ๋ฌผ๋ฆฌ ์ƒํ˜ธ์ž‘์šฉ์„ ๋‹ค๋ฃจ๋Š” ๊ตฌ์กฐ ๋•๋ถ„์ž…๋‹ˆ๋‹ค. ์‹ฌ์ง€์–ด ์†๊ฐ€๋ฝ 4๊ฐœ์งœ๋ฆฌ Allegro Hand์˜ ๊ฒฝ์šฐ๋„ ์ผ๋ถ€ ์†๊ฐ€๋ฝ ๋Œ€์‘๋งŒ ์„ค์ •ํ•˜๋ฉด ํฐ ๋ฌธ์ œ ์—†์ด ๋™์ž‘ํ–ˆ์Œ์„ ๋ณด๊ณ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ, ํ˜„์‹ค ์„ธ๊ณ„ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•œ ์‹œ๋„๊ฐ€ ์ด๋ค„์กŒ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ๋‘ ๋Œ€์˜ 7์ž์œ ๋„ ๋กœ๋ด‡ ํŒ” ๋์— ์‹ค์ œ Inspire ๋กœ๋ด‡ ์† ๋‘ ๊ฐœ๋ฅผ ์žฅ์ฐฉํ•˜๊ณ , ์•ž์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ƒ์„ฑํ•œ DexManipNet์˜ ์–‘์† ์กฐ์ž‘ ๊ถค์ ๋“ค์„ ์žฌ์ƒ์‹œํ‚ค๋Š” ์‹คํ—˜์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์‹ค์ œ ํ•˜๋“œ์›จ์–ด์ธ ๋กœ๋ด‡ ์†์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ชจ๋ธ๋ณด๋‹ค ๊ด€์ ˆ ์ž์œ ๋„๊ฐ€ ์ ๊ธฐ ๋•Œ๋ฌธ์—, ๊ด€์ ˆ ๊ฐ๋„๋ฅผ ํ”ผํŒ…(fitting)ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ถ”๊ฐ€๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ƒ 12-DoF ๋™์ž‘์„ ์‹ค์ œ 6-DoF ๊ธฐ๊ณ„์† ์›€์ง์ž„์œผ๋กœ ๊ทผ์‚ฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๋กœ๋ด‡ ํŒ”์€ ์—ญ๊ธฐ๊ตฌํ•™(IK)์„ ํ’€์–ด ๋กœ๋ด‡ ์†๋ชฉ์ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์†๋ชฉ ๊ฒฝ๋กœ๋ฅผ ๋”ฐ๋ผ๊ฐ€๋„๋ก ์ œ์–ดํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•ด์„œ ์‹คํ—˜ํ•œ ๊ฒฐ๊ณผ, ์˜ˆ๋ฅผ ๋“ค์–ด โ€œ์น˜์•ฝ ๋šœ๊ป‘ ์—ด๊ธฐโ€ ์ž‘์—…์—์„œ ํ•œ ์†์œผ๋กœ ํŠœ๋ธŒ๋ฅผ ๊ฝ‰ ์ฅ๊ณ  ๋‹ค๋ฅธ ์†์˜ ์—„์ง€์™€ ๊ฒ€์ง€๋กœ ์ž‘์€ ๋šœ๊ป‘์„ ํ†ก ๋ˆŒ๋Ÿฌ ์—ฌ๋Š” ๋™์ž‘์„ ๋กœ๋ด‡์ด ์ˆ˜ํ–‰ํ•ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ๋žŒ๋„ ์„ธ์‹ฌํ•œ ํž˜ ์กฐ์ ˆ์ด ํ•„์š”ํ•œ ์ด ์›€์ง์ž„์„ ์›๊ฒฉ์กฐ์ž‘์œผ๋กœ๋Š” ๊ตฌํ˜„ํ•˜๊ธฐ ์–ด๋ ค์šด๋ฐ, ํ•™์Šต๋œ ์ •์ฑ…์„ ์ด์šฉํ•ด ๋น„๊ต์  ์‰ฝ๊ฒŒ ์‹คํ˜„ํ•œ ์‚ฌ๋ก€๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์€ ์ด ๋ฐ–์—๋„ ์—ฌ๋Ÿฌ ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜ ์˜์ƒ์„ ์›น์‚ฌ์ดํŠธ์— ๊ณต๊ฐœํ•˜๋ฉฐ, ๋ณธ ๊ธฐ๋ฒ•์ด ํ–ฅํ›„ ํ˜„์‹ค ๋กœ๋ด‡ ํ•™์Šต์— ํฐ ์ž ์žฌ๋ ฅ์„ ์ง€๋‹˜์„ ๊ฐ•์กฐํ–ˆ์Šต๋‹ˆ๋‹ค.

๋…ผ์˜ ๋ฐ ํ•œ๊ณ„์ 

ManipTrans๋Š” ๋‹ค์–‘ํ•œ ๋ณต์žกํ•œ ์ธ๊ฐ„ ์กฐ์ž‘์„ ๋กœ๋ด‡ ์–‘์†์— ์„ฑ๊ณต์ ์œผ๋กœ ์ „์ดํ–ˆ์ง€๋งŒ, ์ €์ž๋“ค์€ ๋ช‡ ๊ฐ€์ง€ ํ•œ๊ณ„์™€ ํ–ฅํ›„ ๊ณผ์ œ๋„ ๋…ผ์˜ํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ €, ์ž…๋ ฅ ๋ชจ์บก ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์— ๋”ฐ๋ฅธ ์ œ์•ฝ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ถ€ ์ธ๊ฐ„ ์‹œ์—ฐ์€ ์˜ค์ฐจ๋‚˜ ์žก์Œ์ด ๋งŽ์•„์„œ, ์†๊ณผ ๋ฌผ์ฒด์˜ ์ƒํ˜ธ์ž‘์šฉ์ด ์ •ํ™•ํžˆ ๊ธฐ๋ก๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋Š”๋ฐ, ์ด๋Ÿฐ ๋…ธ์ด์ฆˆ๊ฐ€ ํฐ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ์ „์ด ์„ฑ๋Šฅ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Œ์„ ์ง€์ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์†๊ฐ€๋ฝ์ด ๋ฌผ์ฒด๋ฅผ ์‚ด์ง ๊ด€ํ†ตํ•˜๊ฑฐ๋‚˜ ๋ถˆ์•ˆ์ •ํ•˜๊ฒŒ ์žก์€ ์ฑ„๋กœ ๊ธฐ๋ก๋œ ๋ฐ์ดํ„ฐ๋ผ๋ฉด ๋กœ๋ด‡์ด ๊ทธ ์›€์ง์ž„์„ ๋”ฐ๋ผ๊ฐ€๋‹ค ์‹คํŒจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ๋ฌผ์ฒด์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ชจ๋ธ ์ •ํ™•๋„ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜„์‹ค์˜ ๋ฌผ์ฒด๋Š” ๋ชจ์–‘์ด๋‚˜ ๊ด€์ ˆ(์˜ˆ: ๋šœ๊ป‘์˜ ๋‚˜์‚ฌ์‚ฐ ๋“ฑ)์ด ์ •๊ตํ•˜์ง€๋งŒ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฌผ์ฒด ๋ชจ๋ธ์ด ๋‹จ์ˆœ/๋ถ€์ •ํ™•ํ•˜๋ฉด ๋กœ๋ด‡์˜ ์กฐ์ž‘์ด ์—‰๋šฑํ•˜๊ฒŒ ์ง„ํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋ณตํ•ฉ ๊ตฌ์กฐ(articulated)๋ฅผ ๊ฐ€์ง„ ๋ฌผ์ฒด์˜ ๊ฒฝ์šฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ชจ๋ธ๋ง์ด ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ์ „์ด๊ฐ€ ์ž˜ ์•ˆ ๋˜๋Š” ์‚ฌ๋ก€๊ฐ€ ์žˆ์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋“ค ๋•Œ๋ฌธ์— ์ผ๋ถ€ ์‹œ์—ฐ ์‹œํ€€์Šค๋Š” ManipTrans๋กœ๋„ ์™„๋ฒฝํžˆ ์žฌํ˜„ํ•˜์ง€ ๋ชปํ–ˆ๋‹ค๊ณ  ๋ณด๊ณ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋ฉด ๋” ๊ฐ•์ธํ•œ ํ•™์Šต ๊ธฐ๋ฒ• ๊ฐœ๋ฐœ, ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํƒ€๋‹นํ•œ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์ฆ๊ฐ•, ์ •๊ตํ•œ ๋ฌผ์ฒด ๋ชจ๋ธ ํ™•๋ณด ๋“ฑ์ด ํ•„์š”ํ•˜๋‹ค๊ณ  ์ œ์–ธํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ์•ž์œผ๋กœ ManipTrans์˜ ๊ฐ•๊ฑด์„ฑ ํ–ฅ์ƒ, ํ˜„์‹ค ๋ฌผ์ฒด ๋ชจ๋ธ์˜ ๋ฌผ๋ฆฌ์  ์ •ํ•ฉ์„ฑ ๊ฐœ์„  ๋“ฑ์„ ์—ฐ๊ตฌํ•˜์—ฌ ๋‚จ์€ ์–ด๋ ค์šด ์‚ฌ๋ก€๋“ค๋„ ํ’€์–ด๋‚˜๊ฐ€๋Š” ๊ฒƒ์ด ์˜๋ฏธ์žˆ๋Š” ๋ฐฉํ–ฅ์ด๋ผ๊ณ  ์ „๋งํ•ฉ๋‹ˆ๋‹ค.

์ „์ฒด์ ์œผ๋กœ, โ€œManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learningโ€ ๋…ผ๋ฌธ์€ ์–‘์† ๋กœ๋ด‡ ์†์˜ ๋ณต์žกํ•œ ์กฐ์ž‘ ๋™์ž‘์„ ์ธ๊ฐ„ ์‹œ์—ฐ์„ ํ†ตํ•ด ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ํ˜์‹ ์  ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ์ž”์ฐจ ํ•™์Šต์„ ํ™œ์šฉํ•œ 2๋‹จ๊ณ„ ๋ชจ์…˜ ์ „์ด ๊ตฌ์กฐ๋Š” ์†๊ฐ€๋ฝ ์›€์ง์ž„ ๋ชจ๋ฐฉ๊ณผ ๋ฌผ์ฒด ์กฐ์ž‘ ์ œ์•ฝ ์ ์‘์„ ๋ถ„๋ฆฌํ•จ์œผ๋กœ์จ, ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์ด ๋„˜์ง€ ๋ชปํ–ˆ๋˜ ์ •ํ™•๋„์™€ ํšจ์œจ์„ฑ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ ํŽœ ๋šœ๊ป‘ ์”Œ์šฐ๊ธฐ, ๋ณ‘ ๋šœ๊ป‘ ๋Œ๋ฆฌ๊ธฐ ๊ฐ™์€ ์ƒˆ๋กœ์šด ๋‚œ์ œ ๊ณผ์ œ๋“ค๊นŒ์ง€ ์„ฑ๊ณต์ ์œผ๋กœ ๊ตฌํ˜„ํ•ด ๋ƒˆ์œผ๋ฉฐ, ์ด๋ฅผ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋„ ์ •๋ฆฌํ•˜์—ฌ ๊ณต๊ฐœํ•จ์œผ๋กœ์จ ํ–ฅํ›„ ์—ฐ๊ตฌ์— ๊ธฐ์—ฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ManipTrans๋ฅผ ํ†ตํ•ด ์ธ๊ฐ„์ฒ˜๋Ÿผ ์„ฌ์„ธํ•œ ์–‘์† ์กฐ์ž‘์„ ๋กœ๋ด‡์ด ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์ด ํ•œ์ธต ๊ฐ€๊นŒ์›Œ์กŒ์œผ๋ฉฐ, ์ถ”ํ›„ ๋‚จ์€ ๊ณผ์ œ๋“ค๋งŒ ํ•ด๊ฒฐ๋œ๋‹ค๋ฉด ๊ฐ€์ •์šฉ ์„œ๋น„์Šค ๋กœ๋ด‡์ด๋‚˜ ์‚ฐ์—…์šฉ ์กฐ์ž‘ ์ž‘์—… ๋“ฑ์— ํญ๋„“๊ฒŒ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค.

Copyright 2024, Jung Yeon Lee