Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • 1. ์„œ๋ก : ์™œ ๋„๊ตฌ ์กฐ์ž‘์€ ์•„์ง๋„ ์–ด๋ ค์šด๊ฐ€?
      • ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„
    • 2. ํ•ต์‹ฌ ์•„์ด๋””์–ด: ๋ฌธ์ œ๋ฅผ ๋‹ค์‹œ ์ •์˜ํ•˜๋ผ
    • 3. ๋ฐฉ๋ฒ•๋ก  ์ƒ์„ธ ๋ถ„์„
      • 3.1 ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜ ์ „์ฒด ๊ฐœ์š”
      • 3.2 ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ›ˆ๋ จ: ๋ฌด์—‡์„ ํ•™์Šตํ•  ๊ฒƒ์ธ๊ฐ€
      • 3.3 ์ •์ฑ… ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ
      • 3.4 ์‹ค์„ธ๊ณ„ ๋ฐฐํฌ: ์ธ์‹ ํŒŒ์ดํ”„๋ผ์ธ
      • 3.5 ๋ชฉํ‘œ ๊ด€๋ฆฌ(Goal Management)์˜ ์˜์‚ฌ์ฝ”๋“œ
      • 3.6 DexToolBench: ํ‰๊ฐ€ ๋ฒค์น˜๋งˆํฌ
    • 4. ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ
      • 4.1 ๋น„๊ต ๋Œ€์ƒ (Baselines)
      • 4.2 ์ •๋Ÿ‰์  ๊ฒฐ๊ณผ
      • 4.3 ๋„๊ตฌ ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ๋‚œ์ด๋„ ๋ถ„์„
      • 4.4 ์‹คํŒจ ๋ชจ๋“œ ๋ถ„์„
    • 5. ๊ธฐ์ˆ ์  ์‹ฌ์ธต ๋ถ„์„
      • 5.1 ์™œ ์ด๋ฏธ์ง€๊ฐ€ ์•„๋‹Œ ํฌ์ฆˆ ํ‘œํ˜„์ธ๊ฐ€?
      • 5.2 ์ ˆ์ฐจ์  ์ƒ์„ฑ์˜ ๋‹ค์–‘์„ฑ์ด ์™œ ์ค‘์š”ํ•œ๊ฐ€
      • 5.3 LSTM vs. Transformer: ์™œ LSTM์„ ์„ ํƒํ–ˆ๋Š”๊ฐ€
    • 6. ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
      • 6.1 Sim-to-Real RL ๊ณ„๋ณด
      • 6.2 Dex4D์™€์˜ ๋น„๊ต (๋™์‹œ๊ธฐ ๋ฐœํ‘œ)
      • 6.3 ๋ชจ๋ฐฉ ํ•™์Šต ์ ‘๊ทผ๋ฒ•๋“ค๊ณผ์˜ ๋น„๊ต
    • 7. ๋น„ํŒ์  ๊ณ ์ฐฐ: ๊ฐ•์ ๊ณผ ํ•œ๊ณ„
      • 7.1 ๊ฐ•์ 
      • 7.2 ํ•œ๊ณ„
      • 7.3 ์—ฐ๊ตฌ์ž๋ฅผ ์œ„ํ•œ ํ›„์† ์—ฐ๊ตฌ ๋ฐฉํ–ฅ
    • 8. ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 
    • ์ฐธ๊ณ  ์ •๋ณด
      • ํ™œ์šฉ๋œ ์ฃผ์š” ์™ธ๋ถ€ ๋ชจ๋ธ

๐Ÿ“ƒSimToolReal ๋ฆฌ๋ทฐ

humanoid
whole-body-control
motion-tracking
An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation
Published

February 23, 2026

  • Paper Link
  • Code Link
  • Project Link
  1. ๐Ÿ‘‰ SimToolReal์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ƒ์„ฑ๋œ ๋„๊ตฌ primitive์™€ ๋ฌด์ž‘์œ„ ๋ชฉํ‘œ pose๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•˜๋‚˜์˜ object-centric RL ์ •์ฑ…์„ ํ›ˆ๋ จํ•˜์—ฌ zero-shot์œผ๋กœ ์‹ค์ œ ์„ธ๊ณ„์˜ ๋‹ค์–‘ํ•œ ๋„๊ตฌ ์กฐ์ž‘ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿฆพ ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋„๊ตฌ ์‚ฌ์šฉ์„ ๋ชฉํ‘œ pose ์‹œํ€€์Šค๋ฅผ ํ†ตํ•œ ์กฐ์ž‘์œผ๋กœ ์ •์˜ํ•˜๋ฉฐ, vision foundation model ๊ธฐ๋ฐ˜์˜ perception pipeline์„ ํ†ตํ•ด ์ธ๊ฐ„ ์‹œ์—ฐ ๋น„๋””์˜ค์—์„œ ์–ป์€ ๋ชฉํ‘œ ๊ถค์ ์„ ์ถ”์ ํ•˜์—ฌ novelํ•œ ๋„๊ตฌ์™€ ์ž‘์—…์— ์ผ๋ฐ˜ํ™”๋ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿš€ DexToolBench ๋ฒค์น˜๋งˆํฌ์—์„œ SimToolReal์€ ๊ธฐ์กด retargeting ๋ฐ fixed-grasp ๋ฐฉ๋ฒ•๋ณด๋‹ค 37% ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, specialist ์ •์ฑ…๊ณผ ์œ ์‚ฌํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ dexterousํ•œ ๋„๊ตฌ ์‚ฌ์šฉ์˜ ํšจ๊ณผ์ ์ธ zero-shot transfer ๋Šฅ๋ ฅ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

SimToolReal์€ ๋กœ๋ด‡์˜ dexterousํ•œ ๋„๊ตฌ ์กฐ์ž‘์„ ์œ„ํ•œ object-centric ์ •์ฑ…์„ sim-to-real ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ๋‹จ์ผ์˜ ์ผ๋ฐ˜ ๋ชฉ์ (general-purpose) RL ์ •์ฑ…์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จํ•˜๊ณ  ์ด๋ฅผ ์‹ค์ œ ์„ธ๊ณ„์˜ ์ƒˆ๋กœ์šด ๋„๊ตฌ์™€ ์ž‘์—…์— zero-shot์œผ๋กœ ์ „์ด์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ํ†ต์ฐฐ (Core Insight)

SimToolReal์€ dexterousํ•œ ๋„๊ตฌ ์‚ฌ์šฉ์„ โ€œ๋„๊ตฌ๋ฅผ ์ผ๋ จ์˜ ๋ชฉํ‘œ ์ž์„ธ(goal poses)๋ฅผ ํ†ตํ•ด ์กฐ์ž‘ํ•˜๋Š” ๊ฒƒโ€์œผ๋กœ ์žฌ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ณต์žกํ•œ ๋„๊ตฌ ์‚ฌ์šฉ ์ž‘์—…์„ ๋‹จ์ผํ•œ ๋ชฉํ‘œ ๋„๋‹ฌ(goal-reaching) RL ๋ฌธ์ œ๋กœ ๋‹จ์ˆœํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ ˆ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑ๋œ ๋‹ค์–‘ํ•œ tool-like primitive object๋“ค์„ ์ž„์˜์˜ ๋ชฉํ‘œ ์ž์„ธ(random goal poses)๋กœ ์กฐ์ž‘ํ•˜๋„๋ก ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ›ˆ๋ จ ๋ชฉํ‘œ๋Š” ์•ˆ์ •์ ์ธ grasping๊ณผ in-hand reorientation๊ณผ ๊ฐ™์€ ๋„๊ตฌ ์‚ฌ์šฉ์— ํ•„์ˆ˜์ ์ธ dexterousํ•œ ๊ธฐ์ˆ ๋“ค์„ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค. ํ…Œ์ŠคํŠธ ์‹œ์—๋Š” ์ด ์ •์ฑ…์ด ์ธ๊ฐ„ ์‹œ์—ฐ ๋น„๋””์˜ค์—์„œ ์ถ”์ถœ๋œ ๋„๊ตฌ ๊ถค์ (trajectory)์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ถ”์ ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜์–ด, ์ž‘์—…๋ณ„ ํ›ˆ๋ จ ์—†์ด ๋‹ค์–‘ํ•œ ๋„๊ตฌ ์กฐ์ž‘์„ zero-shot์œผ๋กœ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

ํ›ˆ๋ จ ๋ฐฉ๋ฒ•๋ก  (Training Methodology)

  1. ํ™˜๊ฒฝ ์„ค์ • (Environment Setup)

    • ์ ˆ์ฐจ์  ๋„๊ตฌ ์ƒ์„ฑ (Procedural Tool Generation): ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์‹ค์ œ ๋„๊ตฌ์˜ ๋‹ค์–‘์„ฑ์„ ํฌ๊ด„ํ•˜๊ธฐ ์œ„ํ•ด ์ ˆ์ฐจ์ ์œผ๋กœ tool-like primitive object๋“ค์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋„๊ตฌ๋Š” handle๊ณผ head์˜ ์กฐํ•ฉ์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ์ด๋“ค์˜ ๊ธฐํ•˜ํ•™์  ํ˜•ํƒœ(cuboids ๋˜๋Š” capsules)์™€ ๋ฐ€๋„(handle์€ ๋‚ฎ์€ ๋ฐ€๋„ \rho_{\text{low}} \sim U[300, 600] \text{ kg/m}^3, head๋Š” ๋†’์€ ๋ฐ€๋„ \rho_{\text{high}} \sim U[300, 2000] \text{ kg/m}^3)๊ฐ€ ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋ง๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์–‘ํ•œ ๋ฌด๊ฒŒ ์ค‘์‹ฌ(center-of-mass)๊ณผ ํšŒ์ „ ๊ด€์„ฑ(rotational inertia)์„ ๊ฐ€์ง„ ๋„๊ตฌ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ •์ฑ…์ด ๋‹ค์–‘ํ•œ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ์— ์ ์‘ํ•˜๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.
      • Handle dimensions: length \in [5, 30] cm; width/height/diameter \in [1, 4] cm.
      • Head dimensions: length \in [1, 15] cm; width/height/diameter \in [0.5, 12] cm.
    • ์ดˆ๊ธฐํ™” (Initialization): ๊ฐ ์—ํ”ผ์†Œ๋“œ ์‹œ์ž‘ ์‹œ, ๋ฌด์ž‘์œ„๋กœ ์„ ํƒ๋œ ๊ฐ์ฒด๊ฐ€ ํ…Œ์ด๋ธ” ์œ„์— ๋ฌด์ž‘์œ„ ์ž์„ธ๋กœ ๋ฐฐ์น˜๋˜๊ณ , ๋กœ๋ด‡์€ ๋ฌด์ž‘์œ„ ๊ด€์ ˆ ๊ตฌ์„ฑ์œผ๋กœ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
    • ๋ชฉํ‘œ ์ƒ˜ํ”Œ๋ง (Goal Sampling): ์ฒซ ๋ฒˆ์งธ ๋ชฉํ‘œ๋Š” ๋กœ๋ด‡์˜ reachable workspace ๋‚ด์—์„œ ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋ง๋˜์–ด ๋„“์€ ๋ฒ”์œ„์˜ ๊ฐ์ฒด ์ž์„ธ์™€ ํฐ ์žฌ๋ฐฐ์น˜๋ฅผ ํ•™์Šตํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„์˜ ๋ชฉํ‘œ๋“ค์€ ์ด์ „ ๋ชฉํ‘œ์— ๊ฐ€๊น๊ฒŒ ์ƒ˜ํ”Œ๋ง๋˜์–ด ๋ถ€๋“œ๋Ÿฌ์šด ๊ถค์ (trajectory-like motion)์„ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.
  2. ๋ณด์ƒ ํ•จ์ˆ˜ (Reward Function)

    • ์ด ๋ณด์ƒ r์€ ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค: r = r_{\text{smooth}} + r_{\text{grasp}} + I_{\text{grasped}}r_{\text{goal}}

    • ๋ถ€๋“œ๋Ÿฌ์›€ ๋ณด์ƒ (r_{\text{smooth}}): ๊ด€์ ˆ ์†๋„(joint velocities)์˜ L_1 norm์— ๋ฒŒ์น™์„ ๋ถ€๊ณผํ•˜์—ฌ ๋ถ€๋“œ๋Ÿฌ์šด ๋™์ž‘์„ ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. r_{\text{smooth}} = -\lambda_{\text{arm}}\|\dot{q}_{\text{arm}}\|_1 - \lambda_{\text{hand}}\|\dot{q}_{\text{hand}}\|_1 ์—ฌ๊ธฐ์„œ \dot{q}_{\text{arm}}๊ณผ \dot{q}_{\text{hand}}๋Š” ๊ฐ๊ฐ 7-DoF arm๊ณผ 22-DoF hand์˜ ํ˜„์žฌ ์†๋„์ž…๋‹ˆ๋‹ค.

    • ๊ทธ๋žฉ ๋ณด์ƒ (r_{\text{grasp}}): ์ค‘๋ฆฝ ์ž์„ธ์—์„œ ์•ˆ์ •์ ์ธ ๊ทธ๋žฉ์œผ๋กœ์˜ ์ „ํ™˜์„ ๋•์Šต๋‹ˆ๋‹ค. r_{\text{grasp}} = r_{\text{approach}} + (1 - I_{\text{grasped}})r_{\text{lift}}

      • r_{\text{approach}}: ๋กœ๋ด‡ ์†๊ณผ ๊ฐ์ฒด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์„ ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. r_{\text{approach}} = \lambda_{\text{approach}} \max(\bar{d}^*_{\text{ft}} - \bar{d}_{\text{ft}}, 0) \bar{d}_{\text{ft}}๋Š” fingertip๊ณผ ๊ฐ์ฒด ์‚ฌ์ด์˜ ํ˜„์žฌ ํ‰๊ท  ๊ฑฐ๋ฆฌ์ด๋ฉฐ, \bar{d}^*_{\text{ft}}๋Š” ์—ํ”ผ์†Œ๋“œ์—์„œ ๋‹ฌ์„ฑ๋œ ์ตœ์†Œ ํ‰๊ท  ๊ฑฐ๋ฆฌ์ž…๋‹ˆ๋‹ค.
      • r_{\text{lift}}: ๊ฐ์ฒด๋ฅผ ๋“ค์–ด ์˜ฌ๋ฆฌ๋Š” ๊ฒƒ์„ ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. r_{\text{lift}} = \lambda_{\text{lift}} \max(z - z_{\text{init}}, 0) + I[z \ge z_{\text{lifted}}]B_{\text{lifted}} z๋Š” ๊ฐ์ฒด์˜ ์ˆ˜์ง ์œ„์น˜, z_{\text{init}}๋Š” ์ดˆ๊ธฐ ์ˆ˜์ง ์œ„์น˜, z_{\text{lifted}}๋Š” ๋“ค์–ด ์˜ฌ๋ฆผ ์ž„๊ณ„๊ฐ’, B_{\text{lifted}}๋Š” ๋ณด๋„ˆ์Šค์ž…๋‹ˆ๋‹ค. I_{\text{grasped}}๋Š” z \ge z_{\text{lifted}}์ผ ๋•Œ true๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
    • ๋ชฉํ‘œ ์ž์„ธ ๋ณด์ƒ (r_{\text{goal}}): I_{\text{grasped}} = 1์ด ๋˜๋ฉด ์ง€๋ฐฐ์ ์ธ ๋ณด์ƒ ํ•ญ์ด ๋ฉ๋‹ˆ๋‹ค. r_{\text{goal}} = \max(d^* - d(o_t, g), 0) + B_{\text{succ}} I[d(o_t, g) < \epsilon]

      • ๋ฐ€๋„ ์žˆ๋Š” ์ง„ํ–‰ ํ•ญ: ํ˜„์žฌ ๊ฐ์ฒด ์ž์„ธ o_t์™€ ๋ชฉํ‘œ ์ž์„ธ g ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ d(o_t, g)๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. d^*๋Š” ํ˜„์žฌ ๋ชฉํ‘œ์— ๋Œ€ํ•ด ๋‹ฌ์„ฑ๋œ ์ตœ์†Œ ๊ฑฐ๋ฆฌ๋ฅผ ์ถ”์ ํ•˜๋Š” stateful ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค.
      • ํฌ์†Œํ•œ ์„ฑ๊ณต ๋ณด๋„ˆ์Šค (B_{\text{succ}}): d(o_t, g) < \epsilon์ผ ๋•Œ ์ฃผ์–ด์ง€๋ฉฐ, ์ƒˆ๋กœ์šด ๋ชฉํ‘œ๊ฐ€ ์ƒ˜ํ”Œ๋ง๋ฉ๋‹ˆ๋‹ค.
      • Keypoint Distance Formulation: ๊ฐ์ฒด ์ž์„ธ ๊ฑฐ๋ฆฌ d(o_t, g)๋Š” D=4๊ฐœ์˜ ๊ฐ์ฒด ํ”„๋ ˆ์ž„ keypoint๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธก์ •๋ฉ๋‹ˆ๋‹ค: d(o_t, g) = \max_i \|o_{t,i} - g_i\|. ์—ฌ๊ธฐ์„œ o_{t,i}์™€ g_i๋Š” ํ˜„์žฌ ๋ฐ ๋ชฉํ‘œ ์ž์„ธ์˜ i-๋ฒˆ์งธ keypoint์˜ world-frame ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. Keypoint๋“ค์€ ๊ณ ์ •๋œ ์Šค์ผ€์ผ s_{\text{rew}} = [0.14, 0.03, 0.03] (๋ฏธํ„ฐ)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •์˜๋ฉ๋‹ˆ๋‹ค. s_{\text{rewx}} > s_{\text{rewy}}, s_{\text{rewz}}๋กœ ์„ค์ •ํ•˜์—ฌ ๊ธด ๋„๊ตฌ์˜ pitch ๋ฐ yaw ์˜ค์ฐจ์— ๋” ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  3. Object-Centric ์ •์ฑ… ์ž…๋ ฅ (Object-Centric Policy Inputs)

    ์ •์ฑ…์€ ํ˜„์žฌ ๋„๊ตฌ์˜ 6D ์ž์„ธ์™€ graspable region์— ๋Œ€ํ•œ coarseํ•œ 3D bounding box(๊ฐ์ฒด ํ”„๋ ˆ์ž„์—์„œ์˜ ์ค‘์‹ฌ + ํ™•์žฅ)๋งŒ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค. ์ด ์ถ”์ƒํ™”๋Š” sim-to-real visual gap์„ ํšจ๊ณผ์ ์œผ๋กœ ์šฐํšŒํ•˜๊ณ  zero-shot ์ „์ด๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. LSTM backbone์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒํ˜ธ์ž‘์šฉ ์ด๋ ฅ์„ ํ†ตํ•ฉํ•˜๊ณ  ์ง์ ‘ ๊ด€์ฐฐ๋˜์ง€ ์•Š๋Š” ์ž ์žฌ์ ์ธ ๋ฌผ๋ฆฌ์  ๋ฐ ๊ธฐํ•˜ํ•™์  ํŠน์„ฑ์„ ์•”๋ฌต์ ์œผ๋กœ ์ถ”๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์˜ ๊ด€์ฐฐ keypoint๋Š” ๊ฐ์ฒด์˜ grasp bounding box ํฌ๊ธฐ s \in \mathbb{R}^3 (๊ธธ์ด, ๋„ˆ๋น„, ๋†’์ด)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •์˜๋˜์–ด, ์ •์ฑ…์ด ํŠน์ • ๋„๊ตฌ ๊ธฐํ•˜ํ•™์— ์กฐ๊ฑดํ™”๋ฉ๋‹ˆ๋‹ค.

  4. RL ํ›ˆ๋ จ ์„ธ๋ถ€ ์‚ฌํ•ญ (RL Training Details)

    • SAPG (Split and Aggregate Policy Gradients): PPO์˜ ๋ณ€ํ˜•์ธ SAPG๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. SAPG๋Š” ์ •์ฑ…์˜ ์ง‘๋‹จ(population)์„ ์œ ์ง€ํ•˜๊ณ  ๊ทธ๋“ค์˜ collective experience๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฆฌ๋” ์ •์ฑ…์„ ์—…๋ฐ์ดํŠธํ•˜์—ฌ ํƒ์ƒ‰ ๋‹ค์–‘์„ฑ์„ ์ด‰์ง„ํ•˜๊ณ  ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ์˜ ํƒ์ƒ‰ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ์™„ํ™”ํ•ฉ๋‹ˆ๋‹ค.
    • Domain Randomization: sim-to-real ์ „์ด๋ฅผ ๋•๊ธฐ ์œ„ํ•ด observation delays, action-execution latency, ๊ฐ์ฒด ์ž์„ธ ์ถ”์ •์˜ ๋…ธ์ด์ฆˆ ๋ฐ ์ง€์—ฐ, grasp-region bounding box ๊ต๋ž€, ๊ฐ์ฒด์— ๋Œ€ํ•œ ๋ฌด์ž‘์œ„ ํž˜ ๋ฐ ํ† ํฌ ๊ต๋ž€ ๋“ฑ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • Asymmetric Critic: Actor๋Š” ํ…Œ์ŠคํŠธ ์‹œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ตœ์†Œํ•œ์˜ ๊ฐ์ฒด ํ‘œํ˜„๋งŒ์„ ๋ฐ›์ง€๋งŒ, Critic์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ privileged states(ground-truth ์†๋„, ๋ณด์ƒ ์‹ ํ˜ธ, stateful progress features, ๋…ธ์ด์ฆˆ๊ฐ€ ์—†๊ณ  ์ง€์—ฐ๋˜์ง€ ์•Š์€ ๊ฐ์ฒด ์ž์„ธ)์— ์ ‘๊ทผํ•˜์—ฌ ๊ฐ€์น˜ ํ•จ์ˆ˜(value function) ์ถ”์ •์„ ๊ฐœ์„ ํ•˜๊ณ  ํ›ˆ๋ จ์„ ์•ˆ์ •ํ™”ํ•ฉ๋‹ˆ๋‹ค.
    • Action Space: 29 DoF ๋กœ๋ด‡(7-DoF KUKA iiwa arm ๋ฐ 22-DoF Sharpa hand)์˜ ๊ด€์ ˆ ์œ„์น˜ ๋ชฉํ‘œ(joint position targets)๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. Arm์€ ์ด์ „ ๋ชฉํ‘œ๋กœ๋ถ€ํ„ฐ์˜ ์ƒ๋Œ€ ๋ณ€์œ„(delta)๋กœ, Hand๋Š” ๊ด€์ ˆ ํ•œ๊ณ„ ๋‚ด์˜ ์ ˆ๋Œ€ ๋ชฉํ‘œ๋กœ ํ•ด์„๋ฉ๋‹ˆ๋‹ค. ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ EMA ํ•„ํ„ฐ๊ฐ€ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

์‹ค์„ธ๊ณ„ ๋ฐฐํฌ (Real-World Deployment)

  1. ์ธ๊ฐ„ ๋น„๋””์˜ค ์ฒ˜๋ฆฌ (Human Video Processing)
    • Metric-Scale Mesh ๋ฐ Grasp Bounding Box ํš๋“: ์ฒซ RGB-D ํ”„๋ ˆ์ž„์—์„œ SAM 3D [16] (์บก์ฒ˜๋œ ๊นŠ์ด ๋งต์„ ์‚ฌ์šฉํ•˜์—ฌ metric accuracy๋ฅผ ๋ณด์žฅํ•˜๋„๋ก ์ˆ˜์ •๋จ)๋ฅผ ํ†ตํ•ด ๊ฐ์ฒด์˜ metric-scale 3D mesh๋ฅผ ์žฌ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ SAM 2 [65]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜๋„๋œ ๊ทธ๋žฉ ์˜์—ญ์„ ์„ธ๊ทธ๋จผํŠธํ™”ํ•˜๊ณ , ์ด๋ฅผ coarseํ•œ 3D bounding box๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ •์ฑ… ์ž…๋ ฅ์˜ ์ผ๋ถ€๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋Š” ํ•ธ๋“ค์— ์ค‘์‹ฌ์„ ๋‘๊ณ  x์ถ•์ด ํ•ธ๋“ค์˜ ์ฃผ ์ถ•์„ ๋”ฐ๋ผ ๋จธ๋ฆฌ ์ชฝ์œผ๋กœ ํ–ฅํ•˜๋„๋ก ์ •๋ ฌ๋ฉ๋‹ˆ๋‹ค.
    • 6D ๊ฐ์ฒด ์ž์„ธ ๊ถค์  ์ถ”์ถœ: FoundationPose [80]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RGB-D ๋น„๋””์˜ค์—์„œ 6D ๊ฐ์ฒด ์ž์„ธ ๊ถค์ ์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ถค์ ์€ 3Hz๋กœ ๋‹ค์šด์ƒ˜ํ”Œ๋ง๋˜๊ณ , ๊ฐ์ฒด๊ฐ€ ํ…Œ์ด๋ธ”์—์„œ ๋“ค์–ด ์˜ฌ๋ ค์ง€๋Š”(lift-off) ์ˆœ๊ฐ„๋ถ€ํ„ฐ ์‹œ์ž‘๋˜๋„๋ก ์ •์  ๋‹จ๊ณ„๊ฐ€ ์ž˜๋ผ๋‚ด์–ด์ง‘๋‹ˆ๋‹ค (z_{\text{thresh}}=10\text{cm}).
  2. ์ถ”๋ก  ์‹œ๊ฐ„ ๊ฐ์ฒด ์ถ”์  (Inference-time Object Tracking) ์ถ”๋ก  ์‹œ์—๋Š” FoundationPose [80]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ œ3์ž ์‹œ์  ์นด๋ฉ”๋ผ์—์„œ RGB-D ๊ด€์ฐฐ์„ ํ†ตํ•ด ํ˜„์žฌ 6D ๊ฐ์ฒด ์ž์„ธ๋ฅผ 30Hz๋กœ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์€ proprioception, ํ˜„์žฌ ๊ฐ์ฒด ์ž์„ธ, ๊ณ ์ •๋œ grasp-region bounding box, ๊ทธ๋ฆฌ๊ณ  ์‹œ์—ฐ ๊ถค์ ์—์„œ ๊ฐ€์ ธ์˜จ ํ˜„์žฌ ๋ชฉํ‘œ ์ž์„ธ์— ๋”ฐ๋ผ ์กฐ๊ฑดํ™”๋ฉ๋‹ˆ๋‹ค. ๊ฐ์ฒด ์ž์„ธ๊ฐ€ ๋ชฉํ‘œ ์ž์„ธ์— ์ถฉ๋ถ„ํžˆ ๊ฐ€๊นŒ์›Œ์ง€๋ฉด(d(o_t, g) < \epsilon) ๋ชฉํ‘œ๊ฐ€ ๋‹ค์Œ์œผ๋กœ ์ „ํ™˜๋ฉ๋‹ˆ๋‹ค.

DexToolBench

์ด ์—ฐ๊ตฌ๋Š” ๋„์ „์ ์ธ ๋„๊ตฌ ์‚ฌ์šฉ ์ž‘์—…์„ ํฌํ•จํ•˜๋Š” ์‹ค์„ธ๊ณ„ dexterous ์กฐ์ž‘ ๋ฒค์น˜๋งˆํฌ์ธ DexToolBench๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. 24๊ฐœ์˜ ์ผ์ƒ์ ์ธ ๋„๊ตฌ ์‚ฌ์šฉ ์ž‘์—…, 6๊ฐ€์ง€ ๋ฒ”์ฃผ์˜ 12๊ฐœ ๊ณ ์œ ํ•œ ๊ฐ์ฒด ์ธ์Šคํ„ด์Šค๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ์ž‘์—…์€ RGB-D ์ธ๊ฐ„ ๋น„๋””์˜ค ์‹œ์—ฐ๊ณผ ์Œ์„ ์ด๋ฃน๋‹ˆ๋‹ค. ํ‰๊ฐ€๋Š” Task Progress (์‹œ์—ฐ๋œ ๋ชฉํ‘œ ์ž์„ธ ์ค‘ ์„ฑ๊ณต์ ์œผ๋กœ ์ถ”์ ๋œ ๋น„์œจ, \epsilon = 2\text{cm})๋กœ ์ธก์ •๋ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ (Experimental Results)

  • Zero-Shot Real-World Tool-Use: SimToolReal์€ 120ํšŒ ์ด์ƒ์˜ ์‹ค์ œ ๋กค์•„์›ƒ์„ ํ†ตํ•ด 24๊ฐœ ์ž‘์—…, 12๊ฐœ ๊ฐ์ฒด ์ธ์Šคํ„ด์Šค, 6๊ฐœ ๋„๊ตฌ ๋ฒ”์ฃผ์— ๊ฑธ์ณ ๊ฐ•๋ ฅํ•œ zero-shot ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ in-hand rotation์ด ๋œ ํ•„์š”ํ•œ ์ž‘์—…(eraser, marker)์—์„œ ๋†’์€ Task Progress๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ฃผ์š” ์‹คํŒจ ๋ชจ๋“œ๋Š” ์ž์„ธ ์ถ”์  ์†์‹ค(43.7%), ๊ฐ์ฒด ๋–จ์–ด๋œจ๋ฆผ(34.5%), ๋ถˆ์™„์ „ํ•œ in-hand rotation์œผ๋กœ ์ธํ•œ ๋ชฉํ‘œ ์ž์„ธ ๋„๋‹ฌ ์‹คํŒจ(18.2%)์˜€์Šต๋‹ˆ๋‹ค.
  • Retargeting ๋ฐ Fixed Grasp Baseline๊ณผ์˜ ๋น„๊ต: SimToolReal์€ ๊ณ ์ •๋œ ๊ทธ๋žฉ์ด๋‚˜ Kinematic Retargeting ๊ธฐ๋ฐ˜์˜ ์ด์ „ ๋ฐฉ๋ฒ•๋“ค์„ 37% ๋Šฅ๊ฐ€ํ•˜๋ฉฐ, dexterousํ•œ in-hand object rotation์„ ํ†ตํ•ด ๋ณต์žกํ•œ ์ž‘์—…์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. Kinematic Retargeting์€ ์ ‘์ด‰ ์ƒํ˜ธ์ž‘์šฉ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์•„ ๊ทธ๋žฉ์— ์‹คํŒจํ–ˆ๊ณ , Fixed Grasp๋Š” in-hand rotation์ด ํ•„์š”ํ•œ ์ž‘์—…์—์„œ ๋กœ๋ด‡ ํŒ”์ด ํ…Œ์ด๋ธ”๊ณผ ์ถฉ๋Œํ•˜๋Š” ๋“ฑ์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • Specialist RL ์ •์ฑ…๊ณผ์˜ ๋น„๊ต: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ SimToolReal์€ ํŠน์ • ๊ฐ์ฒด์™€ ๊ถค์ ์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ specialist ์ •์ฑ…๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์ง€๋งŒ, specialist ์ •์ฑ…์€ ํ›ˆ๋ จ ์กฐ๊ฑด์—์„œ ๋ฒ—์–ด๋‚  ๊ฒฝ์šฐ(์ƒˆ๋กœ์šด ๊ถค์  ๋˜๋Š” ๊ฐ์ฒด) ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ €ํ•˜๋˜๋Š” ๊ณผ์ ํ•ฉ(overfitting) ๊ฒฝํ–ฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด SimToolReal์€ ๊ฐ์ฒด์™€ ๊ถค์  ๋ณ€ํ™” ๋ชจ๋‘์— ๋Œ€ํ•ด ๊ฐ•ํ•œ zero-shot ์„ฑ๋Šฅ์„ ์œ ์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ํ›ˆ๋ จ ๋ชฉํ‘œ์˜ ์ผ๋ฐ˜ํ™” ์˜ˆ์ธก ๋Šฅ๋ ฅ: ์ ˆ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑ๋œ primitive object๋“ค์—์„œ ์ž„์˜์˜ ๋ชฉํ‘œ ์ž์„ธ์— ๋„๋‹ฌํ•˜๋Š” ํ›ˆ๋ จ ๋ชฉํ‘œ์˜ ๋ณด์ƒ ์ฆ๊ฐ€๊ฐ€ DexToolBench ์ž‘์—…์—์„œ์˜ Task Progress ํ–ฅ์ƒ๊ณผ ๊ฐ•ํ•œ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ›ˆ๋ จ ๋ชฉํ‘œ๊ฐ€ ์‹ค์ œ ๋„๊ตฌ ์‚ฌ์šฉ์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์ง€ํ‘œ์ž„์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
  • RL ํ›ˆ๋ จ ablation: SAPG์™€ Asymmetric Critic์€ ์„ฑ๋Šฅ ๊ทน๋Œ€ํ™”์— ํ•„์ˆ˜์ ์ธ ์š”์†Œ๋กœ ํ™•์ธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. PPO ๋˜๋Š” ํ‘œ์ค€ critic์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ํ•™์Šต ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๋–จ์–ด์กŒ์Šต๋‹ˆ๋‹ค.

๋…ผ์˜ ๋ฐ ํ•œ๊ณ„ (Discussion and Limitations)

SimToolReal์€ zero-shot dexterous ๋„๊ตฌ ์กฐ์ž‘์— ๋Œ€ํ•œ ๊ฐ•๋ ฅํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ๊ณ ๊ฐ•๋„ ์ƒํ˜ธ์ž‘์šฉ์—์„œ ๊ธฐ๋Šฅ์  ์ž‘์—… ์™„๋ฃŒ๋ฅผ ๋ณด์žฅํ•˜์ง€ ์•Š์œผ๋ฉฐ, ๊ฐ์ฒด ์ž์„ธ ๋ชฉํ‘œ๋งŒ์œผ๋กœ๋Š” ํ™˜๊ฒฝ์„ ์ธ์‹ํ•˜์ง€ ๋ชปํ•˜์—ฌ ๋ณต์žกํ•œ ํ™˜๊ฒฝ์—์„œ์˜ ์ถฉ๋Œ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ˜„์žฌ๋Š” ๋„๊ตฌ๋ฅผ ๊ฐ•์ฒด๋กœ ๊ฐ€์ •ํ•˜๋ฉฐ, ๊ณ ์ˆ˜์ค€์˜ ๋ชฉํ‘œ ๊ถค์ ์ด ๊ณ ์ •๋˜์–ด ๋™์ ์œผ๋กœ ์žฌ๊ณ„ํš๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

ํ•œ ์ค„ ์š”์•ฝ: ๋„๊ตฌ ์กฐ์ž‘(Tool Manipulation)์ด๋ผ๋Š” ์–ด๋ ต๊ณ  ๋‹ค์–‘ํ•œ ๋ฌธ์ œ๋ฅผ โ€œ์ž„์˜์˜ ๋ฌผ์ฒด๋ฅผ ๋ชฉํ‘œ ํฌ์ฆˆ๊นŒ์ง€ ์ด๋™์‹œํ‚ค๋Š” ๋‹จ์ผ ๊ณผ์ œโ€๋กœ ํ™˜์›ํ•˜์—ฌ, ๋‹จ ํ•˜๋‚˜์˜ RL ์ •์ฑ…๋งŒ์œผ๋กœ ์‹ค์ œ ์„ธ๊ณ„์˜ ๋‹ค์–‘ํ•œ ๋„๊ตฌ๋ฅผ Zero-Shot์œผ๋กœ ๋‹ค๋ฃจ๋Š” ๋ฐ ์„ฑ๊ณตํ–ˆ๋‹ค.


1. ์„œ๋ก : ์™œ ๋„๊ตฌ ์กฐ์ž‘์€ ์•„์ง๋„ ์–ด๋ ค์šด๊ฐ€?

์ธ๊ฐ„์ด ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ์Šต์„ ์ž ๊น ์ƒ์ƒํ•ด ๋ณด์ž. ๋ง์น˜๋ฅผ ์ง‘์–ด ์˜ฌ๋ฆด ๋•Œ ์šฐ๋ฆฌ๋Š” ์†์žก์ด์˜ ์–‡์€ ๋ถ€๋ถ„์„ ์žก๊ณ , ์† ์•ˆ์—์„œ ๋ฌต์งํ•œ ๋ถ€๋ถ„์ด ์œ„๋กœ ๊ฐ€๋„๋ก ํšŒ์ „์‹œํ‚จ ๋’ค, ๊ฐ•ํ•œ ์ถฉ๊ฒฉ์„ ๊ฐ€ํ•ด๋„ ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค. ์ด ๋™์ž‘ ์•ˆ์—๋Š” ์„ธ ๊ฐ€์ง€ ๊ทผ๋ณธ์ ์ธ ๋„์ „์ด ์ˆจ์–ด ์žˆ๋‹ค.

  1. ์–‡์€ ๋ฌผ์ฒด ํŒŒ์ง€ (Thin Object Grasping): ํ…Œ์ด๋ธ” ์œ„์— ๋ˆ„์šด ๋งˆ์ปค ํŽœ, ๋ถ“, ๋“œ๋ผ์ด๋ฒ„์˜ ์†์žก์ด๋Š” ๋‘๊ป˜๊ฐ€ ์ˆ˜ mm์— ๋ถˆ๊ณผํ•˜๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ทธ๋ฆฌํผ ํŒŒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์–ผ๋งˆ๋‚˜ ํž˜๋“ค์–ดํ•˜๋Š”์ง€ ์ง๊ด€์ ์œผ๋กœ ๋А๊ปด์งˆ ๊ฒƒ์ด๋‹ค.

  2. ์† ์•ˆ ์žฌ๋ฐฐ์น˜ (In-Hand Reorientation): ์žก์€ ๋’ค ๋„๊ตฌ๋ฅผ ๊ธฐ๋Šฅ์ ์ธ ์ž์„ธ๋กœ ๋Œ๋ ค์•ผ ํ•œ๋‹ค. ์ด๋Š” ๋‹จ์ˆœ ํŒŒ์ง€๋ฅผ ๋„˜์–ด ๋‹ค์ง€ ํ•ธ๋“œ(Dexterous Hand)๋งŒ์ด ํ•  ์ˆ˜ ์žˆ๋Š” ์ •๊ตํ•œ ์กฐ์ž‘์„ ์š”๊ตฌํ•œ๋‹ค.

  3. ๊ฐ•์ œ์  ์ƒํ˜ธ์ž‘์šฉ (Forceful Interaction): ๋ชป ๋ฐ•๊ธฐ๋‚˜ ์ง€์šฐ๊ฐœ๋กœ ์ง€์šฐ๊ธฐ๋Š” ํ™˜๊ฒฝ๊ณผ ๊ฐ•ํ•œ ์ ‘์ด‰์ด ๋ฐœ์ƒํ•œ๋‹ค. ํŒŒ์ง€ ์ƒํƒœ๋ฅผ ์žƒ์ง€ ์•Š์œผ๋ฉด์„œ ํž˜์„ ์ „๋‹ฌํ•ด์•ผ ํ•œ๋‹ค.

๊ธฐ์กด ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„

์ด ๋ฌธ์ œ์— ์ ‘๊ทผํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํฌ๊ฒŒ ๋‘ ๊ฐˆ๋ž˜์˜€๋‹ค.

๋ชจ๋ฐฉ ํ•™์Šต(Imitation Learning) ๊ธฐ๋ฐ˜: ์ธ๊ฐ„์ด ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์œผ๋กœ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์€ ๋’ค ์ด๋ฅผ ํ•™์Šตํ•œ๋‹ค. ACT, Diffusion Policy, RT-2 ๊ฐ™์€ ๋ฐฉ๋ฒ•๋“ค์ด ์—ฌ๊ธฐ ์†ํ•œ๋‹ค. ๋ฌธ์ œ๋Š” ์–‡๊ณ  ํž˜๋“  ๋ฌผ์ฒด๋ฅผ ์žก๋Š” ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ๊ฒƒ ์ž์ฒด๊ฐ€ ๊ทน๋„๋กœ ์–ด๋ ต๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋งˆ์ปค ํ•œ ์ž๋ฃจ๋ฅผ ์ง‘์–ด ๋“œ๋ผ์ด๋ฒ„๋กœ ์“ฐ๋Š” ์ž์„ธ๋กœ ๋Œ๋ฆฌ๋Š” ๋™์ž‘์„ 50ํšŒ ์ด์ƒ ์‹œ์—ฐํ•˜๋Š” ๊ฒƒ์„ ์ƒ์ƒํ•ด ๋ณด๋ผ.

Sim-to-Real RL ๊ธฐ๋ฐ˜: ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ์ •์ฑ…์„ ํ•™์Šตํ•˜๊ณ  ํ˜„์‹ค์— ์ ์šฉํ•œ๋‹ค. OpenAI Dactyl์ด ๋Œ€ํ‘œ์ ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•๋“ค์€ ๊ฐ ๊ฐ์ฒด์™€ ๊ณผ์ œ๋ณ„๋กœ ๋ณ„๋„์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ชจ๋ธ๋ง๊ณผ ๋ณด์ƒ ํ•จ์ˆ˜ ์„ค๊ณ„๊ฐ€ ํ•„์š”ํ–ˆ๋‹ค. ์ƒˆ ๋„๊ตฌ๊ฐ€ ์ถ”๊ฐ€๋  ๋•Œ๋งˆ๋‹ค ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‹ค์‹œ ์‹œ์ž‘ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์˜๋ฏธ๋‹ค.

์ด ๋…ผ๋ฌธ, SimToolReal์€ ์ด ์–‘์ชฝ ๋ฌธ์ œ๋ฅผ ๋™์‹œ์— ํ•ด๊ฒฐํ•˜๋Š” ์šฐ์•„ํ•œ ํ•ด๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.


2. ํ•ต์‹ฌ ์•„์ด๋””์–ด: ๋ฌธ์ œ๋ฅผ ๋‹ค์‹œ ์ •์˜ํ•˜๋ผ

๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ๊ธฐ์—ฌ๋Š” ๋ฌธ์ œ ์ •์˜์˜ ์ „ํ™˜์ด๋‹ค.

โ€œ๋„๊ตฌ ์กฐ์ž‘ = ๋„๊ตฌ๋ฅผ ์ผ๋ จ์˜ ๋ชฉํ‘œ ํฌ์ฆˆ(goal pose sequence)๋กœ ์ด๋™์‹œํ‚ค๋Š” ๊ฒƒโ€

์ด ์ •์˜๋Š” ์–ผํ• ๋‹จ์ˆœํ•ด ๋ณด์ด์ง€๋งŒ ์—„์ฒญ๋‚œ ํ•จ์˜๋ฅผ ๋‹ด๊ณ  ์žˆ๋‹ค.

  • ๋ง์น˜์งˆ = {์†์žก์ด ์žก๊ธฐ ํฌ์ฆˆ, ์œ„๋กœ ๋“ค๊ธฐ ํฌ์ฆˆ, ์•„๋ž˜๋กœ ๋‚ด๋ ค์น˜๊ธฐ ํฌ์ฆˆ} ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ
  • ๋ถ“์งˆ = {๋ถ“ ์žก๊ธฐ ํฌ์ฆˆ, ์บ”๋ฒ„์Šค ์ ‘๊ทผ ํฌ์ฆˆ, ์ขŒ์šฐ ์ด๋™ ํฌ์ฆˆ๋“ค} ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ
  • ๋“œ๋ผ์ด๋ฒ„ ์กฐ์ด๊ธฐ = {์žก๊ธฐ ํฌ์ฆˆ, ํšŒ์ „ ํฌ์ฆˆ๋“ค} ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ

๋ชจ๋“  ๋„๊ตฌ ๊ณผ์ œ๊ฐ€ ๋™์ผํ•œ ์ถ”์ƒ์  ํ‘œํ˜„โ€”โ€œ๋‹ค์Œ ๋ชฉํ‘œ ํฌ์ฆˆ๋กœ ์ด๋™ํ•˜๋ผโ€โ€”์œผ๋กœ ํ™˜์›๋œ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ด ๋‹จ์ผ ๊ณผ์ œ๋งŒ์„ ์ž˜ ํ•™์Šตํ•˜๋ฉด ๋˜๋Š” ๊ฒƒ์ด๋‹ค. ๋„๊ตฌ๊ฐ€ ๋ญ”์ง€, ๊ณผ์ œ๊ฐ€ ๋ญ”์ง€๋Š” ์ „ํ˜€ ๋ชฐ๋ผ๋„ ๋œ๋‹ค.

์ด๊ฒƒ์ด Object-Centric์ด๋ผ๋Š” ์ด๋ฆ„์˜ ์˜๋ฏธ๋‹ค. ๋กœ๋ด‡์˜ ์‹œ๊ฐ์ด ์•„๋‹Œ ๋ฌผ์ฒด์˜ ์‹œ๊ฐ์œผ๋กœ ์„ธ์ƒ์„ ํ‘œํ˜„ํ•˜๋ฉด, ๋ชจ๋“  ๋„๊ตฌ ์กฐ์ž‘์ด ํ•˜๋‚˜์˜ ๋ณดํŽธ์  ๋ฌธ์ œ๊ฐ€ ๋œ๋‹ค.


3. ๋ฐฉ๋ฒ•๋ก  ์ƒ์„ธ ๋ถ„์„

3.1 ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜ ์ „์ฒด ๊ฐœ์š”

flowchart TD
    subgraph SIM ["๐Ÿ–ฅ๏ธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ (Isaac Gym)"]
        A[์ ˆ์ฐจ์  ๋„๊ตฌ ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒ ์ƒ์„ฑ\nProcedurally Generated Primitives] --> B[๋žœ๋ค ๋ชฉํ‘œ ํฌ์ฆˆ ์ƒ˜ํ”Œ๋ง\nRandom Goal Pose Sampling]
        B --> C[RL ์ •์ฑ… ํ•™์Šต\nLSTM Policy Training]
        C --> D[ํ•™์Šต๋œ ๋ฒ”์šฉ ์ •์ฑ…\nGeneral-Purpose RL Policy]
    end

    subgraph REAL ["๐Ÿค– ์‹ค์„ธ๊ณ„ ๋ฐฐํฌ (Real World)"]
        E[์ธ๊ฐ„ ๋น„๋””์˜ค ์‹œ์—ฐ\nHuman Video Demo] --> F[FoundationPose\n6D ๋ชฉํ‘œ ํฌ์ฆˆ ์ถ”์ถœ]
        G[์‹ค์ œ ๋„๊ตฌ\nReal Tool] --> H[SAM 3D\n๋ฉ”์‰ฌ + ํŒŒ์ง€ ๋ฐ”์šด๋”ฉ๋ฐ•์Šค ์ถ”์ถœ]
        F --> I[๋ชฉํ‘œ ํฌ์ฆˆ ์‹œํ€€์Šค]
        H --> J[๋„๊ตฌ 6D ํฌ์ฆˆ ํŠธ๋ž˜ํ‚น]
    end

    D -->|Zero-Shot ์ „์ด| K[์‹ค์‹œ๊ฐ„ ์ •์ฑ… ์ถ”๋ก ]
    I --> K
    J --> K
    K --> L[29-DoF ๊ด€์ ˆ ์œ„์น˜ ํƒ€๊ฒŸ ์ถœ๋ ฅ]
    L --> M[๋„๊ตฌ ๊ณผ์ œ ์ˆ˜ํ–‰ ์™„๋ฃŒ]

    style SIM fill:#e8f4f8,stroke:#2196F3
    style REAL fill:#f0f8e8,stroke:#4CAF50

3.2 ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ›ˆ๋ จ: ๋ฌด์—‡์„ ํ•™์Šตํ•  ๊ฒƒ์ธ๊ฐ€

์ ˆ์ฐจ์  ๋ฌผ์ฒด ์ƒ์„ฑ (Procedural Object Generation)

๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ํŠน์ • ๋„๊ตฌ(์˜ˆ: ๋ง์น˜ ํ•˜๋‚˜, ๋“œ๋ผ์ด๋ฒ„ ํ•˜๋‚˜)๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ณ  ๊ทธ์— ๋งž์ถ˜ ํ™˜๊ฒฝ์„ ๊ตฌ์„ฑํ–ˆ๋‹ค. SimToolReal์€ ์ด๋ฅผ ์™„์ „ํžˆ ๋’ค์ง‘๋Š”๋‹ค.

์ˆ˜๋ฐฑ ๊ฐ€์ง€์˜ ๊ฐ€์ƒ ๋„๊ตฌ ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒ๋ฅผ ์ ˆ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑํ•œ๋‹ค. ์ด ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒ๋“ค์€ ์‹ค์ œ ๋„๊ตฌ์™€ ์™„์ „ํžˆ ๋™์ผํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. ์ค‘์š”ํ•œ ๊ฒƒ์€ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ, ํฌ๊ธฐ, ์งˆ๋Ÿ‰ ๋ถ„ํฌ๋ฅผ ํฌ๊ด„ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋Ÿฐ ๋‹ค์–‘์„ฑ์ด ์‹ค์ œ ๋„๊ตฌ๋กœ์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. ๋งˆ์น˜ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ๋ธ”๋ก์„ ์Œ“๋Š” ์—ฐ์Šต์„ ํ•œ ์‚ฌ๋žŒ์ด ์ฒ˜์Œ ๋ณด๋Š” ๋ฌผ๊ฑด๋„ ์Œ“์„ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ.

๋ฒ”์šฉ ๋ชฉํ‘œ: ์ž„์˜์˜ ํฌ์ฆˆ ๋„๋‹ฌ

ํ›ˆ๋ จ ๊ณผ์ œ๋Š” ๋‹จ ํ•˜๋‚˜๋‹ค:

\pi^* = \arg\max_\pi \mathbb{E}\left[\sum_t r(s_t, a_t)\right]

์—ฌ๊ธฐ์„œ ๋ณด์ƒ์€ ํ˜„์žฌ ๋„๊ตฌ ํฌ์ฆˆ์™€ ๋ชฉํ‘œ ํฌ์ฆˆ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด๋‹ค:

r(s_t, a_t) = -\left\|T_{\text{obj}}^{\text{current}} \ominus T_{\text{obj}}^{\text{goal}}\right\|

(\ominus๋Š” SE(3) ์ƒ์—์„œ์˜ ํฌ์ฆˆ ์˜ค์ฐจ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค)

์ด๋ณด๋‹ค ๋‹จ์ˆœํ•  ์ˆ˜ ์—†๋‹ค. ๊ณผ์ œ๋ณ„ ๋ณด์ƒ ์„ค๊ณ„(task-specific reward shaping)๊ฐ€ ์ „ํ˜€ ์—†๋‹ค.

์ด ๋‹จ์ˆœํ•œ ๊ณผ์ œ๋ฅผ ์ˆ˜์ฒœ ๊ฐœ์˜ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์— ๋Œ€ํ•ด ๋ฐ˜๋ณต ํ•™์Šตํ•˜๋ฉด, ์ •์ฑ…์€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋‹ค์Œ์„ ํ•™์Šตํ•œ๋‹ค: ํ…Œ์ด๋ธ”์—์„œ ๋ฌผ์ฒด๋ฅผ ์ง‘์–ด ์˜ฌ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•, ์† ์•ˆ์—์„œ ๋ฌผ์ฒด๋ฅผ ์›ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋Œ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•, ๋ฌผ์ฒด๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์žก์€ ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•. ์ด ์„ธ ๊ฐ€์ง€๋Š” ๋ชจ๋“  ๋„๊ตฌ ์กฐ์ž‘์— ํ•„์š”ํ•œ ํ•ต์‹ฌ ๊ธฐ์ˆ ์ด๋‹ค. ๋ณด์ƒ ํ•จ์ˆ˜๊ฐ€ ๋ช…์‹œ์ ์œผ๋กœ ์ง€์‹œํ•˜์ง€ ์•Š์•„๋„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ฐฝ๋ฐœ(emerge)ํ•œ๋‹ค.

3.3 ์ •์ฑ… ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ

flowchart LR
    subgraph OBS ["์ž…๋ ฅ Observation"]
        P["๊ณ ์œ  ๊ฐ๊ฐ\nProprioception\n๊ด€์ ˆ ๊ฐ๋„ ๋ฐ ์†๋„"]
        O["๋ฌผ์ฒด 6D ํฌ์ฆˆ\nObject 6D Pose\n์œ„์น˜ + ๋ฐฉํ–ฅ"]
        B["ํŒŒ์ง€ ๋ฐ”์šด๋”ฉ๋ฐ•์Šค\nGrasp Bounding Box\n3D AABB"]
        G["๋ชฉํ‘œ ํฌ์ฆˆ\nGoal Pose\n์œ„์น˜ + ๋ฐฉํ–ฅ"]
    end

    subgraph NET ["LSTM ์ •์ฑ… ๋„คํŠธ์›Œํฌ"]
        L["LSTM Cell\n์‹œ๊ฐ„์  ๋งฅ๋ฝ ์œ ์ง€"]
        MLP["MLP Head"]
    end

    subgraph ACT ["์ถœ๋ ฅ Action"]
        JT["๊ด€์ ˆ ์œ„์น˜ ํƒ€๊ฒŸ\n29-DoF\nArm + Dexterous Hand"]
    end

    P --> L
    O --> L
    B --> L
    G --> L
    L --> MLP
    MLP --> JT

์™œ LSTM์ธ๊ฐ€? ๋„๊ตฌ๋ฅผ ์ง‘์–ด ์˜ฌ๋ฆฌ๊ณ , ๋Œ๋ฆฌ๊ณ , ์‚ฌ์šฉํ•˜๋Š” ๊ณผ์ •์€ ์ˆœ์ฐจ์ ์ด๊ณ  ๋งฅ๋ฝ ์˜์กด์ ์ด๋‹ค. โ€œ์ง€๊ธˆ ์†์ด ์–ด๋–ค ์ƒํƒœ์ธ์ง€โ€, โ€œ๋ฐฉ๊ธˆ ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ํž˜์„ ์คฌ๋Š”์ง€โ€๊ฐ€ ๋‹ค์Œ ํ–‰๋™์— ์ค‘์š”ํ•˜๋‹ค. LSTM์€ ์ด ์‹œ๊ฐ„์  ๋งฅ๋ฝ์„ ๋‚ด๋ถ€ ์ƒํƒœ๋กœ ์œ ์ง€ํ•œ๋‹ค.

๊ด€์ฐฐ ๊ณต๊ฐ„ ์„ค๊ณ„์˜ ํ•ต์‹ฌ: Object-Centric Representation

๊ฐ€์žฅ ์ค‘์š”ํ•œ ์„ค๊ณ„ ๊ฒฐ์ •์€ ๊ด€์ฐฐ ๊ณต๊ฐ„์— ์นด๋ฉ”๋ผ ์ด๋ฏธ์ง€๋ฅผ ๋„ฃ์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋Œ€์‹  ์„ธ ๊ฐ€์ง€ ์ถ”์ƒ์  ํ‘œํ˜„๋งŒ ์‚ฌ์šฉํ•œ๋‹ค:

  • Object 6D Pose: ๋„๊ตฌ์˜ ํ˜„์žฌ ์œ„์น˜์™€ ๋ฐฉํ–ฅ (SE(3) ์›์†Œ)
  • Grasp Bounding Box: ์žก์•„์•ผ ํ•  ๋ถ€๋ถ„์„ ๋‚˜ํƒ€๋‚ด๋Š” 3D ๋ฐ”์šด๋”ฉ๋ฐ•์Šค (์˜ˆ: ๋ง์น˜์˜ ์†์žก์ด ๋ถ€๋ถ„)
  • Goal Pose: ๋„๊ตฌ๊ฐ€ ๋„๋‹ฌํ•ด์•ผ ํ•  ๋‹ค์Œ ๋ชฉํ‘œ ํฌ์ฆˆ

์ด ์„ธ ๊ฐ€์ง€ ์ถ”์ƒ์  ํ‘œํ˜„์€ ๋„๊ตฌ์˜ ์™ธ๊ด€๊ณผ ๋ฌด๊ด€ํ•˜๋‹ค. ๋งˆ์ปค๋“  ๋ง์น˜๋“  ๋ถ“์ด๋“ , ๊ฐ™์€ ์ž…๋ ฅ ๊ณต๊ฐ„์œผ๋กœ ํ‘œํ˜„๋œ๋‹ค. ์ด๊ฒƒ์ด Zero-Shot ์ผ๋ฐ˜ํ™”์˜ ์—ด์‡ ๋‹ค.

3.4 ์‹ค์„ธ๊ณ„ ๋ฐฐํฌ: ์ธ์‹ ํŒŒ์ดํ”„๋ผ์ธ

์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต๋œ ์ •์ฑ…์ด ์ถ”์ƒ์  ํ‘œํ˜„(6D ํฌ์ฆˆ + ๋ฐ”์šด๋”ฉ๋ฐ•์Šค)์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์œผ๋ฏ€๋กœ, ์‹ค์„ธ๊ณ„์—์„œ๋„ ์ด๋ฅผ ์ •ํ™•ํžˆ ์ถ”์ถœํ•ด์•ผ ํ•œ๋‹ค. ์—ฌ๊ธฐ์— ๋‘ ๊ฐ€์ง€ ๋น„์ „ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ์ด ํ™œ์šฉ๋œ๋‹ค.

flowchart TD
    subgraph OFFLINE ["์˜คํ”„๋ผ์ธ ๋ฐฐํฌ ์ „ 1ํšŒ ์ˆ˜ํ–‰"]
        V["์ธ๊ฐ„ ์‹œ์—ฐ ๋น„๋””์˜ค"] --> FP["FoundationPose\n6D ํฌ์ฆˆ ์‹œํ€€์Šค ์ถ”์ถœ"]
        FP --> TRAJ["๋ชฉํ‘œ ํฌ์ฆˆ ๊ถค์ \nGoal Pose Trajectory"]
        
        IMG["๋„๊ตฌ RGB-D ์ด๋ฏธ์ง€\n์ •์  ์Šค์บ”"] --> SAM["SAM 3D\n๋ฉ”์‰ฌ ์žฌ๊ตฌ์„ฑ + ๋ถ„ํ• "]
        SAM --> MESH["๋„๊ตฌ ๋ฉ”์‰ฌ + ํŒŒ์ง€ ๋ฐ”์šด๋”ฉ๋ฐ•์Šค"]
    end

    subgraph ONLINE ["์˜จ๋ผ์ธ ์‹ค์‹œ๊ฐ„ ๋ฃจํ”„ 60Hz"]
        CAM["์‹ค์‹œ๊ฐ„ ์นด๋ฉ”๋ผ"] --> FP2["FoundationPose\nํ˜„์žฌ ๋„๊ตฌ ํฌ์ฆˆ ํŠธ๋ž˜ํ‚น"]
        MESH --> FP2
        FP2 --> POSE_NOW["ํ˜„์žฌ ๋„๊ตฌ ํฌ์ฆˆ"]
        
        TRAJ --> GM["๋ชฉํ‘œ ๊ด€๋ฆฌ์ž\nGoal Manager"]
        POSE_NOW --> GM
        GM --> POLICY["LSTM ์ •์ฑ…\nPolicy Inference"]
        MESH --> POLICY
        
        POLICY --> CMD["๊ด€์ ˆ ์œ„์น˜ ๋ช…๋ น\n60 Hz"]
    end

    style OFFLINE fill:#fff3e0,stroke:#FF9800
    style ONLINE fill:#e8f5e9,stroke:#4CAF50

SAM 3D: Meta์˜ Segment Anything Model์„ 3D๋กœ ํ™•์žฅํ•œ ๊ฒƒ์œผ๋กœ, RGB-D ์ด๋ฏธ์ง€์—์„œ ๋„๊ตฌ์˜ 3D ๋ฉ”์‰ฌ๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๊ณ  ํŒŒ์ง€ ๊ฐ€๋Šฅํ•œ ์˜์—ญ(์†์žก์ด)์„ 3D ๋ฐ”์šด๋”ฉ๋ฐ•์Šค๋กœ ๋ถ„ํ• ํ•œ๋‹ค. ์ด ๊ณผ์ •์€ ๋ฐฐํฌ ์ „ ํ•œ ๋ฒˆ๋งŒ ์ˆ˜ํ–‰ํ•œ๋‹ค.

FoundationPose: NVIDIA์˜ ๋ฒ”์šฉ 6D ํฌ์ฆˆ ์ถ”์ • ๋ฐ ํŠธ๋ž˜ํ‚น ๋ชจ๋ธ์ด๋‹ค. ๋„๊ตฌ์˜ ๋ฉ”์‰ฌ๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ์‹ค์‹œ๊ฐ„์œผ๋กœ 6D ํฌ์ฆˆ๋ฅผ ์ถ”์ ํ•œ๋‹ค. ์ด๋ฅผ ๋‘ ์šฉ๋„๋กœ ์‚ฌ์šฉํ•œ๋‹ค: ์ธ๊ฐ„ ์‹œ์—ฐ ๋น„๋””์˜ค์—์„œ ๋ชฉํ‘œ ํฌ์ฆˆ ์‹œํ€€์Šค ์ถ”์ถœ, ๊ทธ๋ฆฌ๊ณ  ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ˜„์žฌ ๋„๊ตฌ ํฌ์ฆˆ ํŠธ๋ž˜ํ‚น.

๋ชฉํ‘œ ๊ด€๋ฆฌ ๋กœ์ง: ํ˜„์žฌ ํฌ์ฆˆ๊ฐ€ ๋ชฉํ‘œ ํฌ์ฆˆ์— ์ถฉ๋ถ„ํžˆ ๊ทผ์ ‘ํ•˜๋ฉด (\epsilon-tolerance ์ด๋‚ด), ๋‹ค์Œ ๋ชฉํ‘œ ํฌ์ฆˆ๋กœ ์ž๋™ ์ „์ง„ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์‹œ์—ฐ ๊ถค์ ์„ ๋‹จ๊ณ„๋ณ„๋กœ ์ถ”์ ํ•œ๋‹ค.

3.5 ๋ชฉํ‘œ ๊ด€๋ฆฌ(Goal Management)์˜ ์˜์‚ฌ์ฝ”๋“œ

# ์˜์‚ฌ์ฝ”๋“œ: ๋ชฉํ‘œ ๊ด€๋ฆฌ ๋กœ์ง (Pseudocode)
def goal_manager(current_pose, goal_trajectory, epsilon_pos=0.05, epsilon_rot=0.1):
    """
    current_pose: SE(3) โ€” ํ˜„์žฌ ๋„๊ตฌ ํฌ์ฆˆ
    goal_trajectory: List[SE(3)] โ€” ๋ชฉํ‘œ ํฌ์ฆˆ ์‹œํ€€์Šค (์ธ๊ฐ„ ๋น„๋””์˜ค์—์„œ ์ถ”์ถœ)
    epsilon_pos: float โ€” ์œ„์น˜ ๋„๋‹ฌ ์ž„๊ณ„๊ฐ’ (๋ฏธํ„ฐ)
    epsilon_rot: float โ€” ๋ฐฉํ–ฅ ๋„๋‹ฌ ์ž„๊ณ„๊ฐ’ (๋ผ๋””์•ˆ)
    """
    current_goal_idx = 0
    
    while current_goal_idx < len(goal_trajectory):
        goal_pose = goal_trajectory[current_goal_idx]
        
        # ์œ„์น˜ ์˜ค์ฐจ
        pos_error = norm(current_pose.translation - goal_pose.translation)
        # ํšŒ์ „ ์˜ค์ฐจ (SO(3) ์ธก์ง€ ๊ฑฐ๋ฆฌ)
        rot_error = geodesic_distance(current_pose.rotation, goal_pose.rotation)
        
        if pos_error < epsilon_pos and rot_error < epsilon_rot:
            # ํ˜„์žฌ ๋ชฉํ‘œ ๋‹ฌ์„ฑ โ†’ ๋‹ค์Œ ๋ชฉํ‘œ๋กœ ์ „์ง„
            current_goal_idx += 1
        
        yield goal_pose  # ํ˜„์žฌ ๋ชฉํ‘œ๋ฅผ ์ •์ฑ…์— ์ „๋‹ฌ

์ด ๋‹จ์ˆœํ•œ ๋กœ์ง์ด ์‹ค์ œ๋กœ ์ž˜ ์ž‘๋™ํ•œ๋‹ค๋Š” ๊ฒƒ์ด ํฅ๋ฏธ๋กญ๋‹ค. ์ •์ฑ…์ด ๊ฐ ์ค‘๊ฐ„ ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ๋งŒ ์žˆ์œผ๋ฉด, ๋ณต์žกํ•œ ๋„๊ตฌ ๊ณผ์ œ ์ „์ฒด๋ฅผ ์ž๋™์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

3.6 DexToolBench: ํ‰๊ฐ€ ๋ฒค์น˜๋งˆํฌ

SimToolReal์€ ๋‹จ์ˆœํžˆ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•˜๋Š” ๋ฐ ๊ทธ์น˜์ง€ ์•Š๊ณ , DexToolBench๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ํ•จ๊ป˜ ๊ณต๊ฐœํ•œ๋‹ค.

๊ตฌ๋ถ„ ๋‚ด์šฉ
๋„๊ตฌ ์นดํ…Œ๊ณ ๋ฆฌ 6์ข… (๋ถ“, ์ฃผ๊ฑฑ, ๋“œ๋ผ์ด๋ฒ„, ๋งˆ์ปค, ์ง€์šฐ๊ฐœ, ๋ง์น˜)
๊ฐ์ฒด ์ธ์Šคํ„ด์Šค 12๊ฐœ (์นดํ…Œ๊ณ ๋ฆฌ๋‹น 2๊ฐœ)
๊ณผ์ œ ์ˆ˜ 24๊ฐœ (์ธ์Šคํ„ด์Šค๋‹น 2๊ฐœ)
์‹ค์„ธ๊ณ„ ๋กค์•„์›ƒ 120ํšŒ
ํ‰๊ฐ€ ๋ฐฉ์‹ Zero-Shot (ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋œ ๊ฐ์ฒด/๊ณผ์ œ ์—†์Œ)
๊ณต๊ฐœ ์—ฌ๋ถ€ ์ฝ”๋“œ, ์—์…‹, ํ›ˆ๋ จ ์Šคํฌ๋ฆฝํŠธ ์ „์ฒด ๊ณต๊ฐœ

์ค‘์š”ํ•œ ์ ์€ ๋ชจ๋“  ํ‰๊ฐ€๊ฐ€ ์ง„์ •ํ•œ Zero-Shot์ด๋ผ๋Š” ๊ฒƒ์ด๋‹ค. ํ›ˆ๋ จ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์ด 12๊ฐœ ์‹ค์ œ ๋„๊ตฌ ์ค‘ ์–ด๋А ๊ฒƒ๋„, 24๊ฐœ ๊ณผ์ œ ์ค‘ ์–ด๋А ๊ฒƒ๋„ ์‚ฌ์šฉ๋˜์ง€ ์•Š์•˜๋‹ค.


4. ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ

4.1 ๋น„๊ต ๋Œ€์ƒ (Baselines)

๋…ผ๋ฌธ์€ SimToolReal์„ ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๊ณผ ๋น„๊ตํ•œ๋‹ค.

โ‘  Fixed Grasp + Motion Planning: ๋ฏธ๋ฆฌ ์ •์˜๋œ ๊ณ ์ • ํŒŒ์ง€ ํฌ์ฆˆ๋กœ ๋„๊ตฌ๋ฅผ ์žก๊ณ , ๋ชจ์…˜ ํ”Œ๋ž˜๋‹์œผ๋กœ ๊ณผ์ œ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๋„๊ตฌ์˜ ๊ธฐํ•˜ํ•™์ด ์™„๋ฒฝํžˆ ์•Œ๋ ค์ ธ์•ผ ํ•˜๋ฉฐ ์œ ์—ฐ์„ฑ์ด ์—†๋‹ค.

โ‘ก Kinematic Retargeting: ์ธ๊ฐ„ ์† ๋™์ž‘์„ ๋กœ๋ด‡ ์†์œผ๋กœ ์šด๋™ํ•™์ ์œผ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. ๋ฌผ๋ฆฌ์  ์ƒํ˜ธ์ž‘์šฉ ์—†์ด ๋‹จ์ˆœ ํฌ์ฆˆ ๋ณต์‚ฌ(pose copying)์ด๋ฏ€๋กœ ํŒŒ์ง€๋ ฅ ๋ถ€์กฑ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค.

โ‘ข Specialist RL Policy: ๊ฐ ๋„๊ตฌ์™€ ๊ณผ์ œ๋ณ„๋กœ ๋ณ„๋„๋กœ ํ›ˆ๋ จ๋œ RL ์ •์ฑ…. ์ด๊ฒƒ์ด ์‚ฌ์‹ค์ƒ ์˜ค๋ผํด(oracle)์— ํ•ด๋‹นํ•œ๋‹ค โ€” ํ›ˆ๋ จ ๋•Œ ํ•ด๋‹น ๋„๊ตฌ๋ฅผ ๋ดค๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

4.2 ์ •๋Ÿ‰์  ๊ฒฐ๊ณผ

๋ฐฉ๋ฒ• Task Progress ๋น„๊ณ 
Fixed Grasp ๊ธฐ์ค€ ๋Œ€๋น„ ๋‚ฎ์Œ ๋„๊ตฌ/๊ณผ์ œ๋ณ„ ์—”์ง€๋‹ˆ์–ด๋ง ํ•„์š”
Retargeting ๊ธฐ์ค€ ๋Œ€๋น„ ๋‚ฎ์Œ ๋ฌผ๋ฆฌ์  ํŒŒ์ง€๋ ฅ ์•ฝํ•จ
SimToolReal +37% (vs ์œ„ ๋‘ ๋ฐฉ๋ฒ•) Zero-Shot, ํ›ˆ๋ จ ์‹œ ํ•ด๋‹น ๋„๊ตฌ ๋ฏธ์‚ฌ์šฉ
Specialist RL ~SimToolReal ์ˆ˜์ค€ ํ›ˆ๋ จ ์‹œ ํ•ด๋‹น ๋„๊ตฌ ์ง์ ‘ ์‚ฌ์šฉ

๊ฐ€์žฅ ์ฃผ๋ชฉํ•  ๊ฒฐ๊ณผ๋Š” Specialist RL๊ณผ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์ด๋‹ค. ์ŠคํŽ˜์…œ๋ฆฌ์ŠคํŠธ๋Š” ํ•ด๋‹น ๋„๊ตฌ๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ง์ ‘ ๋ดค์Œ์—๋„, ์ƒˆ ๋„๊ตฌ๋ฅผ ์ „ํ˜€ ๋ณธ ์  ์—†๋Š” SimToolReal์ด ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋‚ธ๋‹ค.

4.3 ๋„๊ตฌ ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ๋‚œ์ด๋„ ๋ถ„์„

graph LR
    subgraph HARD ["๊ณ ๋‚œ์ด๋„ ์นดํ…Œ๊ณ ๋ฆฌ"]
        M["๋งˆ์ปค Marker"]
        B["๋ถ“ Brush"]
    end
    subgraph MED ["์ค‘๊ฐ„ ๋‚œ์ด๋„"]
        S["์ฃผ๊ฑฑ Spatula"]
        E["์ง€์šฐ๊ฐœ Eraser"]
    end
    subgraph EASY ["์ƒ๋Œ€์ ์œผ๋กœ ์‰ฌ์šด ์นดํ…Œ๊ณ ๋ฆฌ"]
        H["๋ง์น˜ Hammer"]
        D["๋“œ๋ผ์ด๋ฒ„ Screwdriver"]
    end

    M -.->|"์–‡๊ณ  ๊ฐ€๋ฒผ์›€ โ†’ ํŒŒ์ง€ ๋‚œ์ด๋„ ๋†’์Œ"| HARD
    B -.->|"์œ ์—ฐํ•œ ๋ โ†’ ํฌ์ฆˆ ๋ถˆํ™•์‹ค์„ฑ"| HARD
    H -.->|"๋ฌด๊ฒŒ์ค‘์‹ฌ ๋ช…ํ™•, ๊ตต์€ ์†์žก์ด"| EASY
    D -.->|"์›ํ†ตํ˜• ๋Œ€์นญ"| EASY

๋ถ“๊ณผ ๋งˆ์ปค๋Š” ์–‡๊ณ  ๊ฐ€๋ฒผ์›Œ ํŒŒ์ง€ ์ž์ฒด๊ฐ€ ์–ด๋ ต๋‹ค. ๋ง์น˜๋Š” ๋ฌด๊ฒŒ์ค‘์‹ฌ์ด ๋ช…ํ™•ํ•˜๊ณ  ์†์žก์ด๊ฐ€ ๊ตต์–ด ์ƒ๋Œ€์ ์œผ๋กœ ์‰ฝ๋‹ค. ์ด๋Š” ์ง๊ด€๊ณผ ์ •ํ™•ํžˆ ์ผ์น˜ํ•œ๋‹ค.

4.4 ์‹คํŒจ ๋ชจ๋“œ ๋ถ„์„

๋…ผ๋ฌธ์€ ์‹คํŒจ ์›์ธ์„ ์†”์งํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•œ๋‹ค. ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ์›์ธ์œผ๋กœ๋Š” ์ดˆ๊ธฐ ์ง‘์–ด ์˜ฌ๋ฆฌ๊ธฐ์—์„œ ์†๊ฐ€๋ฝ์ด ๋„๊ตฌ ์•„๋ž˜๋กœ ๋“ค์–ด๊ฐ€์ง€ ๋ชปํ•˜๋Š” ํŒŒ์ง€ ์‹คํŒจ, FoundationPose๊ฐ€ ๋น ๋ฅธ ์›€์ง์ž„ ์ค‘ ํŠธ๋ž˜ํ‚น์„ ์žƒ๋Š” ํฌ์ฆˆ ์ถ”์ • ์˜ค๋ฅ˜, ๊ทธ๋ฆฌ๊ณ  ๋ชฉํ‘œ ๋ฐฉํ–ฅ์œผ๋กœ ๋Œ๋ฆฌ๋‹ค ๋„๊ตฌ๋ฅผ ๋–จ์–ด๋œจ๋ฆฌ๋Š” ์† ๋‚ด ์žฌ๋ฐฐ์น˜ ์‹คํŒจ๊ฐ€ ์žˆ๋‹ค.

ํฅ๋ฏธ๋กœ์šด ์ ์€ ๊ฐ•ํ•œ ํšŒ๋ณต ํ–‰๋™(recovery behavior)์ด ๊ด€์ฐฐ๋œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋„๊ตฌ๋ฅผ ์ž ์‹œ ๋†“์ณค๋‹ค๊ฐ€ ๋‹ค์‹œ ์ง‘์–ด ๊ณผ์ œ๋ฅผ ์™„์ˆ˜ํ•˜๋Š” ์žฅ๋ฉด์ด ํฌ์ฐฉ๋˜์—ˆ๋‹ค. ์ด๋Š” RL ํ›ˆ๋ จ์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์‚ฐ๋ฌผ์ด๋‹ค โ€” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๋‹ค์–‘ํ•œ ์‹คํŒจ ์ƒํ™ฉ์„ ๊ฒฝํ—˜ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.


5. ๊ธฐ์ˆ ์  ์‹ฌ์ธต ๋ถ„์„

5.1 ์™œ ์ด๋ฏธ์ง€๊ฐ€ ์•„๋‹Œ ํฌ์ฆˆ ํ‘œํ˜„์ธ๊ฐ€?

์ด๊ฒƒ์€ ์‚ฌ์‹ค ์ค‘์š”ํ•œ ์„ค๊ณ„ ์„ ํƒ์ด๋‹ค. ์ตœ๊ทผ์˜ VLA(Vision-Language-Action) ํŠธ๋ Œ๋“œ๋Š” ์›์‹œ ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” End-to-End ์ ‘๊ทผ๋ฒ•์ด๋‹ค. SimToolReal์€ ์™œ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์„ ์„ ํƒํ–ˆ์„๊นŒ?

Sim-to-Real Gap์˜ ๊ด€์ ์—์„œ ๋ณด๋ฉด ๋ช…ํ™•ํ•ด์ง„๋‹ค. ์ด๋ฏธ์ง€๋ฅผ ์ •์ฑ… ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉด ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ์‹œ๊ฐ์  ์™ธ๊ด€(ํ…์Šค์ฒ˜, ์กฐ๋ช…, ๊ทธ๋ฆผ์ž)๊ณผ ํ˜„์‹ค์˜ ์ฐจ์ด๊ฐ€ ์ง์ ‘์ ์œผ๋กœ ์„ฑ๋Šฅ์„ ๋–จ์–ด๋œจ๋ฆฐ๋‹ค. ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋ฅผ ์•„๋ฌด๋ฆฌ ์ž˜ ํ•ด๋„ ์ด ๊ฐญ์„ ์™„์ „ํžˆ ์—†์• ๊ธฐ๋Š” ์–ด๋ ต๋‹ค.

๋ฐ˜๋ฉด 6D ํฌ์ฆˆ๋Š” ์ถ”์ƒ์ ์ด๊ณ  ๋„๋ฉ”์ธ ๋ถˆ๋ณ€์ ์ด๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ํ˜„์‹ค์—์„œ ๋™์ผํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„๋‹ค. ์ธ์‹ ํŒŒ์ดํ”„๋ผ์ธ(SAM 3D + FoundationPose)์ด ์ด ์ถ”์ƒํ™”๋ฅผ ๋‹ด๋‹นํ•˜๊ณ , RL ์ •์ฑ…์€ ์ˆœ์ˆ˜ํ•˜๊ฒŒ โ€œ์–ด๋–ป๊ฒŒ ๋ฌผ์ฒด๋ฅผ ์›€์ง์ผ ๊ฒƒ์ธ๊ฐ€โ€๋งŒ ์ง‘์ค‘ํ•œ๋‹ค.

์ด๋Š” ์ธ์‹๊ณผ ์ œ์–ด์˜ ๋ช…ํ™•ํ•œ ๋ถ„๋ฆฌ(Decoupling Perception and Control)์ด๋ฉฐ, ๊ฐ ๋ชจ๋“ˆ์ด ๋…๋ฆฝ์ ์œผ๋กœ ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ๋„ ์žˆ๋‹ค.

5.2 ์ ˆ์ฐจ์  ์ƒ์„ฑ์˜ ๋‹ค์–‘์„ฑ์ด ์™œ ์ค‘์š”ํ•œ๊ฐ€

์ •์ฑ…์ด ์ผ๋ฐ˜ํ™”ํ•˜๋ ค๋ฉด ํ›ˆ๋ จ ๋ถ„ํฌ๊ฐ€ ํ…Œ์ŠคํŠธ ๋ถ„ํฌ๋ฅผ ์ปค๋ฒ„ํ•ด์•ผ ํ•œ๋‹ค.

์‹ค์ œ ๋„๊ตฌ์˜ ํŠน์„ฑ ๊ณต๊ฐ„์„ \mathcal{T}๋ผ ํ•˜๊ณ , ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋œ ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒ์˜ ํŠน์„ฑ ๊ณต๊ฐ„์„ \mathcal{P}๋ผ ํ•˜๋ฉด:

\text{Zero-Shot Generalization} \propto \text{overlap}(\mathcal{T}, \mathcal{P})

์ ˆ์ฐจ์  ์ƒ์„ฑ์œผ๋กœ \mathcal{P}๋ฅผ ์ถฉ๋ถ„ํžˆ ๋„“๊ฒŒ ๋งŒ๋“ค๋ฉด, ์ƒˆ๋กœ์šด ์‹ค์ œ ๋„๊ตฌ๊ฐ€ \mathcal{P} ์•ˆ์— ๊ทผ์‚ฌ์ ์œผ๋กœ ์†ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„์ง„๋‹ค. ๋งˆ์ปค๋Š” โ€œ์–‡๊ณ  ๊ฐ€๋ฒผ์šด ์›ํ†ตํ˜• ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒโ€์™€ ์œ ์‚ฌํ•˜๊ณ , ๋ง์น˜๋Š” โ€œ๋ฌด๊ฒ๊ณ  ํ—ค๋“œ๊ฐ€ ํฐ ๋น„๋Œ€์นญ ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒโ€์™€ ์œ ์‚ฌํ•˜๋‹ค.

5.3 LSTM vs. Transformer: ์™œ LSTM์„ ์„ ํƒํ–ˆ๋Š”๊ฐ€

์ตœ๊ทผ RL ์ •์ฑ… ๋„คํŠธ์›Œํฌ๋กœ๋Š” Transformer ์•„ํ‚คํ…์ฒ˜๋„ ๋งŽ์ด ์‚ฌ์šฉ๋œ๋‹ค. SimToolReal์ด LSTM์„ ์„ ํƒํ•œ ๋ฐ๋Š” ์‹ค์šฉ์  ์ด์œ ๊ฐ€ ์žˆ๋‹ค. ๊ณ„์‚ฐ ํšจ์œจ์„ฑ ์ธก๋ฉด์—์„œ LSTM์€ 60Hz ์‹ค์‹œ๊ฐ„ ์ œ์–ด์— ์ ํ•ฉํ•œ ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ œ๊ณตํ•œ๋‹ค. ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค ์ฒ˜๋ฆฌ์—์„œ๋Š” ๋„๊ตฌ ์กฐ์ž‘ ๊ณผ์ œ์˜ ๊ธธ์ด๊ฐ€ ๋‹ค์–‘ํ•œ๋ฐ LSTM์˜ ์ˆจ๊ฒจ์ง„ ์ƒํƒœ(hidden state)๊ฐ€ ์ด๋ฅผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ฒ˜๋ฆฌํ•œ๋‹ค. ๋˜ํ•œ Isaac Gym ํ™˜๊ฒฝ์—์„œ ์ˆ˜์ฒœ ๊ฐœ์˜ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์„ ๋Œ๋ฆฌ๋Š” ๊ฒ€์ฆ๋œ ๊ตฌ์กฐ๋‹ค.

๋ฌผ๋ก  Transformer๊ฐ€ ๋” ๋‚˜์€ ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ฃผ์žฅ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด๋Š” ํ–ฅํ›„ ํƒ๊ตฌํ•ด๋ณผ ๋งŒํ•œ ๋ฐฉํ–ฅ์ด๋‹ค.


6. ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

6.1 Sim-to-Real RL ๊ณ„๋ณด

์—ฐ๊ตฌ ๋ฐฉ๋ฒ• ์ผ๋ฐ˜ํ™” ์ˆ˜์ค€ ๋น„๊ณ 
OpenAI Dactyl (2019) DR + RL ๋‹จ์ผ ๋ฌผ์ฒด (๋ฃจ๋น…์Šค ํ๋ธŒ) ๋Œ€๊ทœ๋ชจ ์—”์ง€๋‹ˆ์–ด๋ง
HORA (2022) ์ ์‘ํ˜• RL ๋‹จ์ผ ๋ฌผ์ฒด, ์—ฌ๋Ÿฌ ์„ค์ • ํ›ˆ๋ จ ์‹œ ๋ฌผ์ฒด ํ•„์š”
AnyGrasp (2023) ํฌ์ฆˆ ์ถ”์ • + ๋ชจ์…˜ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด, ํŒŒ์ง€๋งŒ ์กฐ์ž‘ ์—†์Œ
DexGraspNet (2023) RL + ์ด‰๊ฐ ํŒŒ์ง€ ํŠนํ™” ๋„๊ตฌ ์กฐ์ž‘ ์—†์Œ
SimToolReal (2026) RL + Object-Centric ๋‹ค์–‘ํ•œ ๋„๊ตฌ, ๋‹ค์–‘ํ•œ ๊ณผ์ œ Zero-Shot

6.2 Dex4D์™€์˜ ๋น„๊ต (๋™์‹œ๊ธฐ ๋ฐœํ‘œ)

๊ฐ™์€ ์‹œ๊ธฐ์— ๋‚˜์˜จ Dex4D (arXiv:2602.15828)์™€ ์ง์ ‘ ๋น„๊ตํ•˜๋ฉด ํฅ๋ฏธ๋กญ๋‹ค.

graph TD
    subgraph STR ["SimToolReal"]
        STR1["๋ชฉํ‘œ ํ‘œํ˜„: SE(3) ํฌ์ฆˆ"]
        STR2["์ธ์‹: FoundationPose\n6D ํฌ์ฆˆ ์ถ”์ "]
        STR3["๋ชฉํ‘œ ์ถœ์ฒ˜: ์ธ๊ฐ„ ์‹ค์ œ ๋น„๋””์˜ค"]
        STR4["ํ›ˆ๋ จ ๋ฌผ์ฒด: ๋„๊ตฌ ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒ"]
    end

    subgraph D4D ["Dex4D"]
        D4D1["๋ชฉํ‘œ ํ‘œํ˜„: 3D ํฌ์ธํŠธ ํŠธ๋ž™"]
        D4D2["์ธ์‹: 4D ์žฌ๊ตฌ์„ฑ\nํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ํŠธ๋ž˜ํ‚น"]
        D4D3["๋ชฉํ‘œ ์ถœ์ฒ˜: ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ"]
        D4D4["ํ›ˆ๋ จ ๋ฌผ์ฒด: ์ˆ˜์ฒœ ๊ฐœ ๋ฒ”์šฉ ๋ฌผ์ฒด"]
    end

    COMP["ํ•ต์‹ฌ ์ฐจ์ด\nSE(3) ํฌ์ฆˆ vs. ํฌ์ธํŠธ ํŠธ๋ž™\n์‹ค์ œ ๋น„๋””์˜ค vs. ์ƒ์„ฑ ๋น„๋””์˜ค"]

    STR --- COMP
    D4D --- COMP

๋‘ ๋ฐฉ๋ฒ•์˜ ์ฒ ํ•™์  ์ฐจ์ด๋Š” ๋ชฉํ‘œ ํ‘œํ˜„ ๋ฐฉ์‹์ด๋‹ค. SimToolReal์˜ SE(3) ํฌ์ฆˆ๋Š” ๋” ๊ฐ„๊ฒฐํ•˜๊ณ  ํ•ด์„ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ๋ฌผ์ฒด์˜ ํ˜•ํƒœ ๋ณ€ํ™”๋‚˜ ๋ถ€๋ถ„์  ๊ฐ€์‹œ์„ฑ์— ์ทจ์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค. Dex4D์˜ ํฌ์ธํŠธ ํŠธ๋ž™์€ ๋” ํ’๋ถ€ํ•œ ํ˜•ํƒœ ์ •๋ณด๋ฅผ ๋‹ด์ง€๋งŒ ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ์— ์˜์กดํ•œ๋‹ค.

6.3 ๋ชจ๋ฐฉ ํ•™์Šต ์ ‘๊ทผ๋ฒ•๋“ค๊ณผ์˜ ๋น„๊ต

ACT, Diffusion Policy ๊ฐ™์€ IL ๋ฐฉ๋ฒ•๋“ค์€ ๊ณ ํ’ˆ์งˆ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์„ ๋•Œ ๋งค์šฐ ๊ฐ•๋ ฅํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ SimToolReal์ด ๊ณต๋žตํ•˜๋Š” ์˜์—ญ์€ ๋ฐ”๋กœ ๊ทธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ์–ด๋ ค์šด ์ƒํ™ฉ์ด๋‹ค. ์–‡์€ ๋„๊ตฌ๋ฅผ ์ง‘์–ด ๋Œ๋ฆฌ๋Š” ๋™์ž‘์˜ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ์‹œ์—ฐ์€ ์ˆ˜์‹ญ ์‹œ๊ฐ„์˜ ์ž‘์—…์ž ๋…ธ๋ ฅ์ด ํ•„์š”ํ•˜๋‹ค.

SimToolReal์€ ์ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฌธ์ œ๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ์šฐํšŒํ•œ๋‹ค. ๋‹จ, ๊ทธ ๋Œ€๊ฐ€๋กœ ์ธ์‹ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ •ํ™•๋„์— ์˜์กดํ•˜๊ฒŒ ๋œ๋‹ค.


7. ๋น„ํŒ์  ๊ณ ์ฐฐ: ๊ฐ•์ ๊ณผ ํ•œ๊ณ„

7.1 ๊ฐ•์ 

์šฐ์•„ํ•œ ๋ฌธ์ œ ํ™˜์›: ๋ณต์žกํ•œ ๋„๊ตฌ ์กฐ์ž‘์„ ๋‹จ์ผ ๋ชฉํ‘œ ๋„๋‹ฌ ๋ฌธ์ œ๋กœ ํ™˜์›ํ•œ ๊ฒƒ์€ ์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ๋น›๋‚˜๋Š” ๊ธฐ์—ฌ๋‹ค. ์ด ๋‹จ์ˆœํ™”๊ฐ€ Zero-Shot ์ผ๋ฐ˜ํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค.

์ง„์ •ํ•œ Zero-Shot: ๋งŽ์€ ๋…ผ๋ฌธ์ด โ€œzero-shotโ€์„ ์ฃผ์žฅํ•˜์ง€๋งŒ, ์‹ค์ œ๋กœ๋Š” ํ›ˆ๋ จ ์‹œ ์œ ์‚ฌํ•œ ๋ฌผ์ฒด๋ฅผ ๋ดค๊ฑฐ๋‚˜ ํŒŒ์ธํŠœ๋‹์ด ์žˆ๋‹ค. SimToolReal์€ ์ง„์ •์œผ๋กœ ์ƒˆ๋กœ์šด ๋ฌผ์ฒด์™€ ๊ณผ์ œ์— zero-shot์ด๋‹ค.

์‹ค์šฉ์  ํŒŒ์ดํ”„๋ผ์ธ: SAM 3D + FoundationPose ์กฐํ•ฉ์€ ๊ธฐ์กด ๊ณต๊ฐœ ๋ชจ๋ธ์„ ํ™œ์šฉํ•ด ์žฌํ˜„ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ „์ฒด ์ฝ”๋“œ์™€ ์—์…‹์ด ๊ณต๊ฐœ๋˜์–ด ์žˆ์–ด ์ง์ ‘ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.

์˜๋ฏธ ์žˆ๋Š” ๋ฒค์น˜๋งˆํฌ ๊ธฐ์—ฌ: DexToolBench๋Š” ํ–ฅํ›„ ์—ฐ๊ตฌ์˜ ํ‘œ์ค€ ํ‰๊ฐ€ ํ”Œ๋žซํผ์ด ๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค.

์ŠคํŽ˜์…œ๋ฆฌ์ŠคํŠธ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ: Zero-Shot์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ฐ์ฒด๋ณ„ ์ „๋ฌธ ์ •์ฑ…๊ณผ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ ๊ฒƒ์€ ์ฃผ๋ชฉํ•  ๋งŒํ•˜๋‹ค.

7.2 ํ•œ๊ณ„

์ธ์‹ ํŒŒ์ดํ”„๋ผ์ธ ์˜์กด์„ฑ: ์ •์ฑ…์˜ ์„ฑ๋Šฅ์ด ํฌ์ฆˆ ์ถ”์ •์˜ ์ •ํ™•๋„์— ์ง๊ฒฐ๋œ๋‹ค. FoundationPose๊ฐ€ ํŠธ๋ž˜ํ‚น์„ ์žƒ์œผ๋ฉด ์ •์ฑ…์ด ์ž˜๋ชป๋œ ๊ด€์ธก ๊ธฐ๋ฐ˜์œผ๋กœ ํ–‰๋™ํ•œ๋‹ค. ํŠนํžˆ ๋น ๋ฅธ ๋ง์น˜ ์Šค์œ™ ๊ฐ™์€ ๋™์ž‘์—์„œ ์ด ๋ฌธ์ œ๊ฐ€ ๋“œ๋Ÿฌ๋‚œ๋‹ค.

๋‹จ๋‹จํ•œ ๋ฌผ์ฒด ํ•œ์ •: ๋„๊ตฌ๋ฅผ ๋‹จ๋‹จํ•œ ๊ฐ•์ฒด(rigid body)๋กœ ๊ฐ€์ •ํ•œ๋‹ค. ๋ถ“์˜ ํ„ธ ๋ถ€๋ถ„์ด๋‚˜ ์œ ์—ฐํ•œ ์†์žก์ด ๋“ฑ์€ ํ˜„์žฌ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ์–ด๋ ต๋‹ค.

๊ธฐ๋Šฅ์  ๊ฒฐ๊ณผ(functional outcome) ๋ฏธํ‰๊ฐ€: ๋…ผ๋ฌธ ์ œ๋ชฉ๊ณผ ๋™์˜์ƒ์€ ์ธ์ƒ์ ์ด์ง€๋งŒ, ์‹ค์ œ๋กœ ๋ชป์„ ๋ฐ•๊ฑฐ๋‚˜ ๋‚˜์‚ฌ๋ฅผ ์กฐ์ด๋Š” ๊ธฐ๋Šฅ์  ๊ฒฐ๊ณผ๋ฅผ ์„ฑ๊ณต ์ง€ํ‘œ๋กœ ํ‰๊ฐ€ํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค. Task Progress๋Š” ๋„๊ตฌ๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ ๊ถค์ ์„ ๋”ฐ๋ผ๊ฐ€๋Š”์ง€๋ฅผ ์ธก์ •ํ•œ๋‹ค.

๋‘ ์† ์กฐ์ž‘ ๋ถ€์žฌ: ํ˜„์‹ค์˜ ๋งŽ์€ ๋„๊ตฌ ์‚ฌ์šฉ์€ ํ•œ ์†์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๊ณ ์ •ํ•˜๊ณ  ๋‹ค๋ฅธ ์†์œผ๋กœ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ํ˜„์žฌ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹จ์ผ ๋‹ค์ง€ ํ•ธ๋“œ์—๋งŒ ์ ์šฉ๋œ๋‹ค.

ํฌ์ฆˆ ํ‘œํ˜„์˜ ํ•œ๊ณ„: SE(3) ํฌ์ฆˆ๋Š” ๋ฌผ์ฒด ์ „์ฒด์˜ rigid body ์ƒํƒœ๋งŒ ํ‘œํ˜„ํ•œ๋‹ค. ๋“œ๋ผ์ด๋ฒ„๋ฅผ ๋‚˜์‚ฌ์— ๊ฝ‚๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ฌผ์ฒด ๊ฐ„์˜ ์ƒ๋Œ€์  ๊ด€๊ณ„๊ฐ€ ์ค‘์š”ํ•œ ๊ฒฝ์šฐ, ํ˜„์žฌ ํ‘œํ˜„๋งŒ์œผ๋กœ๋Š” ๋ถ€์กฑํ•  ์ˆ˜ ์žˆ๋‹ค.

7.3 ์—ฐ๊ตฌ์ž๋ฅผ ์œ„ํ•œ ํ›„์† ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

์ด ๋…ผ๋ฌธ์„ ์ฝ์œผ๋ฉด์„œ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋– ์˜ค๋ฅด๋Š” ํ›„์† ์—ฐ๊ตฌ ์งˆ๋ฌธ๋“ค:

  • ํŒŒ์ง€ ๋ฐ”์šด๋”ฉ๋ฐ•์Šค ์ž๋™ ์ถ”๋ก : ๋ณต์žกํ•œ ํ˜•ํƒœ์˜ ๋„๊ตฌ์—์„œ โ€œ์žก์„ ๋ถ€๋ถ„โ€์„ ์ž๋™์œผ๋กœ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ ์–ด๋ ต๋‹ค. ํ˜„์žฌ๋Š” SAM 3D + ์ˆ˜๋™ ์„ ํƒ์œผ๋กœ ๋ณด์ธ๋‹ค. ์ด๋ฅผ ์–ธ์–ด ์ง€์‹œ(language grounding)๋กœ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ์„๊นŒ?

  • ๋ชฉํ‘œ ๊ถค์  ์ƒ์„ฑ ์ž๋™ํ™”: ์ธ๊ฐ„ ๋น„๋””์˜ค ์—†์ด ํ…์ŠคํŠธ ๋ช…๋ น์œผ๋กœ ๋ชฉํ‘œ ๊ถค์ ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€? ์˜ˆ๋ฅผ ๋“ค์–ด โ€œ๋ง์น˜๋กœ ๋ชป์„ ๋ฐ•์•„๋ผโ€๋ผ๋Š” ์ง€์‹œ๋กœ๋ถ€ํ„ฐ ๊ถค์ ์„ ์ž๋™ ์ƒ์„ฑ.

  • ๋‹ค๋ฅธ ํ”Œ๋žซํผ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ: 29-DoF ์‹œ์Šคํ…œ์— ์ตœ์ ํ™”๋œ ์ •์ฑ…์ด Allegro Hand๋‚˜ Shadow Hand ๊ฐ™์€ ๋‹ค๋ฅธ ๊ตฌ์„ฑ์—๋„ ์ผ๋ฐ˜ํ™”๋˜๋Š”๊ฐ€?

  • ์ด‰๊ฐ ๊ฐ์ง€ ํ†ตํ•ฉ: ํฌ์ฆˆ๋งŒ์œผ๋กœ๋Š” ํŒŒ์ง€๋ ฅ์˜ ์ถฉ๋ถ„ํ•จ์„ ์•Œ ์ˆ˜ ์—†๋‹ค. ์ด‰๊ฐ ์„ผ์„œ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ํŒŒ์ง€ ์•ˆ์ •์„ฑ์ด ํฌ๊ฒŒ ๊ฐœ์„ ๋  ๊ฒƒ์ด๋‹ค.


8. ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

SimToolReal์€ ๋กœ๋ด‡ ๋„๊ตฌ ์กฐ์ž‘ ๋ถ„์•ผ์— ์„ธ ๊ฐ€์ง€ ๊ทผ๋ณธ์  ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

์ฒซ์งธ, ๋ฌธ์ œ ์ •์˜์˜ ํ˜์‹ : ์ˆ˜์‹ญ ๊ฐ€์ง€ ๋‹ค๋ฅธ ๋„๊ตฌ์™€ ๊ณผ์ œ๋ฅผ ํ•˜๋‚˜์˜ ํ†ตํ•ฉ๋œ ๋ชฉํ‘œ ํฌ์ฆˆ ์ถ”์  ๋ฌธ์ œ๋กœ ํ™˜์›ํ–ˆ๋‹ค. ์ด ๋‹จ์ˆœํ™”๋Š” ๊ทธ ์ž์ฒด๋กœ ์ค‘์š”ํ•œ ํ†ต์ฐฐ์ด๋‹ค.

๋‘˜์งธ, ์ง„์ •ํ•œ Zero-Shot ์ผ๋ฐ˜ํ™”: ํ›ˆ๋ จ ๋•Œ ๋ณด์ง€ ๋ชปํ•œ 12๊ฐ€์ง€ ์‹ค์ œ ๋„๊ตฌ์™€ 24๊ฐ€์ง€ ๊ณผ์ œ์—์„œ ์ŠคํŽ˜์…œ๋ฆฌ์ŠคํŠธ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

์…‹์งธ, ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ์‹ค์šฉ์  ์‹œ์Šคํ…œ: ๊ณต๊ฐœ๋œ ์ฝ”๋“œ, ์—์…‹, DexToolBench ๋ฒค์น˜๋งˆํฌ๋Š” ํ›„์† ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ๊ฒฌ๊ณ ํ•œ ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.

์ด ๋…ผ๋ฌธ์ด ํŠนํžˆ ์˜๋ฏธ ์žˆ๋Š” ๊ฒƒ์€ ๋‹จ์ˆœํ•จ์ด ์ผ๋ฐ˜ํ™”๋ฅผ ์ด๊ธด๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋ณต์žกํ•œ ๊ณผ์ œ๋ณ„ ์—”์ง€๋‹ˆ์–ด๋ง ๋Œ€์‹ , ํ•˜๋‚˜์˜ ๋‹จ์ˆœํ•˜๊ณ  ๋ณดํŽธ์ ์ธ ๋ชฉํ‘œ โ€” ์ž„์˜์˜ ๋ฌผ์ฒด๋ฅผ ์ž„์˜์˜ ํฌ์ฆˆ๋กœ ์ด๋™ํ•˜๋ผ โ€” ๊ฐ€ ๋‹ค์–‘ํ•œ ์‹ค์ œ ๋„๊ตฌ ์‚ฌ์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค.

๋ฌผ๋ก  ์‹ค์ œ ๋ชป์„ ๋ฐ•๊ฑฐ๋‚˜ ๋‚˜์‚ฌ๋ฅผ ์กฐ์ด๋Š” ๊ฒƒ๊นŒ์ง€ ๊ฐ€๋Š” ๊ธธ์€ ์•„์ง ๋ฉ€๋‹ค. ํ•˜์ง€๋งŒ SimToolReal์€ ๊ทธ ๊ธธ์˜ ์ค‘์š”ํ•œ ์ด์ •ํ‘œ๋ฅผ ์„ธ์› ๋‹ค. ๋„๊ตฌ๋ฅผ ์ง‘๊ณ  ๊ธฐ๋Šฅ์  ์ž์„ธ๋กœ ๋Œ๋ฆฌ๋Š” ๊ฒƒ, ๊ทธ๊ฒƒ๋งŒ์œผ๋กœ๋„ ์ด๋ฏธ ๋กœ๋ด‡ ๋„๊ตฌ ์กฐ์ž‘์˜ ๊ฐ€์žฅ ์–ด๋ ค์šด ๊ด€๋ฌธ ์ค‘ ํ•˜๋‚˜๋ฅผ ํ†ต๊ณผํ•œ ๊ฒƒ์ด๋‹ค.

๋กœ๋ด‡๊ณตํ•™ ์—ฐ๊ตฌ์ž๋กœ์„œ ์ด ๋…ผ๋ฌธ์—์„œ ์–ป์–ด์•ผ ํ•  ๊ฐ€์žฅ ํฐ ๊ตํ›ˆ์€ ํ•˜๋‚˜๋‹ค. ์ข‹์€ ์ถ”์ƒํ™”(abstraction)๋ฅผ ์ฐพ์œผ๋ฉด, ๋‚˜๋จธ์ง€๋Š” ๋”ฐ๋ผ์˜จ๋‹ค.


์ฐธ๊ณ  ์ •๋ณด

  • ๋…ผ๋ฌธ: arXiv:2602.16863
  • ํ”„๋กœ์ ํŠธ ํŽ˜์ด์ง€: simtoolreal.github.io
  • ์ฝ”๋“œ: GitHub: tylerlum/simtoolreal
  • ์ €์ž: Kushal Kedia (Cornell), Tyler Ga Wei Lum (Stanford), Jeannette Bohg (Stanford), C. Karen Liu (Stanford)
  • ๋ฐœํ‘œ์ผ: 2026๋…„ 2์›” 18์ผ (v1), 24์ผ (v2)

ํ™œ์šฉ๋œ ์ฃผ์š” ์™ธ๋ถ€ ๋ชจ๋ธ

๋ชจ๋ธ ์šฉ๋„ ๋น„๊ณ 
FoundationPose (NVIDIA) 6D ํฌ์ฆˆ ์ถ”์ • ๋ฐ ํŠธ๋ž˜ํ‚น CVPR 2024
SAM 3D (Meta) 3D ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ + ๋ฉ”์‰ฌ ์žฌ๊ตฌ์„ฑ SAM ๊ธฐ๋ฐ˜ ํ™•์žฅ
Isaac Gym (NVIDIA) RL ํ›ˆ๋ จ ํ™˜๊ฒฝ (GPU ๋ณ‘๋ ฌ) ์ˆ˜์ฒœ ๊ฐœ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ

Copyright 2026, JungYeon Lee