Curieux.JY
  • JungYeon Lee
  • Post
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ํ•œ๋ˆˆ์— ๋ณด๊ธฐ
    • ์™œ ์ด ๋ฌธ์ œ๊ฐ€ ์–ด๋ ค์šด๊ฐ€
    • ํฐ ๊ทธ๋ฆผ: ๋‘ ๋‹จ๊ณ„ ํ•™์Šต ๊ตฌ์กฐ
    • Teacher ์ •์ฑ…: ์‹ ์˜ ์‹œ์•ผ๋กœ RL์„ ๋‹จ์ˆœํ•˜๊ฒŒ
    • Student ์ •์ฑ…: ๋น„์ „๊ณผ ์ด‰๊ฐ์„ ์–ด๋–ป๊ฒŒ ๋ฌถ์„ ๊ฒƒ์ธ๊ฐ€
    • ๋ณด์ƒ๊ณผ ๋„๋ฉ”์ธ ๋žœ๋คํ™” ๋””ํ…Œ์ผ
    • ์‹คํ—˜ ์„ค์ •
    • ๊ฒฐ๊ณผ: ๋ฌด์—‡์ด ๋ณด์—ฌ์กŒ๋‚˜
    • Ablation: ๋ฌด์—‡์ด ์ •๋ง ์ค‘์š”ํ•œ๊ฐ€
    • ๊ฐ•์ 
    • ์•ฝ์ ๊ณผ ํ•œ๊ณ„
    • ๊ด€๋ จ ์—ฐ๊ตฌ ์ง€๋„
    • Allegro Hand ์‚ฌ์šฉ์ž ๊ด€์ ์—์„œ์˜ ์‹œ์‚ฌ์ 
    • ์ง๊ด€์  ์ •๋ฆฌ: ์ด ๋…ผ๋ฌธ์ด ์ง„์งœ๋กœ ๋ณด์—ฌ์ค€ ๊ฒƒ

๐Ÿ“ƒRotateIt ๋ฆฌ๋ทฐ

tactile
dexterous manipulation
General In-Hand Object Rotation with Vision and Touch
Published

March 24, 2026

  • Paper Link

  • Project Link

  • Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

  • UC Berkeley, Meta AI, CMU, TU Dresden

  • Conference on Robot Learning (CoRL), 2023

  1. ๐Ÿš€ ๋ณธ ์—ฐ๊ตฌ๋Š” multimodal sensory input(์‹œ๊ฐ, ์ด‰๊ฐ, proprioception)์„ ํ™œ์šฉํ•˜์—ฌ ์†๊ฐ€๋ฝ ๋์œผ๋กœ ๋‹ค์–‘ํ•œ ์ถ•์—์„œ ๋ฌผ์ฒด๋ฅผ ํšŒ์ „์‹œํ‚ค๋Š” ์‹œ์Šคํ…œ์ธ RotateIt์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ’ก RotateIt์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ privileged information์„ ํ™œ์šฉํ•œ oracle policy๋ฅผ ํ›ˆ๋ จํ•œ ํ›„, visuotactile transformer๋ฅผ ํ†ตํ•ด ํ˜„์‹ค์ ์ธ ์„ผ์„œ ์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ ์ด ์ •๋ณด๋ฅผ ์ถ”๋ก ํ•˜์—ฌ visuotactile policy๋ฅผ ํ•™์Šตํ•˜๋Š” ์ด์ค‘ ๋‹จ๊ณ„ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  3. โœ… Vision ๋ฐ tactile sensing์ด ์กฐ์ž‘ ์„ฑ๋Šฅ๊ณผ OOD(Out-of-Distribution) generalization์— ์ค‘์š”ํ•จ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จ๋œ ์ •์ฑ…์ด ์‹ค์ œ ์„ธ๊ณ„์˜ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

Haozhi Qi ์™ธ ์—ฐ๊ตฌ์ง„์€ โ€œGeneral In-Hand Object Rotation with Vision and Touchโ€ ๋…ผ๋ฌธ์—์„œ ์‹œ๊ฐ ๋ฐ ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ์„ ํ†ตํ•ฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋ฅผ ์† ์•ˆ์—์„œ ๋‹ค์ถ• ํšŒ์ „์‹œํ‚ค๋Š” ์‹œ์Šคํ…œ์ธ RotateIt์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ์กฐ์ž‘ ๊ธฐ์ˆ ์ด ์ง๋ฉดํ–ˆ๋˜ ์ผ๋ฐ˜ํ™” ๋ฐ ์•ˆ์ •์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉฐ, ํŠนํžˆ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ๊ฐ์ฒด์— ๋Œ€ํ•œ ์•ˆ์ •์ ์ธ ํž˜ ํ์‡„(force closure) ์œ ์ง€์˜ ์–ด๋ ค์›€์„ ๊ทน๋ณตํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  (Core Methodology)

RotateIt์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จ๋˜๊ณ  ์‹ค์ œ ์„ธ๊ณ„์— ์ง์ ‘ ๋ฐฐํฌ๋˜๋Š” sim-to-real ์ ‘๊ทผ ๋ฐฉ์‹์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ํ›ˆ๋ จ์€ ํฌ๊ฒŒ ๋‘ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  1. Oracle Policy ํ›ˆ๋ จ (Oracle Policy Training):
    • ํŠน๊ถŒ ์ •๋ณด (Privileged Information): ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ๊ฐ์ฒด์˜ ์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ๊ณผ ํ˜•์ƒ ์ •๋ณด(ground-truth physical properties and shapes)๋ฅผ โ€œํŠน๊ถŒ ์ •๋ณด(extrinsics)โ€ z_t ๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด ์ •๋ณด๋Š” ์ •์ฑ…์ด ๊ฐ์ฒด์˜ ํŠน์„ฑ์„ ์™„๋ฒฝํ•˜๊ฒŒ ์•„๋Š” ์ƒํƒœ๋ฅผ ๋ชจ๋ฐฉํ•ฉ๋‹ˆ๋‹ค.
    • ํ˜•์ƒ ์ •๋ณด (Shape Information): ๊ฐ์ฒด์˜ 3D ๋ฉ”์‹œ์—์„œ N_p๊ฐœ์˜ ํฌ์ธํŠธ(point)๋ฅผ ์ƒ˜ํ”Œ๋งํ•œ ํ›„, PointNet [72]์„ ์‚ฌ์šฉํ•˜์—ฌ c_p์ฐจ์›์˜ ํŠน์ง• ๋ฒกํ„ฐ z_{shape_t}๋กœ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ์—ฐ๊ตฌ์™€ ๋‹ฌ๋ฆฌ ๊ฐ์ฒด์˜ ๋ช…์‹œ์ ์ธ ํ˜•์ƒ ์ •๋ณด๋ฅผ ์ •์ฑ…์— ์ฃผ์ž…ํ•˜๋Š” ๊ฒƒ์ด ๋ณต์žกํ•œ ๊ฐ์ฒด ์กฐ์ž‘์— ์ค‘์š”ํ•จ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ๋ฌผ๋ฆฌ์  ์†์„ฑ ๋ฐ ์ž์„ธ (Physical Property and Pose): ๊ฐ์ฒด์˜ ์งˆ๋Ÿ‰(mass), ๋ฌด๊ฒŒ ์ค‘์‹ฌ(center of mass), ๋งˆ์ฐฐ ๊ณ„์ˆ˜(coefficient of friction), ์Šค์ผ€์ผ(scale), ๋ฐ˜๋ฐœ ๊ณ„์ˆ˜(restitution)๋ฅผ ํฌํ•จํ•˜๋Š” 7์ฐจ์› ๋ฌผ๋ฆฌ์  ์†์„ฑ ๋ฒกํ„ฐ์™€ ๊ฐ์ฒด์˜ ์œ„์น˜, ์ž์„ธ(orientation, ์ฟผํ„ฐ๋‹ˆ์–ธ), ๊ฐ์†๋„(angular velocity)๋ฅผ ํฌํ•จํ•˜๋Š” 10์ฐจ์› ์ž์„ธ ๋ฒกํ„ฐ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ 8์ฐจ์› ์ธ์ฝ”๋”ฉ z_{phys_t}๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข… ํŠน๊ถŒ ์ธ์ฝ”๋”ฉ z_t๋Š” z_{phys_t}์™€ z_{shape_t}๋ฅผ ๊ฒฐํ•ฉํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค: z_t = [z_{phys_t}, z_{shape_t}].
    • ๊ด€์ธก ๋ฐ ์ถœ๋ ฅ (Observations and Outputs): Oracle policy \pi๋Š” ๋กœ๋ด‡์˜ ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ(proprioception) p_t์™€ ์ธ์ฝ”๋”ฉ๋œ ํŠน๊ถŒ ์ •๋ณด z_t๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค. p_t๋Š” ์กฐ์ธํŠธ ์œ„์น˜ ๋ฐ ์ด์ „ ์•ก์…˜์˜ ์งง์€ ์‹œ๊ฐ„ ์œˆ๋„์šฐ(temporal window)๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์€ 16๊ฐœ ๊ด€์ ˆ์— ๋Œ€ํ•œ PD Controller์˜ ๋ชฉํ‘œ๊ฐ’์ธ ์•ก์…˜ a_t๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, a_t = \pi(p_t, z_t)์ž…๋‹ˆ๋‹ค.
    • ๋ณด์ƒ ํ•จ์ˆ˜ (Reward Function): ๊ฐ์ฒด ํšŒ์ „ ๋ณด์ƒ r_{rotr} = \max(\min(\omega \cdot k, r_{max}), r_{min})์€ ๊ฐ์ฒด์˜ ๊ฐ์†๋„ \omega๊ฐ€ ๋ชฉํ‘œ ํšŒ์ „์ถ• k์™€ ์ผ์น˜ํ•˜๋„๋ก ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. ์˜๋„ํ•˜์ง€ ์•Š์€ ํšŒ์ „(ํŠนํžˆ x, y์ถ•)์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด r_{rotp} = -\|\omega \times k\|_1 ํ˜•ํƒœ์˜ ํŽ˜๋„ํ‹ฐ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด์™ธ์—๋„ ์† ์ž์„ธ ์ดํƒˆ, ํ† ํฌ, ์—๋„ˆ์ง€ ์†Œ๋ชจ, ๊ฐ์ฒด ์„ ํ˜• ์†๋„์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ ํ•ญ์„ ํฌํ•จํ•˜์—ฌ ์•ˆ์ •์ ์ด๊ณ  ํšจ์œจ์ ์ธ ๋™์ž‘์„ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.
    • ์ •์ฑ… ์ตœ์ ํ™” (Policy Optimization): PPO [75]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Oracle policy๋ฅผ ์ตœ์ ํ™”ํ•˜๋ฉฐ, ํ›ˆ๋ จ ์ค‘ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์™€ ๋ฌด์ž‘์œ„ํ™”๋œ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ ๋ฐ ์ดˆ๊ธฐ ๊ทธ๋ฆฝ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  2. Visuotactile Policy ํ›ˆ๋ จ (Visuotactile Policy Training):
    • ๋™๊ธฐ (Motivation): ์‹ค์ œ ์„ธ๊ณ„์—์„œ๋Š” ํŠน๊ถŒ ์ •๋ณด z_t์— ์ ‘๊ทผํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ, ๋กœ๋ด‡์˜ ์‹ค์ œ ๊ด€์ธก(์‹œ๊ฐ, ์ด‰๊ฐ, ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ)์„ ํ†ตํ•ด z_t์˜ ํ‘œํ˜„ \hat{z}_t๋ฅผ ์ถ”๋ก ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด‰๊ฐ ์„ผ์‹ฑ (Touch Sensing - Figure 4):
      • ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ์ œ๊ณตํ•˜๋Š” 2D ํ‰๋ฉด์ƒ์˜ ์ด์‚ฐํ™”๋œ ์ ‘์ด‰ ์œ„์น˜(discretized contact location)๋ฅผ ์ด‰๊ฐ ์ •๋ณด์˜ ๋Œ€์šฉ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค (8๊ฐœ ์œ„์น˜). ์ ‘์ด‰ ๊ด€์ธก o_{touch_t}๋Š” ์ ‘์ด‰ ๊ฐœ์ˆ˜ N_c์— 9์ฐจ์› ๋ฐฐ์—ด(8์ฐจ์› ์ ‘์ด‰ ์œ„์น˜ + 1์ฐจ์› ์†๊ฐ€๋ฝ ์ธ๋ฑ์Šค)์ž…๋‹ˆ๋‹ค. MLP๋ฅผ ํ†ตํ•ด ๊ฐ ์ ‘์ด‰ ์ •๋ณด๋ฅผ ์ฒ˜๋ฆฌํ•œ ํ›„ ํ‰๊ท  ํ’€๋ง(average pooling)์œผ๋กœ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค.
      • ์‹ค์ œ ์„ธ๊ณ„์—์„œ๋Š” ์†๊ฐ€๋ฝ ๋์— ์žฅ์ฐฉ๋œ 4๊ฐœ์˜ ์ „๋ฐฉํ–ฅ(omnidirectional) ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ(vision-based touch sensor)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์„ผ์„œ์—์„œ ๊ฐ€์žฅ ๊ฐ•ํ•œ ํ”ฝ์…€์˜ ๋ณ€ํ˜•(deformation)์„ ์ถ”์ ํ•˜์—ฌ ์ ‘์ด‰ ์œ„์น˜์˜ ๋Œ€์šฉ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ด 2D ํ‚คํฌ์ธํŠธ(keypoint)๋ฅผ ์ง์ ‘ ์ •์ฑ…์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ์‹œ๊ฐ ์„ผ์‹ฑ (Vision Sensing - Figure 5):
      • ๊ฐ์ฒด ๊นŠ์ด ์ •๋ณด(object depth)๋ฅผ ์‹œ๊ฐ ํ‘œํ˜„์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์‹ค์ œ ์„ธ๊ณ„์—์„œ ์‚ฌ๋žŒ์˜ ๋ผ๋ฒจ๋ง์ด ํ•„์š” ์—†๊ณ , RGB ์ด๋ฏธ์ง€์˜ ์‚ฌ์‹ค์ ์ธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ์–ด๋ ค์šด ๋ฐ˜๋ฉด ๊นŠ์ด ์ •๋ณด๋Š” ๊ฐ์ฒด ํ˜•์ƒ์„ ์ž˜ ์ถ”์ƒํ™”ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
      • ์‹ค์ œ ๋ฐฐํฌ ์‹œ์—๋Š” Segment Anything [12, 13]์„ ์‚ฌ์šฉํ•˜์—ฌ ์›์‹œ ๊นŠ์ด ์ด๋ฏธ์ง€(raw depth)์—์„œ ๊ฐ์ฒด ์ „๊ฒฝ(foreground)์„ ๋ถ„ํ• ํ•˜์—ฌ sim-to-real gap์„ ์ค„์ž…๋‹ˆ๋‹ค.
      • ๊ฐ์ฒด ๊นŠ์ด ์ด๋ฏธ์ง€ o_{depth_t}๋Š” 3-layer ConvNet์„ ๊ฑฐ์ณ ํŠน์ง• ๋ฒกํ„ฐ f_{depth_t}๋กœ ์ธ์ฝ”๋”ฉ๋ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ค‘์—๋Š” ์นด๋ฉ”๋ผ ์œ„์น˜์™€ ๋ฐฉํ–ฅ์„ ๋ฌด์ž‘์œ„ํ™”ํ•˜์—ฌ ์ •์ฑ…์˜ ๊ฐ•๊ฑด์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
    • Visuotactile Transformer (Figure 2):
      • ์ด ํŠธ๋žœ์Šคํฌ๋จธ \phi๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ(multimodal) ์„ผ์„œ ์ŠคํŠธ๋ฆผ์„ ๋ชจ๋ธ๋งํ•˜์—ฌ ํŠน๊ถŒ ์ •๋ณด์˜ ํ•™์Šต๋œ ํ‘œํ˜„ \hat{z}_t๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
      • ์ธ์ฝ”๋”ฉ๋œ ๊นŠ์ด ์ด๋ฏธ์ง€ f_{depth_t}, ์ธ์ฝ”๋”ฉ๋œ ์ด‰๊ฐ ์ ‘์ด‰ ํฌ์ธํŠธ f_{touch_t}, ์กฐ์ธํŠธ ์œ„์น˜ q_t, ์ด์ „ ํƒ€์ž„์Šคํ…์˜ ์•ก์…˜ a_{t-1}์„ ์—ฐ๊ฒฐํ•˜์—ฌ ํŠน์ง• ๋ฒกํ„ฐ f_t๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค.
      • ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ํŠน์ง•๋“ค์˜ ์‹œํ€€์Šค f_T = \{f_{t-k}, ..., f_{t-1}, f_t\}๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์˜ˆ์ธก๋œ ์™ธ์  ๋ฒกํ„ฐ \hat{z}_t๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ํ›ˆ๋ จ (Training): Oracle policy๋ฅผ ๋กค์•„์›ƒ(rollout)ํ•˜๋ฉด์„œ ์˜ˆ์ธก๋œ ์™ธ์  ๋ฒกํ„ฐ \hat{z}_t๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์•ก์…˜ a_t = \pi(p_t, \hat{z}_t)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋™์‹œ์— ์‹ค์ œ ํŠน๊ถŒ ์ •๋ณด z_t๋ฅผ ์ €์žฅํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹ B = \{(f_T^{(i)}, z_t^{(i)}, \hat{z}_t^{(i)})\}_{i=1}^N์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ \phi๋Š” z_t์™€ \hat{z}_t ๊ฐ„์˜ l_2 ๊ฑฐ๋ฆฌ ๋ฐ a_t์™€ \hat{a}_t ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋„๋ก Adam [78]์„ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ ํ™”๋ฉ๋‹ˆ๋‹ค.

ํ‰๊ฐ€ ์„ค์ • (Evaluation Setup)

  • ํ•˜๋“œ์›จ์–ด (Hardware): AllegroHand (16๊ฐœ ๊ด€์ ˆ), Intel RealSense D435 ๊นŠ์ด ์นด๋ฉ”๋ผ, ์†๊ฐ€๋ฝ ๋์— ์ „๋ฐฉํ–ฅ ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ.
  • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ (Simulation): IsaacGym [79] ๊ธฐ๋ฐ˜. ์นด๋ฉ”๋ผ-๋กœ๋ด‡ ์™ธ์ (extrinsics)์€ ArUco tag [80]๋กœ ๋ณด์ •๋˜๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ด๋ฏธ์ง€์— ๋ฌด์ž‘์œ„ ์ž์„ธ ๋…ธ์ด์ฆˆ์™€ ์‚ฌ์‹ค์ ์ธ ๊นŠ์ด ๋…ธ์ด์ฆˆ [81]๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ์ฒด ์„ธํŠธ (Object Set): EGAD [30], Google Scanned Objects [31], YCB [32], ContactDB [33]์—์„œ ์—„์„ ๋œ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋“ค์„ ์‚ฌ์šฉํ•˜๋ฉฐ, width/depth/height (w/d/h) ๋น„์œจ์ด 2.0 ๋ฏธ๋งŒ์ธ ๊ฐ์ฒด๋“ค๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • ํ‰๊ฐ€ ์ง€ํ‘œ (Evaluation Metrics):
    • Time-to-Fall (TTF): ๊ฐ์ฒด๊ฐ€ ์†์—์„œ ๋–จ์–ด์ง€๊ธฐ ์ „๊นŒ์ง€์˜ ํ‰๊ท  ์—ํ”ผ์†Œ๋“œ ๊ธธ์ด (๋†’์„์ˆ˜๋ก ์ข‹์Œ).
    • Rotation Reward (RotR): ์—ํ”ผ์†Œ๋“œ ๋‹น ํ‰๊ท  \omega \cdot k ๊ฐ’ (๋†’์„์ˆ˜๋ก ์ข‹์Œ).
    • Rotation Penalty (RotP): ํƒ€์ž„์Šคํ… ๋‹น ํ‰๊ท  \|\omega \times k\| ๊ฐ’ (๋‚ฎ์„์ˆ˜๋ก ์ข‹์Œ, ํŠนํžˆ x, y์ถ• ํšŒ์ „์—์„œ ์ค‘์š”).
    • Radians Rotated (Rotations): ์‹ค์ œ ์„ธ๊ณ„ ์‹คํ—˜์—์„œ ๋‹ฌ์„ฑ๋œ ์ด ํšŒ์ „ ๊ฐ๋„.

๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„ (Results and Analysis)

  • ๊ฐ์ฒด ํ˜•์ƒ์˜ ์ค‘์š”์„ฑ (Object Shape Importance): Table 1๊ณผ Figure 7, Figure 8์€ Oracle policy ํ›ˆ๋ จ์—์„œ PointNet์„ ํ†ตํ•ด ๊ฐ์ฒด ํ˜•์ƒ ์ •๋ณด๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ํŠนํžˆ ๋ถˆ๊ทœ์น™ํ•˜๊ฑฐ๋‚˜ w/d/h ๋น„์œจ์ด ๊ท ์ผํ•˜์ง€ ์•Š์€ ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋” ํฐ ์ด์ ์„ ์ œ๊ณตํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํ˜•์ƒ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฉด ์ •์ฑ…์ด ๊ฐ์ฒด๋ฅผ ๊ตฌํ˜• ๋˜๋Š” ์ง์œก๋ฉด์ฒด๋กœ ๊ฐ„์ฃผํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋ฉฐ, ์ด๋Š” OOD(out-of-distribution) ๊ฐ์ฒด์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ต๋‹ˆ๋‹ค.
  • Visuotactile Transformer์˜ ์ค‘์š”์„ฑ (Importance of Visuotactile Transformer): Figure 6, Figure 7, Figure 8, Table 4๋Š” ์‹œ๊ฐ ๋˜๋Š” ์ด‰๊ฐ ์ค‘ ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉํ•ด๋„ ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ(proprioception)๋งŒ์„ ์‚ฌ์šฉํ•œ baseline๋ณด๋‹ค ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜๋ฉฐ, ์ด ๋‘˜์„ ๊ฒฐํ•ฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ๋”์šฑ ๊ฐœ์„ ๋˜์–ด Oracle policy ์ˆ˜์ค€์— ๊ทผ์ ‘ํ•จ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋Š” ์ด์ „ ์ž‘์—…์˜ Temporal Convolution๋ณด๋‹ค ์‹œํ€€์Šค ๋ชจ๋ธ๋ง ๋Šฅ๋ ฅ์ด ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค (Table 4). OOD ์ผ๋ฐ˜ํ™”์—๋„ Visuotactile ์ •๋ณด๊ฐ€ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.
  • ์„ธ๋ถ„ํ™”๋œ ์ด‰๊ฐ ์„ผ์‹ฑ (Finer Tactile Sensing): Table 2๋Š” ์ด์ง„(binary) ์ ‘์ด‰ ์ •๋ณด(์ ‘์ด‰ ์œ ๋ฌด)๊ฐ€ ์ถ”๊ฐ€์ ์ธ ์ด์ ์„ ์ œ๊ณตํ•˜์ง€ ์•Š๋Š” ๋ฐ˜๋ฉด, ์ด์‚ฐํ™”๋œ ์ ‘์ด‰ ์œ„์น˜(discretized contact location) ์ •๋ณด๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๋งค์šฐ ์ค‘์š”ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Š” ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ ๋ฐ ์•ก์…˜ ์ด๋ ฅ์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” RotateIt์˜ ํŠน์„ฑ ๋•Œ๋ฌธ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ž ์žฌ ๊ณต๊ฐ„์—์„œ ํ•™์Šต๋œ ํ‘œํ˜„ (Representation Learned in the Latent Space): Figure 9๋Š” ํ•™์Šต๋œ z_t ๋ฐ \hat{z}_t ์ธ์ฝ”๋”ฉ์ด ๊ฐ์ฒด์˜ 3D ํ˜•์ƒ ์ •๋ณด๋ฅผ ์ž˜ ๋ณด์กดํ•˜๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํŠน๊ถŒ ์ •๋ณด์— ํ˜•์ƒ์ด ํฌํ•จ๋˜๋ฉด ์ •์ฑ…์€ ๊ฐ์ฒด์˜ ์‹ค์ œ ํ˜•์ƒ์„ ๋” ์ •ํ™•ํ•˜๊ฒŒ ์ดํ•ดํ•˜๊ณ , Visuotactile ์„ผ์„œ๋Š” ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ๋งŒ์œผ๋กœ๋Š” ๊ตฌ๋ถ„ํ•˜๊ธฐ ์–ด๋ ค์šด ๋ถˆ๊ทœ์น™ํ•œ ๊ฐ์ฒด์˜ ํ˜•์ƒ ์ดํ•ด๋ฅผ ๋•์Šต๋‹ˆ๋‹ค.
  • ์‹ค์ œ ์„ธ๊ณ„ ํ‰๊ฐ€ (Real-world Evaluations): Figure 10์€ RotateIt์ด Hora [7]์™€ ๋‹ฌ๋ฆฌ ์‹ค์ œ ์„ธ๊ณ„์—์„œ ๋‹ค์–‘ํ•œ ๊ธฐํ•˜ํ•™์  ํ˜•ํƒœ์˜ ๊ฐ์ฒด๋“ค์„ x์ถ•์„ ๋”ฐ๋ผ ์„ฑ๊ณต์ ์œผ๋กœ ํšŒ์ „์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. RotateIt์€ ํ›ˆ๋ จ ์„ธํŠธ์— ์—†๋Š” ๊ฐ์ฒด๋“ค๊ณผ ์‹ค์ œ ์„ธ๊ณ„์˜ ๋ฌผ๋ฆฌ์  ์ฐจ์ด์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋›ฐ์–ด๋‚œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ์ด‰๊ฐ ์„ผ์„œ๊ฐ€ ๋น„ํ™œ์„ฑํ™”๋œ ์ƒํƒœ์—์„œ๋„ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐ•๊ฑด์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ๋‹ค์ถ• ํ›ˆ๋ จ (Multi-axis Training): Table 3์€ ๋‹จ์ผ ๋„คํŠธ์›Œํฌ๊ฐ€ ์—ฌ๋Ÿฌ ํšŒ์ „์ถ•์— ๋Œ€ํ•œ ๊ฐ์ฒด ํšŒ์ „์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์›ํ•˜๋Š” ํšŒ์ „์ถ• k๋ฅผ ๊ด€์ธก ๊ณต๊ฐ„์— ์ถ”๊ฐ€ํ•˜๊ณ  ๋ชจ๋ฐฉ ํ•™์Šต(imitation learning) ๋ชฉํ‘œ์™€ ํ•จ๊ป˜ ํ›ˆ๋ จํ•˜๋ฉด, ์ฆ๋ฅ˜๋œ ๋‹ค์ถ• ์ •์ฑ…(distilled multi-axis policy)์ด ๋‹จ์ผ ์ถ• Oracle ์ •์ฑ…๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

ํ•œ๊ณ„ ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ (Limitations and Future Work)

๋ณธ ์—ฐ๊ตฌ์˜ ํ•œ๊ณ„์ ์œผ๋กœ๋Š” ๊ฐ์ฒด๊ฐ€ ๋กœ๋ด‡ ํŒ”์˜ ๊ธฐ๊ณ„์  ํ•œ๊ณ„ ๋‚ด์— ์žˆ์–ด์•ผ ํ•˜๋ฉฐ ๋„ˆ๋ฌด ๊ธธ์ง€ ์•Š์•„์•ผ ํ•œ๋‹ค๋Š” ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ›ˆ๋ จ ํ›„ ์ •์ฑ…์ด ๊ณ ์ •(frozen)๋˜์–ด ๋ฐฐํฌ ์ค‘ ์‹ค์ œ ๊ฒฝํ—˜์„ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ์ ๋„ ํ•œ๊ณ„์ž…๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์œผ๋กœ๋Š” ๊ต์ฐจ ๋ชจ๋‹ฌ(cross-modal) ๊ฐ๋…์„ ํ†ตํ•œ ์‹ค์ œ ์„ธ๊ณ„์—์„œ์˜ ํ‰์ƒ ํ•™์Šต(lifelong learning), ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ์˜ ์ „์ฒด ์ •๋ณด ํ™œ์šฉ, ์‹œ๊ฐ ์‹œ์Šคํ…œ ๊ฐœ์„ (์˜ˆ: ์‹œ๊ฐ ์‚ฌ์ „ ํ›ˆ๋ จ) ๋“ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

RotateIt์€ ์ด‰๊ฐ ๋ฐ ์‹œ๊ฐ ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ๋กœ๋ด‡์ด ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋ฅผ ์† ์•ˆ์—์„œ ๋‹ค์ถ•์œผ๋กœ ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ์œผ๋กœ์จ, ์ผ๋ฐ˜์ ์ธ ๋ฑ์Šคํ„ฐ๋Ÿฌ์Šค(dexterous) ์† ์กฐ์ž‘์„ ํ–ฅํ•œ ์ค‘์š”ํ•œ ๋ฐœ๊ฑธ์Œ์„ ๋‚ด๋”›์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

ํ•œ๋ˆˆ์— ๋ณด๊ธฐ

์ด ๋…ผ๋ฌธ์€ ํ•œ ๊ฐ€์ง€ ๋‹จ์ˆœํ•œ ์งˆ๋ฌธ์— ๋‹ตํ•œ๋‹ค. โ€œ์–ด๋–ค ๋ชจ์–‘์˜ ๋ฌผ์ฒด๋“ , ์–ด๋–ค ์ถ•์œผ๋กœ๋“  ์† ์•ˆ์—์„œ ๋Œ๋ฆด ์ˆ˜ ์žˆ๋Š” ์ •์ฑ…์„ ํ•˜๋‚˜๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€ ๋‹ต์€ โ€œ์žˆ๋‹คโ€์ด๊ณ , ๊ทธ ๋น„๊ฒฐ์€ ๋‘ ๊ฐ€์ง€๋‹ค. (1) ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์•ˆ์—์„œ๋Š” ์‹ ์˜ ์‹œ์•ผ๋ฅผ ๊ฐ€์ง„ ๊ต์‚ฌ ์ •์ฑ…์„ ๋งŒ๋“ค๊ณ , ํ˜„์‹ค์—์„œ๋Š” ๋น„์ „๊ณผ ์ด‰๊ฐ๋งŒ ๋ณด๋Š” ํ•™์ƒ ์ •์ฑ…์œผ๋กœ ๊ฐˆ์•„๋ผ์šฐ๋Š” teacher-student ์ฆ๋ฅ˜. (2) ๊นŠ์ด ์นด๋ฉ”๋ผ ํ•œ ๋Œ€์™€ ์† ์œ„์˜ ์ด์ง„(binary) ์ ‘์ด‰ ์‹ ํ˜ธ ๋ช‡์‹ญ ๊ฐœ๋ฅผ ๋‹จ์ผ ํŠธ๋žœ์Šคํฌ๋จธ์— ํ˜๋ ค ๋„ฃ๋Š” ๋‹จ์ˆœํ•œ ํ‘œํ˜„. ๊ฒฐ๊ณผ์ ์œผ๋กœ Allegro Hand๊ฐ€ ์ž„์˜์˜ ํšŒ์ „์ถ• \hat{k}๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋ฅผ ๋Š๊น€ ์—†์ด ๋Œ๋ฆฐ๋‹ค.

ํ•ต์‹ฌ ๋ฉ”์‹œ์ง€๋ฅผ ๋จผ์ € ๋ฐ•์•„๋‘์ž. ํ’๋ถ€ํ•œ ์„ผ์„œ๋‚˜ ์ •๊ตํ•œ ๋ชจ๋ธ๋ณด๋‹ค, ๋ฌด์—‡์„ ์ž…๋ ฅ์œผ๋กœ ์ค„์ง€์™€ ์–ด๋–ป๊ฒŒ ํ•™์Šต ์‹ ํ˜ธ๋ฅผ ์„ค๊ณ„ํ• ์ง€๊ฐ€ ์ผ๋ฐ˜ํ™”์˜ 8ํ• ์„ ๊ฒฐ์ •ํ•œ๋‹ค. RotateIt์€ ์ด๊ฑธ ๊น”๋”ํ•˜๊ฒŒ ์ฆ๋ช…ํ•œ ์‚ฌ๋ก€๋‹ค.

์™œ ์ด ๋ฌธ์ œ๊ฐ€ ์–ด๋ ค์šด๊ฐ€

์† ์•ˆ์—์„œ ํŽœ์„ ๋Œ๋ ค๋ณด์ž. ํ•œ ๋ฐ”ํ€ด ๋„๋Š” ๋™์•ˆ ์†๊ฐ€๋ฝ์€ ์žก์•˜๋‹ค ๋†“์•˜๋‹ค๋ฅผ ์ˆ˜์‹ญ ๋ฒˆ ๋ฐ˜๋ณตํ•œ๋‹ค. 1์ดˆ ์•ˆ์— ์ผ์–ด๋‚˜๋Š” ์ผ์ด๊ณ , ์–ด๋А ํ•œ ์†๊ฐ€๋ฝ์ด ์กฐ๊ธˆ๋งŒ ์ผ์ฐ ๋–ผ๋ฉด ํŽœ์€ ๋–จ์–ด์ง„๋‹ค. ์‚ฌ๋žŒ์€ ์ด๊ฑธ ๊ฑฐ์˜ ๋ฌด์˜์‹์ ์œผ๋กœ ํ•œ๋‹ค. ํ”ผ๋ถ€ ์ „์ฒด์— ๊น”๋ฆฐ ์ด‰๊ฐ ์„ผ์„œ, ์†์˜ ์ž๊ธฐ์ˆ˜์šฉ๊ฐ๊ฐ, ์‹œ์•ผ ์ฃผ๋ณ€๋ถ€์—์„œ ๋“ค์–ด์˜ค๋Š” ์‹œ๊ฐ ์ •๋ณด (๊ทธ๋ฆฌ๊ณ  ํ‰์ƒ์˜ ๊ฒฝํ—˜)๊ฐ€ ํ•ฉ์ณ์ ธ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๋งˆ๋ฒ•์ด๋‹ค.

๋กœ๋ด‡ ์†์€ ์ด ๋ชจ๋“  ๊ฒŒ ๋ถ€์กฑํ•˜๋‹ค. ๊ทธ๋ž˜์„œ in-hand rotation์€ โ€œ๋‹จ์ˆœํ•œ ํšŒ์ „ ๋ฌธ์ œโ€๋ผ๊ธฐ๋ณด๋‹ค โ€œ์„ผ์„œ ๊ฒฐํ• ์ƒํƒœ์—์„œ ์ ‘์ด‰ ๋™์—ญํ•™์„ ์–ด๋–ป๊ฒŒ๋“  ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œโ€์— ๊ฐ€๊น๋‹ค. ์–ด๋ ค์›€์„ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

๋ฌผ์ฒด ๋‹ค์–‘์„ฑ. ์‚ฌ๊ณผ, ์›๊ธฐ๋‘ฅ, ํŽœ, ์ •์œก๋ฉด์ฒด๋Š” ๋งˆ์ฐฐ ๋ถ„ํฌ๋„ ๋ฌด๊ฒŒ์ค‘์‹ฌ๋„ ๊ด€์„ฑํ…์„œ๋„ ๋‹ค๋ฅด๋‹ค. ์ •์ฑ… ํ•˜๋‚˜๋กœ ๋ชจ๋‘ ๋‹ค๋ฃจ๋ ค๋ฉด ๋ฌผ์ฒด ํ‘œํ˜„์ด ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•ด์•ผ ํ•œ๋‹ค.

ํšŒ์ „์ถ• ์ผ๋ฐ˜ํ™”. ๊ธฐ์กด ์—ฐ๊ตฌ์˜ ๋Œ€๋ถ€๋ถ„์€ z์ถ•(์ค‘๋ ฅ์ถ•) ํšŒ์ „๋งŒ ๋‹ค๋ค˜๋‹ค. HORA, ๊ทธ ์ด์ „์˜ OpenAI ํ๋ธŒ ๋“ฑ์ด ๊ทธ ์˜ˆ๋‹ค. x์ถ•์ด๋‚˜ y์ถ•์œผ๋กœ ๋Œ๋ฆฌ๋ ค๋Š” ์ˆœ๊ฐ„ ์ค‘๋ ฅ์ด ๋ฌธ์ œ๋ฅผ ์™„์ „ํžˆ ๋‹ค๋ฅด๊ฒŒ ๋ฐ”๊ฟ”๋†“๋Š”๋‹ค. ์†๊ฐ€๋ฝ์ด ์œ„์ชฝ์—์„œ ๋– ๋ฐ›์ณ์•ผ ํ•˜๋Š” ์ƒํ™ฉ๊ณผ ์˜†์—์„œ ์žก์•„์•ผ ํ•˜๋Š” ์ƒํ™ฉ์€ ๋™์—ญํ•™์ด ๋‹ค๋ฅด๋‹ค.

๋ถ€๋ถ„ ๊ด€์ธก. ์†์ด ๋ฌผ์ฒด์˜ ๋ฐ˜๋Œ€ํŽธ์„ ๊ฐ€๋ฆฌ๊ณ , ์นด๋ฉ”๋ผ๋Š” ํ•œ ์‹œ์ ์—์„œ๋งŒ ๋ณธ๋‹ค. ์ด‰๊ฐ์€ ์žˆ์–ด์•ผ ํ•˜๋Š”๋ฐ Allegro Hand์—๋Š” ๊ธฐ๋ณธ ํƒ‘์žฌ๋œ ํ’๋ถ€ํ•œ ์ด‰๊ฐ์ด ์—†๋‹ค.

Sim-to-real. ์ ‘์ด‰ ๋™์—ญํ•™์€ PhysX ๊ฐ™์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ๋„ ๊ฐ€์žฅ ๋…ธ์ด์ง€ํ•œ ๋ถ€๋ถ„์ด๋‹ค. ํ•™์Šต๋œ ์ •์ฑ…์ด ์‹ค์ œ ์†์œผ๋กœ ์˜ฎ๊ฒจ๊ฐ”์„ ๋•Œ ๋ฌด๋„ˆ์ง€์ง€ ์•Š์œผ๋ ค๋ฉด ๋„๋ฉ”์ธ ๋žœ๋คํ™”์™€ ํ‘œํ˜„ ์„ค๊ณ„๊ฐ€ ์ ˆ๋ฌ˜ํ•ด์•ผ ํ•œ๋‹ค.

ํฐ ๊ทธ๋ฆผ: ๋‘ ๋‹จ๊ณ„ ํ•™์Šต ๊ตฌ์กฐ

RotateIt์˜ ์‹œ์Šคํ…œ ๊ตฌ์กฐ๋ฅผ ํ•œ ์žฅ์˜ ๋‹ค์ด์–ด๊ทธ๋žจ์œผ๋กœ ๊ทธ๋ฆฌ๋ฉด ์ด๋ ‡๊ฒŒ ๋œ๋‹ค.

graph TB
    subgraph Stage1["Stage 1: Teacher Policy (Simulation Only)"]
        S1[Privileged State<br/>object pose, velocity, shape,<br/>mass, friction, COM, axis k]
        S1 --> T[Teacher MLP]
        T --> A1[Joint Targets]
        A1 --> R[Reward: omega dot k]
        R -.PPO Update.-> T
    end
    
    subgraph Stage2["Stage 2: Student Policy (Distillation)"]
        O[Depth Image] --> PE[Point Cloud<br/>Encoder]
        TC[Tactile Binary] --> TE[Tactile Encoder]
        P[Proprioception<br/>q, q_target] --> PR[Proprio Encoder]
        K[Target Axis k] --> KE[Axis Embedding]
        PE --> TF[Transformer Backbone]
        TE --> TF
        PR --> TF
        KE --> TF
        TF --> A2[Joint Targets]
    end
    
    T -.Action Imitation.-> A2

ํ•™์Šต์€ ๋‘ ๋‹จ๊ณ„๋‹ค.

1๋‹จ๊ณ„ โ€” ๊ต์‚ฌ ์ •์ฑ… ํ•™์Šต. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ(IsaacGym) ์•ˆ์—์„œ, ์ •์ฑ…์—๊ฒŒ ๋ชจ๋“  ๊ฑธ ์•Œ๋ ค์ค€๋‹ค. ๋ฌผ์ฒด์˜ ์ •ํ™•ํ•œ ์ž์„ธ์™€ ์†๋„, ๋ฌด๊ฒŒ, ๊ด€์„ฑ, ๋งˆ์ฐฐ๊ณ„์ˆ˜, ๋ฌด๊ฒŒ์ค‘์‹ฌ ์œ„์น˜, ๊ทธ๋ฆฌ๊ณ  ๋ชฉํ‘œ ํšŒ์ „์ถ• \hat{k}๊นŒ์ง€. ์ด๊ฑธ ๋“ค๊ณ  PPO๋กœ ์ •์ฑ…์„ ํ•™์Šตํ•œ๋‹ค. ๋ณด์ƒ์€ ๊ฑฐ์˜ ํ•œ ์ค„: โ€œ๋ฌผ์ฒด๊ฐ€ ๋ชฉํ‘œ ์ถ• ๋ฐฉํ–ฅ์œผ๋กœ ๋นจ๋ฆฌ ๋Œ์•„๊ฐˆ์ˆ˜๋ก ์ข‹๋‹ค.โ€ ๋‹ค๋ฅธ ํ•ญ๋“ค์€ ๋‹ค ํŽ˜๋„ํ‹ฐ(์†์ด ๋„ˆ๋ฌด ๋ฉ€๋ฆฌ ๊ฐ€์ง€ ๋งˆ๋ผ, ์•ก์…˜์ด ํŠ€์ง€ ๋งˆ๋ผ ๋“ฑ)๋‹ค.

2๋‹จ๊ณ„ โ€” ํ•™์ƒ ์ •์ฑ… ์ฆ๋ฅ˜. ์ด์ œ ๊ต์‚ฌ๊ฐ€ ๋งŒ๋“œ๋Š” ํ–‰๋™์„ ํ‰๋‚ด๋‚ด๋Š” ํ•™์ƒ์„ ๋งŒ๋“ ๋‹ค. ํ•™์ƒ์˜ ์ž…๋ ฅ์€ ํ˜„์‹ค ๋กœ๋ด‡์—์„œ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ๋งŒ: ๊นŠ์ด ์˜์ƒ, ์ด์ง„ ์ ‘์ด‰ ์‹ ํ˜ธ, ๊ด€์ ˆ ๊ฐ๋„, ๊ทธ๋ฆฌ๊ณ  ๋ชฉํ‘œ ํšŒ์ „์ถ•. ๊ต์‚ฌ์˜ ์ถœ๋ ฅ ์•ก์…˜์„ ๋ผ๋ฒจ๋กœ ์‚ผ์•„ ์ง€๋„ํ•™์Šต ๋น„์Šทํ•˜๊ฒŒ ๊ตด๋ฆฌ๋˜, ํ•™์ƒ์ด ๋งŒ๋“  trajectory ์œ„์—์„œ ํ•™์Šตํ•˜๋Š” DAgger ์Šคํƒ€์ผ ์ฆ๋ฅ˜๋ฅผ ์“ด๋‹ค.

์ด ๊ตฌ์กฐ๊ฐ€ ๊ฐ•๋ ฅํ•œ ์ด์œ ๋ฅผ ์ง๊ด€์ ์œผ๋กœ ๋ณด์ž. ๊ฐ•ํ™”ํ•™์Šต์€ ํ‘œํ˜„์ด ์ข‹์„์ˆ˜๋ก ์ž˜ ๋œ๋‹ค. ๊ทธ๋ž˜์„œ ์ •ํ™•ํ•œ ์ƒํƒœ๋ฅผ ๋‹ค ์•Œ๋ ค์ฃผ๋ฉด RL์ด ์ž˜ ํ’€๋ฆฐ๋‹ค. ํ•˜์ง€๋งŒ ๊ทธ ํ‘œํ˜„์€ ํ˜„์‹ค์—์„œ ๋ชป ์–ป๋Š”๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด? ์ผ๋‹จ ์ž˜ ํ‘ธ๋Š” ์ •์ฑ…์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์•ˆ์—์„œ ๋งŒ๋“ค์–ด ๋‘๊ณ , ๊ทธ ์ •์ฑ…์˜ ํ–‰๋™์„ โ€œํ˜„์‹ค์—์„œ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ž…๋ ฅโ€์œผ๋กœ ํ‰๋‚ด๋‚ด๋Š” ํ•™์ƒ์„ ๋”ฐ๋กœ ์ง“๋Š”๋‹ค. ๊ฐ™์€ ํ–‰๋™์„ ๋‹ค๋ฅธ ์ž…๋ ฅ์œผ๋กœ ์žฌ๊ตฌํ˜„ํ•˜๋Š” ์…ˆ์ด๋‹ค.

์ด ํŒจํ„ด์€ HORA, DeXtreme, AnyRotate ๋ชจ๋‘๊ฐ€ ๊ณต์œ ํ•œ๋‹ค. RotateIt์˜ ์ฐจ๋ณ„์ ์€ ํ•™์ƒ ์ •์ฑ…์— ๋น„์ „๊ณผ ์ด‰๊ฐ์„ ํ•จ๊ป˜ ๋„ฃ์—ˆ๋‹ค๋Š” ๊ฒƒ, ๊ทธ๋ฆฌ๊ณ  ํšŒ์ „์ถ• ์ผ๋ฐ˜ํ™”๋ฅผ ์œ„ํ•œ ์กฐ๊ฑดํ™”๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์„ค๊ณ„ํ–ˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

Teacher ์ •์ฑ…: ์‹ ์˜ ์‹œ์•ผ๋กœ RL์„ ๋‹จ์ˆœํ•˜๊ฒŒ

๊ต์‚ฌ์˜ ์ž…๋ ฅ์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ง์ ‘ ๋ฝ‘์€ ํŠน๊ถŒ ์ •๋ณด(privileged information)๋‹ค.

์ž…๋ ฅ ์ข…๋ฅ˜ ๋‚ด์šฉ ์ฐจ์› (๋Œ€๋žต)
์ž๊ธฐ์ˆ˜์šฉ๊ฐ๊ฐ ์† ๊ด€์ ˆ๊ฐ q, ๋ชฉํ‘œ ๊ด€์ ˆ๊ฐ q^{tgt} \sim 32
๋ฌผ์ฒด ์ƒํƒœ ์œ„์น˜ p_o, ์ž์„ธ quaternion, ์„ /๊ฐ ์†๋„ \sim 13
๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์งˆ๋Ÿ‰, ๋งˆ์ฐฐ, ๋ฌด๊ฒŒ์ค‘์‹ฌ, ๊ด€์„ฑํ…์„œ, ํฌ๊ธฐ \sim 10
ํ˜•์ƒ ํ‘œํ˜„ ๋ฌผ์ฒด ์ ๊ตฐ ๋˜๋Š” BPS ์ž„๋ฒ ๋”ฉ \sim 32~128
๋ชฉํ‘œ ํšŒ์ „์ถ• ๋‹จ์œ„๋ฒกํ„ฐ \hat{k} 3

์ด๊ฑธ MLP์— ๋„ฃ์–ด ๋น„๋ก€ ์ œ์–ด(target joint position)๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. ๋ณด์ƒ์€ ๋‹ค์Œ ํ˜•ํƒœ๋กœ ์š”์•ฝ๋œ๋‹ค.

r_t = \underbrace{\boldsymbol{\omega}_t \cdot \hat{k}}_{\text{์ถ• ์ •๋ ฌ ํšŒ์ „ ์†๋„}} \;-\; c_1 \, d_{\text{wrist}} \;-\; c_2 \, \|q_t - q_0\| \;-\; c_3 \, \|\tau_t\| \;-\; c_4 \, \|a_t\|

์ด๊ฑธ ์ง๊ด€์ ์œผ๋กœ ์ฝ์–ด๋ณด์ž. ํ•ต์‹ฌ ๋ณด์ƒ์€ ์ฒซ ๋ฒˆ์งธ ํ•ญ ํ•˜๋‚˜๋ฟ์ด๋‹ค. ๋ฌผ์ฒด์˜ ๊ฐ์†๋„ \boldsymbol{\omega}_t๋ฅผ ๋ชฉํ‘œ ์ถ• \hat{k}์— ์‚ฌ์˜(projection)ํ•œ ๊ฐ’. ๋งŒ์•ฝ ๋ฌผ์ฒด๊ฐ€ ์ •ํ™•ํžˆ ์›ํ•˜๋Š” ์ถ•์œผ๋กœ ๋น ๋ฅด๊ฒŒ ๋Œ๊ณ  ์žˆ๋‹ค๋ฉด ์ด ๊ฐ’์€ ํฌ๊ณ , ๋‹ค๋ฅธ ์ถ•์œผ๋กœ ๋Œ๊ฑฐ๋‚˜ ์•ˆ ๋Œ๊ณ  ์žˆ์œผ๋ฉด ์ž‘๋‹ค. ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ โ€œ์ด์ƒํ•œ ์ง“์„ ํ•˜์ง€ ๋งˆ๋ผโ€๋Š” ํŽ˜๋„ํ‹ฐ๋‹ค. ์†๋ฐ”๋‹ฅ์—์„œ ๋„ˆ๋ฌด ๋ฉ€์–ด์ง€๋ฉด ์•ˆ ๋˜๊ณ , ๊ด€์ ˆ์ด ์ดˆ๊ธฐ ์ž์„ธ์—์„œ ๋„ˆ๋ฌด ๋ฒ—์–ด๋‚˜๋„ ์•ˆ ๋˜๊ณ , ํ† ํฌ์™€ ์•ก์…˜ ํฌ๊ธฐ๋„ ์ž‘์•„์•ผ ํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ ํฅ๋ฏธ๋กœ์šด ๋””์ž์ธ ์„ ํƒ ํ•˜๋‚˜. ๋ฌผ์ฒด ์ž์„ธ(orientation)๋ฅผ ๋ชฉํ‘œ๋กœ ์ฃผ์ง€ ์•Š๋Š”๋‹ค. โ€œ์ด ๊ฐ๋„๋กœ ๋งž์ถฐ๋ผโ€๊ฐ€ ์•„๋‹ˆ๋ผ โ€œ์ด ์ถ•์œผ๋กœ ๊ณ„์† ๋Œ๋ ค๋ผโ€๋‹ค. ์™œ๋ƒํ•˜๋ฉด in-hand rotation์˜ ๋ณธ์งˆ์€ โ€œํŠน์ • ๊ฐ๋„์— ๋„๋‹ฌํ•˜๊ธฐโ€๊ฐ€ ์•„๋‹ˆ๋ผ โ€œ์—ฐ์†์ ์œผ๋กœ ๋Œ๋ฆด ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅโ€์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ํšŒ์ „ ์†๋„๋ฅผ ๋ณด์ƒํ•˜๋ฉด ์ •์ฑ…์€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ โ€œ์žก๊ณ -๋†“๊ณ -๋‹ค์‹œ ์žก๊ธฐโ€์˜ ๋ฆฌ๋“ฌ์„ ์ตํžŒ๋‹ค.

PPO๋ฅผ IsaacGym์—์„œ ์ˆ˜์ฒœ ํ™˜๊ฒฝ ๋ณ‘๋ ฌ๋กœ ๋Œ๋ฆฐ๋‹ค. ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋„ ๊ต์‚ฌ ๋‹จ๊ณ„์—์„œ ์ง„ํ–‰ํ•œ๋‹ค. ๋งˆ์ฐฐ, ์งˆ๋Ÿ‰, ๋ฌด๊ฒŒ์ค‘์‹ฌ, ์™ธ๋ž€ ํ† ํฌ, ๊ด€์ ˆ PD ๊ฒŒ์ธ ๋“ฑ์ด ๋งค episode๋งˆ๋‹ค ํ”๋“ค๋ฆฐ๋‹ค. ์ด ๋•๋ถ„์— ๊ต์‚ฌ ์ •์ฑ… ์ž์ฒด๊ฐ€ ์ด๋ฏธ โ€œ๋‹ค์–‘ํ•œ ๋ฌผ๋ฆฌ ์กฐ๊ฑด์—์„œ ๊ฒฌ๊ณ ํ•œ ์ •์ฑ…โ€์ด ๋œ๋‹ค.

Student ์ •์ฑ…: ๋น„์ „๊ณผ ์ด‰๊ฐ์„ ์–ด๋–ป๊ฒŒ ๋ฌถ์„ ๊ฒƒ์ธ๊ฐ€

ํ•™์ƒ ์ •์ฑ…์˜ ์ž…๋ ฅ์„ ๋‹ค์‹œ ์ •๋ฆฌํ•˜๋ฉด ๋„ค ์ข…๋ฅ˜๋‹ค.

  1. ์ž๊ธฐ์ˆ˜์šฉ๊ฐ๊ฐ: ๊ด€์ ˆ๊ฐ q_t์™€ ์ง์ „ ๋ชฉํ‘œ ๊ด€์ ˆ๊ฐ q^{tgt}_{t-1}. ์•ฝ 32์ฐจ์›.
  2. ๋น„์ „: ์†๋ฐ”๋‹ฅ ์œ„์ชฝ์— ๋ถ€์ฐฉ๋œ ๊นŠ์ด ์นด๋ฉ”๋ผ์—์„œ ์–ป์€ ๋ถ€๋ถ„ ์ ๊ตฐ(partial point cloud). ๋‹ค์šด์ƒ˜ํ”Œ๋งํ•˜์—ฌ ์ˆ˜๋ฐฑ ๊ฐœ ์ .
  3. ์ด‰๊ฐ: ์†์˜ ์—ฌ๋Ÿฌ ๋งํฌ์— ๋ถ„ํฌํ•œ ์ด์ง„ ์ ‘์ด‰ ์‹ ํ˜ธ. ๊ฐ ์ ‘์ด‰ ํŒจ๋“œ์—์„œ โ€œ๋‹ฟ์•˜๋‹ค/์•ˆ ๋‹ฟ์•˜๋‹คโ€๋งŒ ์•Œ๋ ค์ฃผ๋Š” 0/1 ๋ฒกํ„ฐ.
  4. ๋ชฉํ‘œ: ํšŒ์ „์ถ• \hat{k} \in \mathbb{R}^3.

์ด๊ฑธ ํŠธ๋žœ์Šคํฌ๋จธ ๋ฐฑ๋ณธ ํ•˜๋‚˜์— ํ† ํฐ ํ˜•ํƒœ๋กœ ๋„ฃ๋Š”๋‹ค. ๊ฐ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๊ฐ€ ์ž๊ธฐ ์ธ์ฝ”๋”(MLP ํ˜น์€ PointNet ๋ฅ˜)๋ฅผ ๊ฑฐ์ณ ๊ฐ™์€ ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜๋œ ๋’ค, ์‹œ๊ณ„์—ด๋กœ ์Œ“์—ฌ ํŠธ๋žœ์Šคํฌ๋จธ์˜ self-attention์„ ํ†ต๊ณผํ•œ๋‹ค.

Pseudocode: Student Forward Pass

def student_policy(depth_img, tactile_bin, proprio, axis_k):
    # Per-modality encoders
    pc_tokens   = pointnet_encoder(depth_to_pointcloud(depth_img))   # [N_pc, D]
    tac_tokens  = tactile_mlp(tactile_bin)                            # [N_tac, D]
    prop_token  = proprio_mlp(proprio)                                # [1, D]
    axis_token  = axis_mlp(axis_k)                                    # [1, D]
    
    # Concatenate as a token sequence and run transformer
    tokens = concat([pc_tokens, tac_tokens, prop_token, axis_token])
    
    # Temporal context: stack last T frames of tokens
    seq = stack_recent_frames(tokens, history=T)
    z = transformer(seq)
    
    # Read out action from a designated CLS-like token
    action = action_head(z[CLS])
    return action

์—ฌ๊ธฐ์„œ ์งš์–ด์•ผ ํ•  ๋””ํ…Œ์ผ์ด ๋ช‡ ๊ฐ€์ง€ ์žˆ๋‹ค.

์™œ ์ ๊ตฐ์ธ๊ฐ€, ์™œ ์ด๋ฏธ์ง€๊ฐ€ ์•„๋‹Œ๊ฐ€. ๊นŠ์ด ์˜์ƒ์„ ๊ทธ๋Œ€๋กœ CNN์— ๋„ฃ์–ด๋„ ๋˜์ง€๋งŒ, ์ ๊ตฐ์œผ๋กœ ๋ณ€ํ™˜ํ•ด PointNet ๋ฅ˜๋กœ ์ฒ˜๋ฆฌํ•˜๋ฉด ์ขŒํ‘œ๊ณ„ ์ •ํ•ฉ๊ณผ ์‹œ์  ๋ณ€ํ™”์— ๋” ๊ฐ•๊ฑดํ•˜๋‹ค. Allegro Hand๊ฐ€ ์›€์ง์ด๋ฉด ์นด๋ฉ”๋ผ ์‹œ์ ์—์„œ ๋ณธ ๋ฌผ์ฒด์˜ ๋ชจ์–‘์€ ๋งค ํ”„๋ ˆ์ž„ ๋ฐ”๋€Œ๋Š”๋ฐ, ์ ๊ตฐ์€ ์† ๊ธฐ์ค€ ์ขŒํ‘œ๊ณ„๋กœ ๋ณ€ํ™˜ํ•ด ์“ธ ์ˆ˜ ์žˆ๋‹ค.

์™œ ์ด์ง„ ์ด‰๊ฐ์ธ๊ฐ€. ์‚ฌ์‹ค GelSight๋‚˜ DIGIT ๊ฐ™์€ ๊ณ ํ•ด์ƒ๋„ ์ด‰๊ฐ ์˜์ƒ์€ ์ •๋ณด๋Ÿ‰์ด ํ›จ์”ฌ ๋งŽ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ RotateIt์€ ์ผ๋ถ€๋Ÿฌ ๋‹จ์ˆœํ™”ํ•œ๋‹ค. ์ด์œ ๋Š” ๋‘ ๊ฐ€์ง€๋‹ค. (1) ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ’๋ถ€ํ•œ ์ด‰๊ฐ์„ ์ถฉ์‹คํžˆ ์žฌํ˜„ํ•˜๊ธฐ ์–ด๋ ต๋‹ค. PhysX์—์„œ ์ด์ง„ ์ ‘์ด‰์€ ๊ฑฐ์˜ ๋ฌด๋ฃŒ์ง€๋งŒ ์ •๋ฐ€ ์••๋ ฅ ๋ถ„ํฌ๋Š” sim-to-real gap์ด ํฌ๋‹ค. (2) ์ด์ง„ ์‹ ํ˜ธ๋งŒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํžˆ โ€œ์–ด๋А ์†๊ฐ€๋ฝ์ด ์–ด๋””์„œ ๋‹ฟ๊ณ  ์žˆ๋Š”์ง€โ€๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋‹ค. in-hand rotation์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ •๋ณด๋Š” ์ด ์œ„์ƒ ์ •๋ณด๋‹ค.

์ถ• ์กฐ๊ฑดํ™”์˜ ํšจ๊ณผ. ๊ฐ™์€ ๋„คํŠธ์›Œํฌ๊ฐ€ \hat{k} = (0,0,1)์ด๋ฉด z์ถ• ํšŒ์ „์„, \hat{k} = (1,0,0)์ด๋ฉด x์ถ• ํšŒ์ „์„ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค. ์ด๊ฒŒ ๋‹จ์ˆœํ•œ trick ๊ฐ™์ง€๋งŒ, RL ํ•™์Šต์—์„œ๋Š” ํฐ ์ฐจ์ด๋ฅผ ๋งŒ๋“ ๋‹ค. ์ถ•๋งˆ๋‹ค ๋”ฐ๋กœ ์ •์ฑ…์„ ํ•™์Šตํ•˜๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ„์‚ฐ๋˜๊ณ , ์ถ•๋“ค ์‚ฌ์ด์˜ ๊ณต์œ  ๊ฐ€๋Šฅํ•œ ํ‘œํ˜„์„ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค. ํ•˜๋‚˜์˜ ์ •์ฑ…์ด ๋ชจ๋“  ์ถ•์— ๋Œ€ํ•ด ํ•™์Šตํ•˜๋ฉด โ€œ์žก๊ณ -๋†“๊ธฐโ€์˜ ๊ณตํ†ต ํŒจํ„ด์„ ๊ณต์œ ํ•˜๋ฉด์„œ ์ถ•๋ณ„๋กœ ๋ฏธ์„ธ ์กฐ์ •๋งŒ ํ•œ๋‹ค.

์ฆ๋ฅ˜ ์†์‹ค์€ ํ–‰๋™ ๋ชจ๋ฐฉ(behavior cloning) ๊ธฐ๋ฐ˜์ด๋‹ค.

\mathcal{L}_{\text{distill}} = \mathbb{E}_{s \sim \pi_{\text{student}}} \big[ \|\pi_{\text{teacher}}(s_{\text{priv}}) - \pi_{\text{student}}(s_{\text{obs}})\|^2 \big]

ํ•™์ƒ์ด ๋งŒ๋“  trajectory ์œ„์—์„œ (DAgger์ฒ˜๋Ÿผ), ๊ฐ™์€ ์ƒํƒœ์—์„œ ๊ต์‚ฌ๊ฐ€ ์–ด๋–ค ์•ก์…˜์„ ํ–ˆ์„์ง€๋ฅผ ํšŒ๊ท€๋กœ ๋งž์ถ˜๋‹ค. ํ•™์ƒ์ด ๋ฏธ์ˆ™ํ•ด์„œ ๋–จ์–ด๋œจ๋ฆฌ๋Š” ์ƒํƒœ๊นŒ์ง€ ๋ชจ๋‘ ๊ฒฝํ—˜ํ•˜๋ฏ€๋กœ ๋ถ„ํฌ ์ด๋™(distribution shift)์— ๊ฐ•ํ•ด์ง„๋‹ค.

๋ณด์ƒ๊ณผ ๋„๋ฉ”์ธ ๋žœ๋คํ™” ๋””ํ…Œ์ผ

๊ต์‚ฌ ํ•™์Šต์˜ ๋ณด์ƒ์€ ์•ž์„œ ๋ดค์ง€๋งŒ, ์‹ค์ „์— ํ•„์š”ํ•œ ๋””ํ…Œ์ผ ๋ช‡ ๊ฐœ๋ฅผ ์งš์ž.

์ถ• ์ •๋ ฌ ํ•ญ์˜ ๋ถ€ํ˜ธ. \boldsymbol{\omega}_t \cdot \hat{k}๋Š” ๋ถ€ํ˜ธ๊ฐ€ ์žˆ๋‹ค. ์ฆ‰ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ๋„๋Š” ๊ฑด ์Œ์ˆ˜ ๋ณด์ƒ์ด ๋œ๋‹ค. ์ •์ฑ…์€ โ€œ์ •ํ•ด์ง„ ๋ฐฉํ–ฅ์œผ๋กœ ๋Œ๋ฆฌ๊ธฐโ€๋ฅผ ํ•™์Šตํ•œ๋‹ค.

Drop penalty. ๋ฌผ์ฒด๊ฐ€ ์†๋ฐ”๋‹ฅ ์˜์—ญ์„ ๋ฒ—์–ด๋‚˜๋ฉด(๋†’์ด๊ฐ€ ์–ด๋–ค ์ž„๊ณ„ ์•„๋ž˜๋กœ ๋–จ์–ด์ง€๋ฉด) episode๋ฅผ ์ข…๋ฃŒํ•˜๊ณ  ํฐ ์Œ์ˆ˜ ๋ณด์ƒ์„ ์ค€๋‹ค. ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š๋Š” ๊ฒŒ ๊ฐ€์žฅ ์šฐ์„ ์ด๋ผ๋Š” ์šฐ์„ ์ˆœ์œ„๊ฐ€ ๋ณด์ƒ์— ์ƒˆ๊ฒจ์ง„๋‹ค.

Action smoothness. \|a_t - a_{t-1}\| ๊ฐ™์€ ํ•ญ์ด ๋“ค์–ด๊ฐ€ ์•ก์…˜์ด ์ง„๋™ํ•˜์ง€ ์•Š๋„๋ก ๋งŒ๋“ ๋‹ค. ์‹ค์ œ ํ•˜๋“œ์›จ์–ด๋กœ ์˜ฎ๊ธธ ๋•Œ ๋งค์šฐ ์ค‘์š”ํ•œ ํ•ญ์ด๋‹ค.

๋„๋ฉ”์ธ ๋žœ๋คํ™”์˜ ํ•ต์‹ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ํ‘œ๋กœ ์ •๋ฆฌํ•˜๋ฉด ๋Œ€๋žต ์ด๋ ‡๋‹ค.

์นดํ…Œ๊ณ ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜๋ฏธ
๋ฌผ์ฒด ์งˆ๋Ÿ‰, ํฌ๊ธฐ, ๋งˆ์ฐฐ๊ณ„์ˆ˜, ๋ฌด๊ฒŒ์ค‘์‹ฌ ์˜คํ”„์…‹ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด ์ผ๋ฐ˜ํ™”
์† ๊ด€์ ˆ PD ๊ฒŒ์ธ, ๊ด€์ ˆ ๋งˆ์ฐฐ, ๊ด€์ ˆ๊ฐ ๋…ธ์ด์ฆˆ ์‹ค ๋กœ๋ด‡ ์‘๋‹ต ํŠน์„ฑ ํก์ˆ˜
๋™์—ญํ•™ ์™ธ๋ž€ ํ† ํฌ, ์™ธ๋ž€ ํž˜ (๋ฌผ์ฒด์— ๊ฐ€ํ•ด์ง€๋Š”) ๋ฏธ์„ธ ์™ธ๋ž€ ํก์ˆ˜
์„ผ์„œ ๊นŠ์ด ๋…ธ์ด์ฆˆ, ์ ‘์ด‰ ์‹ ํ˜ธ dropout, ๊ด€์ธก ์ง€์—ฐ ์„ผ์„œ ๋…ธ์ด์ฆˆ/์ง€์—ฐ ํก์ˆ˜

์ด ์ค‘ ๊ฐ€์žฅ ๊นŒ๋‹ค๋กœ์šด ๊ฒŒ ์ด‰๊ฐ์˜ dropout๊ณผ ๊ด€์ธก ์ง€์—ฐ์ด๋‹ค. ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ ์ด‰๊ฐ ์„ผ์„œ๋Š” ๊ฐ€๋” ์‹ ํ˜ธ๋ฅผ ๋†“์น˜๊ณ , ๋น„์ „ ํŒŒ์ดํ”„๋ผ์ธ์€ 100ms ๋‹จ์œ„์˜ ์ง€์—ฐ์ด ๋ฐœ์ƒํ•œ๋‹ค. ํ•™์Šตํ•  ๋•Œ ์ด๊ฑธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์— ์˜๋„์ ์œผ๋กœ ์ฃผ์ž…ํ•˜์ง€ ์•Š์œผ๋ฉด ์‹ค ๋กœ๋ด‡์—์„œ ํ•œ์ˆœ๊ฐ„์— ๋ฌด๋„ˆ์ง„๋‹ค.

์‹คํ—˜ ์„ค์ •

์‹ค์ œ ์…‹์—…์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

ํ•˜๋“œ์›จ์–ด. Allegro Hand(4์ง€ 16์ž์œ ๋„), RealSense ๋ฅ˜ ๊นŠ์ด ์นด๋ฉ”๋ผ, ์† ์œ„์˜ ์ž์ž‘ ์ด์ง„ ์ด‰๊ฐ ํŒจ๋“œ. ์†์€ ์œ„์ชฝ์œผ๋กœ ๋“ค๊ณ  ์žˆ๋Š” ์ž์„ธ(palm-up)์™€ ์˜†์„ ํ–ฅํ•œ ์ž์„ธ ๋“ฑ ์—ฌ๋Ÿฌ ์ž์„ธ์—์„œ ํ…Œ์ŠคํŠธ๋œ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ. IsaacGym, ์ˆ˜์ฒœ ๊ฐœ ํ™˜๊ฒฝ ๋ณ‘๋ ฌ. ํ•™์Šต ์‹œ๊ฐ„์€ ๋ฉฐ์น  ๋‹จ์œ„.

๋ฌผ์ฒด ์…‹. YCB-์Šคํƒ€์ผ์˜ ๋‹ค์–‘ํ•œ ๋ชจ์–‘: ์ •์œก๋ฉด์ฒด, ์›๊ธฐ๋‘ฅ, ๊ตฌ, ๋น„๋Œ€์นญ ๋„ํ˜• ๋“ฑ. ํ•™์Šต์šฉ๊ณผ ํ‰๊ฐ€์šฉ์„ ๋ถ„๋ฆฌ.

ํ‰๊ฐ€ ์ง€ํ‘œ

  1. ํ‰๊ท  ํšŒ์ „ ์†๋„ (rad/s) โ€” ๋ชฉํ‘œ ์ถ• ๊ธฐ์ค€.
  2. drop์ด ์ผ์–ด๋‚˜๊ธฐ๊นŒ์ง€์˜ ํ‰๊ท  ์‹œ๊ฐ„/ํšŒ์ „์ˆ˜.
  3. ๋ฏธ๋ณธ(unseen) ๋ฌผ์ฒด์—์„œ์˜ ํšŒ์ „ ๋Šฅ๋ ฅ.
  4. ์ถ• ๋ณ€๊ฒฝ์— ๋Œ€ํ•œ ์ถ”์ข… ๋Šฅ๋ ฅ.

๊ฒฐ๊ณผ: ๋ฌด์—‡์ด ๋ณด์—ฌ์กŒ๋‚˜

๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ฒฐ๊ณผ๋ฅผ ์ •์„ฑ์ ์œผ๋กœ ์š”์•ฝํ•˜๋ฉด ์ด๋ ‡๋‹ค.

์ž„์˜ ์ถ• ํšŒ์ „์ด ์‹ค์ œ๋กœ ๋œ๋‹ค. Allegro Hand๊ฐ€ ์†๋ฐ”๋‹ฅ ์œ„์— ๋†“์ธ ๋ฌผ์ฒด๋ฅผ x, y, z ์ถ• ๊ฐ๊ฐ์œผ๋กœ (๊ทธ๋ฆฌ๊ณ  ๊ทธ ์‚ฌ์ด์˜ ์ž„์˜ ์ถ•์œผ๋กœ) ๋Š๊น€ ์—†์ด ํšŒ์ „์‹œํ‚ค๋Š” ๋ชจ์Šต์ด ์ •์„ฑ์  ๋น„๋””์˜ค์—์„œ ํ™•์ธ๋œ๋‹ค. ํŠนํžˆ z์ถ•์ด ์•„๋‹Œ ํšŒ์ „(์ค‘๋ ฅ์ด ์ธก๋ฉด์œผ๋กœ ์ž‘์šฉํ•˜๋Š” ํšŒ์ „)๋„ ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค.

๋ชจ์–‘ ์ผ๋ฐ˜ํ™”. ํ•™์Šต์—์„œ ๋ณด์ง€ ๋ชปํ•œ ์ƒˆ๋กœ์šด ๋ฌผ์ฒด์— ๋Œ€ํ•ด์„œ๋„ ํšŒ์ „ ์†๋„๊ฐ€ ํฌ๊ฒŒ ๋–จ์–ด์ง€์ง€ ์•Š๋Š”๋‹ค. ํŠนํžˆ ๋น„๋Œ€์นญ ๋ฌผ์ฒด(๊ณต์ด๋‚˜ ํ๋ธŒ๊ฐ€ ์•„๋‹Œ ๊ฒƒ)์—์„œ๋„ ๊ฒฌ๋”˜๋‹ค๋Š” ๊ฒŒ ์ค‘์š”ํ•˜๋‹ค. ๋ฌด๊ฒŒ์ค‘์‹ฌ์ด ํ•œ์ชฝ์œผ๋กœ ์ ๋ฆฐ ๋ฌผ์ฒด๋Š” ํšŒ์ „ ์ค‘์— ํ† ํฌ ํŒจํ„ด์ด ๋Š์ž„์—†์ด ๋ฐ”๋€Œ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ถ• ๋ณ€๊ฒฝ ์ถ”์ข…. ํšŒ์ „ ์ค‘์— ๋ชฉํ‘œ ์ถ• \hat{k}๋ฅผ ๊ฐ‘์ž๊ธฐ ๋ฐ”๊ฟ”๋„, ์ •์ฑ…์€ ๋‹จ๊ธฐ๊ฐ„ ๋‚ด์— ์ƒˆ ์ถ•์œผ๋กœ ํšŒ์ „ ๋ฐฉํ–ฅ์„ ์ „ํ™˜ํ•œ๋‹ค. ์ด๊ฑด ๋‹จ์ผ ์ •์ฑ… ์•ˆ์— ์—ฌ๋Ÿฌ ์ถ•์ด ํ†ตํ•ฉ๋˜์–ด ์žˆ๋‹ค๋Š” ์ฆ๊ฑฐ๋‹ค.

Ablation: ๋ฌด์—‡์ด ์ •๋ง ์ค‘์š”ํ•œ๊ฐ€

Ablation์ด์•ผ๋ง๋กœ ์ด ๋…ผ๋ฌธ์—์„œ ๊ฐ€์žฅ ๋ฐฐ์šธ ๊ฒŒ ๋งŽ์€ ๋ถ€๋ถ„์ด๋‹ค. ๋ฌด์—‡์„ ๋นผ๋ฉด ๋ฌด๋„ˆ์ง€๋Š”๊ฐ€๋ฅผ ๋ณด๋ฉด, ๋ฌด์—‡์ด ํ•„์ˆ˜์ธ์ง€๊ฐ€ ๋“œ๋Ÿฌ๋‚œ๋‹ค.

์กฐ๊ฑด ๊ฒฐ๊ณผ (๋Œ€๋žต์˜ ๊ฒฝํ–ฅ)
๋น„์ „๋งŒ, ์ด‰๊ฐ ์—†์Œ ๋–จ์–ด๋œจ๋ฆผ์ด ์žฆ์•„์ง. ํŠนํžˆ ๋ฏธ๋ณธ ๋ฌผ์ฒด์—์„œ
์ด‰๊ฐ๋งŒ, ๋น„์ „ ์—†์Œ ํšŒ์ „ ์†๋„ ๊ฐ์†Œ, ๋ฌด๊ฒŒ์ค‘์‹ฌ ์ถ”์ • ๋ถˆ์•ˆ์ •
์ž๊ธฐ์ˆ˜์šฉ๊ฐ๊ฐ๋งŒ z์ถ• ์™ธ ํšŒ์ „์—์„œ ์„ฑ๋Šฅ ํฌ๊ฒŒ ํ•˜๋ฝ
์ถ• ์กฐ๊ฑดํ™” ์ œ๊ฑฐ ๋‹จ์ผ ์ถ•์—๋Š” ํ•™์Šต๋˜๋‚˜ ์ผ๋ฐ˜ํ™” ๋ถˆ๊ฐ€
๋„๋ฉ”์ธ ๋žœ๋คํ™” ์—†์Œ ์‹ค ๋กœ๋ด‡์—์„œ ์ฆ‰์‹œ ์‹คํŒจ

์ฝ์–ด๋‚ด์•ผ ํ•  ํ•ต์‹ฌ์€ ๋‘ ๊ฐ€์ง€๋‹ค.

๋น„์ „๊ณผ ์ด‰๊ฐ์€ ์ƒํ˜ธ๋ณด์™„์ ์ด๋‹ค. ๋น„์ „์€ โ€œ์ „์ฒด ํ˜•์ƒ๊ณผ ์ž์„ธโ€๋ฅผ ๋ณธ๋‹ค. ์ด‰๊ฐ์€ โ€œ์ง€๊ธˆ ๋ฌด์—‡์ด ๋‹ฟ๊ณ  ์žˆ๋Š”๊ฐ€โ€๋ฅผ ๋ณธ๋‹ค. ๋น„์ „๋งŒ ์žˆ์œผ๋ฉด ์†์— ๊ฐ€๋ ค์ง„ ๋ถ€๋ถ„์„ ๋ชจ๋ฅด๊ณ , ์ด‰๊ฐ๋งŒ ์žˆ์œผ๋ฉด ๋ฌผ์ฒด๊ฐ€ ์–ด๋””๋กœ ํ–ฅํ•˜๋Š”์ง€ ์˜ˆ์ธก์ด ์–ด๋ ต๋‹ค. ๋‘˜์ด ํ•ฉ์ณ์ ธ์•ผ ์ •์ฑ…์ด ์•ˆ์ •๋œ๋‹ค.

์ถ• ์กฐ๊ฑดํ™”๋Š” ๋‹จ์ˆœํ•˜์ง€๋งŒ ๊ฒฐ์ •์ ์ด๋‹ค. ๋‹จ์ˆœํžˆ ์ž…๋ ฅ์— 3์ฐจ์› ๋ฒกํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•œ ๊ฒƒ๋ฟ์ธ๋ฐ, ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ์งˆ์ ์œผ๋กœ ๋‹ฌ๋ผ์ง„๋‹ค. ํ‘œํ˜„ ํ•™์Šต์˜ ๊ตํ›ˆ์ด๋‹ค. ์ •๋‹ต์„ ๋ช…์‹œ์ ์œผ๋กœ ์•Œ๋ ค์ฃผ๋Š” ์ž…๋ ฅ ํ•˜๋‚˜๊ฐ€ ๋ฐฑ ๊ฐœ์˜ hidden layer๋ณด๋‹ค ํšจ๊ณผ์ ์ผ ๋•Œ๊ฐ€ ์žˆ๋‹ค.

๊ฐ•์ 

์ผ๋ฐ˜์„ฑ. ํšŒ์ „์ถ•๊ณผ ๋ฌผ์ฒด ๋ชจ์–‘์— ๋™์‹œ์— ์ผ๋ฐ˜ํ™”ํ•œ๋‹ค. ์ด๋Š” in-hand manipulation์˜ ๋ชจ๋“  ํ•˜์œ„ ๋ฌธ์ œ(๋Œ๋ฆฌ๊ธฐ, ์˜ฎ๊ธฐ๊ธฐ, ์ž์„ธ์žก๊ธฐ)๋กœ ๊ฐ€๋Š” ๊ธธ์˜ ์ฒซ ๋‹จ์ถ”๋‹ค.

ํ˜„์‹ค ๊ฐ€๋Šฅ์„ฑ. ๊นŠ์ด ์นด๋ฉ”๋ผ ํ•œ ๋Œ€์™€ ์ด์ง„ ์ด‰๊ฐ๋งŒ์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค. GelSight ๊ฐ™์€ ์ •๋ฐ€ ์ด‰๊ฐ ์—†์ด๋„ ๋œ๋‹ค. Allegro Hand ์‚ฌ์šฉ์ž์—๊ฒŒ ์ด๊ฑด ๋งค์šฐ ํ˜„์‹ค์ ์ธ ์…‹์—…์ด๋‹ค.

ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜. ํŠธ๋žœ์Šคํฌ๋จธ ๋ฐฑ๋ณธ + ํ† ํฐ ๊ธฐ๋ฐ˜ ์ž…๋ ฅ์€ ์ƒˆ๋กœ์šด ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ(์˜ˆ: ์Œํ–ฅ, ์ถ”๊ฐ€ ์นด๋ฉ”๋ผ, ํž˜ํ† ํฌ)๋ฅผ ๋ผ์›Œ ๋„ฃ๊ธฐ ์‰ฝ๋‹ค.

์ฆ๋ฅ˜ ํŒจํ„ด์˜ ์žฌํ˜„์„ฑ. ์ฝ”๋“œ์™€ ํ•™์Šต ๋ ˆ์‹œํ”ผ๊ฐ€ ํ›„์† ์—ฐ๊ตฌ(AnyRotate, DexNDM ๋“ฑ)์—์„œ ์žฌํ™œ์šฉ๋œ๋‹ค. ์ฆ‰ ์ด ๋…ผ๋ฌธ์ด ์ œ์‹œํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ์ปค๋ฎค๋‹ˆํ‹ฐ ํ‘œ์ค€์œผ๋กœ ์ž๋ฆฌ์žก์•˜๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

์ด์ง„ ์ด‰๊ฐ์˜ ํ•œ๊ณ„. 0/1๋งŒ์œผ๋กœ๋Š” ๋ฏธ๋„๋Ÿฌ์ง(slip)์„ ์ •๋ฐ€ํ•˜๊ฒŒ ๊ฐ์ง€ํ•  ์ˆ˜ ์—†๋‹ค. ์‚ฌ๋žŒ์˜ ์†์—์„œ ๋ฏธ๋„๋Ÿฌ์ง ๊ฐ์ง€๋Š” ํšŒ์ „ ์†๋„ ์กฐ์ ˆ์˜ ํ•ต์‹ฌ์ธ๋ฐ, ์ด์ง„ ์‹ ํ˜ธ๋กœ๋Š” ์‚ฌํ›„์ ์œผ๋กœ๋งŒ ์•Œ ์ˆ˜ ์žˆ๋‹ค. DIGIT ๊ฐ™์€ ์‹œ๊ฐ ์ด‰๊ฐ์„ ์œตํ•ฉํ•˜๋ฉด ๋” ๋น ๋ฅด๊ฒŒ ํšŒ์ „์‹œํ‚ค๊ฑฐ๋‚˜ ๋” ์ž‘์€/๋ฏธ๋„๋Ÿฌ์šด ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

์œ„์น˜ ์ œ์–ด์˜ ํ•œ๊ณ„. ์ •์ฑ…์˜ ์ถœ๋ ฅ์€ ๋ชฉํ‘œ ๊ด€์ ˆ๊ฐ์ด๋‹ค. ์ž„ํ”ผ๋˜์Šค/ํฌ์Šค ์ œ์–ด๊ฐ€ ์•„๋‹ˆ๋‹ค. ๋ถ€๋“œ๋Ÿฌ์šด ๋ฌผ์ฒด, ๋ณ€ํ˜•๋˜๋Š” ๋ฌผ์ฒด, ๊นจ์ง€๊ธฐ ์‰ฌ์šด ๋ฌผ์ฒด์—๋Š” ์ž˜ ์•ˆ ๋งž์„ ์ˆ˜ ์žˆ๋‹ค.

Sim-to-real์˜ ๋น„์šฉ. ๋„๋ฉ”์ธ ๋žœ๋คํ™” + ๋‹ค์ค‘ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ํ•™์Šต์€ GPU ์‹œ๊ฐ„์ด ๋งŽ์ด ๋“ ๋‹ค. ์ž‘์€ ๋žฉ์—์„œ ์žฌํ˜„ํ•˜๊ธฐ์—๋Š” ๋ฌด๊ฒ๋‹ค.

ํšŒ์ „ ์†๋„ ์ œํ•œ. ์‹œ์—ฐ ๋น„๋””์˜ค ๊ธฐ์ค€์œผ๋กœ ํšŒ์ „์ด ๋น ๋ฅด์ง„ ์•Š๋‹ค. ์‚ฌ๋žŒ์ด ํŽœ์„ ๋Œ๋ฆฌ๋Š” ์†๋„์—๋Š” ํ•œ์ฐธ ๋ชป ๋ฏธ์นœ๋‹ค. ์ด๊ฑด ํ•˜๋“œ์›จ์–ด(Allegro์˜ ๋™์—ญํ•™ ํ•œ๊ณ„), ์ •์ฑ…(๋ณด์ˆ˜์  ํ•™์Šต), ์„ผ์„œ(๋‚ฎ์€ ํ•ด์ƒ๋„) ๋ชจ๋‘์˜ ํ•ฉ์‚ฐ ๊ฒฐ๊ณผ๋กœ ๋ณด์ธ๋‹ค.

์†๋ฐ”๋‹ฅ ์œ„ ์ž์„ธ ํ•œ์ •. ์†์ด ์˜†์„ ํ–ฅํ•˜๊ฑฐ๋‚˜ ๊ฑฐ๊พธ๋กœ ๋“  ์ž์„ธ์—์„œ์˜ ๊ฒฌ๊ณ ํ•จ์€ ๋น„๊ต์  ์ ๊ฒŒ ๋ณด๊ณ ๋œ๋‹ค. ์‚ฌ๋žŒ์€ ์†์„ ์–ด๋–ป๊ฒŒ ๋“ค๋“  ์†๊ฐ€๋ฝ๋งŒ์œผ๋กœ ํšŒ์ „์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

๊ณ ์ •๋œ ํšŒ์ „ ์ž‘์—…. ํšŒ์ „์ถ•์„ ๋”ฐ๋ผ โ€œ๊ณ„์† ๋Œ๋ฆฌ๊ธฐโ€๋Š” ์ž˜ ํ•˜์ง€๋งŒ, ํŠน์ • ์ž์„ธ๋กœ โ€œ์ •๋ฐ€ํ•˜๊ฒŒ ์ •๋ ฌโ€ํ•˜๋Š” ๋Šฅ๋ ฅ์€ ๋ณ„๋„ ๋ฌธ์ œ๋‹ค. ํ›„์† ์—ฐ๊ตฌ๋“ค(์˜ˆ: Dexonomy์˜ ์ผ๋ฐ˜ ์กฐ์ž‘)์ด ์ด ๋ถ€๋ถ„์„ ๋‹ค๋ฃฌ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ ์ง€๋„

์ด ๋…ผ๋ฌธ์€ in-hand manipulation ์—ฐ๊ตฌ์˜ ํ•œ ๋ถ„๊ธฐ์ ์ด๋‹ค. ์ฃผ๋ณ€ ์—ฐ๊ตฌ๋“ค๊ณผ์˜ ์œ„์น˜ ๊ด€๊ณ„๋ฅผ ์ •๋ฆฌํ•˜๋ฉด ์ด๋ ‡๋‹ค.

graph LR
    OpenAI[OpenAI Cube<br/>2019] --> HORA[HORA<br/>2022<br/>z-axis, proprio]
    HORA --> RotateIt[RotateIt<br/>2023<br/>any axis, vision+touch]
    RotateIt --> AnyRotate[AnyRotate<br/>continuous axis,<br/>tactile-focused]
    RotateIt --> DexNDM[DexNDM<br/>neural dynamics]
    RotateIt --> DextER[DextER<br/>extreme rotation]
    HORA --> DeXtreme[DeXtreme<br/>2022<br/>quaternion target]
    DeXtreme --> CTR[CTR-MPC<br/>contact-trajectory]

HORA (2022). ๊ฐ™์€ ๊ทธ๋ฃน์˜ ์„ ํ–‰ ์—ฐ๊ตฌ. ์ž๊ธฐ์ˆ˜์šฉ๊ฐ๊ฐ๋งŒ์œผ๋กœ z์ถ• ํšŒ์ „ ํ•™์Šต. RotateIt์˜ ์ง๊ณ„ ๋ถ€๋ชจ. ๊ฐ™์€ teacher-student ํŒจํ„ด์„ ๋น„์ „+์ด‰๊ฐ์œผ๋กœ ํ™•์žฅํ•œ ๊ฒƒ์ด RotateIt.

DeXtreme (2022). ํ๋ธŒ๋ฅผ ๋ชฉํ‘œ ์ž์„ธ(quaternion)๋กœ ์ •๋ ฌ. ํšŒ์ „ โ€œ๋Šฅ๋ ฅโ€๋ณด๋‹ค โ€œ์ •๋ฐ€ ์ •๋ ฌโ€์— ์ง‘์ค‘. RotateIt์™€๋Š” ๋ณด์ƒ ์„ค๊ณ„ ์ฒ ํ•™์ด ๋‹ค๋ฅด๋‹ค.

AnyRotate. ์ด‰๊ฐ ์ค‘์‹ฌ์œผ๋กœ ์ž„์˜ ์ถ• ํšŒ์ „์„ ๋” ๋ฐ€์–ด๋ถ™์ธ ํ›„์†. RotateIt์ด ๋น„์ „๊ณผ ์ด‰๊ฐ์˜ ์œตํ•ฉ์„ ๋ณด์—ฌ์คฌ๋‹ค๋ฉด, AnyRotate๋Š” ์ด‰๊ฐ ๋‹จ๋…์˜ ๊ฐ€๋Šฅ์„ฑ์„ ํƒ์ƒ‰ํ•œ๋‹ค.

DexNDM, DexMimicGen. ํ•™์Šต ๋ฐ์ดํ„ฐ ์ƒ์„ฑ๊ณผ ์‹ ๊ฒฝ ๋™์—ญํ•™์œผ๋กœ ์ผ๋ฐ˜ ์กฐ์ž‘์„ ํ™•์žฅ. RotateIt์˜ ์ •์ฑ… ํ•™์Šต ํŒจํ„ด์„ ๋ชจ๋“ˆ๋กœ ํ™œ์šฉ.

CTR-MPC. ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ ‘์ด‰ ๊ถค์  ์ตœ์ ํ™”. RotateIt์˜ RL ์ ‘๊ทผ๊ณผ ๋Œ€๋น„๋˜๋Š” ์ง„์˜. ๊ฐ™์€ Allegro Hand ํ”Œ๋žซํผ์—์„œ ๋น„๊ต ์—ฐ๊ตฌ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์—์„œ ์ง์ ‘์  ๋น„๊ต ๋Œ€์ƒ์ด๋‹ค.

Allegro Hand ์‚ฌ์šฉ์ž ๊ด€์ ์—์„œ์˜ ์‹œ์‚ฌ์ 

์—ฐ๊ตฌ ํ˜„์žฅ์—์„œ ๊ณง์žฅ ์จ๋จน์„ ์ˆ˜ ์žˆ๋Š” ์ธ์‚ฌ์ดํŠธ๋ฅผ ๋ช‡ ๊ฐœ ๋ฝ‘์ž.

์ด‰๊ฐ์ด ์—†์–ด๋„ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ๋‹ค. Allegro V4/V5์— GelSight๋‚˜ DIGIT๋ฅผ ๋ถ™์ด๋Š” ๊ฒŒ ๋ถ€๋‹ด์Šค๋Ÿฝ๋‹ค๋ฉด, ์ž์ž‘ ์ด์ง„ ์ ‘์ด‰ ํŒจ๋“œ(ํ˜น์€ ๋ชจํ„ฐ ์ „๋ฅ˜ ๊ธฐ๋ฐ˜ ์ ‘์ด‰ ์ถ”์ •)๋กœ๋„ RotateIt ์ˆ˜์ค€์˜ ํšŒ์ „์€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๋ฉ”์‹œ์ง€๋กœ ์ฝํžŒ๋‹ค.

์ ๊ตฐ ํ‘œํ˜„์ด RGB๋ณด๋‹ค ํ•ฉ๋ฆฌ์ ์ด๋‹ค. ์† ์œ„์— ๋งˆ์šดํŠธํ•œ ์นด๋ฉ”๋ผ์˜ ์‹œ์ ์€ ๋Š์ž„์—†์ด ํ”๋“ค๋ฆฐ๋‹ค. RGB CNN๋ณด๋‹ค ์ ๊ตฐ ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”๊ฐ€ ์‹œ์  ๋ณ€ํ™˜์— ๊ฐ•ํ•˜๋‹ค. PointNet ๋ฅ˜ + ์† ๊ธฐ์ค€ ์ขŒํ‘œ๊ณ„ ๋ณ€ํ™˜์„ ํ•œ ์„ธํŠธ๋กœ ๊ฐ€์ ธ๊ฐ€๋Š” ๊ฒŒ ์œ ๋ฆฌํ•˜๋‹ค.

์ถ• ์กฐ๊ฑดํ™”๋Š” ๊ฑฐ์˜ ๊ณต์งœ๋‹ค. 3์ฐจ์› ๋‹จ์œ„๋ฒกํ„ฐ ์ž…๋ ฅ๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด ์ •์ฑ…์ด ์ผ๋ฐ˜ํ™”๋œ๋‹ค. ์ด๊ฑด ํšŒ์ „๋ฟ ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ in-hand ์ž‘์—…์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์•„์ด๋””์–ด๋‹ค. ๋ชฉํ‘œ ์ž์„ธ, ๋ชฉํ‘œ ์œ„์น˜, ๋ชฉํ‘œ ํ† ํฌ ๋ชจ๋‘ ๊ฐ™์€ ํŒจํ„ด์œผ๋กœ ์กฐ๊ฑดํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.

Domain randomization์˜ ์ง„์งœ ๋น„๋ฐ€์€ ์„ผ์„œ๋‹ค. ๋งˆ์ฐฐ๊ณผ ์งˆ๋Ÿ‰์„ ๋žœ๋คํ™”ํ•˜๋Š” ๊ฑด ๊ฑฐ์˜ ๋ชจ๋“  RL ๋…ผ๋ฌธ์ด ํ•œ๋‹ค. ์ง„์งœ ์ฐจ์ด๋ฅผ ๋งŒ๋“œ๋Š” ๊ฑด ์„ผ์„œ ๋…ธ์ด์ฆˆ, ์ง€์—ฐ, dropout์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด๋‹ค. ์ด ๋ถ€๋ถ„์˜ ๋””ํ…Œ์ผ์„ ํ•™์Šต ์ฝ”๋“œ์—์„œ ์ตœ์šฐ์„ ์œผ๋กœ ์ฑ™๊ฒจ์•ผ ํ•œ๋‹ค.

Isaac Lab ์ด์‹ ์‹œ ์ฃผ์˜์ . RotateIt์ด ์‚ฌ์šฉํ•œ IsaacGym ๊ธฐ๋ฐ˜ ํ•™์Šต ๋ ˆ์‹œํ”ผ๋ฅผ Isaac Lab์œผ๋กœ ์˜ฎ๊ธธ ๋•Œ, ๋‘ ๊ฐ€์ง€ ํ•จ์ •์ด ์žˆ๋‹ค. (1) ์ ‘์ด‰ ๋งˆ์ฐฐ ๋ชจ๋ธ(ํŠนํžˆ torsional/rolling friction) ๊ธฐ๋ณธ๊ฐ’์ด ๋‹ค๋ฅด๋‹ค. (2) PhysX ์†”๋ฒ„ ์„ค์ •๊ณผ substep ์ˆ˜๊ฐ€ ํšŒ์ „ ์•ˆ์ •์„ฑ์— ํฐ ์˜ํ–ฅ์„ ์ค€๋‹ค. HORA ์ด์‹ ์‹œ ๋ฐœ๊ฒฌ๋œ angular_damping ์ฐจ์ด์™€ ๊ฐ™์€ ๋ฅ˜์˜ ํ•จ์ •์ด ๋˜ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค.

์ง๊ด€์  ์ •๋ฆฌ: ์ด ๋…ผ๋ฌธ์ด ์ง„์งœ๋กœ ๋ณด์—ฌ์ค€ ๊ฒƒ

๋งˆ์ง€๋ง‰์œผ๋กœ, ์ด ๋…ผ๋ฌธ์ด ์šฐ๋ฆฌ์—๊ฒŒ ๊ฐ€๋ฅด์ณ์ฃผ๋Š” ํ•ต์‹ฌ์„ ํ•œ ์ค„๋กœ ๋‹ค์‹œ ์••์ถ•ํ•˜๋ฉด ์ด๋ ‡๋‹ค.

โ€œ์„ผ์„œ๊ฐ€ ํ’๋ถ€ํ•œ ๊ฒƒ๋ณด๋‹ค, ์ •์ฑ…์ด ํ•„์š”ํ•œ ๊ฒƒ๋งŒ ์ •ํ™•ํžˆ ๋ฐ›๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.โ€

RotateIt๋Š” GelSight๋„, 6์ถ• F/T๋„, ๊ณ ํ”„๋ ˆ์ž„ RGB๋„ ์“ฐ์ง€ ์•Š๋Š”๋‹ค. ๊ทธ์ € ์† ์œ„์˜ ๊นŠ์ด ์นด๋ฉ”๋ผ ํ•˜๋‚˜, ์†๊ฐ€๋ฝ์— ๋ฐ•ํžŒ ์ด์ง„ ์ ‘์ด‰ ์‹ ํ˜ธ, ๊ทธ๋ฆฌ๊ณ  โ€œ์ด ์ถ•์œผ๋กœ ๋Œ๋ คโ€๋ผ๋Š” 3์ฐจ์› ๋ฒกํ„ฐ. ์ด๊ฒŒ ์ „๋ถ€๋‹ค.

๋Œ€์‹  ํ•™์Šต ์ธก์—์„œ ์˜๋ฆฌํ•˜๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์‹ ์˜ ์‹œ์•ผ๋กœ ์ •์ฑ…์„ ๋งŒ๋“ค๊ณ , ๊ฐ™์€ ํ–‰๋™์„ ํ˜„์‹ค ๊ฐ€๋Šฅํ•œ ์ž…๋ ฅ์œผ๋กœ ์žฌํ˜„ํ•˜๋Š” ํ•™์ƒ์„ ๋”ฐ๋กœ ์ง“๋Š”๋‹ค. ํšŒ์ „์ถ•์„ ์ž…๋ ฅ์— ๋ช…์‹œ์ ์œผ๋กœ ๋„ฃ์–ด ์ผ๋ฐ˜ํ™”์˜ ์ฐจ์›์„ ์—ด์–ด์ค€๋‹ค. ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋™์—ญํ•™์˜ ๊ฒฐํ•จ์„ ๊ฐ€๋ฆฐ๋‹ค.

์ด๊ฒŒ ์ข‹์€ ๋กœ๋ด‡ํ•™ ์—ฐ๊ตฌ์˜ ํ•œ ๋ชจ๋ฒ”์ด๋‹ค. ํ•˜๋“œ์›จ์–ด์™€ ์„ผ์„œ๋กœ ๋ฌธ์ œ๋ฅผ ํ’€๋ ค ํ•˜๊ธฐ ์ „์—, ํ‘œํ˜„๊ณผ ํ•™์Šต ์‹ ํ˜ธ๋กœ ํ’€ ์ˆ˜ ์žˆ๋Š” ๋งŒํผ์€ ํ’€์–ด๋‚ด๋Š” ๊ฒƒ. ๊ทธ ๊ฒฐ๊ณผ ๊ฐ€๋ฒผ์šด ์‹œ์Šคํ…œ์œผ๋กœ๋„ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ์ •์ฑ…์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฑธ ๋ณด์—ฌ์ค€๋‹ค.

๋‹ค์Œ์— in-hand manipulation ์ •์ฑ…์„ ์งค ๋•Œ ๋˜์ ธ์•ผ ํ•  ์งˆ๋ฌธ์€ ๋ถ„๋ช…ํ•˜๋‹ค. ๋‚ด ์ •์ฑ…์— ์ง„์งœ๋กœ ํ•„์š”ํ•œ ์ž…๋ ฅ์€ ๋ฌด์—‡์ธ๊ฐ€? ๊ทธ๋ฆฌ๊ณ  ๊ทธ ์ž…๋ ฅ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ํ˜„์‹ค์—์„œ ์ผ๊ด€๋˜๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š”๊ฐ€? RotateIt์˜ ๋‹ต์€ ๊นŠ์ด + ์ด์ง„ ์ ‘์ด‰ + ์ถ• ๋ฒกํ„ฐ์˜€๋‹ค. ์šฐ๋ฆฌ์˜ ๋‹ต์€ ์šฐ๋ฆฌ ์ž‘์—…์— ๋”ฐ๋ผ ๋‹ค๋ฅผ ๊ฒƒ์ด๋‹ค. ํ•˜์ง€๋งŒ ์งˆ๋ฌธ ์ž์ฒด๋Š” ๊ฐ™๋‹ค.

Copyright 2026, JungYeon Lee