Curieux.JY
  • JungYeon Lee
  • Post
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ๋ฌธ์ œ์˜ ํ’๊ฒฝ: ๋ฐ๋ชจ๋ฅผ ์–ป๋Š” ์„ธ ๊ฐˆ๋ž˜๊ธธ
    • ํ•ต์‹ฌ ์•„์ด๋””์–ด: ๋งž์ถ”์ง€ ๋ง๊ณ  ํก์ˆ˜ํ•˜๋ผ
    • ํ•˜๋“œ์›จ์–ด ์„ค๊ณ„ ๊นŠ์ด ๋“ค์—ฌ๋‹ค๋ณด๊ธฐ
      • ์‹œ์Šคํ…œ ๊ฐœ์š”
      • ์Šฌ๋ผ์ด๋” ๊ธฐ๋ฐ˜ ์†๊ฐ€๋ฝ ์ธํ„ฐํŽ˜์ด์Šค
      • ํฌ์ฆˆ-ํ—ˆ์šฉ ์—„์ง€ ๋ฉ”์ปค๋‹ˆ์ฆ˜: ๊ฐ€์žฅ ์˜๋ฆฌํ•œ ๋ถ€๋ถ„
      • Passive Hand: ์นด๋ฉ”๋ผ๊ฐ€ ๊ฑฐ์ง“๋ง์„ ํ•˜์ง€ ์•Š๊ฒŒ
    • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘๊ณผ ์ •์ฑ… ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ
    • ์‹คํ—˜๊ณผ ๊ฒฐ๊ณผ
      • Wiggle Space ์ธก์ •: ์ด๋ก ์ด ๋งž๋Š”๊ฐ€?
      • ์‚ฌ์šฉ์ž ์—ฐ๊ตฌ: ๊ฐ€์œ„์งˆ์ด ๋ณด์—ฌ์ค€ ์ง„์‹ค
      • ์ •์ฑ… ํ‰๊ฐ€: Raw RGB๋งŒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํ•œ๊ฐ€
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
      • ๊ฐ•์ 
      • ์•ฝ์ ๊ณผ ํ•œ๊ณ„
      • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
    • ๋กœ๋ด‡๊ณตํ•™์ž์—๊ฒŒ ์ฃผ๋Š” ํ†ต์ฐฐ
    • ๋งˆ๋ฌด๋ฆฌ

๐Ÿ“ƒDexEXO

exoskeleton
dexterity
wearable
A Wearability-First Dexterous Exoskeleton for Operator-Agnostic Demonstration and Learning
Published

May 21, 2026

  • Project Page
  • Paper Link

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•˜๊ณ  ์ด์งˆ์ ์ธ ์ž‘์—…์ž๋กœ๋ถ€ํ„ฐ ๊ณ ํ’ˆ์งˆ์˜ ๋ฐ๋ชจ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ๋ฐ ๋”ฐ๋ฅด๋Š” ์–ด๋ ค์›€์œผ๋กœ ์ธํ•ด ์ˆ™๋ จ๋œ ๋กœ๋ด‡ ํ•™์Šต์˜ ํ™•์žฅ์ด ์ œ์•ฝ๋ฐ›๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์›จ์–ด๋Ÿฌ๋ธ” ์ธํ„ฐํŽ˜์ด์Šค๋Š” ์ข…์ข… ์ฐฉ์šฉ์„ฑ(wearability)๊ณผ ์‚ฌ์šฉ์ž ๊ฐ„ ์ ์‘์„ฑ์„ ํฌ์ƒํ•˜๋ฉด์„œ ์šด๋™ํ•™์  ์ถฉ์‹ค๋„(kinematic fidelity)๋ฅผ ๋†’์ด๋ ค ํ•˜๋ฉฐ, ๋ฐ๋ชจ ๋‹จ๊ณ„์™€ ๋ฐฐํฌ ๋‹จ๊ณ„ ๊ฐ„์˜ ์ž„๋ฐ”๋””๋จผํŠธ ๋ถˆ์ผ์น˜(embodiment mismatch)๋Š” ์ •์ฑ… ํ•™์Šต ์ „์— ์‹œ๊ฐ์  ํ›„์ฒ˜๋ฆฌ(visual post-processing)๋ฅผ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ•œ๊ณ„์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์€ ์ฐฉ์šฉ์„ฑ์„ ์ตœ์šฐ์„ ์œผ๋กœ ๊ณ ๋ คํ•œ ํ•ธ๋“œ ์™ธ๊ณจ๊ฒฉ์ธ DexEXO๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. DexEXO๋Š” ํ•˜๋“œ์›จ์–ด ์ˆ˜์ค€์—์„œ ์‹œ๊ฐ์  ์™ธํ˜•(visual appearance), ์ ‘์ด‰ ๊ธฐํ•˜ํ•™(contact geometry), ์šด๋™ํ•™(kinematics)์„ ์ •๋ ฌ์‹œํ‚ต๋‹ˆ๋‹ค. ํ•ต์‹ฌ์ ์ธ ํŠน์ง•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  1. ์ฐฉ์šฉ์„ฑ ๋ฐ ๊ต์ฐจ ์‚ฌ์šฉ์ž ์ ์‘์„ฑ:
    • Slider-Based Finger Interface: ๊ฐ ์†๊ฐ€๋ฝ์— ์ˆ˜๋™ ์Šคํ”„๋ง ์žฅ์ฐฉ ์„ ํ˜• ์Šฌ๋ผ์ด๋”(linear slider)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†๊ฐ€๋ฝ ๊ธธ์ด ๋ณ€ํ™”๋ฅผ ์ˆ˜์šฉํ•˜๊ณ  ์ถฉ๋ถ„ํ•œ ์ปฌ(curl) ๋ณ€์œ„๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์‚ฝ์ž… ๊นŠ์ด์™€ ๊ด€์ ˆ ์ถ• ์ •๋ ฌ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ์ฐฉ์šฉ ๊ฒฌ๊ณ ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ๋ถ„์„์  ๋ชจ๋ธ๋ง์„ ํ†ตํ•ด 140 mm์—์„œ 217 mm๊นŒ์ง€์˜ ์† ๊ธธ์ด๋ฅผ ์ง€์›ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Š” ์‚ฌ์šฉ์ž์˜ ์† ๊ธธ์ด์— ๋”ฐ๋ผ ์†๊ฐ€๋ฝ TPU ๋ง์ด ์†๊ฐ€๋ฝ ๋ฌผ๊ฐˆํ€ด(webbing) ์œ„๋กœ ์ตœ๋Œ€ \delta = 17 \text{ mm}๊นŒ์ง€ ์œ„์น˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ตœ์†Œ ๋ฐ ์ตœ๋Œ€ ํ˜ธํ™˜ ์† ๊ธธ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค: \text{H}_{\text{min}} = \text{L}_{\text{min}} / r \text{H}_{\text{max}} = (\text{d}_{\text{max}} - \text{d}_{\text{curl}} + \delta) / r ์—ฌ๊ธฐ์„œ \text{L}_{\text{min}}์€ ์ตœ์†Œ TPU ๋ง-์†๊ฐ€๋ฝ ๋ฎ๊ฐœ ๊ฑฐ๋ฆฌ, \text{d}_{\text{max}}๋Š” ์Šฌ๋ผ์ด๋”์˜ ์ตœ๋Œ€ ์ด๋™ ๊ฑฐ๋ฆฌ, \text{d}_{\text{curl}}์€ ์™„์ „ํ•œ ์†๊ฐ€๋ฝ ๊ตฝํž˜์— ํ•„์š”ํ•œ ์ตœ์†Œ ์ž์œ  ์Šฌ๋ผ์ด๋” ๊ธธ์ด, r์€ ์†๊ฐ€๋ฝ ๊ธธ์ด-์† ๊ธธ์ด ๋น„์œจ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉ์ž๋ณ„ ํ”ผํŒ… ๋ฐ ๊ต์ •์˜ ํ•„์š”์„ฑ์„ ์ค„์—ฌ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๊ต์ฐจ ์‚ฌ์šฉ์ž ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
    • Pose-Tolerant Thumb Mechanism: ์ด ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์ธ๊ฐ„ ์—„์ง€์†๊ฐ€๋ฝ์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์ž‘์—… ๊ณต๊ฐ„(workspace)์„ ๋ณด์กดํ•˜๋ฉด์„œ๋„ ๋กœ๋ด‡ ์—„์ง€์†๊ฐ€๋ฝ ์ž์œ ๋„(DOFs)์— ๋Œ€ํ•œ ์ผ๊ด€๋œ ๋งคํ•‘์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ ์—„์ง€์†๊ฐ€๋ฝ์€ ๊ณ„์ธก๋œ IP ๊ด€์ ˆ \text{J}_1 (๊ฐ๋„ \theta_1)์„ ํฌํ•จํ•˜๋ฉฐ, ์ˆ˜๋™ ์—„์ง€์†๊ฐ€๋ฝ์€ IP ๊ด€์ ˆ \text{J}_2 (๊ฐ๋„ \theta_2)์™€ TM ์™ธ์ „/๋‚ด์ „(abduction/adduction) ๊ด€์ ˆ \text{J}_4 (๊ฐ๋„ \theta_4)๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ \text{J}_3๋Š” \text{J}_2์— ๊ธฐ๊ณ„์ ์œผ๋กœ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ ์—„์ง€์†๊ฐ€๋ฝ์€ ๋‘ ๊ฐœ์˜ ๊ฐ•์„ฑ ๋งํ‚ค์ง€(distal linkage ๋ฐ metacarpal linkage)๋ฅผ ํ†ตํ•ด ์ˆ˜๋™ ์—„์ง€์†๊ฐ€๋ฝ์— ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ์ด ์•„ํ‚คํ…์ฒ˜๋Š” ์™ธ๊ณจ๊ฒฉ๊ณผ ์ธ๊ฐ„ ์—„์ง€์†๊ฐ€๋ฝ ์‚ฌ์ด์˜ ๊ฐ•์„ฑ ๋ฐฉํ–ฅ ์ •๋ ฌ์„ ๊ฐ•์ œํ•˜์ง€ ์•Š๊ณ , ๋Œ€์‹  ๊ธฐํ•˜ํ•™์  ๊ฑฐ๋ฆฌ ์ œ์•ฝ ์กฐ๊ฑด๋งŒ ๋ถ€์—ฌํ•˜์—ฌ ์™ธ๊ณจ๊ฒฉ์ด ์†๋ฐ”๋‹ฅ์— ๋Œ€ํ•ด ๋ฒˆ์—ญ(translate) ๋ฐ ํšŒ์ „(rotate)ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ์˜ ํฌ์ฆˆ \text{B}\text{T}_{\text{E}}๋Š” 6์ž์œ ๋„๋ฅผ ๊ฐ€์ง€๋ฉฐ, ๋‹ค์Œ์˜ ํ™€๋กœ๋…ธ๋ฏน(holonomic) ๊ฑฐ๋ฆฌ ์ œ์•ฝ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•ฉ๋‹ˆ๋‹ค: \lVert \text{B}\text{r}_{\text{d}}^{\text{E}} - \text{B}\text{r}_{\text{d}}(\text{q}_{\text{p}}) \rVert = \text{L}_{\text{d}} \lVert \text{B}\text{r}_{\text{m}}^{\text{E}} - \text{B}\text{r}_{\text{m}}(\text{q}_{\text{p}}) \rVert = \text{L}_{\text{m}} ์—ฌ๊ธฐ์„œ \text{L}_{\text{d}}์™€ \text{L}_{\text{m}}์€ ์›์œ„(distal) ๋ฐ ์ค‘์ˆ˜๊ณจ(metacarpal) ๋งํ‚ค์ง€ ๊ธธ์ด๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ž”์—ฌ ์ž์œ ๋„(residual freedom)๋Š” โ€œ๊ฟˆํ‹€๊ฑฐ๋ฆผ ๊ณต๊ฐ„(wiggle space)โ€์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋ฉฐ, ์™ธ๊ณจ๊ฒฉ ๋ชธ์ฒด๊ฐ€ ์†๋ฐ”๋‹ฅ์— ๋Œ€ํ•ด ์ด๋™ ๋ฐ ํšŒ์ „ํ•˜๋ฉด์„œ๋„ ์ˆ˜๋™ ์—„์ง€์†๊ฐ€๋ฝ ์ž์„ธ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  2. ์ž„๋ฐ”๋””๋จผํŠธ ์ •๋ ฌ:
    • ๋ฐฐํฌ๋œ ๋กœ๋ด‡(OYMotion ROH-AP001)๊ณผ ์‹œ๊ฐ์ ์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์ˆ˜๋™ ์†(passive hand)์„ ํ†ตํ•ฉํ•˜์—ฌ, ์†๋ชฉ์— ์žฅ์ฐฉ๋œ ์นด๋ฉ”๋ผ ๊ด€์ ์ด ๋กœ๋ด‡ ์†์˜ ๊ด€์ ๊ณผ ์ผ์น˜ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ˆ˜์ง‘ ๋ฐ ์ถ”๋ก  ๊ณผ์ •์—์„œ ์‹œ๊ฐ์  ๋ถˆ์ผ์น˜๋ฅผ ์ œ๊ฑฐํ•˜๊ณ , ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜(segmentation) ๋ฐ ๋น„์ฃผ์–ผ ํฌ์ŠคํŠธ ํ”„๋กœ์„ธ์‹ฑ(visual post-processing) ์—†์ด ์›์‹œ(raw) ์†๋ชฉ ์žฅ์ฐฉ RGB ๊ด€์ฐฐ์—์„œ ์ง์ ‘ ์ •์ฑ… ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ์ •์ฑ… ํ•™์Šต:

๋ฐ์ดํ„ฐ๋Š” ์™ธ๊ณจ๊ฒฉ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋‚ด๋ถ€์— ๋‚ด์žฅ๋œ 6๊ฐœ์˜ ์•„๋‚ ๋กœ๊ทธ ์—”์ฝ”๋”(finger joint positions), iPhone ๊ธฐ๋ฐ˜ AR ์ถ”์  ์‹œ์Šคํ…œ(6-DOF end-effector pose), ๊ทธ๋ฆฌ๊ณ  ์†๋ชฉ ์žฅ์ฐฉ Intel RealSense ์นด๋ฉ”๋ผ(RGB images)๋ฅผ ํ†ตํ•ด ์ˆ˜์ง‘๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ์„ผ์„œ ๋ฐ์ดํ„ฐ๋Š” ๋™๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค. ์ •์ฑ… ํ•™์Šต์—๋Š” ๋””ํ“จ์ „ ์ •์ฑ…(diffusion policy)์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ๊ด€์ฐฐ๊ฐ’์œผ๋กœ๋Š” ์†๋ชฉ ์žฅ์ฐฉ ์นด๋ฉ”๋ผ์˜ RGB ์ด๋ฏธ์ง€์™€ (์„ ํƒ์ ์œผ๋กœ) ์ €์ฐจ์› ์† ์ƒํƒœ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. RGB ์ด๋ฏธ์ง€๋Š” DINOv2 ViT-S/14 ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐ์  ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ  ์ •์ฑ…์˜ ์ฃผ์š” ์กฐ๊ฑดํ™” ์‹ ํ˜ธ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์€ 16๋‹จ๊ณ„์˜ ์˜ˆ์ธก ํ–‰๋™ ํ˜ธ๋ผ์ด์ฆŒ์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ฒซ 8๊ฐœ ํ–‰๋™์€ ๋ฆฌ์‹œ๋”ฉ ํ˜ธ๋ผ์ด์ฆŒ(receding-horizon) ๋ฐฉ์‹์œผ๋กœ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ:

์‚ฌ์šฉ์ž ์—ฐ๊ตฌ์—์„œ DexEXO๋Š” ๊ธฐ์กด ์›จ์–ด๋Ÿฌ๋ธ” ์‹œ์Šคํ…œ(DexUMI) ๋ฐ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜(teleoperation)์— ๋น„ํ•ด ํ–ฅ์ƒ๋œ ํŽธ์•ˆํ•จ๊ณผ ์‚ฌ์šฉ์„ฑ(์˜ˆ: ๊ฐ€์œ„ ์ž๋ฅด๊ธฐ ์ž‘์—… ์œ ์ผํ•˜๊ฒŒ ์ˆ˜ํ–‰, ํ”ผ์•„๋…ธ ์ž‘์—… ์„ฑ๊ณต๋ฅ  54.5% ๋†’์Œ, ์™„๋ฃŒ ์‹œ๊ฐ„ 16.6% ๋น ๋ฆ„)์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ์ฐฉ์šฉ์ž๋Š” ์™ธ๊ณจ๊ฒฉ ๋””์ž์ธ์— ๋Œ€ํ•ด ๋” ๋†’์€ ์†๊ฐ€๋ฝ ๋…๋ฆฝ์„ฑ(finger independence), ๋ฌผ๋ฆฌ์  ํŽธ์•ˆํ•จ(physical comfort), ๋‚ฎ์€ ์ขŒ์ ˆ๊ฐ์„ ๋ณด๊ณ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ •์ฑ… ํ‰๊ฐ€ ๊ฒฐ๊ณผ, ํ•˜๋“œ์›จ์–ด ์ˆ˜์ค€์˜ ๊ธฐํ•˜ํ•™์  ๋ฐ ์‹œ๊ฐ์  ์ •๋ ฌ์€ ์‹œ๊ฐ์  ํ›„์ฒ˜๋ฆฌ ์—†์ด ํšจ๊ณผ์ ์ธ ์—”๋“œํˆฌ์—”๋“œ ์ •์ฑ… ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋ฉฐ, ์›์‹œ RGB ๊ด€์ฐฐ๋งŒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํ•œ ์ƒํƒœ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๋ช…์‹œ์ ์ธ ์† ์ƒํƒœ ์กฐ๊ฑดํ™”(explicit hand-state conditioning)๊ฐ€ ๋ถˆํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. DexUMI์˜ ์ตœ์  ๊ตฌ์„ฑ(์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๋ฐ ์ธํŽ˜์ธํŒ… ํฌํ•จ)๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, DexEXO๋Š” ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์ด๋‚˜ ์ธํŽ˜์ธํŒ… ์—†์ด ์œ ์‚ฌํ•œ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก ์ ์œผ๋กœ, DexEXO๋Š” ์ฐฉ์šฉ์„ฑ๊ณผ ํ•˜๋“œ์›จ์–ด ์ˆ˜์ค€์˜ ์ž„๋ฐ”๋””๋จผํŠธ ์ •๋ ฌ์„ ์šฐ์„ ์‹œํ•จ์œผ๋กœ์จ ์ˆ™๋ จ๋œ ๋กœ๋ด‡ ํ•™์Šต์—์„œ ์ธ๊ฐ„ ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ์ค„์ด๋ฉด์„œ๋„ ์ž‘์—… ์„ฑ๋Šฅ์„ ํฌ์ƒํ•˜์ง€ ์•Š๋Š” ํšจ๊ณผ์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

The bulk of the paper is in hand, but I still need the policy evaluation (V-C), limitations, and conclusion sections. Let me grab those. ## ์‹œ์ž‘ํ•˜๋ฉฐ: ์†์ด ์•ˆ ๋งž์œผ๋ฉด ๋ฐ์ดํ„ฐ๋„ ์—†๋‹ค

์†๊ธฐ์ˆ (dexterous manipulation)์„ ํ•™์Šต์‹œํ‚จ๋‹ค๋Š” ๊ฑด ๊ฒฐ๊ตญ ๋ฐ์ดํ„ฐ ์‹ธ์›€์ž…๋‹ˆ๋‹ค. ๋‹ค๊ด€์ ˆ ์†์—์„œ ์ ‘์ด‰์ด ํ’๋ถ€ํ•œ ๋™์ž‘ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ์ผ์€ ๊ทธ๋ž˜์Šคํผ(parallel-jaw gripper) ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ์ผ๊ณผ ๋น„๊ต๊ฐ€ ์•ˆ ๋ฉ๋‹ˆ๋‹ค. ์ž์œ ๋„๊ฐ€ ๋งŽ๊ณ , ์‹œ์  ๊ฐ€๋ฆผ(occlusion)์ด ์ผ์ƒ์ด๋ฉฐ, ์ ‘์ด‰์ด ๋Š์ž„์—†์ด ๋งŒ๋“ค์–ด์ง€๊ณ  ํ’€๋ฆฝ๋‹ˆ๋‹ค. ์†์— ๋ฌด์–ธ๊ฐ€๋ฅผ ๋ผ์šฐ๋Š” ์ˆœ๊ฐ„ ์‚ฌ์šฉ์ž๋Š” ๋ถˆํŽธํ•จ์„ ํ˜ธ์†Œํ•˜๊ณ , ์†๋งˆ๋‹ค ํฌ๊ธฐ์™€ ๋น„์œจ์ด ๋‹ฌ๋ผ ํ•œ ์‚ฌ๋žŒ์—๊ฒŒ ๋งž์ถ˜ ์žฅ์น˜๋Š” ๋‹ค๋ฅธ ์‚ฌ๋žŒ์—๊ฒ ํ—๊ฒ๊ฑฐ๋‚˜ ๋ผ์ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด ๋…ผ๋ฌธ์ด ๋˜์ง€๋Š” ์งˆ๋ฌธ์€ ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค. โ€œ์†๊ธฐ์ˆ  ๋ฐ๋ชจ ์ˆ˜์ง‘ ์žฅ์น˜์˜ ์ง„์งœ ๋ณ‘๋ชฉ์€ ์ •ํ™•๋„๊ฐ€ ์•„๋‹ˆ๋ผ ์ฐฉ์šฉ์„ฑ(wearability)์ด ์•„๋‹๊นŒ?โ€ ๊ทธ๋ฆฌ๊ณ  ํ•œ ๊ฐ€์ง€ ๋”. โ€œ๊ธฐ๊ตฌํ•™์ ์œผ๋กœ๋Š” ์ผ์น˜์‹œํ‚ค๋”๋ผ๋„ ์นด๋ฉ”๋ผ๊ฐ€ ๋ณด๋Š” ์˜์ƒ๊นŒ์ง€ ์ผ์น˜์‹œํ‚ค์ง€ ์•Š์œผ๋ฉด, ๊ฒฐ๊ตญ ์ •์ฑ… ํ•™์Šต์—์„œ๋Š” ํ›„์ฒ˜๋ฆฌ๊ฐ€ ๋˜ ํ•„์š”ํ•˜์ง€ ์•Š์€๊ฐ€?โ€ DexEXO๋Š” ์ด ๋‘ ๋ฌธ์ œ๋ฅผ ํ•˜๋“œ์›จ์–ด ์ˆ˜์ค€์—์„œ ๋™์‹œ์— ํ•ด๊ฒฐํ•˜๋ ค๋Š” ์‹œ๋„์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ทธ ํ•ด๊ฒฐ์ฑ…์˜ ๋ณธ์งˆ์€ ์˜์™ธ๋กœ ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค. โ€œ๋งž์ถ”๋ ค๊ณ  ํ•˜์ง€ ๋ง๊ณ , ์–ด๊ธ‹๋‚จ์„ ํก์ˆ˜ํ•˜๋ผ.โ€

์ด ๊ธ€์€ UCLA์˜ RoMeLa์™€ PRG ๊ทธ๋ฃน์ด ๋ฐœํ‘œํ•œ DexEXO๋ฅผ ๋‹ค๊ด€์ ˆ ์†๊ธฐ์ˆ ์„ ๋‹ค๋ฃจ๋Š” ๋กœ๋ด‡๊ณตํ•™์ž์˜ ๊ด€์ ์—์„œ ๊นŠ์ด ์žˆ๊ฒŒ ๋“ค์—ฌ๋‹ค๋ด…๋‹ˆ๋‹ค. ํŠนํžˆ Allegro Hand๋‚˜ ์œ ์‚ฌ ๋กœ๋ด‡ ์† ํ”Œ๋žซํผ์—์„œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์„ ๊ณ ๋ฏผํ•˜๋Š” ๋ถ„๋“ค์ด๋ผ๋ฉด, ์ด ๋…ผ๋ฌธ์˜ ์„ค๊ณ„ ์ฒ ํ•™์ด ๋„๊ตฌ ์„ ํƒ์„ ๋„˜์–ด์„œ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์—๊นŒ์ง€ ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์ œ์˜ ํ’๊ฒฝ: ๋ฐ๋ชจ๋ฅผ ์–ป๋Š” ์„ธ ๊ฐˆ๋ž˜๊ธธ

์†๊ธฐ์ˆ  ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ๋ฐฉ๋ฒ•์€ ํฌ๊ฒŒ ์…‹์ž…๋‹ˆ๋‹ค.

์ฒซ์งธ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ๋น„๋””์˜ค์ž…๋‹ˆ๋‹ค. ์–‘์€ ํ’๋ถ€ํ•ฉ๋‹ˆ๋‹ค. ์ธํ„ฐ๋„ท ์˜์ƒ์€ ๋ฌดํ•œ๋Œ€์— ๊ฐ€๊น๊ณ  ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ๋ณ‘๋ ฌ๋กœ ๋ฌดํ•œํžˆ ๊ตด๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ ‘์ด‰๋ ฅ๊ณผ ๋ฏธ์„ธํ•œ ์†๊ฐ€๋ฝ-๋ฌผ์ฒด ์ƒํ˜ธ์ž‘์šฉ์„ ์‹ค์žฌ์ฒ˜๋Ÿผ ํฌ์ฐฉํ•˜๋Š” ์ผ์€ ์—ฌ์ „ํžˆ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ์†๊ฐ€๋ฝ์ด ์ข…์ด ํ•œ ์žฅ์„ ์ง‘์–ด ์˜ฌ๋ฆด ๋•Œ ์†ํ†ฑ์ด ๋ฏธ๋„๋Ÿฌ์ง€๋Š” ๊ทธ ์ฐฐ๋‚˜์˜ dynamics๋Š” ๋น„๋””์˜ค์—” ์•ˆ ์ฐํžˆ๊ณ  ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—” ์ข…์ข… ๋น ์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‘˜์งธ, ์›๊ฒฉ์กฐ์ž‘(teleoperation)์ž…๋‹ˆ๋‹ค. ๋กœ๋ด‡์˜ ์ œ์–ด๊ณต๊ฐ„์—์„œ ์ง์ ‘ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“œ๋‹ˆ ๊ฐ€์žฅ ์ •ํ™•ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋‹ค๊ด€์ ˆ ์†์˜ ์›๊ฒฉ์กฐ์ž‘์€ ๋А๋ฆฌ๊ณ , ์ง๊ด€์ ์ด์ง€ ์•Š์œผ๋ฉฐ, ํ–…ํ‹ฑ ํ”ผ๋“œ๋ฐฑ์ด ๋ถ€์กฑํ•ด ์ ‘์ด‰์ด ํ’๋ถ€ํ•œ ์ž‘์—…์—์„  ์‚ฌ์šฉ์ž๊ฐ€ ์ž๊ธฐ ์†์ด ๋ญ˜ ํ•˜๊ณ  ์žˆ๋Š”์ง€ ์•Œ๊ธฐ๊ฐ€ ํž˜๋“ญ๋‹ˆ๋‹ค.

์…‹์งธ, ์ฐฉ์šฉํ˜• ์žฅ์น˜(glove, exoskeleton)์ž…๋‹ˆ๋‹ค. ์‚ฌ๋žŒ ์†์˜ ์›€์ง์ž„์„ ๋กœ๋ด‡ ์†์˜ ์ž์œ ๋„๋กœ ์ง์ ‘ ๋งคํ•‘ํ•  ์ˆ˜ ์žˆ์–ด retargeting์˜ ๋ชจํ˜ธ์„ฑ์„ ์ค„์—ฌ์ค๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ๋‹ค์‹œ ๋‘ ๊ฐˆ๋ž˜๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

flowchart TD
    A[๋ฐ๋ชจ ์ˆ˜์ง‘ ๋ฐฉ๋ฒ•] --> B[์‹œ๋ฎฌ/๋น„๋””์˜ค]
    A --> C[์›๊ฒฉ์กฐ์ž‘]
    A --> D[์ฐฉ์šฉํ˜• ์žฅ์น˜]
    B --> B1[๊ทœ๋ชจ: ๋ฌดํ•œ๋Œ€]
    B --> B2[์ ‘์ด‰ fidelity: ๋‚ฎ์Œ]
    C --> C1[์ œ์–ด๊ณต๊ฐ„ ์ผ์น˜]
    C --> C2[์†๋„์™€ ์ง๊ด€์„ฑ ์†์‹ค]
    D --> E[๋ฐ์ดํ„ฐ ๊ธ€๋Ÿฌ๋ธŒ]
    D --> F[์™ธ๊ณจ๊ฒฉ Exoskeleton]
    E --> E1[Correspondence ๋ฌธ์ œ]
    F --> G[DexUMI: ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ •๋ ฌ]
    F --> H[DexOP: ๊ฐ•์ฒด ๋งํ‚ค์ง€ ์ •๋ ฌ]
    F --> I[DexEXO: ํฌ์ฆˆ ํ—ˆ์šฉ + Passive Hand]
    G --> G1[Segmentation/Inpainting ํ•„์š”]
    H --> H1[๋กœ๋ด‡ ์† ์ข…์†, ์ธ์ฒด๊ณตํ•™ ์ œํ•œ]
    I --> I1[Raw RGB๋กœ ์ง์ ‘ ํ•™์Šต]

DexUMI๋Š” ๊ฐ€๋ฒผ์šด ์™ธ๊ณจ๊ฒฉ + ๋น„์ „ ๊ธฐ๋ฐ˜ ๋ณต์›์„ ์ถ”๊ตฌํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์™ธ๊ณจ๊ฒฉ ํ˜•์ƒ์ด ๋กœ๋ด‡ ์† ๋น„์œจ์„ ๋”ฐ๋ฅด๋‹ค ๋ณด๋‹ˆ ์‚ฌ๋žŒ ์† ํฌ๊ธฐ ๋‹ค์–‘์„ฑ์— ๋‘”๊ฐํ•˜๊ณ , ์นด๋ฉ”๋ผ๊ฐ€ ๋ณด๋Š” ์†์˜ ๋ชจ์–‘์ด ์‹ค์ œ ๋กœ๋ด‡ ์†๊ณผ ๋‹ฌ๋ผ ํ•™์Šต ์ „ segmentation๊ณผ inpainting์„ ๊ฑฐ์ณ์•ผ ํ•ฉ๋‹ˆ๋‹ค. DexOP๋Š” ๊ฐ•์ฒด ๋งํ‚ค์ง€๋กœ ์‚ฌ๋žŒ ์†๊ณผ ๋กœ๋ด‡ ์† ์šด๋™ํ•™์„ ๊ฐ•ํ•˜๊ฒŒ ๋ฌถ์—ˆ์Šต๋‹ˆ๋‹ค. ์ •ํ™•๋„๋Š” ์ข‹์ง€๋งŒ ํŠน์ • ๋กœ๋ด‡ ์† ๊ธฐํ•˜์— ๋ฐ•ํ˜€๋ฒ„๋ ค ๋‹ค๋ฅธ ์†์— ๋ชป ์”๋‹ˆ๋‹ค.

DexEXO๋Š” ์ด ๋‘˜์˜ ์ค‘๊ฐ„์ด ์•„๋‹ˆ๋ผ, ์‚ฌ์‹ค์ƒ ์ง๊ตํ•˜๋Š” ์œ„์น˜์— ์ž๋ฆฌ๋ฅผ ์žก์Šต๋‹ˆ๋‹ค. โ€œ๊ธฐ๊ตฌํ•™์  ์ •๋ ฌ์„ ๊ฐ•์ œํ•˜์ง€ ๋ง๊ณ , ์–ด๊ธ‹๋‚จ์„ ๋ฐ›์•„๋“ค์ด๋Š” ์ž์œ ๋„๋ฅผ ์˜๋„์ ์œผ๋กœ ๋‚จ๊ฒจ๋‘์žโ€๋ผ๋Š” ๋ฐœ์ƒ์ด ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์•„์ด๋””์–ด: ๋งž์ถ”์ง€ ๋ง๊ณ  ํก์ˆ˜ํ•˜๋ผ

์‚ฌ๋žŒ ์†์€ ๊ธธ์ด๋„ ๋น„์œจ๋„ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๊ฐ•์ฒด ์™ธ๊ณจ๊ฒฉ์ด ์‚ฌ๋žŒ ์† ๊ด€์ ˆ์ถ•๊ณผ ์™ธ๊ณจ๊ฒฉ ๊ด€์ ˆ์ถ•์„ ์ผ์น˜์‹œํ‚ค๋ ค ๋“ค๋ฉด, ๋ˆ„๊ตฐ๊ฐ€์—๊ฒ ์ž˜ ๋งž๊ณ  ๋ˆ„๊ตฐ๊ฐ€์—๊ฒ ํ†ต์ฆ์„ ์ค๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋ณ„ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์„ ๋„์ž…ํ•˜๋ฉด ๋งž์ถœ ์ˆ˜๋Š” ์žˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ํ™•์žฅ์„ฑ์ด ์ฃฝ์Šต๋‹ˆ๋‹ค. ๋งค๋ฒˆ ์žฅ๋น„๋ฅผ ์‚ฌ๋žŒ๋งˆ๋‹ค ์กฐ์ •ํ•˜๋Š” ๋น„์šฉ์ด ๋„ˆ๋ฌด ํฝ๋‹ˆ๋‹ค.

DexEXO์˜ ์•„์ด๋””์–ด๋Š” ์ด ๋ฐœ์ƒ์˜ ์ „ํ™˜์ž…๋‹ˆ๋‹ค. ๊ธฐ๊ตฌํ•™์  ์ผ์น˜๋ฅผ ๊ฐ•์ œํ•˜์ง€ ์•Š๊ณ , ์‚ฌ๋žŒ ์†๊ณผ ์™ธ๊ณจ๊ฒฉ ์‚ฌ์ด์— ์˜๋„๋œ ์ž์œ ๋„(์—ฌ์œ  ์ž์œ ๋„, residual DoF)๋ฅผ ๋‚จ๊ฒจ๋‘ก๋‹ˆ๋‹ค. ์ด ์ž์œ ๋„๊ฐ€ ์‚ฌ๋žŒ๋งˆ๋‹ค ๋‹ค๋ฅธ ํ•ด๋ถ€ํ•™์„ ๊ตญ์†Œ์ ์œผ๋กœ ํก์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ์€ ๋‘˜์ž…๋‹ˆ๋‹ค.

  • ์†๊ฐ€๋ฝ์—๋Š” ์Šคํ”„๋ง์ด ๋“ค์–ด๊ฐ„ ์Šฌ๋ผ์ด๋”(spring-loaded slider)๋ฅผ ๋‘ฌ์„œ ์†๊ฐ€๋ฝ ๊ธธ์ด ๋ณ€๋™์„ ํก์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์—„์ง€์—๋Š” ๋‘ ๊ฐœ์˜ ๋งํ‚ค์ง€(distal + metacarpal)๋กœ ๊ตฌ์„ฑ๋œ ํฌ์ฆˆ-ํ—ˆ์šฉ(pose-tolerant) ์ปคํ”Œ๋ง์„ ๋‘ฌ์„œ ์™ธ๊ณจ๊ฒฉ ๋ณธ์ฒด๊ฐ€ ์†๋ฐ”๋‹ฅ์— ๋Œ€ํ•ด ์ž์œ ๋กญ๊ฒŒ ๋– ๋‹ค๋‹ ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์— ๋”ํ•ด ํ•œ ๊ฐ€์ง€ ์˜๋ฆฌํ•œ ํŠธ๋ฆญ์ด ์žˆ์Šต๋‹ˆ๋‹ค. Passive Hand, ์ฆ‰ ์‚ฌ๋žŒ ์†์— ๋งค๋‹ค๋Š” โ€œ๋กœ๋ด‡ ์† ํ˜•์ƒ์˜ ๋”๋ฏธโ€์ž…๋‹ˆ๋‹ค. ์ด passive hand๋Š” ์‹ค์ œ ๋ฐฐ์น˜ ์‹œ ์‚ฌ์šฉ๋  ๋กœ๋ด‡ ์†(ROHand)๊ณผ ์‹œ๊ฐ์ ์œผ๋กœ ๊ฑฐ์˜ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ์†๋ชฉ์— ์žฅ์ฐฉ๋œ ์นด๋ฉ”๋ผ๊ฐ€ ๋ณด๋Š” ํ’๊ฒฝ์ด ๋ฐ๋ชจ ์‹œ์™€ ์ถ”๋ก  ์‹œ์— ๋˜‘๊ฐ™์•„์ง‘๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์ •์ฑ…์€ raw RGB๋กœ๋ถ€ํ„ฐ ๊ณง์žฅ ํ•™์Šตํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค. Segmentation, masking, inpainting์„ ๋‹ค ๊ฑด๋„ˆ๋›ฐ๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค.

ํ•˜๋“œ์›จ์–ด ์„ค๊ณ„ ๊นŠ์ด ๋“ค์—ฌ๋‹ค๋ณด๊ธฐ

์‹œ์Šคํ…œ ๊ฐœ์š”

DexEXO๋Š” ์„ธ ๋ถ€๋ถ„์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  1. ๋งํ‚ค์ง€ ๊ตฌ๋™ ์™ธ๊ณจ๊ฒฉ(linkage-driven wearable exoskeleton), (2) ์ˆ˜๋™ ๋ฐ๋ชจ ํ•ธ๋“œ(passive demonstration hand), (3) ๋ฌด์„  ๋™์ž‘์„ ์œ„ํ•œ ์˜จ๋ณด๋“œ ์„ผ์‹ฑ/์ „๋ ฅ ๋ชจ๋“ˆ์ž…๋‹ˆ๋‹ค. Passive hand๋Š” 6 ์ž์œ ๋„์˜ OYMotion ROH-AP001 (ROHand) ๊ธฐํ•˜๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ROHand๋Š” ์—„์ง€์— 2์ž์œ ๋„(IP flexion/extension, TM abduction/adduction), ๋‚˜๋จธ์ง€ ๋„ค ์†๊ฐ€๋ฝ์— ๊ฐ๊ฐ 1์ž์œ ๋„์˜ ๊ตด๊ณก(flexion)์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ์ด 6 ์ž์œ ๋„์ž…๋‹ˆ๋‹ค.
[Dorsal-mounted electronics module]
      |
      v
[Exoskeleton structure (linkage)]
      |
      +-- 4-finger parallel four-bar linkage -> [Passive finger]
      |   (parallel motion transmission)
      |
      +-- 2-link distal+metacarpal coupling -> [Passive thumb]
      |   (pose-tolerant)
      |
      v
[Wrist-mounted RealSense RGB + iPhone (AR pose)]

์ด ๊ตฌ์กฐ์˜ ๋ฌ˜๋ฏธ๋Š”, ์™ธ๊ณจ๊ฒฉ์ด ์‚ฌ๋žŒ ์† ์œ„์— โ€œํ—๊ฒ๊ฒŒโ€ ๋–  ์žˆ๊ณ , ์‚ฌ๋žŒ ์†์˜ ์›€์ง์ž„์€ ๋งํ‚ค์ง€๋ฅผ ํ†ตํ•ด passive hand๋กœ ์ „๋‹ฌ๋œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์นด๋ฉ”๋ผ๋Š” passive hand๋ฅผ ๋ด…๋‹ˆ๋‹ค. ์‚ฌ๋žŒ ์†์€ ์นด๋ฉ”๋ผ ์‹œ์•ผ์— ๊ฑฐ์˜ ๋“ค์–ด์˜ค์ง€ ์•Š๊ฑฐ๋‚˜, ๋“ค์–ด์™€๋„ ์™ธ๊ณจ๊ฒฉ ๋„ˆ๋จธ์— ์žˆ์–ด ๋ฌด์‹œํ•  ๋งŒํ•ฉ๋‹ˆ๋‹ค.

์Šฌ๋ผ์ด๋” ๊ธฐ๋ฐ˜ ์†๊ฐ€๋ฝ ์ธํ„ฐํŽ˜์ด์Šค

๊ฐ ์†๊ฐ€๋ฝ์—๋Š” ์Šคํ”„๋ง์ด ๋‹ฌ๋ฆฐ ์„ ํ˜• ์Šฌ๋ผ์ด๋”๊ฐ€ ์žˆ๊ณ , ๊ทธ ๋์— TPU(thermoplastic polyurethane)๋กœ ๋งŒ๋“  ์ปดํ”Œ๋ผ์ด์–ธํŠธ ํ•‘๊ฑฐ์ฝง(fingercot)์ด ๋ถ™์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•‘๊ฑฐ์ฝง์€ ์†๊ฐ€๋ฝ ๋์„ ๊ฐ์‹ธ๋Š” ์ผ์ข…์˜ ๋ถ€๋“œ๋Ÿฌ์šด ๊ณจ๋ฌด ๊ฐ™์€ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์Šฌ๋ผ์ด๋”๋Š” ์†๊ฐ€๋ฝ ๊ธธ์ด๊ฐ€ ์‚ฌ์šฉ์ž๋งˆ๋‹ค ๋‹ฌ๋ผ๋„ ๊ฐ™์€ ๊ตด๊ณก ๋ณ€์œ„(curl displacement)๋ฅผ ๋ณด์žฅํ•˜๋„๋ก ์„ค๊ณ„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ต์‹ฌ์€ ์‚ฝ์ž… ๊นŠ์ด(insertion depth)์™€ ๊ด€์ ˆ์ถ• ์ •๋ ฌ(joint-axis alignment)์„ ๋ถ„๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์†๊ฐ€๋ฝ์ด ๊ธธ๋“  ์งง๋“ , ์Šฌ๋ผ์ด๋”๊ฐ€ ๊ทธ ์ฐจ์ด๋ฅผ ๋นจ์•„๋“ค์ž…๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์€ ์ด ์Šฌ๋ผ์ด๋”์˜ ๊ธฐํ•˜์  ํ•œ๊ณ„๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ์ค‘์ง€(middle finger)๋ฅผ ๊ธฐ์ค€์œผ๋กœ, ๋‹ค์Œ ๋ถ€๋“ฑ์‹์ด ๋งŒ์กฑ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

L_{max} = d_{max} - d_{curl}, \quad MFL_{max} = L_{max} + \delta

H_{min} = \frac{L_{min}}{r}, \quad H_{max} = \frac{MFL_{max}}{r}

์—ฌ๊ธฐ์„œ L_{min}์€ TPU ๋ง๊ณผ ํ•‘๊ฑฐ์ฝง ๊ฐ„ ํœด์‹ ์‹œ ์ตœ์†Œ ๊ฑฐ๋ฆฌ, d_{max}๋Š” ์Šฌ๋ผ์ด๋” ์ตœ๋Œ€ ์ด๋™ ๊ฑฐ๋ฆฌ, d_{curl}์€ ์™„์ „ ๊ตด๊ณก์— ํ•„์š”ํ•œ ์Šฌ๋ผ์ด๋” ์—ฌ์œ , \delta๋Š” TPU ๋ง์ด ์†๊ฐ€๋ฝ ์‚ฌ์ด ๋ง‰(webbing) ์œ„์— ๋–  ์žˆ์„ ์ˆ˜ ์žˆ๋Š” ํ—ˆ์šฉ ์˜คํ”„์…‹, r์€ ์ค‘์ง€-์† ๊ธธ์ด ๋น„์œจ(์•ฝ 0.39~0.40)์ž…๋‹ˆ๋‹ค.

์ˆ˜์‹์„ ์ง๊ด€์ ์œผ๋กœ ํ’€์–ด๋ณด๋ฉด ์ด๋Ÿฐ ์ด์•ผ๊ธฐ์ž…๋‹ˆ๋‹ค. โ€œ์Šฌ๋ผ์ด๋”๋Š” ์†๊ฐ€๋ฝ ๊ธธ์ด ๋ณ€๋™์„ ํก์ˆ˜ํ•˜์ง€๋งŒ, ๊ทธ ํก์ˆ˜๋Ÿ‰์—๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ๊ทธ ํ•œ๊ณ„๋Š” ์Šฌ๋ผ์ด๋”์˜ ๋ฌผ๋ฆฌ์  ์ด๋™ ๋ฒ”์œ„(d_{max})์—์„œ ๊ตด๊ณก์— ํ•„์š”ํ•œ ์ตœ์†Œ ์—ฌ์œ (d_{curl})๋ฅผ ๋บ€ ๊ฒƒ์ด๋‹ค. ์†๋ฐ”๋‹ฅ ์œ„์ชฝ์— ๋ง์ด ๋–  ์žˆ์–ด๋„ ๋˜๋Š” ์–‘(\delta)์„ ๋”ํ•˜๋ฉด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์†๊ฐ€๋ฝ ๊ธธ์ด๊ฐ€ ์ •ํ•ด์ง€๊ณ , ์ธ์ฒด ๋น„์œจ(r)๋กœ ๋‚˜๋ˆ„๋ฉด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์† ์ „์ฒด ๊ธธ์ด๊ฐ€ ๋‚˜์˜จ๋‹ค.โ€

๋…ผ๋ฌธ์ด ์ œ์‹œํ•œ ์ˆ˜์น˜(L_{min}=56 mm, d_{max}=86 mm, d_{curl}=16 mm, \delta=17 mm, r=0.40)๋ฅผ ๋Œ€์ž…ํ•˜๋ฉด H_{min}=140 mm, H_{max}=217 mm๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค. ์† ๊ธธ์ด 140~217 mm ๋ฒ”์œ„๋ฅผ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ์—†์ด ๋‹ค ์ปค๋ฒ„ํ•œ๋‹ค๋Š” ๋œป์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์—ฐ๊ตฌ์— ์ฐธ์—ฌํ•œ 14๋ช…(์† ๊ธธ์ด 165~195 mm)์€ ๋ชจ๋‘ ์ด ๋ฒ”์œ„ ์•ˆ์— ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

์ด ๋‹จ์ˆœํ•œ ๋ถ„์„์ด ์‹œ์‚ฌํ•˜๋Š” ๋ฐ”๋Š” ํฝ๋‹ˆ๋‹ค. ์† ํฌ๊ธฐ ํ˜ธํ™˜ ๋ฒ”์œ„๋ฅผ ํ•˜๋“œ์›จ์–ด ์„ค๊ณ„ ์‹œ์ ์—์„œ ๋ถ€๋“ฑ์‹ ํ•œ ์ค„๋กœ ํ†ต์ œํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์‹ ์ถ•์„ฑ ์ง๋ฌผ์ด๋‚˜ ๋จธ์‹ ๋Ÿฌ๋‹์œผ๋กœ ์‚ฌํ›„์— ๋ณด์ •ํ•˜๋Š” ์ ‘๊ทผ๊ณผ ๋ณธ์งˆ์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

ํฌ์ฆˆ-ํ—ˆ์šฉ ์—„์ง€ ๋ฉ”์ปค๋‹ˆ์ฆ˜: ๊ฐ€์žฅ ์˜๋ฆฌํ•œ ๋ถ€๋ถ„

์—„์ง€๋Š” ์†์—์„œ ๊ฐ€์žฅ ๊นŒ๋‹ค๋กœ์šด ์†๊ฐ€๋ฝ์ž…๋‹ˆ๋‹ค. abduction, adduction, opposition์„ ํ†ตํ•ด ๋‹ค๋ฅธ ์†๊ฐ€๋ฝ๊ณผ ๋งˆ์ฃผ ๋ณด๋ฉฐ in-hand manipulation์˜ ์ ˆ๋ฐ˜ ์ด์ƒ์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์‚ฌ๋žŒ๋งˆ๋‹ค ์—„์ง€ ๊ด€์ ˆ ์œ„์น˜๊ฐ€ ๋„ˆ๋ฌด ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ์ด ์‚ฌ๋žŒ ์—„์ง€ ๊ด€์ ˆ์ถ•๊ณผ ์ผ์น˜ํ•˜๋„๋ก ๊ฐ•์ฒด๋กœ ๋งŒ๋“ค๋ฉด ์–ด๋–ค ์‚ฌ๋žŒ์—๊ฒ ํ†ต์ฆ์„ ์ฃผ๊ณ , ์–ด๋–ค ์‚ฌ๋žŒ์—๊ฒ ์šด๋™์„ ์ œํ•œํ•ฉ๋‹ˆ๋‹ค.

DexEXO์˜ ํ•ด๋ฒ•์€ ํฅ๋ฏธ๋กญ์Šต๋‹ˆ๋‹ค. ๋ฐฉํ–ฅ(orientation)์€ ๊ฐ•์ œํ•˜์ง€ ์•Š๊ณ  ๊ฑฐ๋ฆฌ(distance)๋งŒ ๊ตฌ์†ํ•ฉ๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ ์—„์ง€์™€ passive ์—„์ง€ ์‚ฌ์ด์— distal link(๋ง๋‹จ ๋งํ‚ค์ง€)์™€ metacarpal link(์†ํ—ˆ๋ฆฌ ๋งํ‚ค์ง€) ๋‘ ๊ฐœ๋ฅผ ๋‘ก๋‹ˆ๋‹ค. ๊ฐ ๋งํ‚ค์ง€๋Š” ์–‘ ๋์— swivel joint(์ž์œ  ํšŒ์ „ ๊ด€์ ˆ)๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ์ด ๊ตฌ์กฐ๋Š” link ๊ธธ์ด๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ, ์™ธ๊ณจ๊ฒฉ์ด ์†๋ฐ”๋‹ฅ์— ๋Œ€ํ•ด ํšŒ์ „ํ•˜๊ณ  ํ‰ํ–‰์ด๋™ํ•  ์ž์œ ๋ฅผ ์ค๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์€ ์ด๋ฅผ SE(3)์—์„œ ํ˜•์‹ํ™”ํ•ฉ๋‹ˆ๋‹ค. Palm-base frame \{B\}์— ๋Œ€ํ•œ ์™ธ๊ณจ๊ฒฉ frame \{E\}์˜ ์ž์„ธ๋ฅผ ๋‹ค์Œ์ฒ˜๋Ÿผ ๋‘ก๋‹ˆ๋‹ค.

{}^{B}T_{E} = \begin{bmatrix} {}^{B}R_{E} & {}^{B}p_{E} \\ 0 & 1 \end{bmatrix} \in SE(3)

Passive thumb์˜ ๊ตฌ์„ฑ์„ q_p = [\theta_2, \theta_4]^\top (IP angle, TM ab/ad angle), ๊ทธ๋ฆฌ๊ณ  \theta_3 = f(\theta_2) (๊ธฐ๊ตฌ์ ์œผ๋กœ ๊ฒฐํ•ฉ)์ด๋ผ ๋‘๊ณ , ๋‘ ๋ถ€์ฐฉ์  \{}^{B}r_d, {}^{B}r_m\}์ด passive thumb ์šด๋™ํ•™์—์„œ ๊ณ„์‚ฐ๋œ๋‹ค๊ณ  ํ•ฉ์‹œ๋‹ค. ์™ธ๊ณจ๊ฒฉ ์ชฝ ๋ถ€์ฐฉ์ ์€ frame \{E\}์—์„œ ์ƒ์ˆ˜ ๋ฒกํ„ฐ \{}^{E}\bar{r}_d, {}^{E}\bar{r}_m\}์ด๊ณ , ์ด๋ฅผ \{B\}๋กœ ์˜ฎ๊ธฐ๋ฉด

{}^{B}r^E_i = {}^{B}R_E \, {}^{E}\bar{r}_i + {}^{B}p_E, \quad i \in \{d, m\}

๋งํ‚ค์ง€ ๋‘ ๊ฐœ๊ฐ€ ๊ฐ•์š”ํ•˜๋Š” holonomic ์ œ์•ฝ์€ ๋‹จ์ˆœํžˆ ๊ฑฐ๋ฆฌ ๋™์ผ์„ฑ์ž…๋‹ˆ๋‹ค.

\|{}^{B}r^E_d - {}^{B}r_d(q_p)\| = L_d, \quad \|{}^{B}r^E_m - {}^{B}r_m(q_p)\| = L_m

ํ•ต์‹ฌ์€ ์ด ๋‘ ์‹์ด ๋ฌด์—‡์„ ํ•˜์ง€ ์•Š๋А๋ƒ์— ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐฉํ–ฅ ์ผ์น˜(orientation alignment)๋Š” ๊ฐ•์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‘ ๋ถ€์ฐฉ์ ์ด ์ •ํ•ด์ง„ ๊ฑฐ๋ฆฌ๋งŒํผ ๋–จ์–ด์ ธ ์žˆ๊ธฐ๋งŒ ํ•˜๋ฉด ๊ทธ๋งŒ์ž…๋‹ˆ๋‹ค.

์ด์ œ ์ž์œ ๋„ ์‚ฐ์ˆ˜๋ฅผ ํ•ด๋ด…์‹œ๋‹ค. {}^{B}T_E๋Š” 6์ž์œ ๋„์ž…๋‹ˆ๋‹ค. ๊ฑฐ๋ฆฌ ์ œ์•ฝ์€ ๋‘ ๊ฐœ์˜ ์Šค์นผ๋ผ holonomic ์ œ์•ฝ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ passive thumb ์ž์„ธ q_p๋ฅผ ๊ณ ์ •ํ•œ ์ƒํƒœ์—์„œ, ์™ธ๊ณจ๊ฒฉ ์ž์„ธ \{}^{B}T_E\}๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ 4์ฐจ์› self-motion manifold ์œ„์—์„œ ์ž์œ ๋กญ๊ฒŒ ์›€์ง์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

\text{Residual DoF} = 6 - 2 = 4

์ด 4์ž์œ ๋„๊ฐ€ ๊ณง ๋…ผ๋ฌธ์ด ๋งํ•˜๋Š” โ€œwiggle spaceโ€์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ์—„์ง€๋ฅผ ๊ฑฐ์˜ ๊ฐ™์€ ์ž์„ธ๋กœ ์œ ์ง€ํ•œ ์ฑ„ ์™ธ๊ณจ๊ฒฉ ๋ณธ์ฒด๋ฅผ ์†๋ฐ”๋‹ฅ ์œ„์—์„œ ์‚ด์ง ํ”๋“ค๊ฑฐ๋‚˜ ๋น„ํ‹€์–ด๋„, passive thumb์€ ๊ฐ™์€ ์ž์„ธ๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์‹œ ๋งํ•ด ์‚ฌ๋žŒ ์—„์ง€์˜ ์ž‘์€ ํ”๋“ค๋ฆผ์ด๋‚˜ ์‚ฌ๋žŒ๋งˆ๋‹ค ๋‹ค๋ฅธ ์—„์ง€ ๋ฒ ์ด์Šค ์œ„์น˜๋ฅผ 4์ฐจ์› self-motion manifold๊ฐ€ ๊ทธ๋Œ€๋กœ ํก์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.

์ด๊ฑด ๋‹จ์ˆœํ•œ ์ปดํ”Œ๋ผ์ด์–ธ์Šค๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์œ„์ƒ๊ธฐํ•˜์ ์œผ๋กœ โ€œํ•„์š”ํ•œ ์ •๋ณด๋งŒ ๊ฐ•์ œํ•˜๊ณ  ๋‚˜๋จธ์ง€๋Š” ํ’€์–ด์ค€โ€ ์„ค๊ณ„์ž…๋‹ˆ๋‹ค. ๋กœ๋ด‡๊ณตํ•™์—์„œ ํ”ํžˆ ๋ณด๋Š” โ€œconstrain only what you mustโ€ ์›์น™์˜ ๊น”๋”ํ•œ ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.

Passive Hand: ์นด๋ฉ”๋ผ๊ฐ€ ๊ฑฐ์ง“๋ง์„ ํ•˜์ง€ ์•Š๊ฒŒ

์ˆ˜๋™ ํ•ธ๋“œ๋Š” ROHand์˜ ์™ธํ˜•์„ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ผ ๋งŒ๋“  ๋ถ€ํ’ˆ์ž…๋‹ˆ๋‹ค. ์†๋ชฉ์— ์žฅ์ฐฉ๋œ RealSense ์นด๋ฉ”๋ผ๊ฐ€ ๋ณด๋Š” ์†์€ ๋ฐ๋ชจ ์‹œ์—๋Š” passive hand์ด๊ณ , ์ถ”๋ก  ์‹œ์—๋Š” ์‹ค์ œ ROHand์ž…๋‹ˆ๋‹ค. ๋‘˜์ด ์‹œ๊ฐ์ ์œผ๋กœ ๊ฑฐ์˜ ๋™์ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— embodiment gap์ด ์˜์ƒ ๋ ˆ๋ฒจ์—์„œ ์†Œ์‹ค๋ฉ๋‹ˆ๋‹ค.

DexUMI๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•™์Šต ๋‹จ๊ณ„์—์„œ ํ’€๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค. Segmentation์œผ๋กœ ์™ธ๊ณจ๊ฒฉ ๋ถ€๋ถ„์„ ๋–ผ์–ด๋‚ด๊ณ  inpainting์œผ๋กœ ๊ทธ ์ž๋ฆฌ์— ๋กœ๋ด‡ ์†์„ ํ•ฉ์„ฑํ•ด ๋„ฃ์—ˆ์Šต๋‹ˆ๋‹ค. DexEXO๋Š” ๊ทธ ๋ชจ๋“  ํ›„์ฒ˜๋ฆฌ๋ฅผ ํ•˜๋“œ์›จ์–ด๋กœ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ์นด๋ฉ”๋ผ๊ฐ€ ๋ณด๋Š” ๊ทธ๋ฆผ์ด ์ฒ˜์Œ๋ถ€ํ„ฐ ๋งž์œผ๋‹ˆ๊นŒ์š”.

์ด ์ฐจ์ด๋Š” ํ‘œ๋ฉด์ ์œผ๋กœ๋Š” ๋‹จ์ˆœํ•˜์ง€๋งŒ, ์ •์ฑ… ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ์ „์ฒด์— ์˜ํ–ฅ์„ ์ค๋‹ˆ๋‹ค. Segmentation ํ’ˆ์งˆ์— ์˜์กดํ•˜์ง€ ์•Š๊ณ , inpainting์˜ ์‹œ๊ฐ์  ์ธ๊ณต๋ฌผ(visual artifact)๋„ ๋“ค์–ด๊ฐ€์ง€ ์•Š์œผ๋ฉฐ, ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ sim-to-real ๊ฐญ์ด ์•„๋‹ˆ๋ผ real-to-real ๋™์น˜์„ฑ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์ˆ˜์ง‘๊ณผ ์ •์ฑ… ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ

๋ฐ์ดํ„ฐ ํ๋ฆ„์„ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

[1 kHz] 6 analog encoders (finger joints) -- onboard MCU --
                                                          \
                                                           +--> Host PC
                                                          /        |
[60 Hz] iPhone AR (6-DoF EE pose via TeleDex) -----------/         |
                                                                   |
[30 Hz] Intel RealSense RGB (wrist-mounted, 640x480) --------------+
                                                                   |
                                                                   v
                          Time-synchronized via video timestamps
                                                                   |
                                                                   v
                 [DINOv2 ViT-S/14 encoder] + (optional 6D finger state)
                                                                   |
                                                                   v
                            [Diffusion Policy backbone]
                                                                   |
                                                                   v
                  12-D action (6 EE delta + 6 finger commands)
                  horizon=16, execute first 8, receding horizon

์†๊ฐ€๋ฝ ์œ„์น˜๋Š” ์™ธ๊ณจ๊ฒฉ ์•ˆ์— ๋ฐ•ํžŒ 6๊ฐœ์˜ ์•„๋‚ ๋กœ๊ทธ ์—”์ฝ”๋”๋กœ 1 kHz์—์„œ ์ธก์ •๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์™ธ๊ณจ๊ฒฉ ์—”์ฝ”๋” ๊ฐ’์ด ๊ณง๋ฐ”๋กœ ๋กœ๋ด‡ ์† actuator ๋ช…๋ น์ด ๋˜์ง„ ์•Š์Šต๋‹ˆ๋‹ค. ROHand์˜ actuation์€ ๋น„์„ ํ˜•์ด๋ผ์„œ, ๊ฐ™์€ ๋ฌผ๋ฆฌ์  ์ž์„ธ๋ฅผ ์™ธ๊ณจ๊ฒฉ๊ณผ ROHand์—์„œ ๊ฐ๊ฐ samplingํ•œ waypoint๋“ค๋กœ piecewise linear interpolation์„ ๋งŒ๋“ค์–ด ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฑด ํ”ํžˆ ๋ณด๋Š” retargeting์˜ ํ•œ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค.

End-effector 6์ž์œ ๋„ ์ž์„ธ๋Š” iPhone์˜ ARKit ๊ธฐ๋ฐ˜ ์ถ”์ (TeleDex ์•ฑ)์œผ๋กœ ์žก์Šต๋‹ˆ๋‹ค. iPhone์„ ์†๋ชฉ์— ๋งค๋‹จ๋‹ค๋Š” ๋ฐœ์ƒ์€ ์˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋น„์‹ผ ๋ชจ์…˜ ์บก์ณ ์‹œ์Šคํ…œ ์—†์ด๋„ in-the-wild ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.

์‹œ๊ฐ ๊ด€์ธก์€ ์†๋ชฉ ์žฅ์ฐฉ RealSense RGB(640ร—480, 30 Hz)์ž…๋‹ˆ๋‹ค. ํ•™์Šต ์‹œ 240ร—240์œผ๋กœ ๋ฆฌ์‚ฌ์ด์ฆˆํ•˜๊ณ  224ร—224๋กœ random crop, color jitter๋ฅผ ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ธ์ฝ”๋”๋Š” DINOv2 ViT-S/14์ž…๋‹ˆ๋‹ค. Self-supervised pretrain๋œ ๊ฐ•๋ ฅํ•œ ์‹œ๊ฐ ํŠน์ง•์„ ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ๋ณด์กฐ ์ž…๋ ฅ์œผ๋กœ 6D ์†๊ฐ€๋ฝ ์ƒํƒœ๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค(์žˆ์„ ๋•Œ์™€ ์—†์„ ๋•Œ์˜ ์ฐจ์ด๋Š” V-C์—์„œ ablation์œผ๋กœ ๊ฒ€ํ† ํ•ฉ๋‹ˆ๋‹ค).

์ •์ฑ…์€ diffusion policy(Chi et al., RSS 2023)์ž…๋‹ˆ๋‹ค. 16-step action์„ ํ•œ ๋ฒˆ์— ์˜ˆ์ธกํ•˜๊ณ  ์ฒซ 8๊ฐœ๋ฅผ receding horizon์œผ๋กœ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. Action์€ horizon ์‹œ์ž‘ ์‹œ์ ์— ๋Œ€ํ•œ ์ƒ๋Œ€๊ฐ’(T_k - T_0)์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. 12์ฐจ์› action์€ EE 6์ž์œ ๋„์™€ finger 6์ž์œ ๋„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ํ•œ ๊ฐ€์ง€ ์ฃผ๋ชฉํ•  ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. DexEXO ํŒŒ์ดํ”„๋ผ์ธ์—๋Š” segmentation, masking, inpainting์ด ์—†์Šต๋‹ˆ๋‹ค. Raw RGB๊ฐ€ ๊ทธ๋Œ€๋กœ ์ •์ฑ…์˜ ์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค. ์ด๊ฒŒ ๊ฐ€๋Šฅํ•œ ์ด์œ ๋Š” hardware-level visual alignment ๋•๋ถ„์ž…๋‹ˆ๋‹ค. ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์ด ๋‹จ์ˆœํ•ด์ง„ ๋งŒํผ, ์‹œ๊ฐ ํ›„์ฒ˜๋ฆฌ์—์„œ ๋ˆ„์ ๋˜๋˜ ์˜ค์ฐจ๋„ ์‚ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

์‹คํ—˜๊ณผ ๊ฒฐ๊ณผ

Wiggle Space ์ธก์ •: ์ด๋ก ์ด ๋งž๋Š”๊ฐ€?

์ด๋ก ์ ์œผ๋กœ 4์ฐจ์› self-motion manifold๊ฐ€ ์กด์žฌํ•œ๋‹ค๊ณ  ํ–ˆ๋Š”๋ฐ, ์‹ค์ œ๋กœ๋Š” ์–ด๋–จ๊นŒ์š”? ๋…ผ๋ฌธ์€ ๋ชจ์…˜ ์บก์ณ๋กœ ์‹ค์ธกํ•ฉ๋‹ˆ๋‹ค. Pinch ์ž์„ธ๋ฅผ ์œ ์ง€ํ•œ ์ƒํƒœ์—์„œ, base์™€ ์™ธ๊ณจ๊ฒฉ ์—„์ง€ ๋งํ‚ค์ง€์— reflective marker๋ฅผ ๋ถ™์ด๊ณ  25์ดˆ ๋™์•ˆ ์‚ฌ์šฉ์ž๊ฐ€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ž‘์€ ์กฐ์ •์„ ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ์ด base์— ๋Œ€ํ•ด ๊ทธ๋ฆฌ๋Š” ์  ๊ตฌ๋ฆ„(point cloud)์ด ๋ฐ”๋กœ wiggle space์ž…๋‹ˆ๋‹ค.

์ด ์  ๊ตฌ๋ฆ„์˜ ๋ถ„ํฌ๋ฅผ covariance๋กœ ์š”์•ฝํ•ฉ๋‹ˆ๋‹ค.

\Sigma = \frac{1}{N-1} \sum_{i=1}^{N} (p_i - \bar{p})(p_i - \bar{p})^\top

\Sigma์˜ ๊ณ ์œ ๊ฐ’ \lambda_i๋กœ๋ถ€ํ„ฐ ํƒ€์›์ฒด ๋ฐ˜์ถ• ๊ธธ์ด a_i = k\sqrt{\lambda_i}๋ฅผ ๊ตฌํ•ฉ๋‹ˆ๋‹ค(k=2๋กœ 95% ์‹ ๋ขฐ์˜์—ญ). ์ธก์ • ๊ฒฐ๊ณผ๋Š” 66.12 mm, 49.19 mm, 21.14 mm์˜ ๋ฐ˜์ถ•์„ ๊ฐ€์ง€๋Š” ํƒ€์›์ฒด์˜€์Šต๋‹ˆ๋‹ค.

์ด ์ˆซ์ž๊ฐ€ ์‹œ์‚ฌํ•˜๋Š” ๋ฐ”๋Š” ๋ถ„๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ์ด ์†๋ฐ”๋‹ฅ์— ๋Œ€ํ•ด ๊ฐ€์žฅ ๊ธด ์ถ•์œผ๋กœ 6 cm ์ด์ƒ ์ž์œ ๋กญ๊ฒŒ ์›€์ง์—ฌ๋„ passive thumb ์ž์„ธ๋Š” ๊ฑฐ์˜ ๊ทธ๋Œ€๋กœ ์œ ์ง€๋œ๋‹ค๋Š” ๋œป์ž…๋‹ˆ๋‹ค. ์‚ฌ๋žŒ ์—„์ง€ ๊ธธ์ด๊ฐ€ 6 cm ์ •๋„๋ผ๋Š” ์ ์„ ์ƒ๊ฐํ•˜๋ฉด, ์‚ฌ๋žŒ๋งˆ๋‹ค ์—„์ง€ ์œ„์น˜๊ฐ€ ํ•œ ์†๊ฐ€๋ฝ ๊ธธ์ด๋งŒํผ ๋‹ค๋ฅด๋”๋ผ๋„ ๊ฐ™์€ ์™ธ๊ณจ๊ฒฉ์ด ์ž‘๋™ํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ์—†์ด cross-user ๋ฐ๋ชจ ์ˆ˜์ง‘์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ฃผ์žฅ์˜ ์‹ค์ฆ์  ๊ทผ๊ฑฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ์ž ์—ฐ๊ตฌ: ๊ฐ€์œ„์งˆ์ด ๋ณด์—ฌ์ค€ ์ง„์‹ค

14๋ช…(7๋ช… ๋‚จ, 7๋ช… ์—ฌ, 18~27์„ธ, ์† ๊ธธ์ด 165~195 mm)์ด ์„ธ ๊ฐ€์ง€ ์žฅ์น˜๋กœ ๋„ค ๊ฐ€์ง€ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์žฅ์น˜๋Š” DexEXO, DexUMI, ๊ทธ๋ฆฌ๊ณ  ๋น„์ „ ๊ธฐ๋ฐ˜ ์›๊ฒฉ์กฐ์ž‘(TeleDex)์ž…๋‹ˆ๋‹ค. ์ž‘์—…์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ๊ฐ€์œ„์งˆ(Scissors cutting): ๊ฐ€์œ„๋ฅผ ๋“ค์–ด ํ…Œ์ดํ”„๋ฅผ ์ž๋ฅด๊ธฐ
  • ํŽ˜์ด์ง€ ๋„˜๊ธฐ๊ธฐ(Page flipping): ์†๊ฐ€๋ฝ ๋์œผ๋กœ ๋…ธํŠธ๋ถ ํŽ˜์ด์ง€ ๋„˜๊ธฐ๊ธฐ
  • ์ปต ์Œ“๊ธฐ(Cup stacking): ์ปต 3๊ฐœ๋ฅผ ์œ„๋กœ ํ–ฅํ•˜๊ฒŒ ์Œ“๊ธฐ
  • ํ”ผ์•„๋…ธ ์—ฐ์ฃผ(Piano playing): 4๊ฐœ ์†๊ฐ€๋ฝ์œผ๋กœ 16๊ฐœ ์Œ ์—ฐ์ฃผ

ํ‘œ๋กœ ์ •๋ฆฌํ•œ ๊ฒฐ๊ณผ(120์ดˆ ์ œํ•œ, ํ‰๊ท  ยฑ SEM):

๋ฐฉ๋ฒ• ๊ฐ€์œ„์งˆ ์„ฑ๊ณต๋ฅ  ๊ฐ€์œ„์งˆ ์‹œ๊ฐ„(s) ํŽ˜์ด์ง€ ์„ฑ๊ณต๋ฅ  ํŽ˜์ด์ง€ ์‹œ๊ฐ„(s) ์ปต์Œ“๊ธฐ ์„ฑ๊ณต๋ฅ  ์ปต์Œ“๊ธฐ ์‹œ๊ฐ„(s) ํ”ผ์•„๋…ธ ์„ฑ๊ณต๋ฅ  ํ”ผ์•„๋…ธ ์‹œ๊ฐ„(s)
DexEXO 0.79 ยฑ 0.10 11.7 ยฑ 1.4 0.88 ยฑ 0.03 5.4 ยฑ 0.6 0.82 ยฑ 0.07 12.0 ยฑ 1.1 0.96 ยฑ 0.02 21.6 ยฑ 1.8
DexUMI 0.00 ยฑ 0.00 โ€” 0.86 ยฑ 0.04 4.7 ยฑ 0.7 0.80 ยฑ 0.07 8.9 ยฑ 1.0 0.62 ยฑ 0.13 25.9 ยฑ 2.5
Teleoperation 0.00 ยฑ 0.00 โ€” 0.51 ยฑ 0.06 18.0 ยฑ 2.1 0.33 ยฑ 0.09 68.6 ยฑ 13.1 0.60 ยฑ 0.09 97.4 ยฑ 7.8

์ด ํ‘œ์—์„œ ๊ฐ€์žฅ ํฅ๋ฏธ๋กœ์šด ํ–‰์€ ๊ฐ€์œ„์งˆ์ž…๋‹ˆ๋‹ค. DexEXO๋งŒ ์„ฑ๊ณตํ•ฉ๋‹ˆ๋‹ค. DexUMI๋Š” ์™ธ๊ณจ๊ฒฉ์ด robot hand ํ˜•์ƒ์—๋Š” ์—†๋Š” ์™ธ๋ถ€ ๊ธฐํ•˜๋ฅผ ์ถ”๊ฐ€ํ•ด์„œ ๊ฐ€์œ„ ์†์žก์ด ๊ตฌ๋ฉ์— ์†๊ฐ€๋ฝ์ด ์•ˆ ๋“ค์–ด๊ฐ”๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์›๊ฒฉ์กฐ์ž‘์€ ์ •๋ฐ€๋„, ๋ฐ˜์‘์„ฑ, ํž˜ ํ”ผ๋“œ๋ฐฑ ๋ถ€์กฑ์œผ๋กœ ์‹คํŒจํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ํ•œ ์ค„์ด ๋ณด์—ฌ์ฃผ๋Š” ๋ฐ”๊ฐ€ ํฝ๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ ํ˜•์ƒ์ด ๋กœ๋ด‡ ์† ํ˜•์ƒ๊ณผ ๋‹ค๋ฅด๋ฉด, ์‚ฌ๋žŒ์ด ๋„๊ตฌ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋Šฅ๋ ฅ ์ž์ฒด๊ฐ€ ์†์ƒ๋ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ fidelity ์ด์ „์˜ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

ํŽ˜์ด์ง€ ๋„˜๊ธฐ๊ธฐ์™€ ์ปต ์Œ“๊ธฐ์—์„œ๋Š” DexUMI๊ฐ€ ์‹œ๊ฐ„์ƒ ๋” ๋น ๋ฆ…๋‹ˆ๋‹ค(๊ฐ๊ฐ 13.0%, 25.8%). ๊ทธ๋Ÿฌ๋‚˜ ์„ฑ๊ณต๋ฅ ์€ DexEXO๊ฐ€ ์•ฝ๊ฐ„ ๋†’์Šต๋‹ˆ๋‹ค. ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๊ฐ€ ์žˆ๋Š” ์…ˆ์ž…๋‹ˆ๋‹ค.

ํ”ผ์•„๋…ธ ์—ฐ์ฃผ์—์„œ๋Š” ์ฐจ์ด๊ฐ€ ํฝ๋‹ˆ๋‹ค. DexEXO 0.82(๋˜๋Š” ํ‘œ์˜ ๋‹ค๋ฅธ ์…€ 0.96 โ€” ์ž‘์—… ๋ณ€ํ˜• ์ฐจ์ด๋กœ ์ถ”์ •), DexUMI 0.33. ์†๊ฐ€๋ฝ ๋…๋ฆฝ์„ฑ(finger independence)์ด ์ค‘์š”ํ•œ ์ž‘์—…์—์„œ ์™ธ๊ณจ๊ฒฉ ์„ค๊ณ„ ์ฒ ํ•™์ด ๊ฒฐ์ •์ ์ž…๋‹ˆ๋‹ค. ์ฃผ๊ด€ ํ‰๊ฐ€์—์„œ๋„ ์†๊ฐ€๋ฝ ๋…๋ฆฝ์„ฑ์—์„œ p \ll 0.01๋กœ DexEXO๊ฐ€ ์šฐ์„ธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌ์  ํŽธ์•ˆํ•จ(p = 0.0127), ์ขŒ์ ˆ๊ฐ ๊ฐ์†Œ(p = 0.0219)์—์„œ๋„ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜๋ฏธํ•œ ์ฐจ์ด๊ฐ€ ๋‚ฌ์Šต๋‹ˆ๋‹ค.

์›๊ฒฉ์กฐ์ž‘์ด ๋ชจ๋“  ์ž‘์—…์—์„œ ์™ธ๊ณจ๊ฒฉ ๋‘ ๋ฐฉ์‹์— ํฌ๊ฒŒ ๋ชป ๋ฏธ์นœ๋‹ค๋Š” ์ ๋„ ์ฃผ๋ชฉํ•  ๋งŒํ•ฉ๋‹ˆ๋‹ค. ์†๊ธฐ์ˆ  ์ˆ˜์ง‘์—์„œ ๋น„์ „ ๊ธฐ๋ฐ˜ ์›๊ฒฉ์กฐ์ž‘์ด ์‚ฌ์‹ค์ƒ baseline ์ดํ•˜์˜ ๋„๊ตฌ์ž„์„ ๋‹ค์‹œ ํ•œ ๋ฒˆ ํ™•์ธํ•˜๋Š” ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

์ •์ฑ… ํ‰๊ฐ€: Raw RGB๋งŒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํ•œ๊ฐ€

๋…ผ๋ฌธ์€ block pick-and-place, egg carton ์ •๋ฆฌ, bottle ์กฐ์ž‘ ์ž‘์—…์— ๋Œ€ํ•ด diffusion policy๋ฅผ ํ•™์Šตํ•˜๊ณ  roll-out์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค(Figure 7). ํ•ต์‹ฌ ์ฃผ์žฅ์€ embodiment-aligned RGB๋งŒ์œผ๋กœ๋„(์ฆ‰, segmentation/inpainting ์—†์ด๋„) ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์ด ๋‚˜์˜จ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์ด ํ›จ์”ฌ ๋‹จ์ˆœํ•ด์ง€๋ฉด์„œ task performance๋Š” ์œ ์ง€ ๋˜๋Š” ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.

Ablation์—์„œ๋Š” ์‹œ๊ฐ ์ž…๋ ฅ ๋‹จ๋… vs ์‹œ๊ฐ + ์†๊ฐ€๋ฝ ์ƒํƒœ(state proprioception)์˜ ๋น„๊ต, action ํ‘œํ˜„(์ ˆ๋Œ€ vs ์ƒ๋Œ€) ๋น„๊ต ๋“ฑ์„ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค. ์ž์„ธํ•œ ์ˆ˜์น˜๋Š” ๋…ผ๋ฌธ V-C ์ ˆ์— ํ‘œ๋กœ ๋‚˜์˜ค๋Š”๋ฐ, ํ•ต์‹ฌ ๋ฉ”์‹œ์ง€๋Š” ์ผ๊ด€๋ฉ๋‹ˆ๋‹ค. ํ•˜๋“œ์›จ์–ด๊ฐ€ ์‹œ๊ฐ ๋„๋ฉ”์ธ์„ ์ผ์น˜์‹œํ‚ค๋ฉด, ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ธก์—์„œ ํ•ด์•ผ ํ•  ์ผ์ด ์ค„์–ด๋“ ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

์ฒซ์งธ, โ€œ๋งž์ถ”์ง€ ๋ง๊ณ  ํก์ˆ˜ํ•˜๋ผโ€๋Š” ์„ค๊ณ„ ์ฒ ํ•™์˜ ๋ช…๋ฃŒ์„ฑ์ž…๋‹ˆ๋‹ค. ๊ธฐ๊ตฌํ•™์  ์ •๋ ฌ์„ ์ค„์ด๊ณ  ์ž์œ ๋„๋ฅผ ์ผ๋ถ€๋Ÿฌ ๋‚จ๊ธฐ๋Š” ๋ฐœ์ƒ์€ ์ง๊ด€์— ๋ฐ˜ํ•˜์ง€๋งŒ, ์‚ฌ์šฉ์ž ๋‹ค์–‘์„ฑ์„ ๋ฐ›์•„๋‚ผ ์ˆ˜ ์žˆ๋Š” ๊ฑฐ์˜ ์œ ์ผํ•œ ๊ธธ์ž…๋‹ˆ๋‹ค. Holonomic ์ œ์•ฝ ๋‘ ๊ฐœ๋กœ 4์ฐจ์› self-motion manifold๋ฅผ ๋งŒ๋“ค๊ณ  ์ด๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ์ธก์ •ํ•ด์„œ ๋ณด์—ฌ์ค€ ์ ์€ ๊น”๋”ํ•ฉ๋‹ˆ๋‹ค.

๋‘˜์งธ, hardware-level visual alignment๋ฅผ ํ†ตํ•œ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ๋‹จ์ˆœํ™”์ž…๋‹ˆ๋‹ค. Passive hand ํ•˜๋‚˜๋กœ segmentation, masking, inpainting์„ ๋ชจ๋‘ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฑด ํ‘œ๋ฉด์ ์œผ๋กœ โ€œ๋ณด์ •์šฉ ๋ถ€ํ’ˆ ํ•˜๋‚˜ ์ถ”๊ฐ€โ€๋กœ ๋ณด์ด์ง€๋งŒ, ์ •์ฑ… ํ•™์Šต์˜ ์‹œ๊ฐ ๋„๋ฉ”์ธ ์‹ ๋ขฐ๋„ ์ธก๋ฉด์—์„œ ํฐ ์ฐจ์ด์ž…๋‹ˆ๋‹ค.

์…‹์งธ, ๊ฐ€์œ„์งˆ ์ž‘์—…์ด ๋ณด์—ฌ์ค€ differentiator์ž…๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ ํ˜•์ƒ์ด ๋กœ๋ด‡ ์† ํ˜•์ƒ๊ณผ ๋‹ค๋ฅด๋ฉด ์‚ฌ๋žŒ์ด ๋„๊ตฌ๋ฅผ ์žก๋Š” ์ผ์กฐ์ฐจ ๋ชป ํ•œ๋‹ค๋Š” ์ ์€ ๋‹ค๋ฅธ ๋…ผ๋ฌธ์—์„œ ๊ฑฐ์˜ ์•ˆ ๋‹ค๋ฃจ๋Š” ์ธก๋ฉด์ž…๋‹ˆ๋‹ค. ์ด๊ฑด wearable ์™ธ๊ณจ๊ฒฉ ํ‰๊ฐ€์—์„œ ์ƒˆ๋กœ์šด ๊ธฐ์ค€์ด ๋  ๋งŒํ•ฉ๋‹ˆ๋‹ค.

๋„ท์งธ, ๋ถ„์„์˜ ๋‹จ์ˆœ์„ฑ๊ณผ ๊ฒ€์ฆ์˜ ์ง์ ‘์„ฑ์ž…๋‹ˆ๋‹ค. ์† ํฌ๊ธฐ ํ˜ธํ™˜ ๋ฒ”์œ„๋ฅผ ๋ถ€๋“ฑ์‹ ๋‘ ์ค„๋กœ ๋„์ถœํ•˜๊ณ  14๋ช… ๋ชจ๋‘ ๊ทธ ๋ฒ”์œ„ ์•ˆ์— ์žˆ์Œ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. Wiggle space๋ฅผ motion capture๋กœ ์‹ค์ธกํ•˜๊ณ  covariance ellipsoid๋กœ ์š”์•ฝํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹จ์ˆœํ•˜์ง€๋งŒ ๊ฒฐ์ •์ ์ธ ์ฆ๊ฑฐ ์ œ๊ณต์ž…๋‹ˆ๋‹ค.

๋‹ค์„ฏ์งธ, 14๋ช… ์‚ฌ์šฉ์ž ์—ฐ๊ตฌ์˜ ํญ๊ณผ ๋น„๊ต baseline์˜ ์ ์ ˆ์„ฑ์ž…๋‹ˆ๋‹ค. DexUMI, teleoperation์„ ๊ฐ™์€ ์ž‘์—…์—์„œ ์ง์ ‘ ๋น„๊ตํ•œ ์ , NASA-TLX ๊ธฐ๋ฐ˜ ์ฃผ๊ด€ ํ‰๊ฐ€๊นŒ์ง€ ํฌํ•จํ•œ ์ ์ด ์‹ ๋ขฐ๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

์ฒซ์งธ, ROHand 6์ž์œ ๋„๋ผ๋Š” ์ž‘์€ ํƒ€๊ฒŸ์— ์ข…์†์ ์ž…๋‹ˆ๋‹ค. Allegro Hand(16์ž์œ ๋„)๋‚˜ LEAP Hand(16์ž์œ ๋„)์ฒ˜๋Ÿผ ์ž์œ ๋„๊ฐ€ ๋” ๋งŽ์€ ์†์—์„œ ๊ฐ™์€ ์„ค๊ณ„ ์ฒ ํ•™์ด ํ†ตํ• ์ง€๋Š” ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ž์œ ๋„๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด abduction-adduction(AA) ์ œ์–ด๊ฐ€ ๋ณธ๊ฒฉ์ ์œผ๋กœ ํ•„์š”ํ•œ๋ฐ, ๋ณธ ์„ค๊ณ„๋Š” calibration-free๋ฅผ ์œ„ํ•ด AA joint๋ฅผ ์˜๋„์ ์œผ๋กœ ๋บ์Šต๋‹ˆ๋‹ค. ํ›„์† ์ž‘์—…(WHED ๋ผ์ธ์—…, DEX-Mouse ๋“ฑ ๋™์ผ ๊ทธ๋ฃน/๊ด€๋ จ ๊ทธ๋ฃน ์ž‘์—…)์—์„œ flexure ์„ผ์„œ๋กœ lateral motion์„ ์ถ”๊ฐ€ํ•˜๊ฒ ๋‹ค๊ณ  ์–ธ๊ธ‰๋˜์ง€๋งŒ, DexEXO ์ž์ฒด๋Š” AA ๋ถ€์žฌ๋ผ๋Š” ํ‘œํ˜„๋ ฅ ์†์‹ค์„ ์•ˆ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋‘˜์งธ, ์–‘์† ์ž‘์—…(bimanual)์ด๋‚˜ ๋„๊ตฌ๋ฅผ ๋“ค๊ณ  ํ•˜๋Š” ์ •๋ฐ€ ์ž‘์—…์˜ ํ‰๊ฐ€๊ฐ€ ์ œํ•œ์ ์ž…๋‹ˆ๋‹ค. ๊ฐ€์œ„์งˆ์ด ํ•ต์‹ฌ ์‚ฌ๋ก€์ง€๋งŒ, ์–‘์† ํ˜‘์—…์ด ํ•„์š”ํ•œ ์ž‘์—…(์˜ˆ: ๋ณ‘๋šœ๊ป‘ ์—ด๊ธฐ๋ฅผ ์–‘์†์œผ๋กœ)์ด๋‚˜ in-hand re-orientation์ฒ˜๋Ÿผ ์ง„์ •ํ•œ in-hand dexterity๊ฐ€ ์š”๊ตฌ๋˜๋Š” ์ž‘์—…์˜ ์ •๋Ÿ‰ ํ‰๊ฐ€๋Š” ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค.

์…‹์งธ, ์ด‰๊ฐ ๊ฐ๊ฐ์ด ๋น ์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. Diffusion policy๊ฐ€ raw RGB๋กœ ํ•™์Šต ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ฃผ์žฅ์€ ๊ฐ•๋ ฅํ•˜์ง€๋งŒ, ์ ‘์ด‰์ด ํ’๋ถ€ํ•œ ์ž‘์—…์—์„œ๋Š” ์‹œ๊ฐ๋งŒ์œผ๋กœ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. MILE(fingertip visuotactile sensing ํ†ตํ•ฉ)์ด๋‚˜ PolyTouch(tactile-diffusion policy) ๊ฐ™์€ ๋™์‹œ๊ธฐ ์ž‘์—…๊ณผ ๋น„๊ตํ•˜๋ฉด tactile ๋ถ€์žฌ๋Š” ๋ช…๋ฐฑํ•œ ์•ฝ์ ์ž…๋‹ˆ๋‹ค. DIGIT/GelSight ๊ฐ™์€ vision-based tactile sensor๋ฅผ passive hand ์†๊ฐ€๋ฝ ๋์— ํ†ตํ•ฉํ•˜๋Š” ๊ฒƒ์ด ์ž์—ฐ์Šค๋Ÿฌ์šด ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.

๋„ท์งธ, ์†๋ชฉ ์žฅ์ฐฉ RGB ๋‹จ์ผ ์‹œ์  ์˜์กด์„ฑ์ž…๋‹ˆ๋‹ค. Wrist camera ์‹œ์ ์€ ๊ฐ€๋ฆผ๊ณผ ์‹œ์•ผ ์ข์Œ์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ๊ณ , third-person ์‹œ์ ์ด๋‚˜ multi-view fusion์ด ํ•„์š”ํ•œ ์ž‘์—…์—๋Š” ๊ทธ๋Œ€๋กœ ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. DexCap์ด mocap ๊ธ€๋Ÿฌ๋ธŒ์™€ third-person view๋ฅผ ๊ฒฐํ•ฉํ•œ ๊ฒƒ์„ ๋– ์˜ฌ๋ฆฌ๋ฉด ๋น„๊ต๋ฉ๋‹ˆ๋‹ค.

๋‹ค์„ฏ์งธ, โ€œcalibration-freeโ€์˜ ํ•œ๊ณ„. ์† ๊ธธ์ด 140~217 mm ๋ฒ”์œ„๋Š” ๊ด‘๋ฒ”์œ„ํ•˜์ง€๋งŒ, ์† ํญ์ด๋‚˜ ๋‘๊ป˜, ์†๋ฐ”๋‹ฅ-์†๊ฐ€๋ฝ ๋น„์œจ์˜ ๋ณ€๋™์€ ๋ช…์‹œ์ ์œผ๋กœ ๋‹ค๋ฃจ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋˜ ์‚ฌ์šฉ์ž๊ฐ€ ๋งค๋ฒˆ ๋˜‘๊ฐ™์ด ์žฅ์ฐฉํ•  ์ˆ˜ ์žˆ์„์ง€(๋ฐ˜๋ณต์„ฑ)๋Š” 14๋ช… ์—ฐ๊ตฌ๋กœ๋Š” ์™„์ „ํžˆ ๊ฒ€์ฆํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

์—ฌ์„ฏ์งธ, end-effector pose์˜ iPhone AR ์˜์กด์ž…๋‹ˆ๋‹ค. ARKit์˜ ์ •ํ™•๋„์™€ drift๋Š” ์‹ค๋‚ด ํ™˜๊ฒฝ์—์„  ์ถฉ๋ถ„ํ•˜์ง€๋งŒ long-horizon ์ž‘์—…์ด๋‚˜ metallic/reflective ํ™˜๊ฒฝ์—์„œ๋Š” ๋ฌธ์ œ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฑด in-the-wild ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ๋ช…์‹œ์  ์•ฝ์ ์ž…๋‹ˆ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

flowchart LR
    subgraph ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋น„๋””์˜ค
        A1[Real2Render2Real]
        A2[Hand-object pretrain]
    end
    subgraph ์›๊ฒฉ์กฐ์ž‘
        B1[DexPilot]
        B2[OmniH2O]
        B3[GELLO]
        B4[Mobile ALOHA]
    end
    subgraph ๊ธ€๋Ÿฌ๋ธŒ ๋น„์ „
        C1[DexCap]
        C2[Doglove]
        C3[TeleDex]
    end
    subgraph ์™ธ๊ณจ๊ฒฉ
        D1[DexUMI: ์‹œ๊ฐ ์ •๋ ฌ ํ›„์ฒ˜๋ฆฌ]
        D2[DexOP: ๊ฐ•์ฒด ๋งํ‚ค์ง€]
        D3[MILE: ์ด‰๊ฐ ํ†ตํ•ฉ]
        D4[Tilde: delta hand]
        D5[WHED: ์ด์ „ ์ž‘์—…]
        E[DexEXO: ํฌ์ฆˆ ํ—ˆ์šฉ + Passive Hand]
    end
    D1 --> E
    D2 --> E
    D5 --> E

DexUMI์™€ ๋น„๊ตํ•˜๋ฉด, DexEXO๋Š” visual gap์„ algorithm์ด ์•„๋‹ˆ๋ผ hardware๋กœ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. DexUMI์˜ segmentation-inpainting ํŒŒ์ดํ”„๋ผ์ธ์€ visual artifact๋‚˜ boundary ์˜ค๋ฅ˜๊ฐ€ ๋ˆ„์ ๋˜๊ธฐ ์‰ฝ๊ณ , ๋‹ค์–‘ํ•œ ์กฐ๋ช…/๋ฐฐ๊ฒฝ์—์„œ ๋„๋ฉ”์ธ ์ผ๋ฐ˜ํ™”๊ฐ€ ์•ฝ์ ์ด ๋ฉ๋‹ˆ๋‹ค. DexEXO๋Š” ์ด ๋ฌธ์ œ๋ฅผ ์›์ฒœ์ ์œผ๋กœ ํšŒํ”ผํ•ฉ๋‹ˆ๋‹ค.

DexOP์™€ ๋น„๊ตํ•˜๋ฉด, DexEXO๋Š” ํ•œ ๋ฐœ ์–‘๋ณดํ•ฉ๋‹ˆ๋‹ค. DexOP๋Š” ๊ฐ•์ฒด ๋งํ‚ค์ง€๋กœ ์‚ฌ๋žŒ๊ณผ ๋กœ๋ด‡ ์† ์šด๋™ํ•™์„ 1:1๋กœ ๋ฌถ์–ด motion fidelity๋ฅผ ์ตœ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€์‹  ํŠน์ • ๋กœ๋ด‡ ์†์— ๋ฐ•ํ˜€๋ฒ„๋ฆฝ๋‹ˆ๋‹ค. DexEXO๋Š” motion fidelity๋ฅผ ์•ฝ๊ฐ„ ์–‘๋ณด(self-motion manifold๋ฅผ ์ธ์ •)ํ•˜๊ณ , ๊ทธ ๋Œ€๊ฐ€๋กœ cross-user ํ™•์žฅ์„ฑ๊ณผ ์ธ์ฒด๊ณตํ•™์  ํŽธ์•ˆํ•จ์„ ์–ป์Šต๋‹ˆ๋‹ค.

MILE๋Š” ์ง๊ตํ•˜๋Š” ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค. MILE๋Š” fingertip visuotactile sensing์„ ์™ธ๊ณจ๊ฒฉ ์†๋์— ํ†ตํ•ฉํ•ด ์ ‘์ด‰ ์ •๋ณด๋ฅผ ํ’๋ถ€ํ•˜๊ฒŒ ์บก์ณํ•ฉ๋‹ˆ๋‹ค. DexEXO๋Š” ์‹œ๊ฐ ์ •๋ ฌ์„ ์šฐ์„ ํ•˜์ง€๋งŒ ์ด‰๊ฐ์€ ๋‹ค๋ฃจ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‘ ์ ‘๊ทผ์ด ํ†ตํ•ฉ๋˜๋ฉด(์˜ˆ: passive hand ์†๋์— GelSight ๋ฏธ๋‹ˆ ์„ผ์„œ) ๋งค์šฐ ๊ฐ•๋ ฅํ•œ ์‹œ์Šคํ…œ์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ฐ™์€ UCLA ๊ทธ๋ฃน์˜ WHED(arXiv 2602.17908, 2026๋…„ 2์›”)๋Š” DexEXO์˜ ์ง์ ‘์  ์ „์‹ ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. WHED๋„ wearability-first + pose-tolerant thumb์„ ๊ฐ•์กฐํ•˜์ง€๋งŒ, passive hand๋ฅผ ํ†ตํ•œ visual alignment๋Š” DexEXO์—์„œ ๋ณธ๊ฒฉ์ ์œผ๋กœ ๋„์ž…๋œ ์ฐจ๋ณ„์ ์ž…๋‹ˆ๋‹ค.

UMI(Universal Manipulation Interface)์™€์˜ ๋น„๊ต๋„ ํฅ๋ฏธ๋กญ์Šต๋‹ˆ๋‹ค. UMI๋Š” parallel-jaw ๊ทธ๋ž˜์Šคํผ์šฉ hand-held ์ธํ„ฐํŽ˜์ด์Šค๋กœ ํฐ ์„ฑ๊ณต์„ ๊ฑฐ๋’€์Šต๋‹ˆ๋‹ค. DexEXO๋Š” ๊ทธ ์ •์‹ (in-the-wild, portable, embodiment-aligned)์„ multi-finger ์˜์—ญ์œผ๋กœ ํ™•์žฅํ•˜๋ ค๋Š” ์‹œ๋„์ž…๋‹ˆ๋‹ค.

๋กœ๋ด‡๊ณตํ•™์ž์—๊ฒŒ ์ฃผ๋Š” ํ†ต์ฐฐ

๋‹ค๊ด€์ ˆ ์† ํ”Œ๋žซํผ์„ ๋‹ค๋ฃจ๋Š” ์—ฐ๊ตฌ์ž์˜ ์ž…์žฅ์—์„œ ์ด ๋…ผ๋ฌธ์ด ์‹œ์‚ฌํ•˜๋Š” ๋ฐ”๋ฅผ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ฒซ์งธ, โ€œ์ •ํ™•ํ•œ ์ •๋ ฌโ€์ด ํ•ญ์ƒ ์˜ณ์€ ๋ชฉํ‘œ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ๋‹ค์–‘์„ฑ์„ ๋‹ค๋ฃจ๋ ค๋ฉด ์ผ๋ถ€ ์ž์œ ๋„๋ฅผ ์˜๋„์ ์œผ๋กœ ํ’€์–ด์ฃผ๋Š” ์„ค๊ณ„๊ฐ€ ๋” ๊ฒฌ๊ณ ํ•ฉ๋‹ˆ๋‹ค. Allegro Hand์ฒ˜๋Ÿผ 16์ž์œ ๋„ ์†์—์„œ๋„ base joint ์ •๋ ฌ๋ณด๋‹ค fingertip workspace ๋งค์นญ์ด ๋” ์ค‘์š”ํ•œ ์ž‘์—…์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ์‚ฌ๋žŒ-๋กœ๋ด‡ ๋งคํ•‘ ์‹œ ์–ด๋–ค ์ •๋ณด๋ฅผ ๊ฐ•์ œํ•˜๊ณ  ์–ด๋–ค ์ •๋ณด๋ฅผ ํ’€์–ด์ค„์ง€๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์„ค๊ณ„ํ•˜๋Š” ๊ด€์ ์ด ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋‘˜์งธ, hardware-level visual alignment๋Š” ์‹ค์šฉ์ ์ธ ํฐ leverage์ž…๋‹ˆ๋‹ค. Passive hand ํ•˜๋‚˜๋กœ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์˜ ํ›„์ฒ˜๋ฆฌ๋ฅผ ์ œ๊ฑฐํ•œ ๊ฒƒ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ sim-to-real ๊ฐญ์„ ์ค„์ด๋ ค๊ณ  ๋“ค์ด๋Š” ๋…ธ๋ ฅ์— ๋น„๊ตํ•˜๋ฉด ๋น„์šฉ ๋Œ€๋น„ ํšจ๊ณผ๊ฐ€ ๋งค์šฐ ํฝ๋‹ˆ๋‹ค. Allegro Hand๋กœ ๋ฐ๋ชจ ์ˆ˜์ง‘ํ•  ๋•Œ๋„ wrist-mounted camera ์˜์ƒ์ด ์‹ค๋ฐฐ์น˜ ์‹œ์  ์˜์ƒ๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€ ์ ๊ฒ€ํ•ด๋ณผ ๊ฐ€์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์…‹์งธ, DINOv2 + Diffusion Policy + relative action์˜ ์กฐํ•ฉ์ด raw RGB๋กœ ๋‹ค๊ด€์ ˆ ์ž‘์—…์—์„œ ํ†ตํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. DINOv2 ViT-S/14์˜ ๊ฐ•๋ ฅํ•œ self-supervised feature๊ฐ€ raw ๊ด€์ธก์—์„œ๋„ ์ถฉ๋ถ„ํ•œ ์‹ ํ˜ธ๋ฅผ ๋ฝ‘์•„๋ƒ…๋‹ˆ๋‹ค. ์ด๋Š” VLA(RT-2, OpenVLA, ฯ€0 ๋“ฑ) ํ•™์Šต์šฉ ๋ฐ๋ชจ ์ˆ˜์ง‘ ํŒŒ์ดํ”„๋ผ์ธ์„ ์„ค๊ณ„ํ•  ๋•Œ๋„ ์‹œ์‚ฌ์ ์„ ์ค๋‹ˆ๋‹ค. ์‹œ๊ฐ ์ธ์ฝ”๋”์˜ ์„ ํƒ๊ณผ ์‹œ๊ฐ ๋„๋ฉ”์ธ ์ •๋ ฌ์ด alignment ํ›„์ฒ˜๋ฆฌ๋งŒํผ์ด๋‚˜ ์ค‘์š”ํ•˜๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

๋„ท์งธ, ์‚ฌ์šฉ์ž ์—ฐ๊ตฌ์˜ ์ž‘์—… ์„ ํƒ์—์„œ โ€œ๊ฐ€์œ„์งˆโ€์˜ ๊ฐ€์น˜์ž…๋‹ˆ๋‹ค. ์™ธ๊ณจ๊ฒฉ ํ˜•์ƒ์ด ๋„๊ตฌ ์‚ฌ์šฉ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋“œ๋Ÿฌ๋‚ด๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค. ๋ฐ๋ชจ ์ˆ˜์ง‘ ์žฅ์น˜๋ฅผ ํ‰๊ฐ€ํ•  ๋•Œ, ์† ์ž์ฒด์˜ ์ž‘์—…๋ฟ ์•„๋‹ˆ๋ผ ์†์ด ๋“ค๊ณ  ์žˆ๋Š” ๋„๊ตฌ๊ฐ€ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•˜๋Š” ์ž‘์—…์„ ํฌํ•จ์‹œํ‚ค๋ฉด ์™ธ๊ณจ๊ฒฉ ๋ถ€ํ”ผ์˜ ์˜ํ–ฅ์„ ๋…ธ์ถœ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์„ฏ์งธ, โ€œwiggle spaceโ€๋ผ๋Š” ๊ฐœ๋…์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์ž…๋‹ˆ๋‹ค. ๊ฐ•์ฒด ์ •๋ ฌ์„ ๊ฐ•์š”ํ•˜์ง€ ์•Š๊ณ  self-motion manifold๋ฅผ ์˜๋„์ ์œผ๋กœ ๋งŒ๋“œ๋Š” ์„ค๊ณ„๋Š” ์† ์™ธ๊ณจ๊ฒฉ์„ ๋„˜์–ด humanoid ์ „์‹  teleoperation, wearable haptic ์ธํ„ฐํŽ˜์ด์Šค ๋“ฑ์— ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฐœ์ƒ์ž…๋‹ˆ๋‹ค. Holonomic ์ œ์•ฝ์˜ ์ˆ˜์™€ ์ข…๋ฅ˜๋ฅผ ์˜๋„์ ์œผ๋กœ ์„ ํƒํ•ด residual DoF์˜ ์ฐจ์›์„ ํ†ต์ œํ•˜๋Š” ์ผ๋ฐ˜์  ์„ค๊ณ„ ์›๋ฆฌ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋งˆ๋ฌด๋ฆฌ

DexEXO๋ฅผ ํ•œ ๋ฌธ์žฅ์œผ๋กœ ์š”์•ฝํ•˜๋ฉด ์ด๋ ‡๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. โ€œ๊ธฐ๊ตฌํ•™์  ์ผ์น˜๋ฅผ ๊ฐ•์š”ํ•˜์ง€ ๋ง๊ณ , ์‹œ๊ฐ์  ์ผ์น˜๋ฅผ ํ•˜๋“œ์›จ์–ด๋กœ ๋ณด์žฅํ•˜๋ผ.โ€ ์ด ๋‘ ์›์น™์ด ์‚ฌ์šฉ์ž ๋‹ค์–‘์„ฑ ํก์ˆ˜์™€ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ๋‹จ์ˆœํ™”๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ์‹ค์šฉ์  ์ด๋“์„ ํ•œ ๋ฒˆ์— ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์ด ๋˜์ง€๋Š” ๋” ํฐ ๋ฉ”์‹œ์ง€๋Š” ์ด๋ ‡์Šต๋‹ˆ๋‹ค. ๋ฐ๋ชจ ์ˆ˜์ง‘์˜ ์ง„์งœ ๋ณ‘๋ชฉ์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์•„๋‹ˆ๋ผ ์‚ฌ๋žŒ์ด ๊ทธ ์žฅ์น˜๋ฅผ ์˜ค๋ž˜, ํŽธํ•˜๊ฒŒ, ์ •ํ™•ํ•˜๊ฒŒ ์“ธ ์ˆ˜ ์žˆ๋А๋ƒ์ž…๋‹ˆ๋‹ค. ์ด ๊ด€์ ์€ dexterous manipulation ๋ฐ์ดํ„ฐ ํ™•๋ณด๋ฅผ ๊ณ ๋ฏผํ•˜๋Š” ๋ชจ๋“  ์—ฐ๊ตฌ์ž์—๊ฒŒ ์œ ํšจํ•ฉ๋‹ˆ๋‹ค. ๋” ์ •ํ™•ํ•œ retargeting ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์งœ๊ธฐ ์ „์—, ์™ธ๊ณจ๊ฒฉ์ด ๋„๊ตฌ ์†์žก์ด์— ์•ˆ ๊ฑธ๋ฆฌ๋Š”์ง€๋ถ€ํ„ฐ ํ™•์ธํ•ด๋ด์•ผ ํ•œ๋‹ค๋Š” ์ด์•ผ๊ธฐ์ž…๋‹ˆ๋‹ค.

๋‚จ๋Š” ์งˆ๋ฌธ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. 16์ž์œ ๋„ ์†์—์„œ ๊ฐ™์€ ์ฒ ํ•™์ด ์œ ํšจํ• ๊นŒ์š”? Tactile sensing์ด ๊ฒฐํ•ฉ๋˜๋ฉด passive hand ์„ค๊ณ„๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฐ”๋€Œ์–ด์•ผ ํ• ๊นŒ์š”? Bimanual๋กœ ํ™•์žฅํ•˜๋ ค๋ฉด wrist pose ์ถ”์ ์€ ์–ด๋–ป๊ฒŒ ๊ฐ•ํ™”ํ•ด์•ผ ํ• ๊นŒ์š”? ์ด ์งˆ๋ฌธ๋“ค์— ๋Œ€ํ•œ ๋‹ต์ด ํ›„์† ์—ฐ๊ตฌ์—์„œ ๋‚˜์˜ค๋ฉด, DexEXO์˜ ์ฒ ํ•™์€ ๋‹จ์ˆœํ•œ ํ•œ ์žฅ์น˜๋ฅผ ๋„˜์–ด ๋‹ค๊ด€์ ˆ ์†๊ธฐ์ˆ  ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ํ‘œ์ค€์  ์„ค๊ณ„ ์›์น™์œผ๋กœ ์ž๋ฆฌ์žก์„ ๊ฐ€๋Šฅ์„ฑ์ด ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.

๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์‹œ๋Œ€์˜ ์†๊ธฐ์ˆ  ํ•™์Šต์—์„œ, ๊ฐ€์žฅ ์˜๋ฆฌํ•œ algorithmic trick์€ ์ข…์ข… hardware trick์œผ๋กœ ํ’€๋ฆฐ๋‹ค๋Š” ์ ์„ ๋‹ค์‹œ ํ•œ ๋ฒˆ ์ผ๊นจ์›Œ์ฃผ๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee