Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ์„œ๋ก : ์™œ ์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ๊ฐ€?
      • ๋ฌธ์ œ์˜ ๋ณธ์งˆ โ€” ๋กœ๋ด‡ ์†์€ ์™œ ์•„์ง๋„ ์„œํˆฐ๊ฐ€
      • GRASP Taxonomy โ€” ์ธ๊ฐ„ ํŒŒ์ง€์˜ ์ฃผ๊ธฐ์œจํ‘œ
      • ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ
    • ๋ฐฉ๋ฒ•๋ก : ํ•˜๋‚˜์˜ ๋ ˆ์‹œํ”ผ์—์„œ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ํŒŒ์ง€๋กœ
      • Stage 1: Grasp Template Library (ํŒŒ์ง€ ํ…œํ”Œ๋ฆฟ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)
      • Stage 2: Global Alignment (๊ธ€๋กœ๋ฒŒ ์ •๋ ฌ)
      • Stage 3: Local Refinement (๋กœ์ปฌ ๋ฏธ์„ธ์กฐ์ •)
      • Stage 4: Simulation Validation (์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒ€์ฆ)
      • Stage 5: Template Self-Amplification (ํ…œํ”Œ๋ฆฟ ์ž๊ธฐ ํ™•์žฅ)
    • Dexonomy ๋ฐ์ดํ„ฐ์…‹: ์ˆซ์ž๋กœ ๋ณด๋Š” ๊ทœ๋ชจ
    • ํ•™์Šต ๊ธฐ๋ฐ˜ ํŒŒ์ง€ ์ƒ์„ฑ: Type-Conditional Generative Model
      • ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜
      • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹คํ—˜ ๊ฒฐ๊ณผ
      • ์‹ค์„ธ๊ณ„ ์‹คํ—˜
    • ์‹คํ—˜ ๋ถ„์„: ๋ฌด์—‡์ด ์ž‘๋™ํ•˜๊ณ , ๋ฌด์—‡์ด ์•„๋‹Œ๊ฐ€
      • Type-Unaware ํŒŒ์ง€ ํ•ฉ์„ฑ ๋น„๊ต (์‹œ๋ฎฌ๋ ˆ์ด์…˜)
      • ํŒŒ์ง€ ์œ ํ˜•๋ณ„ ์„ฑ๊ณต๋ฅ  ๋ถ„์„
      • ํ…œํ”Œ๋ฆฟ ๊ฐ•๊ฑด์„ฑ
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
      • ๊ฐ•์ 
      • ์•ฝ์ ๊ณผ ํ•œ๊ณ„
    • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
      • ํŒŒ์ง€ ํ•ฉ์„ฑ ๋ฐฉ๋ฒ•๋ก ์˜ ์ง„ํ™”
      • DemoGrasp์™€์˜ ๋Œ€๋น„
      • OmniDexVLG์™€์˜ ๋น„๊ต
    • ์šฐ๋ฆฌ ์—ฐ๊ตฌ์—์˜ ์‹œ์‚ฌ์ 
      • RL ์—ฐ๊ตฌ ๊ด€์ ์—์„œ
      • VLA ๋ชจ๋ธ ๊ด€์ ์—์„œ
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 
    • ์ฐธ๊ณ  ์ •๋ณด

๐Ÿ“ƒDexonomy ๋ฆฌ๋ทฐ

grasp
taxonomy
mujoco
Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy
Published

February 24, 2026

  • Paper Link
  • Project Link
  • Code Link
  1. ๐Ÿค ์ด ์—ฐ๊ตฌ๋Š” ํ•˜๋‚˜์˜ ํœด๋จผ-์–ด๋…ธํ…Œ์ดํŠธ๋œ ํ…œํ”Œ๋ฆฟ์œผ๋กœ๋ถ€ํ„ฐ ์ปจํƒํŠธ๊ฐ€ ํ’๋ถ€ํ•˜๊ณ  ์นจํˆฌ๊ฐ€ ์—†์œผ๋ฉฐ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํƒ€๋‹นํ•œ ๋ฑ์Šคํ„ฐ๋Ÿฌ์Šค ๊ทธ๋žฉ์„ ํšจ์œจ์ ์œผ๋กœ ํ•ฉ์„ฑํ•˜๋Š” ์ƒˆ๋กœ์šด ํŒŒ์ดํ”„๋ผ์ธ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ“Š ์ œ์•ˆ๋œ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ 10.7k ๊ฐœ์˜ ๊ฐ์ฒด์™€ GRASP taxonomy์˜ 31๊ฐœ ๊ทธ๋žฉ ์œ ํ˜•์„ ํฌํ•จํ•˜๋Š” 9.5M ๊ทœ๋ชจ์˜ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ–ˆ์œผ๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ด์ „์˜ type-unaware ๋ฒ ์ด์Šค๋ผ์ธ๋“ค์„ ํฌ๊ฒŒ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿค– ์ด ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด ๋‹จ์ผ ๋ทฐ ๊ฐ์ฒด ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋งŒ์œผ๋กœ ์›ํ•˜๋Š” ๊ทธ๋žฉ ์œ ํ˜•์„ ์ƒ์„ฑํ•˜๋Š” type-conditional ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผฐ์œผ๋ฉฐ, ์ด๋Š” ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ 82.3%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ GRASP taxonomy์— ์ •์˜๋œ ๋ชจ๋“  dexterous grasp type์„ ํšจ์œจ์ ์œผ๋กœ ํ•ฉ์„ฑํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์ธ Dexonomy๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ง€๋Šฅํ˜• ๋กœ๋ด‡์ด ํ™˜๊ฒฝ๊ณผ ์œ ์—ฐํ•˜๊ฒŒ ์ƒํ˜ธ์ž‘์šฉํ•˜๊ธฐ ์œ„ํ•œ fundamental skill์ธ generalizable dexterous grasping์„ ์œ„ํ•ด, ๊ธฐ์กด ์ž๋™ grasp ํ•ฉ์„ฑ ๋ฐฉ๋ฒ•๋“ค์˜ ํŠน์ • grasp type ๋˜๋Š” object category์— ๋Œ€ํ•œ ํ•œ๊ณ„์ ์„ ๊ทน๋ณตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ์˜ ๊ณ ํ’ˆ์งˆ grasp ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•์˜ ์–ด๋ ค์›€์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์†๊ณผ grasp type๋ณ„๋กœ ๋‹จ ํ•˜๋‚˜์˜ ์ธ๊ฐ„ ์ฃผ์„(human-annotated) template๋งŒ์œผ๋กœ contact-rich, penetration-free, ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํƒ€๋‹นํ•œ dexterous grasp๋ฅผ ์ƒ์„ฑํ•˜๋Š” pipeline์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  (Core Methodology)

์ œ์•ˆํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  1. ๊ฒฝ๋Ÿ‰ ์ „์—ญ ์ •๋ ฌ (Lightweight Global Alignment) ๋‹จ๊ณ„: ์ด ๋‹จ๊ณ„์—์„œ๋Š” ์„ ํƒ๋œ grasp template์˜ ์† ์ ‘์ด‰ ์ •๋ณด(์† contact point p^h_i ๋ฐ normal n^h_i)์— ๋งž์ถฐ object pose๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๊ณ  ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๊ณผ ๋‹ฌ๋ฆฌ ์† pose๋ฅผ ๊ณ ์ •ํ•œ ์ฑ„ object pose๋งŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์ƒ˜ํ”Œ๋ง: ๋ฌด์ž‘์œ„ grasp template์„ ์„ ํƒํ•˜๊ณ , template์—์„œ ๋ฌด์ž‘์œ„ ์† contact point๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ ๋ฌด์ž‘์œ„ object์™€ ๊ทธ object์˜ ๋ฌด์ž‘์œ„ ํ‘œ๋ฉด์ ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ƒ˜ํ”Œ๋ง๋œ ์†๊ณผ object contact point๋ฅผ ์ •๋ ฌํ•˜๊ณ , contact normal ๋ฐฉํ–ฅ์„ ๋ฐ˜๋Œ€๋กœ ์„ค์ •ํ•˜์—ฌ object๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. object์˜ scale๊ณผ normal ๋ฐฉํ–ฅ์— ์ˆ˜์ง์ธ ํ‰๋ฉด ๋‚ด ํšŒ์ „์€ ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋ง๋ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ๋‹จ์ผ GPU์—์„œ ๋Œ€๋Ÿ‰์˜ ์ƒ˜ํ”Œ์„ ๋ณ‘๋ ฌ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์–ด ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.
    • ์ตœ์ ํ™”: ์ตœ์ ํ™” ๋ณ€์ˆ˜๋Š” object์˜ ๋ณ€ํ™˜(scale s_o \in \mathbb{R}, rotation R_o \in \mathrm{S}^3, translation t_o \in \mathbb{R}^3)์ž…๋‹ˆ๋‹ค. ๊ฐ ์† contact point p^h_i์— ๋Œ€ํ•ด Warp ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ object ํ‘œ๋ฉด์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์  p^o_i๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์†๊ณผ object contact์˜ ๋ถˆ์ผ์น˜์— ๋Œ€ํ•œ ํŒจ๋„ํ‹ฐ๋ฅผ ์ฃผ๊ธฐ ์œ„ํ•ด ๋‹ค์Œ ์—๋„ˆ์ง€ ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜์—ฌ object pose๋ฅผ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค: L = k_p \sum_{i=1}^m \|p^h_i - p^o_i\|^2 + k_n \sum_{i=1}^m \|n^h_i - n^o_i\|^2 ์—ฌ๊ธฐ์„œ k_p์™€ k_n์€ hyperparameter์ž…๋‹ˆ๋‹ค.
    • ํ›„์ฒ˜๋ฆฌ ํ•„ํ„ฐ๋ง: ์ตœ์ ํ™” ํ›„, ๊ฒฐ๊ณผ๋Š” ๋„ค ๊ฐ€์ง€ ๊ธฐ์ค€์— ๋”ฐ๋ผ ํ•„ํ„ฐ๋ง๋ฉ๋‹ˆ๋‹ค:
      • ์ตœ์ข… ์—๋„ˆ์ง€ ํ•จ์ˆ˜ L์ด ํŠน์ • ์ž„๊ณ„๊ฐ’ ๋ฏธ๋งŒ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
      • ์†๊ณผ object ๊ฐ„์˜ ์‹ฌ๊ฐํ•œ ๊ด€ํ†ต(penetration)์ด ์—†์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์„ ๋ถ„์œผ๋กœ parameterization๋œ ์† ์ถฉ๋Œ ๊ณจ๊ฒฉ(collision skeleton)์„ ์‚ฌ์šฉํ•˜์—ฌ ํšจ์œจ์ ์œผ๋กœ ๊ฐ์ง€๋ฉ๋‹ˆ๋‹ค.
      • object contact quality(Section III์˜ Eq. 6์œผ๋กœ ์ธก์ •)๊ฐ€ ์ž„๊ณ„๊ฐ’์„ ์ดˆ๊ณผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
      • ์ค‘๋ณต๋˜๋Š” object ๋ณ€ํ™˜์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด farthest point sampling๊ณผ ์œ ์‚ฌํ•œ process๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  2. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜ ์ง€์—ญ ์ •๊ตํ™” (Simulation-based Local Refinement) ๋‹จ๊ณ„: object๊ฐ€ ๊ณ ์ •๋œ ์ƒํƒœ์—์„œ ์† pose๋ฅผ ์ง€์—ญ์ ์œผ๋กœ ์ •๊ตํ™”ํ•˜์—ฌ ์†-object ์ ‘์ด‰์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. MuJoCo ํ™˜๊ฒฝ์—์„œ ๊ฐ€์ƒ์˜ ํž˜ f_i๋ฅผ ๊ฐ ์† contact point p^h_i์—์„œ ํ•ด๋‹น object์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์  p^o_i ๋ฐฉํ–ฅ์œผ๋กœ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฐ€์ƒ์˜ ํž˜์€ ๋‹ค์Œ simplified transposed Jacobian control์„ ํ†ตํ•ด ์†์˜ joint torque \tau๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค: f_i = k_f (p^h_i - p^o_i), \quad \tau = \sum_{i=1}^m J^T_{h,i} f_i ์—ฌ๊ธฐ์„œ k_f๋Š” hyperparameter์ด๊ณ , J^T_{h,i} \in \mathbb{R}^{q \times 3}๋Š” ์† contact Jacobian์˜ ์ „์น˜(transpose)์ž…๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ MuJoCo์˜ second-order Newton optimizer๋ฅผ ํ™œ์šฉํ•˜์—ฌ submillimeter-level์˜ ์ ‘์ด‰ ์ˆ˜๋ ด์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • ๊ด€ํ†ต ๋ฐฉ์ง€: ์†๊ณผ object ๊ฐ„์˜ strict penetration-free๋ฅผ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด MuJoCo์—์„œ 1mm contact margin์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์†์ด object ํ‘œ๋ฉด 1mm ์ด๋‚ด๋กœ ์ ‘๊ทผํ•˜๋ฉด ๋ฐ˜๋ฐœ๋ ฅ์„ ๊ฐ€ํ•˜์—ฌ 0-2mm ๋ฒ”์œ„ ๋‚ด์—์„œ ์ ‘์ด‰ ๊ฑฐ๋ฆฌ๋ฅผ ์œ ์ง€ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    • ํ›„์ฒ˜๋ฆฌ ํ•„ํ„ฐ๋ง: ์ตœ์ ํ™” ํ›„ ๊ฒฐ๊ณผ๋Š” ์„ธ ๊ฐ€์ง€ ๊ธฐ์ค€์— ๋”ฐ๋ผ ํ•„ํ„ฐ๋ง๋ฉ๋‹ˆ๋‹ค:
      • ์†๊ณผ object ์‚ฌ์ด์— ๊ด€ํ†ต์ด ์—†์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค (์ถฉ๋Œ mesh ์‚ฌ์šฉ).
      • ์ฃผ์„์ด ๋‹ฌ๋ฆฐ(annotated) contact๋ฅผ ๊ฐ€์ง„ ์†๊ฐ€๋ฝ์€ object์™€ ์ ‘์ด‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
      • Grasp quality(Section III์˜ Eq. 6์œผ๋กœ ์ธก์ •)๊ฐ€ ์ž„๊ณ„๊ฐ’์„ ์ดˆ๊ณผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Grasp Quality Metric (Section III):

๋…ผ๋ฌธ์€ grasp quality metric์œผ๋กœ force closure metric์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ด๋Š” ๋‹ค๋ฅธ ์—ฐ๊ตฌ๋“ค์—์„œ ์‚ฌ์šฉ๋œ metric์„ ํ†ตํ•ฉํ•˜์—ฌ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. object O๊ฐ€ m๊ฐœ์˜ contact point๋ฅผ ๊ฐ€์ง„ ๋กœ๋ด‡ ์†์— ์˜ํ•ด grasp๋  ๋•Œ, ๊ฐ contact i์— ๋Œ€ํ•ด contact position p_i \in \mathbb{R}^3, inward-pointing surface unit normal n_i \in \mathbb{R}^3, ๊ทธ๋ฆฌ๊ณ  ๋‘ ๊ฐœ์˜ unit tangent vector d_i, c_i \in \mathbb{R}^3 (n_i = d_i \times c_i)๊ฐ€ ์ •์˜๋ฉ๋‹ˆ๋‹ค. ์ฟจ๋กฑ ๋งˆ์ฐฐ ์›์ถ”(Coulomb friction cone) F_i์™€ object์— ๋Œ€ํ•œ contact Jacobian J_{o,i}๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: F_i = \{x_i \in \mathbb{R}^3 \mid 0 \leq x_{i,1} \leq 1, x_{i,2}^2 + x_{i,3}^2 \leq \mu^2 x_{i,1}^2 \} J^T_{o,i} = \begin{pmatrix} n_i & d_i & c_i \\ p_i \times n_i & p_i \times d_i & p_i \times c_i \end{pmatrix} \in \mathbb{R}^{6 \times 3} ์—ฌ๊ธฐ์„œ \mu๋Š” ๋งˆ์ฐฐ ๊ณ„์ˆ˜์ž…๋‹ˆ๋‹ค. ์™ธ๋ถ€ wrench g \in \mathbb{R}^6 (์˜ˆ: object์˜ ์ค‘๋ ฅ)์— ๋Œ€ํ•ด ์ตœ์ ์˜ contact force \{f_i\}_{i=1}^m๋Š” ๋‹ค์Œ 2์ฐจ ๊ณ„ํš๋ฒ•(QP)์„ ํ’€์–ด ์–ป์Šต๋‹ˆ๋‹ค: (f_1, \dots, f_m) = \arg \min_{(x_1, \dots, x_m)} \left\| \sum_{i=1}^m J^T_{o,i} x_i - g \right\|^2 \text{s.t.} \quad x_i \in F_i, \quad i \in \{1, \dots, m\} \sum_{i=1}^m x_{i,1} \geq \lambda ์—ฌ๊ธฐ์„œ \lambda๋Š” ์ตœ์†Œ ์ด normal force๋ฅผ ๊ฐ•์ œํ•˜๋Š” hyperparameter์ž…๋‹ˆ๋‹ค. ๋งˆ์ฐฐ ์›์ถ”๋Š” ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ์œ„ํ•ด ํ”ผ๋ผ๋ฏธ๋“œ๋กœ ๊ทผ์‚ฌ๋ฉ๋‹ˆ๋‹ค. ์ตœ์ข… grasp quality metric e๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค: e = \left\| \sum_{i=1}^m J^T_{o,i} f_i - g \right\|^2 e ๊ฐ’์ด ๋‚ฎ์„์ˆ˜๋ก ์•ˆ์ •์ ์ธ grasp๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒ€์ฆ (Simulation Validation) ๋ฐ Template ๊ตฌ์ถ•:

ํ•ฉ์„ฑ๋œ grasp๋Š” MuJoCo์—์„œ ์•ˆ์ •์„ฑ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฒ€์ฆ๋ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ contact-aware control strategy๋Š” Eq. 3 (g=0)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ contact์— ํ•„์š”ํ•œ force๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ด๋ฅผ ์ „์น˜ Jacobian ์ œ์–ด๋ฅผ ํ†ตํ•ด joint torque๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. Grasp๋Š” object๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ 2์ดˆ ๋™์•ˆ 6๊ฐœ์˜ orthogonal ์™ธ๋ถ€ ํž˜(external forces) ํ•˜์—์„œ๋„ ์•ˆ์ •์ ์œผ๋กœ ์œ ์ง€๋  ๊ฒฝ์šฐ ์„ฑ๊ณต์œผ๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค. ์„ฑ๊ณต์ ์ธ grasp๋Š” ์ƒˆ๋กœ์šด grasp template์œผ๋กœ ๊ตฌ์ถ•๋˜์–ด template library์— ์ถ”๊ฐ€๋˜๋ฉฐ, ์ดํ›„ iteration์—์„œ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Dexonomy ๋ฐ์ดํ„ฐ์…‹:

์ œ์•ˆ๋œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜์—ฌ Shadow hand์— ๋Œ€ํ•œ GRASP taxonomy์˜ 31๊ฐœ grasp type์„ ํฌ๊ด„ํ•˜๋Š” ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค. 10.7k๊ฐœ์˜ object asset (DexGraspNet์—์„œ 5,697๊ฐœ, Objaverse์—์„œ 5,000๊ฐœ)๊ณผ 9.5M๊ฐœ์˜ ์„ฑ๊ณต์ ์ธ grasp ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋Š” grasp pose, pre-grasp pose (์ถฉ๋Œ ์—†๋Š” ๋ชจ์…˜ ๊ณ„ํš์„ ์œ„ํ•ด 2cm contact margin์„ ์ ์šฉํ•˜์—ฌ ์ƒ์„ฑ), squeeze pose (์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒ€์ฆ์— ์‚ฌ์šฉ๋œ ์ œ์–ด ์‹ ํ˜ธ์—์„œ ํŒŒ์ƒ)์˜ ์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ pose๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

Type-Conditional Grasp Generative Model:

์‹ค์ œ ํ™˜๊ฒฝ ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ๋ถ€๋ถ„ ๊ด€์ธก(partial observation)์œผ๋กœ๋ถ€ํ„ฐ grasp๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด type-conditional generative model์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๋‹จ์ผ ์‹œ์ (single-view) object point cloud์™€ grasp-type codebook์—์„œ ์„ ํƒ๋œ type feature f^i_t๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค. point cloud๋Š” Sparse3DConv ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด vision feature f_v๋กœ ์ธ์ฝ”๋”ฉ๋ฉ๋‹ˆ๋‹ค. f_v์™€ f_t๋Š” ์—ฐ๊ฒฐ๋˜์–ด ์กฐ๊ฑด๋ถ€ feature f_c๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. f_c์— ์กฐ๊ฑดํ™”๋œ Mobius normalizing flow๋Š” base distribution์˜ ๋ฌด์ž‘์œ„ ์ƒ˜ํ”Œ์„ grasp pose (R_g, T_g)๋กœ ๋งคํ•‘ํ•˜๊ณ  pose ํ’ˆ์งˆ์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ™•๋ฅ  p๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์ธก๋œ grasp pose๋Š” f_c์™€ ์—ฐ๊ฒฐ๋˜์–ด MLP๋ฅผ ํ†ตํ•ด pre-grasp pose (R_p, T_p)์™€ ์„ธ ๊ฐ€์ง€ ์† joint configuration (q_p, q_g, q_s)์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ end-to-end ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ:

  • Type-Unaware Grasp Synthesis ๋น„๊ต: DexGraspNet, FRoGGeR, SpringGrasp, BODex์™€ ๊ฐ™์€ ๊ธฐ์กด analytical method๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ Allegro hand์— ๋Œ€ํ•ด ๊ฐ€์žฅ ๋†’์€ Grasp Success Rate (60.50%), ์šฐ์ˆ˜ํ•œ Contact Link Number (4.38), ๋‚ฎ์€ Contact Distance Consistency (0.21mm), Penetration Depth (0.00mm), Self-Penetration Depth (0.00mm)๋ฅผ ๋‹ฌ์„ฑํ•˜๋ฉฐ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, object์˜ ์งˆ๋Ÿ‰์„ ๋†’์ด๊ณ  ๋งˆ์ฐฐ ๊ณ„์ˆ˜๋ฅผ ์ค„์ด๋Š” ๋“ฑ ๋” ์–ด๋ ค์šด ๋ฒค์น˜๋งˆํฌ ์กฐ๊ฑด์—์„œ๋„ BODex๋ฅผ ํฌ๊ฒŒ ๋Šฅ๊ฐ€ํ•˜์—ฌ in-the-wild object์— ๋Œ€ํ•œ generalizability๋ฅผ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
  • Type-Aware Grasp Synthesis: power, intermediate, precision grasp ์œ ํ˜•์— ๋Œ€ํ•œ ํ†ต๊ณ„๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ์ •๊ตํ•œ contact-rich grasp๋ฅผ ํ•ฉ์„ฑํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๋ชจ๋“ˆ๋ณ„ Ablation Study: Global Alignment ๋‹จ๊ณ„์˜ ์ตœ์ ํ™” ๋ฐ ํ›„์ฒ˜๋ฆฌ ํ•„ํ„ฐ๋ง, Local Refinement ๋‹จ๊ณ„์˜ ์ตœ์ ํ™”, ๊ทธ๋ฆฌ๊ณ  ์ƒˆ๋กœ์šด grasp template ๊ตฌ์ถ• ์ „๋žต์ด ์ „์ฒด ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ๊ธ์ •์ ์ธ ์˜ํ–ฅ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ template ์ถ”๊ฐ€ ์ „๋žต์€ ์ดˆ๊ธฐ template์˜ ๋…ธ์ด์ฆˆ๋‚˜ ๋ณ€ํ™”์— ๋Œ€ํ•œ pipeline์˜ ๊ฒฌ๊ณ ์„ฑ(robustness)์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
  • ํ•™์Šต ๊ธฐ๋ฐ˜ Grasp Synthesis: Dexonomy ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šต๋œ type-conditional model์€ ๋‹จ์ผ ์‹œ์  object point cloud๋กœ๋ถ€ํ„ฐ grasp๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ baseline๋“ค์„ ๋Šฅ๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์‹ค์ œ ํ™˜๊ฒฝ ์‹คํ—˜: ํ•™์Šต๋œ type-conditional model์€ 13๊ฐœ์˜ ๋‹ค์–‘ํ•œ object์— ๋Œ€ํ•ด 12๊ฐ€์ง€ grasp type์„ ์‹œ๋„ํ•˜์—ฌ 82.3%์˜ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์›ํ•˜๋Š” grasp type์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์‘์šฉ ๋ฐ ํ•œ๊ณ„:

์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ semantic grasp ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์„ ์œ„ํ•œ ํšจ์œจ์ ์ธ annotation UI ๊ฐœ๋ฐœ์— ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” object์— contact point์™€ grasp type์„ ์ง€์ •ํ•˜๋Š” ๊ฐ„๋‹จํ•œ ํด๋ฆญ๋งŒ์œผ๋กœ ๊ณ ํ’ˆ์งˆ grasp๋ฅผ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ๊ณ„์ ์œผ๋กœ๋Š” ์ผ๋ถ€ grasp type์˜ ๋ถ€์ ํ•ฉ์„ฑ ๋˜๋Š” ๋ถˆ์•ˆ์ •์„ฑ, ์ •์  grasp pose ํ•ฉ์„ฑ์— ์ง‘์ค‘ํ•˜๋ฉฐ ๋™์  grasping์„ ์œ„ํ•œ ๊ถค์  ์ƒ์„ฑ(trajectory generation) ๋ถ€์กฑ, ๊ทธ๋ฆฌ๊ณ  ๋‹จ์ผ object grasp์— ๊ตญํ•œ๋œ๋‹ค๋Š” ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

โ€œ์†๊ฐ€๋ฝ ํ•˜๋‚˜ํ•˜๋‚˜๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฌผ๊ฑด์„ ์žก์•„์•ผ ํ•˜๋Š”์ง€, ์ธ๊ฐ„์ด ์•Œ๋ ค์ค€ ๋”ฑ ํ•œ ์žฅ์˜ โ€™๋ ˆ์‹œํ”ผโ€™๋งŒ์œผ๋กœ 950๋งŒ ๊ฐœ์˜ ํŒŒ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋ƒˆ๋‹ค.โ€

์„œ๋ก : ์™œ ์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ๊ฐ€?

๋ฌธ์ œ์˜ ๋ณธ์งˆ โ€” ๋กœ๋ด‡ ์†์€ ์™œ ์•„์ง๋„ ์„œํˆฐ๊ฐ€

์šฐ๋ฆฌ ์ธ๊ฐ„์€ ์•„์นจ์— ์นซ์†”์„ ์ง‘๋Š” ๊ฒƒ๋ถ€ํ„ฐ, ์—ฐํ•„์„ ์ฅ๊ณ  ๊ธ€์”จ๋ฅผ ์“ฐ๊ณ , ์—ด์‡ ๋ฅผ ๋Œ๋ฆฌ๊ณ , ์‚ฌ๊ณผ๋ฅผ ์›€์ผœ์ฅ๋Š” ๊ฒƒ๊นŒ์ง€ โ€” ํ•˜๋ฃจ์—๋„ ์ˆ˜๋ฐฑ ๊ฐ€์ง€์˜ ์„œ๋กœ ๋‹ค๋ฅธ ํŒŒ์ง€(grasp) ์œ ํ˜•์„ ์ž์œ ์ž์žฌ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฑธ ์˜์‹ํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ ์ž์ฒด๊ฐ€ ์ธ๊ฐ„ ์†์˜ ๋†€๋ผ์šด ๋Šฅ๋ ฅ์ด์ฃ .

๋กœ๋ด‡๊ณตํ•™์—์„œ ์ด๋ฅผ ์žฌํ˜„ํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ๊นŒ์š”? ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•๋“ค์€ ๋Œ€๋ถ€๋ถ„ ํ•˜๋‚˜์˜ ํŒŒ์ง€ ์œ ํ˜•, ์ฃผ๋กœ โ€œpower graspโ€(๋ฌผ๊ฑด์„ ๊ฝ‰ ์›€์ผœ์ฅ๋Š” ํ˜•ํƒœ)์— ์ง‘์ค‘ํ•ด์™”์Šต๋‹ˆ๋‹ค. ์ด์œ ๋Š” ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค๊ธฐ๊ฐ€ ์–ด๋ ต๊ฑฐ๋“ ์š”. ๋‹ค์„ฏ ์†๊ฐ€๋ฝ ๋กœ๋ด‡ ํ•ธ๋“œ์˜ ์ž์œ ๋„(DoF)๋Š” 20๊ฐœ๊ฐ€ ๋„˜๊ณ , ์ ‘์ด‰ ์กฐ๊ฑด์€ ๋น„์„ ํ˜•์ด๋ฉฐ, ์นจํˆฌ(penetration) ์—†์ด ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊ทธ๋Ÿด๋“ฏํ•œ ํŒŒ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ์„œ ๊ทน๋„๋กœ ๊นŒ๋‹ค๋กญ์Šต๋‹ˆ๋‹ค.

GRASP Taxonomy โ€” ์ธ๊ฐ„ ํŒŒ์ง€์˜ ์ฃผ๊ธฐ์œจํ‘œ

2016๋…„ Feix ๋“ฑ์ด ์ •๋ฆฌํ•œ GRASP Taxonomy๋Š” ์ธ๊ฐ„ ํŒŒ์ง€๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„๋ฅ˜ํ•œ ์ผ์ข…์˜ โ€œ์ฃผ๊ธฐ์œจํ‘œโ€์ž…๋‹ˆ๋‹ค. ์ด 33๊ฐ€์ง€ ํŒŒ์ง€ ์œ ํ˜•์ด ๋‹ค์Œ ๊ธฐ์ค€์œผ๋กœ ์ •๋ฆฌ๋ฉ๋‹ˆ๋‹ค:

  1. Opposition Type (๋Œ€ํ–ฅ ์œ ํ˜•): Pad, Palm, Side
  2. Virtual Finger (๊ฐ€์ƒ ์†๊ฐ€๋ฝ ํ• ๋‹น): ์–ด๋–ค ์†๊ฐ€๋ฝ๋“ค์ด ํ•œ ํŒ€์œผ๋กœ ์ž‘๋™ํ•˜๋Š”๊ฐ€
  3. Power / Precision / Intermediate: ํž˜ ์ค‘์‹ฌ์ธ๊ฐ€, ์ •๋ฐ€๋„ ์ค‘์‹ฌ์ธ๊ฐ€
  4. Thumb Position (์—„์ง€ ์œ„์น˜): ์—„์ง€๊ฐ€ ์–ด๋””์— ๋†“์ด๋Š”๊ฐ€
๋Œ€๋ถ„๋ฅ˜ ์„ค๋ช… ์˜ˆ์‹œ
Power Grasp ์†๋ฐ”๋‹ฅ๊ณผ ์†๊ฐ€๋ฝ ์ „์ฒด๋กœ ๊ฐ์‹ธ๋Š” ํŒŒ์ง€ Large Diameter (#1), Medium Wrap (#2)
Intermediate Grasp Power์™€ Precision์˜ ์ค‘๊ฐ„ Lateral Tripod (#10), Thumb-2 Finger (#12)
Precision Grasp ์†๊ฐ€๋ฝ ๋์œผ๋กœ ์„ธ๋ฐ€ํ•˜๊ฒŒ ์žก๋Š” ํŒŒ์ง€ Tip Pinch (#17), Palmar Pinch (#18)

๊ธฐ์กด ์ž๋™ ํŒŒ์ง€ ํ•ฉ์„ฑ ๋ฐฉ๋ฒ•๋“ค โ€” DexGraspNet, BODex ๋“ฑ โ€” ์€ ์ด ์ค‘ ์ผ๋ถ€๋งŒ ๋‹ค๋ฃจ๊ฑฐ๋‚˜, ํŒŒ์ง€ ์œ ํ˜•์„ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š๋Š”(type-unaware) ์ ‘๊ทผ๋ฒ•์ด์—ˆ์Šต๋‹ˆ๋‹ค. Dexonomy๋Š” ์ด GRASP Taxonomy์˜ 31๊ฐ€์ง€ ์œ ํ˜•์„ ๋ชจ๋‘ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์ดˆ์˜ ๋ฒ”์šฉ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ

์ด ๋…ผ๋ฌธ(RSS 2025)์˜ ๊ธฐ์—ฌ๋ฅผ ์„ธ ๊ฐ€์ง€๋กœ ์š”์•ฝํ•˜๋ฉด:

  1. ๋ฒ”์šฉ ํŒŒ์ง€ ํ•ฉ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ: ์–ด๋–ค ํŒŒ์ง€ ์œ ํ˜•, ์–ด๋–ค ๋ฌผ์ฒด, ์–ด๋–ค ๊ด€์ ˆํ˜• ํ•ธ๋“œ์—๋„ ์ ์šฉ ๊ฐ€๋Šฅ. ํ•ธ๋“œ์™€ ํŒŒ์ง€ ์œ ํ˜•๋‹น ๋‹จ ํ•˜๋‚˜์˜ ์ธ๊ฐ„ ์ฃผ์„ ํ…œํ”Œ๋ฆฟ๋งŒ ํ•„์š”
  2. Dexonomy ๋ฐ์ดํ„ฐ์…‹: 10,700๊ฐœ ๋ฌผ์ฒด ร— 31๊ฐ€์ง€ ํŒŒ์ง€ ์œ ํ˜• = 950๋งŒ ๊ฐœ์˜ ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ํŒŒ์ง€ ๋ฐ์ดํ„ฐ
  3. Type-Conditional ์ƒ์„ฑ ๋ชจ๋ธ: ๋‹จ์ผ ๋ทฐ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์—์„œ ์›ํ•˜๋Š” ํŒŒ์ง€ ์œ ํ˜•์œผ๋กœ ์ƒ์„ฑ, ์‹ค์„ธ๊ณ„ 82.3% ์„ฑ๊ณต๋ฅ 

flowchart TB
    A["๐Ÿคš ์ธ๊ฐ„ ์ฃผ์„ ํ…œํ”Œ๋ฆฟ<br/>(ํŒŒ์ง€ ์œ ํ˜•๋‹น 1๊ฐœ)"] --> B["โšก Global Alignment<br/>(GPU ๋ณ‘๋ ฌ ์ตœ์ ํ™”)"]
    B --> C["๐Ÿ”ง Local Refinement<br/>(MuJoCo ์‹œ๋ฎฌ๋ ˆ์ด์…˜)"]
    C --> D{"โœ… Simulation<br/>Validation"}
    D -->|์„ฑ๊ณต| E["๐Ÿ“š Dexonomy Dataset<br/>9.5M grasps"]
    D -->|์„ฑ๊ณต| F["๐Ÿ”„ ์ƒˆ ํ…œํ”Œ๋ฆฟ์œผ๋กœ<br/>๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™•์žฅ"]
    F --> B
    D -->|์‹คํŒจ| G["โŒ ํ๊ธฐ"]
    E --> H["๐Ÿง  Type-Conditional<br/>์ƒ์„ฑ ๋ชจ๋ธ ํ•™์Šต"]
    H --> I["๐Ÿค– ์‹ค์„ธ๊ณ„ ํŒŒ์ง€<br/>82.3% ์„ฑ๊ณต๋ฅ "]
    
    style A fill:#4CAF50,color:white
    style E fill:#2196F3,color:white
    style I fill:#FF9800,color:white


๋ฐฉ๋ฒ•๋ก : ํ•˜๋‚˜์˜ ๋ ˆ์‹œํ”ผ์—์„œ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ํŒŒ์ง€๋กœ

Dexonomy์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๋†€๋ž๋„๋ก ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค. ๋น„์œ ํ•˜์ž๋ฉด ์ด๋ ‡์Šต๋‹ˆ๋‹ค:

์—ฌ๋Ÿฌ๋ถ„์ด ์š”๋ฆฌ์‚ฌ๋ผ๊ณ  ํ•ฉ์‹œ๋‹ค. โ€œ์ด ์† ๋ชจ์–‘์œผ๋กœ ์ด ํฌ๊ธฐ์˜ ๋ฌผ๊ฑด์„ ์žก์œผ๋ฉด ์ด๋ ‡๊ฒŒ ์ ‘์ด‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹คโ€๋ผ๋Š” ๋ ˆ์‹œํ”ผ๊ฐ€ ํ•˜๋‚˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ƒˆ๋กœ์šด ๋ฌผ๊ฑด์ด ์™”์„ ๋•Œ, (1) ๋จผ์ € ๋ฌผ๊ฑด์„ ์†์— ๋งž์ถฐ๋ณด๊ณ  โ€” ๋ฌผ๊ฑด ์ชฝ์„ ์กฐ์ •ํ•˜๊ณ , (2) ๊ทธ๋‹ค์Œ์— ์†์„ ๋ฏธ์„ธ ์กฐ์ •ํ•ด์„œ โ€” ์ œ๋Œ€๋กœ ์ฅ˜ ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒ๋‹ˆ๋‹ค.

์ด ๋‘ ๋‹จ๊ณ„ ์„ค๊ณ„๊ฐ€ Dexonomy ํŒŒ์ดํ”„๋ผ์ธ์˜ ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค.

Stage 1: Grasp Template Library (ํŒŒ์ง€ ํ…œํ”Œ๋ฆฟ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)

ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ถœ๋ฐœ์ ์€ Grasp Template์ž…๋‹ˆ๋‹ค. ๊ฐ ํ…œํ”Œ๋ฆฟ์€ ๋‹ค์Œ ์ •๋ณด๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:

  • ์† ๊ด€์ ˆ ๊ฐ๋„ (\mathbf{q} \in \mathbb{R}^{n_\text{dof}}): ํ•ธ๋“œ์˜ ๊ฐ ๊ด€์ ˆ ์ƒํƒœ
  • ์ ‘์ด‰์  (\mathbf{p}_i \in \mathbb{R}^3): ์† ํ‘œ๋ฉด์—์„œ ๋ฌผ์ฒด์™€ ๋‹ฟ์•„์•ผ ํ•  ์ง€์ ๋“ค
  • ์ ‘์ด‰ ๋ฒ•์„  (\mathbf{n}_i \in \mathbb{R}^3): ๊ฐ ์ ‘์ด‰์ ์—์„œ์˜ ํž˜ ๋ฐฉํ–ฅ

์ค‘์š”ํ•œ ๊ฒƒ์€, ํŒŒ์ง€ ์œ ํ˜•๋‹น ๋‹จ ํ•˜๋‚˜์˜ ํ…œํ”Œ๋ฆฟ๋งŒ ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋งŒ๋“ค๋ฉด ๋œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. Shadow Hand์— 31๊ฐ€์ง€ ํŒŒ์ง€ ์œ ํ˜•์ด๋ฉด 31๊ฐœ์˜ ์ดˆ๊ธฐ ํ…œํ”Œ๋ฆฟ๋งŒ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ ํŒŒ์ดํ”„๋ผ์ธ์ด ์„ฑ๊ณตํ•œ ํŒŒ์ง€์—์„œ ์ž๋™์œผ๋กœ ์ƒˆ ํ…œํ”Œ๋ฆฟ์„ ์ƒ์„ฑํ•˜์—ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค โ€” ์ผ์ข…์˜ ์ž๊ธฐ ์ฆํญ(self-amplification) ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด์ฃ .

Stage 2: Global Alignment (๊ธ€๋กœ๋ฒŒ ์ •๋ ฌ)

ํ•ต์‹ฌ ์งˆ๋ฌธ: โ€œ์ด ๋ฌผ์ฒด๋ฅผ ์† ํ…œํ”Œ๋ฆฟ์— ์–ด๋–ป๊ฒŒ ๋งž์ถœ ๊ฒƒ์ธ๊ฐ€?โ€

์ „ํ†ต์  ์ ‘๊ทผ๋ฒ•์€ ์†์„ ๋ฌผ์ฒด์— ๋งž์ถ”๋ ค ํ•ฉ๋‹ˆ๋‹ค. Dexonomy๋Š” ์—ญ๋ฐœ์ƒ์œผ๋กœ ์ ‘๊ทผํ•ฉ๋‹ˆ๋‹ค โ€” ๋ฌผ์ฒด์˜ ์œ„์น˜ยท์ž์„ธ๋ฅผ ์ตœ์ ํ™”ํ•˜์—ฌ ์† ํ…œํ”Œ๋ฆฟ์— ๋งž์ถฅ๋‹ˆ๋‹ค.

์™œ ์ด๋ ‡๊ฒŒ ํ• ๊นŒ์š”? ๋ฌผ์ฒด์˜ ํฌ์ฆˆ๋Š” SE(3), ์ฆ‰ 6์ฐจ์›(3 ์ด๋™ + 3 ํšŒ์ „)์ด์ง€๋งŒ, ์†์˜ ๊ด€์ ˆ ๊ณต๊ฐ„์€ 20์ฐจ์›์ด ๋„˜์Šต๋‹ˆ๋‹ค. 6์ฐจ์› ์ตœ์ ํ™”๊ฐ€ ํ›จ์”ฌ ๋น ๋ฅด๊ณ  ์•ˆ์ •์ ์ž…๋‹ˆ๋‹ค.

๊ตฌ์ฒด์ ์œผ๋กœ ๋ฌผ์ฒด ํฌ์ฆˆ \mathbf{T} = (\mathbf{R}, \mathbf{t})๋ฅผ ์ตœ์ ํ™”ํ•˜์—ฌ ๋‹ค์Œ ์—๋„ˆ์ง€๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค:

E_\text{align}(\mathbf{T}) = \sum_{i=1}^{K} \left[ \lambda_d \cdot d(\mathbf{p}_i, \text{Surf}(\mathcal{O}, \mathbf{T}))^2 + \lambda_n \cdot (1 - \mathbf{n}_i \cdot \mathbf{n}_{\text{obj},i})^2 \right]

์—ฌ๊ธฐ์„œ:

  • d(\mathbf{p}_i, \text{Surf}(\mathcal{O}, \mathbf{T})): ์†์˜ ์ ‘์ด‰์  \mathbf{p}_i์—์„œ ๋ณ€ํ™˜๋œ ๋ฌผ์ฒด ํ‘œ๋ฉด๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ
  • \mathbf{n}_i \cdot \mathbf{n}_{\text{obj},i}: ์† ์ ‘์ด‰ ๋ฒ•์„ ๊ณผ ๋ฌผ์ฒด ํ‘œ๋ฉด ๋ฒ•์„ ์˜ ๋‚ด์  (์ •๋ ฌ๋„)
  • K: ์ด ์ ‘์ด‰์  ์ˆ˜

์‰ฝ๊ฒŒ ๋งํ•˜๋ฉด, โ€œ์†์— ์ง€์ •๋œ ์ ‘์ด‰์ ์— ๋ฌผ์ฒด ํ‘œ๋ฉด์ด ๊ฐ€๊นŒ์ด ์˜ค๊ณ , ๊ทธ ์ง€์ ์˜ ๋ฒ•์„  ๋ฐฉํ–ฅ์ด ์ผ์น˜ํ•˜๋„๋กโ€ ๋ฌผ์ฒด๋ฅผ ์›€์ง์ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ์ตœ์ ํ™”๋Š” GPU์—์„œ ์ˆ˜์ฒœ ๊ฐœ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์‹คํ–‰๋˜๋ฏ€๋กœ ๋งค์šฐ ๋น ๋ฆ…๋‹ˆ๋‹ค.

์ถ”๊ฐ€์ ์œผ๋กœ ๋‘ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ํ•„ํ„ฐ๋ง์ด ์ด ๋‹จ๊ณ„์—์„œ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค:

  • ์‹ฌ๊ฐํ•œ ์นจํˆฌ ๊ฒ€์ถœ: ์†์„ ์ง์„  ์„ธ๊ทธ๋จผํŠธ(skeleton)๋กœ ๊ฐ„์†Œํ™”ํ•˜์—ฌ ๋ฌผ์ฒด ๋ฉ”์‹œ์™€์˜ ๊ต์ฐจ๋ฅผ ๋น ๋ฅด๊ฒŒ ๊ฒ€์‚ฌ
  • ์ ‘์ด‰ ํ’ˆ์งˆ ํ•„ํ„ฐ: ์ ‘์ด‰์  ๊ฑฐ๋ฆฌ์™€ ๋ฒ•์„  ์ผ์น˜๋„์˜ ์ตœ์†Œ ์ž„๊ณ„๊ฐ’ ํ™•์ธ

Stage 3: Local Refinement (๋กœ์ปฌ ๋ฏธ์„ธ์กฐ์ •)

Global Alignment ์ดํ›„์—๋„ ์†๊ณผ ๋ฌผ์ฒด์˜ ์ ‘์ด‰์€ ์™„๋ฒฝํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ฌผ์ฒด ํ˜•ํƒœ๋Š” ์ €๋งˆ๋‹ค ๋‹ค๋ฅด๋‹ˆ๊นŒ์š”. ์ด ๋‹จ๊ณ„์—์„œ๋Š” MuJoCo ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์•ˆ์—์„œ ์†์˜ ๊ด€์ ˆ์„ ๋ฏธ์„ธํ•˜๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ Dexonomy์˜ ๊ธฐ์ˆ ์ ์œผ๋กœ ๊ฐ€์žฅ ์šฐ์•„ํ•œ ๋ถ€๋ถ„์ด ๋“ฑ์žฅํ•ฉ๋‹ˆ๋‹ค โ€” Transposed Jacobian Control์„ ํ™œ์šฉํ•œ ์ ‘์ด‰ ๊ธฐ๋ฐ˜ ๋ฏธ์„ธ์กฐ์ •์ž…๋‹ˆ๋‹ค.

๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ๋ณต์žกํ•œ ๋ชฉ์ ํ•จ์ˆ˜์™€ ์ปค์Šคํ…€ ์˜ตํ‹ฐ๋งˆ์ด์ €๋ฅผ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค. Dexonomy๋Š” ๋Œ€์‹  ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ์—ญํ•™์„ ์ง์ ‘ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค:

  1. ์† ํ‘œ๋ฉด์˜ ์ ‘์ด‰์ ์— ๋ฌผ์ฒด ํ‘œ๋ฉด์„ ํ–ฅํ•œ ๊ฐ€์ƒ์˜ ํž˜์„ ์ •์˜
  2. ์ด ํž˜์„ ๊ด€์ ˆ ํ† ํฌ๋กœ ๋ณ€ํ™˜: \boldsymbol{\tau} = \mathbf{J}^\top \mathbf{f}
  3. MuJoCo๊ฐ€ ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ์†์„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฌผ์ฒด์— ๋ฐ€์ฐฉ

์ด ์ ‘๊ทผ๋ฒ•์˜ ์žฅ์ ์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค:

  • ์นจํˆฌ๊ฐ€ ์›์ฒœ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅ: ๋ฌผ๋ฆฌ ์—”์ง„์ด ์ถฉ๋Œ์„ ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ
  • ์ฝ”๋”ฉ์ด ๊ฐ„๋‹จ: ๋ณต์žกํ•œ ์ปค์Šคํ…€ ์—๋„ˆ์ง€ ํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•  ํ•„์š” ์—†์Œ
  • ์ ‘์ด‰์ด ํ’๋ถ€: ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ์ ‘์ด‰ ์—ญํ•™์„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ•ด๊ฒฐ

flowchart LR
    subgraph GA["Stage 2: Global Alignment"]
        A1["๋ฌผ์ฒด ํฌ์ฆˆ T ์ƒ˜ํ”Œ๋ง<br/>(์ˆ˜์ฒœ ๊ฐœ ๋ณ‘๋ ฌ)"] --> A2["์ ‘์ด‰์  ๊ฑฐ๋ฆฌ +<br/>๋ฒ•์„  ์ •๋ ฌ ์ตœ์ ํ™”"]
        A2 --> A3["์นจํˆฌ ๊ฒ€์‚ฌ +<br/>ํ’ˆ์งˆ ํ•„ํ„ฐ"]
    end
    
    subgraph LR["Stage 3: Local Refinement"]
        B1["MuJoCo์—<br/>์†+๋ฌผ์ฒด ๋ฐฐ์น˜"] --> B2["๊ฐ€์ƒ ํž˜ ์ •์˜<br/>f โ†’ ๋ฌผ์ฒด ํ‘œ๋ฉด ๋ฐฉํ–ฅ"]
        B2 --> B3["ฯ„ = J^T f<br/>Jacobian ์ „์น˜ ์ œ์–ด"]
        B3 --> B4["๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜<br/>์† ๋ฏธ์„ธ์กฐ์ •"]
    end
    
    GA --> LR
    
    style GA fill:#E3F2FD
    style LR fill:#FFF3E0

Stage 4: Simulation Validation (์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒ€์ฆ)

ํŒŒ์ง€๊ฐ€ ์ƒ์„ฑ๋˜์—ˆ๋‹ค๊ณ  ๋์ด ์•„๋‹™๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ๋ฌผ๊ฑด์„ โ€œ๋“ค ์ˆ˜ ์žˆ๋Š”์ง€โ€ ๊ฒ€์ฆํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Dexonomy๋Š” ์ด๋ฅผ ์œ„ํ•ด Contact-Aware Control Strategy๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

Force-Closure ๊ฒ€์ฆ์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด

Force-closure๋ž€ โ€œ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ์™ธ๋ ฅ์ด ๊ฐ€ํ•ด์ ธ๋„ ํŒŒ์ง€๊ฐ€ ์œ ์ง€๋˜๋Š”๊ฐ€?โ€๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜ํ•™์ ์œผ๋กœ ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณต์‹ํ™”๋ฉ๋‹ˆ๋‹ค:

๊ฐ ์ ‘์ด‰์  i์—์„œ์˜ ํž˜ \mathbf{f}_i๊ฐ€ ๋งˆ์ฐฐ์›๋ฟ”(friction cone) \mathcal{F}_i ์•ˆ์— ์žˆ์œผ๋ฉด์„œ, ๋ชจ๋“  ์ ‘์ด‰๋ ฅ์˜ ํ•ฉ์ด ์™ธ๋ถ€ ๋ Œ์น˜(wrench) \mathbf{w}_\text{ext}๋ฅผ ์ƒ์‡„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

\min_{\mathbf{f}_1, \ldots, \mathbf{f}_K} \sum_{i=1}^{K} \|\mathbf{f}_i\|^2 \quad \text{s.t.} \quad \sum_{i=1}^{K} \mathbf{G}_i \mathbf{f}_i = -\mathbf{w}_\text{ext}, \quad \mathbf{f}_i \in \mathcal{F}_i

๋งˆ์ฐฐ์›๋ฟ” \mathcal{F}_i๋ฅผ ํ”ผ๋ผ๋ฏธ๋“œ๋กœ ๊ทผ์‚ฌํ•˜๋ฉด ์ด ๋ฌธ์ œ๋Š” ์„ ํ˜• ์ œ์•ฝ ์ด์ฐจ๊ณ„ํš๋ฒ•(LCQP)์œผ๋กœ ๋ณ€ํ™˜๋˜์–ด ํšจ์œจ์ ์œผ๋กœ ํ’€ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ์ผ๋ฐ˜์ ์ธ force-closure ๊ฒ€์ฆ์€ 6๋ฐฉํ–ฅ ์ค‘๋ ฅ์„ ๋ชจ๋‘ ํ…Œ์ŠคํŠธํ•ด์•ผ ํ•ด์„œ LCQP๋ฅผ 6๋ฒˆ ํ’€์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Dexonomy๋Š” ์ด๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜ ๊ฒ€์ฆ์œผ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค:

  1. LCQP๋กœ ๊ฐ ์ ‘์ด‰์ ์—์„œ์˜ ์›ํ•˜๋Š” ํž˜ \mathbf{f}_i^*๋ฅผ ๊ณ„์‚ฐ
  2. Transposed Jacobian control๋กœ ์†์ด ์ด ํž˜์„ ๊ทผ์‚ฌ์ ์œผ๋กœ ๊ฐ€ํ•˜๋„๋ก ์ œ์–ด: \boldsymbol{\tau} = \mathbf{J}^\top \mathbf{f}^*
  3. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๋ฌผ์ฒด๊ฐ€ ๋–จ์–ด์ง€๋Š”์ง€ ํ™•์ธ

์ด ๋ฐฉ์‹์€ ๊ธฐ์กด ํœด๋ฆฌ์Šคํ‹ฑ(์†์„ ๊ฝ‰ ์ฅ๋Š” ๋ฐฉ์‹)๊ณผ ๋‹ฌ๋ฆฌ, ๋ชจ๋“  ํŒŒ์ง€ ์œ ํ˜•์— ๋ฒ”์šฉ์ ์œผ๋กœ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. Precision grasp์—์„œ ์†๊ฐ€๋ฝ ๋๋งŒ ํž˜์„ ๊ฐ€ํ•˜๋Š” ๊ฒฝ์šฐ์—๋„, power grasp์—์„œ ์†๋ฐ”๋‹ฅ๊นŒ์ง€ ํ™œ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋„ ๋™์ผํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

Stage 5: Template Self-Amplification (ํ…œํ”Œ๋ฆฟ ์ž๊ธฐ ํ™•์žฅ)

์„ฑ๊ณตํ•œ ํŒŒ์ง€๋Š” ์ƒˆ๋กœ์šด ํ…œํ”Œ๋ฆฟ์ด ๋ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์ค‘์š”ํ•œ ์„ค๊ณ„ ์›์น™์ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • ๊ด€์ ˆ ๊ฐ๋„: ์„ฑ๊ณตํ•œ ํŒŒ์ง€์—์„œ ์ง์ ‘ ๊ฐ€์ ธ์˜ด
  • ์ ‘์ด‰ ์ •๋ณด: ์›๋ž˜ ์ ‘์ด‰์  ๊ทผ์ฒ˜์—์„œ ์‹ค์ œ ์ ‘์ด‰์ด ๊ฐ์ง€๋œ ๊ฒฝ์šฐ์—๋งŒ ์—…๋ฐ์ดํŠธ

์ด ๋ณด์ˆ˜์ ์ธ ์—…๋ฐ์ดํŠธ ์ „๋žต์€ ํ…œํ”Œ๋ฆฟ์ด ์›๋ž˜ ํŒŒ์ง€ ์œ ํ˜•์—์„œ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚˜์ง€ ์•Š๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. 10 ์—ํญ์— ๊ฑธ์ณ ์ „์ฒด ๋ฌผ์ฒด๋ฅผ ๋ฐ˜๋ณต ์ฒ˜๋ฆฌํ•˜๋ฉด์„œ ํ…œํ”Œ๋ฆฟ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์ ์ง„์ ์œผ๋กœ ์„ฑ์žฅํ•ฉ๋‹ˆ๋‹ค.


Dexonomy ๋ฐ์ดํ„ฐ์…‹: ์ˆซ์ž๋กœ ๋ณด๋Š” ๊ทœ๋ชจ

ํ•ญ๋ชฉ ์ˆ˜์น˜
์ด ๋ฌผ์ฒด ์ˆ˜ 10,700๊ฐœ (DexGraspNet 5k + Objaverse 5.7k)
์ด ํŒŒ์ง€ ์ˆ˜ 9,500,000๊ฐœ
ํŒŒ์ง€ ์œ ํ˜• ์ˆ˜ 31๊ฐ€์ง€ (GRASP Taxonomy)
๋กœ๋ด‡ ํ•ธ๋“œ Shadow Hand
๋ฌผ์ฒด ์Šค์ผ€์ผ ๋ฒ”์œ„ 0.06 ~ 0.12 (ํ˜„์‹ค์  ํฌ๊ธฐ)
๋ฌผ์ฒด ์งˆ๋Ÿ‰ 100g (๊ธฐ์กด ์—ฐ๊ตฌ๋ณด๋‹ค ๋ฌด๊ฑฐ์›€)

๋ฐ์ดํ„ฐ์…‹์˜ ํŒŒ์ง€ ์œ ํ˜• ๋ถ„ํฌ๋Š” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ถˆ๊ท ํ˜•์„ ๋ณด์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋ถˆ๊ฐ€ํ”ผํ•œ๋ฐ, Lateral grasp(#16)๋Š” ๋‚ฉ์ž‘ํ•˜๊ณ  ์ž‘์€ ๋ฌผ์ฒด์—๋งŒ ์ ํ•ฉํ•˜๊ณ , Large Diameter grasp(#1)๋Š” ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.


ํ•™์Šต ๊ธฐ๋ฐ˜ ํŒŒ์ง€ ์ƒ์„ฑ: Type-Conditional Generative Model

๋ฐ์ดํ„ฐ์…‹์˜ ์ง„์ •ํ•œ ๊ฐ€์น˜๋Š” ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์— ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜

๋…ผ๋ฌธ์—์„œ๋Š” CVAE(Conditional Variational Autoencoder) ๊ธฐ๋ฐ˜์˜ type-conditional ์ƒ์„ฑ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

  • ์ž…๋ ฅ: ๋‹จ์ผ ๋ทฐ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ (๋ถ€๋ถ„ ๊ด€์ฐฐ)
  • ์กฐ๊ฑด: ํŒŒ์ง€ ์œ ํ˜• (31๊ฐ€์ง€ ์ค‘ ํ•˜๋‚˜)
  • ์ถœ๋ ฅ: ์†์˜ ๊ด€์ ˆ ๊ฐ๋„ + ์†๋ชฉ ํฌ์ฆˆ (SE(3) ๋ณ€ํ™˜)

์—ฌ๊ธฐ์— ํŒŒ์ง€ ์œ ํ˜• ๋ถ„๋ฅ˜๊ธฐ(classifier)๋ฅผ ๋ณ„๋„๋กœ ํ•™์Šตํ•˜์—ฌ, ์ฃผ์–ด์ง„ ๋ฌผ์ฒด์˜ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋กœ๋ถ€ํ„ฐ ์ตœ์ ์˜ ํŒŒ์ง€ ์œ ํ˜•์„ ์ž๋™ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

flowchart LR
    A["๐Ÿ“ท ๋‹จ์ผ ๋ทฐ<br/>ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ"] --> B["๐Ÿท๏ธ Type Classifier<br/>(์ตœ์  ํŒŒ์ง€ ์œ ํ˜• ์„ ํƒ)"]
    A --> C["๐Ÿง  Type-Conditional<br/>CVAE"]
    B -->|"ํŒŒ์ง€ ์œ ํ˜• t"| C
    C --> D["โœ‹ ํŒŒ์ง€ ์ƒ์„ฑ<br/>(๊ด€์ ˆ๊ฐ + ์†๋ชฉ ํฌ์ฆˆ)"]
    D --> E["๐Ÿ” ์‹œ๋ฎฌ๋ ˆ์ด์…˜<br/>๊ฒ€์ฆ"]
    E -->|์„ฑ๊ณต| F["๐Ÿค– ์‹คํ–‰"]
    
    style C fill:#9C27B0,color:white
    style F fill:#4CAF50,color:white

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹คํ—˜ ๊ฒฐ๊ณผ

10,700๊ฐœ ๋ฌผ์ฒด๋ฅผ 4:1๋กœ ํ›ˆ๋ จ/ํ…Œ์ŠคํŠธ ๋ถ„ํ• ํ•˜์—ฌ ๋น„๊ตํ•œ ์ฃผ์š” ๊ฒฐ๊ณผ:

๋ฐฉ๋ฒ• ๋ฐ์ดํ„ฐ์…‹ ์„ฑ๊ณต๋ฅ  (Normal) ์„ฑ๊ณต๋ฅ  (Hard)
BODex ๋ฐ์ดํ„ฐ์…‹ + ํ•™์Šต 0.7M grasps ๋‚ฎ์Œ ๋งค์šฐ ๋‚ฎ์Œ
Ours-type1 (Large Diameter๋งŒ) 0.4M ์ค‘๊ฐ„ ์ค‘๊ฐ„
Ours-all (31 ์œ ํ˜• ์ „์ฒด) 9.5M ์ตœ๊ณ  ์ตœ๊ณ 

ํ•ต์‹ฌ ์ธ์‚ฌ์ดํŠธ: ๋‹จ์ผ ํŒŒ์ง€ ์œ ํ˜•๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋ชจ๋“  ํŒŒ์ง€ ์œ ํ˜•์„ ํฌํ•จํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ „์ฒด ์„ฑ๋Šฅ์„ ๋Œ์–ด์˜ฌ๋ฆฝ๋‹ˆ๋‹ค. ์ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ํŒŒ์ง€ ์œ ํ˜•์ด ์„œ๋กœ ๋‹ค๋ฅธ ๋ฌผ์ฒด ํ˜•ํƒœ์— ๋Œ€ํ•œ ๋ณด์™„์  ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์‹ค์„ธ๊ณ„ ์‹คํ—˜

Shadow Hand๋ฅผ ์‚ฌ์šฉํ•œ ์‹ค์„ธ๊ณ„ ์‹คํ—˜์—์„œ 82.3% ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹จ์ผ ๋ทฐ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋งŒ์œผ๋กœ ์›ํ•˜๋Š” ํŒŒ์ง€ ์œ ํ˜•์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค.


์‹คํ—˜ ๋ถ„์„: ๋ฌด์—‡์ด ์ž‘๋™ํ•˜๊ณ , ๋ฌด์—‡์ด ์•„๋‹Œ๊ฐ€

Type-Unaware ํŒŒ์ง€ ํ•ฉ์„ฑ ๋น„๊ต (์‹œ๋ฎฌ๋ ˆ์ด์…˜)

๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค(DexGraspNet, BODex ๋“ฑ)๊ณผ ๋น„๊ตํ•œ ํ•‘๊ฑฐํŒ ํŒŒ์ง€ ํ•ฉ์„ฑ ๊ฒฐ๊ณผ:

์ง€ํ‘œ Dexonomy BODex DexGraspNet
์„ฑ๊ณต๋ฅ  ์ตœ๊ณ  ์ค‘๊ฐ„ ๋‚ฎ์Œ
์ ‘์ด‰ ํ’๋ถ€๋„ ์ตœ๊ณ  ์ค‘๊ฐ„ ์ค‘๊ฐ„
์นจํˆฌ ์ตœ์†Œ ์ ์Œ ์ค‘๊ฐ„
์†๋„ ์ค‘๊ฐ„ ์ตœ๊ณ  (GPU ์ตœ์ ํ™”) ๋А๋ฆผ
๋‹ค์–‘์„ฑ ์ค‘๊ฐ„ ๋†’์Œ ๋†’์Œ

Dexonomy์˜ ์†๋„๊ฐ€ BODex๋ณด๋‹ค ์•ฝ๊ฐ„ ๋А๋ฆฐ ์ด์œ ๋Š”, Local Refinement ๋‹จ๊ณ„๊ฐ€ MuJoCo์˜ CPU ๋ฒ„์ „์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์˜ ํŒŒ์ง€ ๋‹ค์–‘์„ฑ์€ 31๊ฐ€์ง€ ์œ ํ˜•์„ ํฌํ•จํ•˜๋ฏ€๋กœ ์••๋„์ ์œผ๋กœ ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.

ํŒŒ์ง€ ์œ ํ˜•๋ณ„ ์„ฑ๊ณต๋ฅ  ๋ถ„์„

GRASP Taxonomy์— ๋”ฐ๋ฅธ ์„ธ ๋Œ€๋ถ„๋ฅ˜๋ณ„ ํ–‰๋™ ์ฐจ์ด๊ฐ€ ํฅ๋ฏธ๋กญ์Šต๋‹ˆ๋‹ค:

ํŒŒ์ง€ ๋ถ„๋ฅ˜ Normal ์กฐ๊ฑด ์„ฑ๊ณต๋ฅ  Hard ์กฐ๊ฑด ์„ฑ๊ณต๋ฅ  ํŠน์ง•
Precision ์ตœ๊ณ  ๊ธ‰๊ฒฉํžˆ ํ•˜๋ฝ ์†๊ฐ€๋ฝ ๋๋งŒ ์‚ฌ์šฉ, ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋‚˜ ๋งˆ์ฐฐ ์˜์กด๋„ ๋†’์Œ
Power ๋†’์Œ ์ƒ๋Œ€์  ์œ ์ง€ ๋„“์€ ์ ‘์ด‰ ๋ฉด์ ์œผ๋กœ ๋งˆ์ฐฐ ๊ฐ์†Œ์— ๊ฐ•๊ฑด
Intermediate ์ค‘๊ฐ„ ์ค‘๊ฐ„ ๋‘ ์œ ํ˜•์˜ ์ค‘๊ฐ„์  ํŠน์„ฑ

์ด ๊ฒฐ๊ณผ๋Š” ์ง๊ด€๊ณผ ์™„๋ฒฝํžˆ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค. ์ •๋ฐ€ ํŒŒ์ง€๋Š” ์ ‘์ด‰ ๋ฉด์ ์ด ์ž‘์•„์„œ ๋งˆ์ฐฐ์ด ์ค„์–ด๋“ค๋ฉด ์ทจ์•ฝํ•ด์ง€์ง€๋งŒ, ํŒŒ์›Œ ํŒŒ์ง€๋Š” ์†๋ฐ”๋‹ฅ๊นŒ์ง€ ๋™์›ํ•˜์—ฌ ๋„“์€ ์ ‘์ด‰์„ ํ™•๋ณดํ•˜๋ฏ€๋กœ ๋” ๊ฐ•๊ฑดํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ๋ฐ”๋กœ ํŒŒ์ง€ ์œ ํ˜• ๋‹ค์–‘์„ฑ์ด ์™œ ์ค‘์š”ํ•œ์ง€๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ์ฆ๊ฑฐ์ž…๋‹ˆ๋‹ค โ€” ์ƒํ™ฉ์— ๋งž๋Š” ํŒŒ์ง€ ์ „๋žต์„ ์„ ํƒํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํ…œํ”Œ๋ฆฟ ๊ฐ•๊ฑด์„ฑ

๋…ผ๋ฌธ์˜ Figure 5์—์„œ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒฐ๊ณผ๊ฐ€ ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค. ๋งค์šฐ ๋ถ€์ •ํ™•ํ•œ ์ ‘์ด‰ ์ฃผ์„์œผ๋กœ ์‹œ์ž‘ํ•ด๋„ ํ•ฉ๋ฆฌ์ ์ธ ํŒŒ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” Global Alignment โ†’ Local Refinement์˜ 2๋‹จ๊ณ„ ์„ค๊ณ„๊ฐ€ ์ดˆ๊ธฐ ๋…ธ์ด์ฆˆ์— ๊ฐ•๊ฑดํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.


๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

1. ๋ฌธ์ œ ์„ค์ •์˜ ์šฐ์•„ํ•จ

โ€œ๋ฌผ์ฒด๋ฅผ ์†์— ๋งž์ถ”๊ณ , ๊ทธ๋‹ค์Œ ์†์„ ๋ฌผ์ฒด์— ๋งž์ถ˜๋‹คโ€๋Š” 2๋‹จ๊ณ„ ์ „๋žต์€ 20+์ฐจ์› ๋™์‹œ ์ตœ์ ํ™”๋ฅผ 6์ฐจ์› + ๋กœ์ปฌ ์กฐ์ •์œผ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ํƒ์›”ํ•œ ๋ฌธ์ œ ๋ถ„ํ•ด์ž…๋‹ˆ๋‹ค. ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ๊ฐ„๋‹จํ•œ ํ•˜์œ„ ๋ฌธ์ œ๋“ค๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ โ€” ์ด๊ฒƒ์ด ์ข‹์€ ๊ณตํ•™์˜ ์ •์ˆ˜์ž…๋‹ˆ๋‹ค.

2. Transposed Jacobian์˜ ์˜๋ฆฌํ•œ ํ™œ์šฉ

๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ โ€œ์˜ตํ‹ฐ๋งˆ์ด์ €โ€๋กœ ํ™œ์šฉํ•˜๋Š” ๋ฐœ์ƒ์ด ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค. ์ปค์Šคํ…€ ์—๋„ˆ์ง€ ํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋Œ€์‹ , MuJoCo๊ฐ€ ์ด๋ฏธ ์ž˜ ํ•˜๋Š” ๊ฒƒ(์ ‘์ด‰ ํ•ด์„, ์นจํˆฌ ๋ฐฉ์ง€)์„ ๊ทธ๋Œ€๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ตœ์†Œํ•œ์˜ ์ฝ”๋”ฉ์œผ๋กœ ์ตœ๋Œ€์˜ ๋ฌผ๋ฆฌ์  ํ˜„์‹ค์„ฑ์„ ์–ป๋Š” ์…ˆ์ด์ฃ .

3. Self-Amplification ๋ฉ”์ปค๋‹ˆ์ฆ˜

์„ฑ๊ณตํ•œ ํŒŒ์ง€๊ฐ€ ์ƒˆ ํ…œํ”Œ๋ฆฟ์ด ๋˜๋Š” ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๋Š” โ€œ๋ฐ์ดํ„ฐ ํ”Œ๋ผ์ดํœ โ€ ํšจ๊ณผ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ดˆ๊ธฐ์— ํŒŒ์ง€ ์œ ํ˜•๋‹น 1๊ฐœ์˜ ํ…œํ”Œ๋ฆฟ์œผ๋กœ ์‹œ์ž‘ํ•ด์„œ, ์ ์  ๋” ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” ํ…œํ”Œ๋ฆฟ์ด ์ถ•์ ๋ฉ๋‹ˆ๋‹ค.

4. ์Šค์ผ€์ผ๊ณผ ์‹ค์šฉ์„ฑ

10,700๊ฐœ ๋ฌผ์ฒด, 950๋งŒ ํŒŒ์ง€, 31 ์œ ํ˜• โ€” ์ด ๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ์…‹์€ ์ด ๋ถ„์•ผ์—์„œ ์ „๋ก€๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์ด Hugging Face์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์–ด ํ›„์† ์—ฐ๊ตฌ์˜ ๊ธฐ๋ฐ˜์ด ๋ฉ๋‹ˆ๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

1. Shadow Hand ์ค‘์‹ฌ ์„ค๊ณ„

๋ฐ์ดํ„ฐ์…‹์€ Shadow Hand๋กœ๋งŒ ๊ตฌ์ถ•๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์€ ํŒŒ์ดํ”„๋ผ์ธ์ด โ€œ์–ด๋–ค ๊ด€์ ˆํ˜• ํ•ธ๋“œโ€์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ์ฃผ์žฅํ•˜์ง€๋งŒ, Allegro Hand๋‚˜ LEAP Hand ๋“ฑ ๋‹ค๋ฅธ ํ”Œ๋žซํผ์œผ๋กœ์˜ ์‹ค์งˆ์  ํ™•์žฅ์€ ๋ณด์—ฌ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. Shadow Hand๋Š” 24 DoF์˜ ๊ณ ๋„๋กœ ์ธ๊ฐ„๋ชจ๋ฐฉ์ ์ธ ํ•ธ๋“œ์ด๋ฏ€๋กœ, 16 DoF์ธ Allegro Hand์—์„œ 31๊ฐ€์ง€ ํŒŒ์ง€ ์œ ํ˜•์ด ๋ชจ๋‘ ์‹คํ˜„ ๊ฐ€๋Šฅํ•œ์ง€๋Š” ๋ณ„๋„์˜ ๊ฒ€์ฆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

2. ์†๋„-ํ’ˆ์งˆ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„

Local Refinement๊ฐ€ MuJoCo CPU์— ์˜์กดํ•˜์—ฌ BODex ๋Œ€๋น„ ์†๋„๊ฐ€ ๋А๋ฆฝ๋‹ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•์—๋Š” ์ถฉ๋ถ„ํ•˜์ง€๋งŒ, ์‹ค์‹œ๊ฐ„ ํŒŒ์ง€ ๊ณ„ํš์— ์ง์ ‘ ์‚ฌ์šฉํ•˜๊ธฐ๋Š” ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

3. ํŒŒ์ง€ ์œ ํ˜•๋ณ„ ์„ฑ๊ณต๋ฅ  ๋ถˆ๊ท ํ˜•

์ผ๋ถ€ ํŠน์ˆ˜ ํŒŒ์ง€ ์œ ํ˜•(์˜ˆ: Lateral, Sphere 4 Finger)์˜ ์„ฑ๊ณต๋ฅ ์ด ๋งค์šฐ ๋‚ฎ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ•ด๋‹น ์œ ํ˜•์ด ํŠน์ • ๋ฌผ์ฒด ํ˜•ํƒœ์—๋งŒ ์ ํ•ฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ด์ง€๋งŒ, ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์˜ ํŒŒ์ง€ ์œ ํ˜• ๋ถ„ํฌ๊ฐ€ ๋ถˆ๊ท ํ˜•ํ•ฉ๋‹ˆ๋‹ค.

4. ์ •์  ํŒŒ์ง€ ํ•œ์ •

์ด ์—ฐ๊ตฌ๋Š” ์ •์  ํŒŒ์ง€(static grasp)๋งŒ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์‹ค์ œ ์กฐ์ž‘ ์ž‘์—…์—์„œ ํ•„์š”ํ•œ in-hand manipulation, ๋ฆฌ๊ทธ๋ž˜์Šคํ•‘(regrasping), ๋„๊ตฌ ์‚ฌ์šฉ ๋“ฑ์˜ ๋™์  ํ–‰๋™์€ ๋ฒ”์œ„ ๋ฐ–์ž…๋‹ˆ๋‹ค.

5. Type Classifier์˜ ํ•œ๊ณ„

ํŒŒ์ง€ ์œ ํ˜• ์ž๋™ ์„ ํƒ๊ธฐ๋Š” ๋ฌผ์ฒด์˜ ํ˜•ํƒœ๋งŒ ๊ณ ๋ คํ•˜๋ฉฐ, ์ž‘์—… ์˜๋ฏธ๋ก (task semantics)์€ ๋ฐ˜์˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฐ™์€ ์ปต์ด๋ผ๋„ โ€œ๋งˆ์‹œ๊ธฐ ์œ„ํ•ด ์žก๊ธฐโ€์™€ โ€œ์Ÿ๊ธฐ ์œ„ํ•ด ์žก๊ธฐโ€๋Š” ๋‹ค๋ฅธ ํŒŒ์ง€ ์œ ํ˜•์ด ํ•„์š”ํ•œ๋ฐ, ์ด๋Ÿฌํ•œ ๊ธฐ๋Šฅ์  ํŒŒ์ง€(functional grasping)๋Š” ๋‹ค๋ฃจ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.


๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

ํŒŒ์ง€ ํ•ฉ์„ฑ ๋ฐฉ๋ฒ•๋ก ์˜ ์ง„ํ™”

timeline
    title ๋กœ๋ด‡ ์† ํŒŒ์ง€ ํ•ฉ์„ฑ์˜ ๋ฐœ์ „์‚ฌ
    section ์ดˆ๊ธฐ (2020 ์ด์ „)
        Sampling ๊ธฐ๋ฐ˜ : Simulated annealing ๋“ฑ ๋น„๋ฏธ๋ถ„ ์ตœ์ ํ™”
    section DexGraspNet ์‹œ๋Œ€ (2023)
        DexGraspNet : ๋ฏธ๋ถ„ ๊ฐ€๋Šฅ ์—๋„ˆ์ง€ + ๊ทธ๋ž˜๋””์–ธํŠธ ์ตœ์ ํ™”
                    : 1.32M grasps, type-unaware
    section Bilevel ์ตœ์ ํ™” (2025)
        BODex : GPU ๊ฐ€์† bilevel ์ตœ์ ํ™”
              : ์ดˆ๋‹น 49+ grasps ํ•ฉ์„ฑ
    section Taxonomy-Aware (2025)
        Dexonomy : 31 ํŒŒ์ง€ ์œ ํ˜• ๋ฒ”์šฉ ํ•ฉ์„ฑ
                 : 9.5M grasps
        OmniDexVLG : ์–ธ์–ด ์กฐ๊ฑด + ๊ธฐ๋Šฅ์  ์–ดํฌ๋˜์Šค

๋ฐฉ๋ฒ• ์œ ํ˜• ์ธ์‹ ํ•ธ๋“œ ๋ฒ”์šฉ์„ฑ ๊ทœ๋ชจ ๋ฌผ๋ฆฌ ๊ฒ€์ฆ ํ•™์Šต ๋ชจ๋ธ
DexGraspNet (2023) โœ— Shadow 1.32M ์ œํ•œ์  CVAE
DexGraspNet 2.0 (2024) โœ— Shadow 427M MuJoCo CVAE
BODex (2025) โœ— Shadow/Allegro/LEAP ~์ˆ˜M MuJoCo -
GraspXL (2025) ๋ถ€๋ถ„์  Shadow RL ๊ธฐ๋ฐ˜ IsaacGym Policy
OmniDexVLG (2025) โœ“ (์–ธ์–ด+) Shadow - ๋ฌผ๋ฆฌ ์ตœ์ ํ™” VLM ์กฐ๊ฑด
Dexonomy (2025) โœ“ (31 ์œ ํ˜•) Shadow 9.5M MuJoCo CVAE
DemoGrasp (2025) โœ— ๋‹ค์ค‘ ํ•ธ๋“œ RL ๊ธฐ๋ฐ˜ IsaacGym RL Policy

DemoGrasp์™€์˜ ๋Œ€๋น„

๊ฑฐ์˜ ๋™์‹œ๊ธฐ์— ๋ฐœํ‘œ๋œ DemoGrasp๋Š” RL ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฒ”์šฉ ํŒŒ์ง€๋ฅผ ํ•™์Šตํ•˜๋ฉฐ ์‹ค์„ธ๊ณ„ 110๊ฐœ ๋ฌผ์ฒด์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์ฐจ์ด์ ์€ ๋ช…ํ™•ํ•ฉ๋‹ˆ๋‹ค:

  • DemoGrasp: ํŒŒ์ง€ ์œ ํ˜•์„ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์ง€๋งŒ, closed-loop ์ ‘๊ทผ์— ๋” ๊ฐ€๊นŒ์›€
  • Dexonomy: ํŒŒ์ง€ ์œ ํ˜•์„ ๋ช…์‹œ์ ์œผ๋กœ ์ œ์–ดํ•˜์ง€๋งŒ, open-loop ํŒŒ์ง€ ํฌ์ฆˆ ์ƒ์„ฑ์— ์ง‘์ค‘

๋‘ ์ ‘๊ทผ๋ฒ•์€ ์ƒํ˜ธ ๋ณด์™„์ ์ด๋ฉฐ, Dexonomy์˜ type-aware ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ DemoGrasp ๋ฅ˜์˜ RL ์ •์ฑ…์„ ํ•™์Šตํ•˜๋ฉด ์‹œ๋„ˆ์ง€๊ฐ€ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค.

OmniDexVLG์™€์˜ ๋น„๊ต

OmniDexVLG๋Š” ํŒŒ์ง€ ๋ถ„๋ฅ˜ ์ฒด๊ณ„๋ฟ ์•„๋‹ˆ๋ผ ๊ธฐ๋Šฅ์  ์–ดํฌ๋˜์Šค(functional affordance)๊นŒ์ง€ ๊ณ ๋ คํ•˜๋ฉฐ, VLM(Vision-Language Model)์„ ํ™œ์šฉํ•œ ์˜๋ฏธ๋ก ์  ํŒŒ์ง€ ์ƒ์„ฑ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. Dexonomy๊ฐ€ ํŒŒ์ง€ ํ•ฉ์„ฑ์˜ โ€œ์–‘(quantity)โ€๊ณผ โ€œ๋ฌผ๋ฆฌ์  ์ •ํ™•์„ฑโ€์— ์ง‘์ค‘ํ•œ๋‹ค๋ฉด, OmniDexVLG๋Š” โ€œ์˜๋ฏธ๋ก (semantics)โ€์— ์ง‘์ค‘ํ•˜๋Š” ์…ˆ์ž…๋‹ˆ๋‹ค.


์šฐ๋ฆฌ ์—ฐ๊ตฌ์—์˜ ์‹œ์‚ฌ์ 

RL ์—ฐ๊ตฌ ๊ด€์ ์—์„œ

Dexonomy ๋ฐ์ดํ„ฐ์…‹์€ RL ๊ธฐ๋ฐ˜ ์กฐ์ž‘ ์—ฐ๊ตฌ์— ์ง์ ‘ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • Goal-conditioned RL: ๋ชฉํ‘œ ํŒŒ์ง€ ํฌ์ฆˆ๋ฅผ Dexonomy์—์„œ ์ƒ˜ํ”Œ๋ง
  • Reward shaping: type-aware ์ ‘์ด‰ ๋ณด์ƒ ์„ค๊ณ„
  • Curriculum learning: ์‰ฌ์šด power grasp โ†’ ์–ด๋ ค์šด precision grasp ์ˆœ์„œ๋กœ ํ•™์Šต

VLA ๋ชจ๋ธ ๊ด€์ ์—์„œ

Type-conditional ์ƒ์„ฑ ๋ชจ๋ธ ์œ„์— ์–ธ์–ด ์กฐ๊ฑด์„ ์ถ”๊ฐ€ํ•˜๋ฉด, โ€œ์ด ์ปต์„ ํ•€์น˜๋กœ ์žก์•„โ€์™€ ๊ฐ™์€ ์ž์—ฐ์–ด ๋ช…๋ น์— ๋”ฐ๋ฅธ ํŒŒ์ง€ ์ƒ์„ฑ์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค. ์ด๋Š” VLA(Vision-Language-Action) ๋ชจ๋ธ์˜ action space๋ฅผ ํŒŒ์ง€ ์œ ํ˜•์œผ๋กœ ๊ตฌ์กฐํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ๊ณผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.


์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

Dexonomy๋Š” โ€œ์ธ๊ฐ„์˜ ํŒŒ์ง€ ๋ถ„๋ฅ˜ ์ฒด๊ณ„๋ฅผ ๋กœ๋ด‡์ด ์™„๋ฒฝํ•˜๊ฒŒ ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€๋ผ๋Š” ๊ทผ๋ณธ์  ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋งค์šฐ ์„ค๋“๋ ฅ ์žˆ๋Š” ๋‹ต๋ณ€์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

  1. ์—ญ๋ฐœ์ƒ์˜ ํž˜: ์†์„ ๋ฌผ์ฒด์— ๋งž์ถ”๋Š” ๋Œ€์‹  ๋ฌผ์ฒด๋ฅผ ์†์— ๋งž์ถ”๋Š” Global Alignment์€ ์ฐจ์› ์ถ•์†Œ์˜ ์•„๋ฆ„๋‹ค์šด ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค
  2. ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ = ์˜ตํ‹ฐ๋งˆ์ด์ €: Transposed Jacobian control๋กœ MuJoCo๋ฅผ โ€œ์ ‘์ด‰ ์ตœ์ ํ™”๊ธฐโ€๋กœ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์‹ค์šฉ์ ์ธ ํ†ต์ฐฐ์ž…๋‹ˆ๋‹ค
  3. 1๊ฐœ์˜ ํ…œํ”Œ๋ฆฟ โ†’ 950๋งŒ ํŒŒ์ง€: Self-amplification์œผ๋กœ ์ตœ์†Œํ•œ์˜ ์ธ๊ฐ„ ๋…ธ๋ ฅ์—์„œ ์ตœ๋Œ€ํ•œ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ
  4. ๋‹ค์–‘์„ฑ์ด ๊ณง ์„ฑ๋Šฅ: 31๊ฐ€์ง€ ํŒŒ์ง€ ์œ ํ˜• ์ „์ฒด๋ฅผ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด ๋‹จ์ผ ์œ ํ˜•๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ๊ฒƒ์€, ํŒŒ์ง€ ์œ ํ˜•์˜ ๋ณด์™„์„ฑ(complementarity)์„ ์ฆ๋ช…ํ•ฉ๋‹ˆ๋‹ค

๋‚จ์€ ๊ณผ์ œ

  • ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ•ธ๋“œ(Allegro, LEAP, RUKA ๋“ฑ)๋กœ์˜ ํ™•์žฅ
  • ๋™์  ์กฐ์ž‘๊ณผ์˜ ํ†ตํ•ฉ (grasp โ†’ manipulate ์—ฐ์† ๋™์ž‘)
  • ๊ธฐ๋Šฅ์  ํŒŒ์ง€(functional grasping)๋กœ์˜ ์˜๋ฏธ๋ก ์  ํ™•์žฅ
  • GPU ๊ธฐ๋ฐ˜ Local Refinement๋กœ์˜ ์†๋„ ๊ฐœ์„ 

Dexonomy๋Š” dexterous grasping ์—ฐ๊ตฌ์˜ ์ธํ”„๋ผ๋ฅผ ํ•œ ๋‹จ๊ณ„ ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค. ๋งˆ์น˜ ImageNet์ด ์ปดํ“จํ„ฐ ๋น„์ „์— ๋ฏธ์นœ ์˜ํ–ฅ์ฒ˜๋Ÿผ, ๋Œ€๊ทœ๋ชจ ๊ณ ํ’ˆ์งˆ ํŒŒ์ง€ ๋ฐ์ดํ„ฐ์…‹์€ ์ด ๋ถ„์•ผ์˜ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์™€ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ์—ด์–ด์ค„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.


์ฐธ๊ณ  ์ •๋ณด

ํ•ญ๋ชฉ ์ •๋ณด
๋…ผ๋ฌธ Dexonomy: Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy
์ €์ž Jiayi Chen, Yubin Ke, Lin Peng, He Wang (Peking University / Galbot)
๋ฐœํ‘œ RSS 2025 (Robotics: Science and Systems)
ArXiv 2504.18829
ํ”„๋กœ์ ํŠธ pku-epic.github.io/Dexonomy
์ฝ”๋“œ github.com/JYChen18/Dexonomy
๋ฐ์ดํ„ฐ์…‹ HuggingFace
๋ผ์ด์„ ์Šค CC BY-NC 4.0

Copyright 2026, JungYeon Lee