Curieux.JY
  • Post
  • Note
  • Jung Yeon Lee

On this page

  • 1 Brief Review
  • 2 Detail Review

๐Ÿ“ƒSeqMultiGrasp ๋ฆฌ๋ทฐ

grasp
diffusion
multi-objects
Sequential Multi-Object Grasping with One Dexterous Hand
Published

August 6, 2025

  • Paper Link
  • Project Link
  1. ๐Ÿค–๋ณธ ๋…ผ๋ฌธ์€ Allegro Hand๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ฐ์ฒด๋ฅผ ํ•œ ์†์œผ๋กœ ์ˆœ์ฐจ์ ์œผ๋กœ ํŒŒ์ง€ํ•˜๋Š” ๋กœ๋ด‡ ์‹œ์Šคํ…œ์ธ SeqMultiGrasp์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. โœ‹์ด ์‹œ์Šคํ…œ์€ ๋จผ์ € ์†์˜ ํŠน์ • ๋งํฌ์— ์ œ์•ฝ๋œ ๋‹จ์ผ ๊ฐ์ฒด ํŒŒ์ง€ ํ›„๋ณด๋ฅผ ํ•ฉ์„ฑํ•˜๊ณ  ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๊ฒ€์ฆํ•œ ํ›„, ์ด๋ฅผ ๋ณ‘ํ•ฉํ•˜์—ฌ ๋‹ค์ค‘ ๊ฐ์ฒด ํŒŒ์ง€ ๊ตฌ์„ฑ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  3. โœ…์‹ค์ œ ํ™˜๊ฒฝ ๋ฐฐํฌ๋ฅผ ์œ„ํ•ด Point Cloud ๊ธฐ๋ฐ˜์˜ Diffusion Model์ด ํŒŒ์ง€ ์ž์„ธ๋ฅผ ์ œ์•ˆํ•˜๊ณ  ํœด๋ฆฌ์Šคํ‹ฑ ๊ธฐ๋ฐ˜์˜ ์‹คํ–‰ ์ „๋žต์„ ํ†ตํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ 65.8%, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ 56.7%์˜ ํ‰๊ท  ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

1 Brief Review

๋ณธ ๋…ผ๋ฌธ์€ ํ•˜๋‚˜์˜ ๋ฏผ์ฒฉํ•œ ์†์œผ๋กœ ์—ฌ๋Ÿฌ ๊ฐ์ฒด๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํŒŒ์ง€ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๋ฉฐ, ์ด๋ฅผ ์œ„ํ•œ ์‹œ์Šคํ…œ์ธ SeqMultiGrasp๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ธ๊ฐ„์€ ์†์˜ ๋›ฐ์–ด๋‚œ ๋ฏผ์ฒฉ์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ฐ์ฒด๋ฅผ ๋™์‹œ์— ๋˜๋Š” ์ˆœ์ฐจ์ ์œผ๋กœ ํŒŒ์ง€ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋กœ๋ด‡์—๊ฒŒ ์ด๋Š” ๊ฐ์ฒด์˜ ๋‹ค์–‘ํ•œ ํ˜•์ƒ๊ณผ ๋†’์€ ์ž์œ ๋„(high-DOF) ์†์˜ ๋ณต์žกํ•œ ์ ‘์ด‰ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ์ธํ•ด ์–ด๋ ค์šด ๋„์ „ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ํ•˜๋‚˜์˜ ๊ฐ์ฒด๋ฅผ ํŒŒ์ง€ํ•œ ์ƒํƒœ์—์„œ ๋‹ค๋ฅธ ๊ฐ์ฒด๋ฅผ ํŒŒ์ง€ํ•ด์•ผ ํ•˜๋Š” ์ˆœ์ฐจ์  ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ๋‚œ์ด๋„๋Š” ๋”์šฑ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

SeqMultiGrasp๋Š” ๋„ค ์†๊ฐ€๋ฝ์„ ๊ฐ€์ง„ Allegro Hand๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๊ฐœ์˜ ๊ฐ์ฒด๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํŒŒ์ง€ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ์ฒซ ๋ฒˆ์งธ ๊ฐ์ฒด๋ฅผ ์™„์ „ํžˆ ๊ฐ์‹ธ ๋“ค์–ด ์˜ฌ๋ฆฐ ํ›„, ์ฒซ ๋ฒˆ์งธ ๊ฐ์ฒด๋ฅผ ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š์œผ๋ฉด์„œ ๋‘ ๋ฒˆ์งธ ๊ฐ์ฒด๋ฅผ ํŒŒ์ง€ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  1. ๋‹จ์ผ ๊ฐ์ฒด ๊ทธ๋žฉ ํ›„๋ณด ํ•ฉ์„ฑ:
    • ์šฐ์„ , Differentiable Force Closure (DFC) [13] ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹จ์ผ ๊ฐ์ฒด ๊ทธ๋žฉ ํฌ์ฆˆ๋ฅผ ํ•ฉ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ํŒŒ์ง€ ๋ฌธ์ œ๋ฅผ ์—๋„ˆ์ง€ ํ•จ์ˆ˜์˜ ์ตœ์ ํ™”๋กœ ์ •์‹ํ™”ํ•˜์—ฌ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
    • ์† ๊ตฌ์„ฑ H = (\theta , T)๋Š” ๋กœ๋ด‡ ์†์˜ ๊ด€์ ˆ ๊ตฌ์„ฑ \theta \in \mathbb{R}^d์™€ ๊ฐ์ฒด O์— ๋Œ€ํ•œ ์ƒ๋Œ€ ํฌ์ฆˆ T \in SE(3)๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
    • ์—๋„ˆ์ง€ ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: E = E_{fc} + w_{dis}E_{dis} + w_pE_p + w_{sp}E_{sp} + w_qE_q ์—ฌ๊ธฐ์„œ E_{fc}๋Š” force closure ํ•ญ, E_{dis}๋Š” ์ ‘์ด‰์ ๊ณผ ๊ฐ์ฒด ํ‘œ๋ฉด ๊ฐ„์˜ ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ, E_p๋Š” ์†, ๊ฐ์ฒด, ํƒ์ž ๊ฐ„์˜ ์นจํˆฌ(penetration)์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ, E_{sp}๋Š” ์†์˜ ์ž๊ธฐ ์นจํˆฌ(self-penetration)์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ, E_q๋Š” ๊ด€์ ˆ ํ•œ๊ณ„ ์œ„๋ฐ˜์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. w ํ•ญ๋“ค์€ ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์˜ ๊ฐ€์ค‘์น˜ ๊ณ„์ˆ˜์ž…๋‹ˆ๋‹ค.
    • ํ•ฉ์„ฑ ๊ณผ์ •์—์„œ๋Š” ์† ํ‘œ๋ฉด์˜ ์ ‘์ด‰ ํ›„๋ณด์ ์—์„œ ์ ‘์ด‰์ ์„ ์ƒ˜ํ”Œ๋งํ•˜๊ณ  ์ดˆ๊ธฐ ๊ตฌ์„ฑ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ Metropolis-Adjusted Langevin Algorithm (MALA)๊ณผ ๊ฒฐํ•ฉ๋œ ๊ฒฝ์‚ฌ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ํŠน์ • ์ž„๊ณ„๊ฐ’์„ ์ดˆ๊ณผํ•˜๋Š” ์—๋„ˆ์ง€๋ฅผ ๊ฐ€์ง„ ๊ตฌ์„ฑ์€ ํ•„ํ„ฐ๋ง๋ฉ๋‹ˆ๋‹ค.
    • ์ˆœ์ฐจ์  ๋‹ค์ค‘ ๊ฐ์ฒด ํŒŒ์ง€๋ฅผ ์œ„ํ•ด, ์ฒซ ๋ฒˆ์งธ ๊ฐ์ฒด๋Š” ์—„์ง€, ๊ฒ€์ง€, ์ค‘์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋Š” pinch-like grasp, ๋‘ ๋ฒˆ์งธ ๊ฐ์ฒด๋Š” ์•ฝ์ง€์™€ ์†๋ฐ”๋‹ฅ์„ ์‚ฌ์šฉํ•˜๋Š” side grasp์— ์ ‘์ด‰ ํ›„๋ณด์ ์„ ์ œํ•œํ•˜๋Š” ๋“ฑ ๊ธฐ์กด DFC ํŒŒ์ดํ”„๋ผ์ธ์— ์—ฌ๋Ÿฌ ์ˆ˜์ • ์‚ฌํ•ญ์ด ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  2. ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜ ๊ทธ๋žฉ ์œ ํšจ์„ฑ ๊ฒ€์ฆ:
    • ํ•ฉ์„ฑ๋œ ๊ทธ๋žฉ ํ›„๋ณด๋“ค์€ GPU ๊ฐ€์† ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์ธ ManiSkill [39]์—์„œ ๊ทธ๋žฉ์„ ์‹คํ–‰ํ•˜์—ฌ ์•ˆ์ •์„ฑ๊ณผ ์‹คํ–‰ ๊ฐ€๋Šฅ์„ฑ์„ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค.
    • Rotation Robustness: ๊ฐ์ฒด๊ฐ€ 6๊ฐ€์ง€ ์ถ• ์ •๋ ฌ ์ค‘๋ ฅ ๋ฐฉํ–ฅ(ยฑx, ยฑy, ยฑz) ํ•˜์—์„œ 2.5์ดˆ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ›„์—๋„ ์†๊ณผ ์ ‘์ด‰์„ ์œ ์ง€ํ•˜๋Š”์ง€ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    • Execution Feasibility: ๊ทธ๋žฉ์ด ํ™˜๊ฒฝ๊ณผ์˜ ์ถฉ๋Œ ์—†์ด ์„ฑ๊ณต์ ์œผ๋กœ ์‹คํ–‰๋  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
  3. ๋‹ค์ค‘ ๊ฐ์ฒด ๊ทธ๋žฉ ๊ตฌ์„ฑ ๋ณ‘ํ•ฉ:
    • ๊ฒ€์ฆ๋œ ๋‹จ์ผ ๊ฐ์ฒด ๊ทธ๋žฉ ํฌ์ฆˆ๋“ค์„ ๋ณ‘ํ•ฉํ•˜์—ฌ ๋‹ค์ค‘ ๊ฐ์ฒด ๊ทธ๋žฉ ๊ตฌ์„ฑ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ๊ด€๋ จ ์† ๋งํฌ์™€ ๊ด€์ ˆ์ด ์™„์ „ํžˆ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ์„ ๋•Œ๋งŒ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
    • ๋ณ‘ํ•ฉ ์‹œ, ๊ฐ ์†๊ฐ€๋ฝ์˜ ๊ด€์ ˆ ๊ฐ๋„๋Š” ํ•ด๋‹น ์†๊ฐ€๋ฝ์ด ์žก๋Š” ๊ฐ์ฒด์˜ ์ ‘์ด‰์ ์— ๋”ฐ๋ผ ์„ค์ •๋ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ๊ฐ์ฒด๋„ ์žก์ง€ ์•Š๋Š” ์†๊ฐ€๋ฝ์˜ ๊ด€์ ˆ ๊ฐ๋„๋Š” ๋‹จ์ผ ๊ฐ์ฒด ๊ทธ๋žฉ ์ค‘ ํ•˜๋‚˜์—์„œ ๋ฌด์ž‘์œ„๋กœ ์ƒ์†๋ฐ›์•„ ๋น„๊ฒน์นจ ์ œ์–ด ์ œ์•ฝ ์กฐ๊ฑด์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  4. Diffusion-based ํฌ์ฆˆ ์ƒ์„ฑ:
    • ๊ทธ๋žฉ ํฌ์ฆˆ ์ƒ์„ฑ์˜ ๊ณ„์‚ฐ ๋น„์šฉ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด, ๊ฐ์ฒด์˜ point cloud P = \{P_j\}_{j=1}^{N_o}์— ์กฐ๊ฑดํ™”๋œ diffusion model [40]์„ ํ›ˆ๋ จํ•˜์—ฌ ์† ํฌ์ฆˆ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
    • Forward Process (๋…ธ์ด์ฆˆ ์ถ”๊ฐ€): q(H_t |H_{t-1}) = \mathcal{N} \left( H_t ; \sqrt{1 - \beta_t} H_{t-1}, \beta_t \mathbf{I} \right) ์—ฌ๊ธฐ์„œ \beta_t๋Š” ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ์„ ์ œ์–ดํ•˜๊ณ  \mathbf{I}๋Š” ํ•ญ๋“ฑ ํ–‰๋ ฌ์ž…๋‹ˆ๋‹ค.
    • Reverse Process (๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฐ ์žฌ๊ตฌ์„ฑ): p_\phi (H_{t-1}|H_t , P) = \mathcal{N} \left( H_{t-1}; \mu_\phi (H_t ,t, P), \Sigma_\phi (H_t ,t, P) \right) ์—ฌ๊ธฐ์„œ \mu_\phi์™€ \Sigma_\phi๋Š” ๊ฐ๊ฐ ์˜ˆ์ธก๋œ ํ‰๊ท ๊ณผ ๊ณต๋ถ„์‚ฐ์ž…๋‹ˆ๋‹ค.
    • ๋„คํŠธ์›Œํฌ๋Š” PointNet++ [43]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ point cloud ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ํšŒ์ „ ํ–‰๋ ฌ๋กœ ๊ฐ์ฒด ๋ฐฉํ–ฅ์„ ํ‘œํ˜„ํ•˜๋ฉฐ, singular value decomposition (SVD) [44]๋ฅผ ์ ์šฉํ•˜์—ฌ ์ง๊ต์„ฑ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
  5. ํœด๋ฆฌ์Šคํ‹ฑ ๊ธฐ๋ฐ˜ ์‹คํ–‰ ์ „๋žต:
    • ๋ณต์žกํ•œ reinforcement learning (RL) ์ •์ฑ… ๋Œ€์‹ , simple squeeze-and-lift ์ ˆ์ฐจ๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค.
    • CuRobo [45]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—”๋“œ ์ดํŽ™ํ„ฐ๋ฅผ ๊ทธ๋žฉ ํฌ์ฆˆ์—์„œ ์˜คํ”„์…‹๋œ ์ถฉ๋Œ ์—†๋Š” ํฌ์ฆˆ๋กœ ๋ชจ์…˜ ํ”Œ๋ž˜๋‹ํ•ฉ๋‹ˆ๋‹ค.
    • ์ดํ›„ ์ถฉ๋Œ ๊ฒ€์‚ฌ ์—†์ด ๊ทธ๋žฉ ํฌ์ฆˆ๋กœ ๋А๋ฆฌ๊ฒŒ ์ด๋™ํ•˜๋ฉฐ, ์† ๊ด€์ ˆ ์œ„์น˜๋ฅผ ๋‘ ๋‹จ๊ณ„๋กœ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” pre-grasp joint position์œผ๋กœ ์†๊ฐ€๋ฝ ๋์„ ํ›„ํ‡ด์‹œํ‚ค๊ณ , ๋‘ ๋ฒˆ์งธ๋Š” target joint position์œผ๋กœ ์†๊ฐ€๋ฝ์„ ๋‹ซ์Šต๋‹ˆ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์ด ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” 8x8 ๊ฐ์ฒด ์กฐํ•ฉ์— ๋Œ€ํ•ด Synthesized Grasp (SG) ๋ฐฉ์‹์ด ํ‰๊ท  82.7%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์œผ๋ฉฐ, diffusion model ๊ธฐ๋ฐ˜ Learned Grasp (LG) ๋ฐฉ์‹์€ 65.8%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ๋กœ๋ด‡ ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•œ ์‹คํ—˜์—์„œ๋Š” 6x3 ๊ฐ์ฒด ์กฐํ•ฉ์— ๋Œ€ํ•ด SG๊ฐ€ 64.4%, LG๊ฐ€ 56.7%์˜ ํ‰๊ท  ์„ฑ๊ณต๋ฅ ์„ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ํ™˜๊ฒฝ point cloud ํš๋“์„ ์œ„ํ•ด Nerfstudio [50], COLMAP [51], Stable Normal [52], 2D Gaussian Splatting [53] ๋“ฑ์˜ ๊ธฐ์ˆ ์ด ํ™œ์šฉ๋˜์–ด sim-to-real gap์„ ์ค„์˜€์Šต๋‹ˆ๋‹ค.

SeqMultiGrasp๋Š” ์—ฌ์ „ํžˆ ๋‘ ๊ฐœ์˜ ๊ฐ์ฒด๋งŒ ๋‹ค๋ฃจ๋ฉฐ ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ์™€ ๋‹ค์–‘์„ฑ, ๊ทธ๋ฆฌ๊ณ  ํœด๋ฆฌ์Šคํ‹ฑ์— ๋Œ€ํ•œ ์˜์กด์„ฑ ๋“ฑ ๋ช‡ ๊ฐ€์ง€ ํ•œ๊ณ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€๋งŒ, ๋‹ค์žฌ๋‹ค๋Šฅํ•œ ๋‹ค์ค‘ ๊ฐ์ฒด ํŒŒ์ง€ ๋ถ„์•ผ์˜ ๋ฏธ๋ž˜ ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ์œ ๋งํ•œ ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.


2 Detail Review

Copyright 2024, Jung Yeon Lee