Curieux.JY
  • JungYeon Lee
  • Post
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ์„œ๋ก 
    • ๋ฐฉ๋ฒ•
      • Variational Grasp Sampler
      • Grasp Pose Evaluation
      • Iterative Grasp Pose Refinement
      • ํ•™์Šต ๋ฐ์ดํ„ฐ
    • ์‹คํ—˜
      • Ablation
      • ๋กœ๋ด‡ ์‹คํ—˜
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

๐Ÿ“ƒ6DOF GraspNet

grasp
pointcloud
vae
manipulation
6-DOF GraspNet: Variational Grasp Generation for Object Manipulation
Published

March 31, 2024

  • Paper Link (arXiv:1905.10520)
  • Presentation Video
  • ๊ด€๋ จ ํ›„์†/์‘์šฉ: VCGS (arXiv:2302.10745)
  1. ๐Ÿค– ์ด ์—ฐ๊ตฌ๋Š” 3D ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋กœ๋ด‡ ๊ฐ์ฒด ์กฐ์ž‘์„ ์œ„ํ•œ 6-DOF grasp pose๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ, Variational Autoencoder (VAE)๋ฅผ ํ™œ์šฉํ•œ Grasp Sampler์™€ Grasp Evaluator๋ฅผ ํ†ตํ•œ ๋ฐ˜๋ณต์ ์ธ ์ •์ œ ๊ณผ์ •์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. โš™๏ธ Grasp Sampler๋Š” ๋‹ค์–‘ํ•œ grasp๋ฅผ ์ƒ์„ฑํ•˜๊ณ  Grasp Evaluator๋Š” grasp์˜ ์„ฑ๊ณต ํ™•๋ฅ ์„ ํ‰๊ฐ€ํ•˜๋ฉฐ, ์ด ํ‰๊ฐ€ ๋ชจ๋ธ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ๋ง๋œ grasp๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿ† ์ œ์•ˆ๋œ ๋ชจ๋ธ์€ ์ˆœ์ˆ˜ํ•˜๊ฒŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต๋˜์—ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜์—์„œ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์— ๋Œ€ํ•ด 88%์˜ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ๊ธฐ์กด ์ ‘๊ทผ ๋ฐฉ์‹๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

๋ณธ ๋…ผ๋ฌธ์€ ๋กœ๋ด‡ ๊ฐ์ฒด ์กฐ์ž‘์„ ์œ„ํ•œ grasp pose ์ƒ์„ฑ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋ฉฐ, ์ด๋ฅผ Variational Autoencoder (VAE)๋ฅผ ํ™œ์šฉํ•œ grasp sampling ๋ฌธ์ œ๋กœ ์ •์˜ํ•˜๊ณ , grasp evaluator model์„ ํ†ตํ•ด ์ƒ˜ํ”Œ๋ง๋œ grasps๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ  ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. Grasp Sampler์™€ Grasp Refinement ๋„คํŠธ์›Œํฌ ๋ชจ๋‘ depth ์นด๋ฉ”๋ผ๋กœ ๊ด€์ธก๋œ 3D point cloud๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก ์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ฒ˜๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  1. Grasp Sampler (Variational Autoencoder, VAE): ๊ด€์ธก๋œ ๊ฐ์ฒด์˜ ๋ถ€๋ถ„์ ์ธ point cloud๋กœ๋ถ€ํ„ฐ ๋‹ค์–‘ํ•œ grasp set๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  2. Grasp Evaluator Network: ์ƒ์„ฑ๋œ grasp์˜ ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๊ณ , ๊ทธ ๊ธฐ์šธ๊ธฐ(gradient)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ grasp ์ƒ˜ํ”Œ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.

1. 6-DOF Grasp Pose ์ƒ์„ฑ (6-DOF Grasp Pose Generation)

Grasp pose ์ƒ์„ฑ์€ ๋กœ๋ด‡ ๊ทธ๋ฆฌํผ๊ฐ€ ํŠน์ • ๊ฐ์ฒด๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์žก๊ธฐ ์œ„ํ•œ 6-DOF (3D translation ๋ฐ 3D orientation) ์ž์„ธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. Grasp๋Š” ๊ฐ์ฒด ๊ธฐ์ค€ ํ”„๋ ˆ์ž„(\bar{X}, ๊ฐ์ฒด point cloud์˜ ์งˆ๋Ÿ‰ ์ค‘์‹ฌ)์—์„œ ์ •์˜๋˜๋ฉฐ, ๊ทธ ์ถ•์€ ์นด๋ฉ”๋ผ ํ”„๋ ˆ์ž„๊ณผ ํ‰ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์„ฑ๊ณต์ ์ธ grasp ๊ณต๊ฐ„ \mathcal{G}^*์˜ ํ›„๋ฐฉ ๋ถ„ํฌ P(\mathcal{G}^* | X)๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ X๋Š” ๊ฐ์ฒด์˜ ๋ถ€๋ถ„ point cloud์ž…๋‹ˆ๋‹ค. ์„ฑ๊ณต์ ์ธ grasp์˜ ๋ถ„ํฌ๋Š” ๋ณต์žกํ•˜๊ณ  ๋ถˆ์—ฐ์†์ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1.1. Variational Grasp Sampler

Grasp sampler๋Š” P(\mathcal{G} | X)์˜ ๊ฐ€๋Šฅ์„ฑ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

  • ์ž…๋ ฅ: ๊ฐ์ฒด์˜ point cloud X์™€ ์ž ์žฌ ๋ณ€์ˆ˜ z.
  • ์ถœ๋ ฅ: ์˜ˆ์ธก๋œ grasp \hat{g}.
  • ์ž ์žฌ ๊ณต๊ฐ„ (z): P(z) = \mathcal{N}(0, I)๋กœ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ z๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ ๋‹ค์–‘ํ•œ grasps๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๋ชฉํ‘œ ํ•จ์ˆ˜: Encoder Q(z | X, g)๋Š” ๊ฐ (point cloud X, grasp g) ์Œ์„ ์ž ์žฌ ๊ณต๊ฐ„์˜ ์ž‘์€ ๋ถ€๋ถ„ ๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค. Decoder๋Š” ์ƒ˜ํ”Œ๋ง๋œ z \sim Q๋กœ๋ถ€ํ„ฐ grasp \hat{g}๋ฅผ ์žฌ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ค‘์—๋Š” ground truth grasp g์™€ ์žฌ๊ตฌ์„ฑ๋œ grasp \hat{g} ์‚ฌ์ด์˜ ์žฌ๊ตฌ์„ฑ ์†์‹ค \mathcal{L}(\hat{g}, g)์„ ์ตœ์†Œํ™”ํ•˜๊ณ , Q(\cdot|\cdot) ๋ถ„ํฌ์™€ ์ •๊ทœ ๋ถ„ํฌ \mathcal{N}(0, I) ์‚ฌ์ด์˜ KL-divergence D_{KL}๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. \mathcal{L}_{vae} = \sum_{z \sim Q, g \sim \mathcal{G}^*} \mathcal{L}(\hat{g}, g) - \alpha D_{KL} [Q(z|X, g), \mathcal{N}(0, I)] ์žฌ๊ตฌ์„ฑ ์†์‹ค์€ ๊ทธ๋ฆฌํผ์˜ ๋ฏธ๋ฆฌ ์ •์˜๋œ ์ ๋“ค p์˜ ๋ณ€ํ™˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •์˜๋ฉ๋‹ˆ๋‹ค: \mathcal{L}(g, \hat{g}) = \frac{1}{n} \sum ||T(g; p) - T(\hat{g}; p)||_1 ์ถ”๋ก  ์‹œ์—๋Š” encoder Q๋Š” ์ œ๊ฑฐ๋˜๊ณ , ์ž ์žฌ ๊ฐ’์€ \mathcal{N}(0, I)์—์„œ ์ƒ˜ํ”Œ๋ง๋ฉ๋‹ˆ๋‹ค.
  • ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ฒ˜: PointNet++ [24]๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ๊ฐ ํฌ์ธํŠธ๋Š” 3D ์ขŒํ‘œ์™€ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. Encoder์—์„œ๋Š” ์ž…๋ ฅ ํฌ์ธํŠธ x \in X์˜ ํŠน์ง•์ด g = [R, T]์— ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค. Decoder์—์„œ๋Š” ๊ฐ ํฌ์ธํŠธ ํŠน์ง•์ด ์ž ์žฌ ๋ณ€์ˆ˜ z์— ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

1.2. Grasp Pose Evaluator

Grasp sampler๋Š” ์„ฑ๊ณต์ ์ธ grasp๋งŒ์œผ๋กœ ํ›ˆ๋ จ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ถ„ํฌ ๋ชจ๋“œ ์‚ฌ์ด์˜ ์‹คํŒจํ•œ grasp๋ฅผ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Evaluator ๋„คํŠธ์›Œํฌ๋Š” ๊ฐ grasp์— ๋Œ€ํ•ด ์„ฑ๊ณต ํ™•๋ฅ  P(S|g, X)๋ฅผ ํ• ๋‹นํ•˜์—ฌ ์ด๋Ÿฌํ•œ false positive๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ๊ฐ€์ง€์น˜๊ธฐํ•ฉ๋‹ˆ๋‹ค.

  • ์ž…๋ ฅ: ๊ฐ์ฒด point cloud X์™€ grasp g.
  • ๊ทธ๋ฆฌํผ ํ‘œํ˜„: ๋กœ๋ด‡ ๊ทธ๋ฆฌํผ๋Š” 6D grasp pose g์— ๋”ฐ๋ผ ๋ Œ๋”๋ง๋œ point cloud \mathcal{X}_g๋กœ ๊ทผ์‚ฌ๋ฉ๋‹ˆ๋‹ค. ๊ฐ์ฒด point cloud X์™€ ๊ทธ๋ฆฌํผ point cloud \mathcal{X}_g๋Š” ํฌ์ธํŠธ๊ฐ€ ๊ฐ์ฒด์— ์†ํ•˜๋Š”์ง€ ๊ทธ๋ฆฌํผ์— ์†ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ถ”๊ฐ€ ์ด์ง„ ํŠน์ง•์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ์ผ point cloud X \cup \mathcal{X}_g๋กœ ๊ฒฐํ•ฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” grasp pose์™€ ๊ฐ์ฒด point cloud ์‚ฌ์ด์˜ ๋ชจ๋“  ์ƒ๋Œ€ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ grasp๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
  • ๋ชฉํ‘œ ํ•จ์ˆ˜: Cross-entropy loss๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ ํ™”๋ฉ๋‹ˆ๋‹ค. \mathcal{L}_{evaluator} = - (y \log(s) + (1 - y) \log(1 - s)) ์—ฌ๊ธฐ์„œ y๋Š” grasp์˜ ์„ฑ๊ณต ์—ฌ๋ถ€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ground truth ์ด์ง„ ๋ ˆ์ด๋ธ”์ด๊ณ , s๋Š” evaluator๊ฐ€ ์˜ˆ์ธกํ•œ ์„ฑ๊ณต ํ™•๋ฅ ์ž…๋‹ˆ๋‹ค.
  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ: Robustํ•œ evaluator๋ฅผ ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•ด ๊ธ์ •์ (positive) ๋ฐ ๋ถ€์ •์ (negative) grasp ๋ชจ๋‘๋กœ ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, hard negative grasps๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ธ์ •์ ์ธ grasp์™€ ์œ ์‚ฌํ•œ ์ž์„ธ๋ฅผ ๊ฐ€์ง€์ง€๋งŒ, ๊ฐ์ฒด์™€ ์ถฉ๋Œํ•˜๊ฑฐ๋‚˜ ๊ฐ์ฒด๋กœ๋ถ€ํ„ฐ ๋„ˆ๋ฌด ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ ์žˆ๋Š” grasp๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. \mathcal{G}^- = \{g^- | \exists g \in \mathcal{G}^* : \mathcal{L}(g, g^-) < \epsilon\}

1.3. Iterative Grasp Pose Refinement

Evaluator ๋„คํŠธ์›Œํฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•œ grasp๋ฅผ ๊ฑฐ๋ถ€ํ•˜์ง€๋งŒ, ๊ฑฐ๋ถ€๋œ ๋งŽ์€ grasp๋Š” ์„ฑ๊ณต์ ์ธ grasp์— ๊ทผ์ ‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ ์„ ํ™œ์šฉํ•˜์—ฌ ์‹คํŒจํ•œ grasp๋ฅผ ์„ฑ๊ณต์ ์ธ grasp๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ณ€ํ™˜ \Delta g \in SE(3)์„ ์ฐพ์Šต๋‹ˆ๋‹ค.

  • ๋ฐฉ๋ฒ•: Evaluator ๋„คํŠธ์›Œํฌ๋Š” ์„ฑ๊ณต ํ™•๋ฅ  s์˜ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ํ•จ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์„ฑ๊ณต ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ •์ œ ๋ณ€ํ™˜์€ grasp ๋ณ€ํ™˜์— ๋Œ€ํ•œ ์„ฑ๊ณต์˜ ๋ฏธ๋ถ„ \partial S / \partial g์„ ํ†ตํ•ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.
  • ์ˆ˜์‹: \Delta g = \frac{\partial S}{\partial g} = \eta \times \frac{\partial S}{\partial T(g; p)} \times \frac{\partial T(g; p)}{\partial g} ์—ฌ๊ธฐ์„œ \eta๋Š” ์—…๋ฐ์ดํŠธ์˜ ํฌ๊ธฐ๋ฅผ ์ œํ•œํ•˜๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.
  • ๊ฐ•์„ฑ ์ œ์•ฝ (Rigidity Constraint): ๊ทธ๋ฆฌํผ point cloud \mathcal{X}_g๋Š” Euler ๊ฐ๋„ R_g = (\alpha_g, \beta_g, \gamma_g)๋กœ ์ •์˜๋œ grasp์˜ orientation๊ณผ translation T_g์˜ ํ•จ์ˆ˜๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค. Chain rule์„ ์‚ฌ์šฉํ•˜์—ฌ \Delta g๊ฐ€ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.

2. ์‹คํ—˜ (Experiments)

2.1. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ

  • ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ: ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ FleX [18]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๊ณต์ ์ธ grasp์˜ ๊ธฐ์ค€ ์„ธํŠธ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ์ฒด: ShapeNet [3]์˜ ์ƒ์ž, ์›ํ†ต (๋ฌด์ž‘์œ„ ์ƒ์„ฑ), ๊ทธ๋ฆ‡, ๋ณ‘, ๋จธ๊ทธ์ปต ๋“ฑ 6๊ฐ€์ง€ ๋ฒ”์ฃผ์˜ 206๊ฐœ ๊ฐ์ฒด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Grasp ์ƒ˜ํ”Œ๋ง: ๊ฐ์ฒด ๋ฉ”์‰ฌ ํ‘œ๋ฉด์— ๋ฌด์ž‘์œ„ ์ ์„ ์ƒ˜ํ”Œ๋งํ•˜๊ณ , ๊ทธ๋ฆฌํผ์˜ z-์ถ•์„ ํ‘œ๋ฉด ๋ฒ•์„ ์— ์ •๋ ฌํ•˜๋ฉฐ, ๊ทธ๋ฆฌํผ์™€ ๊ฐ์ฒด ํ‘œ๋ฉด ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ทธ๋ฆฌํผ ์†๊ฐ€๋ฝ ๊ธธ์ด ๋‚ด์—์„œ ๊ท ์ผํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌํผ๊ฐ€ ๊ฐ์ฒด์™€ ์ถฉ๋Œํ•˜์ง€ ์•Š๊ณ  ๋‹ซ๋Š” ๋ณผ๋ฅจ์ด ๊ฐ์ฒด์™€ ๊ต์ฐจํ•˜๋Š” grasp๋งŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์„ฑ๊ณต ์ •์˜: ๊ทธ๋ฆฌํผ๊ฐ€ ์†๊ฐ€๋ฝ์„ ๋‹ซ์€ ํ›„ ๋ฏธ๋ฆฌ ์ •์˜๋œ ํ”๋“ค๋ฆผ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•  ๋•Œ, ๊ฐ์ฒด๊ฐ€ ๋‘ ์†๊ฐ€๋ฝ ์‚ฌ์ด์— ์œ ์ง€๋˜๋ฉด ์„ฑ๊ณต์œผ๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค. ์ด 2,104,894๊ฐœ์˜ ์„ฑ๊ณต์ ์ธ grasp๊ฐ€ ์ƒ์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค (19.4%).

2.2. ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ฒ˜ ์„ธ๋ถ€ ์‚ฌํ•ญ

  • ๊ธฐ๋ฐ˜: PointNet++ [24] ์•„ํ‚คํ…์ฒ˜.
  • ๋ ˆ์ด์–ด: 3๊ฐœ์˜ set-abstraction ๋ ˆ์ด์–ด์™€ fully connected ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
    • Set-abstraction ๋ ˆ์ด์–ด๋Š” ๊ฐ๊ฐ 128, 32, ๋ชจ๋“  ํฌ์ธํŠธ๋ฅผ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค.
    • ์ƒ˜ํ”Œ๋ง๋œ ํฌ์ธํŠธ์˜ 2cm, 4cm, โˆž ๋ฐ˜๊ฒฝ ๋‚ด์˜ ํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐ set-abstraction ๋ ˆ์ด์–ด๋Š” 3๊ฐœ์˜ fully connected ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ง•์„ ๊ณ„์‚ฐํ•˜๋ฉฐ, ์ฑ„๋„ ์ˆ˜๋Š” ๊ฐ๊ฐ [64, 64, 128], [128, 128, 256], [256, 256, 512]์ž…๋‹ˆ๋‹ค.
  • Fully Connected Layers: Set-abstraction ๋ ˆ์ด์–ด ๋‹ค์Œ์—๋Š” 1024๊ฐœ ์œ ๋‹›์„ ๊ฐ€์ง„ ๋‘ ๊ฐœ์˜ fully connected ๋ ˆ์ด์–ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Grasp Generator ์ถœ๋ ฅ: unit quaternion์œผ๋กœ ํ‘œํ˜„๋œ ํšŒ์ „ R๊ณผ translation T๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • Evaluator ์ถœ๋ ฅ: softmax ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ grasp์˜ ์ ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

2.3. ํ‰๊ฐ€ ์ง€ํ‘œ (Evaluation Metrics)

  • Success Rate: ์˜ˆ์ธก๋œ ๋ชจ๋“  grasp ์ค‘ ์„ฑ๊ณต์ ์ธ grasp์˜ ๋น„์œจ.
  • Coverage Rate: ์ƒ์„ฑ๋œ grasp๊ฐ€ ๊ธ์ •์ ์ธ grasp ๊ณต๊ฐ„ \mathcal{G}^*๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ํฌ๊ด„ํ•˜๋Š”์ง€ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์ธก๋œ grasp ์„ธํŠธ \hat{\mathcal{G}}๊ฐ€ ๊ธ์ •์ ์ธ grasp g \in \mathcal{G}^*๋ฅผ 2cm ์ด๋‚ด๋กœ ์ปค๋ฒ„ํ•˜๋ฉด ํ•ด๋‹น grasp๋Š” ์ปค๋ฒ„๋œ ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค.
  • AUC (Area Under the Curve): success-coverage ๊ณก์„ ์˜ AUC๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐฉ๋ฒ•๋ก ์„ ๋ถ„์„ํ•˜๊ณ  ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

2.4. ๋ถ„์„ ๋ฐ Ablation ์—ฐ๊ตฌ

  • ์ž ์žฌ ๊ณต๊ฐ„ ์ฐจ์›: ์ž ์žฌ ๊ณต๊ฐ„ ์ฐจ์›(1, 2, 3, 4)์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์„ฑ๊ณต-์ปค๋ฒ„๋ฆฌ์ง€ ๊ณก์„ ์˜ AUC๋ฅผ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ, 2์ฐจ์› ์ž ์žฌ ๊ณต๊ฐ„์ด ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. 1์ฐจ์›์€ ์šฉ๋Ÿ‰ ๋ถ€์กฑ, 3์ฐจ์› ์ด์ƒ์€ ๊ณผ์ ํ•ฉ ๋ฐ ์ถ”๋ก  ์‹œ ์ปค๋ฒ„๋ฆฌ์ง€ ๋ฌธ์ œ๋กœ ์ธํ•ด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ์ •์ œ ๋‹จ๊ณ„์˜ ํšจ๊ณผ: ์ •์ œ ๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณตํ• ์ˆ˜๋ก ์ƒ์„ฑ๋œ grasp์˜ ์„ฑ๊ณต๋ฅ ๊ณผ ์ปค๋ฒ„๋ฆฌ์ง€์œจ์ด ๋ชจ๋‘ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. 10๋ฒˆ์งธ ๋ฐ˜๋ณต ์ดํ›„ AUC๊ฐ€ ์ˆ˜๋ ดํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ƒ˜ํ”Œ๋ง๋œ grasp์˜ ํšจ๊ณผ: ์ƒ˜ํ”Œ๋ง๋œ grasp์˜ ์ˆ˜๊ฐ€ ๋งŽ์„์ˆ˜๋ก ์ปค๋ฒ„๋ฆฌ์ง€์œจ์ด ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

2.5. ๋กœ๋ด‡ ์‹คํ—˜ (Robot Experiments)

  • ์„ค์ •: Intel RealSense D415 ์นด๋ฉ”๋ผ๊ฐ€ ์žฅ์ฐฉ๋œ 7-DOF Franka Panda ๋กœ๋ด‡ ํŒ”์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ์ฒด: ์‹œ๊ฐ์ ์œผ๋กœ๋‚˜ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊นŒ๋‹ค๋กœ์šด 17๊ฐ€์ง€ ์ผ๋ฐ˜์ ์ธ ๊ฐ์ฒด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ์ฒด์˜ 3D ๋ชจ๋ธ์€ ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
  • ํ”„๋กœํ† ์ฝœ: ๊ฐ ๊ฐ์ฒด๋ฅผ ๋กœ๋ด‡ ์•ž ํ…Œ์ด๋ธ”์— ์„ธ ๊ฐ€์ง€ ๋‹ค๋ฅธ ์•ˆ์ •์ ์ธ ์ž์„ธ๋กœ ๋†“์Šต๋‹ˆ๋‹ค. ์นด๋ฉ”๋ผ ์‹œ์•ผ๊ฐ€ ํ™•๋ณด๋˜๋„๋ก ๋กœ๋ด‡ ํŒ”์„ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค. ์ธก์ •๋œ point cloud์—์„œ ํ…Œ์ด๋ธ” ํ‰๋ฉด์„ ์ œ๊ฑฐํ•˜๊ณ  ๋‚˜๋จธ์ง€ ํฌ์ธํŠธ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜์—ฌ ๊ฐ์ฒด point cloud๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ๋น„๊ต: 6-DOF GraspNet๊ณผ GPD [31]๋ฅผ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.
  • ์„ฑ๊ณต ์ •์˜: ๋กœ๋ด‡์ด ๊ฐ์ฒด๋ฅผ ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š๊ณ  10cm ๋“ค์–ด ์˜ฌ๋ฆด ์ˆ˜ ์žˆ์œผ๋ฉด ์„ฑ๊ณต์œผ๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ: ๋ณธ ๋…ผ๋ฌธ์˜ ๋ฐฉ๋ฒ•์€ GPD [31]๋ณด๋‹ค ๋ชจ๋“  ๊ฐ์ฒด์—์„œ ๋” ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค (ํ‰๊ท  88% ๋Œ€ 47%). ๋ณธ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ grasp๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์šด๋™ํ•™์ ์œผ๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ grasp๋ฅผ ๋” ์ž˜ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. GPD๋Š” ๋จธ๊ทธ์ปต์˜ ๋ฆผ(rim)๊ณผ ๊ฐ™์€ ์–‡์€ ๊ตฌ์กฐ๋ฌผ์— ๋Œ€ํ•œ grasp ์ƒ์„ฑ์ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

3. ๊ฒฐ๋ก  (Conclusions)

๋ณธ ๋…ผ๋ฌธ์€ ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•œ grasp๋ฅผ ์ƒ์„ฑํ•˜๋Š” 6-DOF GraspNet์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ VAE๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ grasp๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๊ณ , Grasp Evaluator ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด grasp ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๋ฉฐ, ๊ธฐ์šธ๊ธฐ ๊ธฐ๋ฐ˜ ์ •์ œ ํ”„๋กœ์„ธ์Šค๋ฅผ ํ†ตํ•ด grasp๋ฅผ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต์€ ์ „์ ์œผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ƒ์„ฑ๋œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์•Œ ์ˆ˜ ์—†๋Š” 3D ๋ชจ๋ธ์„ ๊ฐ€์ง„ ๊ฐ์ฒด์— ๋Œ€ํ•ด์„œ๋„ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ•™์Šต๋œ grasp sampler์™€ ๊ธฐ์šธ๊ธฐ ๊ธฐ๋ฐ˜ ์ •์ œ ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋กœ๋ด‡ ์กฐ์ž‘์— ํšจ๊ณผ์ ์ž„์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ: * ์ฃผ๋ณ€ ํ™˜๊ฒฝ์˜ ๊ฐ์ฒด๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์ถฉ๋Œํ•˜๊ฑฐ๋‚˜ ์‹คํ–‰ ๋ถˆ๊ฐ€๋Šฅํ•œ grasp ์ƒ์„ฑ์„ ์ง์ ‘ ํ”ผํ•˜๋„๋ก sampler ๋˜๋Š” evaluator๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. * evaluator๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ๋ง๋œ grasp๋ฅผ ์ •์ œํ•˜๋Š” ๊ฒƒ ์™ธ์—, ๊ฐ์ฒด์— ์ ‘๊ทผํ•˜๋Š” ์กฐ์ž‘๊ธฐ๋ฅผ ์œ„ํ•œ ์‹ค์‹œ๊ฐ„ ํ”ผ๋“œ๋ฐฑ ์ง€์นจ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

์„œ๋ก 

grasp ์„ ํƒ์€ ๋กœ๋ด‡ ์กฐ์ž‘์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋กœ๋ด‡์€ ๋ฌผ์ฒด๋ฅผ ๊ด€์ฐฐํ•˜๊ณ , ๊ทธ๊ฒƒ์„ ์ง‘๊ธฐ ์œ„ํ•ด ๊ทธ๋ฆฌํผ๋ฅผ ์–ด๋””๋กœ(3D ์œ„์น˜) ์–ด๋–ค ๋ฐฉํ–ฅ(3D ํšŒ์ „)์œผ๋กœ ์›€์ง์ผ์ง€ ์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. grasp์˜ ์•ˆ์ •์„ฑ์€ ๋ฌผ์ฒดยท๊ทธ๋ฆฌํผ ๊ธฐํ•˜, ์งˆ๋Ÿ‰ ๋ถ„ํฌ, ํ‘œ๋ฉด ๋งˆ์ฐฐ์— ์˜์กดํ•˜๋ฉฐ, ๋ฌผ์ฒด ์ฃผ๋ณ€ ๊ธฐํ•˜๋Š” โ€œ์–ด๋””๋ฅผ ์žก์œผ๋ฉด ํŒ”์ด ๋‹ค๋ฅธ ๋ฌผ์ฒด์™€ ์ถฉ๋Œํ•˜์ง€ ์•Š๊ณ  ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ๊ฐ€โ€๋ผ๋Š” ์ถ”๊ฐ€ ์ œ์•ฝ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์ „ํ†ต์ ์œผ๋กœ๋Š” ๊ธฐํ•˜ ํœด๋ฆฌ์Šคํ‹ฑ์œผ๋กœ ์œ ๋งํ•œ grasp ์ ์„ ๊ณ ๋ฅธ ๋’ค ์•ˆ์ •์„ฑยท๋„๋‹ฌ์„ฑ์„ ๋ถ„์„ํ–ˆ๋Š”๋ฐ, ๋งŽ์€ ๋ฐฉ๋ฒ•์ด ์™„์ „ํ•œ 3D ๋ชจ๋ธ ์„ ์ „์ œํ•ด โ€” ์žก์Œ ์„ž์ธ ๊นŠ์ด ์˜์ƒ์œผ๋กœ ์žฅ๋ฉด์„ ๋ณด๋Š” ํ˜„์‹ค์—์„œ๋Š” ์‹ฌ๊ฐํ•œ ํ•œ๊ณ„์ž…๋‹ˆ๋‹ค. ์นด๋ฉ”๋ผ๋ฅผ ์›€์ง์—ฌ ์ „์ฒด ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ฑฐ๋‚˜ shape completion์„ ํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์ข์€ ๊ณต๊ฐ„์—์„  ๋ถˆ๊ฐ€๋Šฅํ•˜๊ฑฐ๋‚˜ ์ •ํ™•๋„๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค.

์ตœ๊ทผ ๋”ฅ๋Ÿฌ๋‹์œผ๋กœ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ์—์„œ grasp ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ๋‚˜์™”์ง€๋งŒ, ์—ฌ์ „ํžˆ ์ˆ˜์ž‘์—… ํœด๋ฆฌ์Šคํ‹ฑ์œผ๋กœ ํ‰๊ฐ€ ํ›„๋ณด๋ฅผ ์ƒ˜ํ”Œ๋ง ํ•˜๊ฑฐ๋‚˜ CEM ๊ฐ™์€ ๋ธ”๋ž™๋ฐ•์Šค ์ตœ์ ํ™”์— ๊ธฐ๋Œ€๊ณ , ์ƒ˜ํ”Œ๋œ grasp๋ฅผ ๊ฐœ์„ ํ•  ํšจ์œจ์  ์ˆ˜๋‹จ ์ด ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ ๋งŽ์€ ๋ฐฉ๋ฒ•์ด grasp๋ฅผ ์ด๋ฏธ์ง€ ํ‰๋ฉด์— ํ‰ํ–‰ํ•œ ์‚ฌ๊ฐํ˜•(3-DOF)์œผ๋กœ ํ‘œํ˜„ํ•ด ๋‹ค์–‘์„ฑ์„ ์ œ์•ฝํ–ˆ์Šต๋‹ˆ๋‹ค(top-down ์œ„์ฃผ).

์ด ๋…ผ๋ฌธ์˜ ํ•œ ์ค„ ์š”์•ฝ: ๋ฏธ์ง€ ๋ฌผ์ฒด์— ๋Œ€ํ•ด ์•ˆ์ •์ ์ด๊ณ  ๋‹ค์–‘ํ•œ grasp ์ง‘ํ•ฉ์„ ํšจ์œจ์ ์œผ๋กœ ์ƒ์„ฑ ํ•˜๋Š” ์ตœ์ดˆ์˜ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ โ€” VAE๋กœ grasp๋ฅผ ์ƒ˜ํ”Œํ•˜๊ณ , grasp evaluator๋กœ ํ‰๊ฐ€ํ•˜๋ฉฐ, evaluator์˜ ๊ธฐ์šธ๊ธฐ๋กœ grasp๋ฅผ ๋ฐ˜๋ณต ๊ฐœ์„ ํ•œ๋‹ค.

์ฃผ์š” ๊ธฐ์—ฌ:

  • VAE ๊ธฐ๋ฐ˜ grasp sampler: ๋ถ€๋ถ„ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ๋ฅผ ๋‹ค์–‘ํ•œ grasp ์ง‘ํ•ฉ์œผ๋กœ ๋งคํ•‘. ๊ฐ€๋Šฅํ•œ ๊ธฐ๋Šฅ์  grasp๋ฅผ ๋†’์€ ์ปค๋ฒ„๋ฆฌ์ง€๋กœ ์ƒ์„ฑํ•˜๋ฉด์„œ ์‹คํŒจ grasp๋Š” ์ ๊ฒŒ.
  • grasp evaluator network: 6D ๊ทธ๋ฆฌํผ ์ž์„ธ์˜ ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๊ณ , ๊ทธ ๊ธฐ์šธ๊ธฐ๋กœ grasp๋ฅผ ๊ฐœ์„ (์ถฉ๋Œ์—์„œ ๋นผ๋‚ด๊ฑฐ๋‚˜ ์ •๋ ฌ ๋ณด์ •).
  • ๋ฏธ์ง€ ๋ฌผ์ฒด 17๊ฐœ๋ฅผ 88% ๋กœ ์ง‘์–ด ๊ธฐ์กด(GPD) ๋Œ€๋น„ ์šฐ์ˆ˜ํ•˜๋ฉฐ, ๋‹ค์–‘์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์ž„.

๋ฐฉ๋ฒ•

flowchart LR
    PC["Object Point Cloud X<br/>(depth camera)"] --> SAMP
    subgraph SAMP["1 Variational Grasp Sampler (VAE)"]
        Z["z ~ N(0,I)"] --> DEC["Decoder P"]
        PC2["X"] --> DEC
        DEC --> G["๋‹ค์–‘ํ•œ grasp ์ง‘ํ•ฉ<br/>g=(R,T)โˆˆSE(3)"]
    end
    G --> EVAL
    subgraph EVAL["2 Grasp Evaluator (PointNet)"]
        E["P(S|g,X)<br/>์„ฑ๊ณต ํ™•๋ฅ "]
    end
    EVAL --> REF
    subgraph REF["3 Iterative Refinement"]
        R["ฮ”g = ฮทยท(โˆ‚S/โˆ‚g)<br/>๋ฐ˜๋ณต ๊ฐœ์„ "]
    end
    R -.->|๊ฐœ์„ ๋œ grasp| EVAL
    REF --> OUT["์ž„๊ณ„๊ฐ’ ์ด์ƒ<br/>๊ณ ํ’ˆ์งˆ grasp ์ง‘ํ•ฉ"]

์ž…๋ ฅ์€ ์ง‘์„ ๋ฌผ์ฒด์˜ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ์ž…๋‹ˆ๋‹ค. ๋ชฉํ‘œ๋Š” ์‚ฌํ›„๋ถ„ํฌ P(G^* \mid X) ๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ โ€” G^* ๋Š” ์„ฑ๊ณต grasp๋“ค์˜ ๊ณต๊ฐ„, X ๋Š” ์นด๋ฉ”๋ผ๊ฐ€ ๋ณธ ๋ถ€๋ถ„ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ์ž…๋‹ˆ๋‹ค. grasp g=(R,T)\in SE(3) ๋Š” ๋ฌผ์ฒด ๊ธฐ์ค€ ํ”„๋ ˆ์ž„(์›์  = ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฌด๊ฒŒ์ค‘์‹ฌ \bar X)์—์„œ ์ •์˜๋ฉ๋‹ˆ๋‹ค. G^* ๋Š” ๋ณต์žกยท๋ถˆ์—ฐ์†ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๋จธ๊ทธ์ปต์€ ํ…Œ๋‘๋ฆฌยท์†์žก์ดยท๋ฐ”๋‹ฅ์„ ๋”ฐ๋ผ ์—ฌ๋Ÿฌ ๋ชจ๋“œ, ๊ฐ ๋ชจ๋“œ ๋‚ด๋ถ€๋Š” ์—ฐ์†). ๋ชจ๋“œ ๊ฐœ์ˆ˜๊ฐ€ ๋ฏธ๋ฆฌ ์•Œ๋ ค์ง€์ง€ ์•Š์œผ๋ฏ€๋กœ, ์„ฑ๊ณต grasp์˜ likelihood๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” generator ๋ชจ๋“ˆ ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

Variational Grasp Sampler

grasp sampler๋Š” ์‚ฌ์ „ ์ •์˜๋œ ์„ฑ๊ณต grasp ์ง‘ํ•ฉ g\in G^* ์˜ likelihood P(G\mid X) ๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. latent z, P(z)=\mathcal N(0,I) ์— ๋Œ€ํ•ด

P(G\mid X) = \int P(G\mid X, z; \Theta)\, P(z)\, dz

์ด ์ ๋ถ„์€ ๋‹ค๋ฃจ๊ธฐ ์–ด๋ ค์šฐ๋ฏ€๋กœ, encoder Q(z\mid X, g) ๊ฐ€ (ํฌ์ธํŠธํด๋ผ์šฐ๋“œ, grasp) ์Œ์„ latent์˜ ์ž‘์€ ๋ถ€๋ถ„๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘ํ•˜๊ณ , decoder๊ฐ€ z\sim Q ์—์„œ grasp \hat g ๋ฅผ ๋ณต์›ํ•ฉ๋‹ˆ๋‹ค. VAE ์†์‹ค์€

\mathcal L_{\text{vae}} = \sum_{z\sim Q,\, g\sim G^*} \mathcal L(\hat g, g) \;-\; \alpha\, \mathcal D_{KL}\big[Q(z\mid X, g)\,\Vert\,\mathcal N(0,I)\big]

ํšŒ์ „ยท๋ณ‘์ง„ ์†์‹ค์„ ํ•ฉ์น˜๊ธฐ ์œ„ํ•ด reconstruction ์†์‹ค์€ ๊ทธ๋ฆฌํผ ์œ„์˜ ์‚ฌ์ „ ์ •์˜๋œ ์  p ๋ฅผ ๋ณ€ํ™˜ ํ•ด ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

\mathcal L(g, \hat g) = \frac{1}{n}\sum \big\lVert \mathcal T(g; p) - \mathcal T(\hat g; p) \big\rVert_1

์—ฌ๊ธฐ์„œ \mathcal T(\cdot;p) ๋Š” grasp ์ž์„ธ์— ๋”ฐ๋ผ ๊ทธ๋ฆฌํผ ์ ๋“ค์„ ๋ณ€ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. encoder/decoder ๋ชจ๋‘ PointNet++ ๊ธฐ๋ฐ˜์ด๋ฉฐ, \alpha=0.01, latent๋Š” 2์ฐจ์› ์„ ์”๋‹ˆ๋‹ค(์ถ”๋ก  ์‹œ encoder ์ œ๊ฑฐ, z\sim\mathcal N(0,I) ์—์„œ ์ƒ˜ํ”Œ). ํ•™์Šต ๊ฒฐ๊ณผ latent ๊ณต๊ฐ„์ด grasp ์ž์„ธ์™€ ๊ฐ•ํ•œ ์ƒ๊ด€์„ ๊ฐ€์ง์„ ์ •์„ฑ์ ์œผ๋กœ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

Grasp Pose Evaluation

sampler๋Š” ์–‘์„ฑ grasp๋งŒ ๋ณด๊ณ  ์—ฐ์† ์‚ฌํ›„๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•˜๋ฏ€๋กœ, ๋ถ„ํฌ ๋ชจ๋“œ ์‚ฌ์ด์˜ ์ „์ด์  false positive ๊ฐ€ ์„ž์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ๊ฑธ๋Ÿฌ๋‚ผ evaluator P(S\mid g, X) ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ํ‘œํ˜„ ๊ธฐ๋ฒ•: grasp g ๋ฅผ ๊ทธ๋ฆฌํผ ํ˜•์ƒ์„ ๊ทธ ์ž์„ธ๋กœ ๋ Œ๋”๋งํ•œ ๊ทธ๋ฆฌํผ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ X_g ๋กœ ๋‚˜ํƒ€๋‚ด๊ณ , ๋ฌผ์ฒด ์ ๋“ค๊ณผ X\cup X_g ๋กœ ํ•ฉ์นœ ๋’ค โ€œ๋ฌผ์ฒด/๊ทธ๋ฆฌํผ ์†Œ์†โ€์„ ๋‚˜ํƒ€๋‚ด๋Š” binary feature๋ฅผ ๋ถ™์—ฌ PointNet ์œผ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. cross-entropy๋กœ ํ•™์Šต:

\mathcal L_{\text{evaluator}} = -\big(y\log(s) + (1-y)\log(1-s)\big)

y ๋Š” grasp ์„ฑ๊ณต ์—ฌ๋ถ€ ๋ผ๋ฒจ, s ๋Š” ์˜ˆ์ธก ํ™•๋ฅ . 6D grasp ๊ณต๊ฐ„์ด ์กฐํ•ฉ์ ์œผ๋กœ ๋ฐฉ๋Œ€ํ•ด ๋ชจ๋“  ์Œ์„ฑ์„ ์ƒ˜ํ”Œํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ hard negative mining ์„ ํ•ฉ๋‹ˆ๋‹ค. hard negative ์ง‘ํ•ฉ์€ ์–‘์„ฑ grasp๋ฅผ ์‚ด์ง ๊ต๋ž€ํ•ด(๊ทธ๋ฆฌํผ ๋ฉ”์‹œ๊ฐ€ ๋ฌผ์ฒด์™€ ์ถฉ๋Œํ•˜๊ฑฐ๋‚˜ ๋„ˆ๋ฌด ๋ฉ€์–ด์ง€๋„๋ก) ๋งŒ๋“ 

G^- = \{\, g^- \mid \exists\, g\in G^*:\ \mathcal L(g, g^-) < \epsilon \,\}

์ž…๋‹ˆ๋‹ค.

Iterative Grasp Pose Refinement

evaluator๊ฐ€ ๋ถ€์ ์ ˆํ•œ grasp๋ฅผ ๊ฑธ๋Ÿฌ๋‚ด์ง€๋งŒ, ๊ฑฐ๋ถ€๋œ grasp ์ƒ๋‹น์ˆ˜๋Š” ์„ฑ๊ณต์— ๊ฐ€๊น์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์„ฑ๊ณต ํ™•๋ฅ ์„ ๋†’์ด๋Š” ๋ณ€ํ™˜ \Delta g\in SE(3) ๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. evaluator๋Š” ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ ์„ฑ๊ณต S ๋ฅผ grasp ๋ณ€ํ™˜์œผ๋กœ ๋ฏธ๋ถ„ํ•ด ๊ฐœ์„  ๋ฐฉํ–ฅ์„ ์–ป์Šต๋‹ˆ๋‹ค.

\Delta g = \eta \times \frac{\partial S}{\partial \mathcal T(g; p)} \times \frac{\partial \mathcal T(g; p)}{\partial g}

๋ถ€๋ถ„ ๋ฏธ๋ถ„ \partial S/\partial g ๋Š” ๊ตญ์†Œ ๊ทผ๋ฐฉ์—์„œ๋งŒ ์œ ํšจํ•˜๋ฏ€๋กœ, \eta ๋กœ step ํฌ๊ธฐ๋ฅผ ์ œํ•œํ•ฉ๋‹ˆ๋‹ค(์ตœ๋Œ€ ๋ณ‘์ง„ ๊ฐฑ์‹  1cm/step). ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•ด grasp๋ฅผ ์ ์ง„์ ์œผ๋กœ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: ์ ์ด ๊ทธ๋ฆฌํผ ์‚ฌ์ด์— ์—†๋˜ bowl grasp๋„ ์„ฑ๊ณต ์ž์„ธ๋กœ ๋ฐ€์–ด๋ƒ„).

ํ•™์Šต ๋ฐ์ดํ„ฐ

๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ FleX ๋กœ ์ž„์˜ ํ˜•์ƒ์˜ grasp๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค. ShapeNet์˜ 6๊ฐœ ๋ฒ”์ฃผ(์ƒ์žยท์‹ค๋ฆฐ๋”ยทbowlยท๋ณ‘ยท๋จธ๊ทธ, ๋žœ๋ค ์ƒ์„ฑ ๋ฐ•์Šค/์‹ค๋ฆฐ๋” ํฌํ•จ) 206๊ฐœ ๋ฌผ์ฒด. ๋ฌผ์ฒด ํ‘œ๋ฉด์— ์ ์„ ์ƒ˜ํ”Œํ•˜๊ณ  ๊ทธ๋ฆฌํผ z์ถ•์„ ํ‘œ๋ฉด ๋ฒ•์„ ์— ์ •๋ ฌ, ๋ฌด์ค‘๋ ฅ free-floating ๊ทธ๋ฆฌํผยท๋ฌผ์ฒด๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜, ์†๊ฐ€๋ฝ์„ ๋‹ซ๊ณ  ํ”๋“ค๊ธฐ(shaking) ๋™์ž‘ ํ›„์—๋„ ๋ฌผ์ฒด๊ฐ€ ์žกํ˜€ ์žˆ์œผ๋ฉด ์„ฑ๊ณต์œผ๋กœ ๋ผ๋ฒจ. ์ด 10,816,720 ํ›„๋ณด ์ค‘ 7,074,038๊ฐœ(65.4%, non-empty closing volume ํ†ต๊ณผ)๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ด 2,104,894๊ฐœ(19.4%) ์„ฑ๊ณต grasp ์ƒ์„ฑ. PointNet++, Adam, lr 0.0001. ๋ชจ๋“  grasp๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ƒ์„ฑ์ด๋ฉฐ ์‹ค๋ฐ์ดํ„ฐ๋Š” ํ•™์Šต์— ๋ฏธ์‚ฌ์šฉ.

์‹คํ—˜

ํ‰๊ฐ€ ์ง€ํ‘œ๋Š” success rate(์˜ˆ์ธก grasp ์ค‘ ์„ฑ๊ณต ๋น„์œจ)์™€ coverage rate(์ƒ์„ฑ grasp๊ฐ€ ์–‘์„ฑ grasp ๊ณต๊ฐ„ G^* ๋ฅผ ์–ผ๋งˆ๋‚˜ ๋ฎ๋Š”๊ฐ€; 2cm ์ด๋‚ด๋ฉด ์ปค๋ฒ„๋กœ ๊ฐ„์ฃผ) โ€” ๋ถ„๋ฅ˜์˜ precision/recall์— ๋Œ€์‘ํ•ฉ๋‹ˆ๋‹ค. success-coverage ๊ณก์„ ์˜ AUC ๋กœ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.

Ablation

  • latent ์ฐจ์›: 2D๊ฐ€ ์ตœ์„ . 1D๋Š” ์šฉ๋Ÿ‰ ๋ถ€์กฑ, 3Dยท4D๋Š” ํ•™์Šต ์†์‹ค์€ ์•ฝ๊ฐ„ ์ข‹์ง€๋งŒ ์ถ”๋ก  ์‹œ VAE๊ฐ€ latent๋ฅผ ์กฐ๋ฐ€ํžˆ ๋ฎ์ง€ ๋ชปํ•ด ์„ฑ๋Šฅ ์ €ํ•˜.
  • refinement step ์ˆ˜: successยทcoverage๊ฐ€ ํ•จ๊ป˜ ์ฆ๊ฐ€ํ•˜๋ฉฐ 10ํšŒ ์ดํ›„ plateau(๊ฐœ์„  grasp๊ฐ€ ์–‘์„ฑ G^* ์— ๊ฐ€๊นŒ์›Œ์ง€๋ฏ€๋กœ coverage๋„ ์ƒ์Šน).
  • VAE sampler vs ๊ธฐํ•˜ sampler: ๊ฐ™์€ evaluator๋ฅผ ๋ถ™์—ฌ๋„ VAE+Evaluator AUC 0.18 vs ๊ธฐํ•˜ Baseline+Evaluator 0.07. ๊ธฐํ•˜ ๋ฒ•์„  ๊ธฐ๋ฐ˜ ์ƒ˜ํ”Œ๋Ÿฌ๋Š” ํ…Œ๋‘๋ฆฌยท์–‡์€ ๊ตฌ์กฐ์—์„œ grasp๋ฅผ ๊ฑฐ์˜ ๋ชป ๋งŒ๋“ค๊ณ  ๊ฒฐ์†/๊ฐ€๋ฆผ์— ์ผ๋ฐ˜ํ™” ๋ชป ํ•จ.

๋กœ๋ด‡ ์‹คํ—˜

7-DOF Franka Panda + Intel RealSense D415(๊ทธ๋ฆฌํผ ์žฅ์ฐฉ). ์‹œ๊ฐยท๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊นŒ๋‹ค๋กœ์šด ๋ฏธ์ง€ ๋ฌผ์ฒด(์งˆ๋Ÿ‰ 42g~618g)๋ฅผ 3๊ฐ€์ง€ ์•ˆ์ • ์ž์„ธ๋กœ ๋ฐฐ์น˜, 10cm ๋“ค์–ด ์˜ฌ๋ฆฌ๋ฉด ์„ฑ๊ณต. ์ถฉ๋Œ ์—†๋Š” ๊ฒฝ๋กœ๊ฐ€ ์žˆ๋Š” ์ตœ๊ณ  ์ ์ˆ˜ grasp๋ฅผ ์‹คํ–‰, ์‹คํ–‰ ๊ฐ€๋Šฅ grasp๊ฐ€ ์—†์œผ๋ฉด ์‹คํŒจ. ๋ฐฉ๋ฒ•๋‹น 51 trial. ์ถ”๋ก ์€ VAE+Evaluator 0.04์ดˆ, refinement 1ํšŒ 0.3์ดˆ(Titan XP, batch 200).

Table 1 โ€” vs GPD(๋ฒ ์ด์Šค๋ผ์ธ 6-DOF grasp planner):

๋ฒ”์ฃผ 6-DOF GraspNet GPD
Box 83% 50%
Cylinder 89% 78%
Bowl 100% 78%
Mug 86% 6%
ํ‰๊ท  ์„ฑ๊ณต๋ฅ  90% 52%
์ „์ฒด ์„ฑ๊ณต๋ฅ  88% 47%

ํ•ด์„: GraspNet์€ ๋‹ค์–‘ํ•œ grasp๋ฅผ ์ƒ์„ฑ ํ•ด ์šด๋™ํ•™์ ์œผ๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๊ฒƒ์„ ์ฐพ๊ธฐ ์‰ฌ์šด ๋ฐ˜๋ฉด, GPD๋Š” ์ข…์ข… ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ grasp๋ฅผ ๋ชป ๋งŒ๋“ญ๋‹ˆ๋‹ค. ํŠนํžˆ ๋จธ๊ทธ์ปต์—์„œ GraspNet์€ ํ…Œ๋‘๋ฆฌ๋ฅผ ๋”ฐ๋ผ ํ›จ์”ฌ ๋งŽ์€ grasp ๋ฅผ ์ƒ์„ฑ(GPD๋Š” ํ…Œ๋‘๋ฆฌ grasp ์ž์ฒด๋ฅผ ๋ชป ๋งŒ๋“ค์–ด 6%). ์†๊ฐ€๋ฝ์ด ํ‘œ๋ฉด์— ์ ‘ํ•˜๋Š” grasp๋Š” ์‹คํ–‰ ์˜ค์ฐจ์— ์ทจ์•ฝํ•œ๋ฐ, ๋‹ค์–‘ํ•œ ํ›„๋ณด๊ฐ€ ์ด๋ฅผ ๋ณด์™„ํ•ฉ๋‹ˆ๋‹ค.

๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

  • ์ƒ์„ฑ-ํ‰๊ฐ€-๊ฐœ์„ ์˜ 3๋‹จ ํŒŒ์ดํ”„๋ผ์ธ. โ€œVAE๋กœ ๋‹ค์–‘์„ฑ ํ™•๋ณด โ†’ evaluator๋กœ ์ •๋ฐ€๋„ ๋ณด๊ฐ• โ†’ ๊ธฐ์šธ๊ธฐ๋กœ ๋ฐ˜๋ณต ๊ฐœ์„ โ€์ด๋ผ๋Š” ๊ตฌ์„ฑ์ด ๋ช…๋ฃŒํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ํ•™์Šต๋œ grasp sampler์™€ ๊ธฐ์šธ๊ธฐ ๊ธฐ๋ฐ˜ refinement๋Š” ๋‹น์‹œ ์ตœ์ดˆ ์˜ ์‹œ๋„๋กœ, false positive๋ฅผ ๋Šฅ๋™์ ์œผ๋กœ ์„ฑ๊ณต ์ž์„ธ๋กœ ๋ฐ€์–ด๋ƒ…๋‹ˆ๋‹ค.
  • ๋‹ค์–‘์„ฑ์ด ๊ณง ์‹คํ–‰ ๊ฐ€๋Šฅ์„ฑ. ๋‹จ์ผ best grasp๊ฐ€ ์•„๋‹Œ ๋‹ค์–‘ํ•œ ์ง‘ํ•ฉ์„ ์ƒ์„ฑํ•ด, ์šด๋™ํ•™ยท์ถฉ๋Œ ์ œ์•ฝ์„ ํ†ต๊ณผํ•˜๋Š” grasp๋ฅผ ์ฐพ์„ ํ™•๋ฅ ์„ ๋†’์ž…๋‹ˆ๋‹ค. ๋จธ๊ทธ ํ…Œ๋‘๋ฆฌ ์‚ฌ๋ก€๊ฐ€ ์ด๋ฅผ ๊ทน์ ์œผ๋กœ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค(86% vs 6%).
  • sim-to-real ์ „์ด. ์ˆœ์ˆ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•™์Šต๋งŒ์œผ๋กœ ์ถ”๊ฐ€ ๋‹จ๊ณ„ ์—†์ด ์‹ค๋กœ๋ด‡์—์„œ 88%๋ฅผ ๋‹ฌ์„ฑํ•ด, ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋น„์šฉ ์—†์ด ๋ฏธ์ง€ ๋ฌผ์ฒด๋กœ ํ™•์žฅ๋ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌํผ๋ฅผ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ๋กœ ํ‘œํ˜„. grasp๋ฅผ X_g ๋กœ ๋ Œ๋”๋งํ•ด ๋ฌผ์ฒด ์ ๊ณผ ํ•ฉ์น˜๋Š” evaluator ํ‘œํ˜„์ด, ๋‹จ์ˆœํžˆ 6D ์ž์„ธ๋ฅผ ์ฒซ ์ธต feature๋กœ ๋„ฃ๋Š” ๋ฐฉ์‹๋ณด๋‹ค ์ •ํ™•ํ•จ์„ ์‹คํ—˜์œผ๋กœ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

  • ๋‹จ์ผ ๋ฌผ์ฒดยท์ฃผ๋ณ€ ๋ฏธ๊ณ ๋ ค(์ €์ž ์ธ์ •). ๋ชจ๋“  latent๋ฅผ ๊ท ์ผ ์ƒ˜ํ”Œ ํ›„ ์ถฉ๋Œยท์šด๋™ํ•™ ๊ฒ€์‚ฌ๋กœ ์‚ฌํ›„ ์ œ๊ฑฐ ํ•˜๋ฉฐ, ์ฃผ๋ณ€ ๋ฌผ์ฒด์™€์˜ ์ถฉ๋Œ์„ sampler/evaluator๊ฐ€ ์ง์ ‘ ๊ณ ๋ คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์–ด์ˆ˜์„ ํ•œ ์žฅ๋ฉด์—์„œ๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค(์ €์ž๋Š” ์ฃผ๋ณ€ ๋ฌผ์ฒด๋ฅผ ๊ณ ๋ คํ•˜๋Š” ํ•™์Šต์„ ํ›„์† ๋ฐฉํ–ฅ์œผ๋กœ ์ œ์‹œ).
  • refinement์˜ ๊ตญ์†Œ์„ฑ. \partial S/\partial g ๋Š” ๊ตญ์†Œ ๊ทผ๋ฐฉ ๊ทผ์‚ฌ๋ผ 1cm/step์œผ๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ํฐ ๊ต์ •์ด ํ•„์š”ํ•œ grasp๋Š” ์—ฌ๋Ÿฌ step์ด ๋“ค๊ณ , ์ž˜๋ชป๋œ ๊ตญ์†Œ ์ตœ์ ์— ๊ฐ‡ํž ์—ฌ์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค(์ถ”์ธก).
  • ๋ฌผ์ฒด ๋ฒ”์ฃผ์˜ ์ œํ•œ. ํ•™์Šต ๋ฌผ์ฒด๊ฐ€ 6๊ฐœ ๋ฒ”์ฃผ(์ƒ์žยท์‹ค๋ฆฐ๋”ยทbowlยท๋ณ‘ยท๋จธ๊ทธ)์— ์ง‘์ค‘๋ผ, ํ˜•์ƒ์ด ํฌ๊ฒŒ ๋‹ค๋ฅธ ๋ฌผ์ฒด๋กœ์˜ ์ผ๋ฐ˜ํ™”๋Š” ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • ์ถ”๋ก  ๋น„์šฉ. refinement 1ํšŒ 0.3์ดˆ๋กœ, ๋ฐ˜๋ณต ํšŸ์ˆ˜๊ฐ€ ๋Š˜๋ฉด ์‹ค์‹œ๊ฐ„ ํ๋ฃจํ”„ ์ œ์–ด์—” ๋ถ€๋‹ด์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ €์ž๋Š” evaluator๋ฅผ ์‹ค์‹œ๊ฐ„ ๊ฐ€์ด๋“œ๋กœ ์“ฐ๋Š” ๋ฐฉํ–ฅ์„ ์–ธ๊ธ‰).

์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

6-DOF GraspNet์€ ๋ฏธ์ง€ ๋ฌผ์ฒด์— ๋Œ€ํ•œ 6-DOF grasp ์ƒ์„ฑ์„ VAE sampler + grasp evaluator + ๊ธฐ์šธ๊ธฐ ๊ธฐ๋ฐ˜ iterative refinement ์˜ 3๋‹จ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ํ‘ผ NVIDIA์˜ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค. VAE๊ฐ€ ๋ณต์žกยท๋ถˆ์—ฐ์†ํ•œ ์„ฑ๊ณต grasp ๋ถ„ํฌ(๋จธ๊ทธ์˜ ํ…Œ๋‘๋ฆฌยท์†์žก์ดยท๋ฐ”๋‹ฅ ๋“ฑ ๋‹ค์ค‘ ๋ชจ๋“œ)๋ฅผ ๋‹ค์–‘ํ•˜๊ฒŒ ์ƒ˜ํ”Œํ•˜๊ณ , evaluator๊ฐ€ ์ „์ด์  false positive๋ฅผ ๊ฑธ๋Ÿฌ๋‚ด๋ฉฐ, evaluator์˜ ๊ธฐ์šธ๊ธฐ๋กœ grasp๋ฅผ ์„ฑ๊ณต ์ž์„ธ๋กœ ๋ฐ€์–ด ์ •๋ฐ€๋„๋ฅผ ๋ณด๊ฐ•ํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜์น˜๋กœ ์ •๋ฆฌํ•˜๋ฉด, ์ˆœ์ˆ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•™์Šต๋งŒ์œผ๋กœ 7-DOF Franka์—์„œ ๋ฏธ์ง€ ๋ฌผ์ฒด 17๊ฐœ๋ฅผ 88%(๋ฒ”์ฃผ ํ‰๊ท  90%) ๋กœ ์ง‘์–ด GPD(47%)๋ฅผ ํฌ๊ฒŒ ์•ž์„ฐ๊ณ , ํŠนํžˆ ๋จธ๊ทธ 86% vs GPD 6% ๋กœ ๋‹ค์–‘ํ•œ grasp ์ƒ์„ฑ์˜ ๊ฐ€์น˜๋ฅผ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ablation์—์„œ๋Š” 2D latent์™€ ~10ํšŒ refinement, VAE ์ƒ˜ํ”Œ๋Ÿฌ์˜ ์šฐ์œ„(AUC 0.18 vs 0.07)๋ฅผ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

์‹ค๋ฌด ๊ด€์ ์—์„œ ์ด ์—ฐ๊ตฌ์˜ ๊ฐ€์น˜๋Š” โ€œ์™„์ „ํ•œ 3D ๋ชจ๋ธ ์—†์ด ๋ถ€๋ถ„ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ๋งŒ์œผ๋กœ, ๋‹ค์–‘ํ•˜๊ณ  ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ 6-DOF grasp๋ฅผ ํ•™์Šต ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ์„ฑยท๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ ๊ฒƒโ€ ์— ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹จ์ผ ๋ฌผ์ฒดยท์ฃผ๋ณ€ ๋ฏธ๊ณ ๋ ค๋ผ๋Š” ํ•œ๊ณ„๋Š” ๋ถ„๋ช…ํ•˜์ง€๋งŒ, ์ƒ์„ฑ+ํ‰๊ฐ€+๊ธฐ์šธ๊ธฐ ๊ฐœ์„  ์ด๋ผ๋Š” ํ‹€์€ ์ดํ›„ ๋‹ค์ˆ˜์˜ grasp ์—ฐ๊ตฌ(์˜ˆ: ๋ณธ ํฌ์ŠคํŠธ๊ฐ€ ํ•จ๊ป˜ ์ฐธ์กฐํ•˜๋Š” VCGS ๋“ฑ)์˜ ํ† ๋Œ€๊ฐ€ ๋œ ๊ธฐ๋…๋น„์  ์ž‘์—…์ž…๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee