Curieux.JY
  • JungYeon Lee
  • Post
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • 1. ๋ฌธ์ œ ์ •์˜: ์™œ ํˆฌ๋ช… ๋ฌผ์ฒด๊ฐ€ ์–ด๋ ค์šด๊ฐ€?
    • 2. ํ•ต์‹ฌ ์•„์ด๋””์–ด: Vision-Guided Tactile Poking
      • 2.1 ํฌํ‚น ์˜์—ญ์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€?
      • 2.2 PokePreNet: ํฌํ‚น ์˜์—ญ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜
      • 2.3 ํฌํ‚น ํฌ์ธํŠธ ์ƒ์„ฑ
      • 2.4 Heuristic Grasp ์ƒ์„ฑ
    • 3. ๋ฐ์ดํ„ฐ์…‹: Sim-to-Real ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ
    • 4. ์‹คํ—˜ ๊ฒฐ๊ณผ
      • 4.1 ํ•˜๋“œ์›จ์–ด ์„ค์ •
      • 4.2 PokePreNet ํ‰๊ฐ€
      • 4.3 ํฌํ‚น ์„ฑ๊ณต๋ฅ  ๋น„๊ต
      • 4.4 ์ตœ์ข… ํŒŒ์ง€ ์„ฑ๊ณต๋ฅ  (ํ•ต์‹ฌ ๊ฒฐ๊ณผ)
    • 5. ๊ฐ•์  ๋ถ„์„
    • 6. ํ•œ๊ณ„ ๋ฐ ๋น„ํŒ์  ๋ถ„์„
    • 7. ํ›„์† ์—ฐ๊ตฌ์™€์˜ ์—ฐ๊ฒฐ
    • 9. ์š”์•ฝ

๐Ÿ“ƒPokePreNet ๋ฆฌ๋ทฐ

tactile
transparent
Where Shall I Touch? Vision-Guided Tactile Poking for Transparent Object Grasping
Published

March 30, 2026

  • Paper Link
  • Project Page
  • Code Link
  1. ๐Ÿค” ๋กœ๋ด‡์˜ ํˆฌ๋ช… ๋ฌผ์ฒด ํŒŒ์ง€๋Š” ์‹œ๊ฐ์  ํ•œ๊ณ„๋กœ ์ธํ•ด ๋„์ „์ ์ธ ๊ณผ์ œ์ด๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์€ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Vision-guided tactile poking์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ’ก ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” PokePreNet์ด๋ผ๋Š” segmentation network๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌผ์ฒด์˜ ์ตœ์  โ€™poking regionsโ€™๋ฅผ ์˜ˆ์ธกํ•œ ๋‹ค์Œ, GelSight tactile sensor๋กœ ์ •ํ™•ํ•œ ๊ตญ๋ถ€ ํ”„๋กœํŒŒ์ผ์„ ์–ป์–ด ํŒŒ์ง€๋ฅผ ๊ณ„ํšํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿ“ˆ ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ํˆฌ๋ช… ๋ฌผ์ฒด ํŒŒ์ง€ ์„ฑ๊ณต๋ฅ ์„ 38.9%์—์„œ 85.2%๋กœ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ ๋„์ „์ ์ธ ๋ฌผ์ฒด์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•จ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ ๋กœ๋ด‡์ด ํˆฌ๋ช…ํ•œ ๋ฌผ์ฒด๋ฅผ ์ง‘๋Š” ๋ฐ ์žˆ์–ด ํ˜„์žฌ ์‹œ๊ฐ ์„ผ์„œ ๊ธฐ๋ฐ˜ grasping method์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด vision-guided tactile poking framework๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ํˆฌ๋ช…ํ•œ ๋ฌผ์ฒด๋Š” ๋น›์˜ ๋ฐ˜์‚ฌ์™€ ๊ตด์ ˆ(reflection and refraction)๋กœ ์ธํ•ด depth sensor๊ฐ€ ์ •ํ™•ํ•œ ์ •๋ณด๋ฅผ ์–ป๊ธฐ ์–ด๋ ต๊ณ , ์ด๋กœ ์ธํ•ด ๋Œ€๋ถ€๋ถ„์˜ ๊ธฐ์กด grasping method๋ฅผ ์ง์ ‘ ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต๋‹ค. ์ด์— ์ธ๊ฐ„์ด ํˆฌ๋ช…ํ•œ ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐฉ์‹, ์ฆ‰ ๋Œ€๋žต์ ์ธ ์œค๊ณฝ(coarse profile)์„ ํŒŒ์•…ํ•œ ํ›„ ๊ด€์‹ฌ ์˜์—ญ์„ ์ฐ”๋Ÿฌ(poking) ๋ฏธ์„ธํ•œ ์œค๊ณฝ(fine profile)์„ ์–ป์–ด graspingํ•˜๋Š” ๋ฐฉ์‹์—์„œ ์˜๊ฐ์„ ๋ฐ›์•˜๋‹ค.

์ œ์•ˆ๋œ framework๋Š” ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ๋‹ค:

  1. Poking Region Segmentation: ๋จผ์ € RGB ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ํˆฌ๋ช… ๋ฌผ์ฒด์˜ โ€œpoking regionsโ€์„ ์˜ˆ์ธกํ•œ๋‹ค. Poking regions๋Š” ๋ฌผ์ฒด์˜ ์ƒํƒœ์— ์ตœ์†Œํ•œ์˜ ๋ฐฉํ•ด๋ฅผ ์ฃผ๋ฉด์„œ ์ข‹์€ ์ด‰๊ฐ ์ •๋ณด(tactile reading)๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ˆ˜ํ‰ ์ƒ๋‹จ ์˜์—ญ์„ ์˜๋ฏธํ•œ๋‹ค.
  2. Vision-guided Tactile Poking: ์˜ˆ์ธก๋œ poking regions์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋กœ๋ด‡ ํŒ”์ด GelSight tactile sensor๋ฅผ ์ด์šฉํ•ด ๋ฌผ์ฒด๋ฅผ ์ฐŒ๋ฅธ๋‹ค. ์ด ์ ‘์ด‰์„ ํ†ตํ•ด ๋ฌผ์ฒด์˜ ๊ตญ๋ถ€์ ์ธ ํ”„๋กœํŒŒ์ผ(local profiles) ์ •๋ณด๋ฅผ ์–ป๋Š”๋‹ค.
  3. Heuristic Grasp Planning: ์ด‰๊ฐ ์ •๋ณด๋ฅผ ํ†ตํ•ด ๊ฐœ์„ ๋œ ๋ฌผ์ฒด์˜ ํ”„๋กœํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜์—ฌ ํœด๋ฆฌ์Šคํ‹ฑ ๊ธฐ๋ฐ˜์˜ grasping proposal์„ ์ƒ์„ฑํ•˜๊ณ  ๋ฌผ์ฒด๋ฅผ graspingํ•œ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  (Core Methodology)

1. Poking Region Segmentation (PokePreNet)

Poking region segmentation์€ instance segmentation ๋ฌธ์ œ๋กœ ๋‹ค๋ฃจ์–ด์ง„๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Mask R-CNN์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ PokePreNet์ด๋ผ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๊ฐœ์„  ์‚ฌํ•ญ์„ ํฌํ•จํ•œ๋‹ค:

  • Larger Output Feature Map: Mask R-CNN์˜ ํ‘œ์ค€ ๋งˆ์Šคํฌ ์ถœ๋ ฅ ํฌ๊ธฐ์ธ 28x28์„ ํ™•์žฅํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐœ์˜ ์ถ”๊ฐ€์ ์ธ deconvolutional layer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งˆ์Šคํฌ ํฌ๊ธฐ๋ฅผ 112x112๋กœ ๋Š˜๋ฆฐ๋‹ค. deconvolutional layer์˜ ํ•„ํ„ฐ(S_f)๋Š” 2x2, ํŒจ๋”ฉ(d)์€ 0, ์ŠคํŠธ๋ผ์ด๋“œ(s)๋Š” 2๋กœ ์„ค์ •๋˜์–ด feature map์˜ ํฌ๊ธฐ๋ฅผ ๋‘ ๋ฐฐ๋กœ ๋งŒ๋“ ๋‹ค. ์ถœ๋ ฅ feature map์˜ ํฌ๊ธฐ(S_o)๋Š” ์ž…๋ ฅ feature map์˜ ํฌ๊ธฐ(S_i)์™€ ๋‹ค์Œ ๊ด€๊ณ„๋ฅผ ๋”ฐ๋ฅธ๋‹ค: S_o = s \times (S_i - 1) + S_f - 2 \times d

  • Pixel-level Positive-Negative-balanced Loss (LPN Loss): ์ผ๋ฐ˜์ ์ธ Mask R-CNN์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํ‰๊ท  ์ด์ง„ cross-entropy loss๋Š” poking regions์ด bounding box์˜ ์ž‘์€ ๋ถ€๋ถ„๋งŒ์„ ์ฐจ์ง€ํ•˜์—ฌ positive/negative ํ”ฝ์…€ ๋ถ„ํฌ๊ฐ€ ์‹ฌํ•˜๊ฒŒ ๋ถˆ๊ท ํ˜•ํ•œ ๋ฌธ์ œ(์˜ˆ: 5%๋งŒ poking region)๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋กœ ์ธํ•ด poking region์— ๋Œ€ํ•œ loss ๊ธฐ์—ฌ๋„๊ฐ€ ์ž‘์•„์ ธ ์ •๋ฐ€๋„๊ฐ€ ๋‚ฎ์•„์ง„๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด PokePreNet์€ Positive-Negative-balanced loss (L_{mask})๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค: L_{mask}(X_i) = - \beta_i \sum_{j \in Y^+_i} \log Pr(y_j = 1 | X_i) - \sum_{j \in Y^-_i} \log Pr(y_j = 0 | X_i) ์—ฌ๊ธฐ์„œ Y^+_i์™€ Y^-_i๋Š” ๊ฐ๊ฐ i-๋ฒˆ์งธ RoI (X_i)์— ๋Œ€ํ•œ positive ๋ฐ negative ground truth label set์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. \beta_i๋Š” positive/negative ํ”ฝ์…€ ๊ฐ„์˜ loss ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ๊ฐ€์ค‘์น˜์ด๋‹ค. ์ดˆ๊ธฐ PN loss์—์„œ๋Š” \beta_i = |Y^-_i| / |Y^+_i|๋กœ ์„ค์ •๋˜์—ˆ์œผ๋‚˜, ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ทนํžˆ ์ž‘์€ poking region์˜ ๊ฒฝ์šฐ \beta_i๊ฐ€ ๋งค์šฐ ์ปค์ ธ false positive๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋‹ค. ์ด๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด Log-Positive-Negative-balanced (LPN) loss๋ฅผ ๋„์ž…ํ•˜์—ฌ \beta_i์— ๋กœ๊ทธ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ฐ’์˜ ๋ฒ”์œ„๋ฅผ ์ œํ•œํ•œ๋‹ค: \beta_i = \begin{cases} \ln\left(\frac{|Y^-_i|}{|Y^+_i|}\right) & \text{if } |Y^+_i| > 0 \\ 1 & \text{if } |Y^+_i| = 0 \end{cases} ์ด ๋ฐฉ์‹์€ hard example mining๊ณผ ์œ ์‚ฌํ•˜๋ฉฐ, ์ž‘์€ poking region์„ ๊ฐ€์ง„ instance์˜ ํ”ฝ์…€(hard examples)์„ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•˜๊ฒŒ ํ•œ๋‹ค.

2. Vision-guided Tactile Poking

PokePreNet์—์„œ ๊ฐ์ง€๋œ poking region ๋งˆ์Šคํฌ (M_{poking})๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ํ”„๋ ˆ์ž„ ๋‚ด poking point (P_t = [x_t, y_t])๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

  • OpenCV์˜ findContours ํ•จ์ˆ˜๋กœ poking region ๋งˆ์Šคํฌ์˜ ์™ธ๋ถ€ ์œค๊ณฝ(external contour)์„ ์ฐพ๋Š”๋‹ค.
  • fitEllipse ํ•จ์ˆ˜๋กœ ์œค๊ณฝ์— ํƒ€์›์„ ๋งž์ถฐ ์ค‘์‹ฌ์ (P_c)์„ ์–ป๋Š”๋‹ค.
  • ๋งŒ์•ฝ P_c๊ฐ€ poking region ๋งˆ์Šคํฌ ๋‚ด๋ถ€์— ์žˆ์œผ๋ฉด (P_c \in M_{poking}), P_c๋ฅผ poking point P_t๋กœ ์„ค์ •ํ•œ๋‹ค. (์ด๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹จ์ˆœ ์—ฐ๊ฒฐ๋œ ๋งˆ์Šคํฌ์— ํ•ด๋‹นํ•˜๋ฉฐ, ์›ํ†ตํ˜• ๋ฌผ์ฒด์˜ ์ธก๋ฉด๊ณผ ๊ฐ™์ด ์ค‘์‹ฌ์ด ๋‚ด๋ถ€์— ์žˆ๋Š” ๊ฒฝ์šฐ).
  • ๋งŒ์•ฝ P_c๊ฐ€ poking region ๋งˆ์Šคํฌ ์™ธ๋ถ€์— ์žˆ์œผ๋ฉด (์˜ˆ: ๋ง ํ˜•ํƒœ ๋งˆ์Šคํฌ), P_c์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด positive ํ”ฝ์…€์„ poking point P_t๋กœ ์„ค์ •ํ•œ๋‹ค. (GelSight sensor๊ฐ€ ๋ฌผ์ฒด ๋‚ด๋ถ€๋กœ ๋“ค์–ด๊ฐ€์ง€ ์•Š๋„๋ก ๋ฐฉ์ง€). ๋กœ๋ด‡ ํŒ”์€ ์ด poking point๋กœ ์•ˆ๋‚ด๋˜๋ฉฐ, GelSight sensor๊ฐ€ ๋ฌผ์ฒด์™€ ์ ‘์ด‰ํ•˜๋ฉด ๋ฉˆ์ถ˜๋‹ค. ์ ‘์ด‰ ๊ฐ์ง€๋Š” ๊ฐ„๋‹จํ•œ ์ด๋ฏธ์ง€ subtraction ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•œ๋‹ค. ๋ ˆํผ๋Ÿฐ์Šค ์ด๋ฏธ์ง€์™€ ํ˜„์žฌ ํ”„๋ ˆ์ž„ ๊ฐ„์˜ ์š”์†Œ๋ณ„ ์ ˆ๋Œ€ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์ด์ง„ ์ž„๊ณ„๊ฐ’ ์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ฐจ์ด ํ”„๋ ˆ์ž„์—์„œ positive ํ”ฝ์…€ ์ˆ˜๊ฐ€ ๋ฏธ๋ฆฌ ์ •์˜๋œ ์ž„๊ณ„๊ฐ’์„ ์ดˆ๊ณผํ•˜๋ฉด ์ ‘์ด‰์œผ๋กœ ์ธ์‹ํ•œ๋‹ค.

3. Heuristic Transparent Object Grasping

์˜ˆ์ธก๋œ poking region๊ณผ ์ด‰๊ฐ poking์„ ํ†ตํ•ด ์–ป์€ ๋ฌผ์ฒด์˜ ๊ตญ๋ถ€ ํ”„๋กœํŒŒ์ผ(์ฆ‰, ์ ‘์ด‰ ์œ„์น˜)์„ ๊ธฐ๋ฐ˜์œผ๋กœ top-down parallel grasping์„ ์œ„ํ•œ ํœด๋ฆฌ์Šคํ‹ฑ grasp proposal (G_{hrst} = [x, y, z, w, \theta])์„ ์ƒ์„ฑํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ [x, y, z]๋Š” ์„ธ๊ณ„ ์ขŒํ‘œ๊ณ„(world frame)์—์„œ์˜ grasp ์ค‘์‹ฌ, w๋Š” gripper ํญ, \theta๋Š” ์ˆ˜์ง ์ถ• ์ฃผ์œ„์˜ ๋ฐฉํ–ฅ์ด๋‹ค.

  • Case 1: ellipse.centroid in Mpoking (์ค‘์‹ฌ ๊ธฐ๋ฐ˜ grasp) Poking position P^W_t (์„ธ๊ณ„ ์ขŒํ‘œ๊ณ„)๋Š” P^W_c์™€ ๋™์ผํ•˜๊ฒŒ ์„ค์ •๋œ๋‹ค. x, y, z \leftarrow P^W_t w \leftarrow \text{maximum gripper width} \theta \leftarrow \text{ellipse.rotation angle} (ํƒ€์›์˜ ์งง์€ ์ถ•์„ ๋”ฐ๋ผ grasping)
  • Case 2: ellipse.centroid not in Mpoking (๊ฐ€์žฅ์ž๋ฆฌ ๊ธฐ๋ฐ˜ grasp ๋˜๋Š” ์ค‘์‹ฌ ๊ธฐ๋ฐ˜ grasp) P^W_c (์„ธ๊ณ„ ์ขŒํ‘œ๊ณ„)๋Š” pin-hole camera model์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋œ๋‹ค. D \leftarrow \text{calculateDistance}(P^W_c, P^W_t) Angle \leftarrow \text{calculateAngle}(P^W_c, P^W_t) ๋งŒ์•ฝ D๊ฐ€ gripper finger ํญ์˜ ์ ˆ๋ฐ˜๋ณด๋‹ค ํฌ๋ฉด (gripper๊ฐ€ ๋ฌผ์ฒด์— ์‚ฝ์ž…๋  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ), edge grasp๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. x, y, z \leftarrow P^W_t w \leftarrow 2 \times D \theta \leftarrow Angle (๋ฒกํ„ฐ <P^W_c, P^W_t>์— ํ‰ํ–‰) ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด (gripper๊ฐ€ ๋ฌผ์ฒด์— ์‚ฝ์ž…๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ), ์ค‘์‹ฌ ๊ธฐ๋ฐ˜ grasp๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. x, y, z \leftarrow P^W_c w \leftarrow \text{maximum gripper width} \theta \leftarrow \text{ellipse.rotation angle}

๋ฐ์ดํ„ฐ์…‹ ๋ฐ ์‹คํ—˜

  • ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹: Blender์˜ ๋ฌผ๋ฆฌ ์—”์ง„๊ณผ LuxCoreRender ๋ Œ๋”๋ง ์—”์ง„์„ ์‚ฌ์šฉํ•˜์—ฌ 9,000๊ฐœ ์ด์ƒ์˜ ๋ทฐ๋ฅผ ๊ฐ€์ง„ ๊ณ ํ’ˆ์งˆ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ–ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์€ ํˆฌ๋ช… ๋ฌผ์ฒด์˜ specular highlights ๋ฐ caustics์™€ ๊ฐ™์€ ๋ฏธ๋ฌ˜ํ•œ ํšจ๊ณผ๋ฅผ ํฌํ•จํ•˜์—ฌ ํ˜„์‹ค์ ์ธ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ, RGB ์ด๋ฏธ์ง€, depth ์ด๋ฏธ์ง€, surface normals, instance masks, ๊ทธ๋ฆฌ๊ณ  ํŠนํžˆ poking regions์˜ ground truth๋ฅผ ์ž๋™ ์ƒ์„ฑํ•œ๋‹ค. domain randomisation์„ ํ†ตํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜-์‹ค์„ธ๊ณ„ ๊ฐ„์˜ ๊ฐ„๊ทน(sim2real gap)์„ ์ค„์˜€๋‹ค.
  • ์‹ค์„ธ๊ณ„ ๋ฐ์ดํ„ฐ์…‹: 9๊ฐ€์ง€ ํˆฌ๋ช… ํ”Œ๋ผ์Šคํ‹ฑ ๋ฐ ์œ ๋ฆฌ ๋ฌผ์ฒด๋ฅผ ํฌํ•จํ•˜๋Š” 180๊ฐœ์˜ ์‹ค์„ธ๊ณ„ ์ด๋ฏธ์ง€๋ฅผ ์ˆ˜์ง‘ํ•˜์—ฌ PokePreNet์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ–ˆ๋‹ค.
  • ์‹คํ—˜ ๊ฒฐ๊ณผ: PokePreNet์€ real-world test benchmark์—์„œ 0.360์˜ ๋†’์€ mAP๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค. vision-guided tactile poking์€ bounding box๋‚˜ instance mask๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค poking ์„ฑ๊ณต๋ฅ ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์œผ๋ฉฐ (89.8%), ์ตœ์ข…์ ์œผ๋กœ ํˆฌ๋ช… ๋ฌผ์ฒด grasping ์„ฑ๊ณต๋ฅ ์„ 38.9%์—์„œ 85.2%๋กœ ๋Œ€ํญ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. ์ž‘์€ ๋ฌผ์ฒด(์˜ˆ: vial)์— ๋Œ€ํ•œ tactile alignment ์‹คํ—˜์—์„œ๋Š” ์†-๋ˆˆ ๋ฐ ์„ผ์„œ-๋ง๋‹จ ํšจ๊ณผ๊ธฐ ๋ณด์ • ์˜ค์ฐจ๋ฅผ ์™„ํ™”ํ•˜์—ฌ grasping ์„ฑ๊ณต๋ฅ ์„ 80%์—์„œ 100%๋กœ ๋†’์ผ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.

์ด ์—ฐ๊ตฌ๋Š” ํˆฌ๋ช… ๋ฌผ์ฒด grasping ๋ฌธ์ œ์— ์‹œ๊ฐ ๋ฐ ์ด‰๊ฐ ์„ผ์„œ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํ†ตํ•ฉํ•œ ์ตœ์ดˆ์˜ ์‚ฌ๋ก€์ด๋ฉฐ, ๊ทธ ๋‹จ์ˆœ์„ฑ ๋•๋ถ„์— ๋‹ค๋ฅธ force ๋˜๋Š” tactile sensors์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•˜๊ณ  ๋‹ค๋ฅธ challenging objects์˜ grasping์—๋„ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

1. ๋ฌธ์ œ ์ •์˜: ์™œ ํˆฌ๋ช… ๋ฌผ์ฒด๊ฐ€ ์–ด๋ ค์šด๊ฐ€?

๋กœ๋ด‡ ํŒ”์— RGB-D ์นด๋ฉ”๋ผ๋ฅผ ๋‹ฌ๊ณ  ์œ ๋ฆฌ์ปต์„ ์žก์œผ๋ ค ํ•˜๋ฉด ๋ฌด์Šจ ์ผ์ด ์ƒ๊ธธ๊นŒ? ์นด๋ฉ”๋ผ๋Š” ๋ฌผ์ฒด๋ฅผ ๊ฑฐ์˜ ๋ณด์ง€ ๋ชปํ•˜๊ฑฐ๋‚˜, ๋ณด๋”๋ผ๋„ ๊นŠ์ด ์ •๋ณด๊ฐ€ ์™„์ „ํžˆ ์—‰ํ„ฐ๋ฆฌ๋กœ ๋‚˜์˜จ๋‹ค.

์ด ๋ฌธ์ œ๋Š” ํˆฌ๋ช… ์žฌ์งˆ์˜ ๋‘ ๊ฐ€์ง€ ๊ด‘ํ•™์  ํŠน์„ฑ์—์„œ ๋น„๋กฏ๋œ๋‹ค.

โ‘  ์ƒ‰์ƒ/ํ…์Šค์ฒ˜ ํŠน์ง•์˜ ๋ถ€์žฌ. ๋ถˆํˆฌ๋ช… ๋ฌผ์ฒด๋Š” ๋ฌผ์ฒด ๊ณ ์œ ์˜ ์ƒ‰๊ณผ ํ‘œ๋ฉด ์งˆ๊ฐ์„ ๊ฐ–์ง€๋งŒ, ์œ ๋ฆฌ๋‚˜ ํ”Œ๋ผ์Šคํ‹ฑ์€ ๋ฐฐ๊ฒฝ์„ ๊ทธ๋Œ€๋กœ ํˆฌ๊ณผ์‹œํ‚จ๋‹ค. CNN ๊ธฐ๋ฐ˜ ํƒ์ง€๊ธฐ๊ฐ€ ํ•™์Šตํ•œ ์‹œ๊ฐ์  ํŠน์ง•์ด ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด๋‹ค.

โ‘ก ๊ธฐํ•˜๊ด‘ํ•™ ๊ฐ€์ •์˜ ๋ถ•๊ดด. Intel RealSense ๊ฐ™์€ structured light / ToF ๊นŠ์ด ์„ผ์„œ๋Š” ๋น›์ด ํ‘œ๋ฉด์—์„œ ๋ฐ˜์‚ฌ๋œ๋‹ค๋Š” ๊ฐ€์ • ์œ„์— ์„ค๊ณ„๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์œ ๋ฆฌ๋Š” ๊ตด์ ˆ๊ณผ ๋ฐ˜์‚ฌ๋ฅผ ๋™์‹œ์— ์ผ์œผ์ผœ, ๊ฐ™์€ ํ‘œ๋ฉด์—์„œ ์„ผ์„œ ๊ด‘์ด ์ œ๋ฉ‹๋Œ€๋กœ ์‚ฐ๋ž€๋œ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๊นŠ์ด ๋งต์— ๊ตฌ๋ฉ(hole)์ด ์ƒ๊ธฐ๊ฑฐ๋‚˜ ์™„์ „ํžˆ ์ž˜๋ชป๋œ ๊ฐ’์ด ๋‚˜์˜จ๋‹ค.

\underbrace{\text{ํˆฌ๋ช… ๋ฌผ์ฒด}}_{\text{๊ตด์ ˆยท๋ฐ˜์‚ฌ}} \Rightarrow \underbrace{d_{\text{sensor}} \neq d_{\text{true}}}_{\text{๊นŠ์ด ์˜ค๋ฅ˜}}

๊ธฐ์กด ์—ฐ๊ตฌ๋“ค(ClearGrasp, Dex-NeRF ๋“ฑ)์€ ์ด ๋ฌธ์ œ๋ฅผ ๋น„์ „๋งŒ์œผ๋กœ ํ’€๋ ค ํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋…ผ๋ฌธ์€ ๋‹ค๋ฅธ ์งˆ๋ฌธ์„ ๋˜์ง„๋‹ค.

โ€œ์ธ๊ฐ„์€ ์–ด๋–ป๊ฒŒ ์œ ๋ฆฌ์ปต์„ ์ง‘๋Š”๊ฐ€?โ€

์ธ๊ฐ„์€ ์œ ๋ฆฌ์ปต์„ ๋ˆˆ์œผ๋กœ ๋Œ€๋žต ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•˜๊ณ , ์†๊ฐ€๋ฝ์œผ๋กœ ์‚ด์ง ๊ฑด๋“œ๋ ค์„œ ์ •ํ™•ํ•œ ํ˜•์ƒ์„ ํ™•์ธํ•œ ๋’ค ์ง‘๋Š”๋‹ค. ์‹œ๊ฐ์€ โ€œ์–ด๋””์ฏค์— ์žˆ๋Š”์ง€โ€, ์ด‰๊ฐ์€ โ€œ์ •ํ™•ํžˆ ์–ด๋–ป๊ฒŒ ์ƒ๊ฒผ๋Š”์ง€โ€๋ฅผ ๋‹ด๋‹นํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ์ด ๋ถ„์—… ๊ตฌ์กฐ๋ฅผ ๋กœ๋ด‡์— ๊ทธ๋Œ€๋กœ ๊ตฌํ˜„ํ•œ๋‹ค.


2. ํ•ต์‹ฌ ์•„์ด๋””์–ด: Vision-Guided Tactile Poking

์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์€ ์„ธ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

flowchart LR
    A["RGB ์ด๋ฏธ์ง€"] --> B["PokePreNet\n(ํฌํ‚น ์œ„์น˜ ์˜ˆ์ธก)"]
    B --> C["ํฌํ‚น ํฌ์ธํŠธ ์ƒ์„ฑ"]
    C --> D["๋กœ๋ด‡ ํŒ” ์ด๋™\n& GelSight ์ ‘์ด‰"]
    D --> E["์ด‰๊ฐ ์ด๋ฏธ์ง€\n(๋กœ์ปฌ ํ˜•์ƒ ํš๋“)"]
    E --> F["Heuristic Grasp\n๊ณ„ํš & ์‹คํ–‰"]

2.1 ํฌํ‚น ์˜์—ญ์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ฐœ๋…์ธ poking region์€ ๋‹จ์ˆœํžˆ โ€œ๋ฌผ์ฒด์˜ ์–ด๋”˜๊ฐ€โ€๊ฐ€ ์•„๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ํ…Œ์ด๋ธ” ํ‘œ๋ฉด๊ณผ ๋ฒ•์„  ๋ฒกํ„ฐ๊ฐ€ ๋น„์Šทํ•œ ์ˆ˜ํ‰ ์ƒ๋‹จ ์˜์—ญ์œผ๋กœ ์ •์˜๋œ๋‹ค. ์ด ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ์ด์œ ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค.

  • ์ข‹์€ GelSight ์ฝ๊ธฐ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค: GelSight๋Š” ํ‘œ๋ฉด์— ์ˆ˜์ง์œผ๋กœ ์ ‘์ด‰ํ•  ๋•Œ ๊ฐ€์žฅ ์„ ๋ช…ํ•œ ํ˜•์ƒ์„ ์บก์ฒ˜ํ•œ๋‹ค. ์ˆ˜ํ‰ ์ƒ๋‹จ๋ฉด์€ ๋กœ๋ด‡ ํŒ”์ด ์œ„์—์„œ ์ˆ˜์ง์œผ๋กœ ๋‚ด๋ฆฌ๊ฝ‚๊ธฐ ๊ฐ€์žฅ ์ข‹์€ ์˜์—ญ์ด๋‹ค.
  • ๋ฌผ์ฒด ์ƒํƒœ๋ฅผ ์ตœ์†Œ๋กœ ๊ต๋ž€ํ•œ๋‹ค: ์ธก๋ฉด์„ ๋ฐ€๋ฉด ๋ฌผ์ฒด๊ฐ€ ๋„˜์–ด์ง€๊ฑฐ๋‚˜ ์ด๋™ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ƒ๋‹จ์„ ๊ฐ€๋ณ๊ฒŒ ์ฐ์œผ๋ฉด ๋ฌผ์ฒด๋Š” ์ œ์ž๋ฆฌ์— ์žˆ๋‹ค.

์ฆ‰ ํฌํ‚น ์˜์—ญ์€ โ€œ์ •๋ณด๋Ÿ‰์ด ๋†’์œผ๋ฉด์„œ ๋ฌผ์ฒด๋ฅผ ๋œ ๋ฐฉํ•ดํ•˜๋Š”โ€ ์ตœ์ ์˜ ์ ‘์ด‰ ํ›„๋ณด๋‹ค. ์Œ๋ฃŒ์ˆ˜ ๋ณ‘์ด๋ฉด ๋šœ๊ป‘ ๋ถ€๋ถ„, ์œ ๋ฆฌ์ปต์ด๋ฉด ์ž…๊ตฌ ํ…Œ๋‘๋ฆฌ ๊ฐ™์€ ๊ณณ์ด๋‹ค.

2.2 PokePreNet: ํฌํ‚น ์˜์—ญ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜

ํฌํ‚น ์˜์—ญ ์˜ˆ์ธก์€ instance segmentation ๋ฌธ์ œ๋กœ ์ •์‹ํ™”๋œ๋‹ค. Mask R-CNN์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋˜, ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ์ˆ˜์ •์ด ๊ฐ€ํ•ด์ง„๋‹ค.

๋ฌธ์ œ 1 โ€” ํฌํ‚น ์˜์—ญ์€ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ๋‚ด์—์„œ ๋งค์šฐ ์ž‘๋‹ค.

์œ ๋ฆฌ๋ณ‘์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ๋Œ€๋น„ ์ƒ๋‹จ ์ˆ˜ํ‰๋ฉด์€ ์ „์ฒด ๋ฉด์ ์˜ ์•ฝ 5%๋ฐ–์— ์•ˆ ๋œ๋‹ค. ํ‘œ์ค€ Mask R-CNN์˜ RoI ํฌ๊ธฐ์—์„œ๋Š” ์ด ์ž‘์€ ์˜์—ญ์„ ์ •๋ฐ€ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๊ธฐ ์–ด๋ ต๋‹ค.

โ†’ ํ•ด๊ฒฐ์ฑ…: ๋งˆ์Šคํฌ ํ—ค๋“œ์— ๋””์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋ฅผ ์ถ”๊ฐ€ํ•ด ์ถœ๋ ฅ ํŠน์ง• ๋งต ํ•ด์ƒ๋„๋ฅผ ๋†’์ธ๋‹ค.

๋ฌธ์ œ 2 โ€” ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜•.

positive(ํฌํ‚น ์˜์—ญ) ํ”ฝ์…€์ด ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ๋‚ด ์ „์ฒด ํ”ฝ์…€์˜ 5%๋ผ๋ฉด, ํ‘œ์ค€ binary cross-entropy๋Š” negative ํ”ฝ์…€์— ์ง€๋ฐฐ๋‹นํ•ด ํฌํ‚น ์˜์—ญ์„ ๋ฌด์‹œํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต๋œ๋‹ค.

โ†’ ํ•ด๊ฒฐ์ฑ…: Positive-Negative balanced (PN) loss๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค.

\mathcal{L}_{\text{mask}}(X_i) = -\beta_i \sum_{j \in \mathcal{Y}^+_i} \log \Pr(y_j = 1 | X_i) - \sum_{j \in \mathcal{Y}^-_i} \log \Pr(y_j = 0 | X_i)

์—ฌ๊ธฐ์„œ \beta_i = |\mathcal{Y}^-_i| / |\mathcal{Y}^+_i| ๋กœ ์ธ์Šคํ„ด์Šค๋ณ„ positive:negative ํ”ฝ์…€ ๋น„์œจ์˜ ์—ญ์ˆ˜๋ฅผ ๊ฐ€์ค‘์น˜๋กœ ์“ด๋‹ค. ํฌํ‚น ์˜์—ญ ํ”ฝ์…€์ด ์ ์„์ˆ˜๋ก ํ•ด๋‹น ํ”ฝ์…€์˜ loss ๊ธฐ์—ฌ๊ฐ€ ์ž๋™์œผ๋กœ ์ปค์ง„๋‹ค.

2.3 ํฌํ‚น ํฌ์ธํŠธ ์ƒ์„ฑ

์˜ˆ์ธก๋œ ํฌํ‚น ์˜์—ญ ๋งˆ์Šคํฌ์—์„œ ์‹ค์ œ ์ ‘์ด‰ ํฌ์ธํŠธ P_t๋ฅผ ์–ด๋–ป๊ฒŒ ๊ณ„์‚ฐํ• ๊นŒ?

๋จผ์ € OpenCV findContours โ†’ fitEllipse๋กœ ๋งˆ์Šคํฌ์˜ ์™ธ๊ณฝ์„ ์— ํƒ€์›์„ ํ”ผํŒ…ํ•ด ์ค‘์‹ฌ P_c๋ฅผ ๊ตฌํ•œ๋‹ค. ๊ทธ ๋‹ค์Œ ๋งˆ์Šคํฌ ํ˜•ํƒœ์— ๋”ฐ๋ผ ๋‘ ๊ฐ€์ง€ ์ผ€์ด์Šค๋กœ ๋‚˜๋‰œ๋‹ค.

  • ๋‹จ์ˆœ ์—ฐ๊ฒฐ ์˜์—ญ (์˜ˆ: ์›ํ˜• ๋šœ๊ป‘): P_t = P_c (ํƒ€์› ์ค‘์‹ฌ ๊ทธ๋Œ€๋กœ)
  • ๋ง ํ˜•ํƒœ ์˜์—ญ (์˜ˆ: ์ปต ํ…Œ๋‘๋ฆฌ): P_c์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํฌํ‚น ์˜์—ญ ์–‘์„ฑ ํ”ฝ์…€๋กœ ์„ค์ •. ์ค‘์‹ฌ์— ์ฐ์œผ๋ฉด GelSight๊ฐ€ ์ปต ์•ˆ์œผ๋กœ ๋“ค์–ด๊ฐ€๋ฒ„๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

2.4 Heuristic Grasp ์ƒ์„ฑ

์ด‰๊ฐ์œผ๋กœ ์–ป์€ ๋กœ์ปฌ ํ˜•์ƒ(์ ‘์ด‰ ์œ„์น˜ P^W_t)๊ณผ ์˜ˆ์ธก๋œ ํฌํ‚น ์˜์—ญ์„ ๋ฐ”ํƒ•์œผ๋กœ 5์ฐจ์› ๊ทธ๋ž˜์Šคํ”„ ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

G_{\text{hrst}} = [x,\; y,\; z,\; w,\; \theta]

[x, y, z]๋Š” ๊ทธ๋ž˜์Šคํ”„ ์ค‘์‹ฌ, w๋Š” ๊ทธ๋ฆฌํผ ๋„ˆ๋น„, \theta๋Š” ๋ฐฉํ–ฅ์ด๋‹ค. ์—ฌ๊ธฐ์„œ๋„ ๋‘ ๊ฐ€์ง€ ์ผ€์ด์Šค๊ฐ€ ์žˆ๋‹ค.

  • P_c๊ฐ€ ํฌํ‚น ์˜์—ญ ๋‚ด๋ถ€: ์ค‘์‹ฌ ๊ธฐ๋ฐ˜ ํŒŒ์ง€(centroid grasp) โ€” ์›ํ†ตํ˜•์ด๋‚˜ ์‚ฌ๊ฐํ˜• ๋ฌผ์ฒด์— ์ ํ•ฉ
  • P_c๊ฐ€ ํฌํ‚น ์˜์—ญ ๋ฐ–: ๊ฑฐ๋ฆฌ D(P^W_c, P^W_t)๊ฐ€ ๊ทธ๋ฆฌํผ ์†๊ฐ€๋ฝ ๋„ˆ๋น„์˜ ์ ˆ๋ฐ˜๋ณด๋‹ค ํฌ๋ฉด edge grasp ์‚ฌ์šฉ

์ด ํœด๋ฆฌ์Šคํ‹ฑ์€ ๋ณต์žกํ•œ ํ•™์Šต ์—†์ด ๊ธฐํ•˜ํ•™์  ์ถ”๋ก ๋งŒ์œผ๋กœ ๊ทธ๋ž˜์Šคํ”„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ์ด๊ฒƒ์ด ์ด ์‹œ์Šคํ…œ์˜ ๊ฐ•์ ์ด์ž ํ•œ๊ณ„๋‹ค.


3. ๋ฐ์ดํ„ฐ์…‹: Sim-to-Real ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ

์‹ค์ œ ํˆฌ๋ช… ๋ฌผ์ฒด์— ๋Œ€ํ•ด ํฌํ‚น ์˜์—ญ ๋ ˆ์ด๋ธ”์„ ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋ถ™์ด๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ต๊ณ  ์˜ค๋ž˜ ๊ฑธ๋ฆฐ๋‹ค. ๋…ผ๋ฌธ์€ ์ด ๋ฌธ์ œ๋ฅผ Blender ๊ธฐ๋ฐ˜ ํฌํ† ๋ฆฌ์–ผ๋ฆฌ์Šคํ‹ฑ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋กœ ํ•ด๊ฒฐํ•œ๋‹ค.

๊ตฌ์„ฑ: - 9,000์žฅ ์ด์ƒ์˜ RGB ์ด๋ฏธ์ง€ + ํฌํ‚น ์˜์—ญ ๋งˆ์Šคํฌ ์–ด๋…ธํ…Œ์ด์…˜ - ๋‹ค์–‘ํ•œ HDRI ํ™˜๊ฒฝ๋งต, ๋ฐฐ๊ฒฝ ํ…์Šค์ฒ˜, ์นด๋ฉ”๋ผ ๊ฐ๋„๋ฅผ ๋ฌด์ž‘์œ„ํ™”(domain randomization) - ์œ ๋ฆฌ์ปต, ํ”Œ๋ผ์Šคํ‹ฑ ๋ณ‘, ๋น„์ปค ๋“ฑ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ

์ฃผ๋ชฉํ•  ์ ์€ ๋ชจ๋ธ์„ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ํ•™์Šตํ•œ ๋’ค ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์ง์ ‘ ํ…Œ์ŠคํŠธํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. Sim-to-Real ๊ฐญ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋žœ๋”๋ง ํ’ˆ์งˆ๊ณผ ํ™˜๊ฒฝ ๋‹ค์–‘์„ฑ์— ๊ณต์„ ๋“ค์˜€๋‹ค.

ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ๋ Œ๋”๋ง ์ฝ”๋“œ๋„ ๋ณ„๋„ ๋ ˆํฌ(TransparentObjectRender)๋กœ ๊ณต๊ฐœ๋˜์–ด ์žˆ๋‹ค.


4. ์‹คํ—˜ ๊ฒฐ๊ณผ

4.1 ํ•˜๋“œ์›จ์–ด ์„ค์ •

์ปดํฌ๋„ŒํŠธ ์‚ฌ์–‘
๋กœ๋ด‡ ํŒ” UR5
๊ทธ๋ฆฌํผ Robotiq 2-finger
์ด‰๊ฐ ์„ผ์„œ GelSight (๊ณ ํ•ด์ƒ๋„ ๊ด‘ํ•™์‹)
๊นŠ์ด ์นด๋ฉ”๋ผ Intel RealSense D415/D435
๋ณด์ • ๋ฐฉ๋ฒ• Tsai hand-eye calibration

4.2 PokePreNet ํ‰๊ฐ€

๋ฐฉ๋ฒ• mAP
ํ‘œ์ค€ Cross-Entropy Loss (Mask R-CNN) 0.319
PN Loss (PokePreNet) 0.360

์•ฝ 13% ํ–ฅ์ƒ์ด๋‹ค. ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šตํ–ˆ์Œ์—๋„ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์ผ๋ฐ˜ํ™”๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค.

4.3 ํฌํ‚น ์„ฑ๊ณต๋ฅ  ๋น„๊ต

ํฌํ‚น ์œ„์น˜ ๊ฐ€์ด๋“œ ๋ฐฉ๋ฒ• ํฌํ‚น ์„ฑ๊ณต๋ฅ 
๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ค‘์‹ฌ 78.4%
์ธ์Šคํ„ด์Šค ๋งˆ์Šคํฌ ์ค‘์‹ฌ 84.3%
PokePreNet ํฌํ‚น ์˜์—ญ ์ค‘์‹ฌ (PN Loss) 89.8%

ํฌํ‚น ์„ฑ๊ณต๋ฅ ์€ GelSight๊ฐ€ ๋ฌผ์ฒด์— ์‹ค์ œ๋กœ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ ‘์ด‰ํ•ด ์œ ํšจํ•œ ์ด‰๊ฐ ์ด๋ฏธ์ง€๋ฅผ ์–ป์—ˆ๋Š”์ง€ ์—ฌ๋ถ€๋กœ ์ •์˜๋œ๋‹ค.

4.4 ์ตœ์ข… ํŒŒ์ง€ ์„ฑ๊ณต๋ฅ  (ํ•ต์‹ฌ ๊ฒฐ๊ณผ)

๋ฐฉ๋ฒ• ํŒŒ์ง€ ์„ฑ๊ณต๋ฅ 
๋น„์ „ ๊ธฐ๋ฐ˜ ์ง์ ‘ ํŒŒ์ง€ (ClearGrasp ๋“ฑ) 38.9%
๋น„์ „ ๊ฐ€์ด๋“œ ์ด‰๊ฐ ํฌํ‚น (์ œ์•ˆ ๋ฐฉ๋ฒ•) 85.2%

+46.3%p์˜ ์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ. ์ด๊ฒƒ์ด ์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์ˆ˜์น˜๋‹ค. ๊ธฐ์กด ์ˆœ์ˆ˜ ๋น„์ „ ๋ฐฉ๋ฒ•์ด 40%๋„ ์•ˆ ๋˜๋Š” ์„ฑ๊ณต๋ฅ ๋กœ ๊ณ ์ „ํ•˜๋Š” ํˆฌ๋ช… ๋ฌผ์ฒด ํŒŒ์ง€๋ฅผ, ์ด‰๊ฐ ํฌํ‚น ํ•˜๋‚˜๋กœ 85%๋ฅผ ๋„˜๊ธด๋‹ค.


5. ๊ฐ•์  ๋ถ„์„

โ‘  ๋ฌธ์ œ ์ •์‹ํ™”์˜ ์šฐ์•„ํ•จ. โ€œํฌํ‚น ์˜์—ญโ€์ด๋ผ๋Š” ๊ฐœ๋… ์ž์ฒด๊ฐ€ ์˜๋ฆฌํ•˜๋‹ค. ๊ทธ๋ƒฅ โ€œ์–ด๋””๋“  ๊ฑด๋“œ๋ ค๋ดโ€๊ฐ€ ์•„๋‹ˆ๋ผ, ์ •๋ณด๋Ÿ‰(์ข‹์€ ์ด‰๊ฐ ์ฝ๊ธฐ)๊ณผ ๋ถ€์ž‘์šฉ(๋ฌผ์ฒด ๊ต๋ž€ ์ตœ์†Œํ™”)์„ ๋™์‹œ์— ์ตœ์ ํ™”ํ•˜๋Š” ์œ„์น˜๋ฅผ ๋น„์ „์œผ๋กœ ์˜ˆ์ธกํ•œ๋‹ค๋Š” ์•„์ด๋””์–ด๊ฐ€ ์ง๊ด€์ ์ด๋ฉด์„œ๋„ ์‹ค์šฉ์ ์ด๋‹ค.

โ‘ก ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋กœ Sim-to-Real ์„ฑ๊ณต. ํˆฌ๋ช… ๋ฌผ์ฒด์— ๋Œ€ํ•œ ์‹ค๋ฐ์ดํ„ฐ ๋ ˆ์ด๋ธ”๋ง์˜ ์–ด๋ ค์›€์„ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋กœ ์šฐํšŒํ•˜๊ณ , domain randomization์œผ๋กœ ์‹ค์ œ ํ™˜๊ฒฝ์—์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ๊นŒ์ง€ ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœํ•œ ์ ๋„ ๊ฐ•์ ์ด๋‹ค.

โ‘ข ๋ชจ๋“ˆ์„ฑ๊ณผ ํ™•์žฅ์„ฑ. GelSight์— ํŠนํ™”๋œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ GelTip, TacTip ๋“ฑ ๋‹ค๋ฅธ ๊ด‘ํ•™์‹ ์ด‰๊ฐ ์„ผ์„œ์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ํฌํ‚น ์˜์—ญ ์˜ˆ์ธก ๋ชจ๋“ˆ๊ณผ ํŒŒ์ง€ ๊ณ„ํš ๋ชจ๋“ˆ์ด ๋ถ„๋ฆฌ๋˜์–ด ๊ฐ๊ฐ ๊ต์ฒด ๊ฐ€๋Šฅํ•˜๋‹ค.

โ‘ฃ ์žฌํ˜„ ๊ฐ€๋Šฅ์„ฑ. ์ฝ”๋“œ, ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ๋ Œ๋”๋Ÿฌ, ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ ๋ชจ๋‘ ๊ณต๊ฐœ. T-Mech๊ธ‰ ๋…ผ๋ฌธ์œผ๋กœ๋Š” ๋“œ๋ฌผ๊ฒŒ ์™„์ „ํ•œ ์žฌํ˜„ ํ™˜๊ฒฝ์„ ์ œ๊ณตํ•œ๋‹ค.


6. ํ•œ๊ณ„ ๋ฐ ๋น„ํŒ์  ๋ถ„์„

โ‘  Sequential ํŒŒ์ดํ”„๋ผ์ธ์˜ latency. ์‹œ๊ฐ ์˜ˆ์ธก โ†’ ํฌํ‚น ์ด๋™ โ†’ ์ด‰๊ฐ ํš๋“ โ†’ ํŒŒ์ง€ ๊ณ„ํš์˜ ์ˆœ์ฐจ์  ๊ตฌ์กฐ๋Š” ๊ฐ ๋‹จ๊ณ„์— ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฐ๋‹ค. ํŠนํžˆ ํฌํ‚น ๋™์ž‘ ์ž์ฒด๊ฐ€ ๋ฌผ๋ฆฌ์  ์ด๋™์ด๋ผ ์‹ค์‹œ๊ฐ„ ์‘์šฉ์—์„œ ๋ณ‘๋ชฉ์ด ๋œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด์— ๋Œ€ํ•œ ์‹œ๊ฐ„ ์ธก์ •์น˜๊ฐ€ ์ œ์‹œ๋˜์ง€ ์•Š๋Š”๋‹ค.

โ‘ก ํฌํ‚น ์˜์—ญ์ด ์—†๋Š” ๋ฌผ์ฒด. ํ”Œ๋žซํ•œ ์œ ๋ฆฌํŒ์ฒ˜๋Ÿผ ์ˆ˜ํ‰ ์ƒ๋‹จ๋ฉด์ด ์—†๊ฑฐ๋‚˜ ๋งค์šฐ ์ž‘์€ ๋ฌผ์ฒด์—๋Š” ํฌํ‚น ์˜์—ญ์„ ์ •์˜ํ•˜๊ธฐ ์–ด๋ ต๋‹ค. ๋…ผ๋ฌธ์˜ ์‹คํ—˜ ๋Œ€์ƒ์€ ๋ชจ๋‘ ์ปตยท๋ณ‘ยท๋น„์ปค ๋“ฑ ์ƒ๋‹จ๋ฉด์ด ๋ช…ํ™•ํ•œ ํ˜•ํƒœ๋กœ ํ•œ์ •๋œ๋‹ค.

โ‘ข Heuristic Grasp์˜ ๋‹จ์ˆœ์„ฑ. ํŒŒ์ง€ ๊ณ„ํš์ด ๊ธฐํ•˜ํ•™์  ํœด๋ฆฌ์Šคํ‹ฑ์— ๊ธฐ๋ฐ˜ํ•˜๋‹ค ๋ณด๋‹ˆ, ๋ณต์žกํ•œ ํ˜•ํƒœ(๋น„๋Œ€์นญ, ์†์žก์ด ์žˆ๋Š” ๋ฌผ์ฒด ๋“ฑ)์—์„œ๋Š” ์ตœ์  ํŒŒ์ง€ ํฌ์ฆˆ๋ฅผ ์ฐพ์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•™์Šต ๊ธฐ๋ฐ˜ ํŒŒ์ง€ ๊ณ„ํš๊ณผ ๊ฒฐํ•ฉํ•˜๋ฉด ๋” ๊ฐ•๊ฑดํ•ด์งˆ ๊ฒƒ์ด๋‹ค.

โ‘ฃ ๊ทธ๋ฆฌํผ ๊ธฐ๋ฐ˜ ์‹คํ—˜์˜ ํ•œ๊ณ„. ์‹คํ—˜์ด UR5 + 2-finger ๊ทธ๋ฆฌํผ ์กฐํ•ฉ์— ๊ตญํ•œ๋œ๋‹ค. ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ๋ฑ์Šคํ„ฐ๋Ÿฌ์Šค ํ•ธ๋“œ๋กœ ํ™•์žฅํ•˜๋ฉด ํฌํ‚น ์ดํ›„์˜ ์„ธ๋ฐ€ํ•œ ํŒŒ์ง€ ์ „๋žต์ด ๋” ์ค‘์š”ํ•ด์ง€๋Š”๋ฐ, ์ด ๋…ผ๋ฌธ์€ ๊ทธ ๋ฐฉํ–ฅ์„ ๋‹ค๋ฃจ์ง€ ์•Š๋Š”๋‹ค. ์ด๊ฒƒ์ด ํ˜„์žฌ ๊ฐ€์žฅ ํฐ ์—ฐ๊ตฌ ๊ณต๋ฐฑ์ด์ž ํ™•์žฅ ๊ธฐํšŒ๋‹ค.

โ‘ค ๋‹จ์ผ ํฌํ‚น. ๋ฌผ์ฒด ํ•˜๋‚˜๋‹น ํฌํ‚น ํ•œ ๋ฒˆ์œผ๋กœ ํŒŒ์ง€ ๊ฒฐ์ •์„ ๋‚ด๋ฆฐ๋‹ค. ๋ณต์žกํ•œ ํ˜•ํƒœ์—์„œ๋Š” ์—ฌ๋Ÿฌ ๋ฒˆ์˜ ์ „๋žต์  ํฌํ‚น์ด ๋” ๋‚˜์€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค. TransTouch(IROS 2023)๊ฐ€ ์ด โ€œ์–ด๋””๋ฅผ ๋” ๊ฑด๋“œ๋ฆด์ง€โ€ ๋ฌธ์ œ๋ฅผ utility function์œผ๋กœ ์ตœ์ ํ™”ํ•œ๋‹ค.


7. ํ›„์† ์—ฐ๊ตฌ์™€์˜ ์—ฐ๊ฒฐ

์ด ๋…ผ๋ฌธ์€ โ€œํˆฌ๋ช… ๋ฌผ์ฒด + ์ด‰๊ฐโ€ ์—ฐ๊ตฌ ํ๋ฆ„์˜ ์ถœ๋ฐœ์ ์ด ๋œ๋‹ค. ์ดํ›„ ์—ฐ๊ตฌ๋“ค์ด ๊ฐ๊ฐ ๋‹ค๋ฅธ ๋ฐฉํ–ฅ์—์„œ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•œ๋‹ค.

flowchart TD
    A["Where Shall I Touch?\n(T-Mech 2022)\nโ€ข ํฌํ‚น ์œ„์น˜๋ฅผ ๋น„์ „์œผ๋กœ ์˜ˆ์ธก\nโ€ข UR5 + 2-finger + GelSight"] --> B
    A --> C
    A --> D

    B["TransTouch (IROS 2023)\nโ€ข utility function์œผ๋กœ\n  ์ตœ์  ํฌํ‚น ์œ„์น˜ ์„ ํƒ\nโ€ข ์Šคํ…Œ๋ ˆ์˜ค ๋„คํŠธ์›Œํฌ ์ž์ฒด๋ฅผ\n  ์ด‰๊ฐ์œผ๋กœ ๊ต์ •"]

    C["Visual-Tactile Fusion\n(T-RO 2023)\nโ€ข ๋ณต์žกํ•œ ๋ฐฐ๊ฒฝ, ์ˆ˜์ค‘ ํ™˜๊ฒฝ\nโ€ข ๋น„์ฃผ์–ผ-ํƒํƒ€์ผ ํ“จ์ „ ๋ถ„๋ฅ˜\nโ€ข TaTa ์†Œํ”„ํŠธ ๊ทธ๋ฆฌํผ"]

    D["TEVG (IEEE 2025)\nโ€ข ๋ฌด๊ฒŒยท๋†“์ž„ ์ƒํƒœ ๋ถˆํ™•์‹ค์„ฑ\nโ€ข ๋น„์ „ ๋Šฅ๋ ฅ์„ ์ด‰๊ฐ์œผ๋กœ ๊ฐ•ํ™”"]

9. ์š”์•ฝ

ํ•ญ๋ชฉ ๋‚ด์šฉ
ํ•ต์‹ฌ ๊ธฐ์—ฌ ํˆฌ๋ช… ๋ฌผ์ฒด ํŒŒ์ง€๋ฅผ ์œ„ํ•œ ๋น„์ „ ๊ฐ€์ด๋“œ ์ด‰๊ฐ ํฌํ‚น ํ”„๋ ˆ์ž„์›Œํฌ
๋ฐฉ๋ฒ•๋ก  PokePreNet (PN Loss + ๊ณ ํ•ด์ƒ๋„ ๋งˆ์Šคํฌ) โ†’ GelSight ํฌํ‚น โ†’ Heuristic Grasp
๋ฐ์ดํ„ฐ Blender ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ 9,000์žฅ+, Sim-to-Real
ํ•ต์‹ฌ ๊ฒฐ๊ณผ ํŒŒ์ง€ ์„ฑ๊ณต๋ฅ  38.9% โ†’ 85.2% (+46.3%p)
๊ฐ•์  ์šฐ์•„ํ•œ ๋ฌธ์ œ ์ •์‹ํ™”, ๋ชจ๋“ˆ์„ฑ, ์™„์ „ ์˜คํ”ˆ์†Œ์Šค
ํ•œ๊ณ„ Sequential latency, ๊ทธ๋ฆฌํผ ํ•œ์ •, ๋‹จ์ˆœ ํŒŒ์ง€ ๊ณ„ํš
์—ฐ๊ตฌ ๊ฐญ ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ํ•ธ๋“œ๋กœ์˜ ํ™•์žฅ ๋ฏธ๊ฐœ์ฒ™

Copyright 2026, JungYeon Lee