Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • 1 ๐Ÿ” Ping Review
  • 2 ๐Ÿ”” Ring Review
    • 2.1 ์„œ๋ก : ์™œ ๋กœ๋ด‡์€ ์•„์ง๋„ ๋ฌผ๊ฑด์„ ์ œ๋Œ€๋กœ ๋งŒ์ง€์ง€ ๋ชปํ• ๊นŒ?
      • 2.1.1 ๋ฌธ์ œ์˜ ๋ณธ์งˆ
      • 2.1.2 NeuralFeels์˜ ๋“ฑ์žฅ
      • 2.1.3 ํ•ต์‹ฌ ๊ธฐ์—ฌ ์š”์•ฝ
    • 2.2 ๋ฐฐ๊ฒฝ ์ด๋ก : Neural Field๋ž€ ๋ฌด์—‡์ธ๊ฐ€?
      • 2.2.1 Signed Distance Function (SDF)์˜ ์ง๊ด€
      • 2.2.2 Neural SDF: MLP๋กœ SDF ํ•™์Šตํ•˜๊ธฐ
      • 2.2.3 Instant-NGP: ์™œ ๋น ๋ฅธ๊ฐ€?
    • 2.3 ๋ฐฉ๋ฒ•๋ก : NeuralFeels ํŒŒ์ดํ”„๋ผ์ธ ์ƒ์„ธ ๋ถ„์„
      • 2.3.1 ์ „์ฒด ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜
      • 2.3.2 ํ”„๋ก ํŠธ์—”๋“œ: ์„ผ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊นŠ์ด ๋งต์œผ๋กœ
      • 2.3.3 ๋ฐฑ์—”๋“œ: Neural SLAM
    • 2.4 ์‹คํ—˜ ์„ค์ • ๋ฐ ํ•˜๋“œ์›จ์–ด
      • 2.4.1 ๋กœ๋ด‡ ํ”Œ๋žซํผ
      • 2.4.2 DIGIT ์ด‰๊ฐ ์„ผ์„œ
      • 2.4.3 FeelSight ๋ฐ์ดํ„ฐ์…‹
      • 2.4.4 ์กฐ์ž‘ ์ •์ฑ…
    • 2.5 ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„
      • 2.5.1 ํ‰๊ฐ€ ์ง€ํ‘œ
      • 2.5.2 ์ •๋Ÿ‰์  ๊ฒฐ๊ณผ
      • 2.5.3 ์ •์„ฑ์  ๋ถ„์„
    • 2.6 ๊ธฐ์ˆ ์  ์‹ฌํ™”: ์ˆ˜ํ•™์  ๋ฐฐ๊ฒฝ
      • 2.6.1 Lie Group SE(3)์—์„œ์˜ ์ž์„ธ ํ‘œํ˜„
      • 2.6.2 Truncated SDF (TSDF) vs Neural SDF
      • 2.6.3 Sim-to-Real Transfer์˜ ํ•ต์‹ฌ
    • 2.7 ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
      • 2.7.1 ์‹œ๊ฐ-์ด‰๊ฐ SLAM ๊ณ„๋ณด
      • 2.7.2 ์ฃผ์š” ๋น„๊ต ๋Œ€์ƒ
    • 2.8 ๋น„ํŒ์  ๊ณ ์ฐฐ
      • 2.8.1 ๊ฐ•์ 
      • 2.8.2 ์•ฝ์  ๋ฐ ํ•œ๊ณ„
      • 2.8.3 ๋ฏธ๋ž˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ ์ œ์•ˆ
    • 2.9 ์‹ค์Šต ๊ฐ€์ด๋“œ: NeuralFeels ์„ค์น˜ ๋ฐ ์‹คํ–‰
      • 2.9.1 ํ™˜๊ฒฝ ์„ค์ •
      • 2.9.2 ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ
      • 2.9.3 ์‹คํ–‰ ์˜ˆ์‹œ
      • 2.9.4 ํ•„์ˆ˜ ํ•˜๋“œ์›จ์–ด
    • 2.10 ๊ฒฐ๋ก 
      • 2.10.1 ํ•ต์‹ฌ ๋ฉ”์‹œ์ง€
      • 2.10.2 Feynman์‹ ์š”์•ฝ
      • 2.10.3 ์—ฐ๊ตฌ์ž๋ฅผ ์œ„ํ•œ ์กฐ์–ธ
    • 2.11 ์ฐธ๊ณ  ๋ฌธํ—Œ
  • 3 โ›๏ธ Dig Review
    • 3.1 ์„œ๋ก : ์‹œ๊ฐ๊ณผ ์ด‰๊ฐ์˜ ๊ฒฐํ•ฉ์ด ํ•„์š”ํ•œ ์ด์œ 
    • 3.2 ๋ฐฉ๋ฒ•: NeuralFeels์˜ visuo-tactile SLAM ์•Œ๊ณ ๋ฆฌ์ฆ˜
      • 3.2.1 ํ”„๋ก ํŠธ์—”๋“œ: ์‹œ๊ฐ-์ด‰๊ฐ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
      • 3.2.2 ๋ฐฑ์—”๋“œ: Neural Field ๊ธฐ๋ฐ˜ ํ˜•ํƒœ ํ•™์Šต๊ณผ ์ž์„ธ ์ถ”์ 
    • 3.3 ์‹คํ—˜: ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ
    • 3.4 ๋น„ํŒ์  ๊ณ ์ฐฐ: ๊ฐ•์ , ์•ฝ์ ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ
    • 3.5 ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

๐Ÿ“ƒNeural feels with neural fields ๋ฆฌ๋ทฐ

paper
tactile
sdf
Visuo-tactile perception for in-hand manipulation
Published

June 4, 2025

๐Ÿ” Ping. ๐Ÿ”” Ring. โ›๏ธ Dig. A tiered review series: quick look, key ideas, deep dive.

  • ๋…ผ๋ฌธ ๋งํฌ (arXiv)
  • Code
  1. ๊ธฐ์กด ๋กœ๋ด‡์˜ In-hand manipulation์€ ์‹œ๊ฐ์—๋งŒ ์˜์กดํ•˜์—ฌ ๊ฐ€๋ ค์ง(occlusion)์— ์ทจ์•ฝํ•˜๋ฉฐ ์ƒˆ๋กœ์šด ๊ฐ์ฒด(novel objects)์— ๋Œ€ํ•œ ๊ณต๊ฐ„ ์ธ์‹์ด ๋ถ€์กฑํ•œ๋ฐ, NeuralFeels๋Š” ๋น„์ „(vision)๊ณผ ์ด‰๊ฐ(touch)์„ ํ†ตํ•ฉํ•˜์—ฌ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” visuo-tactile perception ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ์ด ๋ฐฉ๋ฒ•์€ ์˜จ๋ผ์ธ์—์„œ ๋‰ด๋Ÿด ํ•„๋“œ(neural field)๋ฅผ ํ•™์Šตํ•˜๊ณ  ์ž์„ธ ๊ทธ๋ž˜ํ”„(pose graph) ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด ๊ฐ์ฒด๋ฅผ ์ถ”์ ํ•˜๋ฉฐ, ํŠนํžˆ vision-based tactile sensors๋ฅผ ๊ตญ์†Œ์ ์ธ ๊นŠ์ด ์ •๋ณด(local depth information) ์†Œ์Šค๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  3. NeuralFeels๋Š” novel objects์— ๋Œ€ํ•ด 81%์˜ ๋†’์€ F-score์™€ 4.7mm์˜ ํ‰๊ท  ์ž์„ธ ์˜ค์ฐจ๋ฅผ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ์‹œ๊ฐ์  ๊ฐ€๋ ค์ง(visual occlusion)์ด ์‹ฌํ•œ ํ™˜๊ฒฝ์—์„œ vision-only ๋ฐฉ์‹๋ณด๋‹ค ์ตœ๋Œ€ 94% ํ–ฅ์ƒ๋œ ์ถ”์  ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์ด‰๊ฐ์˜ ์ค‘์š”์„ฑ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.


1 ๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ โ€œNeural feels with neural fields: Visuo-tactile perception for in-hand manipulationโ€์ด๋ผ๋Š” ์ œ๋ชฉ์œผ๋กœ, ์ƒˆ๋กœ์šด ๋ฌผ์ฒด์— ๋Œ€ํ•œ ๋กœ๋ด‡์˜ ์† ์•ˆ ์กฐ์ž‘(in-hand manipulation)์„ ์œ„ํ•œ ์‹œ๊ฐ-์ด‰๊ฐ(visuo-tactile) ์ธ์‹์„ ๋‹ค๋ฃจ๋ฉฐ, ์ธ๊ฐ„ ์ˆ˜์ค€์˜ ๋ฏผ์ฒฉ์„ฑ(dexterity) ๋‹ฌ์„ฑ์„ ์œ„ํ•œ ๊ณต๊ฐ„ ์ธ์‹์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์† ์•ˆ ์ธ์‹(in-hand perception) ์‹œ์Šคํ…œ์ด ์‹œ๊ฐ์—๋งŒ ์˜์กดํ•˜๊ณ  ๊ธฐ์ง€(a priori known) ๋ฌผ์ฒด ์ถ”์ ์— ๊ตญํ•œ๋˜๋ฉฐ, ์กฐ์ž‘ ์ค‘ ์‹œ๊ฐ์  ๊ฐ€๋ ค์ง(visual occlusion)์— ์ทจ์•ฝํ•˜๋‹ค๋Š” ๋ฌธ์ œ์ ์„ ์ œ๊ธฐํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, ๋ณธ ๋…ผ๋ฌธ์€ โ€™NeuralFeelsโ€™๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. NeuralFeels๋Š” ๋‹ค์ง€(multi-fingered) ๋กœ๋ด‡ ์†์— ์‹œ๊ฐ ์„ผ์„œ์™€ ์ด‰๊ฐ ์„ผ์„œ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์กฐ์ž‘ ์ค‘ ๋ฌผ์ฒด์˜ ์ž์„ธ(pose)์™€ ํ˜•์ƒ(shape)์„ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ์˜จ๋ผ์ธ ์‹ ๊ฒฝ์žฅ(Neural Field) ํ•™์Šต: ๋ฌผ์ฒด์˜ ๊ธฐํ•˜ํ•™์  ์ •๋ณด๋ฅผ ์‹ ๊ฒฝ์žฅ์œผ๋กœ ์˜จ๋ผ์ธ์—์„œ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์‹ ๊ฒฝ์žฅ(neural field)์€ ๊ณต๊ฐ„์ƒ์˜ 3D ์ขŒํ‘œ p \in \mathbb{R}^3๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„, ํ•ด๋‹น ์ขŒํ‘œ์—์„œ ๋ฌผ์ฒด ํ‘œ๋ฉด๊นŒ์ง€์˜ ๋ถ€ํ˜ธํ™”๋œ ๊ฑฐ๋ฆฌ(signed distance)๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ์—ฐ์†์ ์ธ ํ•จ์ˆ˜ F_{\theta, \mathbf{x}_t}(p): \mathbb{R}^3 \to \mathbb{R}๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ \theta๋Š” ์‹ ๊ฒฝ๋ง์˜ ๊ฐ€์ค‘์น˜, \mathbf{x}_t๋Š” ํ˜„์žฌ ์‹œ๊ฐ„ t์—์„œ์˜ ๋ฌผ์ฒด์˜ ์ž์„ธ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด ์‹ ๊ฒฝ์žฅ์€ โ€˜instant-NGPโ€™ [49]์™€ ๊ฐ™์ด ๋‹ค์ค‘ ํ•ด์ƒ๋„ ํ•ด์‹œ ํ…Œ์ด๋ธ”(multiresolution hash table)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋น ๋ฅธ ํ•™์Šต๊ณผ ์ตœ์ ํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

  2. ์ž์„ธ ๊ทธ๋ž˜ํ”„(Pose Graph) ์ตœ์ ํ™”: ํ•™์Šต๋œ ์‹ ๊ฒฝ์žฅ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌผ์ฒด์˜ ์ž์„ธ๋ฅผ ๋™์‹œ์— ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” SLAM(Simultaneous Localization and Mapping)์˜ ์›๋ฆฌ์™€ ์œ ์‚ฌํ•˜๊ฒŒ ์ถ”์ (tracking)๊ณผ ๋งคํ•‘(mapping)์„ ๋ฒˆ๊ฐˆ์•„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

    • ํ”„๋ŸฐํŠธ์—”๋“œ(Frontend): ๋กœ๋ด‡์˜ RGB-D ์นด๋ฉ”๋ผ๋กœ๋ถ€ํ„ฐ ์–ป์€ ์‹œ๊ฐ ๋ฐ์ดํ„ฐ์™€ DIGIT ์ด‰๊ฐ ์„ผ์„œ๋กœ๋ถ€ํ„ฐ ์–ป์€ ์ด‰๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ ์‹ฌ์ธต(depth) ์ธก์ •๊ฐ’์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
      • ์‹œ๊ฐ ๊นŠ์ด(Visual Depth) ๋ถ„ํ• : โ€˜Segment Anything Model (SAM)โ€™ [36]๊ณผ ๋กœ๋ด‡์˜ ์šด๋™ํ•™์  ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฌผ์ฒด ์‹ฌ์ธต ํ”ฝ์…€์„ ๊ฐ•๊ฑดํ•˜๊ฒŒ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค. grasp center์™€ robot kinematics๋ฅผ SAM์˜ ํ”„๋กฌํ”„ํŠธ(prompt)๋กœ ์‚ฌ์šฉํ•˜์—ฌ, ๋ฌผ์ฒด๊ฐ€ ๋กœ๋ด‡ ์†๊ฐ€๋ฝ ์‚ฌ์ด์— ์กด์žฌํ•œ๋‹ค๋Š” ๊ฐ€์ •์„ ํ†ตํ•ด occluded interaction์—์„œ๋„ ์ •ํ™•ํ•œ ๋ถ„ํ• ์„ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.
      • ์ด‰๊ฐ ํŠธ๋žœ์Šคํฌ๋จธ(Tactile Transformer): DIGIT ์„ผ์„œ์˜ RGB ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์ ‘์ด‰ ๊นŠ์ด(contact depth)๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” TACTO [78] ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๋Œ€๊ทœ๋ชจ ์ด‰๊ฐ ์ด๋ฏธ์ง€ ์ฝ”ํผ์Šค(corpus)๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ํ•™์Šต๋œ vision transformer [58] ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์‹ค์„ธ๊ณ„์˜ ๋‹ค์–‘ํ•œ DIGIT ์„ผ์„œ์— ์ผ๋ฐ˜ํ™”๋˜๋ฉฐ, Sim-to-Real transfer๋ฅผ ์œ„ํ•ด ์„ผ์„œ์˜ LED ์กฐ๋ช…, ์••์ž… ๊นŠ์ด, ํ”ฝ์…€ ๋…ธ์ด์ฆˆ ๋“ฑ์— ๋Œ€ํ•œ ๋ฌด์ž‘์œ„ํ™”(randomization)๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • ๋ฐฑ์—”๋“œ(Backend): ํ”„๋ŸฐํŠธ์—”๋“œ์—์„œ ์–ป์€ ์‹ฌ์ธต ์ธก์ •๊ฐ’๊ณผ ์„ผ์„œ ์ž์„ธ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฌผ์ฒด ๋ชจ๋ธ์„ ์˜จ๋ผ์ธ์—์„œ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.
      • ํ˜•์ƒ ์ตœ์ ํ™”๊ธฐ(Shape Optimizer): ์ตœ์ ํ™”๋Š” online learning ์ ‘๊ทผ๋ฒ• [69, 52]์„ ๋”ฐ๋ฅด๋ฉฐ, ํ˜„์žฌ ์‹ ๊ฒฝ๋ง ๊ฐ€์ค‘์น˜ \bar{\theta}๋ฅผ ๊ณ ์ •ํ•˜๊ณ  ์ž์„ธ \bar{\mathbf{x}}_t๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹ ๊ฒฝ์žฅ(neural field)์˜ ๊ฐ€์ค‘์น˜ \theta๋ฅผ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ผ์ • ๊ฐ„๊ฒฉ๋งˆ๋‹ค ํ•ต์‹ฌ ํ”„๋ ˆ์ž„(keyframes) K๋ฅผ ์„ ํƒํ•˜์—ฌ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. SDF ์†์‹ค ํ•จ์ˆ˜ L_{\text{shape}} = L_f + w_{\text{tr}}L_{\text{tr}}๋Š” ๋ฌผ์ฒด ํ‘œ๋ฉด ๊ทผ์ฒ˜์˜ ์ (surface pixels)๊ณผ ์ž์œ  ๊ณต๊ฐ„(free-space pixels) ๋ชจ๋‘๋ฅผ ํ™œ์šฉํ•˜๋ฉฐ, ์‹ ๊ฒฝ๋ง ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. L_f๋Š” ์ž์œ  ๊ณต๊ฐ„ ํ”ฝ์…€์— ๋Œ€ํ•œ ์†์‹ค์ด๊ณ , L_{\text{tr}}์€ ์ ˆ๋‹จ๋œ SDF(truncated SDF) ์†์‹ค์ž…๋‹ˆ๋‹ค.
      • ์ž์„ธ ์ตœ์ ํ™”๊ธฐ(Pose Optimizer): ๊ณ ์ •๋œ ์‹ ๊ฒฝ๋ง ๊ฐ€์ค‘์น˜ \bar{\theta}๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌผ์ฒด์˜ ์ž์„ธ \mathbf{x}_t๋ฅผ ๋ฏธ๋„๋Ÿฌ์ง€๋Š” ์ฐฝ(sliding window) ํฌ๊ธฐ n์„ ๊ฐ–๋Š” ์ž์„ธ ๊ทธ๋ž˜ํ”„(pose graph) [13]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” nonlinear least squares ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ ์ •์‹ํ™”๋˜๋ฉฐ, Theus [55]์˜ Levenbergโ€“Marquardt (LM) solver๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์†์‹ค ํ•จ์ˆ˜๋Š” L_{\text{pose}} = w_{\text{sdf}}L_{\text{sdf}} + w_{\text{reg}}L_{\text{reg}} + w_{\text{icp}}L_{\text{icp}}๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
        • L_{\text{sdf}}: ๊ฐ ๊ด‘์„ (ray)์˜ ํ‘œ๋ฉด์ ์— ๋Œ€ํ•œ SDF ์†์‹ค์ž…๋‹ˆ๋‹ค.
        • L_{\text{reg}}: ์—ฐ์†๋œ ํ•ต์‹ฌ ํ”„๋ ˆ์ž„ ์ž์„ธ ์‚ฌ์ด์˜ ์•ฝํ•œ ์ •๊ทœํ™”(regularizer) ํ•ญ์ž…๋‹ˆ๋‹ค.
        • L_{\text{icp}}: ํ˜„์žฌ ์‹œ๊ฐ-์ด‰๊ฐ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์™€ ์ด์ „ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ์‚ฌ์ด์˜ Iterative Closest Point (ICP) ์†์‹ค์ž…๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์— ๋Œ€ํ•œ ์ด 70๊ฐ€์ง€ ์‹คํ—˜์„ ํ†ตํ•ด NeuralFeels์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • SLAM ์„ฑ๋Šฅ (Novel Objects): ์ƒˆ๋กœ์šด ๋ฌผ์ฒด์— ๋Œ€ํ•œ ์ตœ์ข… ์žฌ๊ตฌ์„ฑ F-score๋Š” ํ‰๊ท  81%๋ฅผ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ์ž์„ธ ๋“œ๋ฆฌํ”„ํŠธ(pose drift)๋Š” 4.7 mm๋กœ ์•ˆ์ •์ ์ธ ์ถ”์ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ด‰๊ฐ ํ†ตํ•ฉ ์‹œ ์žฌ๊ตฌ์„ฑ ์ •ํ™•๋„๋Š” 15.3%, ์ž์„ธ ์ถ”์  ์ •ํ™•๋„๋Š” 21.3% ํ–ฅ์ƒ๋˜์—ˆ๊ณ , ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋Š” ๊ฐ๊ฐ 14.6%์™€ 26.6% ํ–ฅ์ƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ๊ฐ ์ „์šฉ(vision-only) ๋ฐฉ์‹์— ๋น„ํ•ด ์ถ”์  ์‹คํŒจ์œจ์ด ํฌ๊ฒŒ ๊ฐ์†Œํ–ˆ์Šต๋‹ˆ๋‹ค (์˜ˆ: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ 153ํšŒ ์‹คํŒจ, NeuralFeels๋Š” 5ํšŒ ์‹คํŒจ).
  • ์ž์„ธ ์ถ”์  ์„ฑ๋Šฅ (Known Objects): CAD ๋ชจ๋ธ์ด ์ฃผ์–ด์ง„ ๊ฒฝ์šฐ, ์ด‰๊ฐ ํ†ตํ•ฉ์€ ์ž์„ธ ์ถ”์ •์น˜๋ฅผ ๋”์šฑ ์ •์ œํ•˜์—ฌ ํ‰๊ท  ์ž์„ธ ์˜ค์ฐจ๋ฅผ 2.3 mm๊นŒ์ง€ ์ค„์˜€์Šต๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” 22.29%, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋Š” 3.9%์˜ ํ‰๊ท  ์ž์„ธ ์˜ค์ฐจ ๊ฐ์†Œ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • ๊ฐ€๋ ค์ง ๋ฐ ์„ผ์‹ฑ ๋…ธ์ด์ฆˆ ํ•˜์˜ ์„ฑ๋Šฅ: ์‹œ๊ฐ์  ๊ฐ€๋ ค์ง์ด ์‹ฌํ•œ ๊ฒฝ์šฐ ์ด‰๊ฐ์˜ ํ†ตํ•ฉ์€ ์ถ”์  ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ 94%๊นŒ์ง€ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์‹œ๊ฐ ์‹ฌ์ธต(visual depth) ๋ฐ์ดํ„ฐ์— ๋…ธ์ด์ฆˆ๊ฐ€ ํด ๋•Œ๋„ ์ด‰๊ฐ์€ ์ž์„ธ ์ถ”์  ์˜ค์ฐจ๋ฅผ ํฌ๊ฒŒ ์ค„์ด๋Š” ๋ฐ ๊ธฐ์—ฌํ–ˆ์Šต๋‹ˆ๋‹ค.

NeuralFeels๋Š” ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ๊ฒฌ๊ณ ํ•œ ๋ฌผ์ฒด ์ค‘์‹ฌ SLAM์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ํ’๋ถ€ํ•œ ๊ฐ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹œ๊ฐ ์ „์šฉ ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•ฉ๋‹ˆ๋‹ค. ์ด‰๊ฐ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์€ ํ”„๋ŸฐํŠธ์—”๋“œ ์ถ”์ •์น˜๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ํ•˜๊ณ , ๊ฐ€๋ ค์ง ์ƒํ™ฉ์—์„œ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ ์žฌ๊ตฌ์„ฑ ์™„์„ฑ๋„์™€ ์ •๋ฐ€๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ์ด๋Š” ์‹œ๊ฐ๊ณผ ์ด‰๊ฐ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์˜ ์ƒ๋ณด์ (complementary) ํŠน์„ฑ์„ ๊ฐ•์กฐํ•˜๋ฉฐ, ์˜จ๋ผ์ธ ํ•™์Šต๊ณผ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ๋ชจ๋“ˆ์‹ ๊ฒฐํ•ฉ์ด ์ ์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.


2 ๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

2.1 ์„œ๋ก : ์™œ ๋กœ๋ด‡์€ ์•„์ง๋„ ๋ฌผ๊ฑด์„ ์ œ๋Œ€๋กœ ๋งŒ์ง€์ง€ ๋ชปํ• ๊นŒ?

2.1.1 ๋ฌธ์ œ์˜ ๋ณธ์งˆ

์—ฌ๋Ÿฌ๋ถ„, ์ž ๊น ๋ˆˆ์„ ๊ฐ๊ณ  ์ฃผ๋จธ๋‹ˆ ์† ์—ด์‡ ๋ฅผ ๊บผ๋‚ด๋ณด์„ธ์š”. ๋†€๋ž์ง€ ์•Š๋‚˜์š”? ๋ณด์ง€๋„ ์•Š๊ณ , ์ˆ˜์‹ญ ๊ฐœ์˜ ๋ฌผ๊ฑด ์ค‘์—์„œ ์ •ํ™•ํžˆ ์—ด์‡ ๋ฅผ ์ฐพ์•„ ์›ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋Œ๋ ค ๊บผ๋ƒˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ์šฐ๋ฆฌ ์†๊ฐ€๋ฝ์€ ๋Š์ž„์—†์ด ๋ฌผ์ฒด์˜ ํ˜•ํƒœ, ์œ„์น˜, ๋ฐฉํ–ฅ์„ โ€œ๋А๋ผ๋ฉฐโ€ ํŒŒ์•…ํ•ฉ๋‹ˆ๋‹ค.

Richard Feynman์ด ๋ฌผ๋ฆฌํ•™์„ ์„ค๋ช…ํ•  ๋•Œ ์ž์ฃผ ํ–ˆ๋˜ ์งˆ๋ฌธ์„ ๋นŒ๋ฆฌ์ž๋ฉด: โ€œ๋กœ๋ด‡์€ ์™œ ์ด๊ฑธ ๋ชปํ• ๊นŒ?โ€

๋ฌธ์ œ์˜ ํ•ต์‹ฌ์€ ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค:

  1. ์‹œ๊ฐ์˜ ํ•œ๊ณ„: ์†์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ์žก์œผ๋ฉด, ์ •์ž‘ ์ค‘์š”ํ•œ ๋ถ€๋ถ„(๋ฌผ์ฒด์™€ ์†๊ฐ€๋ฝ์ด ๋งŒ๋‚˜๋Š” ๊ณณ)์ด ์†๊ฐ€๋ฝ์— ๊ฐ€๋ ค์„œ ๋ณด์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  2. ์‚ฌ์ „ ์ง€์‹์˜ ์š”๊ตฌ: ํ˜„์žฌ ๋Œ€๋ถ€๋ถ„์˜ ์กฐ์ž‘ ์‹œ์Šคํ…œ์€ ๋ฏธ๋ฆฌ ์•Œ๊ณ  ์žˆ๋Š” ๋ฌผ์ฒด(CAD ๋ชจ๋ธ)๋งŒ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  3. ๋‹จ์ผ ๊ฐ๊ฐ ์˜์กด: ์‹œ๊ฐ OR ์ด‰๊ฐ, ๋‘˜ ์ค‘ ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉํ•˜๋Š” ์‹œ์Šคํ…œ์ด ๋Œ€๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค.

2.1.2 NeuralFeels์˜ ๋“ฑ์žฅ

Meta AI(FAIR), CMU, UC Berkeley์˜ ์—ฐ๊ตฌํŒ€์ด ๋ฐœํ‘œํ•œ NeuralFeels๋Š” ์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ์šฐ์•„ํ•œ ํ•ด๋‹ต์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๋†€๋ž๋„๋ก ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค:

โ€œ์ธ๊ฐ„์ฒ˜๋Ÿผ ์‹œ๊ฐ๊ณผ ์ด‰๊ฐ์„ ๋™์‹œ์— ์‚ฌ์šฉํ•˜๊ณ , ๋ฏธ์ง€์˜ ๋ฌผ์ฒด๋„ ๋งŒ์ง€๋ฉด์„œ ๋ฐฐ์šฐ์žโ€

์ด ๋…ผ๋ฌธ์€ Science Robotics(2024)์— ๊ฒŒ์žฌ๋˜์—ˆ์œผ๋ฉฐ, ๋กœ๋ด‡ ์†์ด ์ฒ˜์Œ ๋ณด๋Š” ๋ฌผ์ฒด๋ฅผ ์žก๊ณ  ํšŒ์ „์‹œํ‚ค๋ฉด์„œ ๋™์‹œ์— ๊ทธ ๋ฌผ์ฒด์˜:

  • 6-DoF ์ž์„ธ(Pose) ์ถ”์ 
  • 3D ํ˜•์ƒ(Shape) ๋ณต์›

์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ๋„ ์˜จ๋ผ์ธ, ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ง์ด์ฃ .

2.1.3 ํ•ต์‹ฌ ๊ธฐ์—ฌ ์š”์•ฝ

๊ธฐ์—ฌ ์˜์—ญ ๊ตฌ์ฒด์  ๋‚ด์šฉ
์‹œ์Šคํ…œ ์‹œ๊ฐ-์ด‰๊ฐ-๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ์„ ์œตํ•ฉํ•œ Object-centric SLAM
ํ‘œํ˜„ Neural SDF๋ฅผ ์˜จ๋ผ์ธ์œผ๋กœ ํ•™์Šตํ•˜์—ฌ ๋ฌผ์ฒด ํ˜•์ƒ ์ธ์ฝ”๋”ฉ
์•Œ๊ณ ๋ฆฌ์ฆ˜ Pose Graph ์ตœ์ ํ™”๋ฅผ ํ†ตํ•œ ๋™์‹œ ํ˜•์ƒ-์ž์„ธ ์ถ”์ •
ํ•™์Šต ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ „์šฉ ํ•™์Šต์œผ๋กœ ์‹ค์ œ ์„ผ์„œ์— ์ผ๋ฐ˜ํ™”๋˜๋Š” Tactile Transformer
๋ฐ์ดํ„ฐ์…‹ 70๊ฐœ ์‹คํ—˜ ์‹œํ€€์Šค์˜ FeelSight ๋ฒค์น˜๋งˆํฌ ๊ณต๊ฐœ

2.2 ๋ฐฐ๊ฒฝ ์ด๋ก : Neural Field๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

NeuralFeels๋ฅผ ์ดํ•ดํ•˜๋ ค๋ฉด ๋จผ์ € Neural Field(์‹ ๊ฒฝ์žฅ)๋ผ๋Š” ๊ฐœ๋…์„ ์•Œ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฑฑ์ • ๋งˆ์„ธ์š”, ์ƒ๊ฐ๋ณด๋‹ค ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค.

2.2.1 Signed Distance Function (SDF)์˜ ์ง๊ด€

3D ๊ณต๊ฐ„์˜ ์–ด๋–ค ์  \mathbf{x} = (x, y, z)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ทธ ์ ์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฌผ์ฒด ํ‘œ๋ฉด๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค:

\text{SDF}(\mathbf{x}) = \begin{cases} d > 0 & \text{๋ฌผ์ฒด ๋ฐ”๊นฅ (๊ฑฐ๋ฆฌ } d \text{)} \\ 0 & \text{๋ฌผ์ฒด ํ‘œ๋ฉด ์œ„} \\ d < 0 & \text{๋ฌผ์ฒด ๋‚ด๋ถ€ (๊ฑฐ๋ฆฌ } -d \text{)} \end{cases}

๋งˆ์น˜ ์ง€ํ˜•๋„์—์„œ ๋“ฑ๊ณ ์„ ์„ ๋ณด๋Š” ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. SDF ๊ฐ’์ด 0์ธ ๋“ฑ์œ„๋ฉด(level set)์ด ๋ฐ”๋กœ ๋ฌผ์ฒด์˜ ํ‘œ๋ฉด์ž…๋‹ˆ๋‹ค.

2.2.2 Neural SDF: MLP๋กœ SDF ํ•™์Šตํ•˜๊ธฐ

์ „ํ†ต์ ์œผ๋กœ SDF๋Š” Voxel Grid์— ์ €์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ•ด์ƒ๋„๋ฅผ ๋†’์ด๋ฉด ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ O(N^3)๋กœ ํญ๋ฐœํ•ฉ๋‹ˆ๋‹ค.

Neural SDF๋Š” ๋‹ค๋ฅธ ์ ‘๊ทผ๋ฒ•์„ ์ทจํ•ฉ๋‹ˆ๋‹ค:

f_\theta : \mathbb{R}^3 \rightarrow \mathbb{R}

์ž‘์€ ์‹ ๊ฒฝ๋ง f_\theta๊ฐ€ ์ขŒํ‘œ \mathbf{x}๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ SDF ๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด ์‹ ๊ฒฝ๋ง์˜ ๊ฐ€์ค‘์น˜ \theta๊ฐ€ ๊ณง ๋ฌผ์ฒด์˜ ํ˜•์ƒ์„ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค.

2.2.3 Instant-NGP: ์™œ ๋น ๋ฅธ๊ฐ€?

NeuralFeels๊ฐ€ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ํ•ต์‹ฌ ๋น„๋ฐ€์€ NVIDIA์˜ Instant-NGP ์•„ํ‚คํ…์ฒ˜์ž…๋‹ˆ๋‹ค.

2.2.3.1 Multi-Resolution Hash Encoding

์ผ๋ฐ˜ MLP๋Š” ๊ณ ์ฃผํŒŒ ๋””ํ…Œ์ผ์„ ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. Positional Encoding(Fourier Features)์„ ์“ฐ๋ฉด ๋˜์ง€๋งŒ, ํ•™์Šต์ด ๋А๋ฆฝ๋‹ˆ๋‹ค.

Instant-NGP์˜ ํ•ด๋ฒ•:

์ž…๋ ฅ ์ขŒํ‘œ x โ†’ [ํ•ด์‹œ ํ…Œ์ด๋ธ” ์กฐํšŒ] โ†’ ๋‹คํ•ด์ƒ๋„ ํŠน์ง• ๋ฒกํ„ฐ โ†’ ์ž‘์€ MLP โ†’ SDF ๊ฐ’
Tip๋น„์œ : ๋„์„œ๊ด€์—์„œ ์ฑ… ์ฐพ๊ธฐ

์ „ํ†ต์ ์ธ MLP๊ฐ€ โ€œ์ฑ… ๋‚ด์šฉ์„ ๋ชจ๋‘ ์•”๊ธฐํ•˜๋Š” ๊ฒƒโ€์ด๋ผ๋ฉด, Instant-NGP๋Š” โ€œ์ƒ‰์ธํ‘œ(ํ•ด์‹œ ํ…Œ์ด๋ธ”)๋ฅผ ๋งŒ๋“ค์–ด ํ•„์š”ํ•œ ํŽ˜์ด์ง€๋งŒ ๋น ๋ฅด๊ฒŒ ์ฐพ๋Š” ๊ฒƒโ€์ž…๋‹ˆ๋‹ค.

๋‹คํ•ด์ƒ๋„ ๊ทธ๋ฆฌ๋“œ๋ฅผ ํ•ด์‹œ ํ…Œ์ด๋ธ”์— ์ €์žฅํ•˜๊ณ , ๊ฐ ํ•ด์ƒ๋„์—์„œ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ๋ณด๊ฐ„(interpolate)ํ•˜์—ฌ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด:

  • ํ•™์Šต ์†๋„: ์ˆ˜์‹ญ ์ดˆ ๋งŒ์— ๊ณ ํ’ˆ์งˆ SDF ํ•™์Šต
  • ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ: ํ•ด์‹œ ์ถฉ๋Œ์„ ํ—ˆ์šฉํ•˜๋˜, ๋‹คํ•ด์ƒ๋„ ๊ตฌ์กฐ๊ฐ€ ๋ชจํ˜ธ์„ฑ์„ ํ•ด์†Œ
  • ์ฟผ๋ฆฌ ์†๋„: ์ˆ˜๋ฐฑ ms ๋งŒ์— ๋ฉ”์‰ฌ ์ถ”์ถœ

2.3 ๋ฐฉ๋ฒ•๋ก : NeuralFeels ํŒŒ์ดํ”„๋ผ์ธ ์ƒ์„ธ ๋ถ„์„

2.3.1 ์ „์ฒด ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜

flowchart TB
    subgraph Sensors["๐Ÿค– ์„ผ์„œ ์ž…๋ ฅ"]
        CAM[RGB-D ์นด๋ฉ”๋ผ<br/>RealSense D435]
        TAC[DIGIT ์ด‰๊ฐ ์„ผ์„œ<br/>x4 fingers]
        PROP[๊ด€์ ˆ ์ธ์ฝ”๋”<br/>16D joints]
    end
    
    subgraph Frontend["โš™๏ธ ํ”„๋ก ํŠธ์—”๋“œ ์ฒ˜๋ฆฌ"]
        SAM[SAM ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜<br/>+ Embodied Prompts]
        TT[Tactile Transformer<br/>ViT ๊ธฐ๋ฐ˜ ๊นŠ์ด ์˜ˆ์ธก]
        FK[Forward Kinematics<br/>์ด‰๊ฐ ์„ผ์„œ ์œ„์น˜ ๊ณ„์‚ฐ]
    end
    
    subgraph Backend["๐Ÿง  ๋ฐฑ์—”๋“œ ์ตœ์ ํ™”"]
        SDF[Neural SDF<br/>Instant-NGP]
        PG[Pose Graph<br/>Sliding Window]
    end
    
    subgraph Output["๐Ÿ“ค ์ถœ๋ ฅ"]
        POSE[6-DoF ๋ฌผ์ฒด ์ž์„ธ]
        MESH[3D ๋ณต์› ๋ฉ”์‰ฌ]
    end
    
    CAM --> SAM
    TAC --> TT
    PROP --> FK
    
    SAM --> |์‹œ๊ฐ ๊นŠ์ด| Backend
    TT --> |์ด‰๊ฐ ๊นŠ์ด| Backend
    FK --> |์„ผ์„œ ํฌ์ฆˆ| Backend
    
    SDF <--> |๊ต๋Œ€ ์ตœ์ ํ™”| PG
    
    Backend --> POSE
    Backend --> MESH
    
    style Frontend fill:#e1f5fe
    style Backend fill:#fff3e0
    style Sensors fill:#f3e5f5
    style Output fill:#e8f5e9
Figure 1: NeuralFeels ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜

์‹œ์Šคํ…œ์€ ํฌ๊ฒŒ ํ”„๋ก ํŠธ์—”๋“œ์™€ ๋ฐฑ์—”๋“œ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์น˜ ์ „ํ†ต์ ์ธ Visual SLAM ์‹œ์Šคํ…œ๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, ์ด‰๊ฐ ์ •๋ณด๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

2.3.2 ํ”„๋ก ํŠธ์—”๋“œ: ์„ผ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊นŠ์ด ๋งต์œผ๋กœ

2.3.2.1 ์‹œ๊ฐ ์ฒ˜๋ฆฌ: SAM + Embodied Prompts

๋ฌผ์ฒด๊ฐ€ ์†์— ๊ฐ€๋ ค์ง„ ์ƒํ™ฉ์—์„œ ์–ด๋–ป๊ฒŒ ๋ฌผ์ฒด ์˜์—ญ์„ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ํ• ๊นŒ์š”?

NeuralFeels๋Š” Segment Anything Model (SAM)์„ ํ™œ์šฉํ•˜๋˜, โ€œEmbodied Promptsโ€๋ผ๋Š” ์˜๋ฆฌํ•œ ํŠธ๋ฆญ์„ ์”๋‹ˆ๋‹ค:

  1. Forward Kinematics๋กœ ๊ฐ ์†๊ฐ€๋ฝ ๋(์ด‰๊ฐ ์„ผ์„œ)์˜ 3D ์œ„์น˜ ๊ณ„์‚ฐ
  2. ์ด ์œ„์น˜๋“ค์„ ์นด๋ฉ”๋ผ ์ด๋ฏธ์ง€์— ํˆฌ์˜
  3. ํˆฌ์˜๋œ ์ ๋“ค์„ SAM์˜ ํฌ์ธํŠธ ํ”„๋กฌํ”„ํŠธ๋กœ ์‚ฌ์šฉ
  4. ๊ทธ๋ฆฝ ์ค‘์‹ฌ๋„ ์ถ”๊ฐ€ํ•˜์—ฌ โ€œ์ด ์ ๋“ค ์ฃผ๋ณ€์ด ๋ฌผ์ฒด๋‹คโ€๋ผ๊ณ  ์•ˆ๋‚ด

\mathbf{p}_i^{\text{2D}} = \mathbf{K} \cdot \mathbf{T}_{c \leftarrow h} \cdot \text{FK}(q_t, i)

์—ฌ๊ธฐ์„œ:

  • \mathbf{K}: ์นด๋ฉ”๋ผ ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ
  • \mathbf{T}_{c \leftarrow h}: ํ•ธ๋“œโ†’์นด๋ฉ”๋ผ ๋ณ€ํ™˜
  • \text{FK}(q_t, i): i๋ฒˆ์งธ ์†๊ฐ€๋ฝ ๋์˜ 3D ์œ„์น˜

2.3.2.2 ์ด‰๊ฐ ์ฒ˜๋ฆฌ: Tactile Transformer

DIGIT ์„ผ์„œ๋Š” ๋น„์ „ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ์ž…๋‹ˆ๋‹ค. ํˆฌ๋ช…ํ•œ ์ ค ํŒจ๋“œ ๋’ค์— ์นด๋ฉ”๋ผ๊ฐ€ ์žˆ์–ด์„œ ๋ณ€ํ˜•์„ โ€œ๋ณด๋Š”โ€ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

๋ฌธ์ œ๋Š”: ์ด RGB ์ด๋ฏธ์ง€์—์„œ ์ ‘์ด‰ ๊นŠ์ด๋ฅผ ์–ด๋–ป๊ฒŒ ์ถ”์ •ํ•  ๊ฒƒ์ธ๊ฐ€?

๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ Photometric Stereo๋‚˜ CNN์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. NeuralFeels๋Š” Vision Transformer (ViT) ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค:

flowchart LR
    subgraph Input["์ž…๋ ฅ"]
        IMG["DIGIT RGB ์ด๋ฏธ์ง€<br/>(240ร—320)"]
    end
    
    subgraph Encoder["ViT ์ธ์ฝ”๋”"]
        PATCH["ํŒจ์น˜ ์ž„๋ฒ ๋”ฉ<br/>16ร—16 patches"]
        TRANS["Transformer Blocks<br/>12 layers"]
    end
    
    subgraph Decoder["๋””์ฝ”๋”"]
        REASSEMBLE["Feature Reassembly"]
        FUSION["Multi-scale Fusion"]
    end
    
    subgraph Output["์ถœ๋ ฅ"]
        DEPTH["์ ‘์ด‰ ๊นŠ์ด ๋งต"]
        MASK["์ ‘์ด‰ ๋งˆ์Šคํฌ"]
    end
    
    IMG --> PATCH --> TRANS --> REASSEMBLE --> FUSION --> DEPTH
    FUSION --> MASK
    
    style Encoder fill:#e3f2fd
    style Decoder fill:#fce4ec
Figure 2: Tactile Transformer ์•„ํ‚คํ…์ฒ˜

ํ•ต์‹ฌ ์„ค๊ณ„ ๊ฒฐ์ •:

์š”์†Œ ์„ ํƒ ์ด์œ 
์•„ํ‚คํ…์ฒ˜ Dense ViT (DPT ๊ธฐ๋ฐ˜) ๊ณ ํ•ด์ƒ๋„ ๊นŠ์ด ์˜ˆ์ธก์— ์šฐ์ˆ˜
ํ•™์Šต ๋ฐ์ดํ„ฐ TACTO ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ 40๊ฐœ YCB ๋ฌผ์ฒด ร— 10,000 ์ ‘์ด‰
๋„๋ฉ”์ธ ์ ์‘ ๋žœ๋คํ™” (LED, ๊นŠ์ด, ๋…ธ์ด์ฆˆ) Sim-to-Real ์ „์ด
ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ 21.7M ๊ฒฝ๋Ÿ‰ํ™” (CNN ๋Œ€๋น„)

์†์‹ค ํ•จ์ˆ˜:

\mathcal{L}_{\text{tactile}} = \frac{1}{N} \sum_{i} \| \hat{D}_i - D_i^{\text{GT}} \|_2^2

2.3.3 ๋ฐฑ์—”๋“œ: Neural SLAM

๋ฐฑ์—”๋“œ๋Š” ๋‘ ๊ฐ€์ง€ ์ตœ์ ํ™”๋ฅผ ๊ต๋Œ€๋กœ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค:

  1. Map Optimizer: Neural SDF ๊ฐ€์ค‘์น˜ \theta ์—…๋ฐ์ดํŠธ
  2. Pose Optimizer: ๋ฌผ์ฒด ์ž์„ธ \{x_t\} ์—…๋ฐ์ดํŠธ

์ด๋Š” EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ •์‹ ๊ณผ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค: โ€œํ˜•์ƒ์„ ์•Œ๋ฉด ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์‰ฝ๊ณ , ์ž์„ธ๋ฅผ ์•Œ๋ฉด ํ˜•์ƒ์„ ์ถ”์ •ํ•˜๊ธฐ ์‰ฝ๋‹ค.โ€

2.3.3.1 Map Optimizer: SDF ํ•™์Šต

๊ฐ ํ”„๋ ˆ์ž„์—์„œ ์‹œ๊ฐ+์ด‰๊ฐ ๊นŠ์ด ๋งต์ด ๋“ค์–ด์˜ค๋ฉด:

  1. ๊นŠ์ด ํ”ฝ์…€์„ 3D ํฌ์ธํŠธ๋กœ ๋ฐฑํ”„๋กœ์ ์…˜
  2. ๋ฌผ์ฒด ์ขŒํ‘œ๊ณ„๋กœ ๋ณ€ํ™˜ (ํ˜„์žฌ ์ž์„ธ ์ถ”์ •์น˜ \hat{x}_t ์‚ฌ์šฉ)
  3. ์นด๋ฉ”๋ผ ๊ด‘์„ ์„ ๋”ฐ๋ผ ์ƒ˜ํ”Œ๋ง
  4. Truncated SDF ์†์‹ค๋กœ Neural SDF ํ•™์Šต

SDF ์†์‹ค ํ•จ์ˆ˜:

๊ด‘์„  \mathbf{r}(u) = \mathbf{o} + u \cdot \mathbf{d} ์œ„์˜ ์ƒ˜ํ”Œ \mathbf{p}์— ๋Œ€ํ•ด:

\mathcal{L}_{\text{SDF}}(\theta) = \begin{cases} |f_\theta(\mathbf{p}) - d_{\text{surf}}| & \text{if } |d_{\text{surf}}| < \tau \\ \text{free-space loss} & \text{otherwise} \end{cases}

์—ฌ๊ธฐ์„œ d_{\text{surf}}๋Š” ํ‘œ๋ฉด๊นŒ์ง€์˜ ์‹ค์ œ ๊ฑฐ๋ฆฌ, \tau๋Š” truncation ๊ฑฐ๋ฆฌ์ž…๋‹ˆ๋‹ค.

Keyframe ๊ธฐ๋ฐ˜ ํ•™์Šต:

๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ๊ณผ Catastrophic Forgetting ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด:

  • ์ตœ๊ทผ K๊ฐœ์˜ ํ‚คํ”„๋ ˆ์ž„ ์œ ์ง€
  • ๊ฐ ์ตœ์ ํ™” ์Šคํ…์—์„œ ํ˜„์žฌ ํ”„๋ ˆ์ž„ + ๊ณผ๊ฑฐ ํ‚คํ”„๋ ˆ์ž„ ๋ฆฌํ”Œ๋ ˆ์ด

2.3.3.2 Pose Optimizer: Factor Graph

์ž์„ธ ์ถ”์ •์€ ๋น„์„ ํ˜• ์ตœ์†Œ์ œ๊ณฑ ๋ฌธ์ œ๋กœ ์ •์‹ํ™”๋ฉ๋‹ˆ๋‹ค:

x_t^* = \arg\min_{x_t} \sum_k \mathcal{L}_k(x_t)

Factor๋“ค:

graph LR
    subgraph Factors["Factor Types"]
        SDF_F["๐Ÿ”ต SDF Factor<br/>Point-to-SDF ์ •๋ ฌ"]
        ICP_F["๐ŸŸข ICP Factor<br/>ํ”„๋ ˆ์ž„๊ฐ„ ์ •ํ•ฉ"]
        REG_F["๐ŸŸก Regularization<br/>์ž์„ธ ์•ˆ์ •ํ™”"]
    end
    
    subgraph Graph["Factor Graph"]
        X1((xโ‚)) --- X2((xโ‚‚)) --- X3((xโ‚ƒ)) --- X4((xโ‚„))
    end
    
    SDF_F --> Graph
    ICP_F --> Graph
    REG_F --> Graph
Figure 3: Pose Graph Factor ๊ตฌ์กฐ

1. SDF Factor (Point-to-SDF):

\mathcal{L}_{\text{sdf}}(x_t) = \sum_{\mathbf{p} \in \mathcal{P}_t} \rho\left( f_\theta(x_t^{-1} \cdot \mathbf{p}) \right)

ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ ํ˜„์žฌ ์ž์„ธ๋กœ ๋ณ€ํ™˜ํ•œ ๋’ค, Neural SDF ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์•ผ ํ•ฉ๋‹ˆ๋‹ค (ํ‘œ๋ฉด ์œ„์— ์žˆ์–ด์•ผ ํ•จ).

2. ICP Factor (Frame-to-Frame):

\mathcal{L}_{\text{icp}}(x_t, x_{t-1}) = \| (x_{t-1}^{-1} \cdot x_t) \ominus \Delta T_{\text{ICP}} \|^2

์ธ์ ‘ ํ”„๋ ˆ์ž„ ๊ฐ„์˜ ์ƒ๋Œ€ ๋ณ€ํ™˜์ด ICP๋กœ ์ถ”์ •ํ•œ ๋ณ€ํ™˜๊ณผ ์ผ์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

3. Regularization Factor:

\mathcal{L}_{\text{reg}}(x_t, x_{t-1}) = \| x_t \ominus x_{t-1} \|^2_\Sigma

๊ธ‰๊ฒฉํ•œ ์ž์„ธ ๋ณ€ํ™”๋ฅผ ์–ต์ œํ•ฉ๋‹ˆ๋‹ค.

Sliding Window ์ตœ์ ํ™”:

์ „์ฒด ํŠธ๋ž˜์ ํ† ๋ฆฌ๋ฅผ ์ตœ์ ํ™”ํ•˜๋ฉด ๊ณ„์‚ฐ๋Ÿ‰์ด ์„ ํ˜• ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€์‹ :

  • ์ตœ๊ทผ W๊ฐœ ํ”„๋ ˆ์ž„๋งŒ ํ™œ์„ฑ ์œˆ๋„์šฐ๋กœ ์œ ์ง€
  • Theseus (PyTorch ๊ธฐ๋ฐ˜ ๋น„์„ ํ˜• ์ตœ์ ํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ) ์‚ฌ์šฉ
  • ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜์—ฌ End-to-End ํ•™์Šต ๊ฐ€๋Šฅ (๋ฏธ๋ž˜ ์—ฐ๊ตฌ)

2.4 ์‹คํ—˜ ์„ค์ • ๋ฐ ํ•˜๋“œ์›จ์–ด

2.4.1 ๋กœ๋ด‡ ํ”Œ๋žซํผ

๊ตฌ์„ฑ์š”์†Œ ์‚ฌ์–‘
๋งค๋‹ˆํ“ฐ๋ ˆ์ดํ„ฐ Franka Panda 7-DoF
ํ•ธ๋“œ Allegro Hand (16-DoF, 4์†๊ฐ€๋ฝ)
์ด‰๊ฐ ์„ผ์„œ DIGIT ร— 4 (๊ฐ ์†๊ฐ€๋ฝ ๋)
์‹œ๊ฐ ์„ผ์„œ Intel RealSense D435 RGB-D
GPU NVIDIA RTX 3090/4090

2.4.2 DIGIT ์ด‰๊ฐ ์„ผ์„œ

DIGIT์€ Meta์—์„œ ๊ฐœ๋ฐœํ•œ ๋น„์ „ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ์ž…๋‹ˆ๋‹ค:

  • ํ•ด์ƒ๋„: 240 ร— 320 RGB
  • ํ”„๋ ˆ์ž„๋ ˆ์ดํŠธ: 30 Hz
  • ์›๋ฆฌ: ์ ค ํŒจ๋“œ ๋ณ€ํ˜•์„ ๋‚ด์žฅ ์นด๋ฉ”๋ผ๋กœ ์ดฌ์˜
  • ์žฅ์ : ์ €๋ ด($50), ๊ณ ํ•ด์ƒ๋„, ๊ต์ฒด ๊ฐ€๋Šฅ

2.4.3 FeelSight ๋ฐ์ดํ„ฐ์…‹

์—ฐ๊ตฌํŒ€์€ ๋ฒค์น˜๋งˆํ‚น์„ ์œ„ํ•ด FeelSight ๋ฐ์ดํ„ฐ์…‹์„ ๊ณต๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค:

ํ•ญ๋ชฉ ์ˆ˜๋Ÿ‰
์ด ์‹œํ€€์Šค 70๊ฐœ
์‹œ๋ฎฌ๋ ˆ์ด์…˜ 35๊ฐœ (Isaac Gym + TACTO)
์‹ค์ œ ํ™˜๊ฒฝ 35๊ฐœ
๋ฌผ์ฒด ์ข…๋ฅ˜ 14๊ฐœ (YCB, ContactDB ๋“ฑ)
์‹œํ€€์Šค ๊ธธ์ด 30์ดˆ/์‹œํ€€์Šค
Ground Truth ๋‹ค์ค‘ ์นด๋ฉ”๋ผ ์ถ”์ 

2.4.4 ์กฐ์ž‘ ์ •์ฑ…

๋ฌผ์ฒด ํšŒ์ „์„ ์œ„ํ•ด HORA (Haozhi Qi et al.) ์ •์ฑ…์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

  • ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ(proprioception) ๊ธฐ๋ฐ˜
  • Isaac Gym์—์„œ ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ํ•™์Šต
  • Sim-to-Real ์ „์ด ์„ฑ๊ณต

2.5 ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„

2.5.1 ํ‰๊ฐ€ ์ง€ํ‘œ

์ž์„ธ ์ถ”์ :

  • ADD-S: Symmetric Average Distance (mm)

\text{ADD-S} = \frac{1}{|\mathcal{M}|} \sum_{\mathbf{p} \in \mathcal{M}} \min_{\mathbf{q} \in \mathcal{M}} \| (R\mathbf{p} + t) - \mathbf{q} \|

ํ˜•์ƒ ๋ณต์›:

  • F-Score: Precision๊ณผ Recall์˜ ์กฐํ™”ํ‰๊ท  (threshold = 5mm)

\text{F-Score} = \frac{2 \cdot P \cdot R}{P + R}

2.5.2 ์ •๋Ÿ‰์  ๊ฒฐ๊ณผ

2.5.2.1 ํ˜•์ƒ ๋ณต์› (SLAM ๋ชจ๋“œ)

ํ™˜๊ฒฝ Modality F-Score (%) ์ค‘์•™ ์˜ค์ฐจ (mm)
์‹œ๋ฎฌ๋ ˆ์ด์…˜ Vision Only 73.2 2.8
์‹œ๋ฎฌ๋ ˆ์ด์…˜ Vision+Tactile 81.4 2.1
์‹ค์ œ Vision Only 62.1 4.2
์‹ค์ œ Vision+Tactile 74.8 3.9
Noteํ•ต์‹ฌ ๋ฐœ๊ฒฌ

์ด‰๊ฐ ์ถ”๊ฐ€ ์‹œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ 11%, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ 20% F-Score ๊ฐœ์„ 

2.5.2.2 ์ž์„ธ ์ถ”์  (Known Shape)

CAD ๋ชจ๋ธ์ด ์ฃผ์–ด์ง„ ๊ฒฝ์šฐ์˜ ์ˆœ์ˆ˜ ์ถ”์  ์„ฑ๋Šฅ:

ํ™˜๊ฒฝ Modality ADD-S (mm) ๊ฐœ์„ ์œจ
์‹œ๋ฎฌ๋ ˆ์ด์…˜ Vision Only 3.2 -
์‹œ๋ฎฌ๋ ˆ์ด์…˜ Vision+Tactile 2.3 28% โ†“
์‹ค์ œ Vision Only 5.8 -
์‹ค์ œ Vision+Tactile 4.7 19% โ†“

2.5.2.3 ํ์ƒ‰(Occlusion) ๊ฐ•๊ฑด์„ฑ

๊ฐ€์žฅ ์ธ์ƒ์ ์ธ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ์นด๋ฉ”๋ผ ์‹œ์ ์„ ๊ตฌ๋ฉด(sphere) ์œ„์—์„œ ๋ณ€ํ™”์‹œํ‚ค๋ฉฐ ํ์ƒ‰ ์ •๋„์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ์„ ์ธก์ •:

ํ์ƒ‰ ์ˆ˜์ค€ Vision Only Vision+Tactile ๊ฐœ์„ ์œจ
๊ฒฝ๋ฏธ (0-30%) 4.1 mm 3.8 mm 7%
์ค‘๊ฐ„ (30-60%) 8.2 mm 5.1 mm 38%
์‹ฌ๊ฐ (60-90%) 22.4 mm 6.2 mm 72%
๊ทน์‹ฌ (90%+) ์‹คํŒจ 12.1 mm 94%

โ€œTouch, at the very least, refines and, at the very best, disambiguates visual estimates.โ€

2.5.3 ์ •์„ฑ์  ๋ถ„์„

2.5.3.1 ์‹œ๊ฐ vs ์ด‰๊ฐ์˜ ์ƒ๋ณด์„ฑ

flowchart LR
    subgraph Vision["๐Ÿ‘๏ธ ์‹œ๊ฐ"]
        V1["โœ… ์ „์—ญ ํ˜•์ƒ ํŒŒ์•…"]
        V2["โœ… ๋จผ ๊ฑฐ๋ฆฌ์—์„œ๋„ ๋™์ž‘"]
        V3["โŒ ํ์ƒ‰์— ์ทจ์•ฝ"]
        V4["โŒ ์ ‘์ด‰๋ฉด ์ •๋ณด ์—†์Œ"]
    end
    
    subgraph Tactile["๐Ÿ–๏ธ ์ด‰๊ฐ"]
        T1["โœ… ํ์ƒ‰ ๋ฌด๊ด€"]
        T2["โœ… ๊ณ ํ•ด์ƒ๋„ ์ ‘์ด‰ ์ง€์˜ค๋ฉ”ํŠธ๋ฆฌ"]
        T3["โŒ ์ง€์—ญ์  ์ •๋ณด๋งŒ"]
        T4["โŒ ์ ‘์ด‰ ์‹œ์—๋งŒ ๋™์ž‘"]
    end
    
    subgraph Fusion["๐Ÿ”€ ์œตํ•ฉ ์‹œ๋„ˆ์ง€"]
        F1["์ „์—ญ + ์ง€์—ญ ์ •๋ณด"]
        F2["ํ์ƒ‰ ๊ฐ•๊ฑด์„ฑ"]
        F3["์ •๋ฐ€ํ•œ ์ ‘์ด‰๋ฉด ๋ชจ๋ธ๋ง"]
    end
    
    Vision --> Fusion
    Tactile --> Fusion
Figure 4: ์‹œ๊ฐ๊ณผ ์ด‰๊ฐ์˜ ์ƒ๋ณด์  ์—ญํ• 

2.5.3.2 ์‹คํŒจ ์‚ฌ๋ก€ ๋ถ„์„

์—ฐ๊ตฌํŒ€์€ ์†”์งํ•˜๊ฒŒ ํ•œ๊ณ„๋ฅผ ์ธ์ •ํ•ฉ๋‹ˆ๋‹ค:

  1. ์ดˆ๊ธฐ ์ˆ˜๋ ด ์‹คํŒจ: ์ฒ˜์Œ ๋ช‡ ์ดˆ๊ฐ„ Neural SDF๊ฐ€ ๋ถˆ์™„์ „ํ•  ๋•Œ ์ถ”์  ์‹คํŒจ ๊ฐ€๋Šฅ
  2. ๊ธ‰๊ฒฉํ•œ ํšŒ์ „: ํ”„๋ ˆ์ž„ ๊ฐ„ ๋ณ€ํ™”๊ฐ€ ๋„ˆ๋ฌด ํฌ๋ฉด ICP ์‹คํŒจ
  3. ํˆฌ๋ช…/๋ฐ˜์‚ฌ ๋ฌผ์ฒด: DIGIT ์„ผ์„œ์˜ ๊ด‘ํ•™์  ํ•œ๊ณ„
  4. ๋งค์šฐ ์ž‘์€ ๋ฌผ์ฒด: ์ด‰๊ฐ ํ•ด์ƒ๋„ ํ•œ๊ณ„

2.6 ๊ธฐ์ˆ ์  ์‹ฌํ™”: ์ˆ˜ํ•™์  ๋ฐฐ๊ฒฝ

2.6.1 Lie Group SE(3)์—์„œ์˜ ์ž์„ธ ํ‘œํ˜„

๋กœ๋ด‡ ์ž์„ธ๋ฅผ ๋‹ค๋ฃจ๋ ค๋ฉด SE(3) (Special Euclidean group)์„ ์ดํ•ดํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

\text{SE}(3) = \left\{ \begin{pmatrix} R & t \\ 0 & 1 \end{pmatrix} \mid R \in \text{SO}(3), t \in \mathbb{R}^3 \right\}

์™œ Lie Group์ธ๊ฐ€?

  1. ํšŒ์ „ ํ–‰๋ ฌ์˜ ์ง์ ‘ ์ตœ์ ํ™”๋Š” ์ œ์•ฝ ์กฐ๊ฑด(์ •๊ทœ์ง๊ต์„ฑ)์ด ๋ณต์žก
  2. Lie Algebra \mathfrak{se}(3)๋กœ ๋งคํ•‘ํ•˜๋ฉด ๋ฌด์ œ์•ฝ ์ตœ์ ํ™” ๊ฐ€๋Šฅ
  3. ๋ฏธ๋ถ„๊ณผ ๋ณด๊ฐ„์ด ์ž์—ฐ์Šค๋Ÿฌ์›€

Exponential/Logarithmic Map:

\exp: \mathfrak{se}(3) \rightarrow \text{SE}(3), \quad \log: \text{SE}(3) \rightarrow \mathfrak{se}(3)

NeuralFeels์˜ Pose Graph์—์„œ x_t \ominus x_{t-1}๋Š” ๋ฐ”๋กœ ์ด Logarithmic map์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค:

x_t \ominus x_{t-1} = \log(x_{t-1}^{-1} \cdot x_t)

2.6.2 Truncated SDF (TSDF) vs Neural SDF

TSDF (์ „ํ†ต์ ):

\text{TSDF}(\mathbf{x}) = \text{clamp}\left( \frac{d(\mathbf{x})}{\tau}, -1, 1 \right)

  • Voxel Grid์— ์ €์žฅ
  • O(N^3) ๋ฉ”๋ชจ๋ฆฌ
  • ํ•ด์ƒ๋„ ์ œํ•œ

Neural SDF (NeuralFeels):

f_\theta(\mathbf{x}) \approx \text{SDF}(\mathbf{x})

  • ์‹ ๊ฒฝ๋ง ๊ฐ€์ค‘์น˜์— ์•”๋ฌต์  ์ €์žฅ
  • ์—ฐ์†์ , ๋ฏธ๋ถ„ ๊ฐ€๋Šฅ
  • ์ ์‘์  ํ•ด์ƒ๋„ (Instant-NGP)

2.6.3 Sim-to-Real Transfer์˜ ํ•ต์‹ฌ

Tactile Transformer๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ํ•™์Šตํ–ˆ๋Š”๋ฐ ์‹ค์ œ ์„ผ์„œ์—์„œ ๋™์ž‘ํ•˜๋Š” ์ด์œ :

Domain Randomization ์ „๋žต:

์š”์†Œ ๋žœ๋คํ™” ๋ฒ”์œ„
LED ์ƒ‰์˜จ๋„ ยฑ20%
์ ค ํŒจ๋“œ ๊ตด์ ˆ๋ฅ  ยฑ5%
์ ‘์ด‰ ๊นŠ์ด 0.5-3mm
์นด๋ฉ”๋ผ ๋…ธ์ด์ฆˆ Gaussian + Salt&Pepper
๋ฐฐ๊ฒฝ ํ…์Šค์ฒ˜ ์‹ค์ œ DIGIT ๋น„์ ‘์ด‰ ์ด๋ฏธ์ง€ ํ•ฉ์„ฑ
ImportantDomain Gap ํ•ด์†Œ์˜ ํ•ต์‹ฌ

์‹ค์ œ DIGIT ์„ผ์„œ์˜ โ€œ๋น„์ ‘์ด‰โ€ ๋ฐฐ๊ฒฝ ์ด๋ฏธ์ง€๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์— ํ•ฉ์„ฑํ•˜์—ฌ ์„ผ์„œ๋ณ„ ๊ด‘ํ•™ ํŠน์„ฑ ์ฐจ์ด๋ฅผ ํก์ˆ˜


2.7 ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

2.7.1 ์‹œ๊ฐ-์ด‰๊ฐ SLAM ๊ณ„๋ณด

์‹œ๊ฐ-์ด‰๊ฐ ์ธ์ง€ ์—ฐ๊ตฌ ๋ฐœ์ „ ํƒ€์ž„๋ผ์ธ:

Table 1: ์‹œ๊ฐ-์ด‰๊ฐ SLAM ์—ฐ๊ตฌ ๊ณ„๋ณด
์—ฐ๋„ ์—ฐ๊ตฌ ๋‚ด์šฉ
2000s Moll & Erdmann ์†๋ฐ”๋‹ฅ ๊ตด๋ฆฌ๊ธฐ ํ˜•์ƒ ๋ณต์›
2015 GelSight ๋“ฑ์žฅ ๊ณ ํ•ด์ƒ๋„ ์ด‰๊ฐ ์ด๋ฏธ์ง•
2019 DIGIT ์„ผ์„œ ์ €๊ฐ€/์†Œํ˜•ํ™”
2020 Bauza et al. ์ด‰๊ฐ SLAM ๋ฐ๋ชจ
2022 iSDF ์‹ค์‹œ๊ฐ„ Neural SDF
2023 FingerSLAM ๋‹จ์ผ ์†๊ฐ€๋ฝ SLAM
2024 NeuralFeels ๋‹ค์†๊ฐ€๋ฝ ์‹œ๊ฐ-์ด‰๊ฐ ์œตํ•ฉ

2.7.2 ์ฃผ์š” ๋น„๊ต ๋Œ€์ƒ

์—ฐ๊ตฌ ์‹œ๊ฐ ์ด‰๊ฐ ์˜จ๋ผ์ธ Unknown Object Multi-finger
FingerSLAM โœ… โœ… โœ… โœ… โŒ
Bauza et al. โŒ โœ… โŒ โœ… โŒ
BundleSDF โœ… โŒ โœ… โœ… N/A
NeuralFeels โœ… โœ… โœ… โœ… โœ…

FingerSLAM๊ณผ์˜ ์ฐจ์ด์ :

  • FingerSLAM: ๋‹จ์ผ ์ด‰๊ฐ ์„ผ์„œ, ๋ฌผ์ฒด๊ฐ€ ํ•ญ์ƒ ์ ‘์ด‰ ์œ ์ง€
  • NeuralFeels: 4๊ฐœ ์†๊ฐ€๋ฝ, ๊ฐ„ํ—์  ์ ‘์ด‰, ๋” ํ˜„์‹ค์ ์ธ ์กฐ์ž‘ ์‹œ๋‚˜๋ฆฌ์˜ค

2.8 ๋น„ํŒ์  ๊ณ ์ฐฐ

2.8.1 ๊ฐ•์ 

  1. ์™„์ „ํ•œ ์‹œ์Šคํ…œ: ์„ผ์„œ โ†’ ์ฒ˜๋ฆฌ โ†’ ์ถœ๋ ฅ๊นŒ์ง€ End-to-End
  2. ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ: ๋ฏธ์ง€์˜ ๋ฌผ์ฒด์—์„œ๋„ ๋™์ž‘
  3. ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ: Neural SDF๊ฐ€ ๋ช…์‹œ์  3D ํ‘œํ˜„ ์ œ๊ณต (vs ๋ธ”๋ž™๋ฐ•์Šค End-to-End)
  4. ์žฌํ˜„ ๊ฐ€๋Šฅ์„ฑ: ์ฝ”๋“œ, ๋ฐ์ดํ„ฐ์…‹, ๋ชจ๋ธ ๊ณต๊ฐœ
  5. ์‹ค์šฉ์  ํ•˜๋“œ์›จ์–ด: ์ƒ์šฉ ์„ผ์„œ ์‚ฌ์šฉ (DIGIT, RealSense)

2.8.2 ์•ฝ์  ๋ฐ ํ•œ๊ณ„

2.8.2.1 ๊ธฐ์ˆ ์  ํ•œ๊ณ„

  1. ์ฒ˜๋ฆฌ ์†๋„: 1-5 Hz (์‹ค์‹œ๊ฐ„์ด๋ผ ํ•˜๊ธฐ์—” ๋А๋ฆผ)
  2. ์ดˆ๊ธฐ ์ˆ˜๋ ด ๋ฌธ์ œ: ์ฒ˜์Œ ๋ช‡ ์ดˆ๊ฐ„ ๋ถˆ์•ˆ์ •
  3. ๋ฒ”์šฉ์„ฑ: ํ˜„์žฌ in-hand rotation๋งŒ ๋ฐ๋ชจ (๋‹ค๋ฅธ ์กฐ์ž‘ ํ…Œ์ŠคํŠธ ๋ถ€์กฑ)
  4. ์„ผ์„œ ์˜์กด์„ฑ: DIGIT ์„ผ์„œ ํŠนํ™” (๋‹ค๋ฅธ ์ด‰๊ฐ ์„ผ์„œ ๋ฏธ๊ฒ€์ฆ)

2.8.2.2 ๋ฐฉ๋ฒ•๋ก ์  ์งˆ๋ฌธ

  1. 3D Prior ๋ถ€์žฌ: ๋งค๋ฒˆ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ˜•์ƒ ํ•™์Šต (Category-level prior ๋ฏธํ™œ์šฉ)
  2. ๋‹จ์ผ ๋ฌผ์ฒด ๊ฐ€์ •: ๋‹ค์ค‘ ๋ฌผ์ฒด ์‹œ๋‚˜๋ฆฌ์˜ค ๋ฏธ๊ฒ€ํ† 
  3. ๋™์  ๋ฌผ์ฒด: ๋ณ€ํ˜• ๊ฐ€๋Šฅ ๋ฌผ์ฒด(soft object) ๋ฏธ์ง€์›

2.8.2.3 ์‹คํ—˜ ์„ค๊ณ„

  1. ๋ฌผ์ฒด ๋‹ค์–‘์„ฑ: 14๊ฐœ ๋ฌผ์ฒด๋งŒ ํ…Œ์ŠคํŠธ (๋” ๋‹ค์–‘ํ•œ ํ˜•์ƒ, ์žฌ์งˆ ํ•„์š”)
  2. ์‹คํŒจ ๋ชจ๋“œ: ์ฒด๊ณ„์ ์ธ ์‹คํŒจ ๋ถ„์„ ๋ถ€์กฑ
  3. Baseline: ๋” ๋‹ค์–‘ํ•œ ๋น„๊ต ๋Œ€์ƒ ํ•„์š” (End-to-End ๋ฐฉ๋ฒ• ๋“ฑ)

2.8.3 ๋ฏธ๋ž˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ ์ œ์•ˆ

2.8.3.1 ๋‹จ๊ธฐ (1-2๋…„)

  1. ์†๋„ ์ตœ์ ํ™”: TensorRT ๋“ฑ์œผ๋กœ 10Hz ์ด์ƒ ๋‹ฌ์„ฑ
  2. ๋‹ค์ค‘ ์„ผ์„œ ์ผ๋ฐ˜ํ™”: GelSight, DIGIT-360 ๋“ฑ ๋‹ค์–‘ํ•œ ์„ผ์„œ ์ง€์›
  3. Sim-to-Real ๊ฐ•ํ™”: Meta-learning ๊ธฐ๋ฐ˜ ๋น ๋ฅธ ์ ์‘

2.8.3.2 ์ค‘๊ธฐ (2-5๋…„)

  1. Category-level Prior: ์‚ฌ์ „ ํ•™์Šต๋œ ํ˜•์ƒ prior๋กœ ์ดˆ๊ธฐ ์ˆ˜๋ ด ๊ฐœ์„ 
  2. Closed-loop Control: ์ธ์ง€ ๊ฒฐ๊ณผ๋ฅผ ์กฐ์ž‘ ์ •์ฑ…์— ํ”ผ๋“œ๋ฐฑ
  3. ๋ณ€ํ˜• ๋ฌผ์ฒด: Neural Field + Physics ๊ฒฐํ•ฉ (์˜ˆ: NeuralCloth)

2.8.3.3 ์žฅ๊ธฐ (5๋…„+)

  1. Foundation Model: ๋ฒ”์šฉ ์ด‰๊ฐ-์‹œ๊ฐ ํ‘œํ˜„ ํ•™์Šต
  2. Whole-body Manipulation: ๋กœ๋ด‡ ์ „์‹ ์˜ ์ ‘์ด‰ ์ธ์ง€
  3. Human-Robot Handover: ์ธ๊ฐ„-๋กœ๋ด‡ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ํ™•์žฅ

2.9 ์‹ค์Šต ๊ฐ€์ด๋“œ: NeuralFeels ์„ค์น˜ ๋ฐ ์‹คํ–‰

2.9.1 ํ™˜๊ฒฝ ์„ค์ •

## ์ €์žฅ์†Œ ํด๋ก 
git clone git@github.com:facebookresearch/neuralfeels.git
cd neuralfeels

## Conda ํ™˜๊ฒฝ ์ƒ์„ฑ (micromamba ๊ถŒ์žฅ)
./install.sh -e neuralfeels
micromamba activate neuralfeels

2.9.2 ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ

## FeelSight ๋ฐ์ดํ„ฐ์…‹ (Hugging Face)
cd data
git clone https://huggingface.co/datasets/suddhu/Feelsight
mv Feelsight/* . && rm -r Feelsight
find . -name "*.tar.gz" -exec tar -xzf {} \; -exec rm {} \;
cd ..

## Tactile Transformer ๋ชจ๋ธ
git clone https://huggingface.co/suddhu/tactile_transformer

## SAM ๊ฐ€์ค‘์น˜
mkdir -p data/segment-anything && cd data/segment-anything
for model in sam_vit_h_4b8939.pth sam_vit_l_0b3195.pth sam_vit_b_01ec64.pth; do
  wget https://dl.fbaipublicfiles.com/segment_anything/$model
done
cd ../..

2.9.3 ์‹คํ–‰ ์˜ˆ์‹œ

## ์‹œ๋ฎฌ๋ ˆ์ด์…˜ SLAM (๊ณ ๋ฌด ์˜ค๋ฆฌ)
./scripts/run --slam-sim

## ์‹ค์ œ ํ™˜๊ฒฝ ์ถ”์  (ํ”ผ๋ง)
./scripts/run --slam-real

## ์ปค์Šคํ…€ ์„ค์ •
./scripts/run feelsight slam vitac 077_rubiks_cube 00 5 0 1
##             [๋ฐ์ดํ„ฐ์…‹] [๋ชจ๋“œ] [๋ชจ๋‹ฌ๋ฆฌํ‹ฐ] [๋ฌผ์ฒด] [๋กœ๊ทธ] [FPS] [๋…นํ™”] [์‹œ๊ฐํ™”]

2.9.4 ํ•„์ˆ˜ ํ•˜๋“œ์›จ์–ด

๊ตฌ์„ฑ์š”์†Œ ์ตœ์†Œ ์š”๊ตฌ ๊ถŒ์žฅ
GPU RTX 3080 (10GB) RTX 4090 (24GB)
RAM 32GB 64GB
์ €์žฅ๊ณต๊ฐ„ 50GB SSD 100GB NVMe

2.10 ๊ฒฐ๋ก 

2.10.1 ํ•ต์‹ฌ ๋ฉ”์‹œ์ง€

NeuralFeels๋Š” ๋กœ๋ด‡ ์กฐ์ž‘ ์ธ์ง€์—์„œ ์ค‘์š”ํ•œ ์ด์ •ํ‘œ์ž…๋‹ˆ๋‹ค:

  1. ์‹œ๊ฐ-์ด‰๊ฐ ์œตํ•ฉ์˜ ์ •๋Ÿ‰์  ํšจ๊ณผ๋ฅผ ๋ช…ํ™•ํžˆ ๋ณด์—ฌ์คŒ
  2. Neural Field๋ผ๋Š” ํ˜„๋Œ€์  ํ‘œํ˜„์„ ๋กœ๋ด‡ SLAM์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ
  3. ์˜จ๋ผ์ธ, ๋ฏธ์ง€ ๋ฌผ์ฒด ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ์˜ ๋™์ž‘์„ ์ž…์ฆ
  4. ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๋ฒค์น˜๋งˆํฌ ์ œ๊ณต์œผ๋กœ ํ›„์† ์—ฐ๊ตฌ ์ด‰์ง„

2.10.2 Feynman์‹ ์š”์•ฝ

โ€œ๋งŒ์•ฝ ๋กœ๋ด‡ ์†์—๊ฒŒ โ€™๋А๋ผ๋ฉด์„œ ๋ฐฐ์šฐ๋ผโ€™๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๋ˆˆ์ด ๊ฐ€๋ ค์ ธ๋„ ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋‹ค. NeuralFeels๋Š” ๊ทธ โ€™๋А๋ผ๋ฉด์„œ ๋ฐฐ์šฐ๊ธฐโ€™๋ฅผ ์‹ ๊ฒฝ๋ง์œผ๋กœ ๊ตฌํ˜„ํ•œ ๊ฒƒ์ด๋‹ค.โ€

2.10.3 ์—ฐ๊ตฌ์ž๋ฅผ ์œ„ํ•œ ์กฐ์–ธ

์ด ์—ฐ๊ตฌ๋ฅผ ํ™•์žฅํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด:

  1. ๋‹ค๋ฅธ ์ด‰๊ฐ ์„ผ์„œ ์ ์šฉ: GelSight, Soft Bubble ๋“ฑ
  2. ๋‹ค๋ฅธ ์กฐ์ž‘ ์ž‘์—… ์ ์šฉ: ์‚ฝ์ž…, ๋น„ํŒŒ์ง€ ์กฐ์ž‘, ์–‘์† ์กฐ์ž‘
  3. Foundation Model ๊ฒฐํ•ฉ: CLIP, SAM๊ณผ์˜ ๋” ๊นŠ์€ ํ†ตํ•ฉ
  4. Sim-to-Real ๊ฐœ์„ : Domain Adaptation, Meta-learning

2.11 ์ฐธ๊ณ  ๋ฌธํ—Œ

์ฃผ์š” ์ฐธ๊ณ  ๋…ผ๋ฌธ:

  1. Suresh et al., โ€œNeuralFeels with neural fields: Visuotactile perception for in-hand manipulation,โ€ Science Robotics, 2024.
  2. Mรผller et al., โ€œInstant neural graphics primitives with a multiresolution hash encoding,โ€ ACM TOG, 2022.
  3. Ortiz et al., โ€œiSDF: Real-time neural signed distance fields for robot perception,โ€ RSS, 2022.
  4. Qi et al., โ€œIn-hand object rotation via rapid motor adaptation,โ€ ICRA, 2023.
  5. Lambeta et al., โ€œDIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation,โ€ RA-L, 2020.
  6. Kirillov et al., โ€œSegment anything,โ€ ICCV, 2023.
  7. Zhao et al., โ€œFingerSLAM: Closed-loop unknown object localization and reconstruction from visuo-tactile feedback,โ€ arXiv, 2023.

Tip๋…ผ๋ฌธ ์›๋ฌธ ๋ฐ ์ž๋ฃŒ
  • arXiv: https://arxiv.org/abs/2312.13469
  • Science Robotics: https://www.science.org/doi/10.1126/scirobotics.adl0628
  • GitHub: https://github.com/facebookresearch/neuralfeels
  • ๋ฐ์ดํ„ฐ์…‹: https://huggingface.co/datasets/suddhu/Feelsight

3 โ›๏ธ Dig Review

โ›๏ธ Dig โ€” Go deep, uncover the layers. Dive into technical detail.

3.1 ์„œ๋ก : ์‹œ๊ฐ๊ณผ ์ด‰๊ฐ์˜ ๊ฒฐํ•ฉ์ด ํ•„์š”ํ•œ ์ด์œ 

์‚ฌ๋žŒ์€ ์—ฌ๋Ÿฌ ๊ฐ๊ฐ์„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ†ตํ•ฉํ•˜์—ฌ ์ฃผ๋ณ€์„ ์ธ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์–ด๋‘์šด ์ฃผ๋จธ๋‹ˆ ์†์—์„œ ์—ด์‡ ๋ฅผ ์ฐพ๊ฑฐ๋‚˜, ๋ฐค์ค‘์— ๋ถˆ์„ ์ผœ์ง€ ์•Š๊ณ ๋„ ์—ด์‡ ๋ฅผ ์ž๋ฌผ์‡ ์— ๋งž์ถฐ ๋„ฃ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์‹œ๊ฐ์ด ์ œํ•œ๋  ๋•Œ ์ด‰๊ฐ์„ ํ†ตํ•ด ๋ฌผ์ฒด์˜ ํ˜•ํƒœ์™€ ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•˜๊ณ , ๋‘ ๊ฐ๊ฐ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•ด ์ •ํ™•ํ•œ ์กฐ์ž‘์„ ์ˆ˜ํ–‰ํ•˜์ฃ . ๊ทธ๋Ÿฌ๋‚˜ ์˜ค๋Š˜๋‚  ๋กœ๋ด‡์€ ์ด๋Ÿฌํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ธ์ง€๋ฅผ ๊ฑฐ์˜ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋‹ค์ง€ ๋กœ๋ด‡ ์†(์—ฌ๋Ÿฌ ์†๊ฐ€๋ฝ์„ ๊ฐ€์ง„ ๋กœ๋ด‡ ํ•ธ๋“œ)์ด ๋ฌผ์ฒด๋ฅผ ์† ์•ˆ์—์„œ ์ด๋ฆฌ์ €๋ฆฌ ๋Œ๋ฆฌ๋Š” in-hand ์กฐ์ž‘ ์ƒํ™ฉ์—์„œ๋Š”, ๋ฌผ์ฒด๊ฐ€ ์†์ด๋‚˜ ์†๊ฐ€๋ฝ์— ๊ฐ€๋ ค ์‹œ์•ผ๊ฐ€ ์ฐจ๋‹จ(occlusion)๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žฆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด ๋กœ๋ด‡ ์ธํ•ธ๋“œ(in-hand) ์ธ์ง€ ์—ฐ๊ตฌ๋“ค์€ ์ฃผ๋กœ ์นด๋ฉ”๋ผ ๋น„์ „์— ์˜์กดํ•˜์—ฌ ๋ฏธ๋ฆฌ ๋ชจ๋ธ์ด ์•Œ๋ ค์ง„ ๋ฌผ์ฒด์˜ ์ž์„ธ(pose)๋ฅผ ์ถ”์ ํ•˜๋Š” ๋ฐ ํ•œ์ •๋˜์–ด ์žˆ์—ˆ๊ณ , ์‹œ์•ผ ๊ฐ€๋ฆผ์ด ์—†๋Š” ๊ฐœ๋ฐฉ๋œ ํ™˜๊ฒฝ์—์„œ๋งŒ ๋™์ž‘ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ผ๋ถ€ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฌผ์ฒด๋‚˜ ํ™˜๊ฒฝ์— ๋งˆ์ปค(fiducial) ๋ถ€์ฐฉ ๋“ฑ ํŽธ๋ฒ•์œผ๋กœ ์ธ์ง€ ๋ฌธ์ œ๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๊ธฐ๋„ ํ–ˆ์ฃ . ํ•˜์ง€๋งŒ ๊ฐ€์ •์ด๋‚˜ ๋น„๊ตฌ์กฐํ™”๋œ ํ™˜๊ฒฝ์—์„œ ์ผ๋ฐ˜์ ์ธ ๋กœ๋ด‡ ์†์žฌ์ฃผ๋ฅผ ์‹คํ˜„ํ•˜๋ ค๋ฉด, ๊ฒฌ๊ณ ํ•˜๊ณ  ๋ฒ”์šฉ์ ์ธ ๋ฌผ์ฒด ์ธ์ง€๊ฐ€ ํ•„์ˆ˜์ž…๋‹ˆ๋‹ค.

์ด๋•Œ ์ด‰๊ฐ์€ ๊ฐ•๋ ฅํ•œ ๋ณด์™„ ๊ฐ๊ฐ์œผ๋กœ ๋– ์˜ค๋ฆ…๋‹ˆ๋‹ค. ๋กœ๋ด‡ ๋น„์ „์€ ์กฐ๋ช…์ด๋‚˜ ๋ฐ˜์‚ฌ, ํˆฌ๋ช…๋„ ๋“ฑ์˜ ํ˜„์‹ค ๋ฌธ์ œ๋กœ ์˜ค์ž‘๋™ํ•˜๊ธฐ ์‰ฝ์ง€๋งŒ, ์ด‰๊ฐ ์„ผ์„œ๋Š” ์‹ค์ œ ์ ‘์ด‰์„ ํ†ตํ•ด ๋ฌผ์ฒด์˜ ๊ตญ์ง€์ ์ธ ํ˜•์ƒ๊ณผ ์ƒ๋Œ€ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ง์ ‘ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ๋žŒ์˜ ์ธ์ง€ ์—ฐ๊ตฌ์—์„œ๋„ ์‹œ๊ฐ๊ณผ ์ด‰๊ฐ์ด ์„œ๋กœ ๋ณด์™„์ ์ž„์ด ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์ตœ๊ทผ ๋น„์ „ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ(์˜ˆ: GelSight, DIGIT ๋“ฑ)๊ฐ€ ์ €๋ ดํ•ด์ง€๊ณ  ์†Œํ˜•ํ™”๋˜๋ฉด์„œ ๋กœ๋ด‡ ์†๊ฐ€๋ฝ์— ๋‚ด์žฅํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๊ณ , ์ด‰๊ฐ ์„ผ์„œ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๋„ ๋ฐœ์ „ํ•˜์—ฌ ์ด‰๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์— ํ™œ์šฉํ•˜๊ธฐ ์ˆ˜์›”ํ•ด์กŒ์Šต๋‹ˆ๋‹ค. ์ด์ œ ๋กœ๋ด‡์€ ์‹œ๊ฐ + ์ด‰๊ฐ์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋™์‹œ์— ์‚ฌ์šฉํ•  ์ค€๋น„๊ฐ€ ๋œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์ฃผ์–ด์ง„ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ํ‘œํ˜„ํ•˜๊ณ  ํ™œ์šฉํ•ด์•ผ ํ• ๊นŒ์š”? ์ตœ๊ทผ ์‹ ๊ฒฝ์žฅ(neural field) ๊ธฐ๋ฐ˜์˜ ์—ฐ์†์  3์ฐจ์› ํ‘œํ˜„์ด ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ๊ฐ๊ด‘๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ ๊ฒฝ์žฅ์€ ์ขŒํ‘œ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๊ทธ ์ง€์ ์˜ ํŠน์„ฑ(์˜ˆ: ๋ฐ€๋„, ์ƒ‰๊น”, ๊ฑฐ๋ฆฌ ๋“ฑ)์„ ์ถœ๋ ฅํ•˜๋Š” ์‹ ๊ฒฝ๋ง์œผ๋กœ, NeRF์™€ ๊ฐ™์€ ๊ธฐ์ˆ ์„ ํ†ตํ•ด ๊ณ ํ’ˆ์งˆ 3D ์žฌ๊ตฌ์„ฑ์ด ๊ฐ€๋Šฅํ•จ์ด ์ž…์ฆ๋˜์—ˆ์ฃ . ์‹ ๊ฒฝ์žฅ์€ ์—ฐ์†์ ์ด๊ณ  ํ•ด์ƒ๋„ ์ œํ•œ์ด ์—†๋Š” ํ‘œํ˜„์ด๋ผ์„œ, ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋‚˜ ๊ฒฉ์ž(mesh)๋ณด๋‹ค ๋ฌผ์ฒด ํ˜•์ƒ์„ ์ •๊ตํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์ „ํ†ต์ ์ธ NeRF๋Š” ์˜คํ”„๋ผ์ธ ์ผ๊ด„(batch) ์ตœ์ ํ™”์— ์น˜์ค‘๋˜์–ด ์žˆ์–ด, ๋กœ๋ด‡์˜ ์‹ค์‹œ๊ฐ„ ์˜จ๋ผ์ธ ์ธ์ง€์— ๋ฐ”๋กœ ์“ฐ๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๋‹คํ–‰ํžˆ๋„ ์ตœ๊ทผ์—๋Š” ๊ฒฝ๋Ÿ‰ํ™”๋œ SDF ์‹ ๊ฒฝ์žฅ ๋ชจ๋ธ๋“ค์ด ๋“ฑ์žฅํ•˜์—ฌ ์˜จ๋ผ์ธ์œผ๋กœ ํ™˜๊ฒฝ ์ง€๋„๋ฅผ ํ•™์Šตํ•˜๊ฑฐ๋‚˜ ๋ฌผ์ฒด๋ฅผ ์ถ”์ ํ•˜๋Š” ์‹œ๋„๊ฐ€ ์ด๋ค„์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ Ortiz ๋“ฑ์€ iSDF๋ฅผ ํ†ตํ•ด ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ™˜๊ฒฝ์˜ SDF ์ง€๋„๋ฅผ ๊ตฌ์ถ•ํ•˜์˜€๊ณ , Lin ๋“ฑ์€ iNeRF๋ฅผ ํ†ตํ•ด ๋ฏธ๋ฆฌ ํ•™์Šต๋œ NeRF๋ฅผ ์ด์šฉํ•ด ์นด๋ฉ”๋ผ pose ์ถ”์ •์„ ์—ญ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹ ๊ฒฝ์žฅ ๊ธฐ๋ฒ•์„ ๋กœ๋ด‡ manipulation์— ์ ์šฉํ•˜๋ ค๋Š” ์—ฐ๊ตฌ๋„ ์ ์ฐจ ๋‚˜ํƒ€๋‚˜๊ณ  ์žˆ์ง€๋งŒ, ์‹œ๊ฐ-์ด‰๊ฐ ๊ฐ™์ด ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ฉํ•˜๋Š” ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์€ ์•„์ง ๊ฐœ์ฒ™ ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฐฐ๊ฒฝ์—์„œ, CMUยทMeta AIยทBerkeley ๋“ฑ์˜ ํ˜‘์—…์œผ๋กœ ๋ฐœํ‘œ๋œ NeuralFeels (Science Robotics, 2024) ์—ฐ๊ตฌ๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ (์‹œ๊ฐ+์ด‰๊ฐ) SLAM ๋ฐฉ์‹์„ ํ†ตํ•ด ์†์•ˆ์˜ ๋ฌผ์ฒด๋ฅผ ๋™์‹œ์— ์ถ”์ ํ•˜๊ณ  ๋ชจ๋ธ๋งํ•˜๋Š” ์‹ ๋ขฐ์„ฑ ๋†’์€ ์ธ์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํ•œ๋งˆ๋””๋กœ ์š”์•ฝํ•˜๋ฉด: โ€œ๋‰ด๋Ÿด ํ•„๋“œ(์‹ ๊ฒฝ์žฅ)๋กœ ๋ฌผ์ฒด์˜** ๋А๋‚Œ(feel)๊นŒ์ง€ ํ•™์Šตํ•˜๋Š”โ€ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ๋กœ๋ด‡ ์†์— ์žฅ์ฐฉ๋œ ์นด๋ฉ”๋ผ์™€ ์ด‰๊ฐ ์„ผ์„œ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ ์ŠคํŠธ๋ฆผ์„ ๋ฐ›์•„, ๋ฌผ์ฒด์˜ ์ž์„ธ(6-DoF ์œ„์น˜/๋ฐฉํ–ฅ)์™€ ํ˜•์ƒ(3D ๋ชจ์–‘)์„ ์‹ค์‹œ๊ฐ„ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค. NeuralFeels๋Š” ๋ฌผ์ฒด์— ๋Œ€ํ•œ ์‚ฌ์ „ ๋ชจ๋ธ ์—†์ด ์™„์ „ํžˆ ์ฒ˜์Œ ๋ณด๋Š” ๋ฌผ์ฒด๋ผ๋„ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ๋ชจ์–‘์„ ๋ฐฐ์›Œ๊ฐ€๋ฉฐ** ์ถ”์ ํ•  ์ˆ˜ ์žˆ๊ณ , ์‹œ๊ฐ ์ •๋ณด๊ฐ€ ๋ถ€์กฑํ•ด๋„ ์ด‰๊ฐ์œผ๋กœ ๋ณด๊ฐ•ํ•˜์—ฌ ์ถ”์  ์ •ํ™•๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ์•ผ๊ฐ€๋ฆผ์ด ์‹ฌํ•œ ๊ฒฝ์šฐ ์ตœ๋Œ€ 94%๊นŒ์ง€ ์ถ”์  ์ •ํ™•๋„๊ฐ€ ํ–ฅ์ƒ๋˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์ตœ์ข…์ ์œผ๋กœ ํ˜•์ƒ ์žฌ๊ตฌ์„ฑ F-์Šค์ฝ”์–ด 81%์™€ ํ‰๊ท  ์ž์„ธ ์˜ค์ฐจ 4.7โ€ฏmm ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. (F-์Šค์ฝ”์–ด์™€ ์ž์„ธ ์˜ค์ฐจ๋Š” ๋’ค์—์„œ ์ž์„ธํžˆ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.) ๋˜ํ•œ ๋™์ผ ๊ธฐ๋ฒ•์„ ๋ฌผ์ฒด CAD ๋ชจ๋ธ์ด ์ฃผ์–ด์ง„ ๊ฒฝ์šฐ์— ์ ์šฉํ•˜๋ฉด, ํ‰๊ท  2.3โ€ฏmm ์ˆ˜์ค€๊นŒ์ง€ ์˜ค์ฐจ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์–ด ๊ธฐ์กด ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ถ”์ ๋ณด๋‹ค๋„ ํ–ฅ์ƒ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ์ด ๋ฐฉ๋ฒ•์„ ๊ตฌํ˜„ํ•œ ์†Œ์Šค ์ฝ”๋“œ์™€, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์‹ค์ œ ๋กœ๋ด‡์œผ๋กœ ์ˆ˜์ง‘ํ•œ FeelSight ๋ฐ์ดํ„ฐ์…‹(์ด 70ํšŒ ์‹คํ—˜ ์‹œํ€€์Šค)์„ ๊ณต๊ฐœํ•˜์—ฌ ํ–ฅํ›„ ์—ฐ๊ตฌ๋ฅผ ๊ฐ€์†ํ™”ํ•˜๊ณ ์ž ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ธ€์—์„œ๋Š” NeuralFeels ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด์™€ ๊ธฐ์—ฌ, ์‚ฌ์šฉ๋œ ๊ธฐ์ˆ  ์š”์†Œ์™€ ์•Œ๊ณ ๋ฆฌ์ฆ˜, ์ฃผ์š” ์‹คํ—˜ ๊ฒฐ๊ณผ, ๊ทธ๋ฆฌ๊ณ  ๊ฐ•์ ๊ณผ ํ•œ๊ณ„, ๋ฏธ๋ž˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ ๋“ฑ์„ ๋กœ๋ด‡๊ณตํ•™์ž์˜ ๊ด€์ ์—์„œ ์‹ฌ์ธต ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ๋ฆฌ์ฒ˜๋“œ ํŒŒ์ธ๋งŒ์˜ ์„ค๋ช…์ฒ˜๋Ÿผ ์ตœ๋Œ€ํ•œ ์ง๊ด€์  ๋น„์œ ๋ฅผ ๋“ค์–ด ์ดํ•ด๋ฅผ ๋•๊ณ ์ž ํ•˜๋ฉฐ, ๋ณต์žกํ•œ ์ˆ˜์‹์ด๋‚˜ ๊ธฐ์ˆ ์  ๊ฐœ๋…๋„ ์‰ฌ์šด ์–ธ์–ด๋กœ ํ’€์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

3.2 ๋ฐฉ๋ฒ•: NeuralFeels์˜ visuo-tactile SLAM ์•Œ๊ณ ๋ฆฌ์ฆ˜

NeuralFeels์˜ ๋ฐฉ๋ฒ•๋ก ์„ ํ•œ ๋งˆ๋””๋กœ ํ‘œํ˜„ํ•˜๋ฉด โ€œ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ SLAMโ€์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ SLAM์ด๋ž€ ๋กœ๋ด‡๊ณตํ•™์—์„œ ํ”ํžˆ ๋งํ•˜๋Š” ๋™์‹œ์  ์œ„์น˜์ถ”์ • ๋ฐ ์ง€๋„์ž‘์„ฑ(Simultaneous Localization And Mapping)์„ ๋œปํ•˜๋Š”๋ฐ์š”, ์ผ๋ฐ˜์ ์ธ SLAM์€ ๋กœ๋ด‡์ด ์ž์‹ ์˜ ์œ„์น˜์™€ ์ฃผ๋ณ€ ์ง€๋„๋ฅผ ๋™์‹œ์— ์•Œ์•„๋‚ด๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ํฅ๋ฏธ๋กญ๊ฒŒ๋„, NeuralFeels์—์„œ๋Š” ๋กœ๋ด‡์ด ์•„๋‹ˆ๋ผ ์†์— ๋“  ๋ฌผ์ฒด์˜ ์ž์„ธ(์œ„์น˜/์ž์„ธ)์™€ ๋ชจ์–‘(์ง€๋„์— ํ•ด๋‹น)์„ ๋™์‹œ์— ์ถ”์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋งฅ๋ฝ์€ ๊ฐ™์ง€๋งŒ ๋Œ€์ƒ์ด โ€œ๋ฌผ์ฒดโ€๋กœ ๋ฐ”๋€ SLAM์ด๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋กœ๋ด‡ ์†์€ ๋ฌผ์ฒด๋ฅผ ์ฅ๊ณ  ๋‹ค์–‘ํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ํšŒ์ „์‹œํ‚ค๋ฉด์„œ, RGB-D ์นด๋ฉ”๋ผ(์‹œ๊ฐ)์™€ ์†๊ฐ€๋ฝ์˜ ์ด‰๊ฐ ์„ผ์„œ๋กœ๋ถ€ํ„ฐ ์—ฐ์†์ ์ธ ๊ด€์ธก ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์„ ๋ฐ›์•„๋“ค์ž…๋‹ˆ๋‹ค. ์ด ์ŠคํŠธ๋ฆผ์„ ์ฒ˜๋ฆฌํ•˜์—ฌ ๋งค ์ˆœ๊ฐ„ โ€œ์ง€๋„โ€(๋ฌผ์ฒด์˜ ์ ์ง„์ ์œผ๋กœ ์™„์„ฑ๋˜๋Š” ๋ชจ์–‘)์™€ โ€œ์œ„์น˜โ€(๋ฌผ์ฒด์˜ ์ž์„ธ)๋ฅผ ๊ฐฑ์‹ ํ•ด๊ฐ€๋Š” ๊ฒƒ์ด NeuralFeels์˜ ํ•ต์‹ฌ ํ๋ฆ„์ž…๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์„ ์š”์•ฝํ•ฉ๋‹ˆ๋‹ค.

flowchart LR
    subgraph frontend["Frontend"]
        Vision[RGB-D ์นด๋ฉ”๋ผ] --> Seg[๋ถ„ํ• ]
        Seg --> VDepth[๊ฐ์ฒด ๊นŠ์ด ๋งต]
        Tactile[์ด‰๊ฐ ์ด๋ฏธ์ง€] --> TT[์ด‰๊ฐ ํŠธ๋žœ์Šคํฌ๋จธ]
        TT --> TDepth[์ ‘์ด‰ ๊นŠ์ด ๋งต]
    end
    subgraph backend["Backend"]
        VDepth & TDepth --> SDF[์‹ ๊ฒฝ์žฅ SDF ๋ชจ๋ธ]
        SDF --> PoseOpt[๋ฌผ์ฒด ์ž์„ธ ์ตœ์ ํ™”]
        PoseOpt --> SDF
    end
    PoseOpt --> PoseOut[์ถ”์ • ๋ฌผ์ฒด ์ž์„ธ]
    SDF --> ShapeOut[์ถ”์ • ๋ฌผ์ฒด ํ˜•์ƒ]

์œ„ ํŒŒ์ดํ”„๋ผ์ธ์€ ํฌ๊ฒŒ ํ”„๋ก ํŠธ์—”๋“œ(Frontend)์™€ ๋ฐฑ์—”๋“œ(Backend) ๋ชจ๋“ˆ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค. ํ”„๋ก ํŠธ์—”๋“œ๋Š” ์›์‹œ ์„ผ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„ ์œ ์šฉํ•œ ํ˜•ํƒœ์˜ ์ž…๋ ฅ์œผ๋กœ ๋ณ€ํ™˜ํ•ด์ค๋‹ˆ๋‹ค. ๋ฐฑ์—”๋“œ๋Š” ์ด ์ž…๋ ฅ์„ ํ† ๋Œ€๋กœ ์‹ ๊ฒฝ์žฅ(Neural Field) ํ˜•ํƒœ์˜ ๋ฌผ์ฒด ๋ชจ๋ธ(์ง€๋„)์„ ์‹ค์‹œ๊ฐ„ ํ•™์Šตํ•˜๋ฉด์„œ, ํ•œํŽธ์œผ๋กœ๋Š” ๋ฌผ์ฒด์˜ 6์ž์œ ๋„ ์ž์„ธ๋„ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ถ€๋ถ„์„ ์ฐจ๋ก€๋กœ ์ž์„ธํžˆ ์‚ดํŽด๋ณด์ฃ .

3.2.1 ํ”„๋ก ํŠธ์—”๋“œ: ์‹œ๊ฐ-์ด‰๊ฐ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ

ํ”„๋ก ํŠธ์—”๋“œ์˜ ์—ญํ• ์€ ๋‹ค์–‘ํ•œ ์„ผ์„œ ์ถœ๋ ฅ(์นด๋ฉ”๋ผ ์˜์ƒ, ๊นŠ์ด, ์ด‰๊ฐ ์ด๋ฏธ์ง€ ๋“ฑ)์—์„œ ๋ฌผ์ฒด์— ๋Œ€ํ•œ ์œ ์šฉํ•œ ์ •๋ณด๋งŒ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋งˆ์น˜ ์‚ฌ๋žŒ์œผ๋กœ ์น˜๋ฉด, ๋ˆˆ์— ๋ณด์ด๋Š” ๋ณต์žกํ•œ ์žฅ๋ฉด์—์„œ ๊ด€์‹ฌ ๋ฌผ์ฒด๋งŒ ์ธ์‹ํ•ด๋‚ด๊ณ , ์†๋์˜ ์‹ ํ˜ธ์—์„œ ํ‘œ๋ฉด์˜ ๊ตด๊ณก๋งŒ ๋ฝ‘์•„๋‚ด๋Š” ๊ณผ์ •์ด๋ผ ํ•  ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ์‹œ๊ฐ ๋ถ€: ๋กœ๋ด‡ ์†์ด ๋ฌผ์ฒด๋ฅผ ์ฅ๊ณ  ์žˆ์„ ๋•Œ ์นด๋ฉ”๋ผ๊ฐ€ ๋ณด๋Š” ์žฅ๋ฉด์€ ๋ณต์žกํ•ฉ๋‹ˆ๋‹ค. ์†๊ฐ€๋ฝ, ๋ฌผ์ฒด, ๋ฐฐ๊ฒฝ์ด ์„ž์—ฌ ์žˆ๊ณ , ํŠนํžˆ ๋ฌผ์ฒด ์ผ๋ถ€๋Š” ์†์— ๊ฐ€๋ ค ๋ณด์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. NeuralFeels๋Š” ๊ณ ์ •๋œ RGB-D ์นด๋ฉ”๋ผ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ ์–ป์€ ๊นŠ์ด๋ฒ„ํผ(depth)์—์„œ ๋ฌผ์ฒด ๋ถ€๋ถ„๋งŒ ๋ถ„๋ฆฌํ•ด๋‚ด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•ต์‹ฌ์€ Meta AI์—์„œ ์ตœ๊ทผ ๊ณต๊ฐœํ•œ ์„ธ๊ทธ๋จผํŠธ ์—๋‹ˆ์‹ฑ ๋ชจ๋ธ(SAM) ๊ฐ™์€ ํŒŒ์šด๋ฐ์ด์…˜ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜๋˜, ๋กœ๋ด‡์˜ ๊ธฐ๊ตฌํ•™ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ–ˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋กœ๋ด‡ ์†์˜ ๊ด€์ ˆ ๊ฐ๋„(์ž์„ธ)๋Š” ํ•ญ์ƒ ์•Œ๊ณ  ์žˆ์œผ๋ฏ€๋กœ, ์ด๋ฅผ ์ด์šฉํ•˜๋ฉด ํ˜„์žฌ ํ”„๋ ˆ์ž„์—์„œ ์†๊ฐ€๋ฝ๊ณผ ์†๋ฐ”๋‹ฅ์˜ 3D ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์˜์—ญ์„ ํ”„๋กฌํ”„ํŠธ(prompt)๋กœ ์ฃผ์–ด SAM์—๊ฒŒ โ€œ์ด ์˜์—ญ์€ ๋กœ๋ด‡ ์†์ด๋‹ˆ ๋ฐฐ์ œํ•˜๊ณ , ๋‚˜๋จธ์ง€ ์ค‘ ํŠน์ • ๋ฌผ์ฒด์— ์†ํ•˜๋Š” ํ”ฝ์…€์„ ์ฐพ์•„๋ผโ€๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๊ฒ ์ง€์š”. ์ €์ž๋“ค์€ ์ด๋ ‡๊ฒŒ ๋กœ๋ด‡ ์† kinematics๋กœ ์ œํ•œ ์กฐ๊ฑด์„ ์ค€ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์„ ํ†ตํ•ด, ์†๊ณผ ๋ฌผ์ฒด๊ฐ€ ๋ณต์žกํ•˜๊ฒŒ ๋’ค์—‰์ผœ ์žˆ์–ด๋„ ๋ฌผ์ฒด์˜ ํ”ฝ์…€๋งŒ ์ž˜ ๋ถ„๋ฆฌํ•ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ RGB-D ์นด๋ฉ”๋ผ๋กœ๋ถ€ํ„ฐ ๋ฐฐ๊ฒฝ๊ณผ ์†์ด ์ œ๊ฑฐ๋œ ์˜ค๋ธŒ์ ํŠธ์˜ ๊นŠ์ด๋งต D_{\text{vision}}์„ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ์‹œ๊ฐ ๊นŠ์ด๋งต์—๋Š” ๋ฌผ์ฒด ํ‘œ๋ฉด์˜ ์ผ๋ถ€ (์นด๋ฉ”๋ผ์— ๋ณด์ด๋Š” ๋ถ€๋ถ„)์— ๋Œ€ํ•œ ๊ฑฐ๋ฆฌ ์ •๋ณด๊ฐ€ ์ด˜์ด˜ํžˆ ๋“ค์–ด์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ด‰๊ฐ ๋ถ€: ์†๊ฐ€๋ฝ ๋์— ์žฅ์ฐฉ๋œ DIGIT ์ด‰๊ฐ ์„ผ์„œ๋“ค์€ ๋งˆ์น˜ ์ž‘์€ ์นด๋ฉ”๋ผ์ฒ˜๋Ÿผ, ์†๊ฐ€๋ฝ ์ คํŒจ๋“œ ํ‘œ๋ฉด์˜ ๋ณ€ํ˜•์„ ์ด๋ฏธ์ง€๋กœ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์†๊ฐ€๋ฝ์ด ๋ฌผ์ฒด๋ฅผ ๋ˆ„๋ฅด๋ฉด ๊ทธ ๋ถ€๋ถ„์— ์„ ๋ช…ํ•œ ์œค๊ณฝ์ด ์ฐํ˜€๋‚˜์˜ค์ฃ . ํ•˜์ง€๋งŒ ์ด ์ด‰๊ฐ ์ด๋ฏธ์ง€๋Š” ์‚ฌ๋žŒ ๋ˆˆ์—๋„ ํ•ด์„ํ•˜๊ธฐ ์‰ฝ์ง€ ์•Š๊ณ , ์ผ๋ฐ˜ ์ž์—ฐ์˜์ƒ๊ณผ๋Š” ์ „ํ˜€ ๋‹ค๋ฅธ ๋ถ„ํฌ(์กฐ๋ช… ํŒจํ„ด ๋“ฑ)๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ์ด๋ฏธ์ง€์—์„œ ๋ฐ”๋กœ 3D ์ •๋ณด๋ฅผ ์–ป๊ธฐ๋Š” ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ์—์„œ๋Š” ์ด‰๊ฐ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN)์œผ๋กœ ์ฒ˜๋ฆฌํ•ด ์ ‘์ด‰ ์ง€ํ˜•์˜ ๊นŠ์ด๋ฅผ ์ถ”์ •ํ•˜๊ณค ํ–ˆ๋Š”๋ฐ, NeuralFeels๋Š” ํ•œ ๋ฐœ ๋” ๋‚˜์•„๊ฐ€ Vision Transformer(ViT) ๊ธฐ๋ฐ˜์˜ ์ด‰๊ฐ ํŠธ๋žœ์Šคํฌ๋จธ ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. ViT๊ฐ€ ์ž์—ฐ์˜์ƒ์˜ ๊นŠ์ด ์ถ”์ •์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค๋Š” ์ตœ๊ทผ ์—ฐ๊ตฌ์— ์ฐฉ์•ˆํ•œ ๊ฒƒ์ธ๋ฐ์š”, ์ด‰๊ฐ ๋ฐ์ดํ„ฐ์—๋„ ์ž๊ธฐ์–ดํ…์…˜ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ์ด ํšจ๊ณผ์ ์ผ ๊ฒƒ์ด๋ผ ๋ณธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ์ด‰๊ฐ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ์ „์ ์œผ๋กœ ์ž์œจ ํ•™์Šต๋˜์—ˆ๋Š”๋ฐ, Meta AI์˜ TACTO ํ”„๋ ˆ์ž„์›Œํฌ ๋“ฑ์„ ํ™œ์šฉํ•ด ๋‹ค์–‘ํ•œ ๊ฐ€์ƒ์˜ ๋ฌผ์ฒด๋ฅผ DIGIT๋กœ ๋ˆ„๋ฅด๋Š” ์ƒํ™ฉ์„ ๋งŒ๋“ค๊ณ , ๊ทธ๋•Œ์˜ ์ด‰๊ฐ ์ด๋ฏธ์ง€์™€ ์ •ํ™•ํ•œ ์ ‘์ด‰ ๊นŠ์ด๋งต ์Œ์„ ๋Œ€๋Ÿ‰์œผ๋กœ ๋ชจ์•˜์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ์นด๋ฉ”๋ผ ๋…ธ์ด์ฆˆ๋‚˜ ์ด‰๊ฐ์„ผ์„œ ํŽธ์ฐจ ๋“ฑ์„ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์œผ๋กœ ์„ž์–ด ํ•™์Šต์‹œ์ผœ, ์‹ค์ œ ๋‹ค์–‘ํ•œ DIGIT ์„ผ์„œ์— ๋ฒ”์šฉ์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ์ด ๋ชจ๋ธ์€ ์ฃผ์–ด์ง„ ์ด‰๊ฐ ์ด๋ฏธ์ง€์—์„œ ์ ‘์ด‰ ํ‘œ๋ฉด์˜ ๊นŠ์ด๋งต D_{\text{tactile}}์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ ‘์ด‰ํ•œ ๋ถ€๋ถ„์€ ๊นŠ์ด๊ฐ’ (์†๊ฐ€๋ฝ ํ‘œ๋ฉด์œผ๋กœ๋ถ€ํ„ฐ์˜ ๊ตญ์†Œ ๋ณ€์œ„)์ด ๋‚˜ํƒ€๋‚˜๊ณ , ์ ‘์ด‰์ด ์—†๋Š” ๋ถ€๋ถ„์€ ๋นˆ ์˜์—ญ์œผ๋กœ ๋งˆ์Šคํ‚นํ•˜์—ฌ ๋น„์ ‘์ด‰ ์˜์—ญ์€ ๋ฌด์‹œํ•˜๋„๋ก ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์˜ ๋ณด๊ณ ์— ๋”ฐ๋ฅด๋ฉด ์ด ์ด‰๊ฐ ๊นŠ์ด์ง€๋„ ์˜ˆ์ธก์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ…Œ์ŠคํŠธ ์„ธํŠธ ๊ธฐ์ค€ ํ‰๊ท  ์˜ค๋ฅ˜๊ฐ€ ๋งค์šฐ ๋‚ฎ์€ ์ˆ˜์ค€์ด๋ฉฐ, ์‹ค์ œ ๋ฐ์ดํ„ฐ์—๋„ ์ž˜ ๋“ค์–ด๋งž๋Š”๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค (Fig. 8(b)์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜-์‹ค์ œ ๊ฐ„ ์˜ˆ์ธก์„ ๊ฒ€์ฆ).

ํ”„๋ก ํŠธ์—”๋“œ์˜ ๊ฒฐ๊ณผ๋กœ, ๋งค ์‹œ๊ฐ ํ”„๋ ˆ์ž„๋งˆ๋‹ค ๋‘ ๊ฐ€์ง€ ๊นŠ์ด ์ •๋ณด D_{\text{vision}} (์นด๋ฉ”๋ผ ๊ธฐ๋ฐ˜)๊ณผ D_{\text{tactile}} (์ด‰๊ฐ ๊ธฐ๋ฐ˜)์ด ์‚ฐ์ถœ๋ฉ๋‹ˆ๋‹ค. ๊ฐ๊ฐ ๋ฌผ์ฒด์˜ ๊ฒ‰๋ณด๊ธฐ ํ‘œ๋ฉด ์ผ๋ถ€์™€ ์†๊ฐ€๋ฝ์ด ๋‹ฟ์€ ๋ถ€๋ถ„์˜ ๊ตญ์†Œ ํ‘œ๋ฉด์„ ๋‚˜ํƒ€๋‚ด์ฃ . ์ด์ฒ˜๋Ÿผ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ๋กœ ์–ป์€ "์ ๊ตฐ" ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ๋Š” ์ผ๊ด€๋œ 3D ๋ชจ๋ธ๋กœ ํ†ตํ•ฉํ•˜๋Š” ์ž‘์—…์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

3.2.2 ๋ฐฑ์—”๋“œ: Neural Field ๊ธฐ๋ฐ˜ ํ˜•ํƒœ ํ•™์Šต๊ณผ ์ž์„ธ ์ถ”์ 

NeuralFeels ๋ฐฑ์—”๋“œ๋Š” ๋ฌผ์ฒด์˜ ํ˜•์ƒ์„ ์‹ ๊ฒฝ์žฅ ๋ชจ๋ธ๋กœ ํ•™์Šต(mapping)ํ•˜๊ณ , ๋™์‹œ์— ๋ฌผ์ฒด ์ž์„ธ๋ฅผ ์ถ”์ (localization)ํ•˜๋Š” ์ตœ์ ํ™” ์—”์ง„์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋งˆ์น˜ ์ง€๋„ ์ž‘์„ฑ์ž์™€ ์ธก๋Ÿ‰์‚ฌ๊ฐ€ ํ•œ ํŒ€์„ ์ด๋ค„ ๊ต๋Œ€๋กœ ์ผํ•˜๋Š” ๋ชจ์Šต๊ณผ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค. ์ง€๋„ ์ž‘์„ฑ์ž๋Š” ํ˜„์žฌ๊นŒ์ง€ ๋ชจ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ์ง€๋„๋ฅผ ์กฐ๊ธˆ์”ฉ ์ˆ˜์ •ํ•˜๊ณ , ์ธก๋Ÿ‰์‚ฌ๋Š” ๊ทธ ์ง€๋„(๋ชจ๋ธ)๋ฅผ ์ฐธ๊ณ ํ•ด ํ˜„์žฌ ์œ„์น˜(์ž์„ธ)๋ฅผ ๋ฐ”๋กœ์žก๋Š” ์‹์ด์ฃ . NeuralFeels์—์„œ๋Š” ๋‘ ์ž‘์—…์„ ๊ต์ฐจ ๋ฐ˜๋ณต(alternate)ํ•˜์—ฌ, ์‹œ๊ฐ„์ด ์ง€๋‚ ์ˆ˜๋ก ์ •ํ™•ํ•œ ํ˜•์ƒ๊ณผ ์ž์„ธ๋ฅผ ๋™์‹œ์— ์–ป์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ํ•œ ๋ฃจํ”„(iteration)์—์„œ๋Š” ๋จผ์ € ํฌ์ฆˆ ์ตœ์ ํ™”(Pose Optimization)๋ฅผ ์ˆ˜ํ–‰ํ•œ ํ›„, ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ํ† ๋Œ€๋กœ ํ˜•์ƒ ์ตœ์ ํ™”(Shape Optimization)๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ํ˜•์ƒ ์ตœ์ ํ™” ๋‹จ๊ณ„์—์„œ ์‹ ๊ฒฝ SDF(Signed Distance Function) ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋ฌผ์ฒด ํ˜•์ƒ์„ ํ•™์Šตํ•˜๊ณ , ํฌ์ฆˆ ์ตœ์ ํ™” ๋‹จ๊ณ„์—์„œ๋Š” ํ˜„์žฌ ์‹ ๊ฒฝ์žฅ ๋ชจ๋ธ์„ ๊ณ ์ •ํ•œ ์ฑ„ ๋ฌผ์ฒด์˜ ์ž์„ธ ๋ณ€์ˆ˜๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ๋™์‹œ ์ตœ์ ํ™”์˜ ์–ด๋ ค์›€์„ ํ”ผํ•˜๋ฉด์„œ๋„, ๋น ๋ฅธ ๊ต๋Œ€ ๋ฐ˜๋ณต์œผ๋กœ ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋™์‹œ์ถ”์ •์— ์ˆ˜๋ ดํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ ๋‚ด๋ถ€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ž์„ธํžˆ ๋“ค์—ฌ๋‹ค๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

(a) ํ˜•์ƒ ํ‘œํ˜„๊ณผ ์ตœ์ ํ™” โ€“ NeuralFeels๋Š” ๋ฌผ์ฒด์˜ ํ˜•์ƒ์„ ์‹ ๊ฒฝ์žฅ(SDF)์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” 3D ๊ณต๊ฐ„์˜ ์ขŒํ‘œ \mathbf{x}๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๊ทธ ์ ์ด ๋ฌผ์ฒด ํ‘œ๋ฉด์œผ๋กœ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€(๋ถ€ํ˜ธ์žˆ๋Š” ๊ฑฐ๋ฆฌ) ์ถœ๋ ฅํ•˜๋Š” ์—ฐ์† ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. SDF ๊ฐ’์ด 0์ธ ์ขŒํ‘œ๋“ค์˜ ์ง‘ํ•ฉ์ด ๋ฐ”๋กœ ๋ฌผ์ฒด์˜ ํ‘œ๋ฉด์„ ์ด๋ฃจ์ฃ . ์ด SDF๋ฅผ ์ž‘์€ ๋‹ค์ธตํผ์…‰ํŠธ๋ก (MLP)์œผ๋กœ ํ‘œํ˜„ํ•˜๋˜, Instant-NGP ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ด ๋‹ค์ค‘ ํ•ด์ƒ๋„ ๊ฒฉ์ž ์ž„๋ฒ ๋”ฉ์œผ๋กœ ํ•™์Šต์„ ๊ฐ€์†ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์‰ฝ๊ฒŒ ๋งํ•ด, iSDF ์Šคํƒ€์ผ์˜ ๊ฒฝ๋Ÿ‰ ๋„คํŠธ์›Œํฌ๋กœ ๋ฌผ์ฒด ๋ชจ์–‘์„ ์ ์  ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ํ˜•์ƒ ๋„คํŠธ์›Œํฌ(์ง€๋„)๋Š” ์‹ค์‹œ๊ฐ„์œผ๋กœ ์—…๋ฐ์ดํŠธ๋˜์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ชจ๋“  ํ”„๋ ˆ์ž„ ๋ฐ์ดํ„ฐ๋ฅผ ๋ˆ„์ ํ•˜์—ฌ ํ•œ๊บผ๋ฒˆ์— ํ•™์Šต์‹œํ‚ค๋Š” ๊ฑด ๋น„ํ˜„์‹ค์ ์ž…๋‹ˆ๋‹ค. ๋Œ€์‹  ์ €์ž๋“ค์€ ํ‚คํ”„๋ ˆ์ž„(keyframe) ๊ฐœ๋…์„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ •๋ณด๋Ÿ‰์ด ๋งŽ์€ ํ”„๋ ˆ์ž„๋งŒ ์„ ๋ณ„ํ•˜์—ฌ ํ‚คํ”„๋ ˆ์ž„์œผ๋กœ ์œ ์ง€ํ•˜๊ณ , ์ƒˆ๋กœ์šด ๊ด€์ธก์ด ๋“ค์–ด์˜ฌ ๋•Œ ๊ธฐ์—ฌ๋„๊ฐ€ ํฐ ๊ฒฝ์šฐ์—๋งŒ ํ‚คํ”„๋ ˆ์ž„์œผ๋กœ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์˜ค๋ž˜๋œ ์ •๋ณด๊ฐ€ ์™„์ „ํžˆ ์žŠํžˆ์ง€ ์•Š๋„๋ก, ๊ณผ๊ฑฐ ํ‚คํ”„๋ ˆ์ž„ ์ค‘์—์„œ๋„ ์˜ค์ฐจ๊ฐ€ ์ปธ๋˜ ๊ฒƒ๋“ค์€ ํ™•๋ฅ ์ ์œผ๋กœ ์žฌํ‘œํ˜„ํ•˜์—ฌ ํ•™์Šต์‹œ ๋ฆฌํ”Œ๋ ˆ์ดํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” catastrophic forgetting(๊ณผ๊ฑฐ ์ •๋ณด ์†Œ์‹ค) ํ˜„์ƒ์„ ๋ง‰๊ธฐ ์œ„ํ•œ ์žฅ์น˜์ž…๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ตœ์ดˆ ํ”„๋ ˆ์ž„์€ ๋ฌด์กฐ๊ฑด ํ‚คํ”„๋ ˆ์ž„์œผ๋กœ ํƒํ•˜๊ณ , ์ดํ›„์—๋Š” โ€œ๋ Œ๋”๋ง ์†์‹คโ€์ด ์ผ์ • ์ž„๊ณ„๊ฐ’ ์ด์ƒ ์ฆ๊ฐ€ํ•œ ๊ฒฝ์šฐ ๊ทธ ํ”„๋ ˆ์ž„์„ ํ‚คํ”„๋ ˆ์ž„์œผ๋กœ ์ถ”๊ฐ€ํ•˜๋ฉฐ, ๋„ˆ๋ฌด ์˜ค๋žซ๋™์•ˆ ํ‚คํ”„๋ ˆ์ž„์ด ์ถ”๊ฐ€๋˜์ง€ ์•Š์œผ๋ฉด ์ฃผ๊ธฐ์ ์œผ๋กœ ํ•˜๋‚˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ˜•์ƒ ์ตœ์ ํ™”์‹œ์—๋Š”, ์ˆ˜์ง‘ํ•œ ๊นŠ์ด๋งต๋“ค(์‹œ๊ฐ ๋ฐ ์ด‰๊ฐ)๋กœ๋ถ€ํ„ฐ ๋ฌผ์ฒด ํ‘œ๋ฉด๊ณผ ์ฃผ๋ณ€ ๊ณต๊ฐ„์—์„œ ์ƒ˜ํ”Œ ์ ๋“ค์„ ์ถ”์ถœํ•˜์—ฌ SDF ๋„คํŠธ์›Œํฌ์˜ ์†์‹ค(loss)์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ํ‘œ๋ฉด์— ํ•ด๋‹นํ•˜๋Š” ํ”ฝ์…€(์˜ˆ: ๊นŠ์ด๋งต์—์„œ ๋ฌผ์ฒด๊ฐ€ ๊ด€์ธก๋œ ํ”ฝ์…€๋“ค)์€ ๊ทธ ๊ด‘์„ (ray)์„ ๋”ฐ๋ผ ํ‘œ๋ฉด ๋ถ€๊ทผ์˜ 3D ์ ๋“ค์„ ๋ฝ‘์•„ SDF=0์ด ๋˜๋„๋ก ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ๋ฌผ์ฒด๊ฐ€ ๋ณด์ด์ง€ ์•Š์€ ๋นˆ ๊ณต๊ฐ„์˜ ํ”ฝ์…€์€ ๊ทธ ์„ ์ƒ์—์„œ ๋ฌผ์ฒด๊ฐ€ ์—†์–ด์•ผ ํ•˜๋ฏ€๋กœ, ํ•ด๋‹น ๊ตฌ๊ฐ„ ์ ๋“ค์€ ์–‘์˜ SDF(๊ฑฐ๋ฆฌ) ๊ฐ’์„ ๊ฐ–๋„๋ก ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ํ•œ ํ”„๋ ˆ์ž„์˜ ๊ด‘์„ ๋‹น ํ‘œ๋ณธ ์ค‘์—๋Š” ํ‘œ๋ฉด ๊ทผ์ฒ˜ ์ ๋“ค๊ณผ ๊ณต๊ฐ„ ์ ๋“ค์„ ํ˜ผํ•ฉํ•˜์—ฌ ์ผ์ • ์ˆ˜ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์นด๋ฉ”๋ผ (vision)์˜ ๊ฒฝ์šฐ ๋ฌผ์ฒด ์ฃผ๋ณ€ ๋นˆ ๊ณต๊ฐ„๋„ ์ค‘์š”ํ•˜๋ฏ€๋กœ ํ‘œ๋ฉด:๊ณต๊ฐ„ ๋น„์œจ์„ ์ ์ ˆํžˆ ์„ž๊ณ , ์ด‰๊ฐ (touch)์˜ ๊ฒฝ์šฐ ์ด‰๊ฐ์„ผ์„œ๋Š” ์ ‘์ด‰๋œ ํ‘œ๋ฉด ์ฃผ๋ณ€ ์ •๋ณด๋งŒ ์žˆ์œผ๋ฏ€๋กœ ํ‘œ๋ฉด์  ์œ„์ฃผ๋กœ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋ชจ์€ ์ƒ˜ํ”Œ๋“ค \{\mathbf{x}_{i}\}์— ๋Œ€ํ•ด SDF ๋„คํŠธ์›Œํฌ์˜ ์˜ˆ์ธก f_{\Theta}\left( \mathbf{x}_{i} \right)๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , Truncated SDF Loss๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Azinoviฤ‡ ๋“ฑ์˜ ๋ฐฉ์‹์ฒ˜๋Ÿผ, ํ‘œ๋ฉด์œผ๋กœ๋ถ€ํ„ฐ ์ผ์ • ์ž„๊ณ„ ๊ฑฐ๋ฆฌ \tau ์ด๋‚ด์˜ ์ ๋“ค์€ SDF ์˜ˆ์ธก๊ฐ’ d_{i}์™€ ๋ชฉํ‘œ๊ฐ’(ํ‘œ๋ฉด์ ์ด๋ฉด 0, ๊ณต๊ฐ„์ ์ด๋ฉด \tau ์ •๋„์˜ ์–‘์ˆ˜)์„ ๋น„๊ตํ•ด ์ œ๊ณฑ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , \left| d_{i} \right|๊ฐ€ ๋„ˆ๋ฌด ํฐ ์ ๋“ค์€ ์ด์ƒ์น˜๋กœ ์ทจ๊ธ‰ํ•˜์—ฌ Loss์— ์™„์ „ํžˆ ๋ฐ˜์˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค (loss ๊ธฐ์—ฌ๋ฅผ ์ž˜๋ผ๋ƒ„). ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด SDF ํ•จ์ˆ˜๊ฐ€ ๋ฌผ์ฒด ํ‘œ๋ฉด ๊ทผ์ฒ˜์—์„œ๋Š” ์ •ํ™•ํžˆ 0์„ ๋งž์ถ”๊ณ , ๋จผ ๊ณต๊ฐ„๊นŒ์ง€๋Š” ๊ตณ์ด ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜์ง€ ์•Š์•„๋„ ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜๋ ดํ•ฉ๋‹ˆ๋‹ค. ์š”์ปจ๋Œ€, ํ˜•์ƒ ์ตœ์ ํ™”๋Š” ํ˜„์žฌ ์ถ”์ •๋œ ๋ฌผ์ฒด ์ž์„ธ๋“ค(ํฌ์ฆˆ)์— ๋งž์ถฐ, ์‹œ๊ฐ-์ด‰๊ฐ ๊ด€์ธก๊ฐ’์— ์ผ์น˜ํ•˜๋„๋ก SDF ํŒŒ๋ผ๋ฏธํ„ฐ \Theta๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค.

(b) ์ž์„ธ ์ถ”์ •(ํฌ์ฆˆ) ์ตœ์ ํ™” โ€“ ์•ž์„œ ํ˜•์ƒ ํ•™์Šต์—์„œ๋Š” ๋ฌผ์ฒด์˜ ์ž์„ธ๊ฐ€ ์ด๋ฏธ ์ฃผ์–ด์ง„ ๊ฒƒ์ฒ˜๋Ÿผ ์ง„ํ–‰ํ–ˆ์ง€๋งŒ, ์‹ค์ œ๋ก  ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋ฌผ์ฒด๊ฐ€ ์†์•ˆ์—์„œ ์›€์ง์ด๊ณ  ํšŒ์ „ํ•ฉ๋‹ˆ๋‹ค. ํ”„๋ ˆ์ž„๋งˆ๋‹ค ๋ฌผ์ฒด ์ž์„ธ T_{t} (์˜ˆ: ์›”๋“œ ์ขŒํ‘œ๊ณ„์—์„œ ๋ฌผ์ฒด ์ขŒํ‘œ๊ณ„๋กœ์˜ ๋ณ€ํ™˜)๋„ ์ถ”์ ํ•ด์•ผ ํ•˜๋Š”๋ฐ, ์ด๋Š” ์‰ฌ์šด ๋ฌธ์ œ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. NeuralFeels๋Š” ์ด๋ฅผ ํฌ์ฆˆ ๊ทธ๋ž˜ํ”„ ์ตœ์ ํ™” ํ˜•ํƒœ๋กœ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐœ๋…์ƒ Visual SLAM์˜ BA(Bundle Adjustment)์™€ ๋น„์Šทํ•˜๊ฒŒ, ์ตœ๊ทผ์˜ ์—ฌ๋Ÿฌ ํ‚คํ”„๋ ˆ์ž„๋“ค์— ๋Œ€ํ•œ ๋ฌผ์ฒด ์ž์„ธ๋ฅผ ๋ฌถ์–ด์„œ ํ•œ๊บผ๋ฒˆ์— ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๊ฐ€์žฅ ์ตœ๊ทผ N๊ฐœ์˜ ํ‚คํ”„๋ ˆ์ž„์— ํ•ด๋‹นํ•˜๋Š” ๋ฌผ์ฒด ์ž์„ธ \{ T_{t - N + 1},...,T_{t}\}๋ฅผ ๋ณ€์ˆ˜๋กœ ๋‘๊ณ , ๊ฐ ํ‚คํ”„๋ ˆ์ž„์˜ ๊ด€์ธก๊ฐ’๊ณผ ํ˜„์žฌ ํ˜•์ƒ ๋ชจ๋ธ(๋™๊ฒฐ๋œ SDF)์„ ์ผ์น˜์‹œํ‚ค๋Š” ๋ชฉ์ ์‹๋“ค์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชฉ์ ์‹๋“ค์„ ๊ทธ๋ž˜ํ”„์ƒ์˜ Factor๋กœ ๋ณผ ์ˆ˜ ์žˆ์–ด์„œ Factor Graph ํ˜น์€ Pose Graph๋ผ ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ํŽ˜์ด์„œ์Šค(Theseus)๋ผ๋Š” PyTorch๊ธฐ๋ฐ˜ ์ตœ์ ํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์ด ๊ทธ๋ž˜ํ”„๋ฅผ ๋น„์„ ํ˜• ์ตœ์†Œ์ œ๊ณฑ ๋ฌธ์ œ๋กœ ํ’€์—ˆ์Šต๋‹ˆ๋‹ค. ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ 2์ฐจ ๊ธฐ๋ฒ•์ธ Levenbergโ€“Marquardt (LM)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค (๊ธฐ์กด iNeRF ๋“ฑ์€ ๊ฒฝ์‚ฌํ•˜๊ฐ•(1์ฐจ)์œผ๋กœ ํ–ˆ๋˜ ๋ฐ ๋น„ํ•ด ๊ฐœ์„ ).

ํฌ์ฆˆ ๊ทธ๋ž˜ํ”„์˜ Factor(์ฝ”์ŠคํŠธ ํ•ญ)๋“ค์€ ํฌ๊ฒŒ ์„ธ ์ข…๋ฅ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค: 1. ์‹ ๊ฒฝ์žฅ ์ •ํ•ฉ ์˜ค์ฐจ(SDF alignment loss) โ€“ ํ˜„์žฌ ๊ณ ์ •๋œ ์‹ ๊ฒฝ์žฅ SDF ๋ชจ๋ธ๊ณผ ๊ฐ ํ‚คํ”„๋ ˆ์ž„์˜ ๊นŠ์ด ๊ด€์ธก(์‹œ๊ฐ/์ด‰๊ฐ)์„ ์ผ์น˜์‹œํ‚ค๋Š” ํ•ญ์ž…๋‹ˆ๋‹ค. iNeRF์—์„œ ์นด๋ฉ”๋ผ ํฌ์ฆˆ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋ Œ๋”๋ง๋œ ์˜์ƒ๊ณผ ์‹ค์ œ ์˜์ƒ์„ ๋งž์ถ”๋˜ ๊ฒƒ์„ ์—ฐ์ƒํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๊ฐ ํ‚คํ”„๋ ˆ์ž„ k์˜ ๊นŠ์ด๋งต ํ”ฝ์…€๋“ค์„ ๊ด‘์„ ์œผ๋กœ ์ด์„œ ํ‘œ๋ฉด์  ๋ช‡ ๊ฐœ์”ฉ ์ƒ˜ํ”Œ๋งํ•œ ๋’ค(์˜ค์ง ํ‘œ๋ฉด ๋ถ€๊ทผ ์ ๋“ค๋งŒ, ์™œ๋ƒํ•˜๋ฉด ํ‘œ๋ฉด์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ์ ๋“ค์€ ์˜ค์ฐจ ์‹ ํ˜ธ๊ฐ€ ์•ฝํ•˜๋‹ˆ๊นŒ์š”), ํ˜„์žฌ ์ถ”์ •๋œ ๋ฌผ์ฒด ํฌ์ฆˆ T_{k}๋กœ ์ด ์ ๋“ค์„ ๋ฌผ์ฒด ์ขŒํ‘œ๊ณ„๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ ๋“ค์˜ SDF ์˜ˆ์ธก๊ฐ’์„ ์‹ ๊ฒฝ์žฅ์œผ๋กœ๋ถ€ํ„ฐ ๊ตฌํ•˜๊ณ , ์ด๋“ค์ด 0์— ๊ฐ€๊น๋„๋ก(ํ‘œ๋ฉด์ด์–ด์•ผ ํ•˜๋ฏ€๋กœ) ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ง๊ด€์ ์œผ๋กœ, โ€œํ˜„์žฌ ๋ฌผ์ฒด ํฌ์ฆˆ์—์„œ ๋ณด๋ฉด ํ‚คํ”„๋ ˆ์ž„ ๊นŠ์ด ๋งต์— ์žˆ๋Š” ์ ๋“ค์ด ์ •ํ™•ํžˆ SDF ๋ชจ๋ธ ํ‘œ๋ฉด์— ๋†“์—ฌ์•ผ ํ•œ๋‹คโ€๋Š” ์กฐ๊ฑด์ž…๋‹ˆ๋‹ค. ์ด ์˜ค๋ฅ˜์˜ ์ž์ฝ”๋น„์•ˆ(๊ธฐ์šธ๊ธฐ)๋ฅผ ๋ฌผ์ฒด ํฌ์ฆˆ ๋ณ€์ˆ˜(๋ณ€ํ™˜ ํ–‰๋ ฌ์˜ Lie algebra ํ‘œํ˜„)์— ๋Œ€ํ•ด ํ•ด์„์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜์—ฌ Theseus์— ์ „๋‹ฌํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. (PyTorch ์ž๋™๋ฏธ๋ถ„์„ ์“ฐ๋ฉด ๋„ˆ๋ฌด ๋А๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์—, ์ง์ ‘ ์œ ๋„ํ•œ ์ปค์Šคํ…€ Jacobian์„ ๊ตฌํ˜„ํ•˜์—ฌ 4๋ฐฐ ํšจ์œจ์„ ๋†’์˜€๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.) 2. ํฌ์ฆˆ ์—ฐ์†์„ฑ ์ •๊ทœํ™”(Pose regularizer) โ€“ ์—ฐ์†๋œ ๋‘ ํ‚คํ”„๋ ˆ์ž„์˜ ๋ฌผ์ฒด ์ž์„ธ ๋ณ€ํ™”๊ฐ€ ๋„ˆ๋ฌด ๋น„ํ˜„์‹ค์ ์œผ๋กœ ํฌ์ง€ ์•Š๋„๋ก ์•ฝํ•œ ์ œ์•ฝ์„ ๊ฑฐ๋Š” ํ•ญ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ฃผ๋กœ ๊นŠ์ด ๋ฐ์ดํ„ฐ์— ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๊ฑฐ๋‚˜ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์ด ์ž˜๋ชป๋˜์–ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์žก์Œ์„ฑ ํŠ€๋Š” ์ถ”์ •์„ ์–ต์ œํ•˜๋Š” ์—ญํ• ์ž…๋‹ˆ๋‹ค. ์†์•ˆ์—์„œ ๋Œ๋ฆฌ๋Š” ๋™์ž‘์€ ์—ฐ์†๋œ ํ”„๋ ˆ์ž„์—์„œ ๊ทน๋‹จ์ ์ธ ์ด๋™์ด ์—†์œผ๋ฏ€๋กœ, ์ด ์ •๊ทœํ™”๋กœ ์ถ”์ •์˜ ์•ˆ์ •์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. 3. ICP ์ •ํ•ฉ ์˜ค์ฐจ(Iterative Closest Point loss) โ€“ ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ”„๋ ˆ์ž„ ๊ฐ„ ์ ๊ตฐ ์ •ํ•ฉ์„ ๋•๋Š” ํ•ญ๋ชฉ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ํ˜„์žฌ ํ‚คํ”„๋ ˆ์ž„๊ณผ ์ด์ „ ํ‚คํ”„๋ ˆ์ž„ ์‚ฌ์ด์˜ ๊นŠ์ด์ ๋“ค์˜ ์ƒํ˜ธ ์ผ์น˜๋ฅผ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ‚คํ”„๋ ˆ์ž„ k์™€ k - 1์—์„œ ์–ป์€ ๋ฌผ์ฒด ํ‘œ๋ฉด์  ๊ตฌ๋ฆ„๋“ค์ด ์„œ๋กœ ์ž˜ ๊ฒน์น˜๋„๋ก T_{k}์™€ T_{k - 1}๋ฅผ ์กฐ์ •ํ•˜๋Š” ์‹์ž…๋‹ˆ๋‹ค. ์ „ํ†ต ICP ์•Œ๊ณ ๋ฆฌ์ฆ˜์ฒ˜๋Ÿผ ์ตœ๊ทผ์ ‘์  ์Œ์„ ์ฐพ์•„ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ํ•ญ์€ ํ”„๋ ˆ์ž„-ํ”„๋ ˆ์ž„ ๊ฐ„ ์ƒ๋Œ€์ ์ธ ๊ด€์ธก ์ •ํ•ฉ์„ ์ฃผ์–ด, ์•ž์˜ ์‹ ๊ฒฝ์žฅ ์ •ํ•ฉ์ด ํ”„๋ ˆ์ž„-๋ชจ๋ธ ์ „์—ญ ์ •ํ•ฉ์ธ ๊ฒƒ๊ณผ ์ƒ๋ณด์ ์œผ๋กœ ์ž‘์šฉํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ํ”„๋ ˆ์ž„ ๋Œ€ ํ”„๋ ˆ์ž„+ํ”„๋ ˆ์ž„ ๋Œ€ ๋ชจ๋ธ ๋‘ ๋ฐฉํ–ฅ์—์„œ ์ตœ์ ํ™”๋ฅผ ๊ฑฐ๋Š” ๊ฒƒ์ด์ฃ .

์ด๋Ÿฌํ•œ Factor๋“ค์„ ๋ชจ๋‘ ํ•ฉ์ณ ํฌ์ฆˆ ๊ทธ๋ž˜ํ”„์˜ ์ตœ์†Œํ™” ๋ฌธ์ œ๋ฅผ ์„ธ์šฐ๊ณ , LM ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ํ•ด๋ฅผ ๊ตฌํ•˜๋ฉด N๊ฐœ ํ‚คํ”„๋ ˆ์ž„์˜ ๋ฌผ์ฒด ์ž์„ธ๊ฐ€ ํ•œ๊บผ๋ฒˆ์— ์กฐ์ •๋ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฐ๊ณผ ์ค‘ ๊ฐ€์žฅ ์ตœ์‹  ํ”„๋ ˆ์ž„์˜ ์ž์„ธ๊ฐ€ ๋ฐ”๋กœ ํ˜„์žฌ ์‹œ๊ฐ์˜ ๋ฌผ์ฒด ์ถ”์ • ์ž์„ธ T_{t}๊ฐ€ ๋˜๊ณ , ์ด์ „ ๊ฒƒ๋“ค์€ ๊ทธ๋ž˜ํ”„ ๋ฉ”๋ชจ๋ฆฌ์—์„œ ๋” ์ด์ƒ ์“ฐ์ด์ง€ ์•Š์œผ๋ฉด ํ๊ธฐ๋˜๊ฑฐ๋‚˜(์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ) ํ•„์š”์‹œ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ๋ฐฑ์—”๋“œ๋Š” ๋งค ์ž…๋ ฅ ์‹œํ€€์Šค์— ๋Œ€ํ•ด (ํฌ์ฆˆ ์ตœ์ ํ™” โ†’ ํ˜•์ƒ ์ตœ์ ํ™”) ๋ฃจํ”„๋ฅผ ๋Œ๋ฉด์„œ, ์‹ ๊ฒฝ SDF ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ \Theta์™€ ๋ฌผ์ฒด ์ž์„ธ T๋ฅผ ๊ต๋Œ€๋กœ ๋ณด์ •ํ•ด ๋‚˜๊ฐ‘๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ˜•์ƒ ์ง€๋„์™€ ์œ„์น˜ ์ถ”์ ์ด ๋™์‹œ์— ์ •๋ฐ€ํ•ด์ง‘๋‹ˆ๋‹ค. NeuralFeels์˜ ์ ‘๊ทผ๋ฒ•์€ ์™„์ „ํ•œ ์—”๋“œํˆฌ์—”๋“œ ๋”ฅ๋Ÿฌ๋‹๊ณผ ๋‹ฌ๋ฆฌ, ์ด๋ ‡๊ฒŒ ๋ชจ๋“ˆํ™”๋œ ์ตœ์ ํ™”๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์–ด์„œ ๊ฒฐ๊ณผ๋ฌผ์ด ํ•ด์„ ๊ฐ€๋Šฅํ•˜๊ณ  ์‹ ๋ขฐ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค. (์˜ˆ๋ฅผ ๋“ค์–ด ์ตœ์ข… ์ถœ๋ ฅ์ธ SDF ๊ธฐ๋ฐ˜ 3D ๋ชจ๋ธ๊ณผ ๋ฌผ์ฒด ์ž์„ธ ๊ฒฝ๋กœ๋Š” ์‚ฌ๋žŒ์ด ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๊ณ , ๋กœ๋ด‡์˜ ํ›„์† ์ž‘์—…์—๋„ ์ง์ ‘ ํ™œ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.) ๋˜ํ•œ ํ•˜๋“œ์›จ์–ด ์ธก๋ฉด์—์„œ๋„ ํŠน์ˆ˜ํ•œ ๋ฉ€ํ‹ฐ์นด๋ฉ”๋ผ ์žฅ์น˜๋‚˜ ๋น„์ „ ๋ชจ์…˜์บก์ฒ˜ ์—†์ด ๊ฐ„๋‹จํ•œ ์„ผ์„œ ๊ตฌ์„ฑ(์นด๋ฉ”๋ผ 1๋Œ€ + ์ด‰๊ฐ์„ผ์„œ ๋ช‡ ๊ฐœ)์œผ๋กœ ์ด ๋ฌธ์ œ๋ฅผ ํ’€์—ˆ๋‹ค๋Š” ์ ์ด ๋‹๋ณด์ž…๋‹ˆ๋‹ค.

3.3 ์‹คํ—˜: ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ

NeuralFeels์˜ ์—ฐ๊ตฌ์ง„์€ ์œ„ ๋ฐฉ๋ฒ•์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ํ”Œ๋žซํผ ์–‘์ชฝ์—์„œ ๊ตฌํ˜„ํ•˜๊ณ  ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋“œ์›จ์–ด๋Š” ๋‹ค์„ฏ ์†๊ฐ€๋ฝ์„ ๊ฐ€์ง„ ๋กœ๋ด‡ ์†(์•„๋งˆ Shadow Hand๋‚˜ ๋น„์Šทํ•œ ๋‹ค์ง€ ํ•ธ๋“œ๋กœ ์ถ”์ •)์ด๋ฉฐ, ๊ฐ ์†๊ฐ€๋ฝ ๋์— DIGIT ์ด‰๊ฐ ์„ผ์„œ๊ฐ€ ์žฅ์ฐฉ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์†๋ชฉ์—๋Š” ๊ด€์ ˆ ๊ฐ๋„ ์„ผ์„œ๋กœ ์†๊ฐ€๋ฝ ์œ„์น˜๋ฅผ ์ฝ๊ณ , ํ™˜๊ฒฝ์—๋Š” Realsense RGB-D ์นด๋ฉ”๋ผ(๊ณ ์ •)๊ฐ€ ์„ค์น˜๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ์ฒด๋Š” ์‚ฌ๋žŒ ์ฃผ๋จน ์ •๋„ ํฌ๊ธฐ์˜ ์ผ์ƒ ๋ฌผ๊ฑด๋“ค๋กœ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ๋ฌผ์ฒด 3D ๋ชจ๋ธ(๋ฉ”์‹œ)์„ ์ž„์˜๋กœ ์„ ํƒํ•ด ๊ฐ€์ƒ ๋ฌผ๋ฆฌ์—”์ง„(IsaacGym) ์†์—์„œ ์†์ด ์ฅ๊ณ  ๋Œ๋ ธ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ์‹คํ—˜์—์„œ๋Š” ๋™์ผํ•˜๊ฑฐ๋‚˜ ์œ ์‚ฌํ•œ ๋ฌผ๊ฑด์„ 3Dํ”„๋ฆฐํŒ…ํ•˜๊ฑฐ๋‚˜ ์ค€๋น„ํ•˜์—ฌ ๋กœ๋ด‡ ์†์— ์ฅ๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ์ฒด๋กœ๋Š” ์žฅ๋‚œ๊ฐ ์˜ค๋ฆฌ, ์ฃผ์‚ฌ์œ„ ๋ชจํ˜•, ๋ฃจ๋น…์Šค ํ๋ธŒ, ๋ธ”๋ก ๋“ฑ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์™€ ํ‘œ๋ฉดํŠน์„ฑ์„ ๊ฐ€์ง„ ๊ฒƒ๋“ค์ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค (์ผ๋ถ€๋Š” YCB ๋ฒค์น˜๋งˆํฌ๋‚˜ ContactDB์—์„œ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์™”๋‹ค๊ณ  ์–ธ๊ธ‰๋ฉ๋‹ˆ๋‹ค).

In-hand ํšŒ์ „ ์ •์ฑ…: ์‹คํ—˜์—์„œ ๋กœ๋ด‡ ์†์€ ๋ฌผ์ฒด๋ฅผ ์†๋ฐ”๋‹ฅ ์•ž์—์„œ ์ง‘์€ ์ƒํƒœ๋กœ ์‹œ์ž‘ํ•˜์—ฌ, ์ž์œ ๋กญ๊ฒŒ ํšŒ์ „์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ์ด ๋™์ž‘์„ ์œ„ํ•œ ์ •์ฑ…(policy)์€ ์ด ์—ฐ๊ตฌ์˜ ์ดˆ์ ์€ ์•„๋‹ˆ์ง€๋งŒ, Haozhi Qi ๋“ฑ์ด ๊ฐœ๋ฐœํ•œ HORA(In-Hand Object Rotation via Rapid Motor Adaptation) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ™œ์šฉํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์‰ฝ๊ฒŒ ๋งํ•ด, ๋ฌผ์ฒด๋ฅผ ๋†“์น˜์ง€ ์•Š์œผ๋ฉด์„œ ์†๊ฐ€๋ฝ๋“ค๋งŒ์œผ๋กœ ์—ฐ์† ํšŒ์ „์„ ์‹คํ–‰ํ•˜๋Š” ํ•™์Šต๋œ ์ œ์–ด๊ธฐ๋ฅผ ์“ด ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์•ฝ ์ˆ˜ ์ดˆ ๊ฐ„ ํšŒ์ „์‹œํ‚ค๋ฉด, ๋ฌผ์ฒด์˜ ์—ฌ๋Ÿฌ ๋ฉด์ด ์†๊ฐ€๋ฝ์— ๋‹ฟ์•˜๋‹ค ๋–จ์–ด์ง€๋ฉฐ ์ด‰๊ฐ ์ •๋ณด๋ฅผ ์ฃผ๊ณ , ์นด๋ฉ”๋ผ ์‹œ์ ์—์„œ๋„ ๋‹ค์–‘ํ•œ ๊ฐ๋„๋กœ ๋ฌผ์ฒด๋ฅผ ๋ณด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ์–‘์ชฝ ๋ชจ๋‘ ๋ณดํ†ต ํ•œ ์‹œํ€€์Šค ๊ธธ์ด๊ฐ€ ์ˆ˜๋ฐฑ ํ”„๋ ˆ์ž„(๋ช‡ ์ดˆ) ์ •๋„๋กœ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. FeelSight ๋ฐ์ดํ„ฐ์…‹์—๋Š” ์ด๋Ÿฌํ•œ ํšŒ์ „ ์‹œํ€€์Šค๊ฐ€ ์ด 70๊ฐœ ๋‹ด๊ฒจ ์žˆ์œผ๋ฉฐ (์‹œ๋ฎฌ๋ ˆ์ด์…˜ 40ํšŒ, ์‹ค์ œ 30ํšŒ), ๊ฐ ์‹œํ€€์Šค๋Š” ๋‹ค๋ฅธ ๋ฌผ์ฒด์™€ ์ดˆ๊ธฐ ๋ฐฐ์น˜๋กœ 5ํšŒ ๋ฐ˜๋ณต๋˜์–ด ํ†ต๊ณ„์  ์œ ํšจ์„ฑ์„ ํ™•๋ณดํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์€ HuggingFace๋ฅผ ํ†ตํ•ด ๊ณต๊ฐœ๋˜์–ด ์žˆ์–ด, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ 25GB, ์‹ค์ œ 15GB, ์ถ”๊ฐ€๋กœ ๊ฐ€๋ ค์ง„ ์‹œ์  ์‹คํ—˜ 12GB ๋“ฑ์˜ ๋ฐ์ดํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค (๊นŠ์ด ์˜์ƒ, ์ด‰๊ฐ ํ”„๋ ˆ์ž„, ๋กœ๋ด‡ ์ƒํƒœ, ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ์˜ ๊ฒฝ์šฐ ์ถ”์ •๋œ โ€œ์ค€-์ •๋‹ตโ€ ์ž์„ธ ๋“ฑ์ด ํฌํ•จ).

ํ‰๊ฐ€์ง€ํ‘œ: ์„ฑ๋Šฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ํ˜•์ƒ ์žฌ๊ตฌ์„ฑ ์ •ํ™•๋„์™€ ์ž์„ธ ์ถ”์  ์ •ํ™•๋„ ๋‘ ์ถ•์„ ์ธก์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. - ํ˜•์ƒ ์ •ํ™•๋„๋Š” F-Score๋กœ ํ‘œํ˜„ํ–ˆ๋Š”๋ฐ, ์ด๋Š” 3D ๋ชจ์–‘ ๋น„๊ต์‹œ ์ •๋ฐ€๋„(precision)์™€ ์žฌํ˜„์œจ(recall)์˜ ์กฐํ™”ํ‰๊ท ์ž…๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์‹คํ—˜ ์ข…๋ฃŒ ํ›„ ์–ป์€ ์‹ ๊ฒฝ์žฅ SDF ๋ชจ๋ธ์„ Marching Cubes๋กœ ๋ฉ”์‹œ ์ถ”์ถœํ•˜์—ฌ ์žฌ๊ตฌ์„ฑ๋œ ๋ฌผ์ฒด ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ , ์ด๋ฅผ ๊ทธ๋ผ์šด๋“œํŠธ๋ฃจ์Šค ๋ฉ”์‰ฌ(์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ์•Œ๊ณ  ์žˆ๊ณ , ์‹ค์ œ๋Š” ์‚ฌ์ „ ์Šค์บ”ํ•˜๊ฑฐ๋‚˜ CAD ๋ชจ๋ธ ์‚ฌ์šฉ)์™€ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฉ”์‰ฌ๋ฅผ ์ •ํ•ฉํ•˜์—ฌ ์„œ๋กœ์˜ ์ ๋“ค์„ ์ผ์ • ์ž„๊ณ„ ๊ฑฐ๋ฆฌ(์˜ˆ: 5mm) ์ด๋‚ด์— ๊ฐ–๋Š” ๋น„์œจ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ •๋ฐ€๋„๋Š” ์žฌ๊ตฌ์„ฑ ๋ฉ”์‰ฌ์˜ ์ ๋“ค ์ค‘ GT์— ๊ทผ์ ‘ํ•œ ๋น„์œจ, ์žฌํ˜„์œจ์€ GT ์ ๋“ค ์ค‘ ์žฌ๊ตฌ์„ฑ์— ์˜ํ•ด ์„ค๋ช…๋˜๋Š” ๋น„์œจ์ž…๋‹ˆ๋‹ค. ๋‘ ๊ฐ’์˜ ์กฐํ™”ํ‰๊ท ์ด F-Score๋กœ, ๋†’์„์ˆ˜๋ก GT์™€ ์žฌ๊ตฌ์„ฑ์ด ์ž˜ ์ผ์น˜ํ–ˆ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. - ์ž์„ธ ์ถ”์  ์˜ค์ฐจ๋Š” ADD-S (Average Distance โ€“ Symmetry) ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฌผ์ฒด์˜ 3D ํ‘œ๋ฉด ํฌ์ธํŠธ ์ง‘ํ•ฉ์„ ์ผ์ • ๊ฐ„๊ฒฉ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ, ์ถ”์ • ์ž์„ธ๋กœ ๋ณ€ํ™˜ํ•œ ์ ๋“ค๊ณผ GT ์ž์„ธ์˜ ์ ๋“ค ์‚ฌ์ด์˜ ์ตœ๊ทผ์ ‘ ๊ฑฐ๋ฆฌ ํ‰๊ท ์„ ๊ตฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ฌผ์ฒด์— ๋Œ€์นญ ๊ตฌ์กฐ๊ฐ€ ์žˆ์œผ๋ฉด (์˜ˆ: ์ •์œก๋ฉด์ฒด ์ฃผ์‚ฌ์œ„์ฒ˜๋Ÿผ ํšŒ์ „ํ•ด๋„ ๋ชจ์–‘ ๊ฐ™์•„ ๊ตฌ๋ถ„ ์•ˆ ๋˜๋Š” ๊ฒฝ์šฐ), ์ตœ๊ทผ์ ‘ ๊ธฐ์ค€์œผ๋กœ ํ•˜์—ฌ ์˜ค์ฐจ๊ฐ€ ๊ณผ๋Œ€ํ‰๊ฐ€๋˜์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ADD-S ์˜ค์ฐจ๋Š” ๋‚ฎ์„์ˆ˜๋ก (mm ๋‹จ์œ„๋กœ ํ‘œ๊ธฐ) ์ถ”์ ์ด ์ž˜ ๋˜์—ˆ๋‹ค๋Š” ๋œป์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” GT ์ž์„ธ๋ฅผ ์•Œ๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์ง์ ‘ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ์‹ค์ œ ์‹คํ—˜์—์„œ๋Š” GT ์ž์„ธ๋ฅผ ์–ป๊ธฐ ์–ด๋ ค์›Œ์„œ โ€œ์ค€์ง€๋„โ€ ๋ฐฉ์‹์„ ์ผ์Šต๋‹ˆ๋‹ค. ๋ฐฉํ•ด ์—†๋Š” ํ™˜๊ฒฝ์—์„œ ์นด๋ฉ”๋ผ ์—ฌ๋Ÿฌ ๋Œ€๋ฅผ ๋™์›ํ•ด ๋ฌผ์ฒด๋ฅผ ๊ด€์ฐฐํ•˜๊ณ , NeuralFeels ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ CAD ๋ชจ๋ธ์ด ์žˆ๋Š” ๋ชจ๋“œ๋กœ ๋Œ๋ ค ์–ป์€ ์ตœ์ƒ์˜ ์ถ”์  ๊ฒฐ๊ณผ๋ฅผ โ€œGTโ€์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. (๋ชจ์…˜์บก์ฒ˜๋Š” ๋งˆ์ปค๊ฐ€ ๊ฐ„์„ญํ•˜์—ฌ ์–ด๋ ค์› ๋‹ค๊ณ  ํ•˜๋„ค์š”.)

๋น„๊ต ๊ธฐ๋ฒ• ๋ฐ ์‹คํ—˜ ์‹œ๋‚˜๋ฆฌ์˜ค: ์ €์ž๋“ค์€ NeuralFeels์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋ช‡ ๊ฐ€์ง€ ๋น„๊ต ๋ชจ๋“œ๋ฅผ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค: 1. Vision-only vs Visuo-tactile: ๋ณธ ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ์€ ์ด‰๊ฐ์„ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ๊ฐœ์„ ๋˜๋Š” ๋ถ€๋ถ„์ด๋ฏ€๋กœ, ์ด‰๊ฐ์„ ๋ฐฐ์ œํ•œ ๊ฒฝ์šฐ๋ฅผ ์ผ์ข…์˜ baseline์œผ๋กœ ์‚ผ์•˜์Šต๋‹ˆ๋‹ค. ์ฆ‰ ์‹œ๊ฐ ์ „์šฉ ๋ชจ๋“œ์—์„œ๋Š” ์นด๋ฉ”๋ผ ๊นŠ์ด๋กœ๋งŒ ๋™์ผํ•œ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋Œ๋ฆฌ๊ณ , ์‹œ๊ฐ+์ด‰๊ฐ ๋ชจ๋“œ์—์„œ๋Š” ์ „์ฒด ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๊ฒฝ์šฐ์˜ ํ˜•์ƒ/์ž์„ธ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์—ฌ, ํŠนํžˆ ์‹œ์•ผ๊ฐ€๋ฆผ ์ •๋„์— ๋”ฐ๋ผ ์–ด๋–ค ์ฐจ์ด๊ฐ€ ๋‚˜๋Š”์ง€ ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค. 2. Unknown Object SLAM vs Known Object Tracking: ๋ฏธ์ง€ ๊ฐ์ฒด SLAM ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ๋Š” ๋ฌผ์ฒด์˜ CAD ๋ชจ๋ธ์ด๋‚˜ ์‚ฌ์ „ ์ •๋ณด ์—†์ด Shape+Pose ๋™์‹œ์ถ”์ •์„ ํ•˜๋Š” ์™„์ „ํŒ NeuralFeels๋ฅผ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋ธ์ด ์•Œ๋ ค์ง„ ๊ฐ์ฒด ์ถ”์  ์‹œ๋‚˜๋ฆฌ์˜ค๋„ ๋”ฐ๋กœ ๋‘์—ˆ๋Š”๋ฐ, ์ด๋•Œ๋Š” NeuralFeels์˜ ์‹ ๊ฒฝ์žฅ SDF ๋ชจ๋ธ์„ ๋ฌผ์ฒด์˜ GT CAD๋กœ ์ดˆ๊ธฐํ™”ํ•˜๊ฑฐ๋‚˜ ์•„์˜ˆ ๊ณ ์ •ํ•œ ์ฑ„, ํฌ์ฆˆ๋งŒ ์ถ”์ ํ•˜๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์ˆœ์ˆ˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ถ”์ ๊ธฐ๋กœ์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๊ณผ ์„ฑ๋Šฅ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์ด ๊ฒฝ์šฐ ๋ฌผ์ฒด๊ฐ€ ๊ฑฐ์˜ ๋ณด์ด์ง€ ์•Š์„ ๋•Œ (์˜ˆ: ์™„์ „ํžˆ ์†์— ๊ฐ€๋ ค์ง) ์ด‰๊ฐ๋งŒ์œผ๋กœ ์–ด๋А ์ •๋„ ์ถ”์ ์ด ๊ณ„์†๋˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์‹œ๊ฐ์ „์šฉ ์ถ”์ ๊ธฐ(CAD ์•Œ๊ณ ๋ฆฌ์ฆ˜ + ICP ๋“ฑ) ๋Œ€๋น„ ํ–ฅ์ƒ์„ ์ •๋Ÿ‰ํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค. 3. Occlusion & Noise Stress Test: ๋งˆ์ง€๋ง‰์œผ๋กœ ์‹œ๊ฐ์„ผ์„œ์˜ ์‹œ์•ผ ๊ฐ€๋ฆผ ์ •๋„์™€ ๋…ธ์ด์ฆˆ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”๋ฅผ ์‹คํ—˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹œ์•ผ ๊ฐ€๋ฆผ์€ ์นด๋ฉ”๋ผ ์œ„์น˜๋ฅผ ๋‹ฌ๋ฆฌํ•˜์—ฌ ๋ฌผ์ฒด๋ฅผ ๋ณด๋Š” ๊ฐ๋„ ๋ฒ”์œ„๋ฅผ ๊ตฌ๋ถ„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ ์–ด๋–ค ๊ฐ๋„์—์„œ๋Š” ๋ฌผ์ฒด๊ฐ€ ์†์œผ๋กœ ๊ฑฐ์˜ ๊ฐ€๋ ค์ ธ ์žˆ๊ณ , ์–ด๋–ค ๊ฐ๋„์—์„œ๋Š” ์ž˜ ๋ณด์ด๋Š” ์‹์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ์นด๋ฉ”๋ผ๋ฅผ ๊ตฌ ํ˜•ํƒœ๋กœ ๋‘˜๋Ÿฌ ๋ฐฐ์น˜ํ•ด๊ฐ€๋ฉฐ ํฌ์ฆˆ ์˜ค์ฐจ ๋ณ€ํ™”๋ฅผ ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•œํŽธ ๋…ธ์ด์ฆˆ ์‹คํ—˜์—์„œ๋Š” RealSense ์นด๋ฉ”๋ผ์˜ ๊นŠ์ด ๋…ธ์ด์ฆˆ ๋ชจ๋ธ์„ ์ ์šฉํ•˜์—ฌ ๊นŠ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ ์ง„์ ์œผ๋กœ ๋” ๋ถ€์ •ํ™•ํ•˜๊ฒŒ ๋งŒ๋“ค๊ณ , ๊ทธ์— ๋”ฐ๋ฅธ ์ถ”์  ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์กฐ๋ช…์ด๋‚˜ ๋ฌผ์ฒด ์žฌ์งˆ(์˜ˆ: ์œ ๊ด‘ ๋ฌผ์ฒด) ๋“ฑ์— ๋”ฐ๋ฅธ ์‹ค์ œ ์„ผ์„œ ๋ถˆ์•ˆ์ • ์ƒํ™ฉ์„ ๋ชจ์‚ฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ฃผ์š” ๊ฒฐ๊ณผ: - ๋ฏธ์ง€ ๋ฌผ์ฒด SLAM: NeuralFeels๋Š” ์•„๋ฌด ์‚ฌ์ „์ •๋ณด ์—†๋Š” ์ƒˆ๋กœ์šด ๋ฌผ์ฒด๋“ค์„ ๋Œ€์ƒ์œผ๋กœ ์•ˆ์ •์ ์œผ๋กœ 3D ๋ชจ๋ธ์„ ํ˜•์„ฑํ•˜๊ณ  ์ถ”์ ํ•ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ํ‰๊ท  F-Score ์•ฝ 81% ์ˆ˜์ค€์œผ๋กœ ํ˜•์ƒ์„ ๋ณต์›ํ–ˆ์œผ๋ฉฐ, ADD-S ์ž์„ธ ์˜ค์ฐจ ํ‰๊ท  4.7โ€ฏmm๋กœ ์ดˆ๊ธฐ ์œ„์น˜์—์„œ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚˜์ง€ ์•Š๊ณ  ๋๊นŒ์ง€ ์ถ”์  ์œ ์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฐ๊ณผ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ์—์„œ ํฐ ์ฐจ์ด ์—†์ด ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์ด์—ˆ์œผ๋ฉฐ, ์ด๋Š” ํ•™์Šต๋œ ์ด‰๊ฐ ํŠธ๋žœ์Šคํฌ๋จธ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜โ†’์‹ค์ œ ์ผ๋ฐ˜ํ™”๊ฐ€ ์„ฑ๊ณต์ ์ž„์„ ๋ฐ˜์ฆํ•ฉ๋‹ˆ๋‹ค. ๋น„์ „ ์ „์šฉ ๋Œ€๋น„ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ์˜ ์ด์ ์€ ํŠนํžˆ ์–ด๋ ค์šด ์ƒํ™ฉ์—์„œ ๋‘๋“œ๋Ÿฌ์กŒ์Šต๋‹ˆ๋‹ค. ์ „์ฒด 70ํšŒ ์‹คํ—˜์„ ํ†ต๊ณ„๋‚ผ ๋•Œ ์‹œ๊ฐ+์ด‰๊ฐ ์œตํ•ฉ์ด ๋ชจ๋“  ์‹คํ—˜์—์„œ ํ˜•์ƒ F-์Šค์ฝ”์–ด๋ฅผ ๋” ๋†’๊ฒŒ ๋‹ฌ์„ฑํ–ˆ๊ณ , ์ž์„ธ ๋“œ๋ฆฌํ”„ํŠธ๋„ ์ค„์—ฌ์ฃผ์–ด Vision-only๊ฐ€ ๊ฐ„ํ˜น ์ถ”์ ์— ์‹คํŒจํ•˜๋Š” ์ผ€์ด์Šค๋“ค์„ ํ˜„์ €ํžˆ ์ค„์˜€์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ Figure 3(c)์—์„œ๋Š” Vision-only์˜ ์ถ”์  ์‹คํŒจ ํšŸ์ˆ˜๊ฐ€ ๋งŽ์ง€๋งŒ Visuo-tactile์˜ ์‹คํŒจ๋Š” ํ›จ์”ฌ ์ ๋‹ค๋Š” ์ ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ •์„ฑ์ ์ธ ์˜ˆ๋กœ, Vision-only๋Š” ํฐ ์ฃผ์‚ฌ์œ„์˜ ์ˆจ์€ ๋ฉด์ด๋‚˜ ๊ณ ๋ฌด ์˜ค๋ฆฌ์˜ ๋“ฑ ๋’ค์ฒ˜๋Ÿผ ๋ณด์ด์ง€ ์•Š๋Š” ๋ถ€๋ถ„์„ ์ œ๋Œ€๋กœ ์žฌํ˜„ ๋ชปํ•˜์ง€๋งŒ, ์ด‰๊ฐ์„ ์“ด ๋ฐฉ๋ฒ•์€ ๊ทธ ๋ถ€๋ถ„๊นŒ์ง€ ๋น„๊ต์  ์™„์„ฑ๋œ ํ˜•ํƒœ๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ด‰๊ฐ ์ •๋ณด๊ฐ€ ๋ณด์ด์ง€ ์•Š๋Š” ํ‘œ๋ฉด์„ ๋ฉ”์›Œ์ฃผ์–ด ๋ฌผ์ฒด ๋ชจ๋ธ์˜ ์™„์„ฑ๋„(completion)๋ฅผ ๋†’์—ฌ์คฌ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

  • CAD ๋ชจ๋ธ ์ถ”์ : ๋ฌผ์ฒด์˜ 3D ๋ชจ๋ธ์ด ๋ฏธ๋ฆฌ ์ฃผ์–ด์ ธ ์žˆ๋Š” ๊ฒฝ์šฐ, NeuralFeels๋Š” ์‹ ๊ฒฝ์žฅ ํ•™์Šต์„ ์ƒ๋žตํ•˜๊ณ  ์ž์„ธ ์ถ”์  ์ „์šฉ ๋ชจ๋“œ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ์ดˆ๊ธฐ ๋ช‡ ํ”„๋ ˆ์ž„๋งŒ์— ๋ฌผ์ฒด ์ž์„ธ๋ฅผ ์ •ํ™•ํžˆ ์ฐพ์•„๋‚ธ ๋’ค, ์ดํ›„์—๋Š” LM ์ตœ์ ํ™” + ์ด‰๊ฐ ๋ณด์กฐ๋กœ ๋งค์šฐ ๋‚ฎ์€ ๋“œ๋ฆฌํ”„ํŠธ๋ฅผ ์œ ์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ‰๊ท  2.3โ€ฏmm ์˜ค์ฐจ ์ˆ˜์ค€์€, ์ผ๋ฐ˜์ ์ธ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ 6-DoF ์ถ”์ ๊ธฐ๋“ค(์˜ˆ: ICP ๊ธฐ๋ฐ˜)๋ณด๋‹ค๋„ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ์•ผ๊ฐ€ ๊ฐ€๋ ค์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ๊ทน๋ช…ํ•ด์กŒ๋Š”๋ฐ, Vision-only ์ถ”์ ์˜ ๊ฒฝ์šฐ ์†๊ฐ€๋ฆผ์œผ๋กœ ํŠน์ง•์ด ๋ถ€์กฑํ•ด์ง€๋ฉด ๋จธ๋ญ‡๊ฑฐ๋ฆฌ๊ฑฐ๋‚˜ ์ž˜๋ชป๋œ ๋ฐฉํ–ฅ์œผ๋กœ ํŠ€๋Š” ๋ฐ˜๋ฉด, ์ด‰๊ฐ ์œตํ•ฉ ์ถ”์ ์€ ์†๋์—์„œ ๋А๋‚€ ์›€์ง์ž„์„ ํฌ์ฐฉํ•˜์—ฌ ์—ฐ์†์„ฑ ์žˆ๊ฒŒ ์ถ”์ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์— ๋”ฐ๋ผ ๊ฐ•ํ•œ occlusion ํ™˜๊ฒฝ์—์„œ ์ตœ๋Œ€ 94%๊นŒ์ง€ ์ถ”์  ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค๊ณ  ๋ณด๊ณ ๋ฉ๋‹ˆ๋‹ค. Figure 4์— ํ•ด๋‹นํ•˜๋Š” ๊ฒฐ๊ณผ์—์„œ, ์‹œ๊ฐ์ด ๊ฑฐ์˜ ์ฐจ๋‹จ๋œ ๊ทน๋‹จ์  ๊ฐ๋„์—์„œ์กฐ์ฐจ ์ด‰๊ฐ์ด ๋กœ์ปฌํ•˜๊ฒŒ ๋ณด์ถฉ ์‹œ์•ผ ์—ญํ• ์„ ํ•˜์—ฌ ์ถ”์ ์„ ์ด์–ด๊ฐ€๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ ๋ฌผ์ฒด๊ฐ€ ์นด๋ฉ”๋ผ์— ์ž˜ ๋ณด์ด๋Š” ๊ฒฝ์šฐ์—๋Š” ์ด‰๊ฐ์˜ ์˜ํ–ฅ์ด ์ƒ๋Œ€์ ์œผ๋กœ ์ ์—ˆ๋Š”๋ฐ, ์ด๋•Œ๋Š” Vision-only๋„ ์ถฉ๋ถ„ํžˆ ์ž˜ ์ถ”์ ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ด‰๊ฐ์€ ๋ฏธ์„ธํ•œ ๋ณด์ • ์ •๋„์˜ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ด€์ฐฐ์€ โ€œ์ด‰๊ฐ์€ ์‹œ๊ฐ์ด ๋ชจํ˜ธํ•  ๋•Œ ๊ฒฐ์ •์ ์œผ๋กœ ๋„์›€์ด ๋˜๊ณ , ์‹œ๊ฐ ์ •๋ณด๊ฐ€ ํ’๋ถ€ํ•  ๋•Œ๋Š” ์„ธ๋ถ€๋ฅผ ๋‹ค๋“ฌ์–ด์ฃผ๋Š” ์—ญํ• ์„ ํ•œ๋‹คโ€๋Š” ์—ฐ๊ตฌ์ง„์˜ ๊ฒฐ๋ก ๊ณผ๋„ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค.

  • ๋…ธ์ด์ฆˆ ๋ฏผ๊ฐ๋„ ๋ถ„์„: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์นด๋ฉ”๋ผ ๊นŠ์ด๋งต์— ์ ์ง„์  ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•ด๋ณธ ๊ฒฐ๊ณผ, Vision-only ๋ฐฉ์‹์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์ปค์ง€๋ฉด ์ถ”์  ์˜ค๋ฅ˜๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฐ˜๋ฉด, ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐฉ์‹์€ ์ƒ๋Œ€์ ์œผ๋กœ ์™„๋งŒํ•˜๊ฒŒ ์•…ํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ด‰๊ฐ ์„ผ์„œ๊ฐ€ ๋…ธ์ด์ฆˆ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š์œผ๋ฏ€๋กœ (๋ฌผ๋ก  ์‹ค์ œ ์ด‰๊ฐ์—๋„ ์•ฝ๊ฐ„์˜ ์žก์Œ์€ ์žˆ๊ฒ ์ง€๋งŒ, ๊นŠ์ด ์นด๋ฉ”๋ผ๋งŒํผ ํฌ์ง€ ์•Š์Œ) ์–ด๋А ์ •๋„ ๊ฒฌ์ธ ์—ญํ• ์„ ํ•ด์ค€ ๋•๋ถ„์ž…๋‹ˆ๋‹ค. RealSense ๋“ฑ์˜ ์‹ค์ œ ๊นŠ์ด์„ผ์„œ๋Š” ํ™˜๊ฒฝ์— ๋”ฐ๋ผ ๋„์—„๋„์—„ ์ ์ด ๋ˆ„๋ฝ๋˜๊ฑฐ๋‚˜ ์ž˜๋ชป ์ธก์ •ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์€๋ฐ, ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ์—๋„ ์ด‰๊ฐ์ด ๋ณด๊ฐ• ์‹ ํ˜ธ๋กœ ์ž‘์šฉํ•˜๋ฉด ๋” ๊ฒฌ๊ณ ํ•œ ์ถ”์ ์ด ๊ฐ€๋Šฅํ•จ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

FeelSight ๋ฐ์ดํ„ฐ์…‹: ์•ž์„œ ์–ธ๊ธ‰ํ•œ 70๊ฐœ ์‹คํ—˜ ์‹œํ€€์Šค ๋ฌถ์Œ์ธ FeelSight๋Š” ์ด ๋ถ„์•ผ์˜ ์ฒซ ๋ฒˆ์งธ ๊ณต๊ฐœ ๋ฒค์น˜๋งˆํฌ๋กœ ์˜๋ฏธ๊ฐ€ ํฝ๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ 40๊ฐœ์™€ ์‹ค์ œ 30๊ฐœ์˜ in-hand ํšŒ์ „ ์‹œ๋‚˜๋ฆฌ์˜ค๊ฐ€ ๋™์ผ ํฌ๋งท์œผ๋กœ ์ œ๊ณต๋˜์–ด, ๋ชจ๋ธ ํ•™์Šต์ด๋‚˜ ํƒ€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ‰๊ฐ€์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ด‰๊ฐ๊ณผ ์‹œ๊ฐ์„ ๋™์‹œ์— ์ œ๊ณตํ•˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋“œ๋ฌผ๊ธฐ ๋•Œ๋ฌธ์—, ํ–ฅํ›„ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ธ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ‰๊ฐ€ ํ‘œ์ค€์œผ๋กœ ์ž๋ฆฌ์žก์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์—๋Š” ๊ฐ๊ฐ์˜ ์‹œํ€€์Šค์— ๋Œ€ํ•ด ์ดˆ๋‹น 1fps, 5fps ๋“ฑ ์—ฌ๋Ÿฌ ๋ฒ„์ „์ด ์žˆ๊ณ , Occlusion ์ „์šฉ ์‹คํ—˜ ์„ธํŠธ๋„ ๋ณ„๋„๋กœ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ huggingface์˜ ๋ชจ๋ธ ์ €์žฅ์†Œ๋ฅผ ํ†ตํ•ด ํ•™์Šต๋œ ์ด‰๊ฐ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ๊ณผ ์„ธ๊ทธ๋จผํŠธ์—๋‹ˆ์‹ฑ ๊ฐ€์ค‘์น˜ ๋“ฑ๋„ ์ œ๊ณต๋˜์–ด, ์—ฐ๊ตฌ์ž๋“ค์ด ๋ฐ”๋กœ ์žฌํ˜„ ์‹คํ—˜์„ ํ•ด๋ณผ ์ˆ˜ ์žˆ๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค. GitHub ์ฝ”๋“œ ์ƒ์—์„œ ./scripts/run ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ๋‹ค์–‘ํ•œ ๋ชจ๋“œ(vi: vision only, vitac: vision+tactile, tac: tactile only ๋“ฑ)๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ์ƒํ•˜๋ฉฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•ด๋ณผ ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

3.4 ๋น„ํŒ์  ๊ณ ์ฐฐ: ๊ฐ•์ , ์•ฝ์ ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ

NeuralFeels๋Š” ๋กœ๋ด‡ ์‹œ๊ฐ-์ด‰๊ฐ ํ†ตํ•ฉ ์ธ์ง€ ๋ถ„์•ผ์—์„œ ์—ฌ๋Ÿฌ ์ค‘์š”ํ•œ ์ง„์ „์„ ์ด๋ค˜์Šต๋‹ˆ๋‹ค. ์šฐ์„  ํ•ต์‹ฌ ๊ฐ•์ ์„ ์ •๋ฆฌํ•ด๋ณด๋ฉด:

  • ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ†ตํ•ฉ์˜ ํšจ๊ณผ ์ž…์ฆ: ์ด ์—ฐ๊ตฌ๋Š” ์‹คํ—˜์ ์œผ๋กœ ์ด‰๊ฐ์˜ ๊ฐ€์น˜๋ฅผ ๊ณ„๋Ÿ‰ํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ์•ผ๊ฐ€๋ฆผ์ด๋‚˜ ์„ผ์„œ ๋…ธ์ด์ฆˆ ๊ฐ™์€ ํ˜„์‹ค์  ๋ฌธ์ œ์—์„œ ์ด‰๊ฐ์ด ์—†์„ ๋•Œ์™€ ์žˆ์„ ๋•Œ์˜ ์„ฑ๋Šฅ ์ฐจ์ด๋ฅผ ๋ช…ํ™•ํžˆ ๋ณด์—ฌ์คŒ์œผ๋กœ์จ, ํ–ฅํ›„ ๋กœ๋ด‡ ์‹œ์Šคํ…œ์— ์ด‰๊ฐ ์„ผ์„œ ์ฑ„ํƒ์„ ์ •๋‹นํ™”ํ•˜๋Š” ๊ทผ๊ฑฐ๋ฅผ ๋งˆ๋ จํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๊ฐ„ ์ด‰๊ฐ ์„ผ์„œ๋Š” ๊ตฌํ˜„ ๋ณต์žก์„ฑ๊ณผ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์–ด๋ ค์›€ ๋•Œ๋ฌธ์— ๋ถ€์ฐจ์  ์ทจ๊ธ‰์„ ๋ฐ›๊ณค ํ–ˆ์ง€๋งŒ, ๋ณธ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋Š” โ€œ์ด‰๊ฐ์„ ์“ฐ๋ฉด ์ด๋ ‡๊ฒŒ ์ข‹์•„์ง„๋‹ค!โ€๋ฅผ ์ˆ˜์น˜๋กœ ์ œ์‹œํ•˜์—ฌ ๋กœ๋ด‡๊ณตํ•™์ž๋“ค์—๊ฒŒ ์‹œ์‚ฌํ•˜๋Š” ๋ฐ”๊ฐ€ ํฝ๋‹ˆ๋‹ค.

  • ๋ฏธ์ง€ ๋ฌผ์ฒด์— ๋Œ€ํ•œ ์ผ๋ฐ˜์„ฑ: NeuralFeels๋Š” ๋ฌผ์ฒด ๋ฒ”์ฃผ๋‚˜ ์‚ฌ์ „๋ชจ๋ธ ์ œํ•œ ์—†์ด ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „์˜ ๋Œ€๋ถ€๋ถ„ in-hand ์ถ”์  ์—ฐ๊ตฌ๋“ค์€ CAD ๋ชจ๋ธ์ด ์žˆ๋Š” ๋ฌผ์ฒด๋งŒ ๋‹ค๋ฃจ๊ฑฐ๋‚˜, ๋ฌผ์ฒด๋ฅผ ๋งˆ์ปค๋กœ ํƒœ๊น…ํ•˜๋Š” ๋ฐฉ์‹์ด ๋งŽ์•˜์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ๋ณธ ๊ธฐ๋ฒ•์€ ์™„์ „ ๋ชจ๋ธ ํ”„๋ฆฌ(model-free)์ด๋ฉฐ, ์‹ฌ์ง€์–ด ์นดํ…Œ๊ณ ๋ฆฌ ์ˆ˜์ค€ ์‚ฌ์ „ ํ•™์Šต์กฐ์ฐจ ์—†์Šต๋‹ˆ๋‹ค. ๋งค ์ƒˆ๋กœ์šด ๋ฌผ์ฒด๋ฅผ ์ œ๋กœ-์ƒท์œผ๋กœ ๋‹ค๋ฃจ๋ฉด์„œ๋„ ํ›Œ๋ฅญํ•œ ์žฌ๊ตฌ์„ฑ ํ’ˆ์งˆ์„ ๋ณด์ธ ๊ฑด ๋†€๋ผ์šด ์ ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ€์ • ํ™˜๊ฒฝ ๋“ฑ ๋ฌด๊ถ๋ฌด์ง„ํ•œ ์ข…๋ฅ˜์˜ ๋ฌผ๊ฑด์„ ๋‹ค๋ค„์•ผ ํ•˜๋Š” ๋กœ๋ด‡์—๊ฒŒ ํ•„์ˆ˜์ ์ธ ๋Šฅ๋ ฅ์ด์ฃ .

  • Neural SLAM ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์„ฑ๊ณต: ์‹ ๊ฒฝ implicit ๋ชจ๋ธ์„ ์˜จ๋ผ์ธ์œผ๋กœ ์ตœ์ ํ™”ํ•˜๋Š” ์ ‘๊ทผ์€ ์•„์ง ์ƒ์†Œํ•œ๋ฐ, NeuralFeels๋Š” ์ด๋ฅผ SLAM์˜ ๊ด€์ ์—์„œ ์ž˜ ๊ตฌ์กฐํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ‚คํ”„๋ ˆ์ž„ ๊ด€๋ฆฌ, ์ง€๋„-ํฌ์ฆˆ ๊ต์ฐจ ์ตœ์ ํ™”, factor graph ๋“ฑ ๊ฒ€์ฆ๋œ ๊ธฐ๋ฒ•๋“ค์„ ํ™œ์šฉํ•˜์—ฌ ์•ˆ์ •์  ์ˆ˜๋ ด์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ Theseus ๊ธฐ๋ฐ˜์˜ LM ์†”๋ฒ„ + ์ปค์Šคํ…€ ์ œ์ด์ฝฅian์œผ๋กœ ์ตœ์ ํ™” ์†๋„๋ฅผ ๋†’์ธ ๋ถ€๋ถ„, ๊ทธ๋ฆฌ๊ณ  ICP ํŒฉํ„ฐ ๋„์ž…์œผ๋กœ ์‹ค์šฉ ์‹ ๋ขฐ์„ฑ์„ ๊ฐ•ํ™”ํ•œ ๋ถ€๋ถ„์€ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ์Šน๋ฆฌ๋ผ ํ•  ๋งŒํ•ฉ๋‹ˆ๋‹ค. ๋•๋ถ„์— ํ›ˆ๋ จ ์—†์ด ์‹ค์‹œ๊ฐ„ ๋™์ž‘ ๊ฐ€๋Šฅํ•œ ์˜จ๋ผ์ธ ์ตœ์ ํ™”์‹์„ ์ œ์‹œํ•˜์˜€๊ณ , ๊ฒฐ๊ณผ๋„ ์‚ฌ๋žŒ์ด ์ดํ•ดํ•˜๊ธฐ ์ข‹์€ ํ˜•ํƒœ(ํฌ์ธํŠธ๋งต์ด๋‚˜ ๋ฉ”์‹œ, 6-์ž์„ธ ์‹œํ€€์Šค)๋ผ์„œ ํˆฌ๋ช…์„ฑ๋„ ํ™•๋ณดํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ์ตœ์‹  ๊ธฐ์ˆ  ์ข…ํ•ฉ ํ™œ์šฉ: ๋…ผ๋ฌธ์„ ์ž์„ธํžˆ ๋“ค์—ฌ๋‹ค๋ณด๋ฉด, ์ตœ๊ทผ 1-2๋…„๊ฐ„ ๋“ฑ์žฅํ•œ ์ตœ์‹  ๊ธฐ๋ฒ•๋“ค์„ ์ ์žฌ์ ์†Œ์— ํ™œ์šฉํ–ˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Segment-Anything์œผ๋กœ ๋Œ€ํ‘œ๋˜๋Š” ๋น„์ „ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ์˜ ๋กœ๋ด‡ ํ™œ์šฉ, DIGIT & TACTO๋กœ ์ƒ์ง•๋˜๋Š” ์ด‰๊ฐ ์„ผ์„œ ํ•˜๋“œ์›จ์–ดยท์‹œ๋ฎฌ ๋ ˆ์ด์…˜, Instant-NGP์™€ iSDF๋กœ ์ด์–ด์ง€๋Š” ์‹ ๊ฒฝ์žฅ ๊ฐ€์† ๊ธฐ์ˆ , Levenberg-Marquardt์™€ Theseus๋กœ ๋Œ€ํ‘œ๋˜๋Š” differentiable ์ตœ์ ํ™”, ViT ๊ธฐ๋ฐ˜ ํŠธ๋žœ์Šคํฌ๋จธ์˜ ์‘์šฉ ๋“ฑ, ํ•˜๋‚˜ํ•˜๋‚˜๊ฐ€ ๊ฐ ๋ถ„์•ผ์˜ ์ตœ์ฒจ๋‹จ ์š”์†Œ๋“ค์ž…๋‹ˆ๋‹ค. NeuralFeels๋Š” ์ด๋“ค์„ ํ•œ ๋ฐ ํ†ตํ•ฉํ•˜์—ฌ ์‹ค์ œ ๋กœ๋ด‡ ์‹œ์Šคํ…œ์œผ๋กœ ๊ตฌํ˜„ํ•ด ๋ƒˆ๋‹ค๋Š” ๋ฐ ํฐ ์˜์˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ณง ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ†ตํ•ฉ ์—ฐ๊ตฌ์˜ ์ข…ํ•ฉ ์˜ˆ์‹œ๋ฅผ ๋ณด์—ฌ์ค€ ๊ฒƒ์œผ๋กœ, ์•ž์œผ๋กœ ๋‹ค๋ฅธ ์—ฐ๊ตฌ์ž๋“ค์ด ์ด ๊ตฌ์„ฑ์š”์†Œ๋“ค์„ ํ™œ์šฉํ•˜๊ฑฐ๋‚˜ ๊ฐœ์„ ํ•ด๋‚˜๊ฐˆ ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

  • ์žฌํ˜„์„ฑ๊ณผ ๊ฐœ๋ฐฉ์„ฑ: ํŽ˜์ด์Šค๋ถ/Meta ์—ฐ๊ตฌ์ง„ ๋‹ต๊ฒŒ, ๋ณธ ๊ฒฐ๊ณผ๋ฌผ์€ ์˜คํ”ˆ์†Œ์Šค๋กœ ์ž˜ ์ •๋ฆฌ๋˜์–ด ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ฝ”๋“œ, ์‚ฌ์ „ ํ•™์Šต๋ชจ๋ธ๊นŒ์ง€ ํ’€์„ธํŠธ๋กœ ๊ณต๊ฐœํ•˜์—ฌ ์žฌํ˜„ ์—ฐ๊ตฌ์— ๋ชจ๋ฒ” ์‚ฌ๋ก€๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๊ฐœ๋ฐฉ์€ ํ•ด๋‹น ๋ถ„์•ผ์˜ ํ‘œ์ค€ ํ‰๊ฐ€ ์…‹ ๋ฐ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์ถ•์œผ๋กœ ์ด์–ด์ง€๊ณ , ๋” ๋‚˜์€ ํ›„์† ์—ฐ๊ตฌ๋ฅผ ๋Œ์–ด๋‚ด๋Š” ์„ ์ˆœํ™˜์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ•œํŽธ, ์ œํ•œ์ ๊ณผ ์•ฝ์ ๋„ ๋ถ„๋ช… ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋ช‡ ๊ฐ€์ง€๋ฅผ ์ง€์ ํ•ด๋ณด๋ฉด:

  • ๋ณต์žกํ•œ ์‹œ์Šคํ…œ๊ณผ ๊ณ„์‚ฐ๋น„์šฉ: NeuralFeels ํŒŒ์ดํ”„๋ผ์ธ์€ ํ•œ๋‘ ๊ฐœ ๋ชจ๋“ˆ์ด ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์˜ ์‹ ๊ฒฝ๋ง๊ณผ ์ตœ์ ํ™”๊ฐ€ ์–ฝํ˜€ ์žˆ์Šต๋‹ˆ๋‹ค. SAM ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜, ViT ํŠธ๋žœ์Šคํฌ๋จธ ์ถ”๋ก , ์‹ ๊ฒฝ์žฅ ํ•™์Šต, LM ์ตœ์ ํ™” ๋“ฑ์ด ๋งค ํ”„๋ ˆ์ž„๋งˆ๋‹ค ๋Œ์•„๊ฐ€๋‹ˆ ์—ฐ์‚ฐ๋Ÿ‰์ด ์ƒ๋‹นํ•ฉ๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ ์‹ค์‹œ๊ฐ„์„ฑ์— ๋Œ€ํ•ด ๋ช…์‹œ์ ์œผ๋กœ ์–ธ๊ธ‰ํ•˜์ง€๋Š” ์•Š์•˜์ง€๋งŒ, ์•„๋งˆ GPU ๊ฐ€์† ์—†์ด๋Š” ํž˜๋“ค๊ณ , GPU๋ฅผ ์จ๋„ ํ”„๋ ˆ์ž„๋‹น ์ˆ˜๋ฐฑ ms ์ด์ƒ์˜ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆด ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋กœ๋ด‡์ด ์•„์ฃผ ๋น ๋ฅด๊ฒŒ ๋ฌผ์ฒด๋ฅผ ๋Œ๋ฆฌ๋Š” ๊ฒฝ์šฐ ์ถ”์  ์ง€์—ฐ์ด ์ƒ๊ธธ ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ Instant-NGP ๋•๋ถ„์— ์‹ ๊ฒฝ์žฅ ํ•™์Šต์€ ๊ฝค ๋น ๋ฅด๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ๊ณ , Theseus LM๋„ ํšจ์œจ์ ์ด๋ผ ์ตœ์ ํ™” ๋ถ€๋ถ„๋„ ๋ณ‘๋ ฌํ™”๊ฐ€ ์ž˜ ๋œ๋‹ค๋ฉด ๊ฐ€๊นŒ์šด ์‹ค์‹œ๊ฐ„ ์ˆ˜์ค€์€ ๋  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ตœ์ ํ™” ์ดํ„ฐ๋ ˆ์ด์…˜ ํšŸ์ˆ˜๋‚˜ ํ‚คํ”„๋ ˆ์ž„ ์œˆ๋„์šฐ ํฌ๊ธฐ์— ๋”ฐ๋ผ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๊ฐ€ ์žˆ๊ฒ ์ง€์š”. ํ–ฅํ›„ ๊ฒฝ๋Ÿ‰ํ™” ๋ฐ ์ตœ์ ํ™” ์†๋„ ๊ฐœ์„ ์ด ์‹ค์šฉํ™”๋ฅผ ์œ„ํ•ด ํ•„์š”ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ์ดˆ๊ธฐ ์ƒํƒœ ์˜์กด์„ฑ: ํ˜„์žฌ ๋ฐฉ๋ฒ•์€ ๋ฌผ์ฒด๋ฅผ ์žก์€ ์ดˆ๊ธฐ ์ƒํƒœ์—์„œ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ๋ฌผ์ฒด๋ฅผ ๋†“์นœ๋‹ค๋“ ๊ฐ€, ์ฒ˜์Œ์— ๋ฌผ์ฒด๊ฐ€ ์–ด๋А ์ •๋„ ๋ณด์—ฌ์•ผ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์ด ๋  ํ…๋ฐ ๋งŒ์•ฝ ์†์— ์™„์ „ํžˆ ํŒŒ๋ฌปํ˜€ ์žˆ์œผ๋ฉด ์‹œ์ž‘์„ ๋ชป ํ•œ๋‹ค๋“ ๊ฐ€ ํ•˜๋Š” ์ดˆ๊ธฐ ์กฐ๊ฑด ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ์ธ๊ฐ„์ด ์ฃผ๋จธ๋‹ˆ ์†์—์„œ ์—ด์‡  ์ฐพ์„ ๋• ์†๋ ๊ฐ๊ฐ๋งŒ์œผ๋กœ ์‹œ์ž‘ํ•˜์ง€๋งŒ, NeuralFeels๋Š” ์ดˆ๊ธฐ๋Š” ์‹œ๊ฐ์— ์กฐ๊ธˆ ์˜์กดํ•ด์•ผ ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค (SAM์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๊ตฌ๋ถ„ํ•˜๋ ค๋ฉด ์•ฝ๊ฐ„์ด๋ผ๋„ ๋ณด์—ฌ์•ผ ํ•˜๋‹ˆ๊นŒ์š”). ๋ฌผ๋ก  ์†๊ฐ€๋ฝ์— ๋‹ฟ์œผ๋ฉด ์ด‰๊ฐ๋„ ์‹œ์ž‘๋˜๋‹ˆ, ์–ด๋А ํ•œ์ชฝ๋„ 0์ธ ์ƒํƒœ๋Š” ์—†๊ฒ ์ง€๋งŒ, ์ดˆ๊ธฐ ํƒ์ƒ‰ ์ „๋žต์ด ์ฃผ์–ด์ง€์ง€ ์•Š์œผ๋ฉด ๋ฌด์ž‘์ • ์†๊ฐ€๋ฝ ๋น„๋น„๋Š” ์‹์œผ๋กœ๋Š” ์–ด๋ ต์ฃ . ๋…ผ๋ฌธ์—์„œ๋Š” proprioception-driven ์ •์ฑ…์œผ๋กœ ์†๊ฐ€๋ฝ์„ ์›€์ง์˜€๋‹ค ํ–ˆ์ง€๋งŒ, ๋งŒ์•ฝ ๋ฌผ์ฒด ์œ„์น˜๋ฅผ ์ดˆ๊ธฐ์—” ๋ชจ๋ฅธ๋‹ค๋ฉด ์žก๊ธฐ๋„ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ์ฆ‰ ๋ฌผ์ฒด ์ง‘๊ธฐ(grasp) ์ดํ›„์˜ ๋ฌธ์ œ ์„ค์ •์ด๋ผ, ์ง‘๊ธฐ ์ด์ „์— ์ด‰๊ฐ์„ ํ™œ์šฉํ•˜๋Š” ๊ฑด ๋ฒ”์œ„ ๋ฐ–์ž…๋‹ˆ๋‹ค. ๋ฏธ๋ž˜์—๋Š” ์ง‘๋Š” ๋‹จ๊ณ„๋ถ€ํ„ฐ ์ด‰๊ฐ-์‹œ๊ฐ ํ๋ฃจํ”„๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์—ฐ๊ตฌ๋กœ ํ™•์žฅ๋  ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ๋ชจ๋ธ์˜ ํ•œ๊ณ„์™€ ํ™•์žฅ์„ฑ: ์‹ ๊ฒฝ SDF ๋ชจ๋ธ์€ ์—ฐ์† ํ‘œํ˜„์œผ๋กœ ์ •๋ฐ€ํ•˜์ง€๋งŒ, ๋ณต์žกํ•œ ๋ชจ์–‘์„ ํ•™์Šตํ•  ๋•Œ ๋กœ์ปฌ ์„ฌ์„ธํ•จ์ด ๋ถ€์กฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Instant-NGP๊ฐ€ multi-level grid๋ฅผ ํ†ตํ•ด ๋งŽ์€ ๋ถ€๋ถ„ ์™„ํ™”ํ•˜์ง€๋งŒ, ์—ฌ์ „ํžˆ ์–‡์€ ๊ตฌ์กฐ๋‚˜ ๊ณ ํ•ด์ƒ๋„ ํ…์Šค์ฒ˜ ๊ฐ™์€ ๊ฒƒ์€ ํ‘œํ˜„์—์„œ ๋†“์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹คํ–‰ํžˆ ์ด๋ฒˆ ์—ฐ๊ตฌ๋Š” ๋ฌผ์ฒด ํ˜•์ƒ๋งŒ ๋‹ค๋ฃจ๊ณ , ํ‘œ๋ฉด ์žฌ์งˆ์ด๋‚˜ ์ƒ‰์€ ๋ฌด๊ด€ํ•˜์ง€๋งŒ, ๋‚˜์ค‘์— ์‹๋ณ„์„ ์œ„ํ•ด ํ…์Šค์ฒ˜๊นŒ์ง€ ๊ณ ๋ คํ•˜๋ ค๋ฉด ๋” ๋ณต์žกํ•œ ์‹ ๊ฒฝ์žฅ (์˜ˆ: radiance field)์œผ๋กœ ํ™•์žฅํ•ด์•ผ ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ฌผ์ฒด๊ฐ€ ๋น„(้ž)๊ฐ•์ฒด๊ฑฐ๋‚˜ ๋ณ€ํ˜•๋˜๋Š” ๊ฒฝ์šฐ์—๋Š” SDF ํ•˜๋‚˜๋กœ๋Š” ์•ˆ ๋˜๊ณ  ๋” ๋ณต์žกํ•œ ๋ชจ๋ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฒˆ์€ ๊ฐ•์ฒด๋งŒ ๋Œ€์ƒ์œผ๋กœ ํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ์ด‰๊ฐ ์ •๋ณด์˜ ๋ฒ”์œ„: DIGIT ์ด‰๊ฐ ์„ผ์„œ๋Š” ์†๊ฐ€๋ฝ ๊ทนํžˆ ์ผ๋ถ€ ๋ฉด์ ์˜ ์ ‘์ด‰๋งŒ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ๋žŒ ์†์˜ ์ด‰๊ฐ์— ๋น„ํ•˜๋ฉด ๋ฒ”์œ„์™€ ์ข…๋ฅ˜๊ฐ€ ์ œํ•œ๋˜์ง€์š”. ๊ทธ๋ž˜์„œ ๋ฌผ์ฒด ๋Œ€๋ถ€๋ถ„ ํ‘œ๋ฉด์€ ์—ฌ์ „ํžˆ ์‹œ๊ฐ์— ์˜์กดํ•  ์ˆ˜๋ฐ–์— ์—†์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ์—ฐ๊ตฌ์˜ ์ด‰๊ฐ์€ โ€œ๊ตญ๋ถ€์ ์ธ ์œค๊ณฝ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ด ์ฃผ๋Š” ๋ณด์กฐโ€ ์—ญํ• ์ธ๋ฐ, ํ–ฅํ›„ ์ด‰๊ฐ ๋ฒ”์œ„๋ฅผ ๋Š˜๋ฆฌ๋ ค๋ฉด ์„ผ์„œ ์—ฌ๋Ÿฌ ๊ฐœ๋ฅผ ์† ์ „์ฒด์— ๋ถ™์ด๊ฑฐ๋‚˜, ํž˜/ํ† ํฌ ์„ผ์„œ๊นŒ์ง€ ํฌํ•จํ•ด ๋ณด๋‹ค ํ’๋ถ€ํ•œ ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ์„ ์ˆ˜์ง‘ํ•˜๋Š” ๋ฐฉํ–ฅ๋„ ์ƒ๊ฐํ•ด๋ณผ ๋งŒํ•ฉ๋‹ˆ๋‹ค.

  • ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ์‹ ๋ขฐ์„ฑ: SAM์„ ํ™œ์šฉํ–ˆ๋‹ค๊ณ  ํ•˜์ง€๋งŒ, ์—ฌ์ „ํžˆ ์™„๋ฒฝํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์†๊ฐ€๋ฝ๊ณผ ๋ฌผ์ฒด์˜ ๊ฒฝ๊ณ„๊ฐ€ ์• ๋งคํ•˜๊ฑฐ๋‚˜, ์†๊ฐ€๋ฝ์— ๋ฌป์€ ๋ฌผ์ฒด์ƒ‰, ํ˜น์€ ๋ฌผ์ฒด ํ‘œ๋ฉด์— ๋ฐ˜์‚ฌ๊ฐ€ ์žˆ์–ด ๊นŠ์ด ์˜ค์ ์ด ์ƒ๊ธฐ๋Š” ๊ฒฝ์šฐ ์˜ค๋ถ„๋ฅ˜ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ €์ž๋“ค๋„ ์ž˜๋ชป๋œ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์„ ๋Œ€๋น„ํ•ด ํฌ์ฆˆ ์ตœ์ ํ™”์—์„œ Pose regularizer๋กœ ์™„์ถฉ์„ ๋„ฃ์—ˆ์ง€๋งŒ, ์•„์ฃผ ํฐ ์˜ค๋ฅ˜๊ฐ€ ์žˆ์œผ๋ฉด ์ถ”์ ์ด ๊ผฌ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ์˜ค๋ฅ˜๋Š” ๋กœ๋ด‡์—๊ฒ ์น˜๋ช…์ ์ผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์ด ๋ถ€๋ถ„์˜ ์•ˆ์ •ํ™”๊ฐ€ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์ถ”ํ›„์—” ์‹œ๊ฐ-์ด‰๊ฐ-๊ธฐ๊ตฌํ•™ ์ •๋ณด๋ฅผ ํ•ฉ์นœ ๋™์‹œ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๊ธฐ๋ฒ•์œผ๋กœ ๋” ๊ฒฌ๊ณ ํ•˜๊ฒŒ ๊ฐœ์„ ํ• ไฝ™ๅœฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ผ๋ฐ˜ํ™”๋œ ์ƒํ˜ธ์ž‘์šฉ ๋ถ€์กฑ: ๋ณธ ์‹คํ—˜ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” ์˜ค๋กœ์ง€ ๊ณต์ค‘์—์„œ ํšŒ์ „๋งŒ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์ด๋Š” ๋งค์šฐ ๊น”๋”ํ•œ ์ƒํ™ฉ์ธ๋ฐ, ์‹ค์ œ ๊ฐ€์ •์šฉ ๋กœ๋ด‡์ด ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฃฐ ๋• ์ง‘์–ด ์˜ฎ๊ธฐ๊ณ , ๋„๊ตฌ์— ๊ฝ‚๊ณ , ์ฑ…์ƒ์— ๋‚ด๋ ค๋†“๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰ ๋ฌผ์ฒด-๋ฌผ์ฒด ์ƒํ˜ธ์ž‘์šฉ์ด๋‚˜ ํ™˜๊ฒฝ๊ณผ์˜ ์ ‘์ด‰๋„ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. NeuralFeels๋Š” ํ˜„์žฌ ์†๊ณผ ๋ฌผ์ฒด์˜ ์ƒํ˜ธ์ž‘์šฉ๋งŒ ๋ชจ๋ธ๋งํ–ˆ์ง€๋งŒ, ๋ฏธ๋ž˜์—๋Š” ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฅธ ํ‘œ๋ฉด์— ๋ฌธ์ง€๋ฅด๋ฉฐ ์ด‰๊ฐ์„ ์–ป๋Š”๋‹ค๊ฑฐ๋‚˜ ํ•˜๋Š” ๋ณด๋‹ค ์ผ๋ฐ˜์ ์ธ ์ƒํ˜ธ์ž‘์šฉ ์ธ์ง€๋กœ ๋‚˜์•„๊ฐ€์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๊ฒฝ์šฐ์—” SLAM ๋ฒ”์œ„๋„ ๋” ๋„“์–ด์ ธ์•ผ ํ•ฉ๋‹ˆ๋‹ค (์˜ˆ: ์†์— ๋“  ๋ฌผ์ฒด์™€ ์ฃผ๋ณ€ ํ™˜๊ฒฝ๊นŒ์ง€ ๋™์‹œ์— ๋ชจ๋ธ๋ง).

๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , NeuralFeels์˜ ๊ธฐ์—ฌ๋Š” ๋กœ๋ด‡ Dexterity์— ์ƒˆ๋กœ์šด ์ง€ํ‰์„ ์—ด์—ˆ๋‹ค๊ณ  ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„๋„ ๋ณธ ๊ธฐ๋ฒ•์„ โ€œํ–ฅํ›„ ๋กœ๋ด‡ ์†์žฌ์ฃผ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ธ์ง€ ๋ฐฑ๋ณธโ€์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ช‡ ๊ฐ€์ง€ ํ›„์† ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ์ œ์•ˆํ•ด๋ณด์ฃ :

  • ํ•™์Šต๊ณผ ์‚ฌ์ „ ์ง€์‹์˜ ์œตํ•ฉ: NeuralFeels๋Š” ์ œ๋กœ๋ถ€ํ„ฐ ๋ฐฐ์šฐ์ง€๋งŒ, ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ์‚ฌ์ „ ํ•™์Šต์ด๋‚˜ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํž˜์„ ๋นŒ๋ฆด ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ Shape Completion ๋ถ„์•ผ์˜ ๋”ฅ๋Ÿฌ๋‹์„ ์ ‘๋ชฉํ•˜๋ฉด, ์งง์€ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ์–ป์€ ๋‹จํŽธ ์ •๋ณด๋งŒ์œผ๋กœ๋„ ๋ฌผ์ฒด ์ „๋ฉด์„ ์ถ”์ธกํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ ๋ณ„ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜๊ฑฐ๋‚˜, ๋Œ€๊ทœ๋ชจ ์‚ฌ์ „ ํ•™์Šต๋œ ์‹ ๊ฒฝ์žฅ์œผ๋กœ ๋น ๋ฅด๊ฒŒ ์ดˆ๊ธฐํ™”ํ•˜๋Š” ๋ฐฉ์‹๋„ ๊ฐ€๋Šฅํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋งˆ์น˜ ์ฒ˜์Œ ๋ณธ ๋ฌผ์ฒด๋ผ๋„ ์‚ฌ๋žŒ์€ ์œ ์‚ฌํ•œ ๊ฑธ ๋– ์˜ฌ๋ ค ๋Œ€์ถฉ ์ง์ž‘ํ•˜๋“ฏ์ด, ๋กœ๋ด‡๋„ ๊ฒฝํ—˜์„ ์ด์šฉํ•ด ๋” ๋น ๋ฅด๊ฒŒ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค.

  • ์‹ค์‹œ๊ฐ„์„ฑ๊ณผ ๊ฒฝ๋Ÿ‰ํ™”: ์•ž์„œ ์ง€์ ํ•œ ์†๋„ ๋ฌธ์ œ๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด, ๋ณด๋‹ค ๊ฐ„๊ฒฐํ•œ ๋ชจ๋ธ์ด๋‚˜ ๋ณ‘๋ ฌ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ ํฌ์ฆˆ ์ถ”์ •์„ ์™„์ „ํžˆ end-to-end ํ•™์Šต์‹œ์ผœ ๋„คํŠธ์›Œํฌ๋กœ ๋Œ€์ฒดํ•˜๋˜, ๊ทธ ์ถœ๋ ฅ์„ ์‹ ๊ฒฝ์žฅ ์—…๋ฐ์ดํŠธ์— ๋ฐ˜์˜ํ•˜๋Š” ํ•™์Šต+์ตœ์ ํ™” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ๋„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜น์€ ํฌ์ฆˆ ์ตœ์ ํ™” ๋นˆ๋„๋ฅผ ๋‚ฎ์ถ”๊ณ  IMU ๋“ฑ์˜ ์ถ”๊ฐ€ ์„ผ์„œ๋กœ ๋ณด์™„ํ•ด๋„ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ถ๊ทน์ ์œผ๋กœ ๋กœ๋ด‡ ์ œ์–ด์— ๋„ฃ์œผ๋ ค๋ฉด perception์ด ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋‚˜์™€์•ผ ํ•˜๋‹ˆ, GPU ํ•˜๋‚˜๋กœ ์—ฌ๋Ÿฌ ๊ฐ์ฒด ์ฒ˜๋ฆฌ๋„ ๊ณ ๋ คํ•ด์•ผ๊ฒ ์ง€์š”.

  • ๋‹ค์–‘ํ•œ ์ด‰๊ฐ ์„ผ์„œ์™€ ํ†ตํ•ฉ: DIGIT ์™ธ์—๋„ ์••๋ ฅ๋งคํŠธ๋ฆญ์Šค, ์ปคํŒจ์‹œํ‹ฐ๋ธŒ ์„ผ์„œ, ํ•€ ๋ฐฐ์—ด ์„ผ์„œ ๋“ฑ ์ด‰๊ฐ ๋ฐฉ์‹์ด ๋‹ค์–‘ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ๊ธฐ ๋‹ค๋ฅธ ์†์„ฑ (์˜ˆ: ํž˜-๋ณ€ํ˜• ๊ด€๊ณ„)๋„ ์žˆ์œผ๋ฏ€๋กœ, ์ด๋“ค์„ ๊ฐ™์€ ํ”„๋ ˆ์ž„์›Œํฌ์— ํ†ตํ•ฉํ•˜๋ฉด ๋” ํ’๋ถ€ํ•œ ์ •๋ณด๋กœ ์ •ํ™•๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด‰๊ฐ-์‹œ๊ฐ ์™ธ์— ์ฒญ๊ฐ(์˜ˆ: ๋ฌผ์ฒด๊ฐ€ ์›€์ง์ผ ๋•Œ ๋‚˜๋Š” ์†Œ๋ฆฌ๋กœ ์žฌ์งˆ ์ถ”์ •) ๋“ฑ ๋‹ค๋ฅธ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๋„ ๊ณ ๋ ค ๊ฐ€๋Šฅํ•˜์ฃ . ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ sensor fusion์˜ ํ™•์žฅ์„ฑ์ด ์—ด๋ ค ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ƒํ˜ธ์ž‘์šฉ์ (next-best-sense) ์ธ์ง€: ํ˜„์žฌ๋Š” ์ •ํ•ด์ง„ ์ •์ฑ…์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๋Œ๋ ธ์ง€๋งŒ, ํ–ฅํ›„์—๋Š” ์ธ์ง€ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ๋กœ๋ด‡์ด ์ ๊ทน์ ์œผ๋กœ ํƒ์ƒ‰ ํ–‰๋™์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์•„์ง ์•ˆ ๋งŒ์ ธ๋ณธ ๋ถ€๋ถ„์„ ๋งŒ์ง€๋Ÿฌ ์›€์ง์ธ๋‹ค๊ฑฐ๋‚˜, ํŠน์ • ๊ฐ๋„๊ฐ€ ์•ˆ ๋ณด์˜€์œผ๋ฉด ์†๋ชฉ์„ ๋Œ๋ ค ๋” ์ž˜ ๋ณด์ด๊ฒŒ ํ•˜๋Š” ์‹์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ Planning ๋ฌธ์ œ (๋‹ค์Œ ์–ด๋””๋ฅผ ๋งŒ์งˆ๊นŒ?)๋กœ ํ’€๋ฉด ์ตœ์†Œํ•œ์˜ ๋™์ž‘์œผ๋กœ ์ตœ๋Œ€ ์ •๋ณด ํš๋“ํ•˜๋Š” ์ง€๋Šฅํ˜• ์ธ์ง€๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. NeuralFeels์˜ ๊ตฌ์„ฑ์š”์†Œ๋“ค์€ ์ด๋Ÿฌํ•œ active perception ์ „๋žต๊ณผ๋„ ๊ถํ•ฉ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์‹ ๊ฒฝ์žฅ ๋ชจ๋ธ์˜ ๋ถˆํ™•์‹ค์„ฑ์„ ๋ถ„์„ํ•˜์—ฌ, ๊ฐ€์žฅ ๋ถˆํ™•์‹คํ•œ ๋ถ€๋ถ„์˜ ์ •๋ณด๋ฅผ ์–ป๋Š” ํ–‰๋™์„ ์„ ํƒํ•˜๋„๋ก ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ๊ด€๋ จ ์—ฐ๊ตฌ์™€ ๋น„๊ต: ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ณธ ์—ฐ๊ตฌ๋ฅผ ๊ธฐ์กด ์‹œ๊ฐ-์ด‰๊ฐ ์ธ์ง€ ์—ฐ๊ตฌ๋“ค๊ณผ ๋งฅ๋ฝ ์†์—์„œ ๋ณด๋ฉด, FingerSLAM (dense touch๋กœ ๊ณ ์ • ๋ฌผ์ฒด ๋ชจ๋ธ ์žฌ๊ตฌ์„ฑ)์ด๋‚˜ Gelsight ๊ธฐ๋ฐ˜ Pose Tracking ๋“ฑ ์„ ํ–‰ ์—ฐ๊ตฌ๋“ค์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. FingerSLAM์€ ํ•œ ๊ฐœ์˜ ์ด‰๊ฐ ์„ผ์„œ๋กœ ์นด๋ฉ”๋ผ ์ „๋ฉด ๊ฐ€๋ฆผ ์—†๋Š” ์กฐ๊ฑด์ด๋ผ ๋‹จ์ˆœํ–ˆ์ง€๋งŒ, NeuralFeels๋Š” ๋‹ค์ˆ˜ ์ด‰๊ฐ + ๊ฐ€๋ฆผํ—ˆ์šฉ ๋‹ค์ด๋‚ด๋ฏน์œผ๋กœ ๋ฌธ์ œ ๋‚œ์ด๋„๋ฅผ ๋†’์ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ Rodriguez ๊ทธ๋ฃน์˜ TACTO-SLAM ๊ด€๋ จ ์—ฐ๊ตฌ๋‚˜, SIMPLE (sim-to-real visuotactile regrasp) ๋“ฑ๋„ ๋ฌผ์ฒด ์œ„์น˜ ์ถ”์ •์— ์ด‰๊ฐ์„ ํ™œ์šฉํ–ˆ์ง€๋งŒ, ์ด๋“ค์€ ๋ฌผ์ฒด๊ฐ€ ํ…Œ์ด๋ธ” ์œ„์— ๊ณ ์ •๋˜์—ˆ๊ฑฐ๋‚˜ ๊ฐ„๋‹จ ์ด๋™๋งŒ ๊ณ ๋ คํ–ˆ์Šต๋‹ˆ๋‹ค. NeuralFeels๋Š” ๋ณต์žกํ•œ 6-DoF in-hand ์šด๋™์„ ์ปค๋ฒ„ํ•˜๋ฉฐ, shape๊นŒ์ง€ ๋™์‹œ์ถ”์ •ํ•˜๋Š” ์ ์—์„œ ์ฐจ๋ณ„ํ™”๋ฉ๋‹ˆ๋‹ค. ํ•œ๋งˆ๋””๋กœ ๋กœ๋ด‡ ์ด‰๊ฐ SLAM์˜ ์ƒˆ๋กœ์šด ์ฑ•ํ„ฐ๋ฅผ ์—ด์—ˆ๋‹ค ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.

3.5 ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

NeuralFeels: Neural Fields + Feels(์ด‰๊ฐ)๋ผ๋Š” ์žฌ์น˜์žˆ๋Š” ์ œ๋ชฉ์˜ ์ด ๋…ผ๋ฌธ์€, ๋กœ๋ด‡์ด ์†๋์˜ ์ด‰๊ฐ๊นŒ์ง€ ํ™œ์šฉํ•˜์—ฌ ๋ฌผ์ฒด๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ธํ•ธ๋“œ SLAM ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด, ๋กœ๋ด‡์ด ์ฒ˜์Œ ๋ณด๋Š” ๋ฌผ์ฒด๋ผ๋„ ๋“ค๊ณ  ์›€์ง์ด๋Š” ์งง์€ ์ƒํ˜ธ์ž‘์šฉ๋งŒ์œผ๋กœ ๊ทธ 3D ํ˜•์ƒ๊ณผ ์ž์„ธ ์›€์ง์ž„์„ ๊ฑฐ์˜ ์ธ๊ฐ„ ์ˆ˜์ค€ ์ •ํ™•๋„๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์—ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋กœ๋ด‡์ด ์ธ๊ฐ„์— ๋น„ํ•ด ํ˜„๊ฒฉํžˆ ๋ถ€์กฑํ–ˆ๋˜ ๋ถ€๋ถ„์ธ ์ด‰๊ฐ์  ํƒ์ƒ‰๊ณผ ์ธ์ง€ ์˜์—ญ์—์„œ ํฐ ์ง„์ „์ž…๋‹ˆ๋‹ค. ์‚ฌ๋žŒ์œผ๋กœ ์น˜๋ฉด ๋ˆˆ์„ ๊ฐ๊ณ ๋„ ์†์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๋”๋“ฌ์–ด ๊ทธ ๋ฌผ๊ฑด์„ ์ƒ์ƒํ•ด๋‚ด๋Š” ๋Šฅ๋ ฅ์„ ๊ธฐ๊ณ„์— ๋ถ€์—ฌํ•œ ์…ˆ์ด์ฃ .

NeuralFeels์˜ ์„ฑ๊ณต์€ ์—ฌ๋Ÿฌ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ์‹ค์šฉ ๋ฉด์—์„œ, ์ด ๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•˜๋ฉด ๋กœ๋ด‡์€ ๋ถ€์—Œ ์„œ๋ž ์† ๋ฌผ๊ฑด ์ฐพ๊ธฐ, ๋น„์ •ํ˜• ๋ฌผ์ฒด ์กฐ๋ฆฝ, ์–ด๋‘์šด ๊ณณ์—์„œ์˜ ์กฐ์ž‘ ๋“ฑ ์ง€๊ธˆ๊นŒ์ง€ ํž˜๋“ค์—ˆ๋˜ ์ž‘์—…๋“ค์„ ๋” ์ž˜ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ํ™•๋ณด๋œ 3D ๋ชจ๋ธ๊ณผ ์œ„์น˜ ์ •๋ณด๋Š” ๋กœ๋ด‡์˜ ๋‹ค๋ฅธ ์ง€๋Šฅ ๋ชจ๋“ˆ(์˜ˆ: ๊ฒฝ๋กœ๊ณ„ํš, ๊ทธ๋ฆฝ ์กฐ์ •, ๋ฌผ์ฒด ์‹๋ณ„)์— ๋ฐ”๋กœ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์–ด, ์ข…ํ•ฉ์ ์ธ ๋กœ๋ด‡ ๊ธฐ์ˆ  ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค. ํ•™์ˆ  ๋ฉด์—์„œ๋„, ๋ณธ ์—ฐ๊ตฌ๋Š” ์‹œ๊ฐ๊ณผ ์ด‰๊ฐ์˜ ์ง„์ •ํ•œ ์˜๋ฏธ์˜ ์„ผ์„œ ์œตํ•ฉ์„ ๊ตฌํ˜„ํ•˜์—ฌ ํ–ฅํ›„ multimodal SLAM์ด๋‚˜ interactive perception ๋ถ„์•ผ์˜ ๋งŽ์€ ํ›„์† ์—ฐ๊ตฌ๋ฅผ ์ž๊ทนํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ ์ €์ž๋“ค์€ โ€œTo perceive deeply is to have sensed fullyโ€๋ผ๋Š” ํ†ต์ฐฐ๋กœ ๋…ผ๋ฌธ์„ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊นŠ์ด ์ธ์ง€ํ•œ๋‹ค๋Š” ๊ฒƒ์€ ์ถฉ๋ถ„ํžˆ ๊ฐ์ง€ํ•œ ๊ฒƒ๊ณผ ๋‹ค๋ฆ„์—†๋‹ค๋Š” ์˜๋ฏธ์ด์ง€์š”. ๋กœ๋ด‡์—๊ฒŒ ์žˆ์–ด ์ถฉ๋ถ„ํžˆ ๊ฐ์ง€ํ•œ๋‹ค๋Š” ๊ฒƒ์€ ํ•˜๋‚˜์˜ ์„ผ์„œ์— ์˜์กดํ•˜์ง€ ์•Š๊ณ , ์ด์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ๊ฐ๊ฐ์„ ์ด๋™์›ํ•˜๋Š” ๊ฒƒ์ผ ๊ฒ๋‹ˆ๋‹ค. NeuralFeels๋Š” ๊ทธ๋Ÿฌํ•œ ๋กœ๋ด‡ ๊ฐ๊ฐ ํ†ตํ•ฉ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ํ™œ์ง ์—ด์–ด ๋ณด์ธ ํ›Œ๋ฅญํ•œ ์˜ˆ์ด๋ฉฐ, ํ–ฅํ›„ ์šฐ๋ฆฌ ์ฃผ๋ณ€์—์„œ ๋”์šฑ ๋˜‘๋˜‘ํ•˜๊ณ  ๋Šฅ์ˆ™ํ•œ ๋กœ๋ด‡ ์†๋“ค์„ ๋งŒ๋‚˜๊ฒŒ ๋  ์ „๋ง์„ ํ•œ์ธต ๋ฐ๊ฒŒ ํ•ด์ฃผ๋Š” ์—ฐ๊ตฌ๋ผ๊ณ  ๊ฒฐ๋ก ์ง€์„ ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.


๐Ÿงพ ์ฐธ๊ณ ์ž๋ฃŒ

  • GelSight ๊ธฐ์ˆ  ๊ฐœ์š”
  • NeRF ๊ฐœ๋… ์„ค๋ช… ๋ธ”๋กœ๊ทธ

Copyright 2026, JungYeon Lee