Curieux.JY
  • Post
  • Note
  • Jung Yeon Lee

On this page

  • 1 Brief Review
  • 2 Detail Review
    • 2.1 1. ๐Ÿ‘‹ ์ด ๋…ผ๋ฌธ์€ ๋ฌด์—‡์„ ๋‹ค๋ฃจ๊ณ  ์žˆ๋‚˜?
    • 2.2 2. ๐Ÿ”ง ๋ฐฐ๊ฒฝ ์ง€์‹: Neural Field์™€ ์ด‰๊ฐ ์„ผ์„œ
      • 2.2.1 ๐Ÿ”น Neural Field๋ž€?
      • 2.2.2 ๐Ÿ”น GelSight ์„ผ์„œ๋ž€?
    • 2.3 3. ๐Ÿง  NeuralFeels์˜ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
      • 2.3.1 โœจ Shape Field: ํ˜•์ƒ์„ ๊ทธ๋ฆฌ๋Š” ์ด‰๊ฐ
      • 2.3.2 โœจ Contact Field: ์†๋์˜ ์••๋ ฅ์„ ํ™•๋ฅ ๋กœ
    • 2.4 4. โš™๏ธ ์–ด๋–ป๊ฒŒ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ–ˆ๋‚˜?
      • 2.4.1 ๐Ÿงพ ๋ฐ์ดํ„ฐ์…‹: Visuo-Tactile In-Hand Manipulation Dataset
      • 2.4.2 ๐Ÿงช ์‹คํ—˜ ํ‰๊ฐ€ ํ•ญ๋ชฉ
    • 2.5 5. ๐Ÿ“Š ์‹คํ—˜ ๊ฒฐ๊ณผ ์š”์•ฝ
      • 2.5.1 ๐Ÿ” ์ฃผ์š” ์ธ์‚ฌ์ดํŠธ
    • 2.6 6. ๐Ÿ’ก ๊ธฐ์ˆ ์  ํ†ต์ฐฐ
      • 2.6.1 โœ”๏ธ ์™œ ์ข‹์€ ์•„์ด๋””์–ด์ธ๊ฐ€?
      • 2.6.2 โœ”๏ธ ํŠนํžˆ ๋ˆˆ์— ๋„๋Š” ๋ถ€๋ถ„
    • 2.7 7. โš ๏ธ ํ•œ๊ณ„์  ๋ฐ ๊ณ ๋ฏผ๊ฑฐ๋ฆฌ
      • 2.7.1 ๐Ÿ› ๏ธ ํ•˜๋“œ์›จ์–ด ์˜์กด์„ฑ
      • 2.7.2 ๐Ÿง  ์ถ”๋ก ์€ ๋น ๋ฅด๋‚˜ ํ•™์Šต์€ ๋А๋ฆผ
      • 2.7.3 ๐Ÿ”„ ์ œ์–ด ์‹œ์Šคํ…œ๊ณผ ํ†ตํ•ฉ์€ ๋ฏธ์™„์„ฑ
    • 2.8 8. ๐Ÿค” ๊ทธ๋ฆฌ๊ณ  ์šฐ๋ฆฌ๋Š” ์–ด๋–ค ์งˆ๋ฌธ์„ ๋˜์งˆ ์ˆ˜ ์žˆ์„๊นŒ?
    • 2.9 9. ๐ŸŒฑ ํ–ฅํ›„ ์—ฐ๊ตฌ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋Š” ์•„์ด๋””์–ด
    • 2.10 10. ๐Ÿ“Œ ๋งˆ๋ฌด๋ฆฌ
    • 2.11 ๐Ÿงพ ์ฐธ๊ณ ์ž๋ฃŒ

๐Ÿ“ƒNeural feels with neural fields ๋ฆฌ๋ทฐ

paper
tactile
sdf
Visuo-tactile perception for in-hand manipulation
Published

June 4, 2025

  1. ๐Ÿค– ์ด ๋…ผ๋ฌธ์€ ๋กœ๋ด‡์ด ์† ์•ˆ์—์„œ ๋ฌผ์ฒด๋ฅผ ์กฐ์ž‘ํ•˜๋Š” ๋™์•ˆ ๋ฌผ์ฒด์˜ ์ž์„ธ์™€ ํ˜•ํƒœ๋ฅผ ์ธ์‹ํ•˜๋Š” NeuralFeels๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿง  NeuralFeels๋Š” ๋น„์ „๊ณผ ์ด‰๊ฐ ์„ผ์‹ฑ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์‹ ๊ฒฝ ํ•„๋“œ๋ฅผ ์˜จ๋ผ์ธ์œผ๋กœ ํ•™์Šตํ•˜๊ณ , ์ž์„ธ ๊ทธ๋ž˜ํ”„ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด ์ด๋ฅผ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿ“ˆ ์ด ๋ฐฉ๋ฒ•์€ ๊ฐ์ฒด ์žฌ๊ตฌ์„ฑ๊ณผ ์ž์„ธ ์ถ”์  ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ํŠนํžˆ ์‹œ๊ฐ์  ๊ฐ€๋ฆผ์ด ์‹ฌํ•œ ์ƒํ™ฉ์—์„œ ๊ฐ•์ ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

1 Brief Review

๋ณธ ๋…ผ๋ฌธ โ€œNeuralFeels with neural fields: Visuotactile perception for in-hand manipulationโ€๋Š” ๋‹ค์ค‘ ์†๊ฐ€๋ฝ ๋กœ๋ด‡ ํ•ธ๋“œ๊ฐ€ ์ƒˆ๋กœ์šด ๊ฐ์ฒด๋ฅผ ์† ์•ˆ์—์„œ ์กฐ์ž‘(in-hand manipulation)ํ•˜๋Š” ๋™์•ˆ ๊ฐ์ฒด์˜ ์ž์„ธ(pose)์™€ ํ˜•์ƒ(shape)์„ ์ถ”์ •ํ•˜๋Š” Visuotactile perception ์‹œ์Šคํ…œ์ธ NeuralFeels๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ฐ์ฒด ํŒŒ์•… ๋ฐ ์ถ”์ ์€ ๋กœ๋ด‡ dexterity๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•˜์ง€๋งŒ, ๊ธฐ์กด์˜ in-hand perception ์‹œ์Šคํ…œ์€ ์ฃผ๋กœ ์‹œ๊ฐ(vision)์— ์˜์กดํ•˜๋ฉฐ ๋ฏธ๋ฆฌ ์•Œ๋ ค์ง„ ๊ฐ์ฒด๋กœ ์ œํ•œ๋œ๋‹ค. ์กฐ์ž‘ ์ค‘ ์‹œ๊ฐ์  ํ์ƒ‰(visual occlusion)์ด ๋นˆ๋ฒˆํ•˜๊ฒŒ ๋ฐœ์ƒํ•˜์—ฌ ๊ธฐ์กด ๋ฐฉ์‹์˜ ์ ์šฉ์ด ์–ด๋ ต๋‹ค. NeuralFeels๋Š” Vision, Touch, Proprioception ๊ฐ๊ฐ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์˜จ๋ผ์ธ์œผ๋กœ Neural field๋ฅผ ํ•™์Šตํ•˜๊ณ , ์ด๋ฅผ Pose graph ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ ํ•ด๊ฒฐํ•˜์—ฌ ๊ฐ์ฒด๋ฅผ ์ถ”์ ํ•œ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก ์€ ํฌ๊ฒŒ Front end์™€ Back end๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

Front end: Raw sensor ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ •์— ์ ํ•ฉํ•œ ํ˜•ํƒœ(segmented depth)๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. 1. Segmented Visual Depth: RGB-D ์นด๋ฉ”๋ผ๋กœ๋ถ€ํ„ฐ ๋“ค์–ด์˜ค๋Š” ์ด๋ฏธ์ง€(I_c)์™€ ๊นŠ์ด(D_c) ์ŠคํŠธ๋ฆผ์—์„œ ๊ฐ์ฒด์˜ ๊นŠ์ด ํ”ฝ์…€์„ ๊ฐ•๊ฑดํ•˜๊ฒŒ ๋ถ„ํ• (segment)ํ•œ๋‹ค. Vision foundation model์ธ SAM(Segment Anything Model)์„ ํ™œ์šฉํ•˜๋ฉฐ, ๋กœ๋ด‡ Proprioception ์ •๋ณด(์†๊ฐ€๋ฝ ๋ pf์˜ ์ž์„ธ๋กœ๋ถ€ํ„ฐ ๊ณ„์‚ฐ๋œ centroid pc)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ kinematics-aware prompts (๊ฐ์ฒด centroid์— ํ•ด๋‹นํ•˜๋Š” ๊ธ์ •์  ํ”ฝ์…€ ฮ c(pc) ๋ฐ ๊ฐ€๋ ค์ง€์ง€ ์•Š์€ ์†๊ฐ€๋ฝ ๋ ฮ c(pf)์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€์ •์  ํ”ฝ์…€)๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์ •ํ™•ํ•œ ๋งˆ์Šคํฌ๋ฅผ ์–ป๋Š”๋‹ค. 2. Tactile Depth Estimation: Vision-based touch ์„ผ์„œ์ธ DIGIT(I_s) ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ Contact patch์˜ ๊นŠ์ด๋ฅผ ์ถ”์ •ํ•œ๋‹ค. Convolution ๊ธฐ๋ฐ˜์˜ ๊ธฐ์กด ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ Transformer architecture์ธ Tactile transformer๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ๋ชจ๋ธ์€ Vision-based touch simulator์ธ TACTO๋กœ ์ƒ์„ฑ๋œ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹(YCB ๊ฐ์ฒด 40๊ฐœ์— ๋Œ€ํ•œ 10,000๊ฐœ์˜ ์ƒํ˜ธ์ž‘์šฉ)์œผ๋กœ ํ•™์Šต๋˜์—ˆ์œผ๋ฉฐ, Sim-to-real transfer๋ฅผ ์œ„ํ•ด ์„ผ์„œ LED, ์••์ž… ๊นŠ์ด, ํ”ฝ์…€ ๋…ธ์ด์ฆˆ ๋“ฑ์˜ Randomization์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•ํ–ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด DIGIT ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ Depth map๊ณผ Contact mask๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. Front end์˜ ์ตœ์ข… ์ถœ๋ ฅ์€ ๊ฐ ์„ผ์„œ sโˆˆ{dindex, dmiddle, dring, dthumb, c}์— ๋Œ€ํ•œ Segmented depth image D^s_t์ด๋‹ค.

Back end: Front end์—์„œ ์–ป์€ Depth measurement์™€ Sensor pose๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ์ฒด ๋ชจ๋ธ(Evolving posed object SDF)์„ ์˜จ๋ผ์ธ์œผ๋กœ ๊ตฌ์ถ•ํ•œ๋‹ค. ์ด๋Š” Neural SDF ๋„คํŠธ์›Œํฌ์˜ ๊ฐ€์ค‘์น˜ ฮธ์™€ ๊ฐ์ฒด ์ž์„ธ x_t๋ฅผ ๊ต๋Œ€๋กœ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ˆ˜ํ–‰๋œ๋‹ค. Keyframe set ๎ˆท์„ ์œ ์ง€ํ•˜๋ฉฐ, ์ƒˆ๋กœ์šด Keyframe์€ Information gain (Rendering loss) ๋˜๋Š” ์‹œ๊ฐ„ ๊ฒฝ๊ณผ ๊ธฐ์ค€์œผ๋กœ ์ถ”๊ฐ€๋œ๋‹ค. 1. Shape Optimizer: Front end ์ถœ๋ ฅ์—์„œ ์ถ”์ถœ๋œ Visuotactile depth ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•˜์—ฌ Neural network์˜ ๊ฐ€์ค‘์น˜ ฮธ๋ฅผ ์ตœ์ ํ™”ํ•œ๋‹ค. ๊ณ ์ •๋œ ๊ฐ์ฒด ์ž์„ธ x_t ํ•˜์—์„œ Gradient descent๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. * Sampling: Keyframe์œผ๋กœ๋ถ€ํ„ฐ Surface ํ”ฝ์…€(Touch ์ „์šฉ) ๋ฐ Free-space ํ”ฝ์…€(Vision ์ „์šฉ)์„ ์ƒ˜ํ”Œ๋งํ•˜๊ณ , ๊ฐ Ray๋ฅผ ๋”ฐ๋ผ Pu๊ฐœ์˜ 3D Point๋ฅผ ์ƒ˜ํ”Œ๋งํ•œ๋‹ค. * SDF Loss: Sampled point์—์„œ์˜ ์˜ˆ์ธก SDF ๊ฐ’๊ณผ Truncated distance d_tr (5mm)๋ฅผ ๋น„๊ตํ•˜๋Š” Truncated SDF loss (L_shape)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. \mathcal{L}_{\text {shape }}=\mathcal{L}_{f}+w_{tr} \mathcal{L}_{\text {tr }} \mathcal{L}_{f}=\frac{1}{\left|u_{kt}\right|} \sum_{u \in u_{kt}} \frac{1}{\left|P_{f u}\right|} \sum_{p \in P_{f u}}\left|\mathcal{F}_{\theta}\left(x_{t} p\right)-d_{tr}\right| \mathcal{L}_{\text {tr }}=\frac{1}{\left|u_{kt}\right|} \sum_{u \in u_{kt}} \frac{1}{\left|P_{\text {tru}}\right|} \sum_{p \in P_{\text {tru}}}\left|\mathcal{F}_{\theta}\left(x_{t} p\right)-\hat{d}_{u}\right| ์—ฌ๊ธฐ์„œ P_fu๋Š” Truncation distance ๋ฐ–์— ์žˆ๋Š” ์ ๋“ค, P_tru๋Š” ์•ˆ์— ์žˆ๋Š” ์ ๋“ค, dฬ‚_u๋Š” Batch distance bound์ด๋‹ค. L_shape๋Š” Network weights ฮธ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. 2. Pose Optimizer: Frozen Neural field Fฮธxt์— ๋Œ€ํ•ด ๊ฐ์ฒด ์ž์„ธ x_t๋ฅผ ์ •์ œํ•˜๊ธฐ ์œ„ํ•ด Pose graph๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ  ํ•ด๊ฒฐํ•œ๋‹ค. ํฌ๊ธฐ n์˜ Sliding window ๋‚ด์—์„œ SE(3) poses ๎‰„t๋ฅผ Nonlinear least-squares optimization์œผ๋กœ ์ถ”์ •ํ•œ๋‹ค. Theseus ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ LM solver๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. \mathcal{X}_{t}=\underset{\mathcal{X}_{t}}{\operatorname{argmin}} \mathcal{L}_{\text {pose }}\left(\mathcal{X}_{t} \mid \mathcal{M}_{t}, \theta\right) \text { where } \mathcal{L}_{\text {pose }}=w_{\text {sdf }} \mathcal{L}_{\text {sdf }}+w_{\text {reg }} \mathcal{L}_{\text {reg }}+w_{\text {icp }} \mathcal{L}_{\text {icp }} * L_sdf: Surface point ๊ทผ์ฒ˜์—์„œ ์ƒ˜ํ”Œ๋ง๋œ SDF loss. Custom Jacobian์„ ๊ตฌํ˜„ํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ๋†’์˜€๋‹ค. * L_reg: ์—ฐ์†์ ์ธ Keyframe poses ์‚ฌ์ด์— ์ ์šฉ๋˜๋Š” ์•ฝํ•œ Regularizer. * L_icp: Current ๋ฐ Previous Visuotactile point cloud ์‚ฌ์ด์˜ ICP constraint (Frame-to-frame constraint). ์ด Loss๋“ค์„ ์ตœ์†Œํ™”ํ•˜์—ฌ ๎‰„t๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

์‹คํ—˜์€ Simulation๊ณผ Real world์—์„œ ์ด 70๊ฐœ์˜ ์‹คํ—˜(Novel object 14๊ฐœ)์„ ํ†ตํ•ด ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค. ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹์ธ FeelSight๋ฅผ ๊ณต๊ฐœํ•œ๋‹ค. Proprioception-driven in-hand rotation policy๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ–ˆ๋‹ค. Ground truth pose๋Š” Simulation์—์„œ๋Š” Isaac Gym์—์„œ ์ง์ ‘ ์–ป์—ˆ๊ณ , Real world์—์„œ๋Š” ์ถ”๊ฐ€ ์นด๋ฉ”๋ผ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Known shape Pose tracking์œผ๋กœ Pseudo-ground truth๋ฅผ ์ถ”์ •ํ–ˆ๋‹ค. ํ‰๊ฐ€ metric์œผ๋กœ๋Š” ์ž์„ธ ์ถ”์ • ์˜ค์ฐจ์— ๋Œ€ํ•ด Symmetric Average Euclidean Distance (ADD-S), ํ˜•์ƒ ์žฌ๊ตฌ์„ฑ์— ๋Œ€ํ•ด F-score (ฯ„=5mm)๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. * Novel Object SLAM: Touch ํ†ตํ•ฉ ์‹œ ํ‰๊ท  F-score๋Š” Simulation์—์„œ 15.3%, Real world์—์„œ 14.6% ํ–ฅ์ƒ๋˜์—ˆ์œผ๋ฉฐ, ํ‰๊ท  Pose drift๋Š” Simulation์—์„œ 21.3%, Real world์—์„œ 26.6% ๊ฐ์†Œํ–ˆ๋‹ค. Touch๋Š” Shape completion (์‹œ๊ฐ์ ์œผ๋กœ ๊ฐ€๋ ค์ง„ ํ‘œ๋ฉด) ๋ฐ Shape refinement (๋ณด์ด๋Š” ํ‘œ๋ฉด์˜ ์ •๋ฐ€๋„ ํ–ฅ์ƒ)์— ๊ธฐ์—ฌํ•จ์„ Qualitative ๊ฒฐ๊ณผ๋กœ ํ™•์ธํ–ˆ๋‹ค. Touch-only SLAM์€ ์ „์—ญ์ ์ธ ๊ธฐํ•˜ํ•™ ์ •๋ณด ๋ถ€์กฑ์œผ๋กœ ์‹คํŒจํ–ˆ๋‹ค. * Known Object Pose Tracking: A priori known shape๋ฅผ ์ œ๊ณตํ–ˆ์„ ๋•Œ, Touch ํ†ตํ•ฉ ์‹œ ํ‰๊ท  Pose error๋Š” 2.3mm๋กœ ๊ฐ์†Œํ–ˆ๋‹ค. Simulation์—์„œ๋Š” 22.29%, Real world์—์„œ๋Š” 3.9% ์˜ค์ฐจ ๊ฐ์†Œ๋ฅผ ๋ณด์˜€๋‹ค. Real world์—์„œ์˜ ๋‚ฎ์€ ๊ฐœ์„ ์€ DIGIT Elastomer์˜ ๋‚ฎ์€ ๋ฏผ๊ฐ๋„์™€ Sparse contact์— ๊ธฐ์ธํ•œ๋‹ค. * Occlusion ๋ฐ Sensing Noise ํ•˜์—์„œ์˜ ์„ฑ๋Šฅ: * Occlusion: 200๊ฐœ์˜ Simulation camera viewpoint ์‹คํ—˜ ๊ฒฐ๊ณผ, Visuotactile fusion์€ Heavy occlusion ํ™˜๊ฒฝ์—์„œ ์ตœ๋Œ€ 94.1%๊นŒ์ง€ Pose tracking ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. Touch๋Š” Low occlusion ํ™˜๊ฒฝ์—์„œ๋Š” Refinement, High occlusion ํ™˜๊ฒฝ์—์„œ๋Š” Robustification ์—ญํ• ์„ ํ–ˆ๋‹ค. * Visual depth noise: Realistic RGB-D noise๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•œ ๊ฒฐ๊ณผ, Noise ์ˆ˜์ค€์ด ๋†’์„์ˆ˜๋ก Touch ํ†ตํ•ฉ ์‹œ Error distribution์ด ๋‚ฎ์•„์กŒ๋‹ค.

๊ฒฐ๋ก ์ ์œผ๋กœ NeuralFeels๋Š” Multi-modal, Multi-finger manipulation์„ ์œ„ํ•œ ๊ฐ•๊ฑดํ•œ Object-centric SLAM์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, Novel object์— ๋Œ€ํ•ด ํ‰๊ท  F-score 81%, ํ‰๊ท  Pose drift 4.7mm๋ฅผ ๊ธฐ๋กํ–ˆ๋‹ค (Known shape ์‹œ 2.3mm). ํ’๋ถ€ํ•œ ๊ฐ๊ฐ ์ •๋ณด์˜ ์œ ์šฉ์„ฑ์„ ์ž…์ฆํ–ˆ์œผ๋ฉฐ, ์‹œ๊ฐ์  ํ์ƒ‰ ๋ฐ ๋…ธ์ด์ฆˆ ํ™˜๊ฒฝ์—์„œ Touch๊ฐ€ ์‹œ๊ฐ ์ถ”์ •์„ ๊ฐœ์„ ํ•˜๊ณ  ๋ชจํ˜ธ์„ฑ์„ ํ•ด์†Œํ•จ์„ ๋ณด์˜€๋‹ค. ๊ธฐ์กด์˜ Fiducial tracking๋ณด๋‹ค ๊ฐ„๋‹จํ•˜๊ณ , End-to-end ๋ฐฉ์‹๋ณด๋‹ค ๊ฒฐ๊ณผ ํ•ด์„์ด ์šฉ์ดํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. SLAM, Neural rendering, Tactile simulation ๊ธฐ๋ฒ•์„ ๊ฒฐํ•ฉํ•œ ๋ณธ ์—ฐ๊ตฌ๋Š” ๋กœ๋ด‡ Dexterity ๋ฐœ์ „์— ๊ธฐ์—ฌํ•˜๋Š” ์ค‘์š”ํ•œ ๋‹จ๊ณ„์ด๋‹ค. ํ•œ๊ณ„์ ์œผ๋กœ๋Š” Sim-to-real gap, Sparse tactile contact, Real-time ์‹คํ–‰ ์†๋„ ๊ฐœ์„  ํ•„์š”์„ฑ, ๊ฐ•๊ฑดํ•œ Initial guess ๋ถ€์žฌ ๋“ฑ์ด ๋…ผ์˜๋˜์—ˆ๋‹ค.

2 Detail Review

๐Ÿง  NeuralFeels: ์†๋์˜ ๊ฐ๊ฐ์„ ์‹ ๊ฒฝ๋ง์œผ๋กœ ์žฌํ˜„ํ•˜๋‹ค

โ€“ NeRF์™€ ์ด‰๊ฐ์˜ ๋งŒ๋‚จ, in-hand manipulation์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ง€๊ฐ ๋ฐฉ์‹

โ€œ๋ˆˆ์œผ๋กœ ๋ณด์ง€ ๋ชปํ•˜๋Š” ๊ณณ์€ ์†๋์˜ ๊ฐ๊ฐ์œผ๋กœ ๊ทธ๋ฆฐ๋‹ค.โ€

2.1 1. ๐Ÿ‘‹ ์ด ๋…ผ๋ฌธ์€ ๋ฌด์—‡์„ ๋‹ค๋ฃจ๊ณ  ์žˆ๋‚˜?

๋กœ๋ด‡์ด ๋ฌผ์ฒด๋ฅผ ์†์œผ๋กœ ์žก๊ณ  ์›€์ง์ผ ๋•Œ, ๋‹จ์ˆœํžˆ ๋ˆˆ์œผ๋กœ ๋ณด๋Š” ์ •๋ณด๋งŒ์œผ๋กœ๋Š” ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์†๊ฐ€๋ฝ์œผ๋กœ ๊ฐ€๋ ค์ง„ ๋ถ€๋ถ„์ด๋‚˜ ์ ‘์ด‰ํ•˜๋Š” ๋ฉด์€ ์‹œ๊ฐ ์ •๋ณด๋งŒ์œผ๋กœ๋Š” ๊ด€์ฐฐํ•  ์ˆ˜ ์—†์ฃ .

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฐ in-hand manipulation(์† ์•ˆ์—์„œ ์กฐ์ž‘) ์ƒํ™ฉ์—์„œ, ๐Ÿ“ท ์‹œ๊ฐ ์ •๋ณด(RGB-D) ์™€ โœ‹ ์ด‰๊ฐ ์ •๋ณด(GelSight) ๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ,

  • 3D ๋ฌผ์ฒด ํ˜•์ƒ(Shape) ๊ณผ
  • ์ ‘์ด‰ ์ƒํƒœ(Contact) ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ถ”๋ก ํ•˜๋Š” ๋ชจ๋ธ์ธ NeuralFeels๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ฐœ๋…์€ ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค:

์‹œ๊ฐ์ด ๋†“์น˜๋Š” ๋ถ€๋ถ„์€ ์ด‰๊ฐ์œผ๋กœ ๋ณด์™„ํ•˜์ž. ๊ทธ๋ฆฌ๊ณ  ์ด ์ •๋ณด๋ฅผ Neural Field ํ˜•ํƒœ๋กœ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ํ‘œํ˜„ํ•˜์ž.


2.2 2. ๐Ÿ”ง ๋ฐฐ๊ฒฝ ์ง€์‹: Neural Field์™€ ์ด‰๊ฐ ์„ผ์„œ

2.2.1 ๐Ÿ”น Neural Field๋ž€?

Neural Field๋Š” ๊ณต๊ฐ„์˜ ์—ฐ์†์ ์ธ ๋ฌผ๋ฆฌ๋Ÿ‰(์˜ˆ: ๋ฐ€๋„, ์ƒ‰, ๊ฑฐ๋ฆฌ ๋“ฑ)์„ ์˜ˆ์ธกํ•˜๋Š” ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ํ•จ์ˆ˜ ํ‘œํ˜„์ž…๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ๊ฐ€ NeRF(Neural Radiance Fields)๋กœ, ํ•œ ์ ์˜ ์œ„์น˜์™€ ์‹œ์ ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํ•ด๋‹น ์ ์˜ ์ƒ‰๊ณผ ๋ฐ€๋„๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

์ด ๋…ผ๋ฌธ์—์„œ๋Š” NeRF ๋Œ€์‹  Signed Distance Function(SDF) ๊ธฐ๋ฐ˜ Field๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. SDF๋Š” ์–ด๋–ค ์ ์ด ๋ฌผ์ฒด์˜ ํ‘œ๋ฉด์—์„œ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์Šค์นผ๋ผ ๊ฐ’์ž…๋‹ˆ๋‹ค.

  • 0์ด๋ฉด ํ‘œ๋ฉด ์œ„,
  • ์Œ์ˆ˜๋ฉด ๋‚ด๋ถ€,
  • ์–‘์ˆ˜๋ฉด ์™ธ๋ถ€.

NeuralFeels๋Š” ์ด SDF๋ฅผ ํ•™์Šตํ•˜์—ฌ ๋ฌผ์ฒด ํ˜•์ƒ์„ ์—ฐ์†์ ์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

2.2.2 ๐Ÿ”น GelSight ์„ผ์„œ๋ž€?

GelSight๋Š” ๋ฌผ์ฒด ํ‘œ๋ฉด์˜ ๋ฏธ์„ธํ•œ ํ˜•์ƒ๊ณผ ์ ‘์ด‰ ๊ฐ•๋„๋ฅผ ๊ณ ํ•ด์ƒ๋„๋กœ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋Š” ์ด‰๊ฐ ์„ผ์„œ์ž…๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌ์ ์œผ๋กœ๋Š” ์ ค ๊ฐ™์€ ํˆฌ๋ช…ํ•œ ๋ฌผ์งˆ์— ๊ณ ๋ฌด๋ง‰์„ ๋ฎ๊ณ , ๊ทธ ์•„๋ž˜์— ์นด๋ฉ”๋ผ๋ฅผ ์„ค์น˜ํ•˜์—ฌ ๋ณ€ํ˜•๋œ ํ‘œ๋ฉด์„ ์‹œ๊ฐ์ ์œผ๋กœ ์ฝ์–ด๋‚ด๋Š” ์žฅ์น˜์ž…๋‹ˆ๋‹ค.


2.3 3. ๐Ÿง  NeuralFeels์˜ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ

NeuralFeels๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐœ์˜ neural field๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

์ปดํฌ๋„ŒํŠธ ์—ญํ•  ์ž…๋ ฅ ์ถœ๋ ฅ
๐Ÿ”ต Shape Field 3D ํ˜•์ƒ ์ถ”์ • (SDF ์˜ˆ์ธก) RGB-D + Tactile Depth SDF ๊ฐ’
๐Ÿ”ด Contact Field ์†๊ฐ€๋ฝ-๋ฌผ์ฒด ์ ‘์ด‰ ๋ถ€์œ„ ์˜ˆ์ธก ์†๊ฐ€๋ฝ ์œ„์น˜ + SDF ์ ‘์ด‰ ํ™•๋ฅ 

2.3.1 โœจ Shape Field: ํ˜•์ƒ์„ ๊ทธ๋ฆฌ๋Š” ์ด‰๊ฐ

  • ๊ธฐ๋ณธ์ ์œผ๋กœ RGB-D๋ฅผ ํ†ตํ•ด ๊ด€์ฐฐ๋œ ์‹œ์ ์˜ ์ ๋“ค์„ SDF supervision์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ด‰๊ฐ์œผ๋กœ ์ธก์ •๋œ ํ‘œ๋ฉด์€ occluded region์˜ SDF ground-truth๋กœ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์†๊ฐ€๋ฝ์œผ๋กœ ๊ฐ€๋ ค์ง„ ์˜์—ญ๋„ ์ด‰๊ฐ์œผ๋กœ ์žฌ๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•œ ๊ฒŒ ํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค.

2.3.2 โœจ Contact Field: ์†๋์˜ ์••๋ ฅ์„ ํ™•๋ฅ ๋กœ

  • ์†๊ฐ€๋ฝ ๋งํฌ์˜ ์œ„์น˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ณต๊ฐ„ ์ƒ˜ํ”Œ๋ง.
  • SDF๊ฐ€ 0์— ๊ฐ€๊นŒ์šด ์œ„์น˜ ์ค‘, ์‹ค์ œ๋กœ ์ ‘์ด‰ํ•œ tactile evidence๊ฐ€ ์žˆ๋Š” ๊ณณ์— contact ํ™•๋ฅ ์„ ๋†’์ด๋„๋ก ํ•™์Šต.

2.4 4. โš™๏ธ ์–ด๋–ป๊ฒŒ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ–ˆ๋‚˜?

2.4.1 ๐Ÿงพ ๋ฐ์ดํ„ฐ์…‹: Visuo-Tactile In-Hand Manipulation Dataset

  • 6๊ฐ€์ง€ ์ผ์ƒ ๋ฌผ์ฒด (์ปต, ๋ณ‘, ์ƒ์ž ๋“ฑ)
  • ๋‹ค๊ด€์ ˆ ๋กœ๋ด‡ ์†์œผ๋กœ ๋‹ค์–‘ํ•œ ์กฐ์ž‘ (๋Œ๋ฆฌ๊ธฐ, ๋“ค๊ธฐ, ๋ˆŒ๋Ÿฌ๋ณด๊ธฐ)
  • RGB-D ์˜์ƒ + Gelsight ์ด‰๊ฐ ์ •๋ณด + ์†-๋ฌผ์ฒด ํฌ์ฆˆ ์ •๋ณด

2.4.2 ๐Ÿงช ์‹คํ—˜ ํ‰๊ฐ€ ํ•ญ๋ชฉ

  1. SDF ์žฌ๊ตฌ์„ฑ ์ •ํ™•๋„ (Chamfer Distance)
  2. ์ ‘์ด‰ ์˜ˆ์ธก ์ •ํ™•๋„ (Contact Classification)
  3. Occluded ์˜์—ญ ๋ณต์› ์„ฑ๋Šฅ ๋น„๊ต

2.5 5. ๐Ÿ“Š ์‹คํ—˜ ๊ฒฐ๊ณผ ์š”์•ฝ

ํ‰๊ฐ€ ํ•ญ๋ชฉ ๊ธฐ์กด ๋ฐฉ๋ฒ• NeuralFeels ์„ฑ๋Šฅ ํ–ฅ์ƒ
SDF ์˜ค์ฐจ โ†“ 0.86 mm 0.54 mm -37%
์ ‘์ด‰ ์˜ˆ์ธก ์ •ํ™•๋„ โ†‘ 75.3% 91.7% +16%
Occlusion ๋ณต์› ํ’ˆ์งˆ ๋‚ฎ์Œ ์šฐ์ˆ˜ํ•จ โœ…

2.5.1 ๐Ÿ” ์ฃผ์š” ์ธ์‚ฌ์ดํŠธ

  • Vision-only๋Š” ๋ฌผ์ฒด์˜ ๋’ค๋‚˜ ์ ‘์ด‰๋ฉด์„ ๊ฑฐ์˜ ์ถ”๋ก  ๋ชปํ•จ.
  • ์ด‰๊ฐ ์ •๋ณด๋ฅผ supervision์œผ๋กœ ๋„ฃ์ž hidden surface ๋ณต์› ๋Šฅ๋ ฅ์ด ๊ทน์ ์œผ๋กœ ํ–ฅ์ƒ๋จ.

2.6 6. ๐Ÿ’ก ๊ธฐ์ˆ ์  ํ†ต์ฐฐ

2.6.1 โœ”๏ธ ์™œ ์ข‹์€ ์•„์ด๋””์–ด์ธ๊ฐ€?

  • ์ด‰๊ฐ ์ •๋ณด๋ฅผ โ€œ๋‹จ์ˆœ ํ”ผ๋“œ๋ฐฑโ€์ด ์•„๋‹ˆ๋ผ โ€œ์ง€๊ฐ ํ•™์Šต์˜ supervisionโ€์œผ๋กœ ์‚ฌ์šฉํ•œ ์ ์ด ํƒ์›”ํ•ฉ๋‹ˆ๋‹ค.
  • NeRF ๊ธฐ๋ฐ˜์˜ 3D ํ‘œํ˜„๋ ฅ๊ณผ tactile์˜ ์„ธ๋ฐ€ํ•œ ์ ‘์ด‰ ๊ฐ์ง€๋ฅผ ๊ฒฐํ•ฉํ•ด, ๊ธฐ์กด๋ณด๋‹ค ํ›จ์”ฌ ํ˜„์‹ค๊ฐ ์žˆ๋Š” ์ง€๊ฐ์ด ๊ฐ€๋Šฅํ•ด์กŒ์Šต๋‹ˆ๋‹ค.

2.6.2 โœ”๏ธ ํŠนํžˆ ๋ˆˆ์— ๋„๋Š” ๋ถ€๋ถ„

  • Contact Field๋Š” ๋‹จ์ˆœ contact point๋ฅผ ๋„˜์–ด์„œ โ€œ์ ‘์ด‰ ํ™•๋ฅ  ๋ถ„ํฌโ€๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค.
  • ์ด๋Š” Grasp Refinement, Slip Detection, Force Control ๋“ฑ downstream task์— ๋งค์šฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

2.7 7. โš ๏ธ ํ•œ๊ณ„์  ๋ฐ ๊ณ ๋ฏผ๊ฑฐ๋ฆฌ

2.7.1 ๐Ÿ› ๏ธ ํ•˜๋“œ์›จ์–ด ์˜์กด์„ฑ

  • Gelsight ์„ผ์„œ๋Š” ๊ณ ๊ฐ€์ด๋ฉฐ ์„ค์น˜ ๋ณต์žก โ†’ ์‹ค์‚ฌ์šฉ ์‹œ์Šคํ…œ ๊ตฌ์ถ• ๋‚œ์ด๋„ โ†‘

2.7.2 ๐Ÿง  ์ถ”๋ก ์€ ๋น ๋ฅด๋‚˜ ํ•™์Šต์€ ๋А๋ฆผ

  • Inference๋Š” 30Hz ์ด์ƒ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ํ•™์Šต์€ ํ•œ ๊ฐ์ฒด๋‹น ์ˆ˜ ์‹œ๊ฐ„ ์†Œ์š”๋จ

2.7.3 ๐Ÿ”„ ์ œ์–ด ์‹œ์Šคํ…œ๊ณผ ํ†ตํ•ฉ์€ ๋ฏธ์™„์„ฑ

  • perception ๋ชจ๋“ˆ์€ ํ›Œ๋ฅญํ•˜์ง€๋งŒ, ์‹ค์‹œ๊ฐ„ manipulation loop๊ณผ ์—ฐ๊ฒฐ๋œ ์™„์ „ํ•œ policy๋Š” ์•„์ง ์ œ์•ˆ๋˜์ง€ ์•Š์Œ

2.8 8. ๐Ÿค” ๊ทธ๋ฆฌ๊ณ  ์šฐ๋ฆฌ๋Š” ์–ด๋–ค ์งˆ๋ฌธ์„ ๋˜์งˆ ์ˆ˜ ์žˆ์„๊นŒ?

  1. ์ €๊ฐ€ํ˜• ์„ผ์„œ์—์„œ๋„ ๊ฐ™์€ ๋ฐฉ์‹์ด ๊ฐ€๋Šฅํ• ๊นŒ? ์˜ˆ: ReSkin, uSkin์ฒ˜๋Ÿผ ๋ฒ”์šฉ์„ฑ ๋†’์€ ์ž์„ฑ ๊ธฐ๋ฐ˜ ์„ผ์„œ๋กœ๋„ SDF ํ•™์Šต์ด ๊ฐ€๋Šฅํ• ๊นŒ?

  2. ์‹ค์‹œ๊ฐ„ ์—…๋ฐ์ดํŠธ ๊ฐ€๋Šฅ์„ฑ์€? ํ˜„์žฌ๋Š” offline ํ•™์Šต ํ›„ ์ถ”๋ก ๋งŒ ์‹ค์‹œ๊ฐ„. ์‹ค์‹œ๊ฐ„ online update๊ฐ€ ๋œ๋‹ค๋ฉด slip feedback ๋“ฑ์— ๋ฐ”๋กœ ๋ฐ˜์˜ ๊ฐ€๋Šฅ.

  3. Generalization์€ ์–ด๋–ป๊ฒŒ ๋ณด์žฅํ• ๊นŒ? ๋ฌผ์ฒด๊ฐ€ ๋ฐ”๋€Œ์—ˆ์„ ๋•Œ, ์† ๋ชจ์–‘์ด ๋‹ฌ๋ผ์กŒ์„ ๋•Œ ์–ผ๋งˆ๋‚˜ robustํ•œ๊ฐ€?


2.9 9. ๐ŸŒฑ ํ–ฅํ›„ ์—ฐ๊ตฌ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋Š” ์•„์ด๋””์–ด

  • Policy-level ํ•™์Šต ํ†ตํ•ฉ: SDF + Contact Field๋ฅผ ์กฐ๊ฑด์œผ๋กœ ํ•˜๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ manipulation policy ํ•™์Šต
  • Domain Adaptation ์—ฐ๊ตฌ: tactile ์—†๋Š” ์ƒํ™ฉ์—์„œ pre-trained model์„ ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ?
  • Simulation to Real Transfer: GelSight ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ํ†ตํ•œ ๋Œ€๊ทœ๋ชจ ํ•™์Šต โ†’ ์‹ค์ œ ํ™˜๊ฒฝ ์ ์šฉ

2.10 10. ๐Ÿ“Œ ๋งˆ๋ฌด๋ฆฌ

NeuralFeels๋Š” ์‹œ๊ฐ๊ณผ ์ด‰๊ฐ์ด๋ผ๋Š” ์ด์งˆ์ ์ธ ๋‘ ๊ฐ๊ฐ์„ ํ•˜๋‚˜์˜ ์‹ ๊ฒฝ ํ‘œํ˜„ ์•ˆ์— ํ†ตํ•ฉํ•œ ์ธ์ƒ์ ์ธ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๊ทธ ํ†ตํ•ฉ ๋ฐฉ์‹์„ Neural Field๋กœ ์ถ”์ƒํ™”ํ•˜์—ฌ ์—ฐ์†์ ์ด๊ณ  ํ•ด์„ ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ๋งŒ๋“  ์ ์€ ํ–ฅํ›„ ๋กœ๋ด‡ ์ด‰๊ฐ์ง€๊ฐ ์—ฐ๊ตฌ์˜ ์ค‘์š”ํ•œ ์ด์ •ํ‘œ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด‰๊ฐ ์„ผ์„œ์˜ ๋ฐœ์ „๊ณผ ํ•จ๊ป˜ ์ด๋Ÿฐ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ field ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์€ ๋”์šฑ ๋น›์„ ๋ฐœํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค. ๋กœ๋ด‡์ด โ€˜๋ณด๋Š”โ€™ ๊ฒƒ์—์„œ โ€˜๋А๋ผ๋Š”โ€™ ์กด์žฌ๋กœ ์ง„ํ™”ํ•ด ๊ฐ€๋Š” ํ๋ฆ„์„ ์ด ๋…ผ๋ฌธ์ด ์ž˜ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์ฃ .


2.11 ๐Ÿงพ ์ฐธ๊ณ ์ž๋ฃŒ

  • ๋…ผ๋ฌธ ๋งํฌ (arXiv)
  • GelSight ๊ธฐ์ˆ  ๊ฐœ์š”
  • NeRF ๊ฐœ๋… ์„ค๋ช… ๋ธ”๋กœ๊ทธ
  • Original Paper
  • Project Homepage

Copyright 2024, Jung Yeon Lee