Curieux.JY
  • JungYeon Lee
  • Post
  • ๐Ÿ•ธ๏ธ Graph
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
    • ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก 
    • ๊ธฐ์ˆ ์  ์„ฑ๊ณผ
  • ๐Ÿ”” Ring Review
    • ์„œ๋ก 
    • ๋ฐฉ๋ฒ•
      • 1๋‹จ๊ณ„: ๊ต์‚ฌ RL ํ•™์Šต
      • 2๋‹จ๊ณ„: ํ•™์ƒ distillation
      • 3๋‹จ๊ณ„: ์‹œ๊ฐ ํฌ์ฆˆ ์ถ”์ • โ€” 3DGS ๋„๋ฉ”์ธ ๋žœ๋คํ™” (ํ•ต์‹ฌ)
      • ์„ฑ๋Šฅ ๊ธฐ๋ฐ˜ ์ปค๋ฆฌํ˜๋Ÿผ RL
      • ์‹œ์Šคํ…œ ์„ค์ •
    • ์‹คํ—˜
      • ํฌ์ฆˆ ์ถ”์ • (Table II)
      • Augmentation Ablation (Table III)
      • ์‹ค๋กœ๋ด‡ ๋ฐฐํฌ (Table IV)
      • ํ•™์Šต ํšจ์œจ
      • belief decoder์˜ ๊ฒฌ๊ณ ์„ฑ (Figure 7)
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

๐Ÿ“ƒViserDex

dexterity
in-hand-reorientation
gaussian-splatting
sim2real
rl
pose-estimation
ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation
Published

June 24, 2026

  • Paper Link (arXiv:2604.11138)
  • Project Page
  • Video
  • ์ €์ž: Arjun Bhardwaj, Maximum Wilder-Smith, Mayank Mittal, Vaishakh Patil, Marco Hutter (ETH Zรผrich, NVIDIA) โ€” RSS 2026
  1. ๐Ÿš€ ViserDex๋Š” 3D Gaussian Splatting(3DGS)์˜ ํ‘œํ˜„๋ ฅ์„ ํ™œ์šฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ๋‚ด์—์„œ ๋ณต์žกํ•œ ๊ฐ์ฒด์˜ ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์„ ํ™•๋ณดํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋…ธํ˜๋Ÿฌ RGB ์นด๋ฉ”๋ผ๋งŒ์œผ๋กœ๋„ ๊ฐ•๊ฑดํ•œ sim-to-real ์ „์ด๊ฐ€ ๊ฐ€๋Šฅํ•œ dexterous in-hand manipulation ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

  2. ๐Ÿ’ก ์—ฐ๊ตฌํŒ€์€ ๊ฐ€์šฐ์‹œ์•ˆ ํ‘œํ˜„ ๊ณต๊ฐ„์—์„œ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋œ ์ „์ฒ˜๋ฆฌ ์ฆ๊ฐ•(pre-rasterization augmentations) ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์—ฌ, ์กฐ๋ช… ๋ณ€ํ™”๋‚˜ ๊ฐ€๋ ค์ง์ด ์‹ฌํ•œ adversarial ํ™˜๊ฒฝ์—์„œ๋„ ์ •ํ™•ํ•œ ๊ฐ์ฒด ์ž์„ธ ์ถ”์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.

  3. ๐Ÿค– ์‹ค์ œ 16-DoF Allegro Hand๋ฅผ ์ด์šฉํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ, ๋ณธ ์‹œ์Šคํ…œ์€ ๊ธฐ์กด ๋ฐฉ์‹๋ณด๋‹ค ํ›จ์”ฌ ์ ์€ ์ปดํ“จํŒ… ์ž์›์œผ๋กœ๋„ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋“ค์— ๋Œ€ํ•ด ํ‰๊ท  25ํšŒ ์ด์ƒ์˜ ์—ฐ์†์ ์ธ ์„ฑ๊ณต์ ์ธ ์žฌ๋ฐฐํ–ฅ(reorientation)์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ ๋†’์€ ๋ฒ”์šฉ์„ฑ๊ณผ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ ๋‹จ์ผ Monocular RGB ์นด๋ฉ”๋ผ๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๋ด‡์˜ Dexterous In-hand Manipulation(์† ์•ˆ์˜ ๋ฌผ์ฒด ์žฌ๋ฐฐ์น˜)์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ Sim-to-Real ํ”„๋ ˆ์ž„์›Œํฌ์ธ ViserDex๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์ด ๋ณต์žกํ•œ ๊ฐ์ฒด๋‚˜ ์กฐ๋ช… ํ™˜๊ฒฝ์—์„œ ์–ด๋ ค์›€์„ ๊ฒช๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, 3D Gaussian Splatting(3DGS)์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฃจํ”„์— ํ†ตํ•ฉํ•˜์—ฌ ๊ณ ๋„์˜ ์‹œ๊ฐ์  ํ˜„์‹ค๊ฐ๊ณผ ํ›ˆ๋ จ ํšจ์œจ์„ฑ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.


Figure 1 โ€” ViserDex ๊ฐœ์š”: ๋‹จ์•ˆ RGB๋งŒ์œผ๋กœ ์†์•ˆ ์žฌ๋ฐฐํ–ฅ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์‹œ๊ฐ sim-to-real ํŒŒ์ดํ”„๋ผ์ธ

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก 

1. 3D Gaussian Splatting ๊ธฐ๋ฐ˜์˜ ์‹œ๊ฐ์  ์‹œ๋ฎฌ๋ ˆ์ด์…˜

๊ธฐ์กด์˜ ๋ฉ”์‰ฌ ๊ธฐ๋ฐ˜ ๋ Œ๋”๋ง ๋Œ€์‹  3D Gaussian Splatting(3DGS)์„ ๋„์ž…ํ•˜์—ฌ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ณ ํ’ˆ์งˆ์˜ ์‹œ๊ฐ์  ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  • Pre-rasterization Augmentation: ๋ Œ๋”๋ง ์ „ ๋‹จ๊ณ„์—์„œ ๊ฐ€์šฐ์‹œ์•ˆ์˜ Spherical Harmonics(SH) ๊ณ„์ˆ˜๋ฅผ ์ง์ ‘ ์กฐ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
  • ํด๋Ÿฌ์Šคํ„ฐ ๊ธฐ๋ฐ˜ ์„ญ๋™: ๊ณต๊ฐ„์  ์œ„์น˜, photometric ์ƒ๊ด€๊ด€๊ณ„, ๋˜๋Š” ์ „์ฒด ์”ฌ ๋‹จ์œ„๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋‚˜๋ˆ„์–ด ์ƒ‰์ƒ(SH_0) ๋ฐ ๋ฐ˜์‚ฌ(SH_{N}) ํŠน์„ฑ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ์ž…ํ•ฉ๋‹ˆ๋‹ค.
  • ์ˆ˜์‹: ๊ด€์ธก ๋ฐฉํ–ฅ d์— ๋”ฐ๋ฅธ ์ƒ‰์ƒ c(d)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค. c(d) = \text{Sigmoid}\left(\sum_{\ell=0}^{L} \sum_{m=-\ell}^{\ell} k_{\ell}^{m} Y_{\ell}^{m}(d)\right)
  • ์ด ๋ฐฉ์‹์„ ํ†ตํ•ด ๋ ˆ์ด ํŠธ๋ ˆ์ด์‹ฑ ์—†์ด๋„ ์‚ฌ์‹ค์ ์ธ ์กฐ๋ช… ๋ณ€ํ™”์™€ ์žฌ์งˆ ๋ณ€ํ™”๋ฅผ ๊ตฌํ˜„ํ•˜์—ฌ ์‹œ๊ฐ์  Domain Randomization์˜ ํšจ๊ณผ๋ฅผ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

2. ๋ชจ๋“ˆํ™”๋œ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ

ํ•™์Šต์€ ํฌ๊ฒŒ ์„ธ ๋‹จ๊ณ„๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ๊ฐ ๋‹จ๊ณ„๋Š” ์†Œ๋น„์ž์šฉ GPU์—์„œ๋„ ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ•  ๋งŒํผ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

  • Privileged Teacher Training: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์™„์ „ํ•œ ์ƒํƒœ ์ •๋ณด(๋ฌผ์ฒด ์†๋„, ์ ‘์ด‰๋ ฅ ๋“ฑ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ PPO(Proximal Policy Optimization) ๊ธฐ๋ฐ˜์˜ ๊ต์‚ฌ ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์„ฑ๋Šฅ ๊ธฐ๋ฐ˜์˜ Curriculum Learning์„ ์ ์šฉํ•˜์—ฌ ๋‚œ์ด๋„๋ฅผ ๋‹จ๊ณ„์ ์œผ๋กœ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค.
  • Student Distillation: ๊ต์‚ฌ ์ •์ฑ…์„ ์žฌ๊ท€์  ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ ํ•™์ƒ ์ •์ฑ…์œผ๋กœ ์ฆ๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. Belief Encoder๋ฅผ ํ†ตํ•ด ๋…ธ์ด์ฆˆ๊ฐ€ ์„ž์ธ ๊ด€์ธก์น˜๋กœ๋ถ€ํ„ฐ ์‹œ์Šคํ…œ ์ƒํƒœ๋ฅผ ์ถ”๋ก ํ•˜๋ฉฐ, Online DAgger ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐฐํฌ ํ™˜๊ฒฝ์˜ Covariate Shift์— ๋Œ€์‘ํ•ฉ๋‹ˆ๋‹ค.
  • Visual Pose Estimator Training: 3DGS๋กœ ๋ Œ๋”๋ง๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RGB ์ด๋ฏธ์ง€์—์„œ 9๊ฐœ์˜ ํ•ต์‹ฌ ํฌ์ธํŠธ(Keypoints)๋ฅผ ์ถ”๋ก ํ•˜๋Š” ResNet-34 ๊ธฐ๋ฐ˜์˜ ํฌ์ฆˆ ์ถ”์ •๊ธฐ๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ถ”์ •๋œ ํ‚คํฌ์ธํŠธ๋Š” Rigid Procrustes ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด 6D ํฌ์ฆˆ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.

๊ธฐ์ˆ ์  ์„ฑ๊ณผ

  • ๊ฐ•๊ฑด์„ฑ(Robustness): Adversarial ์กฐ๋ช… ์กฐ๊ฑด(๋‚ฎ์€ ๋Œ€๋น„, ์ƒ‰์ƒ ์™œ๊ณก ๋“ฑ)์—์„œ๋„ ์•ˆ์ •์ ์ธ ๊ฐ์ฒด ์žฌ๋ฐฐ์น˜๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ํšจ์œจ์„ฑ: ๊ธฐ์กด์˜ ๋ณต์žกํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐฉ์‹ ๋Œ€๋น„ VRAM ์‚ฌ์šฉ๋Ÿ‰์„ ํฌ๊ฒŒ ์ค„์˜€์œผ๋ฉฐ, 3DGS๋ฅผ ํ†ตํ•ด ๋ Œ๋”๋ง ์ฒ˜๋ฆฌ๋Ÿ‰์„ ์•ฝ 1.6๋ฐฐ ํ–ฅ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์„ฑ๋Šฅ: 16-DoF Allegro Hand๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 5๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฌผ์ฒด์— ๋Œ€ํ•ด ํ‰๊ท  25ํšŒ ์ด์ƒ์˜ ์—ฐ์† ์žฌ๋ฐฐ์น˜ ์„ฑ๊ณต์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, 3DGS ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์ด ๊ธฐ์กด์˜ ๋ Œ๋”๋ง ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•๋ณด๋‹ค ํฌ์ฆˆ ์ถ”์ • ์˜ค์ฐจ๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ์ค„์ž„์„ ์‹คํ—˜์ ์œผ๋กœ ์ฆ๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก ์ ์œผ๋กœ, ViserDex๋Š” ์‹œ๊ฐ์  ์ธ์ง€ ๋Šฅ๋ ฅ์˜ ํ•œ๊ณ„๋ฅผ 3DGS๋ฅผ ํ™œ์šฉํ•œ ํšจ์œจ์ ์ธ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ์ „๋žต์œผ๋กœ ๊ทน๋ณตํ•จ์œผ๋กœ์จ, ๋ณต์žกํ•œ ์‹ค์„ธ๊ณ„ ํ™˜๊ฒฝ์—์„œ๋„ ๋‹จ์ผ RGB ์นด๋ฉ”๋ผ๋งŒ์œผ๋กœ ๊ณ ๋„์˜ ๋กœ๋ด‡ ์†์žฌ์ฃผ๋ฅผ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

์„œ๋ก 

์†์•ˆ ์žฌ๋ฐฐํ–ฅ(in-hand reorientation)์€ ๋Šฅ์ˆ™ ์กฐ์ž‘(dexterous manipulation)์˜ ์ƒ์ง•์  ๋‚œ์ œ์ž…๋‹ˆ๋‹ค. ์†๊ฐ€๋ฝ๋งŒ์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๊ตด๋ ค ๋ชฉํ‘œ ์ž์„ธ๋กœ ๋งž์ถ”๋ ค๋ฉด ์ •๋ฐ€ํ•œ ๋ฌผ์ฒด ํฌ์ฆˆ ์ถ”์ • ์ด ํ•„์ˆ˜์ธ๋ฐ, ์—ฌ๊ธฐ์—” ๋‘ ๊ฐ€์ง€ ํฐ ๋ฒฝ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋น ๋ฅธ ๋™์ž‘ + ์‹ฌํ•œ ์ž๊ธฐ ๊ฐ€๋ฆผ(self-occlusion). ์†๊ฐ€๋ฝ์ด ๋ฌผ์ฒด๋ฅผ ๋Š์ž„์—†์ด ๊ฐ€๋ฆฌ๋Š” ์ƒํ™ฉ์—์„œ ๋‹จ์•ˆ RGB๋กœ 6D ํฌ์ฆˆ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์ถ”์ •ํ•˜๊ธฐ๋Š” ์–ด๋ ต์Šต๋‹ˆ๋‹ค.
  • ์‹œ๊ฐ sim-to-real ๊ฒฉ์ฐจ. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋ Œ๋”๋ง๊ณผ ์‹ค์ œ ์นด๋ฉ”๋ผ ์˜์ƒ์€ ์กฐ๋ช…ยท์žฌ์งˆยท๋ฐ˜์‚ฌ ์ธก๋ฉด์—์„œ ๋‹ฌ๋ผ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•œ ํฌ์ฆˆ ์ถ”์ •๊ธฐ๊ฐ€ ์‹ค์„ธ๊ณ„์—์„œ ๋ฌด๋„ˆ์ง€๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค.

๊ธฐ์กด ํ•ด๋ฒ•์€ (1) ๋‹ค์ค‘ ์นด๋ฉ”๋ผ ๋ฆฌ๊ทธ, (2) ์—ฐ์‚ฐ์ด ๋น„์‹ผ ray tracing ๋ Œ๋”๋ง, ๋˜๋Š” (3) ๋น„์‹œ๊ฐ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ(์ด‰๊ฐ ๋“ฑ) ์— ์˜์กดํ–ˆ์Šต๋‹ˆ๋‹ค. ์…‹ ๋‹ค ๋น„์šฉยท๋ณต์žก๋„ยทํ™•์žฅ์„ฑ ์ธก๋ฉด์—์„œ ๋ถ€๋‹ด์ด ํฝ๋‹ˆ๋‹ค.

์ €์ž๋“ค์˜ ์งˆ๋ฌธ์€ ๋ช…ํ™•ํ•ฉ๋‹ˆ๋‹ค. โ€œ์นด๋ฉ”๋ผ ํ•œ ๋Œ€(๋‹จ์•ˆ RGB)์™€ ์†Œ๋น„์ž๊ธ‰ GPU๋งŒ์œผ๋กœ, ๊ทนํ•œ ์กฐ๋ช…์—์„œ๋„ ๊ฒฌ๋””๋Š” ๊ฐ•๊ฑดํ•œ ์†์•ˆ ์žฌ๋ฐฐํ–ฅ์ด ๊ฐ€๋Šฅํ•œ๊ฐ€?โ€

์ด ๋…ผ๋ฌธ์˜ ํ•œ ์ค„ ์š”์•ฝ: 3D Gaussian Splatting์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์— ํ†ตํ•ฉํ•˜๊ณ , ๋ž˜์Šคํ„ฐํ™” ์ด์ „ SH ๊ณ„์ˆ˜์— ๋„๋ฉ”์ธ ๋žœ๋คํ™” ๋ฅผ ๊ฐ€ํ•ด ๊ด‘ํ˜„์‹ค์  ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ƒ์„ฑํ•œ๋‹ค โ€” ๊ทธ ๊ฒฐ๊ณผ ๋‹จ์•ˆ RGB๋งŒ์œผ๋กœ, ์†Œ๋น„์ž๊ธ‰ GPU ํ•™์Šต์œผ๋กœ, ๊ทนํ•œ ์กฐ๋ช…์—์„œ๋„ ๊ฐ•๊ฑดํ•œ ์†์•ˆ ์žฌ๋ฐฐํ–ฅ์„ ๋‹ฌ์„ฑํ•œ๋‹ค.

flowchart LR
    subgraph T["1 ๊ต์‚ฌ RL"]
        PRIV["ํŠน๊ถŒ ๊ด€์ธก<br/>(GT ํฌ์ฆˆยท์†๋„ยท์ ‘์ด‰๋ ฅ)"]
        PPO["PPO<br/>24,576 ๋ณ‘๋ ฌ env"]
        PRIV --> PPO
    end
    subgraph S["2 ํ•™์ƒ distillation"]
        BELIEF["belief encoder-decoder<br/>(LSTM)"]
        DAG["์˜จ๋ผ์ธ DAgger<br/>BC + ์žฌ๊ตฌ์„ฑ ์†์‹ค"]
        BELIEF --- DAG
    end
    subgraph V["3 ์‹œ๊ฐ ํฌ์ฆˆ ์ถ”์ •"]
        GS["3DGS ๋ Œ๋” + SH augmentation<br/>(๊ณต๊ฐ„/์ƒ‰/์ „์—ญ ํด๋Ÿฌ์Šคํ„ฐ)"]
        RES["ResNet-34<br/>9 ํ‚คํฌ์ธํŠธ 2.5D"]
        GS --> RES
    end
    PPO --> BELIEF
    RES --> DEPLOY["์‹ค๋กœ๋ด‡ ๋ฐฐํฌ<br/>Allegro + RealSense<br/>๋‹จ์•ˆ RGB"]
    BELIEF --> DEPLOY

๋ฐฉ๋ฒ•

์ „์ฒด ์‹œ์Šคํ…œ์€ ๊ต์‚ฌ RL โ†’ ํ•™์ƒ distillation โ†’ ์‹œ๊ฐ ํฌ์ฆˆ ์ถ”์ • ์˜ 3๋‹จ๊ณ„ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ •์ฑ… ํ•™์Šต(์ƒํƒœ ๊ธฐ๋ฐ˜)๊ณผ ์ง€๊ฐ(์‹œ๊ฐ ๊ธฐ๋ฐ˜)์„ ๋ถ„๋ฆฌํ•ด ๊ฐ๊ฐ์„ ๋…๋ฆฝ์ ์œผ๋กœ ์ตœ์ ํ™”ํ•˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.


Figure 2 โ€” 3๋‹จ๊ณ„ ๊ตฌ์กฐ: (1) ํŠน๊ถŒ ์ƒํƒœ ๊ธฐ๋ฐ˜ RL ๊ต์‚ฌ ํ•™์Šต โ†’ (2) ๋…ธ์ด์ฆˆ ๊ด€์ธก ํ•™์ƒ distillation โ†’ (3) 3DGS ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ RGB ํฌ์ฆˆ ์ถ”์ •๊ธฐ

1๋‹จ๊ณ„: ๊ต์‚ฌ RL ํ•™์Šต

ํŠน๊ถŒ(privileged) ๊ด€์ธก์— ์™„์ „ ์ ‘๊ทผํ•˜๋Š” ๊ต์‚ฌ ์ •์ฑ…์„ PPO๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

  • ํ–‰๋™ ๊ณต๊ฐ„: 16๊ฐœ ๊ด€์ ˆ ์œ„์น˜ ๋ชฉํ‘œ(Allegro ์†).
  • ๋ณด์ƒ: ์—ญ(inverse) ๋ฐฉํ–ฅ ์˜ค์ฐจ ๊ธฐ๋ฐ˜ dense reward + ์„ฑ๊ณต ๋ณด๋„ˆ์Šค, ๊ทธ๋ฆฌ๊ณ  ํ–‰๋™ ํ‰ํ™œ์„ฑยท๊ด€์ ˆ ์†๋„ยท์—๋„ˆ์ง€ ์†Œ๋น„์— ๋Œ€ํ•œ ์ •๊ทœํ™” ํŽ˜๋„ํ‹ฐ.
  • ๊ด€์ธก: proprioceptive(๊ด€์ ˆ ์œ„์น˜ยทํ–‰๋™ ์ด๋ ฅยท๋ชฉํ‘œ), exteroceptive(๋ฌผ์ฒด ํฌ์ฆˆยท๋ชฉํ‘œ ์ฐจ์ด), privileged(์†๋„ยทํž˜ยท๋žœ๋คํ™”๋œ ๋ฌผ๋ฆฌ ์†์„ฑ).
  • ์•„ํ‚คํ…์ฒ˜: proprio/extero/privileged๋ฅผ ๊ฐ๊ฐ MLP๋กœ ์ธ์ฝ”๋”ฉ ํ›„ ๋ฐฑ๋ณธ [1024,1024,1024,512]์— ์—ฐ๊ฒฐ. ฮณ=0.998, ฮป=0.95, ํ™˜๊ฒฝ๋‹น 24 ์Šคํ…. 24,576๊ฐœ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ.

2๋‹จ๊ณ„: ํ•™์ƒ distillation

์‹ค์„ธ๊ณ„์—์„œ๋Š” ํŠน๊ถŒ ์ •๋ณด๊ฐ€ ์—†์œผ๋ฏ€๋กœ, ๋…ธ์ด์ฆˆ ๊ด€์ธก๋งŒ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ํ•™์ƒ ์ •์ฑ…์„ distillํ•ฉ๋‹ˆ๋‹ค.

  • belief encoder-decoder ์ˆœํ™˜๋ง(์€๋‹‰ [256,256], 2์ธต LSTM)์ด ๋…ธ์ด์ฆˆ ๊ด€์ธก์—์„œ ์ž ์žฌ ์ƒํƒœ๋ฅผ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ํ•ฉ์„ฑ ์†์‹ค L = L_{BC} + 0.2 \cdot L_{recon} (ํ–‰๋™ ๋ณต์ œ + ์ƒํƒœ ์žฌ๊ตฌ์„ฑ)์œผ๋กœ, ์˜จ๋ผ์ธ DAgger ๋ฅผ ํ†ตํ•ด ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
  • ์ด belief ๊ตฌ์กฐ ๋•๋ถ„์— ํ•™์ƒ์€ ์ผ์‹œ์  ํฌ์ฆˆ ์ถ”์ • ์‹คํŒจ(์˜ˆ: 180ยฐ ํ”Œ๋ฆฝ)๋ฅผ ์‹œ๊ฐ„์ ์œผ๋กœ ํ•„ํ„ฐ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3๋‹จ๊ณ„: ์‹œ๊ฐ ํฌ์ฆˆ ์ถ”์ • โ€” 3DGS ๋„๋ฉ”์ธ ๋žœ๋คํ™” (ํ•ต์‹ฌ)

๊ฐ€์žฅ ํฐ ๊ธฐ์—ฌ๋Š” 3D Gaussian Splatting์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ Œ๋”๋Ÿฌ๋กœ ํ†ตํ•ฉํ•˜๊ณ , ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋ฅผ Gaussian ํ‘œํ˜„ ๊ณต๊ฐ„์—์„œ ์ˆ˜ํ–‰ ํ•œ ์ ์ž…๋‹ˆ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ†ตํ•ฉ. ๋ฌผ์ฒด๊ฐ€ ์›€์ง์ด๊ณ  ์นด๋ฉ”๋ผ๋Š” ๊ณ ์ •์ธ ์ƒํ™ฉ์„, Gaussian์— ์—ญ๋ณ€ํ™˜์„ ์ ์šฉํ•ด โ€œ์ •์  ์žฅ๋ฉดโ€ ๊ฐ€์ •์„ ์œ ์ง€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์†์— ์˜ํ•œ ๊ฐ€๋ฆผ์€ ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ๊นŠ์ด ๋งˆ์Šคํ‚น(์† ๊นŠ์ด์™€ Gaussian ๊นŠ์ด ๋น„๊ต)์œผ๋กœ ๋ณต์›ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์ „ ๋ž˜์Šคํ„ฐํ™” augmentation. ๋ Œ๋”๋ง ์ „ ๋‹จ๊ณ„์—์„œ SH ๊ณ„์ˆ˜์— ์ง์ ‘ ์„ญ๋™์„ ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค โ€” ray tracing์ด ํ•„์š” ์—†์–ด ๋งค์šฐ ๋น ๋ฆ…๋‹ˆ๋‹ค.

  1. Random Noise: ๋…๋ฆฝ ๊ฐ€์šฐ์‹œ์•ˆ ์„ญ๋™(๋น„๊ตฌ์กฐ์  ๋…ธ์ด์ฆˆ).
  2. Spatial Cluster: ์œ„์น˜ ๊ธฐ์ค€ 64๊ฐœ k-means ํด๋Ÿฌ์Šคํ„ฐ ๋‹จ์œ„ ์„ญ๋™ โ†’ ๊ตญ์†Œ ๊ทธ๋ฆผ์ž/์†์ƒ ๋ชจ์‚ฌ.
  3. Color Cluster: SHโ‚€ ๊ณ„์ˆ˜ ๊ธฐ์ค€ 32๊ฐœ ํด๋Ÿฌ์Šคํ„ฐ ๋‹จ์œ„ ์„ญ๋™ โ†’ ์žฌ์งˆ๋ณ„ ๋ฐ˜์‚ฌ์œจ ๋ณ€ํ™”.
  4. Global Shift: ์žฅ๋ฉด ์ „์ฒด ๊ท ์ผ ์„ญ๋™ โ†’ ํ™˜๊ฒฝ ๋ฐ๊ธฐ/์ƒ‰์˜จ๋„ ๋ณ€ํ™”.

ํ•ต์‹ฌ์€ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์›์ž ๋‹จ์œ„ ๋กœ ์„ญ๋™ํ•ด ๊ด‘๋„ ์ผ๊ด€์„ฑ ์„ ์œ ์ง€ํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋ฌด์ž‘์œ„ ํ”ฝ์…€ ๋…ธ์ด์ฆˆ์™€ ๋‹ฌ๋ฆฌ, ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊ทธ๋Ÿด๋“ฏํ•œ ์™ธํ˜• ๋ณ€ํ™”๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.


Figure 3 โ€” ์‚ฌ์ „ ๋ž˜์Šคํ„ฐํ™” augmentation: ํด๋Ÿฌ์Šคํ„ฐ๋ง๋œ Gaussian์˜ SH ๊ณ„์ˆ˜๋ฅผ ์„ญ๋™ํ•ด ์ƒ‰/๋ฐ˜์‚ฌ/๊ณต๊ฐ„ ์™ธํ˜• ๋ณ€ํ™”๋ฅผ ray tracing ์—†์ด ์ƒ์„ฑ

ํฌ์ฆˆ ์ถ”์ •๊ธฐ. ResNet-34 ๋ฐฑ๋ณธ์ด 9๊ฐœ ํ‚คํฌ์ธํŠธ(๋ฌผ์ฒด๋ณ„ 8 + centroid)๋ฅผ 2.5D ์ขŒํ‘œ๋กœ ํšŒ๊ท€ํ•ฉ๋‹ˆ๋‹ค.

์„ฑ๋Šฅ ๊ธฐ๋ฐ˜ ์ปค๋ฆฌํ˜๋Ÿผ RL

๊ฐ’๋น„์‹ผ ADR(Automatic Domain Randomization)์„ ๊ฒฝ๋Ÿ‰ ์ปค๋ฆฌํ˜๋Ÿผ์œผ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

  • ์ •๊ทœํ™” ํŽ˜๋„ํ‹ฐ ์ ์ง„ ์ฆ๊ฐ€: ์ดˆ๊ธฐ์—” ๊ณผ์ œ ์„ฑ๊ณต์— ์ง‘์ค‘, ์ดํ›„ ํ‰ํ™œ์„ฑ ๊ฐ•ํ™”.
  • ํ–‰๋™ ์ง€์—ฐ ์ ์ง„ ์ถ”๊ฐ€: ์‹ค์„ธ๊ณ„ ๋น„๋™๊ธฐ์„ฑ ๋Œ€๋น„.
  • ์„ฑ๊ณต ์‹œ๊ฐ„ ์ฐฝ ์ ์ง„ ์ถ•์†Œ: ์ ์  ๋น ๋ฅธ ์žฌ๋ฐฐํ–ฅ ์š”๊ตฌ.

์„ธ ์š”์†Œ ๋ชจ๋‘ ์—ฐ์† ์„ฑ๊ณต ์ด๋™ํ‰๊ท  ์— ์—ฐ๋™๋˜์–ด, ๋ฌผ์ฒด๋ณ„ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์—†์ด ์ž๋™ ์Šค์ผ€์ผ๋ฉ๋‹ˆ๋‹ค.


Figure 5 โ€” ์ปค๋ฆฌํ˜๋Ÿผ ํ•™์Šต ํšจ์œจ ๋น„๊ต: ์ „์ฒด ์ปค๋ฆฌํ˜๋Ÿผ์ด ๊ฐ€์žฅ ๋น ๋ฅธ ์ˆ˜๋ ด๊ณผ ์ตœ๋‹ค ์—ฐ์† ์„ฑ๊ณต์„ ๋‹ฌ์„ฑ

์‹œ์Šคํ…œ ์„ค์ •

  • ํ•˜๋“œ์›จ์–ด: 16-DoF Allegro ์† + ์†๋ชฉ ์žฅ์ฐฉ Intel RealSense D435i.
  • ์ œ์–ด: ์ถ”๋ก  30Hz, ๊ด€์ ˆ ์ œ์–ด 300Hz.
  • ๋ Œ๋”๋ง ํšจ์œจ: Isaac Lab tiled ๋ Œ๋”๋Ÿฌ ๋Œ€๋น„ 1.6๋ฐฐ ๋น ๋ฆ„, 1,024 ํ™˜๊ฒฝ์—์„œ VRAM 12GB(vs 34GB), augmentation ์˜ค๋ฒ„ํ—ค๋“œ๋Š” ํ”„๋ ˆ์ž„๋‹น <22ms(~4%).

์‹คํ—˜


Figure 4 โ€” ์‹คํ—˜ ์…‹์—…(RGB ์นด๋ฉ”๋ผ + Allegro ์† + ๋‹ค์ƒ‰ ๊ด‘์›)๊ณผ ๊ณต์นญ/์ ๋Œ€์  ์กฐ๋ช… ํ•˜์˜ 5์ข… ํ…Œ์ŠคํŠธ ๋ฌผ์ฒด

ํฌ์ฆˆ ์ถ”์ • (Table II)

์„ฑ๋Šฅ ์ง€ํ‘œ๋Š” ADD(mm)์™€ ์ •ํ™•๋„(<10mm, <10ยฐ)์ž…๋‹ˆ๋‹ค.

์กฐ๋ช… ๋ฐฉ๋ฒ• ADD (mm) ์ •ํ™•๋„
๊ณต์นญ ViserDex (Ours) 10.2ยฑ0.66 65.4%
๊ณต์นญ DR Tiled 12.2ยฑ0.67 55.6%
๊ณต์นญ Naive GS (augmentation ์—†์Œ) 14.4ยฑ0.93 38.4%
์ ๋Œ€์  ViserDex (Ours) 12.9ยฑ0.69 56.3%
์ ๋Œ€์  DR Tiled 14.0ยฑ0.96 47.2%
์ ๋Œ€์  Naive GS 18.6ยฑ1.17 36.5%

์ ๋Œ€์  ์กฐ๋ช…(์ €์กฐ๋„ยท๋™์  ์ƒ‰ ๋ณ€ํ™”)์—์„œ DR Tiled ๋Œ€๋น„ ํ‰๊ท  +9.1%p ํ–ฅ์ƒ. augmentation ์—†๋Š” Naive GS๋Š” ํฌ๊ฒŒ ๋ฌด๋„ˆ์ ธ, 3DGS ์ž์ฒด๊ฐ€ ์•„๋‹ˆ๋ผ SH ๋„๋ฉ”์ธ ๋žœ๋คํ™”๊ฐ€ ํ•ต์‹ฌ ์ž„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Augmentation Ablation (Table III)

์ œ๊ฑฐ ์š”์†Œ ๊ณต์นญ ์ •ํ™•๋„ ์ ๋Œ€์  ์ •ํ™•๋„
์ „์ฒด (์—†์Œ ์ œ๊ฑฐ) 65.4% โ€”
Global Shift ์ œ๊ฑฐ 51.2% 23.6% (๋ถ•๊ดด)
Random Noise ์ œ๊ฑฐ 58.6% โ€”
Spatial Cluster ์ œ๊ฑฐ โ€” 42.5%
Color Cluster ์ œ๊ฑฐ โ€” 44.7%

ํŠนํžˆ Global Shift ์ œ๊ฑฐ ์‹œ ์ ๋Œ€์  ์กฐ๋ช…์—์„œ ์ •ํ™•๋„๊ฐ€ 23.6%๋กœ ๋ถ•๊ดดํ•ด, ์ „์—ญ ๋ฐ๊ธฐ/์ƒ‰์˜จ๋„ ๋ณ€ํ™” ๋ชจ๋ธ๋ง์ด ๊ทนํ•œ ์กฐ๋ช… ๊ฐ•๊ฑด์„ฑ์˜ ํ•ต์‹ฌ์ž„์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค. ์ƒ๊ด€๋œ(ํด๋Ÿฌ์Šคํ„ฐ ๋‹จ์œ„) ์„ญ๋™์ด ๋น„๊ตฌ์กฐ์  ๋…ธ์ด์ฆˆ๋ณด๋‹ค ๋ณธ์งˆ์ ์œผ๋กœ ์ค‘์š”ํ•จ๋„ ํ™•์ธ๋ฉ๋‹ˆ๋‹ค.

์‹ค๋กœ๋ด‡ ๋ฐฐํฌ (Table IV)

์„ฑ๋Šฅ ์ง€ํ‘œ๋Š” ํ‰๊ท  ์—ฐ์† ์„ฑ๊ณต ํšŸ์ˆ˜ ์ž…๋‹ˆ๋‹ค.

๋ฌผ์ฒด ๊ณต์นญ ์กฐ๋ช…
Cube 35.4ยฑ13.8 (DeXtreme 27.8ยฑ19.0)
3D Printed Toy 28.2ยฑ12.6
Rubber Duck 24.2ยฑ15.3
Tablet Bottle 12.6ยฑ8.8 (๋ฏธ๋ชจ๋ธ๋ง ์ €๋งˆ์ฐฐ๋กœ ์ €ํ•˜)
Globe 87.6ยฑ41.4
ํ‰๊ท  37.6ยฑ21.8

์ ๋Œ€์  ์กฐ๋ช…์—์„œ๋„ ํ‰๊ท  25.4ยฑ30.1ํšŒ ์—ฐ์† ์„ฑ๊ณต ์„ ๊ธฐ๋กํ•˜๋ฉฐ, ์ €์ž๋“ค์€ ์ด๋ฅผ ๊ทนํ•œ ์‹œ๊ฐ ์„ญ๋™ ํ•˜ ์ง€์†์  ๋Šฅ์ˆ™ ์กฐ์ž‘์˜ ์ฒซ ์‹ค์ฆ ์œผ๋กœ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

ํ•™์Šต ํšจ์œจ

  • ๊ต์‚ฌ ํ•™์Šต: Cube ๊ธฐ์ค€ 26์‹œ๊ฐ„(๋‹จ์ผ RTX 4090), ๋ณต์žก ๋ฌผ์ฒด๋Š” 90์‹œ๊ฐ„(๋“€์–ผ GPU).
  • ํ•™์ƒ distillation: 16์‹œ๊ฐ„(๋‹จ์ผ RTX 4090, 4,096 ํ™˜๊ฒฝ).
  • DeXtreme(8ร— A40, 60์‹œ๊ฐ„) ๋Œ€๋น„ ํ•œ ์ž๋ฆฟ์ˆ˜ ๊ทœ๋ชจ ํšจ์œจ ๊ฐœ์„ .

belief decoder์˜ ๊ฒฌ๊ณ ์„ฑ (Figure 7)

์ธ์œ„์  ๋…ธ์ด์ฆˆ ์ฃผ์ž… ๊ตฌ๊ฐ„์—์„œ, belief decoder๋Š” ์†์ƒ๋œ ์ž…๋ ฅ์„ ๋Šฅ๊ฐ€ํ•˜๋ฉฐ ๋‚ฎ์€ ์˜ค์ฐจ๋ฅผ ์œ ์ง€ํ–ˆ๊ณ , 180ยฐ ํ”Œ๋ฆฝ ๊ฐ™์€ ์น˜๋ช…์  ์ถ”์  ์‹คํŒจ๋ฅผ ํ•„ํ„ฐ๋ง ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹œ๊ฐ„์  belief ์ถ”์ •์ด ์ผ์‹œ์  ์ง€๊ฐ ์‹คํŒจ์— ๋Œ€ํ•œ ์•ˆ์ „์žฅ์น˜ ์—ญํ• ์„ ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.


Figure 6 โ€” ์‹ค๋กœ๋ด‡ ๋กค์•„์›ƒ ๋ฐ belief decoder์˜ 180ยฐ ์ถ”์  ์‹คํŒจ ํ•„ํ„ฐ๋ง ํšจ๊ณผ

๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

  • ์‹œ๊ฐ sim-to-real์„ ์ •๋ฉด ๊ณต๋žต. ๋Šฅ์ˆ™ ์กฐ์ž‘์˜ ํ•ต์‹ฌ ๋ณ‘๋ชฉ์ธ ๋‹จ์•ˆ ์‹œ๊ฐ ํฌ์ฆˆ ์ถ”์ •์„, 3DGS ํ‘œํ˜„ ๊ณต๊ฐ„ ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋ผ๋Š” ์ƒˆ ๊ฐ๋„๋กœ ํ’€์—ˆ์Šต๋‹ˆ๋‹ค. ablation์—์„œ Naive GS๊ฐ€ ๋ฌด๋„ˆ์ง€๋Š” ๊ฒƒ์„ ๋ณด์—ฌ, ๊ธฐ์—ฌ์˜ ์›์ฒœ์ด โ€œ3DGS ์‚ฌ์šฉโ€์ด ์•„๋‹ˆ๋ผ โ€œSH ์‚ฌ์ „ ๋ž˜์Šคํ„ฐํ™” augmentationโ€์ž„์„ ๋ช…ํ™•ํžˆ ๋ถ„๋ฆฌํ•œ ์ ์ด ์„ค๋“๋ ฅ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ ‘๊ทผ์„ฑ/ํšจ์œจ. ์นด๋ฉ”๋ผ ํ•œ ๋Œ€ + ์†Œ๋น„์ž๊ธ‰ GPU๋กœ ํ•™์Šตยท๋ฐฐํฌ๊ฐ€ ๊ฐ€๋Šฅํ•ด, 8ร— A40 ๊ฐ™์€ ๋Œ€๊ทœ๋ชจ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์š”๊ตฌํ•˜๋˜ ์„ ํ–‰ ์—ฐ๊ตฌ์˜ ์ง„์ž… ์žฅ๋ฒฝ์„ ํฌ๊ฒŒ ๋‚ฎ์ท„์Šต๋‹ˆ๋‹ค. ๋ Œ๋”๋ง 1.6๋ฐฐ ๊ฐ€์†, VRAM 1/3 ์ ˆ๊ฐ๋„ ์‹ค์šฉ์ ์ž…๋‹ˆ๋‹ค.
  • ๊ทนํ•œ ์กฐ๋ช… ๊ฐ•๊ฑด์„ฑ์˜ ์‹ค์ฆ. ์ ๋Œ€์  ์กฐ๋ช…(์ €์กฐ๋„ยท๋™์  ์ƒ‰)์—์„œ ํ‰๊ท  25.4ํšŒ ์—ฐ์† ์„ฑ๊ณต์€, ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์†์•ˆ ์žฌ๋ฐฐํ–ฅ์—์„œ ๋ณด๊ธฐ ๋“œ๋ฌธ ๊ฐ•๊ฑด์„ฑ ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค.
  • ๊ฒฝ๋Ÿ‰ ์ปค๋ฆฌํ˜๋Ÿผ. ADR์„ ์—ฐ์† ์„ฑ๊ณต ๊ธฐ๋ฐ˜ ์ž๋™ ์Šค์ผ€์ผ ์ปค๋ฆฌํ˜๋Ÿผ์œผ๋กœ ๋Œ€์ฒดํ•ด, ๋ฌผ์ฒด๋ณ„ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๋ถ€๋‹ด์„ ์—†์•ค ์ ์ด ๊น”๋”ํ•ฉ๋‹ˆ๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

  • ๋ฌผ๋ฆฌ ๋ชจ๋ธ๋ง ์˜์กด. Tablet Bottle์ด 12.6ํšŒ๋กœ ์ €ํ•˜๋œ ์›์ธ์ด โ€œ๋ฏธ๋ชจ๋ธ๋ง ์ €๋งˆ์ฐฐโ€์ด๋ผ๋Š” ์ ์€, ์‹œ๊ฐ์€ ๊ฐ•๊ฑดํ•ด์กŒ์œผ๋‚˜ ๋™์—ญํ•™ ์ •ํ™•๋„๊ฐ€ ์—ฌ์ „ํžˆ ์„ฑ๋Šฅ ์ƒํ•œ์„ ์ขŒ์šฐ ํ•จ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค(์ถ”์ธก).
  • ๋ฌผ์ฒด๋ณ„ ํ‚คํฌ์ธํŠธ. ํฌ์ฆˆ ์ถ”์ •๊ธฐ๊ฐ€ ๋ฌผ์ฒด๋ณ„ 8๊ฐœ ํ‚คํฌ์ธํŠธ๋ฅผ ์“ฐ๋ฏ€๋กœ, ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ๋ฌผ์ฒด๋กœ์˜ ์ฆ‰์‹œ ์ผ๋ฐ˜ํ™”(category-level/novel object)๋Š” ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค(์ถ”์ธก).
  • ์žฌ๊ตฌ์„ฑ ์ „์ฒ˜๋ฆฌ ๋น„์šฉ. ๊ฐ ๋ฌผ์ฒด์˜ 3DGS ์ž์‚ฐ์„ ์‚ฌ์ „์— ์žฌ๊ตฌ์„ฑํ•ด์•ผ ํ•˜๋ฏ€๋กœ, ๋Œ€๊ทœ๋ชจ ๋ฌผ์ฒด๊ตฐ์œผ๋กœ ํ™•์žฅ ์‹œ ์ž์‚ฐ ์ค€๋น„ ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋น„์šฉ์ด ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค.
  • ๊ฐ•์ฒด ๊ฐ€์ •. ๋ณ€ํ˜•์ฒดยท๊ด€์ ˆ ๋ฌผ์ฒด๋กœ์˜ ํ™•์žฅ์€ ๋ณธ ํ‹€์—์„œ ์ง์ ‘ ๋‹ค๋ค„์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

ViserDex๋Š” ๋‹จ์•ˆ RGB ๊ธฐ๋ฐ˜ ์†์•ˆ ์žฌ๋ฐฐํ–ฅ์˜ ์‹œ๊ฐ sim-to-real ๊ฒฉ์ฐจ ๋ฅผ, 3D Gaussian Splatting ํ‘œํ˜„ ๊ณต๊ฐ„์—์„œ์˜ ๋„๋ฉ”์ธ ๋žœ๋คํ™” ๋กœ ํ•ด์†Œํ•ฉ๋‹ˆ๋‹ค. ๋ž˜์Šคํ„ฐํ™” ์ด์ „ SH ๊ณ„์ˆ˜์— ๊ฐ€ํ•˜๋Š” ๊ณต๊ฐ„/์ƒ‰/์ „์—ญ ํด๋Ÿฌ์Šคํ„ฐ augmentation ์œผ๋กœ ๊ด‘ํ˜„์‹ค์  ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ray tracing ์—†์ด ํšจ์œจ์ ์œผ๋กœ ๋งŒ๋“ค๊ณ , ๊ต์‚ฌ-ํ•™์ƒ distillation + ์„ฑ๋Šฅ ๊ธฐ๋ฐ˜ ์ปค๋ฆฌํ˜๋Ÿผ RL ๋กœ ๊ฐ•๊ฑดํ•œ ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์ˆ˜์น˜๋กœ ์ •๋ฆฌํ•˜๋ฉด, ํฌ์ฆˆ ์ถ”์ •์€ ๊ณต์นญ/์ ๋Œ€์  ์กฐ๋ช…์—์„œ ๊ฐ๊ฐ 65.4%/56.3% ์ •ํ™•๋„(DR Tiled ๋Œ€๋น„ ์šฐ์œ„), ์‹ค๋กœ๋ด‡ ๋ฐฐํฌ๋Š” ๊ณต์นญ ์กฐ๋ช… ํ‰๊ท  37.6ํšŒ, ์ ๋Œ€์  ์กฐ๋ช… ํ‰๊ท  25.4ํšŒ ์—ฐ์† ์„ฑ๊ณต ์„ ๋‹ฌ์„ฑํ–ˆ๊ณ , ํ•™์Šต์€ ์†Œ๋น„์ž๊ธ‰ RTX 4090์œผ๋กœ ๊ฐ€๋Šฅํ•ด DeXtreme ๋Œ€๋น„ ํ•œ ์ž๋ฆฟ์ˆ˜ ํšจ์œจ ๊ฐœ์„  ์„ ์ด๋ค˜์Šต๋‹ˆ๋‹ค.

์‹ค๋ฌด ๊ด€์ ์˜ ๊ฐ€์น˜๋Š” โ€œ์นด๋ฉ”๋ผ ํ•œ ๋Œ€์™€ ์†Œ๋น„์ž๊ธ‰ GPU๋งŒ์œผ๋กœ, ๊ทนํ•œ ์กฐ๋ช…์—์„œ๋„ ๊ฒฌ๋””๋Š” ๋Šฅ์ˆ™ ์กฐ์ž‘โ€ ์„ ์‹ค์ฆํ–ˆ๋‹ค๋Š” ๋ฐ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌ ๋ชจ๋ธ๋ง ์˜์กด์„ฑ๊ณผ ๋ฌผ์ฒด๋ณ„ ์ž์‚ฐ ์ค€๋น„๋ผ๋Š” ํ•œ๊ณ„๋Š” ๋‚จ์ง€๋งŒ, 3DGS ํ‘œํ˜„ ๊ณต๊ฐ„ ๋„๋ฉ”์ธ ๋žœ๋คํ™” ๋ผ๋Š” ์•„์ด๋””์–ด๋Š” ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ์กฐ์ž‘์˜ sim-to-real ์ „์ด์—์„œ ๊ฐ•๋ ฅํ•œ ์ƒˆ ํ‘œ์ค€์ ์ด ๋  ์ž ์žฌ๋ ฅ์ด ํฝ๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee