Curieux.JY
  • JungYeon Lee
  • Post
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ์„œ๋ก : ์™œ ๋กœ๋ด‡์€ ์•„์ง๋„ ์†์„ ๋ชป ์“ฐ๋Š”๊ฐ€?
    • ๋ฐฉ๋ฒ• I: DIGIT ์„ผ์„œ ์„ค๊ณ„
      • ๋น„์ „ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ์˜ ์›๋ฆฌ
      • ๊ธฐ๊ณ„์  ์„ค๊ณ„: ์†๊ฐ€๋ฝ ๋์— ๋“ค์–ด๊ฐ€๋Š” ์นด๋ฉ”๋ผ
      • ์ „์ž ์„ค๊ณ„: 7cmยฒ์— ๋‹ด์€ ์นด๋ฉ”๋ผ ์‹œ์Šคํ…œ
      • ์—˜๋ผ์Šคํ† ๋จธ ์„ค๊ณ„: ๋‚ด๊ตฌ์„ฑ์˜ ํ˜์‹ 
    • ๋ฐฉ๋ฒ• II: ์ด‰๊ฐ ๊ธฐ๋ฐ˜ ์ธ-ํ•ธ๋“œ ์กฐ์ž‘ ํ•™์Šต
      • ์‹œ์Šคํ…œ ํŒŒ์ดํ”„๋ผ์ธ ๊ฐœ์š”
      • ์ž๊ธฐ์ง€๋„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
      • ํ‚คํฌ์ธํŠธ ์˜คํ† ์ธ์ฝ”๋”: ์ด๋ฏธ์ง€๋ฅผ 14์ฐจ์›์œผ๋กœ ์••์ถ•ํ•˜๊ธฐ
      • ๋™์—ญํ•™ ๋ชจ๋ธ: Struct-NN
      • ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ œ์–ด: MPC + CEM
    • ์‹คํ—˜: ๊ฒฐ๊ณผ์™€ ํ•ด์„
      • ๋™์˜์ƒ ์˜ˆ์ธก ๋ชจ๋ธ ์„ฑ๋Šฅ
      • ๊ตฌ์Šฌ ์กฐ์ž‘ ์‹คํ—˜
    • ์ „์ฒด ์‹œ์Šคํ…œ ํ๋ฆ„๋„
    • ๋น„ํŒ์  ๊ณ ์ฐฐ: ๊ฐ•์ ๊ณผ ํ•œ๊ณ„
      • ๊ฐ•์ 
      • ์•ฝ์ ๊ณผ ํ•œ๊ณ„
    • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
    • Allegro Hand ์—ฐ๊ตฌ์ž๋ฅผ ์œ„ํ•œ ์‹œ์‚ฌ์ 
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 
    • ์ฐธ๊ณ ๋ฌธํ—Œ (์ฃผ์š”)

๐Ÿ“ƒXL-VLA ๋ฆฌ๋ทฐ

cross-embodiment
vla
dexterity
latent
Cross-Hand Latent Representation for Vision-Language-Action Models
Published

March 13, 2026

  • Paper Link
  • Project Link
  1. ๐Ÿ’ก XL-VLA๋Š” ๋‹ค์–‘ํ•œ dexterous hand๋“ค ๊ฐ„์— ๊ณต์œ ๋˜๋Š” ํ†ต์ผ๋œ latent action space๋ฅผ ํ™œ์šฉํ•˜์—ฌ scalableํ•œ cross-embodiment dexterous manipulation์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” Vision-Language-Action (VLA) ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค.
  2. ๐Ÿ› ๏ธ ์ด embodiment-invariant latent space๋Š” unsupervised autoencoder๋ฅผ ํ†ตํ•ด ์‚ฌ์ „ ํ•™์Šต๋˜๋ฉฐ, reconstruction, retargeting, ๊ทธ๋ฆฌ๊ณ  latent regularization ์†์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์†์˜ ๊ธฐ๊ตฌํ•™์  ์ฐจ์ด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋‹ค๋ฆฌ ๋†“์Šต๋‹ˆ๋‹ค.
  3. ๐Ÿ“ˆ ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜์—์„œ XL-VLA๋Š” ๊ธฐ์กด VLA ๋ชจ๋ธ๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๊ณ , ์ƒˆ๋กœ์šด hand-task ์กฐํ•ฉ์— ๋Œ€ํ•œ zero-shot generalization ๋Šฅ๋ ฅ์„ ์ž…์ฆํ•˜์—ฌ ํšจ์œจ์ ์ธ ๋ฐ์ดํ„ฐ ์žฌํ™œ์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

XL-VLA ๋…ผ๋ฌธ์€ Vision-Language-Action (VLA) ๋ชจ๋ธ์„ ์œ„ํ•œ Cross-Hand Latent Representation์„ ์ œ์•ˆํ•˜์—ฌ, ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ Dexterous Hand์— ๊ฑธ์ณ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๋กœ๋ด‡ ์กฐ์ž‘(Manipulation)์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด VLA ๋ชจ๋ธ์€ ๋กœ๋ด‡์˜ Morphology์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋Š” ํ–‰๋™ ๊ณต๊ฐ„(Action Space) ๋•Œ๋ฌธ์— ์ƒˆ๋กœ์šด ๋กœ๋ด‡์ด ๋“ฑ์žฅํ•  ๋•Œ๋งˆ๋‹ค ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ์žฌํ•™์Šตํ•ด์•ผ ํ•˜๋Š” ๋น„ํšจ์œจ์„ฑ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ํŠนํžˆ Dexterous Hand์˜ ๊ฒฝ์šฐ, ๊ด€์ ˆ ์œ„์น˜(Joint Position) ํŒŒ๋ผ๋ฏธํ„ฐํ™”๊ฐ€ embodiment๋งˆ๋‹ค ํฌ๊ฒŒ ๋‹ฌ๋ผ์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋‹ค์–‘ํ•œ Dexterous Hand์— ๊ฑธ์ณ ๊ณต์œ ๋˜๋Š” ํ†ตํ•ฉ๋œ Latent Action Space๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  (Core Methodology)

XL-VLA์˜ ํ•ต์‹ฌ์€ ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค: (1) ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ž…๋ ฅ(Vision V, Language T)์„ ์ธ์ฝ”๋”ฉํ•˜๋Š” VLA Backbone, (2) Cross-Embodiment Transfer๋ฅผ ์œ„ํ•ด ๋ฏธ๋ฆฌ ํ•™์Šต๋œ(pretrained) Latent Encoder ๋ฐ Decoder ์„ธํŠธ.

  1. ๋ฌธ์ œ ์ •์˜ (Problem Formulation): ๊ฐ Dexterous Hand h \in H๋Š” d_h๊ฐœ์˜ actuated joints๋ฅผ ๊ฐ€์ง€๋ฉฐ, ์ ˆ๋Œ€ ๊ด€์ ˆ ํšŒ์ „(Absolute Joint Rotations) q^{(h)} \in \mathbb{R}^{d_h}๋ฅผ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์€ Action Chunk ๋‹จ์œ„๋กœ ์ž‘๋™ํ•˜๋ฉฐ, ๊ฐ Action q^{(h)}_t \in \mathbb{R}^{64 \times d_h}๋Š” 20Hz๋กœ ์ƒ˜ํ”Œ๋ง๋œ 64๊ฐœ์˜ ๊ด€์ ˆ ์œ„์น˜ ๋ช…๋ น์–ด ์‹œํ€€์Šค(3.2์ดˆ์˜ ๋™์ž‘)์ž…๋‹ˆ๋‹ค. ์ •์ฑ…์€ ํ˜„์žฌ ๋‹จ๊ณ„ t์—์„œ ์ด์ „ ๊ด€์ ˆ ์ƒํƒœ, ์ด์ „์— ์‹คํ–‰๋œ Action Chunk q^{(h)}_t, ํ˜„์žฌ ์ด๋ฏธ์ง€ V, ์–ธ์–ด ์ง€์‹œ T๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๋‹ค์Œ Chunk q^{(h)}_{t+1}๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค: q^{(h)}_{t+1} = F(q^{(h)}_t, V, T) ์—ฌ๊ธฐ์„œ F๋Š” Hand-Agnostic ๋ชจ๋ธ์ด๋ฉฐ, Hand ID h๋Š” ์ ์ ˆํ•œ Encoder/Decoder๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐ๋งŒ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

  2. XL-VLA ํŒŒ์ดํ”„๋ผ์ธ: XL-VLA๋Š” \pi_0 [6]์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด \pi_0๊ฐ€ proprioceptive history๋ฅผ state token ์Šคํƒ์œผ๋กœ ์ œ๊ณตํ–ˆ๋˜ ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ, XL-VLA์—์„œ๋Š” latent action token์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ Hand h์— ๋Œ€ํ•ด, Hand-specific Encoder E_h๋Š” ์ด์ „ ์ ˆ๋Œ€ ๊ด€์ ˆ ์œ„์น˜ Action Chunk q^{(h)}_t๋ฅผ ์••์ถ•๋œ Latent Vector z_t = E_h(q^{(h)}_t)๋กœ ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค. VLA ๋ชจ๋ธ์€ ์ด๋Ÿฌํ•œ Latent Token๋“ค์˜ ์งง์€ History์™€ Vision ๋ฐ Language Token์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ Latent Chunk \hat{z}_{t+1}์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด Latent Vector๋Š” Embodiment-specific Decoder D_h์— ์˜ํ•ด ๋‹ค์Œ ๊ด€์ ˆ ๋ช…๋ น Chunk \hat{q}^{(h)}_{t+1} = D_h(\hat{z}_{t+1})๋กœ ๋””์ฝ”๋”ฉ๋ฉ๋‹ˆ๋‹ค. VLA Fine-tuning ์ค‘์—๋Š” ๋ชจ๋“  Latent Encoder์™€ Decoder๋Š” Frozen ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

  3. Latent Space ํ•™์Šต (Latent Space Learning): Latent Space๋Š” ๋ฉ€ํ‹ฐ ํ—ค๋“œ VAE(Variational Autoencoder) ์Šคํƒ€์ผ์˜ Autoencoder๋ฅผ ํ†ตํ•ด VLA ๋ชจ๋ธ๊ณผ ๋…๋ฆฝ์ ์œผ๋กœ ์‚ฌ์ „ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ๊ฐ Hand Type h \in H์— ๋Œ€ํ•ด Hand-specific Encoder E_h์™€ Decoder D_h๊ฐ€ ์ •์˜๋ฉ๋‹ˆ๋‹ค. Input q^{(h)}๋Š” Encoder MLP๋ฅผ ํ†ตํ•ด ๊ณตํ†ต Latent Space๋กœ ํˆฌ์˜๋˜๊ณ , Decoder MLP๋Š” Latent Embedding์„ Hand์˜ ์›๋ž˜ ๊ด€์ ˆ ๊ตฌ์„ฑ์œผ๋กœ ์žฌํˆฌ์˜ํ•ฉ๋‹ˆ๋‹ค.

    ์˜๋ฏธ ์žˆ๋Š” Cross-Embodiment Latent Space๋ฅผ ํ˜•์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์„ธ ๊ฐ€์ง€ ํ›ˆ๋ จ ์ œ์•ฝ ์กฐ๊ฑด์ด ๋ถ€๊ณผ๋ฉ๋‹ˆ๋‹ค:

    • ์žฌ๊ตฌ์„ฑ ์†์‹ค (L_1, Reconstruction Loss): Encoder-Decoder ์Œ์ด ํ•ด๋‹น Hand์— ๋Œ€ํ•œ Autoencoder๋กœ ์ž‘๋™ํ•˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. L_1 = L_{rec} = \frac{1}{|H|} \sum_{h \in H} \text{MSE}(\hat{q}^{(h)}, q^{(h)}) ์ด๋Š” Latent Space๊ฐ€ Hand-specific kinematics๋ฅผ ๋ณด์กดํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    • ๋ฆฌํƒ€๊ฒŸํŒ… ์†์‹ค (L_2, Retargeting Loss): ๋‹ค๋ฅธ Dexterous Hand ๋กœ๋ด‡ ๊ฐ„์˜ Fingertip Geometry๋ฅผ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค. ๊ฐ Hand h์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ Forward Kinematics (FK)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ด€์ ˆ์„ Fingertip Position p^{(h)}_i์— ๋งคํ•‘ํ•˜๊ณ , Fingertip Displacement \delta^{(h)}_{ij} = p^{(h)}_i - p^{(h)}_j๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. L_2 = \frac{1}{|H|(|H|-1)|P|} \sum_{s \neq t} \sum_{(i,j) \in P} w^{(s)}_{ij} \left[ \lambda_{dis} \| \delta^{(s)}_{ij} \|^2 - \| \hat{\delta}^{(t)}_{ij} \|^2 \right]^2 + \lambda_{dir}(1 - c^{(s,t)}_{ij}) ์—ฌ๊ธฐ์„œ \hat{\delta}^{(t)}_{ij}๋Š” Hand t์˜ ๋””์ฝ”๋”ฉ๋œ ๊ตฌ์„ฑ์—์„œ ๊ณ„์‚ฐ๋˜๋ฉฐ, c^{(s,t)}_{ij}๋Š” Pinch Directions \delta^{(s)}_{ij}์™€ \hat{\delta}^{(t)}_{ij} ์‚ฌ์ด์˜ ๊ฐ๋„ ์ฝ”์‚ฌ์ธ ๊ฐ’์ž…๋‹ˆ๋‹ค. w^{(s)}_{ij} = \exp(-\lambda_{exp} \| \delta^{(s)}_{ij} \|^2)๋Š” ๊ฐ•ํ•œ Pinch์— ๊ฐ€์ค‘์น˜๋ฅผ ๋‘ก๋‹ˆ๋‹ค. ์ด ์†์‹ค์€ ๋™์ผํ•œ Latent Code๊ฐ€ ๋‹ค์–‘ํ•œ Hand์—์„œ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ์ผ๊ด€๋œ Pinch Behavior๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    • Latent ์†์‹ค (L_3, Latent Loss): Dexterous Hand Latent Space๋ฅผ ๋ถ€๋“œ๋Ÿฝ๊ณ  ์ž˜ ์ž‘๋™ํ•˜๋„๋ก ์ •๊ทœํ™”ํ•˜๊ธฐ ์œ„ํ•ด Latent ๋ณ€์ˆ˜์— ํ‘œ์ค€ ๊ฐ€์šฐ์‹œ์•ˆ ์‚ฌ์ „(Standard Gaussian Prior)์„ ๋ถ€๊ณผํ•ฉ๋‹ˆ๋‹ค. L_3 = L_{KL} = \mathbb{E}_q[ \text{KL}(q(z | q) \| \mathcal{N}(0, I)) ] ์ด๋Š” ๊ณต์œ  Latent Space๊ฐ€ \mathcal{N}(0, I) ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋„๋ก ๊ถŒ์žฅํ•˜๋ฉฐ, Sampling ๋ฐ Interpolation์„ ์šฉ์ดํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

    ์ด Latent ๋ชฉ์  ํ•จ์ˆ˜ (Total Latent Objective)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: L_{latent} = L_1 + L_2 + \beta L_3 ์—ฌ๊ธฐ์„œ \beta = 10^{-5}, \lambda_{dis} = 2000.0, \lambda_{dir} = 5.0, \lambda_{exp} = 12.0๋กœ ๊ณ ์ •๋ฉ๋‹ˆ๋‹ค.

    ์ด Latent Autoencoder๋Š” ์–ด๋– ํ•œ Demonstration์ด๋‚˜ Inverse Kinematics (IK)๋กœ ์ƒ์„ฑ๋œ Trajectory ์—†์ด ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ๋Œ€์‹ , ๊ฐ Hand s \in H์— ๋Œ€ํ•ด ํ•˜๋“œ์›จ์–ด ๊ด€์ ˆ ํ•œ๊ณ„ ๋‚ด์—์„œ ๋ฌด์ž‘์œ„๋กœ ๊ด€์ ˆ ๊ตฌ์„ฑ q^{(s)}๋ฅผ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค. Latent ๊ณต๊ฐ„์˜ ์ •๋ ฌ์€ ์™„์ „ํžˆ Self-supervised ๋ฐฉ์‹์œผ๋กœ ์ด๋ฃจ์–ด์ง€๋ฉฐ, Cross-Hand Trajectory ์Œ์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ (Experiments and Results)

๋ณธ ์—ฐ๊ตฌ๋Š” 10๊ฐ€์ง€ ๋‹ค์–‘ํ•œ ์กฐ์ž‘ Task์™€ Ability, Paxini DexH13, X-Hand1, Inspire ๋“ฑ 4๊ฐ€์ง€ Dexterous Hand๋ฅผ ํฌํ•จํ•˜๋Š” ๋Œ€๊ทœ๋ชจ Teleoperation Dataset์„ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค (์ด 2M State-Action Pair). ์‹คํ—˜์€ xArm๊ณผ Unitree G1 ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  1. VLA + Latent ํ†ตํ•ฉ์˜ ํšจ๊ณผ (Effectiveness of VLA + Latent Integration):
    • Cross-Hand ๋ฐ์ดํ„ฐ ์Šค์ผ€์ผ๋ง: XL-VLA๋Š” \pi_0 baseline ๋Œ€๋น„ ๋ชจ๋“  Hand ๋ฐ Task์—์„œ ์ผ๊ด€๋˜๊ณ  ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค (Table 2). \pi_0์˜ ํ‰๊ท  ์„ฑ๊ณต๋ฅ ์€ 0.32์— ๋ถˆ๊ณผํ–ˆ์ง€๋งŒ, XL-VLA๋Š” 0.72๋ฅผ ๊ธฐ๋กํ•˜์—ฌ 40% ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ •๊ตํ•œ ์กฐ์ž‘ Task์—์„œ ๋‘๋“œ๋Ÿฌ์ง„ ๊ฐœ์„ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
    • Cross-Robot ๋ฐ์ดํ„ฐ ์Šค์ผ€์ผ๋ง: Tabletop xArm๊ณผ ํœด๋จธ๋…ธ์ด๋“œ G1์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ํ•™์Šต์‹œ์ผฐ์„ ๋•Œ, XL-VLA๋Š” G1์—์„œ \pi_0 ๋Œ€๋น„ 57% ๋” ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค (XL-VLA: 0.825, \pi_0: 0.525) (Figure 5, Table 6). ์ด๋Š” ํ†ต์ผ๋œ Latent Space๊ฐ€ ์ด์ข… ๋กœ๋ด‡ ์‹œ์Šคํ…œ ๊ฐ„์—๋„ ์œ ์ตํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
    • Zero-Shot Task ์ผ๋ฐ˜ํ™”: XL-VLA๋Š” Hold-out๋œ Task์— ๋Œ€ํ•ด Zero-Shot์œผ๋กœ ์ผ๋ฐ˜ํ™”ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค (Figure 4). ํ‘œ์ค€ Kinematic Retargeting ๊ธฐ๋ฐ˜์˜ \pi_0+RT baseline๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, XL-VLA๋Š” ๋ชจ๋“  Embodiment์™€ Task์—์„œ ์ผ๊ด€๋˜๊ฒŒ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ํŠนํžˆ ๋ฏธ์„ธํ•œ Dexterous Task์—์„œ ๊ทธ ์ด์ ์ด ๋”์šฑ ๋ช…ํ™•ํ–ˆ์Šต๋‹ˆ๋‹ค.
  2. Latent Action Space์˜ ํšจ๊ณผ (Effectiveness of the Latent Action Space):
    • Latent Replay ๋น„๊ต: Latent Action Diffusion (LAD) [2]์™€ ๊ฐ™์€ Supervised Latent Space Retargeting ๋ฐฉ๋ฒ•๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, XL-VLA์˜ Latent Space๋Š” ํ›จ์”ฌ ๋›ฐ์–ด๋‚œ Replay ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค (Table 4). LAD๊ฐ€ 0.60, 0.61์— ๊ทธ์นœ ๋ฐ˜๋ฉด, XL-VLA๋Š” 0.82, 0.81์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” XL-VLA์˜ Latent Space๊ฐ€ Unsupervised ๋ฐฉ์‹์œผ๋กœ๋„ Embodiment-invariant ๊ตฌ์กฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํฌ์ฐฉํ•จ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
    • ์„ค๊ณ„ ์„ ํƒ ๋น„๊ต (Design Choice Comparison): Ablation Study๋ฅผ ํ†ตํ•ด Latent Space์˜ ์•„ํ‚คํ…์ฒ˜ ๋ฐ ์†์‹ค ํ•จ์ˆ˜ ์„ค๊ณ„๊ฐ€ ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค (Table 5). ์ตœ์ข… ๊ตฌ์„ฑ (Hidden Size H128->64, Latent Dimension 32)์€ ์žฌ๊ตฌ์„ฑ ์ •ํ™•๋„(Reconstruction Accuracy), Cross-Embodiment Retargeting, Latent Continuity, Interpolation Smoothness ๋“ฑ ๋‹ค์–‘ํ•œ Metric์—์„œ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ ๊ท ํ˜•์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ์žฌ๊ตฌ์„ฑ ์†์‹ค(L_1)๊ณผ ๋ฆฌํƒ€๊ฒŸํŒ… ์†์‹ค(L_2) ๋ชจ๋‘ Cross-Embodiment ์„ฑ๋Šฅ์— ํ•„์ˆ˜์ ์ž„์ด ๋ฐํ˜€์กŒ์Šต๋‹ˆ๋‹ค. Latent Dimension์ด ๋„ˆ๋ฌด ์ปค์ง€๋ฉด(์˜ˆ: L128) Embodiment-invariant ๊ตฌ์กฐ๋ฅผ ๋ฐฉํ•ดํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก  (Conclusion)

XL-VLA๋Š” ํ†ตํ•ฉ๋œ Latent Action Space๋ฅผ ํ†ตํ•ด Vision-Language-Action ๋ชจ๋ธ์„ Dexterous Manipulation์— ์ ์šฉํ•˜๋Š” ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ•ธ๋“œ์— ๊ฑธ์ณ ์›ํ™œํ•œ ํ›ˆ๋ จ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ณ , ์ƒˆ๋กœ์šด Hand-Task ์กฐํ•ฉ์— ๋Œ€ํ•œ Zero-Shot ์ผ๋ฐ˜ํ™”๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹ค์ œ ์‹คํ—˜์„ ํ†ตํ•ด XL-VLA๋Š” ํ‘œ์ค€ VLA ๋ชจ๋ธ ๋ฐ Retargeting ๊ธฐ๋ฐ˜ Baseline์„ ์ผ๊ด€๋˜๊ฒŒ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” Latent Action Space๊ฐ€ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•˜๊ณ  ๋ฐ์ดํ„ฐ ํšจ์œจ์ ์ธ Dexterous Manipulation ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ๊ธฐ๋ฐ˜์ด ๋  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

์„œ๋ก : ์™œ ๋กœ๋ด‡์€ ์•„์ง๋„ ์†์„ ๋ชป ์“ฐ๋Š”๊ฐ€?

์ž ๊น ์ƒ๊ฐํ•ด๋ณด์ž. ๋‹น์‹ ์ด ์ฑ…์ƒ ์œ„์— ๋†“์ธ ์œ ๋ฆฌ ๊ตฌ์Šฌ์„ ์ง‘์–ด ์†๊ฐ€๋ฝ ์‚ฌ์ด์—์„œ ๊ตด๋ฆฐ๋‹ค๊ณ  ํ•ด๋ณด์ž. ์ด ๋™์ž‘์ด ์–ผ๋งˆ๋‚˜ ๋ณต์žกํ•œ์ง€๋ฅผ. ์†๊ฐ€๋ฝ์ด ๊ตฌ์Šฌ ์œ„๋ฅผ ๋ฏธ๋„๋Ÿฌ์ง€์ง€ ์•Š๊ฒŒ ์ ๋‹นํ•œ ํž˜์„ ์ฃผ๋ฉด์„œ๋„, ๋„ˆ๋ฌด ์„ธ๊ฒŒ ์žก์•„ ๊ตฌ์Šฌ์ด ํŠ€์–ด๋‚˜๊ฐ€์ง€ ์•Š๊ฒŒ ํ•ด์•ผ ํ•œ๋‹ค. ๊ตฌ์Šฌ์ด ์–ด๋””์— ์žˆ๋Š”์ง€, ์–ผ๋งˆ๋‚˜ ๋ˆŒ๋ ธ๋Š”์ง€, ๋ฏธ๋„๋Ÿฌ์ง€๋ ค ํ•˜๋Š”์ง€โ€”์ด ๋ชจ๋“  ์ •๋ณด๋ฅผ ๋‹น์‹ ์˜ ์†๋ ์‹ ๊ฒฝ์ด ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋‡Œ์— ์ „๋‹ฌํ•˜๊ณ  ์žˆ๋‹ค.

๋กœ๋ด‡์ด ์ด๊ฑธ ๋ชป ํ•˜๋Š” ์ด์œ ๊ฐ€ ๋ญ˜๊นŒ? ๋ฌผ๋ก  ์—ฌ๋Ÿฌ ์ด์œ ๊ฐ€ ์žˆ์ง€๋งŒ, ์ด‰๊ฐ ์„ผ์„œ์˜ ๋ถ€์žฌ๊ฐ€ ํ•ต์‹ฌ ๋ณ‘๋ชฉ ์ค‘ ํ•˜๋‚˜๋‹ค. ๋กœ๋ด‡์ด ๋ฌผ์ฒด๋ฅผ ์žก์„ ๋•Œ ๋ฌด์Šจ ์ผ์ด ๋ฒŒ์–ด์ง€๋Š”์ง€ โ€œ๋А๋‚„โ€ ์ˆ˜ ์—†๋‹ค๋ฉด, ์ •๊ตํ•œ ์กฐ์ž‘์€ ๊ทผ๋ณธ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ์นด๋ฉ”๋ผ๋กœ ์†์˜ ๋ฐ”๊นฅ์„ ๋ณด๋Š” ๊ฑด ์†๊ฐ€๋ฝ ๋‚ด๋ถ€์˜ ์ ‘์ด‰ ์ƒํ™ฉ์„ ์•Œ๋ ค์ฃผ์ง€ ๋ชปํ•œ๋‹ค.

์ด ๋…ผ๋ฌธ์ด ๋“ฑ์žฅํ•œ ๋ฐฐ๊ฒฝ์ด ๋ฐ”๋กœ ์—ฌ๊ธฐ์— ์žˆ๋‹ค. DIGIT๋Š” Facebook AI Research(FAIR) ํŒ€์ด ๊ฐœ๋ฐœํ•œ ๋น„์ „ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ๋กœ, ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ๋ฅผ ๋™์‹œ์— ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•œ๋‹ค.

๋ฌธ์ œ 1: ๊ธฐ์กด ์ด‰๊ฐ ์„ผ์„œ๋“ค์€ ์™œ ์•ˆ ์“ฐ์ด๋‚˜?

๊ธฐ์กด ๊ณ ํ•ด์ƒ๋„ ์ด‰๊ฐ ์„ผ์„œ(GelSight ๋“ฑ)๋Š” ์„ฑ๋Šฅ์€ ์ข‹์ง€๋งŒ ๋ถ€ํ”ผ๊ฐ€ ๋„ˆ๋ฌด ํฌ๊ณ , ์ œ์กฐ ์žฌํ˜„์„ฑ์ด ๋‚ฎ์œผ๋ฉฐ, ๋น„์ŒŒ๋‹ค. ๋ฐ˜๋ฉด ์ €๋ ดํ•œ ์••๋ ฅ ์„ผ์„œ๋“ค์€ ๊ณต๊ฐ„ ํ•ด์ƒ๋„๊ฐ€ ๋‚ฎ์•„ ์„ฌ์„ธํ•œ ์กฐ์ž‘์— ์“ฐ๊ธฐ ์–ด๋ ค์› ๋‹ค. โ€œ์„ฑ๋Šฅ vs. ์‹ค์šฉ์„ฑโ€์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๊ฐ€ ์˜ค๋žซ๋™์•ˆ ์—ฐ๊ตฌ์ž๋“ค์„ ๊ดด๋กญํ˜€ ์™”๋‹ค.

๋ฌธ์ œ 2: ๊ณ ํ•ด์ƒ๋„ ์ด‰๊ฐ์œผ๋กœ ์‹ค์ œ ์กฐ์ž‘์„ ์–ด๋–ป๊ฒŒ ํ•˜๋‚˜?

์„ค๋ น ์ข‹์€ ์„ผ์„œ๊ฐ€ ์žˆ๋”๋ผ๋„, 640ร—480 ํ”ฝ์…€์งœ๋ฆฌ ์ด๋ฏธ์ง€๊ฐ€ 60fps๋กœ ์Ÿ์•„์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์‹ค์‹œ๊ฐ„ ์ œ์–ด์— ์“ฐ๊ธฐ๋Š” ๊ณ„์‚ฐ์ ์œผ๋กœ ๋ถ€๋‹ด์Šค๋Ÿฝ๋‹ค. ์—ฌ๋Ÿฌ ์†๊ฐ€๋ฝ์—์„œ ๋™์‹œ์— ๋“ค์–ด์˜ค๋Š” ์ด‰๊ฐ ์ŠคํŠธ๋ฆผ์„ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•˜๋Š”๊ฐ€?

DIGIT๋Š” ์ด ๋‘ ๋ฌธ์ œ์— ๋Œ€ํ•œ ๊ณตํ•™์ ยท์•Œ๊ณ ๋ฆฌ์ฆ˜์  ํ•ด๋‹ต์„ ๋™์‹œ์— ์ œ์‹œํ•œ๋‹ค.


๋ฐฉ๋ฒ• I: DIGIT ์„ผ์„œ ์„ค๊ณ„

๋น„์ „ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ์˜ ์›๋ฆฌ

๋จผ์ € ์ด ๊ณ„์—ด ์„ผ์„œ๊ฐ€ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€๋ถ€ํ„ฐ ์ดํ•ดํ•˜์ž. ์›๋ฆฌ ์ž์ฒด๋Š” ์•„๋ฆ„๋‹ต๋„๋ก ๋‹จ์ˆœํ•˜๋‹ค.

[Object] --presses--> [Soft Elastomer Gel]
         [Deformed surface reflects light differently]
[RGB Camera inside sensor] --captures--> [Deformation image]

์—˜๋ผ์Šคํ† ๋จธ(ํƒ„์„ฑ ๊ณ ๋ถ„์ž)๋กœ ๋งŒ๋“  ๋ถ€๋“œ๋Ÿฌ์šด ์ ค์ด ์„ผ์„œ ํ‘œ๋ฉด์„ ๋ฎ๊ณ  ์žˆ๋‹ค. ๋ฌผ์ฒด๊ฐ€ ์ด ์ ค์— ์ ‘์ด‰ํ•˜๋ฉด ์ ค ํ‘œ๋ฉด์ด ๋ณ€ํ˜•๋˜๊ณ , ๋‚ด๋ถ€ LED ์กฐ๋ช…์ด ์ด ๋ณ€ํ˜•๋œ ํ‘œ๋ฉด์„ ๋น„์ถ˜๋‹ค. ๋‚ด๋ถ€ ์นด๋ฉ”๋ผ๋Š” ์ด ๋น›์˜ ๋ณ€ํ™”๋ฅผ ์ด๋ฏธ์ง€๋กœ ํฌ์ฐฉํ•œ๋‹ค. ๋ณ€ํ˜• = ์ด๋ฏธ์ง€ ๋ณ€ํ™” = ์ ‘์ด‰ ์ •๋ณด. ์ด๊ฒƒ์ด GelSight ๊ณ„์—ด ์„ผ์„œ๋“ค์˜ ๊ทผ๋ณธ ์›๋ฆฌ๋‹ค.

์ด ๋ฐฉ์‹์˜ ์žฅ์ ์€ ๊ณต๊ฐ„ ํ•ด์ƒ๋„๊ฐ€ ์นด๋ฉ”๋ผ ํ•ด์ƒ๋„์— ์˜ํ•ด์„œ๋งŒ ์ œํ•œ๋œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์นด๋ฉ”๋ผ ํ”ฝ์…€์ด ์ถฉ๋ถ„ํžˆ ์ž‘์œผ๋ฉด ์ˆ˜์‹ญ ๋งˆ์ดํฌ๋กœ๋ฏธํ„ฐ ์ˆ˜์ค€์˜ ํ‘œ๋ฉด ๊ตฌ์กฐ๋„ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค โ€” ๋…ผ๋ฌธ์˜ Fig. 3์ด ๋ณด์—ฌ์ฃผ๋“ฏ, DIGIT๋Š” ์„œ๋ธŒ๋ฐ€๋ฆฌ๋ฏธํ„ฐ ๊ตฌ์กฐ๋ฅผ ์„ ๋ช…ํ•˜๊ฒŒ ํฌ์ฐฉํ•œ๋‹ค.

๊ธฐ๊ณ„์  ์„ค๊ณ„: ์†๊ฐ€๋ฝ ๋์— ๋“ค์–ด๊ฐ€๋Š” ์นด๋ฉ”๋ผ

DIGIT๊ฐ€ ๊ธฐ์กด GelSight ๋Œ€๋น„ ๊ฐ€์žฅ ๊ทน์ ์œผ๋กœ ๊ฐœ์„ ํ•œ ๋ถ€๋ถ„์€ ํผํŒฉํ„ฐ๋‹ค.

์„ผ์„œ ํฌ๊ธฐ (mm) ๋ฌด๊ฒŒ (g) ์„ผ์‹ฑ ๋ฉด์  (mm) ํ•ด์ƒ๋„ FPS ๋ถ€ํ’ˆ ๋น„์šฉ
DIGIT (Ours) 20ร—27ร—18 20 19ร—16 640ร—480 60 $15*
Fingertip GelSight [11] 35ร—60ร—35 NA 18ร—14 1920ร—1080 30 ~$30
GelSlim [12] 50ร—205ร—20 NA 30ร—40 640ร—480 60 NA

1,000๊ฐœ ๋‹จ์œ„ ์ƒ์‚ฐ ๊ธฐ์ค€

GelSight์˜ ๊ธด ์ถ•์ด 205mm์ธ ๋ฐ˜๋ฉด, DIGIT๋Š” 27mm๋‹ค. ์ด ์ฐจ์ด๊ฐ€ ๊ฒฐ์ •์ ์ด๋‹ค. GelSight๋Š” Allegro Hand ๊ฐ™์€ ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ๋กœ๋ด‡ ์†์— ์žฅ์ฐฉ ์ž์ฒด๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. DIGIT๋Š” ์ฒ˜์Œ๋ถ€ํ„ฐ Allegro Hand์˜ ๊ฐ ์†๊ฐ€๋ฝ ๋์— ์žฅ์ฐฉ ๊ฐ€๋Šฅํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค(Fig. 1 ์ฐธ์กฐ).

๊ตฌ์กฐ๋Š” 7๊ฐœ ๋ถ€ํ’ˆ์œผ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค:

A) Elastomer (contact surface)
B) Acrylic window
C) Snap-fit holder
D) Lighting PCB (RGB LEDs)
E) Plastic housing
F) Camera PCB (OVM7692)
G) Back housing

ํ•ต์‹ฌ ์„ค๊ณ„ ์ฒ ํ•™์€ ๋ชจ๋“ˆ์„ฑ๊ณผ press-fit ์กฐ๋ฆฝ์ด๋‹ค. ๋‚˜์‚ฌ๋ฅผ ํ•˜๋‚˜๋งŒ ํ’€๋ฉด ์ ค์„ ๊ต์ฒดํ•  ์ˆ˜ ์žˆ๊ณ , ํ•„์š”์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ์—˜๋ผ์Šคํ† ๋จธ๋ฅผ ๋ผ์šธ ์ˆ˜ ์žˆ๋‹ค:

  • ๋ถˆํˆฌ๋ช… ๋ฐ˜์‚ฌํ˜•: ํ‘œ๋ฉด ํ…์Šค์ฒ˜ยทํ˜•์ƒ ์ธก์ • (๊ธฐ๋ณธ๊ฐ’)
  • ๋งˆ์ปค ์žˆ๋Š” ๋ฐ˜์‚ฌํ˜•: ๊ด‘ํ•™ ํ๋ฆ„(optical flow) ๊ณ„์‚ฐ
  • ๋งˆ์ปค ์žˆ๋Š” ํˆฌ๋ช…ํ˜•: ํŒŒ์ง€ ์ค‘ ์†๊ฐ€๋ฝ ์œ„์น˜ ํ™•์ธ (FingerVision ์Šคํƒ€์ผ)

ํ•˜๋‚˜์˜ ํ•˜๋“œ์›จ์–ด๋กœ ์„ธ ๊ฐ€์ง€ ์šด์šฉ ๋ชจ๋“œ๋ฅผ ์ง€์›ํ•œ๋‹ค๋Š” ์ ์€ ์—ฐ๊ตฌ ํ”Œ๋žซํผ์œผ๋กœ์„œ ๋งค๋ ฅ์ ์ด๋‹ค.

์ „์ž ์„ค๊ณ„: 7cmยฒ์— ๋‹ด์€ ์นด๋ฉ”๋ผ ์‹œ์Šคํ…œ

DIGIT๋Š” ๊ธฐ์„ฑํ’ˆ ์นด๋ฉ”๋ผ ๋ชจ๋“ˆ ๋Œ€์‹  ์ปค์Šคํ…€ PCB๋ฅผ ์„ค๊ณ„ํ–ˆ๋‹ค. ์นด๋ฉ”๋ผ๋กœ๋Š” Omnivision OVM7692๋ฅผ ์ฑ„ํƒํ–ˆ๋Š”๋ฐ, ์ด ์นฉ์€ ์ดˆ์ ๊ฑฐ๋ฆฌ 1.15mm, ์‹ฌ๋„ 30cm์˜ ๋งˆ์ดํฌ๋กœ๋ Œ์ฆˆ ์–ด๋ ˆ์ด๋ฅผ ๋‚ด์žฅํ•ด ๋Œ€๋‹จํžˆ ์งง์€ ๊ฑฐ๋ฆฌ์—์„œ๋„ ์„ ๋ช…ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์–ป๋Š”๋‹ค. ์ „์ฒด ์ „์ž๋ถ€ํ’ˆ์ด ์ฐจ์ง€ํ•˜๋Š” ๋ฉด์ ์€ 7cmยฒ โ€” ์ธ๊ฐ„ ์†๊ฐ€๋ฝ ๋๋ณด๋‹ค ์กฐ๊ธˆ ํด ๋ฟ์ด๋‹ค.

์กฐ๋ช…์€ ์„ธ ๊ฐœ์˜ RGB LED๋กœ ๊ตฌ์„ฑ๋˜์–ด ์—˜๋ผ์Šคํ† ๋จธ ํ‘œ๋ฉด์— ์ตœ๋Œ€ 4๋ฃจ๋ฉ˜์„ ๊ณต๊ธ‰ํ•œ๋‹ค. ์—ฌ๋Ÿฌ DIGIT๋ฅผ ํ•˜๋‚˜์˜ USB ํฌํŠธ์— ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ๋„๋ก SuperSpeed USB 3.0 ํ—ˆ๋ธŒ๋ฅผ PCB์— ํ†ตํ•ฉํ–ˆ๋‹ค. ์ด๋Š” ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ํ•ธ๋“œ ์šด์šฉ์—์„œ ์ค‘์š”ํ•œ ์‹ค์šฉ์  ๊ณ ๋ ค์‚ฌํ•ญ์ด๋‹ค.

์—˜๋ผ์Šคํ† ๋จธ ์„ค๊ณ„: ๋‚ด๊ตฌ์„ฑ์˜ ํ˜์‹ 

๊ธฐ์กด GelSight ๊ณ„์—ด ์„ผ์„œ์˜ ๊ฐ€์žฅ ํฐ ์•ฝ์ ์€ ์ ค์˜ ๋งˆ๋ชจ์˜€๋‹ค. ์ ค ํ‘œ๋ฉด์˜ ๋ถˆํˆฌ๋ช… ์ด๋ฏธ์ง€ ์ „์‚ฌ ๋ ˆ์ด์–ด๊ฐ€ ๋ฐ˜๋ณต ์ ‘์ด‰์œผ๋กœ ์†์ƒ๋˜๋ฉด ์„ผ์„œ ํŠน์„ฑ์ด ๋‹ฌ๋ผ์ง€๊ณ , ์ƒˆ ์ ค๋กœ ๊ต์ฒดํ•˜๋ฉด ์žฌํ›ˆ๋ จ์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

DIGIT์˜ ์ ค ์ œ์กฐ ๊ณต์ •์€ 3๋‹จ๊ณ„๋‹ค:

Step 1: Airbrush silicone-based white pigment into mold
        + chemical kicker -> uniform image transfer layer
Step 2: Apply base layer silicone to finger-shaped mold, cure
Step 3: Remove from mold, glue onto acrylic window
        using Smooth-On Sil-Poxy (optically clear adhesive)
-> Acrylic-gel unit press-fit into DIGIT body

์†Œ์žฌ๋กœ๋Š” ํƒœ์–‘๊ด‘ ํŒจ๋„ ์ฝ”ํŒ…์— ์“ฐ์ด๋Š” Smooth-On Solaris ์‹ค๋ฆฌ์ฝ˜์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ์†Œ์žฌ ์„ ํƒ๊ณผ ์ œ์กฐ ๊ณต์ •์ด ๋‚ด๊ตฌ์„ฑ์— ๊ฒฐ์ •์ ์ธ ์ฐจ์ด๋ฅผ ๋งŒ๋“ ๋‹ค.

์ •๋Ÿ‰์  ๊ฒ€์ฆ ๊ฒฐ๊ณผ๊ฐ€ ์ธ์ƒ์ ์ด๋‹ค. ์—…๊ณ„ ํ‘œ์ค€ ์„ ํ˜• ๋งˆ๋ชจ ์žฅ์น˜(1.7N, H-18 Calibrade ์ค‘๊ฐ„ ๋งˆ๋ชจ ํŒ)๋กœ 5ํšŒ ํŒจ์Šค์”ฉ ์‚ฌ์ดํด์„ ์ง„ํ–‰ํ•˜๋ฉด์„œ ๊ด‘ํˆฌ๊ณผ์œจ ๋ณ€ํ™”(%)๋กœ ๋งˆ๋ชจ๋„๋ฅผ ์ธก์ •ํ–ˆ๋‹ค:

์ ค / ๋งˆ๋ชจ ์‚ฌ์ดํด 5ํšŒ 10ํšŒ 15ํšŒ
DIGIT (Ours) 0% 0.3% 0.3%
Yuan et al. [11] ์ ค 276% 482% 805%
GelSight Inc. ์ ค 475% 662% 918%

๋‹จ 5๋ฒˆ์˜ ํŒจ์Šค ๋งŒ์— ๊ธฐ์กด ์ ค๋“ค์€ ์ฐข์–ด์ง€๊ฑฐ๋‚˜ ํ‘œ๋ฉด ์†Œ์žฌ๊ฐ€ ํƒˆ๋ฝํ•ด ์‚ฌ์šฉ ๋ถˆ๊ฐ€ ์ƒํƒœ๊ฐ€ ๋œ ๋ฐ˜๋ฉด, DIGIT ์ ค์€ 15๋ฒˆ ์‚ฌ์ดํด ํ›„์—๋„ 0.3% ๋ณ€ํ™”์— ๊ทธ์ณค๋‹ค. 1,000๋ฐฐ ์ด์ƒ์˜ ๋‚ด๊ตฌ์„ฑ ์ฐจ์ด๋‹ค.

ํ•œ ๊ฐ€์ง€ trade-off๋ฅผ ์ง€์ ํ•ด๋‘์–ด์•ผ ํ•œ๋‹ค: DIGIT ์ ค์€ ๊ธฐ์กด ์ ค ๋Œ€๋น„ ํˆฌ๊ณผ์œจ์ด ๋†’๋‹ค(676 Lux vs. 17~16 Lux). ์ ค์ด ์•ฝ๊ฐ„ ๋” ๋ฐ˜ํˆฌ๋ช…ํ•˜๋‹ค๋Š” ์˜๋ฏธ์ธ๋ฐ, ์ €์ž๋“ค์€ ์ด๊ฒƒ์ด ์ด‰๊ฐ ์„ผ์‹ฑ ์„ฑ๋Šฅ์— ๋ถ€์ •์  ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Œ์„ ์‹คํ—˜์œผ๋กœ ๋ณด์˜€๋‹ค.


๋ฐฉ๋ฒ• II: ์ด‰๊ฐ ๊ธฐ๋ฐ˜ ์ธ-ํ•ธ๋“œ ์กฐ์ž‘ ํ•™์Šต

DIGIT ์„ผ์„œ ์ž์ฒด์˜ ์„ค๊ณ„๊ฐ€ ๋…ผ๋ฌธ์˜ ์ ˆ๋ฐ˜์ด๋ผ๋ฉด, ๋‚˜๋จธ์ง€ ์ ˆ๋ฐ˜์€ ์ด ์„ผ์„œ๋ฅผ ์‚ฌ์šฉํ•ด ์–ด๋–ป๊ฒŒ ์กฐ์ž‘ ๋Šฅ๋ ฅ์„ ํ•™์Šตํ•˜๋Š”๊ฐ€๋‹ค. ์œ ๋ฆฌ ๊ตฌ์Šฌ์„ ๋‘ ์†๊ฐ€๋ฝ ์‚ฌ์ด์—์„œ ์›ํ•˜๋Š” ์œ„์น˜๋กœ ๊ตด๋ฆฌ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ ํƒœ์Šคํฌ๋‹ค. ์ด ํƒœ์Šคํฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์–ด๋ ค์šด์ง€ ์ƒ๊ฐํ•ด๋ณด๋ผ: ๊ตฌ์Šฌ์€ ์ž‘๊ณ  ๋งค๋„๋Ÿฝ๊ณ , ์ ‘์ด‰๋ฉด์€ ๊ณก๋ฉด์ด๊ณ  ๋ณ€ํ˜•๋˜๋ฉฐ, ๋„ˆ๋ฌด ์„ธ๊ฒŒ ์žก์œผ๋ฉด ํŠ€์–ด๋‚˜๊ฐ€๊ณ  ๋„ˆ๋ฌด ์•ฝํ•˜๋ฉด ๋–จ์–ด์ง„๋‹ค.

์‹œ์Šคํ…œ ํŒŒ์ดํ”„๋ผ์ธ ๊ฐœ์š”

flowchart TD
    A["Raw DIGIT Images\n(left + right finger, 640x480)"] --> B["Keypoint Encoder\n(ResNet-18 mini)"]
    B --> C["K=8 Feature Maps\n-> Active Keypoint k=[x,y,i]"]
    C --> D["State: s = [k_L, k_R, j]\n(14-dimensional)"]
    D --> E["Neural Network\nDynamics Model\nf(s,a) -> s'"]
    E --> F["MPC + CEM Optimizer\n250 particles, horizon T=10\n~120 iterations per step"]
    F --> G["Optimal Action a*_t"]
    G --> H["Allegro Hand\n(8 DOF: 4 joints ร— 2 fingers)"]
    H --> A
    
    style A fill:#2d6a9f,color:#fff
    style D fill:#1a6b3a,color:#fff
    style E fill:#7b3291,color:#fff
    style F fill:#c0392b,color:#fff

์ž๊ธฐ์ง€๋„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

4,800๋ฒˆ์˜ ์‹œํ–‰์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ–ˆ๋‹ค. ๊ฐ ์‹œํ–‰์—์„œ:

  1. ๊ธˆ์† ๋ฐ›์นจ๋Œ€๊ฐ€ ๊ตฌ์Šฌ์„ ๋“ค์–ด์˜ฌ๋ฆฐ๋‹ค
  2. Sawyer ๋กœ๋ด‡ ์•”์ด ์‚ฌ์ „ ํ”„๋กœ๊ทธ๋ž˜๋ฐ๋œ ๋™์ž‘์œผ๋กœ ๊ตฌ์Šฌ์„ ์ง‘๋Š”๋‹ค
  3. 4๊ฐœ ์„œ๋ณด ร— 2์†๊ฐ€๋ฝ = 8์ฐจ์› ํ–‰๋™ ๊ณต๊ฐ„์—์„œ ๋žœ๋ค ๊ฐ๋„ ๋ณ€์œ„ ๋ช…๋ น 20ํšŒ ๋ฐœํ–‰ (~10์ดˆ)
  4. ๊ตฌ์Šฌ์ด ๋–จ์–ด์ง€๋ฉด ๊ทธ๋ฆ‡์— ๋‹ด๊ธฐ๊ณ  ๋ฐ›์นจ๋Œ€๊ฐ€ ๋‹ค์‹œ ๋“ค์–ด์˜ฌ๋ฆฐ๋‹ค

์ „์ฒด ๋ฆฌ์…‹ ์‚ฌ์ดํด์ด ์ž๋™ํ™”๋˜์–ด ์žˆ์–ด ์ธ๊ฐ„ ๊ฐœ์ž… ์—†์ด ์ˆ˜์ฒœ ํšŒ ์ž์œจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ๊ฐ€๋Šฅํ•˜๋‹ค. 950๊ฐœ ์‹œํ–‰์€ ๊ฒ€์ฆ ์„ธํŠธ๋กœ ๋ถ„๋ฆฌํ–ˆ๋‹ค.

ํ‚คํฌ์ธํŠธ ์˜คํ† ์ธ์ฝ”๋”: ์ด๋ฏธ์ง€๋ฅผ 14์ฐจ์›์œผ๋กœ ์••์ถ•ํ•˜๊ธฐ

์ด ๋ถ€๋ถ„์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ ์œผ๋กœ ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ์•„์ด๋””์–ด๋‹ค. 640ร—480 ์ด๋ฏธ์ง€๋ฅผ ์ง์ ‘ ๋‹ค๋ฃจ๋ฉด์„œ ์ˆ˜์‹ญ๋งŒ ๋ฒˆ์˜ ์˜ˆ์ธก์„ ์‹ค์‹œ๊ฐ„์— ๋Œ๋ฆฌ๋Š” ๊ฑด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ์–ด๋–ป๊ฒŒ ํ• ๊นŒ?

ํ•ต์‹ฌ ํ†ต์ฐฐ: ๊ตฌ์Šฌ ์กฐ์ž‘ ํƒœ์Šคํฌ์—์„œ ์‹ค์ œ๋กœ ์ค‘์š”ํ•œ ์ •๋ณด๋Š” ๊ตฌ์Šฌ์ด ์–ด๋””์— ์žˆ๋Š”๊ฐ€ ์™€ ์–ผ๋งˆ๋‚˜ ๋ˆŒ๋ ธ๋Š”๊ฐ€ ๋ฟ์ด๋‹ค. ๋‚˜๋จธ์ง€ ํ”ฝ์…€ ์ •๋ณด๋Š” ์ œ์–ด ๋ชฉ์ ์ƒ ์žก์Œ์ด๋‹ค.

๊ตฌ์กฐ์  ์˜คํ† ์ธ์ฝ”๋”(Structural VRNN [31] ๊ธฐ๋ฐ˜)๊ฐ€ ์ด ์••์ถ•์„ ํ•™์Šตํ•œ๋‹ค:

์ธ์ฝ”๋” ๊ฒฝ๋กœ:

\text{Encoder}(I) \rightarrow \{f_1, f_2, \ldots, f_K\} \quad (K \text{ feature maps})

๊ฐ ํ”ผ์ฒ˜๋งต f_k์—์„œ ํ‚คํฌ์ธํŠธ๋ฅผ ์ถ”์ถœํ•œ๋‹ค:

k_k = [x_k, y_k, i_k]

  • (x_k, y_k): ํ”ผ์ฒ˜๋งต์—์„œ ํ™œ์„ฑํ™”๊ฐ€ ์ตœ๋Œ€์ธ 2D ์œ„์น˜
  • i_k: ํ•ด๋‹น ํ”ผ์ฒ˜๋งต์˜ ํ‰๊ท  ํ™œ์„ฑํ™” ํฌ๊ธฐ (๊ตฌ์Šฌ์ด ์–ผ๋งˆ๋‚˜ ๋ˆŒ๋ ธ๋Š”์ง€)

๋””์ฝ”๋” ๊ฒฝ๋กœ:

๊ฐ ํ‚คํฌ์ธํŠธ (x_k, y_k)์— ๋Œ€ํ•ด ๋นˆ ํ”ผ์ฒ˜๋งต์— ๊ฐ€์šฐ์‹œ์•ˆ ๋ธ”๋กญ์„ ๊ทธ๋ฆฐ๋‹ค. ์ด K๊ฐœ์˜ ํ”ผ์ฒ˜๋งต์„ ๋””์ฝ”๋”์— ์ž…๋ ฅํ•ด ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ์žฌ๊ตฌ์„ฑํ•œ๋‹ค.

์†์‹ค ํ•จ์ˆ˜๋Š” L2 ์ด๋ฏธ์ง€ ์žฌ๊ตฌ์„ฑ ์˜ค์ฐจ + ํ‚คํฌ์ธํŠธ ํฌ์†Œ์„ฑยท๋น„์ค‘๋ณต์„ฑ์„ ๊ฐ•์ œํ•˜๋Š” ๋ณด์กฐ ์†์‹ค์˜ ํ•ฉ์ด๋‹ค:

\mathcal{L} = \mathcal{L}_{\text{reconstruction}} + \lambda \mathcal{L}_{\text{sparsity}} + \mu \mathcal{L}_{\text{separation}}

์‹คํ—˜ ๊ฒฐ๊ณผ K=8๋กœ ์„ค์ •ํ–ˆ์„ ๋•Œ 8๊ฐœ ํ‚คํฌ์ธํŠธ ์ค‘ 7๊ฐœ๋Š” ๋น„ํ™œ์„ฑํ™”๋˜๊ณ , ํ•˜๋‚˜์˜ ํ™œ์„ฑ ํ‚คํฌ์ธํŠธ๋งŒ์ด ๊ตฌ์Šฌ์˜ ์œ„์น˜๋ฅผ ์ •ํ™•ํžˆ ์ถ”์ ํ–ˆ๋‹ค. ๊ฐ•๋„ i๋Š” ๊ตฌ์Šฌ์ด ๊นŠ๊ฒŒ ๋ˆŒ๋ฆด์ˆ˜๋ก ์ฆ๊ฐ€ํ–ˆ๋‹ค. ์ด ์ž๊ธฐ์ง€๋„ ํ•™์Šต์ด ํƒœ์Šคํฌ ๊ด€๋ จ ํ‘œํ˜„์„ ์ž๋™์œผ๋กœ ๋ฐœ๊ฒฌํ•œ ๊ฒƒ์ด๋‹ค.

์ตœ์ข… ์ƒํƒœ ํ‘œํ˜„:

s = [k_L, k_R, j] \in \mathbb{R}^{14}

  • k_L = [x_L, y_L, i_L]: ์™ผ์ชฝ(์—„์ง€) DIGIT ํ‚คํฌ์ธํŠธ
  • k_R = [x_R, y_R, i_R]: ์˜ค๋ฅธ์ชฝ(์ค‘์ง€) DIGIT ํ‚คํฌ์ธํŠธ
  • j \in \mathbb{R}^8: 8๊ฐœ ์„œ๋ณด์˜ ๊ด€์ ˆ ๊ฐ๋„

64ร—64 ์ด๋ฏธ์ง€ ๋‘ ์žฅ(= 8,192์ฐจ์›)์ด 14์ฐจ์›์œผ๋กœ ์••์ถ•๋œ๋‹ค. 585๋ฐฐ ์ฐจ์› ๊ฐ์†Œ๋‹ค.

๋™์—ญํ•™ ๋ชจ๋ธ: Struct-NN

์••์ถ•๋œ ์ƒํƒœ ๊ณต๊ฐ„์—์„œ ๋™์—ญํ•™์„ ํ•™์Šตํ•œ๋‹ค:

s' = f_\theta(s, a)

14์ฐจ์› ์ƒํƒœ s์™€ 8์ฐจ์› ํ–‰๋™ a๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๋‹ค์Œ ์ƒํƒœ s'๋ฅผ ์˜ˆ์ธกํ•˜๋Š” MLP๋‹ค. ํ™˜๊ฒฝ์ด ์™„์ „ ๊ด€์ธก ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ(ํ‚คํฌ์ธํŠธ๊ฐ€ ๊ตฌ์Šฌ ์œ„์น˜๋ฅผ ์™„์ „ํžˆ ๊ธฐ์ˆ ), ๋ณต์žกํ•œ VRNN ๋Œ€์‹  ๊ฐ„๋‹จํ•œ MLP๋กœ ์ถฉ๋ถ„ํ•˜๋‹ค.

ํ›ˆ๋ จ ์‹œ ๋‘ ๊ฐ€์ง€ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์ ์šฉํ•œ๋‹ค:

  • Zero-action ํŠœํ”Œ ์‚ฝ์ž…: (s, 0, s) ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌด์ž‘์œ„ ์‚ฝ์ž…ํ•˜์—ฌ ๋ชจ๋ธ์ด โ€œ์•„๋ฌด๊ฒƒ๋„ ์•ˆ ํ•˜๋ฉด ์ƒํƒœ๊ฐ€ ๋ณ€ํ•˜์ง€ ์•Š๋Š”๋‹คโ€๋Š” ๋ฌผ๋ฆฌ์  ์ƒ์‹์„ ํ•™์Šตํ•˜๊ฒŒ ํ•จ
  • RGB ๊ฐ’ยท๊ฐ๋งˆ ๊ต๋ž€: ์กฐ๋ช… ๋ณ€ํ™”์— ๋Œ€ํ•œ ๊ฐ•์ธ์„ฑ ํ™•๋ณด
๋ชจ๋ธ 1 forward-backward 1 forward MPC 1 step ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜
Struct-NN (Ours) 4.3 ms 1.6 ms 1.4 s 1.2M
CDNA [35] 6.8 ms 2.3 ms 69 s 4M

MPC 1์Šคํ…์—์„œ 50๋ฐฐ ์†๋„ ์ฐจ์ด๊ฐ€ ํ•ต์‹ฌ์ด๋‹ค. CDNA๋Š” 69์ดˆ๊ฐ€ ๊ฑธ๋ ค ์‹ค์‹œ๊ฐ„ ์ œ์–ด์— ์‚ฌ์šฉ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.

๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ œ์–ด: MPC + CEM

ํ•™์Šต๋œ ๋™์—ญํ•™ ๋ชจ๋ธ๋กœ ๋ชจ๋ธ ์˜ˆ์ธก ์ œ์–ด(MPC)๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ๋Š” ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ๋ฐฉ๋ฒ•(CEM)์„ ์‚ฌ์šฉํ•œ๋‹ค.

MPC with CEM (one control step):
  Input: current state s_t, goal keypoint (x_g, y_g, i_g)
  Parameters: 250 particles, horizon T=10, ~120 CEM iterations

  for each CEM iteration:
    sample 250 action sequences {a_t:t+T-1} from current distribution
    for each sequence:
      rollout: s_t+1 = f(s_t, a_t), ..., s_t+T = f(s_t+T-1, a_t+T-1)
      cost = sum_{tau=t}^{t+T} ||[x_tau, y_tau, i_tau] - [x_g, y_g, i_g]||_2
    update distribution from top-K lowest-cost sequences

  Apply a*_t (first action of best sequence) to Allegro Hand

๋น„์šฉ ํ•จ์ˆ˜๋Š” ํ‚คํฌ์ธํŠธ ๊ณต๊ฐ„์—์„œ์˜ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ ํ•ฉ์‚ฐ์ด๋‹ค. (x, y) ํ•ญ์€ ๊ตฌ์Šฌ์„ ๋ชฉํ‘œ ์œ„์น˜๋กœ ์ด๋™์‹œํ‚ค๊ณ , i ํ•ญ์€ ๊ตฌ์Šฌ์„ ๋–จ์–ด๋œจ๋ฆฌ๊ฑฐ๋‚˜(๋‚ฎ์€ i) ๋„ˆ๋ฌด ์„ธ๊ฒŒ ๋ˆ„๋ฅด๋Š”(๋†’์€ i) ํ–‰๋™์„ ์–ต์ œํ•œ๋‹ค. ์šฐ์•„ํ•˜๊ฒŒ ๋‹จ์ˆœํ•œ ๋น„์šฉ ์„ค๊ณ„๋‹ค.

Struct-NN ๋•๋ถ„์— ์ธ์ฝ”๋”๋Š” ์‹ค์ œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด MPC 1์Šคํ…๋‹น ๋‹จ 1๋ฒˆ๋งŒ ํ˜ธ์ถœ๋˜๊ณ , ์ดํ›„ ์ˆ˜์‹ญ๋งŒ ๋ฒˆ์˜ ์˜ˆ์ธก์€ 14์ฐจ์› MLP๋งŒ์œผ๋กœ ์ˆ˜ํ–‰๋œ๋‹ค. ๊ณ„์‚ฐ ๋ณ‘๋ชฉ์„ ์ธ์ฝ”๋”ฉ์—์„œ ๊ณ„ํš(planning)์œผ๋กœ ์ด๋™์‹œํ‚จ ์„ค๊ณ„๋‹ค.


์‹คํ—˜: ๊ฒฐ๊ณผ์™€ ํ•ด์„

๋™์˜์ƒ ์˜ˆ์ธก ๋ชจ๋ธ ์„ฑ๋Šฅ

๋จผ์ € ๋™์—ญํ•™ ๋ชจ๋ธ ์ž์ฒด๋ฅผ ๋ฒค์น˜๋งˆํ‚นํ•œ๋‹ค. BAIR ๋กœ๋ด‡ ํ‘ธ์‹ฑ ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ž์ฒด ๊ตฌ์Šฌ ์กฐ์ž‘ ๋ฐ์ดํ„ฐ์…‹ ๋ชจ๋‘์—์„œ CDNA์™€ ๋น„๊ตํ•œ๋‹ค.

๋ฐ์ดํ„ฐ์…‹ Struct-NN RMSE CDNA RMSE
BAIR ํ‘ธ์‹ฑ 0.06023 0.01082
๊ตฌ์Šฌ ์กฐ์ž‘ 0.00657 0.00028

ํฅ๋ฏธ๋กœ์šด ํŒจํ„ด์ด ๋ณด์ธ๋‹ค. RMSE๋Š” CDNA๊ฐ€ ๋‚ซ์ง€๋งŒ, MPC ์‹ค์ œ ์„ฑ๋Šฅ์—์„œ๋Š” Struct-NN์ด ์šฐ์„ธํ•˜๋‹ค. ์™œ? ์ด๋ฏธ์ง€ ์žฌ๊ตฌ์„ฑ ์˜ค์ฐจ๊ฐ€ ์ œ์–ด ์„ฑ๋Šฅ๊ณผ ์ง๊ฒฐ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. Struct-NN์ด ํฌ์ฐฉํ•˜๋Š” ํ‚คํฌ์ธํŠธ ํ‘œํ˜„์ด ์ œ์–ด์— ์ถฉ๋ถ„ํžˆ ์ข‹์€ ํ‘œํ˜„์ž„์„ ์‹œ์‚ฌํ•œ๋‹ค.

๊ตฌ์Šฌ ์กฐ์ž‘ ์‹คํ—˜

๊ฐ ์‹คํ—˜์€ 50ํšŒ ๋ฐ˜๋ณต์ด๋ฉฐ, ๋ชฉํ‘œ ์œ„์น˜๋Š” ํ˜„์žฌ ์œ„์น˜์—์„œ ์ตœ์†Œ 16ํ”ฝ์…€ ๋–จ์–ด์ง„ ๊ณณ์œผ๋กœ ๋žœ๋ค ์ƒ˜ํ”Œ๋ง๋œ๋‹ค.

๋น„๊ต ๋Œ€์ƒ: ์ˆ˜๋™ ํŠœ๋‹ํ•œ ์„ ํ˜• ๋น„๋ก€(P) ์ œ์–ด๊ธฐ

P ์ œ์–ด๊ธฐ์˜ ์ด๋“ ํ–‰๋ ฌ์€ P \in \mathbb{R}^{3 \times 8}์œผ๋กœ, 3์ฐจ์› ๋ณ€์œ„ ๋ฒกํ„ฐ(ํ‚คํฌ์ธํŠธ ์˜ค์ฐจ)๋ฅผ 8์ฐจ์› ํ–‰๋™์œผ๋กœ ๋งคํ•‘ํ•œ๋‹ค. ์ด ํ–‰๋ ฌ์„ ์ˆ˜์ž‘์—…์œผ๋กœ ํŠœ๋‹ํ•˜๋Š” ๊ฒƒ์ด ์–ผ๋งˆ๋‚˜ ์–ด๋ ค์šด๊ฐ€๋ฅผ ์ƒ๊ฐํ•ด๋ณด๋ผ โ€” 24๊ฐœ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋™์—ญํ•™์ด ๋น„์„ ํ˜•์ธ ์‹œ์Šคํ…œ์—์„œ ์†์œผ๋กœ ๋งž์ถฐ์•ผ ํ•œ๋‹ค.

๊ฒฐ๊ณผ (Fig. 8 ์ฐธ์กฐ):

  • Struct-NN MPC: ํ–‰๋™ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚ ์ˆ˜๋ก ๋ชฉํ‘œ๊นŒ์ง€์˜ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๊ฐ€ ๊พธ์ค€ํžˆ ๊ฐ์†Œ
  • P ์ œ์–ด๊ธฐ: ๊ฑฐ๋ฆฌ๊ฐ€ ์˜คํžˆ๋ ค ์ฆ๊ฐ€ (ํ‰๊ท )
  • ๊ตฌ์Šฌ ๋‚™ํ•˜์œจ: ๋‘ ๋ฐฉ๋ฒ• ๋ชจ๋‘ ์‹œ๊ฐ„์ด ์ง€๋‚ ์ˆ˜๋ก ๋‚™ํ•˜ ์ฆ๊ฐ€, Struct-NN์ด ์ „๋ฐ˜์ ์œผ๋กœ ๋‚ฎ์Œ
  • ์•ฝ 25%์˜ ์‹œํ–‰์—์„œ ๊ตฌ์Šฌ์ด ๋ชฉํ‘œ ๋„๋‹ฌ ์ „ ๋‚™ํ•˜

25% ๋‚™ํ•˜์œจ์ด ๋†’์•„ ๋ณด์ผ ์ˆ˜ ์žˆ์ง€๋งŒ, ์ด ํƒœ์Šคํฌ์˜ ๋‚œ์ด๋„๋ฅผ ๊ฐ์•ˆํ•ด์•ผ ํ•œ๋‹ค: 20g์˜ ์œ ๋ฆฌ ๊ตฌ์Šฌ์„ 6mm ์ง๊ฒฝ ๊ณก๋ฉด ํƒ„์„ฑ ์ ค ๋‘ ๊ฐœ ์‚ฌ์ด์—์„œ ์ •๋ฐ€ ์ œ์–ดํ•˜๋Š” ๊ฒƒ์€ ์ธ๊ฐ„๋„ ์—ฐ์Šต์ด ํ•„์š”ํ•œ ๋™์ž‘์ด๋‹ค. ์ €์ž๋“ค์€ ๋‚ฎ์€ ์ˆ˜์ค€ ์ปจํŠธ๋กค๋Ÿฌ ๊ฐœ์„ ๊ณผ ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์œผ๋กœ ๋‚™ํ•˜์œจ์„ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ง€์ ํ•œ๋‹ค.

P ์ œ์–ด๊ธฐ ์‹คํŒจ์˜ ๊ทผ๋ณธ ์›์ธ์€ ๋™์—ญํ•™์˜ ๋น„์„ ํ˜•์„ฑ์ด๋‹ค. ์†๊ฐ€๋ฝ ์„œ๋ณด ๋ช…๋ น์—์„œ DIGIT ํ‘œ๋ฉด์˜ ์ ‘์„  ๋ฐฉํ–ฅ๊นŒ์ง€์˜ ๋งคํ•‘์€ ์‚ผ๊ฐํ•จ์ˆ˜๋กœ ์ด๋ฃจ์–ด์ง„ ๋ณต์žกํ•œ ๋ณ€ํ™˜์ด๋ฉฐ, ๊ฒŒ๋‹ค๊ฐ€ DIGIT ํ‘œ๋ฉด ์ž์ฒด๊ฐ€ ๊ณก๋ฉด์ด๊ณ  ๋ณ€ํ˜•๋œ๋‹ค. ๋‹จ์ผ ์„ ํ˜• ํ–‰๋ ฌ๋กœ ๋ชจ๋“  ๊ตฌ์„ฑ ๊ณต๊ฐ„์—์„œ ์ตœ์ ์ด๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•˜๋Š” ๊ฒƒ์€ ๋ฌด๋ฆฌ๋‹ค.


์ „์ฒด ์‹œ์Šคํ…œ ํ๋ฆ„๋„

flowchart LR
    subgraph Hardware["Hardware Platform"]
        A1["Sawyer Arm"]
        A2["Allegro Hand\n(4-finger)"]
        A3["DIGIT x2\n(Thumb + Middle)"]
        A1 --> A2 --> A3
    end

    subgraph DataCollection["Self-supervised Data Collection"]
        B1["Random Action\nExploration\n4,800 trials"]
        B2["Auto-reset\nMechanism\n(bowl + platform)"]
        B1 <--> B2
    end

    subgraph Learning["Learning Pipeline"]
        C1["Keypoint\nAutoencoder\n(ResNet-18 mini)"]
        C2["State Compression\n640x480 img x2\n-> 14D vector"]
        C3["MLP Dynamics\nModel f(s,a)->s'"]
        C1 --> C2 --> C3
    end

    subgraph Control["Model Predictive Control"]
        D1["CEM Optimizer\n250 particles\nHorizon T=10"]
        D2["Cost:\nL2 distance\nin keypoint space"]
        D1 --> D2
    end

    Hardware --> DataCollection
    DataCollection --> Learning
    Learning --> Control
    Control --> Hardware


๋น„ํŒ์  ๊ณ ์ฐฐ: ๊ฐ•์ ๊ณผ ํ•œ๊ณ„

๊ฐ•์ 

1. ๊ณตํ•™์  ์™„์„ฑ๋„์™€ ์˜คํ”ˆ์†Œ์Šค ๊ณต๊ฐœ

๋…ผ๋ฌธ์ด ๋‹จ์ˆœํ•œ ํ”„๋กœํ† ํƒ€์ž… ๋ณด๊ณ ์— ๊ทธ์น˜์ง€ ์•Š๊ณ , ๋Œ€๋Ÿ‰ ์ƒ์‚ฐ์„ ๊ณ ๋ คํ•œ ์„ค๊ณ„ ๊ฒฐ์ •(injection molding, press-fit, ํ‘œ์ค€ ๋ถ€ํ’ˆ)๊นŒ์ง€ ์ƒ์„ธํžˆ ๊ธฐ์ˆ ํ•œ๋‹ค. ์„ค๊ณ„๋ฅผ www.digit.ml์— ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœํ•œ ๊ฒƒ์€ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋Œ€ํ•œ ์‹ค์งˆ์  ๊ธฐ์—ฌ๋‹ค. ์‹ค์ œ๋กœ DIGIT๋Š” ์ด ๋…ผ๋ฌธ ์ดํ›„ ์ด‰๊ฐ ์„ผ์‹ฑ ์—ฐ๊ตฌ์˜ ์‚ฌ์‹ค์ƒ์˜ ํ‘œ์ค€ ํ”Œ๋žซํผ ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋˜์—ˆ๋‹ค.

2. ๋‚ด๊ตฌ์„ฑ ๊ฐœ์„ ์˜ ์ •๋Ÿ‰์  ๊ฒ€์ฆ

๋งˆ๋ชจ ํ…Œ์ŠคํŠธ๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ๋น„๊ตํ•œ ๊ฒƒ์€ ๋…ผ๋ฌธ์˜ ์‹ ๋ขฐ๋„๋ฅผ ๋†’์ธ๋‹ค. โ€œ๋” ํŠผํŠผํ•˜๋‹คโ€๋Š” ์ฃผ์žฅ์„ ์ˆ˜์น˜๋กœ ๋’ท๋ฐ›์นจํ–ˆ๋‹ค.

3. ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ํ™•์žฅ์„ฑ

Struct-NN์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ๋Š” ํ‚คํฌ์ธํŠธ ์ถ”์ƒํ™”๋กœ ์ด‰๊ฐ MPC๋ฅผ ๋‹จ์ผ ์„ผ์„œ์—์„œ ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ์„ค์ •์œผ๋กœ ํ™•์žฅํ•œ ๊ฒƒ์ด๋‹ค. CDNA ๋Œ€๋น„ 50ร— ์†๋„ ํ–ฅ์ƒ์€ ์‹ค์šฉ์„ฑ์„ ์œ„ํ•œ ํ•„์ˆ˜์  ๊ฐœ์„ ์ด์—ˆ๋‹ค.

4. ์ž๊ธฐ์ง€๋„ ํ‘œํ˜„ ํ•™์Šต์˜ ํ†ต์ฐฐ

K=8 ํ‚คํฌ์ธํŠธ ์ค‘ 7๊ฐœ๊ฐ€ ๋น„ํ™œ์„ฑํ™”๋˜๊ณ  1๊ฐœ๊ฐ€ ๊ตฌ์Šฌ ์œ„์น˜๋ฅผ ์ •ํ™•ํžˆ ์ถ”์ ํ–ˆ๋‹ค๋Š” ๊ฒฐ๊ณผ๋Š”, ์˜คํ† ์ธ์ฝ”๋”๊ฐ€ ํƒœ์Šคํฌ ๊ด€๋ จ ๊ตฌ์กฐ๋ฅผ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์Šค์Šค๋กœ ๋ฐœ๊ฒฌํ–ˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด๋Š” ์ด‰๊ฐ ๋ฐ์ดํ„ฐ์—์„œ์˜ ๋น„์ง€๋„ ํ‘œํ˜„ ํ•™์Šต ๊ฐ€๋Šฅ์„ฑ์„ ์‹œ์‚ฌํ•˜๋Š” ํฅ๋ฏธ๋กœ์šด ๊ด€์ฐฐ์ด๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

1. ํƒœ์Šคํฌ์˜ ์ œํ•œ์  ๋ฒ”์œ„

์œ ๋ฆฌ ๊ตฌ์Šฌ ํ•˜๋‚˜๋ฅผ ๋‘ ์†๊ฐ€๋ฝ ์‚ฌ์ด์—์„œ ๊ตด๋ฆฌ๋Š” ๊ฒƒ์€ ์ธ-ํ•ธ๋“œ ์กฐ์ž‘์˜ ๊ทนํžˆ ์ผ๋ถ€๋‹ค. ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด, ๋‹ค์–‘ํ•œ ๊ทธ๋ฆฝ, ๋‹ค์–‘ํ•œ ๋™์ž‘์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™”๋Š” ๊ฒ€์ฆ๋˜์ง€ ์•Š์•˜๋‹ค. ๊ตฌ์Šฌ์ด๋ผ๋Š” ํƒœ์Šคํฌ๊ฐ€ ํ‚คํฌ์ธํŠธ ํ‘œํ˜„์— ํŠนํžˆ ์œ ๋ฆฌํ•˜๊ฒŒ ์ž‘์šฉํ–ˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค(๊ตฌํ˜•์ด๋ผ ํ•˜๋‚˜์˜ (x,y,i)๋กœ ์™„์ „ํžˆ ๊ธฐ์ˆ  ๊ฐ€๋Šฅ).

2. 25% ๋‚™ํ•˜์œจ

ํƒœ์Šคํฌ์˜ ๋‚œ์ด๋„๋ฅผ ๊ฐ์•ˆํ•˜๋”๋ผ๋„, 4๋ฒˆ ์ค‘ 1๋ฒˆ ์‹คํŒจ๋Š” ์‹ค์šฉ์  ๋ฐฐ์น˜์—๋Š” ๋ถ€์กฑํ•˜๋‹ค. ์ €์ž๋“ค ์Šค์Šค๋กœ ์ด๋ฅผ ์ธ์ •ํ•˜๊ณ  ํ–ฅํ›„ ๊ณผ์ œ๋กœ ๋‚จ๊ฒจ๋‘์—ˆ์ง€๋งŒ, ํ˜„ ์‹œ์Šคํ…œ์˜ ์™„์„ฑ๋„๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ์ง€ํ‘œ์ด๊ธฐ๋„ ํ•˜๋‹ค.

3. ์ด‰๊ฐ ์ด๋ฏธ์ง€ ํ•ด์„์˜ ๊นŠ์ด ๋ถ€์žฌ

๋…ผ๋ฌธ์€ ์›์‹œ ์ด‰๊ฐ ์ด๋ฏธ์ง€๋ฅผ ์ง์ ‘ ํ•ด์„ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ํ‚คํฌ์ธํŠธ๋กœ ์••์ถ•ํ•ด ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋Š” ๊ณ„์‚ฐ ํšจ์œจ์„ ์œ„ํ•œ ํ•ฉ๋ฆฌ์  ์„ ํƒ์ด์ง€๋งŒ, ์„ผ์„œ ์ž์ฒด๊ฐ€ ์ œ๊ณตํ•˜๋Š” ํ’๋ถ€ํ•œ ์ •๋ณด(ํ‘œ๋ฉด ํ…์Šค์ฒ˜, ํž˜ ๋ถ„ํฌ, ๋ณ€ํ˜• ํŒจํ„ด)๋ฅผ ๋Œ€๋ถ€๋ถ„ ๋ฒ„๋ฆฌ๋Š” ๊ฒƒ์ด๊ธฐ๋„ ํ•˜๋‹ค.

4. ๋‹จ์ผ ํƒœ์Šคํฌ์— ํŠนํ™”๋œ ํŒŒ์ดํ”„๋ผ์ธ

ํ‚คํฌ์ธํŠธ ์˜คํ† ์ธ์ฝ”๋”์™€ MPC ๋น„์šฉ ํ•จ์ˆ˜๋Š” ๊ตฌ์Šฌ ์œ„์น˜ ์ถ”์ ์— ํŠนํ™”๋˜์–ด ์žˆ๋‹ค. ์ƒˆ๋กœ์šด ํƒœ์Šคํฌ์— ์ ์šฉํ•˜๋ ค๋ฉด ํŒŒ์ดํ”„๋ผ์ธ ์ „์ฒด๋ฅผ ์žฌ์„ค๊ณ„ํ•ด์•ผ ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค. ํƒœ์Šคํฌ-๋…๋ฆฝ์  ์ด‰๊ฐ ํ‘œํ˜„์„ ์œ„ํ•œ ๋ณด๋‹ค ๋ฒ”์šฉ์ ์ธ ์ ‘๊ทผ์ด ํ•„์š”ํ•˜๋‹ค.

5. ์„ผ์„œ ๊ฐ„ ์žฌํ˜„์„ฑ ๋ฏธ๊ฒ€์ฆ

์ €์ž๋“ค์€ ๋Œ€๋Ÿ‰ ์ƒ์‚ฐ ์žฌํ˜„์„ฑ์„ ๊ฐ•์กฐํ•˜์ง€๋งŒ, ์‹ค์ œ๋กœ ์—ฌ๋Ÿฌ DIGIT ์œ ๋‹› ๊ฐ„์˜ ๊ต์ฒด ๊ฐ€๋Šฅ์„ฑ(Sensor-to-sensor consistency)์„ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•˜์ง€๋Š” ์•Š์•˜๋‹ค. ์ด‰๊ฐ ์„ผ์„œ์—์„œ ๊ฐœ๋ณ„ ์ ค์˜ ํŠน์„ฑ ํŽธ์ฐจ๋Š” ์‹ค์šฉ์ ์œผ๋กœ ์ค‘์š”ํ•œ ๋ฌธ์ œ๋‹ค.


๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

graph TD
    A["Vision-based Tactile Sensors"] --> B["TacTip Family\n[13,14]\nMarker pins, low resolution"]
    A --> C["FingerVision [10]\nTransparent gel, dual-use\nbut lower tactile resolution"]
    A --> D["GelSight [11]\nHigh res, bulky\n35x60x35mm"]
    A --> E["GelSlim [12]\nSlimmer but 50x205mm\nAlegro-incompatible"]
    A --> F["DIGIT (This work)\n20x27x18mm\nAllegro-compatible\n$15/unit"]

    G["Tactile Control Methods"] --> H["tactile-MPC [17]\nSingle sensor, 3-DOF\nCDNA-based, slow"]
    G --> I["DIGIT + Struct-NN\nDual sensor, 8-DOF\n50x faster MPC"]
    G --> J["OpenAI Dexterous\nManipulation [26]\nNo tactile, many cameras"]

    style F fill:#2196F3,color:#fff
    style I fill:#2196F3,color:#fff

DIGIT์˜ ์ง์ ‘์  ์„ ์กฐ๋Š” GelSight[11]์™€ GelSlim[12]์ด๋‹ค. GelSight๋Š” ์„ฑ๋Šฅ์€ ๋›ฐ์–ด๋‚˜์ง€๋งŒ ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ํ•ธ๋“œ ์žฅ์ฐฉ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. GelSlim์€ ๋” ๋‚ฉ์ž‘ํ•˜์ง€๋งŒ ๊ธธ์ด๊ฐ€ 205mm๋กœ ์†๊ฐ€๋ฝ ๋์—๋Š” ๋งž์ง€ ์•Š๋Š”๋‹ค. DIGIT๋Š” ์ด ๋‘ ์„ผ์„œ๊ฐ€ ์—ด์ง€ ๋ชปํ•œ ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ๊ณ ํ•ด์ƒ๋„ ์ด‰๊ฐ ์กฐ์ž‘์˜ ๋ฌธ์„ ์ฒ˜์Œ ์—ด์—ˆ๋‹ค.

์ œ์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ธก๋ฉด์—์„œ tactile-MPC[17]๋Š” ์ง์ ‘์  ์ „์‹ ์ด๋‹ค. DIGIT ๋…ผ๋ฌธ์€ ์ด๋ฅผ ๋‹จ์ผ ์„ผ์„œ 3-DOF ์„ค์ •์—์„œ ์ด์ค‘ ์„ผ์„œ 8-DOF ์„ค์ •์œผ๋กœ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์ด ์™œ ์–ด๋ ค์šด์ง€(๊ณ„์‚ฐ ๋น„์šฉ), ๊ทธ๋ฆฌ๊ณ  Struct-NN์ด ์–ด๋–ป๊ฒŒ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š”์ง€๋ฅผ ์„ค๋ช…ํ•œ๋‹ค.

OpenAI์˜ Dexterous In-Hand Manipulation[26]๊ณผ ๋น„๊ตํ•˜๋ฉด ํฅ๋ฏธ๋กญ๋‹ค. OpenAI๋Š” ์ด‰๊ฐ ์—†์ด ์ˆ˜์‹ญ ๋Œ€์˜ ์ถ”์  ์นด๋ฉ”๋ผ๋กœ ์†๊ฐ€๋ฝ ์ƒํƒœ๋ฅผ ์ถ”์ •ํ•˜๋Š” ์ ‘๊ทผ์„ ํƒํ–ˆ๋‹ค. DIGIT๋Š” ๋ฐ˜๋Œ€๋กœ ์ด‰๊ฐ์—์„œ ์ง์ ‘ ์ƒํƒœ๋ฅผ ์–ป์–ด ์นด๋ฉ”๋ผ ๊ธฐ๋ฐ˜ ์ถ”์ ์˜ ์˜์กด์„ฑ์„ ์ค„์ธ๋‹ค. ๋‘ ์ ‘๊ทผ ๋ชจ๋‘ ๊ฐ์ž์˜ ์žฅ๋‹จ์ ์ด ์žˆ๋‹ค.


Allegro Hand ์—ฐ๊ตฌ์ž๋ฅผ ์œ„ํ•œ ์‹œ์‚ฌ์ 

๋…ผ๋ฌธ์˜ Fig. 1์€ DIGIT๊ฐ€ Allegro Hand์— ์žฅ์ฐฉ๋œ ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค. Allegro Hand๋ฅผ ํ”Œ๋žซํผ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์—ฐ๊ตฌ์ž์—๊ฒŒ DIGIT๋Š” ๋ช‡ ๊ฐ€์ง€ ๊ตฌ์ฒด์ ์ธ ํ•จ์˜๋ฅผ ๊ฐ–๋Š”๋‹ค.

๊ธ์ •์  ์ธก๋ฉด:

  • Allegro Hand์˜ ์†๊ฐ€๋ฝ ๋ ์น˜์ˆ˜(~20mm ํญ)์— DIGIT๊ฐ€ ์ •ํ™•ํžˆ ๋งž๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค
  • USB 3.0 ํ—ˆ๋ธŒ ํ†ตํ•ฉ์œผ๋กœ 4๊ฐœ ์†๊ฐ€๋ฝ ๋ชจ๋‘์— DIGIT๋ฅผ ์žฅ์ฐฉํ•ด๋„ ์ผ€์ด๋ธ” ๊ด€๋ฆฌ๊ฐ€ ๋‹จ์ˆœํ•˜๋‹ค
  • $15/์œ ๋‹›์˜ ๊ฐ€๊ฒฉ์€ Allegro Hand ์‚ฌ์šฉ์ž๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ ๊ตฌ๋น„ํ•˜๋Š” ๊ฒƒ์„ ์‹ค์šฉ์ ์œผ๋กœ ๋งŒ๋“ ๋‹ค
  • ์ ค ๊ต์ฒด ์šฉ์ด์„ฑ โ†’ ์‹คํ—˜ ์ค‘ ์ ค ์†์ƒ ์‹œ ๋น ๋ฅธ ๋ณต๊ตฌ ๊ฐ€๋Šฅ

๊ณ ๋ ค์‚ฌํ•ญ:

  • ๊ด€์ ˆ ์ปจํŠธ๋กค๋Ÿฌ ๋…ธ์ด์ฆˆ(์ €์ž๋“ค์ด 25% ๋‚™ํ•˜์œจ ์›์ธ ์ค‘ ํ•˜๋‚˜๋กœ ์ง€๋ชฉ)๋Š” Allegro Hand์˜ ๊ณ ์งˆ์  ๋ฌธ์ œ๋‹ค. ์ด‰๊ฐ ์ œ์–ด๊ฐ€ Allegro Hand์˜ ๋‚ฎ์€ ์ปจํŠธ๋กค ์ •๋ฐ€๋„์™€ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ๋ฐฉ์‹์„ ์ฃผ์˜ํ•ด์•ผ ํ•œ๋‹ค
  • ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ๋™์‹œ ์ด‰๊ฐ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ โ†’ USB ๋Œ€์—ญํญ๊ณผ ํ˜ธ์ŠคํŠธ CPU ๋ถ€ํ•˜๋ฅผ ์‚ฌ์ „์— ๊ฒ€ํ† ํ•  ๊ฒƒ
  • DIGIT๋Š” ํ‰๋ฉด ์ ‘์ด‰๋ฉด์„ ๊ฐ€์ •ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋Š”๋ฐ, Allegro Hand์˜ ์†๊ฐ€๋ฝ ๋์€ ๊ณก๋ฉด์ด๋ฏ€๋กœ ์žฅ์ฐฉ ์ธํ„ฐํŽ˜์ด์Šค ์„ค๊ณ„๊ฐ€ ํ•„์š”ํ•˜๋‹ค

์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

DIGIT๋Š” ๋‘ ๊ฐ€์ง€๋ฅผ ๋™์‹œ์— ํ•ด๋ƒˆ๋‹ค๋Š” ์ ์—์„œ ๋กœ๋ด‡๊ณตํ•™ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๊ฐ€์น˜ ์žˆ๋Š” ๊ธฐ์—ฌ๋‹ค.

ํ•˜๋“œ์›จ์–ด ์ธก๋ฉด: ๊ณ ํ•ด์ƒ๋„ ์ด‰๊ฐ ์„ผ์‹ฑ์„ ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ํ•ธ๋“œ์—์„œ ์‹ค์šฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“  ํผํŒฉํ„ฐ์˜ ์†Œํ˜•ํ™”. ์ œ์กฐ ๋น„์šฉ($15)๊ณผ ๋‚ด๊ตฌ์„ฑ(๊ธฐ์กด ๋Œ€๋น„ 1,000ร—+) ๊ฐœ์„ ์€ ์‹คํ—˜์‹ค ํ”„๋กœํ† ํƒ€์ž…์„ ๋„˜์–ด ์—ฐ๊ตฌ ํ”Œ๋žซํผ์œผ๋กœ์„œ์˜ ์ง€์† ๊ฐ€๋Šฅ์„ฑ์„ ์˜๋ฏธํ•œ๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ธก๋ฉด: ํ‚คํฌ์ธํŠธ ์˜คํ† ์ธ์ฝ”๋”๋ฅผ ํ†ตํ•œ ๊ณ ์ฐจ์› ์ด‰๊ฐ ์ด๋ฏธ์ง€์˜ ํƒœ์Šคํฌ-๊ด€๋ จ ์ €์ฐจ์› ํ‘œํ˜„ ์••์ถ•, ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ํ†ตํ•œ ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ์ด‰๊ฐ MPC์˜ ์‹ค์šฉ์  ๊ตฌํ˜„. 50ร— ์†๋„ ํ–ฅ์ƒ์ด ๋‹จ์ˆœํ•œ ์—”์ง€๋‹ˆ์–ด๋ง ํŠธ๋ฆญ์ด ์•„๋‹ˆ๋ผ ์‹œ์Šคํ…œ์„ ์‹ค์‹œ๊ฐ„ ์ œ์–ด ๊ฐ€๋Šฅ/๋ถˆ๊ฐ€๋Šฅ์œผ๋กœ ๊ฐ€๋ฅด๋Š” ์งˆ์  ์ฐจ์ด๋ฅผ ๋งŒ๋“ ๋‹ค.

ํ•œ๊ณ„๋„ ๋ช…ํ™•ํ•˜๋‹ค: ๋‹จ์ผ ํƒœ์Šคํฌ ๊ฒ€์ฆ, 25% ๋‚™ํ•˜์œจ, ๋ฒ”์šฉ ์ด‰๊ฐ ํ‘œํ˜„ ๋ถ€์žฌ. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๋…ผ๋ฌธ์ด ์—ด์–ด๋†“์€ ๋ฐฉํ–ฅโ€”๊ณ ํ•ด์ƒ๋„ ์ด‰๊ฐ + ๋ฉ€ํ‹ฐํ•‘๊ฑฐ + ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ดโ€”์€ ์ดํ›„ ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ๋”ฐ๋ผ๊ฐ€๊ฒŒ ๋  ๊ธธ์ด๋‹ค.

์ด‰๊ฐ ์„ผ์‹ฑ์ด ๋กœ๋ด‡ ์กฐ์ž‘์˜ ๋ณด์กฐ ์ˆ˜๋‹จ์ด ์•„๋‹Œ ํ•ต์‹ฌ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๋กœ ์ž๋ฆฌ ์žก๊ธฐ ์œ„ํ•œ ํ† ๋Œ€ ์ž‘์—…์œผ๋กœ์„œ, DIGIT๋Š” ์‹œ๊ธฐ์ ์ ˆํ•˜๊ณ  ์ž˜ ์‹คํ–‰๋œ ์—ฐ๊ตฌ๋‹ค.


์ฐธ๊ณ ๋ฌธํ—Œ (์ฃผ์š”)

  • [11] Yuan et al., โ€œGelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force,โ€ Sensors, 2017
  • [12] Donlon et al., โ€œGelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-Sensing Finger,โ€ IROS, 2018
  • [17] Tian et al., โ€œManipulation by Feel: Touch-Based Control with Deep Predictive Models,โ€ ICRA, 2019
  • [31] Minderer et al., โ€œUnsupervised Learning of Object Structure and Dynamics from Videos,โ€ NeurIPS, 2019
  • [35] Finn et al., โ€œUnsupervised Learning for Physical Interaction through Video Prediction,โ€ NeurIPS, 2016

Copyright 2026, JungYeon Lee