Curieux.JY
  • JungYeon Lee
  • Post
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ํ•œ๋ˆˆ์— ๋ณด๋Š” ํ•ต์‹ฌ
    • ๋ฌธ์ œ: ์ธํ•ธ๋“œ ํšŒ์ „์˜ sim-to-real, ์™œ ์•„์ง๋„ ํ’€๋ฆฌ์ง€ ์•Š๋Š”๊ฐ€
    • ์ฒซ ๋ฒˆ์งธ ํ†ต์ฐฐ: ๋™์—ญํ•™์„ ๊ด€์ ˆ ๋‹จ์œ„๋กœ ๋ถ„ํ•ดํ•œ๋‹ค
    • ๋‘ ๋ฒˆ์งธ ํ†ต์ฐฐ: ์ •๋ณด ์••์ถ•์ด ์ผ๋ฐ˜ํ™” ๊ฒฉ์ฐจ๋ฅผ ์ค„์ธ๋‹ค
    • ์„ธ ๋ฒˆ์งธ ํ†ต์ฐฐ: Chaos Box๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฑฐ์ € ๋ชจ์€๋‹ค
    • ๋„ค ๋ฒˆ์งธ ํ†ต์ฐฐ: ๋ฒ ์ด์Šค ์ •์ฑ…์€ ๊ฑด๋“œ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค (residual policy)
    • ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ
    • ์‹คํ—˜: ๋ฌด์—‡์„, ์–ด๋–ป๊ฒŒ, ์™œ
      • ์„ค์ •
      • ์ฃผ์š” ๊ฒฐ๊ณผ
      • Ablation ๋ถ„์„ ์š”์•ฝ
      • ์‘์šฉ: Teleoperation
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
      • ๊ฐ•์ 
      • ์•ฝ์ ๊ณผ ํ•œ๊ณ„
      • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ์œ„์น˜ ์ง“๊ธฐ
    • Allegro Hand ์—ฐ๊ตฌ์ž๊ฐ€ ๊ฐ€์ ธ๊ฐˆ ๋งŒํ•œ ์ธ์‚ฌ์ดํŠธ
    • ๋งˆ์น˜๋ฉฐ

๐Ÿ“ƒDexNDM ๋ฆฌ๋ทฐ

dexterous manipulation
sim2real
Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
Published

March 21, 2026

  • Paper Link

  • Project Link

  • Video

  • Xueyi Liu, He Wang, Li Yi

  1. ๐Ÿค– ๋ณธ ์—ฐ๊ตฌ๋Š” sim-to-real reality gap์œผ๋กœ ์ธํ•ด ์–ด๋ ค์›€์ด ํฐ dexterous in-hand rotation์—์„œ ์ „๋ก€ ์—†๋Š” ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๋‹ฌ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿฆพ ์ด๋ฅผ ์œ„ํ•ด, limited real-world data๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•˜๊ณ  sim policy์˜ actions๋ฅผ ์กฐ์ •ํ•˜๋Š” joint-wise neural dynamics model๊ณผ autonomous data collection ์ „๋žต์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  3. โœจ DexNDM์€ ๋‹จ์ผ policy๋กœ ๋ณต์žกํ•œ ํ˜•์ƒ, ๋†’์€ aspect ratio, ๋‹ค์–‘ํ•œ wrist orientation์„ ๊ฐ€์ง„ ๋ฌผ์ฒด๋ฅผ ํ˜„์‹ค ์„ธ๊ณ„์—์„œ ์„ฑ๊ณต์ ์œผ๋กœ ์กฐ์ž‘ํ•˜์—ฌ, teleoperation๊ณผ ๊ฐ™์€ complex dexterous tasks๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

DEXNDM: CLOSING THE REALITY GAP FOR DEXTEROUS IN-HAND ROTATION VIA JOINT-WISENEURAL DYNAMICS MODEL ๋…ผ๋ฌธ์€ dexterous in-hand rotation์—์„œ ๋ฐœ์ƒํ•˜๋Š” sim-to-real gap์„ ์ขํžˆ๊ธฐ ์œ„ํ•ด joint-wise neural dynamics model์„ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์ธ DexNDM์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

๋กœ๋ด‡ ๊ณตํ•™์—์„œ ์ผ๋ฐ˜ํ™”๋œ ์†์•ˆ ๊ฐ์ฒด ํšŒ์ „์„ ๋‹ฌ์„ฑํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ ์ค‘๋Œ€ํ•œ ๋„์ „ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ๋‹จ์ˆœํ•œ ๊ธฐํ•˜ํ•™์  ํ˜•ํƒœ, ์ œํ•œ๋œ ๊ฐ์ฒด ํฌ๊ธฐ, ๊ณ ์ •๋œ ์†๋ชฉ ์ž์„ธ, ๋งž์ถคํ˜• ํ•˜๋“œ์›จ์–ด ๋“ฑ ์ œ์•ฝ๋œ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๊ตญํ•œ๋˜์–ด ์™”์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„์˜ ์ฃผ๋œ ์›์ธ์€ ๋ณต์žกํ•˜๊ณ  ์ ‘์ด‰์ด ๋งŽ์€ ๋™์—ญํ•™์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” โ€œํ˜„์‹ค-์‹œ๋ฎฌ๋ ˆ์ด์…˜(sim-to-real) ๊ฐ„๊ทนโ€์ž…๋‹ˆ๋‹ค. ํŠนํžˆ, dexterous manipulation์—์„œ๋Š” ๋ถ„ํฌ ๊ด€๋ จ ๋ฐ์ดํ„ฐ์˜ ๋Œ€๋Ÿ‰ ์ˆ˜์ง‘์ด ์–ด๋ ต๊ณ , ์น˜๋ช…์ ์ธ ์‹คํŒจ(์˜ˆ: ๊ฐ์ฒด ๋‚™ํ•˜) ์‹œ ๋นˆ๋ฒˆํ•œ ์ธ๊ฐ„ ๊ฐœ์ž…์ด ํ•„์š”ํ•˜๋ฉฐ, ์†์œผ๋กœ ์ธํ•œ ๊ฐ€๋ฆผ์œผ๋กœ ๊ฐ์ฒด ์ƒํƒœ ์ถ”์ •์ด ์–ด๋ ต๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

DexNDM์€ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ์„ค๊ณ„๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.

  1. ์ „๋ฌธ๊ฐ€-์ผ๋ฐ˜์ฃผ์˜์ž(Specialist-to-Generalist) ์ •์ฑ… ํ›ˆ๋ จ: ๋จผ์ €, ๋‹ค์–‘ํ•œ ๊ฐ์ฒด ์นดํ…Œ๊ณ ๋ฆฌ(์›ํ†ต, ์ง์œก๋ฉด์ฒด, ๋ณต์žกํ•œ ํ˜•์ƒ ๋“ฑ)์— ๊ฑธ์ณ RL(Reinforcement Learning)์„ ํ†ตํ•ด oracle policy๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ์ด oracle policy๋“ค์€ ํ’๋ถ€ํ•œ privileged observation์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„, ์„ฑ๊ณต์ ์ธ oracle ๊ถค์ ๋งŒ์„ ์ง‘๊ณ„ํ•˜์—ฌ Behavior Cloning (BC)์„ ํ†ตํ•ด ๋‹จ์ผ generalist policy๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. generalist policy์˜ ๊ด€์ธก์น˜ o_{gen_t}๋Š” proprioception history, ์†๋ชฉ ๋ฐฉํ–ฅ, ํšŒ์ „ ์ถ• ์ •๋ณด๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ์–ด๋ ค์šด ์ž‘์—…์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ตœ์ ํ™” ์‹คํŒจ๋‚˜ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ”ผํ•˜๋ฉด์„œ ๋†’์€ ํ’ˆ์งˆ์˜ oracle behavior๋ฅผ ๋ชจ๋ฐฉํ•˜์—ฌ ์‹ค์ œ ํ™˜๊ฒฝ์— ๋ฐฐํฌ ๊ฐ€๋Šฅํ•œ ์ •์ฑ…์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  2. ์กฐ์ธํŠธ๋ณ„ ์‹ ๊ฒฝ ๋™์—ญํ•™ ๋ชจ๋ธ (Joint-Wise Neural Dynamics Model): ์ด ๋ชจ๋ธ์€ ํ˜„์‹ค-์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฐ„๊ทน์„ ๋ฉ”์šฐ๋Š” ํ•ต์‹ฌ ์š”์†Œ์ž…๋‹ˆ๋‹ค.
    • ๋ชจ๋ธ ์„ค๊ณ„: ๊ธฐ์กด์˜ โ€œ์ „์ฒด ์†(whole-hand)โ€ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ, ๊ฐ ์กฐ์ธํŠธ i์˜ ๋™์—ญํ•™์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์กฐ์ธํŠธ์˜ ๋‹ค์Œ ์ƒํƒœ q^i_{t+1}๋Š” ์˜ค์ง ํ•ด๋‹น ์กฐ์ธํŠธ์˜ W ์Šคํ… ์ƒํƒœ-์•ก์…˜ ์ด๋ ฅ h^i_t = \{q^i_j, a^i_j\}_{j=t-W+1}^t๋กœ๋ถ€ํ„ฐ ์˜ˆ์ธก๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” q^i_{t+1} = f_{\psi^i}(h^i_t)์™€ ๊ฐ™์ด ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. ์ด ์„ค๊ณ„๋Š” ๊ณ ์ฐจ์›์ ์ธ ์‹œ์Šคํ…œ ์ „๋ฐ˜์˜ ์˜ํ–ฅ(์˜ˆ: ์กฐ์ธํŠธ ๊ฐ„ ์ปคํ”Œ๋ง, ์ž‘๋™, ๊ฐ์ฒด ์œ ๋ฐœ ํšจ๊ณผ)์„ ์ €์ฐจ์›์˜ โ€œ์œ ํšจํ•œ(effective)โ€ ๋ณ€์ˆ˜๋กœ ์ฆ๋ฅ˜ํ•˜์—ฌ ๊ฐ ์กฐ์ธํŠธ์˜ ๋™์—ญํ•™์  ํ”„๋กœํ•„๋กœ๋ถ€ํ„ฐ ๊ทธ ์ง„ํ™”๋ฅผ ์•”์‹œ์ ์œผ๋กœ ํฌ์ฐฉํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋ก ์  ๊ทผ๊ฑฐ (์ •๋ณด ์ˆ˜์ถ•์„ ํ†ตํ•œ ์ผ๋ฐ˜ํ™”): ์ด ๋ชจ๋ธ์˜ ํ•ต์‹ฌ ๊ฐ•์ ์€ ์ •๋ณด ์ˆ˜์ถ•(Information Contraction)์„ ํ†ตํ•ด ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
      • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ถ€๋“ฑ์‹ (Data Processing Inequality for KL divergence, Theorem 3.1): ์ „์ฒด ์‹œ์Šคํ…œ ์ƒํƒœ X = H_t์™€ ์กฐ์ธํŠธ๋ณ„ ์ƒํƒœ Y = h^i_t ๊ฐ„์˜ ๋งคํ•‘ g: X \to Y๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, KL(P\|Q) \ge KL(g(P)\|g(Q))์ด ์„ฑ๋ฆฝํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ P๋Š” ์‹ค์ œ ํ™˜๊ฒฝ ๋ถ„ํฌ, Q๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋˜๋Š” ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ, g๊ฐ€ P์™€ Q๊ฐ€ ๋‹ค๋ฅธ ์ƒ๋Œ€์  ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋Š” ์ง€์ ๋“ค์„ ๋ณ‘ํ•ฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋น„๋‹จ์‚ฌ์ (non-injective)์ด๋ฉด, ์ด ๋ถ€๋“ฑ์‹์€ ์—„๋ฐ€ํ•˜๊ฒŒ ์„ฑ๋ฆฝํ•ฉ๋‹ˆ๋‹ค (>). ์ด๋Š” ๊ณ ์ฐจ์› ์ •๋ณด๋ฅผ ์ €์ฐจ์›์œผ๋กœ ์ถ•์†Œํ•  ๋•Œ, ๋‘ ๋ถ„ํฌ ๊ฐ„์˜ KL ๋ฐœ์‚ฐ์ด ์ค„์–ด๋“ค์–ด ๋ถ„ํฌ ๋ณ€ํ™”(distribution shift)๊ฐ€ ์™„ํ™”๋จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
      • ์ผ๋ฐ˜ํ™” ๊ฐ„๊ทน ์ˆ˜์ถ• (Generalization Gap Contraction, Theorem 3.2): KL(g(P)\|g(Q)) < KL(P\|Q)์ธ ๊ฒฝ์šฐ, ์กฐ์ธํŠธ๋ณ„ ๋ชจ๋ธ f_2 \circ g_X์˜ generalization gap์ด ์ „์ฒด ์† ๋ชจ๋ธ f_1์˜ generalization gap๋ณด๋‹ค ์ž‘์•„์ง‘๋‹ˆ๋‹ค. ์ฆ‰, ์ถ•์†Œ๋œ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์ด ํ˜„์‹ค-์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฐ„๊ทน๊ณผ ๊ฐ™์€ ๋ถ„ํฌ ๋ณ€ํ™” ์ƒํ™ฉ์—์„œ ๋” ์ž˜ ์ผ๋ฐ˜ํ™”๋ฉ๋‹ˆ๋‹ค.
    • ์ž์œจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ (Autonomous Data Collection): โ€œ์นด์˜ค์Šค ๋ฐ•์Šค(Chaos Box)โ€๋ผ๋Š” ์ €๋น„์šฉ์˜ ์ž์œจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์ „๋žต์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋กœ๋ด‡ ์†์€ ์†Œํ”„ํŠธ๋ณผ์ด ๊ฐ€๋“ ์ฐฌ ์ปจํ…Œ์ด๋„ˆ์— ๋ฐฐ์น˜๋˜๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ›ˆ๋ จ๋œ ๊ธฐ๋ณธ ์ •์ฑ…์˜ ์•ก์…˜์„ open-loop์œผ๋กœ ์žฌ์ƒํ•˜๊ณ  ๊ฐ ์•ก์…˜์— ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ(\sigma=0.01)๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋ฌด์ž‘์œ„ ๋ถ€ํ•˜(randomized loads)๋ฅผ ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ์™„์ „ํžˆ ์ž์œจ์ ์ด๊ณ  ํ•˜๋“œ์›จ์–ด ์•ˆ์ „ํ•˜๋ฉฐ, ๊ฐ์ฒด ๋‚™ํ•˜ ์‹œ์˜ ์ธ๊ฐ„ ๊ฐœ์ž…์ด๋‚˜ ๋ฆฌ์…‹์ด ํ•„์š” ์—†์–ด ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
    • ์ž”์—ฌ ์ •์ฑ… (Residual Policy): ํ•™์Šต๋œ ์กฐ์ธํŠธ๋ณ„ ๋™์—ญํ•™ ๋ชจ๋ธ f_\psi๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ธฐ๋ณธ ์ •์ฑ…์˜ ์•ก์…˜์„ ๋ณด์ƒํ•˜๋Š” ์ž”์—ฌ ์ •์ฑ… \pi_{res}๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ์ •์ฑ…์˜ ๊ด€์ธก์น˜ o_{gen_t}์™€ ๊ธฐ๋ณธ ์•ก์…˜ a_t๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, \pi_{res}๋Š” ๋ณด์ •์น˜ a_{res,t}๋ฅผ ์ถœ๋ ฅํ•˜๋ฉฐ, ์‹ค์ œ ๋ฐฐํฌ ์‹œ์—๋Š” a_t + a_{res,t}๊ฐ€ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ๊ธฐ์กด ์ •์ฑ…์˜ ๋™์ž‘์„ ํฌ๊ฒŒ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š์œผ๋ฉด์„œ ์‹ค์ œ ํ™˜๊ฒฝ์˜ ๋™์—ญํ•™์  ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜๋„๋ก ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ํšจ๊ณผ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ:

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ‰๊ฐ€์—์„œ DexNDM์˜ generalist policy๋Š” ๋ฏธ๊ณต๊ฐœ ๊ฐ์ฒด์— ๋Œ€ํ•ด ๊ธฐ์กด AnyRotate ๊ตฌํ˜„๋ณด๋‹ค 37%~81% ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ DexNDM์€ ์ „๋ก€ ์—†๋Š” dexterity๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๋ณต์žกํ•œ ํ˜•์ƒ(๋™๋ฌผ ๋ชจ๋ธ), ๋†’์€ ์ข…ํšก๋น„(์ตœ๋Œ€ 5.33), ์ž‘์€ ํฌ๊ธฐ ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•œ ์†๋ชฉ ๋ฐฉํ–ฅ ๋ฐ ํšŒ์ „ ์ถ•์—์„œ ์„ฑ๊ณต์ ์ธ ๊ณต์ค‘ ํšŒ์ „์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, 10-16cm ๊ธธ์ด์˜ ๊ธด ๊ฐ์ฒด๋ฅผ palm-down ๊ตฌ์„ฑ์—์„œ ๊ณต์ค‘์—์„œ ๊ฑฐ์˜ ํ•œ ๋ฐ”ํ€ด ํšŒ์ „์‹œํ‚ค๋Š” ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋Š”๋ฐ, ์ด๋Š” ๊ธฐ์กด ์—ฐ๊ตฌ์—์„œ ์‹œ๋„๋˜์ง€ ์•Š์•˜๊ฑฐ๋‚˜ ์–ด๋ ค์› ๋˜ ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. Visual Dexterity ๋ฐ AnyRotate์™€ ๋น„๊ตํ•˜์—ฌ ํƒ์›”ํ•œ ์„ฑ๋Šฅ๊ณผ ๊ด‘๋ฒ”์œ„ํ•œ ๊ฐ์ฒด ๋ฐ ์กฐ๊ฑด์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. Whole-Hand Neural Dynamics Model๊ณผ์˜ ๋น„๊ต๋ฅผ ํ†ตํ•ด, DexNDM์˜ joint-wise model์ด ๋ฐ์ดํ„ฐ๊ฐ€ ์ œํ•œ์ ์ด๊ฑฐ๋‚˜ train-test distribution shift๊ฐ€ ์žˆ๋Š” ํ™˜๊ฒฝ์—์„œ ํ›จ์”ฌ ๋” ๋†’์€ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ๊ณผ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ASAP ๋ฐ UAN๊ณผ ๊ฐ™์€ ๊ธฐ์กด sim-to-real ๋ฐฉ๋ฒ•๋“ค์€ object-loaded ์ƒํ˜ธ์ž‘์šฉ ๋™์—ญํ•™์— ๋Œ€ํ•œ generalization์ด ๋ถ€์กฑํ•˜์—ฌ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์‹คํŒจํ–ˆ์Šต๋‹ˆ๋‹ค. DexNDM์€ tool-using ๋ฐ ์กฐ๋ฆฝ๊ณผ ๊ฐ™์€ ๋ณต์žกํ•œ dexterous task๋ฅผ ์œ„ํ•œ teleoperation ์‹œ์Šคํ…œ์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์—ฐํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก :

DexNDM์€ joint-wise neural dynamics model๊ณผ ์ž์œจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์ „๋žต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ์ƒˆ๋กœ์šด sim-to-real framework๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์ „๋ก€ ์—†๋Š” ์†์•ˆ ๊ฐ์ฒด ํšŒ์ „ ๋Šฅ๋ ฅ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” dexterous manipulation์˜ โ€œํ˜„์‹ค-์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฐ„๊ทนโ€์„ ์ขํžˆ๋Š” ๋ฐ ์ค‘์š”ํ•œ ์ง„์ „์„ ์ด๋ฃจ์—ˆ์œผ๋ฉฐ, ํ–ฅํ›„ ์ด‰๊ฐ ์„ผ์„œ ๋ฐ ๋” ํ’๋ถ€ํ•œ ์‹ ํ˜ธ ํ†ตํ•ฉ์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

ํ•œ๋ˆˆ์— ๋ณด๋Š” ํ•ต์‹ฌ

DexNDM์€ ์† ์•ˆ์—์„œ์˜ ๋ฌผ์ฒด ํšŒ์ „(in-hand rotation)์—์„œ sim-to-real ๊ฒฉ์ฐจ๋ฅผ ์ขํžˆ๋Š” ์ƒˆ ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๋‘ ๊ฐ€์ง€๋กœ ์••์ถ•๋œ๋‹ค. ์ฒซ์งธ, ์ „์ฒด ์†-๋ฌผ์ฒด ์‹œ์Šคํ…œ์„ ํ•œ ๋ฉ์–ด๋ฆฌ๋กœ ํ•™์Šตํ•˜์ง€ ๋ง๊ณ  ๊ด€์ ˆ ํ•˜๋‚˜ํ•˜๋‚˜๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šตํ•˜๋ผ. ๋‘˜์งธ, ๊ทธ ๋ชจ๋ธ์ด ์ผ๋ฐ˜ํ™”๊ฐ€ ์ž˜ ๋˜๋‹ˆ๊นŒ ๊ณต ํ†ต(Chaos Box)์— ์†์„ ์ฒ˜๋ฐ•๊ณ  ๋ฌด์ž‘์œ„ ๋ถ€ํ•˜๋ฅผ ๋ฐ›๊ฒŒ ๋งŒ๋“ค๋ฉด์„œ ์ž๋™ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์„ ํ•˜๋ผ. ์ด ๋‘ ๊ฒฐ์ •์ด ๊ฒฐํ•ฉ๋˜์–ด, ๋‹จ์ผ ์ •์ฑ… ํ•˜๋‚˜๊ฐ€ ๋™๋ฌผ ๋ชจ์–‘, 5.33์˜ ์ข…ํšก๋น„, ์†๋“ฑ์ด ์•„๋ž˜๋กœ ํ–ฅํ•œ ์ž์„ธ๊นŒ์ง€ ๊ด‘๋ฒ”์œ„ํ•œ ์กฐ๊ฑด์—์„œ ์ž˜ ์ž‘๋™ํ•œ๋‹ค. ํŠนํžˆ ์†๋“ฑ์ด ์•„๋ž˜๋ฅผ ํ–ฅํ•œ ์ƒํƒœ์—์„œ 10~16cm ๊ธธ์ด์˜ ๋ง‰๋Œ€๋ฅผ ๊ณต์ค‘์—์„œ ํ•œ ๋ฐ”ํ€ด ๊ตด๋ฆฐ ์ฒซ ์‹œ์—ฐ์ด๋ผ๋Š” ์ ์€ ์ฃผ๋ชฉํ•  ๋งŒํ•˜๋‹ค.

DexNDM์˜ LEAP hand ๊ฒฐ๊ณผ๋Š” ๋” ์ž‘๊ณ  ๋œ ํŠน์ˆ˜ํ•œ ํ•˜๋“œ์›จ์–ด์—์„œ Visual Dexterity์˜ Dโ€™Claw ์„ฑ๋Šฅ์„ ๋”ฐ๋ผ์žก๊ฑฐ๋‚˜ ๋Šฅ๊ฐ€ํ•œ๋‹ค. ์ด๋Š” dexterous manipulation ์—ฐ๊ตฌ์ž์—๊ฒŒ ์‹œ์‚ฌํ•˜๋Š” ๋ฐ”๊ฐ€ ํฌ๋‹ค. โ€œํ•˜๋“œ์›จ์–ด๋ฅผ ๋” ๋น„์‹ธ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒโ€์ด ์•„๋‹ˆ๋ผ โ€œ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋ฅผ ์†๋ณด๋Š” ๊ฒƒโ€์ด sim-to-real์˜ ๋‹ค์Œ ๋ŒํŒŒ๊ตฌ๋ผ๋Š” ๋ฉ”์‹œ์ง€๊ฐ€ ๋ช…ํ™•ํ•˜๋‹ค.

๋ฌธ์ œ: ์ธํ•ธ๋“œ ํšŒ์ „์˜ sim-to-real, ์™œ ์•„์ง๋„ ํ’€๋ฆฌ์ง€ ์•Š๋Š”๊ฐ€

์† ์•ˆ์—์„œ ๋ฌผ์ฒด๋ฅผ ๊ตด๋ฆฌ๋Š” ์ผ์€ ์ธ๊ฐ„์—๊ฒ ์†์‰ฝ์ง€๋งŒ, ๋กœ๋ด‡์—๊ฒ ๊ฐ€์žฅ ์–ด๋ ค์šด manipulation ๊ณผ์ œ ์ค‘ ํ•˜๋‚˜๋‹ค. ๋ฌด์—‡์ด ์–ด๋ ค์šด๊ฐ€? ์ ‘์ด‰์ด ๋น ๋ฅด๊ฒŒ ๋ณ€ํ•˜๊ณ , ์†๊ฐ€๋ฝ ์‚ฌ์ด๋กœ ๋ฌผ์ฒด๊ฐ€ ๋ฏธ๋„๋Ÿฌ์ง€๋ฉฐ, ์™ธ๋ถ€ ๋ถ€ํ•˜๊ฐ€ ๋งค ์ˆœ๊ฐ„ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด ๋ชจ๋“  ๊ฒƒ์ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ๋Š” โ€œ๊ทธ๋Ÿญ์ €๋Ÿญ ๋น„์Šทํ•œ ๋ฐฉ์‹โ€์œผ๋กœ ๋ชจ๋ธ๋ง๋˜์ง€๋งŒ ์‹ค์ œ ํ•˜๋“œ์›จ์–ด์—์„œ๋Š” ์ž‘์€ ๋งˆ์ฐฐ๊ณ„์ˆ˜ ์ฐจ์ด, ๋ชจํ„ฐ์˜ ๋ฐฑ๋ž˜์‹œ, ์†๊ฐ€๋ฝ ํ‘œ๋ฉด์˜ ๋งˆ๋ชจ, PD ์ œ์–ด์˜ ์‘๋‹ต ์ง€์—ฐ ๊ฐ™์€ ๊ฒƒ๋“ค์ด ๋ˆ„์ ๋˜์–ด ์ •์ฑ…์„ ๋ฌด๋„ˆ๋œจ๋ฆฐ๋‹ค. ์ด๊ฒƒ์ด sim-to-real gap์ด๋‹ค.

๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ์ค‘ ํ•˜๋‚˜๋กœ ํšŒํ”ผํ–ˆ๋‹ค.

์ ‘๊ทผ๋ฒ• ๋Œ€ํ‘œ ์—ฐ๊ตฌ ํ•œ๊ณ„
์†๋ฐ”๋‹ฅ์ด ์œ„๋ฅผ ํ–ฅํ•œ ์ž์„ธ๋งŒ ๊ฐ€์ • RotateIt (Qi 2023), PenSpin (Wang 2024) ๋‹ค์–‘ํ•œ wrist orientation ์ฒ˜๋ฆฌ ๋ถˆ๊ฐ€
๋‹จ์ˆœํ•œ ์ •ํ˜• ๋ฌผ์ฒด๋งŒ ๋‹ค๋ฃธ RotateIt, AnyRotate (Yang 2024) ๋™๋ฌผ ๋ชจ์–‘, ๋ง‰๋Œ€ ๋“ฑ ๋ณต์žก ํ˜•์ƒ ์ฒ˜๋ฆฌ ๋ถˆ๊ฐ€
๋น„์‹ผ ๋งž์ถคํ˜• ํ•˜๋“œ์›จ์–ด + ์ •๋ฐ€ ์ด‰๊ฐ์„ผ์„œ Visual Dexterity (Chen 2022, Dโ€™Claw) ์ผ๋ฐ˜ ํ•˜๋“œ์›จ์–ด๋กœ ์žฌํ˜„ ์–ด๋ ค์›€

AnyRotate๋Š” wrist orientation๊ณผ ํšŒ์ „์ถ• ์ผ๋ฐ˜ํ™”๋ฅผ ๋‹ฌ์„ฑํ–ˆ์ง€๋งŒ ๊ฐ์ฒด๋Š” ํ‰๋ฒ”ํ•œ ํฌ๊ธฐ/ํ˜•์ƒ์— ๋จธ๋ฌผ๋ €๊ณ , Visual Dexterity๋Š” ๋ณต์žก ํ˜•์ƒ์„ ๊ณต์ค‘์—์„œ ๊ตด๋ ธ์ง€๋งŒ ์ž‘์€ ๋ฌผ์ฒด๋‚˜ ๊ธธ์ญ‰ํ•œ ๋ฌผ์ฒด์—์„œ ์„ฑ๋Šฅ์ด ๊ฒ€์ฆ๋˜์ง€ ์•Š์•˜๋‹ค. โ€œ๋ชจ๋“  ์ฐจ์›์˜ ์ผ๋ฐ˜์„ฑ์„ ๋™์‹œ์— ๊ฐ–์ถ˜ ๋‹จ์ผ ์ •์ฑ…โ€์ด ์•„์ง ์—†์—ˆ๋‹ค๋Š” ๊ฒƒ์ด DexNDM์ด ์ •์กฐ์ค€ํ•œ ๋นˆ์ž๋ฆฌ๋‹ค.

๊ธฐ์กด sim-to-real ์ „๋žต์˜ ํ•œ๊ณ„๋„ ๋ช…ํ™•ํ•˜๋‹ค. ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋Š” ํœด๋ฆฌ์Šคํ‹ฑํ•œ ๋ถ„ํฌ ํญ์— ์˜์กดํ•˜๊ณ , ์‹œ์Šคํ…œ ์‹๋ณ„(SysID)์€ ํŒŒ๋ผ๋ฏธํ„ฐํ™” ๊ฐ€๋Šฅํ•œ ๋ถ€๋ถ„๋งŒ ์žก์•„๋‚ธ๋‹ค. ๋” ์•ผ์‹ฌ์ฐฌ ์ ‘๊ทผ์€ ์‹ค์„ธ๊ณ„ ๋ฐ์ดํ„ฐ๋กœ ์‹ ๊ฒฝ๋ง ๋™์—ญํ•™์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ธ๋ฐ(ASAP, UAN, MB-Max), locomotion์—์„œ๋Š” ์ž˜ ํ†ตํ•˜์ง€๋งŒ dexterous manipulation์—์„œ๋Š” ๋‹ค์Œ ๋ชจ์ˆœ์— ๋ง‰ํ˜”๋‹ค.

๋ฐ์ดํ„ฐ ๋ชจ์ˆœ: ์ผ๋ฐ˜์„ฑ์„ ๊ฐ–์ถ”๋ ค๋ฉด ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์— ๋Œ€ํ•œ ๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ทธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ„ํฌ์ ์œผ๋กœ task-relevantํ•˜๋ ค๋ฉด ์ •์ฑ…์ด ์ด๋ฏธ ๊ทธ ๊ฐ์ฒด๋“ค์„ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ •์ฑ…์ด ์ž˜ ์ž‘๋™ํ•˜์ง€ ๋ชปํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋ ค๋Š” ๊ฒƒ์ด๋‹ค. ์ฆ‰ ๋‹ญ์ด ๋จผ์ €๋ƒ ๋‹ฌ๊ฑ€์ด ๋จผ์ €๋ƒ์˜ ๋ฌธ์ œ๋‹ค.

๊ฒŒ๋‹ค๊ฐ€ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์€ ๋˜ ๋‹ค๋ฅธ ํ•จ์ •์— ๋น ์ง„๋‹ค. ๋ง‰๋Œ€์ฒ˜๋Ÿผ ์–ด๋ ค์šด ๋ฌผ์ฒด๋Š” ๋ฏธํกํ•œ ์ •์ฑ…์œผ๋กœ ์ž๊พธ ๋–จ์–ด๋œจ๋ ค์„œ ์‚ฌ๋žŒ์ด ๊ณ„์† ๋‹ค์‹œ ์„ธํŒ…ํ•ด์ค˜์•ผ ํ•˜๊ณ , ์†์— ๊ฐ€๋ ค์ง„ ์ž‘์€ ๋ฌผ์ฒด์˜ ์ƒํƒœ๋ฅผ ๋น„์ „์œผ๋กœ ์ •ํ™•ํžˆ ์ถ”์ ํ•˜๋Š” ๊ฒƒ๋„ ์–ด๋ ต๋‹ค. ๋ฐ์ดํ„ฐ์…‹์€ ์ž‘๊ณ , ํŽธํ–ฅ๋˜๊ณ , ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ๋‹ค. DexNDM์€ ์ด ๋งค๋“ญ์„ ๋ชจ๋ธ ์ชฝ๊ณผ ๋ฐ์ดํ„ฐ ์ชฝ์—์„œ ๋™์‹œ์— ํ‘ผ๋‹ค.

์ฒซ ๋ฒˆ์งธ ํ†ต์ฐฐ: ๋™์—ญํ•™์„ ๊ด€์ ˆ ๋‹จ์œ„๋กœ ๋ถ„ํ•ดํ•œ๋‹ค

์ „ํ†ต์ ์ธ ์‹ ๊ฒฝ๋ง ๋™์—ญํ•™ ๋ชจ๋ธ์€ ์† ์ „์ฒด๋ฅผ ํ•œ๊บผ๋ฒˆ์— ๋ณธ๋‹ค. ์ฆ‰ ์†์˜ ๊ธธ์ด W ์งœ๋ฆฌ ์ƒํƒœ-ํ–‰๋™ ํžˆ์Šคํ† ๋ฆฌ H_t = \{\mathbf{q}_j, \mathbf{a}_j\}_{j=t-W+1}^{t} ๋ฅผ ๋ฐ›์•„์„œ ๋‹ค์Œ ์ƒํƒœ ์ „์ฒด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค.

\mathbf{q}^{t+1} = f_\theta(H_t)

์ด๊ฑด RMA (Kumar 2021)์˜ ์ ‘๊ทผ์—์„œ ๋น„๋กฏ๋œ ์•„์ด๋””์–ด๋‹ค. 16-DoF ์†์ด๋ผ๋ฉด H_t์˜ ์ฐจ์›์ด 2 \times 16 \times W๊ฐ€ ๋˜์–ด ๋งค์šฐ ํฌ๋‹ค. ํฐ ์ฐจ์› = ๋ฐ์ดํ„ฐ ๋งŽ์ด ํ•„์š” = ๋ถ„ํฌ ์ผ์น˜ ๋นก์„ธ์ง์ด๋‹ค. DexNDM์€ ์ด๋ฅผ ๋’ค์ง‘๋Š”๋‹ค.

๊ด€์ ˆ i ํ•˜๋‚˜์˜ ๋‹ค์Œ ์ƒํƒœ๋Š”, ๊ทธ ๊ด€์ ˆ ์ž์‹ ์˜ ํžˆ์Šคํ† ๋ฆฌ๋งŒ ๋ณด๊ณ  ์˜ˆ์ธกํ•˜์ž.

\mathbf{q}_{t+1}^i = f_{\psi_i}(h_t^i), \quad h_t^i = \{\mathbf{q}_j^i, \mathbf{a}_j^i\}_{j=t-W+1}^{t}

์ด๊ฒŒ ์™œ ๋ง์ด ๋˜๋Š”๊ฐ€? ํ•œ ๊ด€์ ˆ์˜ ์šด๋™๋ฐฉ์ •์‹์„ ๋ณด์ž. ํ‘œ์ค€ ๋งค๋‹ˆํ“ฐ๋ ˆ์ดํ„ฐ ๋™์—ญํ•™์—์„œ

M(\mathbf{q})\ddot{\mathbf{q}} + C(\mathbf{q},\dot{\mathbf{q}})\dot{\mathbf{q}} + G(\mathbf{q}) = \boldsymbol{\tau} + \boldsymbol{\tau}_{\text{ext}}

์ด๊ฑธ โ€œ๋ชจ๋ธ๋ง ๋Œ€์ƒ ๊ด€์ ˆ mโ€๊ณผ โ€œ๊ทธ ์™ธ ๋ชจ๋“  ์Šฌ๋ ˆ์ด๋ธŒ ๊ด€์ ˆ sโ€๋กœ ์ชผ๊ฐœ๋ฉด, ์ €์† ๊ฐ€์ • ํ•˜์—์„œ Coriolis๋ฅผ ๋ฌด์‹œํ•˜๊ณ  ์Šˆ์–ด ๋ณด์ˆ˜(Schur complement)๋กœ ์ •๋ฆฌํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์••์ถ•๋œ๋‹ค.

\mathbf{H}_t^{\text{eff}} \ddot{\mathbf{q}}_t^i + \mathbf{G}_t^{\text{eff}} = \tau_t^i

์—ฌ๊ธฐ์„œ \mathbf{H}_t^{\text{eff}}, \mathbf{G}_t^{\text{eff}} \in \mathbb{R} ์€ ์Šค์นผ๋ผ๋‹ค. ์ธ์ ‘ ๊ด€์ ˆ์˜ ๊ฐ€์†๋„, ์ค‘๋ ฅ, ์™ธ๋ถ€ ๋ถ€ํ•˜, ๊ฐ์ฒด์™€์˜ ์ ‘์ด‰๋ ฅ์ด ์ „๋ถ€ ์ด ๋‘ ๊ฐœ์˜ ํšจ๊ณผ ํ•ญ(effective term)์œผ๋กœ ์‘์ถ•๋œ๋‹ค. ์ด ๋‘ ํ•ญ๋งŒ ์•Œ๋ฉด, ๊ทธ ๊ด€์ ˆ์˜ ๋‹ค์Œ ์ƒํƒœ๋Š” ๊ฒฐ์ •๋œ๋‹ค.

์ง๊ด€์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์ด๋ ‡๋‹ค. 16๋ช…์ด ์†์„ ์žก๊ณ  ์ค„๋‹ค๋ฆฌ๊ธฐ๋ฅผ ํ•˜๋Š” ์ƒํ™ฉ์—์„œ, ๊ฐ ์‚ฌ๋žŒ์€ ์ž๊ธฐ๊ฐ€ ๋ฐ›๋Š” ์•Œ์งœ ํž˜๊ณผ ์ž๊ธฐ ๋ฌด๊ฒŒ์ค‘์‹ฌ๋งŒ ์•Œ๋ฉด ๋‹ค์Œ ํ•œ ๋ฐœ์„ ์ •ํ™•ํžˆ ๋‚ด๋”›์„ ์ˆ˜ ์žˆ๋‹ค. ์˜† ์‚ฌ๋žŒ์ด ์–ด๋–ค ์ž์„ธ๋กœ ์–ด๋–ค ๊ทผ์œก์„ ์ผ๋Š”์ง€ ์•Œ ํ•„์š”๊ฐ€ ์—†๋‹ค. ์˜† ์‚ฌ๋žŒ๋“ค์˜ ๋ชจ๋“  ์ƒํƒœ๋Š” โ€œ๋‚ด ์†๋ฐ”๋‹ฅ์— ์ „ํ•ด์ง„ ์•Œ์งœ ์žฅ๋ ฅโ€์ด๋ผ๋Š” ํ•˜๋‚˜์˜ ์‹ ํ˜ธ๋กœ ์••์ถ•๋˜์–ด ๋‚ด๊ฒŒ ๋„๋‹ฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋ฌผ๋ก  ์šฐ๋ฆฌ๋Š” ์ด ํšจ๊ณผ ํ•ญ์„ ์ง์ ‘ ์ธก์ •ํ•˜์ง€ ๋ชปํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์งง์€ ํžˆ์Šคํ† ๋ฆฌ h_t^i ์•ˆ์—๋Š” ๊ฐ€์†๋„, ์†๋„, ์œ„์น˜, ๋ช…๋ น ํ† ํฌ๊ฐ€ ๋‹ค ๋“ค์–ด ์žˆ๊ณ , ํšจ๊ณผ ํ•ญ์ด ์งง์€ ์‹œ๊ฐ„ ๋™์•ˆ ์—ฐ์†ํ•จ์ˆ˜์ฒ˜๋Ÿผ ๋ณ€ํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉด ์ด ํžˆ์Šคํ† ๋ฆฌ๋งŒ์œผ๋กœ ์ถฉ๋ถ„ํžˆ ๋‹ค์Œ ์ƒํƒœ๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค. ์‹ ๊ฒฝ๋ง์€ ์ด ํ•จ์ˆ˜ ๋งคํ•‘์„ ํ•™์Šตํ•œ๋‹ค.

%%| label: fig-jointwise
%%| fig-cap: "๊ด€์ ˆ ๋‹จ์œ„ ๋™์—ญํ•™ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ. ๊ฐ ๊ด€์ ˆ i๋Š” ์ž๊ธฐ ์ž์‹ ์˜ W-step ํžˆ์Šคํ† ๋ฆฌ๋งŒ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๋‹ค์Œ ์ƒํƒœ๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค."
flowchart LR
    subgraph WHOLE["์ „์ฒด ์† ๋ชจ๋ธ f_theta"]
        H["H_t<br/>(์ „์ฒด ์† ํžˆ์Šคํ† ๋ฆฌ)<br/>์ฐจ์›: 2*W*d"] --> Q["q_{t+1}<br/>(์ „์ฒด ์† ์ƒํƒœ)"]
    end
    subgraph JOINT["๊ด€์ ˆ๋ณ„ ๋ชจ๋ธ f_psi_i"]
        H1["h_t^1"] --> Q1["q_{t+1}^1"]
        H2["h_t^2"] --> Q2["q_{t+1}^2"]
        HN["..."] --> QN["..."]
        HD["h_t^d"] --> QD["q_{t+1}^d"]
    end
    WHOLE -.->|"๋ถ„ํ•ด<br/>(factorize)"| JOINT

์ด ๊ฒฐ์ •์ด ๊ฐ€์ ธ์˜ค๋Š” ๋‘ ๊ฐ€์ง€ ๊ฒฐ๊ณผ๊ฐ€ ์žˆ๋‹ค.

  1. ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ: ์ž…๋ ฅ ์ฐจ์›์ด 2Wd ์—์„œ 2W ๋กœ ์ค„์–ด๋“ ๋‹ค. d=16์ด๋ผ๋ฉด 16๋ฐฐ ์ ์€ ์ฐจ์›์ด๋‹ค. ๋ฐ์ดํ„ฐ ํ•œ trajectory๊ฐ€ d๊ฐœ์˜ ํ•™์Šต ์ƒ˜ํ”Œ์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ํšจ๊ณผ๋„ ๋ถ€์ˆ˜์ ์œผ๋กœ ๋”ฐ๋ผ์˜จ๋‹ค.
  2. ๊ฐ์ฒด ์ƒํƒœ ์ถ”์ • ์˜์กด์„ฑ ์ œ๊ฑฐ: ์†๊ฐ€๋ฝ ์‚ฌ์ด์—์„œ ๊ฐ€๋ ค์ง€๋Š” ๋ฌผ์ฒด์˜ 6D ์ž์„ธ๋ฅผ ์ถ”์ •ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. ๊ฐ์ฒด์˜ ์˜ํ–ฅ์€ ํšจ๊ณผ ํ•ญ์— ์ž๋™์œผ๋กœ ์‘์ถ•๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋น„์ „ ์ถ”์ ์˜ ๋…ธ์ด์ฆˆ์™€ occlusion ๋ฌธ์ œ๊ฐ€ ์‚ฌ๋ผ์ง„๋‹ค.

๋‘ ๋ฒˆ์งธ ํ†ต์ฐฐ: ์ •๋ณด ์••์ถ•์ด ์ผ๋ฐ˜ํ™” ๊ฒฉ์ฐจ๋ฅผ ์ค„์ธ๋‹ค

์—ฌ๊ธฐ๊นŒ์ง€๋Š” โ€œ๊ทธ๋Ÿด๋“ฏํ•œ ๋ชจ๋ธ๋ง ์„ ํƒโ€ ์ •๋„๋‹ค. ๋…ผ๋ฌธ์ด ํ•œ ๋ฐœ ๋” ๋‚˜์•„๊ฐ€๋Š” ๊ณณ์€, ์™œ ์ด ๋ถ„ํ•ด๊ฐ€ ๋ถ„ํฌ ๋ณ€ํ™”(distribution shift)์— ๊ฐ•๊ฑดํ•œ์ง€๋ฅผ ์ˆ˜ํ•™์ ์œผ๋กœ ์ฆ๋ช…ํ•˜๋Š” ๋ถ€๋ถ„์ด๋‹ค.

๋ฌธ์ œ ์„ค์ •์€ ์ด๋ ‡๋‹ค. ํ•™์Šต ๋ถ„ํฌ \mathcal{Q} (Chaos Box์—์„œ ๋ชจ์€ ๋ฐ์ดํ„ฐ)์™€ ํƒ€๊ฒŸ ๋ถ„ํฌ \mathcal{P} (์‹ค์ œ ํšŒ์ „ task)์ด ๋‹ค๋ฅด๋‹ค. ์šฐ๋ฆฌ๋Š” \mathcal{Q} ์œ„์—์„œ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด \mathcal{P} ์œ„์—์„œ๋„ ์ž˜ ์ž‘๋™ํ•˜๊ธฐ๋ฅผ ์›ํ•œ๋‹ค.

ํ•ต์‹ฌ ๋„๊ตฌ๋Š” Data Processing Inequality (DPI) ๋‹ค. ์–ด๋–ค ์ธก์ • ๊ฐ€๋Šฅํ•œ ๋ณ€ํ™˜ g (๋น„๋‹จ์‚ฌ์ , non-injective)์— ๋Œ€ํ•ด

\mathrm{KL}(\mathcal{P} \| \mathcal{Q}) \geq \mathrm{KL}(g(\mathcal{P}) \| g(\mathcal{Q}))

๊ฐ€ ์„ฑ๋ฆฝํ•œ๋‹ค. ์ฆ‰ ๋ณ€ํ™˜ g๋ฅผ ๊ฑฐ์น˜๊ณ  ๋‚˜๋ฉด ๋‘ ๋ถ„ํฌ ์‚ฌ์ด์˜ KL ๋ฐœ์‚ฐ์€ ์ž‘์•„์งˆ ๋ฟ, ์ปค์ง€์ง€ ์•Š๋Š”๋‹ค. ๋” ๊ฐ•ํ•œ ํ˜•ํƒœ๋Š”, g๊ฐ€ ์ง„์ •์œผ๋กœ ์ •๋ณด๋ฅผ ์žƒ์„ ๋•Œ ๋ถ€๋“ฑ์‹์ด ์—„๊ฒฉํ•˜๊ฒŒ ์„ฑ๋ฆฝํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

DexNDM์—์„œ g๋Š” โ€œ์ „์ฒด ์† ํžˆ์Šคํ† ๋ฆฌ์—์„œ ๊ด€์ ˆ i์˜ ํžˆ์Šคํ† ๋ฆฌ๋งŒ ์ถ”์ถœํ•˜๋Š” ์‚ฌ์˜โ€์ด๋‹ค. ์ •์˜์—ญ ์ฐจ์› 2Wd์—์„œ ๊ณต์—ญ ์ฐจ์› 2W๋กœ ์ค„์ด๋‹ˆ, ์ •๋ณด๋ฅผ ์žƒ๋Š”๋‹ค. ๋”ฐ๋ผ์„œ

\mathrm{KL}(g(\mathcal{P}) \| g(\mathcal{Q})) < \mathrm{KL}(\mathcal{P} \| \mathcal{Q})

์ด KL ์ถ•์†Œ๊ฐ€ ์ผ๋ฐ˜ํ™” ๊ฒฉ์ฐจ์˜ ์ถ•์†Œ๋กœ ์ด์–ด์ง„๋‹ค๋Š” ๊ฒƒ์ด Theorem 3.2์˜ ๋‚ด์šฉ์ด๋‹ค. ๊ณต๋ณ€๋Ÿ‰ ์‹œํ”„ํŠธ(covariate shift) ๊ฐ€์ • ์•„๋ž˜์—์„œ, ์ž„์˜์˜ ํ•™์Šต๋œ ํ•จ์ˆ˜์— ๋Œ€ํ•ด

\sup |R_{\mathcal{P}}(f_2 \circ g_X) - R_{\mathcal{Q}}(f_2 \circ g_X)| < \sup |R_{\mathcal{P}}(f_1) - R_{\mathcal{Q}}(f_1)|

์ด ์„ฑ๋ฆฝํ•œ๋‹ค. ํ’€์–ด ์“ฐ๋ฉด ์ด๋ ‡๋‹ค.

๊ฐ™์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์ €์ฐจ์› ์‚ฌ์˜ ๊ณต๊ฐ„์—์„œ ํ•™์Šตํ•œ ๊ด€์ ˆ๋ณ„ ๋ชจ๋ธ์ด, ๊ณ ์ฐจ์› ์›๊ณต๊ฐ„์—์„œ ํ•™์Šตํ•œ ์ „์ฒด ์† ๋ชจ๋ธ๋ณด๋‹ค ๋” ์ž‘์€ ์ผ๋ฐ˜ํ™” ๊ฒฉ์ฐจ๋ฅผ ๊ฐ–๋Š”๋‹ค.

์ง๊ด€์œผ๋กœ ํ’€์–ด๋ณด์ž. ํ•™์Šต ๋ถ„ํฌ์™€ ํ‰๊ฐ€ ๋ถ„ํฌ๊ฐ€ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ๊ตฌ์„์— ์žˆ๋‹ค๊ณ  ํ•˜์ž. ๊ทธ๋Ÿฐ๋ฐ ๋‘ ๋ถ„ํฌ๋ฅผ ๋™์ผํ•œ ์ €์ฐจ์› ์ถ•์œผ๋กœ ์‚ฌ์˜ํ•˜๋ฉด, ๋‘ ๋ถ„ํฌ๋Š” ๊ทธ ์ถ• ์œ„์—์„œ ํ›จ์”ฌ ๋” ๋น„์Šทํ•˜๊ฒŒ ๋ณด์ธ๋‹ค. ํฉ์–ด์ง„ ๋‹ค์ฑ„๋กœ์šด ๋ณ„์ž๋ฆฌ๊ฐ€ ๋ฉ€๋ฆฌ์„œ ๋ณด๋ฉด ๋น„์Šทํ•œ ์•ˆ๊ฐœ๋กœ ๋ญ‰๋šฑ๊ทธ๋ ค์ง€๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค. ์ด โ€œ์›๊ทผ ํšจ๊ณผโ€๊ฐ€ ์ •๋ณด ์••์ถ•์˜ ๋ณธ์งˆ์ด๋‹ค.

DexNDM์˜ ์•ฝ์ ์ด ๋  ๋งŒํ•œ ๋ถ€๋ถ„๋„ ๋™์‹œ์— ๋ณดํ˜ธ๋œ๋‹ค. ๋‹จ์ผ ๊ด€์ ˆ ํžˆ์Šคํ† ๋ฆฌ๋Š” ํšจ๊ณผ ํ•ญ์„ ์ถ”์ •ํ•˜๊ธฐ์—๋Š” ์ถฉ๋ถ„ํ•˜์ง€๋งŒ, ๋‹ค๋ฅธ ๊ด€์ ˆ๋“ค์˜ ๊ณ ์ฐจ์› ์˜ํ–ฅ๋ ฅ์„ ๋ณต์›ํ•˜๊ธฐ์—๋Š” ๋ถ€์กฑํ•˜๋‹ค. ํ‘œํ˜„๋ ฅ์€ ์‚ด์•„๋‚จ๊ณ , ํ—›๋œ ์ƒ๊ด€๊ด€๊ณ„๋Š” ์ฐจ๋‹จ๋œ๋‹ค. ์ด๊ฒƒ์ด ์ •๋ณด ๋ณ‘๋ชฉ(information bottleneck)์ด ์ž‘๋™ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

๋…ผ๋ฌธ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ์‚ฌ์ „ํ•™์Šตํ•˜์—ฌ ์ดˆ๊ธฐ๊ฐ’์„ ์žก๊ณ , ์‹ค์„ธ๊ณ„ ๋ฐ์ดํ„ฐ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ๋‹ค. ์‚ฌ์ „ํ•™์Šต์ด ablation์—์„œ ํฐ ์ฐจ์ด๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค.

์„ธ ๋ฒˆ์งธ ํ†ต์ฐฐ: Chaos Box๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฑฐ์ € ๋ชจ์€๋‹ค

๋ชจ๋ธ์ด ๋ถ„ํฌ ๋ณ€ํ™”์— ๊ฐ•๊ฑดํ•˜๋‹ค๋Š” ์‚ฌ์‹ค์€ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐฉ์‹์„ ๊ทผ๋ณธ์ ์œผ๋กœ ๋‹จ์ˆœํ™”ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. ์ด ๋ถ€๋ถ„์ด DexNDM์—์„œ ์‹ค๋ฌด์ž์—๊ฒŒ ๊ฐ€์žฅ ๋งค๋ ฅ์ ์ธ ๋Œ€๋ชฉ์ด๋‹ค.

๊ธฐ์กด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ํ•จ์ •์„ ์งš์–ด๋ณด์ž.

๋ฐฉ์‹ ๋ฌธ์ œ์ 
๋ฒ ์ด์Šค ์ •์ฑ… ๋กค์•„์›ƒ (ASAP, MB-Max) ์–ด๋ ค์šด ๋ฌผ์ฒด์—์„œ ์ž๊พธ ๋–จ์–ด๋œจ๋ฆผ. ์‚ฌ๋žŒ์ด ๊ณ„์† ๋‹ค์‹œ ์„ธํŒ…ํ•ด์•ผ ํ•จ
Wave action (UAN) ๊ฐ์ฒด ๋ถ€ํ•˜๊ฐ€ ์—†์–ด์„œ ์‹ค์ œ dynamics์™€ ๋™๋–จ์–ด์ง
๋น„์ „ ๊ธฐ๋ฐ˜ ๊ฐ์ฒด ์ถ”์  ์†์— ๊ฐ€๋ ค์ ธ ์ถ”์  ์‹คํŒจ. ์ž‘์€ ๋ฌผ์ฒด์ผ์ˆ˜๋ก ์‹ฌํ•จ

DexNDM์˜ ๋‹ต์€ ๋‹จ์ˆœํ•˜๋‹ค. ๊ณต ํ†ต(Chaos Box)์— ์†์„ ์ฒ˜๋ฐ•๋Š”๋‹ค. ๋ถ€๋“œ๋Ÿฌ์šด ๊ณต์œผ๋กœ ๊ฐ€๋“ ์ฐฌ ํ†ต ์•ˆ์— LEAP hand๋ฅผ ๋„ฃ๊ณ , ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์˜ ํ–‰๋™์„ open-loop๋กœ ์žฌ์ƒํ•œ๋‹ค. 50% ํ™•๋ฅ ๋กœ ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ(\sigma=0.01)๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค. ๊ทธ๊ฒŒ ์ „๋ถ€๋‹ค.

์ด ๋‹จ์ˆœํ•œ ์„ธํŒ…์ด ๋„ค ๊ฐ€์ง€ ์›์น™์„ ๋™์‹œ์— ๋งŒ์กฑํ•œ๋‹ค.

  1. Policy-awareness: ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์˜ ํ–‰๋™ ๋ถ„ํฌ๋ฅผ ์žฌ์ƒํ•˜๋ฏ€๋กœ, ๊ฑฐ์‹œ์ ์œผ๋กœ๋Š” task์™€ ๋น„์Šทํ•œ ํ–‰๋™ ์˜์—ญ์— ๋จธ๋ฌธ๋‹ค.
  2. Object-loaded interaction: ๊ณต๋“ค์ด ์†๊ฐ€๋ฝ ์‚ฌ์ด์—์„œ ๋ฌด์ž‘์œ„ ๋ถ€ํ•˜๋ฅผ ๋งŒ๋“ ๋‹ค. ๋‹จ์ˆœ wave action๊ณผ ๊ฒฐ์ •์ ์œผ๋กœ ๋‹ค๋ฅธ ์ ์ด๋‹ค.
  3. Broad coverage: ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€์™€ ๊ณต๋“ค์˜ ๋ฌด์ž‘์œ„์„ฑ์ด ๋ถ„ํฌ ํญ์„ ๋„“ํžŒ๋‹ค.
  4. Scalability: ๋ฌผ์ฒด๋ฅผ ๋–จ์–ด๋œจ๋ฆด ์ผ์ด ์—†๋‹ค. ์‚ฌ๋žŒ์ด ๊ฐœ์ž…ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. ํ•˜๋“œ์›จ์–ด ์†์ƒ ์œ„ํ—˜๋„ ๋‚ฎ๋‹ค.
flowchart TB
    A["Simulated base policy actions<br/>(open-loop replay)"] --> B{"Add Gaussian noise?<br/>p=0.5"}
    B -->|"yes"| C["a_t + noise"]
    B -->|"no"| D["a_t"]
    C --> E["Execute on LEAP hand<br/>inside Chaos Box"]
    D --> E
    E --> F["Hand interacts with<br/>soft balls -> random loads"]
    F --> G["Record (q_t, a_t) histories"]
    G --> H["Train joint-wise<br/>neural dynamics f_psi"]
Figure 1: Chaos Box ์ž๋™ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ํ๋ฆ„.

๊ฒฐ๊ณผ๋Š” ์ธ์ƒ์ ์ด๋‹ค. ๋…ผ๋ฌธ์€ task-aware ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์œผ๋กœ ๋™์ผํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋ ค๋ฉด ์•ฝ 750๋งŒ ๊ฐœ์˜ trajectory, ์•ฝ 41๋งŒ 7์ฒœ ์‹œ๊ฐ„์ด ํ•„์š”ํ•˜๋‹ค๊ณ  ์™ธ์‚ฝํ•œ๋‹ค. Chaos Box๋Š” ๊ฐ™์€ ํšจ๊ณผ๋ฅผ ๋ฉฐ์น  ์•ˆ์— ๋‚ธ๋‹ค. ํ•ต์‹ฌ์€ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ ์ผ์น˜ ์š”๊ตฌ๋ฅผ ์™„ํ™”ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์—ฌ๊ธฐ์„œ ์ž์ฃผ ๋ฐ›์„ ๋งŒํ•œ ์˜๋ฌธ์— ๋ฏธ๋ฆฌ ๋‹ตํ•˜์ž. โ€œ๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด ์ •์ž‘ task์—์„œ ์ž˜ ์ž‘๋™ํ• ๊นŒ?โ€ ๋…ผ๋ฌธ์˜ Figure 4๋Š” ์ด ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๊ฒฝํ—˜์  ์ฆ๊ฑฐ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๋‹จ์ผ ๊ด€์ ˆ์˜ ์ž…์ถœ๋ ฅ ํžˆ์Šคํ† ๋ฆฌ ๋ถ„ํฌ๋Š” Chaos Box ๋ฐ์ดํ„ฐ์™€ ์‹ค์ œ task ๋ฐ์ดํ„ฐ ์‚ฌ์ด์—์„œ ๊ฑฐ์˜ ๊ฒน์นœ๋‹ค. ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒด ์† ๋‹จ์œ„๋กœ ๋ณด๋ฉด ๋‘ ๋ถ„ํฌ๊ฐ€ ๋ถ„๋ฆฌ๋œ๋‹ค. ์ด๊ฒƒ์ด ์‚ฌ์˜ g๊ฐ€ KL์„ ์ถ•์†Œํ•˜๋Š” ๋ชจ์Šต์„ ๊ทธ๋Œ€๋กœ ์‹œ๊ฐํ™”ํ•œ ๊ฒฐ๊ณผ๋‹ค.

๋„ค ๋ฒˆ์งธ ํ†ต์ฐฐ: ๋ฒ ์ด์Šค ์ •์ฑ…์€ ๊ฑด๋“œ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค (residual policy)

ํ•™์Šต๋œ dynamics ๋ชจ๋ธ์„ ์–ด๋–ป๊ฒŒ ์“ธ ๊ฒƒ์ธ๊ฐ€? ์ž์—ฐ์Šค๋Ÿฌ์šด ์„ ํƒ์ง€๋Š” ๋‘ ๊ฐ€์ง€๋‹ค.

  1. ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ œ์–ด(MPC)๋‚˜ ์ •์ฑ… ํŒŒ์ธํŠœ๋‹: ํ•™์Šต๋œ dynamics๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ๋งŒ๋“ค์–ด ์ •์ฑ…์„ ์žฌํ•™์Šตํ•œ๋‹ค. ASAP, UAN์˜ ์ ‘๊ทผ.
  2. Residual policy: ๋ฒ ์ด์Šค ์ •์ฑ…์˜ ์ถœ๋ ฅ์„ ๋ณด์ •ํ•˜๋Š” ์ž‘์€ ์ •์ฑ…์„ ๋ณ„๋„๋กœ ํ•™์Šตํ•œ๋‹ค.

DexNDM์€ ํ›„์ž๋ฅผ ์„ ํƒํ•œ๋‹ค. ์ด์œ ๋Š”? ํ•™์Šต๋œ dynamics ๋ชจ๋ธ์€ ๋ถ€๋ถ„์ ์œผ๋กœ๋งŒ ์ •ํ™•ํ•˜๋‹ค(global accuracy ๋ณด์žฅ ์—†์Œ). ๊ทธ ์œ„์—์„œ ์ •์ฑ…์„ ๋‹ค์‹œ ํ•™์Šต์‹œํ‚ค๋ฉด ๋ชจ๋ธ ์˜ค์ฐจ์— ์ •์ฑ…์ด ๊ณผ์ ํ•ฉํ•œ๋‹ค. Residual์€ ๊ทธ ์œ„ํ—˜์„ ์ค„์ธ๋‹ค.

์ˆ˜์‹์œผ๋กœ ๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

{\pi^{\text{res}}}^{*} = \arg\min_{\pi^{\text{res}}} \mathbb{E}_{\tau \sim p_{\pi^*}(\tau)} \sum_{t=1}^{N-1} \left\| \mathbf{q}_{t+1} - f_\psi\left(\{\mathbf{q}_j, \mathbf{a}_j + \pi^{\text{res}}(\mathbf{o}_j^{\text{gene}}, \mathbf{a}_j)\}_{j=t-W+1}^{t}\right) \right\|

์ง๊ด€์ ์œผ๋กœ ํ’€๋ฉด ์ด๋ ‡๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ์ •๋‹ต trajectory๋ฅผ ๊ฐ€์ง€๊ณ , โ€œ๋ณด์ •๋œ ์•ก์…˜์„ ์‹ค์„ธ๊ณ„ dynamics ๋ชจ๋ธ์— ๋„ฃ์—ˆ์„ ๋•Œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ๋‹ค์Œ ์ƒํƒœ๊ฐ€ ๋‚˜์˜ค๋„๋กโ€ ๋ณด์ • ํ•ญ์„ ํ•™์Šตํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์‹ค์ œ ๋กœ๋ด‡์— ๋ณด์ •๋œ ์•ก์…˜์„ ์คฌ์„ ๋•Œ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ๋ณด์—ฌ์ค€ ํ–‰๋™๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค๋Š” ๋…ผ๋ฆฌ๋‹ค.

๋ฐฐํฌ ์‹œ์—๋Š” ๋‹จ์ˆœํžˆ \mathbf{a}_t + \mathbf{a}_t^{\text{res}} ๋ฅผ ์‹คํ–‰ํ•œ๋‹ค. ๋ฒ ์ด์Šค ์ •์ฑ…์€ ๊ทธ๋Œ€๋กœ๋‹ค. ์ด๊ฑด ์‹ค๋ฌด์ ์œผ๋กœ ํฐ ์ด์ ์ด๋‹ค. ์ƒˆ ๊ฐ์ฒด๋‚˜ ์ƒˆ wrist ์ž์„ธ๋ฅผ ์ถ”๊ฐ€ํ•  ๋•Œ, ๋ฒ ์ด์Šค ์ •์ฑ…์„ ๋‹ค์‹œ ํ•™์Šตํ•  ํ•„์š” ์—†์ด dynamics ๋ชจ๋ธ๊ณผ residual๋งŒ ์—…๋ฐ์ดํŠธํ•˜๋ฉด ๋œ๋‹ค.

์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ

์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์„ ํ•œ ์žฅ์œผ๋กœ ์ •๋ฆฌํ•˜๋ฉด ์ด๋ ‡๋‹ค.

flowchart TB
    subgraph SIM["Simulation training"]
        A["(A) Train category-specific<br/>oracle policies (PPO)"] --> B["(B) Distill into generalist<br/>via Behavior Cloning"]
    end
    subgraph S2R["Neural sim-to-real"]
        C["(C) Chaos Box<br/>autonomous data collection"] --> D["(D) Train joint-wise<br/>neural dynamics f_psi"]
        D --> E["(E) Train residual policy<br/>pi_res via supervised learning"]
    end
    B --> E
    E --> F["Deploy: a_t + a_t^res<br/>on LEAP hand"]
Figure 2: DexNDM ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ. (A-B)๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•™์Šต, (C-E)๋Š” sim-to-real ๋ณด์ •.

๋ฒ ์ด์Šค ์ •์ฑ… ํ•™์Šต ์ž์ฒด์—๋„ ํ•œ ๊ฐ€์ง€ ๋””ํ…Œ์ผ์ด ์žˆ๋‹ค. DAgger ์Šคํƒ€์ผ distillation์€ ์ด setting์—์„œ ๋ฌด๋„ˆ์ง„๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ตœ์ ํ™”๊ฐ€ ์•ˆ ๋˜๊ฑฐ๋‚˜ ์‹ค์„ธ๊ณ„์—์„œ ์ •์ฑ…์ด ๋ถ•๊ดด๋œ๋‹ค. ๋…ผ๋ฌธ์€ PenSpin์˜ ๊ด€์ฐฐ๊ณผ ์ผ์น˜ํ•œ๋‹ค๊ณ  ๋ณธ๋‹ค. ๋Œ€์•ˆ์€ ๋‹จ์ˆœํ•œ BC๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ oracle ์ •์ฑ…์„ ๋กค์•„์›ƒํ•˜๊ณ , ์„ฑ๊ณตํ•œ trajectory๋งŒ ๊ณจ๋ผ์„œ generalist๋ฅผ supervised๋กœ ํ•™์Šตํ•œ๋‹ค. ๊ณ ํ’ˆ์งˆ ํ–‰๋™๋งŒ ๋ชจ๋ฐฉํ•˜๋Š” ๊ฒƒ์ด ์ด ๋‚œ์ด๋„์—์„œ๋Š” ๋” ์ž˜ ํ†ตํ•œ๋‹ค.

์‹คํ—˜: ๋ฌด์—‡์„, ์–ด๋–ป๊ฒŒ, ์™œ

์„ค์ •

  • ํ•˜๋“œ์›จ์–ด: LEAP hand (16-DoF, 4-finger). Visual Dexterity์˜ ์ปค์Šคํ…€ Dโ€™Claw๋ณด๋‹ค ์ž‘๊ณ  ์ผ๋ฐ˜์ ์ด๋‹ค.
  • ๊ฐ์ฒด ๋ถ„ํฌ: ๋™๋ฌผ ๋ชจ์–‘(์ฝ”๋ผ๋ฆฌ, ํ† ๋ผ, ์ฐป์ฃผ์ „์ž), ์ข…ํšก๋น„ ์ตœ๋Œ€ 5.33 (์˜ˆ: 20cm ๋ง‰๋Œ€), ์ž‘์€ ๋ฌผ์ฒด(2-3cm). object-to-hand ratio 0.31์—์„œ 1.68.
  • Wrist orientation: palm up/down, base up/down, thumb up/down ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉํ–ฅ.
  • ํšŒ์ „์ถ•: ๋‹ค์ถ•.

์ฃผ์š” ๊ฒฐ๊ณผ

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ผ๋ฐ˜ํ™”: ๋ฒ ์ด์Šค ์ •์ฑ…์ด ์ƒˆ๋กœ์šด ๋ณต์žก ํ˜•์ƒ์— ๋Œ€ํ•ด baseline์„ 37%-81% ์ฐจ์ด๋กœ ์•ž์„ ๋‹ค. ๋‹จ์ผ ์ •์ฑ…์œผ๋กœ ์นดํ…Œ๊ณ ๋ฆฌ ๊ฐ„ generalist ๋Šฅ๋ ฅ์„ ํ™•๋ณดํ–ˆ๋‹ค๋Š” ์ฆ๊ฑฐ๋‹ค.

์‹ค์„ธ๊ณ„ ๊ฒ€์ฆ: sim-to-real ๋ชจ๋“ˆ์ด ์ผ๊ด€๋˜๊ฒŒ ํšŒ์ „ ์„ฑ๋Šฅ์„ ๋Œ์–ด์˜ฌ๋ฆฐ๋‹ค. ํŠนํžˆ ์†๋“ฑ์ด ์•„๋ž˜๋กœ ํ–ฅํ•œ ์ž์„ธ์—์„œ 10-16cm ๋ง‰๋Œ€๋ฅผ ์žฅ์ถ• ๊ธฐ์ค€์œผ๋กœ ํ•œ ๋ฐ”ํ€ด ๊ฐ€๊นŒ์ด ๊ณต์ค‘์—์„œ ํšŒ์ „์‹œํ‚จ ์ฒซ ์‹œ์—ฐ์ด๋‹ค. ์ด๊ฑด dexterous manipulation์˜ โ€œ์–ด๋ ต๋‹ค๊ณ  ์•Œ๋ ค์ง„ ์ž์„ธโ€์—์„œ์˜ ๋ŒํŒŒ๋‹ค.

Visual Dexterity ๋น„๊ต: VD๊ฐ€ ํฐ Dโ€™Claw๋กœ ๋ณด์—ฌ์คฌ๋˜ ๋ณต์žก ํ˜•์ƒ ํšŒ์ „ ์„ฑ๋Šฅ์„, DexNDM์€ ๋” ์ž‘์€ LEAP hand์—์„œ ๋™๋“ฑํ•˜๊ฑฐ๋‚˜ ์šฐ์›”ํ•˜๊ฒŒ ๋‹ฌ์„ฑํ•œ๋‹ค. VD๊ฐ€ ์–ด๋ ค์›Œํ•œ ์ฝ”๋ผ๋ฆฌ, ํ† ๋ผ, ์ฐป์ฃผ์ „์ž ๊ฐ™์€ ํ˜•์ƒ์—์„œ ๋” ์ž˜ ์ž‘๋™ํ•œ๋‹ค. โ€œsurvival angle(๋–จ์–ด๋œจ๋ฆฌ๊ธฐ ์ „๊นŒ์ง€์˜ ๋ˆ„์  ํšŒ์ „๊ฐ)โ€ ์ง€ํ‘œ์—์„œ ๋น„์Šทํ•˜๊ฑฐ๋‚˜ ์šฐ์›”ํ•˜๋‹ค.

AnyRotate ๋น„๊ต: AnyRotate์˜ axis/wrist ์ผ๋ฐ˜์„ฑ์€ ์ •ํ˜• ๊ฐ์ฒด์— ํ•œ์ •๋˜์—ˆ๋‹ค. DexNDM์€ ๊ฐ™์€ ์ผ๋ฐ˜์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋” ์–ด๋ ค์šด ๊ฐ์ฒด ๋ถ„ํฌ(์ž‘์€ ํฌ๊ธฐ, ๋†’์€ ์ข…ํšก๋น„)๊นŒ์ง€ ๋‹ค๋ฃฌ๋‹ค. ์†๊ฐ€๋ฝ ๊ฒŒ์ดํŒ…(finger gaiting)๋„ ๋” ์ •๊ตํ•˜๋‹ค.

ASAP/UAN ๋น„๊ต: ์ด ๋‘ sim-to-real ๊ธฐ๋ฒ•์€ dexterous manipulation์—์„œ ์™„์ „ํžˆ ์‹คํŒจํ•œ๋‹ค. ์ด์œ ๋Š” ๋ช…ํ™•ํ•˜๋‹ค. ๊ทธ๋“ค์˜ dynamics ๋ชจ๋ธ/compensator๋Š” ์ž์œ  ์šด๋™(๊ฐ์ฒด ์—†์Œ) ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋˜๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ์ฒด์™€์˜ ํ’๋ถ€ํ•œ ์ ‘์ด‰ dynamics๋ฅผ ์ผ๋ฐ˜ํ™”ํ•˜์ง€ ๋ชปํ•œ๋‹ค. DexNDM์€ Chaos Box๋กœ ๊ฐ์ฒด ๋ถ€ํ•˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด ๊ฒฉ์ฐจ๊ฐ€ ์—†๋‹ค.

Ablation ๋ถ„์„ ์š”์•ฝ

๋ณ€๊ฒฝ์  ์˜ํ–ฅ
Joint-wise โ†’ Whole-hand dynamics ์ ์€ ๋ฐ์ดํ„ฐ/๋ถ„ํฌ ์‹œํ”„ํŠธ ํ™˜๊ฒฝ์—์„œ 37%-81% ์„ฑ๋Šฅ ํ•˜๋ฝ
Joint-wise โ†’ Finger-wise dynamics ์ค‘๊ฐ„ ์ •๋„ ํ•˜๋ฝ. ์†๊ฐ€๋ฝ ๋‹จ์œ„๋„ ์ •๋ณด ์••์ถ•์ด ๋ถ€์กฑ
์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‚ฌ์ „ํ•™์Šต ์ œ๊ฑฐ ํฐ ํญ์˜ ์„ฑ๋Šฅ ํ•˜๋ฝ
Chaos Box โ†’ wave action๋งŒ ๊ฐ์ฒด ๋ถ€ํ•˜ ๋ถ€์žฌ๋กœ ์‹ค์„ธ๊ณ„ dynamics ํ•™์Šต ์‹คํŒจ
๋…ธ์ด์ฆˆ ์ฃผ์ž… ์ œ๊ฑฐ ๋ถ„ํฌ ํญ์ด ์ข์•„์ ธ ์ผ๋ฐ˜ํ™” ์ €ํ•˜
Policy-aware replay ์ œ๊ฑฐ task ์˜์—ญ์—์„œ์˜ ์ •ํ™•๋„ ํ•˜๋ฝ

๊ฐ ์„ค๊ณ„ ์„ ํƒ์ด ์šฐ์—ฐ์ด ์•„๋‹ˆ๋ผ๋Š” ์ ์ด ablation์œผ๋กœ ๋ช…ํ™•ํžˆ ๋“œ๋Ÿฌ๋‚œ๋‹ค.

์‘์šฉ: Teleoperation

์ผ๋ฐ˜ํ™”๋œ ํšŒ์ „ ์ •์ฑ… ์œ„์— Meta Quest 3 ๊ธฐ๋ฐ˜ teleoperation ์‹œ์Šคํ…œ์„ ์–น์–ด, ๋“œ๋ผ์ด๋ฒ„ ์‚ฌ์šฉ, ์นผ ๋‹ค๋ฃจ๊ธฐ, ๋ถ€ํ’ˆ ์กฐ๋ฆฝ ๊ฐ™์€ long-horizon dexterous task๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ํšŒ์ „ ๋Šฅ๋ ฅ์ด ์ผ๋ฐ˜์ ์ด๋ผ๋Š” ๊ฒƒ์€, ๊ทธ ์œ„์— ๋” ๋ณต์žกํ•œ task layer๋ฅผ ์–น์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์˜๋ฏธ๋‹ค. ์ด๊ฑด ์‚ฐ์—…์  ํ•จ์˜๊ฐ€ ํฌ๋‹ค.

๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

  1. ์ด๋ก ๊ณผ ์‹คํ—˜์ด ๊ฐ™์€ ๋ฐฉํ–ฅ์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค. DPI ๊ธฐ๋ฐ˜ ์ผ๋ฐ˜ํ™” ๋ถ„์„์ด ๊ฒฐ๊ณผ๋ฅผ ๊น”๋”ํ•˜๊ฒŒ ์„ค๋ช…ํ•œ๋‹ค. โ€œ๋‹จ์ผ ๊ด€์ ˆ ํžˆ์Šคํ† ๋ฆฌ๋Š” ์ž๊ธฐ ๋™์—ญํ•™์„ ์˜ˆ์ธกํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•˜๋˜, ๋‹ค๋ฅธ ๊ด€์ ˆ์˜ ์˜ํ–ฅ์„ ๋ณต์›ํ•˜๊ธฐ์—” ๋ถ€์กฑํ•˜๋‹คโ€๋Š” ๋ถ„์„์€ ํ‘œํ˜„๋ ฅ๊ณผ ์ •๊ทœํ™”์˜ ๊ท ํ˜•์„ ์ž˜ ์žก์•˜๋‹ค.
  2. ํ•˜๋“œ์›จ์–ด ์ง„์ž… ์žฅ๋ฒฝ์„ ๋‚ฎ์ถ˜๋‹ค. ๋น„์‹ผ ์ด‰๊ฐ ์„ผ์„œ๋‚˜ ์ปค์Šคํ…€ hand ์—†์ด LEAP hand๋กœ SOTA๋ฅผ ์นœ๋‹ค. Allegro Hand ๊ฐ™์€ ์ผ๋ฐ˜ ์—ฐ๊ตฌ์šฉ hand๋กœ๋„ follow-up์ด ๊ฐ€๋Šฅํ•ด ๋ณด์ธ๋‹ค.
  3. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ์‹ค์šฉ์„ฑ. Chaos Box๋Š” ์–ด๋–ค ์—ฐ๊ตฌ์‹ค์ด๋“  ๋ฉฐ์น  ์•ˆ์— ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ์„ ๋งŒํผ ๋‹จ์ˆœํ•˜๋‹ค. ์‚ฌ๋žŒ ๊ฐœ์ž…๊ณผ ๋น„์ „ ์ถ”์  ์˜์กด์„ ๋ชจ๋‘ ์ œ๊ฑฐํ•œ ๊ฒƒ์ด ํ•ต์‹ฌ ๊ฐ€์น˜๋‹ค.
  4. ๋ชจ๋“ˆ์‹ ์„ค๊ณ„. ๋ฒ ์ด์Šค ์ •์ฑ…, dynamics ๋ชจ๋ธ, residual policy๊ฐ€ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ์–ด ๋ถ€๋ถ„ ์—…๋ฐ์ดํŠธ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ƒˆ ๊ฐ์ฒด ์ถ”๊ฐ€๋‚˜ ์ƒˆ wrist ์ž์„ธ ์ถ”๊ฐ€ ์‹œ, ์ „์ฒด ์žฌํ•™์Šต์ด ์•„๋‹Œ ๋ชจ๋“ˆ ๊ต์ฒด๋กœ ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

  1. ์ด‰๊ฐ ์„ผ์„œ ๋ถ€์žฌ. ๋…ผ๋ฌธ ์Šค์Šค๋กœ ์ธ์ •ํ•œ ํ•œ๊ณ„๋‹ค. ๋ฏธ๋„๋Ÿผ ๋ฐœ์ƒ, ๋ฏธ์„ธ ์ ‘์ด‰ ๊ฒ€์ถœ, ํ‘œ๋ฉด ์žฌ์งˆ ์ถ”์ • ๊ฐ™์€ ๋Šฅ๋ ฅ์ด ๋น ์ ธ์žˆ๋‹ค. DIGIT์ด๋‚˜ GelSight ํ†ตํ•ฉ์€ ์ž์—ฐ์Šค๋Ÿฌ์šด ํ›„์† ๋ฐฉํ–ฅ์ด๋‹ค.
  2. ์ €์† ๊ฐ€์ •. Coriolis ํ•ญ ๋ฌด์‹œ๋Š” ์ผ๋ฐ˜์ ์ธ in-hand ํšŒ์ „ ์†๋„์—์„œ๋Š” ํ•ฉ๋ฆฌ์ ์ด์ง€๋งŒ, ๋น ๋ฅธ finger gaiting์ด๋‚˜ ๋™์  manipulation์—์„œ๋Š” ๊นจ์งˆ ์ˆ˜ ์žˆ๋‹ค. ํšจ๊ณผ ํ•ญ์ด ์งง์€ ์œˆ๋„์šฐ์—์„œ ์—ฐ์†ํ•จ์ˆ˜์ฒ˜๋Ÿผ ๋ณ€ํ•œ๋‹ค๋Š” ๊ฐ€์ •์ด ํ”๋“ค๋ฆฌ๋Š” ๊ฒฝ์šฐ๋‹ค.
  3. ํšŒ์ „ task์— ํŠนํ™”๋œ ๊ฒ€์ฆ. Residual policy์˜ ํ•™์Šต ๋ชฉํ‘œ๊ฐ€ โ€œ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ๋ณธ ๋‹ค์Œ ์ƒํƒœ์— ๋„๋‹ฌํ•˜๊ธฐโ€๋‹ค. ์ด๋Š” trajectory๊ฐ€ ๋ณธ์งˆ์ ์œผ๋กœ ํšŒ์ „์ธ ๊ฒฝ์šฐ์— ์ž˜ ์ž‘๋™ํ•˜์ง€๋งŒ, ๋ณ€ํ˜• ๊ฐ€๋Šฅํ•œ ๊ฐ์ฒด๋‚˜ grasping/handover ๊ฐ™์€ task๋กœ ์˜ฎ๊ธฐ๋ฉด dynamics ๋ชจ๋ธ์˜ ํ‘œํ˜„ ํ•œ๊ณ„๊ฐ€ ๋“œ๋Ÿฌ๋‚  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค.
  4. Chaos Box์˜ ๋ถ„ํฌ๊ฐ€ ์ถฉ๋ถ„ํ•œ๊ฐ€?. Figure 4๊ฐ€ ๋ณด์—ฌ์ฃผ๋Š” ๋ถ„ํฌ ์ผ์น˜๋Š” ๋งค๋ ฅ์ ์ด์ง€๋งŒ, ์‹ค์ œ๋กœ ๋” ๊ทน๋‹จ์ ์ธ ์ž์„ธ๋‚˜ ๋งค์šฐ ์ž‘์€ ๋ฌผ์ฒด์—์„œ ํšจ๊ณผ ํ•ญ์˜ ๋ถ„ํฌ๊ฐ€ Chaos Box๋กœ ์ถฉ๋ถ„ํžˆ ์ปค๋ฒ„๋˜๋Š”์ง€๋Š” case-by-case๋‹ค. ๊ทน๋‹จ ์ผ€์ด์Šค์—์„œ ๋ถ„ํฌ ๋ณด์™„ ์ „๋žต์ด ๋ณ„๋„๋กœ ํ•„์š”ํ•  ์ˆ˜ ์žˆ๋‹ค.
  5. Residual policy์˜ ๋ณด์ • ๋ฒ”์œ„. ๋ฒ ์ด์Šค ์ •์ฑ…์ด ์™„์ „ํžˆ ๋ถ€์ ํ•ฉํ•œ ๊ฒฝ์šฐ(์˜ˆ: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋„ ๋ชป ํ‘ธ๋Š” ์ƒˆ๋กœ์šด task), residual์€ ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ๋ฒ ์ด์Šค ์ •์ฑ…์˜ ํ’ˆ์งˆ์ด ceiling์„ ์ •ํ•œ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ์œ„์น˜ ์ง“๊ธฐ

์—ฐ๊ตฌ ์ ‘๊ทผ DexNDM๊ณผ์˜ ๊ด€๊ณ„
RMA (Kumar 2021) proprioceptive history๋กœ ์ž ์žฌ ํ‘œํ˜„ ํ•™์Šต DexNDM์˜ ๊ด€์ ˆ๋ณ„ ๋ชจ๋ธ๋ง์ด RMA๋ฅผ ๊ด€์ ˆ ๋‹จ์œ„๋กœ ๋ถ„ํ•ดํ•œ ์ผ๋ฐ˜ํ™”
Visual Dexterity (Chen 2022) ๋น„์ „ + RL + ํฐ Dโ€™Claw DexNDM์€ ๋” ์ž‘์€ hand๋กœ ๋™๋“ฑ ์ด์ƒ. Wrist orientation ์ผ๋ฐ˜์„ฑ์—์„œ ์šฐ์œ„
AnyRotate (Yang 2024) ์ด‰๊ฐ + axis/wrist ์ผ๋ฐ˜ํ™” DexNDM์€ ์ด‰๊ฐ ์—†์ด ๊ฐ์ฒด ์ผ๋ฐ˜์„ฑ๊นŒ์ง€ ํ™•์žฅ
ASAP (He 2025) ์ „์ฒด ์‹œ์Šคํ…œ dynamics ํ•™์Šต, locomotion ์ค‘์‹ฌ DexNDM์€ dexterous manipulation์— ์ ํ•ฉํ•˜๋„๋ก ๋ถ„ํ•ด๋œ dynamics
UAN (Fey 2025) sim-real delta action ํ•™์Šต UAN์€ ๊ฐ์ฒด ๋ถ€ํ•˜ ์—†์ด ํ•™์Šต. Manipulation ์ผ๋ฐ˜ํ™” ์‹คํŒจ
HORA (Qi 2023) proprioception ๊ธฐ๋ฐ˜ in-hand ํšŒ์ „ DexNDM์˜ sim-to-real ๋ชจ๋“ˆ์„ HORA ๋ฅ˜์— ๊ฒฐํ•ฉํ•˜๋ฉด hardware ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ

Allegro Hand ์—ฐ๊ตฌ์ž๊ฐ€ ๊ฐ€์ ธ๊ฐˆ ๋งŒํ•œ ์ธ์‚ฌ์ดํŠธ

DexNDM์˜ ์„ค๊ณ„ ๊ฒฐ์ •์€ Allegro Hand ๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ์—๋„ ์ง์ ‘ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์‹œ์‚ฌ์ ์„ ๋˜์ง„๋‹ค.

  1. HORA, RotateIt ๋ฅ˜ ์ •์ฑ…์— sim-to-real ๋ณด์ • ๋ชจ๋“ˆ๋กœ ์–น์–ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฏธ ์žˆ๋Š” ๋ฒ ์ด์Šค ์ •์ฑ…์„ ์žฌํ•™์Šตํ•˜์ง€ ์•Š๊ณ , ๊ด€์ ˆ๋ณ„ dynamics๋งŒ Allegro Hand์—์„œ ๋ชจ์•„์„œ residual policy๋ฅผ ์–น๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๊ฐ€ ์ž์—ฐ์Šค๋Ÿฝ๋‹ค. IsaacLab์—์„œ PD ๊ฒŒ์ธ์„ ์žก์•„๋‘๊ณ , Chaos Box ๋ฐ์ดํ„ฐ๋กœ ์‹ค์„ธ๊ณ„ dynamics๋ฅผ ์ธก์ •ํ•œ ๋’ค residual์„ ํ•™์Šตํ•˜๋Š” ์›Œํฌํ”Œ๋กœ์šฐ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
  2. ๊ฐ์ฒด ์ƒํƒœ ์ถ”์ •์˜ ๋ถ€๋‹ด์„ ๋œ ์ˆ˜ ์žˆ๋‹ค. DIGIT/GelSight ํ†ตํ•ฉ ์ „์ด๋ผ๋„, ๋น„์ „ ๊ธฐ๋ฐ˜ ๊ฐ์ฒด ํŠธ๋ž˜ํ‚น ์—†์ด sim-to-real ๋ณด์ •์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์€ setup์„ ๋‹จ์ˆœํ™”ํ•œ๋‹ค.
  3. PD ๊ฒŒ์ธ ๋„๋ฉ”์ธ ๋žœ๋คํ™”์˜ ๋ณด์™„. ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋กœ ๋ชป ์žก๋Š” modeling discrepancy๋ฅผ residual์ด ๋ฉ”์šด๋‹ค. ์ฆ‰ DR๊ณผ residual์€ ์ƒํ˜ธ ๋ฐฐ์ œ๊ฐ€ ์•„๋‹ˆ๋ผ ๋ณด์™„ ๊ด€๊ณ„๋‹ค.
  4. F/T ์„ผ์„œ ๋ฐ์ดํ„ฐ๋กœ ํšจ๊ณผ ํ•ญ์„ ๊ฒ€์ฆ. ATI Mini45 ๊ฐ™์€ sensor๊ฐ€ ์žˆ๋‹ค๋ฉด, ํ•™์Šต๋œ dynamics์˜ effective term ์˜ˆ์ธก์ด ์‹ค์ œ ์ธก์ •๊ณผ ์–ด๋–ป๊ฒŒ ๋น„๊ต๋˜๋Š”์ง€ ์ •๋Ÿ‰์ ์œผ๋กœ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋ก ์  ํšจ๊ณผ ํ•ญ์ด ์‹ ๊ฒฝ๋ง์— ์–ด๋–ป๊ฒŒ ์ธ์ฝ”๋”ฉ๋˜๋Š”์ง€ ๊ฒ€์ฆํ•˜๋Š” ํฅ๋ฏธ๋กœ์šด ๋ถ„์„ ์ฃผ์ œ๋‹ค.
  5. VLA ๋ชจ๋ธ๊ณผ์˜ ๊ฒฐํ•ฉ ๊ฐ€๋Šฅ์„ฑ. ๋ฒ ์ด์Šค ์ •์ฑ…์„ VLA๋กœ ๋‘๊ณ  dynamics ๋ณด์ •๋งŒ residual๋กœ ์ฒ˜๋ฆฌํ•˜๋ฉด, vision-language๋กœ ์ •์˜๋œ ์–ด๋ ค์šด manipulation task์—์„œ๋„ sim-to-real์„ ๋‹จ์ˆœํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ฯ€0/ฯ€0.5, GR00T ๊ฐ™์€ ๋ชจ๋ธ์˜ ์•ก์…˜ ์ถœ๋ ฅ์„ residual๋กœ ๋ณด์ •ํ•˜๋Š” ์‹คํ—˜์  ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ƒ๊ฐํ•ด๋ณผ ๋งŒํ•˜๋‹ค.

๋งˆ์น˜๋ฉฐ

DexNDM์˜ ๊ฐ€์น˜๋Š” ๋‹จ์ผ ํŠธ๋ฆญ์ด ์•„๋‹ˆ๋ผ ๋‘ ๊ฒฐ์ •์˜ ๊ฒฐํ•ฉ์— ์žˆ๋‹ค. ๋ชจ๋ธ์„ ๊ด€์ ˆ ๋‹จ์œ„๋กœ ๋ถ„ํ•ดํ•ด ์ผ๋ฐ˜ํ™”๋ฅผ ๋Œ์–ด์˜ฌ๋ ธ๊ณ , ๊ทธ ์ผ๋ฐ˜ํ™”๊ฐ€ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์„ ๋‹จ์ˆœํ™”ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ’€์–ด์คฌ๋‹ค. ๋‘ ๊ฒฐ์ •์ด ๋”ฐ๋กœ ๋–จ์–ด์ ธ ์žˆ์—ˆ๋‹ค๋ฉด ํ‰๋ฒ”ํ–ˆ์„ ๊ฒƒ์ด๋‹ค. ํ•จ๊ป˜ ๋ฌถ์ด๋‹ˆ sim-to-real์ด๋ผ๋Š” ์˜ค๋žœ ๋งค๋“ญ์ด ํ’€๋ฆฐ๋‹ค.

๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ถ€๋“ฑ์‹์ด dexterous manipulation์˜ ์ผ๋ฐ˜ํ™”์— ์ ์šฉ๋œ๋‹ค๋Š” ๊ด€์ ์€ ์‹ ์„ ํ•˜๋‹ค. ์ด ๊ด€์ ์€ ๋‹ค์Œ ์งˆ๋ฌธ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ด์–ด์ง„๋‹ค. โ€œ์šฐ๋ฆฌ๊ฐ€ ๋ชจ๋ธ๋งํ•˜๋Š” ์‹œ์Šคํ…œ์˜ ์–ด๋–ค ์‚ฌ์˜์ด task-sufficientํ•˜๋ฉด์„œ ๋ถ„ํฌ ์ฐจ์ด๋ฅผ ๊ฐ€์žฅ ์ž˜ ์••์ถ•ํ•˜๋Š”๊ฐ€?โ€ ๊ด€์ ˆ ๋‹จ์œ„๊ฐ€ ๋‹ต์ธ ๊ฒฝ์šฐ๋Š” in-hand ํšŒ์ „์ด์—ˆ๋‹ค. ๋‹ค๋ฅธ task์—์„œ๋Š” ๋‹ค๋ฅธ ์‚ฌ์˜์ด ๋‹ต์ผ ์ˆ˜ ์žˆ๋‹ค(์†๊ฐ€๋ฝ ๋‹จ์œ„, ์†๋ฐ”๋‹ฅ ๋‹จ์œ„, ๊ฐ์ฒด-์†๊ฐ€๋ฝ ์ ‘์ด‰ ํŒจ์น˜ ๋‹จ์œ„ ๋“ฑ). ์ด framework๋ฅผ ์ผ๋ฐ˜ํ™”ํ•˜๋ฉด dexterous manipulation ์ „๋ฐ˜์— ์ ์šฉ ๊ฐ€๋Šฅํ•œ sim-to-real ๋ ˆ์‹œํ”ผ๊ฐ€ ๋งŒ๋“ค์–ด์งˆ ๊ฐ€๋Šฅ์„ฑ์ด ๋ณด์ธ๋‹ค.

์ด‰๊ฐ ์ •๋ณด ๋ถ€์žฌ๊ฐ€ ceiling์„ ๋งŒ๋“ ๋‹ค๋Š” ์ ์€ ๋ช…๋ฐฑํ•œ ํ›„์† ๋ฐฉํ–ฅ์ด๋‹ค. DIGIT ๊ฐ™์€ vision-based tactile ์„ผ์„œ๋ฅผ ํ†ตํ•ฉํ•˜๊ณ , ํšจ๊ณผ ํ•ญ์„ ์ง์ ‘ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์‹ ํ˜ธ๋กœ ํ™œ์šฉํ•˜๋Š” ํ›„์† ์—ฐ๊ตฌ๊ฐ€ ๊ณง ๋”ฐ๋ผ์˜ฌ ๊ฒƒ์ด๋‹ค. ๊ทธ ์‹œ์ ์—์„œ DexNDM์˜ frame์€ ํ•œ ๋‹จ๊ณ„ ๋” ๋‹จ๋‹จํ•ด์งˆ ๊ฒƒ์ด๋‹ค.

์ง€๊ธˆ ์‹œ์ ์—์„œ ๊ฐ€์žฅ ๋งค๋ ฅ์ ์ธ ๋ถ€๋ถ„์€ ์žฌํ˜„ ๊ฐ€๋Šฅ์„ฑ์ด๋‹ค. ๋น„์‹ผ ํ•˜๋“œ์›จ์–ด๊ฐ€ ํ•„์š” ์—†๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์€ ๊ณต ํ†ต ํ•˜๋‚˜๋ฉด ๋œ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด ๋‘ ๊ฐœ๊ฐ€ ๋ช…ํ™•ํ•˜๋‹ค. ์ฆ‰ ๋ˆ„๊ตฌ๋“  ๋ฉฐ์น  ์•ˆ์— ์‹œ๋„ํ•ด๋ณผ ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ๊ฒฐ๊ณผ๊ฐ€ ์ •๋ฆฌ๋˜์–ด ์žˆ๋‹ค๋Š” ์ ์ด, ์ด ๋…ผ๋ฌธ์ด dexterous manipulation ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋ฏธ์น  ์‹ค์ œ ์˜ํ–ฅ์„ ๊ฒฐ์ •ํ•  ๊ฒƒ์ด๋‹ค.

Copyright 2026, JungYeon Lee