Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • 1. ์„œ๋ก : ์™œ ์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ๊ฐ€?
      • 1.1 Dexterous Manipulation์˜ ํ˜„์žฌ ๊ณผ์ œ
      • 1.2 โ€œ๋ถˆ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜โ€์ด๋ผ๋Š” ๊ทผ๋ณธ์  ๋ฌธ์ œ
    • 2. ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก : 3๋‹จ๊ณ„ Sim-to-Real ํ”„๋ ˆ์ž„์›Œํฌ
      • 2.1 1๋‹จ๊ณ„: ๋‹จ์ˆœํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ์˜ RL ํ•™์Šต
      • 2.2 2๋‹จ๊ณ„: ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์„ ํ†ตํ•œ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
      • 2.3 3๋‹จ๊ณ„: ์ด‰๊ฐ ํ†ตํ•ฉ Behavior Cloning
    • 3. ๊ธฐ์ˆ ์  ์‹ฌ์ธต ๋ถ„์„
      • 3.1 ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์„ค๊ณ„ ์ฒ ํ•™
      • 3.2 ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ค๊ณ„
      • 3.3 ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ์‹œ์Šคํ…œ์˜ ๊ธฐ์ˆ ์  ์„ค๊ณ„
      • 3.4 ์ด‰๊ฐ ์„ผ์‹ฑ ์‹œ์Šคํ…œ์˜ ๊ธฐ์ˆ ์  ์„ธ๋ถ€์‚ฌํ•ญ
    • 4. ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
      • 4.1 ๊ธฐ์กด Sim-to-Real ์ ‘๊ทผ๋ฒ•๋“ค
      • 4.2 ์œ ์‚ฌ ์—ฐ๊ตฌ๋“ค
      • 4.3 ์ด‰๊ฐ ๊ธฐ๋ฐ˜ ์กฐ์ž‘ ์—ฐ๊ตฌ๋“ค
    • 5. ํƒœ์Šคํฌ ๋ถ„์„: ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ๊ณผ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™
      • 5.1 ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ (Nut-Bolt Fastening)
      • 5.2 ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™ (Screwdriving)
    • 6. ์ด๋ก ์  ์˜์˜์™€ ์‹ค์šฉ์  ํ•จ์˜
      • 6.1 ์ด๋ก ์  ๊ธฐ์—ฌ์˜ ์‹ฌ์ธต ๋ถ„์„
    • 7. ํ•œ๊ณ„์ ๊ณผ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ
      • 7.1 ํ˜„์žฌ ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„
      • 7.2 ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ
    • 8. ๊ฒฐ๋ก 
  • โ›๏ธ Dig Review
    • ๋ฐฉ๋ฒ•๋ก  ๊ฐœ์š” ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ
    • ์‹คํ—˜ ์„ค์ • ๋ฐ ๊ฒฐ๊ณผ ๋ถ„์„
      • ํ•˜๋“œ์›จ์–ด ๋ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ
      • ์‹คํ—˜ ๊ฒฐ๊ณผ ์š”์•ฝ
      • ์„ฑ๋Šฅ ์ง€ํ‘œ ๋ฐ ์ •๋Ÿ‰ ํ‰๊ฐ€
    • ๋น„ํŒ์  ๋ถ„์„: ์žฅ๋‹จ์  ๋ฐ ํ•œ๊ณ„
    • ๊ฒฐ๋ก  ๋ฐ ์ „๋ง

๐Ÿ“ƒDexScrew ๋ฆฌ๋ทฐ

dexterity
rl
teleop
Learning Dexterous Manipulation Skills from Imperfect Simulations
Published

December 4, 2025

๐Ÿ” Ping. ๐Ÿ”” Ring. โ›๏ธ Dig. A tiered review series: quick look, key ideas, deep dive.

  • Paper Link
  • Training Code, Operation Codee
  • Project
  1. ๐Ÿ’ก ์ด ๋…ผ๋ฌธ์€ ๋ถˆ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ๋ณต์žกํ•œ ์ ‘์ด‰ ๊ธฐ๋ฐ˜์˜ ๋Šฅ์ˆ™ํ•œ ์กฐ์ž‘ ๊ธฐ์ˆ ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ sim-to-real ํ”„๋ ˆ์ž„์›Œํฌ์ธ DexScrew๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ› ๏ธ DexScrew๋Š” ๋‹จ์ˆœํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํšŒ์ „ ๊ธฐ์ˆ ์„ ํ•™์Šตํ•˜๊ณ , ์ด ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•œ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์œผ๋กœ ์‹ค์ œ ์ด‰๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•œ ๋’ค, ์ด multisensory ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ–‰๋™ ๋ชจ๋ฐฉ(behavior cloning) ์ •์ฑ…์„ ํ›ˆ๋ จํ•˜์—ฌ ํ˜„์‹ค ์ ์šฉ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
  3. ๐Ÿš€ ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ ๋ฐ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™ ์ž‘์—…์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ง์ ‘์ ์ธ sim-to-real ๋ฐฉ์‹๋ณด๋‹ค ๋†’์€ ์„ฑ๊ณต๋ฅ ๊ณผ unseen object์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ์ด‰๊ฐ ์„ผ์‹ฑ๊ณผ temporal history๊ฐ€ robustํ•œ ์„ฑ๋Šฅ์— ํ•„์ˆ˜์ ์ž„์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ ๋ถˆ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ๋ณต์žกํ•œ ์ ‘์ด‰ ์—ญํ•™(contact dynamics) ๋ฐ ๋‹ค์ค‘ ์„ผ์„œ(multisensory) ์‹ ํ˜ธ(ํŠนํžˆ ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ)๊ฐ€ ํ•„์š”ํ•œ ์„ฌ์„ธํ•œ ์กฐ์ž‘ ๊ธฐ์ˆ ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ sim-to-real ํ”„๋ ˆ์ž„์›Œํฌ์ธ DexScrew๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด Sim-to-Real ๋ฐฉ๋ฒ•๋ก ์˜ ํ•œ๊ณ„(์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ์–ด๋ ค์›€, ๊ฐ๊ฐ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์˜ Sim-to-Real ๊ฐ„๊ทน)์™€ ๋ชจ๋ฐฉ ํ•™์Šต(Imitation Learning)์˜ ํ•œ๊ณ„(๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ์–ด๋ ค์›€)๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

์ œ์•ˆํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  1. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๊ฐ•ํ™” ํ•™์Šต(RL) ์ •์ฑ… ํ›ˆ๋ จ (III-A):
    • ๊ฐ„์†Œํ™”๋œ ๊ฐ์ฒด ๋ชจ๋ธ๋ง (Simplified Object Modeling): ๋„ˆํŠธ๋‚˜ ์Šคํฌ๋ฅ˜์˜ ๋ณต์žกํ•œ ์Šค๋ ˆ๋“œ ๊ตฌ์กฐ๋ฅผ ์ง์ ‘ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋Š” ๋Œ€์‹ , ํšŒ์ „ ๋™์ž‘์˜ ๋ณธ์งˆ์„ ํฌ์ฐฉํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ„๋‹จํ•œ ๊ธฐํ•˜ํ•™์  ํ˜•ํƒœ(์˜ˆ: ๋„ˆํŠธ์—๋Š” ๋‘๊บผ์šด ์‚ผ๊ฐํ˜•, ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋ฒ„์—๋Š” ํŒ”๊ฐํ˜• ๋˜๋Š” ์‹ญ์ด๊ฐํ˜•)๋ฅผ ํšŒ์ „ ์กฐ์ธํŠธ(revolute joint)๋กœ ๊ณ ์ •๋œ ๋ฒ ์ด์Šค์— ์—ฐ๊ฒฐํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ •์ฑ…์ด ํšŒ์ „ ํ–‰๋™์„ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•˜๋„๋ก ๋•์Šต๋‹ˆ๋‹ค.
    • ํ›ˆ๋ จ ํŒŒ์ดํ”„๋ผ์ธ: ๋จผ์ € ํŠน๊ถŒ ์ •๋ณด(privileged information)์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š” ์˜ค๋ผํด ์ •์ฑ…(oracle policy)์„ ํ›ˆ๋ จํ•œ ๋‹ค์Œ, ์ด๋ฅผ ํ†ตํ•ด ์˜ˆ์ธก ๋ชจ๋“ˆ \phi์— ์˜ํ•ด ๊ณผ๊ฑฐ ํ–‰๋™ ์ด๋ ฅ(proprioceptive history) h_t๋กœ๋ถ€ํ„ฐ ์ถ”๋ก ๋œ ์ž„๋ฒ ๋”ฉ \hat{z}_t = \phi(h_t)์— ๊ธฐ๋ฐ˜ํ•˜๋Š” ์„ผ์„œ๋ชจํ„ฐ ์ •์ฑ…(sensorimotor policy)์„ Distillํ•ฉ๋‹ˆ๋‹ค.
    • ํŠน๊ถŒ ์ •๋ณด (Privileged Information): ์˜ค๋ผํด ์ •์ฑ…์€ ๊ฐ์ฒด์˜ ์œ„์น˜, ์Šค์ผ€์ผ, ์งˆ๋Ÿ‰, ๋งˆ์ฐฐ ๊ณ„์ˆ˜ ๋“ฑ ์‹ค์ œ ํ™˜๊ฒฝ ๋ฐ ๊ฐ์ฒด ์†์„ฑ์— ๋Œ€ํ•œ ์ง€์ƒ ์ง„์‹ค(ground-truth) ์ •๋ณด์— ์ ‘๊ทผํ•ฉ๋‹ˆ๋‹ค. (์ž์„ธํ•œ ๋‚ด์šฉ์€ Appendix A ์ฐธ์กฐ).
    • ํ–‰๋™ (Actions): ์ •์ฑ…์€ ์ƒ๋Œ€์ ์ธ ๋ชฉํ‘œ ์œ„์น˜(relative target position)๋ฅผ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ด๋Š” ๋กœ๋ด‡์˜ ๋‚ฎ์€ ์ˆ˜์ค€์˜ PD ์ปจํŠธ๋กค๋Ÿฌ๋กœ ์ „๋‹ฌ๋˜์–ด ํ† ํฌ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.
    • ๋ณด์ƒ ํ•จ์ˆ˜ (Reward): ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ •์ฑ…์˜ ๋ชฉํ‘œ๋Š” ๊ฐ„์†Œํ™”๋œ ๊ฐ์ฒด๋ฅผ ํšŒ์ „ ์กฐ์ธํŠธ ์ฃผ๋ณ€์œผ๋กœ ํšŒ์ „์‹œํ‚ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ณด์ƒ์€ ํƒœ์Šคํฌ ๋ณด์ƒ(r_{task}, ํšŒ์ „ ๋ฐ ๊ทผ์ ‘์„ฑ ์žฅ๋ ค), ์—๋„ˆ์ง€ ํŽ˜๋„ํ‹ฐ(r_{energy}, ๋น„ํšจ์œจ์ ์ธ ๋™์ž‘ ์–ต์ œ), ์•ˆ์ •์„ฑ ํŽ˜๋„ํ‹ฐ(r_{stability}, ์•ˆ์ •์ ์ธ ํ–‰๋™ ์œ ์ง€)์˜ ๊ฐ€์ค‘์น˜ ํ•ฉ์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค (Appendix B ์ฐธ์กฐ).
    • ํ›ˆ๋ จ: ์˜ค๋ผํด ์ •์ฑ…์€ PPO(Proximal Policy Optimization)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ๋˜๋ฉฐ, ์„ผ์„œ๋ชจํ„ฐ ์ •์ฑ…์€ DAgger ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ์„ผ์„œ๋ชจํ„ฐ ์ •์ฑ…์˜ ํ›ˆ๋ จ ๋ชฉํ‘œ๋Š” ์˜ˆ์ธก๋œ ํ–‰๋™๊ณผ ํŠน๊ถŒ ์ž„๋ฒ ๋”ฉ์ด ์˜ค๋ผํด ์ •์ฑ…์˜ ํ–‰๋™ ๋ฐ ์‹ค์ œ ํŠน๊ถŒ ์ž„๋ฒ ๋”ฉ๊ณผ ์ผ์น˜ํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค: \mathcal{L} = \|a_{Hand_t} - \hat{a}_{Hand_t}\|^2_2 + \|z_t - \hat{z}_t\|^2_2.
    • ๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™” (Domain Randomization): RL ์ •์ฑ…์˜ ๊ฐ•๊ฑด์„ฑ(robustness)์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๊ฐ์ฒด ์งˆ๋Ÿ‰, ์งˆ๋Ÿ‰ ์ค‘์‹ฌ, ๋งˆ์ฐฐ ๊ณ„์ˆ˜, ํฌ๊ธฐ, PD ์ด๋“(gain)์„ ๋ฌด์ž‘์œ„ํ™”ํ•˜๊ณ  ๊ด€์ธก ๋ฐ ํ–‰๋™ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค (Appendix C, Table V ์ฐธ์กฐ).
  2. ํ•™์Šต๋œ ์ •์ฑ…์„ ํ™œ์šฉํ•œ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ (III-B):
    • ์Šคํ‚ฌ ๊ธฐ๋ฐ˜ ๋ณด์กฐ ์›๊ฒฉ ์กฐ์ž‘ (Skill-Based Assisted Teleoperation): ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จ๋œ ์†๊ฐ€๋ฝ ํšŒ์ „ ์Šคํ‚ฌ(skill primitive)์„ ํ™œ์šฉํ•˜์—ฌ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค. ์ธ๊ฐ„ ์กฐ์ž‘์ž๋Š” ๊ฐœ๋ณ„ ์กฐ์ธํŠธ๋ฅผ ๋ช…๋ นํ•˜๋Š” ๋Œ€์‹  VR ์ปจํŠธ๋กค๋Ÿฌ์˜ ์กฐ์ด์Šคํ‹ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๋ด‡ ํŒ”์˜ ์†๋ชฉ ์›€์ง์ž„๋งŒ ์ œ์–ดํ•˜๊ณ , ํ•„์š”ํ•  ๋•Œ ์†๊ฐ€๋ฝ ํšŒ์ „ ์Šคํ‚ฌ์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.
    • ๋ฐ์ดํ„ฐ ๊ธฐ๋ก: ๊ฐ ํƒ€์ž„์Šคํ…์—์„œ RL ์ •์ฑ…์ด ์ƒ์„ฑํ•œ ์† ํ–‰๋™(a_{Hand_t})๊ณผ ์ธ๊ฐ„ ์›๊ฒฉ ์กฐ์ž‘์— ์˜ํ•ด ์ƒ์„ฑ๋œ ํŒ” ํ–‰๋™(a_{Arm_t})์„ ํฌํ•จํ•˜๋Š” ํ–‰๋™ a_t = [a_{Hand_t}, a_{Arm_t}]์„ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋“  ์กฐ์ธํŠธ ์œ„์น˜(q_t = [q_{Hand_t}, q_{Arm_t}])์™€ 5๊ฐœ ์†๊ฐ€๋ฝ ๋ชจ๋‘์˜ ์›์‹œ ์ด‰๊ฐ ์‹ ํ˜ธ(c_t \in \mathbb{R}^{5 \times 120 \times 3})๋ฅผ ํฌํ•จํ•˜๋Š” ๋‹ค์ค‘ ์„ผ์„œ ๊ด€์ธก(q_t, c_t)์„ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.
    • ์ด‰๊ฐ ์‹ ํ˜ธ (Tactile Signal): XHand ๋กœ๋ด‡์— ๋‚ด์žฅ๋œ ์••๋ ฅ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๊ฐ ์†๊ฐ€๋ฝ ๋์—๋Š” 120๊ฐœ์˜ ์„ผ์‹ฑ ์š”์†Œ๊ฐ€ ์žˆ์–ด 3์ถ• ํž˜์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
  3. ๋‹ค์ค‘ ์„ผ์„œ ๋ฐ์ดํ„ฐ๋กœ ํ–‰๋™ ๋ณต์ œ(Behavior Cloning, BC) ์ •์ฑ… ํ›ˆ๋ จ (III-C):
    • ์‹ ๊ฒฝ๋ง ์•„ํ‚คํ…์ฒ˜ (Neural Network Architecture): ํ”ผ๋“œํฌ์›Œ๋“œ ์‹ ๊ฒฝ๋ง์„ ์ •์ฑ…์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ณผ๊ฑฐ K ํƒ€์ž„์Šคํ…์˜ ๊ด€์ธก(q_{t-K+1:t}, c_{t-K+1:t})์€ ๋‹จ์ผ ํŠน์ง• ๋ฒกํ„ฐ๋กœ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ์ด‰๊ฐ ์‹ ํ˜ธ๋Š” ๋จผ์ € ํ‰ํƒ„ํ™”(flattening)๋œ ํ›„ MLP๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐํ•ฉ๋œ ํŠน์ง• ๋ฒกํ„ฐ๋Š” hourglass encoder๋ฅผ ํ†ตํ•ด ์ฒ˜๋ฆฌ๋˜์–ด ํ–‰๋™ ์˜ˆ์ธก์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ํ–‰๋™ ์ฒญํ‚น (Action Chunking): ์ •์ฑ…์€ ๋‹จ์ผ ํƒ€์ž„์Šคํ… ํ–‰๋™์ด ์•„๋‹Œ ๋ฏธ๋ž˜ ํ–‰๋™ ์‹œํ€€์Šค \hat{a}_{t:t+H}๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค(๊ธฐ๋ณธ๊ฐ’ K=5, H=16).
    • ํ›ˆ๋ จ: ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ D_Real์„ ์‚ฌ์šฉํ•˜์—ฌ ์ง€๋„ ํ•™์Šต(supervised learning) ๋ฐฉ์‹์œผ๋กœ BC ์ •์ฑ… \pi_{BC}๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ์†์‹ค ํ•จ์ˆ˜๋Š” ์˜ˆ์ธก๋œ ํ–‰๋™ ์ฒญํฌ์™€ ์‹ค์ œ ์ „๋ฌธ๊ฐ€ ํ–‰๋™ ์‹œํ€€์Šค ๊ฐ„์˜ L2 ๋…ธ๋ฆ„(norm) ์ฐจ์ด์˜ ํ•ฉ์œผ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค: \mathcal{L}_{BC} = \sum_{t=1}^{T} \sum_{h=0}^{H} \|\hat{a}_{t+h} - a_{t+h}\|^2_2.

์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ:

๋‘ ๊ฐ€์ง€ ๊ณผ์ œ(๋„ˆํŠธ-๋ณผํŠธ ์กฐ์ž„, ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋น™)์— ๋Œ€ํ•ด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

  • ๋„ˆํŠธ-๋ณผํŠธ ์กฐ์ž„ (Nut-Bolt Fastening): ์ง์ ‘์ ์ธ Sim-to-Real ์ „์ด๋กœ๋Š” ๋„ˆํŠธ๋ฅผ ์•„๋ž˜๋กœ ์กฐ์ผ ์ˆ˜ ์—†์œผ๋ฏ€๋กœ, ์Šค๋ ˆ๋“œ ์ƒํ˜ธ์ž‘์šฉ์ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•™์Šต๋œ ์ •์ฑ…์€ ๋‹ค์–‘ํ•œ ๋„ˆํŠธ ํ˜•ํƒœ(์ •์‚ฌ๊ฐํ˜•, ์‚ผ๊ฐํ˜•, ์œก๊ฐํ˜•, ์‹ญ์žํ˜•)์— ์ผ๋ฐ˜ํ™”๋ฉ๋‹ˆ๋‹ค.
    • ๊ด€์ธก ์ด๋ ฅ ๋ฐ ์ด‰๊ฐ ์ •๋ณด์˜ ํšจ๊ณผ: ๊ด€์ธก์— ์งง์€ ์‹œ๊ฐ„ ์ด๋ ฅ(temporal history)์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ์ง„ํ–‰๋ฅ (progress ratio)์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๊ณ  ์‹คํ–‰ ์‹œ๊ฐ„์„ ์ค„์ž…๋‹ˆ๋‹ค. ์ด‰๊ฐ ์ž…๋ ฅ์„ ์ถ”๊ฐ€ํ•˜๋ฉด ํŠนํžˆ ์‚ผ๊ฐํ˜• ๋ฐ ์‹ญ์žํ˜• ๋„ˆํŠธ์™€ ๊ฐ™์€ ์–ด๋ ค์šด ํ˜•ํƒœ์—์„œ ์ง„ํ–‰๋ฅ ์ด ๊ฐœ์„ ๋ฉ๋‹ˆ๋‹ค. ์ด‰๊ฐ๊ณผ ์‹œ๊ฐ„ ์ด๋ ฅ์„ ๋ชจ๋‘ ๊ฒฐํ•ฉํ•  ๋•Œ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
    • ์‹คํŒจ ๋ชจ๋“œ: ๊ด€์ธก ์ด๋ ฅ์ด ์—†๋Š” ์ •์ฑ…์€ ๋ฏธ๋ฌ˜ํ•œ ํ˜•ํƒœ ๋ณ€ํ™”์— ๋Œ€์‘ํ•˜๊ธฐ ์–ด๋ ต๊ณ , ๋น„์ด‰๊ฐ ์ •์ฑ…์€ ๋ถˆ์•ˆ์ •ํ•œ ์ ‘์ด‰ ์ƒํƒœ๋กœ ์ž์ฃผ ๋ฏธ๋„๋Ÿฌ์ ธ ์ •๋ ฌ์„ ์žƒ์Šต๋‹ˆ๋‹ค. ์ด‰๊ฐ ์ •์ฑ…์€ ์†๋ชฉ์„ ์กฐ์ ˆํ•˜๊ฑฐ๋‚˜ ํ•˜ํ–ฅ ํž˜์„ ๊ฐ€ํ•˜์—ฌ ์ด๋Ÿฌํ•œ ์‹คํŒจ๋ฅผ ๋ณต๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋น™ (Screwdriving): ๋„ˆํŠธ ์กฐ์ž„๋ณด๋‹ค ๋ณธ์งˆ์ ์œผ๋กœ ๋œ ์•ˆ์ •์ ์ธ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค.
    • Sim-to-Real ์ •์ฑ… ๋ฐ ์ „๋ฌธ๊ฐ€ ์žฌํ˜„: ์ง์ ‘์ ์ธ Sim-to-Real ์ •์ฑ…์€ ์˜๋ฏธ ์žˆ๋Š” ํ–‰๋™์„ ์ƒ์„ฑํ•˜์ง€๋งŒ ๊ณผ์ œ๋ฅผ ์™„์ „ํžˆ ์™„๋ฃŒํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์ „๋ฌธ๊ฐ€ ๋ฐ์ดํ„ฐ ์žฌํ˜„(expert replay)์€ ์„ฑ๊ณต๋ฅ ์ด ๋†’์ง€๋งŒ ๋ฐฐํฌ ์‹œ ๋ณ€ํ™”์— ์ ์‘ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.
    • ํ–‰๋™ ๋ณต์ œ ์ •์ฑ…: ์ œ์•ˆ๋œ BC ์ •์ฑ…์€ Sim-to-Real ๋ฐ ์ „๋ฌธ๊ฐ€ ์žฌํ˜„ baseline๋ณด๋‹ค ๋ช…ํ™•ํ•œ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด‰๊ฐ ์„ผ์‹ฑ ๋˜๋Š” ์‹œ๊ฐ„ ์ด๋ ฅ ๊ฐœ๋ณ„ ์ถ”๊ฐ€ ์‹œ ์ง„ํ–‰๋ฅ ์ด ํ–ฅ์ƒ๋˜๋ฉฐ, ๋‘ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๋ฅผ ๊ฒฐํ•ฉํ•  ๋•Œ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค(95.00% ์ง„ํ–‰๋ฅ ).
    • ์‹คํŒจ ๋ชจ๋“œ: ๊ฐœ๋ฐฉ ๋ฃจํ”„(open-loop) baseline์€ ์ ์ง„์ ์ธ ํ•ธ๋“ค ๋ฏธ๋„๋Ÿฌ์ง๊ณผ ๋ฐฉํ–ฅ ๋“œ๋ฆฌํ”„ํŠธ๋กœ ์ธํ•ด ์ž์ฃผ ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค. ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ๊ณผ ์‹œ๊ฐ„ ์ด๋ ฅ์ด ๋ชจ๋‘ ์žˆ๋Š” BC ์ •์ฑ…์€ ์†๋ชฉ ๋ฐฉํ–ฅ์„ ์กฐ์ ˆํ•˜๊ณ  ์ ์ ˆํ•œ ํž˜์„ ๊ฐ€ํ•˜์—ฌ ์ด๋Ÿฌํ•œ ํšจ๊ณผ๋ฅผ ๋ณด์ƒํ•ฉ๋‹ˆ๋‹ค.
  • ์™ธ๋ž€์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ (Out-of-distribution Robustness): ํ›ˆ๋ จ ์ค‘ ๊ฒช์ง€ ๋ชปํ•œ ์™ธ๋ถ€ ์™ธ๋ž€(์˜ˆ: ์†๊ฐ€๋ฝ์„ ๊ฐ์ฒด์—์„œ ๋Œ์–ด๋‹น๊ธฐ๊ฑฐ๋‚˜ ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋ฒ„๋ฅผ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ํšŒ์ „)์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ •์ฑ…์ด ์•ˆ์ •์ ์ธ ์กฐ์ž„ ํ–‰๋™์œผ๋กœ ์ผ๊ด€๋˜๊ฒŒ ๋ณต๊ตฌ๋˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๊ฒฐ๋ก :

์ด ์—ฐ๊ตฌ๋Š” ๊ฐ„์†Œํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ์„ฌ์„ธํ•œ ํšŒ์ „ ์Šคํ‚ฌ์„ ํ•™์Šตํ•œ ๋‹ค์Œ, ์ด ์Šคํ‚ฌ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์›๊ฒฉ ์กฐ์ž‘์„ ํ†ตํ•ด ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ , ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ์„ ํ†ตํ•ฉํ•˜์—ฌ ํ–‰๋™ ๋ณต์ œ ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” DexScrew ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋„ˆํŠธ-๋ณผํŠธ ์กฐ์ž„ ๋ฐ ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋น™ ๊ณผ์ œ์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‹จ๋…์œผ๋กœ๋Š” ๋ณต์žกํ•œ ์—ญํ•™์„ ํฌ์ฐฉํ•  ์ˆ˜ ์—†์—ˆ์œผ๋‚˜, ์ด‰๊ฐ ์„ผ์‹ฑ๊ณผ ์‹œ๊ฐ„ ์ด๋ ฅ์„ ๊ฒฐํ•ฉํ•œ ํ–‰๋™ ๋ณต์ œ๋Š” ๋‹ค์–‘ํ•˜๊ณ  ์ด์ „์— ๋ณด์ง€ ๋ชปํ•œ ๊ฐ์ฒด ํ˜•ํƒœ์— ๊ฑธ์ณ ๊ฐ•๊ฑดํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„๋ณ„ ํŒŒ์ดํ”„๋ผ์ธ์€ ๋ณต์žกํ•œ ์ ‘์ด‰์ด ๋งŽ์€ ์กฐ์ž‘์„ ์œ„ํ•œ ์‹ค์šฉ์ ์ด๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•˜๋ฉฐ, ์ด‰๊ฐ ์„ผ์‹ฑ๊ณผ ์Šคํ‚ฌ ๊ธฐ๋ฐ˜ ์›๊ฒฉ ์กฐ์ž‘์ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ํ™˜๊ฒฝ ๋ฐฐํฌ ์‚ฌ์ด์˜ ํšจ๊ณผ์ ์ธ ๋‹ค๋ฆฌ ์—ญํ• ์„ ํ•จ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

1. ์„œ๋ก : ์™œ ์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ๊ฐ€?

1.1 Dexterous Manipulation์˜ ํ˜„์žฌ ๊ณผ์ œ

๋กœ๋ด‡๊ณตํ•™์—์„œ ๋‹ค์ง€(multi-fingered) ์†์„ ์ด์šฉํ•œ ์ •๋ฐ€ ์กฐ์ž‘(dexterous manipulation)์€ ์ธ๊ฐ„ ์ˆ˜์ค€์˜ ๋ฒ”์šฉ ๋กœ๋ด‡์„ ํ–ฅํ•œ ํ•ต์‹ฌ ๋„์ „ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. ์ธ๊ฐ„์˜ ์†์€ 20๊ฐœ ์ด์ƒ์˜ ์ž์œ ๋„(DoF)๋ฅผ ๊ฐ€์ง€๋ฉฐ, ์ˆ˜๋ฐฑ ๊ฐœ์˜ ์ด‰๊ฐ ์ˆ˜์šฉ์ฒด๋ฅผ ํ†ตํ•ด ๋ฏธ์„ธํ•œ ์ ‘์ด‰ ์ •๋ณด๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ณต์žก์„ฑ์„ ๋กœ๋ด‡ ์‹œ์Šคํ…œ์—์„œ ์žฌํ˜„ํ•˜๋Š” ๊ฒƒ์€ ๊ธฐ๊ณ„ ์„ค๊ณ„, ์„ผ์‹ฑ, ๊ทธ๋ฆฌ๊ณ  ์ œ์–ด ๋ชจ๋‘์—์„œ ๊ทผ๋ณธ์ ์ธ ์–ด๋ ค์›€์„ ์ˆ˜๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค.

์ตœ๊ทผ ๋ช‡ ๋…„๊ฐ„ ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning, RL)๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜-์‹ค์ œ ์ „์ด(Sim-to-Real Transfer)๊ฐ€ ์ด ๋ถ„์•ผ์—์„œ ํš๊ธฐ์ ์ธ ๋ฐœ์ „์„ ์ด๋Œ์–ด์™”์Šต๋‹ˆ๋‹ค. OpenAI์˜ ๋ฃจ๋น…์Šค ํ๋ธŒ ์กฐ์ž‘, DexTreme ํ”„๋กœ์ ํŠธ์˜ ๊ทนํ•œ ๋ฏผ์ฒฉ์„ฑ ์‹œ์—ฐ ๋“ฑ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜ ํ•™์Šต์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์„ฑ๊ณต ์‚ฌ๋ก€๋“ค์€ ๋Œ€๋ถ€๋ถ„ ๊ฐ์ฒด์˜ ๋‹จ์ˆœ ์žฌ๋ฐฐํ–ฅ(reorientation)์ด๋‚˜ ๊ทธ๋ž˜์Šคํ•‘(grasping)์— ๊ตญํ•œ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์‹ค์ œ ์‚ฐ์—… ํ˜„์žฅ์—์„œ ์š”๊ตฌ๋˜๋Š” ๋ณต์žกํ•œ ๋„๊ตฌ ์‚ฌ์šฉ์ด๋‚˜ ์ •๋ฐ€ ์กฐ๋ฆฝ ์ž‘์—…์œผ๋กœ์˜ ํ™•์žฅ์€ ์—ฌ์ „ํžˆ ๋ฏธํ•ด๊ฒฐ ๊ณผ์ œ๋กœ ๋‚จ์•„์žˆ์Šต๋‹ˆ๋‹ค.

1.2 โ€œ๋ถˆ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜โ€์ด๋ผ๋Š” ๊ทผ๋ณธ์  ๋ฌธ์ œ

์ด ๋…ผ๋ฌธ์˜ ์ œ๋ชฉ์—์„œ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ๋Š” โ€œImperfect Simulationsโ€์ž…๋‹ˆ๋‹ค. ํ˜„์žฌ ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋“ค(Isaac Gym, MuJoCo, PyBullet ๋“ฑ)์€ ๊ฐ•์ฒด ๋™์—ญํ•™(rigid body dynamics)์—์„œ๋Š” ์ƒ๋‹นํ•œ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ–ˆ์ง€๋งŒ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜์—ญ์—์„œ๋Š” ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค:

  1. ๋ณต์žกํ•œ ์ ‘์ด‰ ์—ญํ•™(Contact Dynamics): ์†๊ฐ€๋ฝ๊ณผ ๊ฐ์ฒด ๊ฐ„์˜ ๋ฏธ๋„๋Ÿฌ์ง(sliding), ๊ตฌ๋ฆ„(rolling), ์ ์ฐฉ(stiction) ํ˜„์ƒ์„ ์ •ํ™•ํžˆ ๋ชจ๋ธ๋งํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

  2. ๋‹ค์ค‘ ๊ฐ๊ฐ ์‹ ํ˜ธ(Multisensory Signals): ํŠนํžˆ ์ด‰๊ฐ(tactile) ํ”ผ๋“œ๋ฐฑ์€ ์ ‘์ด‰ ๋ถ„ํฌ, ์ „๋‹จ๋ ฅ, ๋ฒ•์„ ๋ ฅ์˜ ๋ณต์žกํ•œ ์ƒํ˜ธ์ž‘์šฉ์„ ํฌํ•จํ•˜๋ฉฐ, ์ด๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ •ํ™•ํžˆ ์žฌํ˜„ํ•˜๋Š” ๊ฒƒ์€ ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

  3. ๋„๊ตฌ ๋ฐ ๊ด€์ ˆ ๊ฐ์ฒด(Articulated Objects): ๋‚˜์‚ฌ, ๋ณผํŠธ, ๊ฐ€์œ„ ๊ฐ™์€ ๊ด€์ ˆ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ๋‚ด๋ถ€ ๋งˆ์ฐฐ, ๋ฐฑ๋ž˜์‹œ(backlash), ํด๋ฆฌ์–ด๋Ÿฐ์Šค ๋“ฑ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๋ชจ๋ธ๋งํ•˜๊ธฐ ๊ทนํžˆ ์–ด๋ ค์šด ํŠน์„ฑ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์€ ์ด๋Ÿฌํ•œ โ€œ๋ถˆ์™„์ „ํ•จโ€์„ ์ธ์ •ํ•˜๊ณ , ์ด๋ฅผ ์šฐํšŒํ•˜๊ฑฐ๋‚˜ ๋ณด์™„ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•จ์œผ๋กœ์จ ์‹ค์šฉ์ ์ธ ํ•ด๊ฒฐ์ฑ…์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.


2. ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก : 3๋‹จ๊ณ„ Sim-to-Real ํ”„๋ ˆ์ž„์›Œํฌ

์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ํฐ ๊ธฐ์—ฌ๋Š” ๋ถˆ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ์ฒด๊ณ„์ ์ธ 3๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์ž…๋‹ˆ๋‹ค.

2.1 1๋‹จ๊ณ„: ๋‹จ์ˆœํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ์˜ RL ํ•™์Šต

์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ๋Š” ์˜๋„์ ์œผ๋กœ ๋‹จ์ˆœํ™”๋œ ๊ฐ์ฒด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๊ฐ•ํ™”ํ•™์Šต ์ •์ฑ…์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ๋ฒ•์˜ ํ•ต์‹ฌ ํ†ต์ฐฐ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

๋‹จ์ˆœํ™”์˜ ์ฒ ํ•™: - ์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ์  ์žฌํ˜„๋ณด๋‹ค๋Š” ์˜ฌ๋ฐ”๋ฅธ ํ–‰๋™ ๊ตฌ์กฐ(behavioral structure)์˜ ์ถœํ˜„์— ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค. - ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ์˜ ๊ฒฝ์šฐ, ์ •ํ™•ํ•œ ๋‚˜์‚ฌ์‚ฐ ๊ธฐํ•˜๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๋Œ€์‹  ๊ธฐ๋ณธ์ ์ธ ์›ํ†ตํ˜• ํ˜•์ƒ๊ณผ ๋งˆ์ฐฐ ํŠน์„ฑ๋งŒ์œผ๋กœ๋„ ์˜ฌ๋ฐ”๋ฅธ finger gait(์†๊ฐ€๋ฝ ๋ณดํ–‰ ํŒจํ„ด)๊ฐ€ ํ•™์Šต๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Finger Gait์˜ ๊ฐœ๋…: Finger gait๋Š” in-hand manipulation์—์„œ ํ•ต์‹ฌ์ ์ธ ๊ฐœ๋…์ž…๋‹ˆ๋‹ค. ์ธ๊ฐ„์ด ํŽœ์„ ๋Œ๋ฆฌ๊ฑฐ๋‚˜ ๋™์ „์„ ๊ตด๋ฆด ๋•Œ, ์†๊ฐ€๋ฝ๋“ค์€ ์ˆœ์ฐจ์ ์œผ๋กœ ์ ‘์ด‰๊ณผ ์ดํƒˆ์„ ๋ฐ˜๋ณตํ•˜๋ฉฐ ๊ฐ์ฒด๋ฅผ ์กฐ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋„ˆํŠธ๋ฅผ ๋Œ๋ฆฌ๊ฑฐ๋‚˜ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋ฒ„๋ฅผ ์กฐ์ž‘ํ•  ๋•Œ ํ•„์š”ํ•œ ์ด๋Ÿฌํ•œ finger gait ํŒจํ„ด์ด ๋‹จ์ˆœํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ถœํ˜„(emerge)ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์™œ ์ด๊ฒƒ์ด ์ž‘๋™ํ•˜๋Š”๊ฐ€? - Domain Randomization๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด, ์ •์ฑ…์ด ํŠน์ • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๊ณผ์ ํ•ฉ(overfit)๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. - ๋‹จ์ˆœํ™”๋œ ํ™˜๊ฒฝ์—์„œ ํ•™์Šต๋œ โ€œ๊ฑฐ์นœ(coarse)โ€ ์ •์ฑ…์€ ๊ธฐ๋ณธ์ ์ธ ์กฐ์ž‘ ์ „๋žต์„ ์ธ์ฝ”๋”ฉํ•˜๋ฉฐ, ์ดํ›„ ๋‹จ๊ณ„์—์„œ ์ •์ œ๋ฉ๋‹ˆ๋‹ค.

2.2 2๋‹จ๊ณ„: ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์„ ํ†ตํ•œ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ์ด ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ฐ€์žฅ ํ˜์‹ ์ ์ธ ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. 1๋‹จ๊ณ„์—์„œ ํ•™์Šต๋œ RL ์ •์ฑ…์„ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ์‹œ์Šคํ…œ์˜ ์Šคํ‚ฌ ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒ(skill primitive)๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

2.2.1 ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•์˜ ๊ทผ๋ณธ์  ํ•œ๊ณ„

์ˆœ์ˆ˜ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์˜ ๋ฌธ์ œ:

๋‹ค์ง€ ์†์˜ ์ง์ ‘ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ทผ๋ณธ์  ์–ด๋ ค์›€์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค:

  1. ์ฐจ์›์˜ ์ €์ฃผ(Curse of Dimensionality): 16-24๊ฐœ์˜ ๊ด€์ ˆ์„ ๋™์‹œ์— ์ œ์–ดํ•ด์•ผ ํ•˜๋ฉฐ, ์ธ๊ฐ„ ์˜คํผ๋ ˆ์ดํ„ฐ์˜ ์ธ์ง€ ๋ถ€ํ•˜๊ฐ€ ๊ทน๋„๋กœ ๋†’์Šต๋‹ˆ๋‹ค.

  2. ๊ธฐ๊ตฌํ•™์  ๋ถˆ์ผ์น˜(Kinematic Mismatch): ์ธ๊ฐ„ ์†๊ณผ ๋กœ๋ด‡ ์†์˜ ๊ธฐ๊ตฌํ•™์ด ๋‹ค๋ฅด๋ฏ€๋กœ, ์ง๊ด€์ ์ธ ๋งคํ•‘์ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, Allegro Hand๋Š” 4๊ฐœ์˜ ์†๊ฐ€๋ฝ๋งŒ ์žˆ๊ณ  ๊ฐ ์†๊ฐ€๋ฝ์˜ ๊ด€์ ˆ ๋ฐฐ์น˜๊ฐ€ ์ธ๊ฐ„๊ณผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

  3. ์‹œ๊ฐ„์  ์ •๋ฐ€๋„: Finger gait ๊ฐ™์€ ๋™์  ์กฐ์ž‘์€ ๋ฐ€๋ฆฌ์ดˆ ๋‹จ์œ„์˜ ํƒ€์ด๋ฐ์ด ์ค‘์š”ํ•œ๋ฐ, ์ธ๊ฐ„์ด ์ด๋ฅผ ์ง์ ‘ ์ œ์–ดํ•˜๊ธฐ๋Š” ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

  4. ํ›ˆ๋ จ ๋น„์šฉ: ์ˆ™๋ จ๋œ ์˜คํผ๋ ˆ์ดํ„ฐ ์–‘์„ฑ์— ์ˆ˜์‹ญ-์ˆ˜๋ฐฑ ์‹œ๊ฐ„์ด ์†Œ์š”๋˜๋ฉฐ, ํ”ผ๋กœ๋กœ ์ธํ•œ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

์ˆœ์ˆ˜ Sim-to-Real ์ „์ด์˜ ๋ฌธ์ œ:

์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต๋œ ์ •์ฑ…์„ ์ง์ ‘ ์‹ค์ œ์— ์ ์šฉํ•  ๋•Œ์˜ ์‹คํŒจ ์›์ธ:

  1. ์ ‘์ด‰ ์—ญํ•™ ๋ถˆ์ผ์น˜: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ์ ‘์ด‰ ๋ชจ๋ธ์€ ์‹ค์ œ์˜ ๋ณต์žกํ•œ ๋งˆ์ฐฐ, ๋ณ€ํ˜•, ์ ์ฐฉ ํ˜„์ƒ์„ ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.

  2. ๊ฐ๊ฐ ๊ฒฉ์ฐจ(Sensory Gap): ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์™„๋ฒฝํ•œ ์ƒํƒœ ์ •๋ณด์— ์ ‘๊ทผํ•˜์ง€๋งŒ, ์‹ค์ œ์—์„œ๋Š” ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์„ผ์„œ ๋ฐ์ดํ„ฐ๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

  3. ์•ก์ถ”์—์ดํ„ฐ ๋ชจ๋ธ๋ง ์˜ค๋ฅ˜: ๋ชจํ„ฐ์˜ ๋น„์„ ํ˜•์„ฑ, ๋ฐฑ๋ž˜์‹œ, ๋งˆ์ฐฐ ๋“ฑ์ด ์ •ํ™•ํžˆ ๋ชจ๋ธ๋ง๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  4. ํ™˜๊ฒฝ ๋ณ€๋™์„ฑ: ์กฐ๋ช…, ์˜จ๋„, ์Šต๋„ ๋“ฑ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๊ณ ๋ คํ•˜์ง€ ์•Š์€ ์š”์†Œ๋“ค์ด ์‹ค์ œ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.

2.2.2 ์ œ์•ˆ๋œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ ‘๊ทผ๋ฒ•์˜ ์ƒ์„ธ ์„ค๊ณ„

๋ณธ ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ํ†ต์ฐฐ์€ RL ์ •์ฑ…์„ โ€œ์ž๋™ ์กฐ์ข… ์žฅ์น˜(autopilot)โ€๋กœ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      HUMAN OPERATOR                              โ”‚
โ”‚  - High-level intent: start, stop, direction, force level       โ”‚
โ”‚  - Cognitive load: LOW (only strategic decisions)               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚ Sparse commands (1-5 Hz)
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   COMMAND INTERFACE                              โ”‚
โ”‚  - Joystick / Keyboard / Voice commands                         โ”‚
โ”‚  - Maps discrete inputs to continuous conditioning signals      โ”‚
โ”‚  - Direction vector: 3D rotation axis                           โ”‚
โ”‚  - Force level: scalar multiplier for torque limits             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚ Conditioning signal c(t)
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              RL SKILL PRIMITIVE (from Stage 1)                   โ”‚
โ”‚                                                                  โ”‚
โ”‚  ฯ€(a|s,c) : (proprioception, conditioning) โ†’ joint commands     โ”‚
โ”‚                                                                  โ”‚
โ”‚  - Handles ALL low-level finger coordination                    โ”‚
โ”‚  - Generates finger gait patterns automatically                 โ”‚
โ”‚  - Adjusts grip force based on sensed slip                      โ”‚
โ”‚  - Execution rate: 30-50 Hz                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚ Joint position/torque commands
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    ROBOT HARDWARE                                โ”‚
โ”‚  - Multi-fingered hand (16-24 DoF)                              โ”‚
โ”‚  - Tactile sensor arrays on fingertips                          โ”‚
โ”‚  - Joint encoders and torque sensors                            โ”‚
โ”‚  - Low-level PD control at 500-1000 Hz                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚ Sensor feedback
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   DATA COLLECTION MODULE                         โ”‚
โ”‚  Records synchronized streams:                                   โ”‚
โ”‚  - Tactile: contact distribution, force magnitude (100+ Hz)     โ”‚
โ”‚  - Proprioception: joint angles, velocities, torques (500 Hz)   โ”‚
โ”‚  - Task state: object pose, rotation angle (30 Hz)              โ”‚
โ”‚  - Labels: success/failure, phase annotations                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

์กฐ๊ฑดํ™” ์‹ ํ˜ธ(Conditioning Signal)์˜ ์„ค๊ณ„:

RL ์ •์ฑ…์€ ์กฐ๊ฑดํ™” ์‹ ํ˜ธ c(t)๋ฅผ ์ถ”๊ฐ€ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํ–‰๋™์„ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค:

c(t) = [direction_vector, force_level, task_phase]

direction_vector โˆˆ โ„ยณ:
  - Unit vector specifying desired rotation axis
  - Example: [0, 0, 1] for clockwise rotation around z-axis
  - Example: [0, 0, -1] for counter-clockwise

force_level โˆˆ [0, 1]:
  - Scalar multiplying base torque limits
  - 0.3: gentle manipulation (initial threading)
  - 0.7: normal operation
  - 1.0: high-torque (final tightening)

task_phase โˆˆ {approach, grasp, rotate, release}:
  - Discrete phase indicator
  - Enables phase-specific behaviors

RL ์ •์ฑ…์˜ ์กฐ๊ฑด๋ถ€ ๊ตฌ์กฐ:

class ConditionalSkillPolicy(nn.Module):
    def __init__(self, obs_dim, cond_dim, action_dim):
        self.obs_encoder = MLP([obs_dim, 256, 256])
        self.cond_encoder = MLP([cond_dim, 64, 64])
        self.policy_head = MLP([320, 256, action_dim])
    
    def forward(self, observation, conditioning):
        # Encode proprioceptive observation
        obs_features = self.obs_encoder(observation)
        
        # Encode conditioning signal
        cond_features = self.cond_encoder(conditioning)
        
        # Concatenate and produce action
        combined = torch.cat([obs_features, cond_features], dim=-1)
        action = self.policy_head(combined)
        
        return action

2.2.3 ์ธ๊ฐ„-๋กœ๋ด‡ ์—ญํ•  ๋ถ„๋‹ด์˜ ์›๋ฆฌ

Fitts์˜ MABA-MABA ์›์น™ ์ ์šฉ:

์ธ๊ฐ„๊ณผ ๊ธฐ๊ณ„(๋กœ๋ด‡)์˜ ์—ญํ•  ๋ถ„๋‹ด์€ ๊ฐ์ž์˜ ๊ฐ•์ ์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค:

Capability Human Robot (RL Policy)
Strategic planning โœ“ Excellent โœ— Limited
Anomaly detection โœ“ Excellent โ–ณ Moderate
High-frequency control โœ— Poor โœ“ Excellent
Precise timing โœ— Poor โœ“ Excellent
Fatigue resistance โœ— Poor โœ“ Excellent
Adaptability to novel situations โœ“ Good โ–ณ Within training distribution

๊ตฌ์ฒด์  ์—ญํ•  ๋ถ„๋‹ด ์˜ˆ์‹œ (๋„ˆํŠธ ์ฒด๊ฒฐ ์ž‘์—…):

Human responsibilities:
โ”œโ”€โ”€ Decide WHEN to start grasping
โ”œโ”€โ”€ Specify rotation DIRECTION (CW/CCW)
โ”œโ”€โ”€ Judge if nut is properly seated
โ”œโ”€โ”€ Detect cross-threading (via visual inspection)
โ”œโ”€โ”€ Decide when tightening is complete
โ””โ”€โ”€ Handle exceptions and failures

RL Policy responsibilities:
โ”œโ”€โ”€ Execute finger gait for continuous rotation
โ”œโ”€โ”€ Maintain stable multi-finger grasp
โ”œโ”€โ”€ Adjust grip force to prevent slip
โ”œโ”€โ”€ Coordinate 16-24 joints simultaneously
โ”œโ”€โ”€ React to contact events in real-time
โ””โ”€โ”€ Generate smooth, collision-free motions

2.2.4 ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ์ƒ์„ธ ํ”„๋กœํ† ์ฝœ

์ˆ˜์ง‘๋˜๋Š” ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ:

1. Tactile Stream (100-1000 Hz):
   โ”œโ”€โ”€ Per-finger contact maps: [N_fingers ร— H ร— W] pressure images
   โ”œโ”€โ”€ Aggregated features: total force, CoP, contact area
   โ”œโ”€โ”€ Temporal derivatives: force rate, slip indicators
   โ””โ”€โ”€ Raw sensor values for offline reprocessing

2. Proprioceptive Stream (500-1000 Hz):
   โ”œโ”€โ”€ Joint positions: q โˆˆ โ„^{n_joints}
   โ”œโ”€โ”€ Joint velocities: qฬ‡ โˆˆ โ„^{n_joints}
   โ”œโ”€โ”€ Joint torques: ฯ„ โˆˆ โ„^{n_joints}
   โ””โ”€โ”€ End-effector poses (computed via FK)

3. Task State Stream (30-100 Hz):
   โ”œโ”€โ”€ Object pose (from external tracking or estimation)
   โ”œโ”€โ”€ Rotation angle accumulated
   โ”œโ”€โ”€ Task phase labels
   โ””โ”€โ”€ Success/failure flags

4. Command Stream (1-10 Hz):
   โ”œโ”€โ”€ Human input commands (raw)
   โ”œโ”€โ”€ Interpreted conditioning signals
   โ””โ”€โ”€ Timestamps for synchronization

๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๋ณด์žฅ ๋ฉ”์ปค๋‹ˆ์ฆ˜:

  1. ์ž๋™ ํ•„ํ„ฐ๋ง: ๋ถˆ์•ˆ์ •ํ•˜๊ฑฐ๋‚˜ ์‹คํŒจํ•œ ์—ํ”ผ์†Œ๋“œ ์ž๋™ ์ œ์™ธ
  2. ๋™๊ธฐํ™” ๊ฒ€์ฆ: ์„ผ์„œ ์ŠคํŠธ๋ฆผ ๊ฐ„ ์‹œ๊ฐ„ ์ •๋ ฌ ํ™•์ธ
  3. ์ด์ƒ์น˜ ํƒ์ง€: ๋น„์ •์ƒ์  ์„ผ์„œ ๊ฐ’ ํ”Œ๋ž˜๊น…
  4. ๋ฐธ๋Ÿฐ์‹ฑ: ์„ฑ๊ณต/์‹คํŒจ, ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์— ๋Œ€ํ•ด ๊ท ํ˜• ์žกํžŒ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ

2.2.5 ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ ‘๊ทผ๋ฒ•์˜ ์ด๋ก ์  ์žฅ์ 

์ •๋ณด ์ด๋ก ์  ๊ด€์ :

Traditional Teleoperation:
  I(Demo; Task) โ‰ค I(Human_skill; Task)
  โ†’ Limited by human's motor control capability

Hybrid Approach:
  I(Demo; Task) = I(Human_intent; Task) + I(RL_execution; Task|Human_intent)
  โ†’ Human provides WHAT, RL provides HOW
  โ†’ Information is additive, not bottlenecked

์ƒ˜ํ”Œ ๋ณต์žก๋„ ๊ด€์ :

์ˆœ์ˆ˜ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์œผ๋กœ ํŠน์ • ์ž‘์—…์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๋ฐ๋ชจ ์ˆ˜๋ฅผ N_{teleop}, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ฐฉ์‹์œผ๋กœ ํ•„์š”ํ•œ ์ˆ˜๋ฅผ N_{hybrid}๋ผ ํ•˜๋ฉด:

N_hybrid << N_teleop

Reasons:
1. RL policy already knows basic manipulation structure
2. Human only needs to provide high-level variation
3. Low-level noise is filtered by RL

์ˆ˜์ง‘๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์งˆ์  ์ฐจ์ด:

Aspect Pure Teleoperation Hybrid Approach
Motion smoothness Variable (human tremor) Consistent (RL generated)
Timing precision Poor (human reaction time) Excellent (policy-controlled)
Coverage of state space Biased to human preferences More systematic
Failure modes captured Uncontrolled failures Controlled exploration
Sensory richness Same Same

์ด ์ ‘๊ทผ๋ฒ•์˜ ํ•ต์‹ฌ ์žฅ์ ์€ ํšจ์œจ์„ฑ๊ณผ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์˜ ๋™์‹œ ๋‹ฌ์„ฑ์ž…๋‹ˆ๋‹ค. ์ธ๊ฐ„์ด ์ˆ˜์ฒœ ๋ฒˆ์˜ ์‹œํ–‰์ฐฉ์˜ค ์—†์ด๋„ ์˜๋ฏธ ์žˆ๋Š” ์กฐ์ž‘ ๋ฐ๋ชจ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋™์‹œ์— ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์–ป์„ ์ˆ˜ ์—†๋Š” ํ’๋ถ€ํ•œ ์‹ค์ œ ๊ฐ๊ฐ ์ •๋ณด๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

2.3 3๋‹จ๊ณ„: ์ด‰๊ฐ ํ†ตํ•ฉ Behavior Cloning

๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„์—์„œ๋Š” ์ˆ˜์ง‘๋œ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด‰๊ฐ ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•œ ๋ชจ๋ฐฉ ํ•™์Šต(Behavior Cloning) ์ •์ฑ…์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

์™œ ์ด‰๊ฐ์ด ์ค‘์š”ํ•œ๊ฐ€?

๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ์ด๋‚˜ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™ ๊ฐ™์€ ์ž‘์—…์—์„œ ์ด‰๊ฐ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•ต์‹ฌ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  1. ์ ‘์ด‰ ์ƒํƒœ ์ธ์‹: ์†๊ฐ€๋ฝ์ด ๊ฐ์ฒด์™€ ์ ‘์ด‰ํ•˜๊ณ  ์žˆ๋Š”์ง€, ์–ด๋””์—์„œ ์ ‘์ด‰ํ•˜๋Š”์ง€
  2. ์Šฌ๋ฆฝ ๊ฐ์ง€: ๊ฐ์ฒด๊ฐ€ ๋ฏธ๋„๋Ÿฌ์ง€๊ธฐ ์‹œ์ž‘ํ•˜๋Š” ์ˆœ๊ฐ„์„ ๊ฐ์ง€ํ•˜์—ฌ ๊ทธ๋ฆฝ ์กฐ์ ˆ
  3. ํž˜ ํ”ผ๋“œ๋ฐฑ: ๋„ˆํŠธ๊ฐ€ ๋๊นŒ์ง€ ์กฐ์—ฌ์กŒ๋Š”์ง€, ๋‚˜์‚ฌ์‚ฐ์ด ์ •๋ ฌ๋˜์—ˆ๋Š”์ง€ ํŒ๋‹จ
  4. ํ˜•์ƒ ์ถ”๋ก : ์ด‰๊ฐ์„ ํ†ตํ•ด ๋ณด์ด์ง€ ์•Š๋Š” ๊ฐ์ฒด ํŠน์„ฑ ํŒŒ์•…

์ผ๋ฐ˜ํ™”(Generalization) ๋Šฅ๋ ฅ:

๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์ฃผ์žฅ ์ค‘ ํ•˜๋‚˜๋Š” ํ•™์Šต๋œ ์ •์ฑ…์ด ๋‹ค์–‘ํ•œ ํ˜•์ƒ์˜ ๋„ˆํŠธ์™€ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋ฒ„๋กœ ์ผ๋ฐ˜ํ™”๋œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ด‰๊ฐ ์ •๋ณด๊ฐ€ ์‹œ๊ฐ์ ์œผ๋กœ ๋ณด์ด์ง€ ์•Š๋Š” ๊ฐ์ฒด ํŠน์„ฑ(๋‚˜์‚ฌ์‚ฐ ํ”ผ์น˜, ํ—ค๋“œ ํ˜•์ƒ ๋“ฑ)์— ๋Œ€ํ•œ ์•”๋ฌต์  ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.


3. ๊ธฐ์ˆ ์  ์‹ฌ์ธต ๋ถ„์„

3.1 ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์„ค๊ณ„ ์ฒ ํ•™

3.1.1 ๊ฐ์ฒด ๋ชจ๋ธ ๋‹จ์ˆœํ™”์˜ ์›์น™๊ณผ ๊ทผ๊ฑฐ

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ฑ„ํƒํ•œ ๊ฐ์ฒด ๋ชจ๋ธ ๋‹จ์ˆœํ™”๋Š” ๋‹จ์ˆœํ•œ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ ์ถ”๊ตฌ๊ฐ€ ์•„๋‹Œ, ํ•™์Šต ๊ฐ€๋Šฅ์„ฑ(learnability)๊ณผ ์ „์ด ๊ฐ€๋Šฅ์„ฑ(transferability) ์‚ฌ์ด์˜ ๊ท ํ˜•์„ ๊ณ ๋ คํ•œ ์„ค๊ณ„ ๊ฒฐ์ •์ž…๋‹ˆ๋‹ค.

๋‹จ์ˆœํ™” ์ „๋žต์˜ ์ƒ์„ธ:

Real Object Simulation Representation Simplification Point Preserved Properties
Hex nut Cylindrical primitive Thread removed, basic friction only Rotation axis, graspable region, basic friction
Bolt Fixed axis Only rotation instead of helical motion Axis direction, torque-rotation relationship
Screwdriver Straight rod Head shape simplified Length, mass distribution, grip region

์™œ ๋‚˜์‚ฌ์‚ฐ์„ ๋ชจ๋ธ๋งํ•˜์ง€ ์•Š๋Š”๊ฐ€?

๋‚˜์‚ฌ์‚ฐ์˜ ์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ์  ๋ชจ๋ธ๋ง์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค:

  1. ์ ‘์ด‰์  ํญ๋ฐœ(Contact Point Explosion): ๋‚˜์‚ฌ์‚ฐ์˜ ํ—ฌ๋ฆฌ์ปฌ ๊ธฐํ•˜๋Š” ์ˆ˜๋ฐฑ ๊ฐœ์˜ ์ ‘์ด‰์ ์„ ์ƒ์„ฑํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์†๋„๋ฅผ ํฌ๊ฒŒ ์ €ํ•˜์‹œํ‚ต๋‹ˆ๋‹ค.

  2. ์ˆ˜์น˜์  ๋ถˆ์•ˆ์ •์„ฑ: ๋ฏธ์„ธํ•œ ๋‚˜์‚ฌ์‚ฐ ๊ฐ„๊ฒฉ(0.5-2mm)์—์„œ์˜ ์ ‘์ด‰ ํ•ด์„์€ ์ˆ˜์น˜์ ์œผ๋กœ ๋ถˆ์•ˆ์ •ํ•˜๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋งˆ๋‹ค ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  3. ๋ถˆํ•„์š”ํ•œ ๋ณต์žก์„ฑ: RL ์—์ด์ „ํŠธ๊ฐ€ ํ•™์Šตํ•ด์•ผ ํ•  ํ•ต์‹ฌ ํ–‰๋™(finger gait, ๊ทธ๋ฆฝ ์กฐ์ ˆ)์€ ๋‚˜์‚ฌ์‚ฐ ์„ธ๋ถ€ ์‚ฌํ•ญ๊ณผ ๋…๋ฆฝ์ ์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ํ†ต์ฐฐ:

์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ๋ชฉ์ ์€ โ€œ์‹ค์ œ์™€ ๋™์ผํ•œ ๋ฌผ๋ฆฌ์  ๊ฒฝํ—˜โ€์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, โ€œ์˜ฌ๋ฐ”๋ฅธ ํ–‰๋™ ํŒจํ„ด์„ ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ๋Š” ์ถฉ๋ถ„ํžˆ ํ’๋ถ€ํ•œ ํ™˜๊ฒฝโ€์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

3.1.2 Domain Randomization์˜ ์ฒด๊ณ„์  ์ ์šฉ

Domain Randomization์€ Sim-to-Real Gap์„ ์ค„์ด๊ธฐ ์œ„ํ•œ ํ‘œ์ค€ ๊ธฐ๋ฒ•์ด์ง€๋งŒ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ์„ ํƒ์ ์ด๊ณ  ์ฒด๊ณ„์ ์œผ๋กœ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋žœ๋คํ™” ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณ„์ธต:

Level 1: Physical Parameters (Strong Randomization)
โ”œโ”€โ”€ Object mass: uniform(0.5x, 1.5x)
โ”œโ”€โ”€ Friction coefficient: uniform(0.3, 1.2)
โ”œโ”€โ”€ Moment of inertia: uniform(0.8x, 1.2x)
โ””โ”€โ”€ Contact stiffness: uniform(0.7x, 1.3x)

Level 2: Geometric Parameters (Moderate Randomization)
โ”œโ”€โ”€ Object scale: uniform(0.85, 1.15)
โ”œโ”€โ”€ Finger length: uniform(0.95, 1.05)
โ””โ”€โ”€ Joint offset: gaussian(0, 0.5mm)

Level 3: Sensor/Actuator Noise (Light Randomization)
โ”œโ”€โ”€ Joint position noise: gaussian(0, 0.01rad)
โ”œโ”€โ”€ Torque sensor noise: gaussian(0, 0.1Nm)
โ””โ”€โ”€ Control delay: uniform(0, 50ms)

๋žœ๋คํ™”์˜ ํ•ต์‹ฌ ์›์น™:

  1. ๋ณด์ˆ˜์  ์ ‘๊ทผ: ๊ณผ๋„ํ•œ ๋žœ๋คํ™”๋Š” ํ•™์Šต์„ ๋ฐฉํ•ดํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๊ธฐํ•˜ํ•™์  ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์‹ค์ œ ๋ณ€๋™ ๋ฒ”์œ„ ๋‚ด์—์„œ๋งŒ ๋žœ๋คํ™”ํ•ฉ๋‹ˆ๋‹ค.

  2. ์ƒ๊ด€๊ด€๊ณ„ ๋ณด์กด: ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์—ฐ๊ด€๋œ ํŒŒ๋ผ๋ฏธํ„ฐ(์˜ˆ: ์งˆ๋Ÿ‰๊ณผ ๊ด€์„ฑ)๋Š” ํ•จ๊ป˜ ๋ณ€๊ฒฝํ•˜์—ฌ ๋น„ํ˜„์‹ค์ ์ธ ์กฐํ•ฉ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

  3. ์ ์ง„์  ํ™•๋Œ€: ํ•™์Šต ์ดˆ๊ธฐ์—๋Š” ์ข์€ ๋ฒ”์œ„๋กœ ์‹œ์ž‘ํ•˜์—ฌ ์ ์ง„์ ์œผ๋กœ ๋žœ๋คํ™” ๋ฒ”์œ„๋ฅผ ํ™•๋Œ€ํ•˜๋Š” curriculum ์ ์šฉ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

3.2 ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ค๊ณ„

3.2.1 ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ ํƒ์˜ ๊ทผ๊ฑฐ

๋‹ค์ง€ ์† ์ œ์–ด ๋ฌธ์ œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์„ฑ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค: - ๊ณ ์ฐจ์› ์—ฐ์† ํ–‰๋™ ๊ณต๊ฐ„ (16-24 DoF) - ๋ณต์žกํ•œ ์ ‘์ด‰ ์—ญํ•™์œผ๋กœ ์ธํ•œ ๋น„smooth ๋ณด์ƒ landscape - ์žฅ๊ธฐ ์‹œ๊ฐ„ ์˜์กด์„ฑ (finger gait๋Š” ์ˆ˜์‹ญ ์Šคํ…์— ๊ฑธ์ณ ๋ฐœ์ƒ)

์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ ๊ณ ๋ คํ•  ๋•Œ, ๋‹ค์Œ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์ด ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค:

PPO (Proximal Policy Optimization):

Advantages:
- Stable learning (clipping prevents large policy changes)
- Easy parallelization (thousands of environments simultaneously)
- Relatively few hyperparameters to tune

Expected hyperparameters:
- Learning rate: 3e-4
- Clip range: 0.2
- Entropy coefficient: 0.01
- GAE lambda: 0.95
- Batch size: 4096-16384

SAC (Soft Actor-Critic):

Advantages:
- Maximum entropy principle encourages exploration
- Sample efficient (off-policy)
- Can learn diverse action modes

Application scenarios:
- Useful for fine-tuning on real robot
- When learning with limited data

3.2.2 ๋ณด์ƒ ํ•จ์ˆ˜์˜ ์ƒ์„ธ ์„ค๊ณ„

๋ณด์ƒ ํ•จ์ˆ˜ ์„ค๊ณ„๋Š” RL ๊ธฐ๋ฐ˜ ์กฐ์ž‘์—์„œ ๊ฐ€์žฅ ์–ด๋ ค์šด ๋ถ€๋ถ„ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ์ž‘์—…์— ์ ํ•ฉํ•œ ๋ณด์ƒ ๊ตฌ์กฐ๋ฅผ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค:

๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ ์ž‘์—…์˜ ๋ณด์ƒ ๋ถ„ํ•ด:

def compute_reward(state, action, next_state):
    # 1. ์ง„ํ–‰๋„ ๋ณด์ƒ (Progress Reward)
    # ๋„ˆํŠธ์˜ ํšŒ์ „ ๊ฐ๋„ ๋ณ€ํ™”๋ฅผ ์ธก์ •
    delta_angle = next_state.nut_angle - state.nut_angle
    r_progress = progress_scale * delta_angle  # ์‹œ๊ณ„๋ฐฉํ–ฅ ํšŒ์ „์— ์–‘์˜ ๋ณด์ƒ
    
    # 2. ์ ‘์ด‰ ํ’ˆ์งˆ ๋ณด์ƒ (Contact Quality Reward)
    # ์•ˆ์ •์ ์ธ ๋‹ค์ค‘ ์†๊ฐ€๋ฝ ์ ‘์ด‰ ์žฅ๋ ค
    num_contacts = count_finger_contacts(next_state)
    contact_stability = compute_grasp_stability(next_state)
    r_contact = contact_scale * (num_contacts / max_fingers) * contact_stability
    
    # 3. ์ •๋ ฌ ๋ณด์ƒ (Alignment Reward)
    # ๋„ˆํŠธ๊ฐ€ ๋ณผํŠธ ์ถ•๊ณผ ์ •๋ ฌ๋œ ์ •๋„
    alignment_error = compute_axis_alignment(next_state)
    r_align = align_scale * exp(-alignment_error / alignment_temp)
    
    # 4. ์—๋„ˆ์ง€ ํŽ˜๋„ํ‹ฐ (Energy Penalty)
    # ๊ณผ๋„ํ•œ ํž˜ ์‚ฌ์šฉ ์–ต์ œ
    total_torque = sum(abs(action))
    r_energy = -energy_scale * total_torque
    
    # 5. ์Šฌ๋ฆฝ ํŽ˜๋„ํ‹ฐ (Slip Penalty)
    # ๊ฐ์ฒด ๋ฏธ๋„๋Ÿฌ์ง ๊ฐ์ง€์‹œ ํŽ˜๋„ํ‹ฐ
    if detect_slip(state, next_state):
        r_slip = -slip_penalty
    else:
        r_slip = 0
    
    # 6. ์„ฑ๊ณต ๋ณด์ƒ (Sparse Success Reward)
    # ์ž‘์—… ์™„๋ฃŒ์‹œ ํฐ ๋ณด์ƒ
    if task_completed(next_state):
        r_success = success_bonus
    else:
        r_success = 0
    
    return r_progress + r_contact + r_align + r_energy + r_slip + r_success

๋ณด์ƒ ๊ฐ€์ค‘์น˜ ํŠœ๋‹์˜ ๊ณ ๋ ค์‚ฌํ•ญ:

Reward Component Problem with Low Weight Problem with High Weight
Progress Slow learning, meaningless motion Unstable fast rotation, object drop
Contact Unstable grip Overly conservative motion
Energy Inefficient force use Too weak grip, task failure
Slip Frequent object drops Overly cautious motion

3.2.3 Teacher-Student Distillation ์•„ํ‚คํ…์ฒ˜

๋งŽ์€ sim-to-real ์—ฐ๊ตฌ์—์„œ ํšจ๊ณผ์ ์ธ teacher-student ๊ตฌ์กฐ๋ฅผ ๋ณธ ์ž‘์—…์— ์ ์šฉํ•˜๋ฉด:

Teacher Policy (Simulation only):
โ”œโ”€โ”€ Input: Full state information (object pose, velocity, contact points, etc.)
โ”œโ”€โ”€ Output: Optimal action
โ””โ”€โ”€ Training: Millions of steps in simulation

Student Policy (Real deployment):
โ”œโ”€โ”€ Input: Limited sensory information (proprioception, tactile)
โ”œโ”€โ”€ Output: Action (similar to teacher)
โ””โ”€โ”€ Training: Imitate teacher's behavior + fine-tune with real data

Privileged Information์˜ ํ™œ์šฉ:

Teacher๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ โ€œํŠน๊ถŒ ์ •๋ณดโ€๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค:

  • ์ •ํ™•ํ•œ ๊ฐ์ฒด pose
  • ๋ชจ๋“  ์ ‘์ด‰์ ์˜ ์œ„์น˜์™€ ํž˜
  • ๊ฐ์ฒด์˜ ๋ฌผ๋ฆฌ์  ํŒŒ๋ผ๋ฏธํ„ฐ

Student๋Š” ์ด๋Ÿฌํ•œ ์ •๋ณด ์—†์ด๋„ ์œ ์‚ฌํ•œ ํ–‰๋™์„ ์ถœ๋ ฅํ•˜๋„๋ก ํ•™์Šต๋˜๋ฉฐ, ์ด ๊ณผ์ •์—์„œ ์•”๋ฌต์  ์ƒํƒœ ์ถ”์ •(implicit state estimation)์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

3.3 ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ์‹œ์Šคํ…œ์˜ ๊ธฐ์ˆ ์  ์„ค๊ณ„

3.3.1 ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ œ์–ด ์•„ํ‚คํ…์ฒ˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Human Operator                            โ”‚
โ”‚  (High-level intent: start, stop, direction, force adjust)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚ High-level commands (5-10Hz)
                          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 Command Interpreter                          โ”‚
โ”‚  - Convert to continuous direction vector                   โ”‚
โ”‚  - Map force level to torque limits                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚ Conditioning signal
                          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              RL Skill Primitive Policy                       โ”‚
โ”‚  - Input: proprioception + conditioning signal              โ”‚
โ”‚  - Output: per-joint torque/position commands               โ”‚
โ”‚  - Execution rate: 30-50Hz                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚ Low-level commands
                          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 Low-level Controller                         โ”‚
โ”‚  - PD controller (for position commands)                    โ”‚
โ”‚  - Torque control (for direct torque commands)              โ”‚
โ”‚  - Execution rate: 500-1000Hz                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  Multi-fingered Hand                         โ”‚
โ”‚  - 16-24 DoF                                                โ”‚
โ”‚  - Tactile sensor arrays                                    โ”‚
โ”‚  - Joint encoders                                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3.3.2 ์ธ๊ฐ„-๋กœ๋ด‡ ์ธํ„ฐํŽ˜์ด์Šค ์˜ต์…˜

์˜ต์…˜ 1: ํ–…ํ‹ฑ ์žฅ์น˜ ๊ธฐ๋ฐ˜

Device: Geomagic Touch, Falcon, etc.
Advantages: Force feedback, intuitive operation
Disadvantages: Not suitable for high-DoF hand control
Application: Used for wrist/arm position control, fingers delegated to RL

์˜ต์…˜ 2: ์žฅ๊ฐ‘ ๊ธฐ๋ฐ˜ (Manus, HaptX ๋“ฑ)

Advantages: Natural hand movement mapping
Disadvantages: Kinematic mismatch between human and robot hands
Application: Retargeting algorithm required

์˜ต์…˜ 3: ๋‹จ์ˆœํ™”๋œ ๋ช…๋ น ์ธํ„ฐํŽ˜์ด์Šค

Input: Joystick, keyboard, voice commands
Advantages: Low cost, easy to learn
Disadvantages: Fine control difficult
Application: Suitable for this paper's approach (RL handles fine control)

3.4 ์ด‰๊ฐ ์„ผ์‹ฑ ์‹œ์Šคํ…œ์˜ ๊ธฐ์ˆ ์  ์„ธ๋ถ€์‚ฌํ•ญ

3.4.1 ์ด‰๊ฐ ์„ผ์„œ ์œ ํ˜•๋ณ„ ํŠน์„ฑ

์ €ํ•ญ์‹ ์–ด๋ ˆ์ด ์„ผ์„œ:

Principle: Measure resistance change under pressure
Resolution: 4-16 taxel/cmยฒ
Sampling: 100-1000Hz
Advantages: Low cost, high spatial resolution
Disadvantages: Hysteresis, drift
Examples: FSR array, Tekscan

์šฉ๋Ÿ‰์‹ ์„ผ์„œ:

Principle: Capacitance change under pressure
Resolution: 1-4 taxel/cmยฒ
Sampling: 100-500Hz
Advantages: Low hysteresis, stable
Disadvantages: Sensitive to electromagnetic interference
Examples: Syntouch BioTac, Robotic Skin

๊ด‘ํ•™์‹/๋น„์ „ ๊ธฐ๋ฐ˜:

Principle: Camera imaging of gel deformation
Resolution: Hundreds to thousands taxel equivalent
Sampling: 30-60Hz (camera framerate)
Advantages: Very high resolution, 3-axis force measurement possible
Disadvantages: Processing delay, computational cost
Examples: GelSight, DIGIT, Soft Bubble

3.4.2 ์ด‰๊ฐ ๋ฐ์ดํ„ฐ์˜ ์‹ ๊ฒฝ๋ง ์ž…๋ ฅ ํ‘œํ˜„

์ด‰๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ฑ… ๋„คํŠธ์›Œํฌ์— ์ž…๋ ฅํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:

๋ฐฉ๋ฒ• 1: Raw ์ด๋ฏธ์ง€ ํ‘œํ˜„

# ์ด‰๊ฐ ์–ด๋ ˆ์ด๋ฅผ 2D ์ด๋ฏธ์ง€๋กœ ์ฒ˜๋ฆฌ
tactile_image = reshape(tactile_readings, (H, W, 1))
features = CNN(tactile_image)  # Conv layers for spatial features

๋ฐฉ๋ฒ• 2: ์š”์•ฝ ํ†ต๊ณ„๋Ÿ‰

# ์ €์ฐจ์› ํŠน์ง•์œผ๋กœ ์••์ถ•
tactile_features = {
    'total_force': sum(tactile_readings),
    'center_of_pressure': compute_cop(tactile_readings),
    'contact_area': count_nonzero(tactile_readings > threshold),
    'max_pressure': max(tactile_readings),
    'pressure_gradient': compute_gradient(tactile_readings)
}

๋ฐฉ๋ฒ• 3: ์‹œ๊ฐ„์  ํŠน์ง• ํฌํ•จ

# LSTM/Transformer๋กœ ์‹œ๊ฐ„์  ํŒจํ„ด ํ•™์Šต
tactile_sequence = [tactile_t-k, ..., tactile_t-1, tactile_t]
temporal_features = TemporalEncoder(tactile_sequence)
# ์Šฌ๋ฆฝ ๊ฐ์ง€, ์ ‘์ด‰ ์ „์ด ๋“ฑ์˜ ๋™์  ํŠน์„ฑ ํฌ์ฐฉ

3.4.3 ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์œตํ•ฉ ์•„ํ‚คํ…์ฒ˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Tactile Input โ”‚  โ”‚Proprioceptiveโ”‚  โ”‚ Task Conditionโ”‚
โ”‚    (Hร—Wร—T)    โ”‚  โ”‚   Input(Jร—T) โ”‚  โ”‚     (D)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚                 โ”‚                 โ”‚
       โ–ผ                 โ–ผ                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     CNN      โ”‚  โ”‚     MLP      โ”‚  โ”‚   Embedding  โ”‚
โ”‚   Encoder    โ”‚  โ”‚   Encoder    โ”‚  โ”‚    Layer     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚                 โ”‚                 โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚ Concatenation
                    โ–ผ
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚  Fusion Network  โ”‚
           โ”‚   (MLP/Attention)โ”‚
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚
                    โ–ผ
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚   Policy Head    โ”‚
           โ”‚  (Action Output) โ”‚
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4. ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

4.1 ๊ธฐ์กด Sim-to-Real ์ ‘๊ทผ๋ฒ•๋“ค

Method Advantages Limitations Comparison with This Paper
Domain Randomization Easy to implement, no additional real data required Unrealistic behavior learning with excessive randomization This paper uses DR only in Stage 1, refines with real data
System Identification Accurate simulation possible Time consuming, needs to be redone for each object This paper does not depend on object model accuracy
Real-to-Sim-to-Real Corrects simulation with real data Complex pipeline, computational cost This paper directly uses real data instead of simulation correction
Online Adaptation Responds to real-time environment changes Can be dangerous on real robot This paper deploys after offline learning for safety

4.2 ์œ ์‚ฌ ์—ฐ๊ตฌ๋“ค

DexTreme (2023): - ๊ทนํ•œ์˜ in-hand manipulation ์‹œ์—ฐ - ๋Œ€๊ทœ๋ชจ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ + Domain Randomization์— ์˜์กด - ์ด‰๊ฐ ์—†์ด ๊ณ ์œ ๊ฐ๊ฐ๋งŒ ์‚ฌ์šฉ - ๋ณธ ๋…ผ๋ฌธ์€ ์ด‰๊ฐ ํ†ตํ•ฉ์œผ๋กœ ๋” ๋ณต์žกํ•œ ์ž‘์—… ์ˆ˜ํ–‰

Transic (2024): - ์˜จ๋ผ์ธ ๊ต์ •์„ ํ†ตํ•œ sim-to-real ์ „์ด - ์‹ค์‹œ๊ฐ„ ์ธ๊ฐ„ ํ”ผ๋“œ๋ฐฑ์œผ๋กœ ์ •์ฑ… ์ˆ˜์ • - ๋ณธ ๋…ผ๋ฌธ์€ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์— ํ™œ์šฉํ•˜์—ฌ ์ฐจ๋ณ„ํ™”

CyberDemo (CVPR 2024): - ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ๋ชจ์˜ ๋Œ€๊ทœ๋ชจ ์ฆ๊ฐ• - ์‚ฌ์ „ํ•™์Šต๋œ ์‹œ๊ฐ ํ‘œํ˜„ ํ™œ์šฉ - ๋ณธ ๋…ผ๋ฌธ์€ ์‹ค์ œ ์ด‰๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ ํ™œ์šฉํ•˜์—ฌ ๋” ํ’๋ถ€ํ•œ ๊ฐ๊ฐ ์ •๋ณด ํฌํ•จ

4.3 ์ด‰๊ฐ ๊ธฐ๋ฐ˜ ์กฐ์ž‘ ์—ฐ๊ตฌ๋“ค

Visual Dexterity (Chen et al., 2023): - ์‹œ๊ฐ๋งŒ์œผ๋กœ in-hand ์žฌ๋ฐฐํ–ฅ - ์ด‰๊ฐ ์—†์ด ์‹œ๊ฐ์  ์ถ”๋ก ์— ์˜์กด - ๋ณธ ๋…ผ๋ฌธ์€ ์ด‰๊ฐ์œผ๋กœ ์‹œ๊ฐ์˜ ํ•œ๊ณ„ ๋ณด์™„

In-Hand Manipulation of Articulated Tools (2025): - ๊ด€์ ˆ ๋„๊ตฌ์˜ in-hand ์กฐ์ž‘ - ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ… + ์ด‰๊ฐ ๊ธฐ๋ฐ˜ ์ •์ œ - ๋ณธ ๋…ผ๋ฌธ๊ณผ ์œ ์‚ฌํ•œ ์ฒ ํ•™์ด์ง€๋งŒ ๋‹ค๋ฅธ ์‘์šฉ ์˜์—ญ


5. ํƒœ์Šคํฌ ๋ถ„์„: ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ๊ณผ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™

5.1 ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ (Nut-Bolt Fastening)

์ž‘์—… ๋ถ„ํ•ด:

  1. ์ ‘๊ทผ ๋‹จ๊ณ„ (Approach Phase)
    • ์†์ด ๋„ˆํŠธ์— ์ ‘๊ทผ
    • ์ ์ ˆํ•œ ๊ทธ๋ฆฝ ์ž์„ธ ํ˜•์„ฑ
  2. ํŒŒ์ง€ ๋‹จ๊ณ„ (Grasping Phase)
    • ๋‹ค์ค‘ ์†๊ฐ€๋ฝ์œผ๋กœ ๋„ˆํŠธ ์•ˆ์ •์  ํŒŒ์ง€
    • ์ดˆ๊ธฐ ํ† ํฌ ์ €ํ•ญ ์—†์ด ํšŒ์ „ ๊ฐ€๋Šฅ ํ™•์ธ
  3. ํšŒ์ „ ๋‹จ๊ณ„ (Rotation Phase)
    • ์†๊ฐ€๋ฝ๋“ค์˜ ์ˆœ์ฐจ์  ์žฌ๋ฐฐ์น˜ (finger gait)
    • ์ผ์ •ํ•œ ํ† ํฌ ์ ์šฉํ•˜๋ฉฐ ํšŒ์ „
  4. ์ฒด๊ฒฐ ์™„๋ฃŒ ๊ฐ์ง€ (Completion Detection)
    • ํ† ํฌ ์ฆ๊ฐ€ ๊ฐ์ง€
    • ์ด‰๊ฐ/๊ณ ์œ ๊ฐ๊ฐ์œผ๋กœ โ€œ๊ฝ‰ ์กฐ์—ฌ์งโ€ ํŒ๋‹จ

๋„์ „ ๊ณผ์ œ: - ๋„ˆํŠธ์™€ ๋ณผํŠธ ๋‚˜์‚ฌ์‚ฐ์˜ ์ •๋ ฌ (Cross-threading ๋ฐฉ์ง€) - ํšŒ์ „ ์ค‘ ๋„ˆํŠธ ์Šฌ๋ฆฝ ๋ฐฉ์ง€ - ๋‹ค์–‘ํ•œ ๋„ˆํŠธ ํฌ๊ธฐ/ํ˜•์ƒ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™”

5.2 ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™ (Screwdriving)

์ž‘์—… ํŠน์„ฑ: - ๋„๊ตฌ(์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋ฒ„) ์‚ฌ์šฉ์ด ์ถ”๊ฐ€๋จ - ํ—ค๋“œ์™€ ๋“œ๋ผ์ด๋ฒ„ ํŒ์˜ ์ •๋ ฌ์ด ์ค‘์š” - ์ถ•๋ฐฉํ–ฅ ์••๋ ฅ๊ณผ ํšŒ์ „ ํ† ํฌ์˜ ์กฐํ•ฉ ํ•„์š”

ํ•ต์‹ฌ ๋„์ „: - ๋“œ๋ผ์ด๋ฒ„-์Šคํฌ๋ฅ˜ ๊ฒฐํ•ฉ ์œ ์ง€ - ์ ์ ˆํ•œ ์ถ•๋ฐฉํ–ฅ ํž˜ ์ ์šฉ (๋„ˆ๋ฌด ์•ฝํ•˜๋ฉด ์บ ์•„์›ƒ, ๋„ˆ๋ฌด ๊ฐ•ํ•˜๋ฉด ์†์ƒ) - ๋‹ค์–‘ํ•œ ์Šคํฌ๋ฅ˜ ํ—ค๋“œ ํ˜•์ƒ (์‹ญ์ž, ์ผ์ž, ๋ณ„ํ˜• ๋“ฑ) ์ ์‘


6. ์ด๋ก ์  ์˜์˜์™€ ์‹ค์šฉ์  ํ•จ์˜

6.1 ์ด๋ก ์  ๊ธฐ์—ฌ์˜ ์‹ฌ์ธต ๋ถ„์„

6.1.1 โ€œGood Enoughโ€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฐ€์„ค์˜ ํ˜•์‹ํ™”

๋ณธ ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ๊ทผ๋ณธ์ ์ธ ์ด๋ก ์  ๊ธฐ์—ฌ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ถฉ์‹ค๋„(fidelity)์™€ ํ•™์Šต ํšจ๊ณผ ์‚ฌ์ด์˜ ๊ด€๊ณ„์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ๊ด€์ ์ž…๋‹ˆ๋‹ค.

๊ธฐ์กด ๊ด€์  (High-Fidelity Paradigm):

Sim-to-Real Performance โˆ Simulation Accuracy
โ†’ More accurate simulation = Better transfer
โ†’ Invest in system identification, precise modeling

์ œ์•ˆ๋œ ๊ด€์  (Behavioral Sufficiency Paradigm):

Sim-to-Real Performance = f(Behavioral Structure Learning) ร— g(Real Data Refinement)
โ†’ Simulation only needs to be "sufficient for exploring correct behavior space"
โ†’ Lack of physical accuracy is compensated by real data

์ด ๊ฐ€์„ค์˜ ์ด๋ก ์  ๊ทผ๊ฑฐ:

  1. Manifold Hypothesis for Manipulation: ์„ฑ๊ณต์ ์ธ ์กฐ์ž‘ ์ •์ฑ…๋“ค์€ ๊ณ ์ฐจ์› ํ–‰๋™ ๊ณต๊ฐ„์—์„œ ์ €์ฐจ์› manifold ์œ„์— ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋‹จ์ˆœํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋„ ์ด manifold์˜ ๋Œ€๋žต์ ์ธ ๊ตฌ์กฐ๋ฅผ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ์„ธ๋ถ€ ์กฐ์ •์€ ์ดํ›„์— ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

  2. Behavioral Invariance: ํŠน์ • ํ–‰๋™ ํŒจํ„ด(์˜ˆ: finger gait)์€ ๋ฌผ๋ฆฌ์  ์„ธ๋ถ€์‚ฌํ•ญ์˜ ๊ด‘๋ฒ”์œ„ํ•œ ๋ณ€ํ™”์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์œ ํšจํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ด๋Ÿฌํ•œ ํŒจํ„ด์ด ํŠน์ • ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์•„๋‹Œ ๊ตฌ์กฐ์  ์ œ์•ฝ์—์„œ ๊ธฐ์ธํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

  3. Information Bottleneck ๊ด€์ : ์‹ค์ œ ํ™˜๊ฒฝ์˜ ๋ชจ๋“  ์ •๋ณด๊ฐ€ ์กฐ์ž‘์— ํ•„์š”ํ•œ ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ์ž‘์—…์— ๊ด€๋ จ๋œ ์ •๋ณด๋งŒ ์ „๋‹ฌํ•˜๋ฉด ๋˜๋ฉฐ, ๊ด€๋ จ ์—†๋Š” ์„ธ๋ถ€์‚ฌํ•ญ์˜ ๋ถˆ์ผ์น˜๋Š” ๋ฌด์‹œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

6.1.2 ๊ณ„์ธต์  ํ•™์Šต์˜ ์ด๋ก ์  ๊ธฐ๋ฐ˜

๋ณธ ๋…ผ๋ฌธ์˜ 3๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์€ ๊ณ„์ธต์  ๊ธฐ์ˆ  ํ•™์Šต(Hierarchical Skill Learning)์˜ ๊ตฌ์ฒด์  ๊ตฌํ˜„์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Options Framework์™€์˜ ์—ฐ๊ฒฐ:

Traditional Options:
- Option = (Initiation set, Policy, Termination condition)
- Learn high-level policy on pre-defined primitives

This paper's approach:
- RL-learned skill primitive (simulation)
- Learn option selection/composition via teleoperation (real)
- Form unified policy via BC

Information Theoretic ๊ด€์ :

I(Action; Task Success | Observation) = 
    I(Action; Task Success | Low-level State)  [Maximized in RL stage]
  + I(Action; Task Success | High-level Intent) [Collected in Teleop stage]
  + I(Action; Task Success | Tactile Feedback)  [Integrated in BC stage]

๊ฐ ๋‹จ๊ณ„๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ์ •๋ณด์›์„ ํ™œ์šฉํ•˜์—ฌ ์ „์ฒด ์ƒํ˜ธ์ •๋ณด๋ฅผ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

6.1.3 ์ด‰๊ฐ์˜ ํ•„์ˆ˜์„ฑ์— ๋Œ€ํ•œ ์ด๋ก ์  ๋ถ„์„

์™œ ์‹œ๊ฐ๋งŒ์œผ๋กœ๋Š” ๋ถ€์กฑํ•œ๊ฐ€?

๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ ๊ฐ™์€ ์ž‘์—…์—์„œ ์‹œ๊ฐ์˜ ํ•œ๊ณ„๋ฅผ ์ •๋ณด์ด๋ก  ๊ด€์ ์—์„œ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

Observable information via vision:
- Approximate object pose
- Finger positions
- Global scene structure

Information NOT observable via vision (tactile required):
- Contact presence (when occluded by fingers)
- Contact force magnitude and direction
- Onset of slip
- Thread engagement state
- Torque resistance changes

Observability ๋ถ„์„:

์‹œ์Šคํ…œ์˜ ์ƒํƒœ x๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ด€์ธก y์˜ ๊ด€์ ์—์„œ:

Vision only:
x_estimated = f(visual_obs)
โ†’ Contact-related states are unobservable

Vision + Tactile:
x_estimated = g(visual_obs, tactile_obs)
โ†’ Full state observable (or sufficiently estimable)

Complementary Sensing์˜ ์›๋ฆฌ:

์‹œ๊ฐ๊ณผ ์ด‰๊ฐ์€ ์„œ๋กœ ๋ณด์™„์ ์ธ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

Aspect Vision Tactile
Spatial range Global (entire workspace) Local (contact points only)
Information type Geometric, appearance Dynamic, force
Occlusion robustness Vulnerable to occlusion Only valid during contact but occlusion-independent
Temporal resolution Framerate limited Very high (kHz possible)

6.2.2 ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ํšจ์œจ์„ฑ์˜ ์ •๋Ÿ‰์  ๋ถ„์„

๊ธฐ์กด ๋ฐฉ๋ฒ•๊ณผ์˜ ๋น„๊ต:

Pure Teleoperation (Conventional):
โ”œโ”€โ”€ Training time: 10-50 hours (depending on task complexity)
โ”œโ”€โ”€ Demo collection rate: 5-20 demos/hour (after training)
โ”œโ”€โ”€ Demo quality: High variance (fatigue, concentration)
โ”œโ”€โ”€ Required personnel: Expert operator
โ””โ”€โ”€ Total cost: High

RL + Teleoperation (This paper):
โ”œโ”€โ”€ Training time: 1-5 hours (RL handles low-level)
โ”œโ”€โ”€ Demo collection rate: 20-100 demos/hour
โ”œโ”€โ”€ Demo quality: Consistent (RL provides stable base motion)
โ”œโ”€โ”€ Required personnel: General worker possible
โ””โ”€โ”€ Total cost: Medium

Pure Simulation RL (Zero-shot):
โ”œโ”€โ”€ Training time: Days to weeks (depending on compute)
โ”œโ”€โ”€ Real data: Not required
โ”œโ”€โ”€ Success rate: Low to medium (Sim2Real Gap)
โ”œโ”€โ”€ Adaptability: Low (retrain for each new object)
โ””โ”€โ”€ Total cost: Low (hardware cost only)

ROI ๋ถ„์„:

์ œ์กฐ ํ™˜๊ฒฝ์—์„œ ๋ณธ ์ ‘๊ทผ๋ฒ•์˜ ํˆฌ์ž ๋Œ€๋น„ ํšจ๊ณผ๋ฅผ ๋ถ„์„ํ•˜๋ฉด:

Initial Investment:
โ”œโ”€โ”€ Multi-fingered robot hand: $20,000-100,000
โ”œโ”€โ”€ Tactile sensors: $5,000-20,000
โ”œโ”€โ”€ Teleoperation equipment: $2,000-10,000
โ”œโ”€โ”€ System integration: $10,000-50,000
โ””โ”€โ”€ Total initial cost: $37,000-180,000

Annual Cost Savings:
โ”œโ”€โ”€ Labor cost reduction: $30,000-80,000 (replacing 1-2 workers)
โ”œโ”€โ”€ Quality cost reduction: $5,000-20,000 (lower defect rate)
โ”œโ”€โ”€ Flexibility value: $10,000-50,000 (fast line changeover)
โ””โ”€โ”€ Total annual savings: $45,000-150,000

Break-even point: 1-3 years

6.2.3 ๊ธฐ์ˆ  ์„ฑ์ˆ™๋„์™€ ์ฑ„ํƒ ์žฅ๋ฒฝ

Technology Readiness Level (TRL) ๋ถ„์„:

Current TRL: 4-5 (Validated in laboratory environment)

Challenges for TRL 6-7:
โ”œโ”€โ”€ Hardware reliability (MTBF > 10,000 hours)
โ”œโ”€โ”€ Software stability (99.9% uptime)
โ”œโ”€โ”€ Safety certification (ISO 10218, ISO/TS 15066)
โ””โ”€โ”€ User interface improvement

Challenges for TRL 8-9:
โ”œโ”€โ”€ Mass-producible hardware
โ”œโ”€โ”€ Standardized integration protocols
โ”œโ”€โ”€ Ease of maintenance
โ””โ”€โ”€ Total cost of ownership (TCO) optimization

์กฐ์ง์  ์ฑ„ํƒ ์žฅ๋ฒฝ:

Barrier Description Mitigation Strategy
Technical uncertainty Difficult to guarantee performance Validate with pilot projects
Initial investment High equipment cost RaaS (Robot-as-a-Service) model
Workforce transition Existing worker reallocation Phased deployment, retraining programs
Integration complexity Connection with existing systems Develop standard interfaces
Regulatory compliance Safety certification requirements Consider from early design stage

6.2.4 ๊ฒฝ์Ÿ ๊ธฐ์ˆ ๊ณผ์˜ ํฌ์ง€์…”๋‹

๊ธฐ์ˆ  ์ŠคํŽ™ํŠธ๋Ÿผ์—์„œ์˜ ์œ„์น˜:

Manual Work โ† -------- [This Paper] -------- โ†’ Full Automation
     โ†‘                    โ†‘                    โ†‘
   Max Flexibility     Balance Point       Max Speed
   Max Cost            Medium              Min Cost (at scale)
   Variable Quality    Consistent Quality  Consistent Quality

๋Œ€์•ˆ ๊ธฐ์ˆ ๊ณผ์˜ ๋น„๊ต:

Dedicated Automation Equipment:
โ”œโ”€โ”€ Advantages: High speed, proven reliability
โ”œโ”€โ”€ Disadvantages: No flexibility, high initial cost
โ””โ”€โ”€ Suitable for: Mass production, single product

Collaborative Robot (Simple gripper):
โ”œโ”€โ”€ Advantages: Low cost, easy programming
โ”œโ”€โ”€ Disadvantages: Cannot perform precision manipulation
โ””โ”€โ”€ Suitable for: Pick-and-place, simple assembly

This Paper's Multi-fingered Hand Approach:
โ”œโ”€โ”€ Advantages: High flexibility, complex tasks possible
โ”œโ”€โ”€ Disadvantages: Currently high cost, immature technology
โ””โ”€โ”€ Suitable for: High-mix low-volume, precision assembly, high-value products

7. ํ•œ๊ณ„์ ๊ณผ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

7.1 ํ˜„์žฌ ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„

1. ์ž‘์—… ๋ฒ”์œ„์˜ ์ œํ•œ: - ํ˜„์žฌ๋Š” ๋„ˆํŠธ-๋ณผํŠธ์™€ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™์— ์ง‘์ค‘ - ๋” ๋ณต์žกํ•œ bimanual ์กฐ์ž‘์ด๋‚˜ ๋„๊ตฌ ๊ต์ฒด๋กœ์˜ ํ™•์žฅ ํ•„์š”

2. ์ผ๋ฐ˜ํ™”์˜ ๊ฒฝ๊ณ„: - ํ›ˆ๋ จ๋œ ๊ฐ์ฒด ์นดํ…Œ๊ณ ๋ฆฌ ๋‚ด์—์„œ๋งŒ ์ผ๋ฐ˜ํ™” ๊ฒ€์ฆ - ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ํ˜•ํƒœ์˜ ์กฐ์ž‘์œผ๋กœ์˜ ์ „์ด๋Š” ๋ฏธ๊ฒ€์ฆ

3. ๊ฐ๊ฐ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ: - ์ด‰๊ฐ๊ณผ ๊ณ ์œ ๊ฐ๊ฐ ์ค‘์‹ฌ - ์‹œ๊ฐ ํ†ตํ•ฉ์— ๋Œ€ํ•œ ๋…ผ์˜ ์ œํ•œ์ 

4. ์‹ค์‹œ๊ฐ„ ์ ์‘: - ์˜คํ”„๋ผ์ธ ํ•™์Šต ํ›„ ๊ณ ์ •๋œ ์ •์ฑ… ๋ฐฐํฌ - ๋ฐฐํฌ ์ค‘ ์˜จ๋ผ์ธ ์ ์‘ ๋Šฅ๋ ฅ ๋ถˆ๋ช…ํ™•

7.2 ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

1. ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๊ฐ๊ฐ ์œตํ•ฉ:

Vision + Tactile + Proprioception + Audio
    โ†“
 Multimodal Transformer
    โ†“
 Unified Policy
  • ์‹œ๊ฐ์œผ๋กœ ์ „์—ญ์  ์ƒํ™ฉ ํŒŒ์•…
  • ์ด‰๊ฐ์œผ๋กœ ๊ตญ์†Œ์  ์ ‘์ด‰ ์ •๋ณด ํš๋“
  • ์ฒญ๊ฐ์œผ๋กœ ์ƒํƒœ ๋ณ€ํ™” ๊ฐ์ง€ (๋‚˜์‚ฌ๊ฐ€ ์กฐ์—ฌ์ง€๋Š” ์†Œ๋ฆฌ ๋“ฑ)

2. ์žฅ๊ธฐ ์ˆ˜ํ‰ ์ž‘์—…(Long-Horizon Tasks): - ํ˜„์žฌ: ๋‹จ์ผ ๊ฐ์ฒด์˜ ๋‹จ์ผ ์กฐ์ž‘ - ๋ฏธ๋ž˜: ์—ฐ์†์ ์ธ ์กฐ๋ฆฝ ์‹œํ€€์Šค, ์—๋Ÿฌ ๋ณต๊ตฌ

3. ๊ธฐ๋ฐ˜ ๋ชจ๋ธ(Foundation Models) ํ†ตํ•ฉ: - ๋Œ€๊ทœ๋ชจ ์‚ฌ์ „ํ›ˆ๋ จ๋œ ์ด‰๊ฐ/์กฐ์ž‘ ๋ชจ๋ธ - ์ƒˆ๋กœ์šด ์ž‘์—…์— few-shot ์ ์‘

4. ์•ˆ์ „์„ฑ๊ณผ ์‹ ๋ขฐ์„ฑ: - ์‹คํŒจ ์˜ˆ์ธก ๋ฐ ์•ˆ์ „ํ•œ ์ •์ง€ - ์ธ๊ฐ„๊ณผ์˜ ํ˜‘์—… ์‹œ๋‚˜๋ฆฌ์˜ค


8. ๊ฒฐ๋ก 

โ€œLearning Dexterous Manipulation Skills from Imperfect Simulationsโ€๋Š” ๋กœ๋ด‡ ์ •๋ฐ€ ์กฐ์ž‘ ๋ถ„์•ผ์—์„œ ์‹ค์šฉ์ ์ด๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ๋Š”:

  1. ๋ถˆ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ์ˆ˜์šฉ: ์™„๋ฒฝํ•œ ๋ฌผ๋ฆฌ์  ์ •ํ™•์„ฑ ๋Œ€์‹  ํ–‰๋™ ๊ตฌ์กฐ์˜ ์ถœํ˜„์— ์ง‘์ค‘

  2. 3๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ: ์‹œ๋ฎฌ๋ ˆ์ด์…˜ RL โ†’ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ โ†’ ์ด‰๊ฐ ํ†ตํ•ฉ BC๋กœ ์ด์–ด์ง€๋Š” ์ฒด๊ณ„์  ์ ‘๊ทผ

  3. ์ด‰๊ฐ์˜ ํ•„์ˆ˜์„ฑ: ์ •๋ฐ€ ์กฐ์ž‘์—์„œ ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ์˜ ์ค‘์š”์„ฑ ์‹ค์ฆ

  4. ์‹ค์šฉ์  ํƒœ์Šคํฌ: ์‚ฐ์—…์ ์œผ๋กœ ์˜๋ฏธ ์žˆ๋Š” ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ๊ณผ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™์—์„œ ํšจ๊ณผ ๊ฒ€์ฆ

๋กœ๋ด‡๊ณตํ•™ ์—ฐ๊ตฌ์ž๋“ค์—๊ฒŒ ์ด ๋…ผ๋ฌธ์€ Sim-to-Real์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ์˜ ๊ฐ„๊ทน์„ ์ขํžˆ๋ ค ํ•˜๊ธฐ๋ณด๋‹ค, ๊ทธ ๊ฐ„๊ทน์„ ์ธ์ •ํ•˜๊ณ  ์ฒด๊ณ„์ ์œผ๋กœ ๋ณด์™„ํ•˜๋Š” ์ ‘๊ทผ๋ฒ•์€ ํ–ฅํ›„ ๋ฒ”์šฉ ์กฐ์ž‘ ๋กœ๋ด‡ ๊ฐœ๋ฐœ์— ์ค‘์š”ํ•œ ์ด์ •ํ‘œ๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.


๋ณธ ๋ฆฌ๋ทฐ์—์„œ ์–ธ๊ธ‰๋œ ๊ด€๋ จ ์—ฐ๊ตฌ๋“ค:

  1. OpenAI, โ€œLearning Dexterous In-Hand Manipulation,โ€ IJRR 2020
  2. Chen et al., โ€œVisual Dexterity: In-Hand Reorientation of Novel and Complex Object Shapes,โ€ Science Robotics 2023
  3. Handa et al., โ€œDexTreme: Transfer of Agile In-Hand Manipulation from Simulation to Reality,โ€ ICRA 2023
  4. Wang et al., โ€œCyberDemo: Augmenting Simulated Human Demonstration,โ€ CVPR 2024
  5. Lin et al., โ€œSim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids,โ€ arXiv 2025
  6. Yu & Wang, โ€œDexterous Manipulation for Multi-Fingered Robotic Hands With Reinforcement Learning: A Review,โ€ Frontiers in Neurorobotics 2022

โ›๏ธ Dig Review

โ›๏ธ Dig โ€” Go deep, uncover the layers. Dive into technical detail.

2025๋…„ ๋ฐœํ‘œ๋œ DexScrew ๋…ผ๋ฌธ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ๊ฐ„ ๊ฒฉ์ฐจ(sim-to-real gap)๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ํ…”๋ ˆ์˜ต๋ ˆ์ด์…˜, ํ–‰๋™ ํด๋กœ๋‹์„ ๊ฒฐํ•ฉํ•œ ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ €์ž๋“ค์€ ๊ฐ„์†Œํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ๊ธฐ๋ณธ์ ์ธ ํšŒ์ „ ์กฐ์ž‘ ํ–‰๋™์„ ํ•™์Šตํ•˜๊ณ , ์ด๋ฅผ ์‹ค์„ธ๊ณ„ ํ…”๋ ˆ์˜ต๋ ˆ์ด์…˜ ์Šคํ‚ฌ๋กœ ํ™œ์šฉํ•˜์—ฌ ์‹ค์ œ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•œ ๋’ค, ์ตœ์ข…์ ์œผ๋กœ ๋‹ค์ค‘ ๊ฐ๊ฐ(tactile) ์ •๋ณด๋ฅผ ํฌํ•จํ•œ ํ–‰๋™ ํด๋กœ๋‹์œผ๋กœ ์‹ค์ œ ์ž‘์—… ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์„ ํ†ตํ•ด ๋ชจ์˜ ๋ฌผ๋ฆฌ์™€ ์ด‰๊ฐ ์„ผ์‹ฑ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ณ , ๋‚˜์‚ฌ ์ฒด๊ฒฐ ๋ฐ ๋„ˆํŠธ-๋ณผํŠธ ์กฐ๋ฆฝ๊ณผ ๊ฐ™์€ ์ ‘์ด‰์ด ๋ณต์žกํ•œ ์ž‘์—…์„ ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ์ •์ฑ…์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ•๋ก  ๊ฐœ์š” ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ

DexScrew ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ RL โ†’ ์Šคํ‚ฌ๊ธฐ๋ฐ˜ ํ…”๋ ˆ์˜ต๋ ˆ์ด์…˜ โ†’ ํ–‰๋™ ํด๋กœ๋‹์˜ ์„ธ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„์˜ ์ฃผ์š” ๊ตฌ์„ฑ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  1. Oracle RL ์ •์ฑ… ํ•™์Šต: ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๊ฐ•ํ™”ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ธฐ์ดˆ์ ์ธ ํšŒ์ „ ์šด๋™ ์Šคํ‚ฌ์„ ํš๋“ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ํŠน๊ถŒ ์ •๋ณด(privileged information)๋ฅผ ํ™œ์šฉํ•ด ๊ต์‚ฌ(oracle) ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋Š” ๋„ˆํŠธ/๋ณผํŠธ ๋˜๋Š” ๋“œ๋ผ์ด๋ฒ„ ํ•ธ๋“ค์„ ํšŒ์ „๊ด€์ ˆ(revolute joint)๋กœ ๋‹จ์ˆœํ™”ํ•˜์—ฌ ์‹ค๋ฌผ์˜ ๋‚˜์‚ฌ์‚ฐ ๋ชจ๋ธ๋ง์„ ๊ฑด๋„ˆ๋œ๋‹ˆ๋‹ค. ๊ต์‚ฌ ์ •์ฑ…์€ ๋ฌผ์ฒด์˜ ์ •ํ™•ํ•œ ์œ„์น˜, ํฌ๊ธฐ, ์งˆ๋Ÿ‰, ๋งˆ์ฐฐ๊ณ„์ˆ˜, ๊ด€์„ฑ ์ค‘์‹ฌ ๋ฐ ์†๊ฐ€๋ฝ์˜ ์ ‘์ด‰ ์ƒํƒœ ๋“ฑ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋‚ด๋ถ€ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•™์Šต๋œ ๊ต์‚ฌ ์ •์ฑ…์€ โ€œ์ •ํ™•ํ•œโ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ •๋ณด๋ฅผ ํ†ตํ•ด ๋น ๋ฅด๊ฒŒ ํšŒ์ „ ์Šคํ‚ฌ์„ ํ„ฐ๋“ํ•˜์ง€๋งŒ, ์‹ค์ œ ํ™˜๊ฒฝ์— ์ง์ ‘ ์ ์šฉํ•˜๊ธฐ์—๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ๋ถˆ์™„์ „์„ฑ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

  2. Sensorimotor ์ •์ฑ… ์ถ”์ถœ(PADAPT ๊ธฐ๋ฐ˜ ํ•™์ƒ ์ •์ฑ…): ๊ต์‚ฌ ์ •์ฑ…์—์„œ ์–ป์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๊ด€์ ˆ ์œ„์น˜ ๋“ฑ ๋‚ด๋ถ€ ๊ฐ๊ฐ(proprioceptive) ์ •๋ณด๋งŒ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š” ํ•™์ƒ ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ €์ž๋“ค์€ DAgger(On-policy ํ–‰์œ„ ๋ณต์ œ) ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ํ•™์ƒ ์ •์ฑ…์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‹คํ–‰ํ•˜๊ณ , ๊ทธ ์ˆœ๊ฐ„ ๊ต์‚ฌ ์ •์ฑ…์ด ์˜ˆ์ธกํ•œ ํ–‰๋™์„ ์ง€๋„ ์‹ ํ˜ธ๋กœ ์‚ผ์•„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ํ•™์ƒ ์ •์ฑ…์€ ์—ญ์‚ฌ ๊ธฐ๋ฐ˜ ์ž ์žฌ ์ž„๋ฒ ๋”ฉ ๋ชจ๋“ˆ์„ ํ†ตํ•ด ๊ณผ๊ฑฐ ๊ด€์ ˆ ์ƒํƒœ ์ •๋ณด(๊ณผ๊ฑฐ joint ๋ชฉํ‘œ๊ฐ’)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ต์‚ฌ์˜ ํŠน๊ถŒ ์ •๋ณด๋ฅผ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค. ํ•™์ƒ ์ •์ฑ…์˜ ๊ด€์ฐฐ ๊ณต๊ฐ„์€ ๊ด€์ ˆ ์œ„์น˜์™€ 3-step ๊ณผ๊ฑฐ ๋ชฉํ‘œ๊ฐ’(์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ)์ด๋ฉฐ, ๊ต์‚ฌ ์ •์ฑ…์˜ ์ •๋ฐ€ํ•จ์„ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ์‚ผ์•„ RL+Behavior Cloning ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ์ด๋กœ์จ, ๊ต์‚ฌ(oracle)์˜ ํŠน๊ถŒ๊ด€์ฐฐ(privileged observation) ์—†์ด๋„ ๊ทผ์‚ฌํ™”๋œ ํšŒ์ „ ์Šคํ‚ฌ์„ ํš๋“ํ•œ ํ•™์ƒ ์ •์ฑ…์ด ๋งŒ๋“ค์–ด์ง‘๋‹ˆ๋‹ค.

  3. ์Šคํ‚ฌ๊ธฐ๋ฐ˜ ํ…”๋ ˆ์˜ต๋ ˆ์ด์…˜(data collection): ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ํ•™์Šตํ•œ ํšŒ์ „ ์Šคํ‚ฌ์„ ์‹ค์ œ ์กฐ์ž‘ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์— ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ๋žŒ ์กฐ์ž‘์ž๋Š” ๊ฐ€์ƒํ˜„์‹ค(VR) ์กฐ์ด์Šคํ‹ฑ์„ ํ†ตํ•ด ๋กœ๋ด‡ ํŒ”์˜ ์œ„์น˜์™€ ์ž์„ธ๋งŒ์„ ์ œ์–ดํ•˜๊ณ , ์†๊ฐ€๋ฝ ๊ด€์ ˆ์˜ ํšŒ์ „ ๋™์ž‘์€ ํ•™์Šต๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์œผ๋กœ ์ž๋™ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ธ๊ฐ„์€ ์†๋ชฉ(wrist)์˜ ์œ„์น˜ยท์ž์„ธ๋ฅผ ์กฐ์ •ํ•˜๋ฉฐ ํšŒ์ „ ์Šคํ‚ฌ์˜ ์‹œ์ž‘/์ค‘๋‹จ ํƒ€์ด๋ฐ๋งŒ ๊ฒฐ์ •ํ•˜๊ณ , ๋ณต์žกํ•œ ์†๊ฐ€๋ฝ ํ˜‘์‘์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์— ๋งก๊น๋‹ˆ๋‹ค. ์ด ๋•๋ถ„์— ๋น„์ „๋ฌธ๊ฐ€๋„ ๋ณต์žกํ•œ ์†๋™์ž‘์„ ์ผ์ผ์ด ์กฐ์ž‘ํ•  ํ•„์š” ์—†์ด ํšจ์œจ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ…”๋ ˆ์˜ต๋ ˆ์ด์…˜ ์ค‘ ๋งค ํƒ€์ž„์Šคํ…๋งˆ๋‹ค ๋กœ๋ด‡์˜ ๊ด€์ ˆ ์ƒํƒœ ๋ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์ด ์ƒ์„ฑํ•œ ์†๊ฐ€๋ฝ ๋ชฉํ‘œ ๊ด€์ ˆ(action)๊ณผ ์ธ๊ฐ„์ด ์กฐ์ด์Šคํ‹ฑ์œผ๋กœ ์ œ์–ดํ•œ ํŒ” ๊ด€์ ˆ(action)์„ ํ•จ๊ป˜ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ XHand ํ•ธ๋“œ์— ์žฅ์ฐฉ๋œ ๊ณ ํ•ด์ƒ๋„ ์ด‰๊ฐ ์„ผ์„œ(๊ฐ ์†๊ฐ€๋ฝ ๋์— 120๊ฐœ์˜ 3์ถ• ์••๋ ฅ ์„ผ์„œ, ์ตœ์†Œ ๊ฐ์ง€๋ ฅ 5gf)๋ฅผ ์ด์šฉํ•ด ๋‹ค์ค‘๊ฐ๊ฐ ๊ด€์ฐฐ(๊ด€์ ˆ ์œ„์น˜ยท์†๋„์™€ ์ด‰๊ฐ ์‹ ํ˜ธ)์„ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋Š” ์‹ค์ œ ์ž‘์—… ํ™˜๊ฒฝ์—์„œ ์†๊ฐ€๋ฝ ์ ‘์ด‰ ํŒจํ„ด๊ณผ ํž˜ ์ •๋ณด(tactile), ํŒ” ์œ„์น˜ ๋“ฑ ๋‹ค์–‘ํ•œ ์„ผ์„œ ์ •๋ณด๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  4. ํ–‰๋™ ํด๋กœ๋‹(Behavior Cloning)์œผ๋กœ ์ตœ์ข… ์ •์ฑ… ํ•™์Šต: ์ˆ˜์ง‘ํ•œ ํ…”๋ ˆ์˜ต๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์‹ค์ œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๋‹ค์ค‘๊ฐ๊ฐ ๊ด€์ฐฐ(๊ด€์ ˆ์ •๋ณด ๋ฐ ์ด‰๊ฐ์ •๋ณด ์—ญ์‚ฌ)๊ณผ ์ˆ˜์ง‘๋œ ์ „๋ฌธ๊ฐ€ ํ–‰๋™(์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…๊ณผ ์ธ๊ฐ„ ์กฐ์ž‘์˜ ๊ฒฐํ•ฉ๋œ ํ–‰๋™)์„ ๋งคํ•‘ํ•˜๋„๋ก ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ํ–‰๋™ ํด๋กœ๋‹์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ… ์‹ ๊ฒฝ๋ง์€ ๊ณผ๊ฑฐ ๊ด€์ฐฐ๋“ค์„ ์‹œํ€€์Šค๋กœ ์—ฐ๊ฒฐ(concatenate)ํ•˜์—ฌ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ด‰๊ฐ ๋ฐ์ดํ„ฐ๋Š” ๋จผ์ € MLP๋ฅผ ๊ฑฐ์ณ ์••์ถ•๋œ ํ›„, Hourglass ์•„ํ‚คํ…์ฒ˜๋กœ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด ๋…ผ๋ฌธ์€ ์•ก์…˜ ์ฒญํ‚น(Action Chunking) ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ํ•œ ํƒ€์ž„์Šคํ…์ด ์•„๋‹ˆ๋ผ ์—ฐ์†๋œ ์•ก์…˜ ์‹œํ€€์Šค(์˜ˆ: ์ผ์ • ์‹œ๊ฐ„ ๋™์•ˆ์˜ ์—ฐ์† ๋ช…๋ น)๋ฅผ ํ•œ ๋ฒˆ์— ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต ์†์‹ค๋กœ๋Š” ์˜ˆ์ธกํ•œ ์•ก์…˜ ์ฒญํ‚น ์‹œํ€€์Šค์™€ ์ „๋ฌธ๊ฐ€ ์‹œํ€€์Šค ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” MSE ์†์‹ค์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•™์Šต๋œ ์ •์ฑ…์€ ์‹œ๊ฐ„ ์ •๋ณด์™€ ์ด‰๊ฐ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ํ™œ์šฉํ•˜์—ฌ ํŒ”๊ณผ ์†๊ฐ€๋ฝ์˜ ํ˜‘์‘๋œ ์›€์ง์ž„์„ ๊ตฌํ˜„ํ•˜๋ฉฐ, ์ตœ์ข…์ ์œผ๋กœ ์‹ค์ œ ๋กœ๋ด‡์—์„œ ๋„ˆํŠธ ์ฒด๊ฒฐ๊ณผ ์Šคํฌ๋ฅ˜ ๋“œ๋ผ์ด๋น™ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

ํŒŒ์ดํ”„๋ผ์ธ์˜ ํ•™์Šต ๊ณผ์ •์—์„œ ๊ฐ•ํ™”ํ•™์Šต๊ณผ ์ง€๋„ํ•™์Šต ๋ชจ๋‘ ํ™œ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‹จ๊ณ„์˜ Oracle ์ •์ฑ… ํ•™์Šต์—๋Š” Proximal Policy Optimization(PPO)์„ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, ๊ด€์ฐฐ ๋ฐ ํ–‰๋™์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค:

  • ๊ด€์ฐฐ(Observation): ๋กœ๋ด‡์˜ ๊ด€์ ˆ ์œ„์น˜ ๋ฐ ๋ชฉํ‘œ ์œ„์น˜(๊ณผ๊ฑฐ 3์Šคํ… ์ด๋ ฅ)์™€ ํŠน๊ถŒ ์ •๋ณด. ํŠน๊ถŒ ์ •๋ณด๋Š” ๊ฐ์ฒด ์œ„์น˜/ํฌ๊ธฐ/์งˆ๋Ÿ‰/๋งˆ์ฐฐ/๊ด€์„ฑ์ค‘์‹ฌ ๋“ฑ์˜ ํ™˜๊ฒฝ ๋ณ€์ˆ˜์™€ ํ•ธ๋“œ ๊ด€์ ˆ ์ƒํƒœ, PD ์ œ์–ด๊ธฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ๋“ฑ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
  • ํ–‰๋™(Action): ์ƒ๋Œ€์  ๊ด€์ ˆ ๋ชฉํ‘œ ์œ„์น˜(relativ target position)๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ์ œ์–ด์—์„œ๋Š” ํ˜„์žฌ ๊ด€์ ˆ ์œ„์น˜์— ์•ก์…˜์˜ ์Šค์ผ€์ผ์„ ๊ณฑํ•œ ๊ฐ’ (pos + 0.1 * action)์„ PD ์ปจํŠธ๋กค๋Ÿฌ๋กœ ๋ณด๋‚ด์„œ ํ† ํฌ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ณด์ƒ(Reward): ํšŒ์ „ ์ถ• ์†๋„(angular velocity)๊ฐ€ ์–‘์„ฑ์ผ์ˆ˜๋ก ๋†’์€ ๋ณด์ƒ์„ ์ฃผ๋Š” ํšŒ์ „ ๋ณด์ƒ(rotation reward)๊ณผ, ์†๊ฐ€๋ฝ์ด ๋ฌผ์ฒด์— ๊ฐ€๊นŒ์ด ์œ ์ง€๋˜๋„๋ก ํ•˜๋Š” ๊ทผ์ ‘ ๋ณด์ƒ(proximity reward)์„ ์ค๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ๊ณผ๋„ํ•œ ์—๋„ˆ์ง€ ์‚ฌ์šฉ์„ ์–ต์ œํ•˜๋Š” ๊ด€์ ˆ ํ† ํฌ ๋ฐ ์ž‘์—…๋Ÿ‰ ํŽ˜๋„ํ‹ฐ, ์†๊ฐ€๋ฝ ์ดˆ๊ธฐ ์ž์„ธ ์œ ์ง€ ํŽธ์ฐจ ํŽ˜๋„ํ‹ฐ ๋“ฑ ์•ˆ์ •์„ฑ ์œ ์ง€๋ฅผ ์œ„ํ•œ ํŽ˜๋„ํ‹ฐ๊ฐ€ ๊ฐ€์ค‘ํ•ฉ์œผ๋กœ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํšŒ์ „ ๋ณด์ƒ์€ ์ถ•์†๋„๋ฅผ ์Œ์œผ๋กœ ํด๋ฆฌํ•‘ํ•œ ๋’ค ์ •๊ทœํ™”ํ•˜์—ฌ ์–‘์„ฑ ํšŒ์ „์—๋งŒ ๊ฐ’์„ ๋ถ€์—ฌํ•˜๋ฉฐ, ์—๋„ˆ์ง€ ํŽ˜๋„ํ‹ฐ๋Š” ํฌ๊ณ  ๋น ๋ฅธ ํ† ํฌ ๋™์ž‘์— ๋น„์šฉ์„ ๋ถ€๊ณผํ•ฉ๋‹ˆ๋‹ค.

Oracle ์ •์ฑ… ํ•™์Šต ์‹œ์—๋Š” ๋„๋ฉ”์ธ ๋žœ๋คํ™”(Domain Randomization)๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ฌผ์ฒด ์งˆ๋Ÿ‰, ๋งˆ์ฐฐ๊ณ„์ˆ˜, ํฌ๊ธฐ, ์ปจํŠธ๋กค๋Ÿฌ ์ด๋“ ๋“ฑ์„ ๋ฌด์ž‘์œ„๋กœ ๋ณ€ํ™”์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ๋ถˆํ™•์‹ค์„ฑ์„ ๊ณ ๋ คํ•ด ์‹ค์ œ ์ƒํ™ฉ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, Thumb/Index ์†๊ฐ€๋ฝ๊ณผ ๋ฌผ์ฒด ์‚ฌ์ด ๊ฑฐ๋ฆฌ๊ฐ€ ์ผ์ • ํ•œ๊ณ„ ์ด์ƒ ๋ฉ€์–ด์ง€๊ฑฐ๋‚˜, ๋ฌผ์ฒด๊ฐ€ ์ •์ง€ํ•˜๊ฑฐ๋‚˜ ์ ‘์ด‰๋ ฅ์ด ์‚ฌ๋ผ์ง€๋Š” ๊ฒฝ์šฐ ์—ํ”ผ์†Œ๋“œ๋ฅผ ์กฐ๊ธฐ ์ข…๋ฃŒํ•˜๋Š” ์กฐ๊ฑด์„ ๋‘์–ด ์‹คํŒจ ๋ชจ๋“œ๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•™์Šต์—์„œ ์ œ์™ธํ•ฉ๋‹ˆ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‹จ๊ณ„์˜ ํ•™์Šต ์„ธ๋ถ€ ์„ค์ •์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. PPO๋Š” 8,192๊ฐœ์˜ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์—์„œ ์•ฝ 3.1\times10^9 ์ƒ˜ํ”Œ(ํ™˜๊ฒฝ ์Šคํ…)์„ ์ˆ˜์ง‘ํ•˜์—ฌ ํ›ˆ๋ จํ–ˆ์œผ๋ฉฐ, ํ•™์Šต๋ฅ ์€ 5\times10^{-3}๋กœ ์„ค์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ •์ฑ…๊ณผ ๊ฐ€์น˜ ํ•จ์ˆ˜๋Š” ๋ณ„๋„์˜ MLP(๋‹ค์ธต ํผ์…‰ํŠธ๋ก )๋กœ ๋กœ๋ด‡ ์ƒํƒœ์™€ ํŠน๊ถŒ ์ •๋ณด๋ฅผ ๊ฐ๊ฐ ์ž„๋ฒ ๋”ฉํ•œ ๋’ค ๊ฒฐํ•ฉํ•˜์—ฌ ์˜ˆ์ธกํ•˜๋„๋ก ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•™์ƒ ์ •์ฑ… ํ•™์Šต ์‹œ์—๋„ on-policy BC ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ, ๋งค ์‹œ์ ๋งˆ๋‹ค ํ•™์ƒ์ด ์ˆ˜ํ–‰ํ•œ ํ–‰๋™๊ณผ ๊ต์‚ฌ๊ฐ€ ์˜ˆ์ธกํ•œ ํ–‰๋™์„ ๋น„๊ตํ•˜๋ฉฐ ์˜ตํ‹ฐ๋งˆ์ด์ฆˆ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋Œ€๊ทœ๋ชจ ํ•™์Šต์„ ํ†ตํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์€ ์ƒ๋Œ€์ ์œผ๋กœ ๋น ๋ฅด๊ฒŒ ์•ˆ์ •์  ํšŒ์ „ ๋™์ž‘์„ ํš๋“ํ•˜๋ฉฐ, ํ•™์Šต ๊ณผ์ • ์ „์ฒด๋Š” ๋‹จ์ผ GPU์—์„œ 1์ผ ๋‚ด์™ธ์˜ ์‹œ๊ฐ„์— ์™„๋ฃŒ๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ์„ค์ • ๋ฐ ๊ฒฐ๊ณผ ๋ถ„์„

ํ•˜๋“œ์›จ์–ด ๋ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ

์‹คํ—˜์—์„œ๋Š” UR5e ๋กœ๋ด‡ ํŒ”(6DoF)๊ณผ 12DoF์˜ XHand ๋‹ค์ง€ ์†์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. XHand๋Š” ์—„์ง€์™€ ๊ฒ€์ง€์— ๊ฐ๊ฐ 3์ž์œ ๋„(ํšŒ์ „+ํ„/๊ตฝํž˜+๋ฒŒ๋ฆผ/๋ชจ์Œ)๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋‚˜๋จธ์ง€ ์„ธ ์†๊ฐ€๋ฝ์€ ๊ฐ๊ฐ 2์ž์œ ๋„์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ NVIDIA Isaac Gym ์—”์ง„์„ ์‚ฌ์šฉํ•˜์˜€๊ณ , 8,192๊ฐœ์˜ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์„ ํ†ตํ•ด ํ•™์Šตํ–ˆ์œผ๋ฉฐ, ๊ฐ ์—ํ”ผ์†Œ๋“œ ์ตœ๋Œ€ ๊ธธ์ด๋Š” 800 ์Šคํ…(์ œ์–ด ์ฃผ๊ธฐ 20Hz, 40์ดˆ ์ƒ๋‹น)์ž…๋‹ˆ๋‹ค.

์กฐ์ž‘ ๊ณผ์ œ๋Š” ๋‘ ๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค: (1) ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ(fastening)๊ณผ (2) ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋น™. ํ•™์Šต ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ๋Š” ์‹ค์ œ ๋‚˜์‚ฌ์‚ฐ ๋Œ€์‹  ํšŒ์ „๊ฐ€๋Šฅํ•œ ์กฐ์ธํŠธ๋กœ ์—ฐ๊ฒฐ๋œ ๊ธฐํ•˜ํ•™์  ๋ฌผ์ฒด ๋ชจ๋ธ(์‚ผ๊ฐํ˜• ๋˜๋Š” ๋‹ค๊ฐํ˜• ๋‹จ๋ฉด ๋„ˆํŠธ, ๋‹ค๊ฐํ˜• ์†์žก์ด ๋“ฑ)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋„ˆํŠธ ์ฒด๊ฒฐ ํ•™์Šต์—๋Š” ์‚ผ๊ฐํ˜• ๋ชจ์–‘ ๋„ˆํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋†’์€ ์ „๋ณต(clearance)์„ ๊ฐ€์ง„ ํšŒ์ „ ๋ณดํ–‰์„ ์œ ๋„ํ–ˆ๊ณ , ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋น™ ํ•™์Šต์—๋Š” ์›ํ˜•์ด ์•„๋‹Œ 8๊ฐํ˜•/12๊ฐํ˜• ํ•ธ๋“ค์„ ๋‹ค์–‘ํ•˜๊ฒŒ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์–‘ํ•œ ํ˜•์ƒ์„ ๋…ธ์ถœํ•˜์—ฌ ์‹ค์ œ์—์„œ ๋ชจ์–‘ ๋ณ€ํ™”์— ๊ฒฌ๊ณ ํ•˜๋„๋ก ์ผ๋ฐ˜ํ™”ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค.

์„ฑ๋Šฅ ํ‰๊ฐ€์ง€ํ‘œ๋กœ๋Š” ์ง„ํ–‰๋ฅ (progress ratio)์™€ ์™„๋ฃŒ์‹œ๊ฐ„(completion time)์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ง„ํ–‰๋ฅ ์€ ํ•„์š” ํšŒ์ „ ํšŸ์ˆ˜ ๋Œ€๋น„ ์‹ค์ œ ๋‹ฌ์„ฑํ•œ ํšŒ์ „ ํšŸ์ˆ˜์˜ ๋น„์œจ๋กœ ์ •์˜ํ•˜๋ฉฐ, 100%๋Š” ์ž‘์—… ์™„์ „ ์„ฑ๊ณต์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์™„๋ฃŒ์‹œ๊ฐ„์€ ์™„์ „ ์ฒด๊ฒฐ(์ง„ํ–‰๋ฅ  100%)์„ ์ด๋ฃจ๋Š” ๋ฐ ๊ฑธ๋ฆฐ ์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค. ๋ฒ ์ด์Šค๋ผ์ธ์œผ๋กœ๋Š” ์ˆœ์ˆ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์˜ ์ง์ ‘ ์ ์šฉ(direct sim-to-real), ์‹ค์ œ๋กœ ์ˆ˜์ง‘๋œ ์ „๋ฌธ๊ฐ€ ํ…”๋ ˆ์˜ต ๊ถค์ ์˜ ๋ฆฌํ”Œ๋ ˆ์ด ๋“ฑ์ด ๋น„๊ต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ ์š”์•ฝ

  • ๋„ˆํŠธ-๋ณผํŠธ ์ฒด๊ฒฐ(Nut-Bolt Fastening): ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์„ ๋ฐ”๋กœ ์‚ฌ์šฉํ•˜๋ฉด ์Šคํฌ๋ฅ˜ ํšŒ์ „์€ ๊ฐ€๋Šฅํ•˜๋‚˜ ๋„ˆํŠธ๋ฅผ ์•„๋ž˜๋กœ ๋ฐ€์–ด๋„ฃ์ง€ ๋ชปํ•ด ์ž‘์—…์„ ์™„๋ฃŒํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ(์‚ฌ๋žŒ์ด ์†๋ชฉ ์กฐ์ • ํฌํ•จ)๋ฅผ BCํ•˜์—ฌ ์–ป์€ ์ •์ฑ…์€ ๋‹ค์–‘ํ•œ ๋„ˆํŠธํ˜•์ƒ(์ •์‚ฌ๊ฐํ˜•, ์‚ผ๊ฐํ˜•, ์œก๊ฐํ˜•, ์‹ญ์žํ˜•)์—์„œ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ๊ฐ„ ์ด๋ ฅ(history)๊ณผ ์ด‰๊ฐ ์ •๋ณด๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•œ ์ •์ฑ…์ด ๊ฐ€์žฅ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค. ํ‘œ I์˜ ๊ฒฐ๊ณผ์—์„œ ๋ณผ ๋•Œ, ์‹œ๊ฐ„ ์ด๋ ฅ๋งŒ ์‚ฌ์šฉํ•  ๋•Œ๋ณด๋‹ค ์ด‰๊ฐ์„ ์ถ”๊ฐ€ํ•˜๋ฉด ์–ด๋ ค์šด ํ˜•์ƒ(์‚ผ๊ฐํ˜•, ์‹ญ์žํ˜•)์—์„œ ์„ฑ๊ณต๋ฅ ์ด ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜๋ฉฐ, ๋‘ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•  ๋•Œ ๊ฑฐ์˜ ๋ชจ๋“  ๊ฒฝ์šฐ์—์„œ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋ƒˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ด‰๊ฐ+์ด๋ ฅ ์‚ฌ์šฉ ์‹œ ์ •์‚ฌ๊ฐํ˜• ๋„ˆํŠธ 97.5%, ์‹ญ์žํ˜• ๋„ˆํŠธ 95%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์ธ ๋ฐ˜๋ฉด, ์ด‰๊ฐ ์—†์ด ์ด๋ ฅ๋งŒ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ๋Š” ๊ฐ๊ฐ 87.5%์™€ 85.0%์˜€์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฐ๊ณผ๋Š” ์ด‰๊ฐ ์ •๋ณด๊ฐ€ ์–ด๋ ค์šด ํ˜•์ƒ์—์„œ์˜ ์•ˆ์ •์„ฑ ์œ ์ง€์™€ ํšŒ์ „ ์ง„์ฒ™ ๊ฐ์ง€์— ํŠนํžˆ ์œ ๋ฆฌํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ด๋ ฅ์ด ์—†๋Š” ์ •์ฑ…์€ ๋ชจ์–‘์„ ์ถ”๋ก ํ•˜๊ธฐ ์–ด๋ ค์›Œ ์ผ๋ฐ˜ํ™”๊ฐ€ ๋–จ์–ด์กŒ๊ณ , ์ด๋ ฅ ๋•๋ถ„์— ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๊ฐ€ ์ƒ๋‹นํžˆ ์™„ํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋น™(Screwdriving): ๋„ˆํŠธ ์ฒด๊ฒฐ๋ณด๋‹ค ์ œ์•ฝ์ด ์ ์–ด ํŒ”์„ ๊ณ ์ •ํ•ด๋„ ์–ด๋А ์ •๋„ ํšŒ์ „์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹คํ—˜ ๊ฒฐ๊ณผ ์ˆœ์ˆ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์œผ๋กœ๋Š” ์ง„ํ–‰๋ฅ  ์•ฝ 41.6%๋ฐ–์— ์–ป์ง€ ๋ชปํ–ˆ๊ณ (์™„์ „ ์„ฑ๊ณต ์‚ฌ๋ก€๊ฐ€ ์—†์–ด ์™„๋ฃŒ์‹œ๊ฐ„ ๊ณ„์‚ฐ ๋ถˆ๊ฐ€), ์ „๋ฌธ๊ฐ€ ๋ฆฌํ”Œ๋ ˆ์ด(์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์žฌ์ƒ) ์ •์ฑ…๋„ 50.8%์— ๊ทธ์ณค์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด DexScrew ๋ฐฉ๋ฒ•์œผ๋กœ ํ•™์Šต๋œ BC ์ •์ฑ…์€ ๊ธฐ๋ณธ์ ์œผ๋กœ 69.2%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ๊ณ , ์ด๋ ฅ์„ ์ถ”๊ฐ€ํ•˜๋ฉด 67.6%, ์ด‰๊ฐ๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด 87.5%, ๋‘ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋ฉด ๋ฌด๋ ค 95.0%๊นŒ์ง€ ์ƒ์Šนํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ด‰๊ฐ๊ณผ ์‹œ๊ฐ„ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•  ๋•Œ ํ‰๊ท  ์™„๋ฃŒ์‹œ๊ฐ„๋„ ํฌ๊ฒŒ ๊ฐ์†Œํ•˜๋Š” ๋“ฑ ํšจ์œจ์„ฑ ๋ฉด์—์„œ๋„ ์šฐ์ˆ˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ๋•Œ๋•Œ๋กœ BC๊ฐ€ ์ „๋ฌธ๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•œ ๋ชจ๋ธ๋ณด๋‹ค ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๋Š”๋ฐ, ์ด๋Š” ์„ฑ๊ณตํ•œ ๊ถค์ ๋งŒ ์„ ๋ณ„ํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ํ•„ํ„ฐ๋ง๋œ ํ–‰๋™ ํด๋กœ๋‹ ํšจ๊ณผ๋กœ ์„ค๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ์ •์„ฑ ์‹คํ—˜: ํ•™์Šต๋œ ์ •์ฑ…์€ ์™ธ๋ž€ ์ƒํ™ฉ์—์„œ๋„ ๊ฐ•๊ฑด์„ฑ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์†๊ฐ€๋ฝ์ด ๋ฌผ์ฒด์—์„œ ๋ฐ€๋ ค๋‚˜๊ฑฐ๋‚˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ๋Œ๋ ค์ง€๋Š” ๊ฒฝ์šฐ์—๋„, ์ •์ฑ…์€ ์ด‰๊ฐ ์‹ ํ˜ธ ํŒจํ„ด์ด ์ •์ƒ์ ์ธ ํšŒ์ „ ๋‹จ๊ณ„์™€ ๋‹ฌ๋ผ์ง์„ ์ธ์‹ํ•˜๊ณ  ์†๋ชฉ ๋ฐฉํ–ฅ์„ ์กฐ์ •ํ•˜์—ฌ ํšŒ์ „์„ ํšŒ๋ณตํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ด‰๊ฐ ์‹œ๊ทธ๋‹ˆ์ฒ˜ ๋ถ„์„ ๊ฒฐ๊ณผ, ์˜ฌ๋ฐ”๋ฅธ ํšŒ์ „ ์ ‘์ด‰ ์ƒํƒœ์—์„œ๋Š” ์•ˆ์ •์ ์ธ ์ด‰๊ฐ ํŒจํ„ด์ด ๋‚˜ํƒ€๋‚˜๋ฉฐ, ์ด๋ฅผ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์ •์ฑ…์ด ์†๋ชฉ ๊ฐ๋„์™€ ์••๋ ฅ์„ ์กฐ์ ˆํ•˜๋Š” ๊ฒฝํ–ฅ์ด ๊ด€์ฐฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์„ฑ๋Šฅ ์ง€ํ‘œ ๋ฐ ์ •๋Ÿ‰ ํ‰๊ฐ€

์‹คํ—˜์—์„œ๋Š” ๊ฐ ๋ฐฉ์‹์˜ ์ง„ํ–‰๋ฅ (Progress Ratio)๊ณผ ์™„๋ฃŒ ์‹œ๊ฐ„(Time)์„ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค. ์ง„ํ–‰๋ฅ ์€ ์ „์ฒด ํšŒ์ „ ๋‹จ๊ณ„(๋„ˆํŠธ๋Š” ๋‚˜์‚ฌ์‚ฐ ๋๊นŒ์ง€ ํšŒ์ „, ๋“œ๋ผ์ด๋ฒ„๋Š” ์™„์ „ ์ฒด๊ฒฐ) ๋Œ€๋น„ ๋‹ฌ์„ฑํ•œ ํšŒ์ „ ๋‹จ๊ณ„์˜ ๋น„์œจ๋กœ, ๋†’์€ ์ง„ํ–‰๋ฅ ์ผ์ˆ˜๋ก ๋ชฉํ‘œ ์ž‘์—…์— ๊ฐ€๊นŒ์ด ๋„๋‹ฌํ–ˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์™„๋ฃŒ ์‹œ๊ฐ„์€ ์™„์ „ ์ฒด๊ฒฐ์„ ์ด๋ฃฌ ์‹คํ—˜์˜ ์†Œ์š” ์‹œ๊ฐ„ ํ‰๊ท ์ž…๋‹ˆ๋‹ค. ํ‘œ I, II์˜ ์ˆ˜์น˜์—์„œ ๋ณด๋“ฏ, DexScrew์˜ ํ–‰๋™ ํด๋กœ๋‹ ์ •์ฑ…(์ด‰๊ฐ+์ด๋ ฅ)์€ ์ง์ ‘ sim2real๊ณผ ๋น„๊ตํ•˜์—ฌ ์ง„ํ–‰๋ฅ ์ด ํฌ๊ฒŒ ๋†’๊ณ , ์„ฑ๊ณต ์‹œ ์™„๋ฃŒ์‹œ๊ฐ„์ด ์งง์•˜์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋„ˆํŠธ ์ฒด๊ฒฐ์—์„œ ์ด‰๊ฐ+์ด๋ ฅ BC๋Š” ๋ชจ๋“  ๋„ˆํŠธํ˜•์ƒ์—์„œ 95โ€“98% ์ด์ƒ์˜ ์ง„ํ–‰๋ฅ ์„ ๊ธฐ๋กํ•ด, ์ด‰๊ฐ ์—†์ด 60โ€“80%๋ฅผ ๋„˜์ง€ ๋ชปํ•œ ๊ธฐ์กด ๋ฐฉ๋ฒ•๊ณผ ๋น„๊ตํ•ด ํ˜„์ €ํžˆ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์Šคํฌ๋ฃจ๋“œ๋ผ์ด๋น™์—์„œ๋„ ์ด‰๊ฐ+์ด๋ ฅ BC๋Š” 95.0% ์ง„ํ–‰๋ฅ ์„ ๋‹ฌ์„ฑํ•ด, ์ˆœ์ˆ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜(41.6%) ๋Œ€๋น„ ํฐ ํญ์˜ ๊ฐœ์„ ์„ ์ด๋ฃจ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ„์†Œํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋งŒ์œผ๋กœ๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์—ˆ๋˜ ๋ฌผ๋ฆฌ์  ์ƒํ˜ธ์ž‘์šฉ๊ณผ ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ์„ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋กœ ๋ณด์™„ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์œผ๋กœ ๋ถ„์„๋ฉ๋‹ˆ๋‹ค.

๋น„ํŒ์  ๋ถ„์„: ์žฅ๋‹จ์  ๋ฐ ํ•œ๊ณ„

์žฅ์  (Strengths): DexScrew๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์˜ ์žฅ์ (๋น ๋ฅธ ๋Œ€๊ทœ๋ชจ ํ•™์Šต)๊ณผ ์‹ค์ œ ๋ฐ์ดํ„ฐ์˜ ์žฅ์ (์ •ํ™•ํ•œ ๋ฌผ๋ฆฌยท์„ผ์‹ฑ) ๋ชจ๋‘๋ฅผ ํ™œ์šฉํ•œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ ‘๊ทผ์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‹จ๊ณ„์—์„œ ๊ธฐ์ดˆ์ ์ธ ์›€์ง์ž„ ์Šคํ‚ฌ์„ ์Šต๋“ํ•˜์—ฌ ์ธ๊ฐ„ ์กฐ์ž‘์„ ๋•๊ณ , ์‹ค์ œ ํ…”๋ ˆ์˜ต์„ ํ†ตํ•ด ํ˜„์‹ค ์„ธ๊ณ„์˜ ์ ‘์ด‰ ์—ญํ•™๊ณผ ์ด‰๊ฐ ์ •๋ณด๋ฅผ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ํ™•๋ณดํ–ˆ๋‹ค๋Š” ์ ์ด ํ˜์‹ ์ ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ ํ‰๊ฐ€์—์„œ ๋ณด๋“ฏ, ์ด‰๊ฐ ์„ผ์‹ฑ๊ณผ ์‹œ๊ฐ„ ์ •๋ณด์˜ ๊ฒฐํ•ฉ์€ ๋ณต์žกํ•œ ์ ‘์ด‰ ์ƒํ™ฉ(์˜ˆ: ํ˜•์ƒ์ด ๋‹ค์–‘ํ•œ ๋„ˆํŠธ, ๋งˆ์ฐฐ์ด ๋ถˆํ™•์‹คํ•œ ์Šคํฌ๋ฅ˜๋“œ๋ผ์ด๋น™)์—์„œ ํ˜„์ €ํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ–‰๋™ ํด๋กœ๋‹ ๋‹จ๊ณ„์—์„œ ์•ก์…˜ ์ฒญํ‚น์„ ๋„์ž…ํ•˜์—ฌ ๊ธด ์‹œ๊ณ„์—ด ํ–‰๋™์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ•œ ์ ๋„ ์‹ค์ œ ๋กœ๋ด‡ ์ œ์–ด์—์„œ ์œ ์šฉํ•œ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ •์ฑ… ๋„คํŠธ์›Œํฌ ์„ค๊ณ„๋‚˜ ํ•™์Šต ํ”„๋กœ์„ธ์Šค๋„ ๋น„๊ต์  ํ‘œ์ค€์ ์ธ ๊ธฐ๋ฒ•(MLP ์ž„๋ฒ ๋”ฉ, Hourglass ๋„คํŠธ์›Œํฌ, PPO, DAgger ๋“ฑ)์„ ์‚ฌ์šฉํ•˜์—ฌ ์žฌํ˜„ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์˜€๊ณ , ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌํ™˜๊ฒฝ์œผ๋กœ ํšจ์œจ์ ์ธ ํ•™์Šต์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‹จ์  ๋ฐ ํ•œ๊ณ„ (Limitations): ๊ทธ๋Ÿผ์—๋„ ๋ช‡ ๊ฐ€์ง€ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ์„  ํ…”๋ ˆ์˜ต๋ ˆ์ด์…˜ ๋‹จ๊ณ„๋Š” ์ธ๊ฐ„์˜ ์ง์ ‘ ๊ฐœ์ž…์„ ํ•„์š”๋กœ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์™„์ „ ์ž์œจ์„ฑ์ด ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ์ €์ž๋„ ์ง€์ ํ–ˆ๋“ฏ์ด ์Šคํ‚ฌ ๊ธฐ๋ฐ˜ ํ…”๋ ˆ์˜ต์€ ์ž๋™ํ™”๋œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘๋ณด๋‹ค ํšจ์œจ์„ฑ์ด ๋‚ฎ๊ณ , ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์—๋Š” ์ œ์•ฝ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ์‹คํ—˜์—์„œ๋Š” ์ด๋ฏธ ๋„ˆํŠธ๊ฐ€ ๋ณผํŠธ ์œ„์— ์œ„์น˜ํ•ด ์žˆ๊ณ  ๋“œ๋ผ์ด๋ฒ„๊ฐ€ ๋‚˜์‚ฌ์— ๋งž์ถฐ ์‚ฝ์ž…๋œ ์ƒํƒœ์—์„œ ํ…”๋ ˆ์˜ต์„ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๋ฌผ์ฒด ์ธ์‹์ด๋‚˜ ์ดˆ๊ธฐ ์ •๋ ฌ์€ ํฌํ•จ๋˜์ง€ ์•Š์€ ๋‹จ์ผ ์ž‘์—… ๋‹จ๊ณ„์— ๊ตญํ•œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ™•์žฅํ•˜๋ ค๋ฉด ๋น„์ „ ์„ผ์„œ์™€ ๊ณ ์ •๋ฐ€ ํž˜ ์„ผ์„œ ๋“ฑ์„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ, ์ด ์ ‘๊ทผ๋ฒ•์€ ๋ณต์žกํ•œ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ๊ณผ ๋ฆฌ์†Œ์Šค๋ฅผ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค. Oracle RL ๋‹จ๊ณ„์—์„œ ์ˆ˜์‹ญ์–ต ๋‹จ๊ณ„์˜ ์ƒ˜ํ”Œ์„ ํ•„์š”๋กœ ํ–ˆ์œผ๋ฉฐ, XHand์™€ ๊ฐ™์€ ๊ณ ๊ฐ€์˜ ์ด‰๊ฐ ์„ผ์„œ ์žฅ์ฐฉ ํ•ธ๋“œ, VR๊ธฐ๋ฐ˜ ์กฐ์ž‘ ์ธํ„ฐํŽ˜์ด์Šค ๋“ฑ ํŠน์ˆ˜ ์žฅ๋น„๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ… ํ•™์Šต๊ณผ ์„ธ ๊ฐ€์ง€ ๋‹จ๊ณ„์˜ ํ†ตํ•ฉ ๊ณผ์ •์ด ๋น„๊ต์  ์ˆ˜๊ณ ๊ฐ€ ๋งŽ์ด ๋“œ๋Š” ๋ฐ๋‹ค๊ฐ€, ํŠน์ • ์ž‘์—…์— ํŠนํ™”๋œ ์„ค๊ณ„(์˜ˆ: ๋„ˆํŠธ ์ฒด๊ฒฐ์„ ์œ„ํ•œ ํšŒ์ „ ์Šคํ‚ฌ)๋ผ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ์ž‘์—…์œผ๋กœ ์ „ํ™˜ํ•  ๋•Œ๋งˆ๋‹ค ๊ฐ ๋‹จ๊ณ„๋ฅผ ๋‹ค์‹œ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ–‰๋™ ํด๋กœ๋‹ ๊ธฐ๋ฐ˜ ํ•™์Šต์€ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์— ํฌ๊ฒŒ ์˜์กดํ•˜๋ฉฐ, ์ˆ˜์ง‘ํ•œ ๊ถค์  ์™ธ์˜ ์ƒํ™ฉ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™”๊ฐ€ ์ œํ•œ์ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ €์ž๋„ ํ•™์Šต ์ด๋ ฅ์˜ ์ค‘์š”์„ฑ์„ ์–ธ๊ธ‰ํ•จ). ๋งˆ์ง€๋ง‰์œผ๋กœ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ… ์ž์ฒด๋Š” ์‹ค์ œ ๋‚˜์‚ฌ์‚ฐ ๋ฌผ๋ฆฌ๋‚˜ ์ „์ฒด ์กฐ๋ฆฝ ๋™์—ญํ•™์„ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ์‚ฌ์ด์—๋Š” ์—ฌ์ „ํžˆ ์ผ๋ถ€ ๊ฒฉ์ฐจ๊ฐ€ ๋‚จ์•„ ์žˆ์œผ๋ฉฐ, ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋Œ€์ฒด๋Š” ์–ด๋ ต์Šต๋‹ˆ๋‹ค. Simulation ablation์—์„œ ํ™•์ธํ–ˆ๋“ฏ์ด, ํŠน๊ถŒ ์ •๋ณด ์—†์ด ํ•™์Šตํ–ˆ์„ ๋•Œ ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๊ฐ์†Œํ•˜์—ฌ, ํ˜„ ๋‹จ๊ณ„์—์„œ๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋‚ด๋ถ€ ์ •๋ณด๋ฅผ ์ ๊ทน ํ™œ์šฉํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ œ์•ฝ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ ์šฉ ๋ฒ”์œ„: ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ ‘์ด‰๊ณผ ์ด‰๊ฐ์ด ์ค‘์š”ํ•œ ์ •๋ฐ€ ์กฐ๋ฆฝ ์ž‘์—…(nut-bolt fastening, screwdriving ๋“ฑ)์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ๋‹ค์ง€(ๅคšๆŒ‡) ๋กœ๋ด‡ ํ•ธ๋“œ์™€ ์ด‰๊ฐ ์„ผ์„œ๊ฐ€ ๊ฐ–์ถฐ์ง„ ์‹œ์Šคํ…œ์—์„œ ๊ทธ ํšจ๊ณผ๊ฐ€ ํฝ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์™„์ „ํžˆ ๋น„์ ‘์ด‰ ์ž‘์—…์ด๋‚˜ ์‹œ๊ฐ ์ •๋ณด์— ํฌ๊ฒŒ ์˜์กดํ•˜๋Š” ์ž‘์—…์—๋Š” ์ถ”๊ฐ€ ์ˆ˜์ •์ด ํ•„์š”ํ•˜๋ฉฐ, ๋Œ€๊ทœ๋ชจ ๊ตฐ์ง‘ ์กฐ์ž‘์ฒ˜๋Ÿผ ํ…”๋ ˆ์˜ต์ด ์–ด๋ ค์šด ํ™˜๊ฒฝ์—์„œ๋Š” ์ œํ•œ์ ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ๋‹จ์ˆœํ™”๊ฐ€ ํ•ต์‹ฌ์ธ๋ฐ, ๋„ˆ๋ฌด ๋‹จ์ˆœํ•  ๊ฒฝ์šฐ ๋ฐฐ์šด ์Šคํ‚ฌ์ด ์‹ค์ œ ์ƒํ™ฉ์— ๋ถ€์ ํ•ฉํ•ด์งˆ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์ž‘์—… ํŠน์„ฑ์— ๋งž๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ชจ๋ธ๋ง์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๋ก  ๋ฐ ์ „๋ง

DexScrew๋Š” ๋ถˆ์™„์ „ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ๋ฐ์ดํ„ฐ ํ•™์Šต์„ ๊ฒฐํ•ฉํ•˜์—ฌ, ์ด‰๊ฐ ๊ธฐ๋ฐ˜์˜ ์„ฌ์„ธํ•œ ์กฐ์ž‘์„ ๊ฐ€๋Šฅ์ผ€ ํ•˜๋Š” ์œ ๋งํ•œ ์ ‘๊ทผ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์‹ฌ์ธต ๋ถ„์„ ๊ฒฐ๊ณผ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‹จ๊ณ„์—์„œ ์Šต๋“ํ•œ ํšŒ์ „ ๋™์ž‘์ด ์‹ค์ œ ํ…”๋ ˆ์˜ต๊ณผ ๊ฒฐํ•ฉ๋  ๋•Œ ๋ณต์žกํ•œ ์ ‘์ด‰ ์ž‘์—…์—์„œ๋„ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ๋Š” ํ…”๋ ˆ์˜ต ์˜์กด์„ฑ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ž์œจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐฉ๋ฒ•์ด๋‚˜ ์‚ฌ๋žŒ ์‹œ๊ฐ ๋™์ž‘ ํ•™์Šต์„ ๋„์ž…ํ•˜๊ณ , ๊ธด๋ฐ€ํ•œ ์‹œ๊ฐ-์ด‰๊ฐ ํ†ตํ•ฉ์„ ํ†ตํ•ด ์ž‘์—… ์ดˆ๊ธฐ ์ •๋ ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ด ๋ณด์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ณด๋‹ค ๊ด‘๋ฒ”์œ„ํ•œ ์กฐ์ž‘ ์ž‘์—…์— ๋Œ€ํ•œ ๊ฒ€์ฆ์„ ํ†ตํ•ด ์ผ๋ฐ˜์„ฑ(generalization)์„ ํ‰๊ฐ€ํ•˜๊ณ , ํ•™์Šต ํšจ์œจ์„ฑ ๊ฐœ์„ ์„ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ์ตœ์ ํ™”๋„ ํ•„์š”ํ•œ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ๋ฐœ์ „ํ•œ๋‹ค๋ฉด DexScrew ์Šคํƒ€์ผ์˜ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์šฉ๋„ ๋กœ๋ด‡ ํ•ธ๋“œ๋ฅผ ํ™œ์šฉํ•œ ์ •๋ฐ€ ์กฐ๋ฆฝ์ด๋‚˜ ์ œํ’ˆ ๊ฒ€์‚ฌ ๋“ฑ ์‹ค์ œ ์‚ฐ์—… ์‘์šฉ์—์„œ ๊ฐ•๋ ฅํ•œ ์†”๋ฃจ์…˜์ด ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee