Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ์ด ๋…ผ๋ฌธ์ด ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ
  • ๋ฐฉ๋ฒ•: Geometric Fabrics์™€ Behavioral Dynamics
    • Geometric Fabrics๋ž€ ๋ฌด์—‡์ธ๊ฐ€?
      • ์ˆ˜ํ•™์  ๊ณต์‹ํ™”: Forcing Energized Fabrics
      • Behavioral Dynamics: ๊ฐ€์ƒ๊ณผ ํ˜„์‹ค์˜ ๊ฒฐํ•ฉ
      • RL ์ •์ฑ…์˜ ์—ญํ• : Force Space์—์„œ์˜ ํ–‰๋™
      • ๊ธฐ์กด ์ œ์–ด ํ”„๋ ˆ์ž„์›Œํฌ์™€์˜ ๋น„๊ต
    • ์ ์šฉ: In-Hand Cube Reorientation
      • Allegro Hand๋ฅผ ์œ„ํ•œ Fabric ์„ค๊ณ„
      • ๊ฐ•ํ™”ํ•™์Šต ์„ค์ •
    • ์‹คํ—˜: ๊ฒฐ๊ณผ์™€ ๋ถ„์„
      • ์„ฑ๋Šฅ ์ง€ํ‘œ
      • ์ฃผ์š” ๊ฒฐ๊ณผ
      • Sim-to-Real Transfer
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
      • ๊ฐ•์ 
      • ์•ฝ์  ๋ฐ ํ•œ๊ณ„
    • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
      • ์œ„์น˜์  ๋งฅ๋ฝ
    • ํ•ต์‹ฌ ๋น„๊ต
      • vs. RMPflow
      • vs. OpenAI / DeXtreme
      • vs. Residual Policy Learning
      • vs. TamedPUMA
      • vs. DextrAH-RGB
  • ์‹ค๋ฌด์  ์‹œ์‚ฌ์ : Allegro Hand ์—ฐ๊ตฌ์ž๋ฅผ ์œ„ํ•œ ๋…ธํŠธ
    • ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 
  • โ›๏ธ Dig Review
    • 1. ์„œ๋ก : โ€œ์ •์ฑ…์ด ๋ชจ๋“  ๊ฑธ ๋ฐฐ์šฐ๊ฒŒ ๋‘์ง€ ๋ง์žโ€ โ€” ์•ˆ์ „/๊ตฌ์กฐ๋ฅผ ๋™์—ญํ•™ ๋ ˆ์ด์–ด๋กœ ์˜ฎ๊ธฐ๊ธฐ
    • 2. ๋ฐฉ๋ฒ•: Geometric Fabrics โ†’ Behavioral Dynamics โ†’ Fabric-Guided Policy(FGP)
      • 2.1 ํฐ ๊ทธ๋ฆผ(์•„ํ‚คํ…์ฒ˜)
    • 2.2 (์ˆ˜์‹) Forcing Energized Fabric: ์™œ โ€œ๊ธฐํ•˜ํ•™์ (geometric)โ€์ด๊ณ  ์™œ ์•ˆ์ •์ ์ธ๊ฐ€?
      • 2.3 (์ˆ˜์‹) Behavioral Dynamics: ์ธ๊ณต ๋™์—ญํ•™๊ณผ ์‹ค์ œ ๋™์—ญํ•™์˜ ๊ฒฐํ•ฉ
      • 2.4 (ํ•ต์‹ฌ) Policy Action Space์˜ ์žฌ์ •์˜: โ€œ์กฐ์ธํŠธ ๋ชฉํ‘œโ€ ๋Œ€์‹  โ€œfingertip-space force๋ฅผ ๋ฐ€์–ด๋ผโ€
      • 2.5 (์ˆ˜์‹) ๊ฐ€์†๋„/์ €ํฌ ์ œํ•œ์„ QP๋กœ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•(๋‹ซํžŒํ˜• ํ•ด ํฌํ•จ)
      • 2.6 Fabrics ์„ค๊ณ„(Allegro Hand): Attraction / Repulsion / Energization / Geometric Damping
    • 3. ์‹คํ—˜: Allegro Hand ํ๋ธŒ ์žฌ์ง€ํ–ฅ(in-hand reorientation) โ€” ํ•™์Šต ์†๋„/์„ฑ๋Šฅ/ํ•˜๋“œ์›จ์–ด ์นœํ™”์„ฑ
      • 3.1 ์‹คํ—˜ ์…‹์—… ๊ฐœ์š”
      • 3.2 ํ•™์Šต ๊ด€์ฐฐ: FGP๋Š” ๋” โ€œ๋งค๋ˆํ•จ์„ ํ•™์Šตโ€ํ•˜๊ธฐ ์–ด๋ ต์ง€๋งŒ, ๊ฐ€๋Šฅํ•˜๋‹ค
      • 3.3 ์‹ค๋กœ๋ด‡ ์„ฑ๋Šฅ: CS(์—ฐ์† ์„ฑ๊ณต) / RPM(์†๋„) / ๊ณ ์ฃผํŒŒ ๋…ธ์ด์ฆˆ ์–ต์ œ
      • 3.4 ๋…ผ๋ฌธ ๊ทธ๋ฆผ(ํ…์ŠคํŠธ ์„ค๋ช…): Fig.1 FFT ์ŠคํŽ™ํŠธ๋Ÿผ
    • 4. ๋น„ํŒ์  ๊ณ ์ฐฐ: ๊ฐ•์ /์•ฝ์ /ํ•œ๊ณ„
    • 5. ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต: RMP(1801) โ†”๏ธŽ Fabrics(2309) โ†”๏ธŽ FGP(2405)
      • 5.1 RMP (arXiv:1801.02854): โ€œ๋ชจ์…˜ ์ •์ฑ… + ๋ฆฌ๋งŒ ๋ฉ”ํŠธ๋ฆญโ€์˜ ์กฐํ•ฉ/๋ณ€ํ™˜/ํ•ฉ์„ฑ
      • 5.2 Fabrics ์ด๋ก  (arXiv:2309.07368): โ€œ์•ˆ์ •ํ•œ ๋ฏธ๋””์—„(road network) ์œ„์—์„œ ์ •์ฑ…์ด ํ•ญํ•ดํ•œ๋‹คโ€
      • 5.3 ์ด ๋…ผ๋ฌธ(2405)์˜ ๊ณ ์œ  ๊ธฐ์—ฌ๋ฅผ ํ•œ ๋ฌธ์žฅ์œผ๋กœ
    • 6. ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก : ๋กœ๋ด‡ RL ์ œ์–ด์—์„œ โ€œ์•ˆ์ „โ€์„ ๋‹ค๋ฃจ๋Š” ๋” ํ˜„์‹ค์ ์ธ ๋ฐฉ๋ฒ•
    • (๋ถ€๋ก) ํ•œ ์žฅ ์š”์•ฝ ํ‘œ
    • ์ฐธ๊ณ ๋ฌธํ—Œ

๐Ÿ“ƒGeometric Fabrics ๋ฆฌ๋ทฐ

rl
fabric-guided
behavioral dynamics
a Safe Guiding Medium for Policy Learning
Published

February 6, 2026

๐Ÿ” Ping. ๐Ÿ”” Ring. โ›๏ธ Dig. A tiered review series: quick look, key ideas, deep dive.

  • Paper Link
  • Project: Related with Dextreme
  • Code
  1. ๐Ÿค” ์ด ์—ฐ๊ตฌ๋Š” Reinforcement Learning(RL) ์ •์ฑ…์ด ๋ณต์žกํ•œ ๋กœ๋ด‡ ์—ญํ•™์— ์ง์ ‘ ์ž‘์šฉํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์ œ์–ด ๋ฐ ์•ˆ์ „ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด โ€œGeometric Fabricsโ€๋ฅผ ์•ˆ์ „ํ•œ ์•ˆ๋‚ด ๋งค๊ฐœ์ฒด๋กœ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ› ๏ธ ์ œ์•ˆ๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋กœ๋ด‡์˜ ์‹ค์ œ ์—ญํ•™์„ ์ธ๊ณต์ ์ธ 2์ฐจ ์—ญํ•™(โ€œbehavioral dynamicsโ€)๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ, ๋ณด์ƒ ์—”์ง€๋‹ˆ์–ด๋ง์„ ๋‹จ์ˆœํ™”ํ•˜๊ณ  ๋”์šฑ ๋ถ€๋“œ๋Ÿฝ๊ณ  ์•ˆ์ „ํ•œ ํ–‰๋™์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ์ƒˆ๋กœ์šด ์•ก์…˜ ๊ณต๊ฐ„์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿš€ ๊ณ ์„ฑ๋Šฅ in-hand cube reorientation ์ž‘์—…์— ์ ์šฉ๋œ Fabric-Guided Policy(FGP)๋Š” ๊ธฐ์กด ์ตœ์ฒจ๋‹จ DeXtreme ๋ชจ๋ธ๋ณด๋‹ค ๋†’์€ ์„ฑ๋Šฅ(CS ๋ฐ RPM)๊ณผ ํ˜„์ €ํžˆ ๋‚ฎ์€ ์•ก์…˜ ๋…ธ์ด์ฆˆ๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ sim2real ์ „์ด์˜ ํšจ๊ณผ๋ฅผ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

๋ณธ ๋…ผ๋ฌธ์€ ๋กœ๋ด‡ ์ •์ฑ…(policy)์ด ๋ณต์žกํ•œ 2์ฐจ ์—ญํ•™(second order dynamics)์— ์ข…์†๋˜์–ด ํ–‰๋™(action)์ด ๊ฒฐ๊ณผ ์ƒํƒœ์™€ ์–ฝํžˆ๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ํŠนํžˆ Reinforcement Learning (RL) ํ™˜๊ฒฝ์—์„œ ์ •์ฑ…์€ ๋ฐฉ๋Œ€ํ•œ ๊ฒฝํ—˜๊ณผ ๋ณต์žกํ•œ ๋ณด์ƒ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ด๋Ÿฌํ•œ ๋ณต์žกํ•œ ์ƒํ˜ธ์ž‘์šฉ์„ ํ•ด๋…ํ•˜์—ฌ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ Operational Space Control (OSC) ๋˜๋Š” joint PD control๊ณผ ๊ฐ™์€ ์ œ์–ด๊ธฐ๋Š” ์ฃผ๋กœ ํƒœ์Šคํฌ(task) ๋˜๋Š” ์กฐ์ธํŠธ ๊ณต๊ฐ„(joint space)์—์„œ ์ง์„  ์šด๋™(straight-line motion)์„ ์œ ๋„ํ•˜์ง€๋งŒ, ๋กœ๋ด‡์ด ๋ณด์—ฌ์•ผ ํ•  ํ’๋ถ€ํ•˜๊ณ  ๋น„์„ ํ˜•์ ์ธ ๋™์ž‘์„ ์ œ๋Œ€๋กœ ํฌ์ฐฉํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, RL ์ •์ฑ…์€ ์ข…์ข… bang-bang actions๋ฅผ ์œ ๋ฐœํ•˜์—ฌ ๋กœ๋ด‡ ํ•˜๋“œ์›จ์–ด์— ์†์ƒ์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด โ€œGeometric Fabricsโ€๋ผ๋Š” ์ƒˆ๋กœ์šด ์ œ์–ด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ธ๊ณต์ ์ธ 2์ฐจ ๋™์—ญํ•™(artificial second order dynamics)์„ ํ†ตํ•ด ๋กœ๋ด‡์˜ โ€œuncontrolled dynamicsโ€๋ฅผ ๋ณ€ํ™”์‹œ์ผœ โ€œbehavioral dynamicsโ€๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด โ€œbehavioral dynamicsโ€๋Š” RL ์ •์ฑ…์„ ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ƒˆ๋กญ๊ณ  ์•ˆ์ „ํ•œ ์œ ๋„์  ํ–‰๋™ ๊ณต๊ฐ„(action space)์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” RL ์ •์ฑ…์˜ bang-bang-like action๋„ ์‹ค์ œ ๋กœ๋ด‡์— ๋Œ€ํ•ด ์•ˆ์ „ํ•˜๊ฒŒ ๋งŒ๋“ค๊ณ , ๋ณด์ƒ ์„ค๊ณ„(reward engineering)๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๋ฉฐ, ๊ณ ์„ฑ๋Šฅ ์ •์ฑ…์„ ์ˆœ์ฐจ์ ์œผ๋กœ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก : Behavioral Dynamics๋ฅผ ์œ„ํ•œ Geometric Fabrics

์ œ์•ˆํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋กœ๋ด‡์˜ ์‹ค์ œ 2์ฐจ ๋™์—ญํ•™์„ ์ธ๊ณต์ ์ธ 2์ฐจ ๋™์—ญํ•™ ์‹œ์Šคํ…œ์„ ํ†ตํ•ด ์žฌ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ์ธ๊ณต ๋™์—ญํ•™์€ [20]์—์„œ ์ œ์•ˆ๋œ Geometric Fabrics์˜ ์•ˆ์ •์ ์ธ ์„œ๋ธŒํด๋ž˜์Šค๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

  1. Geometric Fabric์˜ ๊ธฐ๋ณธ ๋™์—ญํ•™: Fabric์˜ ๋™์—ญํ•™์€ ๋‹ค์Œ ๋ฐฉ์ •์‹์œผ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค: \ddot{q}_f = e_h(q_f, \dot{q}_f) + \alpha_L L(q_f, \dot{q}_f) \dot{q}_f - M_f^{-1}(q_f, \dot{q}_f) \partial \psi (q_f) + B(q_f, \dot{q}_f) \dot{q}_f - \beta(q_f, \dot{q}_f) \dot{q}_f \quad (1) ์—ฌ๊ธฐ์„œ q_f, \dot{q}_f, \ddot{q}_f \in \mathbb{R}^n๋Š” ๊ฐ๊ฐ fabric์˜ ์œ„์น˜, ์†๋„, ๊ฐ€์†๋„๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
    • M_f \in \mathbb{R}^{n \times n}๋Š” ์–‘์˜ ์ •๋ถ€ํ˜ธ(positive-definite) system metric (์งˆ๋Ÿ‰)์ž…๋‹ˆ๋‹ค.
    • e_h \in \mathbb{R}^n๋Š” ์†๋„(velocity)์— ๋Œ€ํ•ด 2์ฐจ ๋™์ฐจ(homogeneous of degree 2, HD2)์ธ fabric ํ•ญ์œผ๋กœ, ๊ณต๊ฐ„์„ ํ†ตํ•œ ๊ธฐํ•˜ํ•™์  ๊ฒฝ๋กœ(geometric paths)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • \alpha_L \in \mathbb{R}์€ energization coefficient๋กœ, fabric์ด ํŠน์ • ์—๋„ˆ์ง€ L์„ ์œ ์ง€ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    • \partial \psi \in \mathbb{R}^n๋Š” ํฌํ…์…œ ํ•จ์ˆ˜(potential function)์˜ ๊ธฐ์šธ๊ธฐ์ด๊ณ , B \in \mathbb{R}^{n \times n}๋Š” ์–‘์˜ ๋ฐ˜์ •๋ถ€ํ˜ธ(positive semi-definite) damping matrix์ด๋ฉฐ, ์ด๋“ค์€ ์‹œ์Šคํ…œ ๊ฐ€์†๋„๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ๊ต๋ž€ํ•˜์—ฌ ์ œ์•ฝ ์กฐ๊ฑด(constraints)์„ ๋ถ€๊ณผํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • \beta \in \mathbb{R}^+๋Š” fabric์˜ ๊ธฐํ•˜ํ•™์  ๊ตฌ์กฐ๋ฅผ ๋ณด์กดํ•˜๊ณ  ์‹œ์Šคํ…œ์„ ์•ˆ์ •ํ™”ํ•˜๋Š” ์ถ”๊ฐ€ damping scalar์ž…๋‹ˆ๋‹ค.
  2. Behavioral Dynamics: ์‹ (1)์˜ fabric ๋™์—ญํ•™์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฐ„๊ฒฐํ•˜๊ฒŒ ๋‹ค์‹œ ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: M_f (q_f, \dot{q}_f) \ddot{q}_f + f_f (q_f, \dot{q}_f) = 0 \quad (2) ์—ฌ๊ธฐ์„œ f_f \in \mathbb{R}^n๋Š” ์ธ๊ณต์ ์ธ ํž˜(artificial force)์ž…๋‹ˆ๋‹ค. ์ด fabric ๋™์—ญํ•™์€ ๋กœ๋ด‡์˜ ์‹ค์ œ ๋™์—ญํ•™์— ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค: M(q)\ddot{q} + f(q, \dot{q}) = \tau(q, \dot{q}, q_f, \dot{q}_f, \ddot{q}_f) \quad (3) ์—ฌ๊ธฐ์„œ q, \dot{q}, \ddot{q} \in \mathbb{R}^n๋Š” ์‹ค์ œ ๋กœ๋ด‡์˜ ์œ„์น˜, ์†๋„, ๊ฐ€์†๋„์ด๋ฉฐ, M \in \mathbb{R}^{n \times n}๊ณผ f \in \mathbb{R}^n๋Š” ์‹ค์ œ ๋กœ๋ด‡์˜ ์งˆ๋Ÿ‰ ๋ฐ ํž˜(์ ‘์ด‰, ์›์‹ฌ๋ ฅ/์ฝ”๋ฆฌ์˜ฌ๋ฆฌ, ๋งˆ์ฐฐ, ์ค‘๋ ฅ ํฌํ•จ)์ž…๋‹ˆ๋‹ค. \tau๋Š” torque control law๋กœ, ์ธ๊ณต ๋™์—ญํ•™๊ณผ ์‹ค์ œ ๋™์—ญํ•™์„ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ torque control law๋Š” joint-level proportional-derivative (PD) controller์™€ inverse dynamics compensation์„ ํฌํ•จํ•˜๋ฉฐ, ์ด๋Š” ||q_f - q|| \le \epsilon_1 ๋ฐ || \dot{q}_f - \dot{q}|| \le \epsilon_2๋ฅผ ๋ณด์žฅํ•˜์—ฌ ๋กœ๋ด‡์ด fabric์˜ ์›€์ง์ž„์„ ์ถ”์ ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” impedance control ๋ฐ admittance control ๊ฐœ๋…๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ, fabric ์ƒํƒœ์™€ ์‹ค์ œ ์ƒํƒœ์˜ ๋ถ„๋ฆฌ(separation)๋ฅผ ํ†ตํ•ด ์ ‘์ด‰๋ ฅ(contact forces)์„ ์œ ๋„ํ•˜๊ณ  ์ œ์–ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  3. RL ์ •์ฑ…์˜ ์ž‘์šฉ ๊ณต๊ฐ„: RL ์ •์ฑ… \pi(\cdot)์€ fabric์— driving force๋ฅผ ๊ฐ€ํ•˜์—ฌ action a๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค: M_f (q_f, \dot{q}_f) \ddot{q}_f + f_f (q_f, \dot{q}_f) + f_\pi(a) = 0 \quad (4) ์—ฌ๊ธฐ์„œ f_\pi(a)๋Š” fabric์— ์ž‘์šฉํ•˜๋Š” driving force๋กœ ํ•ด์„๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ fabric ์ƒํƒœ(q_f, \dot{q}_f)์˜ ์‹œ๊ฐ„ ๋ณ€ํ™”๋Š” fabric ์ž์ฒด์™€ ์ •์ฑ…์ด ์ƒ์„ฑํ•˜๋Š” ์ œ์–ด๋ ฅ์˜ ํ•จ์ˆ˜๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. RL ์ •์ฑ…์€ ์ด driving force๋ฅผ ํ†ตํ•ด fabric์˜ ํ–‰๋™์„ ์œ ๋„ํ•˜๋ฉฐ, ์ด๋Š” ๋‹ค์‹œ ์‹ค์ œ ๋กœ๋ด‡์˜ ํ–‰๋™์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.
  4. ๊ฐ€์†๋„ ๋ฐ ์ €ํฌ(Jerk) ์ฒ˜๋ฆฌ:
    • ๋กœ๋ด‡ ์ œ์–ด๊ธฐ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์•ก์ถ”์—์ดํ„ฐ ๋ณดํ˜ธ๋ฅผ ์œ„ํ•ด q_f์˜ ์ถฉ๋ถ„ํ•œ ๋ถ€๋“œ๋Ÿฌ์›€(smoothness)์„ ์š”๊ตฌํ•˜๋ฉฐ, ์ด๋Š” ๊ฐ€์†๋„(acceleration) ๋ฐ ์ €ํฌ(jerk) ์ œ์•ฝ์œผ๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค. ๋ณธ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ quadratic program์„ ํ†ตํ•ด ์ด๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค: L = \frac{1}{2} (\ddot{q}_f - \ddot{q})^T M_f (\ddot{q}_f - \ddot{q}) + \alpha_2 \ddot{q}_f^T M_f \ddot{q}_f \quad (5) ์—ฌ๊ธฐ์„œ \alpha \in \mathbb{R}^+๋Š” M_f๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ||\ddot{q}|| \to 0์„ ํšจ๊ณผ์ ์œผ๋กœ ์ •๊ทœํ™”ํ•ฉ๋‹ˆ๋‹ค. ํ์‡„ํ˜• ํ•ด(closed-form solution)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: (M_f + \alpha I)\ddot{q}_f + f_f = 0 \quad (6) \ddot{q}_f์— ๋Œ€ํ•ด ํ’€๋ฉด \ddot{q}_f = -(M_f + \alpha I)^{-1} f_f๊ฐ€ ๋˜๋ฉฐ, \alpha \to \infty์ผ ๋•Œ ||\ddot{q}_f|| \to 0์ด๋ฏ€๋กœ, ๊ฐ€์†๋„๋ฅผ ์ž„์˜๋กœ ์ž‘๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ ์กฐ์ธํŠธ์˜ ๊ฐ€์†๋„ ํ•œ๊ณ„ | \ddot{q}_{f,i} | \le \ddot{q}_i๋ฅผ ๋งŒ์กฑํ•˜๋„๋ก ๋‹จ์ผ \alpha๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

    • ์ €ํฌ ํ•œ๊ณ„(jerk limits)๋ฅผ ์ˆ˜์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹œ๊ฐ„ ์ด์‚ฐํ™”๋œ ์ €ํฌ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค: \dddot{q}_f^t = \frac{\ddot{q}_f^{t+1} - \ddot{q}_f^t}{\Delta t} \quad (7) ๊ฐ€์žฅ ํฐ ์ €ํฌ๋Š” ๋‹ค์Œ ๊ฐ€์†๋„๊ฐ€ ์ตœ๋Œ€์ด๊ณ  ์ด์ „ ๊ฐ€์†๋„๊ฐ€ ์ตœ์†Œ์ผ ๋•Œ ๋ฐœ์ƒํ•˜๋ฉฐ: \dddot{q}_f^t = \frac{2\ddot{q}}{\Delta t} \quad (8) ๋”ฐ๋ผ์„œ, ์ €ํฌ๊ฐ€ ํŠน์ • ํ•œ๊ณ„ \dddot{q}๋ฅผ ์ดˆ๊ณผํ•˜์ง€ ์•Š์•„์•ผ ํ•œ๋‹ค๋ฉด: \frac{2\ddot{q}}{\Delta t} \le \dddot{q} \quad (9) ์ด๋กœ๋ถ€ํ„ฐ ์›๋ž˜์˜ ๊ฐ€์†๋„ ํ•œ๊ณ„์™€ ์ €ํฌ ํ•œ๊ณ„๋ฅผ ๋ชจ๋‘ ์กด์ค‘ํ•˜๋Š” ๋‹จ์ผ ๊ฐ€์†๋„ ํ•œ๊ณ„ \ddot{q}๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: \ddot{q} = \min \left( \ddot{q}_{\text{original}}, \frac{\Delta t \dddot{q}}{2\ddot{q}_{\text{target}}} \right) \quad (10) ์ด ์ƒˆ๋กœ์šด \ddot{q}๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์†๋„์™€ ์ €ํฌ ์ œ์•ฝ์ด ์œ ์ง€๋˜๋„๋ก ํ•˜๋Š” ๊ธฐ์กด์˜ ๋ฐฉ์‹์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์ง€ ๋กœ๋ด‡ ์†์˜ ํ๋ธŒ ์žฌ๋ฐฐ์น˜(Reorientation) ์ ์šฉ

์ œ์•ˆํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋Š” 16๊ฐœ์˜ ์•ก์ถ”์—์ดํ„ฐ๋ฅผ ๊ฐ€์ง„ 4์ง€ Allegro Hand v4๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ in-hand cube reorientation ๋ฌธ์ œ์— ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • Fabric ์„ค๊ณ„:
    • Attraction: ํ๋ธŒ์™€ ์†๊ฐ€๋ฝ ๋(fingertip) ์ ‘์ด‰์„ ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•ด fingertip ๊ณต๊ฐ„์—, ๊ทธ๋ฆฌ๊ณ  ํ๋ธŒ๋ฅผ ๊ฐ์‹ธ๋Š” ์†๊ฐ€๋ฝ ๋ง๋ฆผ์„ ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•ด configuration ๊ณต๊ฐ„์— geometric attractor๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • Repulsion: ๋กœ๋ด‡ ์†์˜ ์กฐ์ธํŠธ ํ•œ๊ณ„(joint limits)๋ฅผ ์ค€์ˆ˜ํ•˜๊ธฐ ์œ„ํ•ด ์ƒ์œ„ ๋ฐ ํ•˜์œ„ ์กฐ์ธํŠธ ํ•œ๊ณ„ ํƒœ์Šคํฌ ๊ณต๊ฐ„์— ๋ฐ˜๋ฐœ๋ ฅ(repulsion forcing) ํ•ญ์„ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐฐ๋ฆฌ์–ด ๋ฉ”ํŠธ๋ฆญ(barrier metric)์„ ํ†ตํ•ด ๊ตฌํ˜„๋ฉ๋‹ˆ๋‹ค.
    • Energization: fabric ์ž์ฒด์˜ ์—๋„ˆ์ง€ ์•ˆ์ •์„ฑ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด energization coefficient \alpha_L์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • Geometrically-Consistent Damping: ์ตœ์ข… damping ํ•ญ \beta๋Š” ํ›ˆ๋ จ ์ค‘์—๋Š” ํƒ์ƒ‰(exploration)์„ ์šฉ์ดํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ์ž‘์€ ์ƒ์ˆ˜๋กœ ์„ค์ •๋˜๊ณ , ๋ฐฐํฌ ์‹œ์—๋Š” sim2real ์ „ํ™˜์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ˆ˜์ค€์œผ๋กœ ํ…Œ์ŠคํŠธ๋ฉ๋‹ˆ๋‹ค.
    • Action Space: RL ์ •์ฑ…์˜ action์€ concatenation๋œ fingertip ๊ณต๊ฐ„์—์„œ ํž˜์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค (a \in \mathbb{R}^{12}). ์ด ํž˜์€ f_\pi(a) = \gamma J^T(q_f) \text{clamp}(a, -1, 1) ํ˜•ํƒœ๋กœ fabric์— ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  • Reinforcement Learning ์„ค์ •: ๊ธฐ์กด DeXtreme [8]๊ณผ ๋™์ผํ•œ ๋ณด์ƒ ํ•ญ๊ณผ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, fabric layer๊ฐ€ ๋ถ€๋“œ๋Ÿฝ๊ณ  ์•ˆ์ „ํ•œ ์›€์ง์ž„์„ ๋ณด์žฅํ•˜๋ฏ€๋กœ action, action delta, joint velocity ํŽ˜๋„ํ‹ฐ๋ฅผ ์™„์ „ํžˆ ์ œ๊ฑฐํ•˜์—ฌ ๋ณด์ƒ ์„ค๊ณ„๋ฅผ ๋‹จ์ˆœํ™”ํ•ฉ๋‹ˆ๋‹ค. RL ํ›ˆ๋ จ์—๋Š” PPO๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ์œผ๋ฉฐ, Automatic Domain Randomization (ADR)๋„ ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ค‘ ํ๋ธŒ์— ๋Œ€ํ•œ ์™„์ „ํ•œ wrench disturbance๋ฅผ ์ ์šฉํ•˜์—ฌ sim2real ๊ฒฉ์ฐจ๋ฅผ ์ค„์ด๊ณ  ์ •์ฑ…์ด prehensile-lock์„ ๋” ์ž์ฃผ ํ™•๋ฆฝํ•˜๋„๋ก ์œ ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„

FGP (Fabric-Guided Policy)์™€ DeXtreme ์ •์ฑ…์„ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ, FGP๋Š” DeXtreme ์ •์ฑ…๋ณด๋‹ค ํ›ˆ๋ จ ์†๋„๋Š” ๋А๋ ธ์œผ๋‚˜ (๋” ๋ถ€๋“œ๋Ÿฌ์šด ์ •์ฑ… ํ•™์Šต์ด ์–ด๋ ค์›€), ์ตœ์ข… entropy ์ˆ˜์ค€์€ ๋” ๋‚ฎ์•˜์Šต๋‹ˆ๋‹ค.

์‹ค์ œ ํ™˜๊ฒฝ ์„ฑ๋Šฅ ์ง€ํ‘œ์ธ consecutive success (CS)์™€ rotations per minute (RPM)๋ฅผ ํ†ตํ•ด ํ‰๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • CS ์„ฑ๋Šฅ: FGP๋Š” \beta=40์ผ ๋•Œ ํ‰๊ท  94.1์˜ CS๋ฅผ ๊ธฐ๋กํ•˜๋ฉฐ ์ด์ „ DeXtreme ๋ชจ๋ธ ๋Œ€๋น„ 3๋ฐฐ ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. DeXtreme (new)์€ ๋ช‡๋ช‡ ๋งค์šฐ ๋†’์€ ๋Ÿฐ์œผ๋กœ ํ‰๊ท  CS๋Š” ๋†’์•˜์ง€๋งŒ(244.6), ์ค‘์•™๊ฐ’(median)์€ FGP๊ฐ€ ๋” ๋†’์•˜์Šต๋‹ˆ๋‹ค (85.5 vs 70).
  • RPM ์„ฑ๋Šฅ: FGP๋Š” ๋ชจ๋“  DeXtreme ์ •์ฑ…๋ณด๋‹ค RPM ์„ฑ๋Šฅ์ด ํ˜„์ €ํžˆ ์šฐ์ˆ˜ํ–ˆ์Šต๋‹ˆ๋‹ค. \beta ๊ฐ’์ด ๋†’์•„์งˆ์ˆ˜๋ก FGP์˜ RPM์ด ๋”์šฑ ์ •๋ฐ€ํ•ด์ง€๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • Action Noise Rejection: FGP๋Š” 5 Hz ์ด์ƒ์—์„œ ์ŠคํŽ™ํŠธ๋Ÿผ ์ง„ํญ(spectral amplitudes)์ด ๊ฑฐ์˜ 0์— ๊ฐ€๊นŒ์›Œ action noise๊ฐ€ ํ˜„์ €ํžˆ ์ ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” DeXtreme ์ •์ฑ…์ด ์ƒ๋‹นํ•œ low-pass filtering์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  5 Hz ์ด์ƒ์˜ ๋…ธ์ด์ฆˆ๋ฅผ ํฌํ•จํ•˜๋Š” ๊ฒƒ๊ณผ ๋Œ€์กฐ์ ์ž…๋‹ˆ๋‹ค. ์ด๋Š” FGP๊ฐ€ ํ•˜๋“œ์›จ์–ด ๋งˆ๋ชจ๋ฅผ ์ค„์ด๊ณ  ๋กœ๋ด‡์˜ ์ˆ˜๋ช…์„ ์—ฐ์žฅํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์ด์ ์„ ๊ฐ€์ง์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๋ก 

๋ณธ ์—ฐ๊ตฌ๋Š” RL ์ •์ฑ…, ์ธ๊ณต ๋™์—ญํ•™ ์‹œ์Šคํ…œ(geometric fabrics), ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ ์‹œ์Šคํ…œ ๋™์—ญํ•™์„ ๊ณ„๋‹จ์‹์œผ๋กœ ์—ฐ๊ฒฐํ•˜๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ๊ณ ์„ฑ๋Šฅ geometric fabrics๋ฅผ ํ™œ์šฉํ•˜์—ฌ dexterous, high DoA ๋กœ๋ด‡ ์†์˜ cube reorientation ์ž‘์—…์—์„œ state-of-the-art ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. Fabric layer๋Š” ๋ณด์ƒ ์„ค๊ณ„๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๊ณ , ๋ณธ์งˆ์ ์ธ ์•ˆ์ „์„ฑ์„ ์ œ๊ณตํ•˜๋ฉฐ, action noise๋ฅผ ํฌ๊ฒŒ ์ค„์—ฌ ํ•˜๋“œ์›จ์–ด์˜ ์ˆ˜๋ช…์„ ์—ฐ์žฅํ•˜๋Š” ๋“ฑ RL ์ •์ฑ…์— ์ƒ๋‹นํ•œ ์ด์ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. FGP๋Š” RL ํ›ˆ๋ จ ์†๋„๊ฐ€ ๋А๋ ค์งˆ ์ˆ˜ ์žˆ์ง€๋งŒ, ์ด๋Š” ๋” ๋ถ€๋“œ๋Ÿฌ์šด ์ •์ฑ…์ด ๊ฐ€์ง€๋Š” ์ผ๋ฐ˜์ ์ธ ํŠน์„ฑ์œผ๋กœ ๋ถ„์„๋ฉ๋‹ˆ๋‹ค.

ํ–ฅํ›„ ์—ฐ๊ตฌ๋Š” ์ด ์ ‘๊ทผ ๋ฐฉ์‹์„ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ”Œ๋žซํผ, ๋‹ค์–‘ํ•œ ์ž‘์—…, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์–‘ํ•œ fabric ์„ค๊ณ„์— ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ •์ฑ… ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค๋ฅธ ์ตœ์ ํ™” ๋ฐ ๊ณ„ํš ๋ฐฉ๋ฒ•๋„ ๊ณ ๋ ค๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

๋กœ๋ด‡ ํŒ”์—๊ฒŒ โ€œ์ € ์ปต์„ ์žก์•„โ€๋ผ๊ณ  ๋ช…๋ นํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ณด์ž. ๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ์ ‘๊ทผ์€ ํ˜„์žฌ ์œ„์น˜์—์„œ ๋ชฉํ‘œ๊นŒ์ง€ ์ง์„ (straightline)์œผ๋กœ ์ด๋™ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. Joint PD ์ œ์–ด๋“  Operational Space Control(OSC)์ด๋“ , ๋Œ€๋ถ€๋ถ„์˜ ์ „ํ†ต์  ์ปจํŠธ๋กค๋Ÿฌ๋Š” ๋ณธ์งˆ์ ์œผ๋กœ ์ด๋Ÿฐ ์ง์„  ์šด๋™์„ ์ƒ์„ฑํ•œ๋‹ค. ๋ฌธ์ œ๋Š” ํ˜„์‹ค ์„ธ๊ณ„๊ฐ€ ์ง์„ ์œผ๋กœ ํ•ด๊ฒฐ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์žฅ์• ๋ฌผ์„ ํ”ผํ•ด์•ผ ํ•˜๊ณ , ๊ด€์ ˆ ํ•œ๊ณ„๋ฅผ ์กด์ค‘ํ•ด์•ผ ํ•˜๋ฉฐ, ์†๊ฐ€๋ฝ ๋์ด ๋ฌผ์ฒด ํ‘œ๋ฉด์„ ๋”ฐ๋ผ ๋ฏธ๋„๋Ÿฌ์ง€๋“ฏ ์›€์ง์—ฌ์•ผ ํ•œ๋‹ค. ์ด ๋ชจ๋“  ๋น„์„ ํ˜•(nonlinear) ํ–‰๋™์˜ ๋ถ€๋‹ด์ด ๊ณ ์Šค๋ž€ํžˆ ๊ฐ•ํ™”ํ•™์Šต(RL) ์—์ด์ „ํŠธ์—๊ฒŒ ์ „๊ฐ€๋œ๋‹ค.

Karl Van Wyk, Ankur Handa, Viktor Makoviychuk ๋“ฑ NVIDIA ์—ฐ๊ตฌํŒ€์€ ICRA 2024์—์„œ ๋ฐœํ‘œํ•œ ์ด ๋…ผ๋ฌธ์—์„œ ๊ทผ๋ณธ์ ์ธ ์งˆ๋ฌธ์„ ๋˜์ง„๋‹ค: โ€œRL ์ •์ฑ…์ด ์ž‘๋™ํ•˜๋Š” ๊ธฐ๋ฐ˜ ๋™์—ญํ•™ ์ž์ฒด๋ฅผ ๋” ๋˜‘๋˜‘ํ•˜๊ฒŒ ๋งŒ๋“ค๋ฉด ์–ด๋–จ๊นŒ?โ€ ๊ทธ๋“ค์˜ ๋‹ต์ด ๋ฐ”๋กœ Geometric Fabrics๋ฅผ ํ™œ์šฉํ•œ Fabric-Guided Policy (FGP) ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค.

ํ•ต์‹ฌ ์•„์ด๋””์–ด๋ฅผ ๋น„์œ ํ•˜์ž๋ฉด ์ด๋ ‡๋‹ค. RL ์—์ด์ „ํŠธ๊ฐ€ ๋นˆ ์ข…์ด ์œ„์—์„œ ์ฒ˜์Œ๋ถ€ํ„ฐ ๊ธ€์”จ๋ฅผ ์“ฐ๋Š” ๋Œ€์‹ , ์ด๋ฏธ ์—ฐํ•„์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ˜๋Ÿฌ๊ฐ€๋Š” ๋ฐฉํ–ฅ์ด ๊ทธ๋ ค์ง„ ์•ˆ๋‚ด์„ ์ด ์žˆ๋Š” ์ข…์ด ์œ„์—์„œ ๊ธ€์„ ์“ฐ๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด ์•ˆ๋‚ด์„ ์ด ๋ฐ”๋กœ geometric fabric์ด๊ณ , ์—์ด์ „ํŠธ๋Š” ์ด ํ๋ฆ„์„ ๋”ฐ๋ฅด๋ฉด์„œ ํ•„์š”ํ•  ๋•Œ๋งŒ ๋ฐฉํ–ฅ์„ ์ˆ˜์ •ํ•˜๋ฉด ๋œ๋‹ค.

์ด ๋…ผ๋ฌธ์ด ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ

์ „ํ†ต์ ์ธ RL ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ์ œ์–ด ํŒŒ์ดํ”„๋ผ์ธ์—๋Š” ์„ธ ๊ฐ€์ง€ ๊ตฌ์กฐ์  ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค:

์ฒซ์งธ, ํ–‰๋™ ๊ณต๊ฐ„์˜ ๋นˆ์•ฝํ•จ. ์ •์ฑ…์ด OSC๋‚˜ joint PD ์ปจํŠธ๋กค๋Ÿฌ์— ๋ชฉํ‘œ ์œ„์น˜๋ฅผ ์ „๋‹ฌํ•˜๋ฉด, ์ปจํŠธ๋กค๋Ÿฌ๋Š” ๊ทธ ๋ชฉํ‘œ๋ฅผ ํ–ฅํ•ด ์ง์„  ์šด๋™์„ ์ƒ์„ฑํ•œ๋‹ค. ํ•˜์ง€๋งŒ dexterous manipulation์ฒ˜๋Ÿผ ๋ณต์žกํ•œ ํƒœ์Šคํฌ์—์„œ ํ•„์š”ํ•œ ํ’๋ถ€ํ•œ ๋น„์„ ํ˜• ํ–‰๋™์€ ์ด ์ง์„  ๋™์ž‘์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์—†๋‹ค.

๋‘˜์งธ, ์•ˆ์ „์„ฑ์˜ ๋ถ€์žฌ. RL ์ •์ฑ…์€ ๋ณธ์งˆ์ ์œผ๋กœ bang-bang ์ œ์–ด, ์ฆ‰ ์ตœ๋Œ€ยท์ตœ์†Œ๊ฐ’์„ ์˜ค๊ฐ€๋Š” ๊ธ‰๊ฒฉํ•œ ํ–‰๋™์„ ์„ ํ˜ธํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ์ด๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ๋ฌธ์ œ๊ฐ€ ์—†์ง€๋งŒ, ์‹ค์ œ ๋กœ๋ด‡์˜ ๋ชจํ„ฐ์™€ ๊ตฌ๋™๊ณ„์— ์‹ฌ๊ฐํ•œ ์†์ƒ์„ ์ดˆ๋ž˜ํ•œ๋‹ค. ๊ธฐ์กด์—๋Š” low-pass filter๋กœ ๊ณ ์ฃผํŒŒ ์„ฑ๋ถ„์„ ์–ต์ œํ–ˆ์ง€๋งŒ, ์ด๋Š” ๊ด€์ ˆ ์ œ์•ฝ์„ ๋ช…์‹œ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•˜๊ณ  ์ •์ฑ…์˜ ๋ฐ˜์‘์„ฑ์„ ๋–จ์–ด๋œจ๋ฆฐ๋‹ค.

์…‹์งธ, ๋ณด์ƒ ์„ค๊ณ„์˜ ๋ณต์žก์„ฑ. ์—์ด์ „ํŠธ๊ฐ€ ๋ชจ๋“  ํ–‰๋™ โ€” ์†๊ฐ€๋ฝ์„ ์•ˆ์ชฝ์œผ๋กœ ๊ตฌ๋ถ€๋ฆฌ๊ธฐ, ๊ด€์ ˆ ํ•œ๊ณ„ ์กด์ค‘, ์†๋์œผ๋กœ ๋ฌผ์ฒด ์ ‘์ด‰ํ•˜๊ธฐ โ€” ์„ ๋ณด์ƒ ํ•จ์ˆ˜๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•ด์•ผ ํ•˜๋ฏ€๋กœ, ๋ณด์ƒ ์—”์ง€๋‹ˆ์–ด๋ง์ด ๊ทน๋„๋กœ ๋ณต์žกํ•ด์ง„๋‹ค.

๋ฐฉ๋ฒ•: Geometric Fabrics์™€ Behavioral Dynamics

Geometric Fabrics๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

Geometric Fabrics์˜ ์ด๋ก ์  ๊ธฐ์ดˆ๋Š” Nathan Ratliff์˜ ์ผ๋ จ์˜ ์—ฐ๊ตฌ์—์„œ ๋น„๋กฏ๋œ๋‹ค. ํ•ต์‹ฌ ๊ฐœ๋…์„ ๋‹จ๊ณ„์ ์œผ๋กœ ํ’€์–ด๋ณด์ž.

๊ณ ์ „ ์—ญํ•™์—์„œ ์ž…์ž์˜ ์šด๋™์€ 2์ฐจ ๋ฏธ๋ถ„๋ฐฉ์ •์‹์œผ๋กœ ๊ธฐ์ˆ ๋œ๋‹ค: M\ddot{q} + f(q, \dot{q}) = 0. ์—ฌ๊ธฐ์„œ M์€ ์งˆ๋Ÿ‰ ํ–‰๋ ฌ, f๋Š” ์ฝ”๋ฆฌ์˜ฌ๋ฆฌ์Šคยท์›์‹ฌ๋ ฅยท์ค‘๋ ฅ ๋“ฑ์„ ํฌํ•จํ•˜๋Š” ํ•ญ์ด๋‹ค. ์ด ๋ฐฉ์ •์‹์ด ์ •์˜ํ•˜๋Š” ๊ฒƒ์€ configuration space ์œ„์˜ ๊ฒฝ๋กœ(path)๋‹ค.

Geometric fabric์€ ์ด ๊ณ ์ „ ์—ญํ•™์„ ์ผ๋ฐ˜ํ™”ํ•œ ๊ฒƒ์ด๋‹ค. ํ•ต์‹ฌ์ ์ธ ์ฐจ์ด์ ์€ ๋‘ ๊ฐ€์ง€๋‹ค:

  1. ๋ฉ”ํŠธ๋ฆญ์ด ์†๋„์—๋„ ์˜์กดํ•œ๋‹ค. ๊ณ ์ „ ์—ญํ•™์˜ ์งˆ๋Ÿ‰ ํ–‰๋ ฌ M(q)๋Š” ์œ„์น˜์—๋งŒ ์˜์กดํ•˜์ง€๋งŒ, geometric fabric์˜ ๋ฉ”ํŠธ๋ฆญ M_f(q_f, \dot{q}_f)๋Š” ์œ„์น˜์™€ ์†๋„ ๋ชจ๋‘์— ์˜์กดํ•œ๋‹ค. ์ด๊ฒƒ์€ Riemannian ๊ธฐํ•˜ํ•™์—์„œ Finsler ๊ธฐํ•˜ํ•™์œผ๋กœ์˜ ํ™•์žฅ์— ํ•ด๋‹นํ•œ๋‹ค.

  2. ์†๋„ ๋…๋ฆฝ์ (speed-invariant) ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. Fabric์ด ๋งŒ๋“œ๋Š” ๊ฒฝ๋กœ๋Š” ์‹œ์Šคํ…œ์ด ์–ผ๋งˆ๋‚˜ ๋นจ๋ฆฌ ์›€์ง์ด๋Š”์ง€์— ๊ด€๊ณ„์—†์ด ๋™์ผํ•œ ํ˜•ํƒœ๋ฅผ ์œ ์ง€ํ•œ๋‹ค. ์ด๊ฒƒ์€ ๊ฒฝ๋กœ์˜ โ€œ๋ชจ์–‘โ€๊ณผ โ€œ์†๋„โ€๋ฅผ ๋ถ„๋ฆฌํ•ด์„œ ์„ค๊ณ„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๊ฐ•๋ ฅํ•œ ์„ฑ์งˆ์ด๋‹ค.

์ง๊ด€์ ์œผ๋กœ ๋งํ•˜๋ฉด, geometric fabric์€ configuration space ์œ„์— ๊น”๋ฆฐ โ€œํ๋ฆ„์˜ ์žฅ(flow field)โ€์ด๋‹ค. ์ด ํ๋ฆ„์€ ๋กœ๋ด‡์ด ๋ฐ”๋žŒ์งํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์›€์ง์ด๋„๋ก ์œ ๋„ํ•˜๋˜, ์™ธ๋ถ€ ํž˜(forcing)์ด ๊ฐ€ํ•ด์ง€๋ฉด ๊ทธ ๋ฐฉํ–ฅ์œผ๋กœ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. ๋งˆ์น˜ ๊ฐ•๋ฌผ์˜ ํ๋ฆ„์ฒ˜๋Ÿผ โ€” ๋ฐฐ๋ฅผ ๋„์šฐ๋ฉด ๋ฌผ์‚ด์„ ๋”ฐ๋ผ ๊ฐ€์ง€๋งŒ, ๋…ธ๋ฅผ ์ €์œผ๋ฉด ํ๋ฆ„์—์„œ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๋‹ค.

์ˆ˜ํ•™์  ๊ณต์‹ํ™”: Forcing Energized Fabrics

๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” fabric์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์•ˆ์ •์ ์ธ 2์ฐจ ๋™์—ญํ•™ ์‹œ์Šคํ…œ์ด๋‹ค:

\ddot{q}_f = \underbrace{e_h(q_f, \dot{q}_f)}_{\text{๊ธฐํ•˜ํ•™์  ํ•ญ}} + \underbrace{\alpha_L(q_f, \dot{q}_f) \dot{q}_f}_{\text{์—๋„ˆ์ง€ ์กฐ์ ˆ}} - \underbrace{M_f^{-1}(q_f, \dot{q}_f)\left(\nabla\psi(q_f) + B(q_f, \dot{q}_f)\dot{q}_f\right)}_{\text{ํฌํ…์…œ + ๊ฐ์‡  (forcing)}} - \underbrace{\beta(q_f, \dot{q}_f) \dot{q}_f}_{\text{๊ธฐํ•˜ํ•™์  ๊ฐ์‡ }}

๊ฐ ํ•ญ์˜ ์—ญํ• ์„ ํ•˜๋‚˜์”ฉ ์‚ดํŽด๋ณด์ž:

๊ธฐํ˜ธ ์—ญํ•  ์ง๊ด€์  ์˜๋ฏธ
q_f, \dot{q}_f, \ddot{q}_f Fabric์˜ ์œ„์น˜, ์†๋„, ๊ฐ€์†๋„ ๊ฐ€์ƒ ๋กœ๋ด‡์˜ ์ƒํƒœ
e_h(q_f, \dot{q}_f) ๊ธฐํ•˜ํ•™์  ๊ฐ€์†๋„ (geodesic term) โ€œ๊ณต๊ฐ„์˜ ๊ณก๋ฅ ์— ์˜ํ•œ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฐฉํ–ฅ ์ „ํ™˜โ€
\alpha_L \dot{q}_f ์—๋„ˆ์ง€ํ™” ๊ณ„์ˆ˜ โ€œ์‹œ์Šคํ…œ์— ์—๋„ˆ์ง€๋ฅผ ์ฃผ์ž…/ํก์ˆ˜ํ•˜์—ฌ ์†๋„ ์กฐ์ ˆโ€
M_f^{-1}\nabla\psi ํฌํ…์…œ ํ•จ์ˆ˜์˜ ๊ทธ๋ž˜๋””์–ธํŠธ โ€œ๋ชฉํ‘œ๋ฅผ ํ–ฅํ•œ ์ธ๋ ฅ, ์žฅ์• ๋ฌผ๋กœ๋ถ€ํ„ฐ์˜ ์ฒ™๋ ฅโ€
B\dot{q}_f ๊ฐ์‡  ํ•ญ โ€œํฌํ…์…œ ์ฃผ๋ณ€์—์„œ์˜ ์ง„๋™ ์–ต์ œโ€
\beta \dot{q}_f ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ์ผ๊ด€๋œ ๊ฐ์‡  โ€œfabric ์ž์ฒด์˜ ์—๋„ˆ์ง€ ์†Œ์‚ฐโ€

์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ์ด๋ก ์  ๋ณด์žฅ์ด ์žˆ๋‹ค. Ratliff et al. (2020)์˜ Theorem IV.1์— ์˜ํ•ด, ์ด ํ˜•ํƒœ์˜ fabric์€ ์ ๊ทผ์ ์œผ๋กœ ์•ˆ์ •(asymptotically stable)ํ•˜๋‹ค. ์ฆ‰, ํฌํ…์…œ ํ•จ์ˆ˜ \psi๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ๋„ ์‹œ์Šคํ…œ์ด ๋ฐœ์‚ฐํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ด ์ˆ˜ํ•™์ ์œผ๋กœ ์ฆ๋ช…๋˜์–ด ์žˆ๋‹ค.

Behavioral Dynamics: ๊ฐ€์ƒ๊ณผ ํ˜„์‹ค์˜ ๊ฒฐํ•ฉ

์ด์ œ ํ•ต์‹ฌ์ ์ธ ์•„์ด๋””์–ด๊ฐ€ ๋“ฑ์žฅํ•œ๋‹ค. Geometric fabric์€ ๊ฐ€์ƒ์˜ ๋™์—ญํ•™์ด๋‹ค. ์‹ค์ œ ๋กœ๋ด‡์€ ์ž์‹ ์˜ ๋ฌผ๋ฆฌ์  ๋™์—ญํ•™์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค:

M(q)\ddot{q} + f(q, \dot{q}) = \tau

์—ฌ๊ธฐ์„œ M(q)๋Š” ์‹ค์ œ ๊ด€์„ฑ ํ–‰๋ ฌ, f๋Š” ์ฝ”๋ฆฌ์˜ฌ๋ฆฌ์Šค/์ค‘๋ ฅ ํ•ญ, \tau๋Š” ๊ด€์ ˆ ํ† ํฌ๋‹ค.

Behavioral dynamics๋Š” ์ด ๋‘ ์„ธ๊ณ„๋ฅผ ์—ฐ๊ฒฐํ•œ๋‹ค. Fabric์ด ์ƒ์„ฑํ•œ ๊ฐ€์†๋„ \ddot{q}_f๋ฅผ ๋”ฐ๋ผ๊ฐ€๋„๋ก ์‹ค์ œ ๋กœ๋ด‡์˜ ํ† ํฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด๋‹ค:

\tau = M(q)\ddot{q}_f + f(q, \dot{q})

์ด๊ฒƒ์€ ๋ณธ์งˆ์ ์œผ๋กœ ์—ญ๋™์—ญํ•™(inverse dynamics) ๊ธฐ๋ฐ˜ ์ œ์–ด์™€ ์œ ์‚ฌํ•˜์ง€๋งŒ, ๋ชฉํ‘œ๊ฐ€ ๋‹จ์ˆœํ•œ ์œ„์น˜๋‚˜ ๊ถค์ ์ด ์•„๋‹ˆ๋ผ fabric์ด ์ง€์‹œํ•˜๋Š” ๊ฐ€์†๋„๋ผ๋Š” ์ ์ด ๋‹ค๋ฅด๋‹ค. ์‹ค์ œ ๊ตฌํ˜„์—์„œ๋Š” fabric ์ƒํƒœ (q_f, \dot{q}_f)์™€ ์‹ค์ œ ๋กœ๋ด‡ ์ƒํƒœ (q, \dot{q})๊ฐ€ ํ•ญ์ƒ ์ผ์น˜ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ, PD ๋ณด์ƒ ํ•ญ์ด ์ถ”๊ฐ€๋œ๋‹ค:

\tau = M(q)\ddot{q}_f + f(q, \dot{q}) + K_p(q_f - q) + K_d(\dot{q}_f - \dot{q})

RL ์ •์ฑ…์˜ ์—ญํ• : Force Space์—์„œ์˜ ํ–‰๋™

์ด์ œ RL ์ •์ฑ…์€ ์–ด๋””์— ๊ฐœ์ž…ํ•˜๋Š”๊ฐ€? ์ •์ฑ…์€ fabric์— ํž˜(force)์„ ๊ฐ€ํ•œ๋‹ค. ์ˆ˜ํ•™์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด, ์ •์ฑ…์˜ ์ถœ๋ ฅ \pi(s)๋Š” fabric์˜ ํฌํ…์…œ ํ•จ์ˆ˜์— ์ถ”๊ฐ€๋˜๋Š” forcing term์ด ๋œ๋‹ค:

\ddot{q}_f = [\text{fabric terms}] + M_f^{-1} \cdot F_{\text{policy}}

์ด๊ฒƒ์ด ์™œ ํ˜๋ช…์ ์ธ์ง€ ์ƒ๊ฐํ•ด๋ณด์ž. ๊ธฐ์กด RL ์ •์ฑ…์ด โ€œ๊ด€์ ˆ์„ ์ด ์œ„์น˜๋กœ ์ด๋™์‹œ์ผœโ€๋ผ๊ณ  ์ง์ ‘ ๋ช…๋ นํ–ˆ๋‹ค๋ฉด, FGP์—์„œ๋Š” โ€œํ˜„์žฌ fabric์˜ ํ๋ฆ„์— ์ด๋Ÿฐ ๋ฐฉํ–ฅ์œผ๋กœ ํž˜์„ ๊ฐ€ํ•ดโ€๋ผ๊ณ  ์ง€์‹œํ•œ๋‹ค. Fabric์ด ์ด๋ฏธ ์•ˆ์ „ํ•˜๊ณ  ๋ฐ”๋žŒ์งํ•œ ํ–‰๋™์„ ์ธ์ฝ”๋”ฉํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ:

  • ์ •์ฑ…์ด ์•„๋ฌด๋Ÿฐ ํ–‰๋™๋„ ํ•˜์ง€ ์•Š์œผ๋ฉด(zero action), ๋กœ๋ด‡์€ fabric์˜ ๊ธฐ๋ณธ ํ–‰๋™์„ ๋”ฐ๋ฅธ๋‹ค
  • ์ •์ฑ…์ด ๊ทน๋‹จ์ ์ธ ํ–‰๋™์„ ์ถœ๋ ฅํ•ด๋„, fabric์˜ ๊ตฌ์กฐ๊ฐ€ ๊ด€์ ˆ ํ•œ๊ณ„, ๊ฐ€์†๋„ ์ œ์•ฝ ๋“ฑ์„ ์ž๋™์œผ๋กœ ๋ณด์žฅํ•œ๋‹ค
  • ๊ฒฐ๊ณผ์ ์œผ๋กœ bang-bang ์Šคํƒ€์ผ์˜ ์ •์ฑ… ํ–‰๋™์ด ์‹ค์ œ ๋กœ๋ด‡์—์„œ๋„ ์•ˆ์ „ํ•˜๊ฒŒ ์‹คํ–‰๋œ๋‹ค

์•„๋ž˜ ๋‹ค์ด์–ด๊ทธ๋žจ์€ ์ „์ฒด ์‹œ์Šคํ…œ์˜ ํ๋ฆ„์„ ๋ณด์—ฌ์ค€๋‹ค.

flowchart TD
    subgraph RL["๊ฐ•ํ™”ํ•™์Šต ์ •์ฑ… (RL Policy)"]
        OBS["๊ด€์ธก (State Observation)"] --> POLICY["์‹ ๊ฒฝ๋ง ์ •์ฑ… ฯ€(s)"]
        POLICY --> ACTION["ํ–‰๋™ ์ถœ๋ ฅ (Force Space)"]
    end

    subgraph FABRIC["Geometric Fabric Layer"]
        GEOM["๊ธฐํ•˜ํ•™์  ํ•ญ e_h"] --> FABRIC_ACC["Fabric ๊ฐ€์†๋„ qฬˆ_f"]
        ENERGY["์—๋„ˆ์ง€ํ™” ฮฑ_L"] --> FABRIC_ACC
        POT["ํฌํ…์…œ/๊ฐ์‡  โˆ‡ฯˆ, B"] --> FABRIC_ACC
        DAMP["๊ธฐํ•˜ํ•™์  ๊ฐ์‡  ฮฒ"] --> FABRIC_ACC
        ACTION --> |"์ •์ฑ… forcing"| FABRIC_ACC
    end

    subgraph ROBOT["์‹ค์ œ ๋กœ๋ด‡ (Allegro Hand)"]
        FABRIC_ACC --> |"์—ญ๋™์—ญํ•™ + PD ๋ณด์ƒ"| TORQUE["๊ด€์ ˆ ํ† ํฌ ฯ„"]
        TORQUE --> HAND["16-DOF Allegro Hand"]
        HAND --> |"์ƒํƒœ ํ”ผ๋“œ๋ฐฑ"| OBS
    end

    style RL fill:#e8f4f8,stroke:#2196F3
    style FABRIC fill:#fff3e0,stroke:#FF9800
    style ROBOT fill:#e8f5e9,stroke:#4CAF50
Figure 1: FGP ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š”

๊ธฐ์กด ์ œ์–ด ํ”„๋ ˆ์ž„์›Œํฌ์™€์˜ ๋น„๊ต

์ด ์‹œ์ ์—์„œ ์ž์—ฐ์Šค๋Ÿฌ์šด ์งˆ๋ฌธ์ด ์ƒ๊ธด๋‹ค: โ€œOSC๋‚˜ impedance control๊ณผ ๋ญ๊ฐ€ ๋‹ค๋ฅธ๊ฐ€?โ€ ํ•ต์‹ฌ์ ์ธ ์ฐจ์ด๋ฅผ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

graph LR
    subgraph TRAD["์ „ํ†ต์  RL + ์ปจํŠธ๋กค๋Ÿฌ"]
        P1["RL ์ •์ฑ…"] -->|"๋ชฉํ‘œ ์œ„์น˜/์†๋„"| C1["OSC / Joint PD"]
        C1 -->|"์ง์„  ์šด๋™ ์ƒ์„ฑ"| R1["๋กœ๋ด‡"]
    end

    subgraph FGP_ARCH["FGP (Fabric-Guided Policy)"]
        P2["RL ์ •์ฑ…"] -->|"ํž˜ (Force)"| F2["Geometric Fabric"]
        F2 -->|"๋น„์„ ํ˜• ํ–‰๋™ + ์ œ์•ฝ ๋ณด์žฅ"| C2["ํ† ํฌ ์ปจํŠธ๋กค๋Ÿฌ"]
        C2 --> R2["๋กœ๋ด‡"]
    end

    style TRAD fill:#ffebee,stroke:#f44336
    style FGP_ARCH fill:#e8f5e9,stroke:#4CAF50
Figure 2: ๊ธฐ์กด ์ปจํŠธ๋กค๋Ÿฌ vs Geometric Fabric ๊ธฐ๋ฐ˜ ์ œ์–ด
ํŠน์„ฑ OSC / Joint PD Dynamic Movement Primitives Geometric Fabrics
์ƒ์„ฑํ•˜๋Š” ์šด๋™ ์ง์„ (ํƒœ์Šคํฌ/๊ด€์ ˆ ๊ณต๊ฐ„) ํ•™์Šต๋œ ๊ถค์  ๋น„์„ ํ˜• ๊ธฐํ•˜ํ•™์  ๊ฒฝ๋กœ
์•ˆ์ •์„ฑ ๋ณด์žฅ ์กฐ๊ฑด๋ถ€ ํŠน์ • ์กฐ๊ฑดํ•˜ ์ˆ˜ํ•™์ ์œผ๋กœ ์ฆ๋ช…๋จ
์ œ์•ฝ ์ฒ˜๋ฆฌ ๋ณ„๋„ ํ•„์š” ๋ณ„๋„ ํ•„์š” ๋‚ด์žฌ์  (๊ด€์ ˆ ํ•œ๊ณ„, ๊ฐ€์†๋„, ์ €ํฌ)
ํ–‰๋™ ํ’๋ถ€์„ฑ ๋‚ฎ์Œ ์ค‘๊ฐ„ ๋†’์Œ
RL๊ณผ์˜ ๊ฒฐํ•ฉ ํ–‰๋™ ๊ณต๊ฐ„ = ๋ชฉํ‘œ ์œ„์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜ ํ•™์Šต ํ–‰๋™ ๊ณต๊ฐ„ = ํž˜ ๊ณต๊ฐ„
์„ค๊ณ„ ์‹œ๊ฐ„ ์ˆ˜ ๋ถ„ ๋ฐ๋ชจ ํ•„์š” ์ˆ˜ ์‹œ๊ฐ„ (๊ทธ๋Ÿฌ๋‚˜ RL ํ•™์Šต ๋น„์šฉ ์ ˆ๊ฐ)

์ ์šฉ: In-Hand Cube Reorientation

Allegro Hand๋ฅผ ์œ„ํ•œ Fabric ์„ค๊ณ„

๋…ผ๋ฌธ์€ ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ 16์ž์œ ๋„(4๊ฐœ ์†๊ฐ€๋ฝ ร— 4 ๊ด€์ ˆ) Allegro Hand V4๋ฅผ ์ด์šฉํ•œ in-hand cube reorientation ํƒœ์Šคํฌ์— ์ ์šฉํ•œ๋‹ค. ์ด ํƒœ์Šคํฌ๋Š” ์†๋ฐ”๋‹ฅ ์œ„์— ๋†“์ธ ํ๋ธŒ๋ฅผ ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š๊ณ  ๋ชฉํ‘œ ์ž์„ธ๋กœ ํšŒ์ „์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค.

Fabric ์„ค๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•ต์‹ฌ ์ปดํฌ๋„ŒํŠธ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค:

์ธ๋ ฅ (Attraction)

๊ฐ ์†๊ฐ€๋ฝ ๋์ด ํ๋ธŒ ํ‘œ๋ฉด์„ ํ–ฅํ•ด ๋Œ๋ฆฌ๋„๋ก ์„ค๊ณ„ํ•œ๋‹ค. ์ธ๋ ฅ์˜ ํฌ๊ธฐ๋Š” ์†๋๊ณผ ํ๋ธŒ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋˜๋ฉฐ, ํฌํ…์…œ ํ•จ์ˆ˜ \psi์™€ ๋ฉ”ํŠธ๋ฆญ M_f์˜ ์กฐํ•ฉ์œผ๋กœ ํ‘œํ˜„๋œ๋‹ค. ์ด ์„ค๊ณ„์˜ ์ง๊ด€์€ ๋ช…ํ™•ํ•˜๋‹ค: ์†๊ฐ€๋ฝ์ด ๋ฌผ์ฒด์— ์ ‘์ด‰ํ•˜๊ณ  ์žˆ์–ด์•ผ ์กฐ์ž‘์ด ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ, fabric ์ž์ฒด๊ฐ€ โ€œ์†๊ฐ€๋ฝ ๋์„ ํ๋ธŒ ์ชฝ์œผ๋กœ ์ด๋™์‹œํ‚ค๋Š”โ€ ๊ธฐ๋ณธ ๊ฒฝํ–ฅ์„ฑ์„ ๊ฐ€์ง„๋‹ค.

๋˜ํ•œ ์•ˆ์ชฝ์œผ๋กœ ๊ตฌ๋ถ€๋ฆฌ๋Š”(inward curling) ๊ฒฝํ–ฅ์„ฑ๋„ ์ธ์ฝ”๋”ฉ๋˜์–ด ์žˆ๋‹ค. ์ด๋Š” ์†๊ฐ€๋ฝ์ด ํŽด์ง„ ์ƒํƒœ๋ณด๋‹ค ๊ตฌ๋ถ€๋ฆฐ ์ƒํƒœ์—์„œ ๋ฌผ์ฒด๋ฅผ ๋” ์•ˆ์ •์ ์œผ๋กœ ์žก์„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ฒ™๋ ฅ (Repulsion)

๊ด€์ ˆ ์œ„์น˜ ํ•œ๊ณ„์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด ์ฒ™๋ ฅ์ด ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋Š” ๊ด€์ ˆ์ด ๋ฌผ๋ฆฌ์  ํ•œ๊ณ„๋ฅผ ๋„˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ์†Œํ”„ํŠธ ๋ฐฐ๋ฆฌ์–ด ์—ญํ• ์„ ํ•˜๋ฉฐ, ๋ช…์‹œ์ ์ธ ํด๋žจํ•‘(clamping) ์—†์ด๋„ ์•ˆ์ „ํ•œ ์šด๋™ ๋ฒ”์œ„๋ฅผ ์œ ์ง€ํ•œ๋‹ค.

์—๋„ˆ์ง€ํ™” (Energization)

์—๋„ˆ์ง€ํ™” ๊ณ„์ˆ˜ \alpha_L์€ fabric์˜ ์ „์ฒด ์—๋„ˆ์ง€ ์ˆ˜์ค€์„ ์กฐ์ ˆํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ๋งค๊ฐœ๋ณ€์ˆ˜ \beta (๋…ผ๋ฌธ์—์„œ๋Š” ์ด๊ฒƒ์ด ์—๋„ˆ์ง€ํ™”์™€ ๊ด€๋ จ๋œ ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ)๋กœ ์ œ์–ดํ•˜๋ฉฐ, ์ด ๊ฐ’์ด ํด์ˆ˜๋ก fabric์ด ๋” โ€œ์ ๊ทน์ ์œผ๋กœโ€ ํ–‰๋™ํ•œ๋‹ค.

๊ฐ€์†๋„ยท์ €ํฌ ์ œ์•ฝ (Acceleration and Jerk Handling)

์ด๊ฒƒ์ด ์‹ค์ œ ํ•˜๋“œ์›จ์–ด ๋ฐฐ์น˜์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์ด๋‹ค. ๋…ผ๋ฌธ์€ ๊ฐ€์†๋„์™€ ์ €ํฌ ์ œ์•ฝ์„ ๋‹ซํžŒ ํ˜•ํƒœ(closed-form)์˜ ์ด์ฐจ ํ”„๋กœ๊ทธ๋ž˜๋ฐ(QP) ์†”๋ฃจ์…˜์œผ๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค. ์ด๋Š” ๋ณ„๋„์˜ low-pass filtering ์—†์ด๋„ ๋ชจํ„ฐ์— ์•ˆ์ „ํ•œ ๋ช…๋ น์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.

๊ธฐ์กด DeXtreme ๊ฐ™์€ ์‹œ์Šคํ…œ์€ ์ •์ฑ… ์ถœ๋ ฅ์— ๋ฌด๊ฑฐ์šด low-pass filter๋ฅผ ์ ์šฉํ•ด์•ผ ํ–ˆ๋Š”๋ฐ, ์ด๋Š” ๋ฐ˜์‘์„ฑ์„ ๋–จ์–ด๋œจ๋ฆฌ๊ณ  ๊ทธ๋ž˜๋„ ๊ณ ์ฃผํŒŒ ์„ฑ๋ถ„์ด ์™„์ „ํžˆ ์ œ๊ฑฐ๋˜์ง€ ์•Š๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋‹ค. FGP๋Š” fabric ์ˆ˜์ค€์—์„œ ์ œ์•ฝ์„ ์ง์ ‘ ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ ํ•„ํ„ฐ๊ฐ€ ๋ถˆํ•„์š”ํ•˜๋‹ค.

ํ–‰๋™ ๊ณต๊ฐ„ (Action Space)

RL ์ •์ฑ…์˜ ํ–‰๋™ ๊ณต๊ฐ„์€ fabric์— ๊ฐ€ํ•ด์ง€๋Š” ํž˜(force)์ด๋‹ค. ์ •์ฑ…์€ ๊ฐ ์†๊ฐ€๋ฝ์˜ fabric ์ƒํƒœ์— ๋Œ€ํ•ด ํž˜ ๋ฒกํ„ฐ๋ฅผ ์ถœ๋ ฅํ•˜๊ณ , ์ด ํž˜์ด fabric์˜ ๊ธฐ๋ณธ ๋™์—ญํ•™์— ๋”ํ•ด์ ธ ์ตœ์ข… ํ–‰๋™์„ ๊ฒฐ์ •ํ•œ๋‹ค. ์ •์ฑ…์ด ์˜๋ฒกํ„ฐ๋ฅผ ์ถœ๋ ฅํ•˜๋ฉด fabric์˜ ๊ธฐ๋ณธ ํ–‰๋™๋งŒ ์‹คํ–‰๋œ๋‹ค.

๊ฐ•ํ™”ํ•™์Šต ์„ค์ •

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ

  • ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ: NVIDIA Isaac Gym (GPU ๊ธฐ๋ฐ˜ ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜)
  • ๋ณ‘๋ ฌ ํ™˜๊ฒฝ ์ˆ˜: ์ˆ˜์ฒœ ๊ฐœ (๋…ผ๋ฌธ์—์„œ๋Š” ๊ตฌ์ฒด์  ์ˆ˜๋ฅผ DeXtreme ์„ค์ •๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์šด์šฉ)
  • RL ์•Œ๊ณ ๋ฆฌ์ฆ˜: PPO (Proximal Policy Optimization)
  • ์ •์ฑ… ๋„คํŠธ์›Œํฌ: MLP (Multi-Layer Perceptron)

ํ๋ธŒ ๊ต๋ž€ ๋ Œ์น˜ (Cube Disturbance Wrench)

Sim-to-real transfer์˜ ํ•ต์‹ฌ ๊ธฐ๋ฒ•์œผ๋กœ, ๋…ผ๋ฌธ์€ ํ•™์Šต ์ค‘ ํ๋ธŒ์— ์ „์ฒด ๋ Œ์น˜(wrench) โ€” ํž˜๊ณผ ํ† ํฌ ๋ชจ๋‘ โ€” ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๊ฐ€ํ•œ๋‹ค. ์ด๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ํ˜„์‹ค ์‚ฌ์ด์˜ ๋ฌผ๋ฆฌ์  ์ฐจ์ด(๋งˆ์ฐฐ ๊ณ„์ˆ˜, ์ ‘์ด‰ ์—ญํ•™ ๋“ฑ)์— ๋Œ€ํ•œ ์ •์ฑ…์˜ ๊ฐ•๊ฑด์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

๋ณด์ƒ ํ•จ์ˆ˜ ๋‹จ์ˆœํ™”

FGP์˜ ๊ฐ€์žฅ ํฐ ์‹ค์šฉ์  ์žฅ์  ์ค‘ ํ•˜๋‚˜๊ฐ€ ์—ฌ๊ธฐ์— ์žˆ๋‹ค. Fabric์ด ์ด๋ฏธ ๋‹ค์Œ์„ ์ธ์ฝ”๋”ฉํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ:

  • ์†๊ฐ€๋ฝ ๋์˜ ํ๋ธŒ ์ ‘์ด‰ ์œ ๋„
  • ์•ˆ์ชฝ์œผ๋กœ ๊ตฌ๋ถ€๋ฆฌ๋Š” ์ž์„ธ
  • ๊ด€์ ˆ ํ•œ๊ณ„ ์ค€์ˆ˜
  • ๊ฐ€์†๋„/์ €ํฌ ์ œ์•ฝ

๋ณด์ƒ ํ•จ์ˆ˜์—์„œ ์ด๋“ค์„ ๋ณ„๋„๋กœ ์„ค๊ณ„ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. ๋…ผ๋ฌธ์˜ ํ‘œํ˜„์„ ๋นŒ๋ฆฌ๋ฉด: โ€œ๊ฒฝํ—˜ ๋งŽ์€ ์‹ค๋ฌด์ž๊ฐ€ ์ ์ ˆํ•œ ์†Œํ”„ํŠธ์›จ์–ด ํˆด๋ง์„ ์‚ฌ์šฉํ•˜๋ฉด ์ˆ˜ ์‹œ๊ฐ„ ๋‚ด์— fabric์„ ์„ค๊ณ„ํ•˜๊ณ  ํŠœ๋‹ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋ณต์žกํ•œ ๋ณด์ƒ ํ•จ์ˆ˜๋ฅผ ๋ฐ˜๋ณต ์ˆ˜์ •ํ•˜๊ณ  ๊ธด RL ํ•™์Šต์„ ๋ฐ˜๋ณตํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅด๋‹ค.โ€

์ˆ˜์น˜ ์ ๋ถ„ (Numerical Integration)

Fabric์˜ ์ƒํƒœ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์˜ ๋ฌผ๋ฆฌ ์—”์ง„๊ณผ ๋…๋ฆฝ์ ์œผ๋กœ ์ˆ˜์น˜ ์ ๋ถ„๋œ๋‹ค. ์ด๋Š” fabric์ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋ฌผ๋ฆฌ์™€ ๋ณ„๊ฐœ์˜ โ€œ๊ฐ€์ƒ ์„ธ๊ณ„โ€๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ, ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์‹ค์ œ ๋กœ๋ด‡ ์ œ์–ด์— ๋ฐ˜์˜ํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

์‹คํ—˜: ๊ฒฐ๊ณผ์™€ ๋ถ„์„

์„ฑ๋Šฅ ์ง€ํ‘œ

๋…ผ๋ฌธ์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค:

  • Consecutive Successes (CS): ํ๋ธŒ๋ฅผ ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š๊ณ  ์—ฐ์†์œผ๋กœ ๋ชฉํ‘œ ์ž์„ธ์— ๋„๋‹ฌํ•œ ํšŸ์ˆ˜. ๊ฐ ๋ชฉํ‘œ ์ž์„ธ์— 0.4 ๋ผ๋””์•ˆ ์ด๋‚ด๋กœ ๋„๋‹ฌํ•˜๋ฉด ์„ฑ๊ณต์œผ๋กœ ๊ฐ„์ฃผํ•˜๊ณ  ์ƒˆ๋กœ์šด ๋ชฉํ‘œ๊ฐ€ ์ฃผ์–ด์ง„๋‹ค.
  • Rotations Per Minute (RPM): ๋ถ„๋‹น ํšŒ์ „ ์ˆ˜. ์ •์ฑ…์˜ ์กฐ์ž‘ ์†๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

์ฃผ์š” ๊ฒฐ๊ณผ

ฮฒ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ

\beta๋Š” ์—๋„ˆ์ง€ํ™” ๊ณ„์ˆ˜๋กœ, fabric์ด ์–ผ๋งˆ๋‚˜ ์ ๊ทน์ ์œผ๋กœ ํ–‰๋™ํ•˜๋Š”์ง€๋ฅผ ์ œ์–ดํ•œ๋‹ค. ๋…ผ๋ฌธ์€ \beta = \{2.5, 10, 20, 30, 40, 50\}์— ๋Œ€ํ•ด ์‹คํ—˜ํ–ˆ๋‹ค.

์„ค์ • ํ‰๊ท  CS ์ค‘์•™๊ฐ’ CS RPM ํŠน์ง•
FGP ฮฒ=2.5 ๋‚ฎ์Œ - ๋‚ฎ์Œ Fabric ์˜ํ–ฅ ์•ฝํ•จ, ์ •์ฑ… ์ž์œ ๋„ ๋†’์Œ
FGP ฮฒ=10 ์ค‘๊ฐ„ - ์ค‘๊ฐ„
FGP ฮฒ=20 ๋†’์Œ - ๋†’์Œ
FGP ฮฒ=30 ๋งค์šฐ ๋†’์Œ - ๋งค์šฐ ๋†’์Œ
FGP ฮฒ=40 ์ตœ๊ณ  ์ตœ๊ณ  ์ตœ๊ณ  ์ตœ์  ์„ค์ •
FGP ฮฒ=50 ๋†’์Œ - ๋†’์Œ ์ง€๋‚˜์นœ fabric ์ง€๋ฐฐ

์ตœ์  \beta=40 ์„ค์ •์—์„œ FGP๋Š” ๋‹จ์ผ ์‹คํ–‰์—์„œ 186 CS๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ์ด๋Š” ํ๋ธŒ๊ฐ€ ๋–จ์–ด์ง€์ง€ ์•Š๊ณ  186๋ฒˆ ์—ฐ์†์œผ๋กœ ๋ชฉํ‘œ ์ž์„ธ์— ๋„๋‹ฌํ–ˆ๋‹ค๋Š” ์˜๋ฏธ๋‹ค.

DeXtreme๊ณผ์˜ ๋น„๊ต

์ง€ํ‘œ FGP (ฮฒ=40) DeXtreme (new) ๋น„๊ณ 
ํ‰๊ท  CS ์•ฝ 3๋ฐฐ ์ด์ƒ ๊ธฐ์ค€์„  FGP๊ฐ€ ์••๋„์ ์œผ๋กœ ์šฐ์ˆ˜
์ค‘์•™๊ฐ’ CS ์œ ์‚ฌ~์•ฝ๊ฐ„ ๋‚ฎ์Œ ์•ฝ๊ฐ„ ๋†’์Œ DeXtreme์€ ์ตœ์‹  ๋ฒ„์ „์—์„œ ์ค‘์•™๊ฐ’ ์•ฝ๊ฐ„ ์šฐ์„ธ
์ตœ๋Œ€ CS 186 670 DeXtreme์˜ ์ตœ๋Œ€ ๋‹จ์ผ ์‹คํ–‰์ด ๋” ๊ธธ์ง€๋งŒ, FGP์˜ ์ผ๊ด€์„ฑ์ด ๋†’์Œ
RPM ์ตœ๊ณ  ๋ณดํ†ต FGP๊ฐ€ ๊ฐ€์žฅ ๋น ๋ฅด๊ณ  ์ผ๊ด€์ ์ธ ํšŒ์ „ ์†๋„
ํ–‰๋™ ์Šค๋ฌด์Šค๋‹ˆ์Šค 5Hz ์ด์ƒ ๊ฑฐ์˜ 0 5Hz ์ด์ƒ ์„ฑ๋ถ„ ์กด์žฌ FGP์˜ ์ŠคํŽ™ํŠธ๋Ÿผ์ด ๊ทน์ ์œผ๋กœ ๊นจ๋—
Low-pass filter ๋ถˆํ•„์š” ํ•„์ˆ˜ ํ•˜๋“œ์›จ์–ด ์•ˆ์ „์„ฑ์˜ ๊ทผ๋ณธ์  ์ฐจ์ด

ํ•ต์‹ฌ ๊ฒฐ๊ณผ๋ฅผ ์ •๋ฆฌํ•˜๋ฉด:

  1. ํ‰๊ท  CS ๊ธฐ์ค€์œผ๋กœ FGP๊ฐ€ ์ด์ „ SOTA ๋Œ€๋น„ 3๋ฐฐ ์ด์ƒ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€๋‹ค.
  2. RPM์ด ๊ฐ€์žฅ ๋†’๊ณ  ์ผ๊ด€์ ์ด์–ด์„œ, ์ •์ฑ…์˜ ์กฐ์ž‘ ์†๋„๊ฐ€ ๋น ๋ฅด๊ณ  ์•ˆ์ •์ ์ด๋‹ค.
  3. ์ŠคํŽ™ํŠธ๋Ÿผ ๋ถ„์„์—์„œ FGP์˜ ํ–‰๋™ ๋…ธ์ด์ฆˆ๊ฐ€ ๊ทน์ ์œผ๋กœ ๋‚ฎ๋‹ค. 5Hz ์ด์ƒ์˜ ์ฃผํŒŒ์ˆ˜ ์„ฑ๋ถ„์ด ๊ฑฐ์˜ 0์— ๊ฐ€๊นŒ์šด ๋ฐ˜๋ฉด, DeXtreme์€ heavy low-pass filtering์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ณ ์ฃผํŒŒ ์„ฑ๋ถ„์ด ๋‚จ์•„์žˆ๋‹ค.

ํ–‰๋™ ์Šค๋ฌด์Šค๋‹ˆ์Šค์˜ ์˜๋ฏธ

์ด ๊ฒฐ๊ณผ์˜ ์‹ค์šฉ์  ์˜๋ฏธ๋Š” ๋งค์šฐ ํฌ๋‹ค. ์ŠคํŽ™ํŠธ๋Ÿผ์ด ๊นจ๋—ํ•˜๋‹ค๋Š” ๊ฒƒ์€:

  • ๋ชจํ„ฐ์— ๊ฐ€ํ•ด์ง€๋Š” ์ŠคํŠธ๋ ˆ์Šค๊ฐ€ ๊ทน์ ์œผ๋กœ ์ค„์–ด๋“ ๋‹ค โ†’ ํ•˜๋“œ์›จ์–ด ์ˆ˜๋ช… ์—ฐ์žฅ
  • ๋ณ„๋„์˜ ํ•„ํ„ฐ๋ง์ด ๋ถˆํ•„์š”ํ•˜๋‹ค โ†’ ์‹œ์Šคํ…œ ๋ณต์žก๋„ ๊ฐ์†Œ, ์ง€์—ฐ(latency) ๊ฐ์†Œ
  • ์ •์ฑ…์ด ๋” ์˜ˆ์ธก ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–‰๋™ํ•œ๋‹ค โ†’ ์•ˆ์ „์„ฑ๊ณผ ์‹ ๋ขฐ์„ฑ ํ–ฅ์ƒ

์ด๊ฒƒ์€ ๋‹จ์ˆœํžˆ โ€œ์ˆซ์ž๊ฐ€ ์ข‹๋‹คโ€๋ฅผ ๋„˜์–ด์„œ, ์‚ฐ์—… ์ˆ˜์ค€(industrial-grade)์˜ ์ •์ฑ… ๋ฐฐํฌ๋ฅผ ํ˜„์‹ค์ ์œผ๋กœ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ํ•ต์‹ฌ ํŠน์„ฑ์ด๋‹ค.

Sim-to-Real Transfer

๋…ผ๋ฌธ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•œ ์ •์ฑ…์„ ์‹ค์ œ Allegro Hand V4์— ์„ฑ๊ณต์ ์œผ๋กœ ์ „์ด(transfer)ํ–ˆ๋‹ค. ์‹ค์ œ ๋กœ๋ด‡์—์„œ์˜ ํ•ต์‹ฌ ๊ด€์ฐฐ:

  • FGP๋Š” ์™ธ๋ถ€ ๊ต๋ž€(ํ๋ธŒ๋ฅผ ์†๊ฐ€๋ฝ์œผ๋กœ ๋ฐ€์–ด๋‚ด๋Š” ๋“ฑ)์— ๋Œ€ํ•ด ๊ฐ•๊ฑดํ•œ ๋ณต์› ํ–‰๋™์„ ๋ณด์˜€๋‹ค.
  • \beta๊ฐ€ ๋†’์„์ˆ˜๋ก fabric์˜ โ€œ์œ ๋„ ํ–‰๋™โ€์ด ๊ฐ•ํ•ด์ ธ, ๊ต๋ž€์œผ๋กœ๋ถ€ํ„ฐ์˜ ๋ณต์›์ด ๋” ๋น ๋ฅด๊ณ  ์•ˆ์ •์ ์ด์—ˆ๋‹ค.
  • Low-pass filter ์—†์ด๋„ ๋ชจํ„ฐ์— ์•ˆ์ „ํ•œ ๋ช…๋ น์ด ์ƒ์„ฑ๋˜์—ˆ๋‹ค.

๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

1. ์ด๋ก ๊ณผ ์‹ค์šฉ์˜ ์šฐ์•„ํ•œ ๊ฒฐํ•ฉ

์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ํฐ ๋ฏธ๋•์€ geometric fabrics๋ผ๋Š” ์ˆ˜ํ•™์ ์œผ๋กœ ์—„๋ฐ€ํ•œ ์ด๋ก ์„ ์‹ค์ œ SOTA ์„ฑ๋Šฅ์œผ๋กœ ์—ฐ๊ฒฐํ–ˆ๋‹ค๋Š” ์ ์ด๋‹ค. ๋งŽ์€ ์ด๋ก ์  ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ๋‹จ์ˆœํ•œ ํƒœ์Šคํฌ์—์„œ๋งŒ ๊ฒ€์ฆ๋˜๋Š” ๋ฐ˜๋ฉด, ์ด ๋…ผ๋ฌธ์€ 16-DOF hand์˜ contact-rich manipulation์ด๋ผ๋Š” ๊ฐ€์žฅ ๋„์ „์ ์ธ ํƒœ์Šคํฌ์—์„œ SOTA๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

2. ๊ตฌ์กฐ์  ์•ˆ์ „์„ฑ ๋ณด์žฅ

RL ์ •์ฑ…์˜ ์•ˆ์ „์„ฑ์„ ์‚ฌํ›„์ (post-hoc)์ด ์•„๋‹Œ ๊ตฌ์กฐ์ (structural)์œผ๋กœ ๋ณด์žฅํ•œ๋‹ค. Low-pass filter๊ฐ€ โ€œ๋‚˜์œ ํ–‰๋™์„ ์‚ฌํ›„์— ์ •๋ฆฌโ€ํ•˜๋Š” ์ ‘๊ทผ์ด๋ผ๋ฉด, geometric fabric์€ โ€œ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‚˜์œ ํ–‰๋™์ด ๋ฐœ์ƒํ•˜๊ธฐ ์–ด๋ ค์šด ๊ตฌ์กฐโ€๋ฅผ ๋งŒ๋“ ๋‹ค. ์ด ์ฐจ์ด๋Š” ๊ทผ๋ณธ์ ์ด๋‹ค.

3. ๋ณด์ƒ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ๋‹จ์ˆœํ™”

Fabric์ด ํ–‰๋™์˜ ๊ธฐ๋ณธ ํŠน์„ฑ์„ ์ธ์ฝ”๋”ฉํ•˜๋ฏ€๋กœ, ๋ณด์ƒ ํ•จ์ˆ˜๋Š” โ€œ๋ฌด์—‡์„ ๋‹ฌ์„ฑํ•  ๊ฒƒ์ธ๊ฐ€โ€์—๋งŒ ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” RL์˜ ์‹ค๋ฌด์  ์–ด๋ ค์›€์„ ํฌ๊ฒŒ ๊ฒฝ๊ฐ์‹œํ‚จ๋‹ค.

4. Composability (์กฐํ•ฉ ๊ฐ€๋Šฅ์„ฑ)

Geometric fabrics์˜ ์ค‘์š”ํ•œ ์ด๋ก ์  ์„ฑ์งˆ์€ pullback๊ณผ combination ์—ฐ์‚ฐ์— ๋Œ€ํ•ด ๋‹ซํ˜€์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ๋ณต์žกํ•œ ํ–‰๋™์„ ๋” ๋‹จ์ˆœํ•œ ์ปดํฌ๋„ŒํŠธ์˜ ์กฐํ•ฉ์œผ๋กœ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋ฉฐ, transform tree ์œ„์—์„œ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋™์ž‘ํ•œ๋‹ค.

5. ๊ณ ์ฃผํŒŒ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ

Low-pass filter ์—†์ด ํ–‰๋™ ์Šค๋ฌด์Šค๋‹ˆ์Šค๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๊ฒƒ์€ ํ•˜๋“œ์›จ์–ด ๋ฐฐํฌ์˜ ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜์ด๋‹ค. ํ•„ํ„ฐ ์—†์ด ์•ˆ์ „ํ•œ ํ–‰๋™์„ ๋ณด์žฅํ•œ๋‹ค๋Š” ๊ฒƒ์€ latency๋ฅผ ์ค„์ด๋ฉด์„œ๋„ ์•ˆ์ „์„ฑ์„ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์˜๋ฏธ๋‹ค.

์•ฝ์  ๋ฐ ํ•œ๊ณ„

1. Fabric ์„ค๊ณ„์˜ ์ „๋ฌธ์„ฑ ์š”๊ตฌ

๋…ผ๋ฌธ์€ โ€œ๊ฒฝํ—˜ ๋งŽ์€ ์‹ค๋ฌด์ž๊ฐ€ ์ˆ˜ ์‹œ๊ฐ„ ๋‚ด์— ์„ค๊ณ„ํ•  ์ˆ˜ ์žˆ๋‹คโ€๊ณ  ์–ธ๊ธ‰ํ•˜์ง€๋งŒ, ์ด๋Š” geometric fabrics ์ด๋ก ์— ๋Œ€ํ•œ ์ƒ๋‹นํ•œ ์ดํ•ด๋ฅผ ์ „์ œํ•œ๋‹ค. Finsler ๊ธฐํ•˜ํ•™, spectral semi-sprays ๋“ฑ ๋น„๊ต์  ๋น„์ฃผ๋ฅ˜์ ์ธ ์ˆ˜ํ•™ ๋ฐฐ๊ฒฝ์ด ํ•„์š”ํ•˜๋ฉฐ, ์ด๋Š” ์ผ๋ฐ˜ ๋กœ๋ณดํ‹ฑ์Šค ์—ฐ๊ตฌ์ž์—๊ฒŒ ์ง„์ž… ์žฅ๋ฒฝ์ด ๋  ์ˆ˜ ์žˆ๋‹ค. RL์˜ ๋ณด์ƒ ์„ค๊ณ„๊ฐ€ โ€œ์–ด๋ ต์ง€๋งŒ ์ง๊ด€์ โ€์ด๋ผ๋ฉด, fabric ์„ค๊ณ„๋Š” โ€œ์ด๋ก ์ ์œผ๋กœ ๊ฐ•๋ ฅํ•˜์ง€๋งŒ ์ง„์ž… ์žฅ๋ฒฝ์ด ๋†’๋‹ค.โ€

2. ๋ฌผ์ฒด ๊ธฐํ•˜ํ•™ ์ผ๋ฐ˜ํ™”์˜ ๋ถ€์žฌ

๋…ผ๋ฌธ์€ ๋ช…์‹œ์ ์œผ๋กœ ์ธ์ •ํ•œ๋‹ค: โ€œ์šฐ๋ฆฌ๋Š” ๋ฌผ์ฒด ๊ธฐํ•˜ํ•™ ์ „๋ฐ˜์— ๊ฑธ์นœ ์ผ๋ฐ˜ํ™”์—๋Š” ์ดˆ์ ์„ ๋‘์ง€ ์•Š์•˜๋‹ค.โ€ ์‹คํ—˜์€ ์˜ค์ง ํ๋ธŒ ํ•˜๋‚˜์— ๋Œ€ํ•ด์„œ๋งŒ ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค. ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด ํ˜•์ƒ์— ๋Œ€ํ•ด fabric์„ ์–ด๋–ป๊ฒŒ ์ ์‘์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š”์ง€๋Š” ์—ด๋ฆฐ ์งˆ๋ฌธ์œผ๋กœ ๋‚จ์•„ ์žˆ๋‹ค. ์ตœ๊ทผ ํ›„์† ์—ฐ๊ตฌ์ธ DextrAH-RGB (2024)๊ฐ€ ์ด ๋ฐฉํ–ฅ์„ ํƒ์ƒ‰ํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ์ง€๋งŒ, ์—ฌ์ „ํžˆ ์ดˆ๊ธฐ ๋‹จ๊ณ„๋‹ค.

3. Task-Specific Fabric Design

ํ˜„์žฌ fabric์€ in-hand cube reorientation์ด๋ผ๋Š” ํŠน์ • ํƒœ์Šคํฌ์— ๋งž์ถค ์„ค๊ณ„๋˜์—ˆ๋‹ค. ๋‹ค๋ฅธ ํƒœ์Šคํฌ(์˜ˆ: tool use, bimanual manipulation, deformable object manipulation)์— ์ ์šฉํ•˜๋ ค๋ฉด fabric์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ์ƒˆ๋กœ ์„ค๊ณ„ํ•ด์•ผ ํ•œ๋‹ค. ๋ฒ”์šฉ์ ์ธ โ€œfoundation fabricโ€์˜ ๊ฐœ๋…์€ ์•„์ง ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค.

4. ฮฒ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ ๋ฏผ๊ฐ๋„

\beta ๊ฐ’์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ์ƒ๋‹นํžˆ ๋‹ฌ๋ผ์ง€๋ฉฐ, ์ตœ์ ๊ฐ’์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ์‹คํ—˜์ด ํ•„์š”ํ•˜๋‹ค. ์ด๋Š” fabric ์„ค๊ณ„์—์„œ์˜ ๋˜ ๋‹ค๋ฅธ ํŠœ๋‹ ์ถ•์„ ์ถ”๊ฐ€ํ•œ๋‹ค. ๋‹ค๋งŒ \beta์˜ ๋ฒ”์œ„๊ฐ€ ๋น„๊ต์  ๋„“์€ ์˜์—ญ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฏ€๋กœ, ์ด๊ฒƒ์ด ์น˜๋ช…์ ์ธ ๋ฌธ์ œ๋Š” ์•„๋‹ˆ๋‹ค.

5. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์˜์กด์„ฑ

๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜(Isaac Gym)๊ณผ GPU ์ž์›์— ๋Œ€ํ•œ ์˜์กด๋„๊ฐ€ ๋†’๋‹ค. ์ด๋Š” DeXtreme๊ณผ ๊ณต์œ ํ•˜๋Š” ํ•œ๊ณ„์ด์ง€๋งŒ, fabric layer์˜ ์ถ”๊ฐ€๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ณต์žก๋„๋ฅผ ๋” ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค. Fabric์˜ ์ˆ˜์น˜ ์ ๋ถ„์ด ๋ฌผ๋ฆฌ ์—”์ง„๊ณผ ๋…๋ฆฝ์ ์œผ๋กœ ์ˆ˜ํ–‰๋˜๋ฏ€๋กœ, ์ •ํ™•ํ•œ ๋™๊ธฐํ™”์™€ ์ˆ˜์น˜ ์•ˆ์ •์„ฑ์— ์ฃผ์˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

์œ„์น˜์  ๋งฅ๋ฝ

์ด ๋…ผ๋ฌธ์€ ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ ํ๋ฆ„์˜ ๊ต์ฐจ์ ์— ์œ„์น˜ํ•œ๋‹ค:

mindmap
  root((Geometric Fabrics<br/>for Policy Learning))
    ๊ธฐํ•˜ํ•™์  ์ œ์–ด
      Riemannian Motion Policies (RMPflow)
      Geometric Control (energy shaping)
      Finsler Geometry for Robotics
      Optimization Fabrics
    Dexterous Manipulation
      OpenAI Rubik's Cube (Shadow Hand)
      DeXtreme (Allegro Hand)
      DexPBT
      HORA
    RL + ๊ตฌ์กฐํ™”๋œ ์ œ์–ด
      Residual Policy Learning
      Hybrid RL-Control
      TamedPUMA (IL + Fabrics)
    Sim-to-Real Transfer
      Domain Randomization
      Automatic Domain Randomization (ADR)
      System Identification
Figure 3: ๊ด€๋ จ ์—ฐ๊ตฌ ๋งต

ํ•ต์‹ฌ ๋น„๊ต

vs. RMPflow

Geometric fabrics์˜ ์ง์ ‘์ ์ธ ์„ ์กฐ๊ฐ€ RMPflow(Cheng et al., 2018)๋‹ค. RMPflow๋Š” ์—ฌ๋Ÿฌ ํƒœ์Šคํฌ ๊ณต๊ฐ„์—์„œ์˜ ์ •์ฑ…์„ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ์ผ๊ด€๋˜๊ฒŒ ๊ฒฐํ•ฉํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค. Geometric fabrics๋Š” ์ด๋ฅผ Finsler ๊ธฐํ•˜ํ•™์œผ๋กœ ์ผ๋ฐ˜ํ™”ํ•˜๊ณ  ์•ˆ์ •์„ฑ ๋ณด์žฅ์„ ์ถ”๊ฐ€ํ•œ ๊ฒƒ์ด๋‹ค. RMPflow๊ฐ€ โ€œ์‹ค์šฉ์ ์ด์ง€๋งŒ ์ด๋ก ์  ๋ณด์žฅ์ด ๋ถ€์กฑโ€ํ–ˆ๋‹ค๋ฉด, geometric fabrics๋Š” โ€œ์‹ค์šฉ์ ์ด๋ฉด์„œ ์ด๋ก ์ ์œผ๋กœ๋„ ๊ฒฌ๊ณ ํ•˜๋‹ค.โ€

vs. OpenAI / DeXtreme

OpenAI์˜ Shadow Hand ํ”„๋กœ์ ํŠธ์™€ DeXtreme์€ end-to-end RL๋กœ dexterous manipulation์„ ๋‹ฌ์„ฑํ•œ ์„ ๊ตฌ์  ์—ฐ๊ตฌ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋“ค์€ ์ •์ฑ…์ด ์ง์ ‘ ๊ด€์ ˆ ๋ชฉํ‘œ๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๊ตฌ์กฐ๋กœ, ํ–‰๋™์˜ ์•ˆ์ „์„ฑ์€ ์ „์ ์œผ๋กœ low-pass filtering๊ณผ ๋ณด์ƒ ์ •๊ทœํ™”์— ์˜์กดํ–ˆ๋‹ค. FGP๋Š” ์ด ๊ตฌ์กฐ์— fabric layer๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์•ˆ์ „์„ฑ๊ณผ ์„ฑ๋Šฅ์„ ๋™์‹œ์— ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค.

vs. Residual Policy Learning

์ž”์ฐจ ์ •์ฑ… ํ•™์Šต์€ ๊ธฐ์กด ์ปจํŠธ๋กค๋Ÿฌ์˜ ์ถœ๋ ฅ์— ํ•™์Šต๋œ ๋ณด์ •์„ ๋”ํ•˜๋Š” ์ ‘๊ทผ์ด๋‹ค. FGP์™€ ์œ ์‚ฌํ•œ ์ •์‹ ์„ ๊ณต์œ ํ•˜์ง€๋งŒ, fabric์ด ์ œ๊ณตํ•˜๋Š” ๊ธฐํ•˜ํ•™์  ๊ตฌ์กฐ์™€ ์•ˆ์ •์„ฑ ๋ณด์žฅ์ด ์—†๋‹ค๋Š” ์ ์—์„œ ๊ทผ๋ณธ์ ์œผ๋กœ ๋‹ค๋ฅด๋‹ค.

vs. TamedPUMA

์ตœ๊ทผ ์—ฐ๊ตฌ์ธ TamedPUMA๋Š” ๋ชจ๋ฐฉ ํ•™์Šต(IL)๊ณผ geometric fabrics๋ฅผ ๊ฒฐํ•ฉํ–ˆ๋‹ค. FGP๊ฐ€ RL์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ˜๋ฉด, TamedPUMA๋Š” IL๋กœ ์ •์ฑ…์„ ํ•™์Šตํ•˜๊ณ  fabric์œผ๋กœ ์ถฉ๋Œ ํšŒํ”ผ์™€ ๊ด€์ ˆ ํ•œ๊ณ„๋ฅผ ๋ณด์žฅํ•œ๋‹ค. ๋‘ ์—ฐ๊ตฌ ๋ชจ๋‘ fabric์˜ ์•ˆ์ „ํ•œ ์œ ๋„ ์—ญํ• ์„ ํ™œ์šฉํ•˜์ง€๋งŒ, ์ ‘๊ทผ ๋ฐฉ์‹์ด ๋‹ค๋ฅด๋ฉฐ ์ƒํ˜ธ๋ณด์™„์ ์ด๋‹ค.

vs. DextrAH-RGB

FGP์˜ ์ง์ ‘์ ์ธ ํ›„์† ์—ฐ๊ตฌ๋กœ, RGB ์ž…๋ ฅ ๊ธฐ๋ฐ˜ ๋น„์ „ ์ •์ฑ…๊ณผ fabric์„ ๊ฒฐํ•ฉํ–ˆ๋‹ค. FGP์—์„œ ๋ชจ์…˜ ์บก์ฒ˜๋กœ ํ๋ธŒ ์ž์„ธ๋ฅผ ์ถ”์ ํ–ˆ๋‹ค๋ฉด, DextrAH-RGB๋Š” ์Šคํ…Œ๋ ˆ์˜ค ์นด๋ฉ”๋ผ๋กœ๋ถ€ํ„ฐ end-to-end๋กœ grasping์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด๋Š” fabric ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ™•์žฅ์„ฑ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ค‘์š”ํ•œ ํ›„์† ์—ฐ๊ตฌ๋‹ค.

์‹ค๋ฌด์  ์‹œ์‚ฌ์ : Allegro Hand ์—ฐ๊ตฌ์ž๋ฅผ ์œ„ํ•œ ๋…ธํŠธ

Allegro Hand V4๋กœ ์—ฐ๊ตฌํ•˜๋Š” ์‹ค๋ฌด์ž์—๊ฒŒ ์ด ๋…ผ๋ฌธ์ด ์ฃผ๋Š” ๊ตฌ์ฒด์ ์ธ ์‹œ์‚ฌ์ ์„ ์ •๋ฆฌํ•œ๋‹ค:

  1. ๊ด€์ ˆ ํ•œ๊ณ„ ์ฒ˜๋ฆฌ: Fabric์˜ ์ฒ™๋ ฅ ํ•ญ์„ ํ™œ์šฉํ•˜๋ฉด ๊ด€์ ˆ ํ•œ๊ณ„๋ฅผ ์†Œํ”„ํŠธํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” ํ•˜๋“œ ํด๋žจํ•‘๋ณด๋‹ค ๋™์—ญํ•™์ ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ณ , ํŠนํžˆ ๋น ๋ฅธ ์›€์ง์ž„์—์„œ ๊ฐ‘์ž‘์Šค๋Ÿฌ์šด ๋ถˆ์—ฐ์†์„ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค.

  2. ํ† ํฌ ์ œ์–ด ๋ชจ๋“œ ํ™œ์šฉ: FGP๋Š” ํ† ํฌ ์ œ์–ด(torque control)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. Allegro Hand์˜ ์œ„์น˜ ์ œ์–ด ๋ชจ๋“œ ๋Œ€์‹  ํ† ํฌ ์ œ์–ด ๋ชจ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, fabric์˜ ์ด์ ์„ ์˜จ์ „ํžˆ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

  3. ๊ฐ€์†๋„/์ €ํฌ ์ œ์•ฝ์˜ closed-form ์ฒ˜๋ฆฌ: ๋ชจํ„ฐ ๋ณดํ˜ธ๋ฅผ ์œ„ํ•œ ์ œ์•ฝ์„ ๋‹ซํžŒ ํ˜•ํƒœ์˜ QP๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ์ ‘๊ทผ์€, ๋‹ค๋ฅธ ํƒœ์Šคํฌ์—๋„ ๋ฐ”๋กœ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์‹ค์šฉ์ ์ธ ๊ธฐ๋ฒ•์ด๋‹ค.

  4. Disturbance wrench for sim-to-real: ํ๋ธŒ์— ๋ฌด์ž‘์œ„ ๋ Œ์น˜๋ฅผ ๊ฐ€ํ•˜๋Š” ๊ธฐ๋ฒ•์€ contact-rich manipulation์˜ sim-to-real gap์„ ์ค„์ด๋Š” ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์ด๋‹ค.

  5. Fabric ์„ค๊ณ„์˜ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅ์„ฑ: ๊ด€์ ˆ ํ•œ๊ณ„ ํšŒํ”ผ, ์†๊ฐ€๋ฝ ์ปฌ๋ง ๋“ฑ์˜ fabric ์ปดํฌ๋„ŒํŠธ๋Š” ํ๋ธŒ reorientation๋ฟ ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ in-hand manipulation ํƒœ์Šคํฌ์—์„œ๋„ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค.

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

๋…ผ๋ฌธ ์ €์ž๋“ค์ด ์ œ์‹œํ•œ ํ–ฅํ›„ ๋ฐฉํ–ฅ๊ณผ ํ•„์ž๊ฐ€ ์ œ์•ˆํ•˜๋Š” ์ถ”๊ฐ€์ ์ธ ๋ฐฉํ–ฅ:

  1. ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ”Œ๋žซํผ์œผ๋กœ์˜ ํ™•์žฅ: Allegro Hand ์™ธ์— ๋‹ค๋ฅธ robotic hand (LEAP, Shadow Hand, humanoid hand)์— ์ ์šฉ
  2. ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ๋กœ์˜ ํ™•์žฅ: In-hand manipulation ์™ธ์— ๋„๊ตฌ ์‚ฌ์šฉ, ์–‘์† ์กฐ์ž‘, ์œ ์—ฐ ๋ฌผ์ฒด ์กฐ์ž‘ ๋“ฑ
  3. Fabric ์„ค๊ณ„ ์ž๋™ํ™”: ํ˜„์žฌ ์ˆ˜๋™์œผ๋กœ ์„ค๊ณ„ํ•˜๋Š” fabric์„ ์ž๋™์œผ๋กœ ํ•™์Šตํ•˜๊ฑฐ๋‚˜ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•
  4. Foundation Fabric: ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ์— ๋ฒ”์šฉ์ ์œผ๋กœ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๊ธฐ๋ณธ fabric์˜ ๊ฐœ๋ฐœ
  5. VLA ๋ชจ๋ธ๊ณผ์˜ ๊ฒฐํ•ฉ: Vision-Language-Action ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ fabric์˜ forcing์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์•„ํ‚คํ…์ฒ˜
  6. Locomotion๊ณผ whole-body control๋กœ์˜ ํ™•์žฅ: ๋ณดํ–‰ ๋กœ๋ด‡์ด๋‚˜ ํœด๋จธ๋…ธ์ด๋“œ์˜ ์ „์‹  ์ œ์–ด์— fabric ์ ์šฉ (์ด๋ฏธ Kinodynamic Fabrics ๋“ฑ ์ดˆ๊ธฐ ์‹œ๋„๊ฐ€ ์žˆ์Œ)

์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

์ด ๋…ผ๋ฌธ์€ ๋กœ๋ด‡ ๊ฐ•ํ™”ํ•™์Šต์˜ ๊ตฌ์กฐ์  ํ•œ๊ณ„๋ฅผ ๊ทผ๋ณธ์ ์œผ๋กœ ์žฌ๊ณ ํ•œ ์—ฐ๊ตฌ๋‹ค. ํ•ต์‹ฌ ํ†ต์ฐฐ์€ ๋‹จ์ˆœํ•˜์ง€๋งŒ ๊ฐ•๋ ฅํ•˜๋‹ค: RL ์ •์ฑ…์ด ์ž‘๋™ํ•˜๋Š” ๋™์—ญํ•™์  ๊ธฐ๋ฐ˜ ์ž์ฒด๋ฅผ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ํ’๋ถ€ํ•˜๊ณ  ์•ˆ์ „ํ•˜๊ฒŒ ๋งŒ๋“ค๋ฉด, ์ •์ฑ… ํ•™์Šต์ด ๋” ์‰ฌ์›Œ์ง€๊ณ  ๊ฒฐ๊ณผ๊ฐ€ ๋” ์•ˆ์ „ํ•˜๋ฉฐ ์„ฑ๋Šฅ์ด ๋” ๋†’๋‹ค.

  • ๋ฌธ์ œ: ๊ธฐ์กด RL ์ •์ฑ…์€ ๋‹จ์ˆœํ•œ ์ปจํŠธ๋กค๋Ÿฌ(OSC, PD) ์œ„์—์„œ ์ž‘๋™ํ•˜์—ฌ, ๋ชจ๋“  ๋น„์„ ํ˜• ํ–‰๋™์„ ์ง์ ‘ ํ•™์Šตํ•ด์•ผ ํ•˜๋ฉฐ ๊ฒฐ๊ณผ๊ฐ€ ์•ˆ์ „ํ•˜์ง€ ์•Š๋‹ค.
  • ํ•ด๋ฒ•: Geometric fabrics๋กœ ์ •์˜๋œ ์ธ๊ณต ๋™์—ญํ•™์„ RL๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ์‚ฌ์ด์— ์‚ฝ์ž…ํ•˜์—ฌ, ์•ˆ์ „ํ•˜๊ณ  ์œ ๋„์ ์ธ ํ–‰๋™ ๊ธฐ๋ฐ˜ ์œ„์—์„œ ์ •์ฑ…์„ ํ•™์Šตํ•œ๋‹ค.
  • ๊ฒฐ๊ณผ: Allegro Hand์˜ in-hand cube reorientation์—์„œ ์ด์ „ SOTA ๋Œ€๋น„ ํ‰๊ท  CS 3๋ฐฐ ์ด์ƒ ํ–ฅ์ƒ, 5Hz ์ด์ƒ ํ–‰๋™ ๋…ธ์ด์ฆˆ ๊ฑฐ์˜ ์ œ๊ฑฐ, low-pass filter ๋ถˆํ•„์š”.

์ด ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ๋กœ๋ด‡๊ณตํ•™ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์ฃผ๋Š” ๋ฉ”์‹œ์ง€๋Š” ๋ถ„๋ช…ํ•˜๋‹ค: โ€œ๋” ๋˜‘๋˜‘ํ•œ ์ •์ฑ…์„ ํ•™์Šตํ•˜๊ธฐ ์ „์—, ์ •์ฑ…์ด ์ž‘๋™ํ•˜๋Š” ํ† ์–‘ ์ž์ฒด๋ฅผ ๋” ๋น„์˜ฅํ•˜๊ฒŒ ๋งŒ๋“ค๋ผ.โ€ Geometric fabrics๋Š” ๊ทธ ๋น„์˜ฅํ•œ ํ† ์–‘์„ ์ œ๊ณตํ•˜๋Š” ์ˆ˜ํ•™์ ์œผ๋กœ ์—„๋ฐ€ํ•˜๋ฉด์„œ๋„ ์‹ค์šฉ์ ์ธ ๋„๊ตฌ๋‹ค.


โ›๏ธ Dig Review

โ›๏ธ Dig โ€” Go deep, uncover the layers. Dive into technical detail.

1. ์„œ๋ก : โ€œ์ •์ฑ…์ด ๋ชจ๋“  ๊ฑธ ๋ฐฐ์šฐ๊ฒŒ ๋‘์ง€ ๋ง์žโ€ โ€” ์•ˆ์ „/๊ตฌ์กฐ๋ฅผ ๋™์—ญํ•™ ๋ ˆ์ด์–ด๋กœ ์˜ฎ๊ธฐ๊ธฐ

๋กœ๋ด‡์—์„œ RL์ด ์–ด๋ ค์šด ์ด์œ ๋Š” ๋‹จ์ˆœํžˆ โ€œ์ƒํƒœ-ํ–‰๋™ ๊ณต๊ฐ„์ด ํฌ๋‹คโ€๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ •์ฑ…์ด ์ƒ๋Œ€ํ•ด์•ผ ํ•˜๋Š” ๊ฑด 2์ฐจ(Second-order) ๋™์—ญํ•™(๊ด€์„ฑ, ์ฝ”๋ฆฌ์˜ฌ๋ฆฌ, ์ ‘์ด‰, ๋งˆ์ฐฐ, ์ค‘๋ ฅ ๋“ฑ)์ด๊ณ , ์ด ๋™์—ญํ•™์€ ํ–‰๋™(action)์ด ์ƒํƒœ(state)๋ฅผ ์–ด๋–ป๊ฒŒ ๋ฐ”๊พธ๋Š”์ง€๋ฅผ ๋ณต์žกํ•˜๊ฒŒ ๊ผฌ์•„๋†“์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์€ ์ด ์ ์„ ์ •๋ฉด์œผ๋กœ ์งš์Šต๋‹ˆ๋‹ค: RL ์ •์ฑ…์€ ๋ฐฉ๋Œ€ํ•œ ๊ฒฝํ—˜๊ณผ ๋ณต์žกํ•œ ๋ณด์ƒ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ์ด ์–ฝํž˜์„ ํ’€์–ด์•ผ ํ•˜๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ ์ •์ฑ…์ด ๊ณง์ž˜ bang-bang(๊ธ‰๊ฒฉํ•œ) ํ–‰๋™์„ ๋‚ด๊ณ  ํ•˜๋“œ์›จ์–ด์— ํ•ด๋กญ๋‹ค๋Š” ๋ฌธ์ œ๋กœ ์ด์–ด์ง„๋‹ค๊ณ ์š”. ๊ธฐ์กด์˜ ํ”ํ•œ ๊ตฌ์กฐ๋Š” ์ด๋ ‡์Šต๋‹ˆ๋‹ค.

  • ์ •์ฑ…์ด ์กฐ์ธํŠธ ๋ชฉํ‘œ๊ฐ/๋ชฉํ‘œ์†๋„๋ฅผ ๋‚ด๊ณ 
  • ์•„๋ž˜์ชฝ์—์„œ PD(๋˜๋Š” OSC) ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์ง์„ (์„ ํ˜•)์ ์ธ โ€œ๋ชฉํ‘œ ์ถ”์ข…โ€ ๋™์ž‘์„ ์ˆ˜ํ–‰

ํ•˜์ง€๋งŒ ์กฐ์ธํŠธ ๊ณต๊ฐ„/์ž‘์—…๊ณต๊ฐ„์—์„œ์˜ ์ง์„  ์ถ”์ข…์€ ๋กœ๋ด‡์ด ์‹ค์ œ๋กœ ํ•„์š”๋กœ ํ•˜๋Š” ํ’๋ถ€ํ•œ ๋น„์„ ํ˜• ํ–‰๋™(์˜ˆ: ์žฅ์• ๋ฌผ ์ฃผ๋ณ€์„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋Œ์•„๊ฐ€๊ธฐ, ์กฐ์ธํŠธ๋ฆฌ๋ฐ‹ ํšŒํ”ผ, ์ ‘์ด‰์„ ์œ ์ง€ํ•˜๋ฉฐ ์กฐ์ž‘ํ•˜๊ธฐ)์„ ๊ฑฐ์˜ ๋‹ด์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ๊ทธ โ€œ๋น„์„ ํ˜•์„ฑโ€์„ ์ „๋ถ€ RL์ด ๋ณด์ƒ์œผ๋กœ ํ•™์Šตํ•ด์•ผ ํ•˜๊ณ , ๋ณด์ƒ ์„ค๊ณ„ ๋‚œ์ด๋„์™€ ์•ˆ์ „ ๋ฌธ์ œ๋Š” ํญ๋ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์ œ์•ˆ์€ ํ•œ ๋ฌธ์žฅ์œผ๋กœ ์š”์•ฝํ•˜๋ฉด:

์ •์ฑ…์ด ๋ฐฐ์›Œ์•ผ ํ•  ๊ฒƒ์„ ์ค„์ด์ž.
๊ทธ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ธฐํ•˜ํ•™์  Fabrics๊ฐ€ ๋งŒ๋“ค์–ด๋‚ด๋Š” โ€œํ–‰๋™ ๋™์—ญํ•™(behavioral dynamics)โ€์„ ์ •์ฑ… ์•„๋ž˜์— ๊น”์•„,
์ •์ฑ…์€ ๊ทธ ์œ„์—์„œ ์•ˆ์ „ํ•˜๊ณ  ์œ ๋„๋œ(action-space๊ฐ€ ๋ฐ”๋€) ์ œ์–ด๋งŒ ํ•™์Šตํ•˜๊ฒŒ ํ•˜์ž.

2. ๋ฐฉ๋ฒ•: Geometric Fabrics โ†’ Behavioral Dynamics โ†’ Fabric-Guided Policy(FGP)

2.1 ํฐ ๊ทธ๋ฆผ(์•„ํ‚คํ…์ฒ˜)

์ด ๋…ผ๋ฌธ์€ โ€œ์‹ค์ œ ๋กœ๋ด‡ ๋™์—ญํ•™โ€ ์œ„์— โ€œ์ธ๊ณต(artificial) 2์ฐจ ๋™์—ญํ•™โ€์„ ์–น์–ด ์„ž์Šต๋‹ˆ๋‹ค. ๊ทธ ์ธ๊ณต ๋™์—ญํ•™์ด ๋ฐ”๋กœ (forced, energized) geometric fabric์ด๊ณ , ์ด ์กฐํ•ฉ์ด behavioral dynamics์ž…๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๋…ผ๋ฌธ ๊ตฌ์กฐ๋ฅผ ์ œ์–ด ๊ด€์ ์—์„œ ์žฌ์ •๋ฆฌํ•œ ๋‹ค์ด์–ด๊ทธ๋žจ์ž…๋‹ˆ๋‹ค.

flowchart TB
  subgraph Policy["RL Policy ฯ€(a|o)"]
    O[Observations o_t] --> A[Actions a_t]
  end

  subgraph FabricLayer["Geometric Fabric / Behavioral Dynamics"]
    A --> Fdrive["Driving force f_ฯ€(q, qdot, a)"]
    Fdrive --> FabricDyn["Artificial 2nd-order dynamics: qฬˆ_f = h(q_f, qฬ‡_f) + ..."]
    FabricDyn --> Qdes["Target/desired states (e.g., q_des or task-space force)"]
  end

  subgraph LowLevel["Low-level torque control"]
    Qdes --> Tau["ฯ„ = ID(q,qฬ‡,qฬˆ_des)+ PD tracking"]
  end

  subgraph RealRobot["Real robot dynamics"]
    Tau --> Plant["M(q)qฬˆ + C(q,qฬ‡) + g(q) + contacts = ฯ„"]
    Plant --> O
  end

ํ•ต์‹ฌ์€ ์ •์ฑ…์ด ๊ณง๋ฐ”๋กœ ฯ„(ํ† ํฌ)๋‚˜ q_des(๋ชฉํ‘œ๊ฐ)๋ฅผ โ€œ๊ทธ๋Œ€๋กœโ€ ๋‚ด๋Š” ๋Œ€์‹ , fabric์ด๋ผ๋Š” โ€™๊ฐ€์ด๋“œ ๋™์—ญํ•™โ€™์„ ๋ฐ€์–ด์ฃผ๋Š” ์ž…๋ ฅ(์˜ˆ: force-like action)์„ ๋‚ด๊ณ , ๊ทธ ๊ฒฐ๊ณผ๊ฐ€ ์•ˆ์ „ํ•˜๊ณ  ์˜๋ฏธ ์žˆ๋Š” ๊ถค์ /ํ–‰๋™์„ ๋” ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ๋งŒ๋“ค์–ด์ค€๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

2.2 (์ˆ˜์‹) Forcing Energized Fabric: ์™œ โ€œ๊ธฐํ•˜ํ•™์ (geometric)โ€์ด๊ณ  ์™œ ์•ˆ์ •์ ์ธ๊ฐ€?

๋…ผ๋ฌธ์€ fabric์„ โ€œ์•ˆ์ •์„ฑ์ด ๋ณด์žฅ๋˜๋Š”(Provably stable) 2์ฐจ ์‹œ์Šคํ…œ์˜ ํŠน์ • subclassโ€๋กœ ๋‘ก๋‹ˆ๋‹ค.์ •ํ™•ํ•œ ์ˆ˜์‹ ํ‘œ๊ธฐ๋Š” HTML์—์„œ ์ˆ˜์‹์ด ์ผ๋ถ€ ์ƒ๋žต๋˜์–ด ๋ณด์ด์ง€๋งŒ, ๋…ผ๋ฌธ ํ…์ŠคํŠธ๊ฐ€ ๋งํ•˜๋Š” ๊ตฌ์„ฑ์š”์†Œ๋Š” ๋ถ„๋ช…ํ•ฉ๋‹ˆ๋‹ค:

  • ์ƒํƒœ: (\mathbf{q}, \dot{\mathbf{q}}, \ddot{\mathbf{q}})
  • system metric \mathbf{M}(\cdot): โ€œ์งˆ๋Ÿ‰/์šฐ์„ ์ˆœ์œ„โ€ ์—ญํ•  (positive definite)
  • fabric term: ์†๋„์— ๋Œ€ํ•ด HD2(homogeneous degree-2)๋กœ ๋งŒ๋“ค์–ด ๊ฒฝ๋กœ๊ฐ€ ์†๋„์— ๋ถˆ๋ณ€(speed-invariant)์ธ โ€œ๊ธฐํ•˜ํ•™์  ๊ฒฝ๋กœโ€๋ฅผ ๋งŒ๋“ค๋„๋ก ์„ค๊ณ„
  • potential \psi(\mathbf{q})์˜ gradient, damping matrix \mathbf{B}, ๊ทธ๋ฆฌ๊ณ  ๊ธฐํ•˜๋ฅผ ๋ณด์กดํ•˜๋Š” ์Šค์นผ๋ผ damping(๋…ผ๋ฌธ์—์„œ ๋ณ„๋„ scalar๋กœ ์–ธ๊ธ‰)
  • energization coefficient \alpha: ์—๋„ˆ์ง€ ์•ˆ์ •์„ฑ ์œ ์ง€(energy stable)

์ง๊ด€์ ์œผ๋กœ ๋งํ•˜๋ฉด:

  • \mathbf{M}์ด โ€œ์ด ๊ณต๊ฐ„์—์„œ ์–ด๋–ค ๋ฐฉํ–ฅ์ด ์ค‘์š”ํ•œ๊ฐ€(=๊ฐ€์ค‘์น˜/์šฐ์„ ์ˆœ์œ„)โ€๋ฅผ ์ •ํ•˜๊ณ ,
  • fabric(HD2)์ด โ€œ๊ธฐ๋ณธ์ ์œผ๋กœ ๋”ฐ๋ฅด๊ณ  ์‹ถ์€ ๊ธธ(road network of paths)โ€์„ ๋งŒ๋“ค๋ฉฐ,
  • \nabla\psi, \mathbf{B}, scalar damping์€ โ€œ์ œ์•ฝ/์•ˆ์ „/๊ฐ์‡ โ€๋ฅผ ๊ฑธ์–ด์ฃผ๊ณ ,
  • energization์€ โ€œ์—๋„ˆ์ง€ ๊ด€์ ์—์„œ ์‹œ์Šคํ…œ์ด ํญ์ฃผํ•˜์ง€ ์•Š๋„๋กโ€ ๊ท ํ˜•์„ ์žก์Šต๋‹ˆ๋‹ค.

์ด โ€œroad networkโ€ ๋น„์œ ๋Š” fabrics ์ด๋ก  ๋…ผ๋ฌธ(2309.07368)์ด ์ง์ ‘์ ์œผ๋กœ ๊ฐ•์กฐํ•˜๋Š” ํ•ต์‹ฌ ์ง๊ด€์ž…๋‹ˆ๋‹ค: ๊ธฐํ•˜ํ•™์  fabric์€ ์†๋„ ๋ถˆ๋ณ€์˜ โ€™๊ธธ๋ง(road network)โ€™์„ ๋งŒ๋“ค๊ณ , ์ •์ฑ…์€ ๊ทธ ์œ„์—์„œ ์†๋„๋ฅผ ์กฐ์ ˆํ•˜๊ณ  ๊ธธ์„ ๊ฐˆ์•„ํƒ€๋ฉฐ ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•œ๋‹ค.

2.3 (์ˆ˜์‹) Behavioral Dynamics: ์ธ๊ณต ๋™์—ญํ•™๊ณผ ์‹ค์ œ ๋™์—ญํ•™์˜ ๊ฒฐํ•ฉ

๋…ผ๋ฌธ์€ ์ธ๊ณต fabric ๋™์—ญํ•™์„ โ€œ์ธ๊ณต forceโ€๋กœ ๋‹ค์‹œ ์“ฐ๊ณ , ์ด๋ฅผ ์‹ค์ œ ๋กœ๋ด‡ ๋™์—ญํ•™๊ณผ ํ† ํฌ ์ œ์–ด๋ฒ•์œผ๋กœ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค. (๋…ผ๋ฌธ II-B, II-C) ๊ฐœ๋…์„ ์ •๋ฆฌํ•˜๋ฉด:

  1. ์ธ๊ณต ๋™์—ญํ•™(=fabric)์€ \ddot{\mathbf{q}}_f = \text{(fabric dynamics)} + \text{(policy๊ฐ€ ์ฃผ๋Š” driving force)} ์ฒ˜๋Ÿผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์€ ์ด๋ฅผ โ€œartificial forceโ€ ํ˜•ํƒœ๋กœ compact rewrite ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค.

  2. ์‹ค์ œ ๋กœ๋ด‡ ๋™์—ญํ•™์€ \mathbf{M}(\mathbf{q})\ddot{\mathbf{q}} + \mathbf{f}(\mathbf{q},\dot{\mathbf{q}}) = \boldsymbol{\tau} ๋ฅ˜์˜ ํ˜•ํƒœ(์ ‘์ด‰/๋งˆ์ฐฐ/์ฝ”๋ฆฌ์˜ฌ๋ฆฌ/์ค‘๋ ฅ ํฌํ•จ)๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๊ณ , ๋…ผ๋ฌธ์€ \mathbf{M}๊ณผ \mathbf{f}๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค.

  3. ๋‘˜์„ ์ž‡๋Š” ๊ฑด ํ† ํฌ ์ œ์–ด๋ฒ• \boldsymbol{\tau}(\cdot)์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์€ ์‹ค๋ฌด์ ์œผ๋กœ ํ”ํ•œ ํ˜•ํƒœ์ธ

    • inverse dynamics compensation + joint PD tracking ๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ์ž์œ ๊ณต๊ฐ„์—์„œ \mathbf{q}\approx \mathbf{q}_f๋ฅผ ๊ฐ•ํ•˜๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ณ , ์ ‘์ด‰ ์‹œ์—๋Š” fabric state์™€ real state์˜ โ€œ๋ถ„๋ฆฌโ€๊ฐ€ ์ ‘์ด‰๋ ฅ์„ ์œ ๋„ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค(์ž„ํ”ผ๋˜์Šค/์–ด๋“œ๋ฏธํ„ด์Šค์™€ ์œ ์‚ฌํ•œ ๋ฐœ์ƒ). ์ด ์—ฐ๊ฒฐ์€ ๋กœ๋ด‡ ์ œ์–ด ๊ด€์ ์—์„œ ์•„์ฃผ ์ค‘์š”ํ•œ๋ฐ์š”. ์ •์ฑ…์ด ์ง์ ‘ ์ ‘์ด‰๋ ฅ์„ โ€™๋ชจ๋ธ๋งโ€™ํ•ด์„œ ๋งž์ถ”๋Š” ๋Œ€์‹ , fabric/PD ๊ณ„์ธต์ด ๋งŒ๋“ค์–ด๋‚ด๋Š” โ€œ๋ถ„๋ฆฌโ€๋กœ ์ ‘์ด‰์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฐœ์ƒํ•˜๋„๋ก ์„ค๊ณ„ํ•˜๋ฉด, ์ •์ฑ…์€ ๊ทธ ์œ„์—์„œ ์กฐ์ž‘์„ ํ•™์Šตํ•˜๊ธฐ ์‰ฌ์›Œ์ง‘๋‹ˆ๋‹ค.

2.4 (ํ•ต์‹ฌ) Policy Action Space์˜ ์žฌ์ •์˜: โ€œ์กฐ์ธํŠธ ๋ชฉํ‘œโ€ ๋Œ€์‹  โ€œfingertip-space force๋ฅผ ๋ฐ€์–ด๋ผโ€

๋…ผ๋ฌธ์˜ ๊ตฌํ˜„ ํŒŒํŠธ(Allegro Hand cube reorientation)์—์„œ ๊ฐ€์žฅ ์ธ์ƒ์ ์ธ ์„ค๊ณ„๋Š” action space์ž…๋‹ˆ๋‹ค.

  • RL action a_t๋ฅผ ๋ฐ”๋กœ ์กฐ์ธํŠธ ๋ชฉํ‘œ๋กœ ์“ฐ์ง€ ์•Š๊ณ ,

  • ์†๋(fingertip) ๊ณต๊ฐ„์˜ force๋กœ ๋ณ€ํ™˜ํ•œ ๋’ค,

  • Jacobian์œผ๋กœ configuration/root๋กœ pull-back ํ•ด์„œ fabric์— ์ฃผ์ž…ํ•ฉ๋‹ˆ๋‹ค. ๋…ผ๋ฌธ ํ‘œํ˜„์„ ์š”์•ฝํ•˜๋ฉด:

  • action์€ [-1,1] ๊ฐ™์€ ๋ฒ”์œ„๋กœ clamp ํ›„ scaleํ•ด์„œ fingertip-space force๋ฅผ ๋งŒ๋“ค๊ณ ,

  • ์ด๋ฅผ Jacobian์œผ๋กœ ๋ฃจํŠธ๋กœ ๋Œ์–ด์™€ fabric forcing ํ•ญ์— ๋„ฃ๋Š”๋‹ค. ์ด๊ฒŒ ์™œ ์ค‘์š”ํ•œ๊ฐ€?

  • ์ •์ฑ…์ด ์ง์ ‘ โ€œ์กฐ์ธํŠธ ๊ฐ๋„ ๊ถค์ โ€์„ ๋ฏธ์„ธํ•˜๊ฒŒ ์„ค๊ณ„ํ•˜์ง€ ์•Š์•„๋„ ๋ฉ๋‹ˆ๋‹ค.

  • ์ •์ฑ…์€ โ€œ์–ด๋А ์†๋์„ ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ์–ผ๋งˆ๋‚˜ ๋ฐ€์ง€โ€๋งŒ ๋Œ€์ถฉ bang-bang๋กœ ๋‚ด๋„,

  • fabric์ด ๊ธฐ๋ณธ ๊ฒฝ๋กœ/์ œ์•ฝ ํšŒํ”ผ/๊ฐ์‡ ๋ฅผ ๊น”์•„์ฃผ๊ธฐ ๋•Œ๋ฌธ์—,

  • ๊ฒฐ๊ณผ ํ–‰๋™์€ ์•ˆ์ „ํ•˜๊ณ  ๋งค๋ˆํ•œ(ํ˜น์€ ์ ์–ด๋„ ๊ณ ์ฃผํŒŒ ๋…ธ์ด์ฆˆ๊ฐ€ ์–ต์ œ๋œ) ๋ฐฉํ–ฅ์œผ๋กœ ์œ ๋„๋ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ๋…ผ๋ฌธ์€ โ€œraw RL actions๋Š” bang-bang์ธ๋ฐ๋„, ๊ฒฐ๊ณผ ํƒ€๊นƒ ์กฐ์ธํŠธ ์‹ ํ˜ธ๋Š” 5Hz ์ด์ƒ์—์„œ ๊ฑฐ์˜ 0์— ๊ฐ€๊น๋‹คโ€๋Š” ์ŠคํŽ™ํŠธ๋Ÿผ ๋ถ„์„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

2.5 (์ˆ˜์‹) ๊ฐ€์†๋„/์ €ํฌ ์ œํ•œ์„ QP๋กœ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•(๋‹ซํžŒํ˜• ํ•ด ํฌํ•จ)

์‹ค๋กœ๋ด‡์—์„œ ์ž์ฃผ ๋ฌด์‹œ๋˜๋‹ค๊ฐ€ ํฐ ์‚ฌ๊ณ ๋กœ ๋Œ์•„์˜ค๋Š” ์ œ์•ฝ์ด:

  • joint acceleration limit
  • joint jerk limit

๋…ผ๋ฌธ์€ ์ด๋ฅผ fabric/ํ† ํฌ๋ฒ•์น™ ์„ค๊ณ„์— ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํฌํ•จ์‹œํ‚ค๋Š” ๊ฐ„๋‹จํ•œ QP๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด:

  • ์›ํ•˜๋Š” \ddot{\mathbf{q}} (๋˜๋Š” \ddot{\mathbf{q}}_f)๊ฐ€ ๋‚˜์™”์„ ๋•Œ,

  • ์ด๋ฅผ ๊ทธ๋Œ€๋กœ ์“ฐ์ง€ ๋ง๊ณ ,

  • โ€œ์Šค์ผ€์ผ์„ ์ค„์ด๋Š”โ€ ํ˜•ํƒœ๋กœ ์ œํ•œ์„ ๋งŒ์กฑ์‹œํ‚ค๋„๋ก \eta ๊ฐ™์€ ์Šค์นผ๋ผ๋ฅผ ํ‘ธ๋Š” ์‹์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ ํ…์ŠคํŠธ์— ๋”ฐ๋ฅด๋ฉด:

  • QP๋ฅผ ์„ธ์šฐ๋ฉด closed-form solution์ด ์žˆ๊ณ , \eta \to \infty์ผ ๋•Œ \ddot{\mathbf{q}}\to 0๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์–ด์„œ,

  • ๊ฐ€์†๋„๋ฅผ ์›ํ•˜๋Š” ๋งŒํผ ์ž‘๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์ €ํฌ ์ œํ•œ์€ ์ด์‚ฐํ™”๋œ jerk ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ โ€œ๊ฐ€๋Šฅํ•œ ์ตœ๋Œ€ jerkโ€๋ฅผ ๊ณ„์‚ฐํ•ด ๊ฐ€์†๋„ ์ œํ•œ์„ ๋” ๋ณด์ˆ˜์ ์œผ๋กœ ์žฌ์„ค์ •ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋‹ค๋ฃน๋‹ˆ๋‹ค. #### ์˜์‚ฌ์ฝ”๋“œ(์‹ค๋ฌด ์ ์šฉํ˜•)

Given desired acceleration qdd_des (from fabric / policy)
Given accel_limit qdd_max and jerk_limit j_max
Given last acceleration qdd_prev

# 1) jerk -> effective accel limit (paper's discretized bound idea)
qdd_max_eff = min(qdd_max, (j_max * dt)/2 )  # ์ง๊ด€: jerk ์ œํ•œ์ด ๋นก์„ธ๋ฉด accel๋„ ์ค„์—ฌ์•ผ ํ•จ

# 2) Solve scaling QP (closed-form in paper)
# choose scalar eta >= 0 so that ||qdd(eta)||_inf <= qdd_max_eff
eta = find_min_eta_satisfying_limit(qdd_des, qdd_max_eff)

qdd_safe = qdd_des / (1 + eta)   # ์ง๊ด€์  ํ˜•ํƒœ(๋…ผ๋ฌธ์€ closed-form๋กœ ์œ ๋„๋จ)
return qdd_safe

์ฃผ์˜: ์œ„ ์‹์˜ ์ •ํ™•ํ•œ ๋‹ซํžŒํ˜• ํ‘œํ˜„์€ ๋…ผ๋ฌธ ์ˆ˜์‹(5~10)์— ์ข…์†์ด์ง€๋งŒ, โ€œ์Šค์นผ๋ผ ํ•˜๋‚˜๋กœ ์ „์ฒด ๊ฐ€์†๋„ ํฌ๊ธฐ๋ฅผ ์กฐ์ ˆํ•˜์—ฌ ์ œ์•ฝ์„ ๋งŒ์กฑโ€์‹œํ‚ค๋Š” ๊ตฌ์กฐ๊ฐ€ ์š”์ง€์ž…๋‹ˆ๋‹ค.

2.6 Fabrics ์„ค๊ณ„(Allegro Hand): Attraction / Repulsion / Energization / Geometric Damping

๋…ผ๋ฌธ์€ cube reorientation์„ ์œ„ํ•ด fabric์„ ๊ตฌ์„ฑํ•˜๋Š” ์š”์†Œ๋ฅผ ๋ถ„ํ•ดํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. (1) Attraction: โ€œ์ ‘์ด‰๊ณผ ์ผ€์ด์ง€(caging)๋ฅผ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœโ€

  • ์†๋์ด ํ๋ธŒ ์ค‘์‹ฌ์„ ํ–ฅํ•ด โ€œ๋Œ๋ฆฌ๊ฒŒโ€ ํ•˜์—ฌ ์ ‘์ด‰์„ ์œ ๋„
  • ์†๊ฐ€๋ฝ ๊ด€์ ˆ์ด โ€œ์•ˆ์œผ๋กœ ๋ง๋ฆฐ(curl)โ€ ์ž์„ธ๋ฅผ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ๋งŒ๋“ค์–ด ์ผ€์ด์ง€ ํšจ๊ณผ ์ œ๊ณต ์ด ์„ค๊ณ„๋Š” RL์—๊ฒŒ โ€œ์ ‘์ด‰ ์œ ์ง€/ํ˜•์ƒ ๊ฐ์‹ธ๊ธฐโ€ ๊ฐ™์€ ์กฐ์ž‘์˜ ๊ธฐ๋ฐ˜์„ ๋ณด์ƒ ์—†์ด๋„ ๊น”์•„์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

(2) Repulsion: โ€œ์กฐ์ธํŠธ ๋ฆฌ๋ฐ‹์„ barrier metric์œผ๋กœโ€

  • upper/lower joint limit task space๋ฅผ ๋‘๊ณ 
  • metric์˜ ๋Œ€๊ฐ ์„ฑ๋ถ„์ด ๋ฆฌ๋ฐ‹์— ๊ฐ€๊นŒ์›Œ์งˆ์ˆ˜๋ก ์ปค์ง€๋Š” barrier ํ˜•ํƒœ๊ฐ€ ๋˜๊ฒŒ ์„ค๊ณ„
  • acceleration ํ•ญ๋„ โ€œ๋ฆฌ๋ฐ‹์—์„œ ๋ฉ€์–ด์ง€๊ฒŒโ€ ์–‘์˜ ๋ฐฉํ–ฅ์œผ๋กœ ์ž‘๋™ํ•˜๋„๋ก ๋‘  ์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ํฌ์ธํŠธ๋Š” โ€œ์ œ์•ฝ์„ penalty๋กœ๋งŒ ๋„ฃ๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผโ€ ๋™์—ญํ•™์˜ metric/๊ฐ€์†๋„ ๊ตฌ์กฐ๋กœ ํŽธ์ž…ํ•ด๋ฒ„๋ฆฐ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

(3) Energization: โ€œfabric ์ž์ฒด์˜ ์—๋„ˆ์ง€ ์•ˆ์ •์„ฑโ€

energization coefficient๋Š” ๊ธฐ์กด ์ด๋ก (Optimization/Geometric fabrics)์—์„œ ์ œ์‹œ๋œ ์ •๋ฆฌ(Theorem ๊ธฐ๋ฐ˜)๋กœ ๊ณ„์‚ฐํ•ด, fabric์ด energy stableํ•˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. (4) Geometrically-consistent damping: โ€œ๊ฒฝ๋กœ๋Š” ์œ ์ง€, ์†๋„๋งŒ ๋Šฆ์ถ”๊ธฐโ€

๋…ผ๋ฌธ์€ damping์„ ํ›ˆ๋ จ ์‹œ ์ž‘๊ฒŒ ๋‘์–ด ํƒํ—˜์„ ๋•๊ณ , ๋ฐฐํฌ ์‹œ ์—ฌ๋Ÿฌ ๊ฐ’์„ ์‹คํ—˜ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ damping์„ ํ‚ค์šฐ๋ฉด

  • ๊ฒฝ๋กœ(์†๊ฐ€๋ฝ ๊ฒฝ๋กœ)๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ์œ ์ง€๋˜๋ฉด์„œ
  • ์†๋„๋งŒ ๋А๋ ค์ง€๋Š” โ€œ๊ธฐํ•˜ ์ผ๊ด€โ€ ๊ฐ์‡ ๊ฐ€ ๋˜์–ด sim2real์— ์œ ๋ฆฌํ•˜๋‹ค๊ณ  ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค.

3. ์‹คํ—˜: Allegro Hand ํ๋ธŒ ์žฌ์ง€ํ–ฅ(in-hand reorientation) โ€” ํ•™์Šต ์†๋„/์„ฑ๋Šฅ/ํ•˜๋“œ์›จ์–ด ์นœํ™”์„ฑ

3.1 ์‹คํ—˜ ์…‹์—… ๊ฐœ์š”

  • ํ”Œ๋žซํผ: 16-actuator, 4-finger Allegro Hand v4
  • ๋ฌธ์ œ: ์† ์•ˆ์—์„œ ํ๋ธŒ์˜ orientation์„ ์ง€์†์ ์œผ๋กœ ํšŒ์ „(reorientation)
  • ์ธ์‹: vision-based cube pose estimation (๊ธฐ์กด DeXtreme ์…‹์—…๊ณผ ์œ ์‚ฌ) ๋…ผ๋ฌธ์€ ๋น„๊ต ๋Œ€์ƒ์œผ๋กœ DeXtreme ๊ณ„์—ด(๊ธฐ์กด SOTA ๋ผ์ธ)์„ ๋‘๊ณ , fabric-guided policy(FGP)์˜ ํ•™์Šต/์‹ค๋กœ๋ด‡ ์„ฑ๋Šฅ/์•ก์…˜ ๋…ธ์ด์ฆˆ๋ฅผ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

3.2 ํ•™์Šต ๊ด€์ฐฐ: FGP๋Š” ๋” โ€œ๋งค๋ˆํ•จ์„ ํ•™์Šตโ€ํ•˜๊ธฐ ์–ด๋ ต์ง€๋งŒ, ๊ฐ€๋Šฅํ•˜๋‹ค

๋…ผ๋ฌธ Table I ์š”์ง€:

  • DeXtreme (new)๋Š” ํŠน์ • ์—”ํŠธ๋กœํ”ผ ์ˆ˜์ค€(์˜ˆ: -0.5 npd)์— ๋„๋‹ฌํ•˜๋Š” ์‹œ๊ฐ„์ด ๋” ๋น ๋ฅธ ๊ฒฝํ–ฅ

  • FGP๋Š” ๋” ๋‚ฎ์€ entropy(๋” โ€œ๊ฒฐ์ •์ /๋งค๋ˆํ•œโ€ ๋ฐฉํ–ฅ)๋กœ ๊ฐ€๋Š” ๊ฒฝํ–ฅ์ด ๋ณด์ด์ง€๋งŒ, ํ•™์Šต์ด ๋” ์˜ค๋ž˜ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Œ

  • ๊ฒฐ๋ก ์ ์œผ๋กœ โ€œ๊ณ ์„ฑ๋Šฅ FGP๋„ ์•ฝ 1์ฃผ ์ •๋„ ํ•™์Šต์œผ๋กœ ๊ฐ€๋Šฅโ€ ์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ํ•ด์„์€:

  • FGP๋Š” action space๊ฐ€ force-like์ด๊ณ  ์•„๋ž˜์—์„œ ๋™์—ญํ•™์ด ํ–‰๋™์„ โ€œ์ •๋ฆฌโ€ํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์—,

  • ์ •์ฑ… ์ž…์žฅ์—์„  ํƒํ—˜(exploration)๊ณผ ์„ฑ๋Šฅ ์‚ฌ์ด์˜ trade-off๊ฐ€ ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ฆ‰, โ€œ์ •์ฑ…์ด ๋” bang-bang๋กœ ํƒํ—˜ํ•ด๋„ ์•ˆ์ „/๊ฒฝ๋กœ๊ฐ€ ์œ ์ง€โ€๋˜์ง€๋งŒ,

  • ๊ทธ bang-bang๊ฐ€ ์™„์ „ํžˆ ์‚ฌ๋ผ์ง€๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฑด ๋˜ ๋ณ„๊ฐœ์˜ ๋ฌธ์ œ(์—”ํŠธ๋กœํ”ผ, smoothness ๋“ฑ)์ž…๋‹ˆ๋‹ค.

3.3 ์‹ค๋กœ๋ด‡ ์„ฑ๋Šฅ: CS(์—ฐ์† ์„ฑ๊ณต) / RPM(์†๋„) / ๊ณ ์ฃผํŒŒ ๋…ธ์ด์ฆˆ ์–ต์ œ

๋…ผ๋ฌธ์€ ์„ธ ์ง€ํ‘œ๋ฅผ ๋ด…๋‹ˆ๋‹ค. * CS (consecutive success): ์—ฐ์† ์„ฑ๊ณต ํšŒ์ „ ์ˆ˜ * RPM (rotations per minute): ์„ฑ๊ณต ์†๋„ * Action noise rejection: 5Hz ์ด์ƒ ์„ฑ๋ถ„ ์–ต์ œ(FFT๋กœ ๋ถ„์„)

(A) CS / RPM: damping ๊ฐ’์— ๋”ฐ๋ฅธ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„

Table II์—์„œ FGP๋Š” damping ๊ฐ’์„ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ ์„ฑ๋Šฅ์„ ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ๋ฌธ์žฅ:

  • FGP๋Š” damping ์ฆ๊ฐ€์— ๋”ฐ๋ผ CS๊ฐ€ ์–ด๋А ์ง€์ ๊นŒ์ง€ ์ข‹์•„์ง€๋‹ค(์˜ˆ: ํŠน์ • ๊ฐ’๊นŒ์ง€), ์ดํ›„ ๊ณผ๋„ํ•˜๋ฉด ์ €ํ•˜
  • RPM์€ DeXtreme๋ณด๋‹ค FGP๊ฐ€ ์ „๋ฐ˜์ ์œผ๋กœ ์šฐ์ˆ˜, ์ž‘์€ damping์€ ์†๊ฐ€๋ฝ ์›€์ง์ž„์ด ๋น ๋ฅด์ง€๋งŒ ์—๋Ÿฌ๊ฐ€ ๋Š˜์–ด RPM์ด ๊ผญ ๊ฐœ์„ ๋˜์ง„ ์•Š์Œ
  • โ€œ์ •๊ตํ•จ(meticulous)โ€์ด ์ฆ๊ฐ€ํ•˜๋ฉด precision๊ณผ CS๊ฐ€ ๊ฐœ์„  ์ด๋ฅผ ํ•œ ์ค„๋กœ ์š”์•ฝํ•˜๋ฉด:

FGP๋Š” โ€œ๊ฒฝ๋กœ๋Š” ์œ ์ง€ํ•œ ์ฑ„ ์†๋„๋งŒ ์กฐ์ ˆโ€ํ•˜๋Š” damping knob๋กœ sim2real ์„ฑ๋Šฅ์„ ํŠœ๋‹ํ•  ์ˆ˜ ์žˆ๋‹ค. (B) ๊ณ ์ฃผํŒŒ ๋…ธ์ด์ฆˆ ์–ต์ œ: โ€œbang-bang action์ธ๋ฐ๋„ ํ•˜๋“œ์›จ์–ด ์นœํ™”์ โ€

๋…ผ๋ฌธ์ด ๋งค์šฐ ๊ฐ•ํ•˜๊ฒŒ ์ฃผ์žฅํ•˜๋Š” ํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค.

  • DeXtreme๋Š” ๊ฐ•ํ•œ low-pass filtering์„ ์จ๋„ 5Hz ์ด์ƒ ์„ฑ๋ถ„์ด ๋‚จ๋Š”๋ฐ,

  • FGP๋Š” 5Hz ์ด์ƒ ์ŠคํŽ™ํŠธ๋Ÿผ ์ง„ํญ์ด ๊ฑฐ์˜ 0์— ๊ฐ€๊น๋‹ค

  • ํฅ๋ฏธ๋กญ๊ฒŒ๋„, FGP์˜ raw RL action์€ bang-bang์ผ ์ˆ˜ ์žˆ๋Š”๋ฐ๋„ ์ตœ์ข… ์กฐ์ธํŠธ ํƒ€๊นƒ ์‹ ํ˜ธ๋Š” ๊นจ๋—ํ•˜๋‹ค ์ด๊ฒŒ ์˜๋ฏธํ•˜๋Š” ๋ฐ”๋Š” ์‹ค๋ฌด์ ์œผ๋กœ ํฝ๋‹ˆ๋‹ค.

  • โ€œ์ •์ฑ… ์ถœ๋ ฅ์— ํ›„์ฒ˜๋ฆฌ ํ•„ํ„ฐ(EMA/LPF)๋ฅผ ๋•์ง€๋•์ง€ ๋ถ™์ด๋Š” ๋ฐฉ์‹โ€์€

    • ํƒํ—˜์„ ๋ฐฉํ•ดํ•˜๊ณ 
    • ์‘๋‹ต์„ฑ์„ ๋–จ์–ด๋œจ๋ฆฌ๋ฉฐ
    • ๊ทธ๋ž˜๋„ ์™„์ „ํ•œ ์•ˆ์ „์„ ๋ณด์žฅํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ˜๋ฉด FGP๋Š” โ€œ๊ตฌ์กฐ์ ์œผ๋กœโ€ ๋…ธ์ด์ฆˆ๊ฐ€ ์œ„์—์„œ ์•„๋ž˜๋กœ ์ „๋‹ฌ๋˜๋Š” ๊ฒƒ์„ ์•ฝํ™”(attenuation)์‹œ์ผœ, ํ•˜๋“œ์›จ์–ด ์œ ์ง€๋น„(๋งˆ๋ชจ/์ˆ˜๋ฆฌ) ๊ด€์ ์˜ ์‹ค์šฉ์„ฑ์„ ์ „๋ฉด์— ๋‚ด์„ธ์›๋‹ˆ๋‹ค.

3.4 ๋…ผ๋ฌธ ๊ทธ๋ฆผ(ํ…์ŠคํŠธ ์„ค๋ช…): Fig.1 FFT ์ŠคํŽ™ํŠธ๋Ÿผ

๋…ผ๋ฌธ Fig.1์€ FGP์™€ DeXtreme์˜ target joint angles(๋˜๋Š” ์œ ์‚ฌ ํƒ€๊นƒ ์‹ ํ˜ธ)๋ฅผ FFT๋กœ ๋ณธ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค.

    1. FGP: 2Hz ์ดํ•˜์— ์—๋„ˆ์ง€๊ฐ€ ๋ชฐ๋ฆฌ๊ณ , 5Hz ์ด์ƒ์€ ๊ฑฐ์˜ 0
    1. DeXtreme: 5Hz ์ด์ƒ ์„ฑ๋ถ„์ด ์ƒ๋Œ€์ ์œผ๋กœ ํฌ๊ฒŒ ๋‚จ์Œ > ํ•ด์„: FGP๋Š” โ€œ์ •์ฑ…์ด ๊ฑฐ์น ๊ฒŒ ๋‘๋“œ๋ ค๋„(fast switching) fabric+ํ† ํฌ ๊ณ„์ธต์ด ์ด๋ฅผ ๊ธฐ๊ณ„์ ์œผ๋กœ ์ •๋ˆโ€ํ•˜์—ฌ, ๊ณ ์ฃผํŒŒ๊ฐ€ ๊ด€์ ˆ ๋ชฉํ‘œ๋กœ ๋ฒˆ์—ญ๋˜๋Š” ๊ฑธ ๋ง‰์•„์ค€๋‹ค.

4. ๋น„ํŒ์  ๊ณ ์ฐฐ: ๊ฐ•์ /์•ฝ์ /ํ•œ๊ณ„

4.1 ๊ฐ•์  1 โ€” ์•ˆ์ „์„ โ€œ๋ณด์ƒ/ํŒจ๋„ํ‹ฐโ€๊ฐ€ ์•„๋‹ˆ๋ผ โ€œ๋™์—ญํ•™โ€์œผ๋กœ ์˜ฎ๊ธด๋‹ค

์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ํฐ ๋ฏธ๋•์€ ์•ˆ์ „์„ RL์˜ reward shaping์—์„œ ๋Œ์–ด๋‚ด๋ ค ์ œ์–ด/๋™์—ญํ•™ ๊ณ„์ธต์œผ๋กœ ๋‚ด์žฅํ–ˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

  • ์กฐ์ธํŠธ ๋ฆฌ๋ฐ‹: barrier metric + repulsion acceleration* ์—๋„ˆ์ง€ ์•ˆ์ •์„ฑ: energization ๊ธฐ๋ฐ˜ ์•ˆ์ •์„ฑ* jerk/accel: QP ๊ธฐ๋ฐ˜ ์ œํ•œ(๋‹ซํžŒํ˜• ํ•ด) ์ด๋Ÿฐ ๊ฒƒ๋“ค์€ ๋ณด์ƒ์— โ€œ์กฐ์ธํŠธ ๋ฆฌ๋ฐ‹ ํŒจ๋„ํ‹ฐโ€, โ€œ๊ฐ€์†๋„ ํŒจ๋„ํ‹ฐโ€๋กœ ๋„ฃ๋Š” ๊ฒƒ๊ณผ ์งˆ์ ์œผ๋กœ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ํŒจ๋„ํ‹ฐ๋Š” ํ•™์Šต ์ค‘ ์œ„๋ฐ˜์„ โ€˜๋œ ํ•˜๊ฒŒโ€™ ๋งŒ๋“ค ๋ฟ์ด์ง€๋งŒ, ๋™์—ญํ•™ ๋‚ด์žฅ์€ ํ–‰๋™์ด ๊ทธ์ชฝ์œผ๋กœ ์ž˜ ๋ฒˆ์—ญ๋˜์ง€ ์•Š๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

4.2 ๊ฐ•์  2 โ€” action space ์žฌ์„ค๊ณ„์˜ ํŒŒ๊ดด๋ ฅ: โ€œํž˜์œผ๋กœ ๋ฐ€๊ณ , ๋‚˜๋จธ์ง€๋Š” ๊ธธ์ด ํ•ด๊ฒฐโ€

FGP์—์„œ policy action์„ fingertip-space force๋กœ ๋‘๋Š” ์„ค๊ณ„๋Š”, ์ œ์–ด ๊ด€์ ์—์„œ ๋งํ•˜๋ฉด:

  • RL์ด ํ•ด์•ผ ํ•  ์ผ: โ€œ์–ด๋””๋ฅผ ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ๋ฐ€๊นŒ?โ€
  • RL์ด ์•ˆ ํ•ด๋„ ๋˜๋Š” ์ผ: โ€œ์กฐ์ธํŠธ ๊ถค์ ์„ ์–ด๋–ป๊ฒŒ ๋งค๋ˆํ•˜๊ฒŒ ๋งŒ๋“ค๊นŒ?โ€, โ€œ๋ฆฌ๋ฐ‹์„ ์–ด๋–ป๊ฒŒ ํšŒํ”ผํ• ๊นŒ?โ€, โ€œ์ ‘์ด‰์„ ์–ด๋–ป๊ฒŒ ์œ ์ง€ํ• ๊นŒ?โ€

์ด ๋ถ„์—…์ด ์„ฑ๋Šฅ๋ฟ ์•„๋‹ˆ๋ผ โ€œ์ •์ฑ… ์ด์‹/์žฌ์‚ฌ์šฉโ€ ์ธก๋ฉด์—๋„ ์ž ์žฌ๋ ฅ์ด ํฝ๋‹ˆ๋‹ค. ๋™์ผํ•œ fabric ์œ„์—์„œ๋Š” ์ •์ฑ…์ด ๋น„๊ต์  ์ผ๊ด€๋œ ์˜๋ฏธ๋ฅผ ๊ฐ–๊ฒŒ ๋˜๋‹ˆ๊นŒ์š”(โ€œ์ด ํž˜์„ ์ฃผ๋ฉด ์ด๋ ‡๊ฒŒ ๋ฐ˜์‘ํ•œ๋‹คโ€).

4.3 ๊ฐ•์  3 โ€” sim2real ํŠœ๋‹ ๋…ธ๋ธŒ๊ฐ€ ๋ช…ํ™•ํ•ด์ง„๋‹ค(damping)

๋…ผ๋ฌธ์ด ๋ณด์—ฌ์ค€ damping knob๋Š” ์‹ค๋ฌด์ ์ธ โ€œ๋””๋ฒ„๊น… ๊ฐ€๋Šฅ์„ฑโ€์„ ํฌ๊ฒŒ ์˜ฌ๋ฆฝ๋‹ˆ๋‹ค.

  • ๋ฐฐํฌ ์‹œ damping์„ ์˜ฌ๋ฆฌ๋ฉด ์†๋„๋Š” ์ค„์ง€๋งŒ ๊ฒฝ๋กœ๋Š” ์œ ์ง€ โ†’ ์•ˆ์ •์ , ์ •๊ตํ•จ ์ฆ๊ฐ€ โ†’ sim2real ๊ฐœ์„  ๊ฐ€๋Šฅ ๊ธฐ์กด end-to-end RL์€ โ€œ์™œ ์‹ค๋กœ๋ด‡์—์„œ ํ”๋“ค๋ฆฌ๋Š”์ง€โ€๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์–ด๋ ค์šด๋ฐ, FGP๋Š” ์ตœ์†Œํ•œ โ€œ์†๋„๋ฅผ ์ค„์—ฌ ๋™์—ญํ•™/์ง€์—ฐ/๋ฏธ๋ชจ๋ธ ํšจ๊ณผ๋ฅผ ์™„ํ™”โ€ํ•˜๋Š” ์ถ•์ด ๋ถ„๋ช…ํ•ฉ๋‹ˆ๋‹ค.

4.4 ์•ฝ์  1 โ€” fabric ์„ค๊ณ„๋Š” ๊ฒฐ๊ตญ โ€™์‚ฌ๋žŒ์˜ ์ผโ€™์ด ๋œ๋‹ค (์„ค๊ณ„ ๋ณต์žก์„ฑ/์ด์‹์„ฑ)

๋…ผ๋ฌธ ๊ตฌํ˜„์€ ๋งค์šฐ ์˜๋ฆฌํ•˜์ง€๋งŒ, ๋™์‹œ์— ์งˆ๋ฌธ์„ ๋‚จ๊น๋‹ˆ๋‹ค:

  • fingertip attractor ๋ชฉํ‘œ๋ฅผ ์–ด๋–ป๊ฒŒ ๋‘˜ ๊ฒƒ์ธ๊ฐ€?
  • curled posture target์„ ์–ด๋–ป๊ฒŒ ์ •ํ•  ๊ฒƒ์ธ๊ฐ€?
  • barrier metric์˜ shape/๊ฒŒ์ธ์„ ์–ด๋–ป๊ฒŒ ํŠœ๋‹ํ•  ๊ฒƒ์ธ๊ฐ€?
  • energization/metric ํ•ฉ์„ฑ์€ ์–ด๋–ค ์›๋ฆฌ๋กœ ์•ˆ์ •์„ฑ์„ ์œ ์ง€ํ•˜๋Š”๊ฐ€?

์ฆ‰, ์ข‹์€ fabric์„ ๋งŒ๋“œ๋Š” ์ผ์ด ๋˜ ํ•˜๋‚˜์˜ โ€œ์ปจํŠธ๋กค ์—”์ง€๋‹ˆ์–ด๋งโ€์ด ๋ฉ๋‹ˆ๋‹ค. ์ด๊ฑด ์žฅ์ ์ด๊ธฐ๋„ ํ•˜์ง€๋งŒ(๊ตฌ์กฐ/ํ•ด์„ ๊ฐ€๋Šฅ), โ€œ๋„๋ฉ”์ธ ํ™•์žฅโ€์—์„  ๋น„์šฉ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋•Œ ์ฐธ๊ณ  ์—ฐ๊ตฌ(2309.07368)๊ฐ€ ์ œ์•ˆํ•˜๋Š” ๋ฐฉํ–ฅ์€ โ€œfabrics๋ฅผ prior๋กœ ๋ณด๊ณ , ์ด๋ฅผ ๋” ์ผ๋ฐ˜์ ์ด๊ณ  ์ ์šฉํ•˜๊ธฐ ์‰ฌ์šด ํ˜•ํƒœ๋กœ ์ด๋ก ํ™”โ€ํ•˜๋Š” ๊ฒƒ์ด๊ณ , ์ด ๋…ผ๋ฌธ์€ ๊ทธ ํ‹€์„ RL์— ๊ฐ•ํ•˜๊ฒŒ ์ ‘๋ชฉํ•ฉ๋‹ˆ๋‹ค.

4.5 ์•ฝ์  2 โ€” ๋ณด์žฅ ๋ฒ”์œ„์˜ ํ˜„์‹ค: โ€™policy driving forceโ€™๋Š” ์ด๋ก ์ ์œผ๋กœ๋Š” ๋ถˆ์•ˆ์ •ํ™” ๊ฐ€๋Šฅ

๋…ผ๋ฌธ์€ policy๊ฐ€ ์ฃผ๋Š” driving force๊ฐ€ ์›์น™์ ์œผ๋กœ๋Š” ์ธ๊ณต ๋™์—ญํ•™์„ destabilizeํ•  ์ˆ˜ ์žˆ์Œ์„ ์ธ์ •ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์‹ค๋ฌด์ ์œผ๋กœ ์ถฉ๋ถ„ํ•œ damping๊ณผ ์—๋„ˆ์ง€ ์บกํ•‘(์ด๋ก ์ ์œผ๋กœ ๊ฐ€๋Šฅํ•œ ๋ฐฉ๋ฒ•)์„ ํ†ตํ•ด ์•ˆ์ •์„ฑ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, โ€œ์™„์ „ํ•œ ์•ˆ์ „ ๋ณด์žฅโ€์„ ์ฃผ์žฅํ•˜๊ธฐ๋ณด๋‹ค๋Š”,

โ€œfabrics ์ž์ฒด๊ฐ€ ์•ˆ์ •์ ์ธ ๋ฏธ๋””์—„์ด๊ณ  policy๋Š” ๊ทธ ์œ„์—์„œ ์›€์ง์ด๋ฉฐ ์ ์ ˆํ•œ ์„ค๊ณ„(๊ฐ์‡ /์—๋„ˆ์ง€ ์บก ๋“ฑ)๋กœ ์œ„ํ—˜์„ ์ค„์ธ๋‹คโ€ ๋ผ๋Š” ํฌ์ง€์…˜์ด ๋” ์ •ํ™•ํ•ฉ๋‹ˆ๋‹ค.


5. ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต: RMP(1801) โ†”๏ธŽ Fabrics(2309) โ†”๏ธŽ FGP(2405)

5.1 RMP (arXiv:1801.02854): โ€œ๋ชจ์…˜ ์ •์ฑ… + ๋ฆฌ๋งŒ ๋ฉ”ํŠธ๋ฆญโ€์˜ ์กฐํ•ฉ/๋ณ€ํ™˜/ํ•ฉ์„ฑ

RMP๋Š” ๋‹ค์Œ์„ ํ•ต์‹ฌ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

  • ๊ฐ ์ž‘์—…๊ณต๊ฐ„(task space)์—์„œ ๊ฐ€์†๋„์žฅ(2์ฐจ ์ •์ฑ…)๊ณผ ๋ฉ”ํŠธ๋ฆญ(์ค‘์š” ๋ฐฉํ–ฅ/๊ฐ€์ค‘์น˜)์„ ์ •์˜ํ•˜๊ณ ,

  • ์ด๋ฅผ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ์ผ๊ด€๋˜๊ฒŒ(์ขŒํ‘œ ๋ณ€ํ™˜/ํ‘ธ์‹œํฌ์›Œ๋“œ/ํ’€๋ฐฑ) ํ•ฉ์„ฑํ•˜์—ฌ

  • ์ตœ์ข… configuration-space ์ •์ฑ…์„ ์–ป๋Š” ํ”„๋ ˆ์ž„์›Œํฌ ์ด ๋…ผ๋ฌธ(2405)์€ RMP๋ฅผ ์ง์ ‘ โ€œ์ฐธ๊ณ  ํ”„๋ ˆ์ž„์›Œํฌโ€๋กœ ์–ธ๊ธ‰ํ•˜๋ฉด์„œ, RMP๊ฐ€ fabrics๋ฅผ ํฌํ•จํ•  ์ˆ˜ ์žˆ๋Š” ๋” ๋„“์€ ๊ฐœ๋…์ž„์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค(ํ…์ŠคํŠธ์—์„œ RMP๊ฐ€ broadํ•˜๊ณ  fabrics๊ฐ€ special subclass๋ผ๋Š” ์ทจ์ง€). ์ฐจ์ด์ (์‹ค๋ฌด ๊ฐ๊ฐ)

  • RMP๋Š” โ€œ๋ชจ๋“ˆํ˜• ์ •์ฑ… ํ•ฉ์„ฑโ€์ด ์ค‘์‹ฌ์ด๊ณ ,

  • 2405๋Š” โ€œ๊ทธ ๊ตฌ์กฐ๋ฅผ RL ์ •์ฑ… ํ•™์Šต์˜ action space/์•ˆ์ „ ๋ฏธ๋””์—„์œผ๋กœ ์‚ฌ์šฉโ€ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋‘ก๋‹ˆ๋‹ค.

์ฆ‰, RMP๊ฐ€ โ€œ์ œ์–ด ๊ตฌ์กฐโ€๋ผ๋ฉด, FGP๋Š” โ€œRL ํ•™์Šต์˜ ๋ฐœํŒ(๋ฏธ๋””์—„)โ€์œผ๋กœ ํ•œ ๋‹จ๊ณ„ ๋” ๋‚ด๋ ค๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

5.2 Fabrics ์ด๋ก  (arXiv:2309.07368): โ€œ์•ˆ์ •ํ•œ ๋ฏธ๋””์—„(road network) ์œ„์—์„œ ์ •์ฑ…์ด ํ•ญํ•ดํ•œ๋‹คโ€

2309๋Š” fabrics๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •๋ฉด ์ •์˜ํ•ฉ๋‹ˆ๋‹ค:

  • fabric์€ ์—๋„ˆ์ง€๋ฅผ ๋ณด์กดํ•˜๋Š”(๋ณด์กด์ ) ์ž์œจ 2์ฐจ ๋ฏธ๋ถ„๋ฐฉ์ •์‹์œผ๋กœ์„œ,
  • ์ •์ฑ… ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๊ทผ๋ณธ์ ์œผ๋กœ ์•ˆ์ •์ ์ธ ๋ฏธ๋””์—„์„ ํ˜•์„ฑํ•˜๊ณ ,
  • geometric fabric์€ ์†๋„ ๋ถˆ๋ณ€ ๊ฒฝ๋กœ๋ง(road network)์„ ์ œ๊ณตํ•˜์—ฌ ์ •์ฑ…์€ โ€œ์†๋„ ์กฐ์ ˆ/๊ฒฝ๋กœ ์ „ํ™˜โ€๋งŒ ํ•˜๋ฉด ๋œ๋‹ค.

2405๋Š” ์ด ์ด๋ก ์„ โ€œRL์—์„œ ์‹ค์ œ๋กœ ์„ฑ๋Šฅ์„ ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ์•„ํ‚คํ…์ฒ˜โ€๋กœ ๊ตฌํ˜„ํ–ˆ๊ณ ,

  • ์•ˆ์ •/์ œ์•ฝ/๊ฐ์‡ ๋ฅผ fabric์— ๋‚ด์žฅํ•˜๊ณ 
  • ์ •์ฑ…์€ force-like action์œผ๋กœ ๊ทธ ๋ฏธ๋””์—„์„ โ€œํ•ญํ•ดโ€ํ•˜๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.

5.3 ์ด ๋…ผ๋ฌธ(2405)์˜ ๊ณ ์œ  ๊ธฐ์—ฌ๋ฅผ ํ•œ ๋ฌธ์žฅ์œผ๋กœ

Fabrics์˜ ์•ˆ์ •ํ•œ ๊ธฐํ•˜ํ•™์  ๋ฏธ๋””์—„์„, RL ์ •์ฑ…์˜ action space ์žฌ์„ค๊ณ„์™€ ํ•˜๋“œ์›จ์–ด ์ œ์•ฝ(๊ฐ€์†๋„/์ €ํฌ)๊นŒ์ง€ ํฌํ•จํ•œ โ€œํ•™์Šต์šฉ ์ œ์–ด ์Šคํƒโ€์œผ๋กœ ๊ตฌ์ฒดํ™”ํ•˜์—ฌ, ์‹ค๋กœ๋ด‡ ์„ฑ๋Šฅ๊ณผ ํ•˜๋“œ์›จ์–ด ์นœํ™”์„ฑ(๊ณ ์ฃผํŒŒ ์–ต์ œ)์„ ๋™์‹œ์— ๋ณด์—ฌ์คฌ๋‹ค.

6. ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก : ๋กœ๋ด‡ RL ์ œ์–ด์—์„œ โ€œ์•ˆ์ „โ€์„ ๋‹ค๋ฃจ๋Š” ๋” ํ˜„์‹ค์ ์ธ ๋ฐฉ๋ฒ•

์ด ๋…ผ๋ฌธ์ด ๋˜์ง€๋Š” ๋ฉ”์‹œ์ง€๋Š” ๊ฝค ์‹ค๋ฌด์ ์ž…๋‹ˆ๋‹ค.

  • RL ์ •์ฑ…์ด bang-bang๋กœ ๋‚˜์˜ค์ง€ ์•Š๊ฒŒ โ€œ์ •๊ทœํ™”/ํ•„ํ„ฐ๋งโ€์œผ๋กœ ๋ˆŒ๋Ÿฌ ๋‹ด๋Š” ๋ฐฉ์‹์€ ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.* ๋Œ€์‹ , ์ •์ฑ…์ด ์˜ฌ๋ผํƒ€๋Š” ๋ฐ”๋‹ฅ(=๋™์—ญํ•™)์„ ๋ฐ”๊พธ๋ฉด,

    • policy๋Š” ๋” ๋‹จ์ˆœํ•œ ์„ ํƒ๋งŒ ํ•™์Šตํ•˜๊ณ 
    • ์•ˆ์ „/์ œ์•ฝ/๋ถ€๋“œ๋Ÿฌ์›€์€ ๊ตฌ์กฐ์ ์œผ๋กœ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋‹ค.* geometric fabrics๋Š” ๊ทธ ๋ฐ”๋‹ฅ์„ โ€œ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ์ง๊ด€์ ์ธ road networkโ€๋กœ ์ œ๊ณตํ•˜๊ณ , ์ •์ฑ…์€ ์ด๋ฅผ ํ•ญํ•ดํ•œ๋‹ค.

(๋ถ€๋ก) ํ•œ ์žฅ ์š”์•ฝ ํ‘œ

ํ•ญ๋ชฉ ๊ธฐ์กด ํ”ํ•œ RL ์ œ์–ด Fabric-Guided Policy (์ด ๋…ผ๋ฌธ)
์ •์ฑ… ์ถœ๋ ฅ ์กฐ์ธํŠธ ๋ชฉํ‘œ/ํ† ํฌ(์ง์ ‘) fingertip-space force-like driving โ†’ fabric์— ์ฃผ์ž…
์•ˆ์ „/์ œ์•ฝ reward penalty/ํ•„ํ„ฐ/ํด๋ฆฌํ•‘ barrier metric/repulsion/energization/QP ๋“ฑ ๊ตฌ์กฐ ๋‚ด์žฅ
๊ณ ์ฃผํŒŒ ๋…ธ์ด์ฆˆ ํ•„ํ„ฐ๋กœ ์–ต์ œํ•ด๋„ ์ž”์กด ๊ฐ€๋Šฅ 5Hz ์ด์ƒ ๊ฑฐ์˜ 0์— ๊ฐ€๊นŒ์šด ์ŠคํŽ™ํŠธ๋Ÿผ(์‹คํ—˜)
sim2real ํŠœ๋‹ ๋ถˆํˆฌ๋ช…(์›์ธ ํŒŒ์•… ์–ด๋ ค์›€) geometric damping์œผ๋กœ ์†๋„๋งŒ ์กฐ์ ˆ(๊ฒฝ๋กœ ์œ ์ง€)
์ฒ ํ•™ โ€œ์ •์ฑ…์ด ๋‹ค ๋ฐฐ์šด๋‹คโ€ โ€œ์ •์ฑ…์€ ํ•ญํ•ด๋งŒ, ๋ฐ”๋‹ฅ์ด ๊ธธ์„ ๋งŒ๋“ ๋‹คโ€

์ฐธ๊ณ ๋ฌธํ—Œ

  • Geometric Fabrics: a Safe Guiding Medium for Policy Learning (arXiv:2405.02250)
  • Fabrics: A Foundationally Stable Medium for Encoding Prior Experience (arXiv:2309.07368)
  • Riemannian Motion Policies (arXiv:1801.02854)

Copyright 2026, JungYeon Lee