Curieux.JY
  • JungYeon Lee
  • Post
  • Note

On this page

  • Brief Review
  • Detail Review
    • CTR ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ
      • 1. ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ์ ‘์ด‰ ๋™์—ญํ•™ ๋ชจ๋ธ
      • 2. ์ƒํƒœ ๋ฐ ์ ‘์ด‰๋ ฅ์˜ ์„ ํ˜•ํ™”
      • 3. ์ ‘์ด‰ ๊ฐ€๋Šฅ์„ฑ ์ œ์•ฝ(Contact Feasibility Constraints)
      • 4. ์ ‘์ด‰ ์‹ ๋ขฐ ์˜์—ญ์˜ ์ˆ˜ํ•™์  ์ •์˜
      • 5. ๋ณ€ํ˜•: A-CTR, R-CTR
    • CTR ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ์˜ˆ์ธก ์ œ์–ด(MPC) ํ†ตํ•ฉ
      • 1. ์ ‘์ด‰ ์•”์‹œ์ (contact-implicit) MPC
      • 2. ๋ฐ˜๋ณต ์ตœ์ ํ™” ๋ฐ ํ”ผ๋“œ๋ฐฑ
      • 3. ๋ชจ๋“œ ์ „์ด ์—†์ด ์ ‘์ด‰ ์ฒ˜๋ฆฌ
      • 4. ๊ณ„์‚ฐ ํšจ์œจ์„ฑ
      • 5. ์˜ˆ์‹œ ์ž‘์—… ๋ฐ ๊ฒฐ๊ณผ
      • 6. ์ „์—ญ ๊ณ„ํš๊ณผ์˜ ํ†ตํ•ฉ
    • DeXtreme: ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ํ๋ธŒ ํšŒ์ „ ์ œ์–ด
      • 1. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜ ํ•™์Šต
      • 2. ์ •์ฑ… ๊ตฌ์กฐ
      • 3. ๋„๋ฉ”์ธ ๋žœ๋คํ™”(Domain Randomization)
      • 4. ํ•™์Šต ๋น„์šฉ ๋ฐ ๊ณ„์‚ฐ ์ž์›
      • 5. ์‹คํ–‰ ๋ฐ ์‹ค์ œ ๋กœ๋ด‡ ์ ์šฉ
      • 6. ์ผ๋ฐ˜ํ™” ๋ฐ ๊ฐ•๊ฑด์„ฑ
      • 7. ์ •์ฑ…์˜ ํ•œ๊ณ„
    • CTR vs DeXtreme: ๋‘ ์ ‘๊ทผ ๋ฐฉ์‹์˜ ๋น„๊ต ๋ถ„์„
      • 1. ์ ‘์ด‰ ์ฒ˜๋ฆฌ ๋ฐฉ์‹
      • 2. ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ๊ณผ ๊ณ„์‚ฐ ์ž์›
      • 3. ์ผ๋ฐ˜ํ™”์™€ ์ ์‘์„ฑ
      • 4. ์ •์ฑ… ๊ตฌ์กฐ์™€ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ
      • ์š”์•ฝ
    • ๊ฒฐ๋ก  ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ
      • 1. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ „๋žต์˜ ๊ฐ€๋Šฅ์„ฑ
      • 2. ์‹ค์‹œ๊ฐ„์„ฑ ํ–ฅ์ƒ
      • 3. ๋ณด๋‹ค ๋ณต์žกํ•œ ์กฐ์ž‘ ์ž‘์—… ํ™•์žฅ
    • ๋งˆ๋ฌด๋ฆฌ
  • ๐Ÿ”” Ring Review
    • ์™œ ์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ๊ฐ€?
    • ๋ฌธ์ œ ์ •์˜: ์ ‘์ด‰ ๋™์—ญํ•™์˜ ๊ตญ์†Œ ๊ทผ์‚ฌ ๋ฌธ์ œ
      • ์ ‘์ด‰์˜ ๋น„์—ฐ์†์„ฑ๊ณผ Taylor ๊ทผ์‚ฌ์˜ ํ•จ์ •
      • ํ•ต์‹ฌ ๋ฌธ์ œ: ETR์€ ์ ‘์ด‰์˜ ๋‹จ๋ฐฉํ–ฅ์„ฑ๊ณผ ๋ชจ์ˆœ๋œ๋‹ค
    • ๋ฐฉ๋ฒ•๋ก : Contact Trust Region (CTR)
      • ๊ธฐ๋ฐ˜ ๋ชจ๋ธ: CQDC (Convex Quasidynamic Differentiable Contact)
      • Smoothing: ๋ถˆ์—ฐ์†์„ฑ์„ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ๋งŒ๋“ค๊ธฐ
      • Sensitivity Analysis: Primal๊ณผ Dual ๋ชจ๋‘์—์„œ ๊ธฐ์šธ๊ธฐ ์ถ”์ถœ
      • CTR ์ •์˜: ์ ‘์ด‰์˜ ๋ฌผ๋ฆฌ์  ์ œ์•ฝ์„ ์‹ ๋ขฐ ์˜์—ญ์— ํ†ตํ•ฉํ•˜๊ธฐ
      • CTR์ด ์™„์ „ํ•œ ์ ‘์ด‰ ๋™์—ญํ•™ ์ œ์•ฝ๊ณผ ๊ตญ์†Œ์ ์œผ๋กœ ๋™์น˜์ž„์„ ์ฆ๋ช…
      • Action-only CTR๊ณผ Motion Set
      • Relaxed CTR (R-CTR): ์‹ค์šฉ์„ฑ์„ ์œ„ํ•œ ์™„ํ™”
    • ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜: CTR ๊ธฐ๋ฐ˜ MPC์™€ ๊ธ€๋กœ๋ฒŒ ํ”Œ๋ž˜๋„ˆ
      • Contribution 2: CTR-MPC (Local Contact-Implicit MPC)
      • Contribution 3: ๊ธ€๋กœ๋ฒŒ ํ”Œ๋ž˜๋„ˆ โ€” ๋กœ๋“œ๋งต
    • ์‹คํ—˜: ๋‘ ์‹œ์Šคํ…œ์—์„œ์˜ ๊ฒ€์ฆ
      • ์‹คํ—˜ ์‹œ์Šคํ…œ
      • ์‹คํ—˜ 1: ๋กœ์ปฌ MPC ์„ฑ๋Šฅ (Section 5)
      • ์‹คํ—˜ 2: 2์ฐจ ๋™์—ญํ•™ ํ•˜์—์„œ์˜ ์•ˆ์ •ํ™” (Section 6)
      • ์‹คํ—˜ 3: ๊ธ€๋กœ๋ฒŒ ํ”Œ๋ž˜๋‹ ์„ฑ๋Šฅ (Section 8)
    • ์ด๋ก ์  ํ†ต์ฐฐ: CTR๊ณผ ๊ณ ์ „ ์ด๋ก ์˜ ์—ฐ๊ฒฐ
      • Wrench Set๊ณผ์˜ ์—ฐ๊ฒฐ
      • KKT ์กฐ๊ฑด๊ณผ Dual Gradient์˜ ์˜๋ฏธ
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
      • ๊ฐ•์ 
      • ์•ฝ์ ๊ณผ ํ•œ๊ณ„
    • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
      • ๋ฏธ๋ถ„๊ฐ€๋Šฅ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•
      • RL ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•
      • MPPI (Sampling-based MPC)
      • HiDex (Cheng et al. 2023)
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 
      • ์ด ๋…ผ๋ฌธ์ด ๋กœ๋ด‡๊ณตํ•™ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์ฃผ๋Š” ํ•จ์˜
    • ์ฐธ๊ณ  ์ž๋ฃŒ

๐Ÿ“ƒContact Trust Region ๋ฆฌ๋ทฐ(feat.Dextreme)

mpc
rl
dexterous
contact
trust-region
contact-rich
dexterity
Dexterous Contact-Rich Manipulation via the Contact Trust Region
Published

June 13, 2025

CTR vs DeXtreme: ๋Šฅ์ˆ™ํ•œ ์ ‘์ด‰ ์กฐ์ž‘์„ ํ–ฅํ•œ ๋‘ ๊ฐˆ๋ž˜ ๊ธธ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ ‘์ด‰ ๊ณ„ํš(MPC-CTR)๊ณผ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์กฐ์ž‘(DeXtreme)์˜ ์ˆ˜ํ•™์  ์›๋ฆฌ์™€ ๊ตฌ์กฐ๋ฅผ ๊นŠ์ด ๋ถ„์„ํ•˜๊ณ , ๋‘ ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค์–‘ํ•œ ๊ด€์ ์—์„œ ๋น„๊ต

  • Paper Link
  • Project Homepage
  • Code Link
  1. ์ด ๋…ผ๋ฌธ์€ ์ „ํ†ต์ ์ธ ํƒ€์›ํ˜• ์‹ ๋ขฐ ์˜์—ญ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ํŽธ์ธก ์ ‘์ด‰ ์—ญํ•™์„ ๊ณ ๋ คํ•˜๋Š” Contact Trust Region (CTR)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿค– CTR์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์ €์ž๋“ค์€ ํšจ์œจ์ ์ธ ๋กœ์ปฌ Model Predictive Control (MPC) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•˜๊ณ , ์ด๋ฅผ ์ดˆ๊ธฐ ์ถ”์ • ํœด๋ฆฌ์Šคํ‹ฑ ๋ฐ ๋นˆ๋ฒˆํ•œ ์žฌ๊ณ„ํš๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ณต์žกํ•œ ์ ‘์ด‰ ์กฐ์ž‘ ์ž‘์—…์— ๋Œ€ํ•œ ์•ˆ์ •ํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿ—บ๏ธ ์ œ์•ˆ๋œ CTR ๊ธฐ๋ฐ˜ ๋กœ์ปฌ MPC๋Š” ๋กœ๋“œ๋งต ํ”„๋ ˆ์ž„์›Œํฌ์— ํ†ตํ•ฉ๋˜์–ด ์ „์—ญ ๊ณ„ํš์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ๊ธฐ์กด ๋ฐฉ์‹๋ณด๋‹ค ํ›จ์”ฌ ์ ์€ ๊ณ„์‚ฐ ์‹œ๊ฐ„์œผ๋กœ ์–‘ํŒ” ๋กœ๋ด‡ ๋ฐ Allegro hand์™€ ๊ฐ™์€ ๋ณต์žกํ•œ ์‹œ์Šคํ…œ์—์„œ ๋Šฅ์ˆ™ํ•œ ์กฐ์ž‘์„ ์‹œ์—ฐํ•ฉ๋‹ˆ๋‹ค.

Brief Review

๋ณธ ๋…ผ๋ฌธ โ€œDexterous Contact-Rich Manipulation via the Contact Trust Regionโ€์€ ๋กœ๋ด‡์˜ ๋Šฅ์ˆ™ํ•˜๊ณ  ์ ‘์ด‰์ด ๋งŽ์€ ์กฐ์ž‘(dexterous contact-rich manipulation)์„ ์œ„ํ•œ ํšจ์œจ์ ์ธ ์ง€์—ญ์  ๋™์—ญํ•™ ๋ชจ๋ธ๊ณผ ๊ทธ ์‹ ๋ขฐ ์˜์—ญ(trust region)์„ ์ •์˜ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ๊ธฐ์กด์˜ ๋งŽ์€ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋™์—ญํ•™์˜ Taylor ๊ทผ์‚ฌ์™€ ํƒ€์›ํ˜• trust region์— ์˜์กดํ•˜์ง€๋งŒ, ๋ณธ ๋…ผ๋ฌธ์€ ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์ด ์ ‘์ด‰์˜ ๋น„๋Œ€์นญ์„ฑ(unilateral nature)๊ณผ ๊ทผ๋ณธ์ ์œผ๋กœ ์ผ๊ด€๋˜์ง€ ์•Š๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋ณธ ๋…ผ๋ฌธ์€ ์ ‘์ด‰์˜ ๋น„๋Œ€์นญ์„ฑ์„ ํฌ์ฐฉํ•˜๋ฉด์„œ๋„ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ์œ ์ง€ํ•˜๋Š” Contact Trust Region(CTR)์„ ์ œ์•ˆํ•œ๋‹ค. CTR์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ๋จผ์ € ์ง€์—ญ์ ์ธ ์ ‘์ด‰์ด ๋งŽ์€ ๊ณ„ํš์„ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” Model-Predictive Control(MPC) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•œ๋‹ค. ๊ทธ ํ›„, ์ด ๊ธฐ๋Šฅ์„ ํ™•์žฅํ•˜์—ฌ ์ง€์—ญ MPC ๊ณ„ํš๋“ค์„ ์—ฐ๊ฒฐํ•จ์œผ๋กœ์จ ์ „์—ญ์ ์œผ๋กœ ๊ณ„ํšํ•˜๊ณ  ํšจ์œจ์ ์ด๋ฉฐ ๋Šฅ์ˆ™ํ•œ ์ ‘์ด‰์ด ๋งŽ์€ ์กฐ์ž‘์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์˜ ์ฃผ์š” ๊ธฐ์—ฌ๋Š” ์„ธ ๊ฐ€์ง€์ด๋‹ค. ์ฒซ์งธ, ์ ‘์ด‰ ์—ญํ•™์„ ํšจ์œจ์ ์œผ๋กœ ๊ทผ์‚ฌํ•˜๋Š” Contact Trust Region(CTR)์ด๋‹ค. ๋‘˜์งธ, ์ง€์—ญ์ ์ธ ์ ‘์ด‰์ด ๋งŽ์€ ์กฐ์ž‘์— ํŠนํ™”๋œ ๋งค์šฐ ํšจ์œจ์ ์ธ ๊ธฐ์šธ๊ธฐ ๊ธฐ๋ฐ˜ MPC ์ปจํŠธ๋กค๋Ÿฌ์ด๋‹ค. ์…‹์งธ, ์ง€์—ญ ๊ถค์ ๋“ค์„ ์—ฐ๊ฒฐํ•˜๋Š” ์ „์—ญ ํ”Œ๋ž˜๋„ˆ์ด๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก : Contact Trust Region (CTR)

๋ณธ ๋…ผ๋ฌธ์€ ์ ‘์ด‰ ๋™์—ญํ•™์„ Convex Quasidynamic Differentiable Contact(CQDC) ๋ชจ๋ธ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. ์ด๋Š” ์ ‘์ด‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ๋‹ค์Œ ํ˜•ํƒœ์˜ Second-Order Cone Program(SOCP)์œผ๋กœ ์ •์‹ํ™”ํ•œ๋‹ค: \begin{aligned} \min_{q_+} & \quad \frac{1}{2} q_+^\top P(q)q_+ + b(q, u)^\top q_+, \\ \text{subject to} & \quad J_i(q)q_+ + c_i(q) \in K_i, \quad \forall i \in I_c. \end{aligned} ์—ฌ๊ธฐ์„œ q๋Š” ์‹œ์Šคํ…œ ์„ค์ •(configuration), u๋Š” ๋กœ๋ด‡์˜ ์ œ์–ด ์ž…๋ ฅ(actuated configuration command), P, b, J_i, c_i๋Š” q, u์— ์˜์กดํ•˜๋Š” ํ–‰๋ ฌ/๋ฒกํ„ฐ, I_c๋Š” ์ ‘์ด‰ ์Œ ์ธ๋ฑ์Šค ์ง‘ํ•ฉ, K_i๋Š” ๊ฐ€๋Šฅํ•œ ์†๋„(velocity)์˜ feasible cone์ด๋‹ค. ์ด SOCP์˜ KKT ์กฐ๊ฑด์€ ์ค€๋™์ (quasi-dynamic) ์šด๋™ ๋ฐฉ์ •์‹, ๋น„๊ด€ํ†ต(non-penetration), ๋งˆ์ฐฐ ์›๋ฟ”(friction cone), ์ƒ๋ณด์„ฑ(complementarity) ์ œ์•ฝ์„ ๋งŒ์กฑํ•œ๋‹ค.

์ด ๋ชจ๋ธ์˜ ์ง์ ‘์ ์ธ ๋ฏธ๋ถ„์€ ์ ‘์ด‰ ๋ชจ๋“œ ์ „ํ™˜์œผ๋กœ ์ธํ•ด ๊ธฐ์šธ๊ธฐ๊ฐ€ ๋ถˆ์—ฐ์†์ ์ด๋‹ค. ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์€ ๋กœ๊ทธ ๋ฐฐ๋ฆฌ์–ด(log-barrier) ์Šค๋ฌด๋”ฉ์„ ์ ์šฉํ•œ ์™„ํ™”๋œ ๋™์—ญํ•™ f_\kappa(q,u)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ์™„ํ™”๋œ ๋™์—ญํ•™์€ ์Šค๋ฌด๋”ฉ ํŒŒ๋ผ๋ฏธํ„ฐ \kappa์— ์˜์กดํ•˜๋ฉฐ, ์ ‘์ด‰์ด ์—†๋Š” ๊ฐ์ฒด ์‚ฌ์ด์—๋„ ํž˜์„ ๋ฐœ์ƒ์‹œํ‚จ๋‹ค. ์Šค๋ฌด๋”ฉ๋œ ๋™์—ญํ•™์˜ ๊ธฐ์šธ๊ธฐ๋Š” ๋ฏผ๊ฐ๋„ ๋ถ„์„(sensitivity analysis)์„ ํ†ตํ•ด ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์€ smoothed dynamics์˜ Taylor ๊ทผ์‚ฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์Œ ์ƒํƒœ \hat{q}_+์™€ ์ ‘์ด‰๋ ฅ \hat{\lambda}_{+,i}์— ๋Œ€ํ•œ ์„ ํ˜• ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•œ๋‹ค: \begin{aligned} \hat{q}_+ &= A_\kappa \delta q + B_\kappa \delta u + f_\kappa(\bar{q}, \bar{u}), \\ \hat{\lambda}_{+,i} &= C_{\kappa,i} \delta q + D_{\kappa,i} \delta u + \lambda_{\kappa,i}(\bar{q}, \bar{u}). \end{aligned} ์—ฌ๊ธฐ์„œ (\bar{q}, \bar{u})๋Š” ํ˜„์žฌ nominal point์ด๊ณ  (\delta q, \delta u)๋Š” perturbation์ด๋‹ค.

Ellipsoidal Trust Region (ETR)์€ (\delta q, \delta u)์— ๋Œ€ํ•ด \delta z^\top \Sigma \delta z \leq 1 ํ˜•ํƒœ์˜ ์ œ์•ฝ์„ ๊ฐ€ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Š” ์ ‘์ด‰์˜ ๋น„๋Œ€์นญ์„ฑ์„ ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•œ๋‹ค.

Contact Trust Region (CTR)์€ ETR ์ œ์•ฝ์— ๋”ํ•˜์—ฌ, ์œ„ ์„ ํ˜• ๋ชจ๋ธ๋กœ ์˜ˆ์ธก๋œ ๋‹ค์Œ ์ƒํƒœ \hat{q}_+์™€ ์ ‘์ด‰๋ ฅ \hat{\lambda}_{+,i}๊ฐ€ ์›๋ž˜ ๋น„์™„ํ™”๋œ SOCP ๋™์—ญํ•™์˜ primal ๋ฐ dual feasibility constraint๋ฅผ ๋งŒ์กฑํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ œ์•ฝ์„ ์ถ”๊ฐ€ํ•œ๋‹ค: \begin{aligned} J_i \hat{q}_+ + c_i &\in K_i, \\ \hat{\lambda}_{+,i} &\in K_i^*. \end{aligned} ์ด๋Ÿฌํ•œ ์ œ์•ฝ์€ ์„ ํ˜•ํ™”๋œ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ๋ถ€๊ณผ๋˜๋ฏ€๋กœ, CTR์€ ์—ฌ์ „ํžˆ ๋ณผ๋ก ์ง‘ํ•ฉ(convex set)์ด๋‹ค (๊ตฌ์ฒด์ ์œผ๋กœ, ์—ฌ๋Ÿฌ ๊ฐœ์˜ second-order cone constraints์˜ ๊ต์ง‘ํ•ฉ). Example 1๊ณผ 2๋ฅผ ํ†ตํ•ด, primal feasibility ์ œ์•ฝ(J_i \hat{q}_+ + c_i \in K_i)์ด ๋•Œ๋•Œ๋กœ ์‹ค์ œ ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ ์˜์—ญ๋ณด๋‹ค trust region์„ ์ง€๋‚˜์น˜๊ฒŒ ๋ณด์ˆ˜์ ์œผ๋กœ ์ œํ•œํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.

๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์€ primal feasibility ์ œ์•ฝ์„ ์™„ํ™”ํ•œ Relaxed Contact Trust Region (R-CTR)์„ ์ œ์•ˆํ•œ๋‹ค. R-CTR์€ ETR ์ œ์•ฝ๊ณผ dual feasibility ์ œ์•ฝ(\hat{\lambda}_{+,i} \in K_i^*)๋งŒ์„ ํฌํ•จํ•œ๋‹ค. Example 3์€ R-CTR์„ ์‚ฌ์šฉํ•œ Motion Set(์„ ํ˜•ํ™”๋œ primal solution map์— ์˜ํ•œ RA-CTR์˜ ์ด๋ฏธ์ง€)์ด ๊ฐ์ฒด ์›€์ง์ž„์˜ ์ง€์—ญ์  ๋„๋‹ฌ ๊ฐ€๋Šฅ์„ฑ์„ ๋” ์ž˜ ํฌ์ฐฉํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋˜ํ•œ, RA-CTR๊ณผ ๊ทธ์— ๋”ฐ๋ฅธ Wrench Set, Motion Set ๊ฐœ๋…์€ ๊ณ ์ „์ ์ธ ์ ‘์ด‰ ์—ญํ•™ ๊ฐœ๋…๊ณผ ์—ฐ๊ฒฐ๋  ์ˆ˜ ์žˆ์Œ์„ ์ด๋ก ์ ์œผ๋กœ ๋ณด์ธ๋‹ค (Lemma 2).

์ง€์—ญ ๊ณ„ํš ๋ฐ ์ œ์–ด (Local Planning and Control)

์ œ์•ˆ๋œ R-CTR์€ ์ง€์—ญ ๊ถค์  ์ตœ์ ํ™”(trajectory optimization) ๋ฐ MPC์— ํ™œ์šฉ๋œ๋‹ค. Algorithm 1์€ R-CTR ์ œ์•ฝ์„ ํฌํ•จํ•˜๋Š” SOCP subproblem์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜์—ฌ nominal trajectory๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ smoothed dynamics์˜ ์„ ํ˜• ๊ทผ์‚ฌ๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, R-CTR์„ ํ†ตํ•ด ์ง€์—ญ์ ์œผ๋กœ ์œ ํšจํ•œ ์˜์—ญ ๋‚ด์—์„œ ๊ณ„ํš์ด ์ด๋ฃจ์–ด์ง€๋„๋ก ํ•œ๋‹ค. ํŠนํžˆ, ์ ‘์ด‰์ด ์—†๋Š” ์ดˆ๊ธฐ ์ƒํƒœ์—์„œ ์‹œ์ž‘ํ•  ๊ฒฝ์šฐ, ๋กœ๋ด‡์ด ๊ฐ์ฒด์— ์ ‘์ด‰ํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ์ดˆ๊ธฐ ์ถ”์ธก ํœด๋ฆฌ์Šคํ‹ฑ์„ ์ ์šฉํ•˜์—ฌ ๊ณ„ํš์˜ ํšจ์œจ์„ฑ์„ ๋†’์ธ๋‹ค. Example 4์™€ 5๋Š” ์ด ๋ฐฉ๋ฒ•์ด ์ ‘์ด‰ ๋ชจ๋“œ ์ „ํ™˜์„ ํƒ์ƒ‰ํ•˜๊ณ  ๊ณ„ํš์— ์œ ๋ฆฌํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ๋‚˜์•„๊ฐ€๋Š” ๊ณผ์ •์„ ๋ณด์—ฌ์ค€๋‹ค.

Algorithm 2๋Š” Algorithm 1์„ MPC ํ”„๋ ˆ์ž„์›Œํฌ์— ์ ์šฉํ•œ ๊ฒƒ์ด๋‹ค. ํ˜„์žฌ ์ƒํƒœ์—์„œ ๋ฏธ๋ž˜ ์ƒํƒœ๊นŒ์ง€์˜ ๊ถค์ ์„ ๊ณ„ํšํ•˜๊ณ , ๊ณ„ํš๋œ ์ฒซ ๋ฒˆ์งธ ์ œ์–ด ์ž…๋ ฅ์„ ์‹ค์ œ ์‹œ์Šคํ…œ์— ์ ์šฉํ•œ ํ›„, ๋‹ค์Œ ์ƒํƒœ๋ฅผ ๊ด€์ฐฐํ•˜์—ฌ ๋‹ค์‹œ ๊ณ„ํš์„ ์ˆ˜ํ–‰ํ•œ๋‹ค (re-planning).

์‹คํ—˜ ๊ฒฐ๊ณผ (Experiments)

๋ณธ ๋…ผ๋ฌธ์€ IiwaBimanual (planar, 29 collision geometries) ๋ฐ AllegroHand (3D in-hand, 39 collision geometries) ๋‘ ๊ฐ€์ง€ ์ ‘์ด‰์ด ๋งŽ์€ ๋กœ๋ด‡ ์‹œ์Šคํ…œ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ํฌ๊ด„์ ์œผ๋กœ ํ‰๊ฐ€ํ•œ๋‹ค.

  1. CQDC ๋™์—ญํ•™์—์„œ์˜ ์ง€์—ญ MPC ์„ฑ๋Šฅ (Section 5):
    • R-CTR, CTR, ETR์„ ์‚ฌ์šฉํ•˜๋Š” MPC์˜ ๋ชฉํ‘œ ๋„๋‹ฌ ์„ฑ๋Šฅ(์ตœ์ข… ๊ฐ์ฒด ์œ„์น˜/ํšŒ์ „ ์˜ค๋ฅ˜) ๋น„๊ต.
    • ์ƒ์„ฑ๋œ ๋ชฉํ‘œ๋Š” ์ง€์—ญ์ ์œผ๋กœ ๋„๋‹ฌ ๊ฐ€๋Šฅํ•˜๋‚˜ MPC์— ๋„์ „์ ์ธ ๋ชฉํ‘œ๋“ค์ด๋‹ค (Figure 9).
    • ๊ฒฐ๊ณผ(Figure 9, Table 2): R-CTR์ด ๋‘ ์‹œ์Šคํ…œ ๋ชจ๋‘์—์„œ ํ‰๊ท  ์˜ค๋ฅ˜ ๋ฐ ๋ถ„์‚ฐ ์ธก๋ฉด์—์„œ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ํŠนํžˆ IiwaBimanual์—์„œ CTR ๋ฐ ETR๋ณด๋‹ค ์œ ์˜๋ฏธํ•˜๊ฒŒ ์šฐ์ˆ˜ํ–ˆ๋‹ค. AllegroHand์—์„œ๋Š” ์ฐจ์ด๊ฐ€ ๋น„๊ต์  ์ž‘์•˜๋Š”๋ฐ, ์ด๋Š” ์‹œ์Šคํ…œ ํŠน์„ฑ์ƒ bilateral contact regime์ด ๋” ์ž์ฃผ ํ™œ์„ฑํ™”๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์œผ๋กœ ์ถ”์ธก๋œ๋‹ค.
    • Trust region radius(r)์™€ MPC rollout horizon(H)์— ๋Œ€ํ•œ ์‹คํ—˜(Figure 10): ์ ์ ˆํ•œ r๊ณผ H์—์„œ ์„ฑ๋Šฅ์ด ์ตœ์ ํ™”๋˜๋ฉฐ, ๋„ˆ๋ฌด ์ž‘์€ r์€ ๋„๋‹ฌ ๊ฐ€๋Šฅ์„ฑ์„ ์ œํ•œํ•˜๊ณ  ๋„ˆ๋ฌด ํฐ r์€ ์„ ํ˜• ๊ทผ์‚ฌ์˜ ๋ถ€์ •ํ™•์„ฑ์œผ๋กœ ์ธํ•ด ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์•ผ๊ธฐํ•œ๋‹ค.
  2. 2์ฐจ ๋™์—ญํ•™ ํ•˜์—์„œ์˜ ์•ˆ์ •ํ™” ์„ฑ๋Šฅ (Section 6):
    • CQDC ๋™์—ญํ•™ ๋ชจ๋ธ๊ณผ ์‹ค์ œ ๋ฌผ๋ฆฌ(Drake ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ํ•˜๋“œ์›จ์–ด) ๊ฐ„์˜ ์ฐจ์ด(ํŠนํžˆ hydroplaning)๋ฅผ ๊ณ ๋ คํ•œ ์•ˆ์ •ํ™” ์„ฑ๋Šฅ ํ‰๊ฐ€.
    • Algorithm 3์„ ์ œ์•ˆ: MPC ๊ณ„ํš์„ ์—ฌ๋Ÿฌ ๋ฌผ๋ฆฌ ์Šคํ…์— ๊ฑธ์ณ ์‹คํ–‰ํ•˜๊ณ , ์žฌ๊ณ„ํš ์‹œ ํ˜„์žฌ ๋กœ๋ด‡ ์ƒํƒœ์— ๋Œ€ํ•ด ์ดˆ๊ธฐ ์ถ”์ธก ํœด๋ฆฌ์Šคํ‹ฑ์„ ๋‹ค์‹œ ์ ์šฉํ•˜์—ฌ ์ ‘์ด‰ ์œ ์ง€๋ฅผ ๊ฐ•ํ™” (MPCProj).
    • Open-loop, No Heuristics, Closed-loop ์„ธ ๊ฐ€์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ณ€ํ˜• ํ‰๊ฐ€.
    • ๊ฒฐ๊ณผ(Figure 11, Table 4):
      • Closed-loop MPC๋Š” Open-loop๋ณด๋‹ค ํ›จ์”ฌ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ์ ‘์ด‰ ๋™์—ญํ•™ ๋ชจ๋ธ์˜ ๋ถ€์ •ํ™•์„ฑ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ํ”ผ๋“œ๋ฐฑ์ด ์ค‘์š”ํ•จ์„ ์‹œ์‚ฌํ•œ๋‹ค.
      • ์ดˆ๊ธฐ ์ถ”์ธก ํœด๋ฆฌ์Šคํ‹ฑ ์ ์šฉ(Closed-loop vs. No Heuristics): ํ‰๊ท  ์˜ค๋ฅ˜ ๊ฐ์†Œ ํšจ๊ณผ๋Š” ์ž‘์ง€๋งŒ, ์ ‘์ด‰ ์†์‹ค๋กœ ์ธํ•œ ํฐ ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๋นˆ๋„๋ฅผ ์œ ์˜๋ฏธํ•˜๊ฒŒ ์ค„์˜€๋‹ค (Figure 11 histogram). ํœด๋ฆฌ์Šคํ‹ฑ ์ ์šฉ์€ ๋กœ๋ด‡ ๊ฒฝ๋กœ ๊ธธ์ด๋ฅผ ๋‹จ์ถ•์‹œํ‚ค๋Š” ํšจ๊ณผ๋„ ์žˆ์—ˆ๋‹ค (Figure 12).
      • IiwaBimanual๊ณผ AllegroHand ๋น„๊ต: AllegroHand ํƒœ์Šคํฌ(in-hand manipulation)์˜ ๋ณธ์งˆ์ ์ธ ์–ด๋ ค์›€(๋ฏธ๋„๋Ÿฌ์ง)์œผ๋กœ ์ธํ•ด IiwaBimanual๋ณด๋‹ค ํ‰๊ท  ์˜ค๋ฅ˜๊ฐ€ ์ปธ๋‹ค.
      • ํ•˜๋“œ์›จ์–ด ์‹คํ—˜: ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒฐ๊ณผ์™€ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค (Table 4).

์ „์—ญ ๊ณ„ํš (Global Planning)

์ง€์—ญ MPC๋Š” ๋น„ํƒ์š•์  ์›€์ง์ž„์ด ํ•„์š”ํ•œ ์ „์—ญ ๋ชฉํ‘œ ๋‹ฌ์„ฑ์— ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์€ ์ง€์—ญ MPC์˜ ์žฅ์ ์„ ํ™œ์šฉํ•˜๋Š” ๋กœ๋“œ๋งต(Roadmap) ๊ธฐ๋ฐ˜ ์ „์—ญ ๊ณ„ํš ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค.

  1. ๋ชฉํ‘œ ์ƒํƒœ ๊ธฐ๋ฐ˜ ์ ‘์ด‰ ์„ค์ • ์ƒ์„ฑ (Section 7):
    • ์ฃผ์–ด์ง„ ๊ฐ์ฒด ์ƒํƒœ(q_o)์™€ ๋ชฉํ‘œ(q_{og})์— ๋Œ€ํ•ด, ์ง€์—ญ MPC๊ฐ€ ํšจ์œจ์ ์œผ๋กœ ๋ชฉํ‘œ์— ๋„๋‹ฌํ•˜๋„๋ก ์œ ๋ฆฌํ•œ ๋กœ๋ด‡ ์„ค์ •(q_a)์„ ์ฐพ๋Š” ๋ฌธ์ œ ์ •์˜.
    • ์ตœ์ ํ™” ๋ฌธ์ œ์˜ ๋น„์šฉ ํ•จ์ˆ˜๋Š” ์ง€์—ญ MPC์˜ ์œ ํ•œ ์‹œ๊ฐ„ ๊ฐ€์น˜ ํ•จ์ˆ˜(V)์™€ ๊ฐ•๊ฑด์„ฑ(robustness) regularizer(r)๋ฅผ ์กฐํ•ฉํ•œ๋‹ค. r์€ RA-CTR ๊ธฐ๋ฐ˜ wrench set์˜ ์ตœ๋Œ€ ๋‚ด์ ‘๊ตฌ ๋ฐ˜๊ฒฝ์œผ๋กœ ์ •์˜๋˜๋ฉฐ, ์ด ์„ค์ •์—์„œ ๋กœ๋ด‡์ด ๊ฐ์ฒด์— ์–ผ๋งˆ๋‚˜ ํฐ ์™ธ๋ž€์„ ๊ฒฌ๋”œ ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋น„์šฉ ํ•จ์ˆ˜๋Š” C(q_a; q_o, q_{og}) = V(q_a; q_o, q_{og}) - \alpha r(q_a; q_o)^2 ํ˜•ํƒœ์ด๋‹ค.
    • ์ด ๋ฌธ์ œ๋Š” ๋น„๋ณผ๋กํ•˜๋ฉฐ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ์ด ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฐ˜ ์ตœ์ ํ™” ํœด๋ฆฌ์Šคํ‹ฑ์œผ๋กœ ํ•ด๊ฒฐํ•œ๋‹ค. AllegroHand์™€ ๊ฐ™์€ ๊ณ ์ฐจ์› ๋กœ๋ด‡์˜ ๊ฒฝ์šฐ, reduced-order model (4๊ฐœ์˜ sphere)์„ ์‚ฌ์šฉํ•˜๊ณ  ๊ทธ ํ•ด๋ฅผ ์—ญ๊ธฐ๊ตฌํ•™(IK)์œผ๋กœ ๋กœ๋ด‡ ์„ค์ •์— ๋งคํ•‘ํ•˜๋Š” ํœด๋ฆฌ์Šคํ‹ฑ์„ ๋„์ž…ํ•œ๋‹ค.
    • ๊ฒฐ๊ณผ(Figure 18, Table 6): AllegroHand์—์„œ ์ง๊ด€์ ์ด๊ณ  ๋ชฉํ‘œ ๋‹ฌ์„ฑ์— ํšจ๊ณผ์ ์ธ ์ดˆ๊ธฐ ๋กœ๋ด‡ ์„ค์ •๋“ค์„ ์ฐพ์•˜์œผ๋ฉฐ, MPC ๋กค์•„์›ƒ ๊ฒฐ๊ณผ 10mm ์ด๋‚ด์˜ ์œ„์น˜ ์˜ค๋ฅ˜์™€ 30mrad ์ด๋‚ด์˜ ํšŒ์ „ ์˜ค๋ฅ˜๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค.
  2. ๋กœ๋“œ๋งต ๊ธฐ๋ฐ˜ ์ „์—ญ ๊ณ„ํš (Section 8):
    • ์˜คํ”„๋ผ์ธ ๋‹จ๊ณ„(Algorithm 4): ์ž‘์—… ๊ณต๊ฐ„์„ ์ถฉ๋ถ„ํžˆ ์ปค๋ฒ„ํ•˜๋Š” ์•ˆ์ •์ ์ธ ๊ฐ์ฒด ์„ค์ •๋“ค์— ํ•ด๋‹นํ•˜๋Š” ์ ‘์ด‰ ์„ค์ •๋“ค์„ ๋กœ๋“œ๋งต์˜ ์ •์ (vertices)์œผ๋กœ ์ƒ์„ฑํ•œ๋‹ค. ๊ฐ ์ •์  ์Œ์— ๋Œ€ํ•ด ์ง€์—ญ MPC(๊ฐ์ฒด ๋ชฉํ‘œ ๋„๋‹ฌ)์™€ ์ถฉ๋Œ ํšŒํ”ผ ๊ณ„ํš(๋กœ๋ด‡ ์žฌ๋ฐฐ์น˜)์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ ์šฉํ•˜์—ฌ ์ „์ด๊ฐ€ ์„ฑ๊ณตํ•˜๋ฉด ์—์ง€(edge)๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค (Figure 19). AllegroHand์˜ ๊ฒฝ์šฐ ๊ฐ์ฒด์˜ ๋Œ€์นญ์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ ๋กœ๋“œ๋งต ๊ตฌ์ถ•์„ ํšจ์œจํ™”ํ–ˆ์œผ๋ฉฐ, ํ‘œ์ค€ ๋…ธํŠธ๋ถ CPU๋งŒ์œผ๋กœ 10๋ถ„ ์ด๋‚ด์— ๋กœ๋“œ๋งต ๊ตฌ์ถ•์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ํ•˜๋“œ์›จ์–ด์—์„œ 150ํšŒ ์—ฐ์† ์—์ง€ ์ „์ด์— ์„ฑ๊ณตํ•˜๋ฉฐ ๋กœ๋“œ๋งต์˜ ๊ฐ•๊ฑด์„ฑ์„ ํ™•์ธํ–ˆ๋‹ค.
    • ์˜จ๋ผ์ธ ๋‹จ๊ณ„: ์ž„์˜์˜ ์‹œ์ž‘ ์„ค์ •์—์„œ ์ž„์˜์˜ ๋ชฉํ‘œ ๊ฐ์ฒด ์„ค์ •๊นŒ์ง€์˜ ๊ณ„ํš์€, ์‹œ์ž‘/๋ชฉํ‘œ๋ฅผ ๋กœ๋“œ๋งต์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ ์— ์—ฐ๊ฒฐํ•œ ํ›„ ๊ทธ๋ž˜ํ”„ ์ƒ์—์„œ ์ตœ๋‹จ ๊ฒฝ๋กœ๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ˆ˜ํ–‰๋œ๋‹ค (Figure 20).

๊ฒฐ๋ก  (Conclusion)

๋ณธ ๋…ผ๋ฌธ์€ Contact Trust Region(CTR) ๊ฐœ๋…์„ ํ†ตํ•ด ์ ‘์ด‰์˜ ๋น„๋Œ€์นญ์„ฑ์„ ๊ณ ๋ คํ•œ ์ง€์—ญ์  ๋™์—ญํ•™ ๊ทผ์‚ฌ๋ฅผ ์ œ๊ณตํ•˜๊ณ , ์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํšจ์œจ์ ์ธ MPC ๊ธฐ๋ฐ˜ ์ง€์—ญ ๊ณ„ํš ๋ฐ ์ œ์–ด ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•œ๋‹ค. ๋˜ํ•œ, ์ ‘์ด‰ ์„ค์ • ์ƒ์„ฑ ๋ฐ ๋กœ๋“œ๋งต ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์ „์—ญ์ ์ธ ์ ‘์ด‰์ด ๋งŽ์€ ์กฐ์ž‘ ๊ณ„ํš ๋Šฅ๋ ฅ์„ ๊ตฌํ˜„ํ–ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์‹ค์ œ ํ•˜๋“œ์›จ์–ด ์‹คํ—˜์„ ํ†ตํ•ด ๊ทธ ์„ฑ๋Šฅ๊ณผ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ–ˆ๋‹ค. ํŠนํžˆ ์‹ฌ์ธต ๊ฐ•ํ™” ํ•™์Šต(deep RL) ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์— ๋น„ํ•ด ํ˜„์ €ํžˆ ๋‚ฎ์€ ๊ณ„์‚ฐ ์‹œ๊ฐ„์œผ๋กœ ๋ชฉํ‘œ ๋‹ฌ์„ฑ์ด ๊ฐ€๋Šฅํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.

ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๊ณผ์ œ๋“ค์ด ๋‚จ์•„์žˆ๋‹ค. ํŠน์ • ๊ณ„ํš ์‹คํŒจ์˜ ์›์ธ, IiwaBimanual๊ณผ AllegroHand ๊ฐ„ feasibility constraint์˜ ์—ญํ•  ์ฐจ์ด์— ๋Œ€ํ•œ ๊นŠ์€ ์ดํ•ด, ๊ทธ๋ฆฌ๊ณ  CQDC์˜ hydroplaning๊ณผ ๊ฐ™์€ ๋ชจ๋ธ-ํ˜„์‹ค ๋ฌผ๋ฆฌ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ์ ‘์ด‰์„ ๊ฐ•๊ฑดํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋Š” ๋ฌธ์ œ ๋“ฑ์€ ํ–ฅํ›„ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œ๋œ CTR, MPC, ์ ‘์ด‰ ์„ค์ • ์ƒ์„ฑ, ๋กœ๋“œ๋งต ๊ธฐ๋ฒ•์€ ์ ‘์ด‰์ด ๋งŽ์€ ๋กœ๋ด‡ ์กฐ์ž‘ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊ฐ•๋ ฅํ•œ ๋„๊ตฌ๋“ค์„ ์ œ๊ณตํ•œ๋‹ค.


Detail Review

CTR ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ

๊ฐœ์š”: ์ ‘์ด‰ ์‹ ๋ขฐ ์˜์—ญ(Contact Trust Region, CTR)์€ ๊ธฐ์กด์˜ ํƒ€์›ํ˜• ์‹ ๋ขฐ์˜์—ญ(Ellipsoidal Trust Region, ETR)์„ ํ™•์žฅํ•˜์—ฌ, ์ ‘์ด‰ ๋™์—ญํ•™์˜ ๋ฌผ๋ฆฌ ์ œ์•ฝ ์กฐ๊ฑด์„ ๋ช…์‹œ์ ์œผ๋กœ ํฌํ•จํ•˜๋Š” ์ƒˆ๋กœ์šด ์‹ ๋ขฐ์˜์—ญ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์„ ํ˜•ํ™” ์˜ค์ฐจ๋ฅผ ์ œ์–ดํ•˜๋Š” ์ž‘์€ ํƒ€์›ํ˜• ์˜์—ญ๋ฟ ์•„๋‹ˆ๋ผ, ์ ‘์ด‰ ๊ฐ€๋Šฅ์„ฑ ์ œ์•ฝ ์กฐ๊ฑด(์ผ๋ฐฉํ–ฅ ์ ‘์ด‰๋ ฅ, ๋งˆ์ฐฐ ์›๋ฟ” ์ œ์•ฝ ๋“ฑ)๋„ ํ•จ๊ป˜ ์ ์šฉํ•˜์—ฌ, ํƒ์ƒ‰ ๊ฐ€๋Šฅํ•œ ์ง€์—ญ์„ ํ˜„์‹ค์ ์ธ ๋ฌผ๋ฆฌ ๋ฒ”์œ„ ๋‚ด๋กœ ์ œํ•œํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

1. ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ์ ‘์ด‰ ๋™์—ญํ•™ ๋ชจ๋ธ

CTR์€ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ์ ‘์ด‰ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ์ด์ „ ์—ฐ๊ตฌ์ธ Convex Quasi-Dynamic Contact (CQDC) ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์ ‘์ด‰ ๋™์—ญํ•™์„ ๋ณผ๋ก ์ตœ์ ํ™” ๋ฌธ์ œ(SOCP ๋“ฑ)๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์„ ํ’€๋ฉด ๋‹ค์Œ ์ƒํƒœ๋ฟ ์•„๋‹ˆ๋ผ ์ ‘์ด‰๋ ฅ๊นŒ์ง€ ๊ณ„์‚ฐ๋˜๋ฉฐ, ์ƒํƒœ์™€ ์ œ์–ด ์ž…๋ ฅ์— ๋Œ€ํ•œ ๊ฐ๋„(Jacobian)๋„ ํ•จ๊ป˜ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ ‘์ด‰๋ ฅ์„ ์Œ๋Œ€๋ณ€์ˆ˜(dual variable)๋กœ ๊ฐ„์ฃผํ•œ KKT ์กฐ๊ฑด ๋ฏผ๊ฐ๋„ ํ•ด์„์„ ํ†ตํ•ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค.

2. ์ƒํƒœ ๋ฐ ์ ‘์ด‰๋ ฅ์˜ ์„ ํ˜•ํ™”

๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ๋‹ค์Œ ์ƒํƒœ $+$์™€ ์ ‘์ด‰๋ ฅ $+$๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ ํ˜• ๊ทผ์‚ฌ๋ฉ๋‹ˆ๋‹ค:

  • ์ƒํƒœ ์—…๋ฐ์ดํŠธ: \hat{q}_+ = A_\kappa \, \delta q + B_\kappa \, \delta u + f_\kappa(\bar{q}, \bar{u})
  • ์ ‘์ด‰๋ ฅ ์‘๋‹ต: \hat{\lambda}_{+,i} = C_{\kappa,i} \, \delta q + D_{\kappa,i} \, \delta u + \lambda_{\kappa,i}(\bar{q}, \bar{u})

์ด๋Š” ํ‘œ์ค€์ ์ธ ์ƒํƒœ ์„ ํ˜•ํ™”์™€ ๋‹ฌ๋ฆฌ, ์ ‘์ด‰๋ ฅ ๋ณ€ํ™”๊นŒ์ง€ ํ•จ๊ป˜ ๊ทผ์‚ฌํ•˜๋ฏ€๋กœ, ์ ‘์ด‰์˜ 1์ฐจ ์‘๋‹ต์„ ์ •๋ฐ€ํ•˜๊ฒŒ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3. ์ ‘์ด‰ ๊ฐ€๋Šฅ์„ฑ ์ œ์•ฝ(Contact Feasibility Constraints)

CTR์€ ์œ„ ์„ ํ˜•ํ™” ๋ชจ๋ธ์— ๋Œ€ํ•ด, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ์ œ์•ฝ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค:

  • ๋น„์นจํˆฌ ์กฐ๊ฑด (Primal feasibility): \hat{J}_i \, \hat{q}_+ + \hat{c}_i \in K_i โ†’ ์ ‘์ด‰๋ฉด์—์„œ์˜ ์ƒ๋Œ€ ์šด๋™์ด interpenetration์„ ์œ ๋ฐœํ•˜์ง€ ์•Š๋„๋ก ์ œํ•œ

  • ๋งˆ์ฐฐ ์›๋ฟ” ์กฐ๊ฑด (Dual feasibility): \hat{\lambda}_{+,i} \in K_i^* โ†’ ๋งˆ์ฐฐ ๊ณ„์ˆ˜ ๋ฐ ์ผ๋ฐฉํ–ฅ ์ ‘์ด‰๋ ฅ ์กฐ๊ฑด(์ •์ƒ ๋งˆ์ฐฐ๋ ฅ์€ 0 ์ด์ƒ) ๋ณด์žฅ

์ด๋Ÿฌํ•œ ์กฐ๊ฑด์€ 2์ฐจ์› ์›๋ฟ” ์ œ์•ฝ(SOCP) ํ˜•ํƒœ๋กœ ์ •์‹ํ™”๋˜๋ฉฐ, ์‹ ๋ขฐ ์˜์—ญ ๋‚ด์˜ ๋ชจ๋“  ํ›„๋ณดํ•ด๊ฐ€ ์ ‘์ด‰ ๊ฐ€๋Šฅ์„ฑ ๋ฌผ๋ฆฌ ๋ฒ•์น™์„ ๋งŒ์กฑํ•˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

4. ์ ‘์ด‰ ์‹ ๋ขฐ ์˜์—ญ์˜ ์ˆ˜ํ•™์  ์ •์˜

CTR์€ ๋‹ค์Œ์˜ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” $(q, u)$์˜ ์ง‘ํ•ฉ์œผ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค:

  1. ํƒ€์›ํ˜• ์ œ์•ฝ: \delta z^T \Sigma \delta z \leq 1 \quad (\delta z = [\delta q; \delta u])
  2. ์„ ํ˜•ํ™”๋œ ์ƒํƒœ ๋ฐ ์ ‘์ด‰๋ ฅ ์‹ ๋งŒ์กฑ
  3. ๋น„์นจํˆฌ ์ œ์•ฝ: $_+$๊ฐ€ ์ ‘์ด‰๋ฉด์„ ์นจํˆฌํ•˜์ง€ ์•Š์Œ
  4. ๋งˆ์ฐฐ ์›๋ฟ” ์ œ์•ฝ: $_{+,i}$๊ฐ€ ์›๋ฟ” ๋‚ด๋ถ€์— ์œ„์น˜ํ•จ

CTR์€ ์ด๋Ÿฌํ•œ ์ œ์•ฝ๋“ค์˜ ๊ต์ง‘ํ•ฉ์ด๋ฉฐ, ์ด๋Š” ๋ณผ๋ก ์ง‘ํ•ฉ(convex set)์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ดํ›„์˜ ์ตœ์ ํ™” ๋‹จ๊ณ„๋„ ๋ณผ๋ก ์ตœ์ ํ™” ๋ฌธ์ œ(SOCP)๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค.

5. ๋ณ€ํ˜•: A-CTR, R-CTR

  • A-CTR (Action-only CTR): ์ƒํƒœ๋Š” ๊ณ ์ •ํ•˜๊ณ  ์ž…๋ ฅ $u$๋งŒ์„ ํƒ์ƒ‰ํ•˜๋Š” ๊ฒฝ์šฐ. ๊ณ„์‚ฐ๋Ÿ‰์ด ์ค„์–ด ๋น ๋ฅธ ์ถ”๋ก  ๊ฐ€๋Šฅ
  • R-CTR (Relaxed CTR): ๋น„์นจํˆฌ ์กฐ๊ฑด์„ ์ œ๊ฑฐํ•˜๊ณ  ๋งˆ์ฐฐ ์ œ์•ฝ๋งŒ ์ ์šฉํ•˜์—ฌ ๋ณด์ˆ˜์„ฑ ์™„ํ™” ๋ฐ ํƒ์ƒ‰ ๋ฐ˜๊ฒฝ ํ™•๋Œ€

์‹คํ—˜ ๊ฒฐ๊ณผ R-CTR์ด ์˜คํžˆ๋ ค ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์—ˆ์œผ๋ฉฐ, ์ด๋Š” ์ตœ์ ํ™”๊ฐ€ ๋œ ์ œํ•œ์ ์ธ ๋ฐฉํ–ฅ์œผ๋กœ๋„ ์œ ํšจํ•œ ์ ‘์ด‰ ์กฐ์ž‘์„ ๊ณ„ํšํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.


CTR ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ์˜ˆ์ธก ์ œ์–ด(MPC) ํ†ตํ•ฉ

CTR์€ ๊ทธ ์ž์ฒด๋กœ๋Š” ํ•˜๋‚˜์˜ ์ œ์•ฝ ์กฐ๊ฑด ์ง‘ํ•ฉ์ด์ง€๋งŒ, ์ด๋ฅผ ์‹ค์งˆ์ ์ธ ์กฐ์ž‘ ์ œ์–ด๊ธฐ๋กœ ์‚ฌ์šฉํ•˜๋ ค๋ฉด MPC(๋ชจ๋ธ ์˜ˆ์ธก ์ œ์–ด) ํ”„๋ ˆ์ž„์›Œํฌ ๋‚ด์— ํ†ตํ•ฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ์„น์…˜์—์„œ๋Š” CTR์ด ์–ด๋–ป๊ฒŒ MPC์— ํ†ตํ•ฉ๋˜๊ณ , ์ ‘์ด‰-ํ’๋ถ€ํ•œ ์กฐ์ž‘์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”์ง€๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

1. ์ ‘์ด‰ ์•”์‹œ์ (contact-implicit) MPC

CTR ๋…ผ๋ฌธ์—์„œ๋Š” ์ ‘์ด‰-์•”์‹œ์ (contact-implicit) MPC ๋ฌธ์ œ๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ ‘์ด‰ ๋ชจ๋“œ ์ „์ด๋ฅผ ๋ฏธ๋ฆฌ ๋ช…์‹œํ•˜์ง€ ์•Š๊ณ , ์ ‘์ด‰ ์—ฌ๋ถ€ ๋ฐ ์ ‘์ด‰๋ ฅ์˜ ๋ฐœ์ƒ์„ ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ์ž๋™์œผ๋กœ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

  • ๊ฐ ์‹œ์ ์—์„œ CQDC ๊ธฐ๋ฐ˜ ์„ ํ˜•ํ™”๋ฅผ ํ†ตํ•ด ์ƒํƒœ ๋ฐ ์ ‘์ด‰๋ ฅ์— ๋Œ€ํ•œ ์„ ํ˜• ๋ชจ๋ธ์„ ์ƒ์„ฑ
  • CTR ์ œ์•ฝ(์ ‘์ด‰ ๊ฐ€๋Šฅ์„ฑ, ๋งˆ์ฐฐ ๋“ฑ)์„ ์ ์šฉํ•œ SOCP ๋ฌธ์ œ๋ฅผ ๊ตฌ์„ฑ
  • ์ผ์ • ์‹œ๊ฐ„ ์ง€ํ‰(horizon) ๋‚ด์—์„œ ์ตœ์ ํ™”ํ•œ ํ›„, ์ฒซ ๋ฒˆ์งธ ์ œ์–ด ์ž…๋ ฅ๋งŒ ์ ์šฉํ•˜๊ณ  ๋‹ค์‹œ ๋ฐ˜๋ณต (Receding Horizon Planning)

CTR์˜ ๊ตฌ์กฐ ๋•๋ถ„์— ์ด MPC ๋ฌธ์ œ๋Š” ์ „ ๊ตฌ๊ฐ„์—์„œ ๋ณผ๋ก ์ตœ์ ํ™”(SOCP)๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค.

2. ๋ฐ˜๋ณต ์ตœ์ ํ™” ๋ฐ ํ”ผ๋“œ๋ฐฑ

CTR-MPC๋Š” ์ผ๋ฐ˜์ ์ธ MPC์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋งค ํƒ€์ž„์Šคํ…๋งˆ๋‹ค ์ƒˆ๋กœ์šด ์ƒํƒœ๋ฅผ ๊ด€์ธกํ•˜๊ณ , ์„ ํ˜•ํ™”๋ฅผ ์ƒˆ๋กœ ์ˆ˜ํ–‰ํ•œ ํ›„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ˜๋ณต ํ”ผ๋“œ๋ฐฑ ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ด์ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • ๋ชจ๋ธ๋ง ์˜ค๋ฅ˜๋‚˜ ์™ธ๋ž€์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ ํ™•๋ณด
  • ์ ‘์ด‰ ๋ณ€ํ™”๋‚˜ ๋ฏธ์„ธํ•œ ํ™˜๊ฒฝ ์กฐ๊ฑด ๋ณ€ํ™”์— ๋Œ€ํ•œ ์‹ค์‹œ๊ฐ„ ์ ์‘

3. ๋ชจ๋“œ ์ „์ด ์—†์ด ์ ‘์ด‰ ์ฒ˜๋ฆฌ

CTR-MPC๋Š” ์ ‘์ด‰ ๋ชจ๋“œ ์ „์ด(mode scheduling)๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๊ธฐ์ˆ ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์˜ ์ˆ˜์‹ ์กฐ๊ฑด์„ ํ†ตํ•ด ์ ‘์ด‰์˜ ์ƒ์„ฑ๊ณผ ์†Œ๋ฉธ์„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:

  • ${+,i} K_i^*$ ์กฐ๊ฑด์€ ${+,i} = 0$ (์ ‘์ด‰ ์—†์Œ)๋„ ํ—ˆ์šฉ
  • $i + + _i K_i$๋Š” ๋ฌผ์ฒด์™€ ์†๊ฐ€๋ฝ์ด ๋–จ์–ด์ ธ ์žˆ์„ ๋•Œ๋„ ๋น„์นจํˆฌ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋„๋ก ํ—ˆ์šฉ

์ด๋Ÿฌํ•œ ์„ค๊ณ„๋Š” ์ ‘์ด‰ ๋ชจ๋“œ๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ถ„๊ธฐ์‹œํ‚ค๋Š” ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ํ›จ์”ฌ ์œ ์—ฐํ•˜๊ณ  ๊ณ„์‚ฐ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

4. ๊ณ„์‚ฐ ํšจ์œจ์„ฑ

CTR-MPC์˜ ๊ฐ ์ตœ์ ํ™”๋Š” ๋ณผ๋ก ๋ฌธ์ œ(SOCP)๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค:

  • Allegro ํ•ธ๋“œ๋กœ ํ๋ธŒ๋ฅผ ์กฐ์ž‘ํ•˜๋Š” ์ž‘์—…์—์„œ, ์˜จ๋ผ์ธ ์ตœ์ ํ™”๋Š” ์ˆ˜ ์ดˆ ์ด๋‚ด์— ์‹คํ–‰ ๊ฐ€๋Šฅ
  • ์ „์ฒด ์กฐ์ž‘์„ ์œ„ํ•œ ์กฐ์ž‘ ๋™์ž‘ ๊ทธ๋ž˜ํ”„(๋กœ๋“œ๋งต)๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐ 10๋ถ„ ๋ฏธ๋งŒ ์†Œ์š”

์ด๋Š” ์ผ๋ฐ˜์ ์ธ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ณด๋‹ค ํ›จ์”ฌ ๋‚ฎ์€ ๊ณ„์‚ฐ ์ž์›์œผ๋กœ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

5. ์˜ˆ์‹œ ์ž‘์—… ๋ฐ ๊ฒฐ๊ณผ

CTR-MPC๋Š” ๋‘ ๊ฐ€์ง€ ์‹ค์ œ ์˜ˆ์‹œ์—์„œ ๊ฒ€์ฆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค:

  • ์–‘ํŒ” ์กฐ์ž‘ (Bimanual Manipulation): ๋‘ ๊ฐœ์˜ KUKA iiwa ํŒ”๋กœ ํฐ ์›ํ†ตํ˜• ๋ฌผ์ฒด๋ฅผ ์ด๋™์‹œํ‚ค๋Š” ์ž‘์—…. ๋ณต์žกํ•œ ์ ‘์ด‰ ํ˜‘์‘์ด ํ•„์š”ํ•˜์ง€๋งŒ, CTR-MPC๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ๋ชจ๋‘์—์„œ ์„ฑ๊ณต์ ์œผ๋กœ ์ˆ˜ํ–‰.

  • ์† ์•ˆ ํ๋ธŒ ํšŒ์ „ (In-Hand Manipulation): Allegro ํ•ธ๋“œ๋กœ ํ๋ธŒ๋ฅผ ๋‹ค์–‘ํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ํšŒ์ „์‹œํ‚ค๋Š” ์ž‘์—…. Relaxed CTR (R-CTR)์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ๋กœ๋“œ๋งต ๊ธฐ๋ฐ˜ ์ „๋žต์œผ๋กœ ์žฅ๊ฑฐ๋ฆฌ ๋ชฉํ‘œ ํšŒ์ „๋„ ๋‹ฌ์„ฑ ๊ฐ€๋Šฅํ–ˆ์Œ.

6. ์ „์—ญ ๊ณ„ํš๊ณผ์˜ ํ†ตํ•ฉ

CTR-MPC๋Š” ๋ณธ์งˆ์ ์œผ๋กœ ๋กœ์ปฌ ์ตœ์ ํ™” ๊ธฐ๋ฐ˜์ด๋ฏ€๋กœ, ์ „์ฒด ์ƒํƒœ ๊ณต๊ฐ„์—์„œ์˜ ๊ฒฝ๋กœ ๊ณ„ํš์€ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ „์—ญ ๋กœ๋“œ๋งต ๊ธฐ๋ฐ˜ ๊ณ„ํš(global roadmap planning)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค:

  • ํ๋ธŒ์˜ ๋‹ค์–‘ํ•œ ์•ˆ์ •๋œ ํฌ์ฆˆ๋ฅผ ๋…ธ๋“œ๋กœ ๊ตฌ์„ฑ
  • CTR-MPC๋ฅผ ์ด์šฉํ•ด ์ด๋“ค ๋…ธ๋“œ ๊ฐ„ ๋‹จ๊ฑฐ๋ฆฌ ์กฐ์ž‘ ๊ถค์ (edge)๋ฅผ ์ƒ์„ฑ
  • ์ „์ฒด ๊ทธ๋ž˜ํ”„๋ฅผ ํƒ์ƒ‰ํ•˜์—ฌ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ๋ชฉํ‘œ๋„ ์ˆœ์ฐจ์  ์กฐ์ž‘์œผ๋กœ ๋„๋‹ฌ ๊ฐ€๋Šฅ

์ด ๋ฐฉ์‹์€ ์ „ํ†ต์ ์ธ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฐ˜ ๊ณ„ํš๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, MPC ๊ธฐ๋ฐ˜ ๋™์ž‘ ์›์‹œ(primitive)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ ‘์ด‰-ํ’๋ถ€ํ•œ ๊ฒฝ๋กœ ์ƒ์„ฑ์„ ๊ฐ€๋Šฅ์ผ€ ํ•ฉ๋‹ˆ๋‹ค.


DeXtreme: ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ํ๋ธŒ ํšŒ์ „ ์ œ์–ด

DeXtreme(NVIDIA Research, 2022)์€ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต๋œ ์ •์ฑ…(policy)์„ ํ†ตํ•ด, ์ €๋น„์šฉ ๋กœ๋ด‡ ํ•ธ๋“œ์—์„œ๋„ ์ •๋ฐ€ํ•œ ํ๋ธŒ ํšŒ์ „์„ ์ˆ˜ํ–‰ํ•œ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ๋ฒ•์€ CTR์ด ๋‹ค๋ฃฌ Allegro ํ•ธ๋“œ์˜ ์กฐ์ž‘ ๋ฌธ์ œ์™€ ๋™์ผํ•œ ๋ฌธ์ œ ์„ค์ •์—์„œ, ์ „ํ˜€ ๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ ํ•ด๊ฒฐ์ฑ…์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

1. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜ ํ•™์Šต

  • Isaac Gym์ด๋ผ๋Š” GPU ๊ฐ€์† ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ์ •์ฑ…์„ ํ•™์Šต
  • ๋ฌด๋ ค 10๋งŒ ๊ฐœ ์ด์ƒ์˜ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์„ GPU์—์„œ ๋™์‹œ ์‹คํ–‰
  • ์ด๋กœ ์ธํ•ด ๋กœ๋ด‡์€ ์ดˆ์ธ์ ์ธ ์†๋„๋กœ ์‹œํ–‰์ฐฉ์˜ค ํ•™์Šต ๊ฐ€๋Šฅ

2. ์ •์ฑ… ๊ตฌ์กฐ

  • ์ •์ฑ…์€ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ์ž…๋ ฅ์€ ๋กœ๋ด‡ ์ƒํƒœ ๋ฐ ๋ฌผ์ฒด ์ž์„ธ ์ •๋ณด
  • ๋น„์ „ ๊ธฐ๋ฐ˜ ์ •์ฑ…๋„ ํ•™์Šต๋จ: RGB ์นด๋ฉ”๋ผ 3๋Œ€๋ฅผ ์‚ฌ์šฉํ•ด ๋ฌผ์ฒด ์ž์„ธ ์ถ”์ • ํ›„ ์ž…๋ ฅ์œผ๋กœ ํ™œ์šฉ
  • ๋ณ„๋„์˜ ํฌ์ฆˆ ์ถ”์ • ์‹ ๊ฒฝ๋ง์„ ํ•จ๊ป˜ ํ•™์Šต์‹œ์ผœ, ์‹œ๊ฐ ์ •๋ณด์—์„œ 3D ๋ฌผ์ฒด ์ž์„ธ๋ฅผ ๋ณต์›

3. ๋„๋ฉ”์ธ ๋žœ๋คํ™”(Domain Randomization)

  • ์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ๊ฐ„ ๊ฒฉ์ฐจ(Sim2Real gap)๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๋ฌผ๋ฆฌ ์†์„ฑ ๋ฐ ์‹œ๊ฐ ์กฐ๊ฑด์„ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ๋žœ๋คํ™”

    • ์งˆ๋Ÿ‰, ๋งˆ์ฐฐ๊ณ„์ˆ˜, ํ‘œ๋ฉด ํ…์Šค์ฒ˜, ์กฐ๋ช… ์กฐ๊ฑด, ์นด๋ฉ”๋ผ ์œ„์น˜ ๋“ฑ
  • ์ด๋กœ ์ธํ•ด ์ •์ฑ…์€ ๋„“์€ ์กฐ๊ฑด ๋ถ„ํฌ์— ๋Œ€ํ•ด ๊ฐ•๊ฑดํ•œ ํ–‰๋™ ์ „๋žต์„ ํ•™์Šตํ•จ

4. ํ•™์Šต ๋น„์šฉ ๋ฐ ๊ณ„์‚ฐ ์ž์›

  • ์•ฝ 32์‹œ๊ฐ„ ๋™์•ˆ ๊ณ ์„ฑ๋Šฅ GPU ์„œ๋ฒ„์—์„œ ํ•™์Šต
  • ์ด ๋™์•ˆ ์ •์ฑ…์€ ์•ฝ 42๋…„์น˜์— ํ•ด๋‹นํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒฝํ—˜์„ ์ถ•์ 
  • ์ด๋Š” ๊ฐ•ํ™”ํ•™์Šต์˜ ๋Œ€ํ‘œ์ ์ธ ๋‹จ์ ์ธ ์ƒ˜ํ”Œ ๋น„ํšจ์œจ์„ฑ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ง€ํ‘œ

5. ์‹คํ–‰ ๋ฐ ์‹ค์ œ ๋กœ๋ด‡ ์ ์šฉ

  • ํ•™์Šต ์™„๋ฃŒ ํ›„, ์ •์ฑ…์€ ๊ณ ์† ์‹ค์‹œ๊ฐ„ ์ œ์–ด ๊ฐ€๋Šฅ (์‹ ๊ฒฝ๋ง ์ „๋ฐฉ ์—ฐ์‚ฐ๋งŒ ์ˆ˜ํ–‰)
  • Allegro ํ•ธ๋“œ์—์„œ ๋ชฉํ‘œ ๋ฐฉํ–ฅ์œผ๋กœ ํ๋ธŒ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ํšŒ์ „์‹œํ‚ด
  • OpenAI์˜ Shadow Hand์™€ ๋‹ฌ๋ฆฌ, ๊ด€์ ˆ ์ˆ˜๊ฐ€ ์ ๊ณ  ๋น„์šฉ๋„ ๋‚ฎ์€ Allegro ํ•ธ๋“œ์—์„œ ์„ฑ๊ณตํ•œ ์ ์ด ์ธ์ƒ์ ์ž„

6. ์ผ๋ฐ˜ํ™” ๋ฐ ๊ฐ•๊ฑด์„ฑ

  • ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋ฅผ ํ†ตํ•ด, ํ•˜๋“œ์›จ์–ด ์†์ƒ์—๋„ ๊ฒฌ๋””๋Š” ๊ฐ•๊ฑด์„ฑ ํ™•๋ณด

    • ์˜ˆ: ์—„์ง€ ๊ด€์ ˆ์ด ๋А์Šจํ•œ ์ƒํƒœ์—์„œ๋„ ์ •์ฑ…์ด ๋ณด์ƒํ•˜๋ฉฐ ๋™์ž‘ ์„ฑ๊ณต
  • ์‹œ๊ฐ ๋„คํŠธ์›Œํฌ๋Š” ๊ฐ€๋ฆผ(occlusion) ๋ฐ ๋ชจ์…˜ ๋ธ”๋Ÿฌ์—๋„ ๊ฒฌ๋”œ ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต๋จ

7. ์ •์ฑ…์˜ ํ•œ๊ณ„

DeXtreme์€ ๋†€๋ผ์šด ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ์ง€๋งŒ, CTR ์ ‘๊ทผ๊ณผ ๋‹ฌ๋ฆฌ ์ ‘์ด‰ ๋ฌผ๋ฆฌ ๋ฒ•์น™์„ ๋ช…์‹œ์ ์œผ๋กœ ๋ฐ˜์˜ํ•˜์ง€๋Š” ์•Š์Œ:

  • ๋งˆ์ฐฐ ์›๋ฟ”, ๋น„์นจํˆฌ ์กฐ๊ฑด ๋“ฑ์€ ํ•™์Šต์„ ํ†ตํ•ด ์•”๋ฌต์ ์œผ๋กœ ์Šต๋“
  • ํ–‰๋™์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋ฌผ๋ฆฌ ์—”์ง„๊ณผ ๋ณด์ƒ ํ•จ์ˆ˜ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ์œ ๋„๋จ
  • ๋”ฐ๋ผ์„œ ์ •์ฑ…์€ ์™œ ํ•ด๋‹น ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๋Š”์ง€ ํ•ด์„ํ•˜๊ธฐ ์–ด๋ ต๊ณ , ์ œ์•ฝ ์กฐ๊ฑด ์œ„๋ฐ˜ ์—ฌ๋ถ€๋„ ๋ช…์‹œ์ ์œผ๋กœ ํŒ๋‹จํ•˜๊ธฐ ์–ด๋ ค์›€

CTR vs DeXtreme: ๋‘ ์ ‘๊ทผ ๋ฐฉ์‹์˜ ๋น„๊ต ๋ถ„์„

CTR-MPC์™€ DeXtreme์€ ๋ชจ๋‘ ์† ์•ˆ์˜ ํ๋ธŒ ํšŒ์ „๊ณผ ๊ฐ™์€ ๊ณ ๋‚œ๋„ ์ ‘์ด‰ ์กฐ์ž‘์„ ๋ชฉํ‘œ๋กœ ํ•˜์ง€๋งŒ, ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ตœ์ ํ™”์™€ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํ•™์Šต์ด๋ผ๋Š” ์ •๋ฐ˜๋Œ€์˜ ์ฒ ํ•™์„ ๊ฐ€์ง€๊ณ  ์ ‘๊ทผํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๋‘ ๋ฐฉ๋ฒ•๋ก ์„ ์ฃผ์š” ๊ด€์ ์—์„œ ๋น„๊ตํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

1. ์ ‘์ด‰ ์ฒ˜๋ฆฌ ๋ฐฉ์‹

ํ•ญ๋ชฉ CTR-MPC DeXtreme (RL)
์ ‘์ด‰ ๋ชจ๋ธ๋ง ๋งˆ์ฐฐ ์›๋ฟ”, ๋น„์นจํˆฌ ์กฐ๊ฑด ๋“ฑ์„ ๋ช…์‹œ์  ์ˆ˜์‹์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๊ณ  ์ตœ์ ํ™”์— ํ†ตํ•ฉ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ๋ณด์ƒ์„ ํ†ตํ•ด ์•”๋ฌต์ ์œผ๋กœ ์ ‘์ด‰ ์ „๋žต์„ ํ•™์Šต
์ ‘์ด‰๋ ฅ ์ถ”๋ก  ์ ‘์ด‰๋ ฅ์€ ์ตœ์ ํ™” ๋ณ€์ˆ˜๋กœ ์ง์ ‘ ๊ณ„์‚ฐ๋˜๋ฉฐ, ๊ณ„ํš ๊ณผ์ •์—์„œ ์‚ฌ์šฉ๋จ ์‹ ๊ฒฝ๋ง ๋‚ด๋ถ€์—์„œ ์•”๋ฌต์ ์œผ๋กœ ํ˜•์„ฑ๋จ (๊ด€์ธก ๋ถˆ๊ฐ€)
๋ฌผ๋ฆฌ ์œ„๋ฐ˜ ๊ฐ€๋Šฅ์„ฑ ์ˆ˜์‹ ์ œ์•ฝ์œผ๋กœ ์ธํ•ด ๋ฌผ๋ฆฌ ๋ฒ•์น™ ์œ„๋ฐ˜ ๋ถˆ๊ฐ€๋Šฅ ํ•™์Šต๋œ ์ •์ฑ…์ด ๋ฌผ๋ฆฌ ์ œ์•ฝ์„ ์œ„๋ฐ˜ํ•  ์ˆ˜ ์žˆ์Œ (ex. interpenetration)

2. ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ๊ณผ ๊ณ„์‚ฐ ์ž์›

ํ•ญ๋ชฉ CTR-MPC DeXtreme (RL)
์‚ฌ์ „ ํ•™์Šต ํ•„์š”์„ฑ ์—†์Œ โ€“ ๋งค ์‹คํ–‰๋งˆ๋‹ค ์ตœ์ ํ™” ํ•„์š” โ€“ ์ˆ˜์‹ญ์–ต ์Šคํ…์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•„์š”
์‹คํ–‰ ์‹œ ๊ณ„์‚ฐ ๋น„์šฉ ์ค‘๊ฐ„ โ€“ SOCP ์ตœ์ ํ™” ์ˆ˜ํ–‰ ๋งค์šฐ ๋‚ฎ์Œ โ€“ ์‹ ๊ฒฝ๋ง ์ „๋ฐฉ ์—ฐ์‚ฐ๋งŒ ์ˆ˜ํ–‰
์ƒ˜ํ”Œ ํšจ์œจ์„ฑ ๋งค์šฐ ๋†’์Œ โ€“ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ถ”๋ก  ๋‚ฎ์Œ โ€“ ๋ฐฉ๋Œ€ํ•œ ์‹œํ–‰์ฐฉ์˜ค ํ•„์š”

3. ์ผ๋ฐ˜ํ™”์™€ ์ ์‘์„ฑ

ํ•ญ๋ชฉ CTR-MPC DeXtreme (RL)
ํ™˜๊ฒฝ ๋ณ€ํ™” ๋Œ€์‘ ๋ชจ๋ธ๋งŒ ์ˆ˜์ •ํ•˜๋ฉด ์ฆ‰์‹œ ๋Œ€์‘ ๊ฐ€๋Šฅ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ถ„ํฌ ์™ธ์—๋Š” ์žฌํ•™์Šต ํ•„์š”
๋ชฉํ‘œ ๋ณ€ํ™” ์ ์‘ ์ฆ‰์‹œ ๊ฐ€๋Šฅ (๋ชฉํ‘œ ์ƒํƒœ๋งŒ ๋ฐ”๊พธ๋ฉด ๋จ) ๊ฐ€๋Šฅํ•˜๋‚˜, ์ •ํ•ด์ง„ ๋ชฉํ‘œ ํ˜•์‹ ๋‚ด์—์„œ๋งŒ ์ผ๋ฐ˜ํ™”๋จ
์™ธ๋ž€ ๋Œ€์‘์„ฑ ๊ณ  โ€“ ์žฌ๊ณ„ํš ๊ธฐ๋ฐ˜ ์ค‘ โ€“ ์ผ๋ถ€ ์™ธ๋ž€์—๋Š” ๊ฐ•๊ฑดํ•˜๋‚˜ ๊ณ„ํš ๋Šฅ๋ ฅ์€ ์—†์Œ

4. ์ •์ฑ… ๊ตฌ์กฐ์™€ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ

ํ•ญ๋ชฉ CTR-MPC DeXtreme (RL)
์ •์ฑ… ํ˜•ํƒœ ์ตœ์ ํ™” ๊ธฐ๋ฐ˜ โ€“ ํ˜„์žฌ ์ƒํƒœ์—์„œ ๊ณ„ํš์„ ๊ณ„์‚ฐ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ โ€“ ๊ด€์ธก โ†’ ํ–‰๋™ ๋งคํ•‘
ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ โ€“ ์ ‘์ด‰๋ ฅ, ์ œ์•ฝ ์กฐ๊ฑด ๋“ฑ ํ™•์ธ ๊ฐ€๋Šฅ ๋‚ฎ์Œ โ€“ ๋ธ”๋ž™๋ฐ•์Šค ์ •์ฑ…
์ œ์•ฝ ์กฐ๊ฑด ์ถ”๊ฐ€ ์šฉ์ด์„ฑ ์šฉ์ด โ€“ ์ˆ˜์‹ ์‚ฝ์ž…๋งŒ์œผ๋กœ ๋ฐ˜์˜ ๊ฐ€๋Šฅ ์–ด๋ ค์›€ โ€“ ๋„คํŠธ์›Œํฌ ์žฌํ•™์Šต ํ•„์š”

์š”์•ฝ

ํ•ญ๋ชฉ CTR-MPC DeXtreme (RL)
์ ‘์ด‰ ์ฒ˜๋ฆฌ ๋ช…์‹œ์ , ํ•ด์„ ๊ฐ€๋Šฅ ์•”๋ฌต์ , ํ•ด์„ ๋ถˆ๊ฐ€
ํ•™์Šต ํ•„์š”์„ฑ ์—†์Œ ํผ (์ˆ˜์‹ญ์–ต ์Šคํ…)
์‹คํ–‰ ์†๋„ ๋А๋ฆฌ์ง€๋งŒ ์ •ํ™• ๋งค์šฐ ๋น ๋ฆ„
์ผ๋ฐ˜ํ™” ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ ์‘ ์ œํ•œ๋œ ๋ชฉํ‘œ ๋‚ด ์ผ๋ฐ˜ํ™”
ํ™•์žฅ์„ฑ ๋ฐ ์œ ์ง€๋ณด์ˆ˜ ์ œ์•ฝ ์ถ”๊ฐ€/๋ณ€๊ฒฝ ์‰ฌ์›€ ์žฌํ•™์Šต ํ•„์š”

๊ฒฐ๋ก  ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

CTR๊ณผ DeXtreme์€ ๊ฐ๊ฐ ์ •ํ™•ํ•˜๊ณ  ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๊ณ„ํš๊ณผ ๋น ๋ฅด๊ณ  ๊ฐ•๊ฑดํ•œ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ œ์–ด๋ผ๋Š” ์ƒ๋ฐ˜๋œ ๊ฐ•์ ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์„ฑ๊ฒฉ์˜ ์ฐจ์ด๋Š” ์˜คํžˆ๋ ค ์ƒํ˜ธ๋ณด์™„์ ์ธ ํ†ตํ•ฉ ๊ฐ€๋Šฅ์„ฑ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

1. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ „๋žต์˜ ๊ฐ€๋Šฅ์„ฑ

์•ž์œผ๋กœ์˜ ์—ฐ๊ตฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ชจ๋ธ์„ ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • CTR์œผ๋กœ ์ƒ์„ฑ๋œ ๊ถค์ ์„ imitation learning์˜ teacher๋กœ ํ™œ์šฉ

    • RL์˜ ์ดˆ๊ธฐ ์ •์ฑ…์„ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ด์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ
  • DeXtreme ์ •์ฑ…์„ warm-start๋กœ ์‚ฌ์šฉํ•˜์—ฌ CTR ์ตœ์ ํ™”๋ฅผ ๊ฐ€์†

    • ์ตœ์ ํ™” ์ดˆ๊ธฐํ™”๋ฅผ RL ์ •์ฑ… ๊ธฐ๋ฐ˜์œผ๋กœ ์„ค์ •ํ•ด ์—ฐ์‚ฐ๋Ÿ‰ ๊ฐ์†Œ
  • ์ ‘์ด‰ ๋ชจ๋ธ์˜ ์ผ๋ถ€๋ฅผ ํ•™์Šต๋œ ๊ทผ์‚ฌ ๋ชจ๋ธ๋กœ ๋Œ€์ฒด

    • ์˜ˆ: ๋งˆ์ฐฐ๊ณ„์ˆ˜ ์ถ”์ •, ๊ฐ์‡  ๊ณ„์ˆ˜ ์ถ”์ • ๋“ฑ ์‹ค์ œ ํ™˜๊ฒฝ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ณด์ •

์ด์ฒ˜๋Ÿผ ์–‘์ธก์˜ ์žฅ์ ์„ ์กฐํ•ฉํ•˜๋Š” ๋ฐฉ์‹์€, ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ์ •ํ™•์„ฑ๊ณผ ํ•™์Šต ๊ธฐ๋ฐ˜ ์œ ์—ฐ์„ฑ์„ ๋™์‹œ์— ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋Š” ์œ ๋งํ•œ ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค.

2. ์‹ค์‹œ๊ฐ„์„ฑ ํ–ฅ์ƒ

CTR-MPC์˜ ๊ฒฝ์šฐ, ์ตœ์ ํ™”์˜ ์‹ค์‹œ๊ฐ„์„ฑ์€ ์—ฌ์ „ํžˆ ์ œํ•œ์ ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ ‘๊ทผ์ด ์ œ์•ˆ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • CTR ๊ธฐ๋ฐ˜ ์ •์ฑ…์„ ์‚ฌ์ „ ํ•™์Šตํ•ด ์‹ ๊ฒฝ๋ง์œผ๋กœ ๊ทผ์‚ฌ (Policy Distillation)
  • CTR ํ•ด๋ฅผ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ˆ˜์ง‘ ํ›„, offline RL์ด๋‚˜ trajectory matching์œผ๋กœ ์ •์ฑ… ํ•™์Šต

์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ์ œ์•ฝ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ์ •์ฑ…์„ ๋น ๋ฅด๊ฒŒ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค„ ๋ฟ ์•„๋‹ˆ๋ผ, ์ •์ฑ…์˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ๋„ ๋ถ€๋ถ„์ ์œผ๋กœ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3. ๋ณด๋‹ค ๋ณต์žกํ•œ ์กฐ์ž‘ ์ž‘์—… ํ™•์žฅ

ํ–ฅํ›„ ์—ฐ๊ตฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋” ๋ณต์žกํ•œ ์ž‘์—…์œผ๋กœ์˜ ํ™•์žฅ์„ ๋ชฉํ‘œ๋กœ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ๋น„์ •ํ˜• ๋ฌผ์ฒด ์กฐ์ž‘ (๋ถˆ๊ทœ์น™ํ•œ ํ˜•์ƒ, ์—ฐ์„ฑ ๋ฌผ์ฒด ๋“ฑ)
  • ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ž…๋ ฅ ํ†ตํ•ฉ (CTR๊ณผ ์นด๋ฉ”๋ผ ์ธ์‹ ๊ฒฐํ•ฉ)
  • ์‚ฌ๋žŒ๊ณผ์˜ ํ˜‘์—… ์กฐ์ž‘ (๊ณต๋™ ์šด๋ฐ˜, ์•ˆ์ „ ์ œ์•ฝ ๋“ฑ ํฌํ•จ)

ํŠนํžˆ CTR ๊ธฐ๋ฐ˜ ์ ‘๊ทผ์€ ์ œ์•ฝ ์กฐ๊ฑด ๊ธฐ๋ฐ˜์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์•ˆ์ „์„ฑ์„ ํ™œ์šฉํ•ด, ์‚ฌ๋žŒ๊ณผ ํ•จ๊ป˜ํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ๋„ ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.


๋งˆ๋ฌด๋ฆฌ

โ€œDexterous Contact-Rich Manipulation via the Contact Trust Regionโ€ ๋…ผ๋ฌธ์€ ๊ณ ๋‚œ๋„ ์กฐ์ž‘์—์„œ ์ ‘์ด‰ ์ œ์•ฝ์„ ์–ด๋–ป๊ฒŒ ๋ช…์‹œ์ ์œผ๋กœ ๋‹ค๋ฃจ๊ณ , ์ด๋ฅผ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ œ์–ด ํ”„๋ ˆ์ž„์›Œํฌ์— ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ์ˆ˜ํ•™์ ์œผ๋กœ ์šฐ์•„ํ•˜๊ฒŒ ํ’€์–ด๋‚ธ ์ž‘์—…์ž…๋‹ˆ๋‹ค. ๊ทธ์— ๋น„ํ•ด DeXtreme์€ ๋Œ€๊ทœ๋ชจ ๊ณ„์‚ฐ ์ž์›์„ ํ™œ์šฉํ•œ ์ „ํ†ต์ ์ธ ์‹ฌ์ธต๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ์‹์ด์ง€๋งŒ, ์‹ค์ œ ์ ์šฉ์„ฑ์— ์žˆ์–ด ๋งค์šฐ ๊ฐ•๋ ฅํ•œ ์ ‘๊ทผ์ž„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์ด ๋‘ ํ๋ฆ„์€ ์„œ๋กœ ๊ฒฝ์Ÿ์ ์ด๋ผ๊ธฐ๋ณด๋‹ค, ๋‹ค์Œ ์„ธ๋Œ€์˜ ์กฐ์ž‘ ์‹œ์Šคํ…œ์—์„œ ๋ณ‘๋ ฌ์ ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ˆ  ์ŠคํŽ™ํŠธ๋Ÿผ์˜ ์–‘๊ทน๋‹จ์œผ๋กœ ์ดํ•ด๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•ž์œผ๋กœ์˜ ์—ฐ๊ตฌ๋Š”, ์ด๋“ค ๋ฐฉ๋ฒ•๋ก ์„ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์„ ํƒํ•˜๊ฑฐ๋‚˜ ์กฐํ•ฉํ•จ์œผ๋กœ์จ, ๋ณด๋‹ค ์œ ์—ฐํ•˜๊ณ  ์•ˆ์ „ํ•˜๋ฉฐ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ๋กœ๋ด‡ ์กฐ์ž‘ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

์™œ ์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ๊ฐ€?

๋กœ๋ด‡ ํŒ”์€ ์™œ ๋ฌผ๊ฑด์„ ์ฅ˜ ๋•Œ ํ•ญ์ƒ ์†๊ฐ€๋ฝ ๋๋งŒ ์‚ฌ์šฉํ• ๊นŒ? ์ธ๊ฐ„์€ ์† ์ „์ฒด, ํŒ”๋š, ์‹ฌ์ง€์–ด ํŒ”๊ฟˆ์น˜๊นŒ์ง€ ์จ์„œ ๋ฌผ๊ฑด์„ ๊ตด๋ฆฌ๊ณ , ๋ฐ€๊ณ , ์ง‘๋Š”๋‹ค. ์ด ์ฐจ์ด๊ฐ€ ๋‹จ์ˆœํžˆ ๊ธฐ๊ณ„ ์„ค๊ณ„์˜ ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ๋ผ ๊ณ„์‚ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ฌธ์ œ๋ผ๋Š” ์‚ฌ์‹ค์ด ์ด ๋…ผ๋ฌธ์˜ ์ถœ๋ฐœ์ ์ด๋‹ค.

์ ‘์ด‰์ด ํ’๋ถ€ํ•œ ์กฐ์ž‘(contact-rich manipulation)์€ ๋กœ๋ด‡๊ณตํ•™์—์„œ ์˜ค๋žซ๋™์•ˆ ์–ด๋ ค์šด ๋ฌธ์ œ์˜€๋‹ค. ์†๊ฐ€๋ฝ ํ•˜๋‚˜๊ฐ€ ๋ฌผ์ฒด์— ๋‹ฟ๋Š” ์ˆœ๊ฐ„, ๋™์—ญํ•™์€ ๊ฐ‘์ž๊ธฐ ๋น„์—ฐ์†์ ์œผ๋กœ ๋ฐ”๋€๋‹ค. ์ด ๋ถˆ์—ฐ์†์„ฑ์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด ๋งŽ์€ ๋ฐฉ๋ฒ•๋“ค์ด ์ œ์•ˆ๋์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์€ ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•ด์™”๋‹ค.

  • ๊ณ„์‚ฐ ํšจ์œจ์„ ํฌ๊ธฐํ•˜๊ณ  ์ •ํ™•ํ•œ ์ ‘์ด‰ ๋ชจ๋ธ์„ ์“ฐ๊ฑฐ๋‚˜
  • ๋ฌผ๋ฆฌ์  ์ •ํ™•์„ฑ์„ ํฌ๊ธฐํ•˜๊ณ  ๋‹จ์ˆœํ•œ ์„ ํ˜• ๊ทผ์‚ฌ๋ฅผ ์“ฐ๊ฑฐ๋‚˜

MIT CSAIL์˜ H.J. Terry Suh, Tao Pang, Tong Zhao, Russ Tedrake ํŒ€์ด ์ œ์•ˆํ•œ Contact Trust Region (CTR)์€ ์ด ๋”œ๋ ˆ๋งˆ๋ฅผ ์ •๋ฉด์œผ๋กœ ๋ŒํŒŒํ•œ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๋†€๋ž๋„๋ก ๊ฐ„๋‹จํ•˜๋‹ค: โ€œTaylor ๊ทผ์‚ฌ๋ฅผ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์˜์—ญ์ด ์–ด๋””์ธ์ง€๋ฅผ ์ ‘์ด‰์˜ ๋ฌผ๋ฆฌ์  ๋ณธ์งˆ์— ๋งž๊ฒŒ ์ •์˜ํ•˜์ž.โ€

๊ฒฐ๊ณผ๋Š” ์ธ์ƒ์ ์ด๋‹ค. Allegro Hand๋กœ ํ๋ธŒ๋ฅผ in-hand ์žฌ๋ฐฐํ–ฅํ•˜๋Š” ๋กœ๋“œ๋งต์„, ๋ณ‘๋ ฌํ™” ์—†์ด ์ผ๋ฐ˜ ๋…ธํŠธ๋ถ CPU์—์„œ 10๋ถ„ ์ด๋‚ด์— ๊ตฌ์ถ•ํ•˜๊ณ , ์˜จ๋ผ์ธ ์ถ”๋ก ์€ ๋ถˆ๊ณผ ๋ช‡ ์ดˆ ๋งŒ์— ๋๋‚œ๋‹ค. ์ด๊ฒƒ์€ ๊ธฐ์กด RL ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋“ค์ด ์ˆ˜๋ฐฑ GPU-์‹œ๊ฐ„์„ ์Ÿ์•„๋ถ“๋Š” ๊ฒƒ๊ณผ ๊ทน๋ช…ํžˆ ๋Œ€๋น„๋œ๋‹ค.


๋ฌธ์ œ ์ •์˜: ์ ‘์ด‰ ๋™์—ญํ•™์˜ ๊ตญ์†Œ ๊ทผ์‚ฌ ๋ฌธ์ œ

์ ‘์ด‰์˜ ๋น„์—ฐ์†์„ฑ๊ณผ Taylor ๊ทผ์‚ฌ์˜ ํ•จ์ •

๋กœ๋ด‡ ์ œ์–ด์—์„œ ๋ฐ˜๋ณต์ (iterative) ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ•ต์‹ฌ์—๋Š” ํ•ญ์ƒ ๊ฐ™์€ ์งˆ๋ฌธ์ด ์žˆ๋‹ค.

โ€œํ˜„์žฌ ์ƒํƒœ ๊ทผ๋ฐฉ์—์„œ ๋™์—ญํ•™์„ ์–ด๋–ป๊ฒŒ ๊ทผ์‚ฌํ•  ๊ฒƒ์ธ๊ฐ€?โ€

๋งค๋„๋Ÿฌ์šด ์‹œ์Šคํ…œ์ด๋ผ๋ฉด Taylor 1์ฐจ ์ „๊ฐœ๋กœ ์ถฉ๋ถ„ํ•˜๋‹ค. ๋ฌธ์ œ๋Š” ์ ‘์ด‰ ๋™์—ญํ•™์ด ๋ณธ์งˆ์ ์œผ๋กœ ๋น„๋งค๋„๋Ÿฝ๋‹ค(non-smooth)๋Š” ๊ฒƒ์ด๋‹ค. ๋ฌผ์ฒด๊ฐ€ ํ‘œ๋ฉด์— ๋‹ฟ๋Š” ์ˆœ๊ฐ„, ๋˜๋Š” ๋–จ์–ด์ง€๋Š” ์ˆœ๊ฐ„, ๋™์—ญํ•™์€ ๋ถˆ์—ฐ์†์ ์œผ๋กœ ๋ฐ”๋€๋‹ค.

์ด์— ๋Œ€ํ•œ ๊ธฐ์กด์˜ ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ๋ฒ•์„ ์ •๋ฆฌํ•˜๋ฉด:

๋ฐฉ๋ฒ• ๋Œ€ํ‘œ ์—ฐ๊ตฌ ํ•ต์‹ฌ ์•„์ด๋””์–ด ํ•œ๊ณ„
์ ‘์ด‰-์•”์‹œ์  ๊ถค์  ์ตœ์ ํ™” (CITO) Posa et al. 2014, Ding et al. 2019 ์ƒ๋ณด์„ฑ ์ œ์•ฝ(complementarity) ์ง์ ‘ ์ธ์ฝ”๋”ฉ ์ง€์ˆ˜์ ์œผ๋กœ ํญ๋ฐœํ•˜๋Š” ์ ‘์ด‰ ๋ชจ๋“œ ์ˆ˜
๋ฏธ๋ถ„๊ฐ€๋Šฅ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ธฐ๋ฐ˜ Pang et al. 2023, Howell et al. 2023 smoothed dynamics์˜ Taylor ์ „๊ฐœ Taylor ๊ทผ์‚ฌ์˜ ์‹ ๋ขฐ ์˜์—ญ ๋ฏธ์ •์˜
RL (์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฐ˜) OpenAI 2019, Chen 2022 ์ˆ˜๋ฐฑ๋งŒ rollout์œผ๋กœ ์ •์ฑ… ํ•™์Šต ๋ง‰๋Œ€ํ•œ ๊ณ„์‚ฐ ์ž์› ํ•„์š”

์ €์ž๋“ค์ด ๋ฐœ๊ฒฌํ•œ ํ•ต์‹ฌ ํ†ต์ฐฐ: ๋ฏธ๋ถ„๊ฐ€๋Šฅ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ Taylor ์ „๊ฐœ์™€ CITO์˜ ์ƒ๋ณด์„ฑ ์กฐ๊ฑด์€ ์‚ฌ์‹ค ๊ฐ™์€ ๊ฒƒ์ด๋‹ค. ๋‘˜ ๋‹ค Implicit Function Theorem์„ ํ†ตํ•ด ์—ฐ๊ฒฐ๋œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ Taylor ๊ทผ์‚ฌ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ, ์‚ฌ๋žŒ๋“ค์€ ์•”๋ฌต์ ์œผ๋กœ ํƒ€์›ํ˜• ์‹ ๋ขฐ ์˜์—ญ(Ellipsoidal Trust Region, ETR)์„ ๊ฐ€์ •ํ•ด์™”๋‹ค.

ETR (Ellipsoidal Trust Region):
  || Sigma^{-1/2} * [delta_q; delta_u] ||_2 <= 1

Symmetric -> allows "pulling" as much as "pushing"
Contact is UNILATERAL -> pulling is physically impossible!

ํ•ต์‹ฌ ๋ฌธ์ œ: ETR์€ ์ ‘์ด‰์˜ ๋‹จ๋ฐฉํ–ฅ์„ฑ๊ณผ ๋ชจ์ˆœ๋œ๋‹ค

๊ตฌํ˜• ๋กœ๋ด‡์ด ๋ฐ•์Šค๋ฅผ ๋ฐ€๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์„ ์ƒ์ƒํ•ด๋ณด์ž. ETR์€ ๋Œ€์นญ์ ์ด๋ฏ€๋กœ, โ€œ์•ž์œผ๋กœ ๋ฏธ๋Š” ๊ฒƒโ€๊ณผ โ€œ๋’ค๋กœ ๋‹น๊ธฐ๋Š” ๊ฒƒโ€์„ ๋™๋“ฑํ•˜๊ฒŒ ๊ตญ์†Œ์ ์œผ๋กœ ์œ ํšจํ•œ ํ–‰๋™์œผ๋กœ ์ทจ๊ธ‰ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ ‘์ด‰๋ ฅ์€ ๋‹จ๋ฐฉํ–ฅ(unilateral)์ด๋‹ค โ€” ๋กœ๋ด‡์€ ๋ฐ•์Šค๋ฅผ ๋ฐ€ ์ˆ˜ ์žˆ์ง€๋งŒ, ๋‹น๊ธธ ์ˆ˜๋Š” ์—†๋‹ค.

์ด๊ฒƒ์ด ETR์˜ ๊ทผ๋ณธ์  ์˜ค๋ฅ˜๋‹ค. ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ ํ–‰๋™์„ ๊ตญ์†Œ ๋ชจ๋ธ์ด ํ—ˆ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์€, ๊ณ„ํš๊ธฐ๊ฐ€ ์—‰๋šฑํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ์ตœ์ ํ™”๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋œป์ด๋‹ค.


๋ฐฉ๋ฒ•๋ก : Contact Trust Region (CTR)

๊ธฐ๋ฐ˜ ๋ชจ๋ธ: CQDC (Convex Quasidynamic Differentiable Contact)

CTR์€ Pang et al. (2023)์˜ CQDC ๋ชจ๋ธ ์œ„์—์„œ ๊ตฌ์ถ•๋œ๋‹ค. CQDC๋Š” ์ ‘์ด‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ๋ณผ๋ก ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ ํ”„๋ ˆ์ด๋ฐํ•˜๋ฉฐ, ์ด ๋•๋ถ„์— KKT ์กฐ๊ฑด์„ ํ†ตํ•œ ๊ฐ๋„ ๋ถ„์„(sensitivity analysis)์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

์‹œ์Šคํ…œ ๊ตฌ์„ฑ์„ ๊ฐ„๋žตํžˆ ์ •๋ฆฌํ•˜๋ฉด:

  • ์ƒํƒœ: q = (q^a, q^o) โ€” ๋กœ๋ด‡(actuated) + ๋ฌผ์ฒด(unactuated) configuration
  • ์ž…๋ ฅ: u \in \mathbb{R}^{n_{q_a}} โ€” ๊ฐ•์„ฑ ์ปจํŠธ๋กค๋Ÿฌ๋กœ์˜ ์œ„์น˜ ๋ช…๋ น
  • ์ ‘์ด‰๋ ฅ: \lambda_i = (\lambda_{n_i}, \lambda_{t_i}) โ€” ๋ฒ•์„  + ๋งˆ์ฐฐ ์„ฑ๋ถ„

์šด๋™ ๋ฐฉ์ •์‹์€ ๋‹ค์Œ force balance ํ˜•ํƒœ:

\mathbf{P}(q) q_+ + b(q,u) - \sum_{i=1}^{n_c} \mathbf{J}_i(q)^\top \lambda_i = 0

์—ฌ๊ธฐ์„œ \mathbf{P}๋Š” ๊ฐ•์„ฑ/์งˆ๋Ÿ‰ ํ–‰๋ ฌ, b๋Š” ์ค‘๋ ฅ ๋“ฑ์„ ํฌํ•จํ•œ ์™ธ๋ ฅ ๋ฒกํ„ฐ๋‹ค.

์ ‘์ด‰ ์ œ์•ฝ์€ ๋‘ ๊ฐ€์ง€:

  1. ๋ฒ•์„  ์†๋„-ํž˜ ์กฐ๊ฑด (Signorini): ๋ฌผ์ฒด ํ‘œ๋ฉด์ด ๋งž๋‹ฟ์„ ๋•Œ๋งŒ ๋ฒ•์„ ๋ ฅ์ด ์ƒ๊ธฐ๋ฉฐ, ์„œ๋กœ ์นจํˆฌํ•˜์ง€ ์•Š๋Š”๋‹ค.
  2. ๋งˆ์ฐฐ ์›๋ฟ” (Coulomb friction): ์ ‘์ด‰๋ ฅ์ด ๋งˆ์ฐฐ ์›๋ฟ” \mathcal{K}_i^\star ๋‚ด์— ์กด์žฌํ•ด์•ผ ํ•œ๋‹ค.

\lambda_i \in \mathcal{K}_i^\star = \{(\lambda_{n_i}, \lambda_{t_i}) \mid \|\lambda_{t_i}\| \leq \mu_i \lambda_{n_i},\ \lambda_{n_i} \geq 0\}

CQDC๋Š” ์ด๋ฅผ ํ•˜๋‚˜์˜ ๋ณผ๋ก ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ ํ†ตํ•ฉํ•œ๋‹ค:

q_+ = \arg\min_{q_+} \left[ \frac{1}{2} q_+^\top \mathbf{P}(q) q_+ + b(q,u)^\top q_+ \right]

\text{subject to contact constraints at each contact pair } i

์ด ๋ณผ๋ก ๊ตฌ์กฐ๊ฐ€ CTR์˜ ํ•ต์‹ฌ ์žฌ๋ฃŒ๊ฐ€ ๋œ๋‹ค.

Smoothing: ๋ถˆ์—ฐ์†์„ฑ์„ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ๋งŒ๋“ค๊ธฐ

์ˆœ์ˆ˜ํ•œ ์ ‘์ด‰ ๋™์—ญํ•™์€ ๋น„์—ฐ์†์ ์ด๋‹ค โ€” ๋ฌผ์ฒด๊ฐ€ ๋‹ฟ๋Š” ์ˆœ๊ฐ„ ๋™์—ญํ•™์ด ๋ถˆ์—ฐ์†์œผ๋กœ ๋ฐ”๋€๋‹ค. ์ด๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด CQDC๋Š” barrier smoothing์„ ์‚ฌ์šฉํ•œ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ \kappa๋กœ ๋‚ด๋ถ€์  ์™„ํ™”(interior-point relaxation)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด, ๋ฉ€๋ฆฌ์„œ๋„ ์ž‘์€ ์ ‘์ด‰๋ ฅ์ด ์ƒ๊ธฐ๋Š” โ€œํž˜์ด ๊ฑฐ๋ฆฌ์—์„œ ๋‚˜์˜ค๋Š”โ€ ๋™์—ญํ•™ f_{(\kappa)}๊ฐ€ ๋งŒ๋“ค์–ด์ง„๋‹ค.

\kappa \to 0์ผ ๋•Œ ์›๋ž˜ ๋น„์—ฐ์† ๋™์—ญํ•™์œผ๋กœ ์ˆ˜๋ ดํ•˜๊ณ , \kappa๊ฐ€ ํด์ˆ˜๋ก ๋” ๋งค๋„๋Ÿฌ์šด ๊ทผ์‚ฌ๊ฐ€ ๋œ๋‹ค. ์ด smoothing์ด ๋ฐ”๋กœ RL์ด contact-rich ๋ฌธ์ œ์—์„œ ์„ฑ๊ณตํ•  ์ˆ˜ ์žˆ๋Š” ํ•ต์‹ฌ ์ด์œ ๋ผ๋Š” ๊ฒƒ์„ Suh et al. (2022)์ด ๋ถ„์„ํ–ˆ๋‹ค.

Sensitivity Analysis: Primal๊ณผ Dual ๋ชจ๋‘์—์„œ ๊ธฐ์šธ๊ธฐ ์ถ”์ถœ

Taylor ๊ทผ์‚ฌ๋ฅผ ์œ„ํ•ด ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ์ฃผ๋กœ ์ƒํƒœ(configuration, primal variable)์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ๋งŒ ์‚ฌ์šฉํ–ˆ๋‹ค.

\hat{q}_+ = f_{(\kappa)}(\bar{q}, \bar{u}) + \mathbf{A}_{(\kappa)} \delta q + \mathbf{B}_{(\kappa)} \delta u

์—ฌ๊ธฐ์„œ \mathbf{A}_{(\kappa)} = \partial f_{(\kappa)}/\partial q, \mathbf{B}_{(\kappa)} = \partial f_{(\kappa)}/\partial u.

CTR์˜ ํ•ต์‹ฌ ํ˜์‹ ์€ ์ ‘์ด‰๋ ฅ(dual variable)์— ๋Œ€ํ•ด์„œ๋„ ์„ ํ˜• ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค:

\hat{\lambda}_{(\kappa),i} = \bar{\lambda}_{(\kappa),i} + \mathbf{C}_{(\kappa),i} \delta q + \mathbf{D}_{(\kappa),i} \delta u

์—ฌ๊ธฐ์„œ \mathbf{C}_{(\kappa),i} = \partial \lambda_{(\kappa),i}/\partial q, \mathbf{D}_{(\kappa),i} = \partial \lambda_{(\kappa),i}/\partial u.

์ด dual gradient๋Š” CQDC์˜ ๋ณผ๋ก ์ตœ์ ํ™” ๊ตฌ์กฐ ๋•๋ถ„์— ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ KKT ์กฐ๊ฑด์˜ ๊ฐ๋„ ๋ถ„์„์œผ๋กœ ์œ ๋„๋œ๋‹ค. ๋งŽ์€ ๋ฏธ๋ถ„๊ฐ€๋Šฅ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋“ค์ด ์ด๋ฏธ ์ด ์ •๋ณด๋ฅผ ๊ฐ–๊ณ  ์žˆ์ง€๋งŒ, ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ  ์žˆ์—ˆ๋˜ ๊ฒƒ์ด๋‹ค.

CTR ์ •์˜: ์ ‘์ด‰์˜ ๋ฌผ๋ฆฌ์  ์ œ์•ฝ์„ ์‹ ๋ขฐ ์˜์—ญ์— ํ†ตํ•ฉํ•˜๊ธฐ

์ด์ œ ํ•ต์‹ฌ์ด๋‹ค. CTR์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค:

Definition 1 (Contact Trust Region, CTR)

\mathcal{S}_{\mathbf{\Sigma}, \kappa} = \mathcal{E}_\mathbf{\Sigma} \cap \left\{ (\delta q, \delta u) \mid \hat{\lambda}_{(\kappa),i}(\delta q, \delta u) \in \mathcal{K}_i^\star,\ \forall i \right\}

์ฆ‰, CTR = ETR (ํฌ๊ธฐ ์ œํ•œ) + ์ถ”์ •๋œ ์ ‘์ด‰๋ ฅ์ด ๋งˆ์ฐฐ ์›๋ฟ” ์•ˆ์— ์žˆ์–ด์•ผ ํ•œ๋‹ค๋Š” ์กฐ๊ฑด.

์ด ์ •์˜๋ฅผ ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•ด๋ณด์ž.

๊ธฐ์กด ETR์˜ ์„ธ๊ณ„:
  "ํ˜„์žฌ ์ƒํƒœ์—์„œ ๋ฐ˜๊ฒฝ r ์ด๋‚ด์˜ ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ๋“  ์›€์ง์—ฌ๋„ ๋œ๋‹ค"
  (๋ฐ€๊ธฐ๋„ ๋˜๊ณ , ๋‹น๊ธฐ๊ธฐ๋„ ๋œ๋‹ค๊ณ  ์•”๋ฌต์ ์œผ๋กœ ํ—ˆ์šฉ)

CTR์˜ ์„ธ๊ณ„:
  "ํ˜„์žฌ ์ƒํƒœ์—์„œ ๋ฐ˜๊ฒฝ r ์ด๋‚ด + ๊ทธ ์›€์ง์ž„์ด ๋งŒ๋“ค์–ด๋‚ด๋Š” ์ ‘์ด‰๋ ฅ์ด
   ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊ฐ€๋Šฅํ•œ ๋ฒ”์œ„ ์•ˆ์— ์žˆ์–ด์•ผ ํ•œ๋‹ค"
  (๋งˆ์ฐฐ ์›๋ฟ” ๋ฐ–์˜ ํž˜์ด ํ•„์š”ํ•œ ์›€์ง์ž„์€ ์ด๋ฏธ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅ)

CTR์ด ์™„์ „ํ•œ ์ ‘์ด‰ ๋™์—ญํ•™ ์ œ์•ฝ๊ณผ ๊ตญ์†Œ์ ์œผ๋กœ ๋™์น˜์ž„์„ ์ฆ๋ช…

๋…ผ๋ฌธ์˜ ์ˆ˜ํ•™์  ํ•ต์‹ฌ ๊ธฐ์—ฌ ์ค‘ ํ•˜๋‚˜๋Š” ๋‹ค์Œ Lemma๋‹ค:

Lemma 2: ์ถฉ๋ถ„ํžˆ ์ž‘์€ ์‹ ๋ขฐ ์˜์—ญ ๋ฐ˜๊ฒฝ์—์„œ, CTR์€ smoothed ์ ‘์ด‰ ๋™์—ญํ•™์˜ ์™„์ „ํ•œ ์ œ์•ฝ ์ง‘ํ•ฉ๊ณผ ๊ตญ์†Œ์ ์œผ๋กœ ๋™์น˜๋‹ค.

์ด๊ฒƒ์ด ์™œ ์ค‘์š”ํ•œ๊ฐ€? CITO ๋ฐฉ๋ฒ•๋“ค์€ ์ƒ๋ณด์„ฑ ์กฐ๊ฑด(complementarity constraints)์„ ์ง์ ‘ ๋‹ค๋ฃจ์–ด์•ผ ํ•ด์„œ ๊ณ„์‚ฐ์ด ์–ด๋ ต๋‹ค. CTR์€ ์ด ๋ณต์žกํ•œ ์ƒ๋ณด์„ฑ ์กฐ๊ฑด์„ ๋ณผ๋ก ์ง‘ํ•ฉ(convex set, ๊ตฌ์ฒด์ ์œผ๋กœ Second-Order Cone ์ œ์•ฝ๋“ค์˜ ๊ต์ง‘ํ•ฉ)์œผ๋กœ ๋Œ€์ฒดํ•˜๋ฉด์„œ๋„ ๊ตญ์†Œ์  ์ •ํ™•์„ฑ์„ ์œ ์ง€ํ•œ๋‹ค.

\mathcal{S}_{\mathbf{\Sigma}, \kappa} = \mathcal{E}_\mathbf{\Sigma} \cap \bigcap_{i=1}^{n_c} \{ (\delta q, \delta u) \mid \hat{\lambda}_{n_i} \geq 0,\ \|\hat{\lambda}_{t_i}\| \leq \mu_i \hat{\lambda}_{n_i} \}

๊ฐ i์— ๋Œ€ํ•œ ๋งˆ์ฐฐ ์›๋ฟ” ์กฐ๊ฑด \|\hat{\lambda}_{t_i}\| \leq \mu_i \hat{\lambda}_{n_i}๋Š” Second-Order Cone (SOCP) ์ œ์•ฝ์ด๋‹ค โ€” ๋ณผ๋กํ•˜๊ณ , ํšจ์œจ์ ์œผ๋กœ ํ’€ ์ˆ˜ ์žˆ๋‹ค.

Action-only CTR๊ณผ Motion Set

์‹ค์ œ ๊ณ„ํš์—์„œ๋Š” ์ƒํƒœ q๋ณด๋‹ค ์ œ์–ด ์ž…๋ ฅ u์— ์ œ์•ฝ์„ ๊ฑฐ๋Š” ๊ฒƒ์ด ๋” ์ž์—ฐ์Šค๋Ÿฝ๋‹ค. ์ €์ž๋“ค์€ Action-only CTR์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค:

\mathcal{S}^{\mathcal{A}}_{\mathbf{\Sigma}, \kappa} = \{ \delta u \mid \exists \delta q : (\delta q, \delta u) \in \mathcal{S}_{\mathbf{\Sigma}, \kappa} \}

์ฆ‰, ์–ด๋–ค ์ƒํƒœ ๋ณ€ํ™”๊ฐ€ ์กด์žฌํ•˜์—ฌ CTR ์กฐ๊ฑด์„ ๋งŒ์กฑ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ œ์–ด ์ž…๋ ฅ์˜ ์ง‘ํ•ฉ์ด๋‹ค.

๋” ๋‚˜์•„๊ฐ€ Motion Set \mathcal{M}^{(\mathcal{A})}_{\mathbf{\Sigma}, \kappa}๋Š” CTR ๋‚ด์˜ ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ๋‹ค์Œ ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ด๋Š” ๊ณ ์ „์ ์ธ Wrench Set ๊ฐœ๋…๊ณผ ์ง์ ‘ ์—ฐ๊ฒฐ๋œ๋‹ค:

\mathcal{W}^{\mathcal{A}}_{\mathbf{\Sigma}, \kappa} = \left\{ \sum_{i=1}^{n_c} \mathbf{J}_{o_i}^\top \lambda \mid \lambda \in \mathcal{C}^{\mathcal{A}}_{\mathbf{\Sigma}, \kappa, i} \right\}

์ด๊ฒƒ์€ ๊ทธ๋ƒฅ ์ˆ˜์‹์  ํ˜ธ๊ธฐ์‹ฌ์ด ์•„๋‹ˆ๋‹ค. CTR์ด ๊ณ ์ „ ๋กœ๋ด‡๊ณตํ•™์˜ Wrench Set, ๋งˆ์ฐฐ ์›๋ฟ” ๋ถ„์„๊ณผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์—ฐ๊ฒฐ๋จ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์ƒˆ๋กœ์šด ๊ฐœ๋…์ด ๊ณ ์ „ ์ด๋ก ์˜ ํ™•์žฅ์ž„์„ ์‹œ์‚ฌํ•œ๋‹ค.

Relaxed CTR (R-CTR): ์‹ค์šฉ์„ฑ์„ ์œ„ํ•œ ์™„ํ™”

CTR์€ ์ด๋ก ์ ์œผ๋กœ ์™„๋ฒฝํ•˜์ง€๋งŒ, ์‹ค์ œ MPC์—์„œ๋Š” ํ•œ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค: CTR์ด ๋„ˆ๋ฌด ์ž‘์„ ๊ฒฝ์šฐ, ์ตœ์ ํ™” ๋ฌธ์ œ๊ฐ€ ์‹คํ–‰ ๋ถˆ๊ฐ€๋Šฅ(infeasible)ํ•ด์งˆ ์ˆ˜ ์žˆ๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Relaxed CTR (R-CTR)์„ ๋„์ž…ํ•œ๋‹ค:

\tilde{\mathcal{S}}_{\mathbf{\Sigma}, \kappa} = \mathcal{E}_\mathbf{\Sigma} \cap \left\{ (\delta q, \delta u) \mid \hat{\lambda}_{n_i} + s_i \geq 0,\ \|\hat{\lambda}_{t_i}\| \leq \mu_i (\hat{\lambda}_{n_i} + s_i),\ s_i \geq 0 \right\}

์Šฌ๋ž™ ๋ณ€์ˆ˜ s_i๋ฅผ ๋„์ž…ํ•˜์—ฌ, ๋งˆ์ฐฐ ์›๋ฟ” ์กฐ๊ฑด์„ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์™„ํ™”ํ•œ๋‹ค. ์ด ์Šฌ๋ž™ ๋ณ€์ˆ˜์—๋Š” ๋น„์šฉ ํ•จ์ˆ˜์—์„œ penalty๊ฐ€ ๋ถ€๊ณผ๋œ๋‹ค. ์ด๋กœ์จ ํ•ญ์ƒ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ(always feasible) MPC๋ฅผ ๋ณด์žฅํ•œ๋‹ค.


์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜: CTR ๊ธฐ๋ฐ˜ MPC์™€ ๊ธ€๋กœ๋ฒŒ ํ”Œ๋ž˜๋„ˆ

CTR์„ ๊ตฌ์ถ•ํ•œ ํ›„, ์ €์ž๋“ค์€ ์„ธ ๋‹จ๊ณ„์˜ ๊ณ„ํš ์ฒด๊ณ„๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

flowchart TB
    subgraph LOCAL["Local Planning (CTR-MPC)"]
        A[Current State q, u] --> B[CQDC Simulator]
        B --> C[Sensitivity Analysis]
        C --> D[A, B, C, D matrices]
        D --> E[CTR Construction]
        E --> F[SOCP Trajectory Optimization]
        F --> G[Optimal Local Plan]
    end

    subgraph GLOBAL["Global Planning (Roadmap)"]
        H[Goal Configurations] --> I[Node Generation]
        I --> J[Contact Config Search]
        J --> K[CTR-MPC Edge Connection]
        K --> L[Roadmap Graph]
    end

    subgraph INFERENCE["Online Inference"]
        L --> M[Find Nearest Node]
        M --> N[Graph Search]
        N --> O[Execute MPC Policy]
    end

    G --> K
    O --> A
Figure 1: CTR-based manipulation system architecture

Contribution 2: CTR-MPC (Local Contact-Implicit MPC)

CTR ๊ธฐ๋ฐ˜ ๊ถค์  ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ CtrTrajOpt๋Š” ๋‹ค์Œ ๋ฌธ์ œ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ํ‘ผ๋‹ค:

Algorithm 1: CtrTrajOpt

Input: q_0, u_0, goal q_goal, horizon T, max_iter n_max

Repeat n_max times:
  For t = 0 to T-1:
    Compute A_(k),t, B_(k),t from CQDC sensitivity analysis
    Compute C_(k),t, D_(k),t (dual gradients for contact forces)
    Build R-CTR constraints: S_tilde_{Sigma, k}

  Solve SOCP:
    min  sum_t || q_t - q_goal ||^2_Q + || delta_u_t ||^2_R
    s.t. q_{t+1} = A_(k),t * q_t + B_(k),t * delta_u_t  (linear dynamics)
         (delta_q_t, delta_u_t) in S_tilde_{Sigma, k}    (R-CTR)
         joint limits, torque limits

  Update nominal trajectory
Until convergence

Output: Optimal trajectory {q_t, u_t}

์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐ ๋ฐ˜๋ณต์ด SOCP(Second-Order Cone Program)์ด๋ฏ€๋กœ, ํšจ์œจ์ ์ธ ๋ณผ๋ก ์†”๋ฒ„(e.g., ECOS, Mosek)๋กœ ํ•ด๊ฒฐ๋œ๋‹ค. iLQR์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ํ™•์žฅ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, โ€œCTR์ด ์ถ”๊ฐ€๋œ contact-aware iLQRโ€์ด๋ผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค.

MPC ์ ์šฉ: CtrTrajOpt๋ฅผ ์˜จ๋ผ์ธ์—์„œ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‹คํ–‰ํ•˜์—ฌ MPC๋ฅผ ๊ตฌ์„ฑํ•œ๋‹ค.

Algorithm 2: CTR-MPC

Input: current state q, goal q_goal, rollout horizon H

1. Run CtrTrajOpt with horizon T from q to q_goal
2. Execute first H steps of computed trajectory
3. Re-plan from new state
4. Repeat until goal reached or timeout

์ดˆ๊ธฐ ์ถ”์ • ํœด๋ฆฌ์Šคํ‹ฑ: CtrTrajOpt๋Š” ์ดˆ๊ธฐ ๊ถค์  ์ถ”์ •์ด ํ•„์š”ํ•˜๋‹ค. ์ €์ž๋“ค์€ ๊ฐ„๋‹จํ•˜์ง€๋งŒ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค: ์ง์„  ๋ณด๊ฐ„. ์ด ๋‹จ์ˆœํ•œ ์ดˆ๊ธฐ ์ถ”์ •์ด ์‹ค์ œ๋กœ ์ž˜ ์ž‘๋™ํ•˜๋Š” ์ด์œ ๋Š”, CTR์ด ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์˜๋ฏธ์žˆ๋Š” ์˜์—ญ๋งŒ ํƒ์ƒ‰ํ•˜๋„๋ก ์ œ์•ฝํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

Contribution 3: ๊ธ€๋กœ๋ฒŒ ํ”Œ๋ž˜๋„ˆ โ€” ๋กœ๋“œ๋งต

๋กœ์ปฌ MPC๋Š” ๊ฐ€๊นŒ์šด ๋ชฉํ‘œ์— ํƒ์›”ํ•˜์ง€๋งŒ, ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ๋ชฉํ‘œ(์˜ˆ: ํ๋ธŒ๋ฅผ 90๋„ ์ด์ƒ ํšŒ์ „)์—๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ๋กœ๋“œ๋งต ๊ธฐ๋ฐ˜ ๊ธ€๋กœ๋ฒŒ ํ”Œ๋ž˜๋„ˆ๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

๋กœ๋“œ๋งต ๊ตฌ์ถ• ๊ณผ์ •:

flowchart LR
    A[Sample Stable Object\nConfigurations] --> B[Generate Goal-conditioned\nContact Configurations]
    B --> C[CTR Value Function\nEvaluation]
    C --> D[Select Best\nContact Config]
    D --> E[Run CTR-MPC\nto Connect Nodes]
    E --> F{Connection\nSuccessful?}
    F -- Yes --> G[Add Edge to Roadmap]
    F -- No --> H[Discard]
    G --> I[Complete Roadmap]

    I --> J[Online Query]
    J --> K[Find Nearest Node]
    K --> L[Graph Search\nfor Path]
    L --> M[Execute Edge\nMPC Policies]
Figure 2: Roadmap construction and inference pipeline

ํ•ต์‹ฌ: Goal-conditioned Contact Configuration ํƒ์ƒ‰ (Section 7)

๋‘ ๋…ธ๋“œ๋ฅผ CTR-MPC๋กœ ์—ฐ๊ฒฐํ•˜๋ ค๋ฉด, ๋กœ๋ด‡์ด ์–ด๋–ค ์ดˆ๊ธฐ ์ž์„ธ(contact configuration)๋ฅผ ์ทจํ•ด์•ผ ํ•˜๋Š”์ง€๊ฐ€ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ๋‚˜์œ ์ดˆ๊ธฐ ์ž์„ธ์—์„œ๋Š” MPC๊ฐ€ ๋ชฉํ‘œ์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•œ๋‹ค.

์ €์ž๋“ค์€ ์ด๋ฅผ ๋‹ค์Œ ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ ์ •์˜ํ•œ๋‹ค:

q^a_{\text{init}} = \arg\min_{q^a} \underbrace{V_H(q; q_{\text{goal}})}_{\text{MPC value function}} + \underbrace{\rho \cdot R(q)}_{\text{robustness regularizer}}

  • MPC Value Function V_H: ํ˜„์žฌ ์ƒํƒœ์—์„œ MPC๋ฅผ H step ์‹คํ–‰ํ–ˆ์„ ๋•Œ ๋ชฉํ‘œ๊นŒ์ง€์˜ ์˜ˆ์ธก ๋น„์šฉ. ์ด๋ฅผ CTR์˜ Motion Set๋ฅผ ์ด์šฉํ•ด ํšจ์œจ์ ์œผ๋กœ ๊ทผ์‚ฌํ•œ๋‹ค.
  • Robustness Regularizer R(q): ํ˜„์žฌ contact configuration์ด ๋ชฉํ‘œ ๋ฐฉํ–ฅ ์กฐ์ž‘์— ์–ผ๋งˆ๋‚˜ ์ ํ•ฉํ•œ์ง€ โ€” CTR์˜ Motion Set์ด ๋ชฉํ‘œ ๋ฐฉํ–ฅ์„ ํฌํ•จํ•˜๋Š” ์ •๋„๋ฅผ ์ธก์ •.

์ด ์ตœ์ ํ™”๋Š” ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฐ˜์œผ๋กœ ํ’€๋ฆฌ๋ฉฐ, CTR์˜ Motion Set์„ ํ™œ์šฉํ•œ ๋น ๋ฅธ ํ‰๊ฐ€ ๋•๋ถ„์— ํšจ์œจ์ ์ด๋‹ค.


์‹คํ—˜: ๋‘ ์‹œ์Šคํ…œ์—์„œ์˜ ๊ฒ€์ฆ

์ €์ž๋“ค์€ ๋‘ ๊ฐ€์ง€ ๋Œ€ํ‘œ์ ์ธ contact-rich ์‹œ์Šคํ…œ์—์„œ ๋ฐฉ๋ฒ•๋ก ์„ ๊ฒ€์ฆํ•œ๋‹ค.

์‹คํ—˜ ์‹œ์Šคํ…œ

Figure 1 ์ฐธ์กฐ: ๋…ผ๋ฌธ์˜ ํ•˜๋“œ์›จ์–ด ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐฐ๋„ˆ ์ด๋ฏธ์ง€๋กœ, ์™ผ์ชฝ์€ Allegro Hand์˜ in-hand ํ๋ธŒ ์žฌ๋ฐฐํ–ฅ, ์˜ค๋ฅธ์ชฝ์€ iiwaBimanual์˜ ๋ฒ„ํ‚ท ์กฐ์ž‘ ์žฅ๋ฉด์ด๋‹ค.

์‹œ์Šคํ…œ DOF (actuated) DOF (unactuated) ์ถฉ๋Œ ๊ธฐํ•˜ ์ˆ˜ ํƒœ์Šคํฌ
IiwaBimanual (ํ‰๋ฉด) 6 3 29 (๊ตฌ 14๊ฐœ/arm + ์‹ค๋ฆฐ๋”) SE(2) ๋ฒ„ํ‚ท ์กฐ์ž‘
AllegroHand (3D) 16 6 39 6cm ํ๋ธŒ ์žฌ๋ฐฐํ–ฅ

AllegroHand ์‹œ์Šคํ…œ์€ Wonik Robotics์˜ Allegro Hand V4๋กœ, ์†๋ชฉ์ด ์›”๋“œ ํ”„๋ ˆ์ž„์— ๊ณ ์ •๋˜๊ณ  ํ๋ธŒ๋Š” ์ž์œ ๋กญ๊ฒŒ ์›€์ง์ธ๋‹ค. 6D pose ์žฌ๋ฐฐํ–ฅ์ด ๋ชฉํ‘œ๋‹ค.

์‹คํ—˜ 1: ๋กœ์ปฌ MPC ์„ฑ๋Šฅ (Section 5)

์‹ ๋ขฐ ์˜์—ญ ๋ฐ˜๊ฒฝ r๊ณผ Rollout ํ˜ธ๋ผ์ด์ฆŒ H์˜ ์˜ํ–ฅ

  • r์ด ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด: ํ•œ ๋ฒˆ์— ์›€์ง์ผ ์ˆ˜ ์žˆ๋Š” ๊ฑฐ๋ฆฌ๊ฐ€ ์ œํ•œ๋จ โ†’ ๋А๋ฆฌ์ง€๋งŒ ์•ˆ์ •์ 
  • r์ด ๋„ˆ๋ฌด ํฌ๋ฉด: Taylor ๊ทผ์‚ฌ์˜ ์œ ํšจ ๋ฒ”์œ„ ์ดˆ๊ณผ โ†’ ๋ถˆ์•ˆ์ •

์ €์ž๋“ค์€ โ€œ์œ ํšจ ์‹œ์•ผ(Effective Lookahead)โ€ ๊ฐœ๋… r \cdot H๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์‹ ๋ขฐ ์˜์—ญ ๋ฐ˜๊ฒฝ r๊ณผ ํ˜ธ๋ผ์ด์ฆŒ H๋ฅผ ํ•จ๊ป˜ ๊ณ ๋ คํ•ด์•ผ ์‹ค์ œ ์„ฑ๋Šฅ์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ชฉํ‘œ ๋„๋‹ฌ ์„ฑ๋Šฅ (quasidynamic ์‹œ๋ฎฌ๋ ˆ์ด์…˜): - AllegroHand: ๋‹ค์–‘ํ•œ ๋ชฉํ‘œ ๊ฐ๋„์— ๋Œ€ํ•ด ๋†’์€ ๋„๋‹ฌ๋ฅ  - IiwaBimanual: ํ‰๋ฉด ์กฐ์ž‘์—์„œ ๊ฐ•๊ฑดํ•œ ์„ฑ๋Šฅ

๊ณ„์‚ฐ ์‹œ๊ฐ„: ๊ฐ MPC ๋ฐ˜๋ณต์ด SOCP์ด๋ฏ€๋กœ, ํ‘œ์ค€ CPU์—์„œ ์ˆ˜์‹ญ ms ์ˆ˜์ค€ โ€” ์‹ค์‹œ๊ฐ„ ์ œ์–ด์— ์ถฉ๋ถ„ํžˆ ๋น ๋ฅด๋‹ค.

์‹คํ—˜ 2: 2์ฐจ ๋™์—ญํ•™ ํ•˜์—์„œ์˜ ์•ˆ์ •ํ™” (Section 6)

์ค‘์š”ํ•œ ํ˜„์‹ค์  ๋„์ „: ๋กœ์ปฌ MPC๋Š” quasidynamic ๊ฐ€์ •(๊ณ ๊ฐ์‡ ) ํ•˜์—์„œ ๊ณ„ํš๋œ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์ œ ํ•˜๋“œ์›จ์–ด๋Š” 2์ฐจ ๋™์—ญํ•™(second-order dynamics)์„ ๋”ฐ๋ฅธ๋‹ค โ€” ๊ด€์„ฑ, ์ง„๋™, ์˜ค๋ฒ„์ŠˆํŠธ๊ฐ€ ์žˆ๋‹ค.

์ €์ž๋“ค์€ ์ด๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ํœด๋ฆฌ์Šคํ‹ฑ์„ ์ œ์•ˆํ•˜๊ณ , closed-loop MPC๊ฐ€ open-loop๋ณด๋‹ค ํ›จ์”ฌ ๊ฐ•๊ฑดํ•จ์„ ๋ณด์ธ๋‹ค.

CTR vs. ETR ๋น„๊ต:

๋ฐฉ๋ฒ• ์„ฑ๊ณต๋ฅ  (AllegroHand) ์„ฑ๊ณต๋ฅ  (IiwaBimanual) ๋น„๊ณ 
R-CTR (์ œ์•ˆ) ๊ฐ€์žฅ ๋†’์Œ ๊ฐ€์žฅ ๋†’์Œ ๋ฌผ๋ฆฌ ์ œ์•ฝ ๋ฐ˜์˜
CTR (์™„ํ™” ์—†์Œ) ์ค‘๊ฐ„ ์ค‘๊ฐ„ ์‹คํ–‰ ๋ถˆ๊ฐ€๋Šฅ์„ฑ ๋ฌธ์ œ
ETR ๋‚ฎ์Œ ๋‚ฎ์Œ ๋งˆ์ฐฐ ์›๋ฟ” ๋ฌด์‹œ

R-CTR์ด ETR๋ณด๋‹ค ๋ช…ํ™•ํ•˜๊ฒŒ ์šฐ์ˆ˜ํ•˜๋‹ค๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ ablation ๊ฒฐ๊ณผ๋‹ค. ๋งˆ์ฐฐ ์›๋ฟ” ์ •๋ณด๋ฅผ ์‹ ๋ขฐ ์˜์—ญ์— ํ†ตํ•ฉํ•˜๋Š” ๊ฒƒ์ด ์‹ค์งˆ์  ๋„์›€์ด ๋จ์„ ๋ณด์ธ๋‹ค.

ํ•˜๋“œ์›จ์–ด ์‹คํ—˜: IiwaBimanual๊ณผ AllegroHand ๋ชจ๋‘์—์„œ ํ•˜๋“œ์›จ์–ด ๊ฒ€์ฆ ์ˆ˜ํ–‰. ํŠนํžˆ 100ํšŒ์˜ ํ•˜๋“œ์›จ์–ด rollout์œผ๋กœ ํ†ต๊ณ„์  ๊ฒ€์ฆ.

์‹คํ—˜ 3: ๊ธ€๋กœ๋ฒŒ ํ”Œ๋ž˜๋‹ ์„ฑ๋Šฅ (Section 8)

Allegro Hand ๋กœ๋“œ๋งต: - ์˜คํ”„๋ผ์ธ ๋กœ๋“œ๋งต ๊ตฌ์ถ•: 10๋ถ„ ๋ฏธ๋งŒ, ๋…ธํŠธ๋ถ CPU ๋‹จ์ผ ์ฝ”์–ด - ์˜จ๋ผ์ธ ์ถ”๋ก : ์ˆ˜ ์ดˆ - ํ•˜๋“œ์›จ์–ด ์—ฐ์† ์„ฑ๋Šฅ: 150ํšŒ ์—ฐ์† ์—ฃ์ง€ ์ˆœํšŒ (๊ณผ์—ด๋กœ ์ข…๋ฃŒ)

์ด๊ฒƒ์„ ๊ธฐ์กด ๋ฐฉ๋ฒ•๊ณผ ๋น„๊ตํ•˜๋ฉด ๊ฒฉ์ฐจ๊ฐ€ ํ˜„์ €ํ•˜๋‹ค:

xychart-beta
    title "Computation Cost Comparison (approximate)"
    x-axis ["OpenAI Dactyl (RL)", "DextrAH (RL)", "MPPI (sampling)", "CTR (ours)"]
    y-axis "GPU Hours" 0 --> 10000
    bar [6000, 4000, 500, 0.01]
Figure 3: Computation cost comparison

์ฐธ๊ณ : ์œ„ ์ฐจํŠธ์˜ ์ˆ˜์น˜๋Š” ๊ฐœ๋žต์  ๋น„๊ต๋ฅผ ์œ„ํ•œ ์ถ”์ •๊ฐ’์œผ๋กœ, ๋…ผ๋ฌธ์˜ ์ง์ ‘์  ์ˆ˜์น˜๊ฐ€ ์•„๋‹˜.

CTR ๋ฐฉ๋ฒ•์€ GPU ์—†์ด CPU๋งŒ์œผ๋กœ ๋ถ„ ๋‹จ์œ„ ๊ณ„์‚ฐ์ด๋ฉฐ, ์ด๋Š” ์‹ค์šฉ์  ๋ฐฐํฌ ๊ฐ€๋Šฅ์„ฑ ์ธก๋ฉด์—์„œ ํš๊ธฐ์ ์ด๋‹ค.

๋‹ค์–‘ํ•œ ๋ชฉํ‘œ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ: ๋กœ๋“œ๋งต์— ์—†๋Š” ์ƒˆ๋กœ์šด ๋ชฉํ‘œ์— ๋Œ€ํ•ด์„œ๋„, ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋…ธ๋“œ๋ฅผ ์ฐพ์•„ ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ generalization์ด ๊ฐ€๋Šฅํ•˜๋‹ค.


์ด๋ก ์  ํ†ต์ฐฐ: CTR๊ณผ ๊ณ ์ „ ์ด๋ก ์˜ ์—ฐ๊ฒฐ

Wrench Set๊ณผ์˜ ์—ฐ๊ฒฐ

CTR์—์„œ ํŒŒ์ƒ๋˜๋Š” Motion Set์€ ๊ณ ์ „ ๋กœ๋ด‡๊ณตํ•™์˜ Wrench Set๊ณผ ์ง์ ‘ ์—ฐ๊ฒฐ๋œ๋‹ค. Wrench Set์€ ํ˜„์žฌ ์ ‘์ด‰ ์ƒํƒœ์—์„œ ๋กœ๋ด‡์ด ๋ฌผ์ฒด์— ๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ์ผ๋ฐ˜ํ™” ํž˜์˜ ์ง‘ํ•ฉ์ด๋‹ค.

\mathcal{W}^{\mathcal{A}}_{\mathbf{\Sigma},\kappa} = \text{conv}\left( \bigcup_{u \in \mathcal{S}^{\mathcal{A}}} \text{achievable wrenches at contact } i \right)

์ด ์—ฐ๊ฒฐ์€ CTR์ด ๋‹จ์ˆœํžˆ ์ƒˆ๋กœ์šด ํœด๋ฆฌ์Šคํ‹ฑ์ด ์•„๋‹ˆ๋ผ, ๊ณ ์ „ ์ ‘์ด‰ ์—ญํ•™ ์ด๋ก ์˜ ์ฒด๊ณ„์  ํ™•์žฅ์ž„์„ ์˜๋ฏธํ•œ๋‹ค. Craig (1986), Murray et al. (1994)์˜ ๊ณ ์ „์  ํŒŒ์•… ๋ถ„์„๊ณผ ์ˆ˜์‹ญ ๋…„์˜ gap์„ ์ž‡๋Š” ๋‹ค๋ฆฌ๊ฐ€ ๋œ๋‹ค.

KKT ์กฐ๊ฑด๊ณผ Dual Gradient์˜ ์˜๋ฏธ

CQDC์˜ ๋ณผ๋ก ์ตœ์ ํ™” ๊ตฌ์กฐ์—์„œ, ์ ‘์ด‰๋ ฅ \lambda๋Š” dual variable์ด๋‹ค. KKT ์กฐ๊ฑด:

\mathbf{P}(q) q_+ + b(q,u) - \sum_i \mathbf{J}_i^\top \lambda_i = 0 \quad \text{(primal feasibility)} \lambda_i \in \mathcal{K}_i^\star \quad \text{(dual feasibility)} \lambda_i^\top (\mathbf{J}_i q_+ - \phi_i \mathbf{e}_1) = 0 \quad \text{(complementarity)}

Sensitivity analysis (Implicit Function Theorem)๋ฅผ ์ด KKT ์กฐ๊ฑด์— ์ ์šฉํ•˜๋ฉด, ์ƒํƒœ์™€ ์ ‘์ด‰๋ ฅ ๋ชจ๋‘์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๋™์‹œ์— ์–ป๋Š”๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์ด primal gradient๋งŒ ํ™œ์šฉํ–ˆ๋‹ค๋ฉด, CTR์€ dual gradient๊นŒ์ง€ ์™„์ „ํžˆ ํ™œ์šฉํ•œ๋‹ค.


๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

1. ์ด๋ก ์  ์ผ๊ด€์„ฑ

ETR์˜ ๊ทผ๋ณธ์  ๋ถˆ์ผ์น˜๋ฅผ ๋ช…ํ™•ํžˆ ์ง€์ ํ•˜๊ณ , ๋ฌผ๋ฆฌ ์›๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•œ ๋Œ€์•ˆ์„ ์ œ์‹œํ•œ๋‹ค. ๋‹จ์ˆœํ•œ ์—”์ง€๋‹ˆ์–ด๋ง ํŠธ๋ฆญ์ด ์•„๋‹Œ, ๋ณผ๋ก ์ตœ์ ํ™” ์ด๋ก ์—์„œ ์ถœ๋ฐœํ•œ ์ˆ˜ํ•™์ ์œผ๋กœ ์—„๋ฐ€ํ•œ ์ ‘๊ทผ์ด๋‹ค.

2. ๊ณ„์‚ฐ ํšจ์œจ

๋ณ‘๋ ฌํ™” ์—†์ด ๋…ธํŠธ๋ถ CPU์—์„œ ๋ถ„ ๋‹จ์œ„ ๊ณ„ํš์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์€ ์‹ค์šฉ์  ๋ฐฐํฌ ๊ฐ€๋Šฅ์„ฑ ์ธก๋ฉด์—์„œ ํ˜์‹ ์ ์ด๋‹ค. RL ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋“ค์ด ์ˆ˜์ฒœ GPU-์‹œ๊ฐ„์„ ์š”๊ตฌํ•˜๋Š” ๊ฒƒ๊ณผ ๋น„๊ตํ•  ๋•Œ, ์†Œ๊ทœ๋ชจ ์—ฐ๊ตฌํŒ€์ด๋‚˜ ์‚ฐ์—… ํ˜„์žฅ์—์„œ ์ง์ ‘ ํ™œ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.

3. ๊ณ ์ „ ์ด๋ก ๊ณผ์˜ ์—ฐ๊ฒฐ

Wrench Set, ๋งˆ์ฐฐ ์›๋ฟ” ๋ถ„์„ ๋“ฑ ์ˆ˜์‹ญ ๋…„๊ฐ„ ์ถ•์ ๋œ ๊ณ ์ „ ์ ‘์ด‰ ์—ญํ•™ ์ด๋ก ๊ณผ ํ˜„๋Œ€ ๋ฏธ๋ถ„๊ฐ€๋Šฅ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์—ฐ๊ฒฐํ•œ๋‹ค. ์ด๋ก ์  ์•„๋ฆ„๋‹ค์›€์ด ์žˆ๋‹ค.

4. ๊ฐ•๊ฑดํ•œ ์‹คํ—˜ ๊ฒ€์ฆ

2000ํšŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ rollout๊ณผ 100ํšŒ ์ด์ƒ์˜ ํ•˜๋“œ์›จ์–ด ์‹คํ—˜์€ ํ†ต๊ณ„์ ์œผ๋กœ ์˜๋ฏธ์žˆ๋Š” ๊ฒ€์ฆ์ด๋‹ค. Ablation study (R-CTR vs. CTR vs. ETR)๊ฐ€ ์ฒด๊ณ„์ ์œผ๋กœ ์ˆ˜ํ–‰๋๋‹ค.

5. ์‹ค์šฉ์  ์„ค๊ณ„

R-CTR์˜ ๋„์ž…์œผ๋กœ ํ•ญ์ƒ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ MPC๋ฅผ ๋ณด์žฅํ•˜๊ณ , ์ดˆ๊ธฐ ์ถ”์ • ํœด๋ฆฌ์Šคํ‹ฑ์œผ๋กœ ์‹ค์ œ ๊ตฌํ˜„์˜ ์‹ค์šฉ์„ฑ์„ ๋†’์˜€๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

1. Quasidynamic ๊ฐ€์ •์˜ ์ œ์•ฝ

๋ฐฉ๋ฒ•๋ก ์˜ ํ•ต์‹ฌ ๋ชจ๋ธ์ธ CQDC๋Š” ์‹œ์Šคํ…œ์ด ๊ณ ๊ฐ์‡ (heavily damped) ๋™์—ญํ•™์„ ๋”ฐ๋ฅธ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ์ด๋Š” ๋น ๋ฅธ ๋™์ž‘์ด๋‚˜ ํƒ„์„ฑ ์ถฉ๋Œ์ด ํฌํ•จ๋œ ์กฐ์ž‘์—๋Š” ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต๋‹ค. Section 6์—์„œ 2์ฐจ ๋™์—ญํ•™ ์•ˆ์ •ํ™”๋ฅผ ์‹œ๋„ํ•˜์ง€๋งŒ, ์ด๋Š” ์™„์ „ํ•œ ํ•ด๊ฒฐ์ฑ…์ด ์•„๋‹Œ ํœด๋ฆฌ์Šคํ‹ฑ ๋ธŒ๋ฆฟ์ง€๋‹ค.

2. ์ ‘์ด‰ ๊ธฐํ•˜ ๋ฏธ๋ฆฌ ์ •์˜ ํ•„์š”

CTR์€ ํ˜„์žฌ ์ ‘์ด‰ ์Œ \mathcal{I}_c๊ฐ€ ์•Œ๋ ค์ ธ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ์ฆ‰, ์–ด๋””์„œ ์ ‘์ด‰์ด ์ผ์–ด๋‚ ์ง€๋ฅผ ์•Œ์•„์•ผ ํ•œ๋‹ค. ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋Š” ์ถฉ๋Œ ๊ฐ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ด๋ฅผ ์ œ๊ณตํ•˜์ง€๋งŒ, ์ƒˆ๋กœ์šด ์ ‘์ด‰ ๋ชจ๋“œ๋กœ์˜ ์ „ํ™˜(mode switching)์€ ์—ฌ์ „ํžˆ ๋„์ „์ ์ด๋‹ค.

3. ๋กœ๋“œ๋งต์˜ ์˜คํ”„๋ผ์ธ ์˜์กด์„ฑ

๊ธ€๋กœ๋ฒŒ ํ”Œ๋ž˜๋„ˆ๋Š” ์˜คํ”„๋ผ์ธ ๋กœ๋“œ๋งต ๊ตฌ์ถ•์— ์˜์กดํ•œ๋‹ค. โ€œ10๋ถ„โ€์ด๋ผ๋Š” ์‹œ๊ฐ„์€ ๊ธฐ์กด RL์— ๋น„ํ•ด ํš๊ธฐ์ ์œผ๋กœ ๋น ๋ฅด์ง€๋งŒ, ํ™˜๊ฒฝ์ด ๋ณ€ํ•˜๊ฑฐ๋‚˜ ์ƒˆ๋กœ์šด ๋ฌผ์ฒด๊ฐ€ ๋“ฑ์žฅํ•  ๋•Œ๋งˆ๋‹ค ๋กœ๋“œ๋งต์„ ๋‹ค์‹œ ๊ตฌ์ถ•ํ•ด์•ผ ํ•œ๋‹ค. ์˜จ๋ผ์ธ ์ ์‘์„ฑ์ด ์ œํ•œ์ ์ด๋‹ค.

4. ์ง€์—ญ ์ตœ์†Ÿ๊ฐ’ ๋ฌธ์ œ

SOCP ๊ธฐ๋ฐ˜ ๊ถค์  ์ตœ์ ํ™”๋Š” ๋ณผ๋ก ๋ฌธ์ œ์ง€๋งŒ, ์ „์ฒด MPC๋Š” ๋น„์„ ํ˜•์ด๋‹ค (๋งค ๋ฐ˜๋ณต๋งˆ๋‹ค ์„ ํ˜•ํ™”๋˜๋ฏ€๋กœ). ๋ณต์žกํ•œ ์ ‘์ด‰ ๋ชจ๋“œ๊ฐ€ ํ•„์š”ํ•œ ๊ฒฝ์šฐ, ์ง€์—ญ ์ตœ์†Ÿ๊ฐ’์— ๊ฐ‡ํž ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ์ผ๋ถ€ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋กœ๋“œ๋งต์„ ํ™œ์šฉํ•˜์ง€๋งŒ, ์™„์ „ํ•œ ๊ธ€๋กœ๋ฒŒ ์ตœ์ ์„ฑ์€ ๋ณด์žฅ๋˜์ง€ ์•Š๋Š”๋‹ค.

5. ํ™•์žฅ์„ฑ์˜ ๋ฌผ์Œํ‘œ

16 DOF Allegro Hand์™€ 6 DOF ๋ฌผ์ฒด(39 ์ถฉ๋Œ ๊ธฐํ•˜)๊นŒ์ง€ ์„ฑ๊ณต์ ์œผ๋กœ ํ™•์žฅํ–ˆ์ง€๋งŒ, ์ด๋ณด๋‹ค ๋ณต์žกํ•œ ์‹œ์Šคํ…œ(์˜ˆ: ๋‘ ์†์œผ๋กœ ํ˜‘๋ ฅ ์กฐ์ž‘, ์ˆ˜์‹ญ ๊ฐœ์˜ ๋Šฅ๋™ ์ ‘์ด‰)์— ๋Œ€ํ•œ ํ™•์žฅ์„ฑ์€ ์•„์ง ๋ฏธ๊ฒ€์ฆ์ด๋‹ค.

6. ํผ์…‰์…˜๊ณผ์˜ ํ†ตํ•ฉ ๋ถ€์žฌ

ํ˜„์žฌ ๋ฐฉ๋ฒ•์€ ์™„์ „ํ•œ ์ƒํƒœ ๊ด€์ธก(full state observation)์„ ๊ฐ€์ •ํ•œ๋‹ค. ์‹ค์ œ ๋ฐฐํฌ๋ฅผ ์œ„ํ•ด์„œ๋Š” ์นด๋ฉ”๋ผ, ์ด‰๊ฐ ์„ผ์„œ ๋“ฑ์„ ํ†ตํ•œ ์ƒํƒœ ์ถ”์ •์ด ํ•„์š”ํ•˜๋ฉฐ, ๊ด€์ธก ๋…ธ์ด์ฆˆ์™€ ๋ถ€๋ถ„ ๊ด€์ธก ๋ฌธ์ œ๊ฐ€ ์ถ”๊ฐ€๋œ๋‹ค.


๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

๋ฏธ๋ถ„๊ฐ€๋Šฅ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•

๋ฐฉ๋ฒ• ์‹ ๋ขฐ ์˜์—ญ ์ ‘์ด‰๋ ฅ ํ™œ์šฉ ๋งˆ์ฐฐ ์›๋ฟ” ํ†ตํ•ฉ ๊ณ„์‚ฐ ๋น„์šฉ
Suh et al. 2022 (bundled gradients) ETR (implicit) Primal only No ์ค‘๊ฐ„
Pang et al. 2023 (CQDC) ETR Primal only No ์ค‘๊ฐ„
Howell et al. 2023 (MJPC) ETR Primal only No ๋‚ฎ์Œ
CTR (this work) CTR Primal + Dual Yes ๋งค์šฐ ๋‚ฎ์Œ

RL ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•

OpenAI Dactyl, DextrAH (NVIDIA), Chen et al. (CoRL 2022)์™€ ๊ฐ™์€ RL ๋ฐฉ๋ฒ•๋“ค์€ sim-to-real ์ „๋žต์œผ๋กœ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ, ์ˆ˜์ฒœ GPU-์‹œ๊ฐ„์˜ ํ•™์Šต์ด ํ•„์š”ํ•˜๊ณ , ์ƒˆ๋กœ์šด ์ž‘์—…์— ๋Œ€ํ•œ ์žฌํ•™์Šต ์—†์ด๋Š” ์ผ๋ฐ˜ํ™”๊ฐ€ ์–ด๋ ต๋‹ค.

CTR์€ ์ด๋“ค๊ณผ ๊ฒฝ์Ÿํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋ณด์™„์žฌ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ €์ž๋“ค๋„ Section 9.1์—์„œ CTR์ด RL๊ณผ ์‹œ๋„ˆ์ง€๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ œ์•ˆํ•œ๋‹ค:

  • Imitation Learning์šฉ ๋ฐ์ดํ„ฐ ํ•ฉ์„ฑ: CTR-MPC๋กœ ์ƒ์„ฑ๋œ ํ’๋ถ€ํ•œ contact-rich ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ IL ํ•™์Šต์— ํ™œ์šฉ
  • RL ํƒํ—˜ ๋ถ€์ŠคํŠธ: CTR-MPC์˜ ๋กœ๋“œ๋งต์ด RL ์—์ด์ „ํŠธ์˜ ์ดˆ๊ธฐ ํƒํ—˜ ๋ถ„ํฌ(reset distribution)๋ฅผ ์ œ๊ณต

MPPI (Sampling-based MPC)

Li et al. (2024)์˜ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฐ˜ MPC๋Š” gradient ์—†์ด ๋งŽ์€ rollout์„ ๋ณ‘๋ ฌ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค. CTR๊ณผ ๋น„๊ตํ•˜๋ฉด:

  • MPPI: ๋งŽ์€ rollout, ๋ณ‘๋ ฌ GPU ํ•„์š” โ†’ ์‹ค์‹œ๊ฐ„ ์ ์šฉ ์–ด๋ ค์›€
  • CTR-MPC: ๋‹จ์ผ SOCP, CPU๋งŒ์œผ๋กœ ๋น ๋ฆ„ โ†’ ์‹ค์šฉ์ 

HiDex (Cheng et al. 2023)

MCTS ๊ธฐ๋ฐ˜ ๊ณ„์ธต์  ๊ณ„ํš ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ์ ‘์ด‰ ๋ชจ๋“œ ํƒํ—˜์— ํƒ์›”ํ•˜๋‹ค. CTR๊ณผ์˜ ์ฐจ์ด์ :

  • HiDex: ์ด์‚ฐ์  ์ ‘์ด‰ ๋ชจ๋“œ ์ „ํ™˜์„ ๋ช…์‹œ์ ์œผ๋กœ ํƒํ—˜
  • CTR: ์—ฐ์†์  smooth ๋™์—ญํ•™์œผ๋กœ ๋ชจ๋“œ ์ „ํ™˜์„ ์•”๋ฌต์ ์œผ๋กœ ์ฒ˜๋ฆฌ

์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

์ด ๋…ผ๋ฌธ์ด ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•œ ์งˆ๋ฌธ์„ ๋‹ค์‹œ ๋– ์˜ฌ๋ ค๋ณด์ž.

โ€œ์ ‘์ด‰์ด ํ’๋ถ€ํ•œ ์กฐ์ž‘์„ ์œ„ํ•œ ์ข‹์€ ๊ตญ์†Œ ์ ‘์ด‰ ์—ญํ•™ ์„ค๋ช…์ด๋ž€ ๋ฌด์—‡์ด๋ฉฐ, ์–ด๋””์„œ ์ด ์„ค๋ช…์„ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€

๋‹ต์€ CTR์ด๋‹ค. CTR์€ ์„ธ ๊ฐ€์ง€๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•œ๋‹ค:

  1. ๋ฌผ๋ฆฌ์  ์ผ๊ด€์„ฑ: ์ ‘์ด‰์˜ ๋‹จ๋ฐฉํ–ฅ์„ฑ๊ณผ ๋งˆ์ฐฐ ์›๋ฟ”์„ ์‹ ๋ขฐ ์˜์—ญ์— ์ง์ ‘ ํ†ตํ•ฉ
  2. ๊ณ„์‚ฐ ํšจ์œจ: SOCP๋กœ ํ‘œํ˜„๋˜์–ด ํšจ์œจ์ ์œผ๋กœ ํ’€ ์ˆ˜ ์žˆ๋Š” ๋ณผ๋ก ์ง‘ํ•ฉ
  3. ์ด๋ก ์  ์šฐ์•„ํ•จ: ๊ณ ์ „ Wrench Set๊ณผ ํ˜„๋Œ€ ๋ฏธ๋ถ„๊ฐ€๋Šฅ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์ž‡๋Š” ๊ต๋Ÿ‰

์„ธ ๊ฐ€์ง€ ๊ธฐ์—ฌ์˜ ์ตœ์ข… ๊ฒฐ๊ณผ๋Š” ์ธ์ƒ์ ์ด๋‹ค: Allegro Hand์˜ 3D ํ๋ธŒ ์žฌ๋ฐฐํ–ฅ ๋กœ๋“œ๋งต์„, GPU ์—†์ด ๋…ธํŠธ๋ถ์—์„œ 10๋ถ„ ๋งŒ์— ๊ตฌ์ถ•ํ•˜๊ณ , ํ•˜๋“œ์›จ์–ด์—์„œ 150ํšŒ ์—ฐ์†์œผ๋กœ ์‹คํ–‰ํ•œ๋‹ค.

์ด ๋…ผ๋ฌธ์ด ๋กœ๋ด‡๊ณตํ•™ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์ฃผ๋Š” ํ•จ์˜

๋‹จ๊ธฐ์  ์‹œ์‚ฌ์ : - ๊ธฐ์กด์˜ contact-implicit ๊ถค์  ์ตœ์ ํ™” ํŒŒ์ดํ”„๋ผ์ธ์— CTR์„ ํ”Œ๋Ÿฌ๊ทธ์ธ์œผ๋กœ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค. Dual gradient๋Š” ๋Œ€๋ถ€๋ถ„์˜ ํ˜„๋Œ€ ๋ฏธ๋ถ„๊ฐ€๋Šฅ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ด๋ฏธ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•˜๋‹ค. - ETR์„ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ์กด contact-rich MPC ๊ตฌํ˜„์—์„œ R-CTR๋กœ ๊ต์ฒดํ•˜๋ฉด ์ฆ‰์‹œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๊ธฐ๋Œ€๋œ๋‹ค.

์ค‘๊ธฐ์  ์‹œ์‚ฌ์ : - ์ด‰๊ฐ ์„ผ์„œ์™€ ๊ฒฐํ•ฉ ์‹œ, ์‹ค์‹œ๊ฐ„ ์ ‘์ด‰ ์ƒํƒœ ์ถ”์ • โ†’ CTR ์—…๋ฐ์ดํŠธ โ†’ ๋ฐ˜์‘์  ์กฐ์ž‘์˜ ํŒŒ์ดํ”„๋ผ์ธ์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. - IL/RL๊ณผ์˜ ์‹œ๋„ˆ์ง€: CTR ๋กœ๋“œ๋งต์œผ๋กœ ์ƒ์„ฑ๋œ ๊ณ ํ’ˆ์งˆ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํšจ์œจ์  ํ•™์Šต.

์žฅ๊ธฐ์  ์‹œ์‚ฌ์ : - ์ธ๊ฐ„์ด ์‹ ์ฒด ์ „์ฒด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์ฒ˜๋Ÿผ, ๋กœ๋ด‡์ด ํŒ” ์ „์ฒด์™€ ์†๋ฐ”๋‹ฅ, ์†๊ฐ€๋ฝ ์ „์ฒด๋ฅผ ๋Šฅ๋™์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” whole-body contact-rich manipulation์˜ ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์ธ๋‹ค. - Humanoid ๋กœ๋ด‡์˜ loco-manipulation์—์„œ, ๋ฐœ๊ณผ ์†์„ ๋™์‹œ์— ์‚ฌ์šฉํ•˜๋Š” ์ „์‹  ์ ‘์ด‰ ๊ณ„ํš์— ์ด ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ํ™•์žฅ๋  ์ˆ˜ ์žˆ๋‹ค.

๊ฒฐ๊ตญ ์ด ๋…ผ๋ฌธ์ด ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์€, ๋ฌธ์ œ์˜ ๋ฌผ๋ฆฌ์  ๊ตฌ์กฐ๋ฅผ ๊นŠ์ด ์ดํ•ดํ•˜๊ณ  ํ™œ์šฉํ•˜๋ฉด ๋ง‰๋Œ€ํ•œ ๊ณ„์‚ฐ ์ž์› ์—†์ด๋„ ๋†€๋ผ์šด ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์ด๋‹ค. ๊ฐ•ํ™”ํ•™์Šต์ด ๊ทœ๋ชจ์˜ ํž˜์œผ๋กœ ์ ‘์ด‰ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค ํ•  ๋•Œ, CTR์€ ์ดํ•ด์˜ ํž˜์œผ๋กœ ๊ฐ™์€ ๋ฌธ์ œ์— ์ ‘๊ทผํ•œ๋‹ค. ๋‘ ์ฒ ํ•™์€ ๊ฒฝ์Ÿ์ž๊ฐ€ ์•„๋‹ˆ๋ผ, ์„œ๋กœ๋ฅผ ๋ณด์™„ํ•˜๋Š” ๋™๋ฐ˜์ž๋‹ค.


์ฐธ๊ณ  ์ž๋ฃŒ

  • ๋…ผ๋ฌธ: Suh, H.J.T., Pang, T., Zhao, T., Tedrake, R. โ€œDexterous Contact-Rich Manipulation via the Contact Trust Region.โ€ International Journal of Robotics Research (IJRR), 2025. arXiv:2505.02291
  • ํ”„๋กœ์ ํŠธ ํŽ˜์ด์ง€: ctr.theaiinstitute.com
  • ์„ ํ–‰ ์—ฐ๊ตฌ: Pang et al. โ€œGlobal Planning for Contact-Rich Manipulation via Local Smoothing of Quasi-dynamic Contact Models.โ€ IEEE T-RO, 2023. (King-Sun Fu Memorial Best Paper Award Honorable Mention)
  • ๊ธฐ๋ฐ˜ ๋ชจ๋ธ: Pang, T., Tedrake, R. โ€œA Convex Quasidynamic Differentiable Contact Simulator.โ€ 2021.

Copyright 2026, JungYeon Lee