Curieux.JY
  • JungYeon Lee
  • Post
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ์„œ๋ก 
    • ๋ฐฉ๋ฒ•
      • ์ƒํƒœ ์ดˆ๊ธฐํ™”์™€ ๋กค์•„์›ƒ
      • Asymmetric actor-critic
      • ํ–‰๋™
      • ๋ณด์ƒ ํ•จ์ˆ˜ (ํ•ต์‹ฌ ์„ค๊ณ„)
      • Sim-to-Real
    • ์‹คํ—˜
      • ๋‚™์ƒ ์†์ƒ ๊ฐ์†Œ
      • ๋‚™์ƒ ๋ณต๊ท€
      • Ablation: ๊ด€์ธก ๊ตฌ์„ฑ (Table II)
      • ์žฌํ˜„์„ฑยทํ™•์žฅ
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

๐Ÿ“ƒArm-assisted Fall Recovery

rl
recovery
quadruped
mobile-manipulator
Learning Arm-Assisted Fall Damage Reduction and Recovery for Legged Mobile Manipulators
Published

April 17, 2026

  • Paper Link (arXiv:2303.05486)
  • Video

๐Ÿค– ๋ณธ ๋…ผ๋ฌธ์€ legged mobile manipulator๊ฐ€ ํŒ”์„ ์ด์šฉํ•ด ๋‚™ํ•˜ ํ”ผํ•ด๋ฅผ ์ค„์ด๊ณ  ๋ณต๊ตฌ๋ฅผ ๋ณด์กฐํ•˜๋Š” ํ•™์Šต ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹๊ณผ ์ด๋ฅผ ์œ„ํ•œ ๋น„๋Œ€์นญ actor-critic ํ›ˆ๋ จ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๐Ÿฆพ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ์ •์ฑ…์€ ๊ธฐ์ค€์„ (baseline) ๋ฐฉ๋ฒ• ๋Œ€๋น„ base contact impulse, peak joint internal force ๋ฐ base acceleration์„ ํฌ๊ฒŒ ์ค„์˜€์œผ๋ฉฐ, ํ•˜๋“œ์›จ์–ด ํ…Œ์ŠคํŠธ์—์„œ๋Š” 98.9%์˜ ๋†’์€ ๋ณต๊ตฌ ์„ฑ๊ณต๋ฅ ๊ณผ ๋‹ค๋ฆฌ ํ† ํฌ ์†Œ๋น„ ๊ฐ์†Œ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. โœ… ๋˜ํ•œ, ์‹œ๊ฐ„ ๋ถˆ๋ณ€(time-invariant) ์•กํ„ฐ ์ •์ฑ…์ด ๋” ๊ท ์ผํ•œ ๋ณต๊ตฌ ๋™์ž‘์„ ์œ ๋„ํ•˜๋ฉฐ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๋กœ๋ด‡์˜ resting ๋ฐ self-righting๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ ์ž‘์—…์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•จ์„ ๊ฒ€์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ Legged Mobile Manipulator์˜ ๋‚™์ƒ ํ”ผํ•ด ๊ฐ์†Œ ๋ฐ ํšŒ๋ณต์„ ์œ„ํ•ด ๋กœ๋ด‡ ํŒ”์„ ํ™œ์šฉํ•˜๋Š” ํ•™์Šต ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๋‚™์ƒ ๋ฐ ํšŒ๋ณต ์ „๋žต์€ ๋น„ํƒ„์„ฑ ์ถฉ๋Œ ๋˜๋Š” ์ •์˜๋œ ๋ฐฉํ–ฅ์œผ๋กœ์˜ ๋‚™์ƒ๊ณผ ๊ฐ™์€ ์ œํ•œ์ ์ธ ๊ฐ€์ •์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด ์‹ค์‹œ๊ฐ„ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ๋‹จ์ˆœํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ๋กœ๋ด‡ ํŒ”์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‚™์ƒ ํ”ผํ•ด๋ฅผ ์ค„์ด๊ณ  ๋กœ๋ด‡์˜ ํšŒ๋ณต์„ ๋•๋Š” ๋ฐฉ๋ฒ•์„ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ๋ชฉํ‘œ๋Š” ๋‹ค์–‘ํ•œ ์ดˆ๊ธฐ ๋‚™์ƒ ์กฐ๊ฑด์—์„œ ๋กœ๋ด‡์ด ๋‚™์ƒ์œผ๋กœ ์ธํ•œ ์†์ƒ์„ ์ค„์ด๊ณ  ์ •ํ•ด์ง„ ์‹œ๊ฐ„ ๋‚ด์— ์•ˆ์ •์ ์ธ ์Šคํƒ ์Šค ์ž์„ธ๋กœ ํšŒ๋ณตํ•˜๋„๋ก ํ•˜๋Š” ๋‹จ์ผ ์ œ์–ด ์ •์ฑ…์„ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ €์ž๋“ค์€ time-varying reward ํ•จ์ˆ˜๋ฅผ ํฌํ•จํ•˜๋Š” finite-horizon MDP(Markov Decision Process)๋กœ ๋ฌธ์ œ๋ฅผ ์ •ํ˜•ํ™”ํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  (Core Methodology):

๋ณธ ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ์€ Asymmetric Actor-Critic ํ›ˆ๋ จ ๊ตฌ์กฐ์™€ time-varying reward ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. State Initialization and Rollout:
    • ๋‹ค์–‘ํ•œ ๋‚™์ƒ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ธฐ ์œ„ํ•ด, ์ดˆ๊ธฐ ๊ธฐ๋ณธ ์ƒํƒœ(base state)์™€ ๊ด€์ ˆ ์ƒํƒœ(joint state)๊ฐ€ ๋ฌด์ž‘์œ„๋กœ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
    • ๊ด€์ ˆ ์•ก์ถ”์—์ดํ„ฐ(joint actuator)๋Š” ๊ฐ ์—ํ”ผ์†Œ๋“œ์—์„œ 0.04์ดˆ์—์„œ 1.50์ดˆ ์‚ฌ์ด์˜ ๋ฌด์ž‘์œ„ ๊ธฐ๊ฐ„ ๋™์•ˆ ๋น„ํ™œ์„ฑํ™”๋˜์–ด, ๋กœ๋ด‡์ด ์ด๋ฏธ ๋„˜์–ด์ง€๋Š” ์ƒํ™ฉ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ๋Šฆ๊ฒŒ ๋‚™์ƒ์„ ๊ฐ์ง€ํ•˜๋Š” ์ƒํ™ฉ์— ๋Œ€๋น„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
    • ์ด ์ดˆ๊ธฐ ๋น„ํ™œ์„ฑํ™” ๊ธฐ๊ฐ„์ด ๋๋‚˜๋ฉด, ํ•™์Šต๋œ ์ •์ฑ…(policy)์ด ๋กœ๋ด‡์„ ์ œ์–ดํ•˜์—ฌ ๋‚™์ƒ ํ”ผํ•ด๋ฅผ ์ค„์ด๊ณ  ํšŒ๋ณต์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์—ํ”ผ์†Œ๋“œ๋Š” MDP์˜ ๊ณ ์ •๋œ ์‹œ๊ฐ„ ์ง€ํ‰(time horizon)์ด ๋๋‚  ๋•Œ ์ข…๋ฃŒ๋ฉ๋‹ˆ๋‹ค.
  2. Asymmetric Actor-Critic:
    • ๊ฐ•ํ™” ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ๋Š” PPO(Proximal Policy Optimization)๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • Actor observation: ๋กœ๋ด‡์˜ ๊ธฐ๋ณธ ๋ฐฉํ–ฅ(base orientation), ๊ธฐ๋ณธ ๊ฐ์†๋„(base angular velocity), ๊ด€์ ˆ ์ƒํƒœ(joint states)๋ฅผ ํฌํ•จํ•œ ๋กœ๋ด‡ ์ƒํƒœ๋ฅผ ๊ด€์ฐฐํ•ฉ๋‹ˆ๋‹ค. ๋‚™์ƒ ํ›„ ์ƒํƒœ ์ถ”์ •์˜ ๋ถˆํ™•์‹ค์„ฑ ๋•Œ๋ฌธ์— ๊ธฐ๋ณธ ์„ ํ˜• ์†๋„(base linear velocity)๋Š” ์ œ์™ธ๋ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ค‘์—๋Š” Actor์˜ ๊ด€์ฐฐ์— Gaussian noise๊ฐ€ ์ถ”๊ฐ€๋˜์–ด ๋กœ๋ด‡์˜ ์ƒํƒœ ์ถ”์ • ๋…ธ์ด์ฆˆ์— ๋Œ€ํ•œ ์ •์ฑ…์˜ ๊ฒฌ๊ณ ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. Actor์˜ ์ •์ฑ…์€ time-invariantํ•ฉ๋‹ˆ๋‹ค.
    • Critic observation: Actor์˜ noiseless ๊ด€์ฐฐ๊ณผ ํ•จ๊ป˜ ํŠน๊ถŒ์ ์ธ ๊ด€์ฐฐ(privileged observations)์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ privileged observations๋Š” ๋ฐฐํฌ ์‹œ์—๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์ง€๋งŒ, ๊ฐ€์น˜ ํ•จ์ˆ˜(value function) ์ถ”์ •์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ด๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ์—ํ”ผ์†Œ๋“œ์˜ ๋‚จ์€ ์‹œ๊ฐ„, ๋ฐœ ์ ‘์ด‰ ์ƒํƒœ(foot contact states), ์•ก์ถ”์—์ดํ„ฐ ํ™œ์„ฑํ™” ์—ฌ๋ถ€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ด์ง„ ํ”Œ๋ž˜๊ทธ(binary flag), ์ดˆ๊ธฐ ๋น„ํ™œ์„ฑํ™” ๊ธฐ๊ฐ„์˜ ๋‚จ์€ ์‹œ๊ฐ„ ๋“ฑ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. Critic์€ time-variantํ•œ ๊ฐ€์น˜ ํ•จ์ˆ˜๋ฅผ ์ถ”์ •ํ•˜์—ฌ time-varying reward ํ•จ์ˆ˜์™€ ์—ฐ๋™๋ฉ๋‹ˆ๋‹ค.
  3. Actions:
    • ์ •์ฑ…์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๊ด€์ ˆ ๋ชฉํ‘œ(joint target)๋Š” \text{s}(\text{a} + \tilde{\text{q}})๋กœ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ \text{s}๋Š” ์•ก์…˜ ์Šค์ผ€์ผ๋ง ํŒฉํ„ฐ(action scaling factor), \text{a}๋Š” ์ •์ฑ…์˜ ์•ก์…˜ ์ถœ๋ ฅ, \tilde{\text{q}}๋Š” ๊ธฐ๋ณธ ๊ด€์ ˆ ์œ„์น˜(default joint position)์ž…๋‹ˆ๋‹ค.
    • ๊ณ„์‚ฐ๋œ ๊ด€์ ˆ ์œ„์น˜๋Š” ๋“œ๋ผ์ด๋ธŒ์˜ PD ์ปจํŠธ๋กค๋Ÿฌ(PD controller)์˜ ์œ„์น˜ ๋ชฉํ‘œ(position target)๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ALMA ๋กœ๋ด‡์˜ ๊ฒฝ์šฐ ์ง€์ •๋œ ๊ธฐ๋ณธ ๊ฐ๋„ ์ฃผ๋ณ€์˜ ์„ญ๋™(perturbation)์„ ์ถœ๋ ฅํ•˜๋Š” ๊ฒƒ์ด ์ข‹์€ ์ดˆ๊ธฐ ์ •์ฑ…์œผ๋กœ ์ž‘์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
  4. Reward Function:
    • ๋‚™์ƒ ํ”ผํ•ด ๊ฐ์†Œ๋Š” ๋†’์€ ์ ‘์ด‰ ์ถฉ๊ฒฉ๋Ÿ‰(contact impulse)๊ณผ ์‹ ์ฒด ๊ฐ€์†๋„(body acceleration)์™€ ๊ฐ™์€ ์›์น˜ ์•Š๋Š” ์ธก์ •๊ฐ’์˜ ์กฐํ•ฉ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ณต์‹ํ™”๋ฉ๋‹ˆ๋‹ค.
    • Time-variant task rewards (๋นจ๊ฐ„์ƒ‰):
      • Base height: ์—ํ”ผ์†Œ๋“œ์˜ ๋งˆ์ง€๋ง‰ 2์ดˆ ๋™์•ˆ ๋กœ๋ด‡์˜ ๋ชธํ†ต ๋†’์ด๊ฐ€ 0.5m ์ด์ƒ์ผ ๋•Œ ์ตœ๋Œ€ ๋ณด์ƒ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.
      • Joint position: ALMA์˜ ๊ธฐ๋ณธ ๊ด€์ ˆ ์œ„์น˜(default joint position)์—์„œ ๋ฒ—์–ด๋‚˜๋Š” ๊ฒƒ์— ํŽ˜๋„ํ‹ฐ๋ฅผ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.
      • Base orientation: ๋กœ๋ด‡์ด ๋กค(roll) ๋ฐ ํ”ผ์น˜(pitch) ๊ฐ๋„๋ฅผ ์ค„์—ฌ ๊ธฐ๋ณธ ๋ฐฉํ–ฅ์„ ํšŒ๋ณตํ•˜๋Š” ๊ฒƒ์— ๋ณด์ƒ์„ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.
      • ์ด๋Ÿฌํ•œ task reward๋Š” ์—ํ”ผ์†Œ๋“œ์˜ ๋งˆ์ง€๋ง‰ 2์ดˆ ๋™์•ˆ์—๋งŒ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค.
    • Time-invariant behavior rewards (ํŒŒ๋ž€์ƒ‰):
      • Body collision: ์‹ ์ฒด ์ถฉ๋Œ์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ๋กœ, ์ ‘์ด‰๋ ฅ์˜ ํฌ๊ธฐ(\sum_{b \in B} \|\lambda_b[t]\|^2)์— ๋น„๋ก€ํ•ฉ๋‹ˆ๋‹ค. ์Šค์ผ€์ผ์€ -0.2์ž…๋‹ˆ๋‹ค.
      • Momentum change: ์šด๋™๋Ÿ‰ ๋ณ€ํ™”(\sum_{b \in B} \|m_b a_b[t]\|)์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ๋กœ, ์Šค์ผ€์ผ์€ -5e-3์ž…๋‹ˆ๋‹ค.
      • Body yank: ๋ฐ”๋”” ์ €ํฌ(jerk, ํž˜์˜ ๋ณ€ํ™”์œจ)(\sum_{b \in B} \|F_b[t] - F_b[t-1]\|^2)์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ๋กœ, ์Šค์ผ€์ผ์€ -5e-2์ž…๋‹ˆ๋‹ค.
      • Action rate: ์•ก์…˜ ๋ณ€ํ™”์œจ(\sum (a[t] - a[t-1])^2)์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ๋กœ, ์Šค์ผ€์ผ์€ -3e-3์ž…๋‹ˆ๋‹ค.
      • Joint velocity: ๊ด€์ ˆ ์†๋„(\sum_j \dot{q}_j^2)์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ๋กœ, ์Šค์ผ€์ผ์€ -5e-4์ž…๋‹ˆ๋‹ค.
      • Torques: ํ† ํฌ(\sum_j \tau_j^2)์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ๋กœ, ์Šค์ผ€์ผ์€ -4e-7์ž…๋‹ˆ๋‹ค.
      • Acceleration: ๊ด€์ ˆ ๊ฐ€์†๋„(\sum_j \ddot{q}_j^2)์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ๋กœ, ์Šค์ผ€์ผ์€ -1e-8์ž…๋‹ˆ๋‹ค.
      • ์ด๋Ÿฌํ•œ behavior reward๋Š” ํ›ˆ๋ จ ์—ํ”ผ์†Œ๋“œ ์ „์ฒด์— ๊ฑธ์ณ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค.
    • ์ดˆ๊ธฐ ๋น„ํ™œ์„ฑํ™” ๊ธฐ๊ฐ„ ๋™์•ˆ์—๋Š” ์ •์ฑ…์ด ๋กœ๋ด‡์˜ ๊ด€์ ˆ ์•ก์…˜์— ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š์œผ๋ฏ€๋กœ, task ๋ฐ behavior reward ๋ชจ๋‘ 0์œผ๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

Sim-to-Real Transfer:

NVIDIA Isaac Gym์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ํ™˜๊ฒฝ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ณ  200Hz๋กœ ์‹คํ–‰ํ•˜๋ฉฐ, ์ •์ฑ…์€ 100Hz๋กœ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. Sim-to-Real ์ „์†ก์„ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์ด ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. * Actuator model: ๋‹ค๋ฆฌ ๋“œ๋ผ์ด๋ธŒ์— ๋Œ€ํ•œ ์•ก์ถ”์—์ดํ„ฐ ๋ชจ๋ธ์ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ํŒ” ๋“œ๋ผ์ด๋ธŒ์˜ ๊ฒฝ์šฐ, ๋งˆ์ฐฐ์„ ๋ฌด์ž‘์œ„ํ™”ํ•˜๊ณ  ํ† ํฌ ์ง€์—ฐ(torque delay)์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. * Terrain randomization: ํ‰ํ‰ํ•œ ์ง€ํ˜• ๋Œ€์‹  ๋ถˆ๊ท ์ผํ•œ ์ง€ํ˜•(uneven terrain)์ด ์‚ฌ์šฉ๋˜์–ด ์ง€๋ฉด ์ ‘์ด‰ ๋ฒ•์„  ๋ฐฉํ–ฅ(ground contact normal direction)์„ ๋ฌด์ž‘์œ„ํ™”ํ•˜๊ณ  ๋” ํฐ ์—ฌ์œ  ๊ณต๊ฐ„์„ ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. * Observation noise: Actor ๊ด€์ฐฐ์— Gaussian noise๊ฐ€ ์ถ”๊ฐ€๋˜์–ด ๋กœ๋ด‡์˜ ์ƒํƒœ ์ถ”์ • ๋…ธ์ด์ฆˆ์— ๋Œ€ํ•œ ์ •์ฑ…์˜ ๊ฒฌ๊ณ ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. * Robot randomization: ๋กœ๋ด‡์˜ ๊ธฐ๋ณธ ์งˆ๋Ÿ‰(base mass)์ด ๋ฌด์ž‘์œ„ํ™”๋˜๊ณ , ๋กœ๋ด‡ ๋ฐ”๋””์˜ ์ง€๋ฉด ๋งˆ์ฐฐ ๊ณ„์ˆ˜(friction coefficient)๋„ ๋ฌด์ž‘์œ„ํ™”๋ฉ๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ:

  • ๋‚™์ƒ ํ”ผํ•ด ๊ฐ์†Œ: ์ œ์•ˆ๋œ ์ •์ฑ…์€ ๊ธฐ์กด ๋น„์ƒ ์ปจํŠธ๋กค๋Ÿฌ(freezing, damping)์™€ ๋น„๊ตํ•˜์—ฌ ๊ธฐ๋ณธ ์ ‘์ด‰ ์ถฉ๊ฒฉ๋Ÿ‰(base contact impulse), ํ”ผํฌ ๊ด€์ ˆ ๋‚ด๋ถ€ ํž˜(peak joint internal forces), ๊ธฐ๋ณธ ๊ฐ€์†๋„(base acceleration)๋ฅผ ํฌ๊ฒŒ ์ค„์ž…๋‹ˆ๋‹ค. ํŠนํžˆ, ๊ธฐ๋ณธ ์ถฉ๊ฒฉ๋Ÿ‰์ด 0.05Ns ๋ฏธ๋งŒ์ธ ๊ฒฝ์šฐ์— ์ •์ฑ…์ด ์ ‘์ด‰์„ ํ”ผํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋” ๋งŽ์•„ ์†์ƒ ํšŒํ”ผ ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๋‚™์ƒ ํšŒ๋ณต: ํŒ”์„ ์‚ฌ์šฉํ•œ ์ •์ฑ…์€ ์ดˆ๊ธฐ ๋‚™์ƒ ๊ตฌ์„ฑ์—์„œ 98.9%์˜ ์„ฑ๊ณต๋ฅ ๋กœ ํšŒ๋ณตํ•˜๋ฉฐ, ํŒ”์ด ๊ณ ์ •๋œ(tugged-arm) ์ •์ฑ…์˜ 95.2%๋ณด๋‹ค ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ํŒ” ์‚ฌ์šฉ์€ ํ‰๊ท  ๋‹ค๋ฆฌ ํ† ํฌ ์†Œ๋น„๋ฅผ 9.17% ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค. ์ •์ฑ…์€ ๋‚™์ƒ์— ๋”ฐ๋ผ ํŒ”์„ ์ ์‘์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๋‚™์ƒ ์ถฉ๊ฒฉ์„ ์™„ํ™”ํ•˜๊ณ  ํšŒ๋ณต์„ ์ง€์›ํ•˜๋Š” ์ „๋žต์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
  • Observation ๊ตฌ์„ฑ Ablation Study:
    • Privileged critic์˜ ์ค‘์š”์„ฑ: Critic์ด ์—ํ”ผ์†Œ๋“œ ์ง„ํ–‰ ์ƒํ™ฉ์ด๋‚˜ privileged robot state observations๋ฅผ ๊ด€์ฐฐํ•˜์ง€ ๋ชปํ•˜๋ฉด task๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์—†์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. time-variantํ•œ ๊ฐ€์น˜ ํ•จ์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๋Š” privileged critic์ด ์ •์ฑ… ์—…๋ฐ์ดํŠธ์˜ ๋ถ„์‚ฐ์„ ํฌ๊ฒŒ ์ค„์—ฌ ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
    • Time-variant vs. time-invariant actor: Actor ๊ด€์ฐฐ์— ์—ํ”ผ์†Œ๋“œ์˜ ๋‚จ์€ ์‹œ๊ฐ„์„ ํฌํ•จํ•˜๋Š” ๊ฒƒ์€ ์—ํ”ผ์†Œ๋“œ ๋ฆฌํ„ด์— ํฐ ๋ณ€ํ™”๋ฅผ ์ฃผ์ง€ ์•Š์ง€๋งŒ, ํšŒ๋ณต ํ–‰๋™์„ ๋‹ค๋ฅด๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. Time-variant ์ •์ฑ…์€ ์ž‘์—… ๋ณด์ƒ์ด ํ™œ์„ฑํ™”๋˜๊ธฐ ์ง์ „์—๋งŒ ๋น ๋ฅด๊ฒŒ ํšŒ๋ณตํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด ์›€์ง์ž„์ด ์‹œ๊ฐ„์ ์œผ๋กœ ๊ท ์ผํ•˜์ง€ ์•Š๊ณ , ์‹คํŒจ ์‹œ ํšŒ๋ณต ์‹œ๋„๋ฅผ ์ค‘๋‹จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด time-invariant ์ •์ฑ…์€ ์‹œ๊ฐ„์ ์œผ๋กœ ์ผ๊ด€๋œ ํ–‰๋™์„ ๋ณด์ด๋ฉฐ ํ•ญ์ƒ ํšŒ๋ณต์„ ์‹œ๋„ํ•˜์—ฌ ๋ฐฐํฌ์— ๋” ์ ํ•ฉํ•˜๊ณ  ๊ฒฌ๊ณ ํ•ฉ๋‹ˆ๋‹ค.
    • Asymmetric actor-critic vs. privileged policy: ์ œ์•ˆ๋œ ๋น„ํŠน๊ถŒ์ (non-privileged) time-invariant actor์™€ privileged critic์„ ์‚ฌ์šฉํ•œ ์„ค์ •(configuration 2)์€ privileged policy ์„ค์ •(configuration 4)์— ๋น„ํ•ด ํ‰๊ท  ์—ํ”ผ์†Œ๋“œ ๋ฆฌํ„ด์ด 3.0% ๊ฐ์†Œํ•˜์ง€๋งŒ, ํ›„์ž๋Š” ํ•˜๋“œ์›จ์–ด ๋ฐฐํฌ๋ฅผ ์œ„ํ•ด ์ถ”๊ฐ€์ ์ธ ์ •์ฑ… ์ฆ๋ฅ˜(policy distillation) ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์žฌํ˜„์„ฑ ๋ฐ ๋‹ค๋ฅธ ์ž‘์—…์œผ๋กœ์˜ ์ ์‘:

์ œ์•ˆ๋œ ํ›ˆ๋ จ ํŒŒ์ดํ”„๋ผ์ธ์€ ๋กค์•„์›ƒ ์ดˆ๊ธฐํ™” ๊ธฐ๊ฐ„, ์ด ์ž‘์—… ๊ธฐ๊ฐ„, ๊ด€์ ˆ ๋ชฉํ‘œ ์œ„์น˜ ๋ฐ ๋ณด์ƒ ์Šค์ผ€์ผ๊ณผ ๊ฐ™์€ ์ƒ์œ„ ์ˆ˜์ค€ ์„ค์ •์„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ๋‹ค๋ฅธ ์ƒํƒœ ์ „ํ™˜ ์ž‘์—…(state-transition tasks)์— ์‰ฝ๊ฒŒ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. โ€œRestingโ€(์ž„์˜์˜ ์Šคํƒ ์Šค ๊ตฌ์„ฑ์—์„œ ์ง€๋ฉด์— ์ฐฉ์ง€) ๋ฐ โ€œSelf-rightingโ€(๋„˜์–ด์ง„ ์ƒํƒœ์—์„œ ๊ธฐ๋ณธ ํœด์‹ ์ž์„ธ๋กœ ์ผ์–ด์„œ๊ธฐ)๊ณผ ๊ฐ™์€ ๋‘ ๊ฐ€์ง€ ์ถ”๊ฐ€ ์ž‘์—…์— ๋Œ€ํ•œ ์ •์ฑ…๋„ ํ›ˆ๋ จ ๋ฐ ํ•˜๋“œ์›จ์–ด ๊ฒ€์ฆ์„ ํ†ตํ•ด ๊ทธ ๊ฒฌ๊ณ ์„ฑ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๋ก :

์ด ์—ฐ๊ตฌ๋Š” Legged Mobile Manipulator๋ฅผ ์œ„ํ•œ time-invariant ์ œ์–ด ์ •์ฑ…์„ ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•œ Asymmetric Actor-Critic ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ด ์ •์ฑ…์€ ๋ฌด์ž‘์œ„ ๋‚™์ƒ ๊ตฌ์„ฑ์—์„œ time-based reward๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จ๋˜๋ฉฐ, ๋‚™์ƒ ํ”ผํ•ด ๊ฐ์†Œ ๋ฐ ํšŒ๋ณต์„ ์œ„ํ•ด ํŒ”์„ ์ ์‘์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ์ปจํŠธ๋กค๋Ÿฌ๋Š” ๋‚™์ƒ ์ค‘ ํ”ผํฌ ์ˆœ๊ฐ„ ์ถฉ๊ฒฉ๋Ÿ‰, ๊ธฐ๋ณธ ๊ฐ€์†๋„, ํ”ผํฌ ๊ด€์ ˆ ๋‚ด๋ถ€ ํž˜ ์ธก๋ฉด์—์„œ ๊ธฐ์ค€ ๋น„์ƒ ์ปจํŠธ๋กค๋Ÿฌ๋ณด๋‹ค ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ํŒ”์„ ์‚ฌ์šฉํ•œ ํšŒ๋ณต ์ •์ฑ…์€ ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ํŒ”์ด ๊ณ ์ •๋œ ํšŒ๋ณต ์ •์ฑ…๋ณด๋‹ค ํšŒ๋ณต ์„ฑ๊ณต๋ฅ ๊ณผ ๋‹ค๋ฆฌ ํ† ํฌ ์†Œ๋น„ ์ธก๋ฉด์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ์ด ์ •์ฑ…์€ ALMA ๋กœ๋ด‡ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ํ•˜๋“œ์›จ์–ด์—์„œ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ํ…Œ์ŠคํŠธ๋˜๊ณ  ๋ฐฐํฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ๋Š” ๊ธฐ๋Šฅ ์ด์ƒ๊ณผ ๊ฐ™์€ ๋‚™์ƒ์œผ๋กœ ์ธํ•œ ์ž ์žฌ์  ์†์ƒ์„ ๊ณ ๋ คํ•˜์—ฌ ์ •์ฑ…์„ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

์„œ๋ก 

๋‹ค๋ฆฌ ๋‹ฌ๋ฆฐ ๋ชจ๋ฐ”์ผ ๋งค๋‹ˆํ“ฐ๋ ˆ์ดํ„ฐ๋Š” ์กฐ์ž‘ ๋Šฅ๋ ฅ๊ณผ ๋น„์ •ํ˜• ์ง€ํ˜• ์ฃผํ–‰์„ ๊ฒธ๋น„ํ•ด ์‹ค์šฉ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‘์šฉ์—์„œ๋Š” ์„ผ์„œ๋‚˜ ์—”๋“œ์ดํŽ™ํ„ฐ ๊ฐ™์€ ํŠน์ˆ˜ ํŽ˜์ด๋กœ๋“œ ๋ฅผ ์‹ฃ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๊ณ , ๋„˜์–ด์ง€๋ฉด ์ด ํŽ˜์ด๋กœ๋“œ์™€ ํŒ”์ด ์†์ƒ๋˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋‚™์ƒ ์ค‘ ์†์ƒ ๊ฐ์†Œ(fall damage reduction) ์™€ ์‹คํŒจ๋กœ๋ถ€ํ„ฐ์˜ ๋ณต๊ท€(recovery) ๋Š” legged ๋กœ๋ด‡ ๋ถ„์•ผ์˜ ๋‚จ์€ ํ•ต์‹ฌ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. ๋‘˜ ๋‹ค ๋กœ๋ด‡์ด ์ง€๋ฉด๊ณผ ์˜๋ฏธ ์žˆ๊ฒŒ ์ ‘์ด‰ํ•ด์•ผ ํ•˜๋Š” contact-rich ๋™์ž‘์ด๋ผ ์žฆ์€ ์ ‘์ด‰ ์ „ํ™˜์„ ๋‹ค๋ค„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ์กด ์ ‘๊ทผ์˜ ํ•œ๊ณ„๋Š” ๋ถ„๋ช…ํ•ฉ๋‹ˆ๋‹ค.

  • ๊ณ„ํš(planning) ๊ธฐ๋ฐ˜ ์†์ƒ ๊ฐ์†Œ: ๊ณ ์ •/์ ์‘ ์ ‘์ด‰ ์‹œํ€€์Šค๋ฅผ ์งœ๊ฑฐ๋‚˜ ukemi(๋‚™๋ฒ•) ๊ฐ™์€ ๋™์ž‘์œผ๋กœ ์ถฉ๊ฒฉ์„ ํก์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋น„ํƒ„์„ฑยท๋น„๋ฏธ๋„๋Ÿผ ์ถฉ๋Œ, ์‹œ์ƒ๋ฉด(sagittal)/๊ด€์ƒ๋ฉด(frontal)์œผ๋กœ๋งŒ ๋‚™์ƒ ๊ฐ™์€ ์ œํ•œ์  ๊ฐ€์ •์— ๊ธฐ๋Œ€๊ณ , ์‚ฌ์ง€๊ฐ€ ์ ‘์ด‰ ์‹œํ€€์Šค๋ฅผ ์ถ”์ข…ํ•  ๋งŒํผ ์ถฉ๋ถ„ํžˆ ๋ฏผ์ฒฉํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋Š”๋ฐ โ€” ๋ฌด๊ฑฐ์šด ๋‹ค๋ฆฌยท์ œํ•œ๋œ ๊ด€์ ˆ ์†๋„ ๋ฅผ ๊ฐ€์ง„ ๋กœ๋ด‡์—๋Š” ์„ฑ๋ฆฝํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ž‘์€ ํœด๋จธ๋…ธ์ด๋“œ์—์„œ ์“ฐ๋Š” โ€œํŒ” ์™„์ „ ์‹ ์ „(fully-stretched arm)โ€ ์ž์„ธ๋Š” ALMA(์•ฝ 58kg)์ฒ˜๋Ÿผ ๋ฌด๊ฑฐ์šด ๋กœ๋ด‡์—์„  ๋“œ๋ผ์ด๋ธŒ์— ๊ณผํ•œ ์ถฉ๊ฒฉ ์‘๋ ฅ์„ ์ค˜ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ๊ณ„ํš ๊ธฐ๋ฐ˜ ๋ณต๊ท€: ์ •ํ™•ํ•œ ์ƒํƒœยท์ ‘์ด‰์  ์ถ”์ •์— ์˜์กดํ•˜๋Š”๋ฐ, ๋‚™์ƒ ํ›„ ์ƒํƒœ ์ถ”์ •๊ณผ ์ ‘์ด‰์ด ๋ถˆํ™•์‹คํ•ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ์‚ฌ์ „ ์ •์˜ ์‹œํ€€์Šคยท์ƒํƒœ ์ „์ดยท๋ชจ๋ธ ๋‹จ์ˆœํ™” ๊ฐ™์€ ํœด๋ฆฌ์Šคํ‹ฑ์ด ํ•„์š”ํ•ด ํŠน์ • ๋กœ๋ด‡์— ์ข…์†๋ฉ๋‹ˆ๋‹ค.
  • RL ๊ธฐ๋ฐ˜ ๋ณต๊ท€(๊ธฐ์กด): ํœด๋ฆฌ์Šคํ‹ฑ์„ ์ค„์ผ ์ˆ˜ ์žˆ์ง€๋งŒ, ๋™์ž‘ ๋ณด์ƒ(๋ถ€๋“œ๋Ÿฌ์›€) ๊ณผ ์ž‘์—… ๋ณด์ƒ(๋ณต๊ท€ ์‹œ๊ฐ„) ์‚ฌ์ด์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋งž์ถ”๋Š” ์ผ์ด ์ทจ์•ฝํ•˜๊ณ  ๊ณ ๋œ ํŠœ๋‹ ์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.

์ด ๋…ผ๋ฌธ์˜ ํ•œ ์ค„ ์š”์•ฝ: ์ตœ์†Œํ•œ์˜ ๋‹จ์ˆœํ™”๋กœ, ํŒ”์„ ๋Šฅ๋™์ ์œผ๋กœ ํ™œ์šฉ ํ•ด ๋‚™์ƒ ์†์ƒ์„ ์ค„์ด๊ณ  ๋ณต๊ท€ํ•˜๋Š” ๋‹จ์ผ ์ •์ฑ…์„, time-invariant ์ •์ฑ… + time-varying ๋ณด์ƒ ์˜ asymmetric actor-critic๋กœ ํ•™์Šตํ•œ๋‹ค.

์ฃผ์š” ๊ธฐ์—ฌ๋Š” ์„ธ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.

  1. ์ œ์•ˆ ์ •์ฑ…๊ณผ ํ˜„์žฌ ๋ฐฐํฌ๋˜๋Š” ๋น„์ƒ ์ปจํŠธ๋กค๋Ÿฌ(๋“œ๋ผ์ด๋ธŒ freezing/damping)์˜ ์ •๋Ÿ‰ ๋น„๊ต โ€” ๋‚™์ƒ ์ค‘ base impulseยทpeak joint internal forceยทbase acceleration ๊ฐ์†Œ. ๋Œ€์นญ(symmetric)ยท์‹œ๊ฐ„ ๊ฐ€๋ณ€(time-variant) ๋ฒ„์ „ ๋Œ€๋น„ ๋ณต๊ท€ ํ–‰๋™ ๊ฐœ์„ ์˜ ablation.
  2. ํŒ”๋กœ ์ ์‘์ ์œผ๋กœ ๋‚™์ƒ์„ ๋ฉˆ์ถ”๊ณ , ์ •ํ•ด์ง„ ์‹œ๊ฐ„ ์•ˆ์— stance ์ž์„ธ๋กœ arm-assisted ๋ณต๊ท€ ํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฒฐ๊ณผ + ALMA ํ•˜๋“œ์›จ์–ด ๊ฒ€์ฆ.
  3. resting(์ž„์˜ ์ž์„ธ์—์„œ ํœด์ง€ ์ž์„ธ๋กœ), self-righting(๋„˜์–ด์ง„ ์ƒํƒœ์—์„œ ๊ธฐ๋ณธ ์ž์„ธ๋กœ) ๊ฐ™์€ ๋‹ค๋ฅธ ์ž‘์—…์œผ๋กœ์˜ ํ™•์žฅ.

๋ฐฉ๋ฒ•

flowchart LR
    ENV["Environment<br/>(๋‚™์ƒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜)"] --> ACTOR
    subgraph OBS["๊ด€์ธก ๋ถ„๋ฆฌ"]
        AO["Actor obs (๋น„ํŠน๊ถŒ)<br/>base orientation,<br/>angular velocity,<br/>joint states + ๋…ธ์ด์ฆˆ"]
        CO["Critic obs (ํŠน๊ถŒ)<br/>+ remaining time,<br/>foot contact flags,<br/>MDP obs, noiseless state"]
    end
    ACTOR["Actor (time-invariant)"] -->|joint target<br/>sยทa + qฬ„| PD["PD controller"]
    PD --> ENV
    AO --> ACTOR
    CO --> CRITIC["Critic (privileged)"]
    ENV --> CRITIC
    CRITIC -->|value| RL["PPO"]
    RL --> ACTOR

๋ฌธ์ œ๋ฅผ ์œ ํ•œ ์ง€ํ‰์„ (finite-horizon) MDP ๋กœ ์ •์‹ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋กœ๋ด‡์€ ๋ฌด์ž‘์œ„ ์ดˆ๊ธฐ ๋‚™์ƒ ์ž์„ธ์—์„œ ์‹œ์ž‘ํ•ด, ์—ํ”ผ์†Œ๋“œ ๋์—์„œ ์†์ƒ ๊ธฐ์ค€ ์ตœ์†Œํ™” + ์ง๋ฆฝ ๋ณต๊ท€ ์ •๋„์— ๋”ฐ๋ผ ๋ณด์ƒ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค. ํ•ต์‹ฌ ๋„๊ตฌ๋Š” asymmetric actor-critic ์œผ๋กœ, critic๋งŒ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ํŠน๊ถŒ(privileged) ๊ด€์ธก์— ์ ‘๊ทผํ•ฉ๋‹ˆ๋‹ค.

์ƒํƒœ ์ดˆ๊ธฐํ™”์™€ ๋กค์•„์›ƒ

๋ชจ๋ฐ”์ผ ๋งค๋‹ˆํ“ฐ๋ ˆ์ดํ„ฐ๋Š” ๋ณดํ–‰ยท์กฐ์ž‘ยทthrowing ๋“ฑ ์ž‘์—…๋งˆ๋‹ค ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ๋‹ค๋ฅด๊ณ  ์ƒํƒœ ๋ถ„ํฌ(๋‹ค๋ฆฌ ์ ‘์ด‰, base ์ž์„ธ ๋“ฑ)๊ฐ€ ํฌ๊ฒŒ ๋‹ฌ๋ผ, ๋ชจ๋“  ์ปจํŠธ๋กค๋Ÿฌ์— ๋งž๋Š” ๋‚™์ƒ ๊ฐ์ง€๊ธฐ๋ฅผ ๋งŒ๋“ค๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ž‘์—… ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์•ˆ์ „ ์ ๊ฒ€์„ ์ˆ˜ํ–‰ํ•ด ๋‚™์ƒ์„ ๋ณด๊ณ  ํ•˜๋ฉด ์ด ์ •์ฑ…์œผ๋กœ ์ „ํ™˜ํ•˜๋Š” ๋น„์ƒ ์ปจํŠธ๋กค๋Ÿฌ ๋กœ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์–‘ํ•œ ๋‚™์ƒ ์ดˆ๊ธฐ ์กฐ๊ฑด์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด, ๊ฐ ์—ํ”ผ์†Œ๋“œ์—์„œ ์ดˆ๊ธฐ baseยท๊ด€์ ˆ ์ƒํƒœ๋ฅผ ๋ฌด์ž‘์œ„ํ™”ํ•˜๊ณ  ๊ด€์ ˆ ์•ก์ถ”์—์ดํ„ฐ๋ฅผ 0.04~1.50์ดˆ ๋™์•ˆ ๋น„ํ™œ์„ฑํ™” ํ•ด ๋กœ๋ด‡์ด ๋–จ์–ด์ง€๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค(์ƒํ•œ 1.50์ดˆ๋Š” ๋Šฆ์€ ๋‚™์ƒ๊นŒ์ง€ ํ—ˆ์šฉ). ์ด ์ดˆ๊ธฐํ™” ๊ตฌ๊ฐ„์ด ๋๋‚˜๋ฉด ์ •์ฑ…์ด ๋กœ๋ด‡์„ ์ œ์–ดํ•˜๋ฉฐ, ์—ํ”ผ์†Œ๋“œ๋Š” MDP ์‹œ๊ฐ„ ์ง€ํ‰์„  ๋์—์„œ๋งŒ ์ข…๋ฃŒ๋ฉ๋‹ˆ๋‹ค.

Asymmetric actor-critic

PPO๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ critic๋งŒ ํŠน๊ถŒ ๊ด€์ธก์„ ๋ด…๋‹ˆ๋‹ค.

  • Actor ๊ด€์ธก: base ์ž์„ธ, base ๊ฐ์†๋„, ๊ด€์ ˆ ์ƒํƒœ. base ์„ ์†๋„๋Š” ๋‚™์ƒ ํ›„ ์ถ”์ • ๋ถˆํ™•์‹ค์„ฑ์ด ์ปค์„œ ์ œ์™ธ. ํ•™์Šต ์‹œ ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€(๊ฐ•๊ฑด์„ฑ).
  • Critic ๊ด€์ธก: ์žก์Œ ์—†๋Š” actor ๊ด€์ธก + ํŠน๊ถŒ ๊ด€์ธก(์—ํ”ผ์†Œ๋“œ ๋‚จ์€ ์‹œ๊ฐ„, ๋ฐœ ์ ‘์ด‰ ์ƒํƒœ, ์•ก์ถ”์—์ดํ„ฐ ํ™œ์„ฑ ์—ฌ๋ถ€ binary flag ๋ฐ ํ™œ์„ฑ๊นŒ์ง€ ๋‚จ์€ ์‹œ๊ฐ„ ๋“ฑ MDP ๊ด€์ธก).

ํ•ต์‹ฌ: ๋‚จ์€ ์‹œ๊ฐ„์„ critic๋งŒ ๊ด€์ธก ํ•˜๋ฏ€๋กœ actor์˜ ์ •์ฑ…์€ ์‹œ๊ฐ„ ๋ถˆ๋ณ€(time-invariant) ์œผ๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค. ํŠน๊ถŒ critic์ด time-varying ๋ณด์ƒ์ด ์œ ๋„ํ•˜๋Š” ์ง„์งœ value๋ฅผ ์ž˜ ์ถ”์ •ํ•ด, ๋น„ํŠน๊ถŒ actor๊ฐ€ ์†์ƒ ๊ฐ์†Œยท๋ณต๊ท€ ์Šคํ‚ฌ์„ ์•ˆ์ •์ ์œผ๋กœ ๋ฐฐ์šฐ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

ํ–‰๋™

์ •์ฑ…์ด ๋‚ด๋Š” ๊ด€์ ˆ ๋ชฉํ‘œ๋Š” s\,a + \bar q ์ž…๋‹ˆ๋‹ค(s: action scaling, a: ์ •์ฑ… ์ถœ๋ ฅ, \bar q: ๊ธฐ๋ณธ ๊ด€์ ˆ ์œ„์น˜). ์ด ๊ด€์ ˆ ์œ„์น˜๋ฅผ ๋“œ๋ผ์ด๋ธŒ PD ์ปจํŠธ๋กค๋Ÿฌ ์˜ ๋ชฉํ‘œ๋กœ ์”๋‹ˆ๋‹ค(๊ด€์ ˆ ์ฐจ๋ถ„ action ๋Œ€์‹  ์ ˆ๋Œ€ ๋ชฉํ‘œ). ๊ธฐ๋ณธ ๊ด€์ ˆ ์œ„์น˜์™€ action scale์€ ์ธก๋ฉด ๋‚™์ƒ ์‹œ ๋ฌด์ž‘์œ„ action์ด ๋กœ๋ด‡์„ ๋’ค์ง‘์„ ๊ธฐํšŒ๋ฅผ ๊ฐ–๋„๋ก ๊ณจ๋ผ, ์ดˆ๊ธฐ ํ•™์Šต์—์„œ self-righting ํƒ์ƒ‰์„ ๋•์Šต๋‹ˆ๋‹ค.

๋ณด์ƒ ํ•จ์ˆ˜ (ํ•ต์‹ฌ ์„ค๊ณ„)

๋‚™์ƒ ์†์ƒ ๊ฐ์†Œ๋ฅผ ์—ฌ๋Ÿฌ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์€ ์ธก์ •๊ฐ’(๋†’์€ contact impulse, body acceleration ๋“ฑ)์˜ ์กฐํ•ฉ ์ตœ์†Œํ™” ๋กœ ์ •์‹ํ™”ํ•˜๊ณ , ์—ํ”ผ์†Œ๋“œ ๋์—์„œ stance ์ž์„ธ ๊ทผ์ฒ˜๋กœ ์ผ์–ด์„  ๋ฐ ๋Œ€ํ•ด ๋ณด์ƒํ•ฉ๋‹ˆ๋‹ค. ๋ณด์ƒ์€ ๋‘ ์ข…๋ฅ˜๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

(1) Time-variant ์ž‘์—… ๋ณด์ƒ โ€” ์—ํ”ผ์†Œ๋“œ์˜ ๋งˆ์ง€๋ง‰ 2.0์ดˆ ์—๋งŒ ํ™œ์„ฑ:

ํ•ญ๋ชฉ ์˜๋ฏธ scale
Base height ํ† ๋ฅด์†Œ ๋†’์ด๊ฐ€ ๋†’์„์ˆ˜๋ก ๋ณด์ƒ(โ‰ฅ0.5m์—์„œ ์ตœ๋Œ€) 600
Stand joint position ALMA ๊ธฐ๋ณธ ๊ด€์ ˆ ์ž์„ธ์™€์˜ ํŽธ์ฐจ ํŽ˜๋„ํ‹ฐ 350
Base orientation rollยทpitch ํŽ˜๋„ํ‹ฐ๋กœ base ์ž์„ธ ๋ณต๊ท€ ์œ ๋„ 120

(2) Time-invariant ํ–‰๋™ ๋ณด์ƒ โ€” ์—ํ”ผ์†Œ๋“œ ์ „ ๊ตฌ๊ฐ„ ํ™œ์„ฑ(๋ถ€๋“œ๋Ÿฌ์šด ๋‚™์ƒ ์œ ๋„):

ํ•ญ๋ชฉ scale
Body collision -0.2
Momentum change -5\times10^{-2}
Body yank(ํž˜ ๋ณ€ํ™”์œจ) -5\times10^{-2}
Action rate -3\times10^{-3}
Joint velocity -5\times10^{-4}
Torques -4\times10^{-7}
Acceleration -1\times10^{-8}

์ง๊ด€: โ€œ์–ธ์ œ ์ผ์–ด์„ค์ง€(์ž‘์—…)โ€๋Š” ์‹œ๊ฐ„์— ๋ฌถ๊ณ , โ€œ์–ด๋–ป๊ฒŒ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์›€์ง์ผ์ง€(ํ–‰๋™)โ€๋Š” ํ•ญ์ƒ ์ ์šฉ ํ•จ์œผ๋กœ์จ, ๋ถ€๋“œ๋Ÿฌ์›€๊ณผ ๋ณต๊ท€ ์‹œ๊ฐ„์˜ ๊ท ํ˜•์„ ๊ฐ€์ค‘์น˜ ํŠœ๋‹ ์—†์ด ์‹œ๊ฐ„ ๊ตฌ์กฐ๋กœ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ดˆ๊ธฐํ™” ๊ตฌ๊ฐ„(์•ก์ถ”์—์ดํ„ฐ ๋น„ํ™œ์„ฑ)์—์„œ๋Š” ๋‘ ๋ณด์ƒ ๋ชจ๋‘ 0์ด๋ผ ๊ด€์ ˆ action์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

Sim-to-Real

NVIDIA Isaac Gym(200Hz ์‹œ๋ฎฌ, 100Hz ์ •์ฑ…)์œผ๋กœ ํ•™์Šตํ•˜๋ฉฐ ์ „์ด๋ฅผ ์œ„ํ•ด:

  • ์•ก์ถ”์—์ดํ„ฐ ๋ชจ๋ธ: ๋‹ค๋ฆฌ ๋“œ๋ผ์ด๋ธŒ์—” ์ ์šฉ(SEA), ํŒ”์€ pseudo-direct drive๋กœ ํˆฌ๋ช…์„ฑ์ด ์ข‹์•„ ๋ชจ๋ธ ๋Œ€์‹  ๋งˆ์ฐฐ ๋ฌด์ž‘์œ„ํ™” + ํ† ํฌ ์ง€์—ฐ๋งŒ ์ถ”๊ฐ€.
  • ์ง€ํ˜• ๋ฌด์ž‘์œ„ํ™”: ํ‰์ง€ ๋Œ€์‹  ์šธํ‰๋ถˆํ‰ํ•œ ์ง€ํ˜•์œผ๋กœ ์ง€๋ฉด ๋ฒ•์„  ๋ฐฉํ–ฅ์„ ๋ฌด์ž‘์œ„ํ™”, ๋” ํฐ clearance ์œ ๋„.
  • ๊ด€์ธก ๋…ธ์ด์ฆˆ: actor ๊ด€์ธก์— ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ.
  • ๋กœ๋ด‡ ๋ฌด์ž‘์œ„ํ™”: base ์ถ”๊ฐ€ ์งˆ๋Ÿ‰ \sim \mathcal{U}(-5,5) kg, ์ง€๋ฉด ๋งˆ์ฐฐ ๊ณ„์ˆ˜ ๋ฌด์ž‘์œ„ํ™”.

์‹คํ—˜

๋‚™์ƒ ์†์ƒ ๊ฐ์†Œ

๋ฌด์ž‘์œ„ ์ดˆ๊ธฐ ๋‚™์ƒ ์ž์„ธยทbase ์งˆ๋Ÿ‰์— ๋Œ€ํ•ด 2560ํšŒ ๋กค์•„์›ƒ์œผ๋กœ, ๋‘ ๋ฒ ์ด์Šค๋ผ์ธ ๋น„์ƒ ์ปจํŠธ๋กค๋Ÿฌ(๋“œ๋ผ์ด๋ธŒ freezing, damping)์™€ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. ์ง€ํ‘œ๋Š” peak base instantaneous impulse, mean/peak base acceleration, peak joint internal force.

  • Base contact impulse(Fig. 5a): ์ œ์•ˆ ์ •์ฑ…์€ 0.05Ns ์ดˆ๊ณผ impulse ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ์ ์Šต๋‹ˆ๋‹ค. damping์€ (์งํ•˜ยท์ธก๋ฉด ๋‚™์ƒ์— ๋Œ€์‘ํ•˜๋Š”) ๋‘ ๋ด‰์šฐ๋ฆฌ๋ฅผ, freezing์€ ๋†’์€ impulse ํ™•๋ฅ ์„ ๋ณด์—ฌ ํ‰ํ‰ํ•œ ๊ผฌ๋ฆฌ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
  • Base acceleration(Fig. 5b): 95th percentile base acceleration์ด ๋ฒ ์ด์Šค๋ผ์ธ๋ณด๋‹ค ์œ ์˜ํ•˜๊ฒŒ ๋‚ฎ์Œ โ€” ์ตœ์•… ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ์˜ ์šฐ์œ„.
  • Peak joint internal force(Fig. 6): damping ๋Œ€๋น„ ์†Œํญ, freezing ๋Œ€๋น„ ์œ ์˜ํ•˜๊ฒŒ ๊ฐ์†Œ. ์ฆ‰ ๋” ๋†’์€ peak internal force๋ฅผ ์œ ๋ฐœํ•˜์ง€ ์•Š์œผ๋ฉด์„œ impulseยทacceleration์„ ์ค„์˜€์Šต๋‹ˆ๋‹ค.

๋‚™์ƒ ๋ณต๊ท€

์ •์ฑ…์€ ํŒ”์„ ์ ์‘์ ์œผ๋กœ ์”๋‹ˆ๋‹ค(Fig. 4 ์˜ˆ์‹œ: ๋จผ์ € ๋‹ค๋ฆฌ๋กœ ๋น ๋ฅด๊ฒŒ ์ผ์–ด์„œ๋ ค๋‹ค ์‹คํŒจโ†’๊ท ํ˜• ์ƒ์‹คโ†’ํŒ”๊ณผ ๋ฌด๋ฆŽ์œผ๋กœ ๋‚™์ƒ ์•ˆ์ •ํ™”โ†’ํŒ”๋กœ ์ง€๋ฉด์„ ๋ฐ€์–ด ๋ณต๊ท€โ†’ํŒ” ํšŒ์ˆ˜). ์„ฑ๊ณต ๊ธฐ์ค€์€ ์—ํ”ผ์†Œ๋“œ ๋์—์„œ base height >0.5m, ์ตœ๋Œ€ ๊ด€์ ˆ ์†๋„ <0.01 rad/s.

  • Arm-assisted ๋ณต๊ท€ ์„ฑ๊ณต๋ฅ  98.9% vs arm-tugged(ํŒ” ๊ณ ์ •) 95.2% (2560 ์—ํ”ผ์†Œ๋“œ, ๋™์ผ ๋ณด์ƒยทMDP, ํŒ” ์‚ฌ์šฉ ์—ฌ๋ถ€๋งŒ ์ฐจ์ด).
  • ํŒ”์„ ์“ฐ๋ฉด ๋‹ค๋ฆฌ ๋“œ๋ผ์ด๋ธŒ ํ‰๊ท  ํ† ํฌ ์†Œ๋น„ 9.17% ๊ฐ์†Œ(ํŒ” ๋„์›€ ์—†์ด๋Š” ๋‹ค๋ฆฌ๋ฅผ ๋” ๋ฐ€์–ด์•ผ ํ•จ).

Ablation: ๊ด€์ธก ๊ตฌ์„ฑ (Table II)

3๊ฐœ ์‹œ๋“œ, 20000 iteration ํ›„ ํ‰๊ท (episode return / value error):

Actor obs Critic obs Episode return Value error
o_s o_s -3.88 0.0902
o_s (ours) o_s, o_{priv}, o_{MDP} 12.9 0.00379
o_s, o_{eplen} o_s, o_{priv}, o_{MDP} 12.9 0.00411
o_s, o_{priv}, o_{MDP} o_s, o_{priv}, o_{MDP} 13.3 0.00336
  • ํŠน๊ถŒ critic ํ•„์ˆ˜: ๋น„ํŠน๊ถŒ critic(๊ตฌ์„ฑ 1)์€ ์ž‘์—…์„ ํ•™์Šต ๋ชป ํ•จ. ์‹œ๊ฐ„ ๋ถˆ๋ณ€ critic์ด time-varying ๋ณด์ƒ์˜ ์ง„์งœ value๋ฅผ ๋ชป ๋ด ์ •์ฑ… ์—…๋ฐ์ดํŠธ ๋ถ„์‚ฐ์ด ํผ.
  • time-invariant vs time-variant actor: episode return์€ ๋น„์Šทํ•˜๋‚˜, time-variant actor๋Š” โ€œ์ ˆ๋ฐ˜์ฏค ์ผ์–ด๋‚˜ Hip ๊ด€์ ˆ ํ•œ๊ณ„์— ๊ธฐ๋Œ€ ์‰ฌ๋‹ค๊ฐ€ ์ž‘์—… ๋ณด์ƒ ์ง์ „์— ๊ธ‰ํžˆ ์žฌ๋ฐฐ์น˜โ€ํ•˜๋Š” ์‹œ๊ฐ„์ƒ ๋ถˆ๊ท ์ผํ•œ ๋™์ž‘์„ ํ•™์Šต โ€” ๋ฐฐํฌ์— ๋ถ€์ ํ•ฉ. time-invariant๋Š” ์ผ๊ด€๋˜๊ฒŒ ํ•ญ์ƒ ๋ณต๊ท€๋ฅผ ์‹œ๋„ํ•ด ๋” ๊ฒฌ๊ณ .
  • asymmetric vs ํŠน๊ถŒ ์ •์ฑ…: ๋น„ํŠน๊ถŒ time-invariant actor + ํŠน๊ถŒ critic(๊ตฌ์„ฑ 2)์€ ํŠน๊ถŒ ์ •์ฑ…(๊ตฌ์„ฑ 4) ๋Œ€๋น„ ํ‰๊ท  return 3.0% ๊ฐ์†Œ์— ๊ทธ์น˜๋ฉด์„œ, ์ถ”๊ฐ€ distillation ์—†์ด ๋ฐ”๋กœ ํ•˜๋“œ์›จ์–ด ๋ฐฐํฌ ๊ฐ€๋Šฅ.

์žฌํ˜„์„ฑยทํ™•์žฅ

3๋ฐฐ ํ‚ค์šด ์ž‘์—… ๋ณด์ƒ์œผ๋กœ๋„ ๋ณต๊ท€ height ๋ถ„ํฌ๊ฐ€ ์œ ์‚ฌ โ†’ ๋ณด์ƒ ์Šค์ผ€์ผ์— ๊ฐ•๊ฑด. ๋˜ํ•œ ์ดˆ๊ธฐํ™” ๊ตฌ๊ฐ„ยท์ž‘์—… ์‹œ๊ฐ„ยท๊ด€์ ˆ ๋ชฉํ‘œยท๋ณด์ƒ ์Šค์ผ€์ผ ๊ฐ™์€ ์ƒ์œ„ ์„ค์ •๋งŒ ๋ฐ”๊ฟ” resting(์ž„์˜ ์ž์„ธโ†’ํœด์ง€ ์ž์„ธ), self-righting(๋„˜์–ด์ง„ ์ƒํƒœโ†’๊ธฐ๋ณธ ์ž์„ธ)๋กœ ์†์‰ฝ๊ฒŒ ํ™•์žฅํ•˜๊ณ  ํ•˜๋“œ์›จ์–ด๋กœ ๊ฒ€์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

  • ์‹œ๊ฐ„ ๊ตฌ์กฐ๋กœ ๋ณด์ƒ ํŠœ๋‹์„ ์šฐํšŒ. โ€œ์ž‘์—…์€ time-variant, ํ–‰๋™์€ time-invariantโ€๋ผ๋Š” ๋ถ„๋ฆฌ๊ฐ€, ๋ถ€๋“œ๋Ÿฌ์›€ vs ๋ณต๊ท€ ์‹œ๊ฐ„์ด๋ผ๋Š” ๊ณ ์งˆ์  ๊ฐ€์ค‘์น˜ ํŠœ๋‹์„ ๊น”๋”ํ•˜๊ฒŒ ํ’‰๋‹ˆ๋‹ค. time-invariant ์ •์ฑ…์ด ๋ฐฐํฌ์— ๋” ์ ํ•ฉํ•˜๋‹ค๋Š” ์ ๋„ ablation์œผ๋กœ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • ๋ฌด๊ฑฐ์šด ๋กœ๋ด‡์— ๋งž๋Š” ํ˜„์‹ค์  ๊ฐ€์ • ์™„ํ™”. ๋น„ํƒ„์„ฑ/ํ‰๋ฉด ๋‚™์ƒ ๊ฐ™์€ ์ œ์•ฝ์„ ๋ฒ„๋ฆฌ๊ณ , ํŒ”์„ ๋Šฅ๋™์ ์œผ๋กœ ์จ์„œ 58kg๊ธ‰ ๋กœ๋ด‡์˜ ๋‚™์ƒยท๋ณต๊ท€๋ฅผ ๋‹ค๋ฃฌ ์ ์ด ์‹ค์šฉ์ ์ž…๋‹ˆ๋‹ค.
  • ์ •๋Ÿ‰ + ํ•˜๋“œ์›จ์–ด ๊ฒ€์ฆ. 2560ํšŒ ๋กค์•„์›ƒ์˜ ์ •๋Ÿ‰ ๋น„๊ต(impulseยทaccelerationยทinternal force)์™€ ์‹ค์ œ ALMA ๊ฒ€์ฆ์„ ํ•จ๊ป˜ ์ œ์‹œํ•ด ์„ค๋“๋ ฅ์ด ๋†’์Šต๋‹ˆ๋‹ค.
  • ๋‹จ์ผ ์ •์ฑ…์˜ ํ†ตํ•ฉ์„ฑยทํ™•์žฅ์„ฑ. ๋‚™์ƒ ๊ฐ์†Œ์™€ ๋ณต๊ท€๋ฅผ ํ•œ ์ •์ฑ…์— ๋‹ด๊ณ , ๋™์ผ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ restingยทself-righting๊นŒ์ง€ ํ™•์žฅํ–ˆ์Šต๋‹ˆ๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

  • ์ €์ž๊ฐ€ ์ธ์ •ํ•œ ํ•ต์‹ฌ ํ•œ๊ณ„: ํ•™์Šต ํ™˜๊ฒฝ์ด base๊ฐ€ ์ง€๋ฉด์— ๋‹ฟ๋Š” ์ƒํ™ฉ์„ ํšŒํ”ผ ํ•˜๋„๋ก ๊ตฌ์„ฑ๋ผ, ๋‚™์ƒ์œผ๋กœ ๋“œ๋ผ์ด๋ธŒ๊ฐ€ ๊ณ ์žฅ๋‚œ(dysfunctional) ๊ฒฝ์šฐ์—๋Š” ์ ์‘ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ์‹ฌํ•œ ๋‚™์ƒ์˜ ์†์ƒ ํ›„ ๊ฑฐ๋™์€ ๋‹ค๋ฃฐ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
  • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์œ„์ฃผ์˜ ์ •๋Ÿ‰ ํ‰๊ฐ€. ์†์ƒ ์ง€ํ‘œ ๋น„๊ต๋Š” ๋Œ€๋ถ€๋ถ„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด๋ฉฐ, ํ•˜๋“œ์›จ์–ด ๊ฒ€์ฆ์€ ์ •์„ฑ์  ์‹œ์—ฐ ์ค‘์‹ฌ์ž…๋‹ˆ๋‹ค. ์‹ค์„ธ๊ณ„ ์†์ƒ ๊ฐ์†Œ์˜ ์ •๋Ÿ‰ ์ˆ˜์น˜๋Š” ์ œํ•œ์ ์ž…๋‹ˆ๋‹ค.
  • peak joint internal force๋Š” ์†Œํญ ๊ฐœ์„ . damping ๋Œ€๋น„ internal force ๊ฐ์†Œ๋Š” marginal์ด๋ผ, ๋ชจ๋“  ์†์ƒ ๊ธฐ์ค€์—์„œ ์••๋„์ ์ด์ง„ ์•Š์Šต๋‹ˆ๋‹ค.
  • ๋น„์ƒ ์ปจํŠธ๋กค๋Ÿฌ ์ „ํ™˜ ์˜์กด. ์ž‘์—… ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ๋‚™์ƒ์„ ์ •ํ™•ํžˆ ๊ฐ์ง€ยท๋ณด๊ณ ํ•ด์•ผ ๋™์ž‘ํ•˜๋Š”๋ฐ, ๋‚™์ƒ ๊ฐ์ง€ ์ž์ฒด์˜ ์‹ ๋ขฐ์„ฑ์€ ์ด ๋…ผ๋ฌธ์˜ ๋ฒ”์œ„ ๋ฐ–์ž…๋‹ˆ๋‹ค.

์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

์ด ๋…ผ๋ฌธ์€ ๋ฌด๊ฑฐ์šด legged mobile manipulator(ALMA)์˜ ๋‚™์ƒ ์†์ƒ ๊ฐ์†Œ + ๋ณต๊ท€ ๋ฅผ, ํŒ”์„ ๋Šฅ๋™์ ์œผ๋กœ ํ™œ์šฉ ํ•˜๋Š” ๋‹จ์ผ ํ•™์Šต ์ •์ฑ…์œผ๋กœ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ํ•ต์‹ฌ์€ asymmetric actor-critic ์œผ๋กœ time-invariant ์ •์ฑ… ์„ time-varying ๋ณด์ƒ ์œผ๋กœ ํ•™์Šตํ•ด, ๋™์ž‘ ๋ถ€๋“œ๋Ÿฌ์›€๊ณผ ๋ณต๊ท€ ์‹œ๊ฐ„์˜ ๊ฐ€์ค‘์น˜ ํŠœ๋‹์„ ์‹œ๊ฐ„ ๊ตฌ์กฐ๋กœ ๋ถ„๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ˆ˜์น˜๋กœ ์ •๋ฆฌํ•˜๋ฉด, ๋ฌด์ž‘์œ„ ๋‚™์ƒ ์ž์„ธ์˜ 98.9% ์—์„œ ๋ณต๊ท€์— ์„ฑ๊ณต(ํŒ” ๊ณ ์ • 95.2% ๋Œ€๋น„)ํ–ˆ๊ณ , ํŒ” ํ™œ์šฉ์œผ๋กœ ๋‹ค๋ฆฌ ํ† ํฌ๋ฅผ 9.17% ์ ˆ์•ฝํ–ˆ์œผ๋ฉฐ, ๋ฒ ์ด์Šค๋ผ์ธ(freezing/damping) ๋Œ€๋น„ base impulseยท95th-percentile accelerationยทpeak joint internal force๋ฅผ ์ค„์˜€์Šต๋‹ˆ๋‹ค. ํ•™์Šต ์ •์ฑ…์€ ์‹ค์ œ ALMA์—์„œ ๋‚™์ƒยท๋ณต๊ท€ ๋ชจ๋‘ ๊ฒ€์ฆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์‹ค๋ฌด ๊ด€์ ์—์„œ ์ด ์—ฐ๊ตฌ์˜ ๊ฐ€์น˜๋Š” โ€œ๋น„์‹ผ ๋งค๋‹ˆํ“ฐ๋ ˆ์ดํ„ฐ๋ฅผ ๋‹จ legged ๋กœ๋ด‡์„, ๋„˜์–ด์ ธ๋„ ์†์ƒ์„ ์ค„์ด๊ณ  ์Šค์Šค๋กœ ํšŒ๋ณตํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ํ˜„์žฅ ๋ฐฐํฌ์— ํ•œ ๊ฑธ์Œ ๋‹ค๊ฐ€๊ฐ”๋‹คโ€ ๋Š” ๋ฐ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ ์žฅ๋‚œ ๋“œ๋ผ์ด๋ธŒ ๋Œ€์‘ยท์‹ค์„ธ๊ณ„ ์ •๋Ÿ‰ ํ‰๊ฐ€๋ผ๋Š” ํ•œ๊ณ„๋Š” ๋‚จ์ง€๋งŒ, ์‹œ๊ฐ„ ๊ฐ€๋ณ€ ๋ณด์ƒ + ๋น„ํŠน๊ถŒ ์‹œ๊ฐ„ ๋ถˆ๋ณ€ ์ •์ฑ… ์ด๋ผ๋Š” ์„ค๊ณ„๋Š” contact-rich ๋น„์ƒ ํ–‰๋™ ํ•™์Šต์˜ ์ข‹์€ ๋ ˆ์‹œํ”ผ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee