Curieux.JY
  • JungYeon Lee
  • Post
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ์„œ๋ก 
    • ๊ด€๋ จ ์—ฐ๊ตฌ์™€ ์ฐจ๋ณ„์ 
    • ๋ฐฉ๋ฒ• (Overview)
      • 1. Motion Retargeting
      • 2. Motion Imitation
      • 3. Domain Adaptation
      • ๋ชจ๋ธ ๊ตฌ์กฐ
    • ์‹คํ—˜
      • ํ•™์Šตํ•œ ์Šคํ‚ฌ
      • ์‹ค์„ธ๊ณ„ ์„ฑ๋Šฅ (Fig. 5, 7)
      • Out-of-distribution & Information Bottleneck (Fig. 8โ€“10)
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

๐Ÿ“ƒImitating Animals

quadruped
rl
imitation
sim2real
Learning Agile Robotic Locomotion Skills by Imitating Animals
Published

November 20, 2022

  • Paper Link (arXiv:2004.00784)
  • Project Page
  1. ๐Ÿพ ๋ณธ ๋…ผ๋ฌธ์€ ์‹ค์ œ ๋™๋ฌผ ๋ชจ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋ฐฉํ•˜์—ฌ ๋‹ค์กฑ ๋กœ๋ด‡์ด ๋‹ค์–‘ํ•˜๊ณ  ๋ฏผ์ฒฉํ•œ locomotion skills๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” imitation learning framework๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ› ๏ธ ์ด framework๋Š” inverse-kinematics๋ฅผ ํ†ตํ•œ motion retargeting, reference motion์„ ํ™œ์šฉํ•œ Reinforcement Learning ๊ธฐ๋ฐ˜ ์ •์ฑ… ํ›ˆ๋ จ, ๊ทธ๋ฆฌ๊ณ  information bottleneck์ด ์ ์šฉ๋œ latent space๋ฅผ ํ†ตํ•œ sample-efficient domain adaptation์œผ๋กœ sim-to-real transfer๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  3. โœจ 18-DoF quadruped robot Laikago์— ์ ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ dynamic gaits ๋ฐ behaviors๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํ•™์Šต์‹œ์ผฐ์œผ๋ฉฐ, adaptive policies๋Š” ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ non-adaptive ๋ฐฉ์‹๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ๊ณผ robustness๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

๋ณธ ๋…ผ๋ฌธ์€ ๋กœ๋ด‡์ด ์‹ค์ œ ๋™๋ฌผ์˜ ์›€์ง์ž„์„ ๋ชจ๋ฐฉํ•˜์—ฌ ๋ฏผ์ฒฉํ•œ ์ด๋™(locomotion) ๊ธฐ์ˆ ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ์ˆ˜๋™ ์ œ์–ด๊ธฐ ์„ค๊ณ„์˜ ๋ณต์žก์„ฑ๊ณผ ๊ฐ•ํ™” ํ•™์Šต(RL)์˜ ๋ณด์ƒ ํ•จ์ˆ˜ ์„ค๊ณ„ ๋ฐ sim-to-real ์ „์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.

I. ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐœ์š” (Framework Overview)

์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  1. ๋ชจ์…˜ ๋ฆฌํƒ€๊ฒŸํŒ… (Motion Retargeting): ๋™๋ฌผ์—์„œ ๊ธฐ๋ก๋œ ๋ชจ์…˜ ์บก์ฒ˜(mocap) ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋ด‡์˜ ํ˜•ํƒœ์— ๋งž๊ฒŒ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  2. ๋ชจ์…˜ ๋ชจ๋ฐฉ (Motion Imitation): ๋ฆฌํƒ€๊ฒŸํŒ…๋œ ๋ชจ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ๋กœ๋ด‡ ๋ชจ๋ธ์ด ํ•ด๋‹น ๋ชจ์…˜์„ ๋ชจ๋ฐฉํ•˜๋„๋ก ๊ฐ•ํ™” ํ•™์Šต์„ ํ†ตํ•ด ์ •์ฑ…(policy)์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.
  3. ๋„๋ฉ”์ธ ์ ์‘ (Domain Adaptation): ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จ๋œ ์ •์ฑ…์„ ์‹ค์ œ ๋กœ๋ด‡์— ํšจ์œจ์ ์œผ๋กœ ์ „์ด์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๊ธฐ์ˆ ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

II. ๋ชจ์…˜ ๋ฆฌํƒ€๊ฒŸํŒ… (Motion Retargeting)

๋™๋ฌผ์˜ ๋ชจ์…˜ ๋ฐ์ดํ„ฐ๋Š” ๋กœ๋ด‡๊ณผ ํ˜•ํƒœ๊ฐ€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์—ญ๊ธฐ๊ตฌํ•™(inverse-kinematics, IK)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๋ด‡์— ์žฌ๋งคํ•‘(retargeting)๋ฉ๋‹ˆ๋‹ค.

  • ๋™๋ฌผ์˜ ํŠน์ • ํ‚คํฌ์ธํŠธ(๋ฐœ, ์—‰๋ฉ์ด ๋“ฑ)๋ฅผ ๋กœ๋ด‡์˜ ํ•ด๋‹น ํ‚คํฌ์ธํŠธ์— ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ ํƒ€์ž„์Šคํ…์—์„œ ์†Œ์Šค ๋ชจ์…˜์˜ 3D ํ‚คํฌ์ธํŠธ ์œ„์น˜ \hat{x}_i(t)๋ฅผ ์ถ”์ ํ•˜๋„๋ก ๋กœ๋ด‡์˜ ํฌ์ฆˆ q_t ์‹œํ€€์Šค q_{0:T}๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • ์ตœ์ ํ™” ๋ฌธ์ œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: \arg \min_{q_{0:T}} \sum_t \sum_i ||\hat{x}_i(t) - x_i(q_t)||^2 + (\bar{q} - q_t)^T W(\bar{q} - q_t) ์—ฌ๊ธฐ์„œ \bar{q}๋Š” ๊ธฐ๋ณธ ํฌ์ฆˆ, W๋Š” ์ •๊ทœํ™” ๊ณ„์ˆ˜ ํ–‰๋ ฌ์ž…๋‹ˆ๋‹ค.

III. ๋ชจ์…˜ ๋ชจ๋ฐฉ (Motion Imitation)

๋ชจ์…˜ ๋ชจ๋ฐฉ์€ ๊ฐ•ํ™” ํ•™์Šต ๋ฌธ์ œ๋กœ ๊ณต์‹ํ™”๋ฉ๋‹ˆ๋‹ค. ์ •์ฑ… \pi๋Š” ํ™˜๊ฒฝ ์ƒํƒœ s_t์™€ ๋ชจ๋ฐฉํ•  ๋ชฉํ‘œ ๋ชจ์…˜ g_t๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํ–‰๋™ a_t๋ฅผ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค.

  • ์ •์ฑ… ์ž…๋ ฅ: ์ƒํƒœ s_t = (q_{t-2:t}, a_{t-3:t-1})๋Š” ์ด์ „ ์„ธ ํƒ€์ž„์Šคํ…์˜ ๋กœ๋ด‡ ํฌ์ฆˆ(q)์™€ ํ–‰๋™(a)์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ํฌ์ฆˆ ํŠน์ง•์€ IMU(Inertial Measurement Unit)๋ฅผ ํ†ตํ•ด ์–ป์€ ๋ฃจํŠธ ๋ฐฉํ–ฅ(root orientation) ๋ฐ ๊ฐ ๊ด€์ ˆ์˜ ๋กœ์ปฌ ํšŒ์ „(local rotations)์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ๋ฃจํŠธ ์œ„์น˜๋Š” ์‹ค์ œ ๋ฐฐํฌ ์‹œ ์ถ”์ • ๋ฌธ์ œ๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ์ œ์™ธ๋ฉ๋‹ˆ๋‹ค.

  • ๋ชฉํ‘œ ์ž…๋ ฅ: g_t = (\hat{q}_{t+1}, \hat{q}_{t+2}, \hat{q}_{t+10}, \hat{q}_{t+30})๋Š” ์ฐธ์กฐ ๋ชจ์…˜์—์„œ ์•ฝ 1์ดˆ ๋™์•ˆ์˜ ๋ฏธ๋ž˜ ๋„ค ๊ฐœ ํƒ€์ž„์Šคํ…์˜ ๋ชฉํ‘œ ํฌ์ฆˆ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

  • ํ–‰๋™ ์ถœ๋ ฅ: a_t๋Š” ๊ฐ ๊ด€์ ˆ์˜ PD ์ œ์–ด๊ธฐ(PD controller)์— ๋Œ€ํ•œ ๋ชฉํ‘œ ํšŒ์ „์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ๋ถ€๋“œ๋Ÿฌ์šด ์›€์ง์ž„์„ ์œ„ํ•ด ์ €์—ญ ํ†ต๊ณผ ํ•„ํ„ฐ(low-pass filter)๋ฅผ ๊ฑฐ์นฉ๋‹ˆ๋‹ค.

  • ๋ณด์ƒ ํ•จ์ˆ˜ (Reward Function): ์ •์ฑ…์ด ์ฐธ์กฐ ๋ชจ์…˜์˜ ๋ชฉํ‘œ ํฌ์ฆˆ ์‹œํ€€์Šค (\hat{q}_0, \hat{q}_1, ..., \hat{q}_T)๋ฅผ ์ถ”์ ํ•˜๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ณด์ƒ r_t๋Š” ์—ฌ๋Ÿฌ ํ•ญ์˜ ๊ฐ€์ค‘ ํ•ฉ์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค: r_t = w_p r_{pt} + w_v r_{vt} + w_e r_{et} + w_{rp} r_{rpt} + w_{rv} r_{rvt}

    • ์ž์„ธ ๋ณด์ƒ (Pose Reward) r_{pt}: ๋กœ๋ด‡ ๊ด€์ ˆ์˜ ๋กœ์ปฌ ํšŒ์ „ q_j^t๊ฐ€ ์ฐธ์กฐ ๋ชจ์…˜์˜ \hat{q}_j^t์™€ ์œ ์‚ฌํ•˜๋„๋ก ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค: r_{pt} = \exp \left[ -5 \sum_j ||\hat{q}_j^t - q_j^t||^2 \right]
    • ์†๋„ ๋ณด์ƒ (Velocity Reward) r_{vt}: ๊ด€์ ˆ ๊ฐ์†๋„ \dot{q}_j^t๊ฐ€ ์ฐธ์กฐ ๋ชจ์…˜์˜ \hat{\dot{q}}_j^t์™€ ์œ ์‚ฌํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค: r_{vt} = \exp \left[ -0.1 \sum_j ||\hat{\dot{q}}_j^t - \dot{q}_j^t||^2 \right]
    • ๋ง๋‹จ ํšจ๊ณผ๊ธฐ ๋ณด์ƒ (End-effector Reward) r_{et}: ๋ง๋‹จ ํšจ๊ณผ๊ธฐ(end-effector)์˜ 3D ์ƒ๋Œ€ ์œ„์น˜ x_e^t๊ฐ€ ์ฐธ์กฐ ๋ชจ์…˜์˜ \hat{x}_e^t๋ฅผ ์ถ”์ ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค: r_{et} = \exp \left[ -40 \sum_e ||\hat{x}_e^t - x_e^t||^2 \right]
    • ๋ฃจํŠธ ์ž์„ธ ๋ฐ ์†๋„ ๋ณด์ƒ (Root Pose and Velocity Reward) r_{rpt}, r_{rvt}: ๋กœ๋ด‡์˜ ๋ฃจํŠธ(torso)์˜ ๊ธ€๋กœ๋ฒŒ ์œ„์น˜ ๋ฐ ์„ ํ˜•/๊ฐ์†๋„๊ฐ€ ์ฐธ์กฐ ๋ชจ์…˜๊ณผ ์œ ์‚ฌํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค: r_{rpt} = \exp [-20||\hat{x}_{\text{root}}^t - x_{\text{root}}^t||^2 - 10||\hat{q}_{\text{root}}^t - q_{\text{root}}^t||^2] r_{rvt} = \exp [-2||\hat{\dot{x}}_{\text{root}}^t - \dot{x}_{\text{root}}^t||^2 - 0.2||\hat{\dot{q}}_{\text{root}}^t - \dot{\dot{q}}_{\text{root}}^t||^2]

IV. ๋„๋ฉ”์ธ ์ ์‘ (Domain Adaptation)

์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ํ™˜๊ฒฝ ๊ฐ„์˜ ๋™์—ญํ•™์  ๋ถˆ์ผ์น˜(dynamics discrepancies)๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • A. ๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™” (Domain Randomization): ํ›ˆ๋ จ ์ค‘ ๋™์—ญํ•™ ํŒŒ๋ผ๋ฏธํ„ฐ(์˜ˆ: ์งˆ๋Ÿ‰, ๊ด€์„ฑ, ๋งˆ์ฐฐ, ๋ชจํ„ฐ ๊ฐ•๋„, ์ง€์—ฐ ์‹œ๊ฐ„)๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ ์ •์ฑ…์ด ๋‹ค์–‘ํ•œ ๋™์—ญํ•™์— ๋Œ€ํ•ด ๊ฒฌ๊ณ (robust)ํ•ด์ง€๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

  • B. ๋„๋ฉ”์ธ ์ ์‘ (Latent Space Method): ๊ฐ•๊ฑด์„ฑ์„ ๋„˜์–ด ์ƒˆ๋กœ์šด ํ™˜๊ฒฝ์— ์ ์‘ํ•  ์ˆ˜ ์žˆ๋Š” ์ „๋žต์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค. ๋™์—ญํ•™ ํŒŒ๋ผ๋ฏธํ„ฐ \mu๋ฅผ ์ž ์žฌ ์ž„๋ฒ ๋”ฉ(latent embedding) z๋กœ ์ธ์ฝ”๋”ฉํ•˜๋Š” stochastic encoder E(z|\mu)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ •์ฑ… \pi(a|s, g, z)๋Š” ์ด z์— ์กฐ๊ฑดํ™”๋ฉ๋‹ˆ๋‹ค.

    • ์ •๋ณด ๋ณ‘๋ชฉ (Information Bottleneck): ์ •์ฑ…์ด ์‹ค์ œ ์‹œ์Šคํ…œ์˜ ๋™์—ญํ•™์— ๊ณผ์ ํ•ฉ(overfit)๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋™์—ญํ•™ ํŒŒ๋ผ๋ฏธํ„ฐ M๊ณผ ์ธ์ฝ”๋”ฉ Z ๊ฐ„์˜ ์ƒํ˜ธ ์ •๋ณด๋Ÿ‰(mutual information) I(M, Z)์— ์ƒํ•œ I_c๋ฅผ ๋‘ก๋‹ˆ๋‹ค. ์ด ์ œ์•ฝ ์กฐ๊ฑด์€ variational upper bound๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DKL(Kullback-Leibler divergence)๋กœ ๊ทผ์‚ฌ๋ฉ๋‹ˆ๋‹ค.
    • ์ตœ์ ํ™” ๋ชฉํ‘œ๋Š” ์ •๋ณด ์ •๊ทœํ™”๋œ(information-regularized) ํ˜•ํƒœ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค: \arg \max_{\pi,E} E_{\mu \sim p(\mu)} E_{z \sim E(z|\mu)} E_{\tau \sim p(\tau|\pi,\mu,z)} \left[ \sum_{t=0}^{T-1} \gamma^t r_t \right] - \beta E_{\mu \sim p(\mu)} [D_{KL}[E(\cdot|\mu)||\rho(\cdot)]] ์—ฌ๊ธฐ์„œ \beta \ge 0๋Š” ๋ผ๊ทธ๋ž‘์ฃผ ์Šน์ˆ˜(Lagrange multiplier)๋กœ, ๊ฐ•๊ฑด์„ฑ(robustness)๊ณผ ์ ์‘์„ฑ(adaptability) ์‚ฌ์ด์˜ ๊ท ํ˜•์„ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค. \beta๊ฐ€ ํด์ˆ˜๋ก ๊ฐ•๊ฑดํ•˜์ง€๋งŒ ๋น„์ ์‘์ ์ธ ์ •์ฑ…์ด, ์ž‘์„์ˆ˜๋ก ๋œ ๊ฐ•๊ฑดํ•˜์ง€๋งŒ ์ ์‘์ ์ธ ์ •์ฑ…์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • C. ์‹ค์„ธ๊ณ„ ์ „์ด (Real World Transfer): ์‹ค์ œ ๋กœ๋ด‡์— ์ •์ฑ…์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด, ์‹ค์ œ ๋™์—ญํ•™ ํ•˜์—์„œ ๊ฐ€์žฅ ๋†’์€ ๋ฆฌํ„ด(return)์„ ์ œ๊ณตํ•˜๋Š” ์ตœ์ ์˜ ์ธ์ฝ”๋”ฉ z^*๋ฅผ ์ง์ ‘ ํƒ์ƒ‰ํ•ฉ๋‹ˆ๋‹ค. z^* = \arg \max_z E_{\tau \sim p^*(\tau|\pi,z)} \left[ \sum_{t=0}^{T-1} \gamma^t r_t \right] ์ด z^*๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด AWR(Advantage-Weighted Regression)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

    1. ์ดˆ๊ธฐ ๊ฒ€์ƒ‰ ๋ถ„ํฌ \omega_0(z) = \mathcal{N}(0, I)์—์„œ ์ธ์ฝ”๋”ฉ z_k๋ฅผ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค.
    2. z_k์— ์กฐ๊ฑดํ™”๋œ ์ •์ฑ… \pi๋กœ ์‹ค์ œ ๋กœ๋ด‡์—์„œ ์—ํ”ผ์†Œ๋“œ๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ๋ฆฌํ„ด R_k๋ฅผ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.
    3. ์ด์ „ ์ƒ˜ํ”Œ๊ณผ ๋ฆฌํ„ด์„ ํฌํ•จํ•˜๋Š” ๋ฆฌํ”Œ๋ ˆ์ด ๋ฒ„ํผ(replay buffer) D๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    4. D์˜ ์ƒ˜ํ”Œ ์ค‘ ๋” ํฐ ์žฅ์ (advantage)์„ ๊ฐ€์ง„ ์ƒ˜ํ”Œ์— ๋” ๋†’์€ ๊ฐ€๋Šฅ๋„(likelihood)๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ถ„ํฌ \omega_{k+1}๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” \exp\left(\frac{1}{\alpha}(R_i - \bar{v})\right)๋กœ ๊ฐ ์ƒ˜ํ”Œ z_i์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ๊ธฐ์šธ๊ธฐ ํ•˜๊ฐ•(gradient descent)์œผ๋กœ \omega_k(z)๋ฅผ ์ ์ง„์ ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•จ์œผ๋กœ์จ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.

V. ์‹คํ—˜ ๊ฒฐ๊ณผ (Experimental Results)

18 ์ž์œ ๋„(DoF) ์‚ฌ์กฑ๋ณดํ–‰ ๋กœ๋ด‡์ธ Laikago๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋™์  ์ด๋™ ๊ธฐ์ˆ ์„ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

  • ํ•™์Šต๋œ ๊ธฐ์ˆ : ํŽ˜์ด์‹ฑ(pacing), ํŠธ๋กœํŒ…(trotting), ์—ญ๋ฐฉํ–ฅ ๋ณดํ–‰, ์ œ์ž๋ฆฌ ๊ฑธ์Œ(In-Place Steps), ์˜†๊ฑธ์Œ(Side-Steps), ํšŒ์ „(Turn), ํ™‰-ํ„ด(Hop-Turn) ๋“ฑ ๋‹ค์–‘ํ•œ ๋ณดํ–‰ ํŒจํ„ด๊ณผ ๋™์  ๊ธฐ์ˆ ์„ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค. Dog Trot ์ •์ฑ…์€ 1.08m/s, ์—ญ๋ฐฉํ–ฅ ํŠธ๋กœํŒ…์€ 1.20m/s์— ๋„๋‹ฌํ•˜์—ฌ ์ œ์กฐ์‚ฌ ์ˆ˜๋™ ์ œ์–ด๊ธฐ(0.84m/s)๋ณด๋‹ค ๋น ๋ฅธ ์†๋„๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
  • ๋„๋ฉ”์ธ ์ ์‘ ํšจ๊ณผ: โ€œNo Randโ€ (๋ฌด์ž‘์œ„ํ™” ์—†์ด ํ›ˆ๋ จ), โ€œRobustโ€ (๋ฌด์ž‘์œ„ํ™”๋งŒ ์ ์šฉ), โ€œAdaptiveโ€ (๋ณธ ๋…ผ๋ฌธ ์ œ์•ˆ ์ ์‘ ๋ฐฉ์‹) ์ •์ฑ…์„ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ์‹ค์ œ ๋กœ๋ด‡์— ๋ฐฐํฌ ์‹œ, โ€œAdaptiveโ€ ์ •์ฑ…์ด ๋Œ€๋ถ€๋ถ„์˜ ๊ธฐ์ˆ ์—์„œ โ€œNo Randโ€ ๋ฐ โ€œRobustโ€ ์ •์ฑ…๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ Dog Pace, Dog Spin๊ณผ ๊ฐ™์€ ๋™์  ๊ธฐ์ˆ ์—์„œ โ€œRobustโ€ ์ •์ฑ…์€ ์ž์ฃผ ๋„˜์–ด์กŒ์ง€๋งŒ โ€œAdaptiveโ€ ์ •์ฑ…์€ ๋” ์ผ๊ด€๋˜๊ฒŒ ๊ธฐ์ˆ ์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • โ€œAdaptiveโ€ ์ •์ฑ…์€ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ๊ท ํ˜•์„ ๋” ์˜ค๋ž˜ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์—ˆ๊ณ , ๋งŽ์€ ๊ฒฝ์šฐ ๋„˜์–ด์ง€์ง€ ์•Š๊ณ  ์ตœ๋Œ€ ์—ํ”ผ์†Œ๋“œ ๊ธธ์ด์— ๋„๋‹ฌํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ํ›ˆ๋ จ ์‹œ ์‚ฌ์šฉ๋œ ๋ฒ”์œ„๋ณด๋‹ค ๋„“์€ ๋™์—ญํ•™ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฒ”์œ„(out-of-distribution)์—์„œ โ€œAdaptiveโ€ ์ •์ฑ…์ด ๋” ๋†’์€ ๋ฆฌํ„ด์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ์ต์ˆ™ํ•˜์ง€ ์•Š์€ ๋™์—ญํ•™์— ๋Œ€ํ•œ ๋” ๋‚˜์€ ์ผ๋ฐ˜ํ™”(generalization) ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
    • ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ AWR์„ ํ†ตํ•œ ์ ์‘์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ ์€ ์ˆ˜์˜ ์—ํ”ผ์†Œ๋“œ(์•ฝ 50ํšŒ)๋กœ ์ƒˆ๋กœ์šด ํ™˜๊ฒฝ์— ์ ์‘ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ •๋ณด ๋ณ‘๋ชฉ ํšจ๊ณผ: ์ •๋ณด ๋ณ‘๋ชฉ์˜ ๊ณ„์ˆ˜ \beta์˜ ์˜ํ–ฅ์„ ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • \beta ๊ฐ’์ด ํด์ˆ˜๋ก ์ •์ฑ…์€ ๋™์—ญํ•™ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ ์˜์กด๋„๊ฐ€ ๋‚ฎ์•„์ ธ, ์ ์‘ ์ „ ์„ฑ๋Šฅ(robustness)์€ ํ–ฅ์ƒ๋˜์ง€๋งŒ ์ ์‘ ํ›„ ์„ฑ๋Šฅ ํ–ฅ์ƒ ํญ(adaptability)์€ ์ž‘์•„์ง‘๋‹ˆ๋‹ค.
    • \beta ๊ฐ’์ด ์ž‘์„์ˆ˜๋ก ๋œ ๊ฐ•๊ฑดํ•˜์ง€๋งŒ ๋” ์ ์‘์ ์ธ ์ •์ฑ…์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
    • ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” \beta=10^{-4}๊ฐ€ ๊ฐ•๊ฑด์„ฑ๊ณผ ์ ์‘์„ฑ ์‚ฌ์ด์˜ ์ข‹์€ ๊ท ํ˜•์„ ์ œ๊ณตํ•จ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ •๋ณด ๋ณ‘๋ชฉ์ด ์—†๋Š”(No IB) ์ •์ฑ…๋ณด๋‹ค ์ •๋ณด ์ œ์•ฝ(information-constrained) ์ •์ฑ…์ด ์ ์‘ ์ „ํ›„ ๋ชจ๋‘ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

VI. ๊ฒฐ๋ก  ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ (Discussion and Future Work)

์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์–‘ํ•œ ๋™๋ฌผ ๋ชจ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋ฐฉํ•˜์—ฌ ์‚ฌ์กฑ๋ณดํ–‰ ๋กœ๋ด‡์ด ๋ฏผ์ฒฉํ•œ ์ด๋™ ๊ธฐ์ˆ ์„ ํ•™์Šตํ•˜๊ณ  ์ด๋ฅผ ์‹ค์ œ ์„ธ๊ณ„๋กœ ํšจ์œจ์ ์œผ๋กœ ์ „์ด์‹œํ‚ค๋Š” ๋ฐ ์„ฑ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ•˜๋“œ์›จ์–ด ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ํ•œ๊ณ„๋กœ ์ธํ•ด ์•„์ง ํฐ ์ ํ”„๋‚˜ ๋‹ฌ๋ฆฌ๊ธฐ์™€ ๊ฐ™์€ ๋” ์—ญ๋™์ ์ธ ํ–‰๋™์€ ํ•™์Šตํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ๋Š” ํ•™์Šต๋œ ์ œ์–ด๊ธฐ์˜ ์•ˆ์ •์„ฑ ํ–ฅ์ƒ, ๋” ๋‹ค์–‘ํ•œ ํ–‰๋™ ๋ฐ์ดํ„ฐ ์†Œ์Šค(์˜ˆ: ๋น„๋””์˜ค ํด๋ฆฝ)๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

์„œ๋ก 

RL๋กœ ํ•™์Šตํ•œ ์—์ด์ „ํŠธ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ด์ง€๋งŒ, ์‹ค๋กœ๋ด‡์— ์˜ฌ๋ฆฌ๋ฉด ๋ถ€์ž์—ฐ์Šค๋Ÿฝ๊ฑฐ๋‚˜ ์œ„ํ—˜ยท์‹คํ–‰ ๋ถˆ๊ฐ€๋Šฅํ•œ ํ–‰๋™ ์„ ๋ณด์ด๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ž์—ฐ์Šค๋Ÿฌ์šด ์งˆ๋ฌธ์ด ์ƒ๊น๋‹ˆ๋‹ค โ€” ๋™๋ฌผ์˜ ๋ชจ์…˜์„ ์ง์ ‘ ๋ชจ๋ฐฉํ•˜๋ฉด, ๋” ์ ์€ ๋…ธ๋ ฅ์œผ๋กœ ๋” ๋ฏผ์ฒฉํ•œ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ?

reference motion์„ ์“ฐ๋ฉด ์Šคํ‚ฌ๋ณ„ ๋ณด์ƒ ํ•จ์ˆ˜ ์„ค๊ณ„์˜ ๋ถ€๋‹ด ์ด ํฌ๊ฒŒ ์ค„์–ด๋“ญ๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•œ ์ •์ฑ…์„ ์‹ค์„ธ๊ณ„๋กœ ์˜ฎ๊ธฐ๋ ค๋ฉด sim-to-real ๊ฐญ์„ ๋„˜์–ด์•ผ ํ•˜๋Š”๋ฐ, ์ €์ž๋“ค์€ sample-efficient adaptation ๊ธฐ๋ฒ•์œผ๋กœ ์ •์ฑ…์˜ ๊ฑฐ๋™์„ ๋ฏธ์„ธ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€์ƒ ๋กœ๋ด‡์€ Laikago 4์กฑ ๋กœ๋ด‡์ด๋ฉฐ, ๋‹ค์–‘ํ•œ ๋ณดํ–‰ gait์™€ dynamic hopยทturn์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

์ด ๋…ผ๋ฌธ์˜ ํ•œ ์ค„ ์š”์•ฝ: ๋™๋ฌผ mocap์„ ๋ชจ๋ฐฉ ํ•ด ์Šคํ‚ฌ๋ณ„ ๋ณด์ƒ ์„ค๊ณ„ ์—†์ด ๋‹ค์–‘ํ•œ ๋ฏผ์ฒฉ ๋ณดํ–‰์„ ํ•™์Šตํ•˜๊ณ , latent space domain adaptation ์œผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜โ†’์‹ค๋กœ๋ด‡ ์ „์ด๋ฅผ ํšจ์œจํ™”ํ•œ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ์™€ ์ฐจ๋ณ„์ 

  • Trajectory optimization / MPC: ์ปจํŠธ๋กค๋Ÿฌ ์„ค๊ณ„์˜ ์ˆ˜์ž‘์—…์„ ์ค„์˜€์ง€๋งŒ, ๋ณดํ–‰ ์‹œ์Šคํ…œ์˜ ๊ณ ์ฐจ์›ยท๋ณต์žก ๋™์—ญํ•™ ๋•Œ๋ฌธ์— ์ถ•์•ฝ ๋ชจ๋ธ(reduced-order model) ์— ์˜์กดํ–ˆ์Šต๋‹ˆ๋‹ค.
  • Motion imitation: ๋ณดํ–‰ ๋กœ๋ด‡ ์ ์šฉ์€ ์ฃผ๋กœ ์ƒ์ฒด ์œ„์ฃผยท์ •์  ํ•˜์ฒด ํ–‰๋™์— ๊ตญํ•œ๋๊ณ  ๊ท ํ˜• ์ œ์–ด๋Š” ๋ณ„๋„ ์ „๋žต์— ๋งก๊ฒผ์Šต๋‹ˆ๋‹ค. ์ตœ๊ทผ RL ๊ธฐ๋ฐ˜ motion imitation์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ acrobatic ์Šคํ‚ฌ์„ ์ž˜ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
  • Sim-to-real: ์ •ํ™•ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ตฌ์ถ•, ์‹ค๋ฐ์ดํ„ฐ๋กœ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ณด์ •, domain randomization(ํ•™์Šต ์ค‘ ๋™์—ญํ•™์„ ๋ณ€ํ™”์‹œ์ผœ ๊ฐ•๊ฑด์„ฑ ํ™•๋ณด), fine-tuningยทmeta-learning ๊ฐ™์€ ์ ์‘ ๊ธฐ๋ฒ• ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ฐจ๋ณ„์ (Ours): latent space ๋ฐฉ๋ฒ• ์„ motion imitation ๊ณผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค. pre-training์—์„œ ๋‹ค์–‘ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค์— ํšจ๊ณผ์ ์ธ behavior๋“ค์˜ latent ํ‘œํ˜„์„ ํ•™์Šตํ•˜๊ณ , ์ƒˆ ๋„๋ฉ”์ธ์—์„œ๋Š” latent space๋ฅผ ํƒ์ƒ‰ํ•ด ์ž‘์—…์„ ์„ฑ๊ณต์‹œํ‚ค๋Š” behavior๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. ์ •๊ตํ•œ ์Šคํ‚ฌ๋ณ„ ๋ณด์ƒ ์„ค๊ณ„๋‚˜ system identification์— ์˜์กดํ•œ ์ด์ „ ๋ฐฉ๋ฒ•(Hwangbo et al. ANYmal, Xie et al. Cassie, Yu et al. Darwin OP2)๋ณด๋‹ค ๋” ๋‹ค์–‘ํ•˜๊ณ  ๋ฏผ์ฒฉํ•œ ํ–‰๋™ ์„ ์‹ค๋กœ๋ด‡์—์„œ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ• (Overview)

์›ํ•˜๋Š” ์Šคํ‚ฌ์˜ reference motion(์‹ค์ œ ๋™๋ฌผ mocap ๋“ฑ)์„ ์ž…๋ ฅ๋ฐ›์•„, RL๋กœ ๊ทธ ์Šคํ‚ฌ์„ ์‹ค์„ธ๊ณ„์—์„œ ์žฌํ˜„ํ•˜๋Š” ์ •์ฑ…์„ ํ•ฉ์„ฑํ•ฉ๋‹ˆ๋‹ค. 3๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค.

  1. Motion Retargeting: ๋ชจ์…˜ ํด๋ฆฝ์„ ์› ๋Œ€์ƒ(๋™๋ฌผ)์˜ ํ˜•ํƒœ์—์„œ ๋กœ๋ด‡ ํ˜•ํƒœ๋กœ inverse-kinematics ๋ฅผ ํ†ตํ•ด ๋งคํ•‘.
  2. Motion Imitation: retarget๋œ reference๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋กœ๋ด‡์ด ์žฌํ˜„ํ•˜๋„๋ก ์ •์ฑ… ํ•™์Šต. ์ „์ด๋ฅผ ์œ„ํ•ด domain randomization ์ ์šฉ.
  3. Domain Adaptation: ํ•™์Šต๋œ latent dynamics ํ‘œํ˜„ ์„ ์ด์šฉํ•ด ์ •์ฑ…์„ ์‹ค๋กœ๋ด‡์— sample-efficientํ•˜๊ฒŒ ์ ์‘.

1. Motion Retargeting

๋กœ๋ด‡๊ณผ ๋ชจ์…˜์„ ์–ป์€ ๋™๋ฌผ์˜ ํ˜•ํƒœ๊ฐ€ ๋‹ค๋ฅด๋ฏ€๋กœ IK๋กœ retargetํ•ฉ๋‹ˆ๋‹ค. ํ‚คํฌ์ธํŠธ๋Š” ๋ฐœ(feet)๊ณผ ์—‰๋ฉ์ด(hips) ์œ„์น˜๋ฅผ ์”๋‹ˆ๋‹ค. source ๋ชจ์…˜์ด ๊ฐ ํ‚คํฌ์ธํŠธ i ์˜ 3D ์œ„์น˜ \hat{\mathbf x}_i(t) ๋ฅผ ์ง€์ •ํ•˜๋ฉด, ๋กœ๋ด‡ ์ž์„ธ \mathbf q_t ์— ์˜ํ•ด ๊ฒฐ์ •๋˜๋Š” ๋Œ€์‘ ํ‚คํฌ์ธํŠธ \mathbf x_i(\mathbf q_t) ๊ฐ€ ์ด๋ฅผ ์ถ”์ข…ํ•˜๋„๋ก ์ž์„ธ์—ด \mathbf q_{0:T} ๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. default ์ž์„ธ \bar{\mathbf q} ์—์„œ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚˜์ง€ ์•Š๋„๋ก ์ •๊ทœํ™” ํ•ญ(๊ด€์ ˆ๋ณ„ ๊ณ„์ˆ˜ ๋Œ€๊ฐํ–‰๋ ฌ \mathbf W)์„ ๋”ํ•ฉ๋‹ˆ๋‹ค.

\underset{\mathbf q_{0:T}}{\arg\min} \sum_t \sum_i \big\lVert \hat{\mathbf x}_i(t) - \mathbf x_i(\mathbf q_t) \big\rVert^2 + (\bar{\mathbf q} - \mathbf q_t)^T \mathbf W (\bar{\mathbf q} - \mathbf q_t)

2. Motion Imitation

ํ‘œ์ค€ RL ๋ชฉํ‘œ J(\pi) = \mathbb E_{\tau\sim p(\tau\mid\pi)}\big[\sum_{t=0}^{T-1}\gamma^t r_t\big] ๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋˜, ์ •์ฑ… ์ž…๋ ฅ์— ๋ชจ๋ฐฉํ•  ๋ชจ์…˜์„ ์ง€์ •ํ•˜๋Š” goal \mathbf g_t ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค: \pi(\mathbf a_t \mid \mathbf s_t, \mathbf g_t). ์ •์ฑ…์€ 30Hz ๋กœ ์งˆ์˜๋ฉ๋‹ˆ๋‹ค.

  • ์ƒํƒœ \mathbf s_t = (\mathbf q_{t-2:t}, \mathbf a_{t-3:t-1}): ์ง์ „ 3์Šคํ… ์ž์„ธ + ์ง์ „ 3 ํ–‰๋™. ์ž์„ธ feature๋Š” root ๋ฐฉํ–ฅ(roll/pitch/yaw)์˜ IMU ๊ฐ’๊ณผ ๊ฐ ๊ด€์ ˆ์˜ ๋กœ์ปฌ ํšŒ์ „. root ์œ„์น˜๋Š” ์ œ์™ธ(์‹ค์„ธ๊ณ„ ๋ฐฐํฌ ์‹œ root ์œ„์น˜ ์ถ”์ • ๋ถ€๋‹ด ํšŒํ”ผ).
  • goal \mathbf g_t = (\hat{\mathbf q}_{t+1}, \hat{\mathbf q}_{t+2}, \hat{\mathbf q}_{t+10}, \hat{\mathbf q}_{t+30}): reference์˜ ๋ฏธ๋ž˜ 4๊ฐœ ์‹œ์  ๋ชฉํ‘œ ์ž์„ธ(์•ฝ 1์ดˆ ๋ฒ”์œ„).
  • ํ–‰๋™ \mathbf a_t: ๊ฐ ๊ด€์ ˆ PD ์ปจํŠธ๋กค๋Ÿฌ์˜ ๋ชฉํ‘œ ํšŒ์ „. ๋ถ€๋“œ๋Ÿฌ์šด ๋ชจ์…˜์„ ์œ„ํ•ด PD ๋ชฉํ‘œ์— low-pass filter ์ ์šฉ.

๋ณด์ƒ ํ•จ์ˆ˜ ๋Š” ๋ชฉํ‘œ ์ž์„ธ์—ด ์ถ”์ข…์„ ์œ ๋„ํ•˜๋Š” 5๊ฐœ ํ•ญ์˜ ๊ฐ€์ค‘ํ•ฉ์ž…๋‹ˆ๋‹ค.

r_t = w^p r_t^p + w^v r_t^v + w^e r_t^e + w^{rp} r_t^{rp} + w^{rv} r_t^{rv}

w^p=0.5,\ w^v=0.05,\ w^e=0.2,\ w^{rp}=0.15,\ w^{rv}=0.1

๊ฐ ํ•ญ(๋ชจ๋‘ exp ํ˜•ํƒœ):

r_t^p = \exp\Big[-5\sum_j \lVert \hat{\mathbf q}_t^j - \mathbf q_t^j \rVert^2\Big] \quad\text{(pose: ๊ด€์ ˆ ํšŒ์ „)}

r_t^v = \exp\Big[-0.1\sum_j \lVert \hat{\dot{\mathbf q}}_t^j - \dot{\mathbf q}_t^j \rVert^2\Big] \quad\text{(velocity: ๊ฐ์†๋„)}

r_t^e = \exp\Big[-40\sum_e \lVert \hat{\mathbf x}_t^e - \mathbf x_t^e \rVert^2\Big] \quad\text{(end-effector ์œ„์น˜)}

r_t^{rp} = \exp\big[-20\lVert \hat{\mathbf x}_t^{\text{root}} - \mathbf x_t^{\text{root}} \rVert^2 - 10\lVert \hat{\mathbf q}_t^{\text{root}} - \mathbf q_t^{\text{root}} \rVert^2\big] \quad\text{(root pose)}

r_t^{rv} = \exp\big[-2\lVert \hat{\dot{\mathbf x}}_t^{\text{root}} - \dot{\mathbf x}_t^{\text{root}} \rVert^2 - 0.2\lVert \hat{\dot{\mathbf q}}_t^{\text{root}} - \dot{\mathbf q}_t^{\text{root}} \rVert^2\big] \quad\text{(root velocity)}

3. Domain Adaptation

(A) Domain Randomization. ํ•™์Šต ์ค‘ ๋™์—ญํ•™์„ ๋ณ€ํ™”์‹œ์ผœ, ์„œ๋กœ ๋‹ค๋ฅธ ๋™์—ญํ•™์—์„œ ๊ธฐ๋Šฅํ•˜๋Š” ์ „๋žต์„ ์ •์ฑ…์ด ๋ฐฐ์šฐ๋„๋ก ๊ฐ•๊ฑด์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋ชจ๋“  ํ™˜๊ฒฝ์— ํ†ตํ•˜๋Š” ๋‹จ์ผ ์ „๋žต์€ ์—†์Šต๋‹ˆ๋‹ค โ€” ๊ทธ๋ž˜์„œ ์ ์‘์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

(B) Domain Adaptation (latent + information bottleneck). ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๋ฌด์ž‘์œ„ํ™”๋˜๋Š” ๋™์—ญํ•™ ํŒŒ๋ผ๋ฏธํ„ฐ \boldsymbol\mu \sim p(\boldsymbol\mu) ๋ฅผ stochastic encoder E ๊ฐ€ latent embedding \mathbf z \sim E(\mathbf z\mid\boldsymbol\mu) ๋กœ ์ธ์ฝ”๋”ฉํ•˜๊ณ , ์ด๋ฅผ ์ •์ฑ…์˜ ์ถ”๊ฐ€ ์ž…๋ ฅ์œผ๋กœ ์ค๋‹ˆ๋‹ค: \pi(\mathbf a\mid\mathbf s, \mathbf z). ํ•ต์‹ฌ์€ encoder์— information bottleneck ์„ ๋„ฃ์–ด, ๋™์—ญํ•™ ํŒŒ๋ผ๋ฏธํ„ฐ \mathbf M ๊ณผ ์ธ์ฝ”๋”ฉ \mathbf Z ์‚ฌ์ด ์ƒํ˜ธ์ •๋ณด I(\mathbf M, \mathbf Z) ์— ์ƒํ•œ I_c ๋ฅผ ๋‘๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

\underset{\pi, E}{\arg\max}\ \mathbb E_{\boldsymbol\mu\sim p(\boldsymbol\mu)} \mathbb E_{\mathbf z\sim E(\mathbf z\mid\boldsymbol\mu)} \mathbb E_{\tau\sim p(\tau\mid\pi,\boldsymbol\mu,\mathbf z)}\Big[\sum_{t=0}^{T-1}\gamma^t r_t\Big] \quad \text{s.t. } I(\mathbf M, \mathbf Z) \le I_c

bottleneck์ด ๊ฐ•ํ• ์ˆ˜๋ก(์ž‘์€ \beta) ์ •์ฑ…์ด ๋™์—ญํ•™์˜ ์ •ํ™•ํ•œ ๊ฐ’์— ๋œ ์˜์กดํ•ด ์ ์‘ ์ „ ์„ฑ๋Šฅ์ด ๋†’์ง€๋งŒ ์ ์‘ ํญ์€ ์ž‘์•„์ง€๊ณ , ์•ฝํ• ์ˆ˜๋ก ์ ์‘ ์ „์—” ๋œ ๊ฐ•๊ฑดํ•˜๋‚˜ ์ ์‘ ํ›„ ๊ฐœ์„ ์ด ํฝ๋‹ˆ๋‹ค.

(C) Real World Transfer. ์‹ค์„ธ๊ณ„์—์„œ๋Š” latent space์—์„œ ๋ˆ„์  ๋ณด์ƒ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” \mathbf z^* ๋ฅผ ํƒ์ƒ‰ํ•ด ์ ์‘ํ•ฉ๋‹ˆ๋‹ค.

\mathbf z^* = \underset{\mathbf z}{\arg\max}\ \mathbb E_{\tau\sim p^*(\tau\mid\pi,\mathbf z)}\Big[\sum_{t=0}^{T-1}\gamma^t r_t\Big]

๋ชจ๋ธ ๊ตฌ์กฐ

encoder E(\mathbf z\mid\boldsymbol\mu) ๋Š” ๋™์—ญํ•™ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ถ„ํฌ ํ‰๊ท ยทํ‘œ์ค€ํŽธ์ฐจ๋กœ ๋งคํ•‘ํ•˜๋Š” ์™„์ „์—ฐ๊ฒฐ๋ง(256, 128 ReLU). ์ •์ฑ… \pi(\mathbf a\mid\mathbf s, \mathbf g, \mathbf z) ๋Š” ์ƒํƒœยทgoalยท๋™์—ญํ•™ ์ธ์ฝ”๋”ฉ์„ ๋ฐ›์•„ ๊ฐ€์šฐ์‹œ์•ˆ ํ–‰๋™ ๋ถ„ํฌ์˜ ํ‰๊ท ์„ ์ถœ๋ ฅ(512, 256์ธต, ํ‘œ์ค€ํŽธ์ฐจ๋Š” ๊ณ ์ • ๋Œ€๊ฐํ–‰๋ ฌ). ๊ฐ€์น˜ํ•จ์ˆ˜ V(\mathbf s, \mathbf g, \boldsymbol\mu) ๋Š” ๋ณ„๋„ ๋ง(512, 256).

์‹คํ—˜

์…‹์—…: 18-DoF 4์กฑ ๋กœ๋ด‡(๋‹ค๋ฆฌ๋‹น 3 ๊ตฌ๋™ ์ž์œ ๋„ ร—4 = 12 + root 6 ๋ฏธ๊ตฌ๋™). mocap์€ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹. ์„ฑ๋Šฅ์€ ์ •๊ทœํ™” return(0=์ตœ์†Œ, 1=์ตœ๋Œ€)์œผ๋กœ ๊ธฐ๋ก. ๊ฐ ์ •์ฑ…์€ PPO๋กœ ์•ฝ 2์–ต ์ƒ˜ํ”Œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•™์Šต(reparameterization trick์œผ๋กœ end-to-end). ์‹ค์„ธ๊ณ„ ์ ์‘์€ AWR(Advantage-Weighted Regression) ์„ latent dynamics ๊ณต๊ฐ„์—์„œ ์ˆ˜ํ–‰, ์ •์ฑ…๋‹น ์•ฝ 50ํšŒ ์‹ค์„ธ๊ณ„ trial(์Šคํ‚ฌ๋‹น 5~10์ดˆ).

ํ•™์Šตํ•œ ์Šคํ‚ฌ

pacingยทtrotting ๊ฐ™์€ ๋ณดํ–‰๊ณผ ๋ฏผ์ฒฉํ•œ turningยทspinning์„ ํ•™์Šต โ€” ์„œ๋กœ ๋‹ค๋ฅธ reference motion์„ ์ฃผ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ๋‹ค์–‘ํ•œ gait๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค(pacing: ๊ฐ™์€ ์ชฝ ๋‘ ๋‹ค๋ฆฌ๊ฐ€ ํ•จ๊ป˜, ๋А๋ฆฐ ์†๋„ / trotting: ๋Œ€๊ฐ ๋‹ค๋ฆฌ๊ฐ€ ํ•จ๊ป˜, ๋น ๋ฅธ ์†๋„). mocap์„ ๊ฑฐ๊พธ๋กœ ์žฌ์ƒํ•ด ํ›„์ง„ gait ๋„ ํ•™์Šตํ–ˆ๋Š”๋ฐ, ์ œ์กฐ์‚ฌ ์ปจํŠธ๋กค๋Ÿฌ๋ณด๋‹ค ๋นจ๋ž์Šต๋‹ˆ๋‹ค(์ œ์กฐ์‚ฌ ์ตœ๊ณ  0.84 m/s, Dog Trot 1.08 m/s, ํ›„์ง„ trot 1.20 m/s). ์•„ํ‹ฐ์ŠคํŠธ๊ฐ€ ๋งŒ๋“  ์• ๋‹ˆ๋ฉ”์ด์…˜(๊ณต์ค‘ 90ยฐ ํšŒ์ „ Hop-Turn ๋“ฑ)๋„ ๋ชจ๋ฐฉํ–ˆ์œผ๋‚˜, Running Man์ฒ˜๋Ÿผ ์ผ๋ถ€ ๋™์ž‘์€ ์žฌํ˜„์ด ์–ด๋ ค์› ์Šต๋‹ˆ๋‹ค.

์‹ค์„ธ๊ณ„ ์„ฑ๋Šฅ (Fig. 5, 7)

๋„ค ๋ฐฉ๋ฒ• ๋น„๊ต โ€” No Rand(๋ฌด์ž‘์œ„ํ™” ์—†์Œ), Robust(๋ฌด์ž‘์œ„ํ™”O, ์ ์‘X), Adaptive (Before/After). 3๊ฐœ ์‹œ๋“œ ร— 5 ์—ํ”ผ์†Œ๋“œ = ๋ฐฉ๋ฒ•๋‹น 15 trial.

  • ์ ์‘ํ˜• ์ •์ฑ…์ด ๋Œ€๋ถ€๋ถ„ ์Šคํ‚ฌ์—์„œ ๋น„์ ์‘ ์ •์ฑ…์„ ๋Šฅ๊ฐ€.
  • ๋‹จ์ˆœ ์Šคํ‚ฌ(In-Place Steps, Side-Steps)์€ Robust๋งŒ์œผ๋กœ๋„ ์ „์ด ์ถฉ๋ถ„. ํ•˜์ง€๋งŒ ๋™์  ์Šคํ‚ฌ(Dog Pace, Dog Spin)์€ Robust๊ฐ€ ๋„˜์–ด์ง€๊ธฐ ์‰ฌ์šด ๋ฐ˜๋ฉด adaptive๋Š” ์ผ๊ด€๋˜๊ฒŒ ์ˆ˜ํ–‰.
  • ๋ฌด์ž‘์œ„ํ™” ์—†๋Š” ์ •์ฑ…์€ ๋Œ€๋ถ€๋ถ„ ์Šคํ‚ฌ์—์„œ ์ „์ด ์‹คํŒจ.
  • Fig. 7(๋„˜์–ด์ง€๊ธฐ๊นŒ์ง€ ์‹œ๊ฐ„): ์ ์‘ํ˜•์ด ๊ท ํ˜•์„ ๋” ์˜ค๋ž˜ ์œ ์ง€ํ•˜๋ฉฐ, ์ข…์ข… ์ตœ๋Œ€ ์—ํ”ผ์†Œ๋“œ ๊ธธ์ด๊นŒ์ง€ ๋ฒ„ํŒ€.

Out-of-distribution & Information Bottleneck (Fig. 8โ€“10)

  • OOD: ํ•™์Šต ๋ฒ”์œ„๋ณด๋‹ค ๋„“์€ ๋™์—ญํ•™์„ ์ƒ˜ํ”Œํ•œ 100๊ฐœ ์‹œ๋ฎฌ ํ™˜๊ฒฝ์—์„œ, ์ ์‘ํ˜•์ด ๋” ๋‹ค์–‘ํ•œ ๋™์—ญํ•™์—์„œ ๋†’์€ return ์„ ๋‹ฌ์„ฑ(์˜ˆ: Dog Pace์—์„œ ์ ์‘ํ˜•์€ 50% ํ™˜๊ฒฝ์—์„œ return>0.6, robust๋Š” 38%). ์ ์‘ ํ•™์Šต ๊ณก์„ (Fig. 9)์€ ๋น„๊ต์  ์ ์€ ์—ํ”ผ์†Œ๋“œ๋กœ ์ƒˆ ํ™˜๊ฒฝ์— ์ ์‘.
  • Information bottleneck: \beta=10^{-4} ๊ฐ€ ๊ฐ•๊ฑด์„ฑ๊ณผ ์ ์‘์„ฑ์˜ ์ข‹์€ ์ ˆ์ถฉ. bottleneck์ด ์žˆ๋Š”(IB) ์ •์ฑ…์ด ์—†๋Š”(No IB) ์ •์ฑ…๋ณด๋‹ค ์ ์‘ ์ „ยทํ›„ ๋ชจ๋‘ ๋Œ€์ฒด๋กœ ์šฐ์ˆ˜.

๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

  • ๋ณด์ƒ ์„ค๊ณ„ ๋ถ€๋‹ด ์ œ๊ฑฐ. reference motion์„ ๋ชจ๋ฐฉํ•จ์œผ๋กœ์จ ์Šคํ‚ฌ๋ณ„ ์ •๊ตํ•œ ๋ณด์ƒ ํ•จ์ˆ˜ ์„ค๊ณ„๋ฅผ ์—†์• ๊ณ , ํ•˜๋‚˜์˜ ์‹œ์Šคํ…œ์œผ๋กœ ๋‹ค์–‘ํ•œ ๋ฏผ์ฒฉ ์Šคํ‚ฌ ์„ ์ž๋™ ํ•ฉ์„ฑํ•ฉ๋‹ˆ๋‹ค. mocap์„ ๊ฑฐ๊พธ๋กœ ์žฌ์ƒํ•ด ํ›„์ง„ gait๋ฅผ ์–ป๋Š” ๋“ฑ ํ™•์žฅ๋„ ์†์‰ฝ์Šต๋‹ˆ๋‹ค.
  • latent + IB ๊ธฐ๋ฐ˜ ์ ์‘์˜ ํšจ์œจ์„ฑ. domain randomization์œผ๋กœ ๊ฐ•๊ฑดํ•œ ์ ์‘ํ˜• ์ •์ฑ…์„ ๋งŒ๋“ค๊ณ , latent space ํƒ์ƒ‰์œผ๋กœ ์•ฝ 50ํšŒ trial ๋งŒ์— ์‹ค๋กœ๋ด‡์— ์ ์‘ํ•ฉ๋‹ˆ๋‹ค. information bottleneck์œผ๋กœ ๊ฐ•๊ฑด์„ฑโ†”๏ธŽ์ ์‘์„ฑ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๋ฅผ ์กฐ์ ˆํ•˜๋Š” ์ ์ด ์šฐ์•„ํ•ฉ๋‹ˆ๋‹ค.
  • ์‹ค๋กœ๋ด‡ ๊ฒ€์ฆ์˜ ํญ. pacingยทtrottingยทspinยทhop-turn ๋“ฑ ๋‹ค์–‘ํ•œ ๋™์ž‘์„ ์‹ค์ œ Laikago์—์„œ ๋ณด์˜€๊ณ , OOD ๋™์—ญํ•™์—์„œ๋„ ์ ์‘ํ˜•์˜ ์šฐ์œ„๋ฅผ ์ •๋Ÿ‰ํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์‹ค์„ธ๊ณ„ ์นœํ™”์  ์„ค๊ณ„. root ์œ„์น˜๋ฅผ ์ƒํƒœ์—์„œ ์ œ์™ธํ•˜๊ณ  PD ์ €์—ญํ†ต๊ณผ ํ•„ํ„ฐ๋ฅผ ์“ฐ๋Š” ๋“ฑ, ๋ฐฐํฌ ํ˜„์‹ค(์ถ”์ • ๋ถˆํ™•์‹ค์„ฑยท์ง„๋™)์„ ๊ณ ๋ คํ–ˆ์Šต๋‹ˆ๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

  • ์ €์ž๊ฐ€ ์ธ์ •ํ•œ ๋™์  ํ–‰๋™์˜ ํ•œ๊ณ„. ํ•˜๋“œ์›จ์–ดยท์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ œ์•ฝ์œผ๋กœ ํฐ ์ ํ”„๋‚˜ ๋น ๋ฅธ ๋‹ฌ๋ฆฌ๊ธฐ ๊ฐ™์€ ๋” ๋™์ ์ธ ํ–‰๋™์€ ํ•™์Šตํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ˆ˜์ž‘์—… ์ปจํŠธ๋กค๋Ÿฌ ๋Œ€๋น„ ์•ˆ์ •์„ฑ. ํ•™์Šต๋œ ์ปจํŠธ๋กค๋Ÿฌ๋Š” ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์ˆ˜๋™ ์„ค๊ณ„ ์ปจํŠธ๋กค๋Ÿฌ๋งŒํผ ์•ˆ์ •์ ์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋” ๋ณต์žกํ•œ ์‹ค์„ธ๊ณ„ ์‘์šฉ์—” ๊ฐ•๊ฑด์„ฑ ํ–ฅ์ƒ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • referenceยทmocap ์˜์กด. ์ข‹์€ reference motion(mocap/์• ๋‹ˆ๋ฉ”์ด์…˜)์ด ์žˆ์–ด์•ผ ํ•˜๋ฉฐ, ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถ€์ •ํ™•ํ•œ ์• ๋‹ˆ๋ฉ”์ด์…˜์€ ์ผ๋ถ€ ๋™์ž‘(Running Man)์—์„œ ์žฌํ˜„ ์‹คํŒจ๋ฅผ ๋‚ณ์•˜์Šต๋‹ˆ๋‹ค.
  • ์ ์‘์— ์‹ค์„ธ๊ณ„ trial ํ•„์š”. 50ํšŒ ์ˆ˜์ค€์ด๋ผ ์ ์ง€๋งŒ, ์œ„ํ—˜ํ•œ ๋™์  ์Šคํ‚ฌ์—์„œ๋Š” ์‹ค์„ธ๊ณ„ trial ์ž์ฒด๊ฐ€ ๋น„์šฉยท์œ„ํ—˜์„ ๋™๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค(์ถ”์ธก). ์ €์ž๋Š” ํ–ฅํ›„ ๋น„๋””์˜ค ํด๋ฆฝ ๋“ฑ์œผ๋กœ ํ–‰๋™ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

์ด ๋…ผ๋ฌธ์€ ๋™๋ฌผ mocap ๋ชจ๋ฐฉ ์œผ๋กœ 4์กฑ ๋กœ๋ด‡์˜ ๋‹ค์–‘ํ•˜๊ณ  ๋ฏผ์ฒฉํ•œ ๋ณดํ–‰ ์Šคํ‚ฌ์„ ํ•™์Šตํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ์€ (1) IK ๊ธฐ๋ฐ˜ motion retargeting, (2) goal-conditioned ๋ณด์ƒ์œผ๋กœ ๋ชจ์…˜์„ ๋”ฐ๋ผ๊ฐ€๋Š” motion imitation(+ domain randomization), (3) latent dynamics + information bottleneck ๊ธฐ๋ฐ˜ sample-efficient domain adaptation ์ž…๋‹ˆ๋‹ค. reference motion์ด ์Šคํ‚ฌ๋ณ„ ๋ณด์ƒ ์„ค๊ณ„๋ฅผ ๋Œ€์ฒดํ•˜๊ณ , latent space ํƒ์ƒ‰์ด sim-to-real ์ „์ด๋ฅผ ํšจ์œจํ™”ํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜์น˜๋กœ ์ •๋ฆฌํ•˜๋ฉด, 18-DoF Laikago์—์„œ pacingยทtrottingยทํ›„์ง„ gait(์ตœ๊ณ  1.20 m/s)ยทspinยทhop-turn์„ ํ•™์Šตํ–ˆ๊ณ , ์•ฝ 50ํšŒ ์‹ค์„ธ๊ณ„ trial ์˜ ์ ์‘์œผ๋กœ ์ ์‘ํ˜• ์ •์ฑ…์ด ๋น„์ ์‘ ์ •์ฑ… ๋Œ€๋น„ ๋” ์•ˆ์ •์ ์œผ๋กœ(๋„˜์–ด์ง€๊ธฐ๊นŒ์ง€ ์˜ค๋ž˜ ๋ฒ„ํŒ€) ๋™์ž‘ํ–ˆ์œผ๋ฉฐ, OOD ๋™์—ญํ•™์—์„œ๋„ ์šฐ์œ„๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. information bottleneck \beta=10^{-4} ๊ฐ€ ๊ฐ•๊ฑด์„ฑ๊ณผ ์ ์‘์„ฑ์˜ ์ข‹์€ ์ ˆ์ถฉ์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

์‹ค๋ฌด ๊ด€์ ์—์„œ ์ด ์—ฐ๊ตฌ์˜ ๊ฐ€์น˜๋Š” โ€œ์Šคํ‚ฌ๋ณ„ ๋ณด์ƒ ์„ค๊ณ„ ์—†์ด, ๋™๋ฌผ ๋ชจ์…˜์„ ๋ชจ๋ฐฉํ•˜๊ณ  ํšจ์œจ์ ์œผ๋กœ ์ ์‘์‹œ์ผœ ๋‹ค์–‘ํ•œ ๋ฏผ์ฒฉ ๋ณดํ–‰์„ ์‹ค๋กœ๋ด‡์—์„œ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ ๊ฒƒโ€ ์— ์žˆ์Šต๋‹ˆ๋‹ค. ํฐ ์ ํ”„ยท๋‹ฌ๋ฆฌ๊ธฐ ๊ฐ™์€ ๋™์  ํ–‰๋™๊ณผ ์ตœ๊ณ  ์ˆ˜๋™ ์ปจํŠธ๋กค๋Ÿฌ ์ˆ˜์ค€์˜ ์•ˆ์ •์„ฑ์€ ํ•œ๊ณ„๋กœ ๋‚จ์ง€๋งŒ, imitation + latent adaptation ์ด๋ผ๋Š” ํ‹€์€ ์ดํ›„ ๋ณดํ–‰ ๋กœ๋ด‡ ํ•™์Šต ์—ฐ๊ตฌ์˜ ์ค‘์š”ํ•œ ํ† ๋Œ€๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee