Curieux.JY
  • Post
  • Note
  • Jung Yeon Lee

On this page

  • Brief Review
  • Detail Review
    • 1. ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ
    • 2. IBRL ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐœ์š”
    • 3. ์ด๋ก ์  ๊ธฐ์ดˆ
    • 4. ์‹คํ—˜ ์„ค์ •
    • 5. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹คํ—˜ ๊ฒฐ๊ณผ
      • 5.1 Robomimic ํƒœ์Šคํฌ
      • 5.2 Meta-World ํƒœ์Šคํฌ
      • 5.3 ์ฃผ์š” ๊ฒฐ๊ณผ ์š”์•ฝ
    • 6. ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜ ๋ฐ ์ ์šฉ์„ฑ
    • 7. ๊ธฐ์—ฌ, ํ•œ๊ณ„ ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ
    • 8. ๊ฒฐ๋ก 

๐Ÿ“ƒIBRL ๋ฆฌ๋ทฐ

rl
il
Imitation Bootstrapped Reinforcement Learning
Published

October 30, 2025

  • Paper Link
  • Code
  1. ๋ชจ๋ฐฉ ํ•™์Šต(IL)๊ณผ ๊ฐ•ํ™” ํ•™์Šต(RL)์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•œ IBRL(Imitation Bootstrapped Reinforcement Learning)์€ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ์ƒˆ๋กœ์šด RL ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ์ด ๋ฐฉ๋ฒ•์€ ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ์œผ๋กœ ํ›ˆ๋ จ๋œ ๋ณ„๋„์˜ IL ์ •์ฑ…์„ ํ™œ์šฉํ•˜์—ฌ ์˜จ๋ผ์ธ ์ƒํ˜ธ์ž‘์šฉ์—์„œ ๋” ๋‚˜์€ ์•ก์…˜์„ ์ œ์•ˆํ•˜๊ณ , RL ํ›ˆ๋ จ ์‹œ Q-ํ•จ์ˆ˜์˜ ๋ชฉํ‘œ ๊ฐ’ ์ถ”์ •์„ ๋ถ€ํŠธ์ŠคํŠธ๋žฉํ•˜์—ฌ ํƒ์ƒ‰๊ณผ ํ•™์Šต ํšจ์œจ์„ ํฌ๊ฒŒ ๊ฐ€์†ํ™”ํ•ฉ๋‹ˆ๋‹ค.
  3. IBRL์€ 6๊ฐ€์ง€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ 3๊ฐ€์ง€ ์‹ค์ œ ๋กœ๋ด‡ ์ž‘์—…์—์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์„ ํฌ๊ฒŒ ๋Šฅ๊ฐ€ํ•˜๋ฉฐ, ํŠนํžˆ ์–ด๋ ค์šด ์ž‘์—…์—์„œ ํƒ์›”ํ•œ ์„ฑ๋Šฅ๊ณผ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

Brief Review

๋ณธ ๋…ผ๋ฌธ์€ ๋ชจ๋ฐฉ ํ•™์Šต(Imitation Learning, IL)๊ณผ ๊ฐ•ํ™” ํ•™์Šต(Reinforcement Learning, RL)์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ์ธ IBRL(Imitation Bootstrapped Reinforcement Learning)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋กœ๋ด‡ ์ œ์–ด ํƒœ์Šคํฌ์—์„œ IL์€ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ ๋•Œ๋ฌธ์— ๋„๋ฆฌ ์‚ฌ์šฉ๋˜์ง€๋งŒ, ๋ชจ๋“  ์‹œ๋‚˜๋ฆฌ์˜ค์— ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ํฌ๊ด„์ ์ธ ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ(demonstrations) ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ๊ฒƒ์€ ๋น„์šฉ์ด ๋งŽ์ด ๋“ค๊ณ , ๋ถ„ํฌ ๋ณ€ํ™”(distribution shift) ๋ฐœ์ƒ ์‹œ ๋ฐ์ดํ„ฐ ์žฌ์ˆ˜์ง‘์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด RL์€ ์ž์œจ์ ์ธ ์ž๊ธฐ ๊ฐœ์„  ์ ˆ์ฐจ๋กœ์„œ IL์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐœ์ „ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ๋งค๋ ฅ์ ์ž…๋‹ˆ๋‹ค.

IBRL์˜ ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ๋…๋ฆฝ์ ์ธ ๋ชจ๋ฐฉ ์ •์ฑ… (\mu_\psi) ํ›ˆ๋ จ: ์ œ๊ณต๋œ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ„๋„์˜ ๋…๋ฆฝ์ ์ธ ๋ชจ๋ฐฉ ํ•™์Šต ์ •์ฑ… \mu_\psi๋ฅผ ๋จผ์ € ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ์ด IL ์ •์ฑ…์€ ์˜จ๋ผ์ธ RL์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ๊นŠ๊ณ  ๊ฐ•๋ ฅํ•œ ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ๋‘ ๋‹จ๊ณ„์—์„œ์˜ IL ์ •์ฑ… ํ™œ์šฉ: ํ›ˆ๋ จ๋œ IL ์ •์ฑ…์€ RL ํ›ˆ๋ จ์„ ๊ฐ€์†ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„์—์„œ ๋ช…์‹œ์ ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์˜จ๋ผ์ธ ์ƒํ˜ธ์ž‘์šฉ (Actor Proposal): ์˜จ๋ผ์ธ ํ™˜๊ฒฝ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ ๋‹จ๊ณ„์—์„œ, IL ์ •์ฑ…๊ณผ ํ˜„์žฌ ํ›ˆ๋ จ ์ค‘์ธ RL ์ •์ฑ… (\pi_\theta)์€ ๊ฐ๊ฐ ํ–‰๋™(a^{IL}, a^{RL})์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์—์ด์ „ํŠธ๋Š” ํ•™์Šต ์ค‘์ธ Q-ํ•จ์ˆ˜(Q-function)์˜ ํƒ€๊ฒŸ Q-ํ•จ์ˆ˜ Q_{\phi'}์— ๋”ฐ๋ผ ๋” ๋†’์€ Q-๊ฐ’์„ ๊ฐ€์ง€๋Š” ํ–‰๋™์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋‹ค์Œ ํ–‰๋™ a^*๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค. a^* = \underset{a \in \{a^{IL}, a^{RL}\}}{\text{argmax}} Q_{\phi'}(s, a)
    • RL ํ›ˆ๋ จ (Bootstrap Proposal): RL์˜ Q-๊ฐ’ ์—…๋ฐ์ดํŠธ๋ฅผ ์œ„ํ•œ ํƒ€๊ฒŸ ๊ฐ’์„ ๊ณ„์‚ฐํ•  ๋•Œ, ๋‹จ์ˆœํžˆ RL ์ •์ฑ…์˜ ํƒ€๊ฒŸ ๋„คํŠธ์›Œํฌ \pi_{\theta'}์—์„œ ์ƒ˜ํ”Œ๋ง๋œ ํ–‰๋™ a^{RL}_{t+1}๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹ , IL ์ •์ฑ…์—์„œ ์ƒ˜ํ”Œ๋ง๋œ ํ–‰๋™ a^{IL}_{t+1}๊ณผ RL ์ •์ฑ…์—์„œ ์ƒ˜ํ”Œ๋ง๋œ a^{RL}_{t+1} ์ค‘ ๋” ๋†’์€ Q-๊ฐ’์„ ๊ฐ€์ง€๋Š” ํ–‰๋™์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ€ํŠธ์ŠคํŠธ๋žฉํ•ฉ๋‹ˆ๋‹ค. Q_\phi(s_t, a_t) \leftarrow r_t + \gamma \underset{a' \in \{a^{IL}_{t+1}, a^{RL}_{t+1}\}}{\text{max}} Q_{\phi'}(s_{t+1}, a')
    • ๋˜ํ•œ, ๋‹ค๋ฅธ ์„ ํ–‰ ์—ฐ๊ตฌ์™€ ์œ ์‚ฌํ•˜๊ฒŒ, RL ๋ฆฌํ”Œ๋ ˆ์ด ๋ฒ„ํผ(replay buffer)๋ฅผ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋กœ ๋ฏธ๋ฆฌ ์ฑ„์›Œ์„œ ์ •์ฑ…์ด ์ฒซ ๋ฒˆ์งธ ์˜จ๋ผ์ธ ์„ฑ๊ณต์„ ๊ฑฐ๋‘๊ธฐ ์ „์— ํ•™์Šต ์‹ ํ˜ธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

IBRL๋Š” IL ์ •์ฑ…์„ RL ์ •์ฑ…๊ณผ ๋ณ„๋„๋กœ ์œ ์ง€ํ•จ์œผ๋กœ์จ, ์น˜๋ช…์ ์ธ ๋ง๊ฐ(catastrophic forgetting)์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๋ช…์‹œ์ ์ธ ์ •๊ทœํ™” ์†์‹ค(regularization loss)์ด๋‚˜ ๋ณต์žกํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์—†์ด RL๊ณผ IL์ด ๊ฐ์ž์˜ ํƒœ์Šคํฌ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ฒ˜์™€ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด RL ์ •์ฑ…์ด ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ ๋ฏธํกํ•  ๋•Œ ํƒ์ƒ‰(exploration) ํ’ˆ์งˆ๊ณผ ๊ฐ€์น˜ ์ถ”์ •(value estimation)์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์€ ๋˜ํ•œ IBRL์˜ ์„ฑ๋Šฅ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์•„ํ‚คํ…์ฒ˜์  ๊ฐœ์„  ์‚ฌํ•ญ๋“ค์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค:

  • Actor Dropout: ์ •์ฑ… ๋„คํŠธ์›Œํฌ(actor) \pi_\theta์— Dropout์„ ์ ์šฉํ•˜์—ฌ ์•ˆ์ •์„ฑ๊ณผ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐœ์„ ๋œ Vision Encoder ๋ฐ Critic ๋””์ž์ธ: ์ด๋ฏธ์ง€ ์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•  ๋•Œ, ๊ธฐ์กด์˜ ์–•์€ ConvNet ๋Œ€์‹  ์–•์€ ViT(Vision Transformer) ๊ธฐ๋ฐ˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋” ๋ณต์žกํ•œ ํƒœ์Šคํฌ์—์„œ ์„ฑ๋Šฅ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

IBRL์€ 6๊ฐœ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํƒœ์Šคํฌ(Meta-World ๋ฐ Robomimic)์™€ 3๊ฐœ์˜ ์‹ค์ œ ๋กœ๋ด‡ ํƒœ์Šคํฌ(Lift, Drawer, Hang)์— ๊ฑธ์ณ ๋‹ค์–‘ํ•œ ๋‚œ์ด๋„ ์ˆ˜์ค€์—์„œ ํ‰๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ํƒœ์Šคํฌ๋Š” ํฌ์†Œํ•œ 0/1 ๋ณด์ƒ(sparse 0/1 reward)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. IBRL์€ ๋ชจ๋“  ํƒœ์Šคํฌ์—์„œ ๊ธฐ์กด์˜ ๊ฐ•๋ ฅํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ๋Šฅ๊ฐ€ํ•˜๊ฑฐ๋‚˜ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ํŠนํžˆ ์–ด๋ ค์šด ํƒœ์Šคํฌ์—์„œ ๊ทธ ๊ฐœ์„  ํญ์ด ๋‘๋“œ๋Ÿฌ์กŒ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ฐ€์žฅ ์–ด๋ ค์šด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํƒœ์Šคํฌ์—์„œ๋Š” ๋‘ ๋ฒˆ์งธ๋กœ ์ข‹์€ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๊ฑฐ์˜ ๋‘ ๋ฐฐ์˜ ์„ฑ๋Šฅ์„ ๋ณด์˜€๊ณ , ๊นŒ๋‹ค๋กœ์šด ์‹ค์ œ ์ฒœ ๊ฑธ๊ธฐ(deformable cloth hanging) ํƒœ์Šคํฌ์—์„œ๋Š” ๋‘ ๋ฒˆ์งธ๋กœ ์ข‹์€ RL ๋ฐฉ๋ฒ•๋ณด๋‹ค 2.4๋ฐฐ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

Detail Review

Imitation Bootstrapped Reinforcement Learning (IBRL) ๋…ผ๋ฌธ ์‹ฌ์ธต ๋ฆฌ๋ทฐ

1. ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

๊ฐ•ํ™”ํ•™์Šต(RL)์€ ๋ณต์žกํ•œ ์ œ์–ด ๋ฌธ์ œ์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ, ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ ๋ฐ ํƒ์ƒ‰ ์–ด๋ ค์›€ ๋•Œ๋ฌธ์— ์‹ค์ œ ๋กœ๋ด‡ ์ œ์–ด์—๋Š” ๋„๋ฆฌ ์“ฐ์ด์ง€ ๋ชปํ•ด ์™”๋‹ค. ๋ฐ˜๋ฉด ๋ชจ๋ฐฉํ•™์Šต(IL, ์˜ˆ: ํ–‰๋™ ํด๋กœ๋‹)์€ ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ์ดˆ๊ธฐ ์ •์ฑ…์„ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์ปค๋ฒ„ํ•˜๊ธฐ ํž˜๋“ค๊ณ  ๋ฐฐํฌ ์‹œ ๋ถ„ํฌ ์ฐจ์ด ๋ฌธ์ œ๋กœ ์žฌ์ˆ˜์ง‘์ด ํ•„์š”ํ•˜๋‹ค. ๋”ฐ๋ผ์„œ ์†Œ์ˆ˜์˜ ์‹œ์—ฐ๋งŒ์œผ๋กœ ์‹œ์ž‘ํ•ด ์ž์œจ์ ์œผ๋กœ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ํ•™์Šต ๊ธฐ๋ฒ•์ด ์š”๊ตฌ๋œ๋‹ค.

๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ๋Œ€๊ฐœ (1) ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฆฌํ”Œ๋ ˆ์ด ๋ฒ„ํผ์— ์‚ฝ์ž…ํ•˜์—ฌ ํ•™์Šต ์‹œ ๊ณผ๋Œ€ ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๋ฐฉ์‹(RLPD: Reinforcement Learning from Prior Demonstrations), (2) ์‹œ์—ฐ์œผ๋กœ RL ์ •์ฑ…์„ ์‚ฌ์ „ํ•™์Šตํ•˜๊ณ  ์ดํ›„ ๋ฏธ์„ธ์กฐ์ • ์‹œ ์ถ”๊ฐ€ ๊ทœ์ œ(loss)๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐฉ์‹, ๋˜๋Š” (3) ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•(MoDem)์œผ๋กœ ์‹œ์—ฐ์„ ํ†ตํ•ด ์ •์ฑ…ยท๋น„ํ‰์žยท๋ชจ๋ธ์„ ๋ชจ๋‘ ์‚ฌ์ „ํ•™์Šตํ•œ ํ›„, ๋ชจ๋ธ ์˜ˆ์ธก ์ œ์–ด๋กœ ๊ฐ•ํ™”ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹ ๋“ฑ์ด ์žˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ (1)์˜ ๋ฐฉ์‹์€ IL์ด ์ผ๋ฐ˜ํ™”ํ•œ ์œ ์ตํ•œ ํ–‰๋™์„ ์ถฉ๋ถ„ํžˆ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•˜๊ณ , (2)์˜ ๋ฐฉ์‹์€ RL ๊ณผ์ •์—์„œ ์ดˆ๊ธฐ ์ง€์‹์„ ์žƒ์ง€ ์•Š๊ธฐ ์œ„ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์ด๋‚˜ ๋™์ผํ•œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ ์‚ฌ์šฉ ์ œ์•ฝ์ด ํ•„์š”ํ•˜๋ฉฐ, (3)์˜ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ์€ ๊ณ„์‚ฐ ๋น„์šฉ์ด ํฌ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ฐฉ ํ•™์Šต ์ •์ฑ…(IL ์ •์ฑ…)์„ ๊ฐ•ํ™”ํ•™์Šต์— ์ง์ ‘ ํ†ตํ•ฉํ•˜์—ฌ ์ƒ˜ํ”Œ ํšจ์œจ์„ ๋†’์ด๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ IBRL์„ ์ œ์•ˆํ•œ๋‹ค.

2. IBRL ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐœ์š”

IBRL์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” (๊ทธ๋ฆผ 1) ์šฐ์„  ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ์œผ๋กœ ๋ชจ๋ฐฉํ•™์Šต ์ •์ฑ…(\mu_\psi)์„ ํ•™์Šตํ•˜๊ณ , ์ด ์ •์ฑ…์„ RL ํ•™์Šต์˜ ๋‘ ๋‹จ๊ณ„์— ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์ฒซ์งธ, ์˜จ๋ผ์ธ ์ƒํ˜ธ์ž‘์šฉ ๋‹จ๊ณ„(Actor Proposal)์—์„œ๋Š” ๋งค ์‹œ์ ๋งˆ๋‹ค IL ์ •์ฑ…๊ณผ ํ˜„์žฌ ํ•™์Šต ์ค‘์ธ RL ์ •์ฑ…(\pi_\theta)์ด ๊ฐ๊ฐ ํ–‰๋™ a_{IL} \sim \mu_{\psi}(s), a_{RL} \sim \pi_{\theta}(s)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋‘ ํ›„๋ณด ํ–‰๋™์„ ํƒ€๊นƒ Q-๋„คํŠธ์›Œํฌ Q_{\phi'}๋กœ ํ‰๊ฐ€ํ•˜์—ฌ ๋” ๋†’์€ Q๊ฐ’์„ ๊ฐ–๋Š” ํ–‰๋™ a^{*} = \arg\max_{a \in \{ a_{IL},a_{RL}\}}Q_{\phi'}(s,a)๋ฅผ ์‹ค์ œ ํ–‰๋™์œผ๋กœ ์„ ํƒํ•œ๋‹ค(์‹ (1)). ์ด ๋ฐฉ์‹์œผ๋กœ IL ์ •์ฑ…์ด ์ดˆ๊ธฐ ํƒ์ƒ‰์—์„œ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ํ–‰๋™์„ ์ง€์†์ ์œผ๋กœ ์ œ๊ณตํ•จ์œผ๋กœ์จ, ํฌ์†Œ ๋ณด์ƒ ํ™˜๊ฒฝ์—์„œ ๋น ๋ฅธ ์„ฑ๊ณต ๊ฒฝํ—˜์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

๋‘˜์งธ, RL ํ•™์Šต ๋‹จ๊ณ„(Bootstrap Proposal)์—์„œ๋Š” Q-ํ•จ์ˆ˜ ์—…๋ฐ์ดํŠธ ์‹œ ๋‹ค์Œ ์ƒํƒœ์—์„œ์˜ ์ตœ๋Œ€ Q๊ฐ’์„ ๊ณ„์‚ฐํ•  ๋•Œ IL ์ •์ฑ…๊ณผ RL ์ •์ฑ…์ด ์ œ์•ˆํ•˜๋Š” ํ–‰๋™ ์ค‘ ๋” ๋†’์€ Q๊ฐ’์„ ๊ฐ–๋Š” ์ชฝ์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ฆ‰, ์ผ๋ฐ˜์ ์ธ TD ํƒ€๊นƒ r + \gamma Q'\left( s',\pi'(s') \right) ๋Œ€์‹ ์— r + \gamma\max\{ Q'\left( s',a_{IL} \right),Q'\left( s',a_{RL} \right)\} ํ˜•ํƒœ๋กœ ๊ฐ’ ํ•จ์ˆ˜๋ฅผ ๋ถ€ํŠธ์ŠคํŠธ๋ž˜ํ•‘ํ•œ๋‹ค.

์ด๋ฅผ ํ†ตํ•ด IL ์ •์ฑ…์ด ์ œ์•ˆํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ํ–‰๋™์ด Q-๊ฐ’ ํ•™์Šต์— ์ง์ ‘ ๋ฐ˜์˜๋˜์–ด ํ•™์Šต ์†๋„๊ฐ€ ๊ฐœ์„ ๋œ๋‹ค. ๋ชจ๋“ˆํ™”๋œ ๊ตฌ์กฐ ๋•๋ถ„์— IL ์ •์ฑ…๊ณผ RL ์ •์ฑ…์€ ๊ฐ์ž ์ตœ์ ํ™”๋œ ๋„คํŠธ์›Œํฌ(์˜ˆ: ResNet-18 vs. ViT)๋กœ ๋…๋ฆฝ์  ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, RL์— ์˜ํ•œ ์ดˆ๊ธฐ IL ์ง€์‹ ์†Œ์‹ค(catastrophic forgetting)์„ ์šฐ๋ คํ•˜์ง€ ์•Š์•„๋„ ๋œ๋‹ค. ๋˜ํ•œ ์ดˆ๊ธฐ ํ•™์Šต ์‹ ํ˜ธ๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด ๋ฆฌํ”Œ๋ ˆ์ด ๋ฒ„ํผ๋ฅผ ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ์œผ๋กœ ๋ฏธ๋ฆฌ ์ฑ„์šฐ๋Š” ๊ฒƒ์€ ์ข…์ „ ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์ ์šฉ๋œ๋‹ค.

3. ์ด๋ก ์  ๊ธฐ์ดˆ

IBRL์€ ํ‘œ์ค€ MDP \left( \mathcal{S},\mathcal{A},T,R,\gamma \right)๋ฅผ ๊ฐ€์ •ํ•˜๋ฉฐ, ์˜คํ”„-ํด๋ฆฌ์‹œ RL(TD3/SAC)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ๋น„ํ‰์ž(Q_\varphi) ๋„คํŠธ์›Œํฌ๋Š” ๊ฐ•ํ™”ํ•™์Šต ์†์‹ค L(\varphi) = \left( r_{t} + \gamma Q_{\phi'}\left( s_{t + 1},\pi_{\theta'}\left( s_{t + 1} \right) \right) - Q_{\varphi}\left( s_{t},a_{t} \right) \right)^{2}๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉฐ, ์ •์ฑ…(Actor) \pi_\theta๋Š” L(\theta) = - Q_{\varphi}\left( s,\pi_{\theta}(s) \right) ์†์‹ค๋กœ ํ•™์Šต๋œ๋‹ค.

๋ชจ๋ฐฉํ•™์Šต(IL) ์ •์ฑ… \mu_\psi๋Š” ์ „๋ฌธ๊ฐ€ ๋ฐ์ดํ„ฐ \mathcal{D}์—์„œ ์ตœ๋Œ€์šฐ๋„ ๋˜๋Š” ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ๋กœ ํ–‰๋™์„ ๋ณต์ œํ•˜์—ฌ ํ•™์Šต๋œ๋‹ค. IBRL์€ ๋จผ์ € ์ด ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ \mu_\psi๋ฅผ ํ•™์Šตํ•˜๊ณ , ๊ทธ ํ›„ ๊ฐ•ํ™”ํ•™์Šต ๊ณผ์ •์— \mu_\psi๋ฅผ ์ฐธ์กฐ ์ •์ฑ…(reference policy)์œผ๋กœ ํ™œ์šฉํ•œ๋‹ค.

๊ธฐ์กด ์—ฐ๊ตฌ ์ค‘ ๋น„์Šทํ•œ ์•„์ด๋””์–ด๋กœ๋Š”, ์‚ฌ๋žŒ์ด ๋งŒ๋“  ์ฐธ์กฐ ์ •์ฑ…์„ ์˜จ/์˜คํ”„-๋ผ์ธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” PEX, EfficientImitate ๋“ฑ์ด ์žˆ์œผ๋‚˜, ์ด๋“ค์€ ์ฃผ๋กœ ํƒ์ƒ‰ ๋ณด์กฐ์—๋งŒ IL ์ •์ฑ…์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์ €์ฐจ์› ๊ด€์ธก์— ์ œํ•œ๋˜๋ฉฐ ์‹คํ—˜๋„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์— ํ•œ์ •๋˜์—ˆ๋‹ค. IBRL์€ IL ์ •์ฑ…์„ ํƒ์ƒ‰๊ณผ ํ•™์Šต ๋‘ ๋‹จ๊ณ„์— ๋ชจ๋‘ ์ ๊ทน์ ์œผ๋กœ ํ™œ์šฉํ•˜๋ฉฐ ์‹ค์ œ ๋กœ๋ด‡ ํƒœ์Šคํฌ๊นŒ์ง€ ํ‰๊ฐ€ํ•œ ์ ์—์„œ ์ฐจ๋ณ„ํ™”๋œ๋‹ค.

4. ์‹คํ—˜ ์„ค์ •

์ €์ž๋“ค์€ 6๊ฐ€์ง€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํƒœ์Šคํฌ์™€ 3๊ฐ€์ง€ ์‹ค์ œ ๋กœ๋ด‡ ํƒœ์Šคํฌ์—์„œ IBRL์„ ํ‰๊ฐ€ํ–ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ํฌ์†Œ ๋ณด์ƒ์˜ ์—ฐ์†์ œ์–ด ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๋ฉฐ, ์ฃผ๋กœ ํ”ฝ&ํ”Œ๋ ˆ์ด์Šค ๊ณ„์—ด ๊ณผ์ œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” Robomimic ๋ฒค์น˜๋งˆํฌ(Stanford)์™€ Meta-World ๋ฒค์น˜๋งˆํฌ ํ™˜๊ฒฝ์„ ์„ ํƒํ–ˆ๋‹ค.

  • Robomimic ํƒœ์Šคํฌ (Lift, PickPlaceCan, NutAssemblySquare): ๋ธ”๋ก์„ ๋“ค์–ด์˜ฌ๋ฆฌ๊ธฐ, ์บ”์„ ํ”ฝ&ํ”Œ๋ ˆ์ด์Šค, ๋„ˆํŠธ ์กฐ๋ฆฝ์ด๋ผ๋Š” 3๋‹จ๊ณ„๋กœ ๋‚œ์ด๋„๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค. Lift๋Š” ๊ฐ„๋‹จํ•˜์—ฌ 1๊ฐœ, Can์€ ๋ณดํ†ตํ•˜์—ฌ 10๊ฐœ, Square๋Š” ์–ด๋ ค์›Œ 50๊ฐœ์˜ ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ๊ด€์ธก์€ ์ด๋ฏธ์ง€(ํ”ฝ์…€) ๋ฐ ๋กœ์šฐ-์Šคํ…Œ์ดํŠธ(๋กœ๋ด‡ ๊ด€์ ˆ ์ƒํƒœ)๋ฅผ ๋ณ‘ํ–‰ํ•˜์—ฌ ์‹คํ—˜ํ–ˆ๋‹ค.

  • Meta-World ํƒœ์Šคํฌ (Assembly, BoxClose, CoffeePush, StickPull): ์ฃผ์–ด์ง„ ์‹คํ—˜๊ตฐ์—์„œ ๋ฌด์ž‘์œ„๋กœ 4๊ฐœ๋ฅผ ์„ ์ •ํ–ˆ๋‹ค. ๊ฐ๊ฐ ์–ด์…ˆ๋ธ”๋ฆฌ, ๋ฐ•์Šค ๋‹ซ๊ธฐ, ์ปคํ”ผ ํ‘ธ์‹œ, ๋ง‰๋Œ€ ์žก์•„๋‹น๊ธฐ๊ธฐ๋กœ, ํƒœ์Šคํฌ๋‹น 3๊ฐœ์˜ ์Šคํฌ๋ฆฝํŠธ ๊ธฐ๋ฐ˜ ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ์ƒํƒœ๊ณต๊ฐ„์€ ์ด๋ฏธ์ง€(ํ”ฝ์…€)์ด๋‹ค.

๋น„๊ต ๋Œ€์ƒ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค. Robomimic์—์„œ๋Š” RLPD+(TD3 ๊ธฐ๋ฐ˜ ๊ตฌํ˜„, ๋ฐ๋ชจ ๊ณผ๋Œ€์ƒ˜ํ”Œ๋ง)์™€, BC ์‚ฌ์ „ํ•™์Šต ํ›„ ๊ทœ์ œ ๊ฐ•ํ™”ํ•™์Šต(BC+RLreg), SQIL(Synthetic Q-infilling) ๋“ฑ์„ ์‚ฌ์šฉํ–ˆ๋‹ค. Meta-World์—์„œ๋Š” MoDem(๋ชจ๋ธ ๊ธฐ๋ฐ˜ RL with demonstrations) ๋ฐ RLPD+๋ฅผ ํ•จ๊ป˜ ๋น„๊ตํ–ˆ๋‹ค. ๋ชจ๋“  ๋ฐฉ๋ฒ•์€ ๋™์ผํ•œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์™€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณต์œ ํ•˜์—ฌ ๊ณต์ • ๋น„๊ตํ•˜์˜€๋‹ค.

์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜์—์„œ๋Š” Franka ๋กœ๋ด‡ํŒ”์„ ์‚ฌ์šฉํ•˜์—ฌ Lift, Drawer, Hang ๊ณผ์ œ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ๊ฐ ๊ณผ์ œ๋ณ„๋กœ 10~30ํšŒ์˜ ์‹œ์—ฐ์„ ์ˆ˜์ง‘ํ•˜์˜€๊ณ , ๋งค ๋ฐฉ๋ฒ•์— ๋™์ผํ•œ ์ƒํ˜ธ์ž‘์šฉ ์˜ˆ์‚ฐ(์Šคํ… ์ˆ˜)๊ณผ ์ •์ฑ… ์—…๋ฐ์ดํŠธ ํšŸ์ˆ˜๋ฅผ ๋ถ€์—ฌํ–ˆ๋‹ค. ์„ฑ๊ณต ์—ฌ๋ถ€๋Š” ๋ฃฐ ๊ธฐ๋ฐ˜์˜ sparse ๋ณด์ƒ(์„ฑ๊ณต ์‹œ 1, ์•„๋‹ˆ๋ฉด 0)์œผ๋กœ ์ธก์ •ํ•˜์˜€๋‹ค.

5. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹คํ—˜ ๊ฒฐ๊ณผ

5.1 Robomimic ํƒœ์Šคํฌ

๊ทธ๋ฆผ 1์€ Robomimic์˜ Lift, PickPlaceCan(Can), NutAssemblySquare(Square) ๊ณผ์ œ์—์„œ ํ”ฝ์…€/์Šคํ…Œ์ดํŠธ ๊ด€์ธก์œผ๋กœ IBRL๊ณผ RLPD+(๊ธฐ์ค€์„ )๋ฅผ ๋น„๊ตํ•œ ํ•™์Šต ๊ณก์„ ์ด๋‹ค. ๊ทธ๋ž˜ํ”„์—์„œ ๋นจ๊ฐ„์ƒ‰ ์„ ์ด IBRL, ํŒŒ๋ž€์ƒ‰ ์„ ์ด RLPD+๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ ์„ ์€ IBRL ๊ธฐ๋ณธ ๋ณ€ํ˜•(์•ˆํ‹ฐ-์‚ฌ์–‘)์„ ์˜๋ฏธํ•œ๋‹ค. Lift์™€ Can ํ™˜๊ฒฝ์—์„œ๋Š” IBRL์ด ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ์„ฑ๊ณต๋ฅ ์„ ๋Œ์–ด์˜ฌ๋ฆผ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ ๋‹จ์ˆœํ•œ Lift์—์„œ๋Š” 10K ๋‹จ๊ณ„ ๋ฏธ๋งŒ์—์„œ 100% ์„ฑ๊ณต๋ฅ ์— ๋„๋‹ฌํ•˜๋ฉฐ, RLPD+๋ณด๋‹ค ์•ฝ 3๋ฐฐ ๋น ๋ฅธ ์ˆ˜๋ ด์„ ๋ณด์ธ๋‹ค.

๊ทธ๋ฆผ 1์˜ ์˜ค๋ฅธ์ชฝ ์ƒ๋‹จ ๊ทธ๋ž˜ํ”„(Can)๋Š” IBRL์ด 20K ๋‹จ๊ณ„ ๋‚ด์— ๊ณผ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ๋ฐ˜๋ฉด, RLPD+๋Š” ๋” ๋งŽ์€ ๋‹จ๊ณ„๊ฐ€ ํ•„์š”ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. Square๋Š” ๊ฐ€์žฅ ์–ด๋ ค์šด ํ™˜๊ฒฝ์œผ๋กœ, ์ดˆ๊ธฐ ํ•™์Šต์ด ๋งค์šฐ ๋А๋ฆฌ์ง€๋งŒ IBRL์ด ๊ณ„์† ์šฐ์„ธํ•˜๋‹ค. ์ „๋ฐ˜์ ์œผ๋กœ IBRL์€ ๋ชจ๋“  Robomimic ๊ณผ์ œ์—์„œ RLPD+๋ฅผ ํฌ๊ฒŒ ์•ž์„œ๋ฉฐ, ๊ฐ™์€ ์‹œ์—ฐ ์ˆ˜๋กœ๋„ RLPD+๋ณด๋‹ค ์›”๋“ฑํ•œ ์ƒ˜ํ”Œ ํšจ์œจ์„ ๋ณด์ธ๋‹ค.

๊ทธ๋ฆผ 1: Robomimic Lift, PickPlaceCan, NutAssemblySquare ๊ณผ์ œ์—์„œ IBRL(๋นจ๊ฐ„)๊ณผ RLPD+(ํŒŒ๋ž€)์˜ ํ•™์Šต ์„ฑ๋Šฅ ๋น„๊ต. ๊ฐ ๊ทธ๋ž˜ํ”„๋Š” ์„ฑ๋Šฅ(์„ฑ๊ณต๋ฅ )์„ ์ƒํ˜ธ์ž‘์šฉ ์Šคํ… ์ˆ˜์— ๋Œ€ํ•ด ๋ณด์—ฌ์ค€๋‹ค. ๋ชจ๋“  ํ™˜๊ฒฝ์—์„œ IBRL์ด ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋ฉฐ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋˜ํ•œ ํ‘œ 1์— ๋‚˜ํƒ€๋‚œ ๋ฐ”์™€ ๊ฐ™์ด, IBRL๋กœ ํ•™์Šต๋œ ์ •์ฑ…์€ ์ธ๊ฐ„ ์‹œ์—ฐ๋ณด๋‹ค๋„ ํ‰๊ท  ์—ํ”ผ์†Œ๋“œ ๊ธธ์ด๊ฐ€ ์งง์•„์ง€๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์˜€๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด IBRL์€ Lift, Can, Square์—์„œ ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€(48.3, 116.0, 150.8 ์Šคํ…)๋ณด๋‹ค ๊ฐ๊ฐ 3~2.2๋ฐฐ ๋น ๋ฅด๊ฒŒ ๊ณผ์ œ๋ฅผ ์™„๋ฃŒํ–ˆ์œผ๋ฉฐ, ํ‰๊ท ์ ์œผ๋กœ๋Š” ์•ฝ 2.3 ์Šคํ…์„ ๋‹จ์ถ•ํ–ˆ๋‹ค. ์ด๋Š” IBRL์ด RL์„ ํ†ตํ•ด ์‹œ์—ฐ์—์„œ ๋ณธ ๋™์ž‘์„ ๋„˜์–ด ํšจ์œจ์ ์ธ ํ–‰๋™์„ ํ•™์Šตํ–ˆ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.

ํ•œํŽธ, ํ”ฝ์…€ ๊ธฐ๋ฐ˜ ํ•™์Šต์—์„œ๋Š” ๋žœ๋ค ์‰ฌํ”„ํŠธ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•๊ณผ ์†๋ชฉ ์นด๋ฉ”๋ผ ํ™œ์šฉ ๋•๋ถ„์— Lift์™€ Can์—์„œ ์˜คํžˆ๋ ค ์Šคํ…Œ์ดํŠธ ๊ธฐ๋ฐ˜๋ณด๋‹ค ๋น ๋ฅธ ์ˆ˜๋ ด์ด ๊ด€์ฐฐ๋˜์—ˆ๋‹ค. Square๋Š” ์‹œ์•ผ๊ฐ€ ์ œํ•œ๋˜๋Š” ๋ณต์žก๋„๋กœ ์ธํ•ด ํ”ฝ์…€ ํ•™์Šต์ด ์–ด๋ ค์› ์ง€๋งŒ, ๊ทธ๋‚˜๋งˆ IBRL์€ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ๋” ๋น ๋ฅด๊ฒŒ ์ •์ฑ…์„ ๊ฐœ์„ ํ–ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด Robomimic์˜ PickPlaceCan์—์„œ๋Š” ๋‹จ 10ํšŒ์˜ ์‹œ์—ฐ๊ณผ 10๋งŒ ๋‹จ๊ณ„์˜ ์ƒํ˜ธ์ž‘์šฉ๋งŒ์œผ๋กœ๋„, IBRL์€ RLPD ๋Œ€๋น„ ์„ฑ๊ณต๋ฅ ์ด ์•ฝ 6.4๋ฐฐ ๋†’๊ฒŒ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ด์ฒ˜๋Ÿผ IBRL์€ ์ ์€ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋กœ๋„ ๊ฐ•ํ™”ํ•™์Šต์„ ํšจ๊ณผ์ ์œผ๋กœ ์ง„ํ–‰ํ•˜์—ฌ, ํŠนํžˆ ๋‚œ์ด๋„๊ฐ€ ๋†’์€ ๊ณผ์ œ์—์„œ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ธ๋‹ค.

5.2 Meta-World ํƒœ์Šคํฌ

๊ทธ๋ฆผ 2๋Š” Meta-World์˜ 4๊ฐœ ๊ณผ์ œ(Assembly, Box Close, Coffee Push, Stick Pull)์—์„œ IBRL(๋นจ๊ฐ„), IBRL Basic(๋นจ๊ฐ„ ์ ์„ ), MoDem(์ดˆ๋ก), RLPD+(ํŒŒ๋ž‘)์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•œ ๊ฒƒ์ด๋‹ค. ๋ชจ๋“  ๊ทธ๋ž˜ํ”„์—์„œ ๊ฐ€๋กœ์ถ•์€ ์ƒํ˜ธ์ž‘์šฉ ์Šคํ…, ์„ธ๋กœ์ถ•์€ ์„ฑ๊ณต๋ฅ ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ฉด IBRL(๋นจ๊ฐ„ ์‹ค์„ )์ด ๋„ค ๊ฐ€์ง€ ๊ณผ์ œ ๋ชจ๋‘์—์„œ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋“ค์„ ์•ž์„ ๋‹ค. ํŠนํžˆ ์–ด๋ ค์šด ๊ณผ์ œ์ธ Assembly๋‚˜ Stick Pull์—์„œ๋Š” IBRL์ด ๊ฑฐ์˜ 100% ์„ฑ๊ณต๋ฅ ์— ๊ทผ์ ‘ํ•˜๋Š” ๋ฐ˜๋ฉด, MoDem๊ณผ RLPD+๋Š” ์ƒ๋‹นํžˆ ๋‚ฎ์€ ์„ฑ๊ณต๋ฅ ์— ๋จธ๋ฌผ๋ €๋‹ค.

IBRL Basic(์ ์„ )์€ ์ธ์ฝ”๋”๋ฅผ ๊ฐ„๋‹จํ™”ํ•œ ๋ณ€ํ˜•์œผ๋กœ, ๋‹จ์ˆœ ํ™˜๊ฒฝ์—์„œ๋Š” ์˜คํžˆ๋ ค IBRL๋ณด๋‹ค ์šฐ์ˆ˜ํ•˜๋‚˜, ๋ณต์žกํ•œ ์ž‘์—…์—์„œ๋Š” ๊นŠ์€ ๊ตฌ์กฐ์˜ IBRL์ด ๋” ์•ˆ์ •์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ์ „๋ฐ˜์ ์œผ๋กœ IBRL๊ณผ ๊ทธ ๋ณ€ํ˜•์€ ๋ชจ๋“  Meta-World ๊ณผ์ œ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํ•ด๊ฒฐํ–ˆ์œผ๋‚˜, MoDem์€ 4๊ฐœ ์ค‘ 3๊ฐœ ํ™˜๊ฒฝ์—์„œ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ํ•ด๊ฒฐ๋ฅ ์„ ๋‹ฌ์„ฑํ•˜์ง€ ๋ชปํ–ˆ๋‹ค. ์•„์šธ๋Ÿฌ MoDem์€ ๋ชจ๋ธ ํ•™์Šต ๋ฐ ๊ณ„ํš ๋‹จ๊ณ„ ๋•Œ๋ฌธ์— ์‹œ๊ฐ„ ๋น„์šฉ์ด 150์‹œ๊ฐ„ ์ด์ƒ ์†Œ์š”๋˜์ง€๋งŒ, IBRL์€ ๋‹จ์ˆœํžˆ ์ •์ฑ… ํ•™์Šต๋งŒ์œผ๋กœ ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜์˜€๋‹ค.

๊ทธ๋ฆผ 2: Meta-World ๊ณผ์ œ(Assembly, Box Close, Coffee Push, Stick Pull)์—์„œ์˜ IBRL, MoDem, RLPD+ ์„ฑ๋Šฅ ๋น„๊ต. ๋นจ๊ฐ„์ƒ‰ ์‹ค์„ ์ด IBRL, ์ ์„ ์ด IBRL Basic, ์ดˆ๋ก์ด MoDem, ํŒŒ๋ž‘์ด RLPD+์ด๋‹ค. ๋ชจ๋“  ๊ณผ์ œ์—์„œ IBRL(๋นจ๊ฐ„)์ด ๋น ๋ฅด๊ณ  ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ํŠนํžˆ ๋‚œ์ด๋„ ๋†’์€ ํ™˜๊ฒฝ์—์„œ ๊ฒฉ์ฐจ๊ฐ€ ๋‘๋“œ๋Ÿฌ์ง„๋‹ค.

5.3 ์ฃผ์š” ๊ฒฐ๊ณผ ์š”์•ฝ

์š”์•ฝํ•˜๋ฉด, IBRL์€ 6๊ฐœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ณผ์ œ์—์„œ ๋‘๋“œ๋Ÿฌ์ง„ ์ƒ˜ํ”Œ ํšจ์œจ ๊ฐœ์„ ์„ ๋ณด์—ฌ์ค€๋‹ค. RLPD+์™€ ๊ฐ™์€ ๋‹จ์ˆœํ•œ ๋ฐ๋ชจ ๊ณผ๋Œ€์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์— ๋น„ํ•ด, IL ์ •์ฑ…์„ ์ ๊ทน์ ์œผ๋กœ ํ™œ์šฉํ•จ์œผ๋กœ์จ ์ดˆ๊ธฐ ๋‹จ๊ณ„๋ถ€ํ„ฐ ๊ณ ํ’ˆ์งˆ์˜ ํ–‰๋™ ํ›„๋ณด๋ฅผ ์–ป๊ณ  ๋” ๋น ๋ฅธ ํƒ์ƒ‰์„ ๊ฐ€๋Šฅ์ผ€ ํ–ˆ๋‹ค. ๋˜ํ•œ IBRL Basic์„ ํฌํ•จํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ตฌ์กฐ์—์„œ๋„ ์•ˆ์ •์ ์œผ๋กœ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋ฉฐ, ๋ชจ๋ธ ๊ธฐ๋ฐ˜ MoDem๋ณด๋‹ค ๊ณ„์‚ฐ ํšจ์œจ์„ฑ ์ธก๋ฉด์—์„œ๋„ ์œ ๋ฆฌํ•˜๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” IBRL์ด ๊ธฐ์กด ๊ธฐ๋ฒ• ๋Œ€๋น„ ๋” ๋†’์€ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ๊ณผ ์ตœ์ข… ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Œ์„ ๋’ท๋ฐ›์นจํ•œ๋‹ค.

6. ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜ ๋ฐ ์ ์šฉ์„ฑ

IBRL์˜ ์‹ค์ œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ์ €์ž๋“ค์€ 3๊ฐ€์ง€ ์‹ค์ œ ๋กœ๋ด‡ ์กฐ์ž‘ ๊ณผ์ œ๋ฅผ ์„ค์ •ํ–ˆ๋‹ค. ๊ณผ์ œ๋Š” Franka Panda ๋กœ๋ด‡ํŒ”๋กœ ์ˆ˜ํ–‰๋˜๋ฉฐ, Lift(๋ธ”๋ก ๋“ค์–ด์˜ฌ๋ฆฌ๊ธฐ), Drawer(์„œ๋ž ์—ด๊ธฐ), Hang(์ฒœ ๊ฑธ๊ธฐ)์œผ๋กœ ๋‚œ์ด๋„๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค. ์ดˆ๊ธฐ ์กฐ๊ฑด์ด๋‚˜ ๋กœ๋ด‡ ์‹œ์ž‘ ์œ„์น˜์— ๋ณ€์ด๊ฐ€ ์žˆ์œผ๋ฉฐ, ๊ฐ ๊ณผ์ œ๋ณ„๋กœ 10~30ํšŒ์˜ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ–ˆ๋‹ค. Lift์—์„œ๋Š” ์†๋ชฉ ์นด๋ฉ”๋ผ ์‹œ์ ์„, Hang์—์„œ๋Š” 3์ธ์นญ ์นด๋ฉ”๋ผ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ๋ชจ๋“  ๋ฐฉ๋ฒ•์€ ๋™์ผํ•œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์™€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์ƒํ˜ธ์ž‘์šฉ ์˜ˆ์‚ฐ๋„ ๊ณผ์ œ ๋‚œ์ด๋„์— ๋งž์ถ”์–ด ๋ถ€์—ฌํ–ˆ๋‹ค.

๊ทธ๋ฆผ 8 ๋ฐ ํ‘œ I์— ๋”ฐ๋ฅด๋ฉด, IBRL์€ ์„ธ ๊ณผ์ œ ๋ชจ๋‘์—์„œ RLPD(RFT) ๋ฐ ํ–‰๋™ ๋ณต์ œ(BC) ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. Lift ๊ณผ์ œ์—์„œ๋Š” 8K ์ƒํ˜ธ์ž‘์šฉ ๋‹จ๊ณ„ ๋งŒ์— IBRL์ด 100% ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ๊ณ , RLPD์™€ RFT๋Š” ๊ฐ๊ฐ 95%, 90%๋กœ ๋’ค๋ฅผ ์ด์—ˆ๋‹ค. ๋” ์–ด๋ ค์šด Lift Hard Eval ์ƒํ™ฉ(๋ธ”๋ก์ด ์†๋ชฉ ์นด๋ฉ”๋ผ ์‹œ์•ผ ๊ฐ€์žฅ์ž๋ฆฌ์— ๋†“์ž„)์—์„œ๋„ IBRL์€ 95%์˜ ์„ฑ๊ณต๋ฅ ์„ ์œ ์ง€ํ•œ ๋ฐ˜๋ฉด, BC๋Š” 0%๋กœ ์„ฑ๋Šฅ์ด ๊ธ‰๋ฝํ–ˆ๋‹ค. ์ด๋Š” IBRL์ด ํ•™์Šต ์ค‘ ๋‹ค์–‘ํ•œ ์ดˆ๊ธฐ ์ƒํƒœ๋ฅผ ๊ฒฝํ—˜ํ•˜๋ฉฐ ๋ถ„ํฌ ์ฐจ์ด๋ฅผ ๊ทน๋ณตํ–ˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

Drawer ๊ณผ์ œ(์„œ๋ž ์—ด๊ธฐ)์—์„œ๋Š” 16K ๋‹จ๊ณ„ ์ƒํ˜ธ์ž‘์šฉ์—์„œ IBRL์ด 95% ์„ฑ๊ณต๋ฅ ๋กœ ๊ฐ€์žฅ ๋†’์•˜๋‹ค. ์‹คํ—˜์„ ์กฐ๊ธฐ ์ค‘๋‹จํ•œ 10K ๋‹จ๊ณ„ ์ง€์ ์—์„œ๋„ IBRL์€ ์ด๋ฏธ 100%์— ๋„๋‹ฌํ–ˆ์œผ๋‚˜, RLPD์™€ RFT๋Š” 15% ์ดํ•˜๋กœ ๊ทนํžˆ ๋‚ฎ์€ ์„ฑ๊ณต๋ฅ ์ด์—ˆ๋‹ค. IBRL์€ ๋ฐ๋ชจ๊ฐ€ ์ถฉ๋ถ„ํ•˜๋”๋ผ๋„ ์‹ค์ œ ํ™˜๊ฒฝ์˜ ์„ธ๋ฐ€ํ•œ ์กฐ์ž‘์ด ํ•„์š”ํ•จ์„ ๋น ๋ฅด๊ฒŒ ํ•™์Šตํ–ˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

๊ฐ€์žฅ ์–ด๋ ค์šด Hang ๊ณผ์ œ(๋ณ€ํ˜• ์ฒœ ๊ฑธ๊ธฐ)์—์„œ๋Š” IBRL๋งŒ์ด ๊ฐ•๊ฑดํ•จ์„ ์ฆ๋ช…ํ–ˆ๋‹ค. 30K ๋‹จ๊ณ„ ์ƒํ˜ธ์ž‘์šฉ ํ›„ IBRL์€ 85% ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•˜์—ฌ BC(65%)๋ณด๋‹ค 20%p ๋†’์•˜์œผ๋‚˜, RLPD์™€ RFT๋Š” ๊ฐ๊ฐ 15%, 25%์— ๋จธ๋ฌผ๋ €๋‹ค. ์ด๋Š” ์ฒœ์˜ ๋ณ€ํ˜•์„ฑ์„ ์˜ˆ์ธกํ•˜๊ธฐ ํž˜๋“ค์–ด ๋žœ๋ค ํƒ์ƒ‰์ด ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ƒํ™ฉ์—์„œ, IBRL์ด IL ์ •์ฑ…์œผ๋กœ๋ถ€ํ„ฐ ์–ป์€ ์šฐ์ˆ˜ํ•œ ์ดˆ๊ธฐ ํ–‰๋™์„ ๊พธ์ค€ํžˆ ํ™œ์šฉํ•ด ์ •์ฑ…์„ ๋น ๋ฅด๊ฒŒ ๊ฐœ์„ ํ•œ ๊ฒฐ๊ณผ๋‹ค. ๊ทธ๋ฆผ 9์˜ ๋กค์•„์›ƒ ์˜ˆ์‹œ์—์„œ๋„ IBRL์€ ๋” ์ ์€ ์Šคํ… ๋‚ด์— ์„ฑ๊ณตํ–ˆ์œผ๋ฉฐ, BC๊ฐ€ ์‹คํŒจํ•˜๋Š” ์ดˆ๊ธฐ ์กฐ๊ฑด์—์„œ๋„ ์„ฑ๊ณตํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋„ IBRL์€ ๋‹จ์‹œ๊ฐ„ ๋‚ด์— ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ํš๋“ํ•˜์—ฌ, ๋‹ค๋ฅธ RL ๊ธฐ๋ฒ•๋ณด๋‹ค ์ƒ๋‹นํžˆ ๋†’์€ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ–ˆ๋‹ค. ํŠนํžˆ BC ๊ธฐ๋ฐ˜ ์ •์ฑ…์˜ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด์„œ๋Š” ์„ฑ๋Šฅ์œผ๋กœ, ๋ถ„ํฌ ๋ณ€ํ™”๋‚˜ ๋…ธ์ด์ฆˆ์— ์˜ํ•œ ์„ฑ๋Šฅ ์ €ํ•˜ ์ƒํ™ฉ์—์„œ๋„ ๋น ๋ฅด๊ฒŒ ํšŒ๋ณตํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค. ์ด๋Š” IBRL์ด ์‹ค์ œ ๋กœ๋ด‡ ์‘์šฉ์—์„œ ๊ธฐ์กด IL ์ •์ฑ…์„ ํฌ๊ฒŒ ๋›ฐ์–ด๋„˜๋Š” ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์‹ ์†ํžˆ ๊ฐ€๋Šฅ์ผ€ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

7. ๊ธฐ์—ฌ, ํ•œ๊ณ„ ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

IBRL์€ ๋กœ๋ด‡ ๊ฐ•ํ™”ํ•™์Šต ์—ฐ๊ตฌ์— ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค. ์ฒซ์งธ, ๋ชจ๋ฐฉํ•™์Šต ์ •์ฑ…์„ ๋ช…์‹œ์ ์œผ๋กœ ๊ฐ•ํ™”ํ•™์Šต์— ํ†ตํ•ฉํ•จ์œผ๋กœ์จ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ์˜ ์ด์ ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ณ  RL ํƒ์ƒ‰ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ–ˆ๋‹ค. ๋‘˜์งธ, IL๊ณผ RL ์ •์ฑ…์˜ ๋ถ„๋ฆฌ๋กœ ๊ฐ์ž ์ตœ์ ์˜ ๋„คํŠธ์›Œํฌ์™€ ํ•™์Šต๋ฒ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด, ๋ณด๋‹ค ์œ ์—ฐํ•˜๊ณ  ํšจ์œจ์ ์ธ ์„ค๊ณ„๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ์…‹์งธ, ํญ๋„“์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์‹ค์ œ ์‹คํ—˜์—์„œ SoTA ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์—ฌ, ๋กœ๋ด‡ ์ƒ˜ํ”Œ ํšจ์œจ์  ํ•™์Šต์˜ ์ƒˆ๋กœ์šด ๊ธฐ์ค€์„ ์ œ์‹œํ–ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ํ•œ๊ณ„์ ๋„ ์กด์žฌํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์˜ ์‹ค์ œ ์‹คํ—˜์—์„œ๋Š” ์ž๋™ ์ดˆ๊ธฐํ™”(autonomous reset)๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š๊ณ  ์ˆ˜๋™ ๋ฆฌ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์•ˆ์ •์  ํ‰๊ฐ€๋ฅผ ๋ณด์žฅํ–ˆ๋‹ค. ์‹ค์ œ ๋Œ€๊ทœ๋ชจ ๋ฐฐํฌ๋ฅผ ์œ„ํ•ด์„œ๋Š” ์ž๋™ ๋ฆฌ์…‹ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ํ•„์š”ํ•˜๋ฉฐ, ์ด๋Š” ํ–ฅํ›„ ๊ณผ์ œ๋กœ ๋‚จ๋Š”๋‹ค. ๋˜ํ•œ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” BC๋ฅผ ์‚ฌ์šฉํ•œ ๋‹จ์ผ ํ˜•ํƒœ์˜ IL ์ •์ฑ…์„ ์‹คํ—˜ํ–ˆ์ง€๋งŒ, IBRL ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์ด๋ก ์ƒ ์–ด๋– ํ•œ IL ๊ธฐ๋ฒ•๊ณผ๋„ ๊ฒฐํ•ฉ ๊ฐ€๋Šฅํ•˜๋‹ค. ํ–ฅํ›„์—๋Š” ํ™•์‚ฐ ์ •์ฑ…(diffusion policies) ๋“ฑ ์ตœ์‹  IL ๋ฐฉ๋ฒ•์„ ๋„์ž…ํ•˜๊ฑฐ๋‚˜, PEX/PILCO ๋“ฑ๊ณผ์˜ ๋น„๊ต ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ์„ฑ๋Šฅ์„ ๋”์šฑ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

8. ๊ฒฐ๋ก 

IBRL์€ ์ œํ•œ๋œ ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ชจ๋ฐฉํ•™์Šต ์ •์ฑ…์„ ์ฐธ์กฐ ์ •์ฑ…์œผ๋กœ ํ™œ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ ํšจ์œจ์ ์ธ ๊ฐ•ํ™”ํ•™์Šต์„ ์‹คํ˜„ํ•œ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์ด๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, IBRL์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ์ ์€ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ ํŠนํžˆ ์–ด๋ ค์šด ๊ณผ์ œ์—์„œ ํšจ๊ณผ๊ฐ€ ๋‘๋“œ๋Ÿฌ์กŒ๋‹ค. ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜์—์„œ๋„ ํƒ€ ๋ฐฉ๋ฒ•์„ ํฌ๊ฒŒ ์•ž์„œ๋ฉฐ, ๋กœ๋ด‡ ์ œ์–ด ์ •์ฑ…์˜ ๋น ๋ฅธ ํ–ฅ์ƒ์„ ๊ฐ€๋Šฅ์ผ€ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ IBRL์€ ๋กœ๋ด‡ ๊ฐ•ํ™”ํ•™์Šต ์—ฐ๊ตฌ์— ์žˆ์–ด ์‹œ์—ฐ ํ•™์Šต๊ณผ ๊ฐ•ํ™”ํ•™์Šต์˜ ๊ฒฐํ•ฉ์„ ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ์œผ๋กœ ์ œ์‹œํ•˜๋ฉฐ, ์‹ค์ œ ๋กœ๋ด‡ ์ ์šฉ ์ธก๋ฉด์—์„œ๋„ ํšจ์œจ์„ฑ๊ณผ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ํ‰๊ฐ€๋œ๋‹ค.

Copyright 2024, Jung Yeon Lee