Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ๐Ÿค” ๋ฌธ์ œ์˜ ์‹œ์ž‘: ๋กœ๋ด‡์€ ์™œ โ€™๋ณด๋Š” ๊ฒƒโ€™์„ ๋ฐฐ์šฐ๊ธฐ ์–ด๋ ค์šธ๊นŒ?
      • ๊ธฐ์กด์—๋Š” ์–ด๋–ป๊ฒŒ ํ–ˆ์„๊นŒ?
      • ๐Ÿ” ํ•ต์‹ฌ ๋ฌธ์ œ: Observability Gap
    • ๐Ÿ’ก ์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด: ์ฒ˜์Œ๋ถ€ํ„ฐ ๋ณด๋ฉด์„œ ๋ฐฐ์šฐ์ž!
      • โœ… End-to-End RL์˜ ์žฅ์ 
      • โš ๏ธ ๊ทธ๋Ÿฐ๋ฐ ์™œ ์•ˆ ์ผ์„๊นŒ? Vision-based RL์˜ ๋ฌธ์ œ
    • ๐Ÿš€ ๊ฒŒ์ž„ ์ฒด์ธ์ €: Disaggregated Simulation
      • ๐Ÿค” ๊ธฐ์กด ๋ฐฉ์‹์˜ ๋ฌธ์ œ์ 
      • ๐Ÿ’ก ์ƒˆ๋กœ์šด ์•„์ด๋””์–ด: ์—ญํ• ์„ ๋‚˜๋ˆ„์ž!
      • ๐Ÿ“ˆ ์„ฑ๋Šฅ ํ–ฅ์ƒ: ์ˆซ์ž๋กœ ๋ณด๋Š” ํšจ๊ณผ
      • ๐Ÿง  ์˜๊ฐ์˜ ์›์ฒœ: LLM์—์„œ ๋ฐฐ์šฐ๋‹ค
    • ๐ŸŽฏ ์‹ค์ „ ์ „๋žต: Depth โ†’ Stereo RGB ํŒŒ์ดํ”„๋ผ์ธ
      • ๐ŸŽจ ์™œ Depth๋ฅผ ์ค‘๊ฐ„ ๋‹จ๊ณ„๋กœ?
      • ๐Ÿ”„ 3๋‹จ๊ณ„๋กœ ๋‚˜๋ˆ„์–ด ์ •๋ณตํ•˜๊ธฐ
      • ๐Ÿ† Depth๊ฐ€ State๋ณด๋‹ค ๋‚˜์€ ์ด์œ 
    • ๐Ÿ”ฌ ์‹คํ—˜ ํ™˜๊ฒฝ: ๋ฌด์—‡์„ ๊ฐ€์ง€๊ณ  ํ…Œ์ŠคํŠธํ–ˆ๋‚˜?
      • ๐Ÿค– ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ
      • ๐Ÿ“ฆ ํ…Œ์ŠคํŠธ ๋ฌผ์ฒด๋“ค
      • ๐ŸŽฒ ํ˜„์‹ค๊ฐ ๋”ํ•˜๊ธฐ: Domain Randomization
    • ๐Ÿ“Š ๊ฒฐ๊ณผ๋Š” ์–ด๋• ์„๊นŒ? (์Šคํฌ์ผ๋Ÿฌ: ๋Œ€์„ฑ๊ณต!)
      • ๐Ÿ’ป ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ์˜ ์„ฑ์ ํ‘œ
      • ๐ŸŽ“ Teacher ๋น„๊ต: ๋ˆ„๊ฐ€ ๋” ์ž˜ ๊ฐ€๋ฅด์น ๊นŒ?
      • ๐ŸŒ ์‹ค์ œ ๋กœ๋ด‡์—์„œ๋Š”?
    • ๐Ÿ› ๏ธ ๊ธฐ์ˆ ์  ๊นŠ์ด: ์–ด๋–ป๊ฒŒ ๊ตฌํ˜„ํ–ˆ์„๊นŒ?
      • ๐Ÿงฑ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ
      • โš™๏ธ ํ•™์Šต ์„ค์ •
      • ๐Ÿ”ง ๊ตฌํ˜„ํ•  ๋•Œ ์ฃผ์˜ํ•  ์ ๋“ค
    • ๐Ÿ”„ ๋‹ค๋ฅธ ์—ฐ๊ตฌ๋“ค๊ณผ ๋น„๊ตํ•˜๋ฉด?
      • ๐Ÿ‘๏ธ Vision-based Grasping์˜ ์„ ๋ฐฐ๋“ค
      • ๐Ÿคน Dexterous Manipulation์˜ ๋‹ค๋ฅธ ์ ‘๊ทผ๋“ค
      • ๐ŸŒŸ ์ด ๋…ผ๋ฌธ๋งŒ์˜ ํŠน๋ณ„ํ•จ
    • ๐Ÿคท ์™„๋ฒฝํ•˜์ง„ ์•Š์•„์š”: ํ•œ๊ณ„์ ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ
      • ๐Ÿ˜… ์•„์ง์€ ์ด๋Ÿฐ ๋ถ€๋ถ„์ด ์•„์‰ฌ์›Œ์š”
      • ๐Ÿ”จ ๋ฐ”๋กœ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ๋“ค
      • ๐Ÿš€ ๋ฏธ๋ž˜์—๋Š” ์ด๋Ÿฐ ๊ฒƒ๋„ ๊ฐ€๋Šฅํ• ๊นŒ?
    • โœจ ๊ฒฐ๋ก : ๋กœ๋ด‡์ด ์ง„์งœ๋กœ โ€˜๋ณด๋ฉด์„œโ€™ ๋ฐฐ์šฐ๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค
    • โœจ ๊ฒฐ๋ก : ๋กœ๋ด‡์ด ์ง„์งœ๋กœ โ€˜๋ณด๋ฉด์„œโ€™ ๋ฐฐ์šฐ๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค
      • ์ด ๋…ผ๋ฌธ์ด ์šฐ๋ฆฌ์—๊ฒŒ ์•Œ๋ ค์ค€ ๊ฒƒ๋“ค
      • ๋กœ๋ด‡๊ณตํ•™ ๋ถ„์•ผ์— ๋˜์ง€๋Š” ๋ฉ”์‹œ์ง€
      • ์•ž์œผ๋กœ์˜ ์ „๋ง
      • ํ•œ ์ค„ ์š”์•ฝ
    • ๐Ÿ’ฌ ๋งˆ์น˜๋ฉฐ
  • โ›๏ธ Dig Review
    • ์„œ๋ก : ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ๋‹ค์ง€ ๊ทธ๋ฆฌํ•‘์˜ ๋„์ „๊ณผ ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„
    • ์ฃผ์š” ๊ธฐ์—ฌ ๋ฐ ํ˜์‹  ์š”์•ฝ
    • ํ•™์Šต ํ™˜๊ฒฝ ๋ฐ ์ •์ฑ… ๊ตฌ์กฐ
    • ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ์ ‘๊ทผ: ์—”๋“œํˆฌ์—”๋“œ RL ํ•™์Šต ์ „๋žต
      • 1. ๊นŠ์ด ์ž…๋ ฅ ๊ธฐ๋ฐ˜ ์—”๋“œํˆฌ์—”๋“œ RL ๋ฐ RGB ์ฆ๋ฅ˜ ํŒŒ์ดํ”„๋ผ์ธ
      • 2. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-ํ•™์Šต๊ธฐ ๋ถ„ํ• ์„ ํ†ตํ•œ ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌํ™”
    • ์‹คํ—˜ ๊ฒฐ๊ณผ: ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ
      • ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ์˜ ์„ฑ๋Šฅ ๋น„๊ต
      • ์‹ค์„ธ๊ณ„ ๋กœ๋ด‡์œผ๋กœ์˜ ๊ฒ€์ฆ ๋ฐ ์„ฑ๋Šฅ
    • ์—”๋“œํˆฌ์—”๋“œ RL vs ๋‹จ๊ณ„๋ณ„/๋ชจ๋ฐฉ ํ•™์Šต: ์žฅ๋‹จ์  ๋ถ„์„
    • ๊ฒฐ๋ก  ๋ฐ ์‹œ์‚ฌ์ 

๐Ÿ“ƒEnd-to-end RL Dex-Grasping ๋ฆฌ๋ทฐ

rl
gpu-parallel
grasping
End-to-end RL Improves Dexterous Grasping Policies
Published

November 17, 2025

๐Ÿ” Ping. ๐Ÿ”” Ring. โ›๏ธ Dig. A tiered review series: quick look, key ideas, deep dive.

  • Paper Link
  • Blog
  1. ๐Ÿง End-to-end ๋น„์ „ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™” ํ•™์Šต์€ ๋กœ๋ด‡์˜ ๋ฏผ์ฒฉํ•œ ํŒŒ์ง€ ์ •์ฑ… ํ•™์Šต์— ์ค‘์š”ํ•˜์ง€๋งŒ, ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์€ ๋ฉ”๋ชจ๋ฆฌ ๋น„ํšจ์œจ์„ฑ์œผ๋กœ ์ธํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์ˆ˜์™€ ๋ฐฐ์น˜ ํฌ๊ธฐ ํ™•์žฅ์— ์ œ์•ฝ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
  2. ๐Ÿš€ ๋ณธ ์—ฐ๊ตฌ๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ RL ํ•™์Šต ๋ฐ ๊ฒฝํ—˜ ๋ฒ„ํผ๋ฅผ ๋ณ„๋„์˜ GPU์— ๋ถ„๋ฆฌํ•˜๋Š” โ€˜๋ถ„์‚ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜(disaggregated simulation)โ€™ ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜์—ฌ, ๋™์ผ ํ•˜๋“œ์›จ์–ด์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์ˆ˜๋ฅผ ๋‘ ๋ฐฐ๋กœ ๋Š˜๋ฆฌ๊ณ  end-to-end ์‹ฌ๋„(depth) ์ •์ฑ… ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿ“ˆ ์ด ๋ฐฉ์‹์€ ๊ธฐ์กด ์ƒํƒœ ๊ธฐ๋ฐ˜ ์ •์ฑ… ์ฆ๋ฅ˜ ๋Œ€๋น„ ์ •๋ณด ๋น„๋Œ€์นญ์„ฑ์„ ํ•ด์†Œํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์‹ค์ œ ํ™˜๊ฒฝ ๋ชจ๋‘์—์„œ ํ–ฅ์ƒ๋œ ํŒŒ์ง€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ํŠนํžˆ ๋ถ„์‚ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋Š˜๋ฆฌ๋ฉด ์‹ค์ œ ์„ฑ๋Šฅ์ด ๋”์šฑ ๊ฐœ์„ ๋จ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

๋ณธ ์—ฐ๊ตฌ๋Š” ๋กœ๋ด‡ ์•”๊ณผ ํ•ธ๋“œ ์‹œ์Šคํ…œ์„ ์ด์šฉํ•œ ์ •๊ตํ•œ ํŒŒ์ง€(dexterous grasping)๋ฅผ ์œ„ํ•œ ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜์˜ end-to-end Reinforcement Learning (RL) ์Šค์ผ€์ผ๋ง ๊ธฐ๋ฒ•์„ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์ƒํƒœ ๊ธฐ๋ฐ˜(state-based) RL๊ณผ ๋‹ฌ๋ฆฌ, ์‹œ๊ฐ ๊ธฐ๋ฐ˜(vision-based) RL์€ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ์ด ๋‚ฎ์•„ ๋ฐฐ์น˜ ํฌ๊ธฐ(batch size)๊ฐ€ ์ž‘์•„์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” PPO์™€ ๊ฐ™์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋ถˆ๋ฆฌํ•จ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์ƒํƒœ ๊ธฐ๋ฐ˜ ์ •์ฑ…์„ ์‹œ๊ฐ ๋„คํŠธ์›Œํฌ๋กœ ์ฆ๋ฅ˜(distillation)ํ•˜๋Š” ๊ธฐ์กด ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ, ์‹œ๊ฐ ๊ธฐ๋ฐ˜ RL์€ ๋Šฅ๋™ ์‹œ๊ฐ(active vision) ํ–‰๋™์„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฌธ์ œ์ :

์ด๋Ÿฌํ•œ ์‹œ๊ฐ ๊ธฐ๋ฐ˜ RL ์ •์ฑ… ํ›ˆ๋ จ์˜ ์ฃผ์š” ๋ณ‘๋ชฉ ํ˜„์ƒ์€ ๋Œ€๋ถ€๋ถ„์˜ ๊ธฐ์กด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ์ „ํ†ต์ ์ธ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ(data parallelism) ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ GPU์— ์Šค์ผ€์ผ๋ง๋˜๋Š” ๋ฐฉ์‹์—์„œ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๋ฐฉ์‹์—์„œ๋Š” ๊ฐ GPU๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์‹คํ–‰ํ•˜๊ณ , RL ๊ฒฝํ—˜ ๋ฒ„ํผ๋ฅผ ์ €์žฅํ•˜๋ฉฐ, ์•กํ„ฐ(actor)์™€ ํฌ๋ฆฌํ‹ฑ(critic)์˜ ๊ธฐ์šธ๊ธฐ(gradients)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์‹œ๊ฐ ๊ธฐ๋ฐ˜ RL์˜ ๊ฒฝ์šฐ ๊ฒฝํ—˜ ๋ฒ„ํผ์˜ ํฌ๊ธฐ๊ฐ€ ๊ธ‰์ฆํ•˜๊ณ , ๊ฐ GPU๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋Œ€๊ทœ๋ชจ ์ž์‚ฐ ์บ์‹œ(asset cache)๋ฅผ ๋ณต์‚ฌํ•˜์—ฌ ๋ถˆํ•„์š”ํ•œ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋ฅผ ์•ผ๊ธฐํ•˜๋ฏ€๋กœ ๋น„ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก : Disaggregated Simulation and RL

๋ณธ ๋…ผ๋ฌธ์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ RL ํ•™์Šต (๊ฒฝํ—˜ ๋ฒ„ํผ ๋ฐ ํ›ˆ๋ จ)์„ ๋ณ„๋„์˜ GPU์— ๋ถ„๋ฆฌ(disaggregate)ํ•˜์—ฌ ๋ฐฐ์น˜ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 4๊ฐœ์˜ GPU๊ฐ€ ์žˆ๋Š” ๋…ธ๋“œ์—์„œ 3๊ฐœ์˜ GPU๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์‹คํ–‰์— ์ „๋…ํ•˜๊ณ , ๋‚˜๋จธ์ง€ 1๊ฐœ์˜ GPU๋Š” RL ํ•™์Šต๊ณผ ๊ฒฝํ—˜ ๋ฒ„ํผ ์ €์žฅ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

  • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ณต์ œ๋ณธ (Algorithm 1): ์‹œ๋ฎฌ๋ ˆ์ด์…˜ GPU s \in \{0, 1, 2\}๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
    1. ํ™˜๊ฒฝ์„ ์žฌ์„ค์ •ํ•˜๊ณ  ์ดˆ๊ธฐ ๊ด€์ธก๊ฐ’(obs)์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    2. ์ด ์ดˆ๊ธฐ ๊ด€์ธก๊ฐ’์„ ํ•™์Šต๊ธฐ(learner) GPU (์˜ˆ: \ell \leftarrow 3)๋กœ ์ „์†กํ•ฉ๋‹ˆ๋‹ค.
    3. ๋ฌดํ•œ ๋ฃจํ”„ ์•ˆ์—์„œ ํ•™์Šต๊ธฐ๋กœ๋ถ€ํ„ฐ ํ–‰๋™(actions)์„ ์ˆ˜์‹ ํ•ฉ๋‹ˆ๋‹ค.
    4. ์ˆ˜์‹ ๋œ ํ–‰๋™์œผ๋กœ ํ™˜๊ฒฝ์„ ํ•œ ๋‹จ๊ณ„(step) ์ง„ํ–‰ํ•˜์—ฌ ๋‹ค์Œ ๊ด€์ธก๊ฐ’(obs'), ๋ณด์ƒ(rew), ์™„๋ฃŒ ์‹ ํ˜ธ(dones)๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
    5. ์–ป์€ (rew, dones, obs')๋ฅผ ํ•™์Šต๊ธฐ GPU๋กœ ๋‹ค์‹œ ์ „์†กํ•ฉ๋‹ˆ๋‹ค.
  • ํ•™์Šต๊ธฐ/ํ›ˆ๋ จ๊ธฐ (Algorithm 2): ํ•™์Šต๊ธฐ GPU 3์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
    1. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ GPU S \leftarrow \{0, 1, 2\}๋กœ๋ถ€ํ„ฐ ์ดˆ๊ธฐ ๊ด€์ธก๊ฐ’ obs[s]๋ฅผ ์ˆ˜์‹ ํ•ฉ๋‹ˆ๋‹ค.
    2. ํŠธ๋ž™ํ† ๋ฆฌ ๋ฒ„ํผ(trajectory buffer) D๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
    3. ์ง€์ •๋œ horizon H ๋™์•ˆ ๋‹ค์Œ ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค:
      • ๋ชจ๋“  ์‹œ๋ฎฌ๋ ˆ์ด์…˜ GPU์˜ ๊ด€์ธก๊ฐ’์„ ๋ชจ์•„ ์ •์ฑ… \pi๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ–‰๋™ actions๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค (์˜ˆ: actions \leftarrow \pi_{\text{stack}}(\{obs[s]\}_{s \in S})).
      • ๊ณ„์‚ฐ๋œ actions[s]๋ฅผ ๊ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ GPU s๋กœ ์ „์†กํ•ฉ๋‹ˆ๋‹ค.
      • ๊ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ GPU๋กœ๋ถ€ํ„ฐ (rew, dones, nextObs)๋ฅผ ์ˆ˜์‹ ํ•ฉ๋‹ˆ๋‹ค.
      • ์ˆ˜์ง‘๋œ (obs, actions, rew, done)๋ฅผ ํŠธ๋ž™ํ† ๋ฆฌ ๋ฒ„ํผ D์— ์ถ”๊ฐ€ํ•˜๊ณ , obs๋ฅผ nextObs๋กœ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
    4. H ์Šคํ…์ด ๋๋‚˜๋ฉด ํŠธ๋ž™ํ† ๋ฆฌ ๋ฒ„ํผ D๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ PPO ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ •์ฑ… \pi๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ถ„๋ฆฌ ๋ฐฉ์‹์€ ๊ธฐ์กด ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๋ฐฉ์‹๊ณผ ๋™์ผํ•œ ์ˆ˜์˜ GPU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฐ€๋Šฅํ•œ ํ™˜๊ฒฝ ์ˆ˜๋ฅผ ๋‘ ๋ฐฐ ์ด์ƒ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค (ํ‘œ I ์ฐธ์กฐ). ์ด๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ง‘์•ฝ์ ์ธ ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ํ™˜๊ฒฝ์—์„œ PPO ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ถฉ๋ถ„ํžˆ ํ™•๋ณดํ•˜์—ฌ ํ•™์Šต ์‹ ํ˜ธ๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

ํ›ˆ๋ จ ํ™˜๊ฒฝ ๋ฐ ์ •์ฑ… ์•„ํ‚คํ…์ฒ˜:

DextrAH-RGB [9]์™€ ๋™์ผํ•œ Kuka iiwa ์•”๊ณผ Allegro V4 ํ•ธ๋“œ ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜๋ฉฐ, Visual Dexterity ๋ฐ์ดํ„ฐ์…‹ [26]์˜ 140๊ฐœ ๊ฐ์ฒด๋ฅผ ํŒŒ์ง€ํ•˜๊ณ  ๋“ค์–ด ์˜ฌ๋ฆฌ๋Š” ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. sim-to-real ์ „์ด๋ฅผ ์œ„ํ•ด Automatic Domain Randomization (ADR) [27, 9]์„ ์ ์šฉํ•˜์—ฌ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ(joint friction, stiffness, damping, mass ๋“ฑ)๋ฅผ ๋ฌด์ž‘์œ„ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์€ PPO [28]๋กœ end-to-end ๋ฐฉ์‹์œผ๋กœ ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ฒ˜(๊ทธ๋ฆผ 3)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: * CNN ๋ฐฑ๋ณธ: ์ž…๋ ฅ ์ด๋ฏธ์ง€๋Š” 4๊ฐœ ๋ ˆ์ด์–ด์˜ CNN์„ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค. ํ•„ํ„ฐ ์ˆ˜๋Š” ๊ฐ๊ฐ [16, 32, 64, 128]์ด๋ฉฐ, ๋ ˆ์ด์–ด ์ •๊ทœํ™”(layer normalization)์™€ ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. * ์ž„๋ฒ ๋”ฉ: CNN์˜ ์ถœ๋ ฅ์€ 32์ฐจ์› ์ž„๋ฒ ๋”ฉ(embedding)์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค. * ๊ฒฐํ•ฉ ๋ฐ LSTM: ์ด ์ž„๋ฒ ๋”ฉ์€ ๋กœ๋ด‡์˜ ๊ณ ์œ ์ˆ˜์šฉ์„ฑ(proprioception) ๋ฐ์ดํ„ฐ์™€ ๊ฒฐํ•ฉ๋˜์–ด 1024 ์œ ๋‹›์„ ๊ฐ€์ง„ ๋‘ ๊ฐœ์˜ LSTM ๋ ˆ์ด์–ด๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค. * MLP: LSTM์˜ ์ถœ๋ ฅ์€ [512, 512, 256] ์€๋‹‰ ์œ ๋‹›์„ ๊ฐ€์ง„ 3๊ฐœ ๋ ˆ์ด์–ด์˜ Fully Connected Network (MLP)๋กœ ํ”ผ๋“œ๋ฉ๋‹ˆ๋‹ค. GPU ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ๊ณผ ํ•™์Šต ์†๋„๋ฅผ ๊ณ ๋ คํ•˜์—ฌ RGB ๋Œ€์‹  ๊นŠ์ด(depth) ๊ธฐ๋ฐ˜ ์ •์ฑ…์„ RL๋กœ ํ›ˆ๋ จํ•œ ํ›„, ์ด๋ฅผ ์Šคํ…Œ๋ ˆ์˜ค RGB ์ •์ฑ…์œผ๋กœ ์ฆ๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์Šคํ…Œ๋ ˆ์˜ค RGB ์Œ์œผ๋กœ๋ถ€ํ„ฐ ๊นŠ์ด๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์ƒํƒœ ๊ธฐ๋ฐ˜ ๊ต์‚ฌ(teacher) ์ •์ฑ…๊ณผ ํ•™์ƒ(student) ์ •์ฑ… ์‚ฌ์ด์— ์กด์žฌํ•˜๋˜ ์ •๋ณด ๋น„๋Œ€์นญ์„ฑ(information asymmetry) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ์ด๋ก ์ ์ธ ์ •๋ณด ๊ฒฉ์ฐจ(information gap)๋ฅผ ์—†์•ฑ๋‹ˆ๋‹ค (๊ทธ๋ฆผ 4).

์‹คํ—˜ ๊ฒฐ๊ณผ:

  1. Disaggregated Simulation vs Data Parallelism: ์ œ์•ˆ๋œ ๋ถ„๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐฉ์‹์€ ์ „ํ†ต์ ์ธ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๋ฐฉ์‹๋ณด๋‹ค ๋ชจ๋“  ํ‰๊ฐ€ ์ง€ํ‘œ(ADR Inc., % Full ADR, SR)์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค (ํ‘œ II). ํŠนํžˆ 320x240 ํ•ด์ƒ๋„์—์„œ๋Š” ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ๋ฐฉ์‹์ด ๊ฐ์ฒด๋ฅผ ํŒŒ์ง€ํ•˜๋Š” ์ •์ฑ…์„ ์ „ํ˜€ ํ›ˆ๋ จ์‹œํ‚ค์ง€ ๋ชปํ–ˆ์ง€๋งŒ, ๋ถ„๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐฉ์‹์€ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์‹œ๊ฐ ๊ธฐ๋ฐ˜ RL ํƒœ์Šคํฌ๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฐ€๋Šฅํ•œ ํ™˜๊ฒฝ ์ˆ˜์— ํฌ๊ฒŒ ์ œ์•ฝ๋ฐ›์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  2. Distilling State Teachers vs Depth Teachers: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๊นŠ์ด ๊ธฐ๋ฐ˜ ๊ต์‚ฌ ์ •์ฑ…์œผ๋กœ๋ถ€ํ„ฐ ์ฆ๋ฅ˜๋œ RGB ํ•™์ƒ ์ •์ฑ…์ด ์ƒํƒœ ๊ธฐ๋ฐ˜ ๊ต์‚ฌ ์ •์ฑ…์œผ๋กœ๋ถ€ํ„ฐ ์ฆ๋ฅ˜๋œ RGB ํ•™์ƒ ์ •์ฑ…๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค (๊ทธ๋ฆผ 5). ์ด๋Š” ํ•™์ƒ๊ณผ ๊ต์‚ฌ ๊ฐ„์˜ ์ •๋ณด ๋น„๋Œ€์นญ์„ฑ์ด ์ค„์–ด๋“ค์–ด ํ•™์ƒ์ด ์ž์‹ ์˜ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ(modality)์— ๋” ์ ํ•ฉํ•œ ํ–‰๋™์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
  3. Real World Benchmarking: ์‹ค์ œ ํ™˜๊ฒฝ ๋นˆ ํŒจํ‚น(bin packing) ํƒœ์Šคํฌ์—์„œ ๊นŠ์ด ๊ธฐ๋ฐ˜ ๊ต์‚ฌ ์ •์ฑ…(ํŠนํžˆ ๋ถ„๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ํ›ˆ๋ จ๋œ ์ •์ฑ…)์ด ์ƒํƒœ ๊ธฐ๋ฐ˜ ๊ต์‚ฌ ์ •์ฑ…์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์ตœ๊ณ ์˜ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค (ํ‘œ III). ์ด๋Š” RL ํ•™์Šต์˜ ๋ฐฐ์น˜ ํฌ๊ธฐ ์ฆ๋Œ€๊ฐ€ ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ •์ฑ…์˜ ์‹ค์ œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ธฐ์—ฌํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๊ฒฐ๋ก :

๋ณธ ์—ฐ๊ตฌ๋Š” ์ •๊ตํ•œ ํŒŒ์ง€๋ฅผ ์œ„ํ•œ end-to-end ๊นŠ์ด ๊ธฐ๋ฐ˜ RL ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์ƒํƒœ ๊ธฐ๋ฐ˜ ์ •์ฑ… ํ›ˆ๋ จ ํ›„ ์‹œ๊ฐ ๊ธฐ๋ฐ˜์œผ๋กœ ์ฆ๋ฅ˜ํ•˜๋Š” ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ, ๊นŠ์ด ์ •์ฑ…์„ RL๋กœ ์ง์ ‘ ํ›ˆ๋ จํ•˜๊ณ  ์ด๋ฅผ RGB ์ •์ฑ…์œผ๋กœ ์ฆ๋ฅ˜ํ•จ์œผ๋กœ์จ ์ •๋ณด ๊ฒฉ์ฐจ ๋ฌธ์ œ๋ฅผ ํ•ด์†Œํ•˜๊ณ  ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ ์€ ์ˆ˜์˜ GPU๋กœ end-to-end RL ์ •์ฑ…์„ ํšจ์œจ์ ์œผ๋กœ ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•œ ๋ถ„๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ, ๋™์ผํ•œ ํ•˜๋“œ์›จ์–ด์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์ˆ˜๋ฅผ ๋‘ ๋ฐฐ๋กœ ๋Š˜๋ฆฌ๊ณ  ๊ฒฐ๊ณผ์ ์œผ๋กœ ์‹œ๊ฐ ๊ธฐ๋ฐ˜ RL ์ •์ฑ… ํ›ˆ๋ จ์˜ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

๋กœ๋ด‡๋„ ์ด์ œ โ€˜๋ณด๋ฉด์„œโ€™ ๋ฐฐ์šด๋‹ค: End-to-end RL๋กœ ์‹คํ˜„ํ•˜๋Š” Dexterous Grasping

NVIDIA์™€ UC Berkeley ์—ฐ๊ตฌ์ง„์ด ๋ฐœํ‘œํ•œ ์ •๋ง ํฅ๋ฏธ๋กœ์šด ๋…ผ๋ฌธ์„ ์†Œ๊ฐœํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋กœ๋ด‡ ์†์ด ์‚ฌ๋žŒ์ฒ˜๋Ÿผ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋ฅผ ์ง‘์„ ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ, ์ฆ‰ dexterous grasping์€ ๋กœ๋ด‡๊ณตํ•™์˜ ์˜ค๋žœ ์ˆ™์ œ์˜€์ฃ . ์ด ๋…ผ๋ฌธ์€ ๊ทธ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿค” ๋ฌธ์ œ์˜ ์‹œ์ž‘: ๋กœ๋ด‡์€ ์™œ โ€™๋ณด๋Š” ๊ฒƒโ€™์„ ๋ฐฐ์šฐ๊ธฐ ์–ด๋ ค์šธ๊นŒ?

๋กœ๋ด‡์ด ๋ฌผ์ฒด๋ฅผ ์ง‘๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์€ ์ƒ๊ฐ๋ณด๋‹ค ํ›จ์”ฌ ๋ณต์žกํ•œ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ์‚ฌ๋žŒ ์†์ฒ˜๋Ÿผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์†๊ฐ€๋ฝ์„ ๊ฐ€์ง„ ๋กœ๋ด‡ ์†(multi-fingered hand)์œผ๋กœ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋ฅผ ์ง‘๋Š” ๊ฒƒ์€ ์ •๋ง ์–ด๋ ค์šด ์ผ์ด์ฃ .

๊ธฐ์กด์—๋Š” ์–ด๋–ป๊ฒŒ ํ–ˆ์„๊นŒ?

์ „ํ†ต์ ์œผ๋กœ ๋กœ๋ด‡ ๊ทธ๋ž˜์Šคํ•‘ ์ •์ฑ…์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ํŒจ๋Ÿฌ๋‹ค์ž„์œผ๋กœ ๋ฐœ์ „ํ•ด์™”์Šต๋‹ˆ๋‹ค:

State-based RL ์ ‘๊ทผ๋ฒ• - ์™„๋ฒฝํ•œ ์ƒํƒœ ์ •๋ณด(๋ฌผ์ฒด ์œ„์น˜, ๊ด€์ ˆ ๊ฐ๋„ ๋“ฑ)์— ์ง์ ‘ ์ ‘๊ทผ - ๋†’์€ ํ•™์Šต ํšจ์œจ์„ฑ๊ณผ ์•ˆ์ •์„ฑ - ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์™„๋ฒฝํ•œ ์ƒํƒœ ์ •๋ณด๋ฅผ ์–ป๊ธฐ ์–ด๋ ค์›€

Vision-based Policy Distillation ์ ‘๊ทผ๋ฒ• - State-based teacher policy๋ฅผ ํ•™์Šตํ•œ ํ›„ - ์ด๋ฅผ vision-based student policy๋กœ distillation - ํ˜„์žฌ ์‚ฐ์—…๊ณ„์—์„œ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•

ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ distillation ์ ‘๊ทผ๋ฒ•์—๋Š” ๊ทผ๋ณธ์ ์ธ ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”๋กœ โ€œ๊ด€์ฐฐ ๊ฐ€๋Šฅ์„ฑ ๊ฒฉ์ฐจ(Observability Gap)โ€์ž…๋‹ˆ๋‹ค.

๐Ÿ” ํ•ต์‹ฌ ๋ฌธ์ œ: Observability Gap

์˜ˆ๋ฅผ ๋“ค์–ด ๋กœ๋ด‡ ํŒ”์ด ๋ฌผ์ฒด๋ฅผ ์žก์œผ๋ ค๊ณ  ํ•  ๋•Œ๋ฅผ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค. Teacher policy๋Š” ๋ฌผ์ฒด์˜ ์ •ํ™•ํ•œ 3D ์œ„์น˜ ์ •๋ณด๋ฅผ ์•Œ๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋กœ๋ด‡ ํŒ”์ด ๋ฌผ์ฒด๋ฅผ ๊ฐ€๋ฆฌ๊ณ  ์žˆ์–ด๋„ ๋ฌธ์ œ์—†์ด ํŒŒ์ง€ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์นด๋ฉ”๋ผ ์˜์ƒ๋งŒ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š” student policy๋Š” ๋กœ๋ด‡ ํŒ”์— ๊ฐ€๋ ค์ง„ ๋ฌผ์ฒด๋ฅผ ๋ณผ ์ˆ˜ ์—†์–ด teacher์˜ ํ–‰๋™์„ ์žฌํ˜„ํ•˜๋Š” ๋ฐ ์‹คํŒจํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ ์ด์ƒ์ ์ธ ๋น„์ „ ๊ธฐ๋ฐ˜ ์ •์ฑ…์ด๋ผ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ–‰๋™์„ ํ•™์Šตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: 1. ๋ฌผ์ฒด๊ฐ€ ๊ฐ€๋ ค์กŒ์Œ์„ ์ธ์‹ 2. ๋กœ๋ด‡ ํŒ”์„ ์›€์ง์—ฌ ์‹œ์•ผ๋ฅผ ํ™•๋ณด 3. ๋ฌผ์ฒด๋ฅผ ํ™•์ธํ•œ ํ›„ ํŒŒ์ง€ ์ˆ˜ํ–‰

์ด๋ฅผ โ€œActive Vision Behaviorโ€๋ผ๊ณ  ํ•˜๋ฉฐ, ์ด๋Š” distillation ๋ฐฉ์‹์œผ๋กœ๋Š” ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. Teacher policy๋Š” ์ด๋Ÿฌํ•œ ํ–‰๋™์„ ํ•  ํ•„์š”๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.


๐Ÿ’ก ์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด: ์ฒ˜์Œ๋ถ€ํ„ฐ ๋ณด๋ฉด์„œ ๋ฐฐ์šฐ์ž!

์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์ œ์•ˆ์€ ๋ช…ํ™•ํ•ฉ๋‹ˆ๋‹ค: ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ vision input์„ ์‚ฌ์šฉํ•˜์—ฌ RL๋กœ ์ง์ ‘ ํ•™์Šตํ•˜์ž๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

โœ… End-to-End RL์˜ ์žฅ์ 

Active Vision์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์ถœํ˜„ - ์ •์ฑ…์ด ์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ๊ฐ ์ •๋ณด๋งŒ์œผ๋กœ ํ•™์Šตํ•˜๋ฏ€๋กœ - ์‹œ์•ผ๋ฅผ ํ™•๋ณดํ•˜๊ณ , ๋ฌผ์ฒด๋ฅผ ์ถ”์ ํ•˜๋Š” ๋“ฑ์˜ ํ–‰๋™์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋‚˜ํƒ€๋‚จ - Observability gap์ด ๊ทผ๋ณธ์ ์œผ๋กœ ์กด์žฌํ•˜์ง€ ์•Š์Œ

๋” ๋‚˜์€ Sim-to-Real ์ „์ด - Teacher์™€ student ์‚ฌ์ด์˜ ํ–‰๋™ ๋ถˆ์ผ์น˜ ๋ฌธ์ œ ํ•ด๊ฒฐ - ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋„ ํ•™์Šต๋œ active vision ํ–‰๋™์ด ์œ ํšจ

โš ๏ธ ๊ทธ๋Ÿฐ๋ฐ ์™œ ์•ˆ ์ผ์„๊นŒ? Vision-based RL์˜ ๋ฌธ์ œ

๊ทธ๋ ‡๋‹ค๋ฉด ์™œ ๊ธฐ์กด์—๋Š” vision-based end-to-end RL์ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜์ง€ ์•Š์•˜์„๊นŒ์š”? ๊ทธ ์ด์œ ๋Š” ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ์— ์žˆ์Šต๋‹ˆ๋‹ค.

State-based RL vs Vision-based RL

State ๊ธฐ๋ฐ˜ ์ž…๋ ฅ: - ์ž…๋ ฅ ํฌ๊ธฐ: ~์ˆ˜์‹ญ์—์„œ ์ˆ˜๋ฐฑ ์ฐจ์›์˜ ๋ฒกํ„ฐ - ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰: ๋งค์šฐ ์ž‘์Œ - ๋ฐฐ์น˜ ํฌ๊ธฐ: ์ˆ˜๋งŒ ๊ฐœ ํ™˜๊ฒฝ ๋™์‹œ ์‹คํ–‰ ๊ฐ€๋Šฅ

Vision ๊ธฐ๋ฐ˜ ์ž…๋ ฅ: - ์ž…๋ ฅ ํฌ๊ธฐ: ์ด๋ฏธ์ง€ ํ”ฝ์…€ (์˜ˆ: 320ร—240ร—3 = 230,400 ์ฐจ์›) - ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰: state ๋Œ€๋น„ ์ˆ˜์ฒœ ๋ฐฐ - ๋ฐฐ์น˜ ํฌ๊ธฐ: ์ œํ•œ์ 

ํŠนํžˆ PPO(Proximal Policy Optimization)์™€ ๊ฐ™์€ on-policy RL ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํฐ ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. ์ž‘์€ ๋ฐฐ์น˜๋กœ๋Š” ์•ˆ์ •์ ์ธ gradient ์ถ”์ •์ด ์–ด๋ ต๊ณ , ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•ด์ง‘๋‹ˆ๋‹ค.


๐Ÿš€ ๊ฒŒ์ž„ ์ฒด์ธ์ €: Disaggregated Simulation

์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ธฐ์ˆ ์  ๊ธฐ์—ฌ๋Š” โ€œDisaggregated Simulation and RL Frameworkโ€์ž…๋‹ˆ๋‹ค.

๐Ÿค” ๊ธฐ์กด ๋ฐฉ์‹์˜ ๋ฌธ์ œ์ 

์ „ํ†ต์ ์ธ ๋ฐฉ์‹(Data Parallelism)์—์„œ๋Š”: - ๊ฐ GPU๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ + RL ํ•™์Šต์„ ๋ชจ๋‘ ์ˆ˜ํ–‰ - 4๊ฐœ GPU๊ฐ€ ์žˆ๋‹ค๋ฉด: GPU 1, 2, 3, 4๊ฐ€ ๊ฐ๊ฐ ๋…๋ฆฝ์ ์œผ๋กœ ์ž‘๋™ - ๊ฐ GPU์˜ ๋ฉ”๋ชจ๋ฆฌ์— ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ + ๊ฒฝํ—˜ ๋ฒ„ํผ + ๋ชจ๋ธ์ด ๋ชจ๋‘ ์ ์žฌ๋˜์–ด์•ผ ํ•จ

๋ฌธ์ œ์ : - Vision-based ํ™˜๊ฒฝ์€ ๋ Œ๋”๋ง์— ์—„์ฒญ๋‚œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ - ๊ฐ GPU์—์„œ ์†Œ์ˆ˜์˜ ํ™˜๊ฒฝ๋งŒ ์‹คํ–‰ ๊ฐ€๋Šฅ - ์ „์ฒด ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ PPO ํ•™์Šต์— ๋ถˆ์ถฉ๋ถ„

๐Ÿ’ก ์ƒˆ๋กœ์šด ์•„์ด๋””์–ด: ์—ญํ• ์„ ๋‚˜๋ˆ„์ž!

์ƒˆ๋กœ์šด ๋ฐฉ์‹์—์„œ๋Š” ์—ญํ• ์„ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค:

4๊ฐœ GPU ๋…ธ๋“œ์˜ ๊ฒฝ์šฐ: - GPU 1, 2, 3: ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ „์šฉ (Simulator Workers) - ํ™˜๊ฒฝ ๋ Œ๋”๋ง - ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ - ๊ฒฝํ—˜ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ

  • GPU 4: RL ํ•™์Šต ์ „์šฉ (Learner)
    • ์‹ ๊ฒฝ๋ง ํ•™์Šต
    • Experience buffer ๊ด€๋ฆฌ
    • Policy update

์ž‘๋™ ํ๋ฆ„: 1. Simulator workers๊ฐ€ ํ˜„์žฌ policy๋กœ ํ™˜๊ฒฝ์—์„œ ๊ฒฝํ—˜ ์ˆ˜์ง‘ 2. ์ˆ˜์ง‘๋œ ๊ฒฝํ—˜(observations, actions, rewards)์„ Learner๋กœ ์ „์†ก 3. Learner๊ฐ€ ๋ชจ๋“  ๊ฒฝํ—˜์„ ๋ชจ์•„ ํฐ ๋ฐฐ์น˜๋กœ ํ•™์Šต 4. ์—…๋ฐ์ดํŠธ๋œ policy๋ฅผ ๋‹ค์‹œ simulator workers๋กœ ๋ฐฐํฌ

๐Ÿ“ˆ ์„ฑ๋Šฅ ํ–ฅ์ƒ: ์ˆซ์ž๋กœ ๋ณด๋Š” ํšจ๊ณผ

๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ๊ฒฐ๊ณผ๋Š” ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค:

ํ™˜๊ฒฝ ์ˆ˜ ๋น„๊ต (NVIDIA L40S GPU 4๊ฐœ ๊ธฐ์ค€):

160ร—120 ํ•ด์ƒ๋„: - Data Parallel: 4,096 environments - Disaggregated: 8,704 environments (2.13๋ฐฐ)

320ร—240 ํ•ด์ƒ๋„: - Data Parallel: 512 environments - Disaggregated: 1,280 environments (2.5๋ฐฐ)

๋™์ผํ•œ ํ•˜๋“œ์›จ์–ด๋กœ 2๋ฐฐ ์ด์ƒ์˜ ํ™˜๊ฒฝ์„ ๋™์‹œ์— ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์€, PPO์˜ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ 2๋ฐฐ๋กœ ๋Š˜๋ฆด ์ˆ˜ ์žˆ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ํ•™์Šต ์•ˆ์ •์„ฑ๊ณผ ์ตœ์ข… ์„ฑ๋Šฅ ๋ชจ๋‘์— ์ง์ ‘์ ์œผ๋กœ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.

๐Ÿง  ์˜๊ฐ์˜ ์›์ฒœ: LLM์—์„œ ๋ฐฐ์šฐ๋‹ค

ํฅ๋ฏธ๋กญ๊ฒŒ๋„, ์ด ์•„์ด๋””์–ด๋Š” ํ˜„๋Œ€ LLM inference์—์„œ ์‚ฌ์šฉ๋˜๋Š” disaggregated prefill and decode ๊ธฐ๋ฒ•์—์„œ ์˜๊ฐ์„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. LLM serving์—์„œ: - Prefill: ๊ธด ์ž…๋ ฅ ํ† ํฐ์„ ์ฒ˜๋ฆฌ (compute-intensive) - Decode: ํ•œ ํ† ํฐ์”ฉ ์ƒ์„ฑ (memory-intensive)

์ด ๋‘ ์ž‘์—…์„ ๋ถ„๋ฆฌํ•˜๋ฉด ํšจ์œจ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ RL ํ•™์Šต์„ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ๋„ ์œ ์‚ฌํ•œ ํšจ๊ณผ๋ฅผ ๋ƒ…๋‹ˆ๋‹ค.


๐ŸŽฏ ์‹ค์ „ ์ „๋žต: Depth โ†’ Stereo RGB ํŒŒ์ดํ”„๋ผ์ธ

๋…ผ๋ฌธ์€ ๋‹จ์ˆœํžˆ end-to-end vision RL๋งŒ ์ œ์•ˆํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค. ์‹ค์ œ ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ํ˜„์‹ค์ ์ธ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽจ ์™œ Depth๋ฅผ ์ค‘๊ฐ„ ๋‹จ๊ณ„๋กœ?

RGB end-to-end RL์˜ ๋ฌธ์ œ: - Photorealistic rendering์€ ๋งค์šฐ ๋А๋ฆผ - ์ •ํ™•ํ•œ ๋น› ์ „๋‹ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•„์š” - ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋„ˆ๋ฌด ๋†’์Œ

Depth์˜ ์žฅ์ : - ๋ Œ๋”๋ง์ด ํ›จ์”ฌ ๋น ๋ฆ„ (geometric information๋งŒ) - ๋ฌผ์ฒด์˜ 3D ๊ตฌ์กฐ ์ •๋ณด๋ฅผ ์ง์ ‘ ์ œ๊ณต - ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํšจ์œจ์ ์œผ๋กœ ์ƒ์„ฑ ๊ฐ€๋Šฅ

์‹ค์ œ ๋ฐฐํฌ์˜ ํ•„์š”: - ์‹ค์ œ ๋กœ๋ด‡์—๋Š” stereo RGB ์นด๋ฉ”๋ผ ์‚ฌ์šฉ - Depth sensor๋Š” ๋ถ€์ •ํ™•ํ•˜๊ฑฐ๋‚˜ ์‚ฌ์šฉ ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ ๋งŽ์Œ

๐Ÿ”„ 3๋‹จ๊ณ„๋กœ ๋‚˜๋ˆ„์–ด ์ •๋ณตํ•˜๊ธฐ

Stage 1: Depth-based End-to-End RL

Input: Depth images (์‹œ๋ฎฌ๋ ˆ์ด์…˜)
Method: PPO with disaggregated simulation
Output: Depth-based teacher policy
  • ๋Œ€๊ทœ๋ชจ ํ™˜๊ฒฝ์—์„œ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šต
  • Active vision ํ–‰๋™ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ•™์Šต

Stage 2: Distillation to Stereo RGB

Input: Stereo RGB images (์‹œ๋ฎฌ๋ ˆ์ด์…˜)
Teacher: Depth-based policy
Output: Stereo RGB student policy
  • Depth teacher์˜ ํ–‰๋™์„ stereo RGB๋กœ distillation
  • ์—ฌ์ „ํžˆ ๊ด€์ฐฐ ๊ฐ€๋Šฅ์„ฑ ๊ฒฉ์ฐจ ์กด์žฌํ•˜์ง€๋งŒ, state โ†’ vision๋ณด๋‹ค ํ›จ์”ฌ ์ž‘์Œ

Stage 3: Sim-to-Real Transfer

Domain Randomization + Real-world deployment
  • ADR (Automatic Domain Randomization) ์ ์šฉ
  • ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ, ์กฐ๋ช…, ํ…์Šค์ฒ˜ ๋“ฑ ๋žœ๋คํ™”

๐Ÿ† Depth๊ฐ€ State๋ณด๋‹ค ๋‚˜์€ ์ด์œ 

๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์‹คํ—˜์  ๋ฐœ๊ฒฌ ์ค‘ ํ•˜๋‚˜๋Š” depth teacher๊ฐ€ state teacher๋ณด๋‹ค ์šฐ์›”ํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋น„๊ต ์„ค์ •: 1. State teacher โ†’ Stereo RGB student (๊ธฐ์กด ๋ฐฉ์‹) 2. Depth teacher โ†’ Stereo RGB student (์ œ์•ˆ ๋ฐฉ์‹)

๊ฒฐ๊ณผ (์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ฑ๊ณต๋ฅ ): - State teacher: ~70-80% - Depth teacher (baseline): ~85% - Depth teacher (disaggregated): ~90%

์™œ Depth๊ฐ€ ๋” ๋‚˜์€๊ฐ€?

Observability gap์˜ ๊ด€์ ์—์„œ: - State์™€ Vision: ์™„์ „ํžˆ ๋‹ค๋ฅธ ์ •๋ณด ํ˜•์‹ - State: ์ •ํ™•ํ•œ ์ˆซ์ž ๊ฐ’ (๋ฌผ์ฒด ์ขŒํ‘œ [x, y, z]) - Vision: ํ”ฝ์…€ ํŒจํ„ด, ๋ถˆํ™•์‹ค์„ฑ ์กด์žฌ

  • Depth์™€ Stereo RGB: ์œ ์‚ฌํ•œ ์ •๋ณด ํ˜•์‹
    • ๋‘˜ ๋‹ค ์‹œ๊ฐ์  ํ‘œํ˜„
    • ๋‘˜ ๋‹ค ๊ธฐํ•˜ํ•™์  ์ •๋ณด ํฌํ•จ
    • Stereo RGB์—์„œ depth ์žฌ๊ตฌ์„ฑ ๊ฐ€๋Šฅ

์ฆ‰, depth teacher๋Š” student๊ฐ€ ์‹ค์ œ๋กœ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ์ •๋ณด์™€ ๋” ์œ ์‚ฌํ•œ ๋ฐฉ์‹์œผ๋กœ โ€œ์ƒ๊ฐโ€ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ”ฌ ์‹คํ—˜ ํ™˜๊ฒฝ: ๋ฌด์—‡์„ ๊ฐ€์ง€๊ณ  ํ…Œ์ŠคํŠธํ–ˆ๋‚˜?

๐Ÿค– ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ

์‹œ๋ฎฌ๋ ˆ์ด์…˜: - Kuka iiwa 7-DoF ๋กœ๋ด‡ ํŒ” - Allegro Hand V4 16-DoF dexterous hand - ์ด 23-DoF ์‹œ์Šคํ…œ

์‹ค์ œ ๋กœ๋ด‡: - ๋™์ผํ•œ Kuka + Allegro ๊ตฌ์„ฑ - Stereo RGB ์นด๋ฉ”๋ผ (ZED 2) - ํŒŒ์ง€ ํ›„ ๋ฌผ์ฒด๋ฅผ bin์— ํˆฌํ•˜ํ•˜๋Š” ํƒœ์Šคํฌ

๐Ÿ“ฆ ํ…Œ์ŠคํŠธ ๋ฌผ์ฒด๋“ค

Visual Dexterity Dataset: - 140๊ฐœ์˜ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด - ์ผ์ƒ์ ์ธ ๋ฌผ๊ฑด๋“ค (์ปต, ๋„๊ตฌ, ์žฅ๋‚œ๊ฐ ๋“ฑ) - ๋‹ค์–‘ํ•œ ํ˜•์ƒ๊ณผ ํฌ๊ธฐ

ํ‰๊ฐ€ ๋ฉ”ํŠธ๋ฆญ: - Success Rate: ๋ฌผ์ฒด๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํŒŒ์ง€ํ•˜๊ณ  bin์— ํˆฌํ•˜ํ•œ ๋น„์œจ

๐ŸŽฒ ํ˜„์‹ค๊ฐ ๋”ํ•˜๊ธฐ: Domain Randomization

Sim-to-real gap์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ADR (Automatic Domain Randomization) ์‚ฌ์šฉ:

๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ: - Joint friction, stiffness, damping - Object mass, friction - Contact parameters

์‹œ๊ฐ์  ์š”์†Œ (RGB ํ•™์Šต ์‹œ): - Lighting conditions - Texture randomization - Camera parameters

ADR์˜ ํŠน์ง•: - ์ดˆ๊ธฐ์—๋Š” ์ž‘์€ randomization ๋ฒ”์œ„ - ์ •์ฑ…์ด ์•ˆ์ •ํ™”๋˜๋ฉด ์ ์ง„์ ์œผ๋กœ ๋ฒ”์œ„ ํ™•๋Œ€ - ์ตœ์ข…์ ์œผ๋กœ ๋„“์€ ๋ฒ”์œ„์—์„œ๋„ robustํ•œ ์ •์ฑ… ํ•™์Šต


๐Ÿ“Š ๊ฒฐ๊ณผ๋Š” ์–ด๋• ์„๊นŒ? (์Šคํฌ์ผ๋Ÿฌ: ๋Œ€์„ฑ๊ณต!)

๐Ÿ’ป ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ์˜ ์„ฑ์ ํ‘œ

Depth End-to-End RL ๋น„๊ต:

Resolution Method Environments Success Rate
160ร—120 Data Parallel 4,096 ~82%
160ร—120 Disaggregated 8,704 ~88%
320ร—240 Data Parallel 512 ~75%
320ร—240 Disaggregated 1,280 ~85%

์ฃผ์š” ๋ฐœ๊ฒฌ: - Disaggregated ๋ฐฉ์‹์ด ๋ชจ๋“  ํ•ด์ƒ๋„์—์„œ ์šฐ์ˆ˜ - ๋” ๋†’์€ ํ•ด์ƒ๋„(320ร—240)์—์„œ๋„ ํšจ๊ณผ์  - ๋ฐฐ์น˜ ํฌ๊ธฐ ์ฆ๊ฐ€๊ฐ€ ์ง์ ‘์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์ง

๐ŸŽ“ Teacher ๋น„๊ต: ๋ˆ„๊ฐ€ ๋” ์ž˜ ๊ฐ€๋ฅด์น ๊นŒ?

Teacher Type ๋น„๊ต (Stereo RGB Student ๊ธฐ์ค€):

Teacher Type Student Success Rate
State (baseline) ~72%
Depth (data parallel) ~79%
Depth (disaggregated) ~85%

๋ถ„์„: - Depth teacher๊ฐ€ state teacher๋ณด๋‹ค 7-13% ๋†’์€ ์„ฑ๋Šฅ - Observability gap ๊ฐ์†Œ๊ฐ€ ์‹ค์งˆ์ ์ธ ์ด๋“์œผ๋กœ ๋‚˜ํƒ€๋‚จ - Disaggregated simulation์˜ ํšจ๊ณผ๊ฐ€ distillation ํ›„์—๋„ ์œ ์ง€

๐ŸŒ ์‹ค์ œ ๋กœ๋ด‡์—์„œ๋Š”?

Real-World Success Rate:

๋…ผ๋ฌธ์—์„œ๋Š” ์‹ค์ œ ๋กœ๋ด‡์—์„œ ์ด์ „ state-of-the-art vision-based ๋ฐฉ๋ฒ•์„ ๋Šฅ๊ฐ€ํ–ˆ๋‹ค๊ณ  ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์ธ ์ˆ˜์น˜๋Š” ๋ช…์‹œ๋˜์ง€ ์•Š์•˜์ง€๋งŒ, ๋‹ค์Œ ์‚ฌํ•ญ๋“ค์ด ๊ด€์ฐฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค:

์ •์„ฑ์  ์„ฑ๊ณผ: - ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์— ๋Œ€ํ•ด robustํ•œ ํŒŒ์ง€ - Active vision ํ–‰๋™์˜ ์‹ค์ œ ๋ฐœํ˜„ (์˜ˆ: ๋” ๋‚˜์€ ์‹œ์•ผ ํ™•๋ณด๋ฅผ ์œ„ํ•œ ์›€์ง์ž„) - ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ฑ๋Šฅ๊ณผ์˜ gap์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ์ž‘์Œ

Sim-to-Real Transfer์˜ ํšจ๊ณผ: - Depth teacher ๊ธฐ๋ฐ˜ ์ •์ฑ…์ด ๋” ๋‚˜์€ ์ „์ด ํŠน์„ฑ - ADR๊ณผ์˜ ์‹œ๋„ˆ์ง€ ํšจ๊ณผ - ์‹ค์ œ ํ™˜๊ฒฝ์˜ ๋ถˆํ™•์‹ค์„ฑ์— ๋Œ€ํ•œ robustํ•จ


๐Ÿ› ๏ธ ๊ธฐ์ˆ ์  ๊นŠ์ด: ์–ด๋–ป๊ฒŒ ๊ตฌํ˜„ํ–ˆ์„๊นŒ?

๐Ÿงฑ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ

๋…ผ๋ฌธ์—์„œ๋Š” ๊ตฌ์ฒด์ ์ธ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ƒ์„ธํžˆ ๋ฐํžˆ์ง€ ์•Š์•˜์ง€๋งŒ, vision-based RL์˜ ์ผ๋ฐ˜์ ์ธ ์ ‘๊ทผ์„ ๋”ฐ๋ฅผ ๊ฒƒ์œผ๋กœ ์ถ”์ •๋ฉ๋‹ˆ๋‹ค:

Encoder (Vision): - Convolutional layers for spatial feature extraction - Possibly ResNet-like architecture - Separate encoders for left/right stereo images

Policy/Value Networks: - MLP layers after vision encoding - Separate heads for actor and critic (PPO) - Action output: 23-dimensional continuous control

โš™๏ธ ํ•™์Šต ์„ค์ •

PPO Configuration: - Horizon length: ๋…ผ๋ฌธ์—์„œ ์˜ˆ์‹œ๋กœ 3์„ ์–ธ๊ธ‰ (์‹ค์ œ๋Š” ๋” ๊ธธ ์ˆ˜ ์žˆ์Œ) - Learning rate, entropy coefficient ๋“ฑ์€ ๋ช…์‹œ๋˜์ง€ ์•Š์Œ - 5 seeds average๋กœ ์‹คํ—˜ ์žฌํ˜„์„ฑ ํ™•๋ณด

Simulation Scale: - ์ˆ˜์ฒœ~์ˆ˜๋งŒ ๊ฐœ์˜ parallel environments - GPU memory๋ฅผ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๋Š” ํ™˜๊ฒฝ ์ˆ˜ ์„ค์ •

๐Ÿ”ง ๊ตฌํ˜„ํ•  ๋•Œ ์ฃผ์˜ํ•  ์ ๋“ค

Communication Overhead: - Simulator workers์™€ learner ๊ฐ„์˜ ๋ฐ์ดํ„ฐ ์ „์†ก - High-resolution images์˜ ํšจ์œจ์ ์ธ ์ „๋‹ฌ - Network bandwidth ๊ณ ๋ ค

Load Balancing: - 3๊ฐœ simulator vs 1๊ฐœ learner์˜ ๊ท ํ˜• - Simulator๊ฐ€ ๋„ˆ๋ฌด ๋น ๋ฅด๋ฉด learner๊ฐ€ bottleneck - Learner๊ฐ€ ๋„ˆ๋ฌด ๋น ๋ฅด๋ฉด simulator๊ฐ€ ์œ ํœด

Memory Management: - Experience buffer์˜ ํšจ์œจ์ ์ธ ๊ด€๋ฆฌ - Image data์˜ ์••์ถ•/๋น„์••์ถ• trade-off


๐Ÿ”„ ๋‹ค๋ฅธ ์—ฐ๊ตฌ๋“ค๊ณผ ๋น„๊ตํ•˜๋ฉด?

๐Ÿ‘๏ธ Vision-based Grasping์˜ ์„ ๋ฐฐ๋“ค

Parallel Jaw Grippers with End-to-End RL: - QT-Opt (Kalashnikov et al.), ๋“ฑ - ๋‹จ์ˆœํ•œ gripper๋กœ๋Š” end-to-end RL์ด ์„ฑ๊ณต - ํ•˜์ง€๋งŒ dexterous hand์—๋Š” ์ ์šฉ ์–ด๋ ค์›€ - ์ด์œ : ํ›จ์”ฌ ๋†’์€ action space dimension (2 DoF vs 16+ DoF)

DextrAH Series: - DextrAH-RGB (์ด์ „ state-of-the-art) - State teacher โ†’ vision student distillation - ๋ณธ ๋…ผ๋ฌธ์ด ์ด๋ฅผ ์„ฑ๋Šฅ์ ์œผ๋กœ ๋Šฅ๊ฐ€

๐Ÿคน Dexterous Manipulation์˜ ๋‹ค๋ฅธ ์ ‘๊ทผ๋“ค

UniDexGrasp, DexGraspNet: - Large-scale dataset ๊ธฐ๋ฐ˜ ์ ‘๊ทผ - ์ฃผ๋กœ grasp pose generation์— ์ง‘์ค‘ - ๋ณธ ๋…ผ๋ฌธ์€ dynamic execution policy์— ์ง‘์ค‘

DemoGrasp, AdaDexGrasp: - Single demonstration์—์„œ ํ•™์Šต - RL์„ ์‚ฌ์šฉํ•˜์ง€๋งŒ trajectory editing ๋ฐฉ์‹ - ๋ณธ ๋…ผ๋ฌธ๊ณผ๋Š” ๋‹ค๋ฅธ ๋ฐฉํ–ฅ์˜ ์ ‘๊ทผ

๐ŸŒŸ ์ด ๋…ผ๋ฌธ๋งŒ์˜ ํŠน๋ณ„ํ•จ

์ตœ์ดˆ ์„ฑ๊ณผ: - Multi-fingered hand์—์„œ end-to-end vision RL์˜ ์ตœ์ดˆ sim-to-real ์„ฑ๊ณต - ์ด๋Š” ๋กœ๋ด‡ ๊ณตํ•™ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ์ด์ •ํ‘œ

์‹ค์šฉ์  ์†”๋ฃจ์…˜: - ๋‹จ์ˆœํžˆ ์ด๋ก ์  ์ œ์•ˆ์ด ์•„๋‹˜ - ์‹ค์ œ ํ•˜๋“œ์›จ์–ด ์ œ์•ฝ ํ•˜์—์„œ ์ž‘๋™ํ•˜๋Š” ์‹œ์Šคํ…œ - Scalableํ•œ ๊ตฌํ˜„ ๋ฐฉ๋ฒ• ์ œ์‹œ


๐Ÿคท ์™„๋ฒฝํ•˜์ง„ ์•Š์•„์š”: ํ•œ๊ณ„์ ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ

๐Ÿ˜… ์•„์ง์€ ์ด๋Ÿฐ ๋ถ€๋ถ„์ด ์•„์‰ฌ์›Œ์š”

ํ•˜๋“œ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ: - ์—ฌ์ „ํžˆ ๋‹ค์ˆ˜์˜ ๊ณ ์„ฑ๋Šฅ GPU ํ•„์š” - 4๊ฐœ GPU๊ฐ€ ์ตœ์†Œ ์š”๊ตฌ์‚ฌํ•ญ - ์†Œ๊ทœ๋ชจ ์—ฐ๊ตฌ์‹ค์—๋Š” ์ง„์ž… ์žฅ๋ฒฝ

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์˜์กด์„ฑ: - ์—ฌ์ „ํžˆ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต - Sim-to-real gap์ด ์™„์ „ํžˆ ํ•ด์†Œ๋˜์ง€ ์•Š์Œ - Real-world์—์„œ ์ง์ ‘ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ• ํ•„์š”

Task Specificity: - Grasp-and-place ํƒœ์Šคํฌ์— ํŠนํ™” - ๋” ๋ณต์žกํ•œ manipulation์— ๋Œ€ํ•œ ํ™•์žฅ์„ฑ ๋ถˆ๋ช…ํ™•

๐Ÿ”จ ๋ฐ”๋กœ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ๋“ค

Real-World RL Fine-tuning: - ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต ํ›„ - ์‹ค์ œ ๋กœ๋ด‡์—์„œ ์†Œ๋Ÿ‰์˜ ์ถ”๊ฐ€ ํ•™์Šต - SERL, DexGraspRL ๋“ฑ์˜ ๋ฐฉ๋ฒ•๊ณผ ๊ฒฐํ•ฉ

Multi-task Learning: - ๋‹จ์ผ ํƒœ์Šคํฌ๊ฐ€ ์•„๋‹Œ ๋‹ค์–‘ํ•œ manipulation skill ํ•™์Šต - Task-conditioned policy - Transfer learning across tasks

Tactile Integration: - ํ˜„์žฌ๋Š” vision๋งŒ ์‚ฌ์šฉ - Tactile sensor ์ถ”๊ฐ€ ์‹œ ๋” robustํ•œ ํŒŒ์ง€ ๊ฐ€๋Šฅ - Vision + tactile์˜ multi-modal learning

๐Ÿš€ ๋ฏธ๋ž˜์—๋Š” ์ด๋Ÿฐ ๊ฒƒ๋„ ๊ฐ€๋Šฅํ• ๊นŒ?

Foundation Models for Manipulation: - ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ pre-training - Fine-tuning for specific tasks - Vision-Language-Action models

Hierarchical RL: - High-level planning + low-level control - ๋” ๋ณต์žกํ•œ long-horizon tasks - Active vision์„ ๋ช…์‹œ์ ์œผ๋กœ ๋ชจ๋ธ๋ง

Sample Efficiency ๊ฐœ์„ : - Model-based RL ์ ‘๊ทผ - World models for dexterous manipulation - Off-policy ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ™œ์šฉ


โœจ ๊ฒฐ๋ก : ๋กœ๋ด‡์ด ์ง„์งœ๋กœ โ€˜๋ณด๋ฉด์„œโ€™ ๋ฐฐ์šฐ๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค

โœจ ๊ฒฐ๋ก : ๋กœ๋ด‡์ด ์ง„์งœ๋กœ โ€˜๋ณด๋ฉด์„œโ€™ ๋ฐฐ์šฐ๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค

์ด ๋…ผ๋ฌธ์ด ์šฐ๋ฆฌ์—๊ฒŒ ์•Œ๋ ค์ค€ ๊ฒƒ๋“ค

์ด ๋…ผ๋ฌธ์€ dexterous grasping ๋ถ„์•ผ์— ์ •๋ง ์ค‘์š”ํ•œ ์„ธ ๊ฐ€์ง€๋ฅผ ๋ณด์—ฌ์คฌ์Šต๋‹ˆ๋‹ค:

1. ๋˜‘๋˜‘ํ•œ ์ธํ”„๋ผ ์„ค๊ณ„์˜ ํž˜ ๐Ÿ’ช - Disaggregated simulation์œผ๋กœ ๋™์ผํ•œ GPU๋กœ 2๋ฐฐ ์ด์ƒ์˜ ํ™˜๊ฒฝ ์‹คํ–‰ - ๋•Œ๋กœ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณด๋‹ค ์‹œ์Šคํ…œ ์„ค๊ณ„๊ฐ€ ๋” ์ค‘์š”ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ตํ›ˆ - ๋‹ค๋ฅธ vision-based robotics task์—๋„ ๋ฐ”๋กœ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์•„์ด๋””์–ด

2. ์ค‘๊ฐ„ ๋‹จ๊ณ„์˜ ์ง€ํ˜œ ๐ŸŽฏ - Depth๋ฅผ ์ค‘๊ฐ„ ๋‹จ๊ณ„๋กœ ์‚ฌ์šฉํ•˜๋Š” ํ˜„์‹ค์ ์ธ ์ ‘๊ทผ - State โ†’ Vision๋ณด๋‹ค Depth โ†’ Vision์ด ํ›จ์”ฌ ํšจ๊ณผ์  - ์ด๋ก ๊ณผ ์‹ค์šฉ์„ฑ ์‚ฌ์ด์˜ ์™„๋ฒฝํ•œ ๊ท ํ˜•์ 

3. ์ตœ์ดˆ์˜ ์„ฑ๊ณต ์‚ฌ๋ก€ ๐Ÿ† - Multi-fingered hand์—์„œ end-to-end vision RL์˜ ์ฒซ sim-to-real ์„ฑ๊ณต - State-of-the-art ์„ฑ๋Šฅ ๋‹ฌ์„ฑ - Active vision ํ–‰๋™์ด ์‹ค์ œ๋กœ ๋‚˜ํƒ€๋‚จ์„ ํ™•์ธ

๋กœ๋ด‡๊ณตํ•™ ๋ถ„์•ผ์— ๋˜์ง€๋Š” ๋ฉ”์‹œ์ง€

ํŒจ๋Ÿฌ๋‹ค์ž„์ด ๋ฐ”๋€Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค - Distillation๋งŒ์ด ์œ ์ผํ•œ ๋ฐฉ๋ฒ•์ด ์•„๋‹˜ - End-to-end learning์ด ์ด์ œ ์‹ค์šฉ์ ์ธ ์„ ํƒ์ง€๊ฐ€ ๋จ - Infrastructure ๊ฐœ์„ ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜๋งŒํผ ์ค‘์š”

์•„์ง ํ•ด๊ฒฐํ•  ๋ฌธ์ œ๋“ค - ๋” ํšจ์œจ์ ์ธ vision-based RL ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ•„์š” - Sim-to-real gap์„ ๋” ์ค„์ผ ๋ฐฉ๋ฒ• ํƒ๊ตฌ - ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์ง์ ‘ ํ•™์Šตํ•˜๋Š” ์‹œ์Šคํ…œ์œผ๋กœ ๋ฐœ์ „

์•ž์œผ๋กœ์˜ ์ „๋ง

์ด ์—ฐ๊ตฌ๋Š” ์‹œ์ž‘์ ์ž…๋‹ˆ๋‹ค. ์•ž์œผ๋กœ ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฐ ๊ฒƒ๋“ค์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์•„์š”: - ๋” ๋ณต์žกํ•œ manipulation tasks๋กœ์˜ ํ™•์žฅ - ๋” ์ ์€ ์ปดํ“จํŒ… ์ž์›์œผ๋กœ๋„ ๊ฐ€๋Šฅํ•œ ํ•™์Šต - ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๋ฐ”๋กœ ํ•™์Šตํ•˜๋Š” ์‹œ์Šคํ…œ - ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ”Œ๋žซํผ์œผ๋กœ์˜ ์ผ๋ฐ˜ํ™”

ํ•œ ์ค„ ์š”์•ฝ

๋กœ๋ด‡์ด ์ธ๊ฐ„์ฒ˜๋Ÿผ โ€œ๋ณด๊ณ  ๋ฐฐ์šฐ๋Š”โ€ ๊ฒƒ์€ ๋” ์ด์ƒ ๊ณต์ƒ๊ณผํ•™์ด ์•„๋‹™๋‹ˆ๋‹ค. ์˜ฌ๋ฐ”๋ฅธ infrastructure์™€ ๋ฐฉ๋ฒ•๋ก ๋งŒ ์žˆ๋‹ค๋ฉด, vision-based end-to-end learning์€ dexterous manipulation์˜ ๋ฏธ๋ž˜๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ๐Ÿš€

ํ•จ๊ป˜ ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ๋“ค - DextrAH-RGB: ์ด์ „ state-of-the-art vision-based grasping - UniDexGrasp: ๋Œ€๊ทœ๋ชจ dexterous grasping - DexGraspNet: Dexterous grasp ๋ฐ์ดํ„ฐ์…‹ - DemoGrasp: ๋‹จ์ผ ์‹œ์—ฐ์—์„œ ํ•™์Šตํ•˜๊ธฐ - Visual Dexterity: ์ด ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹

๐Ÿ’ฌ ๋งˆ์น˜๋ฉฐ

์ด ๋…ผ๋ฌธ์€ ๊ธฐ์ˆ ์  ๊นŠ์ด์™€ ์‹ค์šฉ์  ๊ฐ€์น˜๋ฅผ ๋ชจ๋‘ ๊ฐ–์ถ˜ ํ›Œ๋ฅญํ•œ ์—ฐ๊ตฌ๋ผ๊ณ  ์ƒ๊ฐํ•ด์š”. ํŠนํžˆ disaggregated simulation์ด๋ผ๋Š” ์˜๋ฆฌํ•œ engineering solution๊ณผ depth-based learning์ด๋ผ๋Š” ์‹ค์šฉ์  ์ ‘๊ทผ์ด ๊ฒฐํ•ฉ๋˜์–ด, ๊ทธ๋™์•ˆ ์–ด๋ ค์› ๋˜ vision-based dexterous grasping์„ ํ˜„์‹คํ™”ํ–ˆ๋‹ค๋Š” ์ ์ด ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค.

๋กœ๋ด‡๊ณตํ•™์„ ์—ฐ๊ตฌํ•˜์‹œ๋Š” ๋ถ„๋“ค๊ป˜ ์ฃผ๋Š” ๊ตํ›ˆ์€ ๋ช…ํ™•ํ•ฉ๋‹ˆ๋‹ค: ๋•Œ๋กœ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐœ์„ ๋งŒํผ์ด๋‚˜ infrastructure์˜ ๊ฐœ์„ ์ด ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ด์ฃ . ๋” ๋งŽ์€ ํ™˜๊ฒฝ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ํ•™์Šต ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โ›๏ธ Dig Review

โ›๏ธ Dig โ€” Go deep, uncover the layers. Dive into technical detail.

์„œ๋ก : ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ๋‹ค์ง€ ๊ทธ๋ฆฌํ•‘์˜ ๋„์ „๊ณผ ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„

ํ˜„๋Œ€ ๋กœ๋ด‡๊ณตํ•™์—์„œ ๋‹ค์ง€ ๋กœ๋ด‡ ์†์„ ์‚ฌ์šฉํ•œ ์„ฌ์„ธํ•œ ๊ทธ๋ฆฌํ•‘(dexterous grasping)์€ ๋†’์€ ๊ธฐ๋ฏผ์„ฑ๊ณผ ์ ์‘์„ฑ์„ ์š”๊ตฌํ•˜๋Š” ์–ด๋ ค์šด ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ณต์žกํ•œ ์กฐ์ž‘ ๊ณผ์ œ์—์„œ๋Š” ๋กœ๋ด‡์ด ์‹œ๊ฐ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ฃผ๋ณ€ ํ™˜๊ฒฝ์„ ์ธ์‹ํ•˜๊ณ  ๋ฌผ์ฒด๋ฅผ ์ •ํ™•ํžˆ ํŒŒ์ง€ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ๋น„์ „ ๊ธฐ๋ฐ˜ ์ •์ฑ…(vision-based policy)์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์€ ํ•„์ˆ˜์ ์ด์ง€๋งŒ, ์ด๋ฏธ์ง€์™€ ๊ฐ™์€ ๊ณ ์ฐจ์› ์ž…๋ ฅ ๊ณต๊ฐ„์—์„œ์˜ ๊ฐ•ํ™”ํ•™์Šต์€ ์ƒ˜ํ”Œ ๋ณต์žก๋„๊ฐ€ ๋งค์šฐ ๋†’์•„ ์˜ค๋ž˜ ์ „๋ถ€ํ„ฐ ๋‚œ์ œ๋กœ ์ธ์‹๋˜์–ด ์™”์Šต๋‹ˆ๋‹ค.

๊ณผ๊ฑฐ์—๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹จ๊ณ„๋ณ„ ํ•™์Šต ๋˜๋Š” ๊ต์‚ฌ-ํ•™์ƒ(distillation) ์ ‘๊ทผ๋ฒ•์ด ์ฃผ๋กœ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ƒํƒœ ์ •๋ณด(์˜ˆ: ๋ฌผ์ฒด์˜ ์œ„์น˜, ๋กœ๋ด‡ ๊ด€์ ˆ๊ฐ ๋“ฑ ํŠน๊ถŒ ์ •๋ณด)๋ฅผ ํ™œ์šฉํ•ด ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๊ต์‚ฌ ์ •์ฑ…์„ ๋จผ์ € ํ›ˆ๋ จํ•œ ํ›„, ์ด๋ฅผ ๋ชจ๋ฐฉ ํ•™์Šต์œผ๋กœ ๋น„์ „ ๊ธฐ๋ฐ˜ ํ•™์ƒ ์ •์ฑ…์— ์ฆ๋ฅ˜(distill)ํ•˜๋Š” 2๋‹จ๊ณ„ ๋ฐฉ๋ฒ•์ด ๋งŽ์ด ์—ฐ๊ตฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ์€ ๊ต์‚ฌ์™€ ํ•™์ƒ์˜ ์ž…๋ ฅ ๊ณต๊ฐ„์„ ๋‹ค๋ฅด๊ฒŒ ์„ค๊ณ„ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์œ ์—ฐ์„ฑ์„ ์ œ๊ณตํ•˜์—ฌ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ๋งŒ ์ด์šฉ ๊ฐ€๋Šฅํ•œ ์™„์ „ ์ƒํƒœ ์ •๋ณด๋ฅผ ๊ต์‚ฌ์— ์ฃผ์–ด RL ํ•™์Šต์„ ์šฉ์ดํ•˜๊ฒŒ ํ•˜๊ณ , ์ดํ›„ ํ•™์ƒ์€ ์นด๋ฉ”๋ผ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ ๋™์ž‘์„ ๋ชจ์‚ฌํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ RL์˜ ๋‚ฎ์€ ์ƒ˜ํ”Œ ํšจ์œจ ๋ฌธ์ œ๋ฅผ ์ผ๋ถ€ ํšŒํ”ผํ•˜๊ณ , ํ•™์Šต ๊ณผ์ •์„ โ€œํ–‰๋™ ํ•™์Šต(๊ต์‚ฌ ๋‹จ๊ณ„)โ€๊ณผ โ€œํ‘œํ˜„ ํ•™์Šต(ํ•™์ƒ ๋‹จ๊ณ„)โ€์œผ๋กœ ํŒฉํ† ๋ผ์ด์ฆˆํ•˜๋Š” ์ด์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋กœ๋ด‡ ํŒ”-์† ์‹œ์Šคํ…œ์˜ ๋‹ค์–‘ํ•œ ์ž‘์—…์ด ์„ฑ๊ณต์ ์œผ๋กœ ํ•™์Šต๋œ ๋ฐ” ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๊ธฐ์กด ๋‹จ๊ณ„๋ณ„ ์ ‘๊ทผ๋ฒ•์—๋Š” ๋ณธ์งˆ์ ์ธ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ€์žฅ ํฐ ๋ฌธ์ œ๋Š” ๊ต์‚ฌ ์ •์ฑ…์ด ์‹œ๊ฐ ์ •๋ณด๋ฅผ ์ง์ ‘ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ, ํ•™์ƒ ์ •์ฑ…์ด ์‹œ๊ฐ์ ์œผ๋กœ ์ตœ์ ํ™”๋œ ํ–‰๋™์„ ๋ฐฐ์šฐ์ง€ ๋ชปํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ต์‚ฌ RL ์ •์ฑ…์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ์ œ๊ณตํ•˜๋Š” ๋ฌผ์ฒด์˜ ์ง€๋ฉด์ง„์‹ค ์œ„์น˜(ground-truth pose)๋ฅผ ์•Œ๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋กœ๋ด‡ ํŒ”์ด ์นด๋ฉ”๋ผ ์‹œ์•ผ๋ฅผ ๊ฐ€๋ฆฌ๊ณ  ์žˆ์–ด๋„ ๋ฐ”๋กœ ๋ฌผ์ฒด๋ฅผ ์ง‘์–ด๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋ฅผ ๊ทธ๋Œ€๋กœ ๋ชจ๋ฐฉํ•˜๋Š” ํ•™์ƒ ์ •์ฑ…์€ ์˜ค์ง ์นด๋ฉ”๋ผ ์˜์ƒ๋งŒ ๋ณด๋ฉด์„œ ํ•™์Šตํ•˜๋ฏ€๋กœ, ํŒ”์ด ๋ฌผ์ฒด๋ฅผ ๊ฐ€๋ฆฐ ์ƒํ™ฉ์—์„œ ํŒ”์„ ์น˜์›Œ ์‹œ์•ผ๋ฅผ ํ™•๋ณดํ•˜๋Š” ํ–‰๋™์„ ๋ฐฐ์šฐ์ง€ ๋ชปํ•œ ์ฑ„ ๋ฌผ์ฒด๋ฅผ ์žก์œผ๋ ค ํ•ด ์‹คํŒจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ๊ต์‚ฌ-ํ•™์ƒ ๊ฐ„ ๊ด€์ธก ์ฐจ์ด(observability gap)๋กœ ์ธํ•ด ํ•™์ƒ ์ •์ฑ…์ด ๋ถ€๋ถ„ ๊ด€์ธก ํ•˜์—์„œ ๊ต์‚ฌ์˜ ํ–‰๋™์„ ์ถฉ์‹คํžˆ ์žฌํ˜„ํ•˜์ง€ ๋ชปํ•˜๊ณ  ์ตœ์  ์„ฑ๋Šฅ์—์„œ ๋ฉ€์–ด์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ณด๊ณ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ถ๊ทน์ ์œผ๋กœ ์ด๋Ÿฌํ•œ ๊ฐ„๊ทน์€ ์ •์ฑ…์˜ ์•„ํ‚คํ…์ฒ˜๋‚˜ ์ž…๋ ฅ๊ณต๊ฐ„์„ ์•„๋ฌด๋ฆฌ ๊ฐœ์„ ํ•ด๋„, ๊ต์‚ฌ๊ฐ€ ํ•™์Šตํ•œ ํ–‰๋™ ์ „๋žต ์ž์ฒด๊ฐ€ ์‹œ๊ฐ ์„ผ์„œ์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ•˜๋Š” ๊ตฌ์กฐ์  ํ•œ๊ณ„์ž…๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ๊ธฐ์กด ๋‹จ๊ณ„๋ณ„ ๋˜๋Š” ๋ชจ๋ฐฉ ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์€ ํ•™์Šต ์•ˆ์ •์„ฑ๊ณผ ์šฉ์ด์„ฑ ์ธก๋ฉด์—์„œ๋Š” ์œ ๋ฆฌํ•˜์ง€๋งŒ, ์‹œ๊ฐ ์ •๋ณด์— ํŠนํ™”๋œ ๋Šฅ๋™์  ํ–‰๋™(์˜ˆ: ์‹œ์•ผ๋ฅผ ํ™•๋ณดํ•˜๊ธฐ ์œ„ํ•œ ํŒ” ๋™์ž‘ ๋“ฑ)์˜ ๋ฐœํ˜„์ด ์–ด๋ ต๊ณ  ์„ฑ๋Šฅ ์ƒํ•œ์ด ์กด์žฌํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งฅ๋ฝ์—์„œ, ๋กœ๋ด‡์ด ์นด๋ฉ”๋ผ ์˜์ƒ๋ถ€ํ„ฐ ์ œ์–ด ๋ช…๋ น๊นŒ์ง€ ๋ชจ๋“  ๊ณผ์ •์„ ํ†ตํ•ฉ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ์—”๋“œํˆฌ์—”๋“œ ๊ฐ•ํ™”ํ•™์Šต(end-to-end RL) ๋ฐฉ์‹์€ ์ด๋ก ์ ์œผ๋กœ ๊ฐ€์žฅ ์ด์ƒ์ ์ธ ์ ‘๊ทผ์œผ๋กœ ์—ฌ๊ฒจ์ง‘๋‹ˆ๋‹ค. ์—”๋“œํˆฌ์—”๋“œ RL์€ ์„ผ์„œ ์ž…๋ ฅ๋งŒ์œผ๋กœ ์ง์ ‘ ์ •์ฑ…์„ ํ›ˆ๋ จํ•˜๋ฏ€๋กœ ๊ด€์ธก ์ฐจ์ด๊ฐ€ ์—†๊ณ , ๋กœ๋ด‡์ด ์ž์‹ ์˜ ์„ผ์„œ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์— ๋งž๋Š” ์ตœ์  ํ–‰๋™ ์ „๋žต์„ ์Šค์Šค๋กœ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ž ์žฌ๋ ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ฐ˜๋Œ€๋กœ, ์ด ์ ‘๊ทผ์€ ๋ง‰๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋Ÿ‰๊ณผ ๊ณ„์‚ฐ ์ž์›์„ ํ•„์š”๋กœ ํ•˜๋ฉฐ ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•˜๋‹ค๋Š” ํ˜„์‹ค์ ์ธ ์–ด๋ ค์›€ ๋•Œ๋ฌธ์—, ๊ทธ๋™์•ˆ ๊ณ ๋‚œ์ด๋„ ๋‹ค์ง€ ๊ทธ๋ฆฌํ•‘์ฒ˜๋Ÿผ ๋ณต์žกํ•œ ์ž‘์—…์—๋Š” ์ ์šฉ๋œ ์ ์ด ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ๊ธฐ์กด์—๋Š” ๋‹จ์ˆœํ•œ ๊ทธ๋ฆฝ(grasping) ์ž‘์—…์— ์ œํ•œ์ ์œผ๋กœ ์—”๋“œํˆฌ์—”๋“œ RL์ด ์‹œ๋„๋˜์—ˆ์„ ๋ฟ์ด๊ณ , ๋‹ค๊ด€์ ˆ ๋กœ๋ด‡ ์†์ด ํฌํ•จ๋œ ๋ณต์žกํ•œ ์กฐ์ž‘ ์ž‘์—…์— ์ง์ ‘ ํ”ฝ์…€ ๋‹จ์œ„๋กœ RL์„ ์ ์šฉํ•ด ์„ฑ๊ณตํ•œ ์‚ฌ๋ก€๋Š” ์ „๋ฌดํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๋ฆฌ๋ทฐ์—์„œ๋Š” โ€œEnd-to-end RL Improves Dexterous Grasping Policiesโ€ (Ritvik Singh ์™ธ, 2025) ๋…ผ๋ฌธ์˜ ๋‚ด์šฉ์„ ์‹ฌ์ธต ๋ถ„์„ํ•˜์—ฌ, ์ด ์—ฐ๊ตฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ์—”๋“œํˆฌ์—”๋“œ RL๋กœ ๋‹ค์ง€ ๋กœ๋ด‡ ์† ๊ทธ๋ฆฌํ•‘ ์ •์ฑ…์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ๋Š”์ง€๋ฅผ ์กฐ๋งํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ํ˜์‹ ์„ ์ค‘์‹ฌ์œผ๋กœ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋Œ€๋น„ ์–ด๋–ค ๊ฐœ์„ ์„ ์ด๋ฃจ์—ˆ๋Š”์ง€, ์ƒˆ๋กœ์šด ๊ตฌ์กฐ/๊ธฐ์ˆ  ์š”์†Œ๋Š” ๋ฌด์—‡์ธ์ง€, ํ•™์Šต ์„ค์ •๊ณผ ์ •์ฑ… ๊ตฌ์กฐ๋Š” ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜์—ˆ๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ค ์‹คํ—˜ ํ™˜๊ฒฝ๊ณผ ๋ฒค์น˜๋งˆํฌ๋กœ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ณ  ์–ด๋–ค ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋Š”์ง€๋ฅผ ๊ตฌ์ฒด์ ์œผ๋กœ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์•„์šธ๋Ÿฌ ์—”๋“œํˆฌ์—”๋“œ RL ์ ‘๊ทผ๋ฒ•์ด ์ „ํ†ต์ ์ธ ๋‹จ๊ณ„๋ณ„ RL ๋˜๋Š” ๋ชจ๋ฐฉํ•™์Šต ๋ฐฉ์‹๊ณผ ๋Œ€๋น„ํ•˜์—ฌ ๊ฐ€์ง€๋Š” ์žฅ๋‹จ์ ๋„ ํ•จ๊ป˜ ๋…ผ์˜ํ•˜์—ฌ, ์‹ค๋ฌด ๋กœ๋ด‡ ์—”์ง€๋‹ˆ์–ด์—๊ฒŒ ์‹ค์ œ ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ ์‹œ ์œ ์šฉํ•œ ํ†ต์ฐฐ์„ ์ œ๊ณตํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ๊ธฐ์—ฌ ๋ฐ ํ˜์‹  ์š”์•ฝ

๋…ผ๋ฌธ์—์„œ ์ €์ž๋“ค์€ ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ๋กœ๋ด‡ ์†์˜ ์—”๋“œํˆฌ์—”๋“œ ํ•™์Šต์„ ํ†ตํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ตœ์ดˆ๋กœ ์‹œํ˜„ํ•˜๋ฉฐ, ์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ๊ธฐ์—ฌ๋ฅผ ๊ฐ•์กฐํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค:

์—”๋“œํˆฌ์—”๋“œ RL์„ ํ™œ์šฉํ•œ ์ตœ์ดˆ์˜ ๋‹ค์ง€ ๋กœ๋ด‡ ์† Sim-to-Real ์„ฑ๊ณต ์‚ฌ๋ก€: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•œ ํ”ฝ์…€-ํˆฌ-์•ก์…˜ ์ •์ฑ…์„ ๋‹ค์ง€ ๋กœ๋ด‡ ์†์— ์ด์‹ํ•˜์—ฌ ํ˜„์‹ค์—์„œ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ธฐ์กด์— ์—†๋˜ ๊ฒƒ์œผ๋กœ, ๋ฉ€ํ‹ฐํ•‘๊ฑฐ ํ•ธ๋“œ์˜ ์„ฌ์„ธํ•œ ์กฐ์ž‘์— ์—”๋“œํˆฌ์—”๋“œ RL์„ ์ ์šฉํ•œ ์ฒซ ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์‹ค์ œ ๋กœ๋ด‡์œผ๋กœ์˜ ์ด์ „(sim-to-real)๊นŒ์ง€ ๋‹ฌ์„ฑํ•จ์œผ๋กœ์จ, ์—”๋“œํˆฌ์—”๋“œ ํ•™์Šต์˜ ์‹ค์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๋Œ€๊ทœ๋ชจ ๋น„์ „ ๊ธฐ๋ฐ˜ RL ํ•™์Šต์„ ๊ฐ€๋Šฅ์ผ€ ํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ธํ”„๋ผ ๊ฐœ์„ : ์ด๋ฏธ์ง€ ์ž…๋ ฅ์„ ์‚ฌ์šฉํ•˜๋Š” RL์˜ ๋ณ‘๋ ฌ ํ•™์Šต ํšจ์œจ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด, ์ƒˆ๋กœ์šด ๋‹ค์ค‘ GPU ํ™œ์šฉ ๊ธฐ๋ฒ•์ธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-ํ•™์Šต๊ธฐ ๋ถ„๋ฆฌ(disaggregated simulation)๋ฅผ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. ๋™์ผํ•œ ํ•˜๋“œ์›จ์–ด์—์„œ ๊ธฐ์กด ๋Œ€๋น„ ๋ฐฐ์น˜ ๊ทœ๋ชจ(๋™์‹œ ํ™˜๊ฒฝ ์ˆ˜)๋ฅผ 2๋ฐฐ ์ด์ƒ ๋Š˜๋ ค ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์ด์ „์—๋Š” ํ•™์Šต์ด ์–ด๋ ค์› ๋˜ ๊ณ ํ•ด์ƒ๋„ ๋น„์ „ RL๋„ ์„ฑ๊ณผ๋ฅผ ๋‚ด๋Š” ๋ฐ ์„ฑ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค.

๋น„์ „ ๊ธฐ๋ฐ˜ ๊ทธ๋ฆฌํ•‘์˜ ์ตœ์‹  ์„ฑ๋Šฅ ๊ฒฝ์‹ : ์ œ์•ˆํ•œ ์—”๋“œํˆฌ์—”๋“œ ์ •์ฑ…์€ ๊ธฐ์กด ์ตœ๊ณ  ์„ฑ๋Šฅ ๋Œ€๋น„ ํ˜„์‹ค ์„ธ๊ณ„ ๊ทธ๋ฆฌํ•‘ ์„ฑ๊ณต๋ฅ ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋™์ผํ•œ ํ‰๊ฐ€ ๊ธฐ์ค€์—์„œ ์ด์ „์˜ ์ƒํƒœ๊ธฐ๋ฐ˜โ†’๋น„์ „ ์ฆ๋ฅ˜ ๋ฐฉ์‹๋ณด๋‹ค ์›”๋“ฑํ•œ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•จ์œผ๋กœ์จ, ๊ด€์ธก ์ •๋ณด ๊ฒฉ์ฐจ๋ฅผ ํ•ด์†Œํ•œ ์—”๋“œํˆฌ์—”๋“œ ํ•™์Šต์˜ ์šฐ์ˆ˜์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ–ฅํ›„ ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ์กฐ์ž‘ ์ •์ฑ… ๊ฐœ๋ฐœ์— ์žˆ์–ด ์ค‘์š”ํ•œ ์„ฑ๊ณผ๋กœ ํ‰๊ฐ€๋ฉ๋‹ˆ๋‹ค.

์ดํ•˜์—์„œ๋Š” ์ƒ๊ธฐ์˜ ํ˜์‹ ๋“ค์„ ๋’ท๋ฐ›์นจํ•˜๋Š” ๊ธฐ๋ฒ•๊ณผ ์‹คํ—˜์  ๋ฐœ๊ฒฌ๋“ค์„ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ํ™˜๊ฒฝ ๋ฐ ์ •์ฑ… ๊ตฌ์กฐ

์‹คํ—˜ ํ™˜๊ฒฝ์€ UC Berkeley์™€ NVIDIA์—์„œ ๊ฐœ๋ฐœํ•œ DextrAH (Dexterous Arm-Hand) ํ™˜๊ฒฝ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋กœ๋ด‡์€ 7์ž์œ ๋„ KUKA iiwa ํŒ”๊ณผ 16์ž์œ ๋„ Allegro V4 ๋กœ๋ด‡ ์†์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์™€ ํฌ๊ธฐ์˜ ๋ฌผ์ฒด๋ฅผ ์ง‘์–ด ์˜ฌ๋ฆฌ๋Š” ๊ทธ๋ฆฝ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์‹œ Visual Dexterity ๋ฐ์ดํ„ฐ์…‹์— ํฌํ•จ๋œ 140์ข…์˜ ๋ฌผ์ฒด ๋ชจ๋ธ์ด ์‚ฌ์šฉ๋˜์—ˆ์œผ๋ฉฐ, ์—์ด์ „ํŠธ๋Š” ๋ฌด์ž‘์œ„๋กœ ์„ ์ •๋œ ๋ฌผ์ฒด๋ฅผ ํ…Œ์ด๋ธ” ์œ„์—์„œ ํŒŒ์ง€ํ•˜์—ฌ ๋“ค์–ด์˜ฌ๋ฆฌ๋Š” ๊ณผ์ œ๋ฅผ ๋ฐ˜๋ณต ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋กœ ํ›ˆ๋ จํ•จ์œผ๋กœ์จ, ์ •์ฑ…์ด ์ผ๋ฐ˜ํ™”๋œ ๊ทธ๋ฆฌํ•‘ ๋Šฅ๋ ฅ์„ ํ•™์Šตํ•˜๋„๋ก ์œ ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•œ ์ •์ฑ…์„ ์‹ค์ œ ๋กœ๋ด‡์— ์ด์‹ํ•˜๊ธฐ ์œ„ํ•ด, ๋„๋ฉ”์ธ ๋žœ๋คํ™”(domain randomization) ๊ธฐ๋ฒ•์ด ์ ๊ทน ํ™œ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ƒ์—์„œ ์ค‘๋ ฅ, ๋งˆ์ฐฐ๊ณ„์ˆ˜, ๊ด€์ ˆ ๋งˆ์ฐฐ/๊ฐ•์„ฑ/๊ฐ์‡ , ๋ฌผ์ฒด ์งˆ๋Ÿ‰ ๋“ฑ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ›ˆ๋ จ ์ดˆ๊ธฐ์— ๋ฒ”์œ„๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์„ค์ •ํ•˜๊ณ  ์ ์ฐจ ํ™•๋Œ€ํ•˜๋Š” Automatic Domain Randomization (ADR)์„ ์ ์šฉํ•˜์—ฌ, ์ดˆ๊ธฐ์—๋Š” ๋น„๊ต์  ์‰ฌ์šด ๋™์—ญํ•™ ์กฐ๊ฑด์—์„œ ํ•™์Šตํ•˜๋‹ค๊ฐ€ ์ ์ง„์ ์œผ๋กœ ํ˜„์‹ค๊ณผ ๋น„์Šทํ•œ ๋‚œ์ด๋„๋กœ ๋†’์•„์ง€๋„๋ก ์ปค๋ฆฌํ˜๋Ÿผ์„ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ADR ๊ธฐ๋ฒ•์€ OpenAI ๋“ฑ์˜ ์„ ํ–‰์—ฐ๊ตฌ์—์„œ ์ž…์ฆ๋œ ๋ฐฉ๋ฒ•์œผ๋กœ, ์ดˆ๊ธฐ ํ•™์Šต ์•ˆ์ •์„ฑ์„ ๋ณด์žฅํ•˜๋ฉด์„œ ์ตœ์ข… ์ •์ฑ…์˜ ํ˜„์‹ค ์ ์‘๋ ฅ(robustness)์„ ๋†’์ด๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์ •์ฑ… ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ณ ์ฐจ์› ์ด๋ฏธ์ง€ ์ž…๋ ฅ๊ณผ ๋กœ๋ด‡ ์ž์ฒด ์ƒํƒœ๋ฅผ ํ•จ๊ป˜ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์‹ ์ค‘ํžˆ ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์šฐ์„  ์‹œ๊ฐ ์ž…๋ ฅ์œผ๋กœ ๊นŠ์ด ์˜์ƒ(depth map)์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ํ•ด์ƒ๋„๋Š” ์‹คํ—˜์— ๋”ฐ๋ผ 160\times120 ๋˜๋Š” 320\times240 ๋“ฑ์œผ๋กœ ์„ค์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๊นŠ์ด ์˜์ƒ์€ 4๊ณ„์ธต ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN)์— ํ†ต๊ณผ๋˜์–ด ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต ๋’ค์—๋Š” Layer Normalization๊ณผ ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜๊ฐ€ ์ ์šฉ๋˜๊ณ , ๋งˆ์ง€๋ง‰ ๊ณ„์ธต ์ถœ๋ ฅ์€ 32์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ์••์ถ•๋ฉ๋‹ˆ๋‹ค. ํ•œํŽธ ๋กœ๋ด‡์˜ ๊ด€์ ˆ ๊ฐ๋„, ์†๋„ ๋“ฑ์˜ ํ”„๋กœํ”„๋ฆฌ์˜ค์…‰์…˜(proprioception) ์ •๋ณด๋„ ์ •์ฑ…์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ์ด๋Š” CNN์—์„œ ์ถ”์ถœ๋œ ์‹œ๊ฐ ์ž„๋ฒ ๋”ฉ๊ณผ ์—ฐ๊ฒฐ(concatenate)๋˜์–ด ์ดํ›„ ๊ณ„์ธต์œผ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.

์‹œ๊ฐ+์ƒํƒœ ๊ฒฐํ•ฉ ํŠน์„ฑ์€ 2๊ณ„์ธต LSTM(์žฅ๋‹จ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ ๋„คํŠธ์›Œํฌ)์— ์ž…๋ ฅ๋ฉ๋‹ˆ๋‹ค. LSTM์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ์ •์ฑ…์ด ์‹œ๊ฐ„์ƒ์˜ ์—ฐ์†์ ์ธ ์ •๋ณด(์˜ˆ: ๋ฌผ์ฒด ์ด๋™, ๋กœ๋ด‡ ๋™์ž‘ ์ด๋ ฅ)๋ฅผ ๊ธฐ์–ตํ•˜๊ณ  ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. LSTM์˜ ์€๋‹‰์ƒํƒœ ์ถœ๋ ฅ์€ ๋‹ค์‹œ 3๊ณ„์ธต ์™„์ „์—ฐ๊ฒฐ๋ง(MLP)์œผ๋กœ ์ „๋‹ฌ๋˜์–ด ์ตœ์ข…์ ์œผ๋กœ ๋กœ๋ด‡์˜ ํ–‰๋™ ์ถœ๋ ฅ(๊ด€์ ˆ ์†๋„ ๋ช…๋ น ๋“ฑ)์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. (๋น„์Šทํ•œ ๊ตฌ์กฐ์˜ ํฌ๋ฆฌํ‹ฑ ๋„คํŠธ์›Œํฌ๋„ ๊ณต์œ  ๋˜๋Š” ์œ ์‚ฌํ•œ ๋ฐฉ์‹์œผ๋กœ ์„ค๊ณ„๋˜์—ˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.) ์ €์ž๋“ค์€ ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๊ฐ€ DeepMind์˜ IMPALA ์—์ด์ „ํŠธ ๊ตฌ์กฐ์™€ ๋น„๊ตํ•ด LSTM์˜ ์œ„์น˜๊ฐ€ ๋‹ค๋ฅด๋ฉฐ, CNN ํŠน์ง• ํ›„ ํ”„๋กœํ”„๋ฆฌ์˜ค์…‰์…˜๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ LSTM์— ๋„ฃ๋Š” ๊ตฌ์„ฑ์ด ํšจ๊ณผ์ ์ด์—ˆ๋‹ค๊ณ  ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์š”์ปจ๋Œ€, ์ด๋ฏธ์ง€ โ†’ CNN โ†’ ์ €์ฐจ์› ์ž„๋ฒ ๋”ฉ + ์ƒํƒœ โ†’ LSTM โ†’ MLP์˜ ์ˆœ์„œ๋กœ ์ง„ํ–‰๋˜๋Š” ์ •์ฑ… ๋„คํŠธ์›Œํฌ๋ฅผ ์ฑ„ํƒํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค (์•„๋ž˜ ๊ทธ๋ฆผ ์ฐธ๊ณ ).

์ด๋•Œ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ๋Š” ๋Œ€ํ‘œ์ ์ธ Proximal Policy Optimization (PPO)์ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. PPO๋Š” Actor-Critic ๊ณ„์—ด์˜ ์ตœ์‹  ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋น„๊ต์  ์•ˆ์ •์ ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜์—ฌ ๋กœ๋ด‡ ์ œ์–ด ๋ฌธ์ œ์— ๋„๋ฆฌ ์“ฐ์ž…๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ PPO์˜ ํ•™์Šต ์‹ ํ˜ธ๊ฐ€ ์ถฉ๋ถ„ํžˆ ์•ˆ์ •๋˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋Œ€๊ทœ๋ชจ ๋ฐฐ์น˜(batch)๊ฐ€ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ˆ˜์ฒœ ๊ฐœ ์ด์ƒ์˜ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์„ ๋™์‹œ์— ์‹คํ–‰ํ•˜์—ฌ ํ•œ ๋ฒˆ์— ๋งŽ์€ ์–‘์˜ ์ƒํƒœ-ํ–‰๋™-๋ณด์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  PPO ์—…๋ฐ์ดํŠธ์— ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ ๊ณ ์ฐจ์› ์ž…๋ ฅ์œผ๋กœ ์ธํ•œ ์žก์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ •์ฑ…์ด ์˜๋ฏธ ์žˆ๋Š” ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋„๋ก ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ •๋ฆฌํ•˜๋ฉด, ๋ณธ ์—ฐ๊ตฌ์˜ ์ •์ฑ… ํ•™์Šต ์„ค์ •์€ โ€œ์ด๋ฏธ์ง€(CNN)-์ƒํƒœ ํ†ตํ•ฉ(LSTM)-PPOโ€๋กœ ์š”์•ฝ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ์‹œ๊ฐ์ •๋ณด์™€ ๋กœ๋ด‡ ๋‚ด๋ถ€์ƒํƒœ๋ฅผ ํ•จ๊ป˜ ๊ณ ๋ คํ•˜๋Š” ๋น„์ฃผ๋ชจํ„ฐ ์ •์ฑ…(visuomotor policy) ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ, ์ด ์ •์ฑ…์„ ์–ด๋–ค ๋ฐฉ์‹์œผ๋กœ ์—”๋“œํˆฌ์—”๋“œ RL๋กœ ํ•™์Šต์‹œ์ผฐ๋Š”์ง€ ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ป๊ฒŒ ํšจ์œจ์„ ๊ทน๋Œ€ํ™”ํ–ˆ๋Š”์ง€๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜์  ์ ‘๊ทผ: ์—”๋“œํˆฌ์—”๋“œ RL ํ•™์Šต ์ „๋žต

1. ๊นŠ์ด ์ž…๋ ฅ ๊ธฐ๋ฐ˜ ์—”๋“œํˆฌ์—”๋“œ RL ๋ฐ RGB ์ฆ๋ฅ˜ ํŒŒ์ดํ”„๋ผ์ธ

์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ, ์—”๋“œํˆฌ์—”๋“œ ์‹œ๊ฐ RL์˜ ๊ฐ€์žฅ ํฐ ๊ฑธ๋ฆผ๋Œ์€ ๋ง‰๋Œ€ํ•œ ๊ณ„์‚ฐ๋Ÿ‰์ž…๋‹ˆ๋‹ค. ํŠนํžˆ RGB ์นด๋ฉ”๋ผ ์˜์ƒ์œผ๋กœ ์ง์ ‘ RL์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ์‚ฌ์‹ค์ ์ธ ๊ด‘์›/์žฌ์งˆ ๋ Œ๋”๋ง๊นŒ์ง€ ํ•„์š”ํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ๋งค์šฐ ๋А๋ ค์ง‘๋‹ˆ๋‹ค. ์ด์— ์ €์ž๋“ค์€ ํ˜„์‹ค์ ์ธ ํƒ€ํ˜‘์•ˆ์œผ๋กœ โ€œ๊นŠ์ด ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ RL + ์ดํ›„ RGB๋กœ์˜ ์ •์ฑ… ์ฆ๋ฅ˜โ€๋ผ๋Š” 2๋‹จ๊ณ„ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์„ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. ํ›ˆ๋ จ ๋‹จ๊ณ„์—์„œ๋Š” ๋ฌผ์ฒด์™€ ํ™˜๊ฒฝ์„ ๋‹จ์ˆœํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๋Š” ๊นŠ์ด ๋งต์„ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ, ๋น„๊ต์  ๋ Œ๋”๋ง ๋ถ€ํ•˜๊ฐ€ ๋‚ฎ์€ ์กฐ๊ฑด์—์„œ ์—”๋“œํˆฌ์—”๋“œ RL์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์–ป์€ ๊นŠ์ด ๊ธฐ๋ฐ˜ ๊ต์‚ฌ ์ •์ฑ…(teacher policy)์„ ์ดํ›„ ์Šคํ…Œ๋ ˆ์˜ค RGB ์นด๋ฉ”๋ผ ์ž…๋ ฅ ๊ธฐ๋ฐ˜ ํ•™์ƒ ์ •์ฑ…์œผ๋กœ ์ฆ๋ฅ˜(distillation)ํ•˜์—ฌ, ์ตœ์ข…์ ์œผ๋กœ ํ˜„์‹ค ๋กœ๋ด‡์— ํˆฌ์ž…ํ•  RGB ์ •์ฑ…์„ ์–ป๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ํ•œ๋งˆ๋””๋กœ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ๋Š” ๊นŠ์ด ์„ผ์„œ๋กœ ํ•™์Šตํ•˜๊ณ , ํ˜„์‹ค์—์„œ๋Š” ์Šคํ…Œ๋ ˆ์˜ค ์นด๋ฉ”๋ผ๋กœ ๋™์ž‘ํ•˜๋„๋ก ๋‘ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ์ ‘๊ทผ์˜ ํ•ต์‹ฌ ์žฅ์ ์€ ๊ต์‚ฌ์™€ ํ•™์ƒ ๊ฐ„ ๊ด€์ธก ๊ฒฉ์ฐจ๋ฅผ ํฌ๊ฒŒ ์ค„์˜€๋‹ค๋Š” ๋ฐ ์žˆ์Šต๋‹ˆ๋‹ค. ๊นŠ์ด ์˜์ƒ๊ณผ ์Šคํ…Œ๋ ˆ์˜ค RGB ์˜์ƒ์€ ๋‘˜ ๋‹ค ๋กœ๋ด‡์˜ ์‹œ๊ฐ ์„ผ์„œ ๋ฐ์ดํ„ฐ๋กœ์„œ, ์ „์ž๋Š” ํ›„์ž๋กœ๋ถ€ํ„ฐ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ๋ณต์› ๊ฐ€๋Šฅํ•œ ์ •๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์‹ค์ƒ ๋™๋“ฑํ•œ ์ˆ˜์ค€์˜ ํ™˜๊ฒฝ ๊ด€์ธก์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๊นŠ์ด๋กœ ํ•™์Šตํ•œ ์ •์ฑ…์ด ์ทจํ•œ ํ–‰๋™์€ ์Šคํ…Œ๋ ˆ์˜ค ์นด๋ฉ”๋ผ๋กœ๋„ ์ถฉ๋ถ„ํžˆ ๋ชจ๋ฐฉ ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ, ๋‘ ์ž…๋ ฅ ์‚ฌ์ด์— ์ด๋ก ์  ์ •๋ณด ์ฐจ์ด๊ฐ€ ์—†๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์•ž์„œ ์ง€์ ํ•œ ์ƒํƒœ ๊ธฐ๋ฐ˜ ๊ต์‚ฌ vs ๋น„์ „ ํ•™์ƒ ๋ฌธ์ œ์™€ ๋Œ€์กฐ์ ์ž…๋‹ˆ๋‹ค. ์ƒํƒœ ๊ต์‚ฌ๋Š” ๋ฌผ์ฒด์˜ 3D ์œ„์น˜ ๋“ฑ ํ•™์ƒ์—๊ฒŒ ๋ณด์ด์ง€ ์•Š๋Š” ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋ฏ€๋กœ ํ•™์ƒ์ด ์™„์ „ํžˆ ๋”ฐ๋ผํ•˜๊ธฐ ์–ด๋ ต์ง€๋งŒ, ๊นŠ์ด ๊ต์‚ฌ๋Š” ์• ์ดˆ์— ์‹œ๊ฐ์  ์ œ์•ฝ ํ•˜์—์„œ ์ตœ์ ํ™”๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ํ•™์ƒ๋„ ํ•ด๋‹น ํ–‰๋™์„ ๊ทธ๋Œ€๋กœ ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ ๊ฒฐ๊ณผ๋กœ๋„, ๊นŠ์ด ๊ต์‚ฌ โ†’ RGB ํ•™์ƒ ์ฆ๋ฅ˜์˜ ๊ฒฝ์šฐ ํ•™์ƒ ์ •์ฑ…์˜ ์„ฑ๋Šฅ์ด ์ƒํƒœ ๊ต์‚ฌ โ†’ RGB ํ•™์ƒ ์ฆ๋ฅ˜ ๋Œ€๋น„ ํฌ๊ฒŒ ํ–ฅ์ƒ๋จ์„ ๋…ผ๋ฌธ์€ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค (๋‹ค์Œ ์„น์…˜ ์‹คํ—˜ ๊ฒฐ๊ณผ ์ฐธ์กฐ).

๋˜ ํ•œ ๊ฐ€์ง€ ์ด์ ์€, ๊นŠ์ด ๊ธฐ๋ฐ˜ RL ๊ต์‚ฌ๋Š” ์ž์ฒด๊ฐ€ ์—”๋“œํˆฌ์—”๋“œ ๋น„์ „ ์ •์ฑ…์ด๋ฏ€๋กœ ์ด๋ฏธ ์‹œ๊ฐ์ ์œผ๋กœ ํ•ฉ๋ฆฌ์ ์ธ ํ–‰๋™ ์ „๋žต์„ ๋‚ด์žฌํ™”ํ•˜๊ณ  ์žˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์•ž์„  ์˜ˆ์‹œ์ฒ˜๋Ÿผ ํŒ”์ด ์‹œ์•ผ๋ฅผ ๊ฐ€๋ฆด ๊ฒฝ์šฐ, ๊นŠ์ด ๊ต์‚ฌ ์ •์ฑ…์€ ์„ฑ๊ณต์ ์œผ๋กœ ํ•™์Šต๋˜์—ˆ๋‹ค๋ฉด ํŒ”์„ ์˜ฎ๊ฒจ ๋ฌผ์ฒด๋ฅผ ๋ณด๋ฉด์„œ ์žก๋Š” ๋ฒ•์„ ์Šค์Šค๋กœ ์ตํ˜”์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ์ƒํƒœ ๊ต์‚ฌ๋Š” ์ฒ˜์Œ๋ถ€ํ„ฐ ๊ทธ๋Ÿด ํ•„์š”๊ฐ€ ์—†์—ˆ๊ธฐ์— ๊ทธ๋Ÿฐ ํ–‰๋™์„ ๋ฐฐ์šฐ์ง€ ๋ชปํ•˜์ฃ . ๋”ฐ๋ผ์„œ ๊นŠ์ด ๊ต์‚ฌ์—๊ฒŒ์„œ ๋ฐฐ์šด ํ•™์ƒ ์ •์ฑ…์€ ์‹œ๊ฐ์  ํ”ผ๋“œ๋ฐฑ์„ ์ ๊ทน ํ™œ์šฉํ•˜๋Š” ํ–‰๋™์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์‹ค์ œ ๋…ผ๋ฌธ์—์„œ๋„ ๊นŠ์ด ๊ต์‚ฌ๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•œ ํ•™์ƒ์ด ๋ฌผ์ฒด๋ฅผ ๊ฐ€๋ฆฌ์ง€ ์•Š๋„๋ก ์†๋ชฉ ๊ฐ๋„๋ฅผ ์กฐ์ ˆํ•œ๋‹ค๋“ ์ง€, ์‹œ์•ผ์— ๋ฌผ์ฒด๋ฅผ ์˜ค๋ž˜ ์œ ์ง€ํ•˜๋ฉด์„œ ์กฐ์ž‘ํ•˜๋Š” ๋“ฑ ๋Šฅ๋™์  ์‹œ๊ฐ ํ–‰๋™์„ ๋” ์ž˜ ์ˆ˜ํ–‰ํ•จ์„ ๊ด€์ฐฐํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ •๋Ÿ‰์ ์œผ๋กœ๋„ ๋น„์ „ ๊ต์‚ฌ๋กœ ํ•™์Šตํ•œ ํ•™์ƒ์ด ์ƒํƒœ ๊ต์‚ฌ ๋Œ€๋น„ ๋” ๋†’์€ ์—ฐ์† ์„ฑ๊ณต๋ฅ ์„ ๋ณด์ด๋Š”๋ฐ, ์ด๋Š” ๊ต์‚ฌ-ํ•™์ƒ ๊ฐ„ ์ •๋ณด ๋น„๋Œ€์นญ์ด ์ ์–ด ํ•™์ƒ ์ •์ฑ…์ด ์ž๊ธฐ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ(์นด๋ฉ”๋ผ)์— ๋ณด๋‹ค ์ตœ์ ํ™”๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์œผ๋กœ ๋ถ„์„๋ฉ๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ, โ€œ๊นŠ์ด๋กœ RL ํ•™์Šต + RGB๋กœ ์ง€์‹ ์ „๋‹ฌโ€์ด๋ผ๋Š” ์ „๋žต์€ ์—”๋“œํˆฌ์—”๋“œ RL์˜ ์žฅ์ ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ํ˜„์‹ค ์ ์šฉ์„ ์œ„ํ•œ ํšจ์œจ์„ฑ๊ณผ ์„ฑ๋Šฅ์„ ๋ชจ๋‘ ์žก์€ ์ฃผ์š” ๊ธฐ์ˆ  ํ˜์‹ ์ด๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹ ๋•๋ถ„์— ์—ฐ๊ตฌ์ง„์€ ์ƒ๋Œ€์ ์œผ๋กœ ์ ์€ ํ•˜๋“œ์›จ์–ด๋กœ๋„ ์™„์ „ํ•œ RGB ํ”ฝ์…€-ํˆฌ-์•ก์…˜ ์ •์ฑ…์„ ํš๋“ํ•  ์ˆ˜ ์žˆ์—ˆ๊ณ , ์‹คํ—˜์ ์œผ๋กœ ๊ด€์ธก ๊ฒฉ์ฐจ ๋ฌธ์ œ๋ฅผ ํ•ด์†Œํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

2. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-ํ•™์Šต๊ธฐ ๋ถ„ํ• ์„ ํ†ตํ•œ ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌํ™”

์—”๋“œํˆฌ์—”๋“œ ๋น„์ „ ๊ฐ•ํ™”ํ•™์Šต์˜ ๋˜ ๋‹ค๋ฅธ ๋„์ „์€, ์ถฉ๋ถ„ํ•œ ๋ฐฐ์น˜ ํฌ๊ธฐ(๋ณ‘๋ ฌ ํ™˜๊ฒฝ ์ˆ˜)๋ฅผ ํ™•๋ณดํ•˜์ง€ ๋ชปํ•˜๋ฉด PPO ๋“ฑ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ œ๋Œ€๋กœ ์ˆ˜๋ ดํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ ์ •์ฑ…์€ ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ๊ณ  ์‹ ํ˜ธ ๋Œ€๋น„ ์žก์Œ ๋น„์œจ์ด ๋‚ฎ๊ธฐ ๋•Œ๋ฌธ์—, ์ˆ˜๋งŽ์€ ์—ํ”ผ์†Œ๋“œ๋กœ๋ถ€ํ„ฐ ํ‰๊ท ์ ์ธ ํ•™์Šต ์‹ ํ˜ธ๋ฅผ ๋ชจ์•„๋ณผ ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ˆ˜์ฒœ ๊ฐœ ์ˆ˜์ค€์˜ ํ™˜๊ฒฝ์„ ๋™์‹œ ๊ตฌ๋™ํ•˜๋Š” ๋ณ‘๋ ฌํ™”๊ฐ€ ํ•„์ˆ˜์ ์ธ๋ฐ, GPU ๊ฐ€์† ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์ด ๋ณ‘๋ชฉ์ด ๋˜์–ด ํ™˜๊ฒฝ ์ˆ˜๋ฅผ ๋งˆ์Œ๊ป ๋Š˜๋ฆฌ๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฐฉ์‹๋Œ€๋กœ๋ผ๋ฉด, ์˜ˆ๋ฅผ ๋“ค์–ด 4๊ฐœ์˜ GPU๋กœ ํ•™์Šตํ•  ๊ฒฝ์šฐ ๊ฐ GPU๊ฐ€ ๋˜‘๊ฐ™์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ธ์Šคํ„ด์Šค๋ฅผ ์‹คํ–‰ํ•˜๋ฉฐ ๊ทธ ์•ˆ์— ๋‹ค์ˆ˜์˜ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์„ ๊ฐ–๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ๊ฐ GPU๋Š” ์ž์‹ ์˜ ํ™˜๊ฒฝ๋“ค์˜ ๋ Œ๋”๋ง ๋ฐ ๋ฌผ๋ฆฌ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๋™์‹œ์— RL ๊ฒฝํ—˜ ๋ฉ”๋ชจ๋ฆฌ์™€ ์ •์ฑ… ์‹ ๊ฒฝ๋ง๊นŒ์ง€ ๋ชจ๋‘ ๋กœ์ปฌ์— ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ชจ๋“  GPU์˜ ๊ทธ๋ผ๋””์–ธํŠธ๋ฅผ ๋ชจ์•„ ํ‰๊ท ๋‚ด๋ฉฐ ํ•™์Šตํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ๋ฐฉ์‹์ด ์ผ๋ฐ˜์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ์ ‘๊ทผ์€ ํฐ ๋ฉ”๋ชจ๋ฆฌ ๋‚ญ๋น„๋ฅผ ๋‚ณ์Šต๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ๋„ค GPU ๋ชจ๋‘ ๋™์ผํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฆฌ์†Œ์Šค(์˜ˆ: ๋ฌผ์ฒดยท๋กœ๋ด‡ 3D ๋ชจ๋ธ, ๋ฌผ๋ฆฌ์—”์ง„ ์ƒํƒœ ๋“ฑ ์ž์‚ฐ ์บ์‹œ)๋ฅผ ๊ฐ์ž ๋ฉ”๋ชจ๋ฆฌ์— ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ํ™˜๊ฒฝ ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋ฉด ๊ฐ GPU์—์„œ RL ๊ฒฝํ—˜ ๋ฒ„ํผ๋„ ์ปค์ง€๋Š”๋ฐ, ์ด๋ฏธ์ง€ ๊ด€์ธก์˜ ๊ฒฝ์šฐ ์ด ๋ฒ„ํผ๊ฐ€ ๊ธˆ๋ฐฉ ์ˆ˜GB๋ฅผ ์ฐจ์ง€ํ•˜์—ฌ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์••๋ฐ•ํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฐ€๋“ ์ฐจ๊ธฐ ์ „์— PPO๊ฐ€ ์š”๊ตฌํ•˜๋Š” ์ถฉ๋ถ„ํ•œ ํ™˜๊ฒฝ์„ ๋Œ๋ฆฌ์ง€ ๋ชปํ•˜๋Š” ์ƒํ™ฉ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์—์„œ๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด โ€œ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-ํ•™์Šต๊ธฐ ๋ถ„๋ฆฌ(Disaggregated Simulation and RL)โ€๋ผ๋Š” ์ƒˆ๋กœ์šด ๋‹ค์ค‘ GPU ๋ณ‘๋ ฌํ™” ์ „๋žต์„ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๊ฐ„๋‹จํžˆ ๋งํ•ด, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ „์šฉ GPU์™€ ํ•™์Šต ์ „์šฉ GPU๋ฅผ ๋ถ„๋ฆฌํ•จ์œผ๋กœ์จ ๊ฐ์ž์˜ ์—ญํ• ์— ์ตœ์ ํ™”๋œ ์ž์› ํ™œ์šฉ์„ ํ•˜์ž๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ 4๊ฐœ์˜ GPU๋ฅผ ๊ฐ€์ง„ ํ•œ ๋…ธ๋“œ์—์„œ, 3๊ฐœ GPU๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋งŒ ์ˆ˜ํ–‰ํ•˜๊ณ  ๋‚จ์€ 1๊ฐœ GPU๋Š” PPO ํ•™์Šต ๋ฐ ๊ฒฝํ—˜ ๋ฉ”๋ชจ๋ฆฌ ์ €์žฅ์—๋งŒ ์ง‘์ค‘ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์„ธ ๊ฐœ์˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ GPU๊ฐ€ ํ†ตํ•ฉ๋œ ํ•˜๋‚˜์˜ ๊ฑฐ๋Œ€ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์ฒ˜๋Ÿผ ๋™์ž‘ํ•˜๋ฉด์„œ, ์ด์ „๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ์€ ํ™˜๊ฒฝ์„ ๋™์‹œ์— ๋Œ๋ฆด ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์ด์ œ ๊ฐ GPU๊ฐ€ ๋ถˆํ•„์š”ํ•œ ์ค‘๋ณต ์ž‘์—…(์ •์ฑ… ๊ทธ๋ผ๋””์–ธํŠธ ๊ณ„์‚ฐ, ๊ฒฝํ—˜ ๋ฒ„ํผ ์ €์žฅ)์„ ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๋ฉ”๋ชจ๋ฆฌ ์—ฌ์œ ๋ถ„์„ ์ „๋ถ€ ํ™˜๊ฒฝ ์ˆ˜ ์ฆ๊ฐ€์— ํˆฌ์ž…ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ ํ•™์Šต๊ธฐ GPU๋Š” ํ™˜๊ฒฝ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์ „ํ˜€ ์•ˆ๊ณ  ๊ฐ€์ง€ ์•Š์œผ๋ฏ€๋กœ ์ •์ฑ… ์‹ ๊ฒฝ๋ง๊ณผ ๋Œ€์šฉ๋Ÿ‰ ๋ฒ„ํผ๋ฅผ ์ถฉ๋ถ„ํžˆ ๋ฉ”๋ชจ๋ฆฌ์— ์ ์žฌํ•œ ์ฑ„, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ GPU๋“ค๋กœ๋ถ€ํ„ฐ ์ƒํƒœ ์ „์†ก์„ ๋ฐ›์•„ PPO ์—…๋ฐ์ดํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ GPU๋“ค ๊ฐ„ ํ†ต์‹ ์€ ๋ณ‘๋ ฌ ๋น„๋™๊ธฐ๋กœ ์ด๋ฃจ์–ด์ง€๋ฉฐ, ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํ†ต์‹  ๋ฐ ๋™๊ธฐํ™” ์ ˆ์ฐจ๋ฅผ ์˜์‚ฌ์ฝ”๋“œ์™€ ๋‹ค์ด์–ด๊ทธ๋žจ์œผ๋กœ ์ƒ์„ธํžˆ ์ œ์‹œํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ด Disaggregated Simulation ๊ธฐ๋ฒ•์˜ ํšจ๊ณผ๋Š” ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค. ๋™์ผํ•œ 4ร—GPU ์žฅ๋น„์—์„œ, ์ „ํ†ต์  ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ๋ฐฉ์‹ ๋Œ€๋น„ 2๋ฐฐ ์ด์ƒ์˜ ๋™์‹œ ํ™˜๊ฒฝ ์ˆ˜ ์ฆ๊ฐ€๊ฐ€ ๋ณด๊ณ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ํ•ด์ƒ๋„ 160\times120์˜ ๊นŠ์ด ์ž…๋ ฅ์ผ ๊ฒฝ์šฐ ๊ธฐ์กด์—๋Š” GPU๋‹น 1024๊ฐœ ํ™˜๊ฒฝ(์ด 4096๊ฐœ)์„ ๋Œ๋ฆด ์ˆ˜ ์žˆ์—ˆ๋˜ ๊ฒƒ์ด, ๋ถ„๋ฆฌ ๋ฐฉ์‹์„ ์“ฐ๋ฉด GPU๋‹น 2800๊ฐœ๋กœ ๋Š˜์–ด๋‚˜ ์ด 8400๊ฐœ ํ™˜๊ฒฝ๊นŒ์ง€ ํ™•์žฅ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•ด์ƒ๋„๋ฅผ 320\times240๋กœ ๋†’์ธ ๊ฒฝ์šฐ๋„ ๊ธฐ์กด GPU๋‹น 256๊ฐœ(์ด 1024๊ฐœ)์—์„œ 700๊ฐœ(์ด 2100๊ฐœ)๋กœ ์•ฝ 2๋ฐฐ ์ด์ƒ ์ฆ๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ํ‘œ๋Š” ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋ฅผ ์š”์•ฝํ•œ ๊ฒƒ์œผ๋กœ, ๋™์ผํ•œ ํ•˜๋“œ์›จ์–ด์—์„œ ๋‹จ์ˆœ ๊ตฌ์กฐ ๋ณ€๊ฒฝ๋งŒ์œผ๋กœ ๋ณ‘๋ ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์šฉ๋Ÿ‰์„ ๋‘ ๋ฐฐ๋กœ ๋Š˜๋ ธ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค:

ํ‘œ 1. ์ž…๋ ฅ ํ•ด์ƒ๋„๋ณ„ ๋™์‹œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์ˆ˜ ๋น„๊ต (4ร—NVIDIA GPU ๋…ธ๋“œ ๊ธฐ์ค€). ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ์˜ ๊ฒฝ์šฐ ๊ฐ GPU๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ+ํ•™์Šต๊ธฐ๋ฅผ ๋ชจ๋‘ ์‹คํ–‰ํ•˜๊ณ , ๋ถ„๋ฆฌ ๋ฐฉ์‹์˜ ๊ฒฝ์šฐ 3๊ฐœ GPU๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ, 1๊ฐœ GPU๋Š” ํ•™์Šต๊ธฐ๋กœ๋งŒ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ๋ถ„๋ฆฌ ๋ฐฉ์‹์€ ๋™์ผ ์ž์›์—์„œ ํ™˜๊ฒฝ ์ˆ˜๋ฅผ ์•ฝ 2๋ฐฐ๋กœ ์ฆ๊ฐ€์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ™˜๊ฒฝ ์ˆ˜ ์ฆ๊ฐ€๋Š” ๋‹จ์ˆœํžˆ ์ˆ˜์น˜์ ์ธ ๊ฐœ์„ ์„ ๋„˜์–ด, ๊ทธ๋™์•ˆ ๋ถˆ๊ฐ€๋Šฅํ–ˆ๋˜ ๊ณ ํ•ด์ƒ๋„ ๋น„์ „ RL ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค๋Š” ์ ์—์„œ ์˜์˜๊ฐ€ ํฝ๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์— ๋”ฐ๋ฅด๋ฉด, ํ•ด์ƒ๋„ 320\times240์—์„œ ๊ธฐ์กด ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ๋กœ๋Š” ์ •์ฑ…์ด ์ „ํ˜€ ํ•™์Šต๋˜์ง€ ์•Š์•˜์ง€๋งŒ, ๋ถ„๋ฆฌ ๋ฐฉ์‹์—์„œ๋Š” ์ผ์ • ์ˆ˜์ค€ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ์„ฑ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๊ธฐ์กด ๋ฐฉ์‹์€ ๋ฉ”๋ชจ๋ฆฌ ์ œ์•ฝ์œผ๋กœ PPO ๋ฐฐ์น˜๊ฐ€ ์ค„์–ด๋“ค์–ด ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•˜์—ฌ ์„ฑ๊ณต๋ฅ  0%์— ๋จธ๋ฌผ๋ €์ง€๋งŒ, ์ œ์•ˆํ•œ ๋ฐฉ์‹์€ ํ›จ์”ฌ ํฐ ๋ฐฐ์น˜๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•ด ์ตœ์ข… ์„ฑ๊ณต๋ฅ  ~35%์˜ ์ •์ฑ…์„ ์–ป์–ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ํ•ด์ƒ๋„ 160\times120 ์‹คํ—˜์˜ ๊ฒฝ์šฐ๋„, ๋ถ„๋ฆฌ ๋ฐฉ์‹์€ ๋ชจ๋“  ์‹œ๋“œ(seed)์—์„œ ๋„๋ฉ”์ธ ๋žœ๋คํ™” ๋‚œ์ด๋„ ์ตœ๋Œ€์น˜(Full ADR)์— ๋„๋‹ฌํ•˜๋ฉฐ ์•ฝ 42%์˜ ์„ฑ๊ณต๋ฅ ์„ ๊ธฐ๋กํ–ˆ์ง€๋งŒ, ๊ธฐ์กด ๋ฐฉ์‹์€ ์ผ๋ถ€ ์‹œ๋“œ๋งŒ ํ•œ์ •๋œ ๋‚œ์ด๋„๊นŒ์ง€ ๋„๋‹ฌ(20%)ํ•˜๊ณ  ์„ฑ๊ณต๋ฅ ๋„ 37%์— ๊ทธ์ณค์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋™์ผ ์กฐ๊ฑด์—์„œ ๋ถ„๋ฆฌ ๋ฐฉ์‹์ด ๋” ์‹ ๋ขฐ์„ฑ ์žˆ๊ฒŒ ์šฐ์ˆ˜ํ•œ ์ •์ฑ…์„ ํ•™์Šต์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ โ€œ๋ณต์žกํ•œ ์—”๋“œํˆฌ์—”๋“œ ์ž‘์—…์˜ ์„ฑ๋Šฅ์€ ์‹œ๋ฎฌ๋ ˆ์ดํŠธ ๊ฐ€๋Šฅํ•œ ํ™˜๊ฒฝ ์ˆ˜์— ๊ฐ•ํ•˜๊ฒŒ ์ œํ•œ๋˜๋ฉฐ, ์šฐ๋ฆฌ์˜ ๊ธฐ๋ฒ•์ฒ˜๋Ÿผ ๊ทธ ํ•œ๊ณ„๋ฅผ ํ™•์žฅํ•˜๋ฉด ๊ณง๋ฐ”๋กœ ์ •์ฑ… ์„ฑ๋Šฅ ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์ง„๋‹คโ€๊ณ  ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ ์ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-ํ•™์Šต๊ธฐ ๋ถ„๋ฆฌ ๊ธฐ๋ฒ•์€ ์—”๋“œํˆฌ์—”๋“œ RL์˜ ์‹ค์šฉํ™”๋ฅผ ๊ฐ€๋กœ๋ง‰๋˜ ๊ณ„์‚ฐ ์ž์› ๋ณ‘๋ชฉ์„ ์™„ํ™”ํ•˜๋Š” ๊ธฐ์ˆ ์  ๋ŒํŒŒ๊ตฌ๋กœ์„œ, ๋ณธ ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ: ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ

์ œ์•ˆ๋œ ์ ‘๊ทผ๋ฒ•์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด, ๋…ผ๋ฌธ์—์„œ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ‰๊ฐ€์™€ ์‹ค์„ธ๊ณ„ ๋กœ๋ด‡ ์‹คํ—˜ ๋ชจ๋‘์—์„œ ๋‹ค์–‘ํ•œ ์ง€ํ‘œ๋ฅผ ์ธก์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ‰๊ฐ€ ๊ธฐ์ค€์œผ๋กœ๋Š” ๊ทธ๋ฆฌํ•‘ ์„ฑ๊ณต๋ฅ (success rate)์„ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์ถ”๊ฐ€์ ์œผ๋กœ ๋„๋ฉ”์ธ ๋žœ๋คํ™” ์ง„ํ–‰๋ฅ (ADR ๋‹ฌ์„ฑ๋„) ๋“ฑ์˜ ํ•™์Šต ์ง„์ฒ™๋„๋ฅผ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ์˜ ์„ฑ๋Šฅ ๋น„๊ต

๋จผ์ € ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์ƒ์—์„œ ์—”๋“œํˆฌ์—”๋“œ RL ์ •์ฑ…์˜ ํ•™์Šต ์„ฑ๊ณผ๋ฅผ ์‚ดํŽด๋ณด๋ฉด, ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ vs ๋ถ„๋ฆฌ ๋ฐฉ์‹ ๊ฐ„์— ํ˜„์ €ํ•œ ์ฐจ์ด๊ฐ€ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ์•ž์„œ ์–ธ๊ธ‰ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ, ๊ณ ํ•ด์ƒ๋„ ์ž…๋ ฅ(320ร—240)์—์„œ๋Š” ๊ธฐ์กด ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ๋กœ๋Š” ์–ด๋– ํ•œ ์‹œ๋“œ๋„ ์ตœ์ข… ๊ณผ์ œ๋ฅผ ํ•™์Šตํ•˜์ง€ ๋ชปํ•œ ๋ฐ˜๋ฉด(ADR ์ง„ํ–‰๋„ 0%, ์„ฑ๊ณต๋ฅ  0), ๋ถ„๋ฆฌ ๋ฐฉ์‹์—์„œ๋Š” 5๊ฐœ ์‹œ๋“œ ์ค‘ 1๊ฐœ ์ •๋„๋Š” ์ตœ๊ณ  ๋‚œ์ด๋„๊นŒ์ง€ ๋„๋‹ฌ(20% ์‹œ๋“œ ๋‹ฌ์„ฑ)ํ•˜๊ณ , ํ‰๊ท ์ ์œผ๋กœ๋„ ADR ๋ฒ”์œ„์˜ 90%๊นŒ์ง€ ๋‚œ์ด๋„๋ฅผ ๋†’์ธ ์ •์ฑ…์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋น„๋ก ๋ชจ๋“  ์‹คํ—˜์—์„œ ์™„์ „ํ•œ ๋‚œ์ด๋„์— ๋„๋‹ฌํ•˜์ง€๋Š” ๋ชปํ–ˆ์ง€๋งŒ, ๋ถ€๋ถ„์ ์œผ๋กœ๋ผ๋„ ํ•™์Šต์ด ์ง„ํ–‰๋˜์–ด 35% ๊ฐ€๋Ÿ‰์˜ ๊ทธ๋ฆฝ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์ธ ๊ฒƒ์€ ์—”๋“œํˆฌ์—”๋“œ ๋น„์ „ RL์ด ์ด ํ•ด์ƒ๋„์—์„œ๋„ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋‚ฎ์€ ํ•ด์ƒ๋„ ์กฐ๊ฑด(160ร—120)์—์„œ๋Š” ์–‘์ชฝ ๋ฐฉ์‹ ๋ชจ๋‘ ํ•™์Šต์ด ๋˜๊ธด ํ–ˆ์ง€๋งŒ, ๋ถ„๋ฆฌ ๋ฐฉ์‹์˜ ์„ฑ๋Šฅ๊ณผ ์•ˆ์ •์„ฑ์ด ๋” ๋†’์•˜์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ์˜ ๊ฒฝ์šฐ ์ „์ฒด ์‹œ๋“œ ์ค‘ ์ผ๋ถ€๋งŒ ์ตœ์ข… ๋‚œ์ด๋„(Full ADR)์— ๋„๋‹ฌํ•˜์—ฌ ํ‰๊ท  20%์˜ ๋‹ฌ์„ฑ๋ฅ ์„ ๋ณด์˜€์œผ๋‚˜, ๋ถ„๋ฆฌ ๋ฐฉ์‹์€ ๋ชจ๋“  ์‹œ๋“œ(100%)๊ฐ€ Full ADR์— ๋„๋‹ฌํ–ˆ๊ณ  ์„ฑ๊ณต๋ฅ ๋„ ๋” ๋†’์•˜์Šต๋‹ˆ๋‹ค (42% vs 37%). ๋‹ค์‹œ ๋งํ•ด, ๋ถ„๋ฆฌ ๋ฐฉ์‹์€ ํ•™์Šต ๊ฒฐ๊ณผ์˜ ํŽธ์ฐจ๋ฅผ ์ค„์ด๊ณ  ์ผ๊ด€๋˜๊ฒŒ ๋‚œ์ด๋„ ์ปค๋ฆฌํ˜๋Ÿผ์„ ๋๊นŒ์ง€ ์†Œํ™”ํ•˜๋„๋ก ๋•๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋Œ€๊ทœ๋ชจ ๋ฐฐ์น˜๋กœ ์ธํ•œ PPO ํ•™์Šต ์‹ ํ˜ธ ์•ˆ์ •ํ™” ๋•๋ถ„์œผ๋กœ ํ’€์ด๋ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ์œผ๋กœ, ๊ต์‚ฌ ์ •์ฑ… ์ข…๋ฅ˜์— ๋”ฐ๋ฅธ ํ•™์ƒ ์ •์ฑ… ์„ฑ๋Šฅ์„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ๊ฐ€ ํฅ๋ฏธ๋กœ์šด ํ†ต์ฐฐ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ์ƒํƒœ ๊ธฐ๋ฐ˜ ๊ต์‚ฌ vs ๊นŠ์ด ๊ธฐ๋ฐ˜ ๊ต์‚ฌ ๊ฐ๊ฐ์œผ๋กœ ํ•™์Šตํ•œ ํ›„ ๋™์ผํ•œ ์Šคํ…Œ๋ ˆ์˜ค RGB ํ•™์ƒ ์ •์ฑ…์œผ๋กœ ์ฆ๋ฅ˜ํ•˜์—ฌ, ์ตœ์ข… ์ •์ฑ…์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ์‹คํ—˜์—์„œ ์„ฑ๊ณต๋ฅ ์€ โ€œ์ž„์˜์˜ ์‹œ์ ์— ๋ฌผ์ฒด๋ฅผ ๊ณต์ค‘์— ๋“ค๊ณ  ์žˆ๋Š” ํ™˜๊ฒฝ ๋น„์œจโ€๋กœ ์ •์˜๋˜์—ˆ๋Š”๋ฐ (๋“ค์–ด์˜ฌ๋ฆฐ ํ›„ 2์ดˆ ์œ ์ง€ ์‹œ ํ•ด๋‹น ํ™˜๊ฒฝ์„ ๋ฆฌ์…‹), ๊ฐ’์ด ๋†’์„์ˆ˜๋ก ์ •์ฑ…์ด ๋น ๋ฅด๊ณ  ๋Šฅ์ˆ™ํ•˜๊ฒŒ ๋ฌผ์ฒด๋ฅผ ์ง‘๋Š”๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๋šœ๋ ทํ–ˆ์Šต๋‹ˆ๋‹ค: ๊นŠ์ด ๊ต์‚ฌ๋กœ๋ถ€ํ„ฐ ๋ฐฐ์šด ํ•™์ƒ ์ •์ฑ…์ด ์ „ ๊ตฌ๊ฐ„์—์„œ ์ƒํƒœ ๊ต์‚ฌ ๊ธฐ๋ฐ˜ ํ•™์ƒ๋ณด๋‹ค ๋†’์€ ์„ฑ๊ณต ๋น„์œจ์„ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๊ณก์„ ์˜ ๊ฒฉ์ฐจ๋Š” ํ•™์Šต ์ดˆ์ค‘๋ฐ˜๋ถ€ํ„ฐ ๋ฒŒ์–ด์ ธ ๋๊นŒ์ง€ ์œ ์ง€๋˜์—ˆ๋Š”๋ฐ, ๋…ผ๋ฌธ์€ ์ด๋ฅผ ๊ต์‚ฌ-ํ•™์ƒ ๊ฐ„ ์ •๋ณด๋น„๋Œ€์นญ ๊ฐ์†Œ ํšจ๊ณผ๋กœ ํ•ด์„ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋น„์ „ ๊ต์‚ฌ๋Š” ํ•™์ƒ๊ณผ ๋™์ผํ•œ ์œ ํ˜•์˜ ์ž…๋ ฅ์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ํ•™์ƒ์ด ๋ฐฐ์›Œ์•ผ ํ•  ํ–‰๋™์ด ์ž๊ธฐ ์„ผ์„œ ํ•œ๊ณ„ ๋‚ด์—์„œ ํ•ฉ๋ฆฌ์ ์ด๋ฉฐ, ๊ฒฐ๊ณผ์ ์œผ๋กœ ํ•™์ƒ์ด ํ•ด๋‹น ํ–‰๋™์„ ํ›จ์”ฌ ํšจ๊ณผ์ ์œผ๋กœ ์ตํžŒ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊นŠ์ด ๊ต์‚ฌ์—๊ฒŒ ๋ฐฐ์šด ํ•™์ƒ์€ ํŒ”์ด๋‚˜ ์†๊ฐ€๋ฝ์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๊ฐ€๋ฆฌ์ง€ ์•Š๋„๋ก ์ž์„ธ๋ฅผ ์กฐ์ •ํ•˜๋Š” ๋ฒ•์„ ํ„ฐ๋“ํ–ˆ์ง€๋งŒ, ์ƒํƒœ ๊ต์‚ฌ์—๊ฒŒ ๋ฐฐ์šด ํ•™์ƒ์€ ๊ทธ๋Ÿฐ ์ „๋žต์„ ํ•™์Šตํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ์—”๋“œํˆฌ์—”๋“œ ๋น„์ „ RL๋กœ ์–ป์€ ๊ต์‚ฌ๋Š” ํ•™์ƒ์—๊ฒŒ๋„ ์‹œ๊ฐ์ ์œผ๋กœ ์ตœ์ ํ™”๋œ ๋™์ž‘์„ ์ „์ˆ˜ํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹คํ—˜์ด ๋’ท๋ฐ›์นจํ•ด์ค๋‹ˆ๋‹ค.

์‹ค์„ธ๊ณ„ ๋กœ๋ด‡์œผ๋กœ์˜ ๊ฒ€์ฆ ๋ฐ ์„ฑ๋Šฅ

์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๋„์ถœ๋œ ์ •์ฑ…๋“ค์€ ์‹ค์ œ ๋กœ๋ด‡ ํ”Œ๋žซํผ์— ์ด์‹๋˜์–ด ์„ฑ๋Šฅ์ด ๊ฒ€์ฆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ‰๊ฐ€์—๋Š” ๋…ผ๋ฌธ ์ €์ž๋“ค์ด ์ด์ „ ์—ฐ๊ตฌ๋“ค์—์„œ ํ™•๋ฆฝํ•œ โ€œ๋นˆ ํฌ์žฅ(bin packing) ๊ทธ๋ฆฌํ•‘โ€ ๋ฒค์น˜๋งˆํฌ๊ฐ€ ํ™œ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฒค์น˜๋งˆํฌ๋Š” ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋“ค์— ๋Œ€ํ•œ ์ง€์†์ ์ธ ๊ทธ๋ฆฌํ•‘ ๋Šฅ๋ ฅ์„ ์‹œํ—˜ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์œผ๋กœ, ํ…Œ์ด๋ธ” ์œ„์— ์„œ๋กœ ๋‹ค๋ฅธ 30๊ฐœ์˜ ๋ฌผ์ฒด๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๋ฐฐ์น˜ํ•œ ํ›„ ๋กœ๋ด‡์ด ์ด๋ฅผ ํ•˜๋‚˜์”ฉ ์ง‘์–ด ์ธ์ ‘ํ•œ ๋นˆ(bin)์— ๋„ฃ๋Š” ์ž‘์—…์„ ๋ฐ˜๋ณต ์ˆ˜ํ–‰ํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์€ ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ๋ฌผ์ฒด๋ฅผ ์ง‘๋„๋ก ์„ค์ •๋˜๋ฉฐ, ๋ฌผ์ฒด๋ฅผ ์ง‘์–ด ์ถฉ๋ถ„ํžˆ ๋“ค์–ด์˜ฌ๋ฆฐ ๋‹ค์Œ์—๋Š” ๋กœ๋ด‡์ด ๋ฏธ๋ฆฌ ์ •ํ•ด์ง„ ์‚ฌ์ „ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋™์ž‘์œผ๋กœ ๊ทธ ๋ฌผ์ฒด๋ฅผ ๋นˆ์— ๋–จ์–ด๋œจ๋ฆฌ๊ณ , ๋‹ค์‹œ ์ดˆ๊ธฐ ์ž์„ธ๋กœ ๋Œ์•„์™€ ๋‹ค์Œ ๋ฌผ์ฒด๋ฅผ ์ง‘๋Š” ์‹์œผ๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค. (์ด๋Ÿฌํ•œ ์ผ๋ จ์˜ ๊ณผ์ •์€ ํœด๋จผ ์˜คํผ๋ ˆ์ดํ„ฐ ๊ฐœ์ž… ์—†์ด ์ƒํƒœ ๋จธ์‹ ์— ์˜ํ•ด ์ž๋™ํ™”๋˜์—ˆ์œผ๋ฉฐ, ์ •์ฑ… ๋„คํŠธ์›Œํฌ ์ž์ฒด์—๋Š” ๋ฌผ์ฒด์˜ ํ˜„์žฌ ๋†’์ด๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ณด์กฐ ํ—ค๋“œ๊ฐ€ ๋ถ™์–ด ์žˆ์–ด ๋ฌผ์ฒด๊ฐ€ ๋“ค๋ ธ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.) ํ‰๊ฐ€ ์ง€ํ‘œ๋Š” ์ตœ์ข…์ ์œผ๋กœ ๋นˆ์— ์„ฑ๊ณต์ ์œผ๋กœ ๋‹ด๊ธด ๋ฌผ์ฒด์˜ ๋น„์œจ, ์ฆ‰ 30๊ฐœ ์ค‘ ๋ช‡ ๊ฐœ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ง‘์–ด ์˜ฎ๊ฒผ๋Š”์ง€๋ฅผ ์„ฑ๊ณต๋ฅ (%)๋กœ ์ •์˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋น„๊ต ๋Œ€์ƒ์€ ๊ธฐ์กด์˜ DextrAH ๊ณ„์—ด ๋ฐฉ๋ฒ•๋“ค๊ณผ ๋ณธ ์—ฐ๊ตฌ์˜ ์ •์ฑ…๋“ค์ž…๋‹ˆ๋‹ค. DextrAH-G๋Š” 2024๋…„์— ๋ณด๊ณ ๋œ ๋ฐฉ์‹์œผ๋กœ, Geometric Fabrics ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๋ฒ•๊ณผ RL์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒํƒœ ์ •๋ณด๋กœ ํ•™์Šต๋œ ์ •์ฑ…์ด๊ณ , DextrAH-RGB๋Š” 2025๋…„ ๋ณด๊ณ ๋œ ํ›„์† ์—ฐ๊ตฌ๋กœ ์•ž์„œ ํ•™์Šต๋œ ์ƒํƒœ ๊ธฐ๋ฐ˜ ์ •์ฑ…์„ RGB ๋น„์ „ ๋ชจ๋ธ๋กœ ์ฆ๋ฅ˜ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•œ๋งˆ๋””๋กœ DextrAH-RGB๋Š” ์ด ๋…ผ๋ฌธ์ด ๊ฐœ์„ ํ•˜๊ณ ์ž ํ•˜๋Š” โ€œ์ƒํƒœ ๊ต์‚ฌ โ†’ ๋น„์ „ ํ•™์ƒโ€ ๋ฐฉ์‹์˜ ์ตœ์‹  ๊ฒฐ๊ณผ๋ผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ, ๋ณธ ๋…ผ๋ฌธ์˜ ์—”๋“œํˆฌ์—”๋“œ RL ์ •์ฑ…์ด ์ด์ „ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๊ธฐ๋กํ•˜๋ฉฐ SOTA๋ฅผ ๊ฐฑ์‹ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์ธ ์ˆ˜์น˜๋ฅผ ๋ณด๋ฉด, DextrAH-G๋Š” ์•ฝ 87%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€๊ณ  (์ƒํƒœ ์ •๋ณด๋ฅผ ์ผ๋ถ€ ํ™œ์šฉํ•˜๋ฏ€๋กœ ์ƒํ•œ์„ ์ฒ˜๋Ÿผ ์—ฌ๊ฒจ์ง), DextrAH-RGB๋Š” 77%๋กœ ๋‹ค์†Œ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์•ž์„œ ์ง€์ ํ•œ ์ƒํƒœ-๋น„์ „ ๊ฐ„ ๊ฒฉ์ฐจ๋กœ ์ธํ•ด ํ˜„์‹ค์—์„œ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง„ ์˜ˆ๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๊นŠ์ด ๊ต์‚ฌ๋ฅผ ๊ฑฐ์ณ ์–ป์€ RGB ์ •์ฑ…์€ 87%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•˜์—ฌ, ๋น„์ „ ์ •์ฑ…์œผ๋กœ์„œ ์ด์ „ ์ตœ๊ณ ์น˜(77%)๋ฅผ ํฌ๊ฒŒ ์ƒํšŒํ•˜๋ฉฐ ์ƒํƒœ ๊ธฐ๋ฐ˜ ๊ต์‚ฌ์˜ ์ˆ˜์ค€๊นŒ์ง€ ๋Œ์–ด์˜ฌ๋ ธ์Šต๋‹ˆ๋‹ค. ๋”์šฑ์ด ์—ฌ๊ธฐ์— ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-ํ•™์Šต๊ธฐ ๋ถ„๋ฆฌ ๊ธฐ๋ฒ•๊นŒ์ง€ ํ™œ์šฉํ•ด ๋Œ€๋Ÿ‰์˜ ๊ฒฝํ—˜์œผ๋กœ ํ•™์Šตํ•œ ์ •์ฑ…์˜ ๊ฒฝ์šฐ ์„ฑ๊ณต๋ฅ ์ด ๋ฌด๋ ค 93%์— ์ด๋ฅด๋ €์Šต๋‹ˆ๋‹ค. 93%๋ผ๋Š” ๊ฒฐ๊ณผ๋Š” 30๊ฐœ ์ค‘ ๊ฑฐ์˜ 28๊ฐœ ๋ฌผ์ฒด๋ฅผ ์—ฐ์† ์„ฑ๊ณต์œผ๋กœ ์˜ฎ๊ฒผ์Œ์„ ์˜๋ฏธํ•˜๋ฉฐ, ์‚ฌ๋žŒ ์ˆ˜์ค€์— ๊ทผ์ ‘ํ•˜๋Š” ๋งค์šฐ ๋†’์€ ์„ฑ๋Šฅ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋™์ผ ๋กœ๋ด‡ยทํ™˜๊ฒฝ์—์„œ ๋ณด๊ณ ๋œ ์ตœ๊ณ  ์„ฑ๋Šฅ์œผ๋กœ์„œ, ์—”๋“œํˆฌ์—”๋“œ RL ์ ‘๊ทผ์˜ ํšจ๊ณผ๋ฅผ ๊ทน๋ช…ํ•˜๊ฒŒ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ์˜ ํ†ต๊ณ„์  ์œ ์˜๋ฏธ์„ฑ์„ ๋” ์‚ดํŽด๋ณด๋ฉด, ๊นŠ์ด ๊ต์‚ฌ ๊ธฐ๋ฐ˜ ์ •์ฑ…๋“ค์€ ๋ชจ๋“  ์‹คํ—˜ ๋ฐ˜๋ณต์—์„œ ์•ˆ์ •์ ์œผ๋กœ 80% ์ด์ƒ์˜ ์„ฑ๊ณต๋ฅ ์„ ์œ ์ง€ํ•œ ๋ฐ˜๋ฉด, ์ƒํƒœ ๊ต์‚ฌ ๊ธฐ๋ฐ˜ ์ •์ฑ…(DextrAH-RGB)์€ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋”ฐ๋ผ ์‹คํŒจํ•˜๋Š” ๋ณ€๋™์ด ๋” ์ปธ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋ฌด์—‡๋ณด๋‹ค ๊นŠ์ด ๊ต์‚ฌ + ๋ถ„๋ฆฌ ํ•™์Šต ์ •์ฑ…์˜ 93%๋Š” ๋‹จ์ˆœํ•œ ๋ฐฉ๋ฒ• ๋ณ€๊ฒฝ๋งŒ์œผ๋กœ ๊ธฐ์กด ๋Œ€๋น„ 16%p ํ–ฅ์ƒ๋œ ๊ฒƒ์ด์–ด์„œ ์‹ค๋ฌด ์—”์ง€๋‹ˆ์–ด๋“ค์—๊ฒŒ ํฐ ์ธ์ƒ์„ ์ค๋‹ˆ๋‹ค. ์ด๋Š” ๊ต์‚ฌ-ํ•™์ƒ ๊ด€์ธก ์ฐจ์ด ํ•ด์†Œ์˜ ์ด๋“๊ณผ ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ํ•™์Šต์˜ ์ด๋“์ด ์„œ๋กœ ์ƒํ˜ธ์ž‘์šฉํ•˜์—ฌ ์–ป์€ ๊ฒฐ๊ณผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋„ โ€œ๊นŠ์ด ๊ต์‚ฌ๋กœ ํ•™์Šตํ•œ ์ •์ฑ…๋“ค์ด ๊ธฐ์กด ์ƒํƒœ ๊ต์‚ฌ ๊ธฐ๋ฐ˜ ์ •์ฑ…๋“ค์˜ ์„ฑ๋Šฅ ์ œํ•œ์„ ๊ทน๋ณตํ–ˆ์œผ๋ฉฐ, ํŠนํžˆ ๋” ํฐ ๋ฐฐ์น˜ ํฌ๊ธฐ๋กœ ํ•™์Šตํ•œ ์ •์ฑ…์ด ๊ฐ€์žฅ ์šฐ์ˆ˜ํ–ˆ๋‹คโ€๊ณ  ๊ฒฐ๋ก ์ง“๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์š”์ปจ๋Œ€ ์—”๋“œํˆฌ์—”๋“œ RL์„ ํ†ตํ•œ ๋น„์ „ ์ธ์‹ ํ–‰๋™ ํ•™์Šต๊ณผ ํ›ˆ๋ จ ์ธํ”„๋ผ์˜ ๊ฐœ์„ ์ด ์‹ค์ œ ๋กœ๋ด‡ ์„ฑ๋Šฅ๊นŒ์ง€ ํ–ฅ์ƒ์‹œํ‚จ ๊ฒƒ์„ ์ž…์ฆํ•œ ์…ˆ์ž…๋‹ˆ๋‹ค.

์—”๋“œํˆฌ์—”๋“œ RL vs ๋‹จ๊ณ„๋ณ„/๋ชจ๋ฐฉ ํ•™์Šต: ์žฅ๋‹จ์  ๋ถ„์„

์•ž์„œ ์‚ดํŽด๋ณธ ๋‚ด์šฉ๋“ค์„ ํ† ๋Œ€๋กœ, ์—”๋“œํˆฌ์—”๋“œ ๊ฐ•ํ™”ํ•™์Šต ์ ‘๊ทผ๊ณผ ์ „ํ†ต์ ์ธ ๋‹จ๊ณ„๋ณ„/๋ชจ๋ฐฉ ํ•™์Šต ์ ‘๊ทผ์„ ์‹ค๋ฌด ๊ด€์ ์—์„œ ๋น„๊ตํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์žฅ๋‹จ์ ์˜ ๋Œ€์กฐ๊ฐ€ ๋“œ๋Ÿฌ๋‚ฉ๋‹ˆ๋‹ค.

โ— ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ๋ฐ ๊ตฌํ˜„ ๋ณต์žก๋„: ๋‹จ๊ณ„๋ณ„ ์ ‘๊ทผ์€ ๊ต์‚ฌ RL๊ณผ ํ•™์ƒ ๋ชจ๋ฐฉํ•™์Šต์œผ๋กœ ๋‚˜๋‰˜์–ด ์žˆ์–ด ๊ตฌํ˜„ ๋ณต์žก๋„๊ฐ€ ๋‹ค์†Œ ๋†’์Šต๋‹ˆ๋‹ค. ๋‘ ๋‹จ๊ณ„์— ๊ฐ๊ธฐ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ณ , ๊ต์‚ฌ-ํ•™์ƒ ๊ฐ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘/ํ•™์Šต ์ผ์น˜๋ฅผ ๋งž์ถฐ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ต์‚ฌ ์ •์ฑ… ํ•™์Šต ํ›„ ๋ณ„๋„์˜ ๋Œ€๋Ÿ‰ roll-out์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์„ ๋ชจ์•„ ํ•™์ƒ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋“ฑ ์ถ”๊ฐ€ ์ž‘์—…์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ์—”๋“œํˆฌ์—”๋“œ RL์€ ๋‹จ์ผํ•œ ํ•™์Šต ๋ฃจํ”„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์–ด ๊ตฌ์กฐ๊ฐ€ ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค. ํ•œ ๋ฒˆ์˜ PPO (๋˜๋Š” ๊ธฐํƒ€ RL) ํ•™์Šต์œผ๋กœ ๋ฐ”๋กœ ์ตœ์ข… ์ •์ฑ…์„ ์–ป์„ ์ˆ˜ ์žˆ๊ณ , ๋ณ„๋„์˜ ๋ชจ๋ฐฉ ๋‹จ๊ณ„๊ฐ€ ์—†์œผ๋ฏ€๋กœ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ๊ฐ€ ์งง๊ณ  ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์ด๋ฒˆ ์—ฐ๊ตฌ์ฒ˜๋Ÿผ ์—”๋“œํˆฌ์—”๋“œ RL ๋‚ด๋ถ€์—์„œ ํšจ์œจ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ํŠธ๋ฆญ(๊นŠ์ด ํ™œ์šฉ, ๋ฉ€ํ‹ฐGPU ํ™œ์šฉ)์ด ๋„์ž…๋˜๊ธด ํ–ˆ์œผ๋‚˜, ์ด๋Š” ์–ด๋””๊นŒ์ง€๋‚˜ ํ•™์Šต ๊ฐ€์†/์•ˆ์ •ํ™” ์ˆ˜๋‹จ์ด์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ๋ฆ„ ์ž์ฒด๋Š” ์ผ์›ํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

โ— ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ๊ณผ ํ•„์š”ํ•œ ์ž์›: ๋ชจ๋ฐฉ ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์€ RL๋กœ ํ•™์Šตํ•ด์•ผ ํ•  ๋ถ€๋ถ„์„ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ด์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ ๊ต์‚ฌ RL์€ ์ƒํƒœ ๊ณต๊ฐ„์—์„œ ์ˆ˜ํ–‰ํ•˜๋ฏ€๋กœ ๋น„๊ต์  ํšจ์œจ์ ์ด๊ณ , ํ•™์ƒ ์ •์ฑ…์€ ์ง€๋„ํ•™์Šต์œผ๋กœ ํ•™์Šต๋˜์–ด RL์˜ ๋†’์€ ์ƒ˜ํ”Œ ๋ณต์žก๋„ ๋ฌธ์ œ๋ฅผ ํ”ผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ณง ์ ์€ ๊ฒฝํ—˜์œผ๋กœ๋„ ์ •์ฑ…์„ ์–ป์„ ์ˆ˜ ์žˆ๊ณ , ํ•„์š”ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹œ๊ฐ„์ด ์ ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ์—”๋“œํˆฌ์—”๋“œ RL์€ ํ”ฝ์…€ ๋‹จ์œ„๊นŒ์ง€ ํฌํ•จํ•œ ๊ฑฐ๋Œ€ํ•œ ์ƒํƒœ๊ณต๊ฐ„์„ ์ง์ ‘ ํƒ์ƒ‰ํ•ด์•ผ ํ•˜๋ฏ€๋กœ ํ›จ์”ฌ ๋งŽ์€ ๋ฐ์ดํ„ฐ์™€ ์‹œ๊ฐ„์ด ์š”๊ตฌ๋ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๊ฒฝ์šฐ์—๋„ 4๊ฐœ์˜ ์ตœ์‹  GPU๋ฅผ ์ˆ˜์ผ๊ฐ„ ํ™œ์šฉํ•˜๋ฉฐ ์ˆ˜์ฒœ๋งŒ ๋‹จ๊ณ„ ์ด์ƒ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ๋Œ๋ฆฐ ๊ฒƒ์œผ๋กœ ์ถ”์ •๋˜๋ฉฐ, ํŠน๋ณ„ํžˆ ์ œ์•ˆํ•œ ๋ถ„์‚ฐ ๊ธฐ๋ฒ• ์—†์ด๋Š” ํ•™์Šต ์ž์ฒด๊ฐ€ ์–ด๋ ค์› ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ž์› ์ œ์•ฝ์ด ํฐ ํ˜„์—… ํ™˜๊ฒฝ์—์„œ๋Š” ๊ณง๋ฐ”๋กœ ์—”๋“œํˆฌ์—”๋“œ RL์„ ์ ์šฉํ•˜๊ธฐ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๊ณ , ์—ฌ์ „ํžˆ ๊ต์‚ฌ-ํ•™์ƒ ๋ฐฉ๋ฒ•์ด ์‹ค์šฉ์ ์ธ ๋Œ€์•ˆ์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โ— ํ–‰๋™ ์ตœ์ ํ™” ๋ฐ ์„ฑ๋Šฅ ์ƒํ•œ: ์—”๋“œํˆฌ์—”๋“œ RL์˜ ๊ฐ€์žฅ ํฐ ์žฅ์ ์€ ์ •์ฑ…์ด ์• ์ดˆ์— ์ž์‹ ์˜ ์„ผ์„œ ์ž…๋ ฅ์— ๋งž์ถฐ ์ตœ์ ํ™”๋œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ ๊ฒฐ๊ณผ์—์„œ ๋ณด์•˜๋“ฏ, ์‹œ๊ฐ ์ •๋ณด๋ฅผ ์ง์ ‘ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šตํ•œ ์ •์ฑ…์€ ์‹œ์•ผ ํ™•๋ณด ๋“ฑ์˜ ํ–‰๋™์„ ์Šค์Šค๋กœ ๋ฐœ๊ฒฌํ•˜๊ณ , ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋™์ผ ์กฐ๊ฑด์—์„œ ๋” ๋†’์€ ๊ทธ๋ฆฌํ•‘ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‹จ๊ณ„๋ณ„ ์ ‘๊ทผ์˜ ๊ตฌ์กฐ์  ํ•œ๊ณ„โ€”์ฆ‰ ํ•™์ƒ์ด ์™„๋ฒฝํžˆ ๋”ฐ๋ผ๊ฐˆ ์ˆ˜ ์—†๋Š” ๊ต์‚ฌ์˜ ํ–‰๋™โ€”๋ฅผ ๋›ฐ์–ด๋„˜์–ด ์ •์ฑ… ์„ฑ๋Šฅ์˜ ์ƒํ•œ์„ ์„ ๋†’์ธ ๊ฒƒ์œผ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋‹ค์ด๋‚ด๋ฏนํ•˜๊ณ  ๋ถˆํ™•์‹ค์„ฑ์ด ํฐ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋Š” ์ด๋Ÿฐ ์„ผ์„œ-๋ชจํ„ฐ ์ผ์น˜(sensorimotor alignment)๊ฐ€ ์ •์ฑ…์˜ ๊ฐ•๊ฑด์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ๋‹จ๊ณ„๋ณ„ ์ ‘๊ทผ์€ ๊ต์‚ฌ๊ฐ€ ๋†’์€ ์„ฑ๋Šฅ์„ ๋‚ด๋”๋ผ๋„ ํ•™์ƒ์ด ์ด๋ฅผ ๋ชป ๋”ฐ๋ผ๊ฐ€๋ฉด ๊ฒฐ๊ตญ ์ „์ฒด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์‚ฌ๋žŒ์ด ์„ค๊ณ„ํ•œ ๊ต์‚ฌ ๋ณด์ƒ์ด๋‚˜ ์ „๋žต์— ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค๋ฉด ๊ทธ ์—ญ์‹œ ํ•™์Šต ๊ฒฐ๊ณผ์— ์ œ์•ฝ์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์š”์•ฝํ•˜๋ฉด, ์—”๋“œํˆฌ์—”๋“œ RL์€ ๋” ๋‚˜์€ ์ตœ์ข… ์„ฑ๋Šฅ ์ž ์žฌ๋ ฅ์„ ๊ฐ€์ง€์ง€๋งŒ, ๋‹จ๊ณ„๋ณ„ ์ ‘๊ทผ์€ ์‹ค์ œ๋กœ ๊ทธ ์ž ์žฌ๋ ฅ์„ ๋ฐœํœ˜ํ•˜๊ธฐ๊นŒ์ง€ ์ œํ•œ์ด ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

โ— ์•ˆ์ „์„ฑ๊ณผ ๋””๋ฒ„๊น… ์šฉ์ด์„ฑ: ํ•œํŽธ, ๋ชจ๋ฐฉ ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์€ ์ค‘๊ฐ„์— ๋ช…์‹œ์ ์ธ ๊ต์‚ฌ ์ •์ฑ…์ด ์กด์žฌํ•˜๋ฏ€๋กœ, ์ •์ฑ…์˜ ์˜๋„๋‚˜ ๋™์ž‘์„ ํ•ด์„ํ•˜๊ฑฐ๋‚˜ ์ œ์–ดํ•˜๊ธฐ ์•ฝ๊ฐ„ ์ˆ˜์›”ํ•œ ๋ฉด์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๊ต์‚ฌ ๋‹จ๊ณ„์—์„œ ์‹คํŒจํ•˜๋ฉด ๊ฑฐ๊ธฐ์„œ ์›์ธ์„ ์ฐพ๊ณ  ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๊ณ , ํ•™์ƒ ๋‹จ๊ณ„์—์„œ ์‹คํŒจํ•˜๋ฉด ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์ด๋‚˜ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ ์กฐ์ •์„ ํ†ตํ•ด ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ต์‚ฌ ์ •์ฑ…์„ ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€ ์‹œๆผ”ํ•˜๊ฑฐ๋‚˜ ์•ˆ์ „ํ•˜๊ฒŒ ์„ค๊ณ„ํ•จ์œผ๋กœ์จ, ์ •์ฑ…์ด ์œ„ํ—˜ํ•œ ๋™์ž‘์„ ํ•˜์ง€ ์•Š๋„๋ก ์•ˆ์ „์žฅ์น˜๋ฅผ ๋„ฃ๋Š” ๊ฒƒ๋„ ๋น„๊ต์  ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ์—”๋“œํˆฌ์—”๋“œ RL์€ ํ•™์Šต ๊ณผ์ •์ด ๋ธ”๋ž™๋ฐ•์Šค์— ๊ฐ€๊นŒ์›Œ, ์ •์ฑ…์ด ํ•™์Šต ๋„์ค‘ ๋ณ€๋•์Šค๋Ÿฝ๊ฑฐ๋‚˜ ์œ„ํ—˜ํ•œ ์‹œ๋„๋ฅผ ํ•  ๊ฒฝ์šฐ ์ด๋ฅผ ์ œ์–ดํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ํ˜„์‹ค ๋กœ๋ด‡์— ๋ฐ”๋กœ RL์„ ์ ์šฉํ•˜๋ฉด ์ถฉ๋Œ ๋“ฑ ์œ„ํ—˜์ด ์žˆ์œผ๋ฏ€๋กœ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์ž˜ ํ™œ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. (๋ณธ ์—ฐ๊ตฌ๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ถฉ๋ถ„ํžˆ ํ•™์Šต ํ›„ ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋ฅผ ๊ฑฐ์ณ ํ˜„์‹ค๋กœ ๊ฐ€์ ธ์˜จ ๊ฒƒ์ด๋ฏ€๋กœ ์•ˆ์ „ ๋ฌธ์ œ๋Š” ํฌ์ง€ ์•Š์•˜์ง€๋งŒ, ์ผ๋ฐ˜์ ์œผ๋กœ ์—”๋“œํˆฌ์—”๋“œ RL์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์— ํฌ๊ฒŒ ์˜์กดํ•œ๋‹ค๋Š” ์ ๋„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.)

โ— ๊ธฐํƒ€ ์ธก๋ฉด: ๋ชจ๋ฐฉ ํ•™์Šต์ด๋‚˜ ๋‹จ๊ณ„๋ณ„ RL์€ ์ดˆ๊ธฐ ์‹œ๋ฒ” ๋ฐ์ดํ„ฐ๋‚˜ ์ „๋ฌธ๊ฐ€ ์ง€์‹์„ ์ฃผ์ž…ํ•˜๊ธฐ ์šฉ์ดํ•œ ๋ฐ˜๋ฉด, ์—”๋“œํˆฌ์—”๋“œ RL์€ ์™„์ „ํžˆ ์ž์œจ ํ•™์Šต์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ „์ž๋Š” ์ธ๊ฐ„์ด ์›ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ •์ฑ…์„ ์œ ๋„ํ•˜๊ธฐ ์‰ฝ์ง€๋งŒ, ์ž์œจ์„ฑ์ด ๋ถ€์กฑํ•˜์—ฌ ์ธ๊ฐ„ ์‹œ์—ฐ์— ํŽธํ–ฅ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ›„์ž๋Š” ๋ฐ์ดํ„ฐ ํŽธํ–ฅ ์—†์ด ์ตœ์ ํ•ด๋ฅผ ์ฐพ์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์ง€๋งŒ, ๋ณด์ƒ ์„ค๊ณ„๋‚˜ ํƒ์ƒ‰ ๋ฌธ์ œ๊ฐ€ ๊นŒ๋‹ค๋กญ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ ๊ณผ์ œ์™€ ํ™˜๊ฒฝ์— ๋”ฐ๋ผ ๋‘ ์ ‘๊ทผ์€ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ณต์žกํ•œ ๋‹ค์ง€ ๊ทธ๋ฆฌํ•‘ ๋ถ„์•ผ์—์„œ ์—”๋“œํˆฌ์—”๋“œ RL์˜ ๊ฐ€๋Šฅ์„ฑ์„ ์ฆ๋ช…ํ•จ์œผ๋กœ์จ, ํ–ฅํ›„์—๋Š” ์ด๋Ÿฌํ•œ ์ ‘๊ทผ์ด ์ถฉ๋ถ„ํ•œ ์ž์› ํ•˜์—์„œ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค๊ณ  ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก  ๋ฐ ์‹œ์‚ฌ์ 

โ€œEnd-to-end RL Improves Dexterous Grasping Policiesโ€ ๋…ผ๋ฌธ์„ ํ†ตํ•ด, ์šฐ๋ฆฌ๋Š” ์—”๋“œํˆฌ์—”๋“œ ๊ฐ•ํ™”ํ•™์Šต์ด ๋‹ค์ง€ ๋กœ๋ด‡ ์†์˜ ๋ณต์žกํ•œ ์กฐ์ž‘์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ ์˜คํžˆ๋ ค ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ๊นŠ์ด ์นด๋ฉ”๋ผ ๊ธฐ๋ฐ˜์˜ ํ”ฝ์…€-ํˆฌ-์•ก์…˜ ์ •์ฑ…์„ ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ํ•™์Šต์‹œํ‚ค๊ณ , ์ด๋ฅผ ์Šคํ…Œ๋ ˆ์˜ค RGB ์ •์ฑ…์œผ๋กœ ์ฆ๋ฅ˜ํ•˜์—ฌ ์‹ค์ œ ๋กœ๋ด‡์— ์„ฑ๊ณต์ ์œผ๋กœ ์ด์ „ํ•จ์œผ๋กœ์จ, ๋น„์ „ ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ๊ทธ๋ฆฌํ•‘์˜ ์ƒˆ๋กœ์šด ๊ฒฝ์ง€๋ฅผ ๊ฐœ์ฒ™ํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-ํ•™์Šต๊ธฐ ๋ถ„๋ฆฌ๋ผ๋Š” ํ˜์‹ ์ ์ธ ์ธํ”„๋ผ ์ตœ์ ํ™”๋ฅผ ๋„์ž…ํ•˜์—ฌ ์ œํ•œ๋œ GPU ์ž์›์œผ๋กœ๋„ ์ˆ˜์ฒœ ๊ฐœ ํ™˜๊ฒฝ์„ ๋Œ๋ฆด ์ˆ˜ ์žˆ์—ˆ๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๊ธฐ์กด์— ์‹คํŒจํ•˜๋˜ ๊ณ ํ•ด์ƒ๋„ ํ•™์Šต๋„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ๊ฒฐ๊ณผ์ ์œผ๋กœ ํ˜„์‹ค ์„ฑ๋Šฅ๊นŒ์ง€ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์„ ์ˆœํ™˜์„ ์ด๋ฃจ์–ด๋ƒˆ์Šต๋‹ˆ๋‹ค.

์ด ์—ฐ๊ตฌ๋Š” ์‹ค๋ฌด ์—”์ง€๋‹ˆ์–ด๋“ค์—๊ฒŒ ๋ช‡ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ์‹œ์‚ฌ์ ์„ ์ค๋‹ˆ๋‹ค. ์ฒซ์งธ, ์—”๋“œํˆฌ์—”๋“œ ์ ‘๊ทผ์ด ๋ฐ˜๋“œ์‹œ ๋น„ํšจ์œจ์ ์ด์ง€๋งŒ์€ ์•Š๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ ์ ˆํ•œ ์‹œ์Šคํ…œ ์ตœ์ ํ™”์™€ ์ค‘๊ฐ„ ํ‘œํ˜„(์˜ˆ: ๊นŠ์ด)์˜ ํ™œ์šฉ์œผ๋กœ, ๋ณต์žกํ•œ ๋กœ๋ด‡ ๊ณผ์ œ๋„ ์—”๋“œํˆฌ์—”๋“œ๋กœ ํ•™์Šต ๊ฐ€๋Šฅํ•จ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ–ฅํ›„ ๋กœ๋ด‡ ํ•™์Šต ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•  ๋•Œ, ๊ตณ์ด ๋ชจ๋“  ๊ฒƒ์„ ๋‹จ๊ณ„๋ณ„๋กœ ์ชผ๊ฐœ๊ธฐ๋ณด๋‹ค ์ตœ์†Œํ•œ์˜ ์ธ๊ฐ„ ๊ฐœ์ž…์œผ๋กœ ํ†ตํ•ฉ ํ•™์Šตํ•˜๋Š” ๋ฐฉํ–ฅ๋„ ๊ณ ๋ คํ•ด๋ณผ ๋งŒํ•จ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‘˜์งธ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ธํ”„๋ผ์˜ ์ค‘์š”์„ฑ์ž…๋‹ˆ๋‹ค. ๋กœ๋ด‡ ํ•™์Šต ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž์ฒด๋„ ์ค‘์š”ํ•˜์ง€๋งŒ, ์ด ๋…ผ๋ฌธ์ฒ˜๋Ÿผ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ ์ˆ˜๋ฅผ ๊ทน๋Œ€ํ™”ํ•˜๊ฑฐ๋‚˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์†๋„๋ฅผ ๋†’์ด๋Š” ์—”์ง€๋‹ˆ์–ด๋ง์ด ๊ฒฐ์ •์ ์ธ ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค๋ฌด์—์„œ๋Š” ์ข…์ข… ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐœ์„  ์ด์ „์— ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์–ป์„ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ํ•™์Šต ์ฒด๊ณ„ ์ „์ฒด๋ฅผ ๋ณด๋Š” ์‹œ๊ฐ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์…‹์งธ, ์„ผ์„œ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ๊ฐ„ ๊ฒฉ์ฐจ๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์˜ ์ค‘์š”์„ฑ์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ ํ˜„์‹ค ๊ฐ„, ํ˜น์€ ๊ต์‚ฌ์™€ ํ•™์ƒ ๊ฐ„์— ํ™œ์šฉํ•˜๋Š” ์„ผ์„œ ์ •๋ณด ์ˆ˜์ค€์ด ์ฐจ์ด๋‚œ๋‹ค๋ฉด, ๊ทธ ๊ฒฉ์ฐจ๋กœ ์ธํ•ด ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณธ ์—ฐ๊ตฌ๋Š” ์ž˜ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ˜„์‹ค ์ ์šฉ์„ ์—ผ๋‘์— ๋‘” ๋กœ๋ด‡ ํ•™์Šต์—์„œ๋Š” ๊ฐ€๋Šฅํ•œ ํ•œ ์‹ค์ œ ์„ผ์„œ์™€ ๋น„์Šทํ•œ ์กฐ๊ฑด์œผ๋กœ ํ•™์Šต์‹œํ‚ค๊ฑฐ๋‚˜, ์•„๋‹ˆ๋ฉด ์ด ๋…ผ๋ฌธ์˜ ๊นŠ์ดโ†’RGB ์ฆ๋ฅ˜์ฒ˜๋Ÿผ ๊ฒฉ์ฐจ๋ฅผ ๋ณด์™„ํ•  ์ˆ˜ ์žˆ๋Š” ์ถ”๊ฐ€ ๊ณผ์ •์„ ๋„ฃ๋Š” ๊ฒƒ์ด ๋ฐ”๋žŒ์งํ•ฉ๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ, ์ด ์—ฐ๊ตฌ๋Š” ๋กœ๋ด‡ ํ•™์Šต์—์„œ์˜ ์—”๋“œํˆฌ์—”๋“œ ๋Œ€ ๋‘ ๋‹จ๊ณ„ ์ ‘๊ทผ์˜ ๋…ผ์Ÿ์— ๊ฐ’์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค. ์—”๋“œํˆฌ์—”๋“œ RL์ด ์–ด๋ ค์šด ๊ธธ์ด์ง€๋งŒ ์ œ๋Œ€๋กœ ๊ตฌํ˜„๋˜๋ฉด ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๋ณด์ƒ์„ ์ฆ๋ช…ํ–ˆ๊ณ , ํ•œํŽธ์œผ๋กœ ๊ทธ ๊ตฌํ˜„์„ ์œ„ํ•ด ๋ฌด์—‡์ด ํ•„์š”ํ•œ์ง€๋„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค (๋Œ€๋Ÿ‰์˜ ๋ณ‘๋ ฌํ™”, ์ ์ ˆํ•œ ์ค‘๊ฐ„ ์„ผ์„œ ํ˜•ํƒœ ๋“ฑ). ์ด๋ฅผ ํ†ตํ•ด ์•ž์œผ๋กœ ๋กœ๋ด‡๊ณตํ•™ ์—ฐ๊ตฌ์ž๋“ค์€ ์–ธ์  ๊ฐ€ ์ธ๊ฐ„์ฒ˜๋Ÿผ ์นด๋ฉ”๋ผ๋งŒ ๋ณด๊ณ  ๋™์ž‘์„ ์ตํžˆ๋Š” ๋กœ๋ด‡์„ ์‹คํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ๋‚˜์•„๊ฐ€์•ผ ํ• ์ง€ ํžŒํŠธ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฒฐ๋ก ์ ์œผ๋กœ, ์—”๋“œํˆฌ์—”๋“œ RL ์ ‘๊ทผ์€ ๋‹ค์ง€ ๊ทธ๋ฆฌํ•‘ ์ •์ฑ…์˜ ์„ฑ๋Šฅ๊ณผ ํ–‰๋™ ํ’ˆ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์œ ๋งํ•œ ๋ฐฉ๋ฒ•์ด๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์˜ ๊ธฐ์ˆ ์  ํ˜์‹ ๋“ค์€ ์ถ”ํ›„ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ์กฐ์ž‘ ๋ถ„์•ผ์— ์‘์šฉ๋˜์–ด ๋” ์Šค๋งˆํŠธํ•˜๊ณ  ๋Šฅ๋™์ ์ธ ๋กœ๋ด‡์„ ๊ฐœ๋ฐœํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee