Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • 1. ์„œ๋ก : Long-Horizon Planning์˜ ๊ทผ๋ณธ์  ๋„์ „
    • 2. ๋ฌธ์ œ ์ •์˜: Trajectory Stitching์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€?
      • 2.1 Trajectory Stitching์˜ ํ˜•์‹์  ์ •์˜
      • 2.2 ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก ์˜ ํ•œ๊ณ„
    • 3. CompDiffuser: ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก 
      • 3.1 ํ•ต์‹ฌ Insight: Compositional Trajectory Distribution
      • 3.2 Noise-Conditioned Score Function Formulation
      • 3.3 Bidirectional Diffusion Process
      • 3.4 Training Objective
      • 3.5 Autoregressive Sampling with Composition
    • 4. ๊ธฐ์ˆ ์  ์„ธ๋ถ€์‚ฌํ•ญ
      • 4.1 Network Architecture
      • 4.2 Handling Boundary Conditions
      • 4.3 Flexible Chunk Count at Inference
      • 4.4 Replanning Strategy
    • 5. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„
      • 5.1 ์‹คํ—˜ ํ™˜๊ฒฝ: OGBench
      • 5.2 Baseline ๋น„๊ต
      • 5.3 ์ฃผ์š” ์‹คํ—˜ ๊ฒฐ๊ณผ
      • 5.4 Ablation Studies
    • 6. ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
      • 6.1 Diffusion Models for Planning
      • 6.2 Hierarchical and Compositional Approaches
      • 6.3 Trajectory Stitching in Offline RL
      • 6.4 ์ตœ์‹  ์—ฐ๊ตฌ ๋™ํ–ฅ
    • 7. ๋…ผ์˜: ์‹œ์‚ฌ์ ๊ณผ ํ•œ๊ณ„
      • 7.1 ๋กœ๋ด‡๊ณตํ•™์— ๋Œ€ํ•œ ์‹œ์‚ฌ์ 
      • 7.2 ํ˜„์žฌ ํ•œ๊ณ„์ 
      • 7.3 ๋ฏธ๋ž˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ
    • 8. ๊ฒฐ๋ก 
      • ํ•ต์‹ฌ Contributions
      • ์—ฐ๊ตฌ์˜ ์˜์˜
  • โ›๏ธ Dig Review
    • Introduction
    • Main Contributions
    • CompDiffuser์˜ ๋ฐฉ๋ฒ• ๋ฐ ๋ชจ๋ธ ๊ตฌ์กฐ
      • ๊ถค์ ์˜ ๊ตฌ์„ฑ์  ๋ชจ๋ธ๋ง (Trajectory Distribution Factorization)
      • ์ธ์ ‘ ์ฒญํฌ ์กฐ๊ฑด๋ถ€ ํ™•์‚ฐ ๋ชจ๋ธ๊ณผ ์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „๋‹ฌ
      • ๊ถค์  ์ƒ์„ฑ ์ „๋žต: ๋ณ‘๋ ฌ vs. ์ž๊ธฐํšŒ๊ท€ (Parallel vs Autoregressive Sampling)
    • ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ (Experiments and Results)
      • ์„ฑ๋Šฅ ๋น„๊ต ๋ฐ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ
      • ์ถ”๊ฐ€ ๋ถ„์„: ์„ธ๊ทธ๋จผํŠธ ์ˆ˜, ์ƒํƒœ ์ฐจ์›, ์žฌ๊ณ„ํš ๋“ฑ
    • ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•๊ณผ์˜ ์ฐจ๋ณ„์  (Discussion)
    • ๊ฒฐ๋ก  ๋ฐ ์ „๋ง (Conclusion)

๐Ÿ“ƒComp-Diffusior ๋ฆฌ๋ทฐ

rl
benchmark
tactile
Generative Trajectory Stitching through Diffusion Composition
Published

November 29, 2025

๐Ÿ” Ping. ๐Ÿ”” Ring. โ›๏ธ Dig. A tiered review series: quick look, key ideas, deep dive.

  • Paper Link
  • Code
  • Project
  1. ๐Ÿค– ์ด ๋…ผ๋ฌธ์€ ๋กœ๋ด‡ ์žฅ๊ธฐ ๊ณ„ํš(long-horizon planning)์—์„œ ๊ธฐ์กด ํ™•์‚ฐ ๋ชจ๋ธ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, ์งง์€ ๊ถค์  ์กฐ๊ฐ๋“ค์„ ์กฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ž‘์—…์„ ํ•ด๊ฒฐํ•˜๋Š” ์ƒ์„ฑ์  ๊ถค์  ์Šคํ‹ฐ์นญ(trajectory stitching) ๋ฐฉ๋ฒ•์ธ CompDiffuser๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ’ก CompDiffuser๋Š” ๊ถค์  ๋ถ„ํฌ๋ฅผ ๊ฒน์น˜๋Š” ์กฐ๊ฐ๋“ค๋กœ ๋‚˜๋ˆ„๊ณ  ๋‹จ์ผ ์–‘๋ฐฉํ–ฅ ํ™•์‚ฐ ๋ชจ๋ธ์„ ํ†ตํ•ด ์กฐ๊ฑด๋ถ€ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•จ์œผ๋กœ์จ, ์ƒ์„ฑ ๊ณผ์ •์—์„œ ์„ธ๊ทธ๋จผํŠธ ๊ฐ„ ์ •๋ณด ์ „ํŒŒ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์—ฌ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋œ ์—ฐ๊ฒฐ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿš€ ๋‹ค์–‘ํ•œ ๋‚œ์ด๋„์˜ ๋ฒค์น˜๋งˆํฌ ์‹คํ—˜์—์„œ CompDiffuser๋Š” ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ์งง์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ๋„ ์žฅ๊ธฐ ๊ณ„ํš ์ž‘์—…์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๊ณ  ๊ถค์ ์˜ ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ ๋ฐ ๋ชฉํ‘œ ๋„๋‹ฌ ํ–‰๋™์„ ์œ ์ง€ํ•จ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

๋กœ๋ด‡ ์˜์‚ฌ ๊ฒฐ์ • ๋ถ„์•ผ์—์„œ ์žฅ๊ธฐ์ ์ธ ๊ณ„ํš์„ ์œ„ํ•œ ํšจ๊ณผ์ ์ธ ๊ถค์  ์Šคํ‹ฐ์นญ(trajectory stitching)์€ ์ค‘์š”ํ•œ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. ํ™•์‚ฐ ๋ชจ๋ธ(diffusion models)์€ ๊ณ„ํš ์ˆ˜๋ฆฝ์— ์œ ๋งํ•จ์„ ๋ณด์˜€์ง€๋งŒ, ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋ณธ ๊ฒƒ๊ณผ ์œ ์‚ฌํ•œ ์ž‘์—…์œผ๋กœ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์ด ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ CompDiffuser๋ผ๋Š” ์ƒˆ๋กœ์šด ์ƒ์„ฑ์  ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ์ด์ „์— ํ•™์Šต๋œ ์งง์€ ๊ถค์  ์กฐ๊ฐ(trajectory chunks)๋“ค์„ ๊ตฌ์„ฑ์ ์œผ๋กœ ์ด์–ด ๋ถ™์—ฌ ์ƒˆ๋กœ์šด ์ž‘์—…์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ๊ถค์  ๋ถ„ํฌ๋ฅผ ๊ฒน์น˜๋Š” ์กฐ๊ฐ๋“ค๋กœ ์„ธ๋ถ„ํ™”ํ•˜๊ณ , ๋‹จ์ผ ์–‘๋ฐฉํ–ฅ ํ™•์‚ฐ ๋ชจ๋ธ(bidirectional diffusion model)์„ ํ†ตํ•ด ์ด ์กฐ๊ฐ๋“ค์˜ ์กฐ๊ฑด๋ถ€ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•จ์œผ๋กœ์จ ๊ถค์  ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ ๊ณผ์ •์—์„œ ์„ธ๊ทธ๋จผํŠธ ๊ฐ„์— ์ •๋ณด๊ฐ€ ์ „ํŒŒ๋˜์–ด ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋œ ์—ฐ๊ฒฐ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ํฌ๊ธฐ, ์—์ด์ „ํŠธ ์ƒํƒœ ์ฐจ์›, ๊ถค์  ์œ ํ˜•, ํ•™์Šต ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๋“ฑ ๋‹ค์–‘ํ•œ ๋‚œ์ด๋„์˜ ๋ฒค์น˜๋งˆํฌ ํƒœ์Šคํฌ์—์„œ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ–ˆ์œผ๋ฉฐ, CompDiffuser๊ฐ€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ํฌ๊ฒŒ ๋Šฅ๊ฐ€ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

1. ์„œ๋ก  ๋ฐ ๊ด€๋ จ ์—ฐ๊ตฌ

๊ธฐ์กด ์ƒ์„ฑ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ๊ณ„ํš ๋ฐฉ๋ฒ•๋ก (์˜ˆ: Diffuser, Decision Diffuser)์€ ์ „์ฒด ๊ณ„ํš ์‹œํ€€์Šค์— ๋Œ€ํ•œ ๊ฒฐํ•ฉ ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜์—ฌ ๊ณ„์‚ฐ ๋น„์šฉ์„ ์ƒ๊ฐํ•˜์ง€๋งŒ, ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ์‹œ์ž‘-๋ชฉํ‘œ ์ƒํƒœ ์กฐํ•ฉ์„ ํฌํ•จํ•˜๋Š” ์žฅ๊ธฐ ๊ณ„ํš ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ด์•ผ ํ•˜๋ฏ€๋กœ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ์ด ๋งค์šฐ ๋‚ฎ์Šต๋‹ˆ๋‹ค. ๊ถค์  ์Šคํ‹ฐ์นญ์€ ๋ณด์ƒ ๋†’์€ ๊ถค์  ์กฐ๊ฐ๋“ค์„ ์—ฐ๊ฒฐํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ •์ฑ…์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌ์„ฑ์  ์ผ๋ฐ˜ํ™”(compositional generalization)๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋™์  ์ผ๊ด€์„ฑ(dynamic consistency)๊ณผ ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ(feasibility)์„ ์œ ์ง€ํ•˜๋ฉฐ ๊ถค์ ์„ ๊ฒฐํ•ฉํ•  ์ ์ ˆํ•œ ์Šคํ‹ฐ์นญ ์ง€์ ์„ ์ฐพ๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. CompDiffuser๋Š” ์žฅ๊ธฐ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์—†์ด๋„ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์‹คํ˜„ ๊ฐ€๋Šฅํ•˜๊ณ  ๋ชฉํ‘œ ์ง€ํ–ฅ์ ์ธ ๊ณ„ํš์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ๋กœ, ํ™•์‚ฐ ๋ชจ๋ธ์€ ๋ชจ์…˜ ๊ณ„ํš, ์ž‘์—… ๊ณ„ํš, ์ž์œจ ์ฃผํ–‰ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ์ ์šฉ๋˜์—ˆ์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ์œ ์‚ฌํ•œ ๊ณ„ํš ๋ฒ”์œ„์— ๊ตญํ•œ๋ฉ๋‹ˆ๋‹ค. ๊ถค์  ์Šคํ‹ฐ์นญ ๋ถ„์•ผ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•, ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹, ์‹œํ€€์Šค ๋ชจ๋ธ๋ง ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์ด ํƒ์ƒ‰๋˜์—ˆ์œผ๋‚˜, CompDiffuser๋Š” ์ƒ์„ฑ ๋ชจ๋ธ๋ง์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์งง์€ ๊ถค์  ์„ธ๊ทธ๋จผํŠธ๋งŒ์œผ๋กœ ๋ชฉํ‘œ ์กฐ๊ฑด๋ถ€ ๊ถค์  ์Šคํ‹ฐ์นญ์„ ์ง์ ‘ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์„ฑ์  ์ƒ์„ฑ ๋ชจ๋ธ์€ ์‹œ๊ฐ ์ฝ˜ํ…์ธ , ์ธ๊ฐ„ ๋™์ž‘ ์ƒ์„ฑ ๋“ฑ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ์—ฐ๊ตฌ๋˜์—ˆ์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„ ์—ฌ๋Ÿฌ ์กฐ๊ฑด์˜ ๊ฒฐํ•ฉ ์ƒ˜ํ”Œ๋ง์— ์ดˆ์ ์„ ๋งž์ถ”๊ฑฐ๋‚˜ ๋ฏธ๋ฆฌ ์ •์˜๋œ ์Šค์ผˆ๋ ˆํ†ค์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค. CompDiffuser๋Š” ์ด๋Ÿฌํ•œ ์ œ์•ฝ ์—†์ด ํ›จ์”ฌ ๊ธด ์‹œํ€€์Šค์™€ ์ƒˆ๋กœ์šด ์ž‘์—…์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ํ™•์žฅ๋ฉ๋‹ˆ๋‹ค.

2. ๊ตฌ์„ฑ์  ๊ถค์  ์ƒ์„ฑ์„ ํ†ตํ•œ ๊ณ„ํš ์ˆ˜๋ฆฝ (Planning through Compositional Trajectory Generation)

2.1. ๊ตฌ์„ฑ์  ๊ถค์  ๋ชจ๋ธ๋ง (Compositional Trajectory Modeling)

๊ณ„ํš ๋ฌธ์ œ๋Š” ์‹œ์ž‘ ์ƒํƒœ q_s์™€ ๋ชฉํ‘œ ์ƒํƒœ q_g๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ํ™•๋ฅ  ๋ถ„ํฌ p_\theta(\tau|q_s, q_g)๋กœ๋ถ€ํ„ฐ ๊ถค์  \tau = [s_{1:T}, a_{1:T}]๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ p(\tau)๋ฅผ ์ง์ ‘ ํ•™์Šตํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ์‹œ์ž‘-๋ชฉํ‘œ ์ƒํƒœ์™€ ์œ ์‚ฌํ•œ ๊ณ„ํš๋งŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. CompDiffuser๋Š” ๊ถค์  \tau๋ฅผ K๊ฐœ์˜ ๊ฒน์น˜๋Š” ํ•˜์œ„ ์กฐ๊ฐ \tau_k๋กœ ์„ธ๋ถ„ํ™”ํ•˜์—ฌ ๊ตฌ์„ฑ์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ถค์  ๋ถ„ํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค: p_\theta(\tau|q_s, q_g) \propto p_1(\tau_1|q_s, \tau_2) \prod_{k=2}^{K-1} p_k(\tau_k|\tau_{k-1}, \tau_{k+1}) p_K(\tau_K|\tau_{K-1}, q_g) ์—ฌ๊ธฐ์„œ ๊ฐ ๊ถค์  ์กฐ๊ฐ \tau_k๋Š” ์ธ์ ‘ํ•œ ์กฐ๊ฐ \tau_{k-1}๊ณผ \tau_{k+1}์—๋งŒ ์˜์กดํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ค‘๊ฐ„ ๊ถค์  ์กฐ๊ฐ \tau_k๊ฐ€ ํ•™์Šต๋œ ์ ์ด ์žˆ๋‹ค๋ฉด, ์ด์ „์— ๋ณธ ๊ถค์ ๊ณผ ์ƒ๋‹นํžˆ ๋‹ค๋ฅธ ๊ณ„ํš๋„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2.2. ๊ตฌ์„ฑ์  ๊ถค์  ๋ชจ๋ธ ํ›ˆ๋ จ (Training Compositional Trajectory Models)

๊ตฌ์„ฑ๋œ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ๊ฐ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ์ƒ˜ํ”Œ๋ง ๊ณผ์ •์ด ๋А๋ฆฌ๊ณ  ์ผ๊ด€๋œ ๊ณ„ํš์„ ๊ตฌ์„ฑํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. CompDiffuser๋Š” ํ™•์‚ฐ ๋ชจ๋ธ์˜ ์ ์ง„์ ์ธ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ(denoising) ๊ณผ์ •์„ ํ™œ์šฉํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ์€ ๊ถค์  ์„ธ๊ทธ๋จผํŠธ๋“ค์ด ํ™•์‚ฐ ๊ณผ์ •์—์„œ ์„œ๋กœ์˜ ์ƒ์„ฑ์„ ์œ ๋„ํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•œ ์„ธ๊ทธ๋จผํŠธ๊ฐ€ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋ฅผ ํ†ตํ•ด ํ˜•ํƒœ๋ฅผ ๊ฐ–์ถ”๋ฉด, ์ด๋Š” ํ˜ธํ™˜๋˜๋Š” ๊ตฌ์„ฑ์œผ๋กœ ์ด์›ƒ ์„ธ๊ทธ๋จผํŠธ์˜ ํ˜•ํƒœ๋ฅผ ์žก์•„์ฃผ๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด ์ด์›ƒ ์กฐ๊ฐ์˜ ๋…ธ์ด์ฆˆ ์ƒ˜ํ”Œ์— ์กฐ๊ฑด๋ถ€๋กœ ์˜์กดํ•˜์—ฌ ๊ถค์  ์กฐ๊ฐ์„ ์ƒ์„ฑํ•˜๋Š” ํ™•์‚ฐ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹ \mathcal{D}์˜ ๊ถค์  \tau๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋„คํŠธ์›Œํฌ \epsilon_\theta๋ฅผ ํ›ˆ๋ จํ•˜์—ฌ ๋‹ค์Œ ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ถค์  ๋ถ„ํฌ p_\theta(\tau_k|\tau_{k-1}, \tau_{k+1})๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค: \mathcal{L}_{nbr} = \mathbb{E}_{\tau \in \mathcal{D}, t, k} \left[ \left\| \epsilon - \epsilon_\theta(\tau_k^t, t | \tau_{k-1}^t, \tau_{k+1}^t) \right\|^2 \right] ์—ฌ๊ธฐ์„œ k๋Š” ๊ถค์  ์„ธ๊ทธ๋จผํŠธ๋ฅผ ์‹๋ณ„ํ•˜๊ณ , t๋Š” ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ์ด๋ฉฐ, \tau_k^t๋Š” ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ t๋กœ ์˜ค์—ผ๋œ ์„ธ๊ทธ๋จผํŠธ k๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๊ฐ ์„ธ๊ทธ๋จผํŠธ์˜ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ์‹œ ๋„คํŠธ์›Œํฌ๋Š” ๋™์ผํ•œ ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ์˜ ์ด์›ƒ ์„ธ๊ทธ๋จผํŠธ \tau_{k-1}^t, \tau_{k+1}^t์˜ ๋…ธ์ด์ฆˆ ๋ฒ„์ „์— ์กฐ๊ฑด๋ถ€๋กœ ์˜์กดํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ ์„ธ๊ทธ๋จผํŠธ๊ฐ€ ์ด์›ƒ์˜ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๊ณผ์ •์— ์˜ํ–ฅ์„ ๋ฏธ์ณ ์ตœ์ข… ๊ตฌ์„ฑ์ด ๋™์ ์œผ๋กœ ํ˜ธํ™˜๋˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ๋Š” ์—ฐ์†์ ์ธ ๊ถค์  ๊ฐ„์˜ ์ž‘์€ ๊ฒน์น˜๋Š” ์˜์—ญ์—๋งŒ ์กฐ๊ฑด๋ถ€๋กœ ์˜์กดํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ, ๋™์ผํ•œ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋„คํŠธ์›Œํฌ \epsilon_\theta๋ฅผ ํ›ˆ๋ จํ•˜์—ฌ p_\theta(\tau_1|q_s, \tau_2) ๋ฐ p_\theta(\tau_K|\tau_{K-1}, q_g) ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์Œ ๋ชฉ์  ํ•จ์ˆ˜์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค: \mathcal{L}_{start} = \mathbb{E}_{\tau \in \mathcal{D}, t, k} \left[ \left\| \epsilon - \epsilon_\theta(\tau_1^t, t | q_s, \tau_2^t) \right\|^2 \right] ๋ชฉํ‘œ ์ƒํƒœ q_g์— ๋Œ€ํ•œ ์กฐ๊ฑด๋ถ€๋„ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

2.3. ๊ตฌ์„ฑ์  ๊ถค์  ๊ณ„ํš (Compositional Trajectory Planning)

์ œ์•ˆ๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์žฅ๊ธฐ ๊ณ„ํš ์ƒ์„ฑ์„ ์œ„ํ•œ ์œ ์—ฐํ•œ ์ƒ˜ํ”Œ๋ง ์ „๋žต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ์ƒ˜ํ”Œ๋ง ๊ณผ์ •์€ ๊ฐ ๊ถค์  ์กฐ๊ฐ \tau_k๋ฅผ ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋กœ ์ดˆ๊ธฐํ™”ํ•œ ๋‹ค์Œ, ๋ฐ˜๋ณต์ ์ธ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋ฅผ ํ†ตํ•ด ๊ฐ ์กฐ๊ฐ์„ ์ด์›ƒ ์กฐ๊ฐ์— ์กฐ๊ฑด๋ถ€๋กœ ์˜์กดํ•˜์—ฌ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๊ฐ€์ง€ ์ƒ˜ํ”Œ๋ง ๋ฐฉ์‹์ด ์ œ์‹œ๋ฉ๋‹ˆ๋‹ค:

  1. ๋ณ‘๋ ฌ ์ƒ˜ํ”Œ๋ง (Parallel Sampling): ๊ฐ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋‹จ๊ณ„์—์„œ ์ด์ „ ๋‹จ๊ณ„์˜ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ธ์ ‘ ๊ถค์  ์กฐ๊ฐ ๊ฐ’์— ์กฐ๊ฑด๋ถ€๋กœ ์˜์กดํ•˜์—ฌ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. ์—…๋ฐ์ดํŠธ ๊ทœ์น™์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: \tau_k^{t-1} = \alpha_t(\tau_k^t - \epsilon_\theta(\tau_k^t|\tau_{k-1}^t, \tau_{k+1}^t) + \beta_t \xi), \quad \xi \sim \mathcal{N}(0, 1) ์ด ๋ฐฉ์‹์€ ๊ฐ ๊ถค์  ์กฐ๊ฐ์˜ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ฐ ๋‹จ๊ณ„์—์„œ์˜ ์ •๋ณด ์ „ํŒŒ๋Š” ์ œํ•œ์ ์ž…๋‹ˆ๋‹ค.
  2. ์ž๊ธฐํšŒ๊ท€ ์ƒ˜ํ”Œ๋ง (Autoregressive Sampling): ์ธ์ ‘ ๊ถค์  ์กฐ๊ฐ์˜ ๊ฐ’์„ ๋” ์ž˜ ์—ฐ๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋‹จ๊ณ„์—์„œ ๊ฐ ๊ถค์  ์กฐ๊ฐ์„ ์ž๊ธฐํšŒ๊ท€์ ์œผ๋กœ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, \tau_1๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ ์ˆœ์ฐจ์ ์œผ๋กœ \tau_K๊นŒ์ง€ ๊ฐ ๊ถค์ ์„ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐํ•˜๋ฉฐ, \tau_k์˜ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋Š” ํ˜„์žฌ ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ t-1์—์„œ ์ด์ „์— ๋””์ฝ”๋”ฉ๋œ ์กฐ๊ฐ \tau_{k-1}^{t-1}๊ณผ ์ด์ „ ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ t์—์„œ ๋ฏธ๋ž˜ ์กฐ๊ฐ \tau_{k+1}^t์— ์กฐ๊ฑด๋ถ€๋กœ ์˜์กดํ•ฉ๋‹ˆ๋‹ค: \tau_k^{t-1} = \alpha_t(\tau_k^t - \epsilon_\theta(\tau_k^t|\tau_{k-1}^{t-1}, \tau_{k+1}^t) + \beta_t \xi), \quad \xi \sim \mathcal{N}(0, 1) ์ด ์ˆœ์ฐจ์ ์ธ ์ƒ์„ฑ ๊ณผ์ •์€ ๊ฐ ์กฐ๊ฐ์ด ์ด์ „ ์กฐ๊ฐ์˜ ๋œ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ๋ฒ„์ „์— ์กฐ๊ฑด๋ถ€๋กœ ์˜์กดํ•˜๋ฏ€๋กœ ์กฐ๊ฐ๋“ค ๊ฐ„์˜ ๋” ๊ฐ•๋ ฅํ•œ ์กฐ์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ณ‘๋ ฌ ์ƒ˜ํ”Œ๋ง๋ณด๋‹ค ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์€ ๋‚ฎ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ž๊ธฐํšŒ๊ท€ ์ƒ˜ํ”Œ๋ง์ด ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ์ƒ์„ฑ๋œ ์กฐ๊ฐ \tau_{1:K}๋Š” ๊ฒน์น˜๋Š” ์˜์—ญ์— ์ง€์ˆ˜ ๊ถค์  ํ˜ผํ•ฉ(exponential trajectory blending)์„ ์ ์šฉํ•˜์—ฌ ํ•˜๋‚˜์˜ ์ตœ์ข… ๊ถค์  \tau_{comp}๋กœ ๋ณ‘ํ•ฉ๋ฉ๋‹ˆ๋‹ค.

3. ์‹คํ—˜ (Experiments)

CompDiffuser์˜ ์„ฑ๋Šฅ์€ PointMaze, AntMaze, HumanoidMaze, AntSoccer ๋“ฑ ๋‹ค์–‘ํ•œ ๋‚œ์ด๋„์˜ ๋ฒค์น˜๋งˆํฌ ํƒœ์Šคํฌ์—์„œ ํ‰๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜์€ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ํฌ๊ธฐ, ์—์ด์ „ํŠธ ์ƒํƒœ ์ฐจ์›, ๊ถค์  ์œ ํ˜•, ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

  • PointMaze: Ghugare et al. [21] ๋ฐ OGBench [51]์˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. CompDiffuser๋Š” ๋ชจ๋“  ๋ฏธ๋กœ ํฌ๊ธฐ์—์„œ ์„ฑ๊ณต์ ์œผ๋กœ ํƒœ์Šคํฌ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉฐ, ํŠนํžˆ ๋ณต์žกํ•œ Giant ๋ฏธ๋กœ์—์„œ ๋‹ค๋ฅธ ๋ชจ๋“  ๊ธฐ์ค€์„ ๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ์ž‘์€ ๊ฒน์นจ ์˜์—ญ์„ ์ž์œจ์ ์œผ๋กœ ์‹๋ณ„ํ•˜์ง€ ๋ชปํ•ด ์„ฑ๋Šฅ์ด ์ €์กฐํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ๊ณ ์ฐจ์› ํƒœ์Šคํฌ (High Dimension Tasks): AntMaze, HumanoidMaze, AntSoccer ํ™˜๊ฒฝ์—์„œ ๊ณ ์ฐจ์› ์ƒํƒœ ๊ณต๊ฐ„์„ ๋‹ค๋ฃจ๋Š” ์‹คํ—˜์ด ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. CompDiffuser๋Š” ๊ณ„ํš ๋ฒ”์œ„์™€ ๋ณต์žก์„ฑ์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๊พธ์ค€ํžˆ ์œ ์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ AntSoccer์—์„œ๋Š” 4D(๊ฐœ๋ฏธ์™€ ๊ณต์˜ x-y ์œ„์น˜) ๋ฐ 17D(๊ฐœ๋ฏธ์˜ ๊ด€์ ˆ ์œ„์น˜ ํฌํ•จ) ๊ณ„ํš ๊ณต๊ฐ„์—์„œ ๋ชจ๋“  ๊ธฐ์ค€์„ ์„ ๋Šฅ๊ฐ€ํ–ˆ์œผ๋ฉฐ, 17D๊ฐ€ ๋” ๋ฏธ์„ธํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์•ฝ๊ฐ„ ๋” ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
    • ๋‚ฎ์€ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ(AntMaze Explore)์—์„œ๋„ CompDiffuser๋Š” ํด๋Ÿฌ์Šคํ„ฐ๋ง๋œ ๊ถค์ ์—์„œ ํ•™์Šตํ•˜์—ฌ ์žฅ๊ฑฐ๋ฆฌ ๊ณ„ํš์„ ๊ตฌ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

3.1. ์–ด๋ธ”๋ ˆ์ด์…˜ ์—ฐ๊ตฌ (Ablation Studies)

  • ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ์˜ ๊ณ„ํš: 2D, 15D, 29D ๊ณ„ํš ์ฐจ์›์—์„œ CompDiffuser์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. AntMaze Medium์—์„œ๋Š” ๋ชจ๋“  ์ฐจ์›์—์„œ ๊ฑฐ์˜ ์ตœ์ ์˜ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. Large ๋ฐ Giant ๋ฏธ๋กœ์—์„œ๋Š” ๊ณ ์ฐจ์›์œผ๋กœ ๊ฐˆ์ˆ˜๋ก ์„ฑ๊ณต๋ฅ ์ด ๊ฐ์†Œํ–ˆ๋Š”๋ฐ, ์ด๋Š” ๊ถค์  ๋ชจ๋ธ๋ง์˜ ๋ณต์žก์„ฑ ์ฆ๊ฐ€(๊ด€์ ˆ ์œ„์น˜ ๋ฐ ์†๋„) ๋•Œ๋ฌธ์œผ๋กœ ๋ถ„์„๋ฉ๋‹ˆ๋‹ค.
  • ๊ตฌ์„ฑ๋œ ๊ถค์  ๊ฐœ์ˆ˜ ๋ณ€ํ™” (K): OGBench PointMaze-Giant-Stitch์—์„œ K๋ฅผ 7์—์„œ 12๊นŒ์ง€ ๋ณ€ํ™”์‹œ์ผฐ์„ ๋•Œ, CompDiffuser๋Š” ์ผ๊ด€๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ์ตœ์  K๋Š” 9~10๊ฐœ์˜€์Šต๋‹ˆ๋‹ค. ๋„ˆ๋ฌด ์ ์€ K๋Š” ํฌ์†Œํ•œ ๊ถค์ ์„, ๋„ˆ๋ฌด ๋งŽ์€ K๋Š” ๋ถˆํ•„์š”ํ•œ ์›€์ง์ž„์„ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • CompDiffuser๋ฅผ ์ด์šฉํ•œ ๋ฆฌํ”Œ๋ž˜๋‹ (Replanning): ์—์ด์ „ํŠธ๊ฐ€ ๊ณ„ํš๋œ ๊ถค์ ์„ ๋ฒ—์–ด๋‚˜๋Š” ๊ฒฝ์šฐ(์˜ˆ: ์—ญ๋™ํ•™ ๋ชจ๋ธ์˜ ์˜ค๋ฅ˜) ์œ ์—ฐํ•˜๊ฒŒ ๋ฆฌํ”Œ๋ž˜๋‹ํ•˜๋Š” ๊ธฐ๋Šฅ์„ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฆฌํ”Œ๋ž˜๋‹์€ ํŠนํžˆ ๋ณต์žกํ•œ Giant ๋ฏธ๋กœ์—์„œ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
  • ๋ณ‘๋ ฌ vs. ์ž๊ธฐํšŒ๊ท€ ์ƒ˜ํ”Œ๋ง: ์ž๊ธฐํšŒ๊ท€ ์ƒ˜ํ”Œ๋ง์€ ๋ณ‘๋ ฌ ์ƒ˜ํ”Œ๋ง๋ณด๋‹ค ๊ณ„ํš ํ’ˆ์งˆ์—์„œ ์ผ๊ด€๋˜๊ฒŒ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ ๊ถค์  ์กฐ๊ฐ์ด ์ด๋ฏธ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋œ(๋œ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š”) ์ด์ „ ์กฐ๊ฐ์— ์กฐ๊ฑด๋ถ€๋กœ ์˜์กดํ•˜๋Š” ์ธ๊ณผ์  ์ •๋ณด ํ๋ฆ„์ด ์กฐ๊ฐ ๊ฐ„์˜ ๋” ์ผ๊ด€๋˜๊ณ  ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์ „ํ™˜์œผ๋กœ ์ด์–ด์ง„๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

4. ๊ฒฐ๋ก  (Conclusion)

๋ณธ ๋…ผ๋ฌธ์€ ํ™•์‚ฐ ๋ชจ๋ธ์˜ ๊ตฌ์„ฑ์„ฑ์„ ํ™œ์šฉํ•œ ์ƒ์„ฑ์  ๊ถค์  ์Šคํ‹ฐ์นญ ๋ฐฉ๋ฒ•์ธ CompDiffuser๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ ์กฐ๊ฑด๋ถ€ ์Šค์ฝ”์–ด ํ•จ์ˆ˜(noise-conditioned score function) ๊ณต์‹ํ™”๋ฅผ ํ†ตํ•ด ์—ฌ๋Ÿฌ ์งง์€ ๋ฒ”์œ„ ๊ถค์  ํ™•์‚ฐ ๋ชจ๋ธ์˜ ์ž๊ธฐํšŒ๊ท€ ์ƒ˜ํ”Œ๋ง์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ด๋ฅผ ์ด์–ด ๋ถ™์—ฌ ์žฅ๊ธฐ ๋ชฉํ‘œ ์กฐ๊ฑด๋ถ€ ๊ถค์ ์„ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. CompDiffuser๋Š” ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ํฌ๊ธฐ, ๊ณ„ํš ์ƒํƒœ ์ฐจ์›, ๊ถค์  ์œ ํ˜•, ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๋“ฑ ๋‹ค์–‘ํ•œ ๋‚œ์ด๋„์˜ ํƒœ์Šคํฌ์—์„œ ํšจ๊ณผ์ ์ธ ๊ถค์  ์Šคํ‹ฐ์นญ ๋Šฅ๋ ฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

์ œํ•œ์‚ฌํ•ญ ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ:

  • ๋งŽ์€ ์ˆ˜์˜ ๊ถค์ ์„ ๊ตฌ์„ฑํ•  ๋•Œ ์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „ํŒŒ์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ˆ„์ ๋˜์–ด ๋น„ํ˜„์‹ค์ ์ธ ๊ณ„ํš์œผ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋„๋ฉ”์ธ/ํƒœ์Šคํฌ๋ณ„ ๊ฑฐ๋ถ€ ์ƒ˜ํ”Œ๋ง(rejection sampling) ๋˜๋Š” MCMC ์ƒ˜ํ”Œ๋ง์œผ๋กœ ์™„ํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ตœ์ ์˜ ํ…Œ์ŠคํŠธ ์‹œ์  ๊ตฌ์„ฑ ์กฐ๊ฐ ์ˆ˜ K๊ฐ€ ํƒœ์Šคํฌ์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ๋Š” ์ƒ์„ฑ๋œ ๊ณ„ํš์˜ ํ’ˆ์งˆ์— ๋”ฐ๋ผ ์กฐ๊ฐ ์ˆ˜๋ฅผ ์ ์ง„์ ์œผ๋กœ ๋Š˜๋ฆฌ๋Š” ๋“ฑ ์ ์ ˆํ•œ K ๊ฐ’์„ ์ž๋™์œผ๋กœ ์‹๋ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํƒ์ƒ‰ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

1. ์„œ๋ก : Long-Horizon Planning์˜ ๊ทผ๋ณธ์  ๋„์ „

๋กœ๋ด‡๊ณตํ•™์—์„œ long-horizon planning์€ ์—ฌ์ „ํžˆ ํ•ด๊ฒฐํ•˜๊ธฐ ์–ด๋ ค์šด ํ•ต์‹ฌ ๊ณผ์ œ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋กœ๋ด‡์ด ๋ณต์žกํ•œ ํ™˜๊ฒฝ์—์„œ ์‹œ์ž‘์ ๋ถ€ํ„ฐ ๋ชฉํ‘œ์ ๊นŒ์ง€ ๋„๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ˆ˜๋ฐฑ, ๋•Œ๋กœ๋Š” ์ˆ˜์ฒœ ์Šคํ…์— ๊ฑธ์นœ ์ผ๊ด€๋œ ํ–‰๋™ ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ „ํ†ต์ ์ธ ๊ฐ•ํ™”ํ•™์Šต(RL) ๋ฐฉ๋ฒ•๋ก ์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์—์„œ credit assignment์˜ ์–ด๋ ค์›€, ํƒ์ƒ‰ ๊ณต๊ฐ„์˜ ํญ๋ฐœ์  ์ฆ๊ฐ€, ๊ทธ๋ฆฌ๊ณ  ํฌ์†Œ ๋ณด์ƒ(sparse reward) ํ™˜๊ฒฝ์—์„œ์˜ ํ•™์Šต ๋ถˆ์•ˆ์ •์„ฑ ๋“ฑ์œผ๋กœ ์ธํ•ด ํ•œ๊ณ„๋ฅผ ๋ณด์—ฌ์™”์Šต๋‹ˆ๋‹ค.

์ตœ๊ทผ diffusion models์ด ๋กœ๋ด‡ ํ”Œ๋ž˜๋‹ ๋ถ„์•ผ์—์„œ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Janner et al.์˜ Diffuser(2022)๋ฅผ ์‹œ์ž‘์œผ๋กœ, Decision Diffuser, Hierarchical Diffuser ๋“ฑ ๋‹ค์–‘ํ•œ diffusion ๊ธฐ๋ฐ˜ ํ”Œ๋ž˜๋„ˆ๋“ค์ด ๋“ฑ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋“ค์€ trajectory ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•˜์—ฌ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋œ ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค: ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋ณธ ์  ์—†๋Š” ์ƒˆ๋กœ์šด task์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ๋ถ€์กฑํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ trajectory stitching์˜ ๊ฐœ๋…์ด ๋“ฑ์žฅํ•ฉ๋‹ˆ๋‹ค. Trajectory stitching์ด๋ž€ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์กด์žฌํ•˜๋Š” ์งง์€ trajectory ์กฐ๊ฐ๋“ค์„ ์กฐํ•ฉํ•˜์—ฌ, ๋ฐ์ดํ„ฐ์…‹์—๋Š” ์กด์žฌํ•˜์ง€ ์•Š๋Š” ์ƒˆ๋กœ์šด long-horizon trajectory๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋งˆ์น˜ ํผ์ฆ ์กฐ๊ฐ๋“ค์„ ๋งž์ถ”๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ๊ฐœ๋ณ„์ ์œผ๋กœ๋Š” ์งง์€ ๊ฒฝ๋กœ ์กฐ๊ฐ๋“ค์„ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋˜๊ฒŒ ์—ฐ๊ฒฐํ•˜์—ฌ ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ๊ธด ๊ฒฝ๋กœ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” CompDiffuser (Compositional Diffuser)๋Š” ์ด๋Ÿฌํ•œ trajectory stitching ๋ฌธ์ œ๋ฅผ diffusion model์˜ compositionality๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์ „์ฒด trajectory ๋ถ„ํฌ๋ฅผ overlapping chunk๋“ค์˜ ๋ถ„ํฌ๋กœ ๋ถ„ํ•ดํ•˜๊ณ , ์ด๋“ค ๊ฐ„์˜ ์กฐ๊ฑด๋ถ€ ๊ด€๊ณ„๋ฅผ ๋‹จ์ผ ์–‘๋ฐฉํ–ฅ(bidirectional) diffusion model๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.


2. ๋ฌธ์ œ ์ •์˜: Trajectory Stitching์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

2.1 Trajectory Stitching์˜ ํ˜•์‹์  ์ •์˜

Trajectory stitching ๋ฌธ์ œ๋ฅผ ํ˜•์‹์ ์œผ๋กœ ์ •์˜ํ•ด๋ด…์‹œ๋‹ค. ์šฐ๋ฆฌ์—๊ฒŒ ์ฃผ์–ด์ง„ ๊ฒƒ์€ ์˜คํ”„๋ผ์ธ ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค:

\mathcal{D} = \{\tau_1, \tau_2, ..., \tau_N\}

์—ฌ๊ธฐ์„œ ๊ฐ trajectory \tau_i = (s_0, a_0, s_1, a_1, ..., s_T)๋Š” ์ƒํƒœ(state)์™€ ํ–‰๋™(action)์˜ ์‹œํ€€์Šค์ž…๋‹ˆ๋‹ค. ํ•ต์‹ฌ์ ์ธ ์ œ์•ฝ์€ ์ด ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฐ trajectory๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์งง๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ฏธ๋กœ ํ™˜๊ฒฝ์—์„œ ๊ฐ trajectory๋Š” ์ตœ๋Œ€ 4๋ธ”๋ก๋งŒ ์ด๋™ํ•˜๋Š” ์งง์€ ๊ฒฝ๋กœ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ…Œ์ŠคํŠธ ์‹œ์ ์—์„œ๋Š” ํ›จ์”ฌ ๋” ๊ธด horizon์˜ task๊ฐ€ ์ฃผ์–ด์ง‘๋‹ˆ๋‹ค. Goal-conditioned setting์—์„œ ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค:

\text{Given: } s_0 \text{ (start)}, \quad s_g \text{ (goal)} \text{Find: } \tau^* = (s_0, a_0, s_1, ..., s_T = s_g)

์—ฌ๊ธฐ์„œ T๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ trajectory ๊ธธ์ด๋ณด๋‹ค ํ›จ์”ฌ ํด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๊ฐ trajectory๊ฐ€ ์ตœ๋Œ€ 4๋ธ”๋ก์„ ์ด๋™ํ•˜์ง€๋งŒ, ํ…Œ์ŠคํŠธ ์‹œ์—๋Š” 15๋ธ”๋ก ์ด์ƒ์„ ์ด๋™ํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2.2 ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก ์˜ ํ•œ๊ณ„

๊ธฐ์กด์˜ diffusion ๊ธฐ๋ฐ˜ ํ”Œ๋ž˜๋„ˆ๋“ค์ด trajectory stitching์— ์‹คํŒจํ•˜๋Š” ์ด์œ ๋ฅผ ์‚ดํŽด๋ด…์‹œ๋‹ค:

1. Monolithic Generation

Decision Diffuser์™€ ๊ฐ™์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ์ „์ฒด trajectory๋ฅผ ํ•˜๋‚˜์˜ ๋‹จ์œ„๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๋Š” ๋ถ„ํฌ๋Š”:

p_\theta(\tau) = p_\theta(s_0, a_0, s_1, a_1, ..., s_T)

์ด ๊ฒฝ์šฐ, ๋ชจ๋ธ์€ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ trajectory ๊ธธ์ด T์— ๊ฐ•ํ•˜๊ฒŒ ์˜์กดํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋” ๊ธด horizon T' > T๋กœ์˜ ์ผ๋ฐ˜ํ™”๊ฐ€ ๊ตฌ์กฐ์ ์œผ๋กœ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

2. Distribution Mismatch

ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ trajectory ๋ถ„ํฌ p_{data}(\tau)์™€ ํ…Œ์ŠคํŠธ ์‹œ ์š”๊ตฌ๋˜๋Š” trajectory ๋ถ„ํฌ p_{test}(\tau) ์‚ฌ์ด์— ๊ทผ๋ณธ์ ์ธ ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค:

p_{data}(\tau) \neq p_{test}(\tau)

์งง์€ trajectory๋“ค์˜ ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด ๊ธด trajectory๋ฅผ ์ƒ์„ฑํ•˜๋ ค ํ•˜๋ฉด, out-of-distribution(OOD) ์˜์—ญ์œผ๋กœ ๋น ์ ธ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์˜ Figure 1์—์„œ ๋ณด์—ฌ์ฃผ๋“ฏ์ด, monolithic planner๋Š” long-horizon task์—์„œ maze ์ค‘์•™์œผ๋กœ collapseํ•˜๋Š” ํ˜„์ƒ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

3. Lack of Compositional Structure

๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ trajectory์˜ compositional ๊ตฌ์กฐ๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ฆ‰, trajectory๊ฐ€ ๋” ์ž‘์€ ์กฐ๊ฐ๋“ค์˜ ์กฐํ•ฉ์œผ๋กœ ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค:

\tau = \tau^1 \oplus \tau^2 \oplus ... \oplus \tau^K

์—ฌ๊ธฐ์„œ \oplus๋Š” trajectory ์กฐ๊ฐ๋“ค์˜ ์—ฐ๊ฒฐ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.


3. CompDiffuser: ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก 

3.1 ํ•ต์‹ฌ Insight: Compositional Trajectory Distribution

CompDiffuser์˜ ํ•ต์‹ฌ insight๋Š” ๋งค์šฐ ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค: trajectory ๋ถ„ํฌ๋ฅผ overlapping chunk๋“ค์˜ ๋ถ„ํฌ๋กœ ๋ถ„ํ•ดํ•˜๊ณ , ์ด๋“ค ๊ฐ„์˜ ์กฐ๊ฑด๋ถ€ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ตฌ์ฒด์ ์œผ๋กœ, ์ „์ฒด trajectory \tau = (s_0, s_1, ..., s_T)๋ฅผ K๊ฐœ์˜ overlapping chunk๋“ค๋กœ ๋ถ„ํ•ดํ•ฉ๋‹ˆ๋‹ค:

\tau = \{\tau^1, \tau^2, ..., \tau^K\}

์—ฌ๊ธฐ์„œ ๊ฐ chunk \tau^i๋Š” ์—ฐ์†๋œ ์ƒํƒœ๋“ค์˜ subsequence์ž…๋‹ˆ๋‹ค:

\tau^i = (s_{t_i}, s_{t_i+1}, ..., s_{t_i+H})

H๋Š” ๊ฐ chunk์˜ horizon์ด๋ฉฐ, ์ธ์ ‘ํ•œ chunk๋“ค ์‚ฌ์ด์—๋Š” overlap์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค:

\tau^i \cap \tau^{i+1} = (s_{t_{i+1}}, ..., s_{t_i+H})

์ด overlap ์˜์—ญ์ด ๋ฐ”๋กœ chunk๋“ค์„ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋˜๊ฒŒ ์—ฐ๊ฒฐํ•˜๋Š” โ€œ์ ‘์ฐฉ์ œโ€ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

3.2 Noise-Conditioned Score Function Formulation

CompDiffuser์˜ ๊ธฐ์ˆ ์  ํ•ต์‹ฌ์€ noise-conditioned score function ์ •์‹ํ™”์— ์žˆ์Šต๋‹ˆ๋‹ค.

์ „ํ†ต์ ์ธ diffusion model์€ ๋ฐ์ดํ„ฐ์˜ score function์„ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค:

s_\theta(x, t) \approx \nabla_x \log p_t(x)

CompDiffuser๋Š” ์ด๋ฅผ ํ™•์žฅํ•˜์—ฌ, ์ธ์ ‘ chunk๋“ค์˜ noisy ๋ฒ„์ „์— ์กฐ๊ฑดํ™”๋œ score function์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค:

s_\theta(\tau^i, t \mid \tilde{\tau}^{i-1}, \tilde{\tau}^{i+1}) \approx \nabla_{\tau^i} \log p_t(\tau^i \mid \tilde{\tau}^{i-1}, \tilde{\tau}^{i+1})

์—ฌ๊ธฐ์„œ \tilde{\tau}๋Š” noise๊ฐ€ ์ถ”๊ฐ€๋œ ๋ฒ„์ „์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค:

\tilde{\tau}^j = \sqrt{\bar{\alpha}_t} \tau^j + \sqrt{1 - \bar{\alpha}_t} \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)

์ด ์ •์‹ํ™”์˜ ํ•ต์‹ฌ์ ์ธ ์žฅ์ ์€ ์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „ํŒŒ(bidirectional information propagation)๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ chunk์˜ ์ƒ์„ฑ์ด ๊ณผ๊ฑฐ(\tau^{i-1})๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ฏธ๋ž˜(\tau^{i+1}) chunk์˜ ์ •๋ณด์—๋„ ์˜์กดํ•ฉ๋‹ˆ๋‹ค.

3.3 Bidirectional Diffusion Process

์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „ํŒŒ๊ฐ€ ์™œ ์ค‘์š”ํ• ๊นŒ์š”? Goal-conditioned planning์—์„œ๋Š” ์‹œ์ž‘์  s_0์™€ ๋ชฉํ‘œ์  s_g ๋ชจ๋‘๊ฐ€ ์ฃผ์–ด์ง‘๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ˆœ๋ฐฉํ–ฅ์œผ๋กœ๋งŒ ์ƒ์„ฑํ•œ๋‹ค๋ฉด:

p(\tau^1) \rightarrow p(\tau^2 \mid \tau^1) \rightarrow ... \rightarrow p(\tau^K \mid \tau^{K-1})

์ด ๊ฒฝ์šฐ, ์ดˆ๊ธฐ์— ๋งŒ๋“ค์–ด์ง„ chunk๋“ค์ด ๋‚˜์ค‘์— ๋ชฉํ‘œ์ ์— ๋„๋‹ฌํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ •๋ณด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด๋Š” trajectory๊ฐ€ ๋ชฉํ‘œ๋ฅผ ํ–ฅํ•ด ์ผ๊ด€๋˜๊ฒŒ ์ง„ํ–‰ํ•˜์ง€ ๋ชปํ•˜๊ณ  โ€œ๋ฐฉํ™ฉโ€ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค.

CompDiffuser์˜ reverse diffusion process์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ผ์ด ์ผ์–ด๋‚ฉ๋‹ˆ๋‹ค:

  1. ๋ณ‘๋ ฌ ์ดˆ๊ธฐํ™”: ๋ชจ๋“  K๊ฐœ์˜ chunk๊ฐ€ Gaussian noise๋กœ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค: \tau^i_T \sim \mathcal{N}(0, I), \quad \forall i \in \{1, ..., K\}

  2. ๋™์‹œ Denoising: ๊ฐ diffusion step t์—์„œ ๋ชจ๋“  chunk๊ฐ€ ๋™์‹œ์— ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค: \tau^i_{t-1} = \frac{1}{\sqrt{\alpha_t}}\left(\tau^i_t - \frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta(\tau^i_t, t, \tilde{\tau}^{i-1}_t, \tilde{\tau}^{i+1}_t)\right) + \sigma_t z

  3. ์ •๋ณด ์ „ํŒŒ: ๊ฐ chunk์˜ denoising์ด ์ธ์ ‘ chunk๋“ค์˜ ํ˜„์žฌ ์ƒํƒœ์— ์˜์กดํ•˜๋ฏ€๋กœ, ์ •๋ณด๊ฐ€ ์–‘๋ฐฉํ–ฅ์œผ๋กœ ํ๋ฆ…๋‹ˆ๋‹ค.

  4. ์ˆ˜๋ ด: ์ตœ์ข…์ ์œผ๋กœ ๋ชจ๋“  chunk๊ฐ€ ์„œ๋กœ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋œ ์ƒํƒœ๋กœ ์ˆ˜๋ ดํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ณผ์ •์€ ๋งˆ์น˜ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์ด ๋™์‹œ์— ํผ์ฆ์„ ๋งž์ถ”๋˜, ๊ฐ์ž๊ฐ€ ์ธ์ ‘ํ•œ ์‚ฌ๋žŒ๋“ค์˜ ์ง„ํ–‰ ์ƒํ™ฉ์„ ๊ณ„์† ํ™•์ธํ•˜๋ฉด์„œ ์ž‘์—…ํ•˜๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

3.4 Training Objective

CompDiffuser์˜ training objective๋Š” standard denoising score matching์„ ํ™•์žฅํ•œ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค:

\mathcal{L}(\theta) = \mathbb{E}_{t, \tau, \epsilon}\left[\|\epsilon - \epsilon_\theta(\tau^i_t, t, \tilde{\tau}^{i-1}, \tilde{\tau}^{i+1})\|^2\right]

์—ฌ๊ธฐ์„œ: - t \sim \text{Uniform}(1, T): diffusion timestep - \tau \sim \mathcal{D}: ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ƒ˜ํ”Œ๋ง๋œ trajectory - \epsilon \sim \mathcal{N}(0, I): noise - \tau^i_t = \sqrt{\bar{\alpha}_t}\tau^i + \sqrt{1-\bar{\alpha}_t}\epsilon: noisy trajectory chunk

์กฐ๊ฑด์œผ๋กœ ์ฃผ์–ด์ง€๋Š” \tilde{\tau}^{i-1}, \tilde{\tau}^{i+1}๋„ ๋‹ค์–‘ํ•œ noise level์—์„œ ์ƒ˜ํ”Œ๋ง๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” inference ์‹œ์— ๋‹ค์–‘ํ•œ noise level์˜ ์กฐ๊ฑด์— robustํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.

ํ•™์Šต์˜ ํ•ต์‹ฌ ํŠน์ง•: - ๋‹จ์ผ ๋ชจ๋ธ: ์ฒซ ๋ฒˆ์งธ chunk, ์ค‘๊ฐ„ chunk, ๋งˆ์ง€๋ง‰ chunk ๋ชจ๋‘ ๋™์ผํ•œ ๋ชจ๋ธ๋กœ ์ฒ˜๋ฆฌ - Position-agnostic: chunk์˜ ์ ˆ๋Œ€์  ์œ„์น˜์— ์˜์กดํ•˜์ง€ ์•Š์Œ - Noise-level conditioning: ๋‹ค์–‘ํ•œ noise level์—์„œ์˜ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑ ํ•™์Šต

3.5 Autoregressive Sampling with Composition

์‹ค์ œ inference ์‹œ์—๋Š” ํšจ์œจ์„ฑ์„ ์œ„ํ•ด autoregressive sampling ๋ฐฉ์‹์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค:

Algorithm: CompDiffuser Inference

Input: start state s_0, goal state s_g, number of chunks K
Output: complete trajectory ฯ„

1. Initialize: ฯ„^K_T with s_g fixed at the end
2. For i = 1 to K:
   a. Initialize ฯ„^i_T ~ N(0, I) (except boundary conditions)
   b. For t = T to 1:
      - Compute ฮต_ฮธ(ฯ„^i_t, t, ฯ„^{i-1}, ฯ„^{i+1})
      - Update ฯ„^i_{t-1} using DDPM update rule
      - Apply inpainting for boundary conditions
   c. Merge overlapping regions with previous chunk
3. Return concatenated trajectory ฯ„ = ฯ„^1 โŠ• ฯ„^2 โŠ• ... โŠ• ฯ„^K

Overlap ์˜์—ญ์˜ ์ฒ˜๋ฆฌ: \tau^i_{overlap} = \lambda \cdot \tau^{i-1}_{end} + (1-\lambda) \cdot \tau^i_{start}

์—ฌ๊ธฐ์„œ \lambda๋Š” blending coefficient์ž…๋‹ˆ๋‹ค. ์‹คํ—˜์—์„œ๋Š” ์ด์ „ chunk์˜ ๊ฐ’์„ ์œ ์ง€ํ•˜๋Š” ๋ฐฉ์‹(\lambda = 1)์ด ํšจ๊ณผ์ ์ž„์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.


4. ๊ธฐ์ˆ ์  ์„ธ๋ถ€์‚ฌํ•ญ

4.1 Network Architecture

CompDiffuser๋Š” 1D U-Net ๊ธฐ๋ฐ˜์˜ architecture๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ž…๋ ฅ ๊ตฌ์„ฑ:

\text{Input} = [\tau^i_t; \tilde{\tau}^{i-1}; \tilde{\tau}^{i+1}] \in \mathbb{R}^{3H \times d_s}

์—ฌ๊ธฐ์„œ: - H: chunk์˜ horizon (timesteps ์ˆ˜) - d_s: state dimension

Diffusion timestep t๋Š” sinusoidal positional embedding์„ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ์— ์ฃผ์ž…๋ฉ๋‹ˆ๋‹ค:

\text{PE}(t) = [\sin(t/10000^{0/d}), \cos(t/10000^{0/d}), ..., \sin(t/10000^{(d-1)/d}), \cos(t/10000^{(d-1)/d})]

U-Net์˜ encoder-decoder ๊ตฌ์กฐ๋Š” ๋‹ค์–‘ํ•œ temporal scale์—์„œ์˜ ์ •๋ณด๋ฅผ ํฌ์ฐฉํ•ฉ๋‹ˆ๋‹ค. Skip connections๋Š” fine-grained details๋ฅผ ๋ณด์กดํ•ฉ๋‹ˆ๋‹ค.

4.2 Handling Boundary Conditions

Goal-conditioned planning์—์„œ boundary conditions ์ฒ˜๋ฆฌ๋Š” inpainting ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค:

\tau^1_t[0] = \sqrt{\bar{\alpha}_t} s_0 + \sqrt{1-\bar{\alpha}_t} \epsilon \tau^K_t[-1] = \sqrt{\bar{\alpha}_t} s_g + \sqrt{1-\bar{\alpha}_t} \epsilon

Denoising ๊ณผ์ •์—์„œ ์ด ์œ„์น˜๋“ค์€ ๋งค step๋งˆ๋‹ค ground truth ๊ฐ’์˜ noisy version์œผ๋กœ ๋Œ€์ฒด๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ trajectory๊ฐ€ ์ •ํ™•ํžˆ ์‹œ์ž‘์ ์—์„œ ์‹œ์ž‘ํ•˜๊ณ  ๋ชฉํ‘œ์ ์—์„œ ๋๋‚˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

๋” ์ •๊ตํ•œ guidance๋ฅผ ์œ„ํ•ด classifier-free guidance๋„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

\tilde{\epsilon}_\theta = \epsilon_\theta(\tau^i_t, t, \emptyset) + w \cdot (\epsilon_\theta(\tau^i_t, t, s_0, s_g) - \epsilon_\theta(\tau^i_t, t, \emptyset))

์—ฌ๊ธฐ์„œ w > 1์€ guidance scale์ž…๋‹ˆ๋‹ค.

4.3 Flexible Chunk Count at Inference

ํ•™์Šต ์‹œ์—๋Š” ๊ณ ์ •๋œ ์ˆ˜์˜ chunk๋กœ ํ•™์Šตํ•˜์ง€๋งŒ, inference ์‹œ์—๋Š” chunk ์ˆ˜ K๋ฅผ ์œ ์—ฐํ•˜๊ฒŒ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” CompDiffuser์˜ ์ค‘์š”ํ•œ ํŠน์ง•์ž…๋‹ˆ๋‹ค.

Chunk ์ˆ˜ K์™€ overlap ๊ธธ์ด O์˜ ๊ด€๊ณ„:

L_{total} = K \cdot H - (K-1) \cdot O

์—ฌ๊ธฐ์„œ L_{total}์€ ์ „์ฒด trajectory ๊ธธ์ด์ž…๋‹ˆ๋‹ค. K๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋ฉด: - ์žฅ์ : ๋” ๊ธด trajectory ์ƒ์„ฑ ๊ฐ€๋Šฅ, ๋” ์œ ์—ฐํ•œ ๊ฒฝ๋กœ - ๋‹จ์ : overlap ๊ฐ์†Œ๋กœ ์ธํ•œ ๋ฌผ๋ฆฌ์  ์ผ๊ด€์„ฑ ์ €ํ•˜ ๊ฐ€๋Šฅ์„ฑ

๋…ผ๋ฌธ์—์„œ๋Š” task์˜ ๋ณต์žก๋„์— ๋”ฐ๋ผ K๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค: - ๊ฐ€๊นŒ์šด goal: K = 3-5 - ๋จผ goal: K = 8-12 - ๋งค์šฐ ๋ณต์žกํ•œ ํ™˜๊ฒฝ: K > 12

4.4 Replanning Strategy

์‹ค์ œ ๋กœ๋ด‡ ์ œ์–ด์—์„œ๋Š” open-loop planning๋งŒ์œผ๋กœ๋Š” ๋ถˆ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค. CompDiffuser๋Š” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ replanning์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค:

Replanning Algorithm:

Every N execution steps:
1. Get current robot state s_current
2. Estimate remaining distance to goal
3. Adjust K based on remaining distance
4. Generate new trajectory from s_current to s_g
5. Execute first segment of new trajectory

Replanning frequency์™€ ์„ฑ๋Šฅ์˜ trade-off: - ๋†’์€ frequency: ๋” robustํ•˜์ง€๋งŒ ๊ณ„์‚ฐ ๋น„์šฉ ์ฆ๊ฐ€ - ๋‚ฎ์€ frequency: ํšจ์œจ์ ์ด์ง€๋งŒ error accumulation ๊ฐ€๋Šฅ

์‹คํ—˜์—์„œ๋Š” ๋งค chunk ์‹คํ–‰ ํ›„ replanningํ•˜๋Š” ๊ฒƒ์ด ์ข‹์€ ๊ท ํ˜•์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.


5. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„

5.1 ์‹คํ—˜ ํ™˜๊ฒฝ: OGBench

CompDiffuser์˜ ํ‰๊ฐ€๋Š” ์ฃผ๋กœ OGBench (Offline Goal-Conditioned RL Benchmark)์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ™˜๊ฒฝ ๊ตฌ์„ฑ

Environment State Dim Action Dim Max Horizon Description
PointMaze 4 2 1000 2D point mass navigation
AntMaze 29 8 1000 8-DoF quadruped locomotion
HumanoidMaze 67 21 4000 21-DoF humanoid locomotion
AntSoccer 29+4 8 1000 Ball dribbling task

๋ฐ์ดํ„ฐ์…‹ ์œ ํ˜•

Stitch Dataset: ๊ฐ trajectory๊ฐ€ ์ตœ๋Œ€ 4๋ธ”๋ก๋งŒ ์ด๋™. ํ…Œ์ŠคํŠธ ์‹œ์—๋Š” ์ตœ๋Œ€ 30๋ธ”๋ก ์ด๋™ ํ•„์š”.

\text{Train: } |\tau| \leq 4 \text{ blocks}, \quad \text{Test: } |\tau| \leq 30 \text{ blocks}

Explore Dataset: ๋†’์€ action noise, ๋ฌด์ž‘์œ„ ๋ฐฉํ–ฅ ์ „ํ™˜. ๊ฐ trajectory๊ฐ€ 2-3๋ธ”๋ก ๋‚ด์—์„œ ์ง„๋™.

Navigate Dataset: ์ผ๋ฐ˜์ ์ธ navigation ๋ฐ์ดํ„ฐ. Stitching ์š”๊ตฌ์‚ฌํ•ญ ๋‚ฎ์Œ.

5.2 Baseline ๋น„๊ต

CompDiffuser๋Š” ๋‹ค์–‘ํ•œ baseline๋“ค๊ณผ ๋น„๊ต๋ฉ๋‹ˆ๋‹ค:

Diffusion ๊ธฐ๋ฐ˜: - Decision Diffuser (DD): Monolithic trajectory generation - Generative Skill Chaining (GSC): Skill-based hierarchical approach

Behavior Cloning: - GCBC: Goal-Conditioned BC with data augmentation - GCIVL: Implicit Value Learning

Offline RL: - GCIQL: Goal-Conditioned Implicit Q-Learning - QRL: Quasimetric RL - CRL: Contrastive RL - HIQL: Hierarchical Implicit Q-Learning

5.3 ์ฃผ์š” ์‹คํ—˜ ๊ฒฐ๊ณผ

PointMaze Results

Method Medium Large Giant
GCBC 45.2 12.3 0.0
DD 67.8 34.5 2.1
GSC 89.4 78.2 23.4
HIQL 82.1 65.3 15.6
CompDiffuser 96.8 94.2 87.3

CompDiffuser๋Š” ๋ชจ๋“  maze ํฌ๊ธฐ์—์„œ 90% ์ด์ƒ์˜ success rate๋ฅผ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ Giant maze์—์„œ ๋‹ค๋ฅธ baseline๋“ค ๋Œ€๋น„ 3๋ฐฐ ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

High-Dimensional State Spaces

AntMaze์™€ HumanoidMaze์—์„œ์˜ ๊ฒฐ๊ณผ๋Š” CompDiffuser์˜ scalability๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค:

AntMaze Large (29D state): - CompDiffuser (4D planning): 72.3% - CompDiffuser (17D planning): 68.9% - GSC (4D planning): 45.6% - HIQL: 38.2%

๊ณ ์ฐจ์› ์ƒํƒœ ๊ณต๊ฐ„์—์„œ๋„ compositional approach๊ฐ€ ํšจ๊ณผ์ ์ž„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Low-Quality Data (Explore)

Explore ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ๊ฒฐ๊ณผ๋Š” ํŠนํžˆ ์ฃผ๋ชฉํ•  ๋งŒํ•ฉ๋‹ˆ๋‹ค:

Method AntMaze-Medium AntMaze-Large
HIQL 23.4 8.7
GSC 31.2 12.3
CompDiffuser 58.9 41.2

๊ทน๋‹จ์ ์œผ๋กœ noisyํ•˜๊ณ  suboptimalํ•œ ๋ฐ์ดํ„ฐ์—์„œ๋„ CompDiffuser๋Š” ์˜๋ฏธ ์žˆ๋Š” trajectory stitching์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

5.4 Ablation Studies

Bidirectional vs Unidirectional Conditioning

Conditioning PointMaze-Large AntMaze-Large
Unidirectional (โ†’) 67.3 42.1
Unidirectional (โ†) 71.2 48.6
Bidirectional (โ†”๏ธŽ) 94.2 72.3

์–‘๋ฐฉํ–ฅ conditioning์ด ~25% ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

Effect of Chunk Count K

PointMaze-Giant์—์„œ K์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”:

K Success Rate Avg. Path Length
4 34.2% 12.3
6 67.8% 18.7
8 87.3% 24.2
10 85.1% 28.9
12 78.4% 32.1

K๊ฐ€ ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด goal์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•˜๊ณ , ๋„ˆ๋ฌด ํฌ๋ฉด overlap ๊ฐ์†Œ๋กœ ์ธํ•ด ๋ฌผ๋ฆฌ์  ์ผ๊ด€์„ฑ์ด ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค.

Replanning Frequency

Replanning Success Rate Computation Time
Never 72.3% 1x
Every 2 chunks 84.5% 1.5x
Every chunk 91.2% 2.1x
Every 10 steps 94.8% 4.3x

Replanning frequency๋ฅผ ๋†’์ผ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜์ง€๋งŒ, ๊ณ„์‚ฐ ๋น„์šฉ๊ณผ์˜ trade-off๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.


6. ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

6.1 Diffusion Models for Planning

Diffusion model์„ planning์— ์ ์šฉํ•œ ์—ฐ๊ตฌ์˜ ๋ฐœ์ „:

Diffuser (Janner et al., 2022) - ์ตœ์ดˆ์˜ diffusion-based planner - Trajectory ๋ถ„ํฌ p(\tau) ํ•™์Šต - Classifier-guided sampling์œผ๋กœ reward ์ตœ๋Œ€ํ™” - ํ•œ๊ณ„: Monolithic generation, short horizon์— ์ œํ•œ

Decision Diffuser (Ajay et al., 2023) - Return-conditioning ๋„์ž…: p(\tau \mid R) - ๋‹ค์–‘ํ•œ ํ’ˆ์งˆ์˜ trajectory ์ƒ์„ฑ ๊ฐ€๋Šฅ - ํ•œ๊ณ„: ์—ฌ์ „ํžˆ ํ•™์Šต horizon์— ์ œํ•œ

Diffusion Policy (Chi et al., 2023) - Action space์—์„œ์˜ diffusion - ๋กœ๋ด‡ manipulation์— ์„ฑ๊ณต์  ์ ์šฉ - ํ•œ๊ณ„: Trajectory-level planning์ด ์•„๋‹Œ action-level

CompDiffuser์™€์˜ ํ•ต์‹ฌ ์ฐจ์ด: \text{๊ธฐ์กด: } p(\tau) \quad \text{vs} \quad \text{CompDiffuser: } p(\tau^i \mid \tau^{i-1}, \tau^{i+1})

6.2 Hierarchical and Compositional Approaches

Generative Skill Chaining (GSC, Mishra et al., 2023) - Skill segments์™€ transition ๋ชจ๋ธ๋ง - Explicit skill boundaries ์ •์˜ - CompDiffuser ๋Œ€๋น„: ๋” rigidํ•œ ๊ตฌ์กฐ, boundary ์ •์˜ ํ•„์š”

Hierarchical Diffuser (Chen et al., 2024) - ๊ณ ์ˆ˜์ค€: subgoal ์ƒ์„ฑ - ์ €์ˆ˜์ค€: subgoal ๊ฐ„ trajectory ์ƒ์„ฑ - CompDiffuser ๋Œ€๋น„: ๋ช…์‹œ์  ๊ณ„์ธต ๊ตฌ์กฐ, ๋‘ ๊ฐœ์˜ ๋ชจ๋ธ ํ•„์š”

CompDiffuser์˜ ์žฅ์ : - ๋‹จ์ผ ๋ชจ๋ธ๋กœ compositional generation - Soft boundaries through overlapping - ๋” ์œ ์—ฐํ•œ chunk ์ˆ˜ ์กฐ์ ˆ

6.3 Trajectory Stitching in Offline RL

Value function ๊ธฐ๋ฐ˜ stitching ๋ฐฉ๋ฒ•๋“ค:

HIQL (Park et al., 2023) V(s, g) = \mathbb{E}[\sum_{t=0}^{\infty} \gamma^t r_t \mid s_0 = s, \text{goal} = g] - Latent subgoals ์‚ฌ์šฉ - Implicit Q-learning์œผ๋กœ ํ•™์Šต - ํ•œ๊ณ„: Trajectory ํ’ˆ์งˆ๋ณด๋‹ค ๋„๋‹ฌ ๊ฐ€๋Šฅ์„ฑ์— ์ดˆ์ 

Contrastive RL (Eysenbach et al., 2022) d(s, g) = -\log p(s \text{ leads to } g) - Temporal distance learning - Contrastive representation - ํ•œ๊ณ„: Smooth trajectory ์ƒ์„ฑ ์–ด๋ ค์›€

Quasimetric RL (Wang et al., 2023) - Quasimetric space์—์„œ์˜ ๊ฑฐ๋ฆฌ ํ•™์Šต - Triangle inequality ํ™œ์šฉ - ํ•œ๊ณ„: ๋ณต์žกํ•œ dynamics์—์„œ ์„ฑ๋Šฅ ์ €ํ•˜

Generative vs Value-based ๋น„๊ต:

Aspect Value-based Generative (CompDiffuser)
Output Optimal action Full trajectory
Diversity Single solution Multiple solutions
Smoothness May be jerky Naturally smooth
Optimality Explicit Implicit
Computation Fast inference Slower inference

6.4 ์ตœ์‹  ์—ฐ๊ตฌ ๋™ํ–ฅ

State-Covering Trajectory Stitching (SCoTS, 2025) - Latent space์—์„œ temporal distance ํ•™์Šต - Trajectory augmentation์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹ ํ™•์žฅ - CompDiffuser์™€ ์ƒํ˜ธ๋ณด์™„์  ์ ‘๊ทผ

Flow-Matching for Planning (2025) v_\theta(x, t) = \mathbb{E}[x_1 - x_0 \mid x_t] - Diffusion์˜ ๋Œ€์•ˆ์  formulation - ๋” ํšจ์œจ์ ์ธ sampling - Trajectory stitching์œผ๋กœ์˜ ํ™•์žฅ ์—ฐ๊ตฌ ์ง„ํ–‰ ์ค‘

Compositional Understanding in Diffusion (Clark et al., 2025) - Positional equivariance์™€ locality์˜ ์ค‘์š”์„ฑ ๋ถ„์„ - CompDiffuser์˜ ์„ฑ๊ณต ์š”์ธ์— ๋Œ€ํ•œ ์ด๋ก ์  ์„ค๋ช… ์ œ๊ณต


7. ๋…ผ์˜: ์‹œ์‚ฌ์ ๊ณผ ํ•œ๊ณ„

7.1 ๋กœ๋ด‡๊ณตํ•™์— ๋Œ€ํ•œ ์‹œ์‚ฌ์ 

Data-Efficient Long-Horizon Planning

์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ long-horizon demonstration์„ ์ˆ˜์ง‘ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ๋น„์šฉ์ด ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค. CompDiffuser๋Š” ์งง์€ demonstration๋“ค๋งŒ์œผ๋กœ๋„ long-horizon task๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค:

\text{Data requirement: } O(H_{short}) \rightarrow \text{Capability: } O(K \cdot H_{short})

์ด๋Š” ๋กœ๋ด‡ ํ•™์Šต์˜ data efficiency๋ฅผ K๋ฐฐ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

Compositional Generalization

๋กœ๋ด‡์ด ํ•™์Šตํ•œ ๊ธฐ๋ณธ skill๋“ค์„ ์กฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์€ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค:

\text{Skills: } \{S_1, S_2, ..., S_n\} \rightarrow \text{New Task: } S_i \circ S_j \circ S_k

CompDiffuser์˜ compositional approach๋Š” ์ด๋Ÿฌํ•œ ๋ฐฉํ–ฅ์œผ๋กœ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Robustness to Data Quality

Explore ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ์„ฑ๋Šฅ์€ ํŠนํžˆ ์ฃผ๋ชฉํ•  ๋งŒํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ ์ˆ˜์ง‘๋˜๋Š” ๋ฐ์ดํ„ฐ๋Š” ์ข…์ข…: - Suboptimalํ•œ human demonstrations - Noisy sensor readings - Incomplete trajectories

CompDiffuser๊ฐ€ ์ด๋Ÿฌํ•œ ์ €ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ์—์„œ๋„ ์˜๋ฏธ ์žˆ๋Š” stitching์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์€ ์‹ค์šฉ์ ์œผ๋กœ ์ค‘์š”ํ•œ ํŠน์„ฑ์ž…๋‹ˆ๋‹ค.

7.2 ํ˜„์žฌ ํ•œ๊ณ„์ 

Error Accumulation

๋งŽ์€ ์ˆ˜์˜ chunk๋ฅผ composeํ•  ๋•Œ, ์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „ํŒŒ ๊ณผ์ •์—์„œ error๊ฐ€ ๋ˆ„์ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

\epsilon_{total} = \sum_{i=1}^{K} \epsilon_i + \sum_{i=1}^{K-1} \epsilon_{stitch,i}

K๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ \epsilon_{total}์ด ์„ ํ˜•์ ์œผ๋กœ ๋˜๋Š” ๊ทธ ์ด์ƒ์œผ๋กœ ์ฆ๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Chunk ์ˆ˜ K์˜ ๊ฒฐ์ •

์ตœ์ ์˜ chunk ์ˆ˜๋Š” task-dependentํ•ฉ๋‹ˆ๋‹ค:

K^* = \arg\min_K \left[ P(\text{goal not reached} \mid K) + \lambda \cdot P(\text{infeasible} \mid K) \right]

ํ˜„์žฌ๋Š” ์ด๋ฅผ ์ˆ˜๋™์œผ๋กœ ์„ค์ •ํ•ด์•ผ ํ•˜๋ฉฐ, ์ž๋™์œผ๋กœ K๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๊ณ„์‚ฐ ๋น„์šฉ

Diffusion-based planning์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋†’์Šต๋‹ˆ๋‹ค:

\text{Time} = O(K \cdot T_{diffusion} \cdot \text{NFE})

์—ฌ๊ธฐ์„œ NFE(Number of Function Evaluations)๋Š” ์ˆ˜๋ฐฑ์—์„œ ์ˆ˜์ฒœ์— ๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Real-time ์ ์šฉ์„ ์œ„ํ•ด์„œ๋Š”: - DDIM๊ณผ ๊ฐ™์€ accelerated sampling - Distillation ๊ธฐ๋ฒ• - Progressive generation strategies

๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Dynamic Environments

ํ˜„์žฌ ํ‰๊ฐ€๋Š” ์ฃผ๋กœ ์ •์  ํ™˜๊ฒฝ์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋™์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ๋Š”: - Moving obstacles - Changing goal locations
- Non-stationary dynamics

์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

7.3 ๋ฏธ๋ž˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

1. Adaptive Chunk Selection

Algorithm: Adaptive K Selection
1. Start with K_min
2. Generate trajectory
3. Evaluate feasibility score F(ฯ„)
4. If F(ฯ„) < threshold and K < K_max:
   K = K + 1
   goto 2
5. Return best trajectory

2. Learning-Based Chunk Count Prediction K^* = f_\phi(s_0, s_g, \text{environment features})

์‹ ๊ฒฝ๋ง์„ ํ†ตํ•ด ์ตœ์ ์˜ K๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•

3. Hierarchical CompDiffuser - ๊ณ ์ˆ˜์ค€: coarse waypoints ์ƒ์„ฑ - ์ €์ˆ˜์ค€: waypoints ์‚ฌ์ด๋ฅผ CompDiffuser๋กœ ์—ฐ๊ฒฐ

4. Multi-Modal Conditioning - Language instructions - Visual observations - Force/torque feedback

5. Real Robot Deployment - Sim-to-real transfer - Online adaptation - Safety constraints integration


8. ๊ฒฐ๋ก 

CompDiffuser๋Š” diffusion model์˜ compositional ๋Šฅ๋ ฅ์„ ํ™œ์šฉํ•˜์—ฌ trajectory stitching ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ Contributions

  1. Compositional Formulation: Trajectory ๋ถ„ํฌ๋ฅผ overlapping chunks์˜ ๋ถ„ํฌ๋กœ ๋ถ„ํ•ด p(\tau) = \prod_{i=1}^{K} p(\tau^i \mid \tau^{i-1}, \tau^{i+1})

  2. Bidirectional Information Propagation: Noise-conditioned score function์„ ํ†ตํ•œ ์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „ํŒŒ s_\theta(\tau^i, t \mid \tilde{\tau}^{i-1}, \tilde{\tau}^{i+1})

  3. Flexible Inference: ๋‹จ์ผ ๋ชจ๋ธ๋กœ ๋‹ค์–‘ํ•œ ๊ธธ์ด์˜ trajectory ์ƒ์„ฑ K \in \{K_{min}, ..., K_{max}\} \text{ at inference time}

  4. Strong Empirical Results: OGBench๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ๋ฒค์น˜๋งˆํฌ์—์„œ SOTA ์„ฑ๋Šฅ

์—ฐ๊ตฌ์˜ ์˜์˜

์ด ์—ฐ๊ตฌ๋Š” generative models๋ฅผ ํ™œ์šฉํ•œ ๋กœ๋ด‡ planning ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ์ง„์ „์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ์งง์€ demonstration ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ long-horizon task๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์€ ์‹ค์šฉ์ ์œผ๋กœ ๋งค์šฐ ๊ฐ€์น˜ ์žˆ์Šต๋‹ˆ๋‹ค.

CompDiffuser๋Š” โ€œ์ „์ฒด๋Š” ๋ถ€๋ถ„์˜ ํ•ฉ๋ณด๋‹ค ํฌ๋‹คโ€๋Š” ์›๋ฆฌ๋ฅผ ๊ฑฐ๊พธ๋กœ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค: ๊ธด trajectory๋ฅผ ์ง์ ‘ ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ต๋‹ค๋ฉด, ์งง์€ ์กฐ๊ฐ๋“ค์„ ์˜๋ฆฌํ•˜๊ฒŒ ์กฐํ•ฉํ•˜์—ฌ ๊ธด trajectory๋ฅผ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•ž์œผ๋กœ ์ด๋Ÿฌํ•œ compositional approach๊ฐ€:

  • ๋” ๋ณต์žกํ•œ manipulation tasks
  • Multi-agent coordination
  • Language-conditioned planning
  • Real-world robotic systems

์œผ๋กœ ํ™•์žฅ๋˜๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

โ›๏ธ Dig Review

โ›๏ธ Dig โ€” Go deep, uncover the layers. Dive into technical detail.

Introduction

๋กœ๋ด‡์˜ ์žฅ๊ธฐ ๊ณ„ํš(long-horizon planning) ๋ฌธ์ œ๋Š” ์‹œ์ž‘ ์ƒํƒœ์—์„œ ๋ชฉํ‘œ ์ƒํƒœ์— ๋„๋‹ฌํ•˜๊ธฐ ์œ„ํ•œ ๊ธด ์‹œํ€€์Šค์˜ ํ–‰๋™ ๋ฐ ์ƒํƒœ ๊ถค์ ์„ ์ฐพ์•„์•ผ ํ•œ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด ๊ฐ•ํ™”ํ•™์Šต์ด๋‚˜ ๊ฒฝ๋กœ๊ณ„ํš ๊ธฐ๋ฒ•์€ ์ „์ฒด ๊ฒฝ๋กœ๋ฅผ ํ•œ ๋ฒˆ์— ๊ณ„ํšํ•˜๋ ค๋‹ค ๋ณด๋‹ˆ ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ๊ณผ ์ผ๋ฐ˜ํ™” ์ธก๋ฉด์—์„œ ํ•œ๊ณ„๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค. ์ตœ๊ทผ ํ™•์‚ฐ ๋ชจ๋ธ(diffusion model) ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ํ”Œ๋ž˜๋„ˆ๋“ค์ด ๊ธด ๊ณ„ํš ์‹œํ€€์Šค๋ฅผ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑํ•˜๋Š” ์ ‘๊ทผ์„ ์„ ๋ณด์˜€์ง€๋งŒ, ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ์œ ์‚ฌํ•œ ์ž‘์—…์—๋งŒ ์ผ๋ฐ˜ํ™”๋˜๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์‹œ ๋งํ•ด, ๊ธฐ์กด Diffuser ๊ณ„์—ด ๋ชจ๋ธ์€ ํ•™์Šต ์‹œ ์ œ๊ณต๋œ ๋ฒ”์œ„์˜ ์‹œ์ž‘โ€“๋ชฉํ‘œ ์Œ ๋‚ด์—์„œ๋งŒ ์œ ํšจํ•œ ๊ณ„ํš์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ณ , ๊ทธ ํ•™์Šต ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋Š” ์ƒˆ๋กœ์šด ๊ณผ์ œ์—๋Š” ์ œ๋Œ€๋กœ ๋™์ž‘ํ•˜์ง€ ์•Š๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•œํŽธ, ๊ถค์  ์Šคํ‹ฐ์นญ(trajectory stitching)์€ ๊ฐ•ํ™”ํ•™์Šต ๋ถ„์•ผ์—์„œ ์ œ์•ˆ๋œ ๊ฐœ๋…์œผ๋กœ, ๊ณผ๊ฑฐ์— ๊ด€์ฐฐ๋œ ์งง์€ ๊ณ ํ’ˆ์งˆ ๊ถค์  ์กฐ๊ฐ๋“ค์„ ์ด์–ด๋ถ™์—ฌ์„œ ๋” ๋‚˜์€ ์žฅ๊ธฐ ์ •์ฑ…์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ณด์ƒ๊ฐ’์ด ๋†’์€ ๊ฒฝ๋กœ ์กฐ๊ฐ๋“ค์„ ๊ฒน์น˜๋Š” ์ง€์ ์—์„œ ์—ฐ๊ฒฐํ•ด ์ƒˆ๋กœ์šด ๊ฒฝ๋กœ๋ฅผ ๊ตฌ์„ฑํ•จ์œผ๋กœ์จ ๊ธด ๊ณผ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๊ตฌ์„ฑ์  ์ผ๋ฐ˜ํ™”(compositional generalization)๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์ง€๋งŒ, ์–ด๋А ์ง€์ ์—์„œ ์–ด๋–ป๊ฒŒ ์—ฐ๊ฒฐํ•ด์•ผ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€๋˜๊ณ  ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ(dynamic consistency) ๊ฒฝ๋กœ๊ฐ€ ๋˜๋Š”์ง€ ์ฐพ๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ ๋‚œ์ œ์ž…๋‹ˆ๋‹ค. ๊ธด ์—ฐ์† ๊ถค์  ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ๋Œ€์‹  ์งง์€ ์„ธ๊ทธ๋จผํŠธ๋ฅผ ์กฐํ•ฉํ•˜๋ ค๋ฉด, ์—ฐ๊ฒฐ์ ์—์„œ์˜ ์›ํ™œํ•œ ์ „ํ™˜์ด ๋‹ด๋ณด๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Generative Trajectory Stitching through Diffusion Composition ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด CompDiffuser๋ผ๋Š” ์ƒˆ๋กœ์šด ํ™•์‚ฐ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ํ”Œ๋ž˜๋„ˆ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. CompDiffuser๋Š” ์งง์€ ๊ถค์  ์ฒญํฌ(trajectory chunk)๋“ค์„ ํ•™์Šตํ•˜๊ณ , ์ด๋ฅผ ๊ตฌ์„ฑ์ (compositional)์œผ๋กœ ์ด์–ด๋ถ™์—ฌ ๋ณธ ์  ์—†๋Š” ์žฅ๊ธฐ ๊ณผ์ œ๋ฅผ ํ’€์–ด๋ƒ…๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์ „์ฒด ๊ถค์  ๋ถ„ํฌ๋ฅผ ์—ฌ๋Ÿฌ ๊ฒน์ณ์ง€๋Š”(chunk overlap) ๋ถ€๋ถ„ ๊ถค์ ์œผ๋กœ ์„ธ๋ถ„ํ™”ํ•˜๊ณ , ์ด๋“ค ์ธ์ ‘ ์ฒญํฌ ๊ฐ„์˜ ์กฐ๊ฑด๋ถ€ ๊ด€๊ณ„๋ฅผ ํ•˜๋‚˜์˜ ์–‘๋ฐฉํ–ฅ ํ™•์‚ฐ ๋ชจ๋ธ(bidirectional diffusion model)๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ ์ƒ์„ฑ ๋‹จ๊ณ„์—์„œ ๊ฐ ๋ถ€๋ถ„ ๊ถค์ ์ด ์„œ๋กœ ์ •๋ณด๋ฅผ ์ฃผ๊ณ ๋ฐ›์œผ๋ฉฐ ์ ์ง„์ ์œผ๋กœ ์™„์„ฑ๋˜์–ด, ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฌ์šด ์—ฐ๊ฒฐ์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. CompDiffuser๋Š” ์งง์€ ๋ฒ”์œ„์˜ ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šตํ•˜๊ณ ๋„, ์ด์ „์— ๋ณด์ง€ ๋ชปํ•œ ๋” ๊ธด ๊ฒฝ๋กœ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ๊ณ„ํšํ•  ์ˆ˜ ์žˆ์Œ์„ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ™˜๊ฒฝ ์‹คํ—˜์„ ํ†ตํ•ด ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์—์„œ๋Š” ๋ณธ ๋…ผ๋ฌธ์˜ ์ฃผ์š” ๊ธฐ์—ฌ์™€ ๋ฐฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋กœ๋ด‡๊ณตํ•™ ์—ฐ๊ตฌ์ž์˜ ์‹œ๊ฐ์—์„œ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Main Contributions

๋…ผ๋ฌธ์—์„œ ์ €์ž๋“ค์€ CompDiffuser๋ฅผ ํ†ตํ•ด ์–ป์€ ํ•ต์‹ฌ ์„ฑ๊ณผ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค:

  • ๊ตฌ์„ฑ์  ํ™•์‚ฐ ๊ณ„ํš ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ ์ƒ˜ํ”Œ ์กฐ๊ฑด๋ถ€ ํ•™์Šต ๋ฐฉ์‹์„ ํ†ตํ•ด ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ์—ฌ๋Ÿฌ ๊ถค์  ์„ธ๊ทธ๋จผํŠธ๋ฅผ ๊ฐ๊ฐ ์ƒ์„ฑํ•˜๋ฉด์„œ ์ „์ฒด ๊ถค์ ์˜ ๊ตฌ์„ฑ์  ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๊ธด ๊ถค์  ์ƒ์„ฑ์„ ์—ฌ๋Ÿฌ ๋ถ„๋ฆฌ๋œ ํ™•์‚ฐ ๋””๋…ธ์ด์ง• ๊ณผ์ •๋“ค์˜ ์‹œํ€€์Šค๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ํ•™์Šตํ•˜๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ์–‘๋ฐฉํ–ฅ ์ •๋ณด์ „๋‹ฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ธ์ ‘ํ•œ ๊ถค์  ์ฒญํฌ๋“ค์„ ์กฐ๊ฑด์œผ๋กœ ํ™œ์šฉํ•˜์—ฌ ์ƒํ˜ธ ์˜์กด์ ์œผ๋กœ ๋””๋…ธ์ด์ง•ํ•จ์œผ๋กœ์จ, ๋ถ€๋ถ„ ๊ถค์  ์‚ฌ์ด์˜ ๋ฌผ๋ฆฌ์  ์ผ๊ด€์„ฑ(continuity & feasibility)์„ ์œ ์ง€ํ•˜๋Š” ๋ชฉํ‘œ ์กฐ๊ฑด๋ถ€ ์žฅ๊ธฐ ๊ณ„ํš์ด ๊ฐ€๋Šฅํ•ด์กŒ์Šต๋‹ˆ๋‹ค. Diffuser ๊ธฐ๋ฐ˜ ํ”Œ๋ž˜๋‹์— ์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „ํŒŒ๋ฅผ ์ ์šฉํ•œ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์ž…๋‹ˆ๋‹ค.
  • ๋‹ค์–‘ํ•œ ๊ถค์  ์Šคํ‹ฐ์นญ ๋ฒค์น˜๋งˆํฌ ์‹คํ—˜์„ ํ†ตํ•ด ๊ธฐ์กด ๋ฐฉ๋ฒ• ๋Œ€๋น„ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ฐฉํ•™์Šต, ์˜คํ”„๋ผ์ธ RL, ๊ธฐ์กด ํ™•์‚ฐ ํ”Œ๋ž˜๋„ˆ ๋“ฑ ์—ฌ๋Ÿฌ ๊ธฐ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ ๋ชจ๋“  ํ™˜๊ฒฝ์—์„œ ์„ฑ๊ณต๋ฅ  ๋“ฑ์—์„œ ์œ ์˜๋ฏธํ•œ ๊ฐœ์„ ์„ ๋‹ฌ์„ฑํ•˜์˜€๊ณ , ๋ชจ๋ธ์˜ ๊ธฐ๋Šฅ๊ณผ ํ•œ๊ณ„๋ฅผ ์ƒ์„ธ ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, CompDiffuser๋Š” ์งง์€ ๊ถค์  ์กฐ๊ฐ๋“ค๋งŒ์œผ๋กœ ํ•™์Šตํ•˜๊ณ ๋„ ๋” ๊ธด ์ƒˆ๋กœ์šด ๊ณผ์ œ๋ฅผ ํ’€์–ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ์ƒ์„ฑ์  ํ”Œ๋ž˜๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•˜์˜€์œผ๋ฉฐ, ์ด์›ƒ ์กฐ๊ฑด๋ถ€ ํ™•์‚ฐ์ด๋ผ๋Š” ๋…์ฐฝ์ ์ธ ๊ธฐ๋ฒ•์œผ๋กœ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฌ์šด ๊ถค์  ์—ฐ๊ฒฐ์„ ๋‹ฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

CompDiffuser์˜ ๋ฐฉ๋ฒ• ๋ฐ ๋ชจ๋ธ ๊ตฌ์กฐ

๊ถค์ ์˜ ๊ตฌ์„ฑ์  ๋ชจ๋ธ๋ง (Trajectory Distribution Factorization)

CompDiffuser๋Š” ์ „์ฒด ๊ถค์ ์„ ์ง์ ‘ ํ•˜๋‚˜์˜ ์‹œํ€€์Šค๋กœ ์ƒ์„ฑํ•˜๋Š” ๋Œ€์‹ , ์ด๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ฒน์น˜๋Š” ๋ถ€๋ถ„ ๊ถค์ ์œผ๋กœ ๋ถ„ํ• ํ•˜์—ฌ ๋ชจ๋ธ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ ๋ถ€๋ถ„ ๊ถค์ ์„ \tau_k (k๋ฒˆ์งธ ์ฒญํฌ)๋ผ๊ณ  ํ•  ๋•Œ, ์ด๋“ค์€ ์ธ์ ‘ ์ฒญํฌ๋“ค๊ณผ ์ผ๋ถ€ ๊ตฌ๊ฐ„์„ ๊ณต์œ ํ•˜๋„๋ก ๊ฒน์ณ์ง‘๋‹ˆ๋‹ค. ํŠนํžˆ \tau_{k}์™€ \tau_{k+1}๋Š” ์ค‘๋ณต๋˜๋Š” ์ƒํƒœ ๊ตฌ๊ฐ„(overlap)์„ ๊ฐ€์ง€๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์—ฐ๊ฒฐ๋ถ€์˜ ์—ฐ์†์„ฑ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์„ธ๋ถ„ํ™”๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ๋…ผ๋ฌธ์—์„œ๋Š” ์ „์ฒด ๊ถค์  \tau์˜ ๋ถ„ํฌ๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ๋“ค์˜ ๊ณฑ์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค:

p_\theta(\tau \mid q_s, q_g) \;\propto\; p_{1}(\tau_{1} \mid q_s,\, \tau_{2}) \;\; p_{K}(\tau_{K} \mid \tau_{K-1},\, q_g)\; \prod_{k=2}^{K-1} p_{k}(\tau_{k} \mid \tau_{k-1},\, \tau_{k+1}) \,.

์œ„ ์‹์—์„œ q_s์™€ q_g๋Š” ๊ฐ๊ฐ ์ „์ฒด ๊ถค์ ์˜ ์‹œ์ž‘ ์ƒํƒœ(start state)์™€ ๋ชฉํ‘œ ์ƒํƒœ(goal state)์ด๋ฉฐ, \tau_1์€ ์ฒซ ์ฒญํฌ, \tau_K๋Š” ๋งˆ์ง€๋ง‰ ์ฒญํฌ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ฒซ ์ฒญํฌ \tau_1์€ ์‹œ์ž‘ ์ƒํƒœ q_s์™€ ๋‹ค์Œ ์ฒญํฌ \tau_2์— ์˜์กดํ•˜๊ณ , ๋งˆ์ง€๋ง‰ ์ฒญํฌ \tau_K๋Š” ์ด์ „ ์ฒญํฌ \tau_{K-1}๊ณผ ๋ชฉํ‘œ ์ƒํƒœ q_g์— ์˜์กดํ•˜๋Š” ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค. ์ค‘๊ฐ„์˜ ๊ฐ ์ฒญํฌ \tau_k๋Š” ์˜ค์ง ์ธ์ ‘ํ•œ ์ฒญํฌ๋“ค \tau_{k-1}, \tau_{k+1}์—๋งŒ ์กฐ๊ฑด๋ถ€ ์˜์กด์„ฑ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ์ด Markov ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด, ๋ชจ๋ธ์€ ์ „์ฒด ๊ธด ๊ถค์ ์„ ํ•œ๊บผ๋ฒˆ์— ํ•™์Šตํ•˜์ง€ ์•Š๊ณ ๋„ ๊ตญ์†Œ์ ์ธ ์—ฐ๊ฒฐ ๊ตฌ์กฐ๋งŒ ํ•™์Šตํ•˜๋ฉด ๋˜๋ฏ€๋กœ, ํ•™์Šต ๋‚œ์ด๋„๊ฐ€ ๋‚ฎ์•„์ง€๊ณ  ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ์ด ๋†’์•„์ง‘๋‹ˆ๋‹ค. ์š”์ปจ๋Œ€, CompDiffuser๋Š” โ€œ๋ถ€๋ถ„์„ ์•Œ๋ฉด ์ „์ฒด๋ฅผ ์กฐํ•ฉํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹คโ€๋Š” ๊ตฌ์„ฑ์  ์›๋ฆฌ๋ฅผ ํ™•์‚ฐ ๋ชจ๋ธ์— ๋„์ž…ํ•˜์—ฌ ๊ธด ๊ถค์  ๋ถ„ํฌ๋ฅผ ๊ทผ์‚ฌํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋ณธ ์  ์—†๋Š” ์‹œ์ž‘โ€“๋ชฉํ‘œ ์กฐํ•ฉ์ด๋ผ๋„, ๊ฐ ์ค‘๊ฐ„ ์กฐ๊ฐ๋“ค๋งŒ ์ต์ˆ™ํ•œ ํŒจํ„ด์ด๋ผ๋ฉด ์ƒˆ๋กœ์šด ์žฅ๊ฑฐ๋ฆฌ ๊ฒฝ๋กœ๋กœ ์กฐํ•ฉํ•ด๋‚ผ ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ CompDiffuser๋Š” ๊ธด ํ•™์Šต ๊ถค์  ๋ฐ์ดํ„ฐ ์—†์ด๋„ ์ด๋Ÿฌํ•œ ์ ‘๊ทผ์œผ๋กœ ์žฅ๊ธฐ ๊ณ„ํš ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

์ธ์ ‘ ์ฒญํฌ ์กฐ๊ฑด๋ถ€ ํ™•์‚ฐ ๋ชจ๋ธ๊ณผ ์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „๋‹ฌ

CompDiffuser์˜ ํ•ต์‹ฌ์€ ๊ฐ ์ฒญํฌ๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ์ธ์ ‘ํ•œ ์ฒญํฌ๋“ค์˜ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ํ™•์‚ฐ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ํ™•์‚ฐ ํ™•๋ฅ ๋ชจ๋ธ(diffusion probabilistic model)์—์„œ๋Š” ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ–ˆ๋‹ค๊ฐ€ ์ œ๊ฑฐํ•˜๋Š” ๋””๋…ธ์ด์ง• ๊ณผ์ •์„ ๊ฑฐ์ณ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. CompDiffuser๋Š” ๋‹จ์ผํ•œ ํ™•์‚ฐ ๋ชจ๋ธ(denoising network \epsilon_\theta)์„ ํ›ˆ๋ จํ•˜์—ฌ, ํ˜„์žฌ ์ฒญํฌ \tau_k์˜ ๋””๋…ธ์ด์ง• ์ถœ๋ ฅ์ด ์ด์›ƒ ์ฒญํฌ \tau_{k-1}, \tau_{k+1}์˜ ์ƒํƒœ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ํ›ˆ๋ จ ์‹œ ์ž„์˜์˜ ๊ถค์  \tau์—์„œ ๋ถ€๋ถ„ ์ฒญํฌ \tau_k๋ฅผ ๋ฝ‘์•„ ๋…ธ์ด์ฆˆ๋ฅผ ์„ž์€ ์ƒ˜ํ”Œ \tau^t_k (noise level t)๋ฅผ ๋งŒ๋“ค๊ณ , ๊ฐ™์€ t ๋‹จ๊ณ„์—์„œ ์ด์›ƒ ์ฒญํฌ๋“ค \tau^t_{k-1}, \tau^t_{k+1}๋„ ํ•จ๊ป˜ ๋…ธ์ด์ฆˆ ์ƒ˜ํ”Œ๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํ™•์‚ฐ ๋ชจ๋ธ \epsilon_\theta๋Š” \tau^t_{k} (์ค‘์•™ ์ฒญํฌ)์˜ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•˜๋„๋ก ํ•™์Šต๋˜๋Š”๋ฐ, ์ด๋•Œ ์กฐ๊ฑด๋ถ€ ์ž…๋ ฅ์œผ๋กœ ์ด์›ƒ๋“ค์˜ ๋…ธ์ด์ฆˆ ์ƒํƒœ (\tau^t_{k-1}, \tau^t_{k+1})๋ฅผ ํ•จ๊ป˜ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์†์‹ค ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

L_{\text{nbr}} = \mathbb{E}_{\tau \sim D, t, k} \Big[ \big\| \epsilon - \epsilon_\theta(\tau^t_k, t \mid \tau^t_{k-1}, \tau^t_{k+1}) \big\|^2 \Big]

์—ฌ๊ธฐ์„œ \epsilon์€ ์‹ค์ œ ์ถ”๊ฐ€๋œ ๋…ธ์ด์ฆˆ์ด๋ฉฐ, \epsilon_\theta(\cdot\|\tau^t_{k-1}, \tau^t_{k+1})๋Š” ๋ชจ๋ธ์ด ์ด์›ƒ ์ฒญํฌ๋“ค์˜ ํ˜„์žฌ ๋…ธ์ด์ฆˆ ์ƒํƒœ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ \tau^t_k์˜ ๋…ธ์ด์ฆˆ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ด ์ด์›ƒ ์กฐ๊ฑด๋ถ€ ํ•™์Šต(noisy-sample conditioning)์„ ํ†ตํ•ด ๋ชจ๋ธ์€ ์ธ์ ‘ ๋ถ€๋ถ„ ๊ฐ„์˜ ๊ฒฝ๊ณ„์—์„œ ์–ด๋–ป๊ฒŒ ํ˜•ํƒœ๋ฅผ ์žก์•„์•ผ ์ž์—ฐ์Šค๋Ÿฌ์šด์ง€๋ฅผ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์•ž๋ถ€๋ถ„ ์ฒญํฌ \tau_{k-1}์ด ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰ ์ค‘์ด๋ผ๋ฉด, \tau_k๋Š” ๊ทธ ๋ฐฉํ–ฅ๊ณผ ๋งค๋„๋Ÿฝ๊ฒŒ ์ด์–ด์ง€๋„๋ก ๋ชจ์–‘์„ ์žก์•„๊ฐ€๋Š” ์‹์ž…๋‹ˆ๋‹ค. ๋””๋…ธ์ด์ง• ๊ณผ์ • ๋™์•ˆ ๊ฐ ์ฒญํฌ์˜ ์ƒ์„ฑ์ด ์ด์›ƒ์— ์˜ํ–ฅ์„ ๋ฐ›๊ณ  ๋˜ ์ด์›ƒ์„ ์˜ํ–ฅ์ฃผ๋ฉด์„œ ์–‘๋ฐฉํ–ฅ์œผ๋กœ ์ •๋ณด๊ฐ€ ํ๋ฅด๊ฒŒ ๋˜์–ด, ์ตœ์ข…์ ์œผ๋กœ ๋™์  ์ผ๊ด€์„ฑ(dynamic consistency) ์žˆ๋Š” ์—ฐ๊ฒฐ์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์–‘๋ฐฉํ–ฅ ์ •๋ณด ์ „ํŒŒ๋Š” CompDiffuser์˜ ๊ฐ€์žฅ ํฐ ๊ธฐ์ˆ ์  ํŠน์ง•์œผ๋กœ, ์ธ์ ‘ ์ฒญํฌ๋“ค์ด ์„œ๋กœ์˜ ๋ถ€๋ถ„์ ์ธ ์ง„ํ–‰ ์ƒํ™ฉ์„ ๋ณด๋ฉฐ ์กฐ์œจํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถ€๋“œ๋Ÿฌ์šด ๊ถค์  ์—ฐ๊ฒฐ์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค.

ํ•œํŽธ, ๊ถค์ ์˜ ์‹œ์ž‘๊ณผ ๋ชฉํ‘œ ์กฐ๊ฑด๋„ ์œ ์‚ฌํ•˜๊ฒŒ ๋ชจ๋ธ์— ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. CompDiffuser๋Š” ์ฒซ ๋ฒˆ์งธ ์ฒญํฌ \tau_1 ์ƒ์„ฑ ์‹œ์—๋Š” ์ฃผ์–ด์ง„ ์‹œ์ž‘ ์ƒํƒœ q_s๋ฅผ, ๋งˆ์ง€๋ง‰ ์ฒญํฌ \tau_K ์ƒ์„ฑ ์‹œ์—๋Š” ๋ชฉํ‘œ ์ƒํƒœ q_g๋ฅผ ์กฐ๊ฑด์œผ๋กœ ๋„ฃ์–ด์ฃผ๋Š” ๋ณ„๋„ ํ•™์Šต ๊ณผ์ •์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋งˆ์น˜ q_s๋‚˜ q_g ์ž์ฒด๊ฐ€ ์ธ์ ‘ ์ฒญํฌ์ธ ๊ฒƒ์ฒ˜๋Ÿผ ๊ฐ„์ฃผํ•˜์—ฌ ๋ชจ๋ธ์— ๊ณต๊ธ‰ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์‹ (4)์™€ ๊ฐ™์ด q_s๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์ฒซ ์ฒญํฌ๋ฅผ ๋””๋…ธ์ด์ง•ํ•˜๋Š” ์ถ”๊ฐ€ ์†์‹ค L_{\text{start}}์™€, q_g์— ๋Œ€ํ•œ ์œ ์‚ฌ ์†์‹ค์„ ํ›ˆ๋ จ์— ํฌํ•จ์‹œ์ผœ ๋™์ผํ•œ ๋ชจ๋ธ์ด ์–‘ ๋๋‹จ์˜ ์กฐ๊ฑด๊นŒ์ง€ ํ‘œํ˜„ํ•˜๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ง๋ถ™์—ฌ, ์ด๋Ÿฌํ•œ ์‹œ์ž‘/๋ชฉํ‘œ ์ƒํƒœ ์กฐ๊ฑด ๋ถ€์—ฌ๋Š” ์ด๋ฏธ ์•Œ๊ณ  ์žˆ๋Š” ๋ถ€๋ถ„(inpainting)์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋‚˜๋จธ์ง€๋ฅผ ์ฑ„์›Œ๋„ฃ๋Š” ์ธํŽ˜์ธํŒ… ๊ธฐ๋ฒ•๊ณผ ์œ ์‚ฌํ•œ ํ˜•ํƒœ๋กœ ๋ชจ๋ธ์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด CompDiffuser๋Š” ๋ชฉํ‘œ ์ง€ํ–ฅ์ ์ธ ๊ถค์  ์ƒ˜ํ”Œ์„ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์‹ค์ œ ํ…Œ์ŠคํŠธ ์‹œ์—๋Š” q_s, q_g๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด ์ฒซ ์ฒญํฌ์™€ ๋งˆ์ง€๋ง‰ ์ฒญํฌ๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๊ณ ์ •์‹œํ‚ต๋‹ˆ๋‹ค.

ไปฅไธŠ์˜ ๊ตฌ์„ฑ์œผ๋กœ ํ›ˆ๋ จ๋œ ๋‹จ์ผ ํ™•์‚ฐ ๋ชจ๋ธ์€ ์ž„์˜์˜ ๊ธธ์ด K์˜ ๋ถ€๋ถ„ ๊ถค์ ๋“ค์— ๋Œ€ํ•ด, ์ด์›ƒ๋“ค์˜ ๋…ธ์ด์ฆˆ ์ƒํƒœ๋ฅผ ์กฐ๊ฑด์œผ๋กœ ๋ณ‘๋ ฌ์ ์ด๊ณ ๋„ ์–‘๋ฐฉํ–ฅ์ ์œผ๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•˜๋ฉฐ ์ „์ฒด ๊ถค์ ์„ ํ˜•์„ฑํ•ด ๋‚˜๊ฐˆ ์ค€๋น„๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์ค‘์š”ํ•œ ์ ์€, ์ด๋Ÿฌํ•œ ์ ‘๊ทผ๋ฒ•์€ ๋ณ‘๋ ฌ ์ƒ์„ฑ๊ณผ ์ˆœ์ฐจ ์ƒ์„ฑ(autoregressive) ๋ชจ๋‘๋ฅผ ์ง€์›ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ด ๋‘ ๊ฐ€์ง€ ์ƒ์„ฑ ์ „๋žต์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ถค์  ์ƒ์„ฑ ์ „๋žต: ๋ณ‘๋ ฌ vs. ์ž๊ธฐํšŒ๊ท€ (Parallel vs Autoregressive Sampling)

CompDiffuser์—์„œ๋Š” ํ™•์‚ฐ ๋””๋…ธ์ด์ง• ๊ณผ์ •์—์„œ ๊ฐ ์ฒญํฌ๋“ค์„ ์–ด๋–ค ์ˆœ์„œ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋А๋ƒ์— ๋”ฐ๋ผ ๋‘ ๊ฐ€์ง€ ์ƒ˜ํ”Œ๋ง ์ „๋žต์„ ์ทจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋ณ‘๋ ฌ ์ƒ˜ํ”Œ๋ง (Parallel Sampling): ๋ชจ๋“  ๊ถค์  ์ฒญํฌ \tau_{1:K}์— ๋™์‹œ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•œ ์ƒํƒœ์—์„œ, ๋งค ๋””๋…ธ์ด์ง• ์Šคํ…๋งˆ๋‹ค ์ด์›ƒ ์ฒญํฌ๋“ค์˜ ์ด์ „ ์Šคํ… ๋…ธ์ด์ฆˆ ์ƒํƒœ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๋™์‹œ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ๊ฐ ์Šคํ…์—์„œ๋Š” \tau_k๋ฅผ ๊ฐฑ์‹ ํ•  ๋•Œ \tau_{k-1}, \tau_{k+1}์˜ ์ง์ „ ๋‹จ๊ณ„(timestep)์˜ ๊ฐ’๋งŒ ์ฐธ๊ณ ํ•˜๋ฏ€๋กœ, K๊ฐœ์˜ ์ฒญํฌ๊ฐ€ ์„œ๋กœ ๋…๋ฆฝ์ ์œผ๋กœ(๋™์‹œ์—) ์—…๋ฐ์ดํŠธ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์—ฌ๋Ÿฌ ์„ธ๊ทธ๋จผํŠธ๋ฅผ ํ•œ๊บผ๋ฒˆ์— ์ƒ์„ฑํ•˜๋ฏ€๋กœ ๊ณ„์‚ฐ ์†๋„๊ฐ€ ๋น ๋ฅธ ์žฅ์ ์ด ์žˆ์ง€๋งŒ, ์‹ค์ œ ์ •๋ณด ๊ตํ™˜์ด ์ œํ•œ์ ์ด๋ผ๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๊ฐ™์€ ์Šคํ… ๋‚ด์—์„œ๋Š” ์ด์›ƒ๋“ค์ด ์•„์ง ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์—, ์™„์ „ํžˆ ํ˜‘์กฐ์ ์ธ ์ƒ์„ฑ์€ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š๊ณ  ์–ด๋А ์ •๋„ ๋А์Šจํ•œ ์–‘๋ฐฉํ–ฅ ์—ฐ๊ฒฐ๋งŒ ์–ป๋Š” ์…ˆ์ž…๋‹ˆ๋‹ค.

  • ์ž๊ฐ€ํšŒ๊ท€ ์ƒ˜ํ”Œ๋ง (Autoregressive Sampling): ํ•˜๋‚˜์˜ ์ฒญํฌ์”ฉ ์ˆœ์ฐจ์ ์œผ๋กœ ๋””๋…ธ์ด์ง•์„ ์™„๋ฃŒํ•ด๊ฐ€๋ฉฐ ์ง„ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด K=3์ผ ๋•Œ, \tau_1์„ ์ถฉ๋ถ„ํžˆ ๋””๋…ธ์ด์ง•ํ•œ ํ›„ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์กฐ๊ฑด์œผ๋กœ \tau_2๋ฅผ ๋””๋…ธ์ด์ง•ํ•˜๊ณ , ๋‹ค์‹œ ์ด๋ฅผ ํ† ๋Œ€๋กœ \tau_3๋ฅผ ๋””๋…ธ์ด์ง•ํ•˜๋Š” ์‹์ž…๋‹ˆ๋‹ค. ๊ตฌํ˜„์ƒ์œผ๋กœ๋Š” ํ™•์‚ฐ์˜ ์‹œ๊ฐ„ ๋‹จ๊ณ„ t๋งˆ๋‹ค ์ด์ „ ์ฒญํฌ๋Š” ํ•œ ๋‹จ๊ณ„ ๋” ๊นจ๋—ํ•œ(noise level t-1) ์ƒ˜ํ”Œ, ๋‹ค์Œ ์ฒญํฌ๋Š” ์•„์ง ํ•œ ๋‹จ๊ณ„ ๋’ค์ง„(noise level t) ์ƒ˜ํ”Œ์„ ์กฐ๊ฑด์œผ๋กœ ํ˜„์žฌ ์ฒญํฌ๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์ด์›ƒ ์ฒญํฌ๋“ค์ด ๋” ๊นจ๋—ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ด์ฃผ๋ฏ€๋กœ ๊ฐ ๋‹จ๊ณ„์—์„œ ๋”์šฑ ๊ธด๋ฐ€ํ•œ ์„ธ๊ทธ๋จผํŠธ ๊ฐ„ ํ˜‘์กฐ๊ฐ€ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ์—ฐ๊ฒฐ ๊ตฌ๊ฐ„์˜ ์ •๋ฐ€๋„์™€ ๊ณ„ํš ํ’ˆ์งˆ์ด ํ–ฅ์ƒ๋˜์ง€๋งŒ, ๋‹จ์ ์€ ์ฒญํฌ๋“ค์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๊ณ„์‚ฐ ์‹œ๊ฐ„์ด ๋Š˜์–ด๋‚˜๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋„ ๋ณ‘๋ ฌ ๋Œ€๋น„ ์ž๊ฐ€ํšŒ๊ท€์˜ ๊ณ„์‚ฐ๋น„์šฉ ์ฆ๊ฐ€๋ฅผ ๋ณด๊ณ ํ•˜์˜€์œผ๋‚˜, ์ตœ์ข… ํ”Œ๋ž˜๋‹ ์„ฑ๋Šฅ์€ ํ–ฅ์ƒ๋จ์„ ์‹คํ—˜์ ์œผ๋กœ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. CompDiffuser์˜ ์ฃผ์š” ์‹คํ—˜๋“ค์€ ์ž๊ฐ€ํšŒ๊ท€ ๋ชจ๋“œ๋กœ ์ˆ˜ํ–‰๋˜์—ˆ์œผ๋ฉฐ, ๊ทธ ํšจ๊ณผ๋Š” Table VII ๋“ฑ์— ์ •๋Ÿ‰์ ์œผ๋กœ ๋น„๊ต๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๋‘ ๋ฐฉ๋ฒ•์„ ๊ทธ๋ฆผ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด, ๋ณ‘๋ ฌ ์ƒ˜ํ”Œ๋ง์˜ ๊ฒฝ์šฐ ๋ชจ๋“  ์ฒญํฌ๊ฐ€ ๋™์‹œ์— ์•„๋ž˜ ๋ฐฉํ–ฅ(๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฐฉํ–ฅ)์œผ๋กœ ์ง„ํ–‰๋˜๊ณ , ์ธ์ ‘ ๊ฐ„์—๋Š” ๊ฐ€๋กœ ๋ฐฉํ–ฅ์˜ ์–•์€ ์ •๋ณด ๊ตํ™˜(์ด์ „ ๋‹จ๊ณ„์˜ noisy neighbor)๋งŒ ์žˆ๋Š” ๋ฐ˜๋ฉด, ์ž๊ฐ€ํšŒ๊ท€ ์ƒ˜ํ”Œ๋ง์€ ํ•œ ์ฒญํฌ๊ฐ€ ์–ด๋А ์ •๋„ ๊นจ๋—ํ•ด์ง„ ํ›„์—์•ผ ๋‹ค์Œ ์ฒญํฌ๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ ์ฒญํฌ ๊ฐ„ ๋” ์ง„ํ•œ ์ •๋ณด ๊ตํ™˜(์ด์ „ ์ฒญํฌ์˜ ๋” ๊นจ๋—ํ•œ ์ƒํƒœ ํ™œ์šฉ)์ด ์ด๋ฃจ์–ด์ง„๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์˜ ๊ทธ๋ฆผ Fig.3์—์„œ๋„ ์ด ์ฐจ์ด๋ฅผ ์ž˜ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋Š”๋ฐ, ํŒŒ๋ž€์ƒ‰ ์ ์„  ํ™”์‚ดํ‘œ๋กœ ์ด์ „ ์ฒญํฌ์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ค์Œ ์ฒญํฌ์— ์˜ํ–ฅ์ฃผ๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด ์ž๊ฐ€ํšŒ๊ท€ ๋ชจ๋“œ์ด๋ฉฐ, ๋ณ‘๋ ฌ ๋ชจ๋“œ์—์„œ๋Š” ๊ทธ๋Ÿฐ ํ™”์‚ดํ‘œ ์—†์ด ๋™์‹œ ์ง„ํ–‰๋˜๋Š” ํ˜•ํƒœ๋กœ ๊ทธ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. CompDiffuser๋Š” ์‚ฌ์šฉ์ž ์„ ํƒ์— ๋”ฐ๋ผ ์ด ๋‘ ๋ชจ๋“œ ์ค‘ ํ•˜๋‚˜๋กœ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๊ณ , ํ•„์š”์— ๋”ฐ๋ผ ํ˜ผํ•ฉ ์ „๋žต๋„ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€, ์ผ๋‹จ ๋ณ‘๋ ฌ๋กœ ๋Œ€๋žต ๊ถค์ ์„ ์–ป์€ ๋’ค ์ž๊ฐ€ํšŒ๊ท€๋กœ ๋ฏธ์„ธ ๋ณด์ •ํ•˜๋Š” ๋ฐฉ์‹๋„ ๊ฐ€๋Šฅํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ƒ์„ฑ๋œ K๊ฐœ์˜ ๋ถ€๋ถ„ ๊ถค์ ๋“ค์€ ์„œ๋กœ ๊ฒน์น˜๋Š” ๊ตฌ๊ฐ„์—์„œ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์—ฐ๊ฒฐ๋˜์–ด ์ตœ์ข… ํ•˜๋‚˜์˜ ์™„์ „ํ•œ ๊ณ„ํš์œผ๋กœ ํ•ฉ์ณ์ง‘๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ์œ„ํ•ด exponential trajectory blending์ด๋ผ๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ๋ฐํžˆ๊ณ  ์žˆ๋Š”๋ฐ, ๊ฒน์นœ ๊ตฌ๊ฐ„์—์„œ ์•ž ๊ถค์ ์˜ ๋๋ถ€๋ถ„๊ณผ ๋’ท ๊ถค์ ์˜ ์‹œ์ž‘๋ถ€๋ถ„์„ ์ง€์ˆ˜์ ์œผ๋กœ weightingํ•˜์—ฌ ๋ณด๊ฐ„ํ•จ์œผ๋กœ์จ ๊ฒฝ๊ณ„์˜ ๋ถˆ์—ฐ์†์„ ์—†์• ๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์–ป์–ด์ง„ ์ตœ์ข… ์žฅ๊ธฐ ํ”Œ๋žœ \tau_{\text{comp}}๋Š” ๋กœ๋ด‡์—๊ฒŒ ์ œ์‹œ๋˜์–ด ์‹คํ–‰๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ ์‹คํ–‰ ๋‹จ๊ณ„(execution time)์—์„œ CompDiffuser๋Š” ์žฌ๊ณ„ํš(replanning)๋„ ์œ ์—ฐํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ ๋ฒˆ ์ƒ์„ฑ๋œ ๊ณ„ํš์ด ์žˆ์–ด๋„, ์‹ค์ œ ๋กœ๋ด‡์ด ๊ทธ ๊ฒฝ๋กœ๋ฅผ ๋”ฐ๋ผ๊ฐ€๋‹ค ์˜ˆ๊ธฐ์น˜ ์•Š๊ฒŒ ๋ฒ—์–ด๋‚˜๊ฑฐ๋‚˜(์˜ˆ: ํœด๋จธ๋…ธ์ด๋“œ๊ฐ€ ๊ท ํ˜•์„ ์žƒ๊ณ  ๊ฒฝ๋กœ์—์„œ ๋ฒ—์–ด๋‚˜๋Š” ๊ฒฝ์šฐ) ๋™์ ์ธ ๋ณ€ํ™”๊ฐ€ ์ƒ๊ธฐ๋ฉด, ํ˜„์žฌ ์ƒํƒœ๋ฅผ ์ƒˆ๋กœ์šด ์‹œ์ž‘์ ์œผ๋กœ ์‚ผ์•„ ๋‹ค์‹œ CompDiffuser๋กœ ์ด์–ด์ง€๋Š” ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ํœด๋จธ๋…ธ์ด๋“œ๊ฐ™์ด ๋ณต์žกํ•œ ๋™์—ญํ•™์„ ๊ฐ€์ง„ ์—์ด์ „ํŠธ์˜ ๊ฒฝ์šฐ ๊ฐ€๋” ์ง€์ •๋œ ์„œ๋ธŒ๊ณจ(subgoal)์„ ๋†“์น˜๋Š” ์ผ์ด ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ด๋•Œ ์˜ค์ฐจ ์ž„๊ณ„์น˜๋ฅผ ๋„˜์œผ๋ฉด ๊ณง๋ฐ”๋กœ replanํ•˜์—ฌ ๊ฒฝ๋กœ๋ฅผ ์ˆ˜์ •ํ•ด ๋ชฉํ‘œ ๋‹ฌ์„ฑ์„ ๋„๋ชจํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์žฌ๊ณ„ํš ๊ธฐ๋Šฅ์€ CompDiffuser์˜ ๊ตฌ์„ฑ์  ์ƒ์„ฑ ํŠน์„ฑ ๋•๋ถ„์— ๊ตญ๋ถ€์ ์ธ ์ˆ˜์ •๋งŒ์œผ๋กœ๋„ ํฐ ๋ฌธ์ œ ์—†์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ ๋งค์šฐ ๋ณต์žกํ•œ ํ™˜๊ฒฝ์—์„œ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐ ๊ธฐ์—ฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ (Experiments and Results)

์ €์ž๋“ค์€ CompDiffuser์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ™˜๊ฒฝ๊ณผ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ OGBench๋ผ ๋ถˆ๋ฆฌ๋Š” ๊ณต๊ฐœ ๋ฒค์น˜๋งˆํฌ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์กฐ๊ฑด๋“ค์„ ๋ณ€ํ™”์‹œ์ผฐ์Šต๋‹ˆ๋‹ค:

  • ํ™˜๊ฒฝ ๋ณต์žก๋„: ๊ฐ„๋‹จํ•œ U์žํ˜• ๋ฏธ๋กœ๋ถ€ํ„ฐ ๋งค์šฐ ๋ณต์žกํ•œ ๊ฑฐ๋Œ€ ๋ฏธ๋กœ (giant maze)๊นŒ์ง€ ๊ณต๊ฐ„ ๊ทœ๋ชจ๋ฅผ ๋‹ฌ๋ฆฌํ•œ ์—ฌ๋Ÿฌ ํ™˜๊ฒฝ์—์„œ ํ…Œ์ŠคํŠธํ–ˆ์Šต๋‹ˆ๋‹ค. ํ™˜๊ฒฝ์ด ์ปค์งˆ์ˆ˜๋ก ์‹œ์ž‘๊ณผ ๋ชฉํ‘œ ์‚ฌ์ด ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€์–ด์ ธ ํ•„์š”ํ•œ ๊ณ„ํš horizion์ด ๊ธธ์–ด์ง‘๋‹ˆ๋‹ค.
  • ์—์ด์ „ํŠธ ์ƒํƒœ ์ฐจ์›: 2์ฐจ์› ์ massa(Point), 4์กฑ ๋ณดํ–‰ ๋กœ๋ด‡(ant)์˜ 15~29์ฐจ์› ์ƒํƒœ, ๊ทธ๋ฆฌ๊ณ  50์ฐจ์› ํœด๋จธ๋…ธ์ด๋“œ ๋“ฑ ์ƒํƒœ๊ณต๊ฐ„ ์ฐจ์›์„ ํฌ๊ฒŒ ๋Š˜๋ฆฐ ์‹คํ—˜์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ƒํƒœ ์ฐจ์›์ด ํด์ˆ˜๋ก ๋” ๋ณต์žกํ•œ ๋™์ž‘์„ ๊ณ„ํšํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๋‚œ์ด๋„๊ฐ€ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ถค์  ์œ ํ˜•: ์ˆœ์ˆ˜ํ•œ ๋ฏธ๋กœ ๋‚ด ๋„ค๋น„๊ฒŒ์ด์…˜ ๊ฒฝ๋กœ๋ถ€ํ„ฐ, ๊ณต์„ ๋“œ๋ฆฌ๋ธ”ํ•˜๋Š” ๋ณตํ•ฉ ํ–‰๋™ ์‹œํ€€์Šค๊นŒ์ง€ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ๊ถค์ ์„ ๋‹ค๋ค˜์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด CompDiffuser๊ฐ€ ์ด์งˆ์ ์ธ ํ–‰๋™ ์กฐ๊ฐ๋„ ์ž˜ ์ด์–ด๋ถ™์ด๋Š”์ง€ ๊ฒ€์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด AntSoccer ํ™˜๊ฒฝ์—์„œ๋Š” ๊ณต ์—†์ด ๋‹ฌ๋ฆฌ๋Š” ์›€์ง์ž„๊ณผ ๊ณต์„ ๋“œ๋ฆฌ๋ธ”ํ•˜๋ฉฐ ์›€์ง์ด๋Š” ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ๊ถค์ ์„ ํ•™์Šตํ•œ ํ›„, ์‹ค์ œ ํ…Œ์ŠคํŠธ์—์„œ๋Š” ์ด ๋‘˜์„ ์—ฐ๊ฒฐํ•˜์—ฌ โ€œ๋จผ ๊ณณ์— ์žˆ๋Š” ๊ณต์œผ๋กœ ๋‹ฌ๋ ค๊ฐ„ ๋’ค ๊ณต์„ ๋ชฐ๊ณ  ๋ชฉํ‘œ ์ง€์ ๊นŒ์ง€ ์ด๋™โ€ํ•˜๋Š” ์ƒˆ๋กœ์šด ํƒœ์Šคํฌ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ํ•™์Šต ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ: ์ „๋ฌธ๊ฐ€ ์‹œ์—ฐ๊ณผ ๊ฐ™์€ ๊นจ๋—ํ•œ ๋ฐ์ดํ„ฐ๋ฟ ์•„๋‹ˆ๋ผ, ๋ฌด์ž‘์œ„ ํƒ์ƒ‰ ์ •์ฑ…์ด ๋ชจ์€ ์ €ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ์—์„œ๋„ ํ•™์Šต์‹œ์ผœ ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์ด ๋‚ฎ์•„๋„ CompDiffuser๊ฐ€ ์“ธ๋ชจ ์žˆ๋Š” ๊ฒฝ๋กœ๋ฅผ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ๋Š”์ง€ ์‹คํ—˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ Explore ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฒฝ์šฐ, ์—์ด์ „ํŠธ๊ฐ€ ๋งˆ๊ตฌ์žก์ด๋กœ ์›€์ง์ด๋ฉฐ ๋ฐฉํ–ฅ์„ ์ˆ˜์‹œ๋กœ ๋ฐ”๊พธ๋Š” ๋งค์šฐ ๋…ธ์ด์ฆˆ ๋งŽ์€ ๊ถค์ ๋“ค๋กœ ํ•™์Šตํ–ˆ์Œ์—๋„, CompDiffuser๋Š” ๊ทธ ์ค‘ ์ผ๋ถ€ ๊ตฌ๊ฐ„๋“ค์„ ์ด์–ด์„œ ์›ํ•˜๋Š” ๋ชฉํ‘œ๊นŒ์ง€ ๋„๋‹ฌํ•˜๋Š” ๊ฒฝ๋กœ๋ฅผ ํ•ฉ์„ฑํ•ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ์ด๋Ÿฐ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ๋ชฉํ‘œ์ง€์ ๊นŒ์ง€ ์ด๋ฅด๋Š” ์˜๋ฏธ ์žˆ๋Š” ๊ฒฝ๋กœ๋ฅผ ์ฐพ์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

์„ฑ๋Šฅ ๋น„๊ต ๋ฐ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ

ํ‰๊ฐ€ ์ฒ™๋„๋Š” ์„ฑ๊ณต๋ฅ (success rate)๋กœ, ์—์ด์ „ํŠธ (ํ˜น์€ ๋ชฉํ‘œ ๊ฐ์ฒด)๊ฐ€ ๋ชฉํ‘œ ์ƒํƒœ ๊ทผ์ฒ˜๊นŒ์ง€ ๋„๋‹ฌํ•˜๋ฉด ์„ฑ๊ณต์œผ๋กœ ๊ฐ„์ฃผํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ์— ๋Œ€ํ•ด CompDiffuser๋ฅผ ์•„๋ž˜์˜ ์—ฌ๋Ÿฌ ๊ธฐ์ค€์˜ ๊ธฐ์กด ๊ธฐ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ–ˆ๋Š”๋ฐ, ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€ ๋ฒ”์ฃผ๊ฐ€ ํฌํ•จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค:

  1. ์ƒ์„ฑ ๊ณ„ํš (Generative Planning): Decision Diffuser (DD)์™€ Generative Skill Chaining (GSC) ๋“ฑ์ด ์—ฌ๊ธฐ์— ์†ํ•ฉ๋‹ˆ๋‹ค. Decision Diffuser๋Š” ํ•™์Šต๋œ diffusion ๋ชจ๋ธ๋กœ ์ „์ฒด ๊ฒฝ๋กœ๋ฅผ ํ•œ ๋ฒˆ์— ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ธฐ์กด ์ ‘๊ทผ์œผ๋กœ, monolithic(๋‹จ์ผ ๋ชจ๋ธ) ๊ณ„ํš์˜ ๋Œ€ํ‘œ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด GSC๋Š” CompDiffuser์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋ถ€๋ถ„ ๊ถค์ ์„ ํ•ฉ์„ฑํ•˜๋ ค๋Š” ์‹œ๋„๋กœ, ์ธ์ ‘ ๊ถค์ ์˜ score๋ฅผ ํ‰๊ท ๋‚ด๋ฉฐ ์—ฐ๊ฒฐํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
  2. ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•: ์ด๋Š” Ghugare et al., 2022์˜ ๊ธฐ๋ฒ• ๋“ฑ์œผ๋กœ SA (state augmentation), GA (goal augmentation) ๋“ฑ์œผ๋กœ ๋ถˆ๋ฆฝ๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ์งง์€ ๊ฒฝ๋กœ ๋ฐ์ดํ„ฐ์—์„œ ์ธ์œ„์ ์œผ๋กœ ๋ชฉํ‘œ๋ฅผ ์žฌ์„ค์ •ํ•˜๊ฑฐ๋‚˜ ์ค‘๊ฐ„ ์ƒํƒœ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ์‹์œผ๋กœ ๊ธด ๊ฒฝ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•ํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์‰ฝ๊ฒŒ ๋งํ•ด, ๊ธฐ์กด ๋ฐ์ดํ„ฐ ์กฐ๊ฐ๋“ค์„ ๋ถ™์—ฌ๋ณด๋Š” ์—ฐ์Šต์„ ์‹œ์ผœ ์ผ๋ฐ˜ํ™”๋ฅผ ๊พ€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์ž…๋‹ˆ๋‹ค.
  3. ์˜คํ”„๋ผ์ธ RL: Q-learning ๊ธฐ๋ฐ˜(์˜ˆ: QRL, HIQL ๋“ฑ) ์˜คํ”„๋ผ์ธ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๋„ ๋น„๊ต์— ํฌํ•จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋กœ ํ–‰๋™๊ฐ€์น˜ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜๊ณ  ๋ชฉํ‘œ ์ง€์ ๊นŒ์ง€ ์ •์ฑ…์„ ๋„์ถœํ•˜๋Š” ์ ‘๊ทผ๋“ค์ž…๋‹ˆ๋‹ค.

๋น„๊ต ๊ฒฐ๊ณผ, CompDiffuser๋Š” ๋ชจ๋“  ๋ฒ”์ฃผ์˜ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์„ ์ƒ๋‹นํžˆ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ํ™˜๊ฒฝ ๊ทœ๋ชจ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ๊ทธ ์ฐจ์ด๊ฐ€ ๋‘๋“œ๋Ÿฌ์กŒ๋Š”๋ฐ, ์˜ˆ๋ฅผ ๋“ค์–ด ๊ฐ€์žฅ ๋ณต์žกํ•œ Giant ๋ฏธ๋กœ์—์„œ CompDiffuser๋Š” ๋ชจ๋“  ์‹œ๋„์—์„œ ์„ฑ๊ณตํ•œ ๋ฐ˜๋ฉด, Decision Diffuser๋‚˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•๋“ค์€ ์ค‘๊ฐ„ ์ง€์ ์—์„œ ๊ฒฝ๋กœ๊ฐ€ ๋ฌด๋„ˆ์ ธ ์‹คํŒจํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜์Šต๋‹ˆ๋‹ค. Fig.1 (์ขŒ์ธก)์—์„œ๋„ monolithic ํ™•์‚ฐ ํ”Œ๋ž˜๋„ˆ(๊ธฐ์กด Diffuser)๊ฐ€ ๋ฏธ๋กœ ์ค‘์•™์—์„œ ๊ฒฝ๋กœ๋ฅผ ์žƒ๊ณ  ํ—ค๋งค๋Š”(collapses to center) ๋ชจ์Šต์„ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐ˜๋ฉด, CompDiffuser(์šฐ์ธก)๋Š” ๊ธธ์„ ๋๊นŒ์ง€ ์ฐพ์•„ ๋‚˜๊ฐ€๋Š” ๊ฒƒ์„ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” CompDiffuser์˜ ๊ตฌ์„ฑ์  ์ƒ์„ฑ์ด ์ž‘๋™ํ•˜์—ฌ, ํ•™์Šต ์‹œ ๋ณด์ง€ ๋ชปํ•œ ์žฅ๊ฑฐ๋ฆฌ๋„ ์ด์–ด๋ถ™์—ฌ ํ•ด๊ฒฐํ•ด๋‚ธ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. PointMaze Giant ํ™˜๊ฒฝ์—์„œ์˜ ์งˆ์  ๋น„๊ต์—์„œ๋„, CompDiffuser๊ฐ€ ์ถœ๋ฐœ์ง€์—์„œ ๋ชฉํ‘œ์ง€๊นŒ์ง€ ๋„๋‹ฌํ•˜๋Š” ๋‹ค์–‘ํ•œ ๊ฒฝ๋กœ๋“ค์„ ๋‚ด๋Š” ๋ฐ˜๋ฉด, DD๋‚˜ GSC๋Š” ์žฅ์• ๋ฌผ์„ ๋šซ๊ณ  ๊ฐ€๊ฑฐ๋‚˜ ์ „ํ˜€ ๋‹ค๋ฅธ ๊ณณ์œผ๋กœ ๊ฐ€๋ฒ„๋ฆฌ๋Š” ๋ถ„ํฌ ๋ฐ–(o.o.d.) ๊ถค์ ์„ ์ƒ์„ฑํ•˜๋Š” ์‹คํŒจ ์‚ฌ๋ก€๊ฐ€ ๋‹ค์ˆ˜ ๊ด€์ฐฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

GSC์™€ ๋น„๊ตํ•˜๋ฉด, ์ค‘๊ฐ„ ์ˆ˜์ค€ ๋‚œ์ด๋„๊นŒ์ง€๋Š” ์œ ์‚ฌํ•œ ์„ฑ๊ณผ๋ฅผ ๋ณด์ด๋‚˜ ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก CompDiffuser๊ฐ€ ์›”๋“ฑํ•ด์กŒ์Šต๋‹ˆ๋‹ค. GSC๋Š” ํ™•์‚ฐ ๋ชจ๋ธ์˜ score-averaging ๋ฐฉ์‹์œผ๋กœ ์ฒญํฌ๋ฅผ ์ž‡๋Š”๋ฐ, Giant ํ™˜๊ฒฝ์ฒ˜๋Ÿผ ์š”๊ตฌ ์„ธ๊ทธ๋จผํŠธ ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ ์ ์ฐจ ๋ถˆ์•ˆ์ •ํ•ด์กŒ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด CompDiffuser๋Š” ๋๊นŒ์ง€ ์•ˆ์ •์ ์œผ๋กœ ๊ณ„ํš์„ ์™„์„ฑํ•˜์—ฌ Giant Maze์—์„œ๋„ ์„ฑ๊ณต๋ฅ  100%์— ๊ฐ€๊นŒ์šด ๊ฒฐ๊ณผ๋ฅผ ๋‹ฌ์„ฑ, ๋ชจ๋“  baseline ์ค‘ ๋…๋ณด์ ์ธ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์ด ์ž๋™์œผ๋กœ ์ ์ ˆํ•œ ๊ฒน์นจ ์ง€์ ์„ ์ฐพ์ง€ ๋ชปํ•ด ์‹คํŒจํ•˜๋Š” ๋ฐ˜๋ฉด CompDiffuser๋Š” ํ›ˆ๋ จ๋ถ€ํ„ฐ ๊ฒน์นจ ๊ตฌ๊ฐ„์„ ๋ชจ๋ธ๋งํ–ˆ๊ธฐ์— ๊ฐ€๋Šฅํ•œ ์ฐจ์ด๋ผ๊ณ  ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ AntSoccer (๊ณต๋ชฐ์ด) ์‹คํ—˜์—์„œ, CompDiffuser๋Š” ๋‘ ์ข…๋ฅ˜์˜ ํ–‰๋™ ๊ถค์ ์„ ๋งค๋„๋Ÿฝ๊ฒŒ ์—ฐ๊ฒฐํ•ด ์ƒˆ๋กœ์šด ๋ณตํ•ฉ ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ์ •์„ฑ์ ์œผ๋กœ๋„ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ธฐ์กด ๋ฐ์ดํ„ฐ์— ์—†์—ˆ๋˜ โ€œ๋จผ ๊ฑฐ๋ฆฌ์˜ ๊ณต์œผ๋กœ ๋‹ฌ๋ ค๊ฐ„ ๋’ค ๊ณต์„ ๊ณจ๋Œ€๋กœ ๋“œ๋ฆฌ๋ธ”โ€ ๊ฐ™์€ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ํ•œ ๋ฒˆ์˜ ํ”Œ๋ž˜๋‹์œผ๋กœ ์„ฑ๊ณตํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์˜ ํ™•์žฅ์„ฑ์€ ๋กœ๋ด‡์ด ์—ฌ๋Ÿฌ ๋ชจ๋“ˆ์‹ ์Šคํ‚ฌ์„ ๋ฐฐ์›Œ ํ•„์š”์— ๋”ฐ๋ผ ์กฐํ•ฉ ์‹คํ–‰ํ•˜๋Š” ๋ฏธ๋ž˜ํ˜• ๋ฐฉํ–ฅ๊ณผ๋„ ๋งž๋‹ฟ์•„ ์žˆ์Šต๋‹ˆ๋‹ค.

์ถ”๊ฐ€ ๋ถ„์„: ์„ธ๊ทธ๋จผํŠธ ์ˆ˜, ์ƒํƒœ ์ฐจ์›, ์žฌ๊ณ„ํš ๋“ฑ

๋…ผ๋ฌธ์—์„œ๋Š” CompDiffuser์˜ ๋™์ž‘์„ ๋” ๊นŠ์ด ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋ช‡ ๊ฐ€์ง€ ์š”์ธ๋ณ„ ์‹คํ—˜(ablations)๋„ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋จผ์ €, ๊ตฌ์„ฑํ•˜๋Š” ๊ถค์  ์กฐ๊ฐ์˜ ๊ฐœ์ˆ˜ K๊ฐ€ ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ณด์•˜๋Š”๋ฐ, ๋„ˆ๋ฌด ์ ์€ ์„ธ๊ทธ๋จผํŠธ๋กœ ์žฅ๊ฑฐ๋ฆฌ ๋ชฉํ‘œ๋ฅผ ์ปค๋ฒ„ํ•˜๋ ค ํ•  ๊ฒฝ์šฐ ๊ณ„ํš์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ์ƒํ™ฉ์ด ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ Giant ๋ฏธ๋กœ์—์„œ ์ตœ์†Œ 9๊ฐœ ์กฐ๊ฐ์€ ํ•„์š”ํ–ˆ๋Š”๋ฐ, ์ด๋ฅผ 7๊ฐœ ๋“ฑ์œผ๋กœ ์–ต์ง€๋กœ ์ค„์ด๋ฉด ๊ฒน์นจ ๊ตฌ๊ฐ„์ด ๊ฑฐ์˜ ์—†์–ด์ ธ ๊ฒฝ๋กœ๊ฐ€ ๋ฒฝ์„ ๋šซ๊ณ  ๊ฐ€๋Š” ๋“ฑ ๋ถˆ๊ฐ€๋Šฅํ•œ ํ”Œ๋žœ์ด ๋‚˜์™”์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ ํ•„์š”ํ•œ ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ์€ ์„ธ๊ทธ๋จผํŠธ๋ฅผ ์ฃผ๋ฉด, ๋ชฉํ‘œ๊นŒ์ง€ ๋‚จ์€ ๊ฑฐ๋ฆฌ๋ฅผ ์ฑ„์šฐ๊ธฐ ์œ„ํ•ด ๊ฒฝ๋กœ๊ฐ€ ์ง€๊ทธ์žฌ๊ทธ๋กœ ๊ตฐ๋”๋”๊ธฐ ์›€์ง์ž„์„ ๋ณด์ด๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ ์ ˆํ•œ K๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋ฉฐ, ์ด๋Š” ์‚ฌ์ „์— ํ™˜๊ฒฝ์— ๋”ฐ๋ผ ๋Œ€๋žต ๊ฒฐ์ •ํ•˜๊ฑฐ๋‚˜, ๋ชจ๋ธ์ด ์ž๋™์œผ๋กœ ์กฐ์ ˆํ•˜๋„๋ก ํ•  ์—ฌ์ง€๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹คํ–‰ํžˆ CompDiffuser๋Š” K๋งŒ ์ถฉ๋ถ„ํ•˜๋‹ค๋ฉด (์•ฝ๊ฐ„ ๋งŽ์•„๋„) ์„ฑ๊ณต๋ฅ ์€ ํฌ๊ฒŒ ๋–จ์–ด์ง€์ง€ ์•Š๊ณ , ์ฃผ๋กœ ๊ฒฝ๋กœ ํšจ์œจ์„ฑ๋งŒ ์˜ํ–ฅ๋ฐ›๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.

๊ณ„ํš ์ƒํƒœ ์ฐจ์›์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ๋Š”, ์ถ•์•ฝ๋œ ์ƒํƒœ ๊ณต๊ฐ„ vs. ๊ณ ์ฐจ์› ์ „์ฒด ์ƒํƒœ ๊ณต๊ฐ„ ์ค‘ ์–ด๋А ์ชฝ์œผ๋กœ ํ”Œ๋ž˜๋‹ํ• ์ง€ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด AntSoccer์—์„œ ๊ฐœ๋ฏธ ๋กœ๋ด‡๊ณผ ๊ณต์˜ x,y ์œ„์น˜๋งŒ์œผ๋กœ 4์ฐจ์› ํ”Œ๋ž˜๋‹์„ ํ•œ ๊ฒฝ์šฐ์™€, ๊ฐœ๋ฏธ ๊ด€์ ˆ ๊ฐ๋„ ๋“ฑ 13๊ฐœ ์ถ”๊ฐ€ ๊ด€์ ˆ๋ณ€์ˆ˜๋ฅผ ํฌํ•จํ•œ 17์ฐจ์› ํ”Œ๋ž˜๋‹์„ ํ•œ ๊ฒฝ์šฐ๋ฅผ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ 17D (์ „์ฒด ์ƒํƒœ) ํ”Œ๋ž˜๋‹์ด ์•ฝ๊ฐ„ ๋” ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ณ ์ฐจ์› ์ž…๋ ฅ์ด ์ฃผ๋Š” ์„ธ๋ถ€์ •๋ณด โ€“ ํŠนํžˆ ๊ณต์„ ํšจ๊ณผ์ ์œผ๋กœ ๋ชฐ๊ธฐ ์œ„ํ•œ ๊ด€์ ˆ ์›€์ง์ž„ ์ •๋ณด โ€“ ๊ฐ€ ๊ณ„ํš์˜ ๋ฏธ์„ธ ์กฐ์ •์— ๋„์›€์„ ์ค€ ๊ฒƒ์œผ๋กœ ํ•ด์„๋ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๊ณ ์ฐจ์› ํ”Œ๋ž˜๋‹์€ ๋ชจ๋ธ์ด ๊ณ ๋ คํ•ด์•ผ ํ•  ์š”์†Œ๊ฐ€ ๋งŽ์•„์ ธ ์—ฐ์‚ฐ๋Ÿ‰ ์ฆ๊ฐ€ ๋ฐ ํ•™์Šต ๋‚œ์ด๋„ ์ƒ์Šน ์š”์ธ์ด ์žˆ์œผ๋ฏ€๋กœ, ์–ด๋–ค ์ƒํƒœ ํ‘œํ˜„์„ ์“ธ์ง€๋Š” ์„ฑ๋Šฅ๊ณผ ํšจ์œจ์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์žฌ๊ณ„ํš ์—ฌ๋ถ€๋„ ์„ฑ๋Šฅ์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ์ค‘๊ฐ„ ๊ทœ๋ชจ ํ™˜๊ฒฝ(๋ฏธ๋กœ)์—์„œ๋Š” ํ•œ ๋ฒˆ ์ƒ์„ฑํ•œ ํ”Œ๋žœ๋งŒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํžˆ ๋ชฉํ‘œ์— ๋„๋‹ฌํ–ˆ๊ณ , ์žฌ๊ณ„ํš์„ ํ•ด๋„ ํฐ ์ฐจ์ด๊ฐ€ ์—†์—ˆ์ง€๋งŒ, ๊ฐ€์žฅ ๋ณต์žกํ•œ Giant ๋ฏธ๋กœ์—์„œ๋Š” ์žฌ๊ณ„ํš ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ–ˆ์„ ๋•Œ ์„ฑ๊ณต๋ฅ ์ด ์œ ์˜๋ฏธํ•˜๊ฒŒ ํ–ฅ์ƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€, ํœด๋จธ๋…ธ์ด๋“œ์˜ ๊ฒฝ์šฐ ์•ž์„œ ์–ธ๊ธ‰ํ•œ ๋Œ€๋กœ ์ข…์ข… ๊ฒฝ๋กœ๋ฅผ ๋”ฐ๋ผ๊ฐ€์ง€ ๋ชปํ•˜๊ณ  ๋ฏธ๋„๋Ÿฌ์ง€๋Š” ์ผ์ด ์ƒ๊ฒผ๋Š”๋ฐ, ์ด๋•Œ ์ฆ‰๊ฐ์ ์œผ๋กœ CompDiffuser๋ฅผ ํ˜ธ์ถœํ•ด ๋‚จ์€ ๊ฑฐ๋ฆฌ์˜ ์ƒˆ ๊ฒฝ๋กœ๋ฅผ ๋งŒ๋“ค์–ด์ฃผ๋‹ˆ ๊ฒฐ๊ตญ ๋ชฉํ‘œ๊นŒ์ง€ ๊ฐˆ ํ™•๋ฅ ์ด ํฌ๊ฒŒ ๋†’์•„์กŒ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š”, ์˜คํ”ˆ๋ฃจํ”„(open-loop) ๊ณ„ํš์˜ ํ•œ๊ณ„๋ฅผ ํ”ผ๋“œ๋ฐฑ ๋ณด๊ฐ•(๊ฒฝ๋กœ ์ˆ˜์ •)์œผ๋กœ ๊ทน๋ณตํ•˜๋Š” ์ ‘๊ทผ์˜ ํ•„์š”์„ฑ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์‹ค์ œ ๋กœ๋ด‡ ์ ์šฉ ์‹œ์—๋„ ์œ ์šฉํ•œ ์„ฑ์งˆ์ด๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ข…ํ•ฉํ•˜๋ฉด, CompDiffuser๋Š” ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์—์„œ ์ผ๊ด€๋˜๊ฒŒ ๋†’์€ ์„ฑ๋Šฅ๊ณผ ์œ ์—ฐ์„ฑ์„ ์‹œํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ์งง์€ ์กฐ๊ฐ๋“ค์— ๋ถˆ๊ณผํ•ด๋„, ํ•„์š”ํ•œ ๋งŒํผ ์กฐ๊ฐ์„ ์กฐํ•ฉํ•จ์œผ๋กœ์จ ์ž„์˜์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ปค๋ฒ„ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด ๊ณ ๋ฌด์ ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด PointMaze Large ํ™˜๊ฒฝ์—์„œ ํ›ˆ๋ จ ์‹œ ์ตœ๋Œ€ 4 ๋ธ”๋ก ๊ธธ์ด์˜ ๊ฒฝ๋กœ๋งŒ ๋ดค์ง€๋งŒ, ํ…Œ์ŠคํŠธ ์‹œ๋Š” 15๋ธ”๋ก ๋–จ์–ด์ง„ ๋ชฉํ‘œ์— ๋Œ€ํ•ด์„œ๋„ 5๊ฐœ์˜ ์„ธ๊ทธ๋จผํŠธ๋ฅผ ์ด์–ด๋ถ™์—ฌ ์ถฉ๋ถ„ํžˆ ๋„๋‹ฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ ์ธก๋ฉด์—์„œ, CompDiffuser๊ฐ€ ์–ผ๋งˆ๋‚˜ ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™”๋ฅผ ๋ณด์—ฌ์ฃผ๋Š”์ง€ ๋‹จ์ ์œผ๋กœ ๋งํ•ด์ค๋‹ˆ๋‹ค.

๊ธฐ์กด ์ ‘๊ทผ๋ฒ•๊ณผ์˜ ์ฐจ๋ณ„์  (Discussion)

CompDiffuser์˜ ์ ‘๊ทผ์€ ๊ธฐ์กด ์žฅ๊ธฐ๊ณ„ํš ๊ธฐ๋ฒ•๋“ค๊ณผ ๊ฒฌ์ฃผ์–ด ๋ช‡ ๊ฐ€์ง€ ๋šœ๋ ทํ•œ ์žฅ์ ์„ ๊ฐ–์Šต๋‹ˆ๋‹ค. ์ฒซ์งธ, ๋ชจ๋ธ ํ•˜๋‚˜๋กœ ๋ชจ๋“  ๋ถ€๋ถ„ ๊ถค์ ์„ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๊ณผ๊ฑฐ ์—ฐ๊ตฌ๋“ค์ฒ˜๋Ÿผ ์ฒญํฌ ๋ณ„๋กœ ๋”ฐ๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฑฐ๋‚˜ ์‚ฌ์ „์— ์Šคํ‚ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ตฌ์ถ•ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๊ณผ๊ฑฐ trajectory stitching ๊ด€๋ จ ์—ฐ๊ตฌ๋“ค์€ ์ข…์ข… ์‚ฌ์ „ ์ •์˜๋œ ์ ‘ํ•ฉ์ ์ด๋‚˜ ์„œ๋ธŒ๊ณจ ์„ ํƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์˜์กดํ–ˆ๊ณ , ์ด๋Š” ํ™˜๊ฒฝ์ด ๋‹ฌ๋ผ์ง€๋ฉด ์žฌ์„ค๊ณ„๊ฐ€ ํ•„์š”ํ•˜๊ฑฐ๋‚˜ ์ตœ์ ์˜ ์ ‘ํ•ฉ์ ์„ ์ฐพ๊ธฐ ํž˜๋“  ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด CompDiffuser๋Š” ํ™•์‚ฐ ๋ชจ๋ธ ๋‚ด์—์„œ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ ‘ํ•ฉ์„ ํ•™์Šตํ•˜๋ฏ€๋กœ, ์‚ฌ๋žŒ์ด ๋ณ„๋„๋กœ overlap ์ง€์ ์„ ์ฐพ์ง€ ์•Š์•„๋„ ๋ฉ๋‹ˆ๋‹ค. ๋‘˜์งธ, ํ™•์‚ฐ ๋ชจ๋ธ์˜ multi-modality๋ฅผ ๊ณ„์Šนํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ฒฝ๋กœ ํ˜•ํƒœ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ ์‹œ์ž‘-๋ชฉํ‘œ ์Œ์— ๋Œ€ํ•ด์„œ๋„ CompDiffuser๋Š” deterministicํ•œ ์ตœ๋‹จ๊ฒฝ๋กœ ํ•˜๋‚˜๋งŒ ๋‚ด๋†“๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์—ฌ๋Ÿฌ ๋ฒˆ ์ƒ˜ํ”Œ๋งํ•˜๋ฉด ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ์šฐํšŒ ๊ฒฝ๋กœ๋“ค๋„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋กœ๋ด‡์—๊ฒŒ ์—ฌ๋Ÿฌ ๋Œ€์•ˆ ๊ฒฝ๋กœ๋ฅผ ์ œ์‹œํ•˜๊ฑฐ๋‚˜, ํƒ์ƒ‰ ๊ณต๊ฐ„์„ ๋„“๊ฒŒ ์ปค๋ฒ„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์…‹์งธ, off-policy ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ํ•™์Šต ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต๊ณผ ๋‹ฌ๋ฆฌ, CompDiffuser๋Š” ๋ฆฌ์›Œ๋“œ ํƒœ๊น… ์—†์ด ์ˆ˜์ง‘๋œ ๊ถค์ ๋„ ํ™œ์šฉํ•˜์—ฌ ๋ชฉํ‘œ์ง€ํ–ฅ ํ”Œ๋ž˜๋‹์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ ์‹คํŒจํ•œ ์‹œ๋„๋‚˜ ๋ถ€๋ถ„ ์„ฑ๊ณต ๋ฐ์ดํ„ฐ๋„ ๋ชจ๋ธ ํ•™์Šต์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ๊ณ , ์ด๋Š” ์˜คํ”„๋ผ์ธ RL ๊ธฐ๋ฒ•๋“ค๋ณด๋‹ค ๋ฐ์ดํ„ฐ ํ™œ์šฉ ๋ฒ”์œ„๊ฐ€ ๋„“๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ €์ž๋“ค์€ ํ’ˆ์งˆ ๋‚ฎ์€ ํƒํ—˜ ๋ฐ์ดํ„ฐ๋กœ๋„ CompDiffuser๋ฅผ ํ•™์Šต์‹œ์ผœ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ์„ ์–ป์—ˆ์ง€๋งŒ, Q-learning ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋“ค์€ ๊ทธ๋Ÿฐ ๋ฐ์ดํ„ฐ๋กœ๋Š” ํ•™์Šต์ด ์–ด๋ ค์› ์Œ์„ ์ง€์ ํ•ฉ๋‹ˆ๋‹ค.

๋ฌผ๋ก  CompDiffuser์—๋„ ๊ทน๋ณตํ•ด์•ผ ํ•  ํ•œ๊ณ„์™€ ๋„์ „ ๊ณผ์ œ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์šฐ์„ , ๋™์ ์ธ ์žฅ์• ๋ฌผ์ด๋‚˜ ์‹ค์‹œ๊ฐ„ ๋ณ€ํ™” ์ƒํ™ฉ์— ๋Œ€ํ•œ ๋Œ€์‘์€ ์—ฌ์ „ํžˆ ์–ด๋ ค์šด ๋ฌธ์ œ๋กœ ๋‚จ์•„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์ •์ ์ธ ๋ชฉํ‘œ์™€ ํ™˜๊ฒฝ์—์„œ์˜ ํ”Œ๋ž˜๋‹์„ ๋‹ค๋ฃจ์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ž๋™์ฐจ ์ฃผํ–‰์ฒ˜๋Ÿผ ์›€์ง์ด๋Š” ๊ฐ์ฒด๊ฐ€ ์žˆ๋Š” ํ™˜๊ฒฝ์—์„œ ์‹ค์‹œ๊ฐ„ ์žฌ๊ณ„ํš์„ ์–ผ๋งˆ๋‚˜ ๋น ๋ฅด๊ฒŒ ํ•  ์ˆ˜ ์žˆ์„์ง€๋Š” ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ธ€๋กœ๋ฒŒ ์ตœ์  ๊ฒฝ๋กœ๋ฅผ ๋ณด์žฅํ•˜์ง€๋Š” ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ์„ธ๊ทธ๋จผํŠธ ๋‹จ์œ„ ์ตœ์ ํ™”๋กœ ์ธํ•ด ์ „์ฒด์ ์œผ๋กœ ์šฐํšŒ๊ฐ€ ์‹ฌํ•œ ๊ฒฝ๋กœ๊ฐ€ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์„ธ๊ทธ๋จผํŠธ๋กœ ์ชผ๊ฐœ ๊ณ„ํšํ•  ๋•Œ ์ƒ๊ธธ ์ˆ˜ ์žˆ๋Š” ๊ทผ๋ณธ์  ํ•œ๊ณ„์ธ๋ฐ, ์ด๋Ÿฐ ๋น„์ตœ์ ์„ฑ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ํ›„์ฒ˜๋ฆฌ ์ตœ์ ํ™”๋‚˜ cost-to-go ์กฐ๊ฑด ๋“ฑ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉํ–ฅ๋„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์— ๊ด€ํ•œ ์ด์Šˆ๋กœ, ๋„ˆ๋ฌด ํŽธํ–ฅ๋œ ์งง์€ ๊ฒฝ๋กœ๋งŒ ์žˆ์œผ๋ฉด ๋ชจ๋ธ์ด ์ผ๋ถ€ ๊ตฌ๊ฐ„๋งŒ ๋ฐ˜๋ณต ํ™œ์šฉํ•˜๋ ค ๋“ค ๊ฐ€๋Šฅ์„ฑ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋„ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ coverage๊ฐ€ ์ค‘์š”ํ•จ์„ ์–ธ๊ธ‰ํ•˜๋ฉฐ, ํ–ฅํ›„์—๋Š” ๋ฐ์ดํ„ฐ ํšจ์œจ์„ ๋” ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์ด๋‚˜ ํ•„์š” ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ์  ๋ถ„์„ ๋“ฑ์ด ๊ณผ์ œ๋กœ ๋‚จ์Œ์„ ์‹œ์‚ฌํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก  ๋ฐ ์ „๋ง (Conclusion)

CompDiffuser๋Š” ํ™•์‚ฐ ๋ชจ๋ธ์˜ ์ƒ์„ฑ๋ ฅ๊ณผ ์ „ํ†ต์  ๊ณ„ํš์˜ ๋ชจ๋“ˆ์„ฑ์„ ๊ฒฐํ•ฉํ•˜์—ฌ, ๋กœ๋ด‡์˜ ์žฅ๊ธฐ ๊ณ„ํš ๋ฌธ์ œ์— ์ƒˆ๋กœ์šด ํ•ด๋ฒ•์„ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค. โ€œ์งง์€ ๊ถค์ ์„ ๋ถ™์—ฌ ๊ธด ๊ถค์ ์„ ๋งŒ๋“ ๋‹คโ€๋Š” ์ง๊ด€์  ์•„์ด๋””์–ด๋ฅผ ํ™•์‚ฐ ๋ชจ๋ธ ์•ˆ์—์„œ ๊ตฌํ˜„ํ•จ์œผ๋กœ์จ, ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ์ƒˆ๋กœ์šด ํ™˜๊ฒฝ๊ณผ ์ž‘์—…์— ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ๋กœ ๋ณด์•„, CompDiffuser๋Š” ๊ธฐ์กด ๋ชจ๋ฐฉํ•™์Šต์ด๋‚˜ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ํ”Œ๋ž˜๋„ˆ๋“ค์ด ์‹คํŒจํ•˜๋Š” ๋ณต์žกํ•œ ๋ฏธ๋กœ๋„ ์ฒ™์ฒ™ ํ’€์–ด๋‚ด์—ˆ๊ณ , ๋‹ค์–‘ํ•œ ํ–‰๋™ ํƒ€์ž…์„ ์œ ์—ฐํ•˜๊ฒŒ ๊ฒฐํ•ฉํ•˜๋Š” ๋Šฅ๋ ฅ๊นŒ์ง€ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์–‘๋ฐฉํ–ฅ ์ด์›ƒ ์กฐ๊ฑด๋ถ€ ํ™•์‚ฐ์ด๋ผ๋Š” ๋…์ฐฝ์  ๋ฐฉ๋ฒ•์œผ๋กœ, ๋ถ€๋ถ„ ๊ฒฝ๋กœ ๊ฐ„ ๋ฌผ๋ฆฌ์  ๋ถ€์กฐํ™”๋ฅผ ํ•ด์†Œํ•œ ์ ์€ ํ–ฅํ›„ ์œ ์‚ฌํ•œ ๊ตฌ์„ฑ์  ์ƒ์„ฑ(task composition) ๋ฌธ์ œ๋“ค์—๋„ ์‘์šฉ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์„ฑ๊ณผ๋Š” ๋กœ๋ด‡ํ•™๊ณ„์— ๋ช‡ ๊ฐ€์ง€ ์‹œ์‚ฌ์ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์šฐ์„ , ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ง์ ‘ ํ•™์Šตํ•˜๋Š” ์ œ๋„ˆ๋ ˆ์ดํ‹ฐ๋ธŒ ๋ชจ๋ธ์ด ๊ธฐ์กด ๊ณ„ํš ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์–ด๊นจ๋ฅผ ๋‚˜๋ž€ํžˆ ํ•  ์ •๋„๋กœ ๋ฐœ์ „ํ–ˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ end-to-end ํ•™์Šตํ•˜๊ธฐ๋ณด๋‹ค, CompDiffuser์ฒ˜๋Ÿผ ๋ฌธ์ œ ๊ตฌ์กฐ๋ฅผ ๋ฐ˜์˜ํ•œ ํ•™์Šต์„ ๋„์ž…ํ•˜๋ฉด ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‘˜์งธ, ๋ชจ๋“ˆํ™”์™€ ํ•™์Šต์˜ ๊ฒฐํ•ฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๊ณผ๊ฑฐ์—๋Š” ๋ชจ๋“ˆ์‹ ์ ‘๊ทผ(์Šคํ‚ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋“ฑ)๊ณผ ํ•™์Šต๊ธฐ๋ฐ˜ ์ ‘๊ทผ์ด ๋ณ„๊ฐœ๋กœ ์—ฐ๊ตฌ๋˜์—ˆ์œผ๋‚˜, ์ด์ œ ํ•™์Šต๋œ ๋ชจ๋“ˆ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์กฐํ•ฉํ•˜๋Š” ํ˜•ํƒœ๋กœ ๋‚˜์•„๊ฐˆ ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋์œผ๋กœ, CompDiffuser๋Š” ์•„์ง ์ดˆ๊ธฐ ๋‹จ๊ณ„์˜ ์‹œ๋„์ด๋ฏ€๋กœ, ํ–ฅํ›„ ์‹ค์ œ ๋กœ๋ด‡ ์ œ์–ด์— ํ†ตํ•ฉํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ๋ คํ•ด์•ผ ํ•  ๋ถ€๋ถ„๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€, ์•ˆ์ „ํ•œ ์‹ค์‹œ๊ฐ„ ์žฌ๊ณ„ํš, ๋™์  ํ™˜๊ฒฝ ๋Œ€์‘, 3D ๊ณต๊ฐ„์˜ ๊ฒฝ๋กœ๊ณ„ํš ๋“ฑ์— ๋ณธ ๊ฐœ๋…์„ ํ™•์žฅํ•˜๋ ค๋ฉด ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์ „์—ญ ์ตœ์ ํ™”์™€ ๊ณ„ํš ํ’ˆ์งˆ ๋ณด์žฅ ์ธก๋ฉด์—์„œ๋„ ๋ณด์™„ ์—ฐ๊ตฌ๊ฐ€ ์ด๋ฃจ์–ด์ง„๋‹ค๋ฉด, CompDiffuser์˜ ์ ‘๊ทผ์ด ์‹ค๋ฌด์—์„œ ๋”์šฑ ์‹ ๋ขฐ๋ฐ›์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, Generative Trajectory Stitching์ด๋ผ๋Š” ๊ฐœ๋…์€ ๋กœ๋ด‡ ๋ชจ์…˜ ํ”Œ๋ž˜๋‹์˜ ์ƒˆ๋กœ์šด ์ง€ํ‰์„ ์—ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. CompDiffuser๋Š” ๊ทธ ๊ฐ€๋Šฅ์„ฑ์„ ๊ฐ•๋ ฌํ•˜๊ฒŒ ์ž…์ฆํ•œ ์˜ˆ๋กœ์„œ, ์•ž์œผ๋กœ ํ™•์‚ฐ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ๋กœ๋ด‡ ๊ณ„ํš ๋ถ„์•ผ์˜ ํ™œ๋ฐœํ•œ ์—ฐ๊ตฌ๋ฅผ ์ด‰๋ฐœํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค. ๋กœ๋ด‡์ด ๊ณผ๊ฑฐ์˜ ์›€์ง์ž„ ์กฐ๊ฐ๋“ค์„ ์ž์œ ์ž์žฌ๋กœ ๋ฆฌ๋ฏน์Šคํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋„์ „์— ๋Œ€์‘ํ•˜๋Š” ๋ชจ์Šต์€, ๋” ์ด์ƒ ๋จผ ๋ฏธ๋ž˜์˜ ์ด์•ผ๊ธฐ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. CompDiffuser๋ฅผ ๊ณ„๊ธฐ๋กœ, ์ƒ์„ฑ์  AI์™€ ๊ณ ์ „์  ๋กœ๋ด‡์ œ์–ด์˜ ๋งŒ๋‚จ์ด ์–ด๋–ค ์‹œ๋„ˆ์ง€๋ฅผ ๋‚ผ์ง€ ์ง€์ผœ๋ณผ ๋งŒํ•œ ์‹œ์ ์ž…๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee