Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ์„œ๋ก : โ€œ๊ฐ€์งœ ๋ฐ์ดํ„ฐโ€์˜ ์‹œ๋Œ€, ์ง„์งœ ๋ฌธ์ œ๋Š” ๋ฌด์—‡์ธ๊ฐ€?
    • ๋ฐฐ๊ฒฝ: Neural Trajectory๋ž€ ๋ฌด์—‡์ธ๊ฐ€?
      • Neural Trajectory์˜ ์ •์˜
      • ๊ธฐ์กด ํŒŒ์ดํ”„๋ผ์ธ์˜ ํ•œ๊ณ„
    • ๋ฐฉ๋ฒ•๋ก : RoboCurate์˜ ๊ตฌ์กฐ
      • 3.1 ๋‹ค์–‘ํ•œ Neural Trajectory ์ƒ์„ฑ
      • 3.2 ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ฆฌํ”Œ๋ ˆ์ด ์ผ์น˜์„ฑ ํ•„ํ„ฐ๋ง
      • 3.3 Best-of-N ์ƒ˜ํ”Œ๋ง
    • ์‹คํ—˜: ์–ด๋–ค ์ฆ๊ฑฐ๊ฐ€ ์žˆ๋Š”๊ฐ€?
      • ์‹คํ—˜ ์„ค์ •
      • ์ฃผ์š” ๊ฒฐ๊ณผ
    • ๊ธฐ์ˆ ์  ์‹ฌ์ธต ๋ถ„์„
      • IDM(Inverse Dynamics Model)์˜ ์—ญํ• ๊ณผ ํ•œ๊ณ„
      • ํ•„ํ„ฐ๋ง ํ”„๋กœ๋ธŒ์˜ ์„ค๊ณ„ ์ฒ ํ•™
      • ๋ณ„๋„์˜ Embodiment Tag ์ „๋žต
      • ํ•™์Šต ์Šค์ผ€์ค„๋ง: Curriculum Learning์  ์ ‘๊ทผ
    • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต
      • DreamGen (NVIDIA, 2025)
      • ReBot (2025)
      • Cosmos Policy (NVIDIA, 2025-2026)
    • ๋น„ํŒ์  ๊ณ ์ฐฐ
      • ๊ฐ•์ 
    • ์•ฝ์ ๊ณผ ํ•œ๊ณ„
    • ์‹œ์‚ฌ์ ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ
      • ์‹ค๋ฌด์ž๋ฅผ ์œ„ํ•œ ํ•ต์‹ฌ ๊ตํ›ˆ
      • ์—ด๋ฆฐ ์—ฐ๊ตฌ ์งˆ๋ฌธ๋“ค
    • ์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

๐Ÿ“ƒRoboCurate ๋ฆฌ๋ทฐ

neural-trajectory
data-curation
Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning
Published

February 18, 2026

  • Paper Link
  • Project Link

Allex Platform์—์„œ์˜ ์‹คํ—˜์„ ํฌํ•จ

  1. ๐Ÿค– RoboCurate๋Š” ๋กœ๋ด‡ ํ•™์Šต์„ ์œ„ํ•œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฆฌํ”Œ๋ ˆ์ด์™€์˜ ๋™์ž‘ ์ผ๊ด€์„ฑ ๊ฒ€์ฆ์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ํ–‰๋™์˜ ํ’ˆ์งˆ์„ ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ”„ ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์ด๋ฏธ์ง€-ํˆฌ-์ด๋ฏธ์ง€(I2I) ํŽธ์ง‘์œผ๋กœ ์žฅ๋ฉด ๋‹ค์–‘์„ฑ์„, ๋น„๋””์˜ค-ํˆฌ-๋น„๋””์˜ค(V2V) ์ „์†ก์œผ๋กœ ์™ธ๊ด€ ๋‹ค์–‘์„ฑ์„ ํ™•๋ณดํ•˜์—ฌ ๊ด€์ฐฐ ๋‹ค์–‘์„ฑ์„ ํฌ๊ฒŒ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿš€ RoboCurate๋Š” GR-1 Tabletop ๋ฐ DexMimicGen๊ณผ ๊ฐ™์€ ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ• ๋Œ€๋น„ ์„ฑ๊ณต๋ฅ ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์œผ๋ฉฐ, ALLEX ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋„ ๋›ฐ์–ด๋‚œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

๋ณธ ๋…ผ๋ฌธ์€ ๋กœ๋ด‡ ํ•™์Šต์„ ์œ„ํ•œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ์ธ RoboCurate๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ(neural trajectory)๋Š” ๋กœ๋ด‡ ํ•™์Šต์„ ์œ„ํ•œ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์œ ๋งํ•˜์ง€๋งŒ, ์ƒ์„ฑ๋œ ๋น„๋””์˜ค์˜ ํ’ˆ์งˆ์ด ๋ถˆ์™„์ „ํ•˜์—ฌ ์ผ๊ด€์„ฑ ์—†๋Š” ์•ก์…˜ ํ’ˆ์งˆ ๋ฌธ์ œ๋ฅผ ๊ฒช์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ VLM(Vision-Language Models) ๊ธฐ๋ฐ˜ ๊ฒ€์ฆ์€ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ •ํ™•ํ•œ ๋น„๋””์˜ค๋ฅผ ๊ตฌ๋ณ„ํ•˜๋Š” ๋ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ์œผ๋ฉฐ, ์ƒ์„ฑ๋œ ์•ก์…˜ ์ž์ฒด๋ฅผ ์ง์ ‘ ํ‰๊ฐ€ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

RoboCurate๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ์งธ, controllable visual diversification pipeline์„ ํ†ตํ•ด ์žฅ๋ฉด ๋ฐ ์™ธํ˜•์˜ ๋‹ค์–‘์„ฑ์„ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค. ๋‘˜์งธ, simulator-replay consistency๋ฅผ ํ†ตํ•ด annotated action์˜ ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๊ณ  ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค.

1. Plausible Manipulation Scenarios ์ƒ์„ฑ (Diversity)

RoboCurate๋Š” video generative model์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ•ฉ์„ฑ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์žฅ๋ฉด ๋น„์ฃผ์–ผ๊ณผ task instruction ๋‘ ๊ฐ€์ง€ ์š”์†Œ๋ฅผ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค.

  • Visual Diversity ํ™•์žฅ:
    • I2I (Image-to-Image) editing: ์ดˆ๊ธฐ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ I2I ํŽธ์ง‘์„ ์ ์šฉํ•˜์—ฌ scene-level variation์„ ํฌ๊ฒŒ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค. ํŽธ์ง‘๋œ ์ด๋ฏธ์ง€๊ฐ€ ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์œ ํšจํ•œ ์‹œ์ž‘ ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๋„๋ก Canny edge map์„ ์กฐ๊ฑด์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์›๋ณธ scene structure๋ฅผ ๋ณด์กดํ•ฉ๋‹ˆ๋‹ค. ํ…Œ์ด๋ธ” ์™ธํ˜•, ํƒ€๊ฒŸ ๊ฐ์ฒด ์ •์ฒด์„ฑ ๋ฐ ์™ธํ˜•, ์กฐ๋ช…, ๋ฐฐ๊ฒฝ์˜ ๋„ค ๊ฐ€์ง€ ์ถ•์„ ๋”ฐ๋ผ ์ฒด๊ณ„์ ์ธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ œ์–ด๋œ ์‹œ๊ฐ์  ๋ณ€ํ˜•์„ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.
    • V2V (Video-to-Video) transfer: ์„ฑ๊ณต์ ์ธ ํ•ฉ์„ฑ ๋น„๋””์˜ค์— V2V ์ „์†ก์„ ์ ์šฉํ•˜์—ฌ ๋ชจ์…˜ ์—ญํ•™์„ ๋ณด์กดํ•˜๋ฉด์„œ ์™ธํ˜•์„ ๋‹ค์–‘ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ „์†ก๋œ ๋น„๋””์˜ค๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋กœ๋ด‡ ๋ชจ์…˜์„ ์œ ์ง€ํ•˜๋ฏ€๋กœ IDM(Inverse Dynamics Models)์ด ๋ ˆ์ด๋ธ”๋งํ•œ ์•ก์…˜ ์ฃผ์„์„ ์žฌ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์›๋ณธ ๋น„๋””์˜ค ๊ตฌ์กฐ๋ฅผ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด Canny edge video์— V2V ์ „์†ก์„ ์กฐ๊ฑดํ™”ํ•˜๊ณ , I2I ํŽธ์ง‘ ํŒŒ์ดํ”„๋ผ์ธ๊ณผ ์œ ์‚ฌํ•œ ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋™์ผํ•œ ๋„ค ๊ฐ€์ง€ ์ถ•์„ ๋”ฐ๋ผ ์™ธํ˜•์„ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. ์•ก์…˜ ์žฌ์‚ฌ์šฉ์˜ ์œ ํšจ์„ฑ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ์ฒด ์ •์ฒด์„ฑ๊ณผ ํ˜•ํƒœ๋Š” ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  ํ…์Šค์ฒ˜์™€ ์ƒ‰์ƒ๋งŒ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.
  • Task Instructions ํ™•์žฅ:
    • ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„๊ณผ ์–ธ์–ด ์ง€์นจ์„ ์กฐ๊ฑด์œผ๋กœ ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์˜๋ฏธ ์žˆ๋Š” ๋กœ๋ด‡-๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ์ด ํฌํ•จ๋œ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋…์  VLM์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„์„ ๊ธฐ๋ฐ˜์œผ๋กœ plausible task instruction์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. naiveํ•œ VLM ์ฟผ๋ฆฌ๊ฐ€ ์ž˜๋ชป๋œ instruction template์ด๋‚˜ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋กœ๋ด‡ ์•ก์…˜์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ๊ธฐ์กด ๋ฐ์ดํ„ฐ์…‹์˜ ์˜ˆ์‹œ๋ฅผ ํฌํ•จํ•œ few-shot prompting์„ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๊ด€์„ฑ์„ ํ™•๋ณดํ•ฉ๋‹ˆ๋‹ค. ํ–‰๋™, ํƒ€๊ฒŸ ๊ฐ์ฒด, ๋ฐฐ์น˜, ๋กœ๋ด‡ ์† ์œ ํ˜•์˜ ๋„ค ๊ฐ€์ง€ ์ถ•์„ ๋”ฐ๋ผ ์ƒˆ๋กœ์šด task instruction์„ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค.

2. Action-level Filtering of Neural Trajectory (Quality Verification)

์ƒ์„ฑ๋œ neural trajectory๋Š” ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์•ก์…˜ ๋ ˆ์ด๋ธ”์„ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋น„๋””์˜ค ๋ชจ์…˜์ด๋‚˜ IDM ์˜ˆ์ธก ์˜ค๋ฅ˜๋กœ ์ธํ•ด ์˜ˆ์ธก๋œ ์•ก์…˜์ด ๋น„๋””์˜ค์™€ ์ผ์น˜ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. RoboCurate๋Š” (w_{\text{gen}}, a_{\text{IDM}}) ํ˜•ํƒœ์˜ ๊ฐ neural trajectory ์ƒ˜ํ”Œ(์ƒ์„ฑ๋œ ๋น„๋””์˜ค, IDM ์˜ˆ์ธก ์•ก์…˜)์— ๋Œ€ํ•ด ์•ก์…˜์˜ ํ’ˆ์งˆ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด a_{\text{IDM}}์„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์žฌ์ƒํ•˜์—ฌ ํ•ด๋‹น rollout ๋น„๋””์˜ค w_{\text{sim}}(a_{\text{IDM}})์„ ๋ Œ๋”๋งํ•ฉ๋‹ˆ๋‹ค. ์ด w_{\text{sim}}(a_{\text{IDM}})์€ a_{\text{IDM}}๊ณผ ์ผ๊ด€๋œ ๋กœ๋ด‡ ๋ชจ์…˜์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์•ก์…˜ ๊ฒ€์ฆ ๋ฌธ์ œ๋ฅผ ๋‘ ๋น„๋””์˜ค, ์ฆ‰ (w_{\text{gen}}, w_{\text{sim}}(a_{\text{IDM}})) ๊ฐ„์˜ ๋ชจ์…˜ ์ผ๊ด€์„ฑ ๋น„๊ต ๋ฌธ์ œ๋กœ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

  • Attentive Probe:
    • ์ด ๋ชจ์…˜ ์ผ๊ด€์„ฑ ๋น„๊ต๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋™๊ฒฐ๋œ pre-trained video encoder ์œ„์— lightweight attentive probe๋ฅผ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค.
    • ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•: ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ probe๋ฅผ ํ›ˆ๋ จ์‹œํ‚ค์ง€ ์•Š๊ธฐ ์œ„ํ•ด ์‹ค์ œ ์„ธ๊ณ„ ๋ฐ๋ชจ \mathcal{T} = \{(w_{\text{real}}, a_{\text{real}})\}์—์„œ ๊ธ์ •(aligned) ๋ฐ ๋ถ€์ • ์Œ์„ ์‹ ์ค‘ํ•˜๊ฒŒ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.
      • ๊ธ์ • ์Œ (\mathcal{P}^+): ๊ฐ ์‹ค์ œ ์•ก์…˜ a_{\text{real}}์— ๋Œ€ํ•ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ rollout ๋น„๋””์˜ค w_{\text{sim}}(a_{\text{real}})์„ ๋ Œ๋”๋งํ•˜๊ณ  ์ผ์น˜ํ•˜๋Š” ์‹œ๊ฐ„ ๊ตฌ๊ฐ„์œผ๋กœ ์Œ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค: \left\{ (w_{\text{real}, t:t+H}, w_{\text{sim}}(a_{\text{real}})_{t:t+H}) \right\}.
      • ๋ถ€์ • ์Œ (\mathcal{P}^-): ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
        • Temporally shifted negatives: ๋™์ผํ•œ ์—ํ”ผ์†Œ๋“œ ๋‚ด์—์„œ ์‹œ๊ฐ„์„ ๊ณ ์˜์ ์œผ๋กœ ๋ถˆ์ผ์น˜์‹œํ‚ต๋‹ˆ๋‹ค: \left\{ (w_{\text{real}, t:t+H}, w_{\text{sim}}(a_{\text{real}})_{t':t'+H}) \mid t' \neq t \right\}.
        • Cross-episode negatives: ์‹ค์ œ ํด๋ฆฝ๊ณผ ๋‹ค๋ฅธ ์—ํ”ผ์†Œ๋“œ์—์„œ ์˜จ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ rollout์„ ์Œ์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค: \left\{ (w_{\text{real}, t:t+H}, w_{\text{sim}}(a'_{\text{real}})_{t:t+H}) \mid a'_{\text{real}} \neq a_{\text{real}} \right\}.
    • ํ›ˆ๋ จ: ์ƒ˜ํ”Œ๋ง๋œ ์Œ (w_1, w_2) \sim \mathcal{P}์— ๋Œ€ํ•ด pre-trained video encoder f_\phi๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํด๋ฆฝ์„ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค: z_1 = f_\phi(w_1), z_2 = f_\phi(w_2). ๋‹ค์Œ์œผ๋กœ, ์ž„๋ฒ ๋”ฉ์„ ์—ฐ๊ฒฐํ•˜๊ณ  ์ด๋ฅผ attention-based probe g_\theta(\cdot)์— ์ž…๋ ฅํ•˜์—ฌ ์ผ๊ด€์„ฑ ๋กœ์ง“ \ell = g_\theta([z_1, z_2])์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ์ด์ง„ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค๋กœ g_\theta๋ฅผ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค: \mathcal{L}(\theta; \mathcal{P}) = E_{((w_1,w_2),y)\sim\mathcal{P}}[-y \log p - (1-y) \log(1-p)] (์—ฌ๊ธฐ์„œ p = \sigma(\ell)).
    • ์ถ”๋ก : (w_{\text{gen}}, a_{\text{IDM}}) ์ƒ˜ํ”Œ์ด ์ฃผ์–ด์ง€๋ฉด ๋น„๋””์˜ค ์Œ (w_{\text{gen}}, w_{\text{sim}}(a_{\text{IDM}}))์„ ๊ตฌ์„ฑํ•˜๊ณ  ํ›ˆ๋ จ๋œ attentive probe g_\theta์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ผ๊ด€์„ฑ ํ™•๋ฅ  p๊ฐ€ ์ž„๊ณ„๊ฐ’ c๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๊ฒฝ์šฐ์—๋งŒ ์ƒ˜ํ”Œ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

3. Improve Neural Trajectory via Best-of-N Sampling

ํ•„ํ„ฐ๋ง ๋ฐฉ๋ฒ•์€ ์œ ์ตํ•œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ถ”๋ก  ์‹œ video generative model์˜ critic์œผ๋กœ ์ž‘์šฉํ•˜์—ฌ neural trajectory๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. N๊ฐœ์˜ ํ›„๋ณด ๋น„๋””์˜ค์™€ ํ•ด๋‹น IDM ์˜ˆ์ธก ์•ก์…˜์„ ์ƒ˜ํ”Œ๋งํ•œ ๋‹ค์Œ, ๊ฐ€์žฅ ๋†’์€ critic score(attentive probe์˜ ์ผ๊ด€์„ฑ ํ™•๋ฅ  p)๋ฅผ ๊ฐ€์ง„ ๋น„๋””์˜ค-์•ก์…˜ ์Œ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด ์ „๋žต์€ ์•ก์…˜์ด ๊ฒ€์ฆ๋œ ํ›„๋ณด๋ฅผ ์„ ํƒํ•จ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ํ™˜๊ฒฝ์—์„œ neural trajectory generation framework์˜ ํšจ์œจ์ ์ธ ์‚ฌ์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ:

RoboCurate๋Š” GR-1 Tabletop ๋ฐ DexMimicGen ๋ฒค์น˜๋งˆํฌ์—์„œ์˜ pre-training ์„ค์ •๊ณผ ALLEX humanoid์—์„œ์˜ co-finetuning ์„ค์ • ์ „๋ฐ˜์— ๊ฑธ์ณ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

  • Pre-training: Real data only baseline ๋Œ€๋น„ GR-1 Tabletop์—์„œ +70.1%, DexMimicGen์—์„œ +16.1%์˜ ์ƒ๋‹นํ•œ ์ƒ๋Œ€์  ์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด DreamGen(Jang et al., 2025) ํŒŒ์ดํ”„๋ผ์ธ์€ ๋™์ผํ•œ baseline ๋Œ€๋น„ ๊ฐ๊ฐ +26.6%, +4.0%์˜ ๋ฏธ๋ฏธํ•œ ๊ฐœ์„ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • Co-finetuning (ALLEX humanoid): RoboCurate๋Š” +179.9%์˜ ์ƒ๋Œ€์  ์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ์„ ๋ณด์˜€์œผ๋ฉฐ, ๊ธฐ์กด DreamGen์€ +100.0%์˜ ๊ฐœ์„ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, challengingํ•œ ์‹ค์ œ ALLEX humanoid dexterous manipulation ํ™˜๊ฒฝ์—์„œ OOD(out-of-distribution) ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ์ž…์ฆํ•˜์—ฌ novel object pick-and-place task์—์„œ +162.3%์˜ ์ƒ๋Œ€์  ๊ฐœ์„ ์„, novel action task์—์„œ๋Š” 0.0%์—์„œ 25.0%๋กœ emergent success๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • Ablation Study:
    • ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ ์ฆ๊ฐ•(I2I, V2V) ์ž์ฒด๊ฐ€ downstream task ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
    • ์ œ์•ˆ๋œ action-level filtering์€ VLA ์„ฑ๋Šฅ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
    • RoboCurate์˜ filtering ์ „๋žต์€ VLM ๊ธฐ๋ฐ˜์˜ ๋น„๋””์˜ค ์ˆ˜์ค€ ๋ฌผ๋ฆฌ์  ๊ทธ๋Ÿด๋“ฏํ•จ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•(DreamGenBench, VideoCon-Physics)๋ณด๋‹ค ์šฐ์ˆ˜ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • Attentive probe๋ฅผ ์œ„ํ•œ ํ›ˆ๋ จ ์ „๋žต(์‹ค์ œ ๋ฐ์ดํ„ฐ์—์„œ ์ž๋™ ๊ตฌ์„ฑ๋œ ๊ธ์ •/๋ถ€์ • ์Œ)์ด ์ค‘์š”ํ•˜๋ฉฐ, ์ธ๊ฐ„ ๋ ˆ์ด๋ธ”๋ง์ด๋‚˜ ๋‹จ์ˆœํžˆ ์ž„๋ฒ ๋”ฉ์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ์ด์šฉํ•œ ํ•„ํ„ฐ๋ง๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฏธ๋ฌ˜ํ•œ ๋ชจ์…˜ ๋ถˆ์ผ์น˜์— ๋Œ€ํ•œ ๋ฏธ์„ธํ•œ ์ฐจ์ด๋ฅผ ๊ฐ์ง€ํ•˜๋Š” ๋ฐ ํšจ๊ณผ์ ์ธ ์ผ๊ด€๋œ supervision์„ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๊ฒฐ๋ก ์ ์œผ๋กœ RoboCurate๋Š” simulator-replay consistency๋ฅผ ํ†ตํ•ด IDM ์˜ˆ์ธก ์•ก์…˜์„ ๊ฒ€์ฆํ•˜๊ณ  I2I ํŽธ์ง‘ ๋ฐ action-preserving V2V ์ „์†ก์„ ํ†ตํ•ด ๊ด€์ฐฐ ๋‹ค์–‘์„ฑ์„ ํ™•์žฅํ•จ์œผ๋กœ์จ neural trajectory๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ํšจ๊ณผ์ ์ธ ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

์„œ๋ก : โ€œ๊ฐ€์งœ ๋ฐ์ดํ„ฐโ€์˜ ์‹œ๋Œ€, ์ง„์งœ ๋ฌธ์ œ๋Š” ๋ฌด์—‡์ธ๊ฐ€?

๋กœ๋ด‡๊ณตํ•™์—์„œ ๋ฐ์ดํ„ฐ๋Š” ๋Š˜ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. ์ž์œจ์ฃผํ–‰ ๋ถ„์•ผ์—์„œ ์ˆ˜๋ฐฑ๋งŒ ๋งˆ์ผ์˜ ์ฃผํ–‰ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ•์ ํ•˜๋“ฏ, ๋กœ๋ด‡ ์กฐ์ž‘(manipulation) ๋ถ„์•ผ์—์„œ๋„ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์‹ค์ œ ๋กœ๋ด‡์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ๊ฒƒ์€ ๋А๋ฆฌ๊ณ , ๋น„์‹ธ๊ณ , ์œ„ํ—˜ํ•ฉ๋‹ˆ๋‹ค. ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜์œผ๋กœ 1์‹œ๊ฐ„ ๋ถ„๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ๋ฐ ํ•˜๋ฃจ๊ฐ€ ๊ฑธ๋ฆด ์ˆ˜๋„ ์žˆ์ฃ .

๊ทธ๋ž˜์„œ ์ตœ๊ทผ ๋กœ๋ด‡๊ณตํ•™ ์ปค๋ฎค๋‹ˆํ‹ฐ๋Š” ํ•œ ๊ฐ€์ง€ ๋งค๋ ฅ์ ์ธ ์•„์ด๋””์–ด์— ์ฃผ๋ชฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค: ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ(Video Generative Model)๋กœ ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์ž. NVIDIA์˜ DreamGen์ด ์ด ์ ‘๊ทผ๋ฒ•์„ ๋Œ€์ค‘ํ™”ํ–ˆ๊ณ , GR00T N1 ๊ฐ™์€ ๋Œ€ํ˜• VLA(Vision-Language-Action) ๋ชจ๋ธ๋“ค์ด ์ด๋ ‡๊ฒŒ ๋งŒ๋“  โ€œNeural Trajectoryโ€๋ฅผ ํ•™์Šต์— ํ™œ์šฉํ•˜๋ฉฐ ๋†€๋ผ์šด ์„ฑ๊ณผ๋ฅผ ๊ฑฐ๋‘๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์—” ๊ทผ๋ณธ์ ์ธ ๋ฌธ์ œ๊ฐ€ ์ˆจ์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ์ด ๋งŒ๋“  ์˜์ƒ์€ โ€œ๊ทธ๋Ÿด๋“ฏํ•ด ๋ณด์ผโ€ ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ฑฐ๊ธฐ์„œ ์ถ”์ถœํ•œ ์•ก์…˜์ด ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ •ํ™•ํ•œ์ง€ ๋ณด์žฅํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๋น„์œ ํ•˜์ž๋ฉด ์ด๋ ‡์Šต๋‹ˆ๋‹ค. ์˜ํ™” ์ดฌ์˜ ํ˜„์žฅ์—์„œ ๋ฐฐ์šฐ๊ฐ€ ์ปต์„ ์ง‘๋Š” ์—ฐ๊ธฐ๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. ์นด๋ฉ”๋ผ๋กœ ์ฐ์œผ๋ฉด ์™„๋ฒฝํ•ด ๋ณด์ด์ฃ . ํ•˜์ง€๋งŒ ์‹ค์ œ ๋กœ๋ด‡์—๊ฒŒ โ€œ์ € ๋™์ž‘ ๊ทธ๋Œ€๋กœ ํ•ด๋ดโ€๋ผ๊ณ  ํ•˜๋ฉด? ๊ทธ ์˜์ƒ์—์„œ ์ถ”์ถœํ•œ ๊ด€์ ˆ ๊ฐ๋„๋‚˜ ์—”๋“œ์ดํŽ™ํ„ฐ ๊ถค์ ์€ ์—‰ํ„ฐ๋ฆฌ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜์ƒ์€ ์‹œ๊ฐ์ ์œผ๋กœ๋Š” ์™„๋ฒฝํ•˜์ง€๋งŒ, ๋ฌผ๋ฆฌ์ ์œผ๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅํ•œ ๋™์ž‘์„ ๋‹ด๊ณ  ์žˆ์„ ์ˆ˜ ์žˆ๊ฑฐ๋“ ์š”.

RoboCurate๋Š” ๋ฐ”๋กœ ์ด ๋ฌธ์ œ๋ฅผ ์ •๋ฉด์œผ๋กœ ๋‹ค๋ฃน๋‹ˆ๋‹ค. 2026๋…„ 2์›” Seungku Kim ๋“ฑ 6๋ช…์˜ ์—ฐ๊ตฌ์ž๊ฐ€ ๋ฐœํ‘œํ•œ ์ด ๋…ผ๋ฌธ์€, ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ์˜ โ€œํ’ˆ์งˆ ๊ด€๋ฆฌ(Quality Curation)โ€ ๋ฌธ์ œ์— ๋Œ€ํ•œ ์ฒด๊ณ„์ ์ด๊ณ  ์‹ค์šฉ์ ์ธ ํ•ด๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์งˆ๋ฌธ์€ ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค:

โ€œ์ด ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์˜ ์•ก์…˜ ๋ผ๋ฒจ์ด ์ง„์งœ ๋งž๋Š” ๊ฑด๊ฐ€?โ€

์ด ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ ์œ„ํ•ด RoboCurate๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์‹ฌํŒ๊ด€์œผ๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ฑ๋œ ์•ก์…˜์„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๋ฆฌํ”Œ๋ ˆ์ดํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ ์˜์ƒ์ด ์›๋ž˜ ์ƒ์„ฑ ์˜์ƒ๊ณผ ๋ชจ์…˜์ด ์ผ์น˜ํ•˜๋Š”์ง€ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์ด์ฃ . ์—ฌ๊ธฐ์— ๋”ํ•ด, ๋ฐ์ดํ„ฐ์˜ ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” I2I/V2V ํŒŒ์ดํ”„๋ผ์ธ๊นŒ์ง€ ๊ฐ–์ถค์œผ๋กœ์จ, ๋‹ค์–‘์„ฑ๊ณผ ์ •ํ™•์„ฑ์ด๋ผ๋Š” ๋‘ ๋งˆ๋ฆฌ ํ† ๋ผ๋ฅผ ๋™์‹œ์— ์žก์Šต๋‹ˆ๋‹ค.


๋ฐฐ๊ฒฝ: Neural Trajectory๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

๋ณธ๊ฒฉ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์— ๋“ค์–ด๊ฐ€๊ธฐ ์ „์—, ํ•ต์‹ฌ ๊ฐœ๋…์„ ๋จผ์ € ์ •๋ฆฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Neural Trajectory์˜ ์ •์˜

Neural Trajectory๋Š” ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ(์˜ˆ: Cosmos, Wan ๋“ฑ)์ด ๋งŒ๋“  ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋น„๋””์˜ค์™€, ๊ฑฐ๊ธฐ์„œ ์ถ”์ถœํ•œ ์˜์‚ฌ-์•ก์…˜(pseudo-action)์˜ ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ์™€ ๋‹ฌ๋ฆฌ, ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์—†์ด ์ˆœ์ˆ˜ํ•˜๊ฒŒ ์‹ ๊ฒฝ๋ง์ด โ€œ์ƒ์ƒโ€ํ•œ ๋กœ๋ด‡ ๊ถค์ ์ด๋ผ๊ณ  ๋ณด๋ฉด ๋ฉ๋‹ˆ๋‹ค.

๊ตฌ๋ถ„ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ Neural Trajectory
๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ
์‹œ๊ฐ์  ํ˜„์‹ค๊ฐ ์ตœ๊ณ  ์ค‘๊ฐ„ (sim-to-real gap) ๋†’์Œ
์•ก์…˜ ์ •ํ™•๋„ ์ •ํ™• ์ •ํ™• โš ๏ธ ๋ถˆํ™•์‹ค
ํ™•์žฅ์„ฑ ๋‚ฎ์Œ ์ค‘๊ฐ„ ๋งค์šฐ ๋†’์Œ
๋‹ค์–‘์„ฑ ์ˆ˜์ง‘ ํ™˜๊ฒฝ์— ์ œํ•œ ์—์…‹์— ์ œํ•œ ๋†’์Œ (์ƒ์„ฑ ๋ชจ๋ธ ํ™œ์šฉ)

๊ธฐ์กด ํŒŒ์ดํ”„๋ผ์ธ์˜ ํ•œ๊ณ„

NVIDIA์˜ DreamGen์ด ๋Œ€ํ‘œ์ ์ธ Neural Trajectory ์ƒ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ํ๋ฆ„์€ ์ด๋ ‡์Šต๋‹ˆ๋‹ค:

  1. ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„ + ์–ธ์–ด ์ง€์‹œ๋ฌธ โ†’ Image-to-Video(I2V) ๋ชจ๋ธ๋กœ ๋กœ๋ด‡ ๋น„๋””์˜ค ์ƒ์„ฑ
  2. ์ƒ์„ฑ๋œ ๋น„๋””์˜ค์—์„œ IDM(Inverse Dynamics Model)์œผ๋กœ ์•ก์…˜ ์ถ”์ถœ
  3. (๋น„๋””์˜ค, ์•ก์…˜) ์Œ์„ VLA ์ •์ฑ… ํ•™์Šต์— ํ™œ์šฉ

๋ฌธ์ œ๋Š” ๋‘ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค:

์ฒซ์งธ, ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ ๋ถ€์กฑ. I2V ๋ชจ๋ธ์— ๋„ฃ๋Š” ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„์ด ๊ธฐ์กด ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ€์ ธ์˜ค๋ฏ€๋กœ, ์ƒ์„ฑ ๋น„๋””์˜ค์˜ ์žฅ๋ฉด ๋‹ค์–‘์„ฑ์ด ์ œํ•œ๋ฉ๋‹ˆ๋‹ค.

๋‘˜์งธ, ์•ก์…˜ ํ’ˆ์งˆ ๊ฒ€์ฆ ๋ถ€์žฌ. IDM์ด ์˜ˆ์ธกํ•œ ์•ก์…˜์ด ์‹ค์ œ๋กœ ๋น„๋””์˜ค์˜ ๋ชจ์…˜๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€ ํ™•์ธํ•  ๋ฐฉ๋ฒ•์ด ๋งˆ๋•…์น˜ ์•Š์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด์—๋Š” VLM(Vision-Language Model)์„ ํ™œ์šฉํ•ด โ€œ์ด ๋น„๋””์˜ค๊ฐ€ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํƒ€๋‹นํ•œ๊ฐ€?โ€ ์ •๋„๋งŒ ํŒ๋‹จํ–ˆ๋Š”๋ฐ, VLM์€ ๋ฌผ๋ฆฌ ๋ฒ•์น™์„ ์ •๋ฐ€ํ•˜๊ฒŒ ์ดํ•ดํ•˜์ง€ ๋ชปํ•˜๋ฉฐ, ๋ฌด์—‡๋ณด๋‹ค ์•ก์…˜ ์ž์ฒด์˜ ์ •ํ™•์„ฑ์€ ํ‰๊ฐ€ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.


๋ฐฉ๋ฒ•๋ก : RoboCurate์˜ ๊ตฌ์กฐ

RoboCurate์˜ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€ ์ถ•์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

flowchart TB
    subgraph GENERATION["1๏ธโƒฃ ๋‹ค์–‘ํ•œ Neural Trajectory ์ƒ์„ฑ"]
        A[์‹ค์ œ ๋ฐ์ดํ„ฐ ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„] --> B["I2I ํŽธ์ง‘<br/>(์žฅ๋ฉด ๋‹ค์–‘์„ฑ)"]
        B --> C[๋‹ค์–‘ํ•œ ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„๋“ค]
        C --> D["I2V ๋น„๋””์˜ค ์ƒ์„ฑ"]
        D --> E[์ƒ์„ฑ๋œ ๋กœ๋ด‡ ๋น„๋””์˜ค]
        E --> F["V2V ์ „ํ™˜<br/>(์™ธํ˜• ๋‹ค์–‘์„ฑ)"]
        F --> G[์‹œ๊ฐ์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ๋น„๋””์˜ค]
        G --> H["IDM ์•ก์…˜ ์ถ”์ถœ"]
        H --> I["Neural Trajectory<br/>(๋น„๋””์˜ค + ์•ก์…˜)"]
    end

    subgraph FILTERING["2๏ธโƒฃ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ฆฌํ”Œ๋ ˆ์ด ์ผ์น˜์„ฑ ํ•„ํ„ฐ๋ง"]
        I --> J["์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ<br/>์•ก์…˜ ๋ฆฌํ”Œ๋ ˆ์ด"]
        J --> K[์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋กค์•„์›ƒ ๋น„๋””์˜ค]
        K --> L{"Attentive Probe<br/>๋ชจ์…˜ ์ผ์น˜ ํŒ์ •"}
        G --> L
        L -->|์ผ์น˜| M["โœ… ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ"]
        L -->|๋ถˆ์ผ์น˜| N["โŒ ์ €ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ ์ œ๊ฑฐ"]
    end

    subgraph BESTOFN["3๏ธโƒฃ Best-of-N ์ƒ˜ํ”Œ๋ง"]
        O["N๊ฐœ ๋น„๋””์˜ค ํ›„๋ณด ์ƒ์„ฑ"] --> P["๊ฐ๊ฐ ํ•„ํ„ฐ๋ง ์ ์ˆ˜ ๊ณ„์‚ฐ"]
        P --> Q["์ตœ๊ณ  ์ ์ˆ˜ ์„ ํƒ"]
    end

    M --> R["VLA ์ •์ฑ… ํ•™์Šต<br/>(GR00T N1.5)"]
    Q --> R

    style GENERATION fill:#E8F4FD,stroke:#2196F3
    style FILTERING fill:#FFF3E0,stroke:#FF9800
    style BESTOFN fill:#E8F5E9,stroke:#4CAF50

RoboCurate ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐœ์š”

ํ•˜๋‚˜์”ฉ ๋œฏ์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

3.1 ๋‹ค์–‘ํ•œ Neural Trajectory ์ƒ์„ฑ

RoboCurate๋Š” ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์„ ๋‘ ๊ฐ€์ง€ ์ฐจ์›์—์„œ ์ฆํญํ•ฉ๋‹ˆ๋‹ค.

Image-to-Image (I2I) ํŽธ์ง‘: ์žฅ๋ฉด ์ˆ˜์ค€ ๋ณ€ํ˜•

๊ธฐ์กด ๋ฐ์ดํ„ฐ์…‹์˜ ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„์„ ๊ฐ€์ ธ์™€์„œ, ํ™•์‚ฐ(diffusion) ๊ธฐ๋ฐ˜ I2I ๋ชจ๋ธ๋กœ ์žฅ๋ฉด์„ ํŽธ์ง‘ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด โ€œ์ฃผ๋ฐฉ ๋ฐฐ๊ฒฝ์„ ๋ฐ”๊ฟ”๋ผโ€, โ€œ์กฐ๋ช…์„ ๋ณ€๊ฒฝํ•˜๋ผโ€, โ€œํ…Œ์ด๋ธ” ์œ„ ๋ฌผ์ฒด ๋ฐฐ์น˜๋ฅผ ๋‹ฌ๋ฆฌํ•˜๋ผโ€ ๋“ฑ์˜ ๋ณ€ํ˜•์„ ๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด์ฃ .

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ•˜๋‚˜์˜ ์›๋ณธ ํ”„๋ ˆ์ž„์—์„œ ์ˆ˜์‹ญ ๊ฐœ์˜ ๋‹ค์–‘ํ•œ ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ณ , ๊ฐ๊ฐ์—์„œ I2V ๋ชจ๋ธ๋กœ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๋ฉด ์žฅ๋ฉด ๋‹ค์–‘์„ฑ(scene diversity)์ด ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

Video-to-Video (V2V) ์ „ํ™˜: ์™ธํ˜• ๋ณ€ํ˜•

์ƒ์„ฑ๋œ ๋น„๋””์˜ค ์ „์ฒด์— V2V ์Šคํƒ€์ผ ์ „ํ™˜์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ์€ ๋ชจ์…˜์€ ๋ณด์กดํ•˜๋ฉด์„œ ์™ธํ˜•๋งŒ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋กœ๋ด‡ ํŒ”์˜ ์›€์ง์ž„ ๊ถค์ ์€ ๊ทธ๋Œ€๋กœ ๋‘๊ณ , ๋กœ๋ด‡์˜ ์ƒ‰์ƒ, ์งˆ๊ฐ, ๋ฐฐ๊ฒฝ์˜ ์‹œ๊ฐ์  ์Šคํƒ€์ผ๋งŒ ๋ฐ”๊พธ๋Š” ๊ฑฐ์ฃ .

์ด๊ฒƒ์ด ์ค‘์š”ํ•œ ์ด์œ ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. V2V๋Š” ์›๋ณธ ๋น„๋””์˜ค์˜ ๋ชจ์…˜ ๊ตฌ์กฐ๋ฅผ ์ฐธ์กฐํ•˜๋ฏ€๋กœ, I2V๋กœ ์ฒ˜์Œ๋ถ€ํ„ฐ ์ƒˆ๋กœ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋ฌผ๋ฆฌ์  ์ผ๊ด€์„ฑ์ด ๋†’์€ ๋น„๋””์˜ค๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋™์‹œ์— ์‹œ๊ฐ์  ์™ธํ˜•์€ ์™„์ „ํžˆ ๋‹ฌ๋ผ์ง€๋ฏ€๋กœ, ์ •์ฑ… ๋ชจ๋ธ์ด ํŠน์ • ์‹œ๊ฐ์  ํŒจํ„ด์— ๊ณผ์ ํ•ฉ(overfit)ํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

VLM์„ ํ™œ์šฉํ•œ ํƒœ์Šคํฌ ์ง€์‹œ๋ฌธ ์ƒ์„ฑ

๋‹ค์–‘์„ฑ์˜ ๋˜ ๋‹ค๋ฅธ ์ถ•์€ ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ์ž…๋‹ˆ๋‹ค. RoboCurate๋Š” VLM(์˜ˆ: GPT-4 ์Šคํƒ€์ผ ๋ชจ๋ธ)์— ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„์„ ๋ณด์—ฌ์ฃผ๊ณ , ํ•ด๋‹น ์žฅ๋ฉด์—์„œ ๊ฐ€๋Šฅํ•œ ์กฐ์ž‘ ํƒœ์Šคํฌ ์ง€์‹œ๋ฌธ์„ ๋‹ค์–‘ํ•˜๊ฒŒ ์ƒ์„ฑํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. โ€œ์ปต์„ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์˜ฎ๊ฒจ๋ผโ€, โ€œ์„œ๋ž์„ ์—ด์–ด๋ผโ€, โ€œ๊ทธ๋ฆ‡์„ ์„ธ์›Œ๋ผโ€ ๋“ฑ ์Šคํ‚ฌ, ๋Œ€์ƒ ๋ฌผ์ฒด, ๋ฐฐ์น˜ ์กฐ๊ฑด, ํ•ธ๋“œ ํƒ€์ž…์˜ ์กฐํ•ฉ์„ ์ฒด๊ณ„์ ์œผ๋กœ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค.

3.2 ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ฆฌํ”Œ๋ ˆ์ด ์ผ์น˜์„ฑ ํ•„ํ„ฐ๋ง

์ด๊ฒƒ์ด RoboCurate์˜ ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ๊ธฐ์—ฌ์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์•„์ด๋””์–ด

์ƒ๊ฐํ•ด๋ณด๋ฉด ๊ฝค ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค:

  1. ์ƒ์„ฑ๋œ ๋น„๋””์˜ค์—์„œ IDM์ด ์•ก์…˜ ์‹œํ€€์Šค \hat{a}_{1:T}๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  2. ์ด ์•ก์…˜ ์‹œํ€€์Šค๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๊ทธ๋Œ€๋กœ ๋ฆฌํ”Œ๋ ˆ์ดํ•ฉ๋‹ˆ๋‹ค.
  3. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋Š” ํ•ด๋‹น ์•ก์…˜์— ๋Œ€ํ•ด ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ •ํ™•ํ•œ ๋กค์•„์›ƒ ๋น„๋””์˜ค๋ฅผ ๋ Œ๋”๋งํ•ฉ๋‹ˆ๋‹ค.
  4. ์ƒ์„ฑ ๋น„๋””์˜ค์™€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋กค์•„์›ƒ ๋น„๋””์˜ค์˜ ๋ชจ์…˜ ํŒจํ„ด์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

๋งŒ์•ฝ IDM์ด ์˜ˆ์ธกํ•œ ์•ก์…˜์ด ์ •ํ™•ํ•˜๋‹ค๋ฉด, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๋ฆฌํ”Œ๋ ˆ์ดํ•œ ๋น„๋””์˜ค์™€ ์›๋ž˜ ์ƒ์„ฑ ๋น„๋””์˜ค์˜ ๋กœ๋ด‡ ์›€์ง์ž„์ด ์œ ์‚ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ, ์•ก์…˜์ด ๋ถ€์ •ํ™•ํ•˜๋‹ค๋ฉด ๋‘ ๋น„๋””์˜ค์˜ ๋ชจ์…˜์€ ํฌ๊ฒŒ ๋‹ค๋ฅผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋น„์œ ํ•˜์ž๋ฉด, ๋ฌด์šฉ ๊ณต์—ฐ ์˜์ƒ์„ ๋ณด๊ณ  ๋ˆ„๊ตฐ๊ฐ€ ์•ˆ๋ฌด ์•…๋ณด๋ฅผ ์ ์—ˆ๋‹ค๊ณ  ํ•ฉ์‹œ๋‹ค. ๊ทธ ์•ˆ๋ฌด ์•…๋ณด๊ฐ€ ๋งž๋Š”์ง€ ํ™•์ธํ•˜๋ ค๋ฉด? ๋‹ค๋ฅธ ๋Œ„์„œ์—๊ฒŒ ๊ทธ ์•…๋ณด๋Œ€๋กœ ์ถฐ๋ณด๋ผ๊ณ  ํ•˜๊ณ , ์›๋ณธ ์˜์ƒ๊ณผ ๋น„๊ตํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. RoboCurate๊ฐ€ ํ•˜๋Š” ์ผ์ด ์ •ํ™•ํžˆ ์ด๊ฒƒ์ž…๋‹ˆ๋‹ค โ€” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ โ€œ๋‹ค๋ฅธ ๋Œ„์„œโ€ ์—ญํ• ์„ ํ•˜๋Š” ๊ฑฐ์ฃ .

Attentive Probe: ๋ชจ์…˜ ์ผ์น˜ ํŒ์ •๊ธฐ

๋‘ ๋น„๋””์˜ค์˜ ๋ชจ์…˜์ด ์ผ์น˜ํ•˜๋Š”์ง€ ์–ด๋–ป๊ฒŒ ํŒ๋‹จํ• ๊นŒ์š”? ํ”ฝ์…€ ๋‹จ์œ„ ๋น„๊ต๋Š” ์˜๋ฏธ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค โ€” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ Œ๋”๋ง๊ณผ ์ƒ์„ฑ ๋น„๋””์˜ค์˜ ์‹œ๊ฐ์  ์™ธํ˜•์€ ์™„์ „ํžˆ ๋‹ค๋ฅด๋‹ˆ๊นŒ์š”.

RoboCurate๋Š” ์‚ฌ์ „ํ•™์Šต๋œ ๋น„๋””์˜ค ์ธ์ฝ”๋” ์œ„์— ๊ฒฝ๋Ÿ‰ Attentive Probe๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ:

  1. ์ƒ์„ฑ ๋น„๋””์˜ค์™€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋กค์•„์›ƒ ๋น„๋””์˜ค๋ฅผ ๊ฐ๊ฐ ๋น„๋””์˜ค ์ธ์ฝ”๋”์— ํ†ต๊ณผ์‹œ์ผœ ํŠน์ง•(feature)์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  2. Attentive Probe๊ฐ€ ๋‘ ํŠน์ง• ๋ฒกํ„ฐ์˜ ๋ชจ์…˜ ํŒจํ„ด๊ณผ ๋กœ๋ด‡ ๊ธฐํ•˜ํ•™์  ๊ตฌ์กฐ์˜ ์ผ์น˜ ์—ฌ๋ถ€๋ฅผ ์ด์ง„ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

\text{score}(v_{\text{gen}}, v_{\text{sim}}) = f_{\text{probe}}\big(\phi(v_{\text{gen}}), \phi(v_{\text{sim}})\big) \in [0, 1]

์—ฌ๊ธฐ์„œ \phi๋Š” ์‚ฌ์ „ํ•™์Šต๋œ ๋น„๋””์˜ค ์ธ์ฝ”๋”, f_{\text{probe}}๋Š” ์–ดํ…์…˜ ๊ธฐ๋ฐ˜ ๊ฒฝ๋Ÿ‰ ๋ถ„๋ฅ˜๊ธฐ์ž…๋‹ˆ๋‹ค.

์ด ์ ์ˆ˜๊ฐ€ ๋†’์œผ๋ฉด โ†’ IDM ์•ก์…˜์ด ์ •ํ™•ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ โ†’ ๋ฐ์ดํ„ฐ ๋ณด์กด ์ด ์ ์ˆ˜๊ฐ€ ๋‚ฎ์œผ๋ฉด โ†’ IDM ์•ก์…˜์ด ๋ถ€์ •ํ™•ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ โ†’ ๋ฐ์ดํ„ฐ ์ œ๊ฑฐ

ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ

Attentive Probe์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์‰ฝ๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์–‘์„ฑ ์ƒ˜ํ”Œ (Positive): ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์‹คํ–‰ํ•œ ์‹ค์ œ ๊ถค์ ์˜ ๋น„๋””์˜ค + ๋™์ผ ์•ก์…˜ ๋ฆฌํ”Œ๋ ˆ์ด ๋น„๋””์˜ค โ†’ ๋ชจ์…˜ ์ผ์น˜
  • ์Œ์„ฑ ์ƒ˜ํ”Œ (Negative): ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋น„๋””์˜ค + ๋‹ค๋ฅธ ์•ก์…˜์œผ๋กœ ๋ฆฌํ”Œ๋ ˆ์ดํ•œ ๋น„๋””์˜ค โ†’ ๋ชจ์…˜ ๋ถˆ์ผ์น˜

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ณ„๋„์˜ ์ธ๊ฐ„ ๋ผ๋ฒจ๋ง ์—†์ด, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋งŒ์œผ๋กœ ๋Œ€๊ทœ๋ชจ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3.3 Best-of-N ์ƒ˜ํ”Œ๋ง

ํ•„ํ„ฐ๋ง ์ „๋žต์˜ ํ™•์žฅ์œผ๋กœ, RoboCurate๋Š” ์ƒ์„ฑ ๋‹จ๊ณ„์—์„œ๋„ ์ผ์น˜์„ฑ ์ ์ˆ˜๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„๊ณผ ์ง€์‹œ๋ฌธ์— ๋Œ€ํ•ด N๊ฐœ์˜ ๋น„๋””์˜ค ํ›„๋ณด๋ฅผ ์„œ๋กœ ๋‹ค๋ฅธ ๋žœ๋ค ์‹œ๋“œ๋กœ ์ƒ์„ฑํ•˜๊ณ , ๊ฐ๊ฐ์˜ ์ผ์น˜์„ฑ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ ๋’ค ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜์˜ ๋น„๋””์˜ค๋งŒ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

v^* = \arg\max_{v_i \in \{v_1, \ldots, v_N\}} \text{score}(v_i, \text{SimReplay}(\text{IDM}(v_i)))

์ด๊ฒƒ์€ RLHF์—์„œ ํ”ํžˆ ์‚ฌ์šฉํ•˜๋Š” Best-of-N ์ƒ˜ํ”Œ๋ง๊ณผ ๋™์ผํ•œ ์›๋ฆฌ์ž…๋‹ˆ๋‹ค. ๋ณด์ƒ ๋ชจ๋ธ(์—ฌ๊ธฐ์„œ๋Š” ์ผ์น˜์„ฑ ์ ์ˆ˜) ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€์žฅ ์ข‹์€ ํ›„๋ณด๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด์ฃ . ์ƒ์„ฑ ๋ชจ๋ธ ์ž์ฒด๋ฅผ ์žฌํ•™์Šตํ•˜์ง€ ์•Š์•„๋„ ์ถœ๋ ฅ ํ’ˆ์งˆ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.


์‹คํ—˜: ์–ด๋–ค ์ฆ๊ฑฐ๊ฐ€ ์žˆ๋Š”๊ฐ€?

์‹คํ—˜ ์„ค์ •

ํ‰๊ฐ€ ํ™˜๊ฒฝ

RoboCurate๋Š” ์„ธ ๊ฐ€์ง€ ๋ฒค์น˜๋งˆํฌ์—์„œ ํ‰๊ฐ€๋ฉ๋‹ˆ๋‹ค:

๋ฒค์น˜๋งˆํฌ ์„ค๋ช… ํƒœ์Šคํฌ ์ˆ˜ ํŠน์ง•
GR-1 Tabletop RoboCasa ๊ธฐ๋ฐ˜ ํ…Œ์ด๋ธ”ํƒ‘ ์กฐ์ž‘ ๋‹ค์ˆ˜ ๊ธฐ๋ณธ VLA ๋ฒค์น˜๋งˆํฌ
DexMimicGen ์–‘์† ์กฐ์ž‘(bimanual) ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‹ค์ˆ˜ ์ •๊ตํ•œ ์† ์กฐ์ž‘
ALLEX Humanoid ์‹ค์ œ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡ ๋‹ค์ˆ˜ ์‹ค์„ธ๊ณ„(Real-world)

๊ธฐ๋ณธ ์ •์ฑ… ๋ชจ๋ธ

NVIDIA์˜ GR00T N1.5๋ฅผ ๊ธฐ๋ณธ ์ •์ฑ…์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. GR00T N1.5๋Š” VLM(System 2) + Diffusion Transformer(System 1)์˜ ์ด์ค‘ ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ฐ€์ง„ VLA ๋ชจ๋ธ๋กœ, ํ˜„์žฌ ์˜คํ”ˆ์†Œ์Šค VLA ์ค‘ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

ํ•™์Šต ์„ค์ •: 2๋‹จ๊ณ„ ๊ตฌ์กฐ

์‹คํ—˜์€ ๋‘ ๊ฐ€์ง€ ์„ค์ •์œผ๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค:

์‚ฌ์ „ํ•™์Šต(Pre-training) ์„ค์ •:

  • ActionNet(Fourier ๋กœ๋ณดํ‹ฑ์Šค์˜ ์‹ค์ œ ๋ฐ์ดํ„ฐ) + Neural Trajectory๋ฅผ 1:1 ๋น„์œจ๋กœ ํ˜ผํ•ฉ
  • 60K ๊ทธ๋ž˜๋””์–ธํŠธ ์Šคํ… ํ•™์Šต
  • ํ•ต์‹ฌ ํŠธ๋ฆญ: ์ฒ˜์Œ 50K ์Šคํ…์€ ๋ชจ๋“  Neural Trajectory ์‚ฌ์šฉ, ๋งˆ์ง€๋ง‰ 10K ์Šคํ…์€ RoboCurate๋กœ ํ•„ํ„ฐ๋ง๋œ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋งŒ ์‚ฌ์šฉ

๊ณต๋™ ํŒŒ์ธํŠœ๋‹(Co-finetuning) ์„ค์ •:

  • ์‚ฌ์ „ํ•™์Šต๋œ GR00T N1.5 ์œ„์— ํƒœ์Šคํฌ๋ณ„ ํŒŒ์ธํŠœ๋‹
  • Best-of-N ์ƒ˜ํ”Œ๋ง๋œ Neural Trajectory + ์‹ค์ œ ๋ฐ์ดํ„ฐ

์ฃผ์š” ๊ฒฐ๊ณผ

ํ—ค๋“œ๋ผ์ธ ์ˆ˜์น˜

RoboCurate์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค:

๋ฒค์น˜๋งˆํฌ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋งŒ ์‚ฌ์šฉ ๋Œ€๋น„ ์ƒ๋Œ€์  ์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ
GR-1 Tabletop (300 demos) +70.1%
DexMimicGen (์‚ฌ์ „ํ•™์Šต) +16.1%
ALLEX Humanoid (์‹ค์„ธ๊ณ„) +179.9%

ํŠนํžˆ ์‹ค์„ธ๊ณ„ ALLEX ํœด๋จธ๋…ธ์ด๋“œ์—์„œ +179.9%๋ผ๋Š” ์ˆ˜์น˜๊ฐ€ ๋ˆˆ๊ธธ์„ ๋•๋‹ˆ๋‹ค. ์ด๋Š” ์‹ค์ œ ๋กœ๋ด‡์—์„œ์˜ ์„ฑ๊ณต๋ฅ ์ด ๊ฑฐ์˜ 3๋ฐฐ ๊ฐ€๊นŒ์ด ํ–ฅ์ƒ๋˜์—ˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

ํ•„ํ„ฐ๋ง ์ „๋žต ๋น„๊ต

๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๊ณผ์˜ ๋น„๊ต์—์„œ, RoboCurate์˜ ์•ก์…˜ ์ˆ˜์ค€ ํ•„ํ„ฐ๋ง์ด ๋น„๋””์˜ค ์ˆ˜์ค€ ํ’ˆ์งˆ ํ‰๊ฐ€๋ณด๋‹ค ์šฐ์›”ํ•ฉ๋‹ˆ๋‹ค:

ํ•„ํ„ฐ๋ง ๋ฐฉ๋ฒ• ์ ‘๊ทผ ๋ฐฉ์‹ ํ•œ๊ณ„
DreamGenBench (VLM ๊ธฐ๋ฐ˜) VLM์— โ€œ์ด ๋น„๋””์˜ค๊ฐ€ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํƒ€๋‹นํ•œ๊ฐ€?โ€ ์งˆ์˜ ๋ฌผ๋ฆฌ ๋ฒ•์น™ ์ดํ•ด ๋ถ€์กฑ, ์•ก์…˜ ํ‰๊ฐ€ ๋ถˆ๊ฐ€
์˜์ƒ ํ’ˆ์งˆ ๋ฉ”ํŠธ๋ฆญ FVD, SSIM ๋“ฑ ๋น„๋””์˜ค ํ’ˆ์งˆ ์ธก์ • ์‹œ๊ฐ์  ํ’ˆ์งˆ โ‰  ์•ก์…˜ ์ •ํ™•๋„
RoboCurate (๋ณธ ๋…ผ๋ฌธ) ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ฆฌํ”Œ๋ ˆ์ด + ๋ชจ์…˜ ์ผ์น˜ ๋ถ„๋ฅ˜ โœ… ์•ก์…˜ ์ž์ฒด๋ฅผ ์ง์ ‘ ๊ฒ€์ฆ

ํ•ต์‹ฌ ์ธ์‚ฌ์ดํŠธ๋Š” ๋ช…ํ™•ํ•ฉ๋‹ˆ๋‹ค: ๋น„๋””์˜ค๊ฐ€ โ€œ๋ณด๊ธฐ ์ข‹์€์ง€โ€์™€ โ€œ์•ก์…˜์ด ๋งž๋Š”์ง€โ€๋Š” ๋‹ค๋ฅธ ๋ฌธ์ œ์ด๋ฉฐ, ํ›„์ž๋ฅผ ๊ฒ€์ฆํ•˜๋ ค๋ฉด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ผ๋Š” โ€œ๋ฌผ๋ฆฌ์  ์ง„์‹ค(ground truth)โ€์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์–‘์„ฑ์˜ ํšจ๊ณผ

xychart-beta
    title "ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ๊ณผ ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์˜ ํšจ๊ณผ"
    x-axis ["๋‚ฎ์€ ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ", "์ค‘๊ฐ„ ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ", "๋†’์€ ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ"]
    y-axis "์„ฑ๊ณต๋ฅ  (%)" 0 --> 80
    bar [35, 52, 65]
    bar [42, 60, 73]

๋‹ค์–‘์„ฑ ์ฆ๊ฐ€์— ๋”ฐ๋ฅธ ์ •์ฑ… ์„ฑ๋Šฅ ๋ณ€ํ™”

์œ„ ์ฐจํŠธ์—์„œ ์ฒซ ๋ฒˆ์งธ ๋ง‰๋Œ€๋Š” ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ๋งŒ ์ ์šฉํ•œ ๊ฒฝ์šฐ, ๋‘ ๋ฒˆ์งธ ๋ง‰๋Œ€๋Š” ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ + I2I/V2V ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์„ ๋ชจ๋‘ ์ ์šฉํ•œ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์˜ Table 5์—์„œ ๋ฐํžŒ ํ•ต์‹ฌ ๋ฐœ๊ฒฌ:

  1. ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ ์ฆ๊ฐ€ โ†’ ์„ฑ๋Šฅ ๋‹จ์กฐ ์ฆ๊ฐ€: ๊ณ ์ •๋œ 10K Neural Trajectory ๋ฐ์ดํ„ฐ์…‹์—์„œ, ๊ณ ์œ  ํƒœ์Šคํฌ ์ˆ˜๋ฅผ ๋Š˜๋ฆด์ˆ˜๋ก VLA ์„ฑ๋Šฅ์ด ๊พธ์ค€ํžˆ ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.
  2. ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์˜ ์ถ”๊ฐ€ ํšจ๊ณผ: ๊ฐ™์€ ํƒœ์Šคํฌ ๋‹ค์–‘์„ฑ ์ˆ˜์ค€์—์„œ๋„ I2I/V2V ํŒŒ์ดํ”„๋ผ์ธ์„ ์ ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ถ”๊ฐ€๋กœ ์ƒ์Šนํ•ฉ๋‹ˆ๋‹ค.
  3. I2I + V2V > ์ˆœ์ˆ˜ I2V: ๊ธฐ์กด I2V ํŒŒ์ดํ”„๋ผ์ธ(DreamGen ๋ฐฉ์‹)๋ณด๋‹ค, I2I ํŽธ์ง‘ ํ›„ V2V ์ „ํ™˜์„ ๊ฑฐ์น˜๋Š” ๋ฐฉ์‹์ด ๋” ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰ ๋ฐœ๊ฒฌ์ด ํŠนํžˆ ํฅ๋ฏธ๋กญ์Šต๋‹ˆ๋‹ค. ์ฒ˜์Œ๋ถ€ํ„ฐ ์ƒˆ๋กœ ์ƒ์„ฑ(I2V)ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค, ๊ธฐ์กด ๋น„๋””์˜ค๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ณ€ํ˜•(I2I+V2V)ํ•˜๋Š” ๊ฒƒ์ด ๋‚ซ๋‹ค๋Š” ๊ฒƒ์€, ์ƒ์„ฑ ๋ชจ๋ธ์ด ์•„์ง ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์™„๋ฒฝํ•œ ๋กœ๋ด‡ ๋น„๋””์˜ค๋ฅผ ๋งŒ๋“ค์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ˜„์‹ค์„ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฐ์ดํ„ฐ์— โ€œ๊ธฐ๋Œ€์–ดโ€ ๋‹ค์–‘์„ฑ์„ ํ™•๋ณดํ•˜๋Š” ๊ฒƒ์ด ๋” ์•ˆ์ „ํ•œ ์ „๋žต์ธ ์…ˆ์ด์ฃ .


๊ธฐ์ˆ ์  ์‹ฌ์ธต ๋ถ„์„

IDM(Inverse Dynamics Model)์˜ ์—ญํ• ๊ณผ ํ•œ๊ณ„

Neural Trajectory ํŒŒ์ดํ”„๋ผ์ธ์—์„œ IDM์€ ํ•ต์‹ฌ์ ์ธ ๋ณ‘๋ชฉ(bottleneck)์ž…๋‹ˆ๋‹ค. IDM์€ ์—ฐ์†๋œ ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ (o_t, o_{t+1})๋กœ๋ถ€ํ„ฐ ๊ทธ ์‚ฌ์ด์˜ ์•ก์…˜ \hat{a}_t๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค:

\hat{a}_t = \text{IDM}(o_t, o_{t+1})

๋ฌธ์ œ๋Š” IDM์ด ์‹ค์ œ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋˜๋ฏ€๋กœ, ์ƒ์„ฑ ๋ชจ๋ธ์ด ๋งŒ๋“  ๋น„๋””์˜ค์—์„œ๋Š” ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ƒ์„ฑ ๋น„๋””์˜ค์˜ ๋ฏธ๋ฌ˜ํ•œ ์•„ํ‹ฐํŒฉํŠธ, ๋น„ํ˜„์‹ค์ ์ธ ๋ชจ์…˜ ๋ธ”๋Ÿฌ, ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฌผ์ฒด ๋ณ€ํ˜• ๋“ฑ์ด IDM์˜ ์˜ˆ์ธก์„ ํ˜ผ๋ž€์‹œํ‚ต๋‹ˆ๋‹ค.

RoboCurate์˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-๋ฆฌํ”Œ๋ ˆ์ด ํ•„ํ„ฐ๋ง์€ ์ •ํ™•ํžˆ ์ด ๋ฌธ์ œ๋ฅผ ํƒ€๊ฒŸํ•ฉ๋‹ˆ๋‹ค. IDM์ด ํ‹€๋ ธ๋Š”์ง€ ์•„๋‹Œ์ง€๋ฅผ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ๊ฐ„์ ‘์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๋Š” ๊ฒƒ์ด์ฃ .

ํ•„ํ„ฐ๋ง ํ”„๋กœ๋ธŒ์˜ ์„ค๊ณ„ ์ฒ ํ•™

Attentive Probe ์„ค๊ณ„์—์„œ ์ฃผ๋ชฉํ•  ์ ์€ ๋น„๋””์˜ค ์ธ์ฝ”๋”๋ฅผ ๊ณ ์ •(freeze)ํ•˜๊ณ  ๊ฒฝ๋Ÿ‰ ํ”„๋กœ๋ธŒ๋งŒ ํ•™์Šตํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ๋น„๋””์˜ค ์ธ์ฝ”๋”๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์ด๋ฏธ ํ•™์Šต๋œ ์‹œ๊ฐ์  ํ‘œํ˜„ ์œ„์— ์–‡์€ ๋ถ„๋ฅ˜ ๋ ˆ์ด์–ด๋งŒ ์–น๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฐ ์„ค๊ณ„๊ฐ€ ํ•ฉ๋ฆฌ์ ์ธ ์ด์œ ๋Š”:

  1. ํšจ์œจ์„ฑ: ๋น„๋””์˜ค ์ธ์ฝ”๋” ํŒŒ์ธํŠœ๋‹์€ ๊ณ„์‚ฐ ๋น„์šฉ์ด ํฝ๋‹ˆ๋‹ค.
  2. ์ผ๋ฐ˜ํ™”: ๊ณ ์ •๋œ ์ธ์ฝ”๋”์˜ ๋ฒ”์šฉ ํ‘œํ˜„์ด ๋‹ค์–‘ํ•œ ์žฅ๋ฉด์—์„œ ๋” ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
  3. ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ: ์–ดํ…์…˜ ๊ฐ€์ค‘์น˜๋ฅผ ํ†ตํ•ด โ€œ์–ด๋””๋ฅผ ๋ณด๊ณ  ํŒ๋‹จํ•˜๋Š”์ง€โ€ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณ„๋„์˜ Embodiment Tag ์ „๋žต

ํ•™์Šต ์‹œ ํฅ๋ฏธ๋กœ์šด ํŠธ๋ฆญ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ๋ฐ์ดํ„ฐ(ActionNet)์™€ Neural Trajectory๊ฐ€ ๊ฐ™์€ GR-1 ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ์ž„์—๋„, ์„œ๋กœ ๋‹ค๋ฅธ embodiment tag๋ฅผ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค. ์ด์œ ๋Š” IDM์ด ์˜ˆ์ธกํ•œ ์•ก์…˜์˜ ํ†ต๊ณ„์  ๋ถ„ํฌ๊ฐ€ ์‹ค์ œ ํ…”๋ ˆ์˜คํผ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ์™€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ์‹ค๋ฌด์ ์œผ๋กœ ๋งค์šฐ ์ค‘์š”ํ•œ ํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค. ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์™€ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹จ์ˆœํžˆ ์„ž์œผ๋ฉด ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋ณ„๋„์˜ embodiment tag๋กœ ๋ถ„๋ฆฌํ•˜๋ฉด ๋ชจ๋ธ์ด ๊ฐ ๋ฐ์ดํ„ฐ ์†Œ์Šค์˜ ํŠน์„ฑ์„ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ์Šค์ผ€์ค„๋ง: Curriculum Learning์  ์ ‘๊ทผ

์ฒ˜์Œ 50K ์Šคํ…์€ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ๋งˆ์ง€๋ง‰ 10K ์Šคํ…์—์„œ๋งŒ ํ•„ํ„ฐ๋ง๋œ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ „๋žต๋„ ๋ˆˆ์—ฌ๊ฒจ๋ณผ ๋งŒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ผ์ข…์˜ ์ปค๋ฆฌํ˜๋Ÿผ ๋Ÿฌ๋‹์œผ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์ดˆ๊ธฐ ํ•™์Šต: ์–‘์€ ๋งŽ์ง€๋งŒ ํ’ˆ์งˆ์ด ํ˜ผ์žฌ๋œ ๋ฐ์ดํ„ฐ๋กœ ์ผ๋ฐ˜์ ์ธ ํ‘œํ˜„ ํ•™์Šต
  • ํ›„๊ธฐ ํ•™์Šต: ์ •์ œ๋œ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋กœ ์ •๋ฐ€ํ•œ ์•ก์…˜ ์ƒ์„ฑ ๋Šฅ๋ ฅ ์—ฐ๋งˆ

๋ชจ๋ธ์ด ์ด๋ฏธ ์ถฉ๋ถ„ํ•œ ์‹œ๊ฐ์ /์–ธ์–ด์  ์ดํ•ด๋ฅผ ๊ฐ–์ถ˜ ํ›„์—, ์ •ํ™•ํ•œ ์•ก์…˜์— ์ง‘์ค‘ํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์„ค์ด๋ฉฐ, ์‹คํ—˜์ด ์ด๋ฅผ ๋’ท๋ฐ›์นจํ•ฉ๋‹ˆ๋‹ค.


๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต

graph LR
    A["์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜<br/>(MimicGen, DexMimicGen)"] --> D["ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ"]
    B["๋น„๋””์˜ค ์ƒ์„ฑ ๊ธฐ๋ฐ˜<br/>(DreamGen, ROSIE)"] --> D
    C["Real-to-Sim-to-Real<br/>(ReBot, RialTo)"] --> D
    D --> E["VLA ์ •์ฑ… ํ•™์Šต"]

    F["RoboCurate"] --> D
    F -.->|"ํ•„ํ„ฐ๋ง ์ถ”๊ฐ€"| B
    F -.->|"์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ํ™œ์šฉ"| A

    style F fill:#FF9800,stroke:#E65100,color:#fff
    style D fill:#E3F2FD,stroke:#1976D2

ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐฉ๋ฒ•๋ก ์˜ ๊ณ„๋ณด

DreamGen (NVIDIA, 2025)

RoboCurate์˜ ๊ฐ€์žฅ ์ง์ ‘์ ์ธ ์„ ํ–‰ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค. DreamGen์€ ๋น„๋””์˜ค ์›”๋“œ ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹ํ•˜์—ฌ Neural Trajectory๋ฅผ ์ƒ์„ฑํ•˜๊ณ , IDM ๋˜๋Š” LAPA(Latent Action Model)๋กœ ์•ก์…˜์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. GR-1 ํœด๋จธ๋…ธ์ด๋“œ์—์„œ 22๊ฐœ์˜ ์ƒˆ๋กœ์šด ํ–‰๋™์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋“ฑ ์ธ์ƒ์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์คฌ์Šต๋‹ˆ๋‹ค.

RoboCurate์™€์˜ ์ฐจ์ด์ :

  • DreamGen์€ ํ’ˆ์งˆ ํ•„ํ„ฐ๋ง ์—†์ด ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, VLM ๊ธฐ๋ฐ˜์˜ ๊ฐ„๋‹จํ•œ ์ฒดํฌ๋งŒ ํ•ฉ๋‹ˆ๋‹ค.
  • RoboCurate๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์•ก์…˜ ๊ฒ€์ฆ์ด๋ผ๋Š” ๋ณด๋‹ค ์—„๊ฒฉํ•œ ํ•„ํ„ฐ๋ง์„ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.
  • DreamGen์˜ ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์€ ์ดˆ๊ธฐ ํ”„๋ ˆ์ž„ ์ˆ˜์ค€์— ์ œํ•œ๋˜์ง€๋งŒ, RoboCurate๋Š” I2I + V2V๋กœ ์ด๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.

ReBot (2025)

ReBot์€ Real-to-Sim-to-Real ์ ‘๊ทผ๋ฒ•์„ ์ทจํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ๊ถค์ ์„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ๋ฆฌํ”Œ๋ ˆ์ดํ•˜๊ณ , ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ Œ๋”๋ง์— ์‹ค์„ธ๊ณ„ ๋ฐฐ๊ฒฝ์„ ์ธํŽ˜์ธํŒ…ํ•˜์—ฌ ํ•ฉ์„ฑ ๋น„๋””์˜ค๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌ์  ์ •ํ™•์„ฑ์€ ๋ณด์žฅ๋˜์ง€๋งŒ, ์‹ค์ œ ๊ถค์ ์— ์ข…์†๋˜๋ฏ€๋กœ ์ƒˆ๋กœ์šด ํ–‰๋™ ์ƒ์„ฑ์ด ์ œํ•œ๋ฉ๋‹ˆ๋‹ค.

RoboCurate๋Š” ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ƒ์„ฑ์  ์ž์œ ๋„๋ฅผ ํ™œ์šฉํ•˜๋ฉด์„œ๋„, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ๊ฒ€์ฆ ๋„๊ตฌ๋กœ ํ™œ์šฉํ•˜์—ฌ ๋‘ ์ ‘๊ทผ๋ฒ•์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.

Cosmos Policy (NVIDIA, 2025-2026)

NVIDIA์˜ Cosmos World Foundation Model ๊ธฐ๋ฐ˜ ์ •์ฑ… ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์•ก์…˜ ์กฐ๊ฑด๋ถ€(action-conditioned) ๋น„๋””์˜ค ์ƒ์„ฑ๊ณผ ์ฆ๋ฅ˜(distillation)๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. RoboCurate๋Š” Cosmos ์ƒํƒœ๊ณ„์™€ ์ƒํ˜ธ๋ณด์™„์ ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค โ€” Cosmos๋กœ ์ƒ์„ฑํ•œ Neural Trajectory๋ฅผ RoboCurate๋กœ ํ•„ํ„ฐ๋งํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.


๋น„ํŒ์  ๊ณ ์ฐฐ

๊ฐ•์ 

โœ… ๋ช…ํ™•ํ•˜๊ณ  ์‹ค์šฉ์ ์ธ ๋ฌธ์ œ ์ •์˜

โ€œํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์˜ ์•ก์…˜์ด ์ •ํ™•ํ•œ๊ฐ€?โ€๋ผ๋Š” ์งˆ๋ฌธ์€ Neural Trajectory๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋ชจ๋“  ์—ฐ๊ตฌ์ž์—๊ฒŒ ์ ˆ์‹คํ•œ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์ตœ์ดˆ๋กœ ์ฒด๊ณ„์ ์œผ๋กœ ๋‹ค๋ฃฌ ์ ์ด ๊ฐ€์žฅ ํฐ ๊ธฐ์—ฌ์ž…๋‹ˆ๋‹ค.

โœ… ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ์ฐฝ์˜์  ์žฌํ™œ์šฉ

์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์ด ์•„๋‹Œ ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ์— ํ™œ์šฉํ•œ๋‹ค๋Š” ๋ฐœ์ƒ์˜ ์ „ํ™˜์ด ์‹ ์„ ํ•ฉ๋‹ˆ๋‹ค. sim-to-real gap ๋•Œ๋ฌธ์— ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ ์“ฐ๊ธฐ ์–ด๋ ค์šด ์ƒํ™ฉ์—์„œ, โ€œ์‹ฌํŒ๊ด€โ€์œผ๋กœ์„œ์˜ ์ƒˆ๋กœ์šด ์—ญํ• ์„ ๋ถ€์—ฌํ•œ ๊ฒƒ์ด์ฃ .

โœ… ์‹ค์„ธ๊ณ„ ๊ฒ€์ฆ

ALLEX ํœด๋จธ๋…ธ์ด๋“œ์—์„œ์˜ ์‹ค์„ธ๊ณ„ ์‹คํ—˜์€ ์ด ์ ‘๊ทผ๋ฒ•์ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—๋งŒ ๋จธ๋ฌด๋ฅด์ง€ ์•Š์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. +179.9%๋ผ๋Š” ํ–ฅ์ƒ์€, ์ €ํ’ˆ์งˆ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ค์„ธ๊ณ„ ์„ฑ๋Šฅ์„ ํ•ด์น  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ ์ ˆํ•œ ํ๋ ˆ์ด์…˜์ด ์ด๋ฅผ ๋ฐ˜์ „์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ์ฆ๋ช…ํ•ฉ๋‹ˆ๋‹ค.

โœ… ๋ชจ๋“ˆ๋Ÿฌ ์„ค๊ณ„

I2I, V2V, ํ•„ํ„ฐ๋ง, Best-of-N์ด ๊ฐ๊ฐ ๋…๋ฆฝ์ ์œผ๋กœ ์ ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ DreamGen ํŒŒ์ดํ”„๋ผ์ธ์„ ์“ฐ๊ณ  ์žˆ๋Š” ์—ฐ๊ตฌ์ž๋ผ๋ฉด, ํ•„ํ„ฐ๋ง ๋ชจ๋“ˆ๋งŒ ์ถ”๊ฐ€๋กœ ๋ถ™์ด๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์•ฝ์ ๊ณผ ํ•œ๊ณ„

โš ๏ธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์˜์กด์„ฑ

๊ฐ€์žฅ ๋ณธ์งˆ์ ์ธ ํ•œ๊ณ„์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์€, Neural Trajectory์˜ ํ•ต์‹ฌ ์žฅ์ ์ธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์—†์ด๋„ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ•์ ์„ ๋ถ€๋ถ„์ ์œผ๋กœ ์ƒ์‡„ํ•ฉ๋‹ˆ๋‹ค. ๋ฌผ๋ก  ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ โ€œ์ƒ์„ฑโ€์ด ์•„๋‹Œ โ€œ๊ฒ€์ฆโ€์—๋งŒ ์“ฐ๋ฏ€๋กœ ๋ถ€๋‹ด์€ ์ค„์ง€๋งŒ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์…‹์—…์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ํ™˜๊ฒฝ์—์„œ๋Š” ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

โš ๏ธ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ-์‹ค์„ธ๊ณ„ ๊ฐ„๊ทน

์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋ฌผ๋ฆฌ ์—”์ง„์ด ์‹ค์„ธ๊ณ„๋ฅผ ์™„๋ฒฝํžˆ ๋ชจ์‚ฌํ•˜์ง€ ๋ชปํ•œ๋‹ค๋ฉด, โ€œ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ž˜ ๋ฆฌํ”Œ๋ ˆ์ด๋˜๋Š” ์•ก์…˜โ€์ด โ€œ์‹ค์„ธ๊ณ„์—์„œ๋„ ์ข‹์€ ์•ก์…˜โ€์ด๋ผ๋Š” ๋ณด์žฅ์ด ์•ฝํ•ด์ง‘๋‹ˆ๋‹ค. ํŠนํžˆ ๋ณ€ํ˜•์ฒด(deformable objects), ์œ ์ฒด, ์ ‘์ด‰ ์—ญํ•™์ด ๋ณต์žกํ•œ ํƒœ์Šคํฌ์—์„œ๋Š” ์ด ๊ฐ„๊ทน์ด ๋ฌธ์ œ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โš ๏ธ ์ด์ง„ ๋ถ„๋ฅ˜์˜ ํ•œ๊ณ„

Attentive Probe๊ฐ€ โ€œ์ผ์น˜/๋ถˆ์ผ์น˜โ€๋ฅผ ์ด์ง„ ๋ถ„๋ฅ˜ํ•œ๋‹ค๋Š” ๊ฒƒ์€, ๋ฏธ์„ธํ•œ ํ’ˆ์งˆ ์ฐจ์ด๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์—ฐ์†์ ์ธ ํ’ˆ์งˆ ์ ์ˆ˜(regression)๋กœ ํ™•์žฅํ•˜๋ฉด ๋” ์„ธ๋ฐ€ํ•œ ํ•„ํ„ฐ๋ง์ด ๊ฐ€๋Šฅํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

โš ๏ธ GR00T ์ƒํƒœ๊ณ„์— ๋Œ€ํ•œ ์˜์กด

์‹คํ—˜์ด ์ฃผ๋กœ GR00T N1.5 + GR-1/ALLEX ์กฐํ•ฉ์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ VLA ์•„ํ‚คํ…์ฒ˜(ฯ€0, OpenVLA ๋“ฑ)๋‚˜ ๋‹ค๋ฅธ ๋กœ๋ด‡ ํ”Œ๋žซํผ์—์„œ์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์€ ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

โš ๏ธ ๊ณ„์‚ฐ ๋น„์šฉ

Best-of-N ์ƒ˜ํ”Œ๋ง์€ N๋ฐฐ์˜ ๋น„๋””์˜ค ์ƒ์„ฑ + N๋ฐฐ์˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋ฆฌํ”Œ๋ ˆ์ด + N๋ฐฐ์˜ ํ”„๋กœ๋ธŒ ์ถ”๋ก ์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค. N=5๋ผ๋ฉด 5๋ฐฐ์˜ ๋น„์šฉ์ด ๋“œ๋Š” ์…ˆ์ด์ฃ . ๋Œ€๊ทœ๋ชจ๋กœ ์ ์šฉํ•  ๋•Œ์˜ ํšจ์œจ์„ฑ ๋ถ„์„์ด ๋” ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.


์‹œ์‚ฌ์ ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ

์‹ค๋ฌด์ž๋ฅผ ์œ„ํ•œ ํ•ต์‹ฌ ๊ตํ›ˆ

  1. ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋Š” โ€œ์–‘โ€๋งŒํผ โ€œ์งˆโ€์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ฌด์กฐ๊ฑด ๋งŽ์ด ๋งŒ๋“œ๋Š” ๊ฒƒ๋ณด๋‹ค, ์ž˜ ๋งŒ๋“ค์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ์„ ๋ณ„ํ•˜๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์— ๋” ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.

  2. ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์€ ๊ณต์งœ ์ ์‹ฌ์— ๊ฐ€๊น์Šต๋‹ˆ๋‹ค. I2I/V2V ๊ฐ™์€ ๋น„๊ต์  ๊ฐ„๋‹จํ•œ ๊ธฐ๋ฒ•์œผ๋กœ๋„ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ VLA ๋ชจ๋ธ์˜ ์‹œ๊ฐ์  ์ผ๋ฐ˜ํ™”(visual generalization) ๋Šฅ๋ ฅ์„ ํ‚ค์šฐ๋Š” ๋ฐ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.

  3. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๋ฉด ํ™œ์šฉํ•˜์„ธ์š”. ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์šฉ์ด ์•„๋‹ˆ๋”๋ผ๋„, ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ์šฉ์œผ๋กœ์„œ์˜ ๊ฐ€์น˜๊ฐ€ ํฝ๋‹ˆ๋‹ค.

  4. ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์™€ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋Š” ๋ณ„๋„์˜ embodiment tag๋กœ ๋ถ„๋ฆฌํ•˜์„ธ์š”. ๊ฐ™์€ ๋กœ๋ด‡์ด๋ผ๋„ ๋ฐ์ดํ„ฐ ์†Œ์Šค์— ๋”ฐ๋ฅธ ๋ถ„ํฌ ์ฐจ์ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์—ด๋ฆฐ ์—ฐ๊ตฌ ์งˆ๋ฌธ๋“ค

  • ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์—†์ด๋„ ์•ก์…˜ ํ’ˆ์งˆ์„ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์„๊นŒ? ์˜ˆ๋ฅผ ๋“ค์–ด ํ•™์Šต๋œ ์›”๋“œ ๋ชจ๋ธ(learned world model)์„ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ๋Œ€์‹  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”?
  • ํ•„ํ„ฐ๋ง ๋Œ€์‹  ์ƒ์„ฑ ์ž์ฒด๋ฅผ ๊ฐœ์„ ํ•  ์ˆ˜๋Š” ์—†์„๊นŒ? ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ •ํ™•ํ•œ ๋น„๋””์˜ค๋ฅผ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด, ํ•„ํ„ฐ๋ง์€ ๋ถˆํ•„์š”ํ•ด์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ์Šค์ผ€์ผ๋ง ๋ฒ•์น™์€? Neural Trajectory์˜ ์–‘๊ณผ ํ’ˆ์งˆ์ด ํ•˜๋ฅ˜ ์ •์ฑ… ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์˜ ์Šค์ผ€์ผ๋ง ๋ฒ•์น™(scaling law)์€ ์•„์ง ์™„์ „ํžˆ ๊ทœ๋ช…๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

์š”์•ฝ ๋ฐ ๊ฒฐ๋ก 

RoboCurate๋Š” ํ•ฉ์„ฑ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ ๊ด€๋ฆฌ(curation)๋ผ๋Š”, ์ค‘์š”ํ•˜์ง€๋งŒ ์ƒ๋Œ€์ ์œผ๋กœ ๊ฐ„๊ณผ๋˜์–ด์˜จ ๋ฌธ์ œ์— ๋Œ€ํ•œ ์ฒด๊ณ„์ ์ธ ํ•ด๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ธฐ์—ฌ๋ฅผ ํ•œ ์ค„๋กœ ์š”์•ฝํ•˜๋ฉด:

โ€œ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ์‹ฌํŒ๊ด€์œผ๋กœ ํ™œ์šฉํ•˜์—ฌ, ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ์ด ๋งŒ๋“  ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์˜ ์•ก์…˜ ์ •ํ™•์„ฑ์„ ๊ฒ€์ฆํ•˜๊ณ , ๋™์‹œ์— I2I/V2V ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์‹œ๊ฐ์  ๋‹ค์–‘์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•œ๋‹ค.โ€

์ด ์ ‘๊ทผ๋ฒ•์€ ์„ธ ๊ฐ€์ง€ ๋ฒค์น˜๋งˆํฌ์—์„œ ์ผ๊ด€๋œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ํŠนํžˆ ์‹ค์„ธ๊ณ„ ํœด๋จธ๋…ธ์ด๋“œ ์‹คํ—˜์—์„œ์˜ ๋Œ€ํญ์ ์ธ ๊ฐœ์„ (+179.9%)์€ ์ด ์—ฐ๊ตฌ์˜ ์‹ค์šฉ์  ๊ฐ€์น˜๋ฅผ ์ž˜ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Neural Trajectory๊ฐ€ ๋กœ๋ด‡ ํ•™์Šต ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์˜ ํ•ต์‹ฌ ์ถ•์œผ๋กœ ์ž๋ฆฌ์žก๊ณ  ์žˆ๋Š” ์ง€๊ธˆ, RoboCurate๊ฐ€ ์ œ์‹œํ•œ โ€œ์ƒ์„ฑ ํ›„ ๊ฒ€์ฆ(generate-then-verify)โ€ ํŒจ๋Ÿฌ๋‹ค์ž„์€ ์•ž์œผ๋กœ์˜ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์—ฐ๊ตฌ์— ์ค‘์š”ํ•œ ์ด์ •ํ‘œ๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์ด ๋‚จ๊ธฐ๋Š” ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ฉ”์‹œ์ง€๋Š” ์ด๊ฒƒ์ž…๋‹ˆ๋‹ค: ์ข‹์€ ๋ฐ์ดํ„ฐ๋Š” ์ข‹์€ ํ•„ํ„ฐ์—์„œ ๋‚˜์˜จ๋‹ค.


๋…ผ๋ฌธ ์ •๋ณด

  • ์ œ๋ชฉ: RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning
  • ์ €์ž: Seungku Kim ์™ธ 5์ธ (๊ต์‹ ์ €์ž: Suhyeok Jang)
  • ๋ฐœํ‘œ: arXiv:2602.18742, 2026๋…„ 2์›” 21์ผ
  • ๋งํฌ: https://arxiv.org/abs/2602.18742

Copyright 2026, JungYeon Lee