Curieux.JY
  • JungYeon Lee
  • Post
  • ๐Ÿ•ธ๏ธ Graph
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ํ•œ ์ค„๋กœ ์‹œ์ž‘ํ•˜๋ฉด
    • ๋ฐฐ๊ฒฝ: ์™œ ๋‹ค์ง€ ์† ์ •๋ฐ€ ์กฐ๋ฆฝ์ด ์–ด๋ ค์šด๊ฐ€
    • ๋ฐฉ๋ฒ• ์ƒ์„ธ
      • 1. Dexterous Play Pretraining โ€” 4๊ฐ€์ง€ ์„ค๊ณ„ ์ถ•
      • 2. RL Finetuning on Assembly โ€” CAD์—์„œ sparse ๋ณด์ƒ ๋ฝ‘๊ธฐ
      • 3. ํ•™์ŠตยทSim-to-Real ์„ธ๋ถ€
    • ์ง๊ด€: play๊ฐ€ ์™œ ์กฐ๋ฆฝ ํƒ์ƒ‰์„ ํ‘ธ๋Š”๊ฐ€
    • ์‹คํ—˜
      • 4.1 Dense ๋ณด์ƒ์ด play๋ฅผ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ๋‚˜ โ€” ์•„๋‹ˆ์˜ค
      • 4.2 ์–ด๋–ค ์„ค๊ณ„ ์„ ํƒ์ด ์ค‘์š”ํ•œ๊ฐ€
      • 4.3 ์ •๋ฐ€ ์กฐ๋ฆฝ์— ํŒŒ์ธํŠœ๋‹์ด ํ•„์š”ํ•œ๊ฐ€ โ€” ๊ทธ๋ ‡๋‹ค
      • 4.4 Sim-to-Real (zero-shot)
      • ๋ณด์กฐ: claude-curio ๋…๋ฆฝ ์žฌํ˜„ (offline eval)
    • ๋น„ํŒ์ ์œผ๋กœ ๋ณด๋ฉด
      • ๊ฐ•์ 
      • ์•ฝ์ ยทํ•œ๊ณ„
    • ๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ์ž๋ฆฌ ๋งค๊น€
    • ์š”์•ฝ

๐Ÿ“ƒPlay2Perfect

dexterity
rl
sim2real
assembly
in-hand-reorientation
contact
pretraining
IsaacSim
Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?
Published

July 2, 2026

  • Paper Link (arXiv:2606.26428)

  • Project Page

  • Code (GitHub, MIT)

  • ์ €์ž: Tyler Ga Wei Lum*, Kushal Kedia*, C. Karen Liuโ€ , Jeannette Bohgโ€  (Stanford University, Cornell University)

  • arXiv preprint, 2026 (* equal contribution, โ€  equal advising)

  1. ๐Ÿ’ก ๋‹ค์ง€(multi-fingered) ์†์œผ๋กœ ์ •๋ฐ€ ์กฐ๋ฆฝ(precise assembly)์„ ํ’€๊ธฐ ์œ„ํ•ด, ์กฐ๋ฆฝ์„ ๋ฐ”๋กœ ๋ฐฐ์šฐ์ง€ ๋ง๊ณ  ๋จผ์ € ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋ฅผ ๊ฐ–๊ณ  ๋…ธ๋Š” ๋ฒ•(play)์„ goal-conditioned RL๋กœ ์‚ฌ์ „ํ•™์Šตํ•œ ๋’ค sparse-reward๋กœ ์กฐ๋ฆฝ์— ํŒŒ์ธํŠœ๋‹ํ•˜์ž๋Š” 2๋‹จ๊ณ„ ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค.
  2. โš™๏ธ IsaacSim์—์„œ ์ ˆ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑํ•œ ๋‹ค์–‘ํ•œ primitive ๋ฌผ์ฒด๋ฅผ ๋ฌด์ž‘์œ„ 6D ๋ชฉํ‘œ ์ž์„ธ๋กœ ์˜ฎ๊ธฐ๋Š” play ์ •์ฑ…์„ ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ(24,576 env) RL๋กœ ํ•™์Šตํ•˜๊ณ , CAD๋ฅผ โ€œassembly-by-disassemblyโ€๋กœ ๋’ค์ง‘์–ด ๋งŒ๋“  sparse ๋ชฉํ‘œ ์‹œํ€€์Šค์— ์ด prior๋ฅผ ์–น์–ด ์ ‘์ด‰์ด ๋งŽ์€ ์กฐ๋ฆฝ์„ ํŠนํ™”ํ•œ๋‹ค.
  3. ๐ŸŽฏ denseยทmulti-stage ๋ณด์ƒ์„ ์ค€ scratch RL๋ณด๋‹ค 33๋ฐฐ ํ‘œ๋ณธ ํšจ์œจ์ ์ด๋ฉฐ, zero-shot sim-to-real๋กœ 0.5 mm ์—ฌ์œ ์˜ tight insertion์„ 60%, ์žฅ๊ธฐ ๋‹ค๋ถ€ํ’ˆ ์กฐ๋ฆฝยท์Šคํฌ๋ฅ˜์ž‰์„ 50% ์ด์ƒ ์„ฑ๊ณต์‹œํ‚จ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ •๋ฐ€ ์กฐ๋ฆฝ(precise assembly)์€ ๋‹ค์ง€ ์† ๋กœ๋ด‡์—๊ฒŒ ๋‘ ๊ฐˆ๋ž˜๋กœ ์–ด๋ ต๋‹ค. ์ ‘์ด‰์ด ๋งŽ์•„(contact-rich) ์›๊ฒฉ์กฐ์ž‘์œผ๋กœ ์‹œ์—ฐ์„ ๋ชจ์œผ๊ธฐ ํž˜๋“ค์–ด imitation learning์ด ๋ง‰ํžˆ๊ณ , ๋ณด์ƒ์ด ๋ถ€ํ’ˆ์˜ ์ตœ์ข… ์ž์„ธ๋กœ๋งŒ ์ •์˜๋˜๋Š” sparse-reward๋ผ ์ฒ˜์Œ๋ถ€ํ„ฐ์˜ RL ํƒ์ƒ‰์ด ์‚ฌ์‹ค์ƒ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ทธ๋ž˜์„œ ๊ธฐ์กด ์—ฐ๊ตฌ๋Š” ์ „์šฉ ๊ทธ๋ฆฌํผยทํˆดยทํ”ฝ์Šค์ฒ˜๋กœ ๋ฌธ์ œ๋ฅผ โ€œ๊ตฌ์กฐํ™”โ€ํ•ด ์šฐํšŒํ•ด ์™”๋‹ค. ์ด ๋…ผ๋ฌธ์˜ ์ฃผ์žฅ์€ ๋‹จ์ˆœํ•˜๋‹ค โ€” ์กฐ๋ฆฝ์„ ์™„๋ฒฝํžˆ ํ•ด๋‚ด๊ธฐ(perfect) ์ „์—, ๋กœ๋ด‡์€ ๋จผ์ € ๋ฌผ์ฒด๋ฅผ ๊ฐ–๊ณ  ๋…ธ๋Š” ๋ฒ•(play)์„ ๋ฐฐ์›Œ์•ผ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  โ€œplay์˜ ์–ด๋–ค ์š”์†Œ๊ฐ€ ์กฐ๋ฆฝ์œผ๋กœ ์ „์ด๋˜๋Š”๊ฐ€?โ€๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ํŒŒํ—ค์นœ๋‹ค.


๊ฐœ์š”(Fig. 1) โ€” ํ•˜๋‚˜์˜ goal-conditioned play ์ •์ฑ…์„ ์‚ฌ์ „ํ•™์Šตํ•ด graspยทin-hand reorientationยท6D pose ์ œ์–ด์˜ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ prior๋ฅผ ์–ป๊ณ , ์ด๋ฅผ CAD ๊ธฐ๋ฐ˜ sparse-reward ์กฐ๋ฆฝ ํ™˜๊ฒฝ(tight insertionยทscrewingยทmulti-part assembly)์— ํŒŒ์ธํŠœ๋‹ํ•œ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก :

(1) Dexterous Play Pretraining. play๋ฅผ goal-conditioned RL๋กœ ์ •์‹ํ™”ํ•œ๋‹ค. ์ •์ฑ… \pi_\theta(\bm{s}_t, \bm{o}_t, \bm{g}_t, \bm{\phi})๋Š” ๋กœ๋ด‡ proprioception \bm{s}_t, ํ˜„์žฌยท๋ชฉํ‘œ ๋ฌผ์ฒด ์ž์„ธ \bm{o}_t, \bm{g}_t \in SE(3), ๊ทธ๋ฆฌ๊ณ  3D bounding-box ํฌ๊ธฐ๋กœ ์ธ์ฝ”๋”ฉํ•œ ๊ธฐํ•˜ \bm{\phi}๋ฅผ ๋ฐ›์•„ ํŒ”+์†์„ ํ•จ๊ป˜ ์ œ์–ดํ•œ๋‹ค. ๋ฌผ์ฒด๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ด์–ด์ง€๋Š” 6D ๋ชฉํ‘œ ์ž์„ธ๋“ค๋กœ ์˜ฎ๊ธฐ๊ฒŒ ํ•˜๋Š”๋ฐ, ์ฒซ ๋ชฉํ‘œ๋Š” ์ง‘์–ด ๋“œ๋Š”(grasp+lift) ๊ฒƒ์ด๊ณ  ์ดํ›„ ๋ชฉํ‘œ๋“ค์€ ์† ์•ˆ์—์„œ ์ž์„ธ๋ฅผ ๋ฐ”๊พธ๋Š”(in-hand reorientation) ๊ฒƒ์„ ๊ฐ•์ œํ•œ๋‹ค. ๋ณด์ƒ์€ r = r_{\mathrm{smooth}} + r_{\mathrm{grasp}} + \mathbb{I}_{\mathrm{grasped}}\, r_{\mathrm{goal}} ๋กœ, ๋ชฉํ‘œ ๋„๋‹ฌ์€ keypoint ๊ธฐ๋ฐ˜ 6D ์ž์„ธ ๊ฑฐ๋ฆฌ๋กœ ํŒ์ •ํ•œ๋‹ค. ๋ฌผ์ฒด dim \mathbf{s}์—์„œ ์ •์˜ํ•œ 4๊ฐœ keypoint๋ฅผ world๋กœ ์˜ฎ๊ธด ๋’ค d(o,g) = \max_i \lVert \mathbf{o}_i - \mathbf{g}_i \rVert_2,\qquad d(\bm{o}_t,\bm{g}_t) < \epsilon\ (\epsilon = 1\,\mathrm{cm}) ์ด ์ž„๊ณ„์น˜ ์•ˆ์— ๋“ค๋ฉด sparse success bonus๋ฅผ ์ค€๋‹ค. ์ด ํ•˜๋‚˜์˜ ๊ฑฐ๋ฆฌ๋กœ translation๊ณผ rotation ์˜ค์ฐจ๋ฅผ ๋™์‹œ์— ์žก๋Š”๋‹ค.

(2) RL Finetuning on Assembly. ์กฐ๋ฆฝ CAD๋ฅผ assembly-by-disassembly๋กœ ๋’ค์ง‘์–ด ๋ชฉํ‘œ ์‹œํ€€์Šค๋ฅผ ๋งŒ๋“ ๋‹ค. ๋ถ€ํ’ˆ p^i์˜ fixture f^i ์•ˆ ์ƒ๋Œ€๋ณ€ํ™˜ \bm{T}^{f}_{p}๊ฐ€ CAD์— ์ฃผ์–ด์ง€๋ฏ€๋กœ, ํ˜„์žฌ fixture ์ž์„ธ \bm{f}_t^i์— ๋Œ€ํ•ด ์ตœ์ข… ๋ชฉํ‘œ๋ฅผ \bm{g}^i_M = \bm{f}^i_t\, \bm{T}^{f}_{p} ๋กœ ๊ณ„์‚ฐํ•˜๊ณ (ํ”ฝ์Šค์ฒ˜ ์œ„์น˜ ๋ฌด์ž‘์œ„ํ™”์— ๋ถˆ๋ณ€), ์‚ฝ์ž… ์ง์ „ pre-insertion pose๋‚˜ ์Šคํฌ๋ฅ˜ ๋‚˜์‚ฌ์„  90^\circ ๊ฐ„๊ฒฉ ๊ฐ™์€ ์†Œ์ˆ˜์˜ sparse ์ค‘๊ฐ„ ์ ‘์ด‰ ๋ชฉํ‘œ๋ฅผ ๋ถ™์ธ๋‹ค. ํŒŒ์ธํŠœ๋‹ ๋ณด์ƒ์€ graspยทliftยท์ •๋ ฌ ๋“ฑ shaping์„ ๋ชจ๋‘ ์ œ๊ฑฐํ•˜๊ณ  r_t = r_{\mathrm{smooth}} + r_{\mathrm{goal}}์˜ sparse bonus๋งŒ ๋‚จ๊ธด๋‹ค โ€” ์ ‘๊ทผยทํŒŒ์ง€ยท์ •๋ ฌ์€ ์ „๋ถ€ play prior์—์„œ ๋ฌผ๋ ค๋ฐ›์•„์•ผ ํ•œ๋‹ค.

์ฃผ์š” ๊ฒฐ๊ณผ:

  • ํ‘œ๋ณธ ํšจ์œจ. denseยทmulti-stage ๋ณด์ƒ์„ ๋ฐ›์€ scratch์กฐ์ฐจ 4๊ฐœ ์กฐ๋ฆฝ ํƒœ์Šคํฌ์—์„œ 24์‹œ๊ฐ„ ๋’ค์—๋„ ์„ฑ๊ณต rollout์ด 0์ธ ๋ฐ˜๋ฉด, Play2Perfect๋Š” 2โ€“5์‹œ๊ฐ„์— ๋†’์€ ์„ฑ๊ณต๋ฅ ์— ๋„๋‹ฌ. ๋‹จ์ˆœํ™”ํ•œ fixtured ํƒœ์Šคํฌ์—์„œ scratch(dense)๋Š” near-perfect๊นŒ์ง€ 100์‹œ๊ฐ„+๊ฐ€ ํ•„์š”ํ•˜์ง€๋งŒ Play2Perfect๋Š” 4์‹œ๊ฐ„ โ†’ 33๋ฐฐ ๊ฐ€์†.
  • ๊ฐ•๊ฑด์„ฑ. scratch(dense)๊ฐ€ ๋ฐฐ์šด ์ „๋žต์€ ๋ฌผ์ฒด๋ฅผ ์—„์ง€๋กœ โ€œ๊ท ํ˜• ์žก๋Š”โ€ brittleํ•œ ํŽธ๋ฒ•์ด๋ผ 10 N ์™ธ๋ ฅ์—์„œ ์„ฑ๊ณต๋ฅ  $$20%, ๋” ํฐ ์™ธ๋ ฅ์—” 0%. Play2Perfect๋Š” ๊ฐ€์žฅ ํฐ ์™ธ๋ ฅ์—์„œ๋„ 75%+ ์œ ์ง€.
  • ์ •๋ฐ€๋„(ํŒŒ์ธํŠœ๋‹์˜ ํ•„์š”์„ฑ). ํŒŒ์ธํŠœ๋‹ ์—†๋Š” Play-only๋Š” sim์—์„œ 40 mm ์—ฌ์œ ์— 75%์ง€๋งŒ 4 mm์—์„œ ๊ฑฐ์˜ 0%. Play2Perfect๋Š” 4 mm 95%, 1 mm 92%, ํ•™์Šต ๋ถ„ํฌ๋ณด๋‹ค tightํ•œ 0.2 mm์—์„œ๋„ 80%.
  • Sim-to-real(zero-shot). Tight-Insertion 10 mm 10/10, 2 mm 9/10, 0.5 mm 6/10. Assemble-Beam Step1 8/10ยทStep2 7/10, Screw-Leg ์‚ฝ์ž… 7/10ยท์Šคํฌ๋ฅ˜ 5/10. ์™„๋ฃŒ ์‹œ๊ฐ„์€ ์ ‘๊ทผยทํŒŒ์ง€ยท์šด๋ฐ˜ยท์ ‘์ด‰๊นŒ์ง€ ํฌํ•จํ•ด 6.8โ€“15.6์ดˆ.

๊ฒฐ๋ก : ์—ฌ๋Ÿฌ ablation์„ ๊ด€ํ†ตํ•˜๋Š” ํ•˜๋‚˜์˜ ๊ตํ›ˆ โ€” play ์‚ฌ์ „ํ•™์Šต์€ โ€œ๋ฌผ์ฒด๋ฅผ ์ง‘์–ด ์˜ฎ๊ธฐ๋Š”โ€ ๊ฒƒ์„ ๋ฐฐ์šฐ๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ ์†๊ฐ€๋ฝ์œผ๋กœ ์ •๋ฐ€ํ•œ 6D in-hand ์ œ์–ด๋ฅผ ๋ฐฐ์šฐ๊ฒŒ ๋งŒ๋“ค ๋•Œ ์กฐ๋ฆฝ์œผ๋กœ ๊ฐ€์žฅ ์ž˜ ์ „์ด๋œ๋‹ค. ๊ณ ์ • ํŒŒ์ง€๋กœ ํŒ”๋งŒ ์›€์ง์ด๋Š” play๋Š” ์“ธ๋ชจ๊ฐ€ ์ ๋‹ค.


๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

ํ•œ ์ค„๋กœ ์‹œ์ž‘ํ•˜๋ฉด

โ€œ์กฐ๋ฆฝ์„ ์™„๋ฒฝํžˆ ํ•˜๊ธฐ(Perfect) ์ „์— ๋จผ์ € ๋†€์•„๋ผ(Play)โ€ โ€” ํƒœ์Šคํฌ์— ๋ฌด์ง€ํ•œ dexterous play prior๋ฅผ RL๋กœ ์‚ฌ์ „ํ•™์Šตํ•˜๊ณ , ๊ทธ๊ฒƒ์„ sparse-reward RL๋กœ ์ •๋ฐ€ ์กฐ๋ฆฝ์— ํŠนํ™”ํ•˜๋Š” 2๋‹จ๊ณ„ ๋ ˆ์‹œํ”ผ์ด์ž, ๊ทธ ๋ ˆ์‹œํ”ผ์˜ ์–ด๋–ค ์„ค๊ณ„ ์„ ํƒ์ด ์ „์ด์— ์ค‘์š”ํ•œ๊ฐ€์— ๋Œ€ํ•œ ์ฒด๊ณ„์  ์—ฐ๊ตฌ๋‹ค.

๋ฐฐ๊ฒฝ: ์™œ ๋‹ค์ง€ ์† ์ •๋ฐ€ ์กฐ๋ฆฝ์ด ์–ด๋ ค์šด๊ฐ€

์‚ฌ๋žŒ ์† ์ˆ˜์ค€์˜ ์†๋„ยท์†์žฌ์ฃผ๋ฅผ ๋…ธ๋ฆฌ๋Š” ๋‹ค์ง€ ์† ๋กœ๋ด‡์€ ๋งŽ์€ ์ž์œ ๋„๋ฅผ ์ ‘์ด‰์„ ํ†ตํ•ด ์ œ์–ดํ•ด์•ผ ํ•ด์„œ, ์ •๋ฐ€ ์กฐ๋ฆฝ ๊ฐ™์€ ์˜์—ญ์ด ํ˜„ํ–‰ ๋กœ๋ด‡ ํ•™์Šต์˜ ์‚ฌ๊ฐ์ง€๋Œ€๋กœ ๋‚จ์•„ ์žˆ์—ˆ๋‹ค. ์ €์ž๋“ค์€ ๋‘ ๊ฐˆ๋ž˜์˜ ๋ฒฝ์„ ์ง€์ ํ•œ๋‹ค.

  • Imitation learning ์ชฝ: ์กฐ๋ฆฝ์€ contact-rich๋ผ ์›๊ฒฉ์กฐ์ž‘์ด ์–ด๋ ต๋‹ค. ์กฐ์ž‘์ž์™€ ๋กœ๋ด‡์˜ embodiment gap, ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ ๋ถ€์žฌ ๋•Œ๋ฌธ์— ์ ‘์ด‰ ๋งŽ์€ ํƒœ์Šคํฌ์˜ ๊ณ ํ’ˆ์งˆ ์‹œ์—ฐ ์ˆ˜์ง‘์ด ํž˜๋“ค๋‹ค. ๊ทธ๋ž˜์„œ ๋Œ€๋ถ€๋ถ„์˜ ๋‹ค์ง€ ์† IL์€ ์ •๋ฐ€๋„๊ฐ€ ๋‚ฎ์€ pick-and-place์— ๋จธ๋ฌผ๋Ÿฌ ์žˆ๋‹ค.
  • Reinforcement learning ์ชฝ: ์กฐ๋ฆฝ ๋ณด์ƒ์€ ๋ถ€ํ’ˆ์˜ ์ตœ์ข… ์ž์„ธ๋กœ๋งŒ ์ •์˜๋˜๋Š” sparse-reward๋‹ค. ๋ฌด์ž‘์œ„ ์ •์ฑ…์—์„œ ์ถœ๋ฐœํ•œ ์—์ด์ „ํŠธ๋Š” grasp โ†’ in-hand reorientation โ†’ ์ •๋ ฌ โ†’ ์ ‘์ด‰ ์‚ฝ์ž…์„ ๋ชจ๋‘ ์šฐ์—ฐํžˆ ๋ฐœ๊ฒฌํ•ด์•ผ ์ฒซ ๋ณด์ƒ์„ ๋ฐ›๋Š”๋‹ค. dense reward shaping์ด ํ•„์š”ํ•œ sim-to-real RL์ด ์—ฌ๊ธฐ์„œ ๋ง‰ํžŒ๋‹ค.

๊ธฐ์กด ์ง„์ „์€ ๋ฌธ์ œ๋ฅผ โ€œ๊ตฌ์กฐํ™”โ€ํ•ด์„œ ์–ป์–ด์กŒ๋‹ค โ€” ์ปค์Šคํ…€ ํ”ฝ์Šค์ฒ˜๋กœ ํŒŒ์ง€ยท์‚ฝ์ž…์„ ๋‹จ์ˆœํ™”ํ•˜๊ฑฐ๋‚˜, ์ „์šฉ ํˆดยท์—”๋“œ์ดํŽ™ํ„ฐ๋กœ ์ œ์–ด๋ฅผ ์‰ฝ๊ฒŒ ๋งŒ๋“ค๊ฑฐ๋‚˜, ๋ณ‘๋ ฌ ๊ทธ๋ฆฌํผ๋กœ ์›๊ฒฉ์กฐ์ž‘์„ ๊ฐ€๋Šฅ์ผ€ ํ•ด ILยทRL ํŒŒ์ธํŠœ๋‹์„ ๋ถ™์ด๋Š” ์‹์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋“ค์€ ์กฐ๋ฆฝ๋งˆ๋‹ค ํ•˜๋“œ์›จ์–ดยทํ™˜๊ฒฝ ์—”์ง€๋‹ˆ์–ด๋ง์„ ์š”๊ตฌํ•˜๊ณ , ๋ณ‘๋ ฌ ๊ทธ๋ฆฌํผ๋Š” ์†๋„ยท์†์žฌ์ฃผ๋ฅผ ์ œํ•œํ•œ๋‹ค. Play2Perfect๋Š” ์ด ๊ตฌ์กฐํ™” ๋Œ€์‹  ์ผ๋ฐ˜์  play prior๋ฅผ ์‹ฌ์–ด ํƒ์ƒ‰ ๋ฌธ์ œ๋ฅผ ์šฐํšŒํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ โ€œplay๋กœ๋ถ€ํ„ฐ ํ•™์Šตโ€์ด๋ผ๋Š” ๊ฐœ๋… ์ž์ฒด๋Š” ์ƒˆ๋กญ์ง€ ์•Š๋‹ค(MimicPlay, Learning latent plans from play ๋“ฑ). ๋‹ค๋งŒ ๊ทธ ๋ ˆ์‹œํ”ผ์˜ ๋ฌด์—‡์ด ์ •๋ฐ€ ์กฐ๋ฆฝ ํŒŒ์ธํŠœ๋‹์— ์ค‘์š”ํ•œ์ง€๋Š” ๋ถˆ๋ถ„๋ช…ํ–ˆ๊ณ , ์ด ๋…ผ๋ฌธ์˜ ๊ธฐ์—ฌ๋Š” ๋ฐ”๋กœ ๊ทธ ์งˆ๋ฌธ์— ๋‹ตํ•˜๋Š” ๋ฐ ์žˆ๋‹ค.

๋ฐฉ๋ฒ• ์ƒ์„ธ

1. Dexterous Play Pretraining โ€” 4๊ฐ€์ง€ ์„ค๊ณ„ ์ถ•

play๋Š” goal-conditioned RL ๋ฌธ์ œ๋กœ ์„ธ์›Œ์ง„๋‹ค. ์ €์ž๋“ค์€ โ€œ๋ฌด์—‡์ด ์ค‘์š”ํ•œ๊ฐ€โ€๋ฅผ ๋„ค ์ถ•์œผ๋กœ ๋‚˜๋ˆ  ์„ค๊ณ„ํ•œ๋‹ค(Fig. 2).


Play ์‚ฌ์ „ํ•™์Šต์˜ ๋„ค ์ถ•(Fig. 2) โ€” Object Diversity(๋‹ค์–‘ํ•œ primitive ๋ฌผ์ฒด), Training Objective(6D pose ๋„๋‹ฌ + ์„ฑ๊ณต ํŒ์ • \mathbb{I}[d(\bm{o}_t,\bm{g}_t)<\epsilon]), Trajectory Diversity(๋ฌด์ž‘์œ„ goal ๊ถค์ ), Goal Precision(์ž‘์€ ์ž„๊ณ„์น˜ \epsilon).
  • Object Diversity. cuboidยทcylinder(์ •ํ™•ํžˆ๋Š” ๋‘ ๊ฐœ์˜ cuboid/capsule primitive๋ฅผ ๊ฐ•์ฒด ๊ฒฐํ•ฉ) primitive๋ฅผ ์ ˆ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑํ•œ๋‹ค. ์ฃผ ์„ฑ๋ถ„ ๊ธธ์ดยท๋‹จ๋ฉด์€ [5,30] cm, ๋ถ€ ์„ฑ๋ถ„์€ ๊ธธ์ด [1,15] cm์—์„œ ์ƒ˜ํ”Œ๋งํ•˜๊ณ , ๋ฐ€๋„๋ฅผ ์„ฑ๋ถ„๋ณ„๋กœ ๋ฌด์ž‘์œ„ํ™”([300,600]ยท[300,2000]\,\mathrm{kg/m^3})ํ•ด ๋ฌด๊ฒŒ์ค‘์‹ฌยท๊ด€์„ฑ์„ ํ”๋“ ๋‹ค. ๋‹จ์ผ ๊ธฐํ•˜ยท์งˆ๋Ÿ‰์— ๋ฌถ์ด์ง€ ์•Š๋Š” ์ œ์–ด ์ „๋žต์„ ๊ฐ•์ œํ•˜๋ ค๋Š” ๊ฒƒ.
  • Training Objective. ๋ฌผ์ฒด๋ฅผ 6D ๋ชฉํ‘œ ์ž์„ธ๋“ค์˜ ์‹œํ€€์Šค๋กœ ์˜ฎ๊ธฐ๊ฒŒ ํ•œ๋‹ค. ์ฒซ ๋ชฉํ‘œ๋Š” grasp+lift, ์ดํ›„๋Š” ์† ์•ˆ์—์„œ์˜ ์ž์„ธ ์ œ์–ด. translation์€ ์ž‘์—…๊ณต๊ฐ„ ์ด๋™์„, rotation์€ in-hand reorientation์„ ๊ฐ€๋ฅด์นœ๋‹ค. ๊ธฐ๋ณธ์€ keypoint ๊ธฐ๋ฐ˜ 6D ์ž์„ธ ๊ฑฐ๋ฆฌ d_{\mathrm{pose}}๋ฅผ ์“ด๋‹ค.
  • Trajectory Diversity. ๊ณ ์ • ๊ถค์ ์ด ์•„๋‹ˆ๋ผ ๋งค ์—ํ”ผ์†Œ๋“œ goal ์‹œํ€€์Šค๋ฅผ ๋ฌด์ž‘์œ„ ์ƒ์„ฑํ•œ๋‹ค. ์ฒซ ๋ชฉํ‘œ๋Š” ์ž‘์—…๊ณต๊ฐ„์—์„œ ๋„“๊ฒŒ, ์ดํ›„ ๋ชฉํ‘œ๋Š” ์ด์ „ ๋ชฉํ‘œ ๊ทผ์ฒ˜์—์„œ ํฐ ํšŒ์ „(\le 90^\circ)ยท์ž‘์€ ์ด๋™(\le 0.1 m)์œผ๋กœ ์ƒ˜ํ”Œ๋ง โ†’ ๊ณ ์ • ํŒŒ์ง€์˜ ํŒ” ์ด๋™์ด ์•„๋‹ˆ๋ผ ๋ฐ˜๋ณต์  in-hand reorientation์„ ์œ ๋„.
  • Goal Precision. ์„ฑ๊ณต ์ž„๊ณ„์น˜ \epsilon(๊ธฐ๋ณธ 1 cm)์ด ํ•™์Šต ์ •๋ฐ€๋„๋ฅผ ์กฐ์ ˆํ•œ๋‹ค. ์ž‘์„์ˆ˜๋ก in-hand๋กœ ์ž์„ธ๋ฅผ ์ •๋ฐ€ ์ œ์–ดํ•ด์•ผ ํ•ด tight-clearance ์กฐ๋ฆฝ์— ๋งž๋Š” prior๊ฐ€ ์ƒ๊ธด๋‹ค.

Keypoint ๊ธฐ๋ฐ˜ ์ž์„ธ ํ‘œํ˜„(ํ•ต์‹ฌ ์ˆ˜์‹). ๊ฐ 6D ์ž์„ธ๋ฅผ ๋ฌผ์ฒด ๊ตญ์†Œ ํ”„๋ ˆ์ž„์˜ 4๊ฐœ keypoint๋กœ ํ‘œํ˜„ํ•œ๋‹ค. dim \mathbf{s}=[s_x,s_y,s_z]์— ๋Œ€ํ•ด \mathcal{K}(\mathbf{s}) = \left\{ \big[\tfrac{s_x}{2},\tfrac{s_y}{2},\tfrac{s_z}{2}\big],\ \big[\tfrac{s_x}{2},-\tfrac{s_y}{2},-\tfrac{s_z}{2}\big],\ \big[-\tfrac{s_x}{2},\tfrac{s_y}{2},-\tfrac{s_z}{2}\big],\ \big[-\tfrac{s_x}{2},-\tfrac{s_y}{2},\tfrac{s_z}{2}\big] \right\} ๊ฐ keypoint๋ฅผ \mathbf{o}_i = R_o \mathbf{k}_i + \mathbf{t}_o๋กœ world์— ์˜ฎ๊ธด ๋’ค ๊ฑฐ๋ฆฌ d(o,g) = \max_i \lVert \mathbf{o}_i - \mathbf{g}_i \rVert_2 ๋กœ translationยทrotation์„ ํ•˜๋‚˜์˜ ์Šค์นผ๋ผ๋กœ ํ•ฉ์นœ๋‹ค. ๊ด€์ธก์šฉ keypoint๋Š” ๋ฌผ์ฒด์˜ ์‹ค์ œ dim์œผ๋กœ, ๋ณด์ƒ ๊ณ„์‚ฐ์šฉ์€ ๊ณ ์ • dim \mathbf{s}^{\mathrm{rew}}=[0.14,0.03,0.03] m๋กœ ์ •์˜ํ•ด ๋ฌผ์ฒด๋งˆ๋‹ค translation/rotation trade-off๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•œ๋‹ค.

๋ณด์ƒ ํ•ญ. grasp ์ „์—” fingertip ์ ‘๊ทผยทlift๋ฅผ, grasp ํ›„(\mathbb{I}_{\mathrm{grasped}}=1, 10 cm ๋“ค์–ด์˜ฌ๋ฆฌ๋ฉด ์ผœ์ง)์—” ํ˜„์žฌ 6D ๋ชฉํ‘œ๋กœ์˜ ์ง„์ „์„ ๋ณด์ƒํ•œ๋‹ค. r_{\mathrm{goal}} = \lambda_{\mathrm{goal}} \max\!\big(d^{*} - d(o_t,g_t),\ 0\big) + B_{\mathrm{succ}}\,\mathbb{I}[d(o_t,g_t)<\epsilon] d^{*}๋Š” ํ˜„์žฌ ๋ชฉํ‘œ๊ฐ€ ์ƒ˜ํ”Œ๋ง๋œ ์ดํ›„ ๋„๋‹ฌํ•œ ์ตœ์†Œ ๊ฑฐ๋ฆฌ๋‹ค(potential-based ํ˜•ํƒœ). B_{\mathrm{succ}}=1000์˜ ํฐ sparse bonus๋กœ ๋ชฉํ‘œ๋ฅผ โ€œ์ฐ๊ณ โ€ ๋‹ค์Œ ๋ชฉํ‘œ๋กœ ๋„˜์–ด๊ฐ„๋‹ค.

2. RL Finetuning on Assembly โ€” CAD์—์„œ sparse ๋ณด์ƒ ๋ฝ‘๊ธฐ


Assembly-by-Disassembly(Fig. 3) โ€” ์™„์„ฑ๋œ CAD ์กฐ๋ฆฝ์—์„œ ๋ถ€ํ’ˆ์„ ์ˆœ์ฐจ ์ œ๊ฑฐํ•ด disassembly ์ˆœ์„œ๋ฅผ ๋งŒ๋“ค๊ณ , ๊ทธ๊ฒƒ์„ ๋’ค์ง‘์–ด ์กฐ๋ฆฝ ๋‹จ๊ณ„๋ณ„ sparse ๋ชฉํ‘œ ์‹œํ€€์Šค(์ตœ์ข… ์กฐ๋ฆฝ ์ž์„ธ + pre-insert ๊ฐ™์€ ์ค‘๊ฐ„ ์ ‘์ด‰ ๋ชฉํ‘œ)๋ฅผ ์–ป๋Š”๋‹ค.

๊ฐ ์กฐ๋ฆฝ ํƒœ์Šคํฌ๋Š” CAD์˜ K๊ฐœ ๊ฐ•์ฒด ๋ถ€ํ’ˆ \mathcal{A}=\{p^i\}_{i=1}^K์™€ ์ตœ์ข… ์ž์„ธ๋กœ ์ •์˜๋œ๋‹ค. assembly-by-disassembly๋กœ ์ œ๊ฑฐ ๊ฐ€๋Šฅํ•œ ๋ถ€ํ’ˆ ์ˆœ์„œ๋ฅผ ์ฐพ์•„ ๋’ค์ง‘์œผ๋ฉด ์กฐ๋ฆฝ ์‹œํ€€์Šค๊ฐ€ ๋˜๊ณ , ๊ฐ ๋‹จ๊ณ„๋Š” ์ด๋ฏธ ์กฐ๋ฆฝ๋œ ๋ถ€ํ’ˆ์ด ๋งŒ๋“œ๋Š” fixture f^i์— ๋ถ€ํ’ˆ p^i๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋œ๋‹ค. ๊ฐ ๋‹จ๊ณ„๋ฅผ ๋ถ€ํ’ˆยทfixture ์ž์„ธ๋ฅผ ๋ฌด์ž‘์œ„ํ™”ํ•œ RL ํ™˜๊ฒฝ์œผ๋กœ ์ธ์Šคํ„ด์Šคํ™”ํ•œ๋‹ค.

  • Sparse ๋ณด์ƒ. CAD๊ฐ€ ์ฃผ๋Š” ์ƒ๋Œ€๋ณ€ํ™˜ \bm{T}^{f}_{p}๋กœ ์ตœ์ข… ๋ชฉํ‘œ \bm{g}^i_M = \bm{f}^i_t \bm{T}^{f}_{p}๋ฅผ ๊ณ„์‚ฐ(fixture ๋ฌด์ž‘์œ„ ๋ฐฐ์น˜์— ๋ถˆ๋ณ€). insertion์—” ์ ‘์ด‰ ์‹œ์ž‘์ ์˜ ์ •๋ ฌ๋œ pre-insertion ์ž์„ธ๋ฅผ, screwing์—” ๋‚˜์‚ฌ์„  ๋”ฐ๋ผ 90^\circ ๊ฐ„๊ฒฉ ๋ชฉํ‘œ๋ฅผ ๋ถ™์ธ๋‹ค.
  • shaping ์ œ๊ฑฐ. ํŒŒ์ธํŠœ๋‹ ๋ณด์ƒ์€ r_t = r_{\mathrm{smooth}} + r_{\mathrm{goal}}๋กœ, graspยทliftยทpose-progress ๋ณด์ƒ์„ ์ „๋ถ€ ๋บ€ sparse๋งŒ ๋‚จ๊ธด๋‹ค. ์ค‘๊ฐ„ ๋ชฉํ‘œ๋ฅผ \epsilon=1 cm ์•ˆ์— ๋„ฃ์œผ๋ฉด ๋‹ค์Œ ๋ชฉํ‘œ๋กœ ์ „์ง„, ์ตœ์ข… ๋ชฉํ‘œ ๋„๋‹ฌ์ด ์„ฑ๊ณต. ์ตœ์ข… ๋ชฉํ‘œ์—” ์†์„ ๋–ผ๊ณ  ๋ฌผ๋Ÿฌ๋‚˜๋Š” retraction bonus๋ฅผ ๋”ํ•ด(ํŒ” palm์ด ๋ฌผ์ฒด์—์„œ 0.2 m ์ด์ƒ ๋–จ์–ด์ง€๋ฉด) ์†์œผ๋กœ ๊ณ„์† ๋ถ™์žก์•„ ์ž์„ธ๋ฅผ ์œ ์ง€ํ•˜๋Š” ํŽธ๋ฒ•์„ ๋ง‰๋Š”๋‹ค.
  • ์ ‘์ด‰ ๊ธฐํ•˜. ๋Œ€๋ถ€๋ถ„ geometry๋Š” convex decomposition์œผ๋กœ ๊ทผ์‚ฌํ•˜์ง€๋งŒ, ์ด๋Š” ์ข์€ ๊ตฌ๋ฉยท๊ฒฐํ•ฉ๋ฉด์˜ ์œ ํšจ ์—ฌ์œ ๋ฅผ ์™œ๊ณกํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ์ ‘์ด‰์ด ๊ฒฐ์ •์ ์ธ ๊ตฌ๋ฉยท์‚ฝ์ž…๋ถ€๋งŒ ํ•ด์ƒ๋„ 256์˜ signed distance field(SDF)๋กœ ํ‘œํ˜„ํ•˜๋Š” hybrid ๋ฐฉ์‹์œผ๋กœ ์ •๋ฐ€ ์ถฉ๋Œ ๊ธฐํ•˜๋ฅผ ํ™•๋ณดํ•˜๋ฉด์„œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์•„๋‚€๋‹ค.

3. ํ•™์ŠตยทSim-to-Real ์„ธ๋ถ€

  • ์•Œ๊ณ ๋ฆฌ์ฆ˜. playยทfinetuning ๋ชจ๋‘ SAPG(Split and Aggregate Policy Gradients, PPO์˜ population ๊ธฐ๋ฐ˜ ๋ณ€ํ˜•)๋กœ ํ•™์Šต. ์„ ํ–‰ ์—ฐ๊ตฌ๊ฐ€ dexterous play์—์„œ PPO๋ณด๋‹ค ๋‚ซ๋‹ค๊ณ  ๋ณธ ๋ฐฉ๋ฒ•์ด๋‹ค. Actor๋Š” LSTM[1024]+MLP๋กœ interaction history๋ฅผ ํ†ตํ•ฉํ•ด ๋ฏธ๊ด€์ธก ๋ฌผ์ฒด ์†์„ฑ์„ ์ถ”๋ก ํ•˜๊ณ , asymmetric actorโ€“critic์œผ๋กœ critic๋งŒ privileged ์ •๋ณด(๋ฌด๋…ธ์ด์ฆˆยท๋ฌด์ง€์—ฐ ๊ด€์ธก, ์†๋„, progress feature)๋ฅผ ๋ณธ๋‹ค.
  • ํ•˜๋“œ์›จ์–ด. 22-DoF Sharpa 5์ง€ ์† + 7-DoF KUKA iiwa 14 ํŒ”(์ด 29 DoF). ์ •์ฑ…์€ 140์ฐจ์› ๊ด€์ธก์„ ๋ฐ›์•„ 29๊ฐœ ๊ด€์ ˆ ์œ„์น˜ ๋ช…๋ น ์ถœ๋ ฅ(ํŒ”์€ delta, ์†์€ absolute).
  • ์ž์›. IsaacSim, ๋‹จ์ผ NVIDIA RTX A6000. ๋ฌผ๋ฆฌ 120 Hzยท์ •์ฑ… 60 Hz. play ์‚ฌ์ „ํ•™์Šต์€ 24,576 ๋ณ‘๋ ฌ env๋กœ 7์ผ, ์กฐ๋ฆฝ ํŒŒ์ธํŠœ๋‹์€ 12,228 env๋กœ 1์ผ(์ ‘์ด‰ ๋ชจ๋ธ๋ง์ด ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋” ์จ env ์ˆ˜๋ฅผ ์ค„์ž„).
  • Domain randomization. action latency, proprioception ์ง€์—ฐ, ํ˜„์žฌยท๋ชฉํ‘œ ๋ฌผ์ฒด ์ž์„ธ ๋…ธ์ด์ฆˆ, ๋ฌผ์ฒด dim scale, ํ…Œ์ด๋ธ” ๋†’์ด, ์™ธ๋ ฅ/์™ธํ† ํฌ(20 Nยท2 Nยทm)๊นŒ์ง€ ๋ฌด์ž‘์œ„ํ™”.
  • ์‹ค์„ธ๊ณ„ ์ธ์ง€. ๋ฐฐํฌ ์‹œ CAD ๋ฉ”์‹œ๋ฅผ ์žฌ์‚ฌ์šฉํ•ด FoundationPose๋กœ ๋ถ€ํ’ˆยทfixture์˜ 6D ์ž์„ธ๋ฅผ ์ถ”์ ํ•œ๋‹ค. ์ •์ฑ…์€ 60 Hz closed-loop, ์ž์„ธ ์ถ”์ ์€ 30 Hz. ๋ณ„๋„์˜ scripted ์‚ฝ์ž…ยท์Šคํฌ๋ฅ˜ยท๋ณต๊ตฌ ์ปจํŠธ๋กค๋Ÿฌ๋Š” ์“ฐ์ง€ ์•Š๋Š”๋‹ค.

์ง๊ด€: play๊ฐ€ ์™œ ์กฐ๋ฆฝ ํƒ์ƒ‰์„ ํ‘ธ๋Š”๊ฐ€

sparse-reward ์กฐ๋ฆฝ์˜ ๋ณธ์งˆ์  ๋‚œ์ ์€ โ€œ์ฒซ ๋ณด์ƒ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌโ€๋‹ค. ๋ฌด์ž‘์œ„ ์ •์ฑ…์€ graspโ†’reorientโ†’์ •๋ ฌโ†’์‚ฝ์ž…์„ ๋ชจ๋‘ ์šฐ์—ฐํžˆ ์—ฎ์–ด์•ผ ์‹ ํ˜ธ๋ฅผ ์ฒ˜์Œ ๋ณธ๋‹ค. play prior๋Š” ์ด ์‚ฌ์Šฌ์˜ ์•ž๋ถ€๋ถ„(์•ˆ์ •์  ํŒŒ์ง€, ์† ์•ˆ 6D ์ž์„ธ ์ œ์–ด)์„ ์ด๋ฏธ ๋Šฅ์ˆ™ํ•˜๊ฒŒ ๋งŒ๋“ค์–ด, ํŒŒ์ธํŠœ๋‹์ด ๋งˆ์ง€๋ง‰ ์ ‘์ด‰ยท์ •๋ ฌ ์ƒํ˜ธ์ž‘์šฉ์—๋งŒ ํƒ์ƒ‰์„ ์ง‘์ค‘ํ•˜๊ฒŒ ํ•œ๋‹ค. ์ €์ž๋“ค์˜ ํ‘œํ˜„์œผ๋กœ๋Š”, prior๊ฐ€ โ€œํƒ์ƒ‰์„ ์„ฑ๊ณต์— ํ•„์š”ํ•œ ์ตœ์ข… contact-rich ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ์ขํžŒ๋‹คโ€. ๊ทธ๋ฆฌ๊ณ  ์™œ ํ•˜ํ•„ in-hand ์ œ์–ด๊ฐ€ ์ค‘์š”ํ•œ๊ฐ€ โ€” ๊ณ ์ • ํŒŒ์ง€๋กœ ํŒ”๋งŒ ์›€์ง์ด๋Š” skill์€ ์กฐ๋ฆฝ์˜ ์ •๋ ฌยท์žฌํŒŒ์ง€ยท๋‚˜์‚ฌ ํšŒ์ „์— ํ•„์š”ํ•œ ์†๊ฐ€๋ฝ ์ˆ˜์ค€ ๋ฏธ์„ธ ์ œ์–ด๋ฅผ ๋‹ด๊ณ  ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๊ฒƒ์ด ๋ชจ๋“  ablation์„ ๊ด€ํ†ตํ•˜๋Š” ํ•ต์‹ฌ ์ง๊ด€์ด๋‹ค.

์‹คํ—˜

๋„ค ์งˆ๋ฌธ์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค: โ‘  dense ๋ณด์ƒ์ด play๋ฅผ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ๋‚˜, โ‘ก play์˜ ์–ด๋–ค ์„ค๊ณ„๊ฐ€ ์ค‘์š”ํ•œ๊ฐ€, โ‘ข ์ •๋ฐ€ ์กฐ๋ฆฝ์— RL ํŒŒ์ธํŠœ๋‹์ด ๊ผญ ํ•„์š”ํ•œ๊ฐ€, โ‘ฃ ์‹ค์„ธ๊ณ„๋กœ ์ „์ด๋˜๋‚˜. ํƒœ์Šคํฌ๋Š” Tight-Insertion(T-peg), Assemble-Beam(Fabrica ๊ธฐ๋ฐ˜ ๋‹ค๋ถ€ํ’ˆ ๋น”), Screw-Leg(FurnitureBench ๊ธฐ๋ฐ˜ ๊ฐ€๊ตฌ ๋‹ค๋ฆฌ ์Šคํฌ๋ฅ˜). ์› ๋ถ€ํ’ˆ์ด ๋ณ‘๋ ฌ ๊ทธ๋ฆฌํผ์šฉ์œผ๋กœ ์ž‘์•„, ๋‹ค์ง€ ์†๊ณผ ์‹œ๊ฐ ์ถ”์ ์— ๋งž๊ฒŒ 3๋ฐฐ ํฌ๊ธฐ๋กœ 3D ํ”„๋ฆฐํŠธํ–ˆ๋‹ค. ์ง€ํ‘œ๋Š” ์„ฑ๊ณต๋ฅ (์ตœ์ข… ์ž์„ธ๋ฅผ \epsilon=1 cm ์•ˆ์— ๋„๋‹ฌ)๊ณผ ์™„๋ฃŒ ์‹œ๊ฐ„์ด๋ฉฐ, sim์€ 500 rollout, real์€ ๊ฐ 10 rollout.

4.1 Dense ๋ณด์ƒ์ด play๋ฅผ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ๋‚˜ โ€” ์•„๋‹ˆ์˜ค


ํ•™์Šต ํšจ์œจ(Fig. 4) โ€” ๋„ค ์กฐ๋ฆฝ ํƒœ์Šคํฌ์—์„œ Play2Perfect๋Š” ๊ณต์œ  prior๋กœ๋ถ€ํ„ฐ 2โ€“5์‹œ๊ฐ„์— ๋†’์€ ์„ฑ๊ณต๋ฅ ์— ๋„๋‹ฌ. scratch๋Š” sparse๋“  dense๋“  24์‹œ๊ฐ„ ๋’ค์—๋„ ์ง„์ „ 0.

๋„ค ํƒœ์Šคํฌ ๋ชจ๋‘์—์„œ ๋‘ scratch baseline(sparseยทdense)์€ 24์‹œ๊ฐ„ ๋’ค์—๋„ ์„ฑ๊ณต rollout์ด ์—†๋‹ค. ๋‹จ์ˆœํ™”ํ•œ Tight-Insertion(Fixtured) ํƒœ์Šคํฌ(T-peg๋ฅผ ํ”ฝ์Šค์ฒ˜์— ์„ธ์›Œ ์‹œ์ž‘)์—์„œ๋งŒ scratch๊ฐ€ ํ•™์Šต ๊ฐ€๋Šฅํ•ด์ง€๋Š”๋ฐ, ๊ทธ๋งˆ์ € scratch(dense)๋Š” near-perfect๊นŒ์ง€ 100์‹œ๊ฐ„+๊ฐ€ ํ•„์š”ํ•˜๊ณ  Play2Perfect๋Š” 4์‹œ๊ฐ„ โ†’ 33๋ฐฐ ๊ฐ€์†.


๊ฐ•๊ฑด์„ฑ(Fig. 5) โ€” (์ขŒ) 33๋ฐฐ ๋น ๋ฅธ ํ•™์Šต, (์ค‘) scratch(dense)๋Š” ์—„์ง€๋กœ ๊ท ํ˜• ์žก๋Š” ํŽธ๋ฒ•์„ ๋ฐฐ์›Œ ํŒŒ์ง€๊ฐ€ ๋ถˆ์•ˆ์ •, Play2Perfect๋Š” ์—ฌ๋Ÿฌ ์†๊ฐ€๋ฝ์œผ๋กœ ์•ˆ์ • ํŒŒ์ง€, (์šฐ) ์™ธ๋ ฅ ์„ญ๋™์— ๋Œ€ํ•œ ์„ฑ๊ณต๋ฅ : scratch๋Š” ๊ธ‰๋ฝ, Play2Perfect๋Š” ์œ ์ง€.

๋” ์ค‘์š”ํ•œ ๊ฑด ํ•™์Šต๋œ ์ „๋žต์˜ ์งˆ์ด๋‹ค. scratch(dense)๋Š” ๋ฌผ์ฒด๋ฅผ ์—„์ง€๋กœ โ€œ๊ท ํ˜• ์žก๋Š”โ€ brittleํ•œ ํŽธ๋ฒ•์„ ๋ฐฐ์›Œ, 10 N ์™ธ๋ ฅ์—์„œ ์„ฑ๊ณต๋ฅ  $$20%, ๋” ํฐ ์™ธ๋ ฅ์—” 0%๋กœ ๋ฌด๋„ˆ์ง„๋‹ค. Play2Perfect๋Š” ๊ฐ€์žฅ ํฐ ์„ญ๋™์—์„œ๋„ 75%+๋ฅผ ์œ ์ง€ํ•œ๋‹ค. ์ฆ‰ play prior๋Š” ๋‹จ์ง€ ๋น ๋ฅผ ๋ฟ ์•„๋‹ˆ๋ผ ๋” ๊ฐ•๊ฑดํ•œ ํŒŒ์ง€ยท๋ณต๊ตฌ ์ „๋žต์„ ์‹ฌ๋Š”๋‹ค.

4.2 ์–ด๋–ค ์„ค๊ณ„ ์„ ํƒ์ด ์ค‘์š”ํ•œ๊ฐ€


Ablation(Fig. 6) โ€” ๋„ค ํƒœ์Šคํฌยท์„ธ ์‹œ๋“œ ํ‰๊ท  ๋‹ค์šด์ŠคํŠธ๋ฆผ ์„ฑ๊ณต๋ฅ . Object Diversityยท6D ObjectiveยทTrajectory DiversityยทGoal Precision ๋„ค ์ถ• ๋ชจ๋‘ ์ „์ด์— ์˜ํ–ฅ์„ ์ฃผ๋ฉฐ, ํŒŒ๋ž€ ๊ณก์„ (๊ธฐ๋ณธ๊ฐ’)์ด ๊ฐ€์žฅ ๋น ๋ฅด๊ณ  ๋†’๋‹ค.
  • Object Diversity(10/100/1000). ๋‹ค์–‘์„ฑ์ด ์ „์ด๋ฅผ ๊ฐœ์„ ํ•˜์ง€๋งŒ diminishing returns โ€” 100๊ณผ 1000์€ ํ•™์Šต ์†๋„ยท์ตœ์ข… ์„ฑ๋Šฅ์ด ๋น„์Šทํ•ด, ์ด ๋‹ค์šด์ŠคํŠธ๋ฆผ์—” โ€œ์ ๋‹นํžˆ ๋‹ค์–‘ํ•œโ€ ๋ฌผ์ฒด ์ง‘ํ•ฉ์ด๋ฉด ์ถฉ๋ถ„.
  • Training Objective(6D vs Translation-only vs Rotation-only). ๋ฐฉํ–ฅ ์ œ์–ด๊ฐ€ ๊ฒฐ์ •์ . Translation-only๋Š” graspยทlift๋งŒ ๋ฐฐ์šฐ๊ณ  in-hand reorientation prior๋ฅผ ๋ชป ๋งŒ๋“ค์–ด ์กฐ๋ฆฝ์— ์‹คํŒจํ•œ๋‹ค. Rotation-only๋Š” ์ „์ด๊ฐ€ ๊ฝค ์ข‹์ง€๋งŒ full 6D๋ณด๋‹ค ์•ฝ๊ฐ„ ๋А๋ฆฐ๋ฐ, translation๊ณผ reorientation์„ ๊ฒฐํ•ฉํ•ด ์—ฐ์Šตํ•  ๊ธฐํšŒ๊ฐ€ ์ ์–ด์„œ๋กœ ํ•ด์„.
  • Trajectory Diversity(random vs ๊ณ ์ • 10/100). ๊ณ ์ • 10ยท100์€ ๋น„์Šทํ•˜๊ณ , online ๋ฌด์ž‘์œ„ ๊ถค์ ์ด ๊ฐ€์žฅ ๋น ๋ฅด๋‹ค โ€” ๋ชฉํ‘œ ์ž์„ธ ์ „์ด์˜ ๋„“์€ ์ปค๋ฒ„๋ฆฌ์ง€๊ฐ€ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์กฐ๋ฆฝ ํŒŒ์ธํŠœ๋‹๊ณผ ๋” ์ž˜ ๋งž๋Š”๋‹ค.
  • Goal Precision(1/5/10 cm). ์ •๋ฐ€ํ•œ ๋ชฉํ‘œ๊ฐ€ ์ค‘์š”. ๋А์Šจํ•œ 10 cm๋Š” ์ •ํ™•ํ•œ ์ž์„ธ ์ œ์–ด ์—†์ด๋„ ๋งŒ์กฑ๋ผ ์ „์ด๊ฐ€ ์•ˆ ๋˜๊ณ , 5 cm๋Š” ๊ฒฐ๊ตญ ๋ฐฐ์šฐ์ง€๋งŒ 1 cm๋ณด๋‹ค ๋А๋ฆฌ๋‹ค. tight-clearance ์กฐ๋ฆฝ์—” ์ •๋ฐ€ํ•œ play๊ฐ€ ๋งž๋Š” prior๋ฅผ ๋งŒ๋“ ๋‹ค.

Appendix์˜ ํƒœ์Šคํฌ๋ณ„ ๊ฒฐ๊ณผ(Fig. 8)๋„ ๊ฐ™์€ ๊ฒฐ๋ก ์„ ์žฌํ™•์ธํ•œ๋‹ค: ํšจ๊ณผ์ ์ธ play๋Š” โ€œ์ง‘์–ด ์˜ฎ๊ธฐ๊ธฐโ€๊ฐ€ ์•„๋‹ˆ๋ผ ์†๊ฐ€๋ฝ ๊ธฐ๋ฐ˜ ์ •๋ฐ€ 6D ๋ฌผ์ฒด ์ œ์–ด๋ฅผ ๋ฐฐ์šฐ๋Š” ๊ฒƒ.

4.3 ์ •๋ฐ€ ์กฐ๋ฆฝ์— ํŒŒ์ธํŠœ๋‹์ด ํ•„์š”ํ•œ๊ฐ€ โ€” ๊ทธ๋ ‡๋‹ค


Tight Insertion(Fig. 7) โ€” Play2Perfect vs ์–ผ๋ฆฐ Play-only. (์ขŒ) ๋А์Šจํ•œ ์—ฌ์œ ๋Š” ๋‘˜ ๋‹ค ์„ฑ๊ณตํ•˜๋‚˜ tight ์—ฌ์œ ๋Š” Play2Perfect๋งŒ ์„ฑ๊ณต. (์šฐ ์ƒ) sim์—์„œ ์—ฌ์œ ๊ฐ€ ์ข์•„์ ธ๋„ Play2Perfect๋Š” ๊ฐ•๊ฑด, Play-only๋Š” ๊ธ‰๋ฝ. (์šฐ ํ•˜) real๋„ ๋™์ผ ์ถ”์„ธ.

ํŒŒ์ธํŠœ๋‹ ์—†๋Š” Play-only๋Š” ๊ฐ€์žฅ ๋А์Šจํ•œ ์‚ฝ์ž…๋งŒ ํ‘ผ๋‹ค. sim์—์„œ 40 mm ์—ฌ์œ  75% โ†’ 4 mm์—์„œ ๊ฑฐ์˜ 0%. Play2Perfect๋Š” ์ •๋ฐ€๋„๊ฐ€ ์˜ฌ๋ผ๊ฐ€๋„ 4 mm 95%, 1 mm 92%, ํ•™์Šต ๋ถ„ํฌ๋ณด๋‹ค tightํ•œ 0.2 mm์—์„œ๋„ 80%. real๋„ ๊ฐ™์•„์„œ 10 mm์—์„œ P2P 100% vs Play-only 60%, 2 mm 90% vs 20%, 0.5 mm 60% vs 0%. ์ •์„ฑ์ ์œผ๋กœ Play-only๋Š” ๋ชฉํ‘œ๋กœ ์ง์ง„ํ•˜๋ฉฐ ์ ‘์ด‰์„ ๋ฐฉํ•ด๋กœ ์ทจ๊ธ‰ํ•˜๋Š” ๋ฐ˜๋ฉด, Play2Perfect๋Š” ๊ตฌ๋ฉ ๊ทผ์ฒ˜๋ฅผ ๊ตญ์†Œ ํƒ์ƒ‰ํ•˜๊ณ  ์ ‘์ด‰ ํ•˜์— ๋ณด์ • ๋™์ž‘์„ ํ•˜๋‹ค๊ฐ€ ์ •๋ ฌ๋˜๋ฉด ์‚ฝ์ž…์— ์ปค๋ฐ‹ํ•œ๋‹ค. ์ฆ‰ play๋Š” ์œ ์šฉํ•œ ํŒŒ์ง€ยท์žฌ๋ฐฐํ–ฅ์„ ์ฃผ์ง€๋งŒ, prior๋ฅผ ์ •๋ฐ€ ์กฐ๋ฆฝ ์ •์ฑ…์œผ๋กœ ๋ฐ”๊พธ๋ ค๋ฉด ํŒŒ์ธํŠœ๋‹์ด ํ•„์ˆ˜๋‹ค.

4.4 Sim-to-Real (zero-shot)

FoundationPose๋กœ ์ž์„ธ๋ฅผ ์ถ”์ ํ•˜๋ฉฐ real-world ํŒŒ์ธํŠœ๋‹ ์—†์ด ๋ฐฐํฌํ•œ๋‹ค. Tight-Insertion 10 mm 10/10, 2 mm 9/10, 0.5 mm 6/10. Assemble-Beam Step1 8/10ยทStep2 7/10(๊ฐ ํ‰๊ท  7์ดˆ ์ด๋‚ด). Screw-Leg ์‚ฝ์ž… 7/10ยท์ „์ฒด ์Šคํฌ๋ฅ˜ 5/10(์„ฑ๊ณต ์‹œ 15.6\pm2.9์ดˆ). ์™„๋ฃŒ ์‹œ๊ฐ„์€ ํ™ˆ ์ž์„ธ์—์„œ์˜ ์ ‘๊ทผยทํŒŒ์ง€ยท์žฌ๋ฐฐํ–ฅยท์šด๋ฐ˜ยท์ตœ์ข… ์ ‘์ด‰๊นŒ์ง€ ํฌํ•จํ•˜๋ฉฐ, ์ด ๋น ๋ฅธ ์‹คํ–‰์€ ๋‹ค์ง€ ์† ์กฐ๋ฆฝ์˜ ์ด์ ๊ณผ RL์ด ํšจ์œจ์  ์กฐ์ž‘ ์ „๋žต์„ ๋ฐœ๊ฒฌํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. Appendix์˜ ์ •์„ฑ ๋ถ„์„์— ๋”ฐ๋ฅด๋ฉด ์ •์ฑ…์€ ๋“œ๋กญ ํ›„ ์žฌํŒŒ์ง€, ์ ‘์ด‰ ํ•˜ ๊ตญ์†Œ ํƒ์ƒ‰, ์† ์•ˆ์—์„œ ๋‹ค๋ฆฌ๋ฅผ ์ง์ ‘ ํšŒ์ „์‹œํ‚ค๋Š” ์Šคํฌ๋ฅ˜์ž‰ ๊ฐ™์€ closed-loop ๋ณต๊ตฌ ํ–‰๋™์„ ๋ณ„๋„ ์Šคํฌ๋ฆฝํŠธ ์—†์ด ๋‚ธ๋‹ค โ€” ๋ณ‘๋ ฌ ๊ทธ๋ฆฌํผ๋ผ๋ฉด ์žฌ๋ฐฐ์น˜ยท์žฌํŒŒ์ง€๋‚˜ ํŒ” ์ „์ฒด ํšŒ์ „์ด ํ•„์š”ํ–ˆ์„ ๋™์ž‘๋“ค์ด๋‹ค. ์‹คํŒจ๋Š” ๋Œ€๋ถ€๋ถ„ ์ตœ์ข… ์ ‘์ด‰ ๊ตญ๋ฉด์—์„œ ๊ฐ€๋ฆผ์— ์˜ํ•œ ์ธ์ง€ ์ €ํ•˜ยท์ ‘์ด‰ ๋™์—ญํ•™์˜ sim-to-real ๋ถˆ์ผ์น˜๋กœ ๋ฐœ์ƒํ•œ๋‹ค.

๋ณด์กฐ: claude-curio ๋…๋ฆฝ ์žฌํ˜„ (offline eval)

โš™๏ธ ์ด ๋ธ”๋ก์€ ์ €์ž ๊ฒฐ๊ณผ๊ฐ€ ์•„๋‹ˆ๋ผ claude-curio๊ฐ€ ๊ณต๊ฐœ ์ฒดํฌํฌ์ธํŠธ๋กœ ์ˆ˜ํ–‰ํ•œ ๋…๋ฆฝ ์žฌํ˜„์ด๋‹ค(RTX 5090, 256 ๋ณ‘๋ ฌ env, headless offline evaluation). ๋…ผ๋ฌธ์˜ sim-to-real ํ‘œ(๊ฐ ํƒœ์Šคํฌ n=10)์™€ ์„ฑ๊ฒฉ์ด ๋‹ค๋ฅด๋ฏ€๋กœ ์ฃผ์žฅ๊ณผ ๊ตฌ๋ถ„ํ•ด ์ฝ๋Š”๋‹ค.

๊ณต๊ฐœ๋œ ์ฒดํฌํฌ์ธํŠธ๋กœ sim ๋‚ด์—์„œ ๊ฐ ํƒœ์Šคํฌ๋ฅผ ์ˆ˜๋ฐฑ ํšŒ rolloutํ•ด ์„ฑ๊ณต๋ฅ ์„ ์ธก์ •ํ–ˆ๋‹ค.

ํƒœ์Šคํฌ ์žฌํ˜„ ์„ฑ๊ณต๋ฅ  (offline, sim) ๋…ผ๋ฌธ real (n=10)
Tight insertion (L-peg, 0.5 mm) 96.9% (nโ‰ˆ229) 60% (6/10)
Beam assembly step 1 98.8% (nโ‰ˆ241) 80% (8/10)
Beam assembly step 2 93.6% (nโ‰ˆ220) (๋ฏธ์ œ์‹œ)
Screwing 65.0% (nโ‰ˆ254) 50% (5/10)

ํ•ด์„(๋…ผ๋ฌธ ํ„ํ•˜ ์•„๋‹˜). ํƒœ์Šคํฌ ๊ฐ„ ๋‚œ์ด๋„ ์ˆœ์œ„๋Š” ๋…ผ๋ฌธ๊ณผ ์ผ์น˜ํ•œ๋‹ค โ€” screwing์ด ๊ฐ€์žฅ ์–ด๋ ต๊ณ  ์‚ฝ์ž…ยท๋น” ์กฐ๋ฆฝ์ด ์‰ฝ๋‹ค. ์ด ์ •์„ฑ์  ์ˆœ์œ„๊ฐ€ ์žฌํ˜„๋๋‹ค๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์˜๋ฏธ ์žˆ๋Š” ์‹ ํ˜ธ๋‹ค. ๋ฐ˜๋ฉด ์ ˆ๋Œ€ ์„ฑ๊ณต๋ฅ ์ด ์ „๋ฐ˜์ ์œผ๋กœ ๋†’๊ฒŒ ๋‚˜์˜จ ๊ฒƒ์€ ์„ฑ๋Šฅ ์šฐ์œ„๋กœ ์ฝ์œผ๋ฉด ์•ˆ ๋˜๊ณ , (a) ๋…ผ๋ฌธ์˜ n=10์€ ๋…ธ์ด์ฆˆ๊ฐ€ ํฐ ์ ์ถ”์ •(์ดํ•ญ ํ‘œ์ค€ํŽธ์ฐจ \approx 13%p)์ด๋ผ ๋„“์€ ์‹ ๋ขฐ๊ตฌ๊ฐ„์„ ๊ฐ–๊ณ , (b) ๊ณต๊ฐœ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์ €์ž best run์ผ ๊ฐ€๋Šฅ์„ฑ, (c) offline eval์˜ ์ดˆ๊ธฐ ์ž์„ธ ๋ถ„ํฌยทํŒ์ • tolerance๊ฐ€ ์‹ค๊ธฐ(sim-to-real)์™€ ๋‹ค๋ฅด๋‹ค๋Š” ์ ์œผ๋กœ ์„ค๋ช…๋œ๋‹ค. ๋ฌด์—‡๋ณด๋‹ค ์ด ์ˆ˜์น˜๋Š” sim ๋‚ด policy ์„ฑ๊ณต๋ฅ ์ผ ๋ฟ, ๋…ผ๋ฌธ real ํ‘œ๊ฐ€ ๊ฐ๋‚ดํ•˜๋Š” ์ธ์ง€ ์˜ค์ฐจยท์ ‘์ด‰ ๋™์—ญํ•™ ๊ฐญยท๊ฐ€๋ฆผ์ด ๋น ์ ธ ์žˆ๋‹ค. ๋ฌผ๋ฆฌ clearance ablation(0.5/2/10 mm)์€ ๊ณต๊ฐœ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ L-peg ํ•˜๋‚˜๋ฟ์ด๋ผ ์žฌํ˜„ ๋ฒ”์œ„ ๋ฐ–์ด์—ˆ๋‹ค.

์š”์ปจ๋Œ€ ์ด ์žฌํ˜„์€ โ€œpriorโ†’ํŒŒ์ธํŠœ๋‹ ์ •์ฑ…์ด sim ์•ˆ์—์„œ ๋…ผ๋ฌธ๊ณผ ๊ฐ™์€ ๋‚œ์ด๋„ ๊ตฌ์กฐ๋กœ ๋™์ž‘ํ•œ๋‹คโ€๋ฅผ ํ™•์ธํ•ด์ค„ ๋ฟ, ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ์ธ sim-to-realยท33ร— ํ‘œ๋ณธํšจ์œจยทplay prior ์„ค๊ณ„ ๊ตํ›ˆ์„ ๋Œ€์ฒดํ•˜๊ฑฐ๋‚˜ ๋ฐ˜๋ฐ•ํ•˜์ง€ ์•Š๋Š”๋‹ค.

๋น„ํŒ์ ์œผ๋กœ ๋ณด๋ฉด

๊ฐ•์ 

  • ๋ฌธ์ œ ์žฌ๊ตฌ์„ฑ์˜ ๋ช…์พŒํ•จ. โ€œsparse-reward ์กฐ๋ฆฝ ํƒ์ƒ‰โ€์„ โ€œplay prior + ์ข์€ ํŒŒ์ธํŠœ๋‹โ€์œผ๋กœ ๋ถ„ํ•ดํ•œ ๊ฒƒ์€ ๊ฐœ๋…์ ์œผ๋กœ ๊น”๋”ํ•˜๊ณ , denseยทmulti-stage ๋ณด์ƒ์„ ์ค€ scratch๋ฅผ ์ƒ๋Œ€๋กœ 33๋ฐฐยท๊ฐ•๊ฑด์„ฑ ์šฐ์œ„๋ฅผ ๋ณด์ธ ๋น„๊ต๊ฐ€ ์„ค๋“๋ ฅ ์žˆ๋‹ค. baseline์— ์˜คํžˆ๋ ค ์œ ๋ฆฌํ•œ dense reward๋ฅผ ์ค€ ์ ์ด ๊ณต์ •์„ฑ์„ ๋†’์ธ๋‹ค.
  • โ€œ๋ฌด์—‡์ด ์ค‘์š”ํ•œ๊ฐ€โ€์˜ ์ฒด๊ณ„์  ablation. ๋„ค ์ถ•์„ ๊ฐ๊ฐ ํ†ต์ œํ•ด ๋ฐ”๊พธ๊ณ , ํƒœ์Šคํฌ๋ณ„(Fig. 8)๊นŒ์ง€ ์žฌํ™•์ธํ•ด โ€œin-hand ์ •๋ฐ€ 6D ์ œ์–ดโ€๋ผ๋Š” ๋‹จ์ผ ๊ตํ›ˆ์œผ๋กœ ์ˆ˜๋ ด์‹œํ‚จ ์„œ์ˆ ์ด ์ด ๋…ผ๋ฌธ์˜ ์‹ค์งˆ์  ๊ธฐ์—ฌ๋‹ค. ๋‹จ์ˆœ SOTA ์ฃผ์žฅ๋ณด๋‹ค ์ด์‹ ๊ฐ€๋Šฅํ•œ ํ†ต์ฐฐ์„ ์ค€๋‹ค.
  • ๊นŒ๋‹ค๋กœ์šด sim-to-real ์ฆ๊ฑฐ. 0.5 mm ์—ฌ์œ ์˜ zero-shot ์‚ฝ์ž… 60%, ์Šคํฌ๋ฅ˜์ž‰ยท๋‹ค๋ถ€ํ’ˆ ์กฐ๋ฆฝ๊นŒ์ง€ ์Šคํฌ๋ฆฝํŠธ ์—†๋Š” closed-loop๋กœ ํ•ด๋‚ธ ๊ฒƒ์€ ๋‹ค์ง€ ์† ์ •๋ฐ€ ์กฐ๋ฆฝ์—์„œ ๋“œ๋ฌธ ๊ฒฐ๊ณผ๋‹ค. hybrid SDF(์ ‘์ด‰๋ถ€๋งŒ ๊ณ ํ•ด์ƒ๋„)๋‚˜ retraction bonus ๊ฐ™์€ ์„ธ๋ถ€ ์„ค๊ณ„๊ฐ€ ์‹ค์ „ ๊ฐ๊ฐ์„ ๋ณด์—ฌ์ค€๋‹ค.

์•ฝ์ ยทํ•œ๊ณ„

  • ๋‹จ๊ธฐ skill, ์™ธ๋ถ€ ์ง€์ • ์˜์กด. ์ €์ž๋„ ์ธ์ •ํ•˜๋“ฏ ํƒœ์Šคํฌ ์‹œํ€€์‹ฑยท๋Šฅ๋™ ๋ถ€ํ’ˆ ์„ ํƒยท๋ชฉํ‘œ ์ž์„ธ๊ฐ€ ๋ชจ๋‘ ์™ธ๋ถ€์—์„œ ์ฃผ์–ด์ง€๊ณ , ์ •์ฑ…์€ ํƒœ์Šคํฌ/๋ฒค์น˜๋งˆํฌ ๊ณ„์—ด๋ณ„๋กœ ํŒŒ์ธํŠœ๋‹๋œ๋‹ค. ์™„์ „ ์ž์œจ ์กฐ๋ฆฝ ํŒŒ์ดํ”„๋ผ์ธ์ด ์•„๋‹ˆ๋ผ โ€œ์งง์€ ์กฐ๋ฆฝ skillโ€์˜ ํ•™์Šต์ด๋‹ค.
  • ์ธ์ง€ ๋ณ‘๋ชฉ. ์‹ค์„ธ๊ณ„ ์„ฑ๋Šฅ์ด FoundationPose์˜ 6D ์ถ”์ •์— ํฌ๊ฒŒ ์˜์กดํ•œ๋‹ค. ๋น ๋ฅธ ์›€์ง์ž„ยท๊ฐ€๋ฆผยท์‹œ๊ฐ์  ์œ ์‚ฌ ๋ฌผ์ฒด์—์„œ ์ถ”์ ์ด ํ”๋“ค๋ฆฌ๊ณ , ์Šคํฌ๋ฅ˜-leg์˜ ๊ทผ์‚ฌ 90^\circ ๋Œ€์นญ์€ ํšŒ์ „ ๋ฐฉํ–ฅ ์˜ค์ธ์„ ์œ ๋ฐœํ•ด ์ƒ‰ ํ…Œ์ดํ”„๋กœ ๋Œ€์นญ์„ ๊นจ์•ผ ํ–ˆ๋‹ค. ์ •์ฑ…์€ ๋ชฉํ‘œ ์ž์„ธ ์™ธ์˜ fixtureยท์ฃผ๋ณ€ ๊ธฐํ•˜๋ฅผ ์ง์ ‘ ๊ด€์ธกํ•˜์ง€ ์•Š์•„ scene-awareness๊ฐ€ ์—†๋‹ค(์ €์ž๊ฐ€ ํ–ฅํ›„ visualยทtactile ๊ด€์ธก์„ ์ œ์•ˆ).
  • ์ž์› ๋น„์šฉ์˜ ๋น„๋Œ€์นญ. play ์‚ฌ์ „ํ•™์Šต์— 24,576 envยท7์ผ์ด ๋“œ๋Š”๋ฐ, ์ด prior๊ฐ€ ์ƒˆ๋กœ์šด ์†ยทํŒ” embodiment๋‚˜ ํฌ๊ฒŒ ๋‹ค๋ฅธ ๋ฌผ์ฒด๊ตฐ์— ์–ผ๋งˆ๋‚˜ ์žฌ์‚ฌ์šฉ๋˜๋Š”์ง€๋Š” ์ด ๋…ผ๋ฌธ ๋ฒ”์œ„ ๋ฐ–์ด๋‹ค. โ€œํ•œ ๋ฒˆ ํ•™์Šตํ•ด ์—ฌ๋Ÿฌ ์กฐ๋ฆฝ์— ์žฌ์‚ฌ์šฉโ€์˜ ์ด์ ์ด sim์—์„œ 3๋ฐฐ ํ™•๋Œ€ยท์ •๋ ฌ๋œ CAD ๋ถ€ํ’ˆ์— ๊ตญํ•œ๋œ ์…‹์—…์—์„œ ์–ผ๋งˆ๋‚˜ ์ผ๋ฐ˜ํ™”๋ ์ง€๋Š” ์—ด๋ ค ์žˆ๋‹ค.
  • ์ ‘์ด‰ sim-to-real ๊ฐญ. real ํ”ฝ์Šค์ฒ˜๊ฐ€ ํผ ์œ„์— ํ…Œ์ดํ”„๋กœ ๊ณ ์ •๋ผ ์ ‘์ด‰ ํ•˜์— ์›€์ง์ด๋Š”๋ฐ sim์€ ๊ฐ•์ฒดยท๊ณ ์ •์ด๋ผ, ๋ณด์ • ๋™์ž‘์ด ์˜ˆ์ƒ ์ƒ๋Œ€์šด๋™์„ ๋ชป ๋‚ด๋Š” ์‹คํŒจ๊ฐ€ sim์—์„  ๊ฒฐ์ฝ” ๊ด€์ธก๋˜์ง€ ์•Š๋Š”๋‹ค. ์ ‘์ด‰ ๋™์—ญํ•™ ๋ชจ๋ธ๋ง์ด ์—ฌ์ „ํžˆ ์„ฑ๋Šฅ ์ƒํ•œ์„ ์ฅ๊ณ  ์žˆ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ์™€์˜ ์ž๋ฆฌ ๋งค๊น€

  • Play/foundation ์ปจํŠธ๋กค๋Ÿฌ ๊ณ„์—ด. DexterityGen์ด๋‚˜ SimToolReal์ฒ˜๋Ÿผ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์— ๊ฑธ์นœ task-agnostic play ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ํ•™์Šตํ•˜๋Š” ํ๋ฆ„๊ณผ ๊ฐ€์žฅ ๊ฐ€๊น๋‹ค. ๋‹ค๋งŒ ๊ทธ๋“ค์€ zero-shot(์›๊ฒฉ์กฐ์ž‘ยทํ…Œ์ŠคํŠธ ์‹œ ์‚ฌ๋žŒ ์‹œ์—ฐ๊ณผ ๊ฒฐํ•ฉ) ๋ฐฐํฌ๋ผ ์ •๋ฐ€ ์ ‘์ด‰ ์กฐ๋ฆฝ์—” ์•ฝํ•œ ๋ฐ˜๋ฉด, Play2Perfect๋Š” play๋ฅผ ์‚ฌ์ „ํ•™์Šต์œผ๋กœ ๋ณด๊ณ  sparse-reward ํŒŒ์ธํŠœ๋‹์œผ๋กœ ์ •๋ฐ€ ์กฐ๋ฆฝ์— ํŠนํ™”ํ•œ๋‹ค๋Š” ์ ์ด ์ฐจ๋ณ„์ ์ด๋‹ค.
  • Sim-to-real dexterous RL. graspยทin-hand reorientation์„ ์ž์œ ๊ณต๊ฐ„์—์„œ ํ‘ธ๋Š” DexTreme๋ฅ˜์™€ ViserDex(๋‹จ์•ˆ RGB in-hand reorientation)์˜ ์—ฐ์žฅ์„ ์—์„œ, โ€œ์ž์œ ๊ณต๊ฐ„ skill์„ ์ ‘์ด‰ ๋งŽ์€ ์กฐ๋ฆฝ์œผ๋กœโ€ ํ™•์žฅํ•˜๋ ค๋Š” ์‹œ๋„๋‹ค. ์ ‘์ด‰ ์กฐ๋ฆฝ์œผ๋กœ ๊ฐˆ ๋•Œ ๊ธฐ์กด์—” dense reward๋‚˜ ์‚ฌ๋žŒ ๊ถค์ ยท์›๊ฒฉ์กฐ์ž‘ warm-start๊ฐ€ ํ•„์š”ํ–ˆ๋Š”๋ฐ, ์ด๋ฅผ play prior๋กœ ๋Œ€์ฒดํ•œ ๊ฒƒ์ด ํ•ต์‹ฌ.
  • ์ •๋ฐ€ยท์ ‘์ด‰ ์กฐ๋ฆฝ. IndustRealยทAutoMate ๊ฐ™์€ dense-reward ์กฐ๋ฆฝ, FabricaยทFurnitureBench ๋ฒค์น˜๋งˆํฌ, assembly-by-disassembly(Assemble Them All) ์œ„์— ์„œ ์žˆ๋‹ค. ๋ณ‘๋ ฌ ๊ทธ๋ฆฌํผยทํ”ฝ์Šค์ฒ˜๋กœ ๊ตฌ์กฐํ™”ํ•˜๋˜ ์ด ๊ณ„์—ด์— โ€œ๋‹ค์ง€ ์† + task-agnostic priorโ€๋ผ๋Š” ์ถ•์„ ๋”ํ•œ๋‹ค.

์š”์•ฝ

Play2Perfect์˜ ๋ฉ”์‹œ์ง€๋Š” ๋ฐฉ๋ฒ•์ด ์•„๋‹ˆ๋ผ ๋ ˆ์‹œํ”ผ์˜ ํ•ด๋ถ€์— ์žˆ๋‹ค. ์ •๋ฐ€ ์กฐ๋ฆฝ์„ ์ง์ ‘ RL๋กœ ํ‘ธ๋Š” ๋Œ€์‹ , ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋ฅผ ๋ฌด์ž‘์œ„ 6D ๋ชฉํ‘œ๋กœ ์˜ฎ๊ธฐ๋Š” play๋ฅผ ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ RL๋กœ ์‚ฌ์ „ํ•™์Šตํ•˜๋ฉด, ๊ทธ prior๊ฐ€ sparse-reward ํŒŒ์ธํŠœ๋‹์„ ์„ฑ๊ณต์— ํ•„์š”ํ•œ ์ตœ์ข… ์ ‘์ด‰ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ์ขํ˜€์ค€๋‹ค. denseยทmulti-stage ๋ณด์ƒ์„ ์ค€ scratch๋ณด๋‹ค 33๋ฐฐ ํšจ์œจ์ ์ด๊ณ , 0.5 mm ์—ฌ์œ  ์‚ฝ์ž… 60%ยท๋‹ค๋ถ€ํ’ˆ ์กฐ๋ฆฝยท์Šคํฌ๋ฅ˜์ž‰์„ zero-shot์œผ๋กœ ํ•ด๋‚ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋“  ablation์ด ํ•˜๋‚˜๋กœ ์ˆ˜๋ ดํ•œ๋‹ค โ€” ๊ณ ์ • ํŒŒ์ง€์˜ ํŒ” ์ด๋™์ด ์•„๋‹ˆ๋ผ ์†๊ฐ€๋ฝ ๊ธฐ๋ฐ˜ ์ •๋ฐ€ 6D in-hand ์ œ์–ด๋ฅผ ๊ฐ•์ œํ•˜๋Š” play๊ฐ€ ์กฐ๋ฆฝ์œผ๋กœ ๊ฐ€์žฅ ์ž˜ ์ „์ด๋œ๋‹ค. ์ž์œจ ์‹œํ€€์‹ฑยท์ธ์ง€ ๊ฐ•๊ฑด์„ฑยท์ ‘์ด‰ sim-to-real์€ ๋‚จ์€ ์ˆ™์ œ์ง€๋งŒ, โ€œ์™„๋ฒฝํžˆ ํ•˜๊ธฐ ์ „์— ๋†€์•„๋ผโ€๋Š” ์žฌ๊ตฌ์„ฑ์€ ๋‹ค์ง€ ์† ์ •๋ฐ€ ์กฐ์ž‘์˜ ํƒ์ƒ‰ ๋ฌธ์ œ์— ์‹ค์šฉ์ ์ธ ์ง€๋ ›๋Œ€๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

Copyright 2026, JungYeon Lee