Curieux.JY
  • JungYeon Lee
  • Post
  • Lecture
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review
    • ๋“ค์–ด๊ฐ€๋ฉฐ: ์™œ โ€œ๋‹ค์‹œโ€ ์†์•ˆ ์žฌ๋ฐฐํ–ฅ์ธ๊ฐ€
    • ๋ฌธ์ œ ์„ค์ •: ํšŒ์ „๊ณผ ์žฌ๋ฐฐํ–ฅ์˜ ๋ฏธ๋ฌ˜ํ•˜์ง€๋งŒ ๊ฒฐ์ •์ ์ธ ์ฐจ์ด
    • ํ•ต์‹ฌ ์•„์ด๋””์–ด: ๋‹จ์ˆœ ์Šคํ‚ฌ ์œ„์— ์–‡์€ ํ”Œ๋ž˜๋„ˆ ํ•œ ์žฅ
      • ์ €์ˆ˜์ค€ ์Šคํ‚ฌ \pi_{\text{skill}}: RotateIt์„ ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์˜จ๋‹ค
      • ํ”Œ๋ž˜๋„ˆ \pi_{\text{plan}}: ์ถ•์„ ๊ณ ๋ฅด๊ณ  ๋ฏธ์„ธ ๋ณด์ •์„ ๋”ํ•œ๋‹ค
      • ๋ณด์ƒ์˜ ๋‹จ์ˆœํ™”
    • ์ž์„ธ ์ถ”์ •๊ธฐ: ์†๊ฐ€๋ฝ๋งŒ์œผ๋กœ ์†์•ˆ์˜ ๋ฌผ์ฒด๋ฅผ ๋ณธ๋‹ค
      • ์ถ”์ •๊ธฐ ์ž…๋ ฅ
      • ํ•™์Šต ์ „๋žต
      • ์˜์‚ฌ ์ฝ”๋“œ
    • ์‹œ์Šคํ…œ ๊ตฌ์กฐ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ
    • ์‹คํ—˜: ๋ฌด์—‡์„ ๋ฌป๊ณ , ๋ฌด์—‡์ด ๋‹ต์ธ๊ฐ€
      • ์‹คํ—˜ ์„ค์ •
      • ํ•ต์‹ฌ ๋น„๊ต: ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šต vs ๊ณ„์ธต์ 
      • ์ž”์ฐจ ํ–‰๋™์˜ ํšจ๊ณผ
      • ์ž์„ธ ์ถ”์ •๊ธฐ์˜ ์ผ๋ฐ˜ํ™”
      • ์‹ค์„ธ๊ณ„ ์ „์ด
    • ๋น„ํŒ์  ๊ณ ์ฐฐ: ๋ฌด์—‡์ด ๊ฐ•ํ•˜๊ณ  ๋ฌด์—‡์ด ์•ฝํ•œ๊ฐ€
      • ๊ฐ•์ 
      • ์•ฝ์ ๊ณผ ์˜๋ฌธ
    • ๊ด€๋ จ ์—ฐ๊ตฌ ์ง€๋„
    • ์‹ค๋ฌด ๊ด€์ ์—์„œ ๊ฐ€์ ธ๊ฐˆ ๋งŒํ•œ ํ†ต์ฐฐ
    • ๋งˆ๋ฌด๋ฆฌ

๐Ÿ“ƒFrom Simple to Complex Skills ๋ฆฌ๋ทฐ

dexterous manipulation
hierarchical RL
From Simple to Complex Skills: The Case of In-Hand Object Reorientation
Published

March 23, 2026

  • Paper Link

  • Project Link

  • Haozhi Qi, Brent Yi, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

  1. ๐Ÿค– In-hand object reorientation ์ž‘์—…์„ ์œ„ํ•ด, ๋ณธ ๋…ผ๋ฌธ์€ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ €์ˆ˜์ค€ skill๋“ค์„ ์žฌ์‚ฌ์šฉํ•˜๋Š” ๊ณ„์ธต์  ์ •์ฑ…(hierarchical policy)์„ ์ œ์•ˆํ•˜์—ฌ sim-to-real gap๊ณผ ํ•™์Šต ํšจ์œจ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿฆพ ๋˜ํ•œ, ์‹œ์Šคํ…œ์€ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ๊ฐ์ฒด ์ž์„ธ ์ถ”์ •๊ธฐ(generalizable object pose estimator)๋ฅผ ๋„์ž…ํ•˜๋ฉฐ, ์ด๋Š” ์ €์ˆ˜์ค€ skill ์ •์ฑ…์˜ ํ”ผ๋“œ๋ฐฑ๊ณผ ์ž”์—ฌ ๋™์ž‘(residual action)์„ ํ™œ์šฉํ•˜์—ฌ ๋ณต์žกํ•œ ์กฐ์ž‘ ํ™˜๊ฒฝ์—์„œ ์ •ํ™•ํ•œ ์ž์„ธ ์ถ”์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  3. โœจ ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ์ ‘๊ทผ ๋ฐฉ์‹์€ scratch๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” baseline๋ณด๋‹ค ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๊ณ  ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์— ๋Œ€ํ•œ ํ˜„์‹ค ์„ธ๊ณ„ sim-to-real transfer์—์„œ ๊ฐ•๋ ฅํ•œ ๊ฒฌ๊ณ ํ•จ๊ณผ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ ๋กœ๋ด‡์˜ In-Hand Object Reorientation์ด๋ผ๋Š” ๋ณต์žกํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด์— ํ•™์Šต๋œ(pre-trained) ์ €์ˆ˜์ค€(low-level) ์Šคํ‚ฌ์„ ์žฌ์‚ฌ์šฉํ•˜๋Š” ๊ณ„์ธต์  ์ •์ฑ…(hierarchical policy) ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ •์ฑ…์„ ํ•™์Šตํ•˜๊ณ  ์‹ค์ œ ์„ธ๊ณ„๋กœ ์ „์ด(transfer)ํ•˜๋Š” ๊ฒƒ์ด dexterous manipulation์—์„œ ์œ ๋งํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์ด์ง€๋งŒ, ๊ฐ ์ƒˆ๋กœ์šด ์ž‘์—…์— ๋Œ€ํ•ด sim-to-real gap์„ ๋ฉ”์šฐ๋Š” ๋ฐ์—๋Š” reward engineering, hyperparameter tuning, system identification๊ณผ ๊ฐ™์€ ์ƒ๋‹นํ•œ ์ธ๊ฐ„์˜ ๋…ธ๋ ฅ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„์ธต์  ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  (Core Methodology):

๋ณธ ์‹œ์Šคํ…œ์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค: Planner Policy (\pi_{plan})์™€ Skill Policy (\pi_{skill}).

  1. Skill Policy (\pi_{skill}):
    • ์ด๊ฒƒ์€ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ๋กœ, In-Hand Object Rotation ์ •์ฑ… [6]์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค. ํŠน์ • ํšŒ์ „ ์ถ• k์— ๋Œ€ํ•ด ๊ฐ์ฒด๋ฅผ ํšŒ์ „์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
    • \pi_{skill}์˜ ์ž…๋ ฅ o_{skill_t}๋Š” ๋กœ๋ด‡์˜ ๊ณ ์œ ์ˆ˜์šฉ์„ฑ(proprioception) ์ •๋ณด (๊ด€์ ˆ ์œ„์น˜ \theta_{t-T:t} ๋ฐ ๋ช…๋ น๋œ ๊ด€์ ˆ ๋ชฉํ‘œ a_{skill_{t-T-1:t-1}})์™€ RGB-D ์นด๋ฉ”๋ผ์—์„œ ์–ป์€ ๊นŠ์ด ์ด๋ฏธ์ง€(depth image) ์ž„๋ฒ ๋”ฉ d_{t-T:t}๋ฅผ ํฌํ•จํ•˜๋Š” ์‹œ๊ฐ„์  ์‹œํ€€์Šค(temporal sequence)์ž…๋‹ˆ๋‹ค (์—ฌ๊ธฐ์„œ T=30 ํƒ€์ž„์Šคํ…์˜ ๊ธฐ๋ก์„ ์‚ฌ์šฉ).
    • \pi_{skill}์€ ๋กœ๋ด‡์— ๋Œ€ํ•œ ์›์‹œ ๊ด€์ ˆ ์œ„์น˜ ๋ชฉํ‘œ(raw joint position targets) a_{skill_t}๋ฅผ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ถ”๊ฐ€์ ์œผ๋กœ ๊ฐ์ฒด์˜ ๋ฌผ๋ฆฌ์  ์†์„ฑ๊ณผ ํ˜•ํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํŠน์ง• ๋ฒกํ„ฐ z_t๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด z_t๋Š” ๊ณ ์ˆ˜์ค€(high-level) ์ •์ฑ…์— ๋Œ€ํ•œ ํ”ผ๋“œ๋ฐฑ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. z_t๋Š” ์‹œ๊ฐ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ์ฒด์˜ ๊ธฐํ•˜ํ•™์  ์ •๋ณด๋ฅผ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค.
  2. Planner Policy (\pi_{plan}):
    • ์ด๊ฒƒ์€ ๊ณ ์ˆ˜์ค€ ์ •์ฑ…์œผ๋กœ, a_{plan_t} = \pi_{plan}(o_{plan_t}, q_{goal_t}, z_t)์™€ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.
    • \pi_{plan}์˜ ์ž…๋ ฅ o_{plan_t}๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ •๋ณด๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:
      • ๊ฐ์ฒด์˜ ์ƒํƒœ ์‹œํ€€์Šค s_{t-5:t} (3D ์œ„์น˜ p_t ๋ฐ ๋‹จ์œ„ ์ฟผํ„ฐ๋‹ˆ์–ธ(unit quaternion) q_t๋กœ ํ‘œํ˜„๋œ ๋ฐฉํ–ฅ).
      • ๊ฐ์ฒด์™€ ๋ชฉํ‘œ ๋ฐฉํ–ฅ ์‚ฌ์ด์˜ ์ƒ๋Œ€์  ๋ณ€ํ™˜ \zeta_{t-5:t} = \Delta(q_t, q_{goal_t}) = q_{goal_t} \cdot \bar{q}_t.
      • ์ด์ „ ํƒ€์ž„์Šคํ…์˜ planner action a_{plan_{t-6:t-1}}.
      • ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€, ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ์ •์ฑ…์ธ \pi_{skill}์—์„œ ์ œ๊ณตํ•˜๋Š” ํ”ผ๋“œ๋ฐฑ z_t์ž…๋‹ˆ๋‹ค. ์ด๋Š” \pi_{plan}์ด ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์˜ ๋ฐ˜์‘์„ ์ธ์ง€ํ•˜๊ณ  ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.
    • \pi_{plan}์€ 3๊ณ„์ธต MLP(Multi-Layer Perceptron) ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ELU(Exponential Linear Unit) ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
    • \pi_{plan}์˜ ์ถœ๋ ฅ์€ 7์ฐจ์› ๋ฒ”์ฃผํ˜• ๋ถ„ํฌ(categorical distribution)๋กœ, ์—ฌ์„ฏ ๊ฐœ์˜ ์ •๊ทœ ํšŒ์ „ ์ถ• (\pm x, \pm y, \pm z) ์ค‘ ํ•˜๋‚˜์™€ ์ถ”๊ฐ€์ ์ธ STOP ๋ช…๋ น์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ์ฟผํ„ฐ๋‹ˆ์–ธ์€ ๋„คํŠธ์›Œํฌ ์ž…๋ ฅ ์‹œ 6D representations์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.
    • Residual Actions (a_{rest}): \pi_{plan}์€ ์„ ํƒ๋œ ํšŒ์ „ ์ถ• ์™ธ์—, ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์˜ ์ถœ๋ ฅ์— ๋ณด์™„์ ์ธ ์ž”์—ฌ ๋™์ž‘(residual action) a_{rest}๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ๋กœ๋ด‡์—๊ฒŒ ์ „๋‹ฌ๋˜๋Š” ๋™์ž‘์€ a_t = a_{rest} + a_{skill_t}์ž…๋‹ˆ๋‹ค. ์ด a_{rest}๋Š” planner policy๊ฐ€ low-level skill์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•˜๊ณ  ์ถ”๊ฐ€์ ์ธ ์˜ค๋ฅ˜ ๋ณด์ •์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐ ๋ณด์ƒ:

  • \pi_{plan}์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ œ๊ณต๋˜๋Š” ground-truth ๊ฐ์ฒด ์ƒํƒœ q_t๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค.
  • ๋ณด์ƒ ํ•จ์ˆ˜๋Š” r = 1/(d(q_t, q_{goal_t}) + \epsilon) + \lambda_s \mathbb{1}(Success)๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ d(q_t, q_{goal_t})๋Š” ํšŒ์ „ ๊ฑฐ๋ฆฌ ๋ณด์ƒ์ด๋ฉฐ, \mathbb{1}(Success)๋Š” ์„ฑ๊ณต ๋ณด๋„ˆ์Šค์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ์— ๋น„ํ•ด ๋ณด์ƒ ํ•จ์ˆ˜๊ฐ€ ํ›จ์”ฌ ๋‹จ์ˆœํ•˜๋ฉฐ, ์ด๋Š” ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์ด ์ด๋ฏธ ์ž˜ ํŠœ๋‹๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

Generalizable State Estimator:

  • ์‹ค์ œ ์„ธ๊ณ„๋กœ ์ •์ฑ…์„ ์ „์ดํ•˜๊ธฐ ์œ„ํ•ด, ์‹œ์Šคํ…œ์€ ๊ฐ•๊ฑดํ•œ ๊ฐ์ฒด ์ž์„ธ ์ถ”์ •๊ธฐ(pose estimator)๋ฅผ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ œ์•ˆ๋œ ์ž์„ธ ์ถ”์ •๊ธฐ๋Š” ์‹ ๊ฒฝ๋ง \phi๋กœ ๊ตฌํ˜„๋œ ์žฌ๊ท€์  ์ƒํƒœ ์ถ”์ •๊ธฐ(recursive state estimator)์ž…๋‹ˆ๋‹ค.
  • ์ž…๋ ฅ์€ ๊ณ ์œ ์ˆ˜์šฉ์„ฑ, ๋™์ž‘, ์ œ์–ด ์˜ค๋ฅ˜, ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ํ”ผ๋“œ๋ฐฑ(z_t), ๊ทธ๋ฆฌ๊ณ  ์ด์ „์— ์ถ”์ •๋œ ๊ฐ์ฒด ์ƒํƒœ ์‹œํ€€์Šค์ž…๋‹ˆ๋‹ค.
  • \phi๋Š” ๋‹ค์Œ ํƒ€์ž„์Šคํ…์˜ ๊ฐ์ฒด ์ƒํƒœ \hat{s}_t๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ์ถ”์ •๊ธฐ๋Š” Transformer ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ํŠน์ง• ์‹œํ€€์Šค f_t = [q_t, a_{t-1}, q_t - a_{t-1}, \hat{s}_{t-1}, z_t]๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ \hat{s}_t๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • ํ•™์Šต์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ \pi_{plan}์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กค์•„์›ƒ(rollout)ํ•˜๋ฉฐ, ์˜ˆ์ธก๋œ ์ฟผํ„ฐ๋‹ˆ์–ธ๊ณผ ground-truth ์ฟผํ„ฐ๋‹ˆ์–ธ ์‚ฌ์ด์˜ ํšŒ์ „ ๊ฑฐ๋ฆฌ๊ฐ€ 0.8 ๋ผ๋””์•ˆ์„ ์ดˆ๊ณผํ•˜๊ฑฐ๋‚˜ ์˜ˆ์ธก๋œ ๊ฐ์ฒด ์œ„์น˜๊ฐ€ 3cm ์ด์ƒ ๋ฒ—์–ด๋‚˜๋ฉด ์—ํ”ผ์†Œ๋“œ๋ฅผ ๋ฆฌ์…‹ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ \ell_2 distance๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ:

  • ์ •์ฑ… ํ•™์Šต ์„ฑ๋Šฅ: ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์„ ์‚ฌ์šฉํ•œ ๊ณ„์ธต์  ์ •์ฑ…์€ ํ•™์Šต์—์„œ scratch๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” baseline ์ •์ฑ…๋ณด๋‹ค 8๋ฐฐ ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋ฉฐ, ๋” ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๊ฐ์ฒด ์ƒํƒœ ์ •๋ณด์— ๋…ธ์ด์ฆˆ๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก baseline์€ ๋ถˆ์•ˆ์ •ํ•ด์ง€๊ณ  ์ˆ˜๋ ด์— ์‹คํŒจํ•˜๋Š” ๋ฐ˜๋ฉด, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์•ˆ์ •์ ์ธ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์ด ํƒ์ƒ‰ ๊ณต๊ฐ„์„ ๊ตฌ์กฐํ™”ํ•˜๊ณ  ์˜๋ฏธ ์—†๋Š” ๋ฌด์ž‘์œ„ ํ–‰๋™์„ ์ค„์—ฌ์ฃผ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
  • Out-of-Distribution Robustness: ์ œ์•ˆ๋œ ์ •์ฑ…์€ ๊ด€์ธก ๋…ธ์ด์ฆˆ, ๋ฌผ๋ฆฌ์  ๋ฌด์ž‘์œ„ํ™”(physical randomizations), ๊ฐ์ฒด ํ˜•ํƒœ ๋ณ€ํ™”์™€ ๊ฐ™์€ out-of-distribution ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ baseline๋ณด๋‹ค ํ›จ์”ฌ ๊ฐ•๊ฑดํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • Generalizable State Estimation: ํ•™์Šต๋œ ์ž์„ธ ์ถ”์ •๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์˜ˆ์ธก๋œ ๊ฐ์ฒด ์ƒํƒœ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์‹ค์ œ ์„ธ๊ณ„์— ์ •์ฑ…์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ „์ดํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ baseline๋ณด๋‹ค policy smoothness์™€ energy metrics์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ, ๋” ์•ˆ์ •์ ์ธ ๊ฐ์ฒด ์กฐ์ž‘์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  • Ablation Experiments:
    • Residual Actions ๋ฐ Low-Level Skill Feedback: ์ด ๋‘ ์š”์†Œ๊ฐ€ ์—†์œผ๋ฉด ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค. Residual actions๋Š” ๋ฏธ์„ธํ•œ ์˜ค๋ฅ˜ ๋ณด์ •์„ ์ œ๊ณตํ•˜๊ณ , z_t๋ฅผ ํ†ตํ•œ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ํ”ผ๋“œ๋ฐฑ์€ \pi_{plan}์ด low-level skill์˜ ๋‚ด๋ถ€ ์ƒํƒœ์™€ ๊ฐ์ฒด ์†์„ฑ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ํ•„์ˆ˜์ ์ž„์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.
    • Planner Policy Inputs: ์ฟผํ„ฐ๋‹ˆ์–ธ ์ฐจ์ด๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ, ๊ฐ์ฒด ์œ„์น˜, ๊ด€์ธก ๊ธฐ๋ก, ์ด์ „ planner actions, ๊ทธ๋ฆฌ๊ณ  ๊ณ ์œ ์ˆ˜์šฉ์„ฑ ์ •๋ณด๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ์ถ”๊ฐ€ํ•˜๋ฉด์„œ ์ •์ฑ… ์„ฑ๋Šฅ์ด ์ ์ง„์ ์œผ๋กœ ํ–ฅ์ƒ๋จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Š” planner์™€ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ์ •์ฑ… ๊ฐ„์˜ closed-loop feedback์˜ ์ค‘์š”์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.
  • ์‹ค์ œ ์„ธ๊ณ„ ์‹คํ—˜: Allegro Hand ๋กœ๋ด‡์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ์—†์—ˆ๋˜ (out-of-distribution) 6๊ฐ€์ง€ ๋‹ค์–‘ํ•œ ์‹ค์ œ ๊ฐ์ฒด์— ๋Œ€ํ•ด ์„ฑ๊ณต์ ์ธ in-hand reorientation์„ ์‹œ์—ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ž‘์€ ํ๋ธŒ์™€ ๊ฐ™์€ ์กฐ์ž‘ํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฐ์ฒด์—๋„ ์ž˜ ์ผ๋ฐ˜ํ™”๋˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก  ๋ฐ ํ•œ๊ณ„:

๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์„ ํ™œ์šฉํ•˜์—ฌ in-hand object reorientation์„ ์œ„ํ•œ ๊ณ„์ธต์  ์ •์ฑ…์„ ๊ตฌ์ถ•ํ•˜๊ณ , ๊ฐ•๊ฑดํ•˜๊ณ  ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ์ƒํƒœ ์ถ”์ •๊ธฐ๋ฅผ ํ•™์Šตํ•จ์œผ๋กœ์จ, ํ›ˆ๋ จ ํšจ์œจ์„ฑ, ๊ฐ•๊ฑด์„ฑ, ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•œ๊ณ„์ ์œผ๋กœ๋Š” ์ €์ˆ˜์ค€ ์ •์ฑ…์˜ ํšจ๊ณผ์„ฑ์— ์˜์กดํ•˜๋ฉฐ, ์†๊ฐ€๋ฝ๊ณผ ๊ฐ์ฒด ์‚ฌ์ด์— ๋ฏธ๋„๋Ÿฌ์ง(slipping)์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฐ€์ •์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ˜„์žฌ ์ž์„ธ ์ถ”์ • ์˜ค์ฐจ๊ฐ€ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ˆ„์ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ๋กœ๋Š” ์ด‰๊ฐ ์„ผ์„œ(tactile sensing)๋ฅผ ํ†ตํ•ฉํ•˜๊ณ  ์‹œ๊ฐ(vision)๊ณผ ์ด‰๊ฐ(touch)์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ •ํ™•ํ•˜๊ณ  ์žฅ๊ธฐ์ ์ธ ์ž์„ธ ์ถ”์ ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

๋“ค์–ด๊ฐ€๋ฉฐ: ์™œ โ€œ๋‹ค์‹œโ€ ์†์•ˆ ์žฌ๋ฐฐํ–ฅ์ธ๊ฐ€

์†์•ˆ ํšŒ์ „(in-hand rotation)์€ ํ’€๋ฆฐ ์ค„ ์•Œ์•˜๋‹ค. 2022๋…„ HORA๊ฐ€ z์ถ• ํšŒ์ „์„ ์†๋๋งŒ์œผ๋กœ ํ’€๊ณ , 2023๋…„ RotateIt์ด ์ž„์˜ ์ถ•์œผ๋กœ ํ™•์žฅํ•˜๋ฉด์„œ, ๋‹ค์ถ• ํšŒ์ „ ์ž์ฒด๋Š” ๋” ์ด์ƒ ๋ฏธํ•ด๊ฒฐ ๋ฌธ์ œ์ฒ˜๋Ÿผ ๋ณด์ด์ง€ ์•Š์•˜๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๊ฐ™์€ ๊ทธ๋ฃน(Qi et al., Berkeley/Meta)์ด 2025๋…„์— ๋˜ ๋‹ค๋ฅธ ๋…ผ๋ฌธ์„ ๋‚ด๋†“์•˜๋‹ค. ์ œ๋ชฉ์€ ใ€ŒFrom Simple to Complex Skills: The Case of In-Hand Object Reorientationใ€. ํšŒ์ „(rotation)์ด ์•„๋‹Œ ์žฌ๋ฐฐํ–ฅ(reorientation), ์ฆ‰ ๋ชฉํ‘œ ์ž์„ธ(target pose)์— ๋„๋‹ฌํ•˜๋Š” ๋ฌธ์ œ๋‹ค.

์–ธ๋œป ๋ณด๋ฉด ์ž‘์€ ์ฐจ์ด ๊ฐ™์ง€๋งŒ, ์‹ค์ œ ์ฐจ์ด๋Š” ํฌ๋‹ค. โ€œ๊ณ„์† ๋Œ๋ฆฌ๊ธฐโ€์™€ โ€œ์ •ํ•ด์ง„ ๊ฐ๋„์— ๋ฉˆ์ถ”๊ธฐโ€๋Š” RL ์ž…์žฅ์—์„œ ๋ณด๋ฉด ๊ฑฐ์˜ ๋‹ค๋ฅธ ๋ฌธ์ œ๋‹ค. ์ „์ž๋Š” ํšŒ์ „ ์†๋„(angular velocity)์— ๋น„๋ก€ํ•œ ๋ณด์ƒ์œผ๋กœ ๋์—†์ด ๋Œ๊ฒŒ ๋‘๋ฉด ๋˜์ง€๋งŒ, ํ›„์ž๋Š” โ€œ์–ด๋””๊นŒ์ง€ ๋Œ๋ ธ๋Š”์ง€โ€๋ฅผ ์•Œ์•„์•ผ ํ•˜๊ณ , ์–ด๋А ์ˆœ๊ฐ„ ๋ฉˆ์ถฐ์•ผ ํ•˜๋ฉฐ, ์ž˜๋ชป ๋Œ๋ฆฌ๋ฉด ๋˜๋Œ๋ ค์•ผ ํ•œ๋‹ค. ์†์•ˆ์—์„œ ์ผ์–ด๋‚˜๋Š” ์ผ์„ โ€œ์ •ํ™•ํžˆโ€ ์•Œ์•„์•ผ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์ด ๋…ผ๋ฌธ์ด ํฅ๋ฏธ๋กœ์šด ์ด์œ ๋Š” ๋‹ต์„ ํ’€์–ด๊ฐ€๋Š” ๋ฐฉ์‹์ด ์šฐ์•„ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๋ฏธ ์ž˜ ํ’€๋ฆฐ ๋‹จ์ˆœ ์Šคํ‚ฌ(์ถ•๋ณ„ ํšŒ์ „ ์ •์ฑ…)์„ ๊ทธ๋Œ€๋กœ ๋‘๊ณ , ๊ทธ ์œ„์— ์–‡์€ ํ”Œ๋ž˜๋„ˆ๋ฅผ ์–น์–ด ๋ณต์žกํ•œ ์ž‘์—…์„ ๋งŒ๋“ ๋‹ค. ๋ณด์ƒ ์—”์ง€๋‹ˆ์–ด๋ง๊ณผ ๋„๋ฉ”์ธ ๋žœ๋คํ™”, ์‹œ์Šคํ…œ ์‹๋ณ„์„ ์ƒˆ ์ž‘์—…๋งˆ๋‹ค ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‹ค์‹œ ๊นŽ์•„์•ผ ํ–ˆ๋˜ ๊ธฐ์กด sim-to-real ์›Œํฌํ”Œ๋กœ์˜ ์ธ๊ฑด๋น„ ๋ฌธ์ œ๋ฅผ ์ •๋ฉด์œผ๋กœ ๋“ค์ด๋ฐ›๋Š” ์…ˆ์ด๋‹ค.

Noteํ•œ ์ค„ ์ •๋ฆฌ

์ €์ˆ˜์ค€ ์Šคํ‚ฌ(์ถ•๋ณ„ in-hand rotation)์„ ๋™๊ฒฐ์‹œ์ผœ ๋‘๊ณ , ๊ทธ ์œ„์— (1) ์–ด๋А ์ถ•์œผ๋กœ ๋Œ๋ฆด์ง€ ๊ณ ๋ฅด๋Š” ํ”Œ๋ž˜๋„ˆ, (2) ์ž”์ฐจ ๋ณด์ • ํ–‰๋™์„ ๋”ํ•˜๋Š” ์ž”์ฐจ ์ •์ฑ…, (3) ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๋Š” ์ƒํƒœ ์ถ”์ •๊ธฐ, ์ด ์„ธ ๊ฐ€์ง€๋งŒ ์ƒˆ๋กœ ํ•™์Šตํ•œ๋‹ค.

๋ฌธ์ œ ์„ค์ •: ํšŒ์ „๊ณผ ์žฌ๋ฐฐํ–ฅ์˜ ๋ฏธ๋ฌ˜ํ•˜์ง€๋งŒ ๊ฒฐ์ •์ ์ธ ์ฐจ์ด

์†์•ˆ ํšŒ์ „ ์ž‘์—…์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์“ด๋‹ค.

\max_\pi \; \mathbb{E}\left[\sum_{t=0}^T r_{\text{rot}}(s_t, a_t)\right], \quad r_{\text{rot}} = \omega_{\text{obj}} \cdot \hat{k}

์—ฌ๊ธฐ์„œ \hat{k}๋Š” ๋ชฉํ‘œ ํšŒ์ „์ถ•, \omega_{\text{obj}}๋Š” ๋ฌผ์ฒด์˜ ๊ฐ์†๋„๋‹ค. ์ฆ‰ โ€œ๋ชฉํ‘œ ์ถ• ๋ฐฉํ–ฅ์œผ๋กœ ๋” ๋นจ๋ฆฌ ๋Œ์ˆ˜๋ก ์ข‹๋‹คโ€๋Š” ๋‹จ์ˆœํ•œ ๋ณด์ƒ์ด๋‹ค. ๋(termination)์ด ์—†๋‹ค.

์žฌ๋ฐฐํ–ฅ์€ ๋‹ค๋ฅด๋‹ค. ์‹œ์ž‘ ์ž์„ธ q_0์—์„œ ๋ชฉํ‘œ ์ž์„ธ q^*๊นŒ์ง€ ๊ฐ€์•ผ ํ•œ๋‹ค.

r_{\text{reorient}} = -\,\angle(q_t, q^*) + \mathbb{1}[\angle(q_t, q^*) < \epsilon] \cdot R_{\text{success}}

์ฟผํ„ฐ๋‹ˆ์–ธ ๊ฑฐ๋ฆฌ(angular distance)์™€ ์„ฑ๊ณต ๋ณด๋„ˆ์Šค, ๋‘ ํ•ญ์œผ๋กœ ๋๋‚ธ๋‹ค. ๋ฌธ์ œ๋Š” RL์ด \angle(q_t, q^*)๋ฅผ ํ•™์Šต ์‹ ํ˜ธ๋กœ ๋ฐ›๊ธฐ์—๋Š” ๋„ˆ๋ฌด sparseํ•˜๊ณ , ๋” ํฐ ๋ฌธ์ œ๋Š” ์†์•ˆ์—์„œ ๋ฌผ์ฒด๊ฐ€ ์–ด๋–ป๊ฒŒ ๋†“์—ฌ ์žˆ๋Š”์ง€ ์‹ค์„ธ๊ณ„์—์„œ ๋ชจ๋ฅธ๋‹ค๋Š” ์ ์ด๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ๋ฌผ๋ฆฌ ์—”์ง„์ด ์ž์„ธ๋ฅผ ๊ณต์งœ๋กœ ์•Œ๋ ค์ฃผ์ง€๋งŒ, ์‹ค์ œ ์†์€ ์†๊ฐ€๋ฝ ๊ด€์ ˆ ์ธ์ฝ”๋”๋ฐ–์— ์—†๋‹ค. ์นด๋ฉ”๋ผ๊ฐ€ ์žˆ์–ด๋„ ์†๊ฐ€๋ฝ์— ๊ฐ€๋ ค์ง„๋‹ค(occlusion).

์ด๊ฒŒ ๋‹จ์ˆœํ•œ RL ๋ฌธ์ œ๊ฐ€ ์•„๋‹Œ ์ด์œ ๋‹ค. ํ–‰๋™ ์ •์ฑ…๊ณผ ์ธ์‹(perception)์„ ๋™์‹œ์— ํ’€์–ด์•ผ ํ•œ๋‹ค.

ํ•ต์‹ฌ ์•„์ด๋””์–ด: ๋‹จ์ˆœ ์Šคํ‚ฌ ์œ„์— ์–‡์€ ํ”Œ๋ž˜๋„ˆ ํ•œ ์žฅ

์ €์ž๋“ค์˜ ์ ‘๊ทผ์€ ํ•œ ๋งˆ๋””๋กœ โ€œ์ด๋ฏธ ๊ฐ€์ง„ ๊ฑธ ๋‹ค์‹œ ์งœ์ง€ ๋ง์ž(donโ€™t reinvent the wheel)โ€๋‹ค.

flowchart LR
    A[Goal pose q*] --> P[Planner pi_plan]
    O[Object state estimate] --> P
    F[Skill feedback] --> P
    Pr[Proprioception] --> P

    P -->|"axis (one-hot)"| S[Skill pi_skill - frozen]
    P -->|"residual a_res"| Sum[+]

    S -->|"a_skill"| Sum
    Sum -->|"a_t = a_skill + a_res"| R[Robot Hand]
    R --> Pr
    R --> O
    S --> F

์œ„ ๊ทธ๋ฆผ์ด ์‹œ์Šคํ…œ ์ „๋ถ€๋‹ค. ๊ฐ ๋ธ”๋ก์„ ์ฐจ๋ก€๋กœ ํ’€์–ด ๋ณธ๋‹ค.

์ €์ˆ˜์ค€ ์Šคํ‚ฌ \pi_{\text{skill}}: RotateIt์„ ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์˜จ๋‹ค

๊ธฐ๋ฐ˜์€ RotateIt(Qi et al., CoRL 2023). ์ž…๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑ๋œ๋‹ค.

  • ๋กœ๋ด‡ ๊ด€์ ˆ ์œ„์น˜ q_{\text{robot},t}
  • ์ง์ „ ๋ช…๋ น ๊ด€์ ˆ ํƒ€๊ฒŸ a_{t-1}
  • ์†๋ฐ”๋‹ฅ ์‹œ์  depth ์ด๋ฏธ์ง€(๊ฒฝ๋Ÿ‰ CNN์œผ๋กœ ์ธ์ฝ”๋”ฉ)
  • ํšŒ์ „์ถ• ๋ช…๋ น \hat{k} (one-hot ๋˜๋Š” unit vector)

์ด ์ž…๋ ฅ์„ Transformer๋กœ ์ฒ˜๋ฆฌํ•ด ๋‹จ์ผ ๋ฒกํ„ฐ๋กœ ์••์ถ•ํ•˜๊ณ , ๋‘ ๊ฐˆ๋ž˜์˜ ํ—ค๋“œ๋ฅผ ๋‹จ๋‹ค.

  1. ์ •์ฑ… ํ—ค๋“œ: ๊ด€์ ˆ ํƒ€๊ฒŸ ์ถœ๋ ฅ a_{\text{skill}}
  2. ๋ฌผ์„ฑ ์˜ˆ์ธก ํ—ค๋“œ: ๋ฌผ์ฒด์˜ ๋ฌผ๋ฆฌ์  ์†์„ฑ(์งˆ๋Ÿ‰, ๋งˆ์ฐฐ, ํ˜•์ƒ ๋“ฑ) ์˜ˆ์ธก

์—ฌ๊ธฐ๊ฐ€ ํ•ต์‹ฌ ํŠธ๋ฆญ์ด๋‹ค. ๋ฌผ์„ฑ ์˜ˆ์ธก ํ—ค๋“œ๋Š” ํ•™์Šต์—๋งŒ ์“ฐ๋Š” ๋ณด์กฐ ์†์‹ค์ด ์•„๋‹ˆ๋ผ, ์ถ”๋ก  ์‹œ ํ”Œ๋ž˜๋„ˆ์—๊ฒŒ โ€œ๋‚ด๊ฐ€ ๋งŒ์ง€๊ณ  ์žˆ๋Š” ๋ฌผ์ฒด๊ฐ€ ์–ด๋–ค ๋†ˆ์ธ์ง€โ€๋ฅผ ์•Œ๋ ค์ฃผ๋Š” ํ”ผ๋“œ๋ฐฑ ์‹ ํ˜ธ๋‹ค. ์ผ๋ฐ˜์ ์ธ hierarchical RL์€ ์ €์ˆ˜์ค€์ด โ€œ์ž˜ ๋๋Š”์ง€โ€ ์ •๋„๋งŒ ์œ„๋กœ ๋ณด๋‚ด์ง€๋งŒ, ์—ฌ๊ธฐ๋Š” ์ €์ˆ˜์ค€์ด ์ž์‹ ์ด ๋งŒ์ง€๋Š” ๋Œ€์ƒ์˜ ํ‘œํ˜„(representation)์„ ์œ„๋กœ ํ˜๋ ค๋ณด๋‚ธ๋‹ค.

Tip์ง๊ด€

์ €์ˆ˜์ค€ ์Šคํ‚ฌ์„ ์ž˜ ๋งŒ๋“  ์†๊ฐ€๋ฝ ๋์˜ ๊ฐ๊ฐ ๋‰ด๋Ÿฐ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค. ๊ทธ ๋‰ด๋Ÿฐ์€ โ€œ์ง€๊ธˆ ๋Œ๋ฆฌ๊ณ  ์žˆ๋Š” ๊ฒƒโ€์˜ ์ •์ฒด๋ฅผ ์–ด๋ ดํ’‹์ด ์•ˆ๋‹ค. ๊ตณ์ด ์œ„์ชฝ ๋‡Œ๊ฐ€ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‹ค์‹œ ์ถ”์ •ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค.

ํ”Œ๋ž˜๋„ˆ \pi_{\text{plan}}: ์ถ•์„ ๊ณ ๋ฅด๊ณ  ๋ฏธ์„ธ ๋ณด์ •์„ ๋”ํ•œ๋‹ค

ํ”Œ๋ž˜๋„ˆ์˜ ์ž…๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ์ถ”์ •๋œ ๋ฌผ์ฒด ์ž์„ธ \hat{p}_t, \hat{q}_t
  • ๋ชฉํ‘œ ์ž์„ธ q^*, ๋˜๋Š” ์ƒ๋Œ€ ํšŒ์ „ \Delta q = q^* \otimes \hat{q}_t^{-1}
  • ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์˜ ํ”ผ๋“œ๋ฐฑ(๋ฌผ์„ฑ ์˜ˆ์ธก ์ž„๋ฒ ๋”ฉ)
  • ๊ณ ์œ ๊ฐ๊ฐ q_{\text{robot},t}

์ถœ๋ ฅ์€ ๋‘ ๊ฐœ๋‹ค.

  1. ํšŒ์ „์ถ• ๋ช…๋ น \hat{k}_t: 6๊ฐœ ํ›„๋ณด ์ถ•(ยฑx, ยฑy, ยฑz) ์ค‘ ํ•˜๋‚˜๋ฅผ one-hot์œผ๋กœ ์„ ํƒ
  2. ์ž”์ฐจ ํ–‰๋™ a_{\text{res},t}: ๊ด€์ ˆ ๊ณต๊ฐ„์˜ ์ž‘์€ ๋ณด์ •

์ตœ์ข… ํ–‰๋™์€ ๋‹จ์ˆœํ•œ ํ•ฉ์ด๋‹ค.

a_t = \pi_{\text{skill}}(o_t, \hat{k}_t) + \alpha \cdot a_{\text{res},t}

\alpha๋Š” ์ž”์ฐจ์˜ ํฌ๊ธฐ๋ฅผ ์ œํ•œํ•˜๋Š” ์Šค์ผ€์ผ๋ง์ด๋‹ค. ์ด ํ•ฉ์‚ฐ ๊ตฌ์กฐ๋Š” ์ด์ „ cascaded compositional residual learning(Kumar et al.)์˜ ์ „ํ†ต์„ ๋”ฐ๋ฅธ๋‹ค.

Important์™œ โ€œ์ถ• ์„ ํƒ + ์ž”์ฐจโ€๊ฐ€ ์ž˜ ๋™์ž‘ํ•˜๋Š”๊ฐ€

๊ณ„์ธต์  RL์˜ ๊ณ ์ „์  ์‹คํŒจ ํŒจํ„ด์€ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์ด ๊นจ์งˆ ๋•Œ ์œ„์—์„œ ์†์„ ์“ธ ๋ฐฉ๋ฒ•์ด ์—†๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ €์ˆ˜์ค€์ด โ€œ์ฅ๊ณ  ๋Œ๋ฆฌ๊ธฐโ€๋ผ๊ณ  ๊ฐ€์ •ํ–ˆ๋Š”๋ฐ ์‹ค์ œ๋กœ๋Š” ๋ฏธ๋„๋Ÿฌ์ ธ ๋–จ์–ด์ง€๋Š” ์ค‘์ด๋ผ๋ฉด, ์œ„๊ฐ€ โ€œ๋‹ค์‹œ ๋Œ๋ คโ€๋ผ๊ณ  ๋ช…๋ นํ•ด ๋ด์•ผ ์˜๋ฏธ๊ฐ€ ์—†๋‹ค. ์ž”์ฐจ ํ–‰๋™์€ ์ด ๋‹จ์ ˆ์„ ๋ฉ”์šด๋‹ค. ์œ„๊ฐ€ ์ €์ˆ˜์ค€์—๊ฒŒ ๋ช…๋ น์„ ๋‚ด๋ฆด ๋ฟ ์•„๋‹ˆ๋ผ, ์ง์ ‘ ์†๊ฐ€๋ฝ ๊ด€์ ˆ์„ ์‚ด์ง ์›€์ง์ผ ๊ถŒํ•œ๋„ ๊ฐ€์ง„๋‹ค. ๊ณ„์ธต์„ ๋‘๋˜ ์™„์ „ํžˆ ๊ฒฉ๋ฆฌํ•˜์ง€ ์•Š๋Š”๋‹ค.

๋ณด์ƒ์˜ ๋‹จ์ˆœํ™”

๋ณด์ƒ์€ ์•ž์—์„œ ๋ณธ ๋‘ ํ•ญ๋ฟ์ด๋‹ค.

r_t = -\angle(\hat{q}_t, q^*) + \mathbb{1}[\angle(\hat{q}_t, q^*) < \epsilon]\,R_{\text{success}}

์ด๊ฒŒ ๊ฐ€๋Šฅํ•œ ์ด์œ ๋Š” ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์ด ์ด๋ฏธ ๊ฐ•๊ฑดํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋–จ์–ด๋œจ๋ฆผ ํŽ˜๋„ํ‹ฐ, ์†๊ฐ€๋ฝ ์ถฉ๋Œ ํŽ˜๋„ํ‹ฐ, ํ–‰๋™ ์ •๊ทœํ™”, ํ† ํฌ ํŽ˜๋„ํ‹ฐ ๊ฐ™์€ ๋ถ€๊ฐ€ ํ•ญ์„ ์ผ์ผ์ด ํŠœ๋‹ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. ๋–จ์–ด๋œจ๋ฆผ์€ ์ €์ˆ˜์ค€์ด ์•Œ์•„์„œ ๋ฐฉ์ง€ํ•˜๊ณ , ํ–‰๋™ ๋ถ€๋“œ๋Ÿฌ์›€์€ ์‚ฌ์ „ ํ•™์Šต๋œ ์ •์ฑ…์˜ prior์— ์ด๋ฏธ ๋“ค์–ด ์žˆ๋‹ค. ์œ„ ๋ณด์ƒ์€ โ€œ์–ด๋””๋กœ ๊ฐ€์•ผ ํ•˜๋Š”์ง€โ€๋งŒ ๋งํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

์ด๊ฒŒ ๋…ผ๋ฌธ ์ œ๋ชฉ์˜ ํ•จ์˜๋‹ค. ๋ณต์žกํ•œ ์ž‘์—…์˜ ๋ณด์ƒ์„ ๋‹จ์ˆœํ™”ํ•˜๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€, ๋‹จ์ˆœ ์ž‘์—…์˜ ์ •์ฑ…์„ ์ž˜ ๋งŒ๋“ค์–ด ๋‘๊ณ  ๊ทธ๊ฒƒ์— ์ž‘์—…์„ ์œ„์ž„ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์ž์„ธ ์ถ”์ •๊ธฐ: ์†๊ฐ€๋ฝ๋งŒ์œผ๋กœ ์†์•ˆ์˜ ๋ฌผ์ฒด๋ฅผ ๋ณธ๋‹ค

์—ฌ๊ธฐ๊ฐ€ ์ด ๋…ผ๋ฌธ์—์„œ ๊ฐ€์žฅ ์˜๋ฆฌํ•œ ๋ถ€๋ถ„์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ์‹ค์„ธ๊ณ„ ๋ฐฐํฌ์˜ ๊ฐ€์žฅ ํฐ ๋ฒฝ์ด ์ธ์‹์ด๋ผ๋Š” ์ ์„ ์ •๋ฉด์œผ๋กœ ์ธ์ •ํ•˜๊ณ , ์‹œ๊ฐ์ด ์•„๋‹Œ ๊ณ ์œ ๊ฐ๊ฐ + ์Šคํ‚ฌ ๋‚ด๋ถ€ ์‹ ํ˜ธ๋กœ ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ณ„๋„ ๋„คํŠธ์›Œํฌ๋ฅผ ๋งŒ๋“ ๋‹ค.

์ถ”์ •๊ธฐ ์ž…๋ ฅ

์ถ”์ •๊ธฐ g_\phi๋Š” ์‹œ๊ฐ„์— ๋”ฐ๋ผ ์ž์„ธ๋ฅผ ๊ฐฑ์‹ ํ•˜๋Š” ์žฌ๊ท€(recurrent) ๊ตฌ์กฐ๋กœ, ์ž…๋ ฅ์€ ๋‹ค์Œ์ด๋‹ค.

  • ๊ณ ์œ ๊ฐ๊ฐ q_{\text{robot},t}
  • ์ง์ „ ํ–‰๋™ a_{t-1}
  • ์ œ์–ด ์˜ค์ฐจ e_t = a_{t-1} - q_{\text{robot},t} (๋ช…๋ นํ•œ ๊ด€์ ˆ ํƒ€๊ฒŸ๊ณผ ์‹ค์ œ ๋„๋‹ฌ ์œ„์น˜์˜ ์ฐจ์ด)
  • ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์˜ ์ž„๋ฒ ๋”ฉ z_t = \pi_{\text{skill}}(\cdot)์˜ ์ค‘๊ฐ„ ํ‘œํ˜„
  • ์ง์ „ ์ž์„ธ ์ถ”์ • \hat{q}_{t-1}, \hat{p}_{t-1}

์ถœ๋ ฅ์€ (\hat{p}_t, \hat{q}_t), ์ฆ‰ ๋ฌผ์ฒด์˜ 3D ์œ„์น˜์™€ ๋‹จ์œ„ ์ฟผํ„ฐ๋‹ˆ์–ธ ๋ฐฉํ–ฅ์ด๋‹ค.

Note์™œ ์ œ์–ด ์˜ค์ฐจ๊ฐ€ ๊ฒฐ์ •์ ์ธ๊ฐ€

์†๊ฐ€๋ฝ์ด ์–ด๋–ค ๊ด€์ ˆ ํƒ€๊ฒŸ์„ ๋ช…๋ น๋ฐ›์•˜๋Š”๋ฐ ๊ฑฐ๊ธฐ๊นŒ์ง€ ๋ชป ๊ฐ”๋‹ค๋ฉด, ๋ฌด์–ธ๊ฐ€๊ฐ€ ๋ง‰๊ณ  ์žˆ๋‹ค. ๊ทธ โ€œ๋ฌด์–ธ๊ฐ€โ€๊ฐ€ ๋ฌผ์ฒด๋‹ค. ์ œ์–ด ์˜ค์ฐจ์˜ ์‹œ๊ณ„์—ด์€ ์‚ฌ์‹ค์ƒ ์•”๋ฌต์  ์ด‰๊ฐ์ด๋‹ค. ๋ณ„๋„์˜ ์ด‰๊ฐ ์„ผ์„œ ์—†์ด๋„, ๋ช…๋ น๊ณผ ์‹ค์ œ์˜ ๊ฐญ์ด ์ ‘์ด‰ ์ •๋ณด๋ฅผ ํ˜๋ ค๋ณด๋‚ธ๋‹ค. HORA์˜ rapid motor adaptation๋„ ๋น„์Šทํ•œ ํ†ต์ฐฐ์„ ์ผ๋‹ค.

ํ•™์Šต ์ „๋žต

์ถ”์ •๊ธฐ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ GT ์ž์„ธ๋ฅผ ๊ฐ๋… ์‹ ํ˜ธ๋กœ ์‚ฌ์šฉํ•ด ํ•™์Šตํ•œ๋‹ค. ์†์‹ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

\mathcal{L}_{\text{pose}} = \|p_t - \hat{p}_t\|_2^2 + d_{\text{quat}}(q_t, \hat{q}_t)

์—ฌ๊ธฐ์„œ d_{\text{quat}}์€ ์ฟผํ„ฐ๋‹ˆ์–ธ ๊ฑฐ๋ฆฌ(์˜ˆ: 1 - |q \cdot \hat{q}|). ํ•™์Šต ์‹œ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด, ๋งˆ์ฐฐ, ์งˆ๋Ÿ‰์œผ๋กœ ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ด ์ผ๋ฐ˜ํ™”๋ฅผ ์œ ๋„ํ•œ๋‹ค. ์ •์ฑ… ํ•™์Šต๊ณผ๋Š” ๋ถ„๋ฆฌ๋˜์–ด, ๋ณ„๋„๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์‚ฌ์ „ ํ•™์Šต๋œ ๋‹ค์Œ ์ •์ฑ… ํ•™์Šต์— freeze๋œ ์ƒํƒœ๋กœ ๋“ค์–ด๊ฐ„๋‹ค.

์˜์‚ฌ ์ฝ”๋“œ

# Pseudocode for one timestep at deployment
def step(t, q_robot, prev_action, hand_state):
    # 1. Estimate object pose
    e_t = prev_action - q_robot              # control error
    z_t = skill_embedding(q_robot, prev_action, depth=None)
    pose_t = pose_estimator(prev_pose, q_robot, prev_action, e_t, z_t)

    # 2. Planner decides axis + residual
    axis_logits, a_res = planner(pose_t, goal_pose, z_t, q_robot)
    axis = one_hot_argmax(axis_logits)        # 6 candidates (+/- x,y,z)

    # 3. Low-level skill produces base action
    a_skill = skill_policy(q_robot, prev_action, axis)

    # 4. Combine
    a_t = a_skill + alpha * a_res

    # 5. Send to PD controller
    send_joint_targets(a_t)
    return a_t, pose_t

์ด ์˜์‚ฌ ์ฝ”๋“œ ํ•œ ์žฅ์ด ์ „์ฒด ์‹œ์Šคํ…œ์˜ ์‹œ๊ฐ„ ํ•œ ์Šคํ…์ด๋‹ค. ๊นŠ์ด ์นด๋ฉ”๋ผ๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ์‚ฌ์šฉ๋˜๊ฑฐ๋‚˜, ์™ธ๋ถ€ RGB-D๋กœ ์ดˆ๊ธฐํ™” ๋‹จ๊ณ„์—๋งŒ ์“ฐ์ด๋Š” ์‹์œผ๋กœ ์šด์šฉ๋œ๋‹ค.

์‹œ์Šคํ…œ ๊ตฌ์กฐ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ

flowchart TB
    subgraph Inputs["Inputs (per timestep)"]
        Q[q_robot]
        A_prev[a_t-1]
        E[error e_t = a_t-1 - q_robot]
    end

    subgraph PoseEst["Pose Estimator (RNN)"]
        EST[g_phi]
    end

    subgraph Skill["Low-level Skill - FROZEN"]
        TF[Transformer encoder]
        H1[Action head a_skill]
        H2[Property head z_t]
    end

    subgraph Planner["High-level Planner"]
        MLP[MLP / RNN]
        AXIS[axis one-hot]
        RES[residual a_res]
    end

    Inputs --> EST
    EST --> POSE[pose_t]

    Q --> TF
    A_prev --> TF
    AXIS --> TF
    TF --> H1
    TF --> H2

    POSE --> MLP
    GOAL[goal q*] --> MLP
    H2 --> MLP
    Q --> MLP
    MLP --> AXIS
    MLP --> RES

    H1 --> SUM[Sum]
    RES --> SUM
    SUM --> OUT[a_t]

pi_skill์€ ํ•™์Šต์ด ๋๋‚œ ๋‹ค์Œ์—๋Š” freeze๋˜์–ด ๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ€ ํ๋ฅด์ง€ ์•Š๋Š”๋‹ค. pi_plan๊ณผ g_phi๋งŒ ์ƒˆ ์ž‘์—…๋งˆ๋‹ค ํ•™์Šตํ•œ๋‹ค. ์ด๊ฒŒ sample efficiency๋ฅผ ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ํ•ต์‹ฌ์ด๋‹ค.

์‹คํ—˜: ๋ฌด์—‡์„ ๋ฌป๊ณ , ๋ฌด์—‡์ด ๋‹ต์ธ๊ฐ€

์‹คํ—˜ ์„ค์ •

ํ•ญ๋ชฉ ๊ฐ’
ํ•˜๋“œ์›จ์–ด 4-finger ๋‹ค์ง€ ํ•ธ๋“œ (Allegro Hand) + RGB-D ์นด๋ฉ”๋ผ
์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ IsaacGym (HORA/RotateIt ๊ณ„์—ด๊ณผ ๋™์ผ)
ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ PPO (high-level planner), pose estimator๋Š” supervised
ํ‰๊ฐ€ ๋ฌผ์ฒด ํ•™์Šต ๋ถ„ํฌ ๋‚ด + ๋ถ„ํฌ ์™ธ(OOD) ํ˜•์ƒ, ๋Œ€์นญ/๋ฌดํ…์Šค์ฒ˜ ํฌํ•จ
๋น„๊ต๊ตฐ from-scratch RL, no-residual, no-feedback

ํ•ต์‹ฌ ๋น„๊ต: ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šต vs ๊ณ„์ธต์ 

๋…ผ๋ฌธ์ด ์ •๋Ÿ‰์ ์œผ๋กœ ๋ฌป๋Š” ์ฒซ ๋ฒˆ์งธ ์งˆ๋ฌธ์€ ๋‹จ์ˆœํ•˜๋‹ค. โ€œ๊ตณ์ด ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ์จ์•ผ ํ•˜๋‚˜?โ€

๋‹ต์€ ๋ช…ํ™•ํ•˜๋‹ค. ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•œ ์ •์ฑ…์€ OOD ๋ฌผ์ฒด์™€ ๋…ธ์ด์ฆˆ ์กฐ๊ฑด์—์„œ ๋ฌด๋„ˆ์ง€์ง€๋งŒ, ๊ณ„์ธต์  ์ •์ฑ…์€ ํ•™์Šต ๋ถ„ํฌ๋ฅผ ๋ฒ—์–ด๋‚œ ํ˜•์ƒ๊ณผ ๋งˆ์ฐฐ ์กฐ๊ฑด์—์„œ๋„ ์„ฑ๊ณต๋ฅ ์„ ์œ ์ง€ํ•œ๋‹ค. ํ•™์Šต ๊ณก์„ ์ƒ์œผ๋กœ๋„ ๊ณ„์ธต์  ์ •์ฑ…์€ ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•œ๋‹ค. ์‚ฌ์ „ ํ•™์Šต๋œ ์Šคํ‚ฌ์ด ํƒ์ƒ‰ ๊ณต๊ฐ„์„ ํฌ๊ฒŒ ์ค„์—ฌ ์ฃผ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

Tipํ†ต์ฐฐ

โ€œ์ž˜ ํ•™์Šต๋œ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์€ ์ผ์ข…์˜ prior๋กœ ์ž‘๋™ํ•œ๋‹ค.โ€ ์œ„์ชฝ ์ •์ฑ…์˜ ํƒ์ƒ‰์ด ์˜๋ฏธ ์žˆ๋Š” ํ–‰๋™ ๋ถ„ํฌ ์•ˆ์—์„œ๋งŒ ์ด๋ค„์ง„๋‹ค. ๋–จ์–ด๋œจ๋ฆฌ๊ธฐ, ์†๊ฐ€๋ฝ์ด ์—‰ํ‚ค๊ธฐ, ๋ฌด์˜๋ฏธํ•œ ๊ด€์ ˆ ๋–จ๋ฆผ ๊ฐ™์€ ์‹คํŒจ ๋ชจ๋“œ๋ฅผ ์ž๋™์œผ๋กœ ํšŒํ”ผํ•œ๋‹ค.

์ž”์ฐจ ํ–‰๋™์˜ ํšจ๊ณผ

residual์„ ์ œ๊ฑฐํ•œ ablation์—์„œ ์„ฑ๊ณต๋ฅ ์ด ํฐ ํญ์œผ๋กœ ๋–จ์–ด์ง„๋‹ค. ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์€ ํ‰๊ท ์ ์œผ๋กœ ์ข‹์ง€๋งŒ ๋ชจ๋“  ์ž์„ธ ์ „ํ™˜์—์„œ ์™„๋ฒฝํ•˜์ง€๋Š” ์•Š๋‹ค. ์ž”์ฐจ๋Š” ๊ทธ ๊ฐญ์„ ๋ฉ”์šด๋‹ค. ํŠนํžˆ โ€œ๊ฑฐ์˜ ๋‹ค ๋๋Š”๋ฐ ๋งˆ์ง€๋ง‰ 5๋„๊ฐ€ ๋ถ€์กฑํ•œโ€ ์ƒํ™ฉ์—์„œ ๊ฒฐ์ •์ ์ด๋‹ค.

์ž์„ธ ์ถ”์ •๊ธฐ์˜ ์ผ๋ฐ˜ํ™”

์ถ”์ •๊ธฐ๋Š” ํ•™์Šต์— ์“ฐ์ง€ ์•Š์€ ์‹ ๊ทœ ๋ฌผ์ฒด(์ฝ”๋ผ๋ฆฌ ์ธํ˜•, ๋‹จ์ˆœ ํ๋ธŒ, ๋ผ์ง€์ €๊ธˆํ†ต ๋“ฑ)์— ๋Œ€ํ•ด์„œ๋„ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ž์„ธ ์ถ”์ •์„ ๋‚ด๋†“๋Š”๋‹ค. ๋ฌดํ…์Šค์ฒ˜/๋Œ€์นญ ๋ฌผ์ฒด์— ๊ฐ•ํ•˜๋‹ค๋Š” ์ ์ด ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ž์„ธ ์ถ”์ • ๋Œ€๋น„ ์ฐจ๋ณ„์ ์ด๋‹ค. ์‹œ๊ฐ์œผ๋กœ๋Š” ๋Œ€์นญ ํ๋ธŒ์˜ 6๊ฐœ ๋ฉด์„ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์—†์ง€๋งŒ, ์†๊ฐ€๋ฝ์˜ ์ ‘์ด‰ ์‹œํ€€์Šค๋Š” ๋น„๋Œ€์นญ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์‹ค์„ธ๊ณ„ ์ „์ด

๋ชฉํ‘œ ์ž์„ธ์— ๋„๋‹ฌํ•˜๋Š” ๋ฐ ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„์ด ๋‹จ์ถ•๋˜์—ˆ๊ณ , ๋‹จ์ผ ์ถ•์œผ๋กœ ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ ๋ชฉํ‘œ๋ฟ ์•„๋‹ˆ๋ผ ๋‘ ๊ฐœ์˜ ์Šคํ‚ฌ ์ „ํ™˜์ด ํ•„์š”ํ•œ ๋ชฉํ‘œ(์˜ˆ: -90ยฐ z ํ›„ 90ยฐ y)๋„ ์„ฑ๊ณต์‹œํ‚จ๋‹ค. ์˜์ƒ ๊ฒฐ๊ณผ(dexhier.github.io)์—์„œ ์‚ฌ๊ณผ, ์›๊ธฐ๋‘ฅ, ํ…Œ๋‹ˆ์Šค๊ณต, ์ฝ”๋ผ๋ฆฌ, ํ๋ธŒ ๋“ฑ์˜ ์žฌ๋ฐฐํ–ฅ์ด ํ™•์ธ๋œ๋‹ค.

๋น„ํŒ์  ๊ณ ์ฐฐ: ๋ฌด์—‡์ด ๊ฐ•ํ•˜๊ณ  ๋ฌด์—‡์ด ์•ฝํ•œ๊ฐ€

๊ฐ•์ 

(1) ์‹œ์Šคํ…œ์  ์šฐ์•„ํ•จ ๋ณด์ƒ์„ ๋‘ ํ•ญ์œผ๋กœ ์ค„์˜€๋‹ค. ์ƒˆ ์ž‘์—…๋งˆ๋‹ค ์†์œผ๋กœ ๊นŽ๋˜ ๋ณด์ƒ ์—”์ง€๋‹ˆ์–ด๋ง์„ ์—†์•ด๋‹ค. ์ด๊ฒŒ ์‹ค๋ฌด์ž ์ž…์žฅ์—์„œ ๊ฐ€์žฅ ํฐ ๊ฐ€์น˜๋‹ค. ๋„๋ฉ”์ธ ๋žœ๋คํ™”๋„ ์ €์ˆ˜์ค€ ํ•™์Šต ๋‹จ๊ณ„์—์„œ ๋๋‚ฌ๊ธฐ ๋•Œ๋ฌธ์—, ์ƒˆ ๋ชฉํ‘œ ์ž์„ธ๋ฅผ ์ถ”๊ฐ€ํ•  ๋•Œ ์‹œ์Šคํ…œ ์‹๋ณ„์„ ๋‹ค์‹œ ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค.

(2) ์ธ์‹๊ณผ ํ–‰๋™์˜ ๋ถ„๋ฆฌ (๊ทธ๋Ÿฌ๋‚˜ ์ ์ ˆํ•œ ๊ฒฐํ•ฉ) ์ž์„ธ ์ถ”์ •๊ธฐ๋Š” ๋ณ„๋„๋กœ ํ•™์Šต๋˜์–ด ์ •์ฑ… ํ•™์Šต๊ณผ ์ ˆ์—ฐ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ถ”์ •๊ธฐ๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ์‹ ํ˜ธ(์ €์ˆ˜์ค€ ์Šคํ‚ฌ์˜ ๋‚ด๋ถ€ ์ž„๋ฒ ๋”ฉ)๋Š” ์ •์ฑ…๊ณผ ๊ณต์œ ๋˜์–ด ์žˆ๋‹ค. ์™„์ „ํ•œ ๋ชจ๋“ˆํ™”๋„, ์™„์ „ํ•œ end-to-end๋„ ์•„๋‹Œ ์ค‘๊ฐ„ ์ง€์ ์„ ์ž˜ ์ฐพ์•˜๋‹ค.

(3) ์ž”์ฐจ ๋ณด์ •์˜ ๊น”๋”ํ•œ ํ†ตํ•ฉ ๊ณ„์ธต์  RL์˜ ์˜์›ํ•œ ์•ฝ์ ์ธ โ€œ์ €์ˆ˜์ค€ ์‹คํŒจ ์‹œ ์œ„๊ฐ€ ๋ฌด๋ ฅํ•จโ€ ๋ฌธ์ œ๋ฅผ ์ž”์ฐจ ํ–‰๋™์œผ๋กœ ํ•ด๊ฒฐํ•œ๋‹ค. ๊ตฌํ˜„ ๋น„์šฉ์ด ๋‚ฎ์œผ๋ฉด์„œ ํšจ๊ณผ๋Š” ํฌ๋‹ค.

(4) ๋ฌดํ…์Šค์ฒ˜ / ๋Œ€์นญ ๋ฌผ์ฒด ์ฒ˜๋ฆฌ ์‹œ๊ฐ๋งŒ์œผ๋กœ๋Š” ์ž์„ธ๋ฅผ ๋ชจํ˜ธํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒฝ์šฐ(matte ํ๋ธŒ, ๊ท ์งˆ ํ‘œ๋ฉด ๊ณต)์—๋„ ๋™์ž‘ํ•œ๋‹ค. ์‚ฐ์—… ์‘์šฉ ์ธก๋ฉด์—์„œ ์˜๋ฏธ ์žˆ๋‹ค. ๋ถ€ํ’ˆ ์ •๋ ฌ, ์กฐ๋ฆฝ ์‹œ ํ”ํžˆ ๋งˆ์ฃผ์น˜๋Š” ์กฐ๊ฑด์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์•ฝ์ ๊ณผ ์˜๋ฌธ

(1) ์ถ• ํ›„๋ณด์˜ ์ด์‚ฐํ™” ํ”Œ๋ž˜๋„ˆ๊ฐ€ ยฑx, ยฑy, ยฑz 6๊ฐœ ์ถ• ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ๋ฅธ๋‹ค. ์ž„์˜ ์ถ• ํšŒ์ „์ด ํ•„์š”ํ•œ ์ž์„ธ(์˜ˆ: ๋Œ€๊ฐ์„  ์ถ•์œผ๋กœ 60ยฐ ํšŒ์ „)๋Š” ๋‘์„ธ ๋ฒˆ์˜ ์ถ• ์ „ํ™˜์œผ๋กœ ๊ทผ์‚ฌํ•ด์•ผ ํ•œ๋‹ค. ์ด๋ก ์ ์œผ๋กœ๋Š” ์ž”์ฐจ๊ฐ€ ๋ณด์ •ํ•œ๋‹ค์ง€๋งŒ, ์ž”์ฐจ์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์œผ๋ฉด ์ •๋ฐ€๋„๊ฐ€ ๋–จ์–ด์ง€๊ณ , ํฌ๋ฉด ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์˜ prior๊ฐ€ ๊นจ์ง„๋‹ค. ์—ฐ์† ์ถ• ์„ ํƒ(continuous axis)์œผ๋กœ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์ด ์ž์—ฐ์Šค๋Ÿฌ์šด ๋‹ค์Œ ๋‹จ๊ณ„๋‹ค.

(2) ์ž์„ธ ์ถ”์ •๊ธฐ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์˜์กด์„ฑ ์ถ”์ •๊ธฐ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ GT ์ž์„ธ๋กœ ํ•™์Šต๋œ๋‹ค. sim-to-real์—์„œ ์ถ”์ •๊ธฐ ์ž์ฒด์˜ drift๊ฐ€ ๋ˆ„์ ๋  ์ˆ˜ ์žˆ๊ณ , ๋…ผ๋ฌธ์ด ์ •๋Ÿ‰์ ์œผ๋กœ ์ถ”์ • ์˜ค์ฐจ์˜ ์‹œ๊ฐ„ ๋ˆ„์ ์„ ์–ผ๋งˆ๋‚˜ ๊นŠ์ด ๋ถ„์„ํ–ˆ๋Š”์ง€๋Š” ์ œํ•œ์ ์ด๋‹ค. ์†๊ฐ€๋ฝ์˜ ๋งˆ์ฐฐ ๊ณ„์ˆ˜๊ฐ€ ์‹ค์„ธ๊ณ„์™€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๋‹ค๋ฅด๋ฉด, ์ œ์–ด ์˜ค์ฐจ์˜ ๋ถ„ํฌ๊ฐ€ ์–ด๊ธ‹๋‚˜๊ณ , ์ถ”์ •๊ธฐ๊ฐ€ ์ž˜๋ชป๋œ ์ž์„ธ๋ฅผ ์ž์‹  ์žˆ๊ฒŒ ๋ณด๊ณ ํ•˜๋Š” ์‹คํŒจ ๋ชจ๋“œ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

(3) ์ด‰๊ฐ ์„ผ์„œ์˜ ๋ถ€์žฌ HORA / RotateIt ๊ณ„์—ด์€ ์˜๋„์ ์œผ๋กœ ์ด‰๊ฐ ์„ผ์„œ ์—†์ด ๊ณ ์œ ๊ฐ๊ฐ๋งŒ ์“ด๋‹ค. ๊น”๋”ํ•˜์ง€๋งŒ, AnyRotate(Yang et al., CoRL 2024)๋‚˜ GelSight ํ†ตํ•ฉ ์—ฐ๊ตฌ๋“ค์€ ๋ช…์‹œ์  ์ด‰๊ฐ์ด ๋” ๊ฐ•๊ฑดํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ธ๋‹ค. ์ด ๋…ผ๋ฌธ๋„ ๋ช…์‹œ์  ์ด‰๊ฐ์„ ๋„์ž…ํ•˜๋ฉด ์ถ”์ •๊ธฐ์˜ ์ผ๋ฐ˜ํ™”๊ฐ€ ๋” ์ข‹์•„์งˆ ์—ฌ์ง€๊ฐ€ ์ถฉ๋ถ„ํ•˜๋‹ค. ๋‹ค๋งŒ ๊ทธ๋Ÿฌ๋ฉด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋น„์šฉ์ด ํญ์ฆํ•œ๋‹ค(TACTO, Taxim ํ†ตํ•ฉ ํ•„์š”).

(4) ์ž‘์—… ์ผ๋ฐ˜ํ™”์˜ ๋ฒ”์œ„ ์ €์ˆ˜์ค€์ด โ€œ์ถ•๋ณ„ ํšŒ์ „โ€์œผ๋กœ ๊ณ ์ •๋˜์–ด ์žˆ๋‹ค๋Š” ์ ์ด ๊ฐ•์ ์ด์ง€๋งŒ ๋™์‹œ์— ์ œ์•ฝ์ด๋‹ค. ๋„๊ตฌ ์‚ฌ์šฉ, ์‚ฝ์ž…, ๋˜์ง€๊ธฐ์ฒ˜๋Ÿผ ํšŒ์ „ ์™ธ์˜ ๋‹ค๋ฅธ ์Šคํ‚ฌ์ด ํ•„์š”ํ•œ ์ž‘์—…์œผ๋กœ ์–ด๋–ป๊ฒŒ ํ™•์žฅ๋ ์ง€๋Š” ๋ฏธํ•ด๊ฒฐ์ด๋‹ค. ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑํ•˜๋А๋ƒ๊ฐ€ ๋‹ค์Œ ๋ผ์šด๋“œ์˜ ํ•ต์‹ฌ ์งˆ๋ฌธ์ด ๋  ๊ฒƒ์ด๋‹ค.

(5) ํ‰๊ฐ€์˜ ํญ ์˜์ƒ์œผ๋กœ ๋ณด์—ฌ์ค€ ๊ฒฐ๊ณผ๋Š” ์ธ์ƒ์ ์ด์ง€๋งŒ, ์ •๋Ÿ‰์  ์„ฑ๊ณต๋ฅ  ํ†ต๊ณ„๊ฐ€ ๊ฐ์ฒด ๋‹ค์–‘์„ฑ๊ณผ ๋ชฉํ‘œ ์ž์„ธ ๋‹ค์–‘์„ฑ ์–‘ ์ถ•์—์„œ ์ถฉ๋ถ„ํžˆ ๋„“๊ฒŒ ๋ณด๊ณ ๋˜์—ˆ๋Š”์ง€๋Š” ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•˜๋‹ค(ํŠนํžˆ ๋‘ ๊ฐœ ์ด์ƒ ์Šคํ‚ฌ ์ „ํ™˜์ด ํ•„์š”ํ•œ ์ผ€์ด์Šค).

๊ด€๋ จ ์—ฐ๊ตฌ ์ง€๋„

flowchart LR
    HORA["HORA (CoRL '22)<br/>z-axis only<br/>proprioception"] --> RotateIt["RotateIt (CoRL '23)<br/>multi-axis<br/>vision+touch"]
    RotateIt --> FSC["From Simple to Complex (2025)<br/>reorientation<br/>hierarchical + pose est."]

    VD["Visual Dexterity (Sci.Rob. '23)<br/>D'Claw, full SO(3)<br/>depth only"] -.contrast.-> FSC
    AR["AnyRotate (CoRL '24)<br/>4-finger, gravity-invariant<br/>fingertip touch"] -.contrast.-> FSC

    OAI["OpenAI Cube/Rubik's<br/>('18, '19)<br/>Shadow Hand, RL"] -.predecessor.-> HORA

    FSC --> Future["Open directions:<br/>continuous axis,<br/>tactile fusion,<br/>tool use"]

๋…ผ๋ฌธ ํ•ธ๋“œ ์ž‘์—… ์ธ์‹ ๋ณด์ƒ ๋ณต์žก๋„
HORA (2022) Allegro z์ถ• ์—ฐ์† ํšŒ์ „ ๊ณ ์œ ๊ฐ๊ฐ ์ค‘๊ฐ„
RotateIt (2023) Allegro ๋‹ค์ถ• ์—ฐ์† ํšŒ์ „ ๋น„์ „+์ด‰๊ฐ ์ค‘๊ฐ„
Visual Dexterity (2023) Dโ€™Claw full SO(3) ์žฌ๋ฐฐํ–ฅ depth ๋†’์Œ (๋‹คํ•ญ)
AnyRotate (2024) 4-finger ์ž„์˜ ์ถ•, ์ž„์˜ ์† ๋ฐฉํ–ฅ fingertip touch ์ค‘๊ฐ„
๋ณธ ๋…ผ๋ฌธ (2025) Allegro ๋ชฉํ‘œ ์ž์„ธ ์žฌ๋ฐฐํ–ฅ ๊ณ ์œ ๊ฐ๊ฐ + ์ถ”์ •๊ธฐ ๋‚ฎ์Œ (2ํ•ญ)

ํ•ต์‹ฌ ๋น„๊ต๋Š” Visual Dexterity์™€์˜ ๋Œ€์กฐ๋‹ค. Visual Dexterity๋Š” ๋‹จ์ผ ์ •์ฑ…์œผ๋กœ full SO(3) ์žฌ๋ฐฐํ–ฅ์„ ๋๋‚ธ๋‹ค(end-to-end + ์ •๊ตํ•œ ๋ณด์ƒ). ๋ณธ ๋…ผ๋ฌธ์€ ๋ณด์ƒ์€ ๋‹จ์ˆœํ•œ ๋Œ€์‹  ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•œ๋‹ค. ๋‘˜ ๋‹ค ๊ฐ™์€ ๋ชฉ์ ์ง€(์ž„์˜ ์ž์„ธ ์žฌ๋ฐฐํ–ฅ)์— ๋‹ค๋ฅธ ๊ธธ๋กœ ๋„๋‹ฌํ•œ๋‹ค. ์–ด๋А ๊ธธ์ด โ€œ์˜ณ๋‹คโ€๊ธฐ๋ณด๋‹ค ์–ด๋А ๊ธธ์ด ์ƒˆ ์ž‘์—…์œผ๋กœ ํ™•์žฅ๋  ๋•Œ ์ถ”๊ฐ€ ๋น„์šฉ์ด ์ ์€๊ฐ€์˜ ์งˆ๋ฌธ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ ์ชฝ ๋‹ต์ด โ€œlow-level skill์„ ํ•œ ๋ฒˆ ์ž˜ ๋งŒ๋“ค์–ด ๋‘๋ฉด ๊ทธ ๋‹ค์Œ์€ ์‹ธ๋‹คโ€์ด๊ณ , ์ด ๋‹ต์ด ์‚ฐ์—… ํ˜„์žฅ์—๋Š” ๋” ๋งค๋ ฅ์ ์ด๋‹ค.

์‹ค๋ฌด ๊ด€์ ์—์„œ ๊ฐ€์ ธ๊ฐˆ ๋งŒํ•œ ํ†ต์ฐฐ

Allegro Hand ๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ๋ฅผ ํ•˜๋Š” ์ž…์žฅ์—์„œ ์ด ๋…ผ๋ฌธ์˜ ์‹œ์‚ฌ์ ์„ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  1. ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์€ ๋‹ค์‹œ ๋งŒ๋“ค์ง€ ๋ง๊ณ  ์žฌ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋ฏธ ํ•™์Šต๋œ RotateIt ์ •์ฑ…์„ freezeํ•ด์„œ ์ƒˆ ์ž‘์—…์˜ prior๋กœ ์“ฐ๋Š” ์ ‘๊ทผ์€ ๋น„์šฉ ๋Œ€๋น„ ํšจ๊ณผ๊ฐ€ ๋งค์šฐ ํฌ๋‹ค. ํŠนํžˆ ์ž์ฒด ํ•™์Šต ์ธํ”„๋ผ ๋น„์šฉ์ด ํฐ ์ƒํ™ฉ์—์„œ ๊ฒฐ์ •์ ์ด๋‹ค.
  2. ๋ณด์ƒ ๋ณต์žก๋„์™€ sim-to-real ๋น„์šฉ์€ ๋น„๋ก€ํ•œ๋‹ค. ๋ณด์ƒ ํ•ญ์„ ๋Š˜๋ฆด์ˆ˜๋ก ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํŠœ๋‹ ์‚ฌ์ดํด์ด ๊ธธ์–ด์ง€๊ณ  ์‹ค์„ธ๊ณ„ ์ „์ด๋„ ์–ด๋ ค์›Œ์ง„๋‹ค. ๊ณ„์ธต ๊ตฌ์กฐ๋Š” ๋ณด์ƒ ๋‹จ์ˆœํ™”์˜ ๋„๊ตฌ๋‹ค.
  3. ์ž์„ธ ์ถ”์ •์—์„œ ๊ณ ์œ ๊ฐ๊ฐ + ์ œ์–ด ์˜ค์ฐจ ์‹ ํ˜ธ์˜ ์œ„๋ ฅ. ์นด๋ฉ”๋ผ๊ฐ€ ๊ฐ€๋ ค์ง€๋Š” ์†์•ˆ ์ž‘์—…์—์„œ, ๋ช…๋ น-์‹ค์ œ ๊ฐญ ์‹œ๊ณ„์—ด์€ ๊ฑฐ์˜ ๊ณต์งœ๋กœ ์–ป๋Š” ๊ฐ•๋ ฅํ•œ ์‹ ํ˜ธ๋‹ค. ๋ช…์‹œ์  ์ด‰๊ฐ ์„ผ์„œ ๋„์ž… ์ „์— ์ด ์‹ ํ˜ธ๋ฅผ ๋จผ์ € ๋๊นŒ์ง€ ์งœ๋‚ด๋Š” ๊ฒƒ์ด ํ•ฉ๋ฆฌ์ ์ด๋‹ค.
  4. ์ž”์ฐจ ํ–‰๋™์€ hierarchical RL์˜ ๋‹จ์ ˆ์„ ๋ฉ”์šฐ๋Š” ๋ณดํŽธ์  ์ฒ˜๋ฐฉ. ๋„๋ฉ”์ธ์„ ์˜ฎ๊ฒจ๋„ ๊ฑฐ์˜ ๊ทธ๋Œ€๋กœ ์“ธ ์ˆ˜ ์žˆ๋‹ค. locomotion, manipulation ์–ด๋А ์ชฝ์ด๋“ .
  5. ๋ฌดํ…์Šค์ฒ˜/๋Œ€์นญ ๋ฌผ์ฒด์— ๊ฐ•ํ•˜๋‹ค๋Š” ์ ์€ ์‚ฐ์—…์ ์œผ๋กœ ์ค‘์š”ํ•˜๋‹ค. ์‹ค์ œ ๋ถ€ํ’ˆ ์ •๋ ฌ, ์กฐ๋ฆฝ ์ž‘์—…์˜ ๊ฐ์ฒด๋Š” ํ…์Šค์ฒ˜๊ฐ€ ๊ฑฐ์˜ ์—†๊ณ  ๋Œ€์นญ์„ฑ์ด ๋†’๋‹ค. ์‹œ๊ฐ only ์ ‘๊ทผ์˜ ์•ฝ์ ์ด ๋ช…ํ™•ํžˆ ๋…ธ์ถœ๋˜๋Š” ์ง€์ ์ด๋ฉฐ, ์—ฌ๊ธฐ๊ฐ€ ๋ณธ ๋…ผ๋ฌธ์ด ์ฐจ๋ณ„ํ™”๋˜๋Š” ์˜์—ญ์ด๋‹ค.

๋งˆ๋ฌด๋ฆฌ

์ด ๋…ผ๋ฌธ์ด ๋ณด์—ฌ์ฃผ๋Š” ๊ฑด ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ผ๊ธฐ๋ณด๋‹ค ์ƒˆ๋กœ์šด ์ž‘์—… ๋ถ„ํ•ด ๋ฐฉ์‹์ด๋‹ค. ํšŒ์ „ ์ •์ฑ…์„ ๋‹ค์‹œ ํ•™์Šตํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋ณด์ƒ์„ ์ •๊ตํ•˜๊ฒŒ ๊นŽ์ง€ ์•Š๋Š”๋‹ค. ์ด‰๊ฐ ์„ผ์„œ๋ฅผ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋Œ€์‹  ์ด๋ฏธ ๊ฐ€์ง„ ๊ฒƒ์„ ์˜๋ฆฌํ•˜๊ฒŒ ์žฌ๋ฐฐ์น˜ํ•œ๋‹ค. ๊ณ„์ธต, ์ž”์ฐจ, ์ž์„ธ ์ถ”์ •๊ธฐ, ์ด ์„ธ ๊ฐ€์ง€ ๋ถ€ํ’ˆ์˜ ์กฐํ•ฉ์ด ๊ทธ ๊ฒฐ๊ณผ๋‹ค.

๊ฐ€์žฅ ์ธ์ƒ์ ์ธ ๋ถ€๋ถ„์€ ์ž์„ธ ์ถ”์ •๊ธฐ๋‹ค. ์นด๋ฉ”๋ผ ์—†์ด, ์ด‰๊ฐ ์„ผ์„œ ์—†์ด, ๋‹จ์ง€ ์†๊ฐ€๋ฝ ๊ด€์ ˆ์˜ ๋ช…๋ น๊ณผ ์‹ค์ œ์˜ ์ฐจ์ด๋งŒ์œผ๋กœ ์†์•ˆ ๋ฌผ์ฒด์˜ ์ž์„ธ๋ฅผ ์ถ”์ •ํ•œ๋‹ค๋Š” ๋ฐœ์ƒ์€ ์ง๊ด€์ ์ด๋ฉด์„œ ๊ฐ•๋ ฅํ•˜๋‹ค. ๋กœ๋ด‡์ด ์ž๊ธฐ ๋ชธ์˜ ํ•œ๊ณ„(์˜ค์ฐจ)๋ฅผ ์™ธ๋ถ€ ์„ธ๊ณ„์˜ ์ •๋ณด๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค๋Š” ์ ์—์„œ, ์ด ์ถ”์ •๊ธฐ๋Š” ๋‹จ์ˆœํ•œ ์—”์ง€๋‹ˆ์–ด๋ง์ด ์•„๋‹ˆ๋ผ ์ž‘์€ ์ธ์‹๋ก ์  ์•„์ด๋””์–ด๋ฅผ ๋‹ด๊ณ  ์žˆ๋‹ค.

๋‹ค์Œ ๋‹จ๊ณ„๋Š” ๋ช…ํ™•ํ•˜๋‹ค. ์—ฐ์† ์ถ• ์„ ํƒ, ๋” ๋‹ค์–‘ํ•œ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ(์ฅ๊ธฐ, ์Šฌ๋ผ์ด๋”ฉ, ๋˜์ง€๊ธฐ), ๋ช…์‹œ์  ์ด‰๊ฐ ํ†ตํ•ฉ. ๊ทธ๋ฆฌ๊ณ  ๊ฐ€์žฅ ์ค‘์š”ํ•˜๊ฒŒ๋Š”, โ€œํšŒ์ „โ€ ๋„ˆ๋จธ์˜ ์ž‘์—…์œผ๋กœ ๊ฐ™์€ ๋ถ„ํ•ด ํŒจํ„ด์ด ํ™•์žฅ๋˜๋Š”์ง€๋ฅผ ๋ณด์ด๋Š” ์ผ์ด๋‹ค. ๊ทธ๊ฒŒ ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด, dexterous manipulation์—์„œ ์ƒˆ ์ž‘์—…์˜ sim-to-real ๋น„์šฉ์ด ํ•œ ์ž๋ฆฟ์ˆ˜๋กœ ๋–จ์–ด์ง€๋Š” ์‹œ๋Œ€๊ฐ€ ์˜ค๊ณ  ์žˆ๋Š” ์…ˆ์ด๋‹ค.

Copyright 2026, JungYeon Lee