Curieux.JY
  • JungYeon Lee
  • Post
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review

๐Ÿ“ƒFrom Simple to Complex Skills ๋ฆฌ๋ทฐ

dexterous manipulation
hierarchical RL
From Simple to Complex Skills: The Case of In-Hand Object Reorientation
Published

March 23, 2026

  • Paper Link

  • Project Link

  • Haozhi Qi, Brent Yi, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

  1. ๐Ÿค– In-hand object reorientation ์ž‘์—…์„ ์œ„ํ•ด, ๋ณธ ๋…ผ๋ฌธ์€ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ €์ˆ˜์ค€ skill๋“ค์„ ์žฌ์‚ฌ์šฉํ•˜๋Š” ๊ณ„์ธต์  ์ •์ฑ…(hierarchical policy)์„ ์ œ์•ˆํ•˜์—ฌ sim-to-real gap๊ณผ ํ•™์Šต ํšจ์œจ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿฆพ ๋˜ํ•œ, ์‹œ์Šคํ…œ์€ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ๊ฐ์ฒด ์ž์„ธ ์ถ”์ •๊ธฐ(generalizable object pose estimator)๋ฅผ ๋„์ž…ํ•˜๋ฉฐ, ์ด๋Š” ์ €์ˆ˜์ค€ skill ์ •์ฑ…์˜ ํ”ผ๋“œ๋ฐฑ๊ณผ ์ž”์—ฌ ๋™์ž‘(residual action)์„ ํ™œ์šฉํ•˜์—ฌ ๋ณต์žกํ•œ ์กฐ์ž‘ ํ™˜๊ฒฝ์—์„œ ์ •ํ™•ํ•œ ์ž์„ธ ์ถ”์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  3. โœจ ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ์ ‘๊ทผ ๋ฐฉ์‹์€ scratch๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” baseline๋ณด๋‹ค ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๊ณ  ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์— ๋Œ€ํ•œ ํ˜„์‹ค ์„ธ๊ณ„ sim-to-real transfer์—์„œ ๊ฐ•๋ ฅํ•œ ๊ฒฌ๊ณ ํ•จ๊ณผ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ ๋กœ๋ด‡์˜ In-Hand Object Reorientation์ด๋ผ๋Š” ๋ณต์žกํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด์— ํ•™์Šต๋œ(pre-trained) ์ €์ˆ˜์ค€(low-level) ์Šคํ‚ฌ์„ ์žฌ์‚ฌ์šฉํ•˜๋Š” ๊ณ„์ธต์  ์ •์ฑ…(hierarchical policy) ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ •์ฑ…์„ ํ•™์Šตํ•˜๊ณ  ์‹ค์ œ ์„ธ๊ณ„๋กœ ์ „์ด(transfer)ํ•˜๋Š” ๊ฒƒ์ด dexterous manipulation์—์„œ ์œ ๋งํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์ด์ง€๋งŒ, ๊ฐ ์ƒˆ๋กœ์šด ์ž‘์—…์— ๋Œ€ํ•ด sim-to-real gap์„ ๋ฉ”์šฐ๋Š” ๋ฐ์—๋Š” reward engineering, hyperparameter tuning, system identification๊ณผ ๊ฐ™์€ ์ƒ๋‹นํ•œ ์ธ๊ฐ„์˜ ๋…ธ๋ ฅ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„์ธต์  ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  (Core Methodology):

๋ณธ ์‹œ์Šคํ…œ์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค: Planner Policy (\pi_{plan})์™€ Skill Policy (\pi_{skill}).

  1. Skill Policy (\pi_{skill}):
    • ์ด๊ฒƒ์€ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ๋กœ, In-Hand Object Rotation ์ •์ฑ… [6]์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค. ํŠน์ • ํšŒ์ „ ์ถ• k์— ๋Œ€ํ•ด ๊ฐ์ฒด๋ฅผ ํšŒ์ „์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
    • \pi_{skill}์˜ ์ž…๋ ฅ o_{skill_t}๋Š” ๋กœ๋ด‡์˜ ๊ณ ์œ ์ˆ˜์šฉ์„ฑ(proprioception) ์ •๋ณด (๊ด€์ ˆ ์œ„์น˜ \theta_{t-T:t} ๋ฐ ๋ช…๋ น๋œ ๊ด€์ ˆ ๋ชฉํ‘œ a_{skill_{t-T-1:t-1}})์™€ RGB-D ์นด๋ฉ”๋ผ์—์„œ ์–ป์€ ๊นŠ์ด ์ด๋ฏธ์ง€(depth image) ์ž„๋ฒ ๋”ฉ d_{t-T:t}๋ฅผ ํฌํ•จํ•˜๋Š” ์‹œ๊ฐ„์  ์‹œํ€€์Šค(temporal sequence)์ž…๋‹ˆ๋‹ค (์—ฌ๊ธฐ์„œ T=30 ํƒ€์ž„์Šคํ…์˜ ๊ธฐ๋ก์„ ์‚ฌ์šฉ).
    • \pi_{skill}์€ ๋กœ๋ด‡์— ๋Œ€ํ•œ ์›์‹œ ๊ด€์ ˆ ์œ„์น˜ ๋ชฉํ‘œ(raw joint position targets) a_{skill_t}๋ฅผ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ถ”๊ฐ€์ ์œผ๋กœ ๊ฐ์ฒด์˜ ๋ฌผ๋ฆฌ์  ์†์„ฑ๊ณผ ํ˜•ํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํŠน์ง• ๋ฒกํ„ฐ z_t๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด z_t๋Š” ๊ณ ์ˆ˜์ค€(high-level) ์ •์ฑ…์— ๋Œ€ํ•œ ํ”ผ๋“œ๋ฐฑ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. z_t๋Š” ์‹œ๊ฐ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ์ฒด์˜ ๊ธฐํ•˜ํ•™์  ์ •๋ณด๋ฅผ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค.
  2. Planner Policy (\pi_{plan}):
    • ์ด๊ฒƒ์€ ๊ณ ์ˆ˜์ค€ ์ •์ฑ…์œผ๋กœ, a_{plan_t} = \pi_{plan}(o_{plan_t}, q_{goal_t}, z_t)์™€ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.
    • \pi_{plan}์˜ ์ž…๋ ฅ o_{plan_t}๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ •๋ณด๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:
      • ๊ฐ์ฒด์˜ ์ƒํƒœ ์‹œํ€€์Šค s_{t-5:t} (3D ์œ„์น˜ p_t ๋ฐ ๋‹จ์œ„ ์ฟผํ„ฐ๋‹ˆ์–ธ(unit quaternion) q_t๋กœ ํ‘œํ˜„๋œ ๋ฐฉํ–ฅ).
      • ๊ฐ์ฒด์™€ ๋ชฉํ‘œ ๋ฐฉํ–ฅ ์‚ฌ์ด์˜ ์ƒ๋Œ€์  ๋ณ€ํ™˜ \zeta_{t-5:t} = \Delta(q_t, q_{goal_t}) = q_{goal_t} \cdot \bar{q}_t.
      • ์ด์ „ ํƒ€์ž„์Šคํ…์˜ planner action a_{plan_{t-6:t-1}}.
      • ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€, ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ์ •์ฑ…์ธ \pi_{skill}์—์„œ ์ œ๊ณตํ•˜๋Š” ํ”ผ๋“œ๋ฐฑ z_t์ž…๋‹ˆ๋‹ค. ์ด๋Š” \pi_{plan}์ด ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์˜ ๋ฐ˜์‘์„ ์ธ์ง€ํ•˜๊ณ  ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.
    • \pi_{plan}์€ 3๊ณ„์ธต MLP(Multi-Layer Perceptron) ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ELU(Exponential Linear Unit) ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
    • \pi_{plan}์˜ ์ถœ๋ ฅ์€ 7์ฐจ์› ๋ฒ”์ฃผํ˜• ๋ถ„ํฌ(categorical distribution)๋กœ, ์—ฌ์„ฏ ๊ฐœ์˜ ์ •๊ทœ ํšŒ์ „ ์ถ• (\pm x, \pm y, \pm z) ์ค‘ ํ•˜๋‚˜์™€ ์ถ”๊ฐ€์ ์ธ STOP ๋ช…๋ น์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ์ฟผํ„ฐ๋‹ˆ์–ธ์€ ๋„คํŠธ์›Œํฌ ์ž…๋ ฅ ์‹œ 6D representations์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.
    • Residual Actions (a_{rest}): \pi_{plan}์€ ์„ ํƒ๋œ ํšŒ์ „ ์ถ• ์™ธ์—, ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์˜ ์ถœ๋ ฅ์— ๋ณด์™„์ ์ธ ์ž”์—ฌ ๋™์ž‘(residual action) a_{rest}๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ๋กœ๋ด‡์—๊ฒŒ ์ „๋‹ฌ๋˜๋Š” ๋™์ž‘์€ a_t = a_{rest} + a_{skill_t}์ž…๋‹ˆ๋‹ค. ์ด a_{rest}๋Š” planner policy๊ฐ€ low-level skill์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•˜๊ณ  ์ถ”๊ฐ€์ ์ธ ์˜ค๋ฅ˜ ๋ณด์ •์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐ ๋ณด์ƒ:

  • \pi_{plan}์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ œ๊ณต๋˜๋Š” ground-truth ๊ฐ์ฒด ์ƒํƒœ q_t๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค.
  • ๋ณด์ƒ ํ•จ์ˆ˜๋Š” r = 1/(d(q_t, q_{goal_t}) + \epsilon) + \lambda_s \mathbb{1}(Success)๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ d(q_t, q_{goal_t})๋Š” ํšŒ์ „ ๊ฑฐ๋ฆฌ ๋ณด์ƒ์ด๋ฉฐ, \mathbb{1}(Success)๋Š” ์„ฑ๊ณต ๋ณด๋„ˆ์Šค์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ์— ๋น„ํ•ด ๋ณด์ƒ ํ•จ์ˆ˜๊ฐ€ ํ›จ์”ฌ ๋‹จ์ˆœํ•˜๋ฉฐ, ์ด๋Š” ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์ด ์ด๋ฏธ ์ž˜ ํŠœ๋‹๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

Generalizable State Estimator:

  • ์‹ค์ œ ์„ธ๊ณ„๋กœ ์ •์ฑ…์„ ์ „์ดํ•˜๊ธฐ ์œ„ํ•ด, ์‹œ์Šคํ…œ์€ ๊ฐ•๊ฑดํ•œ ๊ฐ์ฒด ์ž์„ธ ์ถ”์ •๊ธฐ(pose estimator)๋ฅผ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ œ์•ˆ๋œ ์ž์„ธ ์ถ”์ •๊ธฐ๋Š” ์‹ ๊ฒฝ๋ง \phi๋กœ ๊ตฌํ˜„๋œ ์žฌ๊ท€์  ์ƒํƒœ ์ถ”์ •๊ธฐ(recursive state estimator)์ž…๋‹ˆ๋‹ค.
  • ์ž…๋ ฅ์€ ๊ณ ์œ ์ˆ˜์šฉ์„ฑ, ๋™์ž‘, ์ œ์–ด ์˜ค๋ฅ˜, ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ํ”ผ๋“œ๋ฐฑ(z_t), ๊ทธ๋ฆฌ๊ณ  ์ด์ „์— ์ถ”์ •๋œ ๊ฐ์ฒด ์ƒํƒœ ์‹œํ€€์Šค์ž…๋‹ˆ๋‹ค.
  • \phi๋Š” ๋‹ค์Œ ํƒ€์ž„์Šคํ…์˜ ๊ฐ์ฒด ์ƒํƒœ \hat{s}_t๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ์ถ”์ •๊ธฐ๋Š” Transformer ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ํŠน์ง• ์‹œํ€€์Šค f_t = [q_t, a_{t-1}, q_t - a_{t-1}, \hat{s}_{t-1}, z_t]๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ \hat{s}_t๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • ํ•™์Šต์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ \pi_{plan}์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กค์•„์›ƒ(rollout)ํ•˜๋ฉฐ, ์˜ˆ์ธก๋œ ์ฟผํ„ฐ๋‹ˆ์–ธ๊ณผ ground-truth ์ฟผํ„ฐ๋‹ˆ์–ธ ์‚ฌ์ด์˜ ํšŒ์ „ ๊ฑฐ๋ฆฌ๊ฐ€ 0.8 ๋ผ๋””์•ˆ์„ ์ดˆ๊ณผํ•˜๊ฑฐ๋‚˜ ์˜ˆ์ธก๋œ ๊ฐ์ฒด ์œ„์น˜๊ฐ€ 3cm ์ด์ƒ ๋ฒ—์–ด๋‚˜๋ฉด ์—ํ”ผ์†Œ๋“œ๋ฅผ ๋ฆฌ์…‹ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ \ell_2 distance๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ:

  • ์ •์ฑ… ํ•™์Šต ์„ฑ๋Šฅ: ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์„ ์‚ฌ์šฉํ•œ ๊ณ„์ธต์  ์ •์ฑ…์€ ํ•™์Šต์—์„œ scratch๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” baseline ์ •์ฑ…๋ณด๋‹ค 8๋ฐฐ ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋ฉฐ, ๋” ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๊ฐ์ฒด ์ƒํƒœ ์ •๋ณด์— ๋…ธ์ด์ฆˆ๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก baseline์€ ๋ถˆ์•ˆ์ •ํ•ด์ง€๊ณ  ์ˆ˜๋ ด์— ์‹คํŒจํ•˜๋Š” ๋ฐ˜๋ฉด, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์•ˆ์ •์ ์ธ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์ด ํƒ์ƒ‰ ๊ณต๊ฐ„์„ ๊ตฌ์กฐํ™”ํ•˜๊ณ  ์˜๋ฏธ ์—†๋Š” ๋ฌด์ž‘์œ„ ํ–‰๋™์„ ์ค„์—ฌ์ฃผ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
  • Out-of-Distribution Robustness: ์ œ์•ˆ๋œ ์ •์ฑ…์€ ๊ด€์ธก ๋…ธ์ด์ฆˆ, ๋ฌผ๋ฆฌ์  ๋ฌด์ž‘์œ„ํ™”(physical randomizations), ๊ฐ์ฒด ํ˜•ํƒœ ๋ณ€ํ™”์™€ ๊ฐ™์€ out-of-distribution ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ baseline๋ณด๋‹ค ํ›จ์”ฌ ๊ฐ•๊ฑดํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • Generalizable State Estimation: ํ•™์Šต๋œ ์ž์„ธ ์ถ”์ •๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์˜ˆ์ธก๋œ ๊ฐ์ฒด ์ƒํƒœ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์‹ค์ œ ์„ธ๊ณ„์— ์ •์ฑ…์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ „์ดํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ baseline๋ณด๋‹ค policy smoothness์™€ energy metrics์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ, ๋” ์•ˆ์ •์ ์ธ ๊ฐ์ฒด ์กฐ์ž‘์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  • Ablation Experiments:
    • Residual Actions ๋ฐ Low-Level Skill Feedback: ์ด ๋‘ ์š”์†Œ๊ฐ€ ์—†์œผ๋ฉด ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค. Residual actions๋Š” ๋ฏธ์„ธํ•œ ์˜ค๋ฅ˜ ๋ณด์ •์„ ์ œ๊ณตํ•˜๊ณ , z_t๋ฅผ ํ†ตํ•œ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ํ”ผ๋“œ๋ฐฑ์€ \pi_{plan}์ด low-level skill์˜ ๋‚ด๋ถ€ ์ƒํƒœ์™€ ๊ฐ์ฒด ์†์„ฑ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ํ•„์ˆ˜์ ์ž„์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.
    • Planner Policy Inputs: ์ฟผํ„ฐ๋‹ˆ์–ธ ์ฐจ์ด๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ, ๊ฐ์ฒด ์œ„์น˜, ๊ด€์ธก ๊ธฐ๋ก, ์ด์ „ planner actions, ๊ทธ๋ฆฌ๊ณ  ๊ณ ์œ ์ˆ˜์šฉ์„ฑ ์ •๋ณด๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ์ถ”๊ฐ€ํ•˜๋ฉด์„œ ์ •์ฑ… ์„ฑ๋Šฅ์ด ์ ์ง„์ ์œผ๋กœ ํ–ฅ์ƒ๋จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Š” planner์™€ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ์ •์ฑ… ๊ฐ„์˜ closed-loop feedback์˜ ์ค‘์š”์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.
  • ์‹ค์ œ ์„ธ๊ณ„ ์‹คํ—˜: Allegro Hand ๋กœ๋ด‡์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ์—†์—ˆ๋˜ (out-of-distribution) 6๊ฐ€์ง€ ๋‹ค์–‘ํ•œ ์‹ค์ œ ๊ฐ์ฒด์— ๋Œ€ํ•ด ์„ฑ๊ณต์ ์ธ in-hand reorientation์„ ์‹œ์—ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ž‘์€ ํ๋ธŒ์™€ ๊ฐ™์€ ์กฐ์ž‘ํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฐ์ฒด์—๋„ ์ž˜ ์ผ๋ฐ˜ํ™”๋˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก  ๋ฐ ํ•œ๊ณ„:

๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ์„ ํ™œ์šฉํ•˜์—ฌ in-hand object reorientation์„ ์œ„ํ•œ ๊ณ„์ธต์  ์ •์ฑ…์„ ๊ตฌ์ถ•ํ•˜๊ณ , ๊ฐ•๊ฑดํ•˜๊ณ  ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ์ƒํƒœ ์ถ”์ •๊ธฐ๋ฅผ ํ•™์Šตํ•จ์œผ๋กœ์จ, ํ›ˆ๋ จ ํšจ์œจ์„ฑ, ๊ฐ•๊ฑด์„ฑ, ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•œ๊ณ„์ ์œผ๋กœ๋Š” ์ €์ˆ˜์ค€ ์ •์ฑ…์˜ ํšจ๊ณผ์„ฑ์— ์˜์กดํ•˜๋ฉฐ, ์†๊ฐ€๋ฝ๊ณผ ๊ฐ์ฒด ์‚ฌ์ด์— ๋ฏธ๋„๋Ÿฌ์ง(slipping)์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฐ€์ •์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ˜„์žฌ ์ž์„ธ ์ถ”์ • ์˜ค์ฐจ๊ฐ€ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ˆ„์ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ๋กœ๋Š” ์ด‰๊ฐ ์„ผ์„œ(tactile sensing)๋ฅผ ํ†ตํ•ฉํ•˜๊ณ  ์‹œ๊ฐ(vision)๊ณผ ์ด‰๊ฐ(touch)์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ •ํ™•ํ•˜๊ณ  ์žฅ๊ธฐ์ ์ธ ์ž์„ธ ์ถ”์ ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

  • Planner๊ฐ€ ์ €์ˆ˜์ค€ ์ •์ฑ…์˜ ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›์•„ residual action์œผ๋กœ ์˜ค์ฐจ๋ฅผ ๋ณด์ •ํ•˜๋Š” ๊ตฌ์กฐ๊ฐ€ ํ•ต์‹ฌ
  • ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ ํ”ผ๋“œ๋ฐฑ๊ณผ ์ €์ˆ˜์ค€ ์Šคํ‚ฌ ์˜ˆ์ธก์„ ํ™œ์šฉํ•œ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ๋ฌผ์ฒด pose estimator ์ œ์•ˆ
  • Easy ์กฐ๊ฑด์—์„œ baseline๊ณผ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉด์„œ ์ˆ˜๋ ด ์†๋„ 8๋ฐฐ ํ–ฅ์ƒ
  • Distribution shift ์กฐ๊ฑด์—์„œ baseline์ด ์™„์ „ ์‹คํŒจํ•˜๋Š” ๋ฐ˜๋ฉด ๋ณธ ๋ฐฉ๋ฒ•์€ 80% ์„ฑ๊ณต๋ฅ  ์œ ์ง€
  • ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ 6๊ฐ€์ง€ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋กœ ํ…Œ์ŠคํŠธ, 37.5%~93.3% ์„ฑ๊ณต๋ฅ  ๋‹ฌ์„ฑ
  • ๋Œ€์นญ์ ์ด๊ณ  ํ…์Šค์ฒ˜๊ฐ€ ์—†๋Š” ๋ฌผ์ฒด์˜ ์กฐ์ž‘๋„ ์ง€์›

Copyright 2026, JungYeon Lee