Curieux.JY
  • JungYeon Lee
  • Post
  • Note

On this page

  • ๐Ÿ” Ping Review
  • ๐Ÿ”” Ring Review

๐Ÿ“ƒRotateIt ๋ฆฌ๋ทฐ

tactile
dexterous manipulation
General In-Hand Object Rotation with Vision and Touch
Published

March 24, 2026

  • Paper Link

  • Project Link

  • Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

  • UC Berkeley, Meta AI, CMU, TU Dresden

  • Conference on Robot Learning (CoRL), 2023

  1. ๐Ÿš€ ๋ณธ ์—ฐ๊ตฌ๋Š” multimodal sensory input(์‹œ๊ฐ, ์ด‰๊ฐ, proprioception)์„ ํ™œ์šฉํ•˜์—ฌ ์†๊ฐ€๋ฝ ๋์œผ๋กœ ๋‹ค์–‘ํ•œ ์ถ•์—์„œ ๋ฌผ์ฒด๋ฅผ ํšŒ์ „์‹œํ‚ค๋Š” ์‹œ์Šคํ…œ์ธ RotateIt์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ’ก RotateIt์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ privileged information์„ ํ™œ์šฉํ•œ oracle policy๋ฅผ ํ›ˆ๋ จํ•œ ํ›„, visuotactile transformer๋ฅผ ํ†ตํ•ด ํ˜„์‹ค์ ์ธ ์„ผ์„œ ์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ ์ด ์ •๋ณด๋ฅผ ์ถ”๋ก ํ•˜์—ฌ visuotactile policy๋ฅผ ํ•™์Šตํ•˜๋Š” ์ด์ค‘ ๋‹จ๊ณ„ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  3. โœ… Vision ๋ฐ tactile sensing์ด ์กฐ์ž‘ ์„ฑ๋Šฅ๊ณผ OOD(Out-of-Distribution) generalization์— ์ค‘์š”ํ•จ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จ๋œ ์ •์ฑ…์ด ์‹ค์ œ ์„ธ๊ณ„์˜ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

Haozhi Qi ์™ธ ์—ฐ๊ตฌ์ง„์€ โ€œGeneral In-Hand Object Rotation with Vision and Touchโ€ ๋…ผ๋ฌธ์—์„œ ์‹œ๊ฐ ๋ฐ ์ด‰๊ฐ ํ”ผ๋“œ๋ฐฑ์„ ํ†ตํ•ฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋ฅผ ์† ์•ˆ์—์„œ ๋‹ค์ถ• ํšŒ์ „์‹œํ‚ค๋Š” ์‹œ์Šคํ…œ์ธ RotateIt์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ์กฐ์ž‘ ๊ธฐ์ˆ ์ด ์ง๋ฉดํ–ˆ๋˜ ์ผ๋ฐ˜ํ™” ๋ฐ ์•ˆ์ •์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉฐ, ํŠนํžˆ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ๊ฐ์ฒด์— ๋Œ€ํ•œ ์•ˆ์ •์ ์ธ ํž˜ ํ์‡„(force closure) ์œ ์ง€์˜ ์–ด๋ ค์›€์„ ๊ทน๋ณตํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  (Core Methodology)

RotateIt์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จ๋˜๊ณ  ์‹ค์ œ ์„ธ๊ณ„์— ์ง์ ‘ ๋ฐฐํฌ๋˜๋Š” sim-to-real ์ ‘๊ทผ ๋ฐฉ์‹์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ํ›ˆ๋ จ์€ ํฌ๊ฒŒ ๋‘ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  1. Oracle Policy ํ›ˆ๋ จ (Oracle Policy Training):
    • ํŠน๊ถŒ ์ •๋ณด (Privileged Information): ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ๊ฐ์ฒด์˜ ์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ๊ณผ ํ˜•์ƒ ์ •๋ณด(ground-truth physical properties and shapes)๋ฅผ โ€œํŠน๊ถŒ ์ •๋ณด(extrinsics)โ€ z_t ๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด ์ •๋ณด๋Š” ์ •์ฑ…์ด ๊ฐ์ฒด์˜ ํŠน์„ฑ์„ ์™„๋ฒฝํ•˜๊ฒŒ ์•„๋Š” ์ƒํƒœ๋ฅผ ๋ชจ๋ฐฉํ•ฉ๋‹ˆ๋‹ค.
    • ํ˜•์ƒ ์ •๋ณด (Shape Information): ๊ฐ์ฒด์˜ 3D ๋ฉ”์‹œ์—์„œ N_p๊ฐœ์˜ ํฌ์ธํŠธ(point)๋ฅผ ์ƒ˜ํ”Œ๋งํ•œ ํ›„, PointNet [72]์„ ์‚ฌ์šฉํ•˜์—ฌ c_p์ฐจ์›์˜ ํŠน์ง• ๋ฒกํ„ฐ z_{shape_t}๋กœ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ์—ฐ๊ตฌ์™€ ๋‹ฌ๋ฆฌ ๊ฐ์ฒด์˜ ๋ช…์‹œ์ ์ธ ํ˜•์ƒ ์ •๋ณด๋ฅผ ์ •์ฑ…์— ์ฃผ์ž…ํ•˜๋Š” ๊ฒƒ์ด ๋ณต์žกํ•œ ๊ฐ์ฒด ์กฐ์ž‘์— ์ค‘์š”ํ•จ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ๋ฌผ๋ฆฌ์  ์†์„ฑ ๋ฐ ์ž์„ธ (Physical Property and Pose): ๊ฐ์ฒด์˜ ์งˆ๋Ÿ‰(mass), ๋ฌด๊ฒŒ ์ค‘์‹ฌ(center of mass), ๋งˆ์ฐฐ ๊ณ„์ˆ˜(coefficient of friction), ์Šค์ผ€์ผ(scale), ๋ฐ˜๋ฐœ ๊ณ„์ˆ˜(restitution)๋ฅผ ํฌํ•จํ•˜๋Š” 7์ฐจ์› ๋ฌผ๋ฆฌ์  ์†์„ฑ ๋ฒกํ„ฐ์™€ ๊ฐ์ฒด์˜ ์œ„์น˜, ์ž์„ธ(orientation, ์ฟผํ„ฐ๋‹ˆ์–ธ), ๊ฐ์†๋„(angular velocity)๋ฅผ ํฌํ•จํ•˜๋Š” 10์ฐจ์› ์ž์„ธ ๋ฒกํ„ฐ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ 8์ฐจ์› ์ธ์ฝ”๋”ฉ z_{phys_t}๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข… ํŠน๊ถŒ ์ธ์ฝ”๋”ฉ z_t๋Š” z_{phys_t}์™€ z_{shape_t}๋ฅผ ๊ฒฐํ•ฉํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค: z_t = [z_{phys_t}, z_{shape_t}].
    • ๊ด€์ธก ๋ฐ ์ถœ๋ ฅ (Observations and Outputs): Oracle policy \pi๋Š” ๋กœ๋ด‡์˜ ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ(proprioception) p_t์™€ ์ธ์ฝ”๋”ฉ๋œ ํŠน๊ถŒ ์ •๋ณด z_t๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค. p_t๋Š” ์กฐ์ธํŠธ ์œ„์น˜ ๋ฐ ์ด์ „ ์•ก์…˜์˜ ์งง์€ ์‹œ๊ฐ„ ์œˆ๋„์šฐ(temporal window)๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ •์ฑ…์€ 16๊ฐœ ๊ด€์ ˆ์— ๋Œ€ํ•œ PD Controller์˜ ๋ชฉํ‘œ๊ฐ’์ธ ์•ก์…˜ a_t๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, a_t = \pi(p_t, z_t)์ž…๋‹ˆ๋‹ค.
    • ๋ณด์ƒ ํ•จ์ˆ˜ (Reward Function): ๊ฐ์ฒด ํšŒ์ „ ๋ณด์ƒ r_{rotr} = \max(\min(\omega \cdot k, r_{max}), r_{min})์€ ๊ฐ์ฒด์˜ ๊ฐ์†๋„ \omega๊ฐ€ ๋ชฉํ‘œ ํšŒ์ „์ถ• k์™€ ์ผ์น˜ํ•˜๋„๋ก ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. ์˜๋„ํ•˜์ง€ ์•Š์€ ํšŒ์ „(ํŠนํžˆ x, y์ถ•)์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด r_{rotp} = -\|\omega \times k\|_1 ํ˜•ํƒœ์˜ ํŽ˜๋„ํ‹ฐ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด์™ธ์—๋„ ์† ์ž์„ธ ์ดํƒˆ, ํ† ํฌ, ์—๋„ˆ์ง€ ์†Œ๋ชจ, ๊ฐ์ฒด ์„ ํ˜• ์†๋„์— ๋Œ€ํ•œ ํŽ˜๋„ํ‹ฐ ํ•ญ์„ ํฌํ•จํ•˜์—ฌ ์•ˆ์ •์ ์ด๊ณ  ํšจ์œจ์ ์ธ ๋™์ž‘์„ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.
    • ์ •์ฑ… ์ตœ์ ํ™” (Policy Optimization): PPO [75]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Oracle policy๋ฅผ ์ตœ์ ํ™”ํ•˜๋ฉฐ, ํ›ˆ๋ จ ์ค‘ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์™€ ๋ฌด์ž‘์œ„ํ™”๋œ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ ๋ฐ ์ดˆ๊ธฐ ๊ทธ๋ฆฝ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  2. Visuotactile Policy ํ›ˆ๋ จ (Visuotactile Policy Training):
    • ๋™๊ธฐ (Motivation): ์‹ค์ œ ์„ธ๊ณ„์—์„œ๋Š” ํŠน๊ถŒ ์ •๋ณด z_t์— ์ ‘๊ทผํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ, ๋กœ๋ด‡์˜ ์‹ค์ œ ๊ด€์ธก(์‹œ๊ฐ, ์ด‰๊ฐ, ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ)์„ ํ†ตํ•ด z_t์˜ ํ‘œํ˜„ \hat{z}_t๋ฅผ ์ถ”๋ก ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด‰๊ฐ ์„ผ์‹ฑ (Touch Sensing - Figure 4):
      • ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ์ œ๊ณตํ•˜๋Š” 2D ํ‰๋ฉด์ƒ์˜ ์ด์‚ฐํ™”๋œ ์ ‘์ด‰ ์œ„์น˜(discretized contact location)๋ฅผ ์ด‰๊ฐ ์ •๋ณด์˜ ๋Œ€์šฉ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค (8๊ฐœ ์œ„์น˜). ์ ‘์ด‰ ๊ด€์ธก o_{touch_t}๋Š” ์ ‘์ด‰ ๊ฐœ์ˆ˜ N_c์— 9์ฐจ์› ๋ฐฐ์—ด(8์ฐจ์› ์ ‘์ด‰ ์œ„์น˜ + 1์ฐจ์› ์†๊ฐ€๋ฝ ์ธ๋ฑ์Šค)์ž…๋‹ˆ๋‹ค. MLP๋ฅผ ํ†ตํ•ด ๊ฐ ์ ‘์ด‰ ์ •๋ณด๋ฅผ ์ฒ˜๋ฆฌํ•œ ํ›„ ํ‰๊ท  ํ’€๋ง(average pooling)์œผ๋กœ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค.
      • ์‹ค์ œ ์„ธ๊ณ„์—์„œ๋Š” ์†๊ฐ€๋ฝ ๋์— ์žฅ์ฐฉ๋œ 4๊ฐœ์˜ ์ „๋ฐฉํ–ฅ(omnidirectional) ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ(vision-based touch sensor)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์„ผ์„œ์—์„œ ๊ฐ€์žฅ ๊ฐ•ํ•œ ํ”ฝ์…€์˜ ๋ณ€ํ˜•(deformation)์„ ์ถ”์ ํ•˜์—ฌ ์ ‘์ด‰ ์œ„์น˜์˜ ๋Œ€์šฉ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ด 2D ํ‚คํฌ์ธํŠธ(keypoint)๋ฅผ ์ง์ ‘ ์ •์ฑ…์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ์‹œ๊ฐ ์„ผ์‹ฑ (Vision Sensing - Figure 5):
      • ๊ฐ์ฒด ๊นŠ์ด ์ •๋ณด(object depth)๋ฅผ ์‹œ๊ฐ ํ‘œํ˜„์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์‹ค์ œ ์„ธ๊ณ„์—์„œ ์‚ฌ๋žŒ์˜ ๋ผ๋ฒจ๋ง์ด ํ•„์š” ์—†๊ณ , RGB ์ด๋ฏธ์ง€์˜ ์‚ฌ์‹ค์ ์ธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ์–ด๋ ค์šด ๋ฐ˜๋ฉด ๊นŠ์ด ์ •๋ณด๋Š” ๊ฐ์ฒด ํ˜•์ƒ์„ ์ž˜ ์ถ”์ƒํ™”ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
      • ์‹ค์ œ ๋ฐฐํฌ ์‹œ์—๋Š” Segment Anything [12, 13]์„ ์‚ฌ์šฉํ•˜์—ฌ ์›์‹œ ๊นŠ์ด ์ด๋ฏธ์ง€(raw depth)์—์„œ ๊ฐ์ฒด ์ „๊ฒฝ(foreground)์„ ๋ถ„ํ• ํ•˜์—ฌ sim-to-real gap์„ ์ค„์ž…๋‹ˆ๋‹ค.
      • ๊ฐ์ฒด ๊นŠ์ด ์ด๋ฏธ์ง€ o_{depth_t}๋Š” 3-layer ConvNet์„ ๊ฑฐ์ณ ํŠน์ง• ๋ฒกํ„ฐ f_{depth_t}๋กœ ์ธ์ฝ”๋”ฉ๋ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ค‘์—๋Š” ์นด๋ฉ”๋ผ ์œ„์น˜์™€ ๋ฐฉํ–ฅ์„ ๋ฌด์ž‘์œ„ํ™”ํ•˜์—ฌ ์ •์ฑ…์˜ ๊ฐ•๊ฑด์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
    • Visuotactile Transformer (Figure 2):
      • ์ด ํŠธ๋žœ์Šคํฌ๋จธ \phi๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ(multimodal) ์„ผ์„œ ์ŠคํŠธ๋ฆผ์„ ๋ชจ๋ธ๋งํ•˜์—ฌ ํŠน๊ถŒ ์ •๋ณด์˜ ํ•™์Šต๋œ ํ‘œํ˜„ \hat{z}_t๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
      • ์ธ์ฝ”๋”ฉ๋œ ๊นŠ์ด ์ด๋ฏธ์ง€ f_{depth_t}, ์ธ์ฝ”๋”ฉ๋œ ์ด‰๊ฐ ์ ‘์ด‰ ํฌ์ธํŠธ f_{touch_t}, ์กฐ์ธํŠธ ์œ„์น˜ q_t, ์ด์ „ ํƒ€์ž„์Šคํ…์˜ ์•ก์…˜ a_{t-1}์„ ์—ฐ๊ฒฐํ•˜์—ฌ ํŠน์ง• ๋ฒกํ„ฐ f_t๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค.
      • ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ํŠน์ง•๋“ค์˜ ์‹œํ€€์Šค f_T = \{f_{t-k}, ..., f_{t-1}, f_t\}๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์˜ˆ์ธก๋œ ์™ธ์  ๋ฒกํ„ฐ \hat{z}_t๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ํ›ˆ๋ จ (Training): Oracle policy๋ฅผ ๋กค์•„์›ƒ(rollout)ํ•˜๋ฉด์„œ ์˜ˆ์ธก๋œ ์™ธ์  ๋ฒกํ„ฐ \hat{z}_t๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์•ก์…˜ a_t = \pi(p_t, \hat{z}_t)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋™์‹œ์— ์‹ค์ œ ํŠน๊ถŒ ์ •๋ณด z_t๋ฅผ ์ €์žฅํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹ B = \{(f_T^{(i)}, z_t^{(i)}, \hat{z}_t^{(i)})\}_{i=1}^N์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ \phi๋Š” z_t์™€ \hat{z}_t ๊ฐ„์˜ l_2 ๊ฑฐ๋ฆฌ ๋ฐ a_t์™€ \hat{a}_t ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋„๋ก Adam [78]์„ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ ํ™”๋ฉ๋‹ˆ๋‹ค.

ํ‰๊ฐ€ ์„ค์ • (Evaluation Setup)

  • ํ•˜๋“œ์›จ์–ด (Hardware): AllegroHand (16๊ฐœ ๊ด€์ ˆ), Intel RealSense D435 ๊นŠ์ด ์นด๋ฉ”๋ผ, ์†๊ฐ€๋ฝ ๋์— ์ „๋ฐฉํ–ฅ ์‹œ๊ฐ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ.
  • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ (Simulation): IsaacGym [79] ๊ธฐ๋ฐ˜. ์นด๋ฉ”๋ผ-๋กœ๋ด‡ ์™ธ์ (extrinsics)์€ ArUco tag [80]๋กœ ๋ณด์ •๋˜๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ด๋ฏธ์ง€์— ๋ฌด์ž‘์œ„ ์ž์„ธ ๋…ธ์ด์ฆˆ์™€ ์‚ฌ์‹ค์ ์ธ ๊นŠ์ด ๋…ธ์ด์ฆˆ [81]๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ์ฒด ์„ธํŠธ (Object Set): EGAD [30], Google Scanned Objects [31], YCB [32], ContactDB [33]์—์„œ ์—„์„ ๋œ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋“ค์„ ์‚ฌ์šฉํ•˜๋ฉฐ, width/depth/height (w/d/h) ๋น„์œจ์ด 2.0 ๋ฏธ๋งŒ์ธ ๊ฐ์ฒด๋“ค๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • ํ‰๊ฐ€ ์ง€ํ‘œ (Evaluation Metrics):
    • Time-to-Fall (TTF): ๊ฐ์ฒด๊ฐ€ ์†์—์„œ ๋–จ์–ด์ง€๊ธฐ ์ „๊นŒ์ง€์˜ ํ‰๊ท  ์—ํ”ผ์†Œ๋“œ ๊ธธ์ด (๋†’์„์ˆ˜๋ก ์ข‹์Œ).
    • Rotation Reward (RotR): ์—ํ”ผ์†Œ๋“œ ๋‹น ํ‰๊ท  \omega \cdot k ๊ฐ’ (๋†’์„์ˆ˜๋ก ์ข‹์Œ).
    • Rotation Penalty (RotP): ํƒ€์ž„์Šคํ… ๋‹น ํ‰๊ท  \|\omega \times k\| ๊ฐ’ (๋‚ฎ์„์ˆ˜๋ก ์ข‹์Œ, ํŠนํžˆ x, y์ถ• ํšŒ์ „์—์„œ ์ค‘์š”).
    • Radians Rotated (Rotations): ์‹ค์ œ ์„ธ๊ณ„ ์‹คํ—˜์—์„œ ๋‹ฌ์„ฑ๋œ ์ด ํšŒ์ „ ๊ฐ๋„.

๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„ (Results and Analysis)

  • ๊ฐ์ฒด ํ˜•์ƒ์˜ ์ค‘์š”์„ฑ (Object Shape Importance): Table 1๊ณผ Figure 7, Figure 8์€ Oracle policy ํ›ˆ๋ จ์—์„œ PointNet์„ ํ†ตํ•ด ๊ฐ์ฒด ํ˜•์ƒ ์ •๋ณด๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ํŠนํžˆ ๋ถˆ๊ทœ์น™ํ•˜๊ฑฐ๋‚˜ w/d/h ๋น„์œจ์ด ๊ท ์ผํ•˜์ง€ ์•Š์€ ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋” ํฐ ์ด์ ์„ ์ œ๊ณตํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํ˜•์ƒ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฉด ์ •์ฑ…์ด ๊ฐ์ฒด๋ฅผ ๊ตฌํ˜• ๋˜๋Š” ์ง์œก๋ฉด์ฒด๋กœ ๊ฐ„์ฃผํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋ฉฐ, ์ด๋Š” OOD(out-of-distribution) ๊ฐ์ฒด์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ต๋‹ˆ๋‹ค.
  • Visuotactile Transformer์˜ ์ค‘์š”์„ฑ (Importance of Visuotactile Transformer): Figure 6, Figure 7, Figure 8, Table 4๋Š” ์‹œ๊ฐ ๋˜๋Š” ์ด‰๊ฐ ์ค‘ ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉํ•ด๋„ ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ(proprioception)๋งŒ์„ ์‚ฌ์šฉํ•œ baseline๋ณด๋‹ค ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜๋ฉฐ, ์ด ๋‘˜์„ ๊ฒฐํ•ฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ๋”์šฑ ๊ฐœ์„ ๋˜์–ด Oracle policy ์ˆ˜์ค€์— ๊ทผ์ ‘ํ•จ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋Š” ์ด์ „ ์ž‘์—…์˜ Temporal Convolution๋ณด๋‹ค ์‹œํ€€์Šค ๋ชจ๋ธ๋ง ๋Šฅ๋ ฅ์ด ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค (Table 4). OOD ์ผ๋ฐ˜ํ™”์—๋„ Visuotactile ์ •๋ณด๊ฐ€ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.
  • ์„ธ๋ถ„ํ™”๋œ ์ด‰๊ฐ ์„ผ์‹ฑ (Finer Tactile Sensing): Table 2๋Š” ์ด์ง„(binary) ์ ‘์ด‰ ์ •๋ณด(์ ‘์ด‰ ์œ ๋ฌด)๊ฐ€ ์ถ”๊ฐ€์ ์ธ ์ด์ ์„ ์ œ๊ณตํ•˜์ง€ ์•Š๋Š” ๋ฐ˜๋ฉด, ์ด์‚ฐํ™”๋œ ์ ‘์ด‰ ์œ„์น˜(discretized contact location) ์ •๋ณด๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๋งค์šฐ ์ค‘์š”ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Š” ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ ๋ฐ ์•ก์…˜ ์ด๋ ฅ์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” RotateIt์˜ ํŠน์„ฑ ๋•Œ๋ฌธ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ž ์žฌ ๊ณต๊ฐ„์—์„œ ํ•™์Šต๋œ ํ‘œํ˜„ (Representation Learned in the Latent Space): Figure 9๋Š” ํ•™์Šต๋œ z_t ๋ฐ \hat{z}_t ์ธ์ฝ”๋”ฉ์ด ๊ฐ์ฒด์˜ 3D ํ˜•์ƒ ์ •๋ณด๋ฅผ ์ž˜ ๋ณด์กดํ•˜๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํŠน๊ถŒ ์ •๋ณด์— ํ˜•์ƒ์ด ํฌํ•จ๋˜๋ฉด ์ •์ฑ…์€ ๊ฐ์ฒด์˜ ์‹ค์ œ ํ˜•์ƒ์„ ๋” ์ •ํ™•ํ•˜๊ฒŒ ์ดํ•ดํ•˜๊ณ , Visuotactile ์„ผ์„œ๋Š” ๊ณ ์œ ์ˆ˜์šฉ๊ฐ๊ฐ๋งŒ์œผ๋กœ๋Š” ๊ตฌ๋ถ„ํ•˜๊ธฐ ์–ด๋ ค์šด ๋ถˆ๊ทœ์น™ํ•œ ๊ฐ์ฒด์˜ ํ˜•์ƒ ์ดํ•ด๋ฅผ ๋•์Šต๋‹ˆ๋‹ค.
  • ์‹ค์ œ ์„ธ๊ณ„ ํ‰๊ฐ€ (Real-world Evaluations): Figure 10์€ RotateIt์ด Hora [7]์™€ ๋‹ฌ๋ฆฌ ์‹ค์ œ ์„ธ๊ณ„์—์„œ ๋‹ค์–‘ํ•œ ๊ธฐํ•˜ํ•™์  ํ˜•ํƒœ์˜ ๊ฐ์ฒด๋“ค์„ x์ถ•์„ ๋”ฐ๋ผ ์„ฑ๊ณต์ ์œผ๋กœ ํšŒ์ „์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. RotateIt์€ ํ›ˆ๋ จ ์„ธํŠธ์— ์—†๋Š” ๊ฐ์ฒด๋“ค๊ณผ ์‹ค์ œ ์„ธ๊ณ„์˜ ๋ฌผ๋ฆฌ์  ์ฐจ์ด์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋›ฐ์–ด๋‚œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ์ด‰๊ฐ ์„ผ์„œ๊ฐ€ ๋น„ํ™œ์„ฑํ™”๋œ ์ƒํƒœ์—์„œ๋„ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐ•๊ฑด์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ๋‹ค์ถ• ํ›ˆ๋ จ (Multi-axis Training): Table 3์€ ๋‹จ์ผ ๋„คํŠธ์›Œํฌ๊ฐ€ ์—ฌ๋Ÿฌ ํšŒ์ „์ถ•์— ๋Œ€ํ•œ ๊ฐ์ฒด ํšŒ์ „์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์›ํ•˜๋Š” ํšŒ์ „์ถ• k๋ฅผ ๊ด€์ธก ๊ณต๊ฐ„์— ์ถ”๊ฐ€ํ•˜๊ณ  ๋ชจ๋ฐฉ ํ•™์Šต(imitation learning) ๋ชฉํ‘œ์™€ ํ•จ๊ป˜ ํ›ˆ๋ จํ•˜๋ฉด, ์ฆ๋ฅ˜๋œ ๋‹ค์ถ• ์ •์ฑ…(distilled multi-axis policy)์ด ๋‹จ์ผ ์ถ• Oracle ์ •์ฑ…๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

ํ•œ๊ณ„ ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ (Limitations and Future Work)

๋ณธ ์—ฐ๊ตฌ์˜ ํ•œ๊ณ„์ ์œผ๋กœ๋Š” ๊ฐ์ฒด๊ฐ€ ๋กœ๋ด‡ ํŒ”์˜ ๊ธฐ๊ณ„์  ํ•œ๊ณ„ ๋‚ด์— ์žˆ์–ด์•ผ ํ•˜๋ฉฐ ๋„ˆ๋ฌด ๊ธธ์ง€ ์•Š์•„์•ผ ํ•œ๋‹ค๋Š” ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ›ˆ๋ จ ํ›„ ์ •์ฑ…์ด ๊ณ ์ •(frozen)๋˜์–ด ๋ฐฐํฌ ์ค‘ ์‹ค์ œ ๊ฒฝํ—˜์„ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ์ ๋„ ํ•œ๊ณ„์ž…๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์œผ๋กœ๋Š” ๊ต์ฐจ ๋ชจ๋‹ฌ(cross-modal) ๊ฐ๋…์„ ํ†ตํ•œ ์‹ค์ œ ์„ธ๊ณ„์—์„œ์˜ ํ‰์ƒ ํ•™์Šต(lifelong learning), ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ ์ด‰๊ฐ ์„ผ์„œ์˜ ์ „์ฒด ์ •๋ณด ํ™œ์šฉ, ์‹œ๊ฐ ์‹œ์Šคํ…œ ๊ฐœ์„ (์˜ˆ: ์‹œ๊ฐ ์‚ฌ์ „ ํ›ˆ๋ จ) ๋“ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

RotateIt์€ ์ด‰๊ฐ ๋ฐ ์‹œ๊ฐ ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ๋กœ๋ด‡์ด ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋ฅผ ์† ์•ˆ์—์„œ ๋‹ค์ถ•์œผ๋กœ ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ์œผ๋กœ์จ, ์ผ๋ฐ˜์ ์ธ ๋ฑ์Šคํ„ฐ๋Ÿฌ์Šค(dexterous) ์† ์กฐ์ž‘์„ ํ–ฅํ•œ ์ค‘์š”ํ•œ ๋ฐœ๊ฑธ์Œ์„ ๋‚ด๋”›์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”” Ring Review

๐Ÿ”” Ring โ€” An idea that echoes. Grasp the core and its value.

  • ๋น„์ „+์ด‰๊ฐ ์„ผ์‹ฑ๊ณผ ํ•™์Šต์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ in-hand object rotation์„ ๋‹ฌ์„ฑํ•œ ์ตœ์ดˆ์˜ ์—ฐ๊ตฌ
  • ์ผ๋ฐ˜ ๋ฌผ์ฒด์—์„œ 40% ์ด์ƒ, ๋น„์ •ํ˜• ๋ฌผ์ฒด์—์„œ๋Š” ๋” ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑ
  • Visuotactile ์ ‘๊ทผ๋ฒ•์€ proprioception๋งŒ ์‚ฌ์šฉํ•  ๋•Œ ๋Œ€๋น„ OOD generalization ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ 41%์—์„œ 8%๋กœ ํฌ๊ฒŒ ์ค„์ž„
  • Sim2Real ์ „๋žต: ๋น„์ „์€ depth map์œผ๋กœ sim/real ๊ฐ„ ๋„๋ฉ”์ธ ๊ฐญ์„ ์ตœ์†Œํ™”ํ•˜๊ณ , ์ด‰๊ฐ์€ discrete contact location์œผ๋กœ ๊ทผ์‚ฌ + ํ•˜๋“œ์›จ์–ด์—์„œ color tracking์œผ๋กœ pixel displacement ์ธก์ •
  • Segment-Anything๋ฅผ ํ™œ์šฉํ•œ depth camera ๋น„์ „ ํŒŒ์ดํ”„๋ผ์ธ ํ†ตํ•ฉ

Copyright 2026, JungYeon Lee