Curieux.JY
  • Post
  • Note
  • Jung Yeon Lee

On this page

  • 1 Introduction
  • 2 Learning Touch Dexterity
    • 2.1 Problem Formulation
      • 2.1.1 State
      • 2.1.2 Action
      • 2.1.3 Reward
      • 2.1.4 Reset
    • 2.2 Domain Randomization
    • 2.3 Training Procedure
  • 3 Tactile Dexterous Manipulation System
    • 3.1 Real-world System Setup
    • 3.2 Simulation Setup
    • 3.3 Benchmark: In-hand Rotation
  • 4 Reference

๐Ÿ“ƒRotating without Seeing ๋ฆฌ๋ทฐ

paper
tactile
rl
hand
Towards In-hand Dexterity through Touch
Published

December 22, 2024

22

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋ฆฌ๋ทฐํ•  ๋…ผ๋ฌธ์€ Rotating without Seeing: Towards In-hand Dexterity through Touch ์ž…๋‹ˆ๋‹ค. RSS(Robotics: Science and Systems) 2023 ํ•™ํšŒ์—์„œ ๋ฐœํ‘œ๋œ ํ•ด๋‹น ๋…ผ๋ฌธ์€ ์‚ฌ๋žŒ์ด ์‹œ๊ฐ ์—†์ด ์ด‰๊ฐ๋งŒ์œผ๋กœ ์†์•ˆ์—์„œ ๋ฌผ์ฒด๋ฅผ ์ •๊ตํ•˜๊ฒŒ ์กฐ์ž‘ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๋กœ๋ด‡ ํ•ธ๋“œ์— ๊ตฌํ˜„ํ•˜๊ณ ์ž, ์†๋ฐ”๋‹ฅ, ์†๊ฐ€๋ฝ ๊ด€์ ˆ, ์†๋ ์ „์ฒด์— ๋„“๊ฒŒ ๋ถ„ํฌ๋œ ์ €๋น„์šฉ์˜ ์ด์ง„ ์ด‰๊ฐ ์„ผ์„œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ํ•™์Šตํ•œ ์ •์ฑ…์„ ์‹ค์ œ ๋กœ๋ด‡ ์†์— ์ ์šฉํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ํ•™์Šตํ•œ ๋ฌผ์ฒด๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ•™์Šตํ•˜์ง€ ์•Š์€ ์ƒˆ๋กœ์šด ๋ฌผ์ฒด๊นŒ์ง€ ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์ธ Touch Dexterity๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

1 Introduction

๋Œ€๋‹ค์ˆ˜์˜ ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ์ ์  ๋” ๊ณ ํ’ˆ์งˆ์˜ ์„ผ์„œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ •๋ฐ€ํ•˜๊ณ  ์„ธ๋ฐ€ํ•œ ์ ‘์ด‰์„ ๋‹ค๋ฃจ๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์€ ๋Œ€๊ฐœ ๋น„์‹ผ ์„ผ์„œ๋ฅผ ๊ทธ๋ฆฌํผ๋‚˜ ์†์˜ ์†๊ฐ€๋ฝ ๋๋ถ€๋ถ„์—๋งŒ ๋ถ€์ฐฉํ•  ์ˆ˜ ์žˆ์–ด, ์กฐ์ž‘๊ธฐ ์ „์ฒด๋ฅผ ๊ฐ์ง€ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ•œ๊ณ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์ž‘์—…์˜ ๋ฒ”์œ„๊ฐ€ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ณต์žกํ•œ ์ž‘์—…์„ ์œ„ํ•ด ๋Œ€๋Ÿ‰์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•˜์ง€๋งŒ, ๊ณ ์ •๋ฐ€์˜ ์„ผ์„œ๋“ค์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋”์šฑ๋” ๋ฒŒ์–ด์ง€๋Š” Sim2Real ๊ฐ„์˜ ์ฐจ์ด๋กœ ์ธํ•ด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์–ด๋ ค์šด ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

Touch Dexterity๋Š” ๋ฌผ์ฒด๋ฅผ โ€œ๋ณด๊ณ โ€ ์กฐ์ž‘ํ•˜๋Š” ๋Œ€์‹  ๋‹จ์ˆœํžˆ ์ ‘์ด‰๋งŒ์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ํšŒ์ „ํ•˜๊ฑฐ๋‚˜ ์กฐ์ž‘ํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ๋ฒ•์€ ์†์˜ ํ•œ์ชฝ ๋ฉด(์†๊ฐ€๋ฝ ๋, ๋งํฌ, ์†๋ฐ”๋‹ฅ)์— ๋ถ€์ฐฉ๋œ ์ €๋น„์šฉ์˜ binary force ์„ผ์„œ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์„ผ์„œ๋Š” ์ ‘์ด‰ ์—ฌ๋ถ€๋งŒ์„ ํŒ๋‹จํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๋ฌผ์ฒด์˜ ์ƒํƒœ๋ฅผ โ€œ๋А๋‚„ ์ˆ˜โ€ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. 16๊ฐœ์˜ ์„ผ์„œ๋ฅผ ์กฐํ•ฉํ•˜๋ฉด ์ตœ๋Œ€ 2ยนโถ๊ฐ€์ง€ ์ƒํƒœ๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์–ด ๊ฐ•๋ ฅํ•œ ํ‘œํ˜„๋ ฅ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๊ณ , ๋˜ํ•œ Sim2Real ๊ฒฉ์ฐจ๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ์ถฉ๋ถ„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•๋ณดํ•จ์œผ๋กœ์จ ํ•ด๊ฒฐ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์ด ๋•Œ binary ์„ผ์„œ๋Š” ๋‹จ์ˆœํ•œ ๊ตฌ์กฐ๋กœ ์ธํ•ด ๋…ธ์ด์ฆˆ์— ๋œ ๋ฏผ๊ฐํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

22
Amazon์—์„œ ์•ฝ 8๋‹ฌ๋Ÿฌ ์ •๋„๋กœ ์ €๋ ดํ•˜๊ฒŒ ํŒ๋งค๋˜๊ณ  ์žˆ๋Š” FSR ์„ผ์„œ๋ฅผ ๋ถ€์ฐฉํ•œ ๋ชจ์Šต

Touch Dexterity๋Š” ๋‹ค์ค‘ ์†๊ฐ€๋ฝ ๋กœ๋ด‡ ์†์„ ์‚ฌ์šฉํ•˜์—ฌ โ€œ๋ณด์ด์ง€ ์•Š๋Š”โ€ ๋ฌผ์ฒด๋ฅผ x, y, z ์ถ•์œผ๋กœ ํšŒ์ „์‹œํ‚ค๋Š” ์ž‘์—…์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋Š” in-hand re-orientation task์˜ ๋‹จ์ˆœํ™”๋œ ๋ฒ„์ „์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋ณด์ด์ง€ ์•Š๋Š” ๋ฌผ์ฒด๋ž€ ๋‹จ์ˆœํžˆ ์‹œ๊ฐ ์„ผ์„œ๊ฐ€ ์—†๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ํ•™์Šต ์ค‘์— ๋ณด์ง€ ๋ชปํ•œ ๋ฌผ์ฒด๋“ค์„ ์˜๋ฏธํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ•ํ™” ํ•™์Šต(RL) ์ •์ฑ…์€ binary touch sensing ์ •๋ณด์™€ ๋กœ๋ด‡์˜ ๋‚ด๋ถ€ ์ƒํƒœ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๊ฐ ์‹œ๊ฐ„ ๋‹จ๊ณ„์—์„œ ํ๋ฃจํ”„(closed loop) ์ œ์–ด๋ฅผ ์œ„ํ•œ ํ–‰๋™์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต Agent๋Š” ๋ฌผ์ฒด์˜ 3D ๊ตฌ์กฐ์™€ ์ž์„ธ๋ฅผ ์•”๋ฌต์ ์œผ๋กœ ํ•™์Šตํ•˜์—ฌ ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํšŒ์ „์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ ์‹ค์ œ ๋กœ๋ด‡ ์‹œ์Šคํ…œ ํ…Œ์ŠคํŠธ์—์„œ๋Š” 10๊ฐœ์˜ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ๊ทธ๋ฆผ์€ Unseen ๋ฌผ์ฒด์˜€๋˜ ์˜ค๋ฆฌ ์ธํ˜•์„ real world์—์„œ in-hand manipulation์„ ํ•˜๋Š” ์žฅ๋ฉด์ž…๋‹ˆ๋‹ค.

22
Rotate the rubber duck for two cycles without falling, even if it is never presented in training

Dexterous Manipulation

๊ธฐ์กด์˜ ๋ถ„์„์  ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์€ ๋ฌผ์ฒด์™€ ์ปจํŠธ๋กค๋Ÿฌ์— ๋Œ€ํ•œ ์ง€๋‚˜์น˜๊ฒŒ ๋งŽ์€ ๊ฐ€์ •์„ ํ•„์š”๋กœ ํ•˜์—ฌ ๋ณต์žกํ•œ ๋ฌธ์ œ๋กœ ํ™•์žฅํ•˜๋Š” ๋ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ฐฉ ํ•™์Šต(imitation learning)์€ ์ฃผ๋กœ ์‹œ๊ฐ ์ž…๋ ฅ์— ์˜์กดํ•˜๋ฉฐ, proprioception ๋ฐ์ดํ„ฐ ๋‚ด์— ํฌํ•จ๋œ ์•”๋ฌต์ ์ธ ์ด‰๊ฐ ์ •๋ณด๋ฅผ ํ†ตํ•ด ๋ฌผ์ฒด์˜ ์ •๋ณด๋ฅผ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์ฃผ๋กœ ์†๊ฐ€๋ฝ ๋์—์„œ์˜ ๋ฌผ์ฒด ํšŒ์ „์ด๋‚˜ ์ œํ•œ๋œ ๋ฌผ์ฒด ์ง‘ํ•ฉ์˜ ํšŒ์ „์—๋งŒ ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Touch Dexterity๋Š” ์ด‰๊ฐ ์„ผ์„œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†๊ณผ ๋ฌผ์ฒด ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ๋ช…์‹œ์ ์œผ๋กœ ์ธ์ง€ํ•˜๊ณ , ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ ๋ฌผ์ฒด์— ๋Œ€ํ•ด ์†๋ฐ”๋‹ฅ ์œ„์—์„œ์˜ ๋ฌผ์ฒด ํšŒ์ „ ๋ฌธ์ œ๋ฅผ ํ’‰๋‹ˆ๋‹ค. ์ด๋Š” ๋ณต์žกํ•œ ๋ฌผ์ฒด์˜ ์›€์ง์ž„์„ ํฌํ•จํ•˜๋ฉฐ, ํ›จ์”ฌ ๋” ๋„์ „์ ์ธ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๊ณ  ํ•™์Šต๋˜์ง€ ์•Š์€ ์ƒˆ๋กœ์šด ๋ฌผ์ฒด์— ๋Œ€ํ•ด์„œ๋„ ์ผ๋ฐ˜ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Tactile Robotic Manipulation

What type of touch information is essential?

๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ๋‹ค์–‘ํ•œ ์„ผ์„œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์กฐ์ž‘์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด ๊ตญ์†Œ์ ์ธ ํ˜•์ƒ ์ •๋ณด(local geometry), ํž˜๊ณผ ํ† ํฌ, ์ ‘์ด‰ ์ด๋ฒคํŠธ, ๋ฌผ์งˆ ํŠน์„ฑ์„ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ด์™”์Šต๋‹ˆ๋‹ค. ์‹ฌ์ง€์–ด ๊ฐ„๋‹จํ•œ binary ์ ‘์ด‰ ์‹ ํ˜ธ๋ฅผ sparse sensor array๋กœ ์ œ๊ณต๋ฐ›๋Š” ๊ฒฝ์šฐ์—๋„ high-dimensional manipulation task์—์„œ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ก€๋กœ, Shadow Hand์— ์†๋ฐ”๋‹ฅ์— ๋ฐ€์ง‘๋œ ์„ผ์„œ ๋ ˆ์ด์•„์›ƒ์„ ํ™œ์šฉํ•œ ์—ฐ๊ตฌ๋„ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

How can tactile events be simulated to facilitate Sim2Real transfer?

์ผ๋ฐ˜์ ์œผ๋กœ ์ ‘์ด‰ ํ‘œ๋ฉด(contact surface)์—์„œ normal & shear tactile force field์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Touch Dexterity๋Š” ๋ณ„๋„์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ค๊ณ„๋ฅผ ์š”๊ตฌํ•˜์ง€ ์•Š๊ณ , ๊ธฐ์กด ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ ๋‚ด์žฅ๋œ contact ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ํšจ์œจ์ ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2 Learning Touch Dexterity

๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ Touch Dexterity์˜ AI ๋ชจ๋“ˆ์ด ํ•™์Šต๋˜๋Š” ๊ณผ์ •์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

2.1 Problem Formulation

Touch Dexterity๋Š” ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•์œผ๋กœ ์ œ์–ด๋ฅผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ•ํ™”ํ•™์Šต์˜ ๋ฌธ์ œ ์ •์˜ ๋ฐฉ์‹์ธ MDP(Markov Decision Process)์˜ ์š”์†Œ๋“ค, State, Action, Reward ์ˆœ์œผ๋กœ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

2.1.1 State

State๋Š” Hand Robot Agent์˜ ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์š”์†Œ๋“ค๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. Allegro hand ๋กœ๋ด‡์˜์˜ joint position(16 ์ฐจ์›), sensor observation(16์ฐจ์›), ์ด์ „ position target(16์ฐจ์›), ๊ทธ๋ฆฌ๊ณ  rotation axis(2์ฐจ์›)๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ธ๋“œ ๋กœ๋ด‡์˜ ๊ด€์ ˆ(joint) ๋ถ€๋ถ„๋“ค์ด ์ž‘์€ ๋ชจํ„ฐ๋“ค 16๊ฐœ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๊ณ , FSR(Force Sensing Resistor) ์„ผ์„œ๋“ค๋„ ์ด 16๊ฐœ๊ฐ€ ์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ์†๊ฐ€๋ฝ๊ณผ ์†๋ฐ”๋‹ฅ์— ๋ถ„ํฌ๋˜์–ด ์žˆ์–ด State ๋ฒกํ„ฐ์˜ ์ฐจ์›๋“ค์ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑ๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๊ตฌ์„ฑ๋œ State๊ฐ€ ํ•™์Šต ์‹œ์— Policy Network์˜ Input์œผ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋˜๋Š”๋ฐ 1 time step ์ •๋ณด๋งŒ์œผ๋กœ๋Š” ํ•™์Šตํ•˜๊ธฐ์— ๋ถ€์กฑํ•œ ์ •๋ณด๋Ÿ‰์ด๊ธฐ ๋•Œ๋ฌธ์— ํ˜„์žฌ ์‹œ์  ๊ธฐ์ค€ ์ด์ „ ์Šคํ… 2 time step์„ ํ•ฉ์ณ(concatenation), ์ด 3 time step ์„ ์Œ“์•„์„œ policy network์— input์œผ๋กœ ๋„ฃ์–ด์ค๋‹ˆ๋‹ค.

22
State ๊ตฌ์„ฑ์š”์†Œ

2.1.2 Action

Hand Agent๊ฐ€ ์›€์ง์ด๋Š” Action์€ ๋กœ๋ด‡์˜ ๊ฐ ๊ด€์ ˆ(joint) ๋ชจํ„ฐ๋“ค์ด ์›€์ง์ด๋Š” ๊ฒƒ์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Policy network์—์„œ๋Š” 16์ฐจ์›์˜ ๋ชจํ„ฐ์™€ ๊ด€๋ จ๋œ ์–ด๋– ํ•œ command ์ •๋ณด๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ Policy network์˜ output์ธ a_t๋ฅผ ๋ฐ”๋กœ ์“ฐ๋Š” ๊ฒƒ์ด ์•„๋‹Œ PD Controller์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์น˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ PD Controller์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ’์€ \tilde{q}_{t+1} (ํ˜„์žฌ time step์ด t ์ด๋ฏ€๋กœ ์•ž์œผ๋กœ ์ œ์–ดํ•  position์˜ time step ์ฒจ์ž๋Š” t+1)์ธ ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ \tilde{q}_{t+1}์„ ๋ฐ”๋กœ ์ ์šฉํ•  ๊ฒฝ์šฐ ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. policy network output ๊ฐ’๋“ค์ด ์—ฐ์†์ ์ธ ์‹œ๊ฐ„ ์ˆœ์œผ๋กœ ๋ดค์„๋•Œ ๊ฐญ์ด ํฐ ๊ฐ’๋“ค์ด ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋˜๋ฉด ๋ถ€๋“œ๋Ÿฌ์šด ์›€์ง์ž„์„ ๊ฐ€์งˆ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ•ด๋‹น์—ฐ๊ตฌ์—์„œ๋Š” Exponential moving average ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ smoothingํ•˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์น˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

22
Action์ด ์ ์šฉ๋˜๋Š” ๊ณผ์ •

์•„๋ž˜ ๊ทธ๋ž˜ํ”„๋Š” ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ(\eta, 2 consecutive steps)๋กœ ๋žœ๋คํ•œ ํฌ์ธํŠธ๋“ค์„ ๊ฐ€์ง€๊ณ  smoothingํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Code
import matplotlib.pyplot as plt
import numpy as np

# Parameters
eta = 0.8  # Smoothing factor
steps = 2  # Step size for x-axis
n_points = 200  # Number of points

# Generate data
x = np.arange(0, n_points, steps)
data = np.sin(x / 5) + np.random.normal(0, 0.3, len(x))  # Random data with noise
ema = []

# Calculate EMA
for i, point in enumerate(data):
    if i == 0:
        ema.append(point)  # Initialize EMA with the first data point
    else:
        ema.append(eta * point + (1 - eta) * ema[-1])

# Plot
plt.figure(figsize=(8, 3))
plt.plot(x, data, label="Data", marker="o", linestyle="--", alpha=0.6)
plt.plot(x, ema, label="Exponential Moving Average (EMA)", linewidth=2)
plt.xlabel("Step")
plt.ylabel("Value")
plt.title(f"Exponential Moving Average (eta={eta}, step={steps}) ")
plt.legend()
plt.grid(True)
plt.show()

์ด๋ ‡๊ฒŒ ์ตœ์ข…์ ์œผ๋กœ ๊ณ„์‚ฐ๋œ Action ๊ฐ’์œผ๋กœ Hand Agent์˜ ๋ชจ์…˜์ด ๋งŒ๋“ค์–ด์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

2.1.3 Reward

๋ณด์ƒํ•จ์ˆ˜๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด 6๊ฐœ์˜ term๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ 6๊ฐœ์˜ reward term๋“ค์€ linear weighted sum์ด ๋˜์–ด ํ•ด๋‹น timestep์—์„œ์˜ ์ตœ์ข… reward๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

22
Reward Function
  1. Reward of rotation r_{rot}
    • ํšŒ์ „ ์ถ• k์˜ ๋ฒ•์„  ํ‰๋ฉด \Pi์—์„œ ์ƒ˜ํ”Œ๋ง๋œ ๋‹จ์œ„ ๋ฒกํ„ฐ์˜ ํšŒ์ „ ๊ฐ๋„ \Delta \theta๋กœ ์ •์˜๋œ ํšŒ์ „ ๋ณด์ƒ์ž…๋‹ˆ๋‹ค.
      Reward Function
    • ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •
      • ๋ฒ•์„  ํ‰๋ฉด \Pi์—์„œ ๋‹จ์œ„ ๋ฒกํ„ฐ v๋ฅผ ์ž„์˜๋กœ ์ƒ˜ํ”Œ๋งํ•˜๋ฉฐ, ์ด ๋ฒกํ„ฐ๋Š” object์— ๋ถ€์ฐฉ๋œ ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
      • ๋‹ค์Œ ์ƒํƒœ์—์„œ์˜ ํ•ด๋‹น ๋ฒกํ„ฐ v'๋ฅผ ๊ฐ€์ ธ์™€ \Pi์— ํˆฌ์˜(projection)ํ•ฉ๋‹ˆ๋‹ค: v'_p = \text{Proj}(v', \Pi)
      • \Delta \theta \in [-\pi, \pi)๋Š” ์ถ• k์— ๋Œ€ํ•ด v'_p์™€ v ์‚ฌ์ด์˜ ๋ถ€ํ˜ธ ์žˆ๋Š” ๊ฑฐ๋ฆฌ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค.
    • object์˜ ์›€์ง์ž„์ด ๋งค์šฐ ๋ณต์žกํ•œ ๊ฒฝ์šฐ, ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ์ œ๊ณตํ•˜๋Š” ๊ฐ์†๋„๊ฐ€ ๋งค์šฐ ๋…ธ์ด์ฆˆ๊ฐ€ ์‹ฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ณด์ƒ์— ์ด ๊ฐ์†๋„๋ฅผ ๋ณด์ƒํ•จ์ˆ˜์— ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ํŠน์ • ์ž์„ธ์—์„œ ์ง„๋™ํ•˜๋Š” ๋“ฑ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์€ object ์›€์ง์ž„ ํŒจํ„ด์ด ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์ด ์œ ํ•œ ์ฐจ๋ถ„(finite difference)์„ ๋ณด์ƒ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์„œ๋กœ ๋‹ค๋ฅธ ์‹คํ–‰์—์„œ๋„ ์ผ๊ด€๋œ ํšŒ์ „ ๋™์ž‘์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  2. Penalty of objectโ€™s velocity r_{vel}
    • ์†์ด object๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ํšŒ์ „์‹œํ‚ค๋„๋ก ์žฅ๋ คํ•˜๋ฉฐ, ํ›ˆ๋ จ๋œ ์ •์ฑ…์˜ transferability์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
  3. Penalty of falling r_{fall}
    • object๊ฐ€ ์†๋ฐ”๋‹ฅ์—์„œ ๋–จ์–ด์งˆ ๋•Œ ์ ์šฉ๋˜๋Š” negative penalty์ž…๋‹ˆ๋‹ค.
  4. Penalty of the work controller r_{work}
    • ์ปจํŠธ๋กค๋Ÿฌ์˜ ์ผ(work)์˜ ์–‘์„ ํŒจ๋„ํ‹ฐ๋กœ ๋ถ€๊ณผํ•ฉ๋‹ˆ๋‹ค. ์ด reward term์˜ torque \tau๋Š” t์—์„œ PD ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์ถœ๋ ฅํ•œ ํ† ํฌ์ž…๋‹ˆ๋‹ค. ์ด ํŽ˜๋„ํ‹ฐ๋Š” ์†๊ฐ€๋ฝ ์›€์ง์ž„์˜ ๋ถ€๋“œ๋Ÿฌ์›€์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.
  5. Penalty of torque r_{torque}
    • ํฐ ํ† ํฌ ์ถœ๋ ฅ๊ฐ’์— ํŒจ๋„ํ‹ฐ๋ฅผ ๋ถ€๊ณผํ•ฉ๋‹ˆ๋‹ค.
  6. Reward of distance r_{dist}
    • ๊ฑฐ๋ฆฌ ๋ณด์ƒ์œผ๋กœ, ์†๋์ด ๊ฐ์ฒด์— ๊ฐ€๊นŒ์ด ๊ฐ€์„œ ์ƒํ˜ธ์ž‘์šฉํ•˜๋„๋ก ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค.
      • d(x_{\text{tip}}, x_{\text{obj}})๋Š” ์†๋ ์œ„์น˜ x_{\text{tip}}์™€ ๊ฐ์ฒด ์œ„์น˜ x_{\text{obj}} ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ์ž…๋‹ˆ๋‹ค.
      • \epsilon์€ ์ž‘์€ ์–‘์œผ๋กœ, ๋ถ„๋ชจ๊ฐ€ 0์ด ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.
      • c_2์™€ c_3๋Š” ๋ณด์ƒ์˜ ํด๋ฆฌํ•‘ ๋ฒ”์œ„๋ฅผ ์ •์˜ํ•˜๋Š” ํ•˜ํ•œ๊ณผ ์ƒํ•œ์ž…๋‹ˆ๋‹ค.

2.1.4 Reset

๋ถˆํ•„์š”ํ•œ exploration๋ฅผ ์ค„์ด๊ณ  ํ•™์Šต ๊ณผ์ •์„ ๊ฐ€์†ํ™”ํ•˜๊ธฐ ์œ„ํ•ด object๊ฐ€ ์ดˆ๊ธฐ ์œ„์น˜(์ฆ‰, ์†๋ฐ”๋‹ฅ์˜ ์ค‘์‹ฌ)์—์„œ ๋„ˆ๋ฌด ๋งŽ์ด ๋ฒ—์–ด๋‚  ๊ฒฝ์šฐ ์—ํ”ผ์†Œ๋“œ๋ฅผ ๋ฆฌ์…‹ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, object์˜ ์ฃผ์š” ์ถ•์ด ํšŒ์ „ ์ถ•์—์„œ ๋„ˆ๋ฌด ๋งŽ์ด ๋ฒ—์–ด๋‚  ๊ฒฝ์šฐ์—๋„ ์—ํ”ผ์†Œ๋“œ๋ฅผ ๋ฆฌ์…‹ํ•ฉ๋‹ˆ๋‹ค.

2.2 Domain Randomization

๊ฐ•ํ™”ํ•™์Šต์˜ Sim2Real Gap์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ํ•™์Šต ๋‹จ๊ณ„์—์„œ Domain Randomization์„ ์ ์šฉํ•˜๋Š”๋ฐ ํ•ด๋‹น ์—ฐ๊ตฌ์—์„œ๋Š” 2๊ฐ€์ง€ Domain Randomization์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

  1. ๋ฌผ๋ฆฌ์  ๋žœ๋คํ™”:

    • rotationํ•˜๋Š” object์˜ ์ดˆ๊ธฐ ์œ„์น˜, ์งˆ๋Ÿ‰, ํ˜•ํƒœ, ๋งˆ์ฐฐ์„ ๋žœ๋คํ™”ํ•˜์—ฌ ํ•™์Šต๋œ ์ •์ฑ…์ด ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ ๊ฐ์ฒด๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    • PD ์ปจํŠธ๋กค๋Ÿฌ์˜ ๊ฒŒ์ธ์„ ๋žœ๋คํ™”ํ•˜์—ฌ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ PD ์ปจํŠธ๋กค๋Ÿฌ์˜ ๋ถˆํ™•์‹ค์„ฑ์„ ๋ชจ๋ธ๋งํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐ ์ด‰๊ฐ ์„ผ์„œ๋ฅผ ๋žœ๋คํ™”ํ•˜๋Š” ๊ฒƒ๋„ ๊ณ ๋ คํ•ฉ๋‹ˆ๋‹ค. ํ™œ์„ฑํ™”๋œ ์ ‘์ด‰ ์„ผ์„œ(์ถœ๋ ฅ์ด 1์ธ ๊ฒฝ์šฐ)์— ๋Œ€ํ•ด, ํ™•๋ฅ  p๋กœ ์ถœ๋ ฅ์„ 0์œผ๋กœ ๋’ค์ง‘์Šต๋‹ˆ๋‹ค.
    • ์ง€์ˆ˜ ์ง€์—ฐ ๋ชจ๋ธ(exponential delay)์„ ํ†ตํ•ด ์ ‘์ด‰ ์„ผ์„œ์˜ ์‹ ํ˜ธ ์ง€์—ฐ์„ ๋ชจ๋ธ๋งํ•ฉ๋‹ˆ๋‹ค.
  2. ๋น„๋ฌผ๋ฆฌ์  ๋žœ๋คํ™”

    • policy์˜ observation๊ณผ ์ถœ๋ ฅ๋œ action์— ํ™”์ดํŠธ ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ์ž…ํ•˜์—ฌ ์ž‘์€ ์™ธ๋ž€์—๋„ ๊ฐ•์ธํ•˜๋„๋ก ๋งŒ๋“ญ๋‹ˆ๋‹ค.

2.3 Training Procedure

Proximal Policy Optimization (PPO) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋ฉฐ policy ๋„คํŠธ์›Œํฌ์™€ value ๋„คํŠธ์›Œํฌ ๋ชจ๋‘์— ๋‹ค์ธต ํผ์…‰ํŠธ๋ก (MLP)์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ํ›ˆ๋ จ ์„ค์ •:
    • ์ด์ (advantage) ํด๋ฆฝ ์ž„๊ณ„๊ฐ’ ฯต=0.2= 0.2์™€ KL ๋ฐœ์‚ฐ ์ž„๊ณ„๊ฐ’ 0.020.02๋ฅผ ์‚ฌ์šฉ
    • ๋„คํŠธ์›Œํฌ์—์„œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ELU๋ฅผ ์‚ฌ์šฉ
    • ์ •์ฑ… ๋„คํŠธ์›Œํฌ๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ์ƒํƒœ ๋…๋ฆฝ์ ์ธ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ์ถœ๋ ฅ
  • ๋น„๋Œ€์นญ ๊ด€์ฐฐ(asymmetric observation):
    • ์ •์ฑ… ๋ฐ ๊ฐ€์น˜ ๋„คํŠธ์›Œํฌ์˜ ํ•™์Šต ๋‚œ์ด๋„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด asymmetric observation ์„ ์‚ฌ์šฉ
      • ๊ฐ€์น˜ ๋„คํŠธ์›Œํฌ: ์ž…๋ ฅ์— ์ ‘์ด‰๋ ฅ, object์˜ ground-truth pose, ๋ฌผ๋ฆฌ์  ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๊ฐ™์€ ํŠน๊ถŒ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€
      • ์ •์ฑ… ๋„คํŠธ์›Œํฌ: ํ˜„์žฌ ์ƒํƒœ์™€ ํ•จ๊ป˜ 3๊ฐœ์˜ ๊ณผ๊ฑฐ ์ƒํƒœ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ, ํŠน๊ถŒ ์ •๋ณด๋Š” ์ ‘๊ทผํ•  ์ˆ˜ ์—†์Œ
  • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ค์ •:
    • IsaacGym ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ(dt)์€ 0.01667์ดˆ๋กœ ์„ค์ •ํ•˜๊ณ , 2 sub step์„ ์‚ฌ์šฉ
    • 8192๊ฐœ์˜ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ์—์„œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์‹คํ–‰
    • ์ •์ฑ… ๋„คํŠธ์›Œํฌ๊ฐ€ ์ถœ๋ ฅํ•˜๋Š” ํ–‰๋™(์ œ์–ด ๋ชฉํ‘œ)์€ 6๋‹จ๊ณ„ ๋™์•ˆ ์‹คํ–‰๋˜๋ฉฐ, ์ด๋Š” ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ 10Hz์˜ ์ œ์–ด ์ฃผํŒŒ์ˆ˜์— ํ•ด๋‹น
Training Process

3 Tactile Dexterous Manipulation System

3.1 Real-world System Setup

Overview

ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ์€ XArm ๋กœ๋ด‡ ํŒ”๊ณผ 16์ž์œ ๋„(16-DOF)๋ฅผ ๊ฐ€์ง„ Allegro Hand์— ์ ‘์ด‰ ์„ผ์„œ ๋ฐฐ์—ด์„ ์žฅ์ฐฉํ•œ ํ˜•ํƒœ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์†๋ฐ”๋‹ฅ๊ณผ ์†๊ฐ€๋ฝ ๋์„ ํฌํ•จํ•œ Allegro Hand์˜ ์—ฌ๋Ÿฌ ๋ถ€์œ„์— ๋ถ€์ฐฉ๋œ 16๊ฐœ์˜ ์ ‘์ด‰ ์„ผ์„œ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋œ ์ ‘์ด‰ ์„ผ์„œ๋Š” ์™ธ๋ถ€ ํž˜์ด ํ‘œ๋ฉด์— ๊ฐ€ํ•ด์งˆ ๋•Œ ์ €ํ•ญ์ด ๋ณ€ํ•˜๋Š” Force-Sensing Resistor(FSR) ๊ธฐ๋ฐ˜์ž…๋‹ˆ๋‹ค. STM32F ๋งˆ์ดํฌ๋กœ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์„ผ์„œ์˜ ์•„๋‚ ๋กœ๊ทธ ์ „์•• ์‹ ํ˜ธ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ , ์ด๋ฅผ ๋””์ง€ํ„ธ ์‹ ํ˜ธ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ํ˜ธ์ŠคํŠธ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ์ด ์ ‘์ด‰ ์„ผ์„œ๋Š” ์—ฐ์†์ ์ธ ์ ‘์ด‰๋ ฅ ์ธก์ •์„ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์‹ ํ˜ธ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋น„์„ ํ˜•์ ์ด๊ณ  ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฅผ ์ œ์–ด์— ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ์ ์ ˆํ•œ ์ „์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์„ ํƒ๋œ ์ž„๊ณ„๊ฐ’ \theta_{\text{th}}์— ๋”ฐ๋ผ ์ด ์ธก์ •๊ฐ’์„ ์ด์ง„ํ™”(binarize)ํ•˜๊ณ  ์ด ์‹ ํ˜ธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ด์ง„ ์‹ ํ˜ธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์žฅ์ :

  • ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ค„์ด๊ณ , Sim2Real ์ „์ด ์ ˆ์ฐจ๋ฅผ ๋‹จ์ˆœํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด์ง„ํ™”๋œ ์ธก์ •๊ฐ’์€ ์ž„๊ณ„๊ฐ’์„ ์กฐ์ •ํ•˜์—ฌ ์‰ฝ๊ฒŒ ๋ณด์ •(calibrate)ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3.2 Simulation Setup

์ด ๋…ผ๋ฌธ์—์„œ๋Š” IsaacGym ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์ ‘์ด‰ ์„ผ์„œ๋Š” ์†๊ฐ€๋ฝ๊ณผ ์†๋ฐ”๋‹ฅ ๋งํฌ์˜ ๊ณ ์ •๋œ ๋งํฌ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋ฉ๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋Š” ๋งค ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‹จ๊ณ„์—์„œ ๊ฐ ์„ผ์„œ ๋งํฌ์— ๋Œ€ํ•œ ์ˆœ ์ ‘์ด‰๋ ฅ F=[Fx,Fy,Fz]F = [F_x, F_y, F_z]๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. \|F\|์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋œ ์ ‘์ด‰๋ ฅ ์ธก์ •๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•œ ๋‹ค์Œ, ์ด ์ธก์ •๊ฐ’์„ ๋‹ค๋ฅธ ์ž„๊ณ„๊ฐ’ \tilde{\theta}_{\text{th}}์œผ๋กœ ์ด์ง„ํ™”ํ•ฉ๋‹ˆ๋‹ค.

์ค‘์š”ํ•œ ์ ์€ ์„ผ์„œ์˜ ๋ถ€๋ชจ ๋งํฌ์—์„œ ์ œ๊ณต๋˜๋Š” ํž˜์€ ์ˆœ ์ ‘์ด‰๋ ฅ์— ๊ธฐ์—ฌํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์‹ค์ œ ํ™˜๊ฒฝ๊ณผ ์œ ์‚ฌํ•œ ๋™์ž‘์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ์ด ์„ผ์„œ๋“ค์˜ ์ž„๊ณ„๊ฐ’ \tilde{\theta}_{\text{th}}์„ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” \tilde{\theta}_{\text{th}} = 0.01N์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

3.3 Benchmark: In-hand Rotation

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์‹œ์Šคํ…œ์˜ ์†์žฌ์ฃผ(dexterity)๋ฅผ ์—ฐ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด, ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜์—ฌ ์† ์•ˆ์—์„œ ํšŒ์ „ํ•˜๋Š” ์ž‘์—…(in-hand rotation task)์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ด task๋Š” object๊ฐ€ ์†๋ฐ”๋‹ฅ์— ์ดˆ๊ธฐํ™”๋œ ์ƒํƒœ๋กœ ์‹œ์ž‘ํ•˜๋ฉฐ, ๋กœ๋ด‡ ์†์€ ์ฃผ์–ด์ง„ ํšŒ์ „ ์ถ•์„ ๋”ฐ๋ผ ๊ฐ์ฒด๋ฅผ ํšŒ์ „์‹œ์ผœ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์† ์•ˆ์—์„œ ๊ฐ์ฒด๋ฅผ ํšŒ์ „ํ•˜๋Š” ๋™์•ˆ, object์˜ ์›€์ง์ž„์€ ์†๋ ํšŒ์ „(finger-tip rotation)๋ณด๋‹ค ํ›จ์”ฌ ๋” ๋ณต์žกํ•˜๋ฉฐ ํŠนํžˆ, ์† ์•ˆ์—์„œ ์กฐ์ž‘ํ•˜๋Š” ๋™์•ˆ ๊ฐ์ฒด๋Š” ์†๋ฐ”๋‹ฅ์—์„œ ๋ฏธ๋„๋Ÿฌ์ง€๊ฑฐ๋‚˜ ๊ตฌ๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์™€ ๊ฐ™์€ ๋ณต์žกํ•œ ์›€์ง์ž„ ํŒจํ„ด ๋•Œ๋ฌธ์—, ์„ฑ๊ณต์ ์ธ ์กฐ์ž‘์„ ์œ„ํ•ด ์ด‰๊ฐ ์„ผ์„œ๋‚˜ ๋น„์ „(vision) ์‹œ์Šคํ…œ์˜ ๋ช…์‹œ์ ์ธ ํ”ผ๋“œ๋ฐฑ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด, ํ˜„์žฌ ๊ฐ์ฒด์˜ ์ƒํƒœ๋ฅผ ์ถ”๋ก ํ•  ์ˆ˜ ์—†์œผ๋ฉฐ, ๊ฐ์ฒด๋ฅผ ์•ˆ์ „ํ•˜๊ฒŒ ๋ฐ€๊ฑฐ๋‚˜ ํšŒ์ „์‹œํ‚ค๋Š” ๋ฐ ์‹คํŒจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4 Reference

  • Original Paper
  • Project Homepage

Copyright 2024, Jung Yeon Lee