Curieux.JY
  • Post
  • Note
  • Jung Yeon Lee

On this page

  • 1 Introduction
    • 1.1 Initial state distrubutions
    • 1.2 Pose of Quadruped Robots
    • 1.3 Distance between poses
  • 2 Motivation & Contribution
  • 3 Method
    • 3.1 Sampling Static Poses
    • 3.2 Estimating Accessibility Values
    • 3.3 Clustering
    • 3.4 Reinforcement Learning Process
  • 4 Results
  • 5 Conclusion
  • 6 Reference

๐Ÿ“ƒK-Accessibility ๋ฆฌ๋ทฐ

rl
clustering
quadruped
recovery
backflip
paper
Accessibility-Based Clustering for Efficient Learning of Locomotion Skills
Published

May 7, 2023

์ด๋ฒˆ ํฌ์ŠคํŒ…์€ ์ตœ๊ทผ ICRA(International Conference on Robotics and Automation) 2022์—์„œ๋„ ๋ฐœํ‘œ๋œ Accessibility-Based Clustering for Efficient Learning of Locomotion Skills ๋…ผ๋ฌธ์„ ์ฝ๊ณ  ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๋กœ๋ด‡ ์ œ์–ด๋ฅผ ํ•™์Šตํ•  ๋•Œ ์–ด๋–ป๊ฒŒ ํšจ์œจ์ ์œผ๋กœ initial state distribution์„ ํƒ์ƒ‰ํ•˜๋„๋ก ๋งŒ๋“ค์–ด ์ค„ ์ˆ˜ ์žˆ์„๊นŒ?๋ผ๋Š” ์งˆ๋ฌธ์„ K-means++ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์œ ์‚ฌํ•œ K-Access๋ผ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ณ ์•ˆํ•˜์—ฌ ํ•ด๊ฒฐํ•œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” quadruped robot์˜ Recovery์™€ Backflip ๋ชจ์…˜ ํ•™์Šต์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

1 Introduction

์ด ์„ธ์ƒ์—๋Š” ๋งŽ์€ ๋กœ๋ด‡๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ์ค‘ ์กฑํ˜• ๋กœ๋ด‡(legged-robots), ์ฆ‰ ๋‹ค๋ฆฌ ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ ์ง€๋ฉด๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ•˜์—ฌ ์ด๋™ํ•˜๋Š” ๋กœ๋ด‡๋“ค์€ ๋ฐ”ํ€ดํ˜• ์ด๋™ ๋กœ๋ด‡(wheeled-robots)์— ๋น„ํ•ด ๋ฐ”ํ€ด๋กœ ๊ฐ€๊ธฐ ํž˜๋“  ์šธํ‰๋ถˆํ‰ํ•œ ์ง€ํ˜•, ์—ฐ์†์ ์ด์ง€ ์•Š์€ ์ง€ํ˜• ์ƒ์—์„œ ์ด๋™ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์กฑํ˜• ๋กœ๋ด‡๊ณผ ๋ฐ”ํ€ดํ˜• ๋กœ๋ด‡ ๋ชจ๋‘ ์ด๋™์„ ๊ธฐ๋ณธ ์ „์ œ๋กœ ์‚ฌ๋žŒ์—๊ฒŒ ์œ ์šฉํ•œ ๋‹ค์–‘ํ•œ task๋ฅผ ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ์กฑํ˜• ๋กœ๋ด‡์˜ ์ด๋™์„ ๋ณดํ–‰(locomotion) task, ๋ฐ”ํ€ดํ˜• ๋กœ๋ด‡์˜ ์ด๋™์„ ์ฃผํ–‰(drive) task์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์กฑํ˜• ๋กœ๋ด‡์€ ๋Œ€ํ‘œ์ ์œผ๋กœ ์‚ฌ๋žŒ๊ณผ ๊ฐ™์€ ๋ณดํ–‰์„ ํ•˜๋Š” 2์กฑ ๋ณดํ–‰ ๋กœ๋ด‡๊ณผ ๊ฐ•์•„์ง€์™€ ๊ฐ™์€ ๋ณดํ–‰์„ ํ•˜๋Š” 4์กฑ ๋ณดํ–‰ ๋กœ๋ด‡์œผ๋กœ ๋‚˜๋ˆ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ ๊ทธ ์ค‘ 2์กฑ ๋ณดํ–‰ ๋กœ๋ด‡์ธ Digit๊ณผ ๊ด€๋ จํ•œ ์žฌ๋ฐŒ๋Š” ๋‰ด์Šค ํด๋ฆฝ์ด ํ•˜๋‚˜ ์žˆ์–ด์„œ ๋ณธ๊ฒฉ์ ์œผ๋กœ ๋…ผ๋ฌธ์„ ์‚ดํŽด๋ณด๊ธฐ ์ „์— ์†Œ๊ฐœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

์ „ํ›„์ขŒ์šฐ ์ž์œ ๋กญ๊ฒŒ ์›€์ง์ด๊ณ  ์ œ์ž๋ฆฌ์—์„œ ๋Œ๊ฑฐ๋‚˜ ์›…ํฌ๋ฆด ์ˆ˜ ์žˆ๋Š” ์œ ์—ฐ์„ฑ โ€ฆ ํ•œ ๋งˆ๋””๋กœ ๊ณ ๋„์˜ ๊ธฐ์ˆ  ์ง‘ํ•ฉ์ฒด๋ผ๋Š” ์†Œ๋ฆฐ๋ฐ, โ€ฆ ์–ด์งธ ์ ์  ์†๋„๊ฐ€ ๋А๋ ค์ง€๋Š”๊ฐ€ ์‹ถ๋”๋‹ˆ ๋‹ค๋ฆฌ๊ฐ€ ํ’€๋ฆฌ๋ฉด์„œ ์“ฐ๋Ÿฌ์ง€๊ณ  ๋ง™๋‹ˆ๋‹ค.

โ€ฆ ๋ฐ•๋žŒํšŒ ์ฐธ๊ฐ€ ์ „ ์—ฌ๋Ÿฌ ๋‚ ์— ๊ฑฐ์ณ ์•ฝ 20์‹œ๊ฐ„์˜ ๋ผ์ด๋ธŒ ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ โ€ฆ 99%์˜ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์˜€์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋˜ ์‚ฌ๋žŒ๋“ค์ด ๋ณด๋Š” ์•ž์—์„œ ์ด๋Ÿฐ ์‚ฌ๊ณ ๊ฐ€ โ€ฆ

๋‰ด์Šค์—์„œ ๋ณด์‹  ๊ฒƒ์ฒ˜๋Ÿผ ๋ฐ๋ชจ๋ฅผ ๋งค์šฐ ์˜ค๋žœ์‹œ๊ฐ„ ์ค€๋น„ํ•˜๊ณ  ์ ๊ฒ€ํ•˜์˜€์ง€๋งŒ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•œ ๋ณ€์ˆ˜๋กœ ์ธํ•ด ๋กœ๋ด‡์ด ๋„˜์–ด์ง€๋Š” ๋ชจ์Šต์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ ์ž˜ ๊ตฌ์„ฑ๋œ ๋ฐ๋ชจ ํ™˜๊ฒฝ์—์„œ๋„ ๋ฏธ์ฒ˜ ์ƒ๊ฐ์ง€ ๋ชปํ•œ ์ผ๋กœ ์ธํ•ด ๋กœ๋ด‡์ด ๋„˜์–ด์ง€๋Š”๋ฐ ์‹ค์ œ ๋กœ๋ด‡์ด ์ƒํ’ˆ์ด ๋œ๋‹ค๋ฉด ๋กœ๋ด‡์ด ๋†“์ด๊ฒŒ ๋˜๋Š” ํ™˜๊ฒฝ์€ ์ •๋ง ๋‹ค์–‘ํ•ด์„œ ์ ˆ๋Œ€ ๋„˜์–ด์ง€์ง€ ์•Š์„ ๊ฒƒ์ด๋ผ๋Š” ๋ณด์žฅ์„ ํ•˜๊ธฐ์— ์–ด๋ ค์šธ ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

์•ž์„œ ์กฑํ˜• ๋กœ๋ด‡์˜๊ธฐ๋ณธ ์ „์ œ๋Š” ์ด๋™์ด๋ผ๊ณ  ํ–ˆ๋“ฏ์ด Locomotion์€ ๋กœ๋ด‡์˜ Main task๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์œผ๋ฉฐ ๊ฑท๋Š” ๋ชจ์…˜์„ ์ƒ๊ฐํ•ด ๋ดค์„ ๋•Œ ์ผ์ •ํ•œ ์ฃผ๊ธฐ๋กœ ๊ฐ™์€ ๋ชจ์…˜์ด ๋ฐ˜๋ณต๋˜์–ด ์ง„ํ–‰(cyclic)๋˜๋Š” ๊ฒƒ์„ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ Locomotion์€ ์˜ค๋žœ์‹œ๊ฐ„ ๋™์•ˆ ๋™๋ฌผ๋“ค์˜ ๋ชจ์…˜๊ณผ ๊ฐ™์€ reference motion๋“ค์„ ํ†ตํ•ด ์—ฐ๊ตฌ๋˜์–ด ์˜ค๋ฉด์„œ ์ˆ˜ํ•™์ ์ธ ๋ชจ๋ธ๋ง์ด ์ž˜ ์—ฐ๊ตฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ Locomotion์˜ ํŠน์„ฑ์„ ๊ฐ•ํ™”ํ•™์Šต์˜ policy๊ฐ€ ํ•™์Šตํ•ด์•ผ ํ•˜๋Š” ๋กœ๋ด‡์˜ joint ์ œ์–ด์ธก๋ฉด์—์„œ search space๋กœ ํŒŒ์•…ํ•ด๋ณธ๋‹ค๋ฉด, Narrowํ•œ search space๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด Recovery๋Š” ์ง€์†์ ์ธ ์šด์šฉ์„ ๋’ท๋ฐ›์นจํ•ด์ฃผ๋Š” Support Task์ด์ž Non-cyclicํ•œ ๋ชจ์…˜์„ ์š”๊ตฌํ•˜๋Š” task๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋„˜์–ด์ง„ ์ „๋ณต ์ž์„ธ์—์„œ ๊ตฌ์ฒด์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ ์ •์ƒ ์ƒํƒœ๋กœ ํšŒ๋ณตํ•˜๋ผ๊ณ  ๋ชจ์…˜์„ ํƒ€์ž„์Šคํ… ๋ณ„๋กœ ์ •์˜ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ค์šฐ๋ฉฐ ๋ณด๋‹ค Broadํ•œ search space๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” task ์ž…๋‹ˆ๋‹ค.

Locomotion๊ณผ Recovery์— ๋Œ€ํ•œ ๋น„๊ต๋Š” ์ด์ „์— ์ž‘์„ฑํ•œ ํฌ์ŠคํŒ…์—์„œ๋„ ํ•œ๋ฒˆ ๋‹ค๋ฃฌ ์ ์ด ์žˆ์œผ๋‹ˆ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.)

1.1 Initial state distrubutions

์•ž์„œ Locomotion๊ณผ Recovery๋ฅผ ๋น„๊ตํ•˜๋ฉฐ ์‚ดํŽด๋ณด์•˜๋Š”๋ฐ ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ํ•˜๋Š” ๋กœ๋ด‡ ์ œ์–ด์˜ ๊ด€์ ์—์„œ ๋งค์šฐ ํฐ ์ฐจ์ด์  ํ•˜๋‚˜๊ฐ€ ๋” ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ Initial State Distribution, ๊ฐ•ํ™”ํ•™์Šต์˜ Robot Agent๊ฐ€ ํ•™์Šต Episode๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๋งจ ์ฒ˜์Œ์˜ State๋“ค์˜ ๋ถ„ํฌ์ž…๋‹ˆ๋‹ค. Locomotion์—์„œ๋Š” command(์ปจํŠธ๋กค๋Ÿฌ๋กœ ์กฐ์ž‘ํ•˜๋Š” ๋กœ๋ด‡์˜ desired velocity ํ˜น์€ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋ฐฉํ–ฅํ‚ค ์กฐ์ž‘์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Œ)๋ฅผ ๋”ฐ๋ผ ์›€์ง์ด๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— Initial State๋กœ ๋กœ๋ด‡์˜ standing ์ž์„ธ๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šต Episode๋ฅผ ์‹œ์ž‘ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด Recovery๋Š” ๋กœ๋ด‡์ด ๋„˜์–ด์ง„ ์ƒํ™ฉ(์ž์„ธ)๊ฐ€ ๊ฐ Episode์˜ Initial State๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ๋„˜์–ด์ง„ ์ž์„ธ๋Š” ๋งค์šฐ ๋‹ค์–‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์–ด๋–ค ๋„˜์–ด์ง„ ์ž์„ธ๋Š” ์ •์ƒ์ƒํƒœ๋กœ ํšŒ๋ณตํ•˜๊ธฐ๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์‰ฌ์šด ๋ฐ˜๋ฉด, ์–ด๋–ค ์ž์„ธ๋Š” ์ •์ƒ์ƒํƒœ๋กœ ํšŒ๋ณตํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— Recovery task์—์„œ๋Š” RL(Reinforcement Learning) agent๊ฐ€ Initial State Distribution์„ ์ž˜ ํƒ์ƒ‰ํ•˜๊ณ  ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ค์–ด์ฃผ๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

์œ„ ์‚ฌ์ง„์—์„œ ์ฒ˜๋Ÿผ RL Agent๊ฐ€ ํƒ์ƒ‰ํ•ด์•ผํ•˜๋Š” ์ „์ฒด State Space์™€ ์–ด๋–ค ํ•œ Initial state(ํ˜น์€ Initial pose, orange dot)์™€ ์œ ์‚ฌํ•œ state๋“ค์˜ ์ง‘ํ•ฉ ์˜์—ญ Effective Exploration Region(EER)์„ ์ฃผํ™ฉ์ƒ‰ ์› ์˜์—ญ์œผ๋กœ ํ‘œ์‹œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ฃผํ™ฉ์ƒ‰ ์› ์•ˆ์˜ ์˜์—ญ์˜ State๋“ค์€ ์› ์ค‘์‹ฌ์˜ ํ•˜๋‚˜์˜ Initial State๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ํ•™์Šตํ•˜๊ณ  ๋‚˜๋ฉด ์–ด๋ ต์ง€ ์•Š๊ฒŒ ๊ฐ•ํ™”ํ•™์Šต Policy๊ฐ€ ์ž˜ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” State๋“ค์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Case 1์€ ์ „์ฒด ํƒ์ƒ‰ํ•ด์•ผ ํ•˜๋Š” State Space๋ฅผ ๋นˆํ‹ˆ์˜ ์ตœ์†Œํ™”ํ•˜๋„๋ก ๋งŽ์€ Initial state๋ฅผ ํ•™์Šตํ•˜์ง€๋งŒ ๊ฐ EER๋“ค์ด ๋งŽ์ด ์ค‘๋ณต๋˜์–ด ํ•™์Šต๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ํšจ์œจ์ด ๋งค์šฐ ๋–จ์–ด์ง€๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Case 2์—์„œ๋Š” ์ ์€ Initial state๋กœ ํ•™์Šตํ•ด์„œ State Space๊ฐ€ ์ž˜ ์ปค๋ฒ„๋˜์ง€ ์•Š์•˜์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ชฉํ‘œ๋กœํ•˜๋Š” Target State๋„ ์ž˜ ํ•™์Šต๋˜์ง€ ์•Š์•„ ํ•™์Šต Policy์˜ ์„ฑ๋Šฅ์ด ๋งค์šฐ ๋–จ์–ด์ง€๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Case 3๋Š” ๋ชฉํ‘œ๋กœํ•˜๋Š” Target State๋Š” EER์— ๋“ค์–ด๊ฐ€์„œ Policy๊ฐ€ ํ•™์Šตํ•œ state๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ „์ฒด State Space์—์„œ ์ปค๋ฒ„๋˜์ง€ ๋ชปํ•œ state๋“ค์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Corner case๋“ค(Policy๊ฐ€ ์ž˜ ์ž‘๋™๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ)์ด ์žˆ์–ด Policy์˜ robustness๊ฐ€ ๋–จ์–ด์ง„๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ€์žฅ ์ด์ƒ์ ์ธ ์ƒํ™ฉ์€ Case 4์—์„œ์ฒ˜๋Ÿผ Target State๋„ EER์˜ ๋ฒ”์ฃผ์— ๋“ค์–ด๊ฐ€ ์žˆ๊ณ  ์ „์ฒด State Space๋„ ์ ์ ˆํ•œ ์ˆ˜์˜ Initial State๋“ค๋กœ ํƒ์ƒ‰๋˜์–ด Policy์˜ Robustํ•œ ์ƒํ™ฉ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1.2 Pose of Quadruped Robots

๊ทธ๋ ‡๋‹ค๋ฉด Initial State, ์ฆ‰ 4์กฑ ๋ณดํ–‰ ๋กœ๋ด‡์˜ ์ž์„ธ(pose)๋Š” ์–ด๋–ป๊ฒŒ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”? ์ „๋ณต๋œ ์ƒํ™ฉ์€ ๋„˜์–ด์ ธ ์žˆ๋Š” ๋กœ๋ด‡์˜ ์ž์„ธ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

์ „๋ณต๋œ ์ƒํ™ฉ์€ ์›€์ง์ž„์ด ์—†๋Š” ๋„˜์–ด์ง„ ์ •์ (Static) ์ƒํ™ฉ์ด๋ผ ๊ฐ€์ •ํ•˜๊ณ  ๋กœ๋ด‡์˜ ์ƒํ™ฉ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด 2๊ฐ€์ง€ ์ •๋ณด๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ๋Š” ๋ชธ์ฒด์˜ ๊ธฐ์šธ์–ด์ง์„ ํ‘œํ˜„ํ•˜๋Š” Projected gravity vector๋กœ ์ง€๊ตฌ ์ค‘๋ ฅ ๋ฐฉํ–ฅ์˜ ๋ฒกํ„ฐ๋ฅผ (0, 0, -1)์ด๋ผ๊ณ  ํ–ˆ์„ ๋•Œ, ๋กœ๋ด‡ ๋ชธ์ฒด์˜ ํ”„๋ ˆ์ž„์— gravity vector๋ฅผ projectionํ•˜๊ณ  normalizedํ•œ 3์ฐจ์›์˜ ๋ฒกํ„ฐ ์ •๋ณด๋Š” ๋ชธ์ฒด์˜ ๊ธฐ์šธ์–ด์ง์„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘๋ฒˆ์งธ ์š”์†Œ๋Š” ๋กœ๋ด‡์˜ ๊ฐ ๋‹ค๋ฆฌ์— 3๊ฐœ์”ฉ ๋ฐฐ์น˜๋˜์–ด ๊ด€์ ˆ์ด ๋˜๋Š” 12๊ฐœ์˜ revolute joint(motor) angle ์ž…๋‹ˆ๋‹ค.

1.3 Distance between poses

Pose๋ฅผ ์ •์˜ํ•œ ๋‹ค์Œ์œผ๋กœ ์‚ดํŽด๋ณผ ๋ถ€๋ถ„์€ ์—ฌ๋Ÿฌ pose๋“ค ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์–ด๋–ป๊ฒŒ ์ •์˜ํ•  ์ˆ˜ ์žˆ์„๊นŒ์— ๋Œ€ํ•œ ๋ถ€๋ถ„์„ ๊ณ ๋ฏผํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ pose๋“ค ๊ฐ„์— ๊ฐ€๊น๋‹ค(๋น„์Šทํ•˜๋‹ค), ๋ฉ€๋‹ค๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ฑฐ๋ฆฌ(Distance)๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€์žฅ ์ง๊ด€์ ์œผ๋กœ pose๋ฅผ ์ด๋ฃจ๊ณ  ์žˆ๋Š” ์š”์†Œ๋“ค ๊ฐ„์˜ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๋ฅผ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•ž์„œ ์ •์ ์ธ ์ž์„ธ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” Projected gravity vector ์™€ Joint angles์˜ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•ด์„œ ๋‚˜์˜จ ์ˆ˜์น˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ pose๊ฐ€ ์„œ๋กœ ๋น„์Šทํ•˜๋‹ค, ๋งŽ์ด ๋‹ค๋ฅด๋‹ค๋ฅผ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๊ทธ๋ฆผ์—์„œ์˜ ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๊ฐ€ Non-senseํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 3๊ฐ€์ง€ ์ž์„ธ, Backward Leaning(B), Forward Leaning(F), Lying(L)๋ฅผ ๊ฐ€์ง€๊ณ  ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•ด๋ณด๋ฉด B-F์˜ ๊ฑฐ๋ฆฌ๊ฐ€ F-L์˜ ๊ฑฐ๋ฆฌ๋ณด๋‹ค ํฐ ์ˆ˜์น˜์ธ ๊ฒƒ์„ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋กœ๋ด‡์„ ์ง์ ‘ ์ œ์–ดํ•ด์„œ ์ž์„ธ๋ฅผ transitionํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์„ ๋•Œ, F์—์„œ B๋กœ์˜ transition์ด F์—์„œ L๋กœ์˜ transition์ด ํ›จ์”ฌ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ˆœํ•˜๊ฒŒ ๊ตฌ์„ฑ ์š”์†Œ๋“ค์˜ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๋กœ pose๋“ค ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ •์˜ํ•˜๋Š” ๊ฒƒ์€ ์ œ์–ด์ ์ธ ์ธก๋ฉด์—์„œ ๋ง์ด ๋œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.


2 Motivation & Contribution

๋„์ž… ๋ถ€๋ถ„์—์„œ ํŒŒ์•…ํ–ˆ๋˜ Recovery task์˜ ํŠน์„ฑ๊ณผ pose๋“ค ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด metric์˜ ํ•„์š”์„ฑ์„ motivation์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์€ contribution์„ ํŒŒ์•…ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Euclidean distance๋ณด๋‹ค ์ œ์–ด์ ์ธ ์ธก๋ฉด์—์„œ pose๋“ค ๊ฐ„์˜ ๊ฑฐ๋ฆฌ metric์ด ๋  ์ˆ˜ ์žˆ๋Š” Accessibility ๋ฅผ ์ œ์•ˆ

  • Accessibility๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ•ํ™”ํ•™์Šต์„ ํšจ์œจ์ ์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๋„๋ก(State Space ํƒ์ƒ‰์„ ์ž˜ํ•˜๋„๋ก) Initial State๋ฅผ ์ •ํ•  ์ˆ˜ ์žˆ๋Š” K-Access ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆ


3 Method

Overview

์ „์ฒด์ ์ธ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํฌ๊ฒŒ 4๋‹จ๊ณ„๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.

  1. Sampling Static Poses: ์ „๋ณต๋œ ์ •์  ์ž์„ธ๋ฅผ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค.
  2. Estimating Accessibility Values: ์ƒ˜ํ”Œ๋ง๋œ ์ „๋ณต ์ž์„ธ๋“ค ๊ฐ„์˜ ๊ฑฐ๋ฆฌ metric์ธ Accessibility Matrix๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  3. Clustering: ์ธก์ •ํ•œ Accessibility๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ Initial state๋“ค์„ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค.
  4. Learning: Cluster์˜ Centroid pose๋ฅผ initial state๋กœ Recovery(ํ˜น์€ Backflip)์„ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

3.1 Sampling Static Poses

์ „๋ณต๋œ ๋‹ค์–‘ํ•œ ์ž์„ธ๋“ค์„ ์ƒ˜ํ”Œ๋งํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋กœ๋ด‡์˜ base frame์˜ roll, pitch ๊ฐ๋„๋ฅผ ์ผ์ • ๋ฒ”์œ„์—์„œ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋งํ•˜๊ณ  12๊ฐœ์˜ joint position๋„ ๋กœ๋ด‡์˜ configuration์„ ๊ณ ๋ คํ•˜์—ฌ upper/lower limit range์— ์žˆ๋Š” ๊ฐ๋„๋กœ ์ž์„ธ๋ฅผ setํ•ด์„œ ์ „๋ณต๋œ ์ž์„ธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. (์ด๋•Œ yaw ๋ฐฉํ–ฅ์€ flat terrain์—์„  ์˜๋ฏธ๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— 0์œผ๋กœ ์…‹ํŒ…ํ•ฉ๋‹ˆ๋‹ค.) ์ƒ˜ํ”Œ๋ง๋œ ์ž์„ธ๋กœ pose๋ฅผ set ํ–ˆ์„ ๋•Œ self-collision์„ ํ™•์ธํ•œ ๋’ค self-collision์ด ๋˜์ง€ ์•Š์€ ์ž์„ธ 2.4k๊ฐœ๋ฅผ sampling ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ ์‚ฌ์ง„์€ ํ•ด๋‹น ๋…ผ๋ฌธ์˜ ์ฝ”๋“œ๋ฅผ ์—ฐ๊ตฌ์‹ค์—์„œ ๊ฐœ๋ฐœ๋œ AiDIN-VIII ๋กœ๋ด‡์— ์ ์šฉํ•œ ๋ชจ์Šต์ž…๋‹ˆ๋‹ค

3.2 Estimating Accessibility Values

์•ž ๋‹จ๊ณ„์—์„œ ์ƒ˜ํ”Œ๋งํ•œ 2.4k๊ฐœ์˜ pose๋“ค ์ค‘ 1000๊ฐœ๋งŒ ๊ฐ€์ง€๊ณ  Accessibility๋ฅผ ์ธก์ •ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์šฐ์„  2.4k๊ฐœ ์ค‘ 1000๊ฐœ๋งŒ ๊ฐ€์ง€๊ณ  ์ง„ํ–‰ํ•˜๋Š” ์ด์œ ๋Š” ํ•™์Šต ์ดํ›„ Policy๋ฅผ ํ…Œ์ŠคํŠธํ•˜๊ธฐ ์œ„ํ•œ Initial state๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ 1.4๊ฐœ์˜ pose๋Š” ๋‚จ๊ฒจ๋†“๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ์•ž์„œ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๊ฐ€ metric์œผ๋กœ์จ ์ข‹์ง€ ์•Š์€ ์ ์„ ์˜ˆ์‹œ๋ฅผ ๋ณด๋ฉฐ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๊ธฐ์— ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ๋Œ€์ฒดํ•  metric์œผ๋กœ Accessibility๋ผ๋Š” ๊ฐœ๋…์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

์œ„์˜ ์˜ˆ์‹œ๋Š” ์–ด๋–ค pose A์—์„œ pose B๋กœ์˜ Accessibility๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํŠน์ • pose์—์„œ ๋‹ค๋ฅธ ํŠน์ • pose๋กœ transitionํ•˜๋Š” ๊ณผ์ •์„ progress๋ผ๋Š” ์ž‘์€ timestep๋“ค๋กœ ์ชผ๊ฐœ๊ณ  ๊ฐ timestep์— ํ•ด๋‹นํ•˜๋Š” transition angle์„ PD controller๋กœ ์ œ์–ดํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. pose๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” 12๊ฐœ์˜ joint position(angle)์€ continuous value์ด๊ธฐ ๋•Œ๋ฌธ์— ์ฒ˜์Œ๊ณผ ๋ pose์˜ angle์„ ์•ˆ๋‹ค๋ฉด linear interpolation์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. progress๋ฅผ scaled timeline(0~1๋กœ normalized)์ด๋ผ๊ณ  ํ•˜๊ณ  ์ชผ๊ฐ  timestep ํ•˜๋‚˜๋ฅผ ๋ณ€์ˆ˜ t๋กœ ๋ณธ๋‹ค๋ฉด ๋งค ์ˆœ๊ฐ„์˜ desired transition angle ์€ t \cdot \text{[joint angle of B]} + (1-t) \cdot \text{[joint angle of A]}์œผ๋กœ ๊ณ„์‚ฐ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๊ณ„์‚ฐ๋œ desired transition angle์„ ๋”ฐ๋ผ๊ฐ€๋„๋ก PD์ œ์–ด๋ฅผ ํ•˜๋ฉด์„œ ์ถฉ๋ถ„ํžˆ pose B์— ๊ฐ€๊นŒ์›Œ์กŒ๋Š”๊ฐ€?๋ฅผ ํŒ๋‹จํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ์ด๋•Œ์˜ ๊ธฐ์ค€์€ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๋กœ ๊ณ„์‚ฐ๋œ joint position distance, base์˜ height distance, gravity vector distance์ด ๋งค์šฐ ์ž‘์€ ์˜ค์ฐจ ๋ฒ”์œ„๋‚ด์— ๋“ค์–ด๊ฐ”๋Š”์ง€๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. pose A์—์„œ pose B๋กœ ์ถฉ๋ถ„ํžˆ ๊ฐ€๊นŒ์›Œ์ง„ ํ•ด๋‹น ์‹œ๊ฐ„ t๋ฅผ ๊ธฐ๋กํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, 3์ดˆ ๋‚ด์— pose B์— ๊ฐ€๊นŒ์›Œ์ง„ ์ƒํƒœ๋กœ ํ‰ํ˜•์ƒํƒœ์— ๋„๋‹ฌํ•˜๋Š”์ง€ ์ฒดํฌํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ €์ž๊ฐ€ ๊ณต๊ฐœํ•œ ์ฝ”๋“œ์—์„œ ํ™•์ธํ•ด๋ดค์„ ๋•Œ 20์ดˆ๋ฅผ ์ƒํ•œ์„ ์œผ๋กœ ์„ค์ •ํ•˜๊ณ  1000 pose \times 1000 pose Time ๋งคํŠธ๋ฆญ์Šค๋กœ ํ‰ํ˜•์ƒํƒœ์— ๋„๋‹ฌํ•œ ์‹œ๊ฐ„์„ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

์•ž์„œ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๋กœ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์ด ์ข‹์ง€ ์•Š๋‹ค๊ณ  ์ฃผ์žฅํ•  ๋•Œ๋Š” pose๋“ค์ด ์ถฉ๋ถ„ํžˆ ๋‹ฌ๋ž์„ ๋•Œ pose๋“ค ๊ฐ„์˜ ๊ด€๊ณ„ ์ •์˜๋กœ ์‚ฌ์šฉํ•˜๊ธฐ์— ๋ถ€์ ์ ˆํ•จ์„ ๋“ค์–ด ํƒ€๋‹นํ•˜์ง€ ์•Š๋‹ค๊ณ  ์ฃผ์žฅํ•œ ๊ฒƒ์ด์—ˆ๊ณ , ํ˜„์žฌ pose๊ฐ€ transition์ด ๋˜์—ˆ๋Š”๊ฐ€๋ฅผ ํŒ๋‹จํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ค€์œผ๋กœ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๊ฐ€ ๋งค์šฐ ์ž‘์€์ง€๋กœ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์€ similarness๋ฅผ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์ด๊ธฐ์— motivation์„ ํ•ด์น˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ์ธก์ •ํ•œ transition time์„ ๊ฐ€์ง€๊ณ  State Space๋ฅผ ํ•ด์„ํ•ด๋ณธ๋‹ค๋ฉด pose A(s_0)์—์„œ pose B(s_1)์œผ๋กœ์˜ ์‹œ๊ฐ„ t(s_0, s_1)์ด ์–ด๋–ค ํŠน์ • ์‹œ๊ฐ„ t_0์ดํ•˜๋ผ๋ฉด ๋‘ pose ์‚ฌ์ด ๊ด€๊ณ„๋Š” High Accessibility๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ๋งŒ์•ฝ t(s_0, s_1)์ด ์–ด๋–ค ํŠน์ • ์‹œ๊ฐ„ t_0 ์ดˆ๊ณผ๋ผ๋ฉด Low Accessibility ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๊ณ  ์ด๋•Œ์˜ ๊ธฐ์ค€์ด ๋˜๋Š” ํŠน์ • ์‹œ๊ฐ„ t_0๊ฐ€ EER R์˜ ๊ฒฝ๊ณ„๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ Radial Boundary๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์•ž์„œ ๊ณ„์‚ฐํ•œ Time ๋งคํŠธ๋ฆญ์Šค(t(s_i, s_j))๋ฅผ ๊ฐ€์ง€๊ณ  e^{-t(s_i, s_j)}์„ ๊ณ„์‚ฐํ•œ ๊ฒƒ์„ ๋ฐ”๋กœ Accessibility๋ผ๊ณ  ์ •์˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

3.3 Clustering

K-Access Algorithm

์ด์ œ State Space ์ƒ์˜ pose๋“ค๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ •์˜ํ•˜๋Š” Accessibility ๊ฐ’์„ ๊ตฌํ•œ ๋‹ค์Œ์— ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ž˜ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ธ๊ฐ€?์— ๋Œ€ํ•œ ๊ณ ๋ฏผ์œผ๋กœ ๋„˜์–ด๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฐ cluster์˜ centroid๊ฐ€ ๋˜๋Š” pose๋ฅผ ์ •ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•˜๊ณ  ๋ช‡๊ฐœ์˜ cluster ๊ฐฏ์ˆ˜๊ฐ€ ์ ์ ˆํ•  ์ง€ ํŒ๋‹จํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ K-Access์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

์šฐ์„  ๊ฒฐ๋ก ์ ์œผ๋กœ cluster์˜ ๊ฐฏ์ˆ˜์˜ ์ ์ ˆ์„ฑ์„ Index ์ง€์ˆ˜๊ฐ€ ์ตœ๋Œ€๊ฐ€ ๋˜๋Š” ๊ฐ’๋กœ ํŒ๋‹จํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ์ด Index ์ง€์ˆ˜๋Š” Intra-cluster Accessibility์™€ Inter-cluster Accessibility, ๋งˆ์ง€๋ง‰์œผ๋กœ Regularization Term๊นŒ์ง€ ํ•ฉ์‚ฐํ•˜์—ฌ ๊ฒฐ์ •ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  1. Intra-cluster Accessibility: ์ด๋ฆ„์—์„œ๋„ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ํŠน์ • ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•ด์žˆ๋Š”(=๋‚ด๋ถ€์— ์žˆ๋Š”) sample๋“ค(๊ฐ pose๋ฅผ ์ง€์นญ)๊ณผ centroid sample๊ฐ„์˜ Accessibility ๊ฐ’๋“ค ์ค‘ ์ตœ์†Œ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ Index ์ง€์ˆ˜์— positive sum์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์˜๋ฏธ๋ฅผ ํ•ด์„ํ•ด๋ณธ๋‹ค๋ฉด ํ•œ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•ด์žˆ๋Š” sample๋“ค์˜ centroid๋กœ ํ–ฅํ•˜๋Š” ์‘์ง‘๋ ฅ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Intra-cluster accessibility์˜ ์ฐจ์›์€ 1000๊ฐœ ์ƒ˜ํ”Œ์ด ์ž์‹ ์ด ์†ํ•œ ํด๋Ÿฌ์Šคํ„ฐ centroid์™€์˜ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋ฏ€๋กœ 1000 dimension์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  2. Inter-cluster Accessibility: ํด๋Ÿฌ์Šคํ„ฐ๋“ค ๊ฐ„์— overlapping์ด ๋˜์ง€ ์•Š๊ณ  ์ ์ ˆํžˆ ๊ฑฐ๋ฆฌ๋ฅผ ์œ ์ง€ํ•˜๋ฉฐ ๊ฐ EER์ด ์ „์ฒด State Space๋ฅผ ์ปค๋ฒ„ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด์„œ centroid sample ๊ฐ„์˜ Accessibility์˜ ํ‰๊ท ์„ ๊ตฌํ•œ ๊ฐ’
  3. Regularization Term: ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ์ปค์ง€์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๋ถ€๋ถ„์œผ๋กœ Index์— negative sum์ด ๋˜๋Š” ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. \alpha ๊ฐ’์œผ๋กœ Regularization์˜ ๋น„์ค‘์„ ๋†’์ผ ์ˆ˜ ์žˆ๋Š”๋ฐ ๋…ผ๋ฌธ์—์„œ๋Š” 1์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

K-means++ VS. K-Access

K-Access ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ธฐ์กด์— ML์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ K-means++ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. K-means++ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ฒ˜๋Ÿผ (1) Initialize the centroids (2) Assignment step (3) Update step ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์น˜๋Š” ๊ฒƒ์€ ๋น„์Šทํ•˜์ง€๋งŒ K-means++ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ๋Š” (3)๋‹จ๊ณ„์—์„œ ํ‰๊ท ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง์ด ์ง„ํ–‰๋˜๋Š” ๋ฐ˜๋ฉด K-Access ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ๋Š” robustness๋ฅผ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด Maximal neighborhood accessibility๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ข€ ๋” ์ž์„ธํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ณผ์ •์„ ์•Œ์•„๋ณด๊ณ  ์‹ถ์œผ์‹  ๋ถ„๋“ค์€ ์•„๋ž˜ Pseudo Code๋ฅผ ํ™•์ธํ•ด์ฃผ์„ธ์š”.

Pseudo Code of K-Access

Clustering Analysis

๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ Bittle ๋กœ๋ด‡ ํ”Œ๋žซํผ์œผ๋กœ clustering์„ ์ง„ํ–‰ํ–ˆ์„ ๋•Œ 43๊ฐœ์˜ cluster๊ฐ€ ์ตœ์ ์˜ ๊ฐฏ์ˆ˜๋กœ ์ •ํ•ด์ง‘๋‹ˆ๋‹ค. ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•˜๋Š” ์ƒ˜ํ”Œ ์ˆ˜๋ฅผ ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœ ํ™•์ธํ•ด๋ณด๋ฉด ์•„๋ž˜ ์™ผ์ชฝ ๊ทธ๋ž˜ํ”„๊ฐ™์ด ๊ทธ๋ ค์ง€๋ฉฐ ์ด์ค‘ ํ•ด๋‹น ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•œ ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ๋งŽ์€ ์ˆœ์„œ๋Œ€๋กœ top 20๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋“ค ๊ฐ„์˜ inter-cluster accessibility๋ฅผ Chord graph๋ฅผ ๊ฐ€์ง€๊ณ  ์‹œ๊ฐํ™”๋ฅผ ํ•ด๋ณด๋ฉด ์˜ค๋ฅธ์ชฝ ๊ทธ๋ž˜ํ”„์™€ ๊ฐ™์ด ๊ทธ๋ ค์ง‘๋‹ˆ๋‹ค. Chord graph์—์„œ ๊ฐ•์กฐ๋œ ๋ถ€๋ถ„๋“ค์€ 0.15 ์ด์ƒ์˜ Accessibility(์•ฝ 1.9์ดˆ ์ด๋‚ด์˜ transition time)๋ฅผ ๊ฐ€์ง„ ๋ถ€๋ถ„๋“ค์ด๋ฉฐ ์˜…๊ฒŒ ํ‘œ์‹œ๋œ ๋ถ€๋ถ„๋“ค์€ 0.05 ์ดํ•˜์˜ Accessibility(์•ฝ 3์ดˆ ์ด์ƒ transition time)๋ฅผ ๊ฐ€์ง€๋Š” ๋ถ€๋ถ„๋“ค์ž…๋‹ˆ๋‹ค.

Chord graph ์‹œ๊ฐํ™” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด์„œ๋Š” ํ•ด๋‹น ๋…ผ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹ค์ œ ์ œ๊ฐ€ ์—ฐ๊ตฌํ•˜๊ณ  ์žˆ๋Š” ๋กœ๋ด‡ ํ”Œ๋žซํผ์„ ์ด์šฉํ•˜์—ฌ ์ ์šฉํ•œ ์ฝ”๋“œ ์‹ค์Šต์€ ๋‹ค์Œ ํฌ์ŠคํŒ… ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3.4 Reinforcement Learning Process

๋งˆ์น˜ Machine Learning์—์„œ Feature Engineering์ด ๋งŽ์€ ์ฃผ์˜๋ฅผ ์š”ํ•˜๋Š” ์ž‘์—…์ด๋“ฏ์ด ์•ž์„œ Initial State๋ฅผ ์ •ํ•˜๋Š” ์ž‘์—…์„ ์ง„ํ–‰ํ•˜๊ณ  ๋“œ๋””์–ด ๊ฐ•ํ™”ํ•™์Šต ๊ณผ์ •์— ๋“ค์–ด์˜ค๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์œ ๋ช…ํ•œ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ SAC(Soft-Actor-Critic)์„ ๋‹จ์ˆœํ•œ MLP ๋ ˆ์ด์–ด๋กœ ๋งŒ๋“ค์–ด์„œ ์‚ฌ์šฉํ–ˆ๊ณ  Policy Network์˜ Input๊ณผ Ouput ์„ค๊ณ„๋„ ๊ด€๋ จ ์—ฐ๊ตฌ๋“ค์˜ convention๊ณผ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ž์„ธํ•œ ์„ค๋ช…์€ ์ƒ๋žตํ•˜๊ณ  ํŠน์ง•์ ์ธ ๋ถ€๋ถ„์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Reward Functions w/ RBF

ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ ๋‹ค๋ฅธ ๋…ผ๋ฌธ๋“ค์˜ ๊ฐ•ํ™”ํ•™์Šต MDP ์„ค๊ณ„์™€ ๋‹ค๋ฅธ ํŠน์ง•์ ์ธ ๋ถ€๋ถ„์€ ๋ณด์ƒํ•จ์ˆ˜ ์„ค๊ณ„ ๋ถ€๋ถ„์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ Reward function์€ ๊ฐ Reward Term๋“ค์„ Linear Weigthed Sumํ˜•์‹์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” RBF(Radial Basis Function)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ Reward๋ฅผ weighted sumํ•œ ๊ฐ’์œผ๋กœ ์ตœ์ข… reward๋ฅผ ๊ณ„์‚ฐํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์‹ค Reward Function์„ ์„ค๊ณ„ํ•˜๋Š” ๋ถ€๋ถ„์€ ๊ฐ•ํ™”ํ•™์Šต ์—ฐ๊ตฌ์—์„œ Reward Engineering ์ด์Šˆ๊ฐ€ ํฐ ๊ฒƒ์ฒ˜๋Ÿผ, ๋‹ค๋ถ„ํžˆ ์„ค๊ณ„์ž์˜ ์˜๋„์™€ ์„ค๋ช…์ด ํ•„์š”ํ•œ ๋ถ€๋ถ„์ด์ง€๋งŒ ๋…ผ๋ฌธ์—์„œ ์ž์„ธํžˆ ์„ค๋ช…์ด ๋˜์–ด ์žˆ์ง€ ์•Š๊ณ  Main Contribution์ด ์•„๋‹ˆ๋ผ๊ณ  ์ƒ๊ฐํ•ด์„œ ๊ทธ๋Ÿฐ์ง€ Linear sum๊ณผ ๋น„๊ตํ•œ ์‹คํ—˜๊ฐ’๋„ ์žˆ์ง€ ์•Š์•„์„œ RBF๋ฅผ ์‚ฌ์šฉํ•œ ์ด์œ ๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์–ด๋ ค์› ์Šต๋‹ˆ๋‹ค.

22 23
Lecture 16 - Radial Basis Functions Slides(Caltech)

๋”ฐ๋ผ์„œ ์ด ๋ถ€๋ถ„์€ RBF ์ปค๋„์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ•˜๊ณ  ๋‚˜์„œ ์ œ๊ฐ€ ์ƒ๊ฐํ•œ ์ด์œ ๋ฅผ ๋ง๋ถ™์ด๊ฒ ์Šต๋‹ˆ๋‹ค. RBF ์ปค๋„์€ ๊ธฐ๋ณธ์ ์œผ๋กœ Gaussian Distribution ๋ชจ์Šต์œผ๋กœ target value์™€ data ๊ฐ„์˜ radialํ•œ ๊ฑฐ๋ฆฌ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๊ฒŒ ๋˜๋Š”๋ฐ, linear sum๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋ฌดํ•œ ์ฐจ์› ์˜์—ญ์—์„œ ๋งค์šฐ ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ ์žˆ๋Š” data๋กœ๋ถ€ํ„ฐ ์˜ํ–ฅ์„ ๋œ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Reward๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ์— RBF ์ปค๋„์„ ํ†ตํ•ด ๊ณ„์‚ฐํ•œ ์˜๋„๋Š” Maximizationํ•ด์•ผ ํ•˜๋Š” Reward term๋“ค์„ ๋‹จ์ˆœํžˆ Linear sumํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์—ฌ๋Ÿฌ ์นดํ…Œ๊ณ ๋ฆฌ์˜ Reward target ๊ฐ’๋“ค์— ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋„๋ฅผ \alpha๊ฐ’(Slide์—์„œ๋Š” \gamma๋กœ ํ‘œํ˜„)์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šต์˜ ์ข‹์€ ์ง€ํ‘œ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” Reward space๋ฅผ ์„ค๊ณ„ํ•œ ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.

Reward Term์—์„œ ์‚ฌ์šฉ๋œ Symbol์˜ ์˜๋ฏธ๊ฐ€ ๊ถ๊ธˆํ•˜์‹  ๋ถ„๋“ค์€ ์•„๋ž˜ table์„ ํ™•์ธํ•ด์ฃผ์„ธ์š”.

Symbols of Reward Terms for DRL

Other Tasks - Backflip

ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” Recovery ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Locomotion ๋ณด๋‹ค ๋” ๋‹ค์ด๋‚˜๋ฏนํ•œ ๋ชจ์…˜๋„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด Backflip ํ•™์Šต๋„ K-Accessibility ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. (์ด์ „์— ๋ฆฌ๋ทฐํ–ˆ๋˜ WASABI ๋…ผ๋ฌธ์—์„œ๋„ ๋‹ค์ด๋‚˜๋ฏนํ•œ ๋ชจ์…˜ 4๊ฐ€์ง€ ์ค‘ ํ•˜๋‚˜๋ฅผ Backflip์œผ๋กœ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๋˜ ๊ฒƒ๊ณผ ๊ฐ™์€ ๋งฅ๋ฝ์œผ๋กœ ํ•ด๋‹น ๋ชจ์…˜ Task๋ฅผ ์„ค์ •ํ–ˆ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.)


4 Results

์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๊ฐ ๋‹ค๋ฆฌ์— 2๊ฐœ์˜ joint๊ฐ€ ์žˆ๋Š” ์ด 8 DoF์ธ Bittle ๋กœ๋ด‡ ํ”Œ๋žซํผ์„ ๊ฐ€์ง€๊ณ  ์ด 2๊ฐ€์ง€ Task, Recovery์™€ Backflip์— ๋Œ€ํ•ด์„œ ์ง„ํ–‰๋˜์—ˆ์œผ๋ฉฐ, ๋น„๊ต๊ตฐ์œผ๋กœ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ด 5๊ฐœ(proposed method ์ œ์™ธ) ์„ค์ •ํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค.

Models

  • KA[Paper's suggestion]: K-Access ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์–ป์€ 43 clusters์˜ centroid๋กœ initial pose ์„ค์ •
  • KM: K-Means++ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์–ป์€ 33 clusters์˜ centroid ๋กœ initial pose ์„ค์ •
  • WKM: K-Means++ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์–ป์€ 14 clusters์˜ centroid ๋กœ initial pose ์„ค์ • (gravity vector weighted by 2)
  • 9-Pose: 9๊ฐœ์˜ ํŠน์ • pose๋กœ initial pose ์„ค์ •
  • 1-Pose: lying pose 1๊ฐœ๋กœ initial pose ์„ค์ •
  • RND: ๋งค Episode ๋งˆ๋‹ค ๋žœ๋คํ•œ static pose๋กœ initial pose ์„ค์ •

Recovery Task

SAC ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ training(์„œ๋กœ๋‹ค๋ฅธ seed 3๊ฐ€์ง€๋กœ ์‹คํ—˜)ํ•œ ๊ณผ์ •๋™์•ˆ plotํ•œ reward graph๋ฅผ ๋ณด๋ฉด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ• KA์˜ Reward๊ฐ€ ๋งค์šฐ ๋†’๊ณ  ์„œ๋กœ๋‹ค๋ฅธ ์‹œ๋“œ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์‚ฐ๋„ ์ ์€ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•ฝ 180k step์—์„œ RND๋ฅผ ์ œ์™ธํ•œ ๋น„๊ต๊ตฐ๋“ค๊ณผ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์˜ reward๊ฐ€ ๋น„์Šทํ•˜๊ฒŒ ์ˆ˜๋ ดํ•˜๋Š” ๋“ฏ์ด ๋ณด์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ข€ ๋” ํ™•์‹คํ•œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์˜ ์žฅ์ ์„ ์‚ดํŽด๋ณด๊ธฐ ์œ„ํ•ด training์„ ๋งˆ์นœ ํ›„์— ํ•™์Šต ๊ณผ์ •์—์„œ ์“ฐ์ด์ง€ ์•Š์€ initial static pose 500๊ฐœ test pose๋“ค์— ๋Œ€ํ•ด์„œ ์„ฑ๊ณต๋ฅ ์„ ์‚ดํŽด๋ณธ ๊ฒฐ๊ณผ ์•„๋ž˜ Table2์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ๋” ์ ์€ training episode(1200<1600)์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  Test ์—ํ”ผ์†Œ๋“œ ๋ฆฌ์›Œ๋“œ์˜ ํ‰๊ท ๊ฐ’์ด ๋‹ค๋ฅธ ๋น„๊ต๊ตฐ๋“ค์— ๋น„ํ•ด ๋†’๊ณ  ๋ถ„์‚ฐ๋„ ๋” ๋‚ฎ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ 3์ดˆ ์ด๋‚ด๋กœ ์ „๋ณต ํšŒ๋ณต์„ ์„ฑ๊ณตํ•œ ์„ฑ๊ณต๋ฅ ์„ ํ™•์ธํ–ˆ์„๋•Œ ๋‹ค๋ฅธ ๋น„๊ต๊ตฐ๋“ค๋ณด๋‹ค ๋†’์€ 99.4%๋ฅผ ๋ณด์—ฌ 500๊ฐœ์˜ initial pose๋“ค ์ค‘ 497๊ฐœ ์ผ€์ด์Šค์— ๋Œ€ํ•ด์„œ ์„ฑ๊ณตํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Recovery๋ฅผ ์„ฑ๊ณตํ•œ ์ผ€์ด์Šค์— ๋Œ€ํ•œ timestep snapshot๊ณผ ์•ฝ 1.2์ดˆ ๋™์•ˆ ํšŒ๋ณต์ž์„ธ๋กœ ๋Œ์•„๊ฐ€๋Š” ๋™์•ˆ์˜ 8๊ฐœ์˜ joint ๊ฐ๋„ ๋ณ€ํ™”์— ๋Œ€ํ•ด plotํ•œ ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค.

Backflip์— ๋Œ€ํ•ด์„œ๋„ Recovery์™€ ๊ฐ™์ด ์‹คํ—˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ–ˆ์œผ๋ฉฐ Recovery์™€ ํฌ๊ฒŒ ๋‹ค๋ฅธ ์ ์ด ์—†๊ณ  ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด Recovery์—๋งŒ ๊ตญํ•œ๋˜์–ด ์žˆ์ง€ ์•Š๊ณ  ๋‹ค์–‘ํ•œ ๋‹ค์ด๋‚˜๋ฏนํ•œ ๋ชจ์…˜ ํ•™์Šต์—์„œ๋„ ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€์ ์œผ๋กœ ๋ถ™์—ฌ์ง„ ๋ถ€๋ถ„๊ฐ™์•„ ์ž์„ธํ•œ ์„ค๋ช…์€ ์ƒ๋žตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ด€๋ จ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์—์„œ ํ™•์ธํ•ด์ฃผ์„ธ์š”.

Backflip Task

5 Conclusion

์ด๋ฒˆ์— ๋ฆฌ๋ทฐํ•œ ๋…ผ๋ฌธ์€ ๊ฐ•ํ™”ํ•™์Šต์˜ Initial pose์— ๋Œ€ํ•ด ์‹ฌ๋„์žˆ๊ฒŒ ๊ด€์ฐฐํ•ด๋ณด๊ณ  ์–ด๋–ป๊ฒŒ ์„ค๊ณ„ํ•˜๊ณ  ํ•™์Šตํ•ด์•ผ ํ•˜๋Š”์ง€ ์ œ์•ˆํ•œ ๋…ผ๋ฌธ์œผ๋กœ ๋งค์šฐ ์˜๋ฏธ๊ฐ€ ์žˆ๋Š” ์—ฐ๊ตฌ์˜€๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต์„ ์œ„ํ•ด State Space๋ฅผ ์–ด๋–ป๊ฒŒ ๋ถ„์„ํ•˜๊ณ  ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์œ ๋„ํ•˜๋Š” Accessibility๋ผ๋Š” ๊ฐœ๋…๊ณผ K-Access ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด Contribution ์ดˆ์ ์œผ๋กœ ์ž˜ ์žกํ˜€์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉฐ ๋‹ค๋งŒ ์‹ค์ œ ๋กœ๋ด‡์„ ๊ฐ€์ง€๊ณ  ๊ฒ€์ฆํ•ด๋ณด์ง€ ๋ชปํ•ด์„œ ์กฐ๊ธˆ ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ Future Work๋กœ ์ œ์‹œํ•œ ๊ฒƒ์ฒ˜๋Ÿผ Accessibility Estimation๊ณผ Clustering ๋ฐฉ๋ฒ•์— Learning ๊ธฐ๋ฒ•์„ ๊ฐ€์ง€๊ณ  ์ข€ ๋” ๋ณด์™„ํ•ด๋ณด๋Š” ๊ฒƒ๋„ ์ข‹์€ ์—ฐ๊ตฌ๋ฐฉํ–ฅ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

ํ•ด๋‹น ๋…ผ๋ฌธ์— ๋Œ€ํ•ด์„œ ์ข€ ๋” ์•Œ์•„๋ณด๊ณ  ์‹ถ์œผ์‹  ๋ถ„๋“ค์€ ์ €์ž๋“ค์ด ๊ณต๊ฐœํ•œ Youtube ๋ฐœํ‘œ ์˜์ƒ๋„ ํ™•์ธํ•ด๋ณด์‹œ๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

6 Reference

  • Original Paper: Accessibility-Based Clustering for Efficient Learning of Locomotion Skills
  • Original Paper Presentation on Youtube
  • โ€œ๊ณผ๋กœ ํ•œ ๊ฑฐ๋‹ˆ?โ€ ๋ฐ•๋žŒํšŒ์„œ ์ง ๋‚˜๋ฅด๋˜ 2์กฑ ๋ณดํ–‰ ๋กœ๋ด‡ โ€˜๊ฝˆ๋‹นโ€™ / JTBC ์ƒ์•”๋™ ํด๋ผ์Šค
  • Radial Basis Fuction (RBF) Kernel ์ด๋ž€?
  • The Radial Basis Function Kernel
  • 106 RBF Kernel
  • Lecture 16 - Radial Basis Functions - Slides
  • Radial Basis Function (RBF) Kernel: The Go-To Kernel
  • k-means++ Wiki
  • Chord diagram

Copyright 2024, Jung Yeon Lee