Curieux.JY
  • Post
  • Note
  • Jung Yeon Lee

On this page

  • Abstract
  • 1. Introduction
  • 2. Model
    • Graph representation of a physical system
    • Static & Dynamic properties
    • Graph networks
    • Forward models
    • Inference Models
    • Control algorithm
  • 3. Methods
  • 4. Results
    • Prediction
    • Inference
    • Control
  • 5. Discussion

๐Ÿ“ƒGN-Block ๋ฆฌ๋ทฐ

gnn
system identification
mpc
rl
paper
Graph Networks as Learnable Physics Engines for Inference and Control
Published

August 7, 2022

์ด๋ฒˆ post๋Š” Graph Networks as Learnable Physics Engines for Inference and Control ๋ผ๋Š” ๋…ผ๋ฌธ์„ ์ฝ๊ณ  ๋ฆฌ๋ทฐํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

Abstract

Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable modelsโ€”based on graph networksโ€”which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions from real and simulated data, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification. Our models are also differentiable, and support online planning via gradient-based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world and takes a key step toward building machines with more human-like representations of the world.

1. Introduction

์‚ฌ๋žŒ์€ ๊ฑธ์„๋•Œ ๋งˆ์ฐฐ๋ ฅ, ์ž‘์šฉ ๋ฐ˜์ž‘์šฉ ๋ฒ•์น™์„ ์ƒ๊ฐํ•˜๋ฉด์„œ ๊ฑท์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋งŽ์€ ๋ฌผ๋ฆฌ ๋ฌธ์ œ๋“ค๊ณผ ๋ฒ•์น™๋“ค์„ ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ต์ง€๋งŒ ๋ณต์žกํ•œ ๋ฌผ๋ฆฌ์  ์ž‘์šฉ๋“ค์ด ์ผ์–ด๋‚˜๋Š” ๊ฑท๋Š” ํ–‰๋™์— ์žˆ์–ด์„œ ์–ด๋ ค์›€์ด ์—†์Šต๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์‚ฌ๋žŒ์ด ํƒœ์–ด๋‚˜์„œ ์ˆ˜์—†์ด ๋งŽ์€ ๊ฒฝํ—˜๋“ค์˜ ๋ˆ„์ ์œผ๋กœ ์šฐ๋ฆฌ๋Š” ํฌ๊ฒŒ ์‹ ๊ฒฝ์“ฐ์ง€ ์•Š์•„๋„ ์–ด๋–ป๊ฒŒ ํž˜์„ ์ฃผ๋ฉด ๋‹ค๋ฆฌ๊ฐ€ ์›€์ง์ด๋Š” ์ง€ ์•Œ๊ณ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ์ธ๊ณต์ง€๋Šฅ๋„ ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ๋ณต์žกํ•œ ์‹œ์Šคํ…œ์„ ์ดํ•ดํ•˜๊ณ  ์ž˜ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ์„๊นŒ? ๋ผ๋Š” ๋ฌผ์Œ์„ GN-Block์ด๋ผ๋Š” Graph ์•„์ด๋””์–ด๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ฃผ์žฅํ•˜๋Š” ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค.

โ“ How can an intelligent agent understand and control such complex systems?

์ธ๊ณต์ง€๋Šฅ์ด ์ด๋ ‡๊ฒŒ ์‚ฌ๋žŒ์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ตํžˆ๋Š” ์„ธ์ƒ์˜ ๋ฌผ๋ฆฌ์ ์ธ ํ˜„์ƒ์„ ์ดํ•ดํ•˜๊ณ  ์ƒํ˜ธ์ž‘์šฉ ์ž‘์šฉํ•˜๋ ค๋ฉด ์•”์‹œ์ ์ด๋“  ๋ช…์‹œ์ ์œผ๋กœ๋“  ์„ธ๊ณ„์— ๋Œ€ํ•œ ํ’๋ถ€ํ•œ(rich) ํ‘œํ˜„๊ณผ ์ง€์‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์‹œ ๋งํ•ด, ์‹œ์Šคํ…œ์— ์žˆ๋Š” objects๋“ค๊ณผ objects๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•ด์„œ ๋™์ผ object์—๋Š” ๋™์ผํ•œ object-wiseํ•œ ๊ณ„์‚ฐ์„, ์ด๋“ค ์‚ฌ์ด์— ์ผ์–ด๋‚˜๋Š” interation๋“ค์— ๋Œ€ํ•ด์„œ๋Š” relation-wise ๊ณ„์‚ฐ์„ ์ ์šฉํ•ด์„œ ํ•™์Šต์„ ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์น˜ ๋ ˆ๊ณ  ๋ธ”๋Ÿญ๋“ค ํ•˜๋‚˜ํ•˜๋‚˜๋ฅผ ์ดํ•ดํ•˜๊ณ  ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์„ฑ์„ ์Œ“์„ ์ˆ˜ ์žˆ๋Š”์ง€ ์•„๋Š” ๊ฒƒ์ฒ˜๋Ÿผ combinatorial generalization ๋Šฅ๋ ฅ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•ด๋‹น ๋…ผ๋ฌธ์˜ ๋ชฉํ‘œ๋Š” physical dynamics models์„ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Graph Neural Network(GN)์˜ node update function์„ ๊ฐ€์ง€๊ณ  body์˜ dynamics์— ๋Œ€ํ•œ ํ•™์Šต์„ ํ•  ์ˆ˜ ์žˆ๊ณ , edge update function์„ ๊ฐ€์ง€๊ณ  interaction์˜ dynamics๋ฅผ ์ธ์ฝ”๋”ฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, global update function์„ ๊ฐ€์ง€๊ณ  global system์˜ ์†์„ฑ๋“ค์„ ์ธ์ฝ”๋”ฉ ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋‹ค๋ฅธ ๋…ผ๋ฌธ๋“ค๊ณผ ๋‹ค๋ฅด๊ฒŒ ํŠน์ดํ•œ ์ ์€ globalํ•œ ์‹œ์Šคํ…œ์˜ ์†์„ฑ์ด๋ผ๋Š” ๋ถ€๋ถ„์„ ๋”ฐ๋กœ ๊ณ ๋ ค๋ฅผ ํ–ˆ๋‹ค๋Š” ์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์˜ contribution์€ ํฌ๊ฒŒ 3๊ฐ€์ง€ forward model, inference model, control algorithm ์ž…๋‹ˆ๋‹ค. (ํ•˜์ง€๋งŒ ๋ฆฌ๋ทฐํ•˜๋ฉด์„œ ๋А๋‚€ ์ ์€ control algorithm ๋ถ€๋ถ„์€ contribution์ด๋ผ๊ณ  ํ•˜๊ธฐ๋ณด๋‹ค๋Š” GN-based model์„ ๊ฐ€์ง€๊ณ  control pipeline์„ ์ž˜ ๋ถ™์ธ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐ๋ฉ๋‹ˆ๋‹ค.) ๋‹ค๋ฅธ physics engine๋“ค๊ณผ ๋‹ค๋ฅด๊ฒŒ ๋ฌผ๋ฆฌ๋ฒ•์น™์— ๋Œ€ํ•œ ์‚ฌ์ „ ์ง€์‹(prior knowledge)์„ ์ „ํ˜€ ๊ฐ€์ •ํ•˜์ง€ ์•Š์ง€๋งŒ ๋Œ€์‹  object- and relation-centric inductive bias๋ฅผ ์ด์šฉํ•˜์—ฌ current-state/next-state pairs์— ๋Œ€ํ•œ ํ•™์Šต์„ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌผ๋ฆฌ ์‹œ์Šคํ…œ์„ forward model๊ณผ inference model์ด ํ•™์Šตํ•˜๊ฒŒ ๋˜๋ฉด control algorithm์€ ์ด ๋ชจ๋ธ๋“ค์„ ์ด์šฉํ•˜์—ฌ planning์ด๋‚˜ policy learning์„ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Model Role
GN-based forward models ์ •ํ™•ํ•˜๊ณ  ์ผ๋ฐ˜ํ™”๋œ prediction์„ ํ•  ์ˆ˜ ์žˆ์Œ
GN-based inference models observation์— ์ˆจ๊ฒจ์ ธ ์žˆ๋Š” ์†์„ฑ๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ system identification์„ ํ•  ์ˆ˜ ์žˆ์Œ
NOT GN-based control algorithms ๋‹ค๋ฅธ ๋ฒ ์ด์Šค๋ผ์ธ๋“ค๋ณด๋‹ค ์ข‹์€ control ํผํฌ๋จผ์Šค๋ฅผ ๋ณด์—ฌ์คŒ

2. Model

Graph representation of a physical system

๋ฌผ๋ฆฌ์‹œ์Šคํ…œ์„ ์–ด๋–ป๊ฒŒ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š”์ง€ ๋ช‡๊ฐ€์ง€ ์šฉ์–ด์™€ ์ˆ˜์‹๋“ค์„ ์ •๋ฆฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ๋ฌผ๋ฆฌ ์‹œ์Šคํ…œ์˜ body๋Š” ๊ทธ๋ž˜ํ”„์˜ node๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฌผ๋ฆฌ ์‹œ์Šคํ…œ์˜ joint๋Š” ๊ทธ๋ž˜ํ”„์˜ edge๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฌผ๋ฆฌ ์‹œ์Šคํ…œ์˜ globalํ•œ ์†์„ฑ์€ global feature๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์‚ฌ์ง„์—์„œ ๋ณด์ด๋Š” half-cheetah์—์„œ ์ง๊ด€์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ ๊ทธ๋ž˜ํ”„๊ฐ€ ๊ทธ๋ ค์งˆ ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ๊ณ  ์ด ๊ทธ๋ž˜ํ”„๋ฅผ G๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•ž์„œ ์„ค๋ช…ํ•œ ๋ถ€๋ถ„์„ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

G=\left(\mathbf{g},\left\{\mathbf{n}_{i}\right\}_{i=1 \cdots N_{n}},\left\{\mathbf{e}_{j}, s_{j}, r_{j}\right\}_{j=1 \cdots N_{e}}\right)

  • g : global features ์‹œ์Šคํ…œ์˜ ์ค‘๋ ฅ์ด๋‚˜ time step๊ณผ ๊ฐ™์€ ์†์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค.
  • \mathbf{n}_{i} : node features๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค.
  • \mathbf{e}_{j} : edge features๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค.
  • s_{j} : ์ด edge๋ฅผ ํ†ตํ•ด์„œ message๋ฅผ ๋ณด๋‚ด๋Š” sender nodes์˜ ์ธ๋ฑ์Šค์ž…๋‹ˆ๋‹ค.
  • r_{j} : ์ด edge๋ฅผ ํ†ตํ•ด์„œ message๋ฅผ ๋ฐ›๋Š” receiver nodes์˜ ์ธ๋ฑ์Šค์ž…๋‹ˆ๋‹ค.

Static & Dynamic properties

์—ฌ๊ธฐ์„œ static graph G_s์™€ dynamic graph G_d ๋ผ๋Š” ๊ทธ๋ž˜ํ”„๋Š” 2๊ฐ€์ง€ ์ข…๋ฅ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด 2๊ฐœ์˜ ๊ทธ๋ž˜ํ”„๋Š” ๊ฐ๊ฐ ์‹œ์Šคํ…œ์˜ ์†์„ฑ์ด ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€ํ™”ํ•˜๋Š”์ง€(dynamic/time-variant) ์•ˆํ•˜๋Š”์ง€(static/time-invaritant)์— ๋”ฐ๋ผ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์ •๋ณด์˜ ์ข…๋ฅ˜๊ฐ€ ๋‹ค๋ฆ…๋‹ˆ๋‹ค.(์ž์„ธํ•œ ์ •๋ณด๋Š” Appendix G section์—์„œ Mujoco ๊ธฐ๋ฐ˜์˜ ์–ด๋–ค ์ •๋ณด๋กœ ๊ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์„ฑํ–ˆ๋Š”์ง€ ๋‚˜์™€์žˆ์Šต๋‹ˆ๋‹ค.)

  • A static graph G_s: ์‹œ์Šคํ…œ์˜ staticํ•œ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ทธ๋ž˜ํ”„
    • global parameters: the time step, viscosity, gravity, etc
    • body/node parameters: mass, inertia tensor, etc.
    • joint/edge parameters: joint type๊ณผ properties, motor type and properties, etc
  • A dynamic graph G_d: ์‹œ์Šคํ…œ์˜ ์ผ์‹œ์ ์ธ state ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ทธ๋ž˜ํ”„
    • body/node: 3D Cartesian position, 4D quaternion orientation, 3D linear velocity, 3D angular velocity
    • joint/edge: joint์— ์ ์šฉ๋œ action๋“ค์˜ ํฌ๊ธฐ

Graph networks

  • graph2graph ๋ชจ๋“ˆ์„ ํ™œ์šฉํ•˜์—ฌ ์ธํ’‹์„ ๊ทธ๋ž˜ํ”„๋กœ ๋ฐ›๊ณ  ์•„์›ƒํ’‹๋„ ๊ทธ๋ž˜ํ”„๋กœ ๋ฐ›๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์•„์›ƒํ’‹์˜ ๊ทธ๋ž˜ํ”„๋Š” ์ธํ’‹ ๊ทธ๋ž˜ํ”„์™€ ๋‹ค๋ฅธ edge, node, global features๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด์ธ GN ๋ธ”๋ก์˜ ๊ตฌ์กฐ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. - A core GN block

- 3๊ฐœ์˜ sub function, MLP๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.
    - edge-wise $f_e$ : ๋ชจ๋“  edge๋“ค์— ๋Œ€ํ•œ update๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    - node-wise $f_n$ : ๋ชจ๋“  node๋“ค์— ๋Œ€ํ•œ update๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    - global $f_g$ : ๋งˆ์ง€๋ง‰์œผ๋กœ global feature๋“ค์„ update ํ•ฉ๋‹ˆ๋‹ค.

ํ•˜๋‚˜์˜ feedforward GN pass๋Š” ๊ทธ๋ž˜ํ”„ ์ƒ์—์„œ message-passing ๋‹จ๊ณ„์˜ ํ•œ ์Šคํ…์œผ๋กœ ๊ฐ„์ฃผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ GN-block ๋‚ด์—์„œ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Forward models

Forward model์˜ ๋ชฉ์ ์€ ํ˜„์žฌ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ step์˜ ์ƒํƒœ๋ฅผ ์˜ˆ์ธก(prediction)ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. (์ด๋Š” ์˜์–ด ๋‹จ์–ด์˜ ๋น„์Šทํ•œ ์˜๋ฏธ๋•Œ๋ฌธ์— ๋‹ค์Œ์— ๋‚˜์˜ค๋Š” inference model์˜ ๋ชฉ์ ๊ณผ ๋งŽ์ด ํ˜ผ๋™๋  ์ˆ˜ ์žˆ์œผ๋‹ˆ ์ž˜ ์ •์˜ํ•˜๊ณ  ๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.) forward model์€ RNN(GRU)๋ฅผ ๋„์ž…ํ–ˆ๋Š”์ง€ ์—ฌ๋ถ€์— ๋”ฐ๋ผ 2๊ฐ€์ง€ ํƒ€์ž…์ด ์žˆ์Šต๋‹ˆ๋‹ค.

Type1. GNN feed-forward

๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ GNN feed-forward ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ž˜ํ”„๋Š” ์ฒ˜์Œ์— GN_1์„ ๊ฑฐ์ณ latent graph์ธ G'์ด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‹ค์Œ GN_2์˜ ์ธํ’‹์œผ๋กœ๋Š” GN_1์„ ๊ฑฐ์น˜๊ธด ์ „์˜ ๊ทธ๋ž˜ํ”„์˜€๋˜ G์™€ G'๋ฅผ concatenate๋ฅผ ํ•ด์„œ ๋„ฃ์–ด์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ์ด๋ ‡๊ฒŒ ๋””์ž์ธํ•œ ์ด์œ ๋กœ, ๊ทธ๋ž˜ํ”„์˜ ๋ชจ๋“  ๋…ธ๋“œ๋“ค๊ณผ ์—ฃ์ง€๋“ค์ด ๋ชจ๋‘ communicateํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•จ์ด๋ผ๊ณ  ์ด์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ GN_1, GN_2๋ฅผ ๊ฑฐ์ณ ์ตœ์ข…์ ์œผ๋กœ ๋‚˜์˜ค๋Š” G^*์˜ node feature๋“ค์ด ๊ฐ body์˜ ์ƒํƒœ prediction ๊ฐ’์ด ๋˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

Type2. RNN+GNN

๋‹ค์Œ์œผ๋กœ ์•ž์„œ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๋ชจ๋ธ์— G-GRU๋ฅผ ์ถ”๊ฐ€ํ•œ ํƒ€์ž…๋‹ˆ๋‹ค. Type 1๊ณผ ๋น„์Šทํ•˜๊ฒŒ skip connection, latent graph๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋Š”๋ฐ GN block์˜ GRU ๋ฒ„์ ผ์ธ G-GRU๊ฐ€ ๋“ค์–ด๊ฐ€๋ฉด์„œ G_h๋ผ๋Š” RNN์—์„œ hidden vector์™€ ๊ฐ™์€ ๊ฐœ๋…์˜ hidden graph๊ฐ€ ์ถ”๊ฐ€๋œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  edge, node, global feature๋“ค์— ๋Œ€ํ•ด ๊ฐ๊ฐ RNN์ด ์ ์šฉ๋˜์–ด ์ด 3๊ฐœ์˜ RNN sub-modules์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋‘๊ฐ€์ง€ ํƒ€์ž…์˜ GNN forward ๋ชจ๋ธ์— ๊ณตํ†ต์ ์ธ ์‚ฌํ•ญ

  1. state differences๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์„ ํ•™์Šตํ•ด์„œ state prediction์˜ ์ ˆ๋Œ“๊ฐ’(absolute)์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณ„์‚ฐ๋œ absolute state prediction์„ ๊ฐ€์ง€๊ณ  state๋ฅผ updateํ•˜๊ฒŒ ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  2. long-range rollout trajectory๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๊ธฐ ์œ„ํ•ด์„œ state prediction ๊ฐ’๊ณผ control input์„ ๋ฐ˜๋ณต์ ์œผ๋กœ model์— ๋„ฃ์–ด์ฃผ์–ด์„œ ์—ฌ๋Ÿฌ ์Šคํ…์˜ trajectory๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  3. GN model์˜ ์ธํ’‹๊ณผ ์•„์›ƒํ’‹๋“ค์€ normalize ๋ฉ๋‹ˆ๋‹ค.

์‚ฌ์‹ค ๋ฆฌ๋ทฐ๋ฅผ ํ•˜๋ฉด์„œ forward model๊ณผ inference model ์‚ฌ์ด์˜ ๊ตฌ๋ถ„์ด๋‚˜ ๋ชจ๋ธ์˜ ๊ตฌ์ฒด์ ์ธ ํ”„๋กœ์„ธ์Šค ์ดํ•ด๊ฐ€ pseudo algorithm์„ ๋ณด๊ธฐ ์ „๊นŒ์ง€ ์ž˜๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. Appendix์— ๋‚˜์™€์žˆ์–ด์„œ ์ž˜ ๋ณด์ง€ ์•Š์„ ํ™•๋ฅ ์ด ๋†’์ง€๋งŒ ๋…ผ๋ฌธ์˜ ๊ฐœ๋…์„ ๋Œ€๋žต์ ์œผ๋กœ ์ดํ•ดํ•˜๊ณ  ๋‚œ ํ›„์—๋Š” ๊ผญ line by line์œผ๋กœ ๋ณด์‹œ๊ธธ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค.

๋จผ์ € forward model์˜ ํ•™์Šต๊ณผ์ •์„ ๋ณด์—ฌ์ฃผ๋Š” pseudo algorithm ์ž…๋‹ˆ๋‹ค. ๋‹ค์‹œํ•œ๋ฒˆ ์ด ๋ชจ๋ธ์˜ ๋ชฉ์ ์„ ์ƒ๊ธฐ์‹œ์ผœ๋ณด์ž๋ฉด, ํ˜„์žฌ ์ƒํƒœ x^{t_0} ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ a^{t_0}์™€ ํ•จ๊ป˜ ์ฃผ์–ด์กŒ์„ ๋•Œ, x^{t_0+1}์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์•ž์„œ ์„ค๋ช…ํ•œ ๋ถ€๋ถ„๋“ค์ธ, state์˜ ์ž”์ฐจ๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ถ€๋ถ„์ด๋‚˜ normalization ๋“ฑ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜๋‚ด์— ์ž˜ ๋‚˜์™€์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ํ•™์Šต๋œ forward model์„ ๊ฐ€์ง€๊ณ  ๋‹ค์Œ ์ƒํƒœ์ธ x^{t_0+1}์„ ์–ด๋–ป๊ฒŒ ์˜ˆ์ธกํ•˜๋Š”์ง€ ๋ณด์—ฌ์ฃผ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ ๋ฐ”๋กœ ์œ„ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋™์ผํ•˜๊ฒŒ ํ•™์Šต๋œ forward model์„ ๊ฐ€์ง€๊ณ  ๋‹ค์Œ ์ƒํƒœ์ธ x^{t_0+1}์„ ์–ด๋–ป๊ฒŒ ์˜ˆ์ธกํ•˜๋Š”์ง€ ๋ณด์—ฌ์ฃผ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด์ง€๋งŒ inference model์—์„œ ํ•™์Šต๋œ GN_p๋ฅผ ๊ฐ€์ง€๊ณ  system identification์ด ์ถ”๊ฐ€๋œ ์ƒํƒœ์—์„œ ์–ด๋–ป๊ฒŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ˜๋Ÿฌ๊ฐ€๋Š”์ง€ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.(์ด์ „์— ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ๋Š” system parameter p๋ผ๊ณ  ํ‘œ์‹œ๋˜์—ˆ๋˜ ๋ถ€๋ถ„์ด ๋Œ€์ฒด๋œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.)

Inference Models

Inference model์˜ ๋ชฉ์ ์„ ํ•œ ๋งˆ๋””๋กœ ํ‘œํ˜„ํ•˜์ž๋ฉด System identification์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. System identification์ด๋ž€ ๊ด€์ฐฐํ•  ์ˆ˜ ์—†๋Š”(unobserved) dynamic system์˜ ์†์„ฑ๋“ค์„ ๊ด€์ฐฐ๋˜๋Š”(observed) behavior(๋˜๋Š” ์–ด๋–ค ์–‘์ƒ)๋ฅผ ๊ฐ€์ง€๊ณ  ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰ ์•”์‹œ์ ์œผ๋กœ system์„ ๊ตฌ์„ฑํ•˜๋Š” ์š”์†Œ๋“ค์„ (๋ช…์‹œ์ ์ด์ง€ ์•Š์•„) ์ธก์ •ํ•˜๊ฑฐ๋‚˜ ๊ด€์ฐฐํ•  ์ˆ˜ ์—†์ง€๋งŒ latent representations์„ ํ†ตํ•ด ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Inference model๋„ Recurrent GN-based model ์ž…๋‹ˆ๋‹ค. forward ๋ชจ๋ธ๊ณผ ๋‹ค๋ฅธ ์ ์œผ๋กœ๋Š” ์˜ค์ง trajectory์˜ dynamic states๋“ค๋งŒ input์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ dynamic state graph์ธ G_d์™€ control input์„ ๋ฐ›์Šต๋‹ˆ๋‹ค. ์•„์›ƒํ’‹์œผ๋กœ๋Š” ์ผ์ • time step T์ดํ›„์˜ G^*(T)์ด ๋˜๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ดํ›„ ์‹คํ—˜ํŒŒํŠธ์—์„œ 20 step์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

inference model ํ•™์Šต๊ณผ์ •์˜ pseudo ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Control algorithm

control algorithm์—์„œ๋Š” ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜์ด ์•„๋‹ˆ๊ณ  ์•ž์„œ ์„ค๋ช…ํ•œ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜์˜ forward model๊ณผ inference model์„ ์ž˜ ํ™œ์šฉํ•ด์„œ ์–ด๋–ป๊ฒŒ controlํ•  ์ˆ˜ ์žˆ์„์ง€๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํฌ๊ฒŒ 2๊ฐ€์ง€ control algorithm์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต์„ ์ฃผ๋กœ ์—ฐ๊ตฌํ•˜๋Š” ์ž…์žฅ์—์„œ ๋ฆฌ๋ทฐํ•ด๋ณด๋ฉด, ๋Œ€๋ถ€๋ถ„ ๊ฐ•ํ™”ํ•™์Šต์€ model-free ๊ธฐ๋ฐ˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋งŽ์ด ๋ฐœ์ „ํ–ˆ๋Š”๋ฐ GN๊ธฐ๋ฐ˜์˜ ๋‹ค์Œ ์ƒํƒœ๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” model์„ ๋งŒ๋“ฆ์œผ๋กœ์จ model-based ๊ธฐ๋ฐ˜์˜ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด ๋งค์šฐ ํฅ๋ฏธ๋กœ์› ์Šต๋‹ˆ๋‹ค.

  1. MPC(Model Predictive Control)

    GN์€ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— MPC๊ฐ™์€ gradient-based trajectory optimization ๋ฐฉ๋ฒ•์œผ๋กœ model-based planning์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ MPC๊ฐ€ ์žˆ๊ณ  ํ•™์Šต๊ธฐ๋ฐ˜์ด ์•„๋‹ˆ๋ผ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ฉฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ๋ฆ„์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  2. SVG(Stochastic Value Gradients)

    ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ, GN-based model๊ณผ SVG์˜ policy function์„ ๋™์‹œ์— ํ•™์Šตํ•˜๋Š” agent๋กœ control์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. SVG(1)์€ ํ•œ ์Šคํ…์„ ์˜ˆ์ธกํ•˜๋Š” GN model์„ ๊ฐ€์ง€๊ณ  ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ control์„ ํ•œ ๊ฒƒ์ด๋ฉฐ(model-based) SVG(0)์€ ์˜ˆ์ธกํ•˜๋Š” GN model ์—†์ด model-free ๊ธฐ๋ฐ˜์œผ๋กœ controlํ•œ ๊ฒƒ์œผ๋กœ ์ดํ•ดํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

์‚ฌ์‹ค MPC์™€ SVG๋Š” ๋งค์šฐ ๋น„์Šทํ•œ ์ธก๋ฉด์ด ์žˆ์Šต๋‹ˆ๋‹ค. MPC์—์„œ๋Š” control inputs๋“ค์ด ํ•œ ์—ํ”ผ์†Œ๋“œ์—์„œ ์ดˆ๊ธฐ ์กฐ๊ฑด๋“ค์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ์ตœ์ ํ™” ๋˜๋Š” ๊ฒƒ์ด๋ผ๋ฉด, SVG์—์„œ๋Š” state์™€ control์„ ๋งค์นญ์‹œํ‚ค๋Š” policy function์ด ํ•™์Šต๊ณผ์ •์—์„œ ๊ฒฝํ—˜ํ•œ states์— ๋Œ€ํ•ด์„œ ์ตœ์ ํ™” ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

3. Methods

Environments

  • MuJoCo ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ์ด์šฉํ–ˆ๋‹ค.
    • Pendulum, Cartpole, Acrobot, Swimmer, Cheetah, Walker2d, JACO(robotic arm)
  • generated training data for our forward models by applying simulated random controls to the systems, and recording the state transitions
  • generalization and system identification ์‹คํ—˜์„ ์œ„ํ•ด์„œ
    • created a dataset of versions of several of our systemsโ€”Pendulum, Cartpole, Swimmer, Cheetah and JACOโ€” with procedurally varied parameters and structure.
    • ๋ณ€ํ™”์‹œํ‚จ ์†์„ฑ๋“ค๋กœ๋Š” link lengths, body masses, and motor gears. + varied the number of links in the Swimmerโ€™s structure, from 3-15 (we refer to a swimmer with N links as SwimmerN )

MPC planning

  • N-step trajectory(N: planning horizon)์™€ ๊ทธ trajectory๋ฅผ ์‹คํ–‰ํ–ˆ์„ ๋•Œ ๋ฐ›์„ total reward๋ฅผ GN forward ๋ชจ๋ธ๋กœ ์˜ˆ์ธกํ–ˆ๋‹ค.
  • ์ด๋•Œ์˜ action sequences(=trajectory)๋“ค์„ total reward์˜ backpropagating gradient๋ฅผ ๊ฐ€์ง€๊ณ  ์ตœ์ ํ™”ํ•˜๊ฒŒ ๋œ๋‹ค.

Model-based reinforcement learning

  • GN-based model์„ ๊ฐ•ํ™”ํ•™์Šต์— ์ ์šฉํ•ด๋ณด๋‹ค
  • SVG๋ฅผ ์ด์šฉํ–ˆ๋‹ค.
  • GN forward model์ด ๋ฏธ๋ถ„๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— GN ๋ชจ๋ธ๋กœ ์ƒ์„ฑ๋œ next state๋ฅผ ๊ฐ€์ง€๊ณ  expected return ๊ฐ’์˜ ๊ทธ๋ผ๋””์–ธํŠธ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
  • 1 step์„ ์˜ˆ์ธกํ•˜๋Š” SVG(1)๊ณผ
  • model-free RL ๋ฒ ์ด์Šค๋ผ์ธ๊ณผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด SVG(0)๊ณผ deterministic policy ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ DDPG(Deep Deterministic Policy Gradients)์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

Baseline comparisons

  1. constant prediction baseline: input state๋ฅผ ๊ทธ๋Œ€๋กœ output state๋กœ ์‚ฌ์šฉ
  2. MLP baseline: GN model์— ์“ฐ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ MLP์— flattened & concatenated ํ•ด์„œ ํ•™์Šต
  3. MPC baseline: ๋ฌผ๋ฆฌ ๋ชจ๋ธ์„ ๊ฐ€์ง€๊ณ  Differential Dynamic Programming algorithm์„ ์‚ฌ์šฉํ•ด์„œ ground truth ๊ฐ’์„ ๊ฐ€์ง
  4. SVG(0): model-free RL agents
  5. DDPG: model-free RL agents

Prediction performance evaluation

  • calculated independent errors for position, orientation, linear velocity angular velocity
  1. squared one-step dynamic state differences (one-step error)
  2. squared trajectory differences (rollout error) between the prediction and the ground truth.

4. Results

Prediction

Learning a forward model for a single system

ํ•˜๋‚˜์˜ ์‹œ์Šคํ…œ์„ ๊ฐ€์ง€๊ณ  ํ•™์Šตํ•œ forward model์˜ Prediction ์„ฑ๋Šฅ ์‚ดํŽด๋ณด๊ธฐ

  • random control๋กœ ๋งŒ๋“ค์–ด์ง„ ๋ฐ์ดํ„ฐ๋“ค์„ ๊ฐ€์ง€๊ณ  ํ•™์Šต๋œ GN-based model

    • [Visually] Swimmer6์—์„œ ๊ทธ๋ฆผ์—์„œ ์ฒ˜๋Ÿผ ground truth์™€ ์˜ˆ์ธก ๊ฒฐ๊ณผ๊ฐ€ ๊ตฌ๋ถ„์ด ์•ˆ ๊ฐˆ ์ •๋„๋กœ ํก์‚ฌํ•˜๋‹ค.(์˜์ƒ์—์„œ๋„ ๊ฑฐ์˜ ๊ตฌ๋ถ„์ด ์•ˆ ๊ฐˆ ์ •๋„๋กœ ์ž˜ ์˜ˆ์ธกํ•˜๊ณ  ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.)

    • [Quantitatively] 100 step์—์„œ 3์ถ• ๋ฐฉํ–ฅ์œผ๋กœ์˜ ์œ„์น˜, ์„ ์†๋„, ๊ฐ์†๋„, ์ฟผํ„ฐ๋‹ˆ์•ˆ ๋ฐฉํ–ฅ ๋น„๊ต

  • constant prediction baseline์€ ์•„์›ƒํ’‹์œผ๋กœ ์ธํ’‹์„ ๊ทธ๋Œ€๋กœ ๋ณต์‚ฌํ•ด์„œ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์• ๋Ÿฌ ์ตœ๋Œ€์น˜๋กœ normalization ํ•˜๊ธฐ ์œ„ํ•ด ๊ฒ€์€์ƒ‰ ์ ์„ ์œผ๋กœ ํ‘œ๊ธฐ

  • ์šฐ์„  ๊ฒ€์€ ์ ์„ ๊ณผ ๋ง‰๋Œ€๊ธฐ๋“ค์„ ๋ญ‰๋šฑ๊ทธ๋ ค์„œ ๋ณด๋ฉด,

  • 1 step๊ณผ 100 step์˜ rollout ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ–ˆ์„ ๋•Œ ๊ฒ€์€ ์ ์„ ์— ๋น„ํ•ด ํŒŒ๋ž€์ƒ‰ ๋ง‰๋Œ€๊ธฐ๋“ค์˜ error ๊ฐ’์ด ๋‚ฎ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

  • GN ๋ชจ๋ธ์ด MLP-based ๋ณด๋‹ค ๋” ๋‚ฎ์€ ์• ๋Ÿฌ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” ํŠน๋ณ„ํžˆ Swimmer6์ฒ˜๋Ÿผ ์—์ด์ „ํŠธ์˜ ๊ตฌ์กฐ๊ฐ€ ๋ฐ˜๋ณต์ ์ธ ๊ฒฝ์šฐ์— ๋”์šฑ ๋ˆˆ์— ๋„๊ฒŒ ๋‚ฎ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด GN-based forward ๋ชจ๋ธ์ด ๋‹ค์–‘ํ•œ ๋ฌผ๋ฆฌ ์‹œ์Šคํ…œ๋“ค์—์„œ dynamics๋ฅผ ์ž˜ ์˜ˆ์ธกํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

  • GN์ด MLP๋ณด๋‹ค ๋” generalization์ด ์ž˜ ๋จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋Š”๋ฐ, Swimmer6๋ฅผ ์ง‘์ค‘์ ์œผ๋กœ train, valid, test ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด 1-step, rollout error๋ฅผ ๊ฐ๊ฐ ํ™•์ธํ•ด๋ดค์„ ๋•Œ, Best GN์˜ error ๊ฐ’์ด Best MLP๋ณด๋‹ค ๋‚ฎ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ test data์˜ error ์ฆ๊ฐ€์œจ์„ ๋ดค์„ ๋•Œ์—๋„ GN ๋ชจ๋ธ์˜ test data์˜ error๊ฐ€ ๋” ์ ๊ฒŒ ์ฆ๊ฐ€ํ•จ์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ์—ˆ๊ณ  ์ด๋Š” agent์˜ bodies์™€ joints๋“ค์— ๋Œ€ํ•œ inductive bias๊ฐ€ GN์„ ํ†ตํ•ด ์ž˜ ํ•™์Šต๋˜์—ˆ์Œ์„ ์ฆ๋ช…ํ•  ์ˆ˜ ์žˆ๋‹ค.

Learning a forward model for multiple systems

ํ•œ ๊ฐœ์˜ ์‹œ์Šคํ…œ์—์„œ์˜ forward model์„ ์‚ดํŽด๋ณด์•˜์œผ๋‹ˆ ์ด์ œ ์—ฌ๋Ÿฌ ์‹œ์Šคํ…œ์—์„œ์˜ forward model์˜ ์„ฑ๋Šฅ์„ ์‚ดํŽด๋ณด์ž. GN์„ ์‚ฌ์šฉํ•˜๋ฉด ์—ฌ๋Ÿฌ ์‹œ์Šคํ…œ๋“ค์˜ ๋‹ค์–‘ํ•œ ๋ณ€์ˆ˜๋“ค๋„ ์ž˜ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€์ •์ด ์žˆ์—ˆ๋‹ค. ์ด๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์—ฐ์†์ ์œผ๋กœ static parameter๋“ค(์งˆ๋Ÿ‰, body์˜ ๊ธธ์ด, joint์˜ ๊ฐ๋„ ๋“ฑ)์„ ๋ฐ”๊ฟ”๊ฐ€๋ฉด์„œ forward dynamics๋ฅผ ์–ด๋–ป๊ฒŒ ํ•™์Šตํ•ด๊ฐ€๋Š”์ง€ ํ™•์ธํ–ˆ๋‹ค.

Inference

Control

5. Discussion

Review

๋…ผ๋ฌธ ๋ฆฌ๋ทฐํ›„์˜ ์ฃผ๊ด€์ ์ธ ์žฅ๋‹จ์ ์„ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • Pros ๐Ÿ‘
  • Cons ๐Ÿ‘Ž
    • mlp comparison

Reference

  • Original paper Graph Networks as Learnable Physics Engines for Inference and Control
  • Official code https://github.com/fxia22/gn.pytorch

Copyright 2024, Jung Yeon Lee