Curieux.JY
  • Post
  • Note
  • Jung Yeon Lee

On this page

  • 1 Introduction
  • 2 Variational Bound
    • 2.1 Regularization term
    • 2.2 Reconstruction error term
      • 2.2.1 Reparametrization Trick
  • 3 VAE Structure
  • 4 Experiment
  • 5 Conclusion
  • 6 Improved Work

๐Ÿ“ƒVAE ๋ฆฌ๋ทฐ

generative
vae
paper
Auto-Encoding Variational Bayes
Published

October 2, 2022

์ด๋ฒˆ ํฌ์ŠคํŠธ๋Š” ์ƒ์„ฑ๋ชจ๋ธ์—์„œ ์œ ๋ช…ํ•œ Variational Auto-Encoder(VAE)๋ฅผ ๋‹ค๋ฃจ๊ณ  ์žˆ๋Š” Auto-Encoding Variational Bayes๋ผ๋Š” ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ์ž…๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŠธ๋ฅผ ์ •๋ฆฌํ•˜๋ฉด์„œ ๊ฐ€์žฅ ๋งŽ์ด ์ธ์šฉํ•˜๊ณ  ๋„์›€์„ ๋ฐ›์€ ์˜คํ†  ์ธ์ฝ”๋”์˜ ๋ชจ๋“  ๊ฒƒ๋ฅผ ๋ณด์‹œ๋ฉด ํ›จ์”ฌ ๋” ์ž์„ธํ•˜๊ณ  ๊นŠ์€ ์ดํ•ด๋ฅผ ํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํฌ์ŠคํŠธ์˜ ์ˆœ์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.

1 Introduction

VAE๋Š” ์ƒ์„ฑ๋ชจ๋ธ(Generative Model)์—์„œ ์œ ๋ช…ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์ƒ์„ฑ ๋ชจ๋ธ์ด๋ž€ ๋ฌด์—‡์„ ๋งํ•˜๋Š” ๊ฑธ๊นŒ์š”? ์˜ˆ๋ฅผ ๋“ค์–ด ์šฐ๋ฆฌ๊ฐ€ ์ฐ์€ ์ ์ด ์—†๋Š” ๊ฐ•์•„์ง€ ์‚ฌ์ง„์„ ๋งŒ๋“ค์–ด๋‚ด๊ณ  ์‹ถ๋‹ค๊ณ  ํ•ด๋ด…์‹œ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ ๊ฐ•์•„์ง€ ์‚ฌ์ง„์ด ์‹ค์ œ ๊ฐ•์•„์ง€๋“ค์„ ์ฐ์€ ์‚ฌ์ง„๋“ค๊ณผ ๋„ˆ๋ฌด ๋™๋–จ์–ด์ ธ์„œ ์ด์งˆ๊ฐ์„ ๋А๋ผ์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๋งฅ๋ฝ์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๊ฒƒ์€ train database์— ์žˆ๋Š” ์‚ฌ์ง„๋“ค, ์ฆ‰ ์‹ค์ œ๋กœ ๊ฐ•์•„์ง€๋“ค์„ ์ฐ์€ ์‚ฌ์ง„๋“ค์˜ ๋ถ„ํฌ๋ฅผ ์•Œ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋ถ„ํฌ๋ฅผ ์•Œ๊ณ  ์‹ถ์€ ์ด์œ ๋Š” ์šฐ๋ฆฌ๊ฐ€ ๋ถ„ํฌ(distribution)์„ ์•Œ์•„์•ผ ๋ถ„ํฌ์—์„œ data๋ฅผ ์ƒ˜ํ”Œ๋งํ•ด์„œ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋‹ค์‹œ ์ •๋ฆฌํ•˜์ž๋ฉด, ํ˜„์žฌ ๋ฐ์ดํ„ฐ๋“ค๊ณผ ๋น„์Šทํ•œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑ ํ•˜๊ธฐ ์œ„ํ•ด ํ˜„์žฌ train DB์˜ ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ถ„ํฌ p(x) ๋ฅผ ์•Œ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ x๋ฅผ ์ƒ์„ฑํ•˜๋Š” Generator๋ฅผ ์ž‘๋™์‹œํ‚ฌ controller๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก Generator๋ฅผ trigger ํ•ด์ฃผ๋Š” ๋ถ€๋ถ„์ด๊ธฐ ๋•Œ๋ฌธ์— ์šฐ๋ฆฌ๊ฐ€ ๋‹ค๋ฃจ๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ค์–ด ์ค˜์•ผ ์ดํ›„ ์ƒ์„ฑ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ๋•Œ ํŽธ๋ฆฌํ•  ๊ฒƒ ์ž…๋‹ˆ๋‹ค. controller ์—ญํ• ์„ ํ•˜๋Š” latent variable z๋Š” p(z)์—์„œ ์ƒ˜ํ”Œ๋ง๋˜๋ฉฐ ๋ฐ์ดํ„ฐ x๋ณด๋‹ค ์ฐจ์›์ด ์ž‘๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

Generative Model6

๋‹ค์‹œ ๋ชฉํ‘œ์˜€๋˜ p(x)๋ฅผ ์ƒ๊ฐํ•ด๋ณด๋ฉด, prior probability p(z)์™€ conditional probabability์˜ ๊ณฑ ์ ๋ถ„์œผ๋กœ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ ๋ถ„ ์ด๋•Œ ์ ๋ถ„์„ ๋‹จ์ˆœํžˆ samplingํ•œ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ๋“ค์„ summationํ•ด์„œ maximum likelihood estimation์„ ๋ฐ”๋กœ ํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ? ์ƒ๊ฐํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ ์ด๋Š” ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ณผ์ •์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜์ง€ ์•Š๋Š” ์ƒ˜ํ”Œ๋“ค์ด ๋” ๋งŽ์ด ๋ฝ‘ํž ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ฐฉ๋ฒ•์„ ์“ธ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

The reason why it is hard to measure the likelihood of images under a model using only sampling11

์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜์ง€ ์•Š๋Š” ์ƒ˜ํ”Œ๋“ค์ด ๋” ๋งŽ์ด ๋ฝ‘ํžˆ๋Š” ํ˜„์ƒ์„ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. MINST ๋ฐ์ดํ„ฐ ์ค‘ 2๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ด๋ฏธ์ง€ (a)๊ฐ€ ์žˆ๊ณ , (a)๋ฅผ ์ผ๋ถ€ ์ง€์šด (b)์™€ (a)๋ฅผ ์˜ค๋ฅธ์ชฝ์œผ๋กœ 1 pixel ๋งŒํผ ์˜ฎ๊ธด (c)๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ ์šฐ๋ฆฌ๋Š” (a)์™€ ์œ ์‚ฌํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋งŽ์ด ๋ฝ‘๊ณ  ์‹ถ๊ณ , (b)๋ณด๋‹ค๋Š” (c)๊ฐ€ (a)์™€ ๋” ๊ฐ€๊น๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฏ€๋กœ (c)์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋“ค์ด ๋” ๋งŽ์ด ๋ฝ‘ํžˆ๊ธฐ๋ฆฌ ์›ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ณดํ†ต Generator๊ฐ€ Normal distribution์œผ๋กœ ๋””์ž์ธ ๋˜๊ณ  MSE ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ์„ ํ†ตํ•ด (a)์™€ ๋” ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋กœ Normal distribution์˜ ํ‰๊ท ์„ ์˜ฎ๊ฒจ๊ฐ„๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ, (c)๋ณด๋‹ค (b)๊ฐ€ (a)์™€์˜ MSE๊ฐ€ ์ ๊ธฐ ๋•Œ๋ฌธ์— (b)์™€ ๋น„์Šทํ•œ ๊ฐ’์ด ์ •๊ทœ๋ถ„ํฌ์˜ ํ‰๊ท ์ด ๋˜๊ณ  (b)์™€ ๋น„์Šทํ•œ ์ƒ˜ํ”Œ๋“ค์ด ๋” ๋งŽ์ด ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ฒฐ๊ณผ์ ์œผ๋กœ (c)๊ฐ€ (a)์™€ ๋น„์Šทํ•œ ๊ฒƒ์ด ๋” ์ข‹์€ ์ƒ˜ํ”Œ๋ง์ด ๋˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ข‹์€ sampling function์ด๋ž€, ์•ž์„  ์˜ˆ์‹œ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋“ฏ์ด train DB์— ์žˆ๋Š” data x์™€ ์œ ์‚ฌํ•œ ์ƒ˜ํ”Œ์ด ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋Š” ํ™•๋ฅ ๋ถ„ํฌ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ทธ๋ƒฅ ์ƒ˜ํ”Œ๋ง ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค๊ธฐ ๋ณด๋‹ค evidence๋กœ x๋ฅผ given(์กฐ๊ฑด)์œผ๋กœ ํ•˜์—ฌ z๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” ํ™•๋ฅ ๋ถ„ํฌ p(z\|x)๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ๋˜ ๋ฌธ์ œ์ธ ์ ์€ true distridution์ธ ํ•ด๋‹น ๋ถ„ํฌ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๊ธฐ์œ„ํ•ด Variational Inference ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ถ„ํฌ ์ถ”์ •์„ ์œ„ํ•œ family, ์˜ˆ๋ฅผ ๋“ค๋ฉด guassian ๋ถ„ํฌ๋“ค์„ Approximation Class๋กœ ๋‘๊ณ  true distribution์„ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ gaussian ๋ถ„ํฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์ธ ฯ•๋Š” mean๊ณผ std ๊ฐ’์ด ๋  ๊ฒƒ ์ด๊ณ  ์ด๋Ÿฐ ์—ฌ๋Ÿฌ gaussian ๋ถ„ํฌ๋“ค๊ณผ true posterior ๊ฐ„์˜ KL divergence๋ฅผ ๊ตฌํ•˜์—ฌ ์ถ”์ •ํ•ด๊ฐ‘๋‹ˆ๋‹ค.

Variational Inference12

๋”ฐ๋ผ์„œ ์ •๋ฆฌํ•ด๋ณด๋ฉด ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ์ƒ์„ฑ๋ชจ๋ธ์ธ Generator๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด Variational Inference ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜์—ˆ๊ณ  ๊ทธ๋Ÿฌ๋‹ค ๋ณด๋‹ˆ AutoEncoder์™€ ๋น„์Šทํ•œ ๋ชจ๋ธ ๊ตฌ์กฐ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฐ์ดํ„ฐ ์••์ถ•์ด ๋ชฉํ‘œ์ธ AutoEncoder์™€ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์ด ๋ชฉํ‘œ์ธ VAE๋Š” ๊ฐ์ž์˜ ๋ชฉํ‘œ์— ๋งž์ถฐ ํ•„์š”ํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ๋”ํ•˜๊ฒŒ ๋˜๋ฉด์„œ ๊ทธ ๋ชจ๋ธ ๊ตฌ์กฐ๊ฐ€ ๋น„์Šทํ•ด๋ณด์ด๊ฒŒ ๋œ ๊ฒƒ์ด์ง€ ๊ฐ™์ง€์•Š์Šต๋‹ˆ๋‹ค.

VAE์˜ ์ „์ฒด ๊ตฌ์กฐ๋Š” [1] Decoder, Generator, Generation Network ๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๋ถ€๋ถ„๊ณผ [2] Encoder, Posterior, Inference Network๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๋ถ€๋ถ„, ํฌ๊ฒŒ 2๊ฐ€์ง€ ํŒŒํŠธ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

VAE Structure6

2 Variational Bound

์œ„์˜ ํ๋ฆ„์„ ์ด์–ด๊ฐ€๋ณด๋ฉด, ์ฒ˜์Œ์— ์•Œ๊ณ  ์‹ถ์—ˆ๋˜ ๊ฒƒ์€ (1) p(x)์˜€์œผ๋‚˜ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋“ค๋กœ ์ƒ˜ํ”Œ๋ง(์ปจํŠธ๋กค)ํ•˜๊ธฐ ์œ„ํ•ด (2) p(z\|x) (true posterior)๊ฐ€ ํ•„์š”ํ•ด์กŒ๊ณ , true posterior๋ฅผ ์•Œ ์ˆ˜ ์—†์œผ๋‹ˆ ์ด๋ฅผ ์ถ”์ •(Variational Inference)ํ•˜๊ธฐ ์œ„ํ•ด์„œ (3) qฯ•(z\|x)๊ฐ€ ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ์ด 3๊ฐœ์˜ ๋ถ„ํฌ๋“ค์˜ ๊ด€๊ณ„๋ฅผ ์ข€ ๋” ์‚ดํŽด๋ณด๊ณ  ์–ด๋–ป๊ฒŒ ์ƒ์„ฑ๋ชจ๋ธ์„ ํ•™์Šตํ•ด๋‚˜๊ฐˆ ๊ฒƒ์ธ์ง€ ๊ณ ๋ฏผํ•ด๋ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ฒ˜์Œ์˜ ๋ชฉํ‘œ์˜€๋˜ p(x) ์— log๋ฅผ ์”Œ์›Œ์„œ ์•„๋ž˜์™€ ๊ฐ™์€ ์‹ ๋ณ€ํ˜•์„ ์ง„ํ–‰ํ•˜๋ฉด 2๊ฐœ์˜ term์œผ๋กœ ๋‚˜๋ˆ ์ง‘๋‹ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ term์€ ์ด๋ฒˆ ์žฅ์˜ ์ฃผ์ธ๊ณต์ธ Evidence LowerBOund๋ผ๋Š” ELBO์ด๊ณ  ๋‘๋ฒˆ์งธ term์€ Variational Inference์—์„œ ๋ดค์—ˆ๋˜ true posterior์™€ approximator ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” KL ๊ฐ’์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ log(p(x)) ๊ฐ€ ์ผ์ •ํ•  ๋•Œ KL ๊ฐ’์„ ์ค„์ด๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ(=true posterior๋ฅผ ์ž˜ approximationํ•˜๋Š” ๊ฒƒ)์ด๊ณ  KL์€ ํ•ญ์ƒ ์–‘์ˆ˜์ด๊ธฐ ๋•Œ๋ฌธ์—, ์—ญ์œผ๋กœ ์ƒ๊ฐํ•ด๋ณด๋ฉด ์ฒซ๋ฒˆ์งธ term์ด์—ˆ๋˜ ELBO ๊ฐ’์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ๊ฐ„๋‹จํžˆ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ด๋ณด๋ฉด ์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๊ฒƒ์€ ELBO๊ฐ’์ด ์ปค์งˆ ์ˆ˜ ์žˆ๋Š” ฯ•๋ฅผ ์ฐพ์•„๊ฐ€๋Š” ๊ณผ์ •์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Decomposition of log(p(x))6

๋”ฐ๋ผ์„œ ELBO๊ฐ’์ด ์ปค์งˆ ์ˆ˜ ์žˆ๋Š” ฯ•๋ฅผ ์ฐพ์•„๊ฐ€๋Š” ์ตœ์ ํ™”๋ฅผ ์ˆ˜์‹์„ ๋ณ€ํ˜•ํ•˜์—ฌ ๋˜ ๋‹ค์‹œ 2๊ฐœ์˜ term ์ฆ‰, (1) Reconstructino Error์™€ (2) Regularization ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

The relationship of log(p(x)), KL, and ELBO6

๋จผ์ € Reconstructino Error์€ ํ•™์Šต ๋ฐ์ดํ„ฐ์ธ x๊ฐ€ ๋“ค์–ด๊ฐ”์„ ๋•Œ x๊ฐ€ ๋‚˜์˜ค๋„๋ก ํ•˜๋Š” ๋ณต์›(Reconstruction)์„ ํ•˜๋„๋ก ํ•˜๋Š” term์ด๋ฉฐ, Regularization์€ prior distribution์ธ q์˜ ํ˜•ํƒœ๋ฅผ ์ œํ•œํ•ด์ฃผ๋Š” ์—ญํ• ์„ ํ•˜๋Š” term์ž…๋‹ˆ๋‹ค.

2.1 Regularization term

ELBO term์„ ๋‚˜๋ˆ„์—ˆ์„ ๋•Œ ๋‚˜์™”๋˜ ์ฒซ๋ฒˆ์งธ Regularization term์— ๋Œ€ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. True posterior๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•œ q_\phi(\mathrm{z} \mid \mathrm{x})์€ KL ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์‰ฝ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด Multivariate gaussian distribution์œผ๋กœ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์•ž์„œ ์ด์•ผ๊ธฐํ–ˆ๋˜ ๊ฒƒ ์ฒ˜๋Ÿผ controller ๋ถ€๋ถ„์ธ p(z)๋Š” ๋‹ค๋ฃจ๊ธฐ ์‰ฌ์šด ๋ถ„ํฌ์ด์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ •๊ทœ๋ถ„ํฌ๋กœ ๋งŒ๋“ค์–ด ์ค๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ๋…ผ๋ฌธ์˜ Appendix F.1์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋“ค ์‚ฌ์ด์˜ KL ๊ฐ’์€ mean๊ณผ std๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‰ฝ๊ฒŒ ๊ณ„์‚ฐ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Regularization term6

2.2 Reconstruction error term

ELBO์˜ ๋‘๋ฒˆ์งธ term์ธ Reconstruction error์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Reconstruction error์˜ expectation ํ‘œํ˜„์„ integral๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๊ณ  ์ด๋Š” ๋ชฌํ…Œ์นด๋ฅผ๋กœ ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด L๊ฐœ์˜ z_{i,โ€†l}๋ฅผ ๊ฐ€์ง€๊ณ  ํ‰๊ท ์„ ๋‚ด์„œ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ index i๋Š” ๋ฐ์ดํ„ฐ x์˜ ๋„˜๋ฒ„๋ง์ด๊ณ  index l์€ generator์˜ distribution์—์„œ ์ƒ˜ํ”Œ๋งํ•˜๋Š” ํšŸ์ˆ˜์— ๋Œ€ํ•œ ๋„˜๋ฒ„๋ง์ž…๋‹ˆ๋‹ค. VAE๋Š” ํ•œ์ •๋œ ๋ชฌํ…Œ์นด๋ฅผ๋กœ ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ํšจ๊ณผ์ ์œผ๋กœ optimization์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Recontruction error term6

2.2.1 Reparametrization Trick

์œ„์—์„œ Reconstruction error๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ณผ์ •์—์„œ backpropation์„ ํ•˜๊ธฐ ์œ„ํ•ด Reparametrization trick์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋‹จ์ˆœํžˆ ์ •๊ทœ๋ถ„ํฌ์—์„œ ์ƒ˜ํ”Œ๋ง ํ•˜๋ฉด random node์ธ z์— ๋Œ€ํ•ด์„œ gradient๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— random์„ฑ์„ ์ •๊ทœ๋ถ„ํฌ์—์„œ ์ƒ˜ํ”Œ๋ง ๋˜๋Š” ฯต์œผ๋กœ ๋งŒ๋“ค์–ด์ฃผ๊ณ  ์ด๋ฅผ reparametrization์„ ํ•ด์ฃผ์–ด์„œ deterministic node๊ฐ€ ๋œ z๋ฅผ backpropagation ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Reparametrization trick6
# sampling by re-parameterization technique
z = mu + sigma * tf.random_normal(tf.shape(mu), 0, 1, dtype=tf.float32)

z๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๋Š” generator์˜ distribution์€ Bernoulli๋กœ ๋””์ž์ธํ•  ๊ฒฝ์šฐ NLL์ด Cross Entropy๊ฐ€ ๋˜๋ฉฐ Gaussian ๋ถ„ํฌ๋กœ ๋””์ž์ธํ•  ๊ฒฝ์šฐ MSE๊ฐ€ ๋˜์–ด์„œ ๋ณดํ†ต ๊ณ„์‚ฐํ•˜๊ธฐ ์šฉ์ดํ•œ 2๊ฐœ์˜ ๋ถ„ํฌ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ๋””์ž์ธ์˜ ์กฐ๊ฑด์€ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋˜๋Š”๋ฐ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ continuous ํ•˜๋‹ค๋ฉด Gaussian ๋ถ„ํฌ์— ๊ฐ€๊น๊ธฐ ๋•Œ๋ฌธ์— Gaussian์œผ๋กœ ๋””์ž์ธํ•˜๊ณ , ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ discrete ํ•˜๋‹ค๋ฉด Bernoulli๋ถ„ํฌ์— ๊ฐ€๊น๊ธฐ ๋•Œ๋ฌธ์— Bernoulli๋กœ ๋””์ž์ธํ•ฉ๋‹ˆ๋‹ค.

Types of generator distributions6

3 VAE Structure

์ง€๊ธˆ๊นŒ์ง€ ์‚ดํŽด๋ณธ VAE ๊ตฌ์กฐ๋Š” Encoder์™€ Decoder๋ฅผ ๊ฐ๊ฐ ์–ด๋–ค ๋ถ„ํฌ๋กœ ๋””์ž์ธํ•ด์ฃผ๋Š” ๋ƒ์— ๋”ฐ๋ผ Reconstruction error์™€ Regularization์„ ๊ณ„์‚ฐํ•˜๋Š” ์‹๋งŒ ์กฐ๊ธˆ์”ฉ ๋‹ฌ๋ผ์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. Encoder ๋ถ€๋ถ„์€ Reconstruction error์˜ ๊ณ„์‚ฐ์˜ ์šฉ์ด์„ฑ ๋•Œ๋ฌธ์— ๋ชจ๋“  ์œ ํ˜•์—์„œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๊ณ  Decoder ๋ถ€๋ถ„๋งŒ ๋ณ€ํ˜•ํ•˜์—ฌ ์•„๋ž˜์˜ ์—ฌ๋Ÿฌ ์œ ํ˜•๋“ค์ด ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์šฐ์„  ๋ชจ๋“  VAE์—์„œ ๊ณตํ†ต์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š” Encoder๋ฅผ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# Gateway
def autoencoder(x_hat, x, dim_img, dim_z, n_hidden, keep_prob):

    # encoding
    mu, sigma = gaussian_MLP_encoder(x_hat, n_hidden, dim_z, keep_prob)

    # sampling by re-parameterization technique
    z = mu + sigma * tf.random_normal(tf.shape(mu), 0, 1, dtype=tf.float32)

    # decoding
    y = bernoulli_MLP_decoder(z, n_hidden, dim_img, keep_prob)
    y = tf.clip_by_value(y, 1e-8, 1 - 1e-8)

    # loss
    marginal_likelihood = tf.reduce_sum(x * tf.log(y) + (1 - x) * tf.log(1 - y), 1)
    KL_divergence = 0.5 * tf.reduce_sum(tf.square(mu) + tf.square(sigma) - tf.log(1e-8 + tf.square(sigma)) - 1, 1)

    marginal_likelihood = tf.reduce_mean(marginal_likelihood)
    KL_divergence = tf.reduce_mean(KL_divergence)

    ELBO = marginal_likelihood - KL_divergence

    loss = -ELBO

    return y, z, loss, -marginal_likelihood, KL_divergence

def decoder(z, dim_img, n_hidden):

    y = bernoulli_MLP_decoder(z, n_hidden, dim_img, 1.0, reuse=True)

    return y

(1) Encoder: Gaussian / Decoder: Bernoulli

Encoder: Gaussian / Decoder: Bernoulli
VAE type 1 6

์œ„ ๋ชจ๋ธ์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# Bernoulli MLP as decoder
def bernoulli_MLP_decoder(z, n_hidden, n_output, keep_prob, reuse=False):

    with tf.variable_scope("bernoulli_MLP_decoder", reuse=reuse):
        # initializers
        w_init = tf.contrib.layers.variance_scaling_initializer()
        b_init = tf.constant_initializer(0.)

        # 1st hidden layer
        w0 = tf.get_variable('w0', [z.get_shape()[1], n_hidden], initializer=w_init)
        b0 = tf.get_variable('b0', [n_hidden], initializer=b_init)
        h0 = tf.matmul(z, w0) + b0
        h0 = tf.nn.tanh(h0)
        h0 = tf.nn.dropout(h0, keep_prob)

        # 2nd hidden layer
        w1 = tf.get_variable('w1', [h0.get_shape()[1], n_hidden], initializer=w_init)
        b1 = tf.get_variable('b1', [n_hidden], initializer=b_init)
        h1 = tf.matmul(h0, w1) + b1
        h1 = tf.nn.elu(h1)
        h1 = tf.nn.dropout(h1, keep_prob)

        # output layer-mean
        wo = tf.get_variable('wo', [h1.get_shape()[1], n_output], initializer=w_init)
        bo = tf.get_variable('bo', [n_output], initializer=b_init)
        y = tf.sigmoid(tf.matmul(h1, wo) + bo)

    return 

(2) Encoder: Gaussian / Decoder: Gaussian

Encoder: Gaussian / Decoder: Gaussian
VAE type 2 6

์œ„ ๋ชจ๋ธ์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# Gaussian MLP as encoder
def gaussian_MLP_encoder(x, n_hidden, n_output, keep_prob):
    with tf.variable_scope("gaussian_MLP_encoder"):
        # initializers
        w_init = tf.contrib.layers.variance_scaling_initializer()
        b_init = tf.constant_initializer(0.)

        # 1st hidden layer
        w0 = tf.get_variable('w0', [x.get_shape()[1], n_hidden], initializer=w_init)
        b0 = tf.get_variable('b0', [n_hidden], initializer=b_init)
        h0 = tf.matmul(x, w0) + b0
        h0 = tf.nn.elu(h0)
        h0 = tf.nn.dropout(h0, keep_prob)

        # 2nd hidden layer
        w1 = tf.get_variable('w1', [h0.get_shape()[1], n_hidden], initializer=w_init)
        b1 = tf.get_variable('b1', [n_hidden], initializer=b_init)
        h1 = tf.matmul(h0, w1) + b1
        h1 = tf.nn.tanh(h1)
        h1 = tf.nn.dropout(h1, keep_prob)

        # output layer
        # borrowed from https: // github.com / altosaar / vae / blob / master / vae.py
        wo = tf.get_variable('wo', [h1.get_shape()[1], n_output * 2], initializer=w_init)
        bo = tf.get_variable('bo', [n_output * 2], initializer=b_init)
        gaussian_params = tf.matmul(h1, wo) + bo

        # The mean parameter is unconstrained
        mean = gaussian_params[:, :n_output]
        # The standard deviation must be positive. Parametrize with a softplus and
        # add a small epsilon for numerical stability
        stddev = 1e-6 + tf.nn.softplus(gaussian_params[:, n_output:])

    return mean, stddev

(3) Encoder: Gaussian / Decoder: Gaussian w/ Identity Covariance

Encoder: Gaussian / Decoder: Gaussian w/ Identity Covariance
VAE type 3 6

MNIST data์„ ์˜ˆ์‹œ๋กœ ๋“ค์–ด์„œ VAE ๊ตฌ์กฐ๋ฅผ ๋‚˜ํƒ€๋‚ด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

VAE with MNIST 6

4 Experiment

ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ ์‹คํ—˜์€ ์ด 2๊ฐ€์ง€๋ฅผ ์ง„ํ–‰ํ–ˆ๋Š”๋ฐ ์•ž์—์„œ๋Š” ๊ณ„์† VAE๋กœ ๋‚˜ํƒ€๋ƒˆ์ง€๋งŒ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•ด๋‹น ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ AEVB๋กœ ์ง€์นญํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ VAE ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ƒ๊ฐํ•˜๊ณ  ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์„ ๋ณด๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์šฐ์„  ์ฒซ๋ฒˆ์งธ๋กœ MNIST ๋ฐ์ดํ„ฐ์…‹๊ณผ Frey Face ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒ ์ด์Šค๋ผ์ธ์œผ๋กœ wake-sleep ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์„ฑ๋Šฅ์„ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค.

ELBO๊ฐ’์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋ฏ€๋กœ y์ถ•์˜ ๊ฐ’์ด ํด์ˆ˜๋ก ์ข‹์€ ๊ฒƒ์œผ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ๊ทธ๋ž˜ํ”„๋“ค์—์„œ ์‹ค์„ ๊ณผ ์ ์„ ์€ ๊ฐ๊ฐ train๊ณผ test ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ELBO ๊ฐ’์„ plottingํ•œ ๊ฒƒ์œผ๋กœ latent variable์ธ z์˜ ์ฐจ์›์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ELBO ๊ฐ’์ด ์–ด๋–ค ์–‘์ƒ์„ ๋‚˜ํƒ€๋‚ด๋Š”์ง€ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. Experiment I์„ ๋ณด๋ฉด ํŠธ๋ ˆ์ด๋‹ ํฌ์ธํŠธ๊ฐ€ ๋งŽ์„ ๋•Œ ์ฆ‰ x์ถ• ๊ฐ’์ด ํด๋•Œ test์™€ training์˜ ELBO๊ฐ’์ด ์ ์  ๋ฒŒ์–ด์ง€๋Š”๊ฒƒ์ด ๊ด€์ฐฐ์ด ๋˜๋Š”๋ฐ ์ด๋Š” ์˜ค๋ฒ„ํ”ผํŒ…์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. ์ €์ž๋Š” ์˜ค๋ฒ„ํ”ผํŒ…์„ ๋ฐฉ์ง€ํ•˜๊ธฐ์œ„ํ•ด ๋ฐ์ดํ„ฐ์…‹์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ์ž‘์—…์„ ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๋‘๋ฒˆ์งธ๋กœ๋Š” MNIST ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ z์˜ ์ฐจ์›์ด 1000, 50000์ผ๋•Œ์˜ ๊ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์˜ ์„ฑ๋Šฅ์„ ํ•™์Šต ์ƒ˜ํ”Œ์ˆ˜์— ๋”ฐ๋ผ Marginal log-likelihood๋ฅผ plottingํ•˜์—ฌ ๋‚˜ํƒ€๋ƒˆ์Šต๋‹ˆ๋‹ค. ์ด ์‹คํ—˜์—์„œ๋Š” ๋ฒ ์ด์Šค๋ผ์ธ์œผ๋กœ Wake-Sleep๊ณผ MCEM์„ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ ์—ฌ๊ธฐ์„œ๋Š” AEVB(=VAE)๊ฐ€ convergence speed ์ธก๋ฉด์—์„œ ๋ฒ ์ด์Šค๋ผ์ธ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

5 Conclusion

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฐ์†์ ์ธ latent variable์„ ํšจ์œจ์ ์œผ๋กœ inference ํ•˜๊ธฐ์œ„ํ•ด Stochastic Gradient VB๋กœ variational lower bound์˜ estimation ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์—์„œ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•์€ back propgataion์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— VAE๋Š” Reparametrization trick์„ ์ด์šฉํ•˜์—ฌ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ์ƒ˜ํ”Œ๋งํ•œ estimator๋Š” ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๊ณ  SGD๋กœ ์ตœ์ ํ™” ๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ VAE๋Š” i.i.d์˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ฐ™์ด ๊ฐ datapoint๊ฐ€ ์—ฐ์†์ ์ธ latent variable๋ฅผ ๊ฐ€์ง€๋Š” high dimensional ๋ฐ์ดํ„ฐ์— ๋Œ€๋น„ํ•ด Auto-Encoding VB ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ SGVB task๋ฅผ ํ•ด๊ฒฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. VAE๋Š” ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์ €์ฐจ์›์˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•œ ๋‹ค์Œ ์›๋ณธ ์ด๋ฏธ์ง€๋กœ ๋ณต์›ํ•˜๋Š” ์ƒ์„ฑ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ณต์žกํ•œ ๋ถ„ํฌ๋กœ ๊ตฌ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋กœ ์ฐจ์›์ถ•์†Œํ•œ ํ•™์Šต๋ฐฉ๋ฒ•์€ ๋ฌธ์ œ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ต์‹ฌ์ ์ธ ์ •๋ณด๋ฅผ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ชจ์–‘์œผ๋กœ ์••์ถ•ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋งค์šฐ ํž˜๋“ค๋ฉฐ function loss๋Š” ์กด์žฌํ•  ์ˆ˜ ๋ฐ–์— ์—†์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ posterior collapse๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. VAE ์ดํ›„ Posterior collapse๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋“ค์ด ์ง„ํ–‰๋˜์—ˆ๋Š”๋ฐ์š”, VQ-VAE์™€ ๊ฐ™์ด VAE๋ฅผ ์—…๊ทธ๋ ˆ์ด๋“œํ•œ ๋‹ค์–‘ํ•œ latent variable ๋ชจ๋ธ๋“ค์„ ๋งŒ๋‚˜๋ณด์„ธ์š”!

6 Improved Work

VAE์™€ ์ƒ์„ฑ์  ์ ๋Œ€ ๋„คํŠธ์›Œํฌ(GAN)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Auto-Encoder์˜ posterior๋ฅผ ์ž„์˜์˜ prior ๋ถ„ํฌ์™€ matchingํ•จ์œผ๋กœ์จ variational inference๋ฅผ ํ•˜๋Š” Advarsarial AutoEncoder(AAE)๋ผ๋Š” ์—ฐ๊ตฌ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. GAN์˜ Discriminator๊ฐ€ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ ๋ถ„ํฌ์™€ ์ง„์งœ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์˜จ ์ด๋ฏธ์ง€์˜ ๋ถ„ํฌ๋ฅผ ํŒ๋ณ„ํ•˜๋ฉด์„œ prior distribution์„ ๋งค์นญํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. Posterior distribution์„ prior distribution๊ณผ matching ํ•จ์œผ๋กœ์จ latent space(prior distribution)์—์„œ ์ƒ˜ํ”Œ๋“ค์ด ์–ด๋–ค ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๊ณ  ๋ถ„ํฌ๋˜์–ด ์žˆ๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. AAE์˜ encoder๋Š” ์›ํ•˜๋Š” prior distribution์— data ๋ถ„ํฌ๋ฅผ ๋งŒ๋“ค๊ฒŒ ๋˜๊ณ  decoder๋Š” ํ•ด๋‹น prior์—์„œ ์˜๋ฏธ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ์ƒ˜ํ”Œ๋“ค์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Advarsarial AutoEncoder 13

Reference

[1] original paper: https://arxiv.org/abs/1312.6114

[2] https://di-bigdata-study.tistory.com/5

[3] https://di-bigdata-study.tistory.com/4?category=848869

[4] https://ratsgo.github.io/generative%20model/2017/12/19/vi/

[5] https://taeu.github.io/paper/deeplearning-paper-vae/

[6] https://medium.com/humanscape-tech/paper-review-vae-ac918509a9ba

[7] https://www.youtube.com/watch?v=o_peo6U7IRM

[8] https://youtu.be/SAfJz_uzaa8

[9] https://youtu.be/GbCAwVVKaHY

[10] https://youtu.be/7t_3dNs4QK4

[11] https://arxiv.org/abs/1606.05908

[12] https://cs.stanford.edu/~sunfanyun/talks/vi_discrete.pdf

[13] https://arxiv.org/abs/1511.05644

Copyright 2024, Jung Yeon Lee