Curieux.JY
  • JungYeon Lee
  • Post
  • Projects
  • Note

On this page

  • ๐Ÿ” Ping Review

๐Ÿ“ƒManipulationNet ๋ฆฌ๋ทฐ

manipulation
benchmark
enchmarking Real-World Robot Manipulation at Scale through Physical Skill Challenges and Embodied Multimodal Reasoning
Published

February 13, 2026

๐Ÿ” Ping. ๐Ÿ”” Ring. โ›๏ธ Dig. A tiered review series: quick look, key ideas, deep dive.

  • Paper Link
  • Project
  • Code
  1. ๐Ÿค– Robotic manipulation ์—ฐ๊ตฌ๋Š” real-world benchmark์˜ ๋ถ€์žฌ๋กœ ์ธํ•ด ํŒŒํŽธํ™”๋˜์–ด ์žˆ์—ˆ์œผ๋ฉฐ, ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ManipulationNet์€ ์‹ค์„ธ๊ณ„ ๋กœ๋ด‡ ์กฐ์ž‘์„ ์œ„ํ•œ ๋ฒค์น˜๋งˆํ‚น ์ธํ”„๋ผ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ› ๏ธ ManipulationNet์€ ํ‘œ์ค€ํ™”๋œ object set๊ณผ client-server ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•ด ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ task setup๊ณผ ๋ถ„์‚ฐ๋œ ์„ฑ๋Šฅ ํ‰๊ฐ€๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋ฉฐ, Physical Skills Track๊ณผ Embodied Reasoning Track์œผ๋กœ ๋กœ๋ด‡์˜ ๋‹ค์–‘ํ•œ ๋Šฅ๋ ฅ์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿ“ˆ ์ด ํ”Œ๋žซํผ์€ real-world manipulation ์—ฐ๊ตฌ์˜ ์ ‘๊ทผ์„ฑ, ์‚ฌ์‹ค์„ฑ, ํ˜„์‹ค์„ฑ์„ ๊ท ํ˜• ์žˆ๊ฒŒ ์ œ๊ณตํ•˜์—ฌ ๋กœ๋ด‡ ์กฐ์ž‘ ๋ถ„์•ผ์˜ ์ฒด๊ณ„์ ์ธ ๋ฐœ์ „๊ณผ ์žฅ๊ธฐ์ ์ธ ๊ณผํ•™์  ์ง„๋ณด๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ธฐ๋ฐ˜์„ ๋งˆ๋ จํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ” Ping Review

๐Ÿ” Ping โ€” A light tap on the surface. Get the gist in seconds.

์ด ๋…ผ๋ฌธ์€ ๋กœ๋ด‡ ์กฐ์ž‘(manipulation) ๋ถ„์•ผ์˜ ๋ฒค์น˜๋งˆํ‚น์— ์žˆ์–ด ๊ธฐ์กด ๋ฐฉ์‹๋“ค์˜ ํ•œ๊ณ„, ์ฆ‰ ๋ฆฌ์–ผ๋ฆฌ์ฆ˜(realism), ์ ‘๊ทผ์„ฑ(accessibility), ๊ทธ๋ฆฌ๊ณ  ์ง„์ •์„ฑ(authenticity) ์‚ฌ์ด์˜ ๊ท ํ˜• ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ์ƒˆ๋กœ์šด ๊ธ€๋กœ๋ฒŒ ์ธํ”„๋ผ์ธ ManipulationNet์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜ ๋ฒค์น˜๋งˆํฌ๋Š” ํ™•์žฅ์„ฑ(scalability)๊ณผ ์žฌํ˜„์„ฑ(reproducibility)์ด ๋†’์ง€๋งŒ ์‹ค์ œ ๋ฌผ๋ฆฌ ์—ญํ•™์„ ์™„๋ฒฝํ•˜๊ฒŒ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•ด ๋ฆฌ์–ผ๋ฆฌ์ฆ˜์ด ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ์‹ค์ œ ํ™˜๊ฒฝ ๋Œ€ํšŒ(real-world competitions)๋Š” ์ง„์ •์„ฑ๊ณผ ๋ฆฌ์–ผ๋ฆฌ์ฆ˜์€ ๋†’์œผ๋‚˜ ๋ฆฌ์†Œ์Šค ์ง‘์•ฝ์ ์ด๊ณ  ํŠน์ • ์žฅ์†Œ ๋ฐ ์‹œ๊ฐ„์— ๊ตญํ•œ๋˜์–ด ์ ‘๊ทผ์„ฑ์ด ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. ํ‘œ์ค€ํ™”๋œ ๊ฐ์ฒด ์„ธํŠธ(standardized object sets)์™€ ํ”„๋กœํ† ์ฝœ(protocol)์€ ์žฌํ˜„์„ฑ์„ ์ œ๊ณตํ•˜์ง€๋งŒ, ์‹ค์ œ ์‹คํ–‰์˜ ์ง„์ •์„ฑ์„ ๋ณด์žฅํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

ManipulationNet์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ค‘์•™ ์ง‘์ค‘์‹-๋ถ„์‚ฐ์‹(hybrid centralized-decentralized) ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ „ ์„ธ๊ณ„ ์—ฐ๊ตฌ ๊ทธ๋ฃน๋“ค์ด ์ž์ฒด ๋กœ๋ด‡ ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜์—ฌ ์–ธ์ œ ์–ด๋””์„œ๋“  ๋ฒค์น˜๋งˆํ‚น ์ž‘์—…์— ์ฐธ์—ฌํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ถ„์‚ฐ๋œ ์ฐธ์—ฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋ฉด์„œ, ์ค‘์•™ ์ง‘์ค‘์‹ ๊ฒ€์ฆ์„ ํ†ตํ•ด ๊ฒฐ๊ณผ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ๋น„๊ต ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•๋ก  ๋ฐ ๊ธฐ์ˆ ์  ์„ธ๋ถ€์‚ฌํ•ญ:

ManipulationNet์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ํŠธ๋ž™์œผ๋กœ ์ž‘์—…์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค:

  1. Physical Skills Track (๋ฌผ๋ฆฌ์  ๊ธฐ์ˆ  ํŠธ๋ž™): ๋กœ๋ด‡์ด ์‹ค์ œ ๋ฌผ๋ฆฌ์  ์ œ์•ฝ ์กฐ๊ฑด ํ•˜์—์„œ ๊ฒฌ๊ณ ํ•œ sensorimotor skills์„ ์‹คํ–‰ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌ์  ์ƒํ˜ธ์ž‘์šฉ์˜ ๊ฐ•๋„์— ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.
  2. Embodied Reasoning Track (๊ตฌํ˜„๋œ ์ถ”๋ก  ํŠธ๋ž™): ๋ฌผ๋ฆฌ์  ๋‚œ์ด๋„๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ณ , ๋กœ๋ด‡์ด ์ž์—ฐ์–ด ๋ช…๋ น(natural language instructions)๊ณผ ์‹œ๊ฐ์  ์ž…๋ ฅ(visual inputs)์„ ํ•ด์„ํ•˜๊ณ  multimodal grounding ๋Šฅ๋ ฅ์„ ํ†ตํ•ด ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๋™์ž‘์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณ ์ˆ˜์ค€ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.

์ž‘๋™ ๋ฐฉ์‹:

ManipulationNet์€ ํ‘œ์ค€ํ™”๋œ ๊ฐ์ฒด ์„ธํŠธ์™€ ์ž‘์—… ํ”„๋กœํ† ์ฝœ์„ ์ค‘์•™์—์„œ ์„ค๊ณ„ํ•˜๊ณ  ์ œ์กฐํ•˜์—ฌ ์ „ ์„ธ๊ณ„์— ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋“  ์ฐธ๊ฐ€์ž๊ฐ€ ๋™์ผํ•œ ์กฐ๊ฑด์—์„œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ์ž‘์—… ์„ค์ •(reproducible task setups)์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

๋ฒค์น˜๋งˆํ‚น ํ”„๋กœํ† ์ฝœ:

๋ชจ๋“  ์ž‘์—…์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ผ๋ฐ˜์ ์ธ ๋ฒค์น˜๋งˆํ‚น ํ”„๋กœํ† ์ฝœ์— ๋”ฐ๋ผ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค:

  1. Setup Phase (์„ค์ • ๋‹จ๊ณ„): ์ฐธ๊ฐ€์ž๋Š” ํ‘œ์ค€ํ™”๋œ ๊ฐ์ฒด ์„ธํŠธ๋ฅผ ๋ฐ›๊ณ , ๋กœ๋ด‡ ์‹œ์Šคํ…œ์„ ๊ตฌ์„ฑํ•˜๋ฉฐ, ๋กœ๋ด‡ ํ•˜๋“œ์›จ์–ด์™€ ๋ณ„๊ฐœ์ธ ๋…๋ฆฝ์ ์ธ ์™ธ๋ถ€ ์นด๋ฉ”๋ผ๋ฅผ mnet-client์— ์—ฐ๊ฒฐํ•˜์—ฌ ์‹คํ–‰ ๊ณผ์ •์„ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. mnet-client๊ฐ€ ์‹คํ–‰๋˜๋ฉด ์ฆ‰์‹œ mnet-server์— trial์ด ๋“ฑ๋ก๋˜๋ฉฐ, ๋ฌด์ž‘์œ„ ์„ ํƒ(cherry-picking)์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์ œํ•œ๋œ ์ˆ˜์˜ trial์ด ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค.
  2. Execution Phase (์‹คํ–‰ ๋‹จ๊ณ„): mnet-server๋Š” ์ž„์˜์˜ ์ผํšŒ์„ฑ submission code๋ฅผ ์ƒ์„ฑํ•˜์—ฌ mnet-client์— ์ „์†กํ•˜๊ณ , ์ฐธ๊ฐ€์ž๋Š” ์ด ์ฝ”๋“œ๋ฅผ ์นด๋ฉ”๋ผ ์‹œ์•ผ ๋‚ด์— ํ‘œ์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. mnet-client์™€ mnet-server๋Š” ๋ณด์•ˆ ์—ฐ๊ฒฐ์„ ์œ ์ง€ํ•˜๋ฉฐ mnet-client๋Š” ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ž‘์—… ์‹คํ–‰ ์ƒํƒœ๋ฅผ ๋ณด๊ณ ํ•˜๊ณ , mnet-server๋Š” ์ž‘์—… ์ง€์‹œ(์–ธ์–ด/์‹œ๊ฐ์  ํ”„๋กฌํ”„ํŠธ ๋“ฑ)๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. mnet-server๋Š” ์‹คํ–‰ ์ค‘ ๋ฌด์ž‘์œ„๋กœ video frame์˜ hash๊ฐ’์„ ์š”์ฒญํ•˜์—ฌ mnet-client๋กœ๋ถ€ํ„ฐ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ „์†ก๋ฐ›์•„ ๋น„๋””์˜ค์˜ ์ง„์ •์„ฑ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
  3. Verification Phase (๊ฒ€์ฆ ๋‹จ๊ณ„): ์ž‘์—… ์™„๋ฃŒ ์‹œ mnet-client๋Š” ๊ธฐ๋ก๋œ ๋น„๋””์˜ค์™€ ์‹คํ–‰ ๋กœ๊ทธ(execution logs)๋ฅผ mnet-server์— ์ „์†กํ•ฉ๋‹ˆ๋‹ค. mnet-server๋Š” ์ œ์ถœ๋œ ๋น„๋””์˜ค๊ฐ€ ์ผํšŒ์„ฑ submission code๋ฅผ ๋ช…ํ™•ํžˆ ํ‘œ์‹œํ•˜๋Š”์ง€, ์—…๋กœ๋“œ๋œ ๋น„๋””์˜ค์™€ frame๋“ค์ด ์ด์ „์— ๋“ฑ๋ก๋œ hash๊ฐ’๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ๋น„๋””์˜ค ๊ธธ์ด์™€ ๋‚ด์šฉ์ด ๋ณด๊ณ ๋œ ์ž‘์—… ์ƒํƒœ ๋ฐ key frame๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€ ์„ธ ๊ฐ€์ง€ ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ง„์ •์„ฑ์„ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ integrity check๋ฅผ ํ†ต๊ณผํ•œ ํ›„์—์•ผ ๊ณต์‹ ์œ„์›ํšŒ์—์„œ ์ž‘์—…๋ณ„ metrics๋ฅผ ์ ์šฉํ•˜์—ฌ ํ‰๊ฐ€ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ์ˆ ์  ๊ตฌํ˜„:

mnet-server๋Š” Amazon Web Services (AWS)์— ํ˜ธ์ŠคํŒ…๋˜์–ด ์ „ ์„ธ๊ณ„ mnet-client์˜ ์ ‘์†์„ ์ง€์›ํ•˜๋ฉฐ, ํ†ต์‹ ์€ Transmission Control Protocol (TCP)์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. mnet-client๋Š” Robot Operating System (ROS)๊ณผ ํ˜ธํ™˜๋˜๋Š” ํŒจํ‚ค์ง€๋กœ ๊ตฌํ˜„๋˜์–ด ๋กœ๋ด‡ ํ”Œ๋žซํผ๊ณผ์˜ ํ†ตํ•ฉ์ด ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์‹คํ–‰ ์ƒํƒœ ๋ณด๊ณ , ์ž‘์—… ๋ชจ๋‹ˆํ„ฐ๋ง, ๋น„๋””์˜ค ์ฒ˜๋ฆฌ (OpenCV ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ x264 codec์œผ๋กœ ์ธ์ฝ”๋”ฉ), ํŒŒ์ผ ์ œ์ถœ ๋“ฑ์„ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. mnet-client๋Š” ROS services๋ฅผ ํ†ตํ•ด mnet-server์— ์ƒํƒœ๋ฅผ ๋ณด๊ณ ํ•˜๋ฉฐ, ROS topics๋ฅผ ํ†ตํ•ด ๋กœ๋ด‡์ด๋‚˜ ์‚ฌ์šฉ์ž์—๊ฒŒ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

mnet-server๋Š” ๊ฐ mnet-client ์—ฐ๊ฒฐ์— ๋Œ€ํ•ด ์ „์šฉ thread๋ฅผ ํ• ๋‹นํ•˜์—ฌ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. mnet-server์˜ ์•„ํ‚คํ…์ฒ˜๋Š” task manager์™€ storage center๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. task manager๋Š” ํŒ€ ์ž๊ฒฉ ํ™•์ธ, submission code ๋ฐœ๊ธ‰, ์‹คํ–‰ ์ƒํƒœ ๋กœ๊น…, ์ž‘์—…๋ณ„ ์ง€์‹œ ์ „๋‹ฌ ๋“ฑ ๋ชจ๋“  ๋ฉ”์‹œ์ง€ ์ˆ˜์ค€์˜ ์ž‘์—…์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. storage center๋Š” AWS S3๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋Œ€์šฉ๋Ÿ‰ ํŒŒ์ผ(๋น„๋””์˜ค)์„ ์•ˆ์ •์ ์œผ๋กœ ์ €์žฅํ•˜๋ฉฐ, mnet-client๋Š” pre-signed upload URL์„ ๋ฐ›์•„ HTTP PUT ๋ฐฉ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ S3์— ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  trial metadata, ํŒ€ ๊ธฐ๋ก, ์ œ์ถœ ์ƒํƒœ๋Š” MySQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์œ ์ง€๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๋Š” ๋Œ€์—ญํญ ํšจ์œจ์„ฑ์„ ๋†’์ด๊ณ  ๋„คํŠธ์›Œํฌ ์ƒํƒœ์— ๊ด€๊ณ„์—†์ด ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ(integrity)์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ ์ž‘์—…:

  • Physical Skills Track: Peg-in-Hole Assembly: ๊ฐ์ฒด ์„ธํŠธ๋Š” 5๊ฐ€์ง€ ๊ณ ์œ ํ•œ ํ˜•ํƒœ(๋Œ€์นญ/๋น„๋Œ€์นญ)์˜ ๋ชป(peg)๊ณผ ๋ณด๋“œ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ํ˜•ํƒœ์— ๋Œ€ํ•ด 3mm, 1mm, 0.1mm, 0.02mm์˜ 4๊ฐ€์ง€ clearance level์„ ๊ฐ€์ง„ ๊ตฌ๋ฉ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ œ์กฐ ๊ณต์ฐจ๋Š” 20 microns ์ด๋‚ด๋กœ ๋งค์šฐ ์ •๋ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋ณด๋“œ๋Š” ํˆฌ๋ช…ํ•œ acrylic ์žฌ์งˆ๋กœ ์ œ์ž‘๋˜์–ด ์‹œ๊ฐ ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ์ง€๊ฐ์  ๋‚œ์ด๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค. ์‚ฝ์ž… ์ˆœ์„œ๋Š” clearance level์ด ํฐ ๊ฒƒ๋ถ€ํ„ฐ ์ž‘์€ ๊ฒƒ์œผ๋กœ, ๊ทธ๋ฆฌ๊ณ  ๊ธฐํ•˜ํ•™์  ๋ณต์žก๋„๊ฐ€ ๋‚ฎ์€ ๊ฒƒ๋ถ€ํ„ฐ ๋†’์€ ๊ฒƒ์œผ๋กœ ๊ณ ์ •๋ฉ๋‹ˆ๋‹ค. ์‹คํ–‰ ๋ชจ๋“œ๋Š” ์™„์ „ ์ž์œจ(fully autonomous), human-in-the-loop (๊ณ ์ˆ˜์ค€ ์ง€์›), teleoperation (์ง์ ‘ ์ œ์–ด)์œผ๋กœ ๊ตฌ๋ถ„๋ฉ๋‹ˆ๋‹ค.
  • Embodied Reasoning Track: Block Arrangement: ๊ฐ์ฒด ์„ธํŠธ๋Š” 5๊ฐ€์ง€ ์ƒ‰์ƒ(๋นจ๊ฐ•, ๋…ธ๋ž‘, ์ฃผํ™ฉ, ํŒŒ๋ž‘, ์ดˆ๋ก)์˜ ๋ธ”๋ก์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๊ฐ ์ƒ‰์ƒ๋ณ„๋กœ 10๊ฐœ์˜ ๋ธ”๋ก์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ๋ธ”๋ก์€ ํ˜•ํƒœ, ํฌ๊ธฐ, ์žฌ์งˆ์ด ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ์ž‘์—…์€ mnet-server๋กœ๋ถ€ํ„ฐ ๋ฐ›์€ ํ”„๋กฌํ”„ํŠธ(prompt)์— ๋”ฐ๋ผ ๋ธ”๋ก ๋ฐฐ์—ด์„ ์žฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‚œ์ด๋„๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” 10๋ผ์šด๋“œ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๊ฐ ๋ผ์šด๋“œ๋Š” ์–ธ์–ด(language) ํ”„๋กฌํ”„ํŠธ, ์‹œ๊ฐ(visual) ํ”„๋กฌํ”„ํŠธ, ์‹œ๊ฐ-์–ธ์–ด(visual-language) ํ”„๋กฌํ”„ํŠธ์˜ ์„ธ ๊ฐ€์ง€ ๋…๋ฆฝ์ ์ธ ์ž‘์—…์œผ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ๋‚œ์ด๋„๋Š” ์ƒ‰์ƒ ์ดํ•ด, long-horizon tasks, ๊ณต๊ฐ„ ์ถ”๋ก , ๋ฌผ๋ฆฌ์  ์ดํ•ด, ์ˆจ๊ฒจ์ง„ ์ •๋ณด๋‚˜ ์ถ”์ƒ์  ์ง€์‹œ๋กœ๋ถ€ํ„ฐ์˜ ์ถ”๋ก  ๋“ฑ 5๊ฐ€์ง€ ์š”์†Œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค.

๊ถ๊ทน์ ์œผ๋กœ, ManipulationNet์€ ๋กœ๋ด‡ ์กฐ์ž‘ ๋Šฅ๋ ฅ์— ๋Œ€ํ•œ ์ฒด๊ณ„์ ์ธ ์ดํ•ด๋ฅผ ๋•๊ณ , ๊ณผํ•™์  ์ง„๋ณด์˜ ๊ถค์ ์„ ๊ธฐ๋กํ•˜๋ฉฐ, ์‹ค์ œ ๋ฐฐํฌ ์ค€๋น„๊ฐ€ ๋œ ์‹œ์Šคํ…œ์„ ์‹๋ณ„ํ•จ์œผ๋กœ์จ ์—ฐ๊ตฌ ์šฐ์„ ์ˆœ์œ„๋ฅผ ์ œ์‹œํ•˜๋Š” ์ง€์† ๊ฐ€๋Šฅํ•œ ๊ธฐ๋ฐ˜์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

Copyright 2026, JungYeon Lee