{"id":339,"date":"2020-08-31T06:54:16","date_gmt":"2020-08-31T06:54:16","guid":{"rendered":"https:\/\/www.icomputinglabs.in\/blog\/?p=339"},"modified":"2023-08-03T10:14:36","modified_gmt":"2023-08-03T10:14:36","slug":"leveraging-artificial-intelligence-in-unity-using-ml-agents-series-post","status":"publish","type":"post","link":"https:\/\/www.icomputinglabs.in\/blog\/leveraging-artificial-intelligence-in-unity-using-ml-agents-series-post\/","title":{"rendered":"Leveraging Artificial Intelligence in Unity using ML-Agents (Series Post)"},"content":{"rendered":"\n<p>Unity ML-Agents toolkit is a new plugin based on the game engine Unity that allows us to use the Unity Game Engine as an environment builder to train agents.<\/p>\n\n\n\n<p>From playing football, learning to walk, to jumping big walls, to training a cute doggy to catch sticks, Unity ML-Agents Toolkit provides a ton of amazing pre-made environments.<\/p>\n\n\n\n<p>Furthermore, we can also create new learning environments.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How Unity ML-Agents works?<\/strong><\/h2>\n\n\n\n<p>Unity ML-Agents is a new plugin for the game engine Unity that allows us to create or use pre-made environments to train our agents.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The three components<\/strong><\/h3>\n\n\n\n<p>With Unity ML-Agents, we have three important components.<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"177\" src=\"https:\/\/lh4.googleusercontent.com\/vOSRD_TP8Tl1khQ-D69Ms3BPrrtr3nAnUCdTcj_m13IJztik18hm_VcytsSOxto8cd3-3KaWI1fJgN85ADcCVGH1-dWdX2GCTtui4n-FKR0OF5AArXrZ0w92SK_pPqMT2u8GJhxD\"><\/p>\n\n\n\n<p>Source: Unity ML-Agents Documentation<\/p>\n\n\n\n<p>The first is the <em>Learning Environment (on Unity),<\/em> that <strong>contains the Unity scene and the environment elements.<\/strong><\/p>\n\n\n\n<p>The second is the <em>Python API <\/em>that contains the (Reinforcement Learning) <strong>RL algorithms<\/strong> (such as PPO and SAC). We use this API to launch training, to test, etc. It communicates with the Learning environment through the third component called <em>external communicator.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Inside the Learning Environment<\/strong><\/h2>\n\n\n\n<p>Inside the Learning Environment, we have different elements:<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"281\" src=\"https:\/\/lh3.googleusercontent.com\/nGOrZVHV5uzPz0DQn9WVhr3L5Hk5hLDLvoGVzL8mFKqEnfEnu0riRX4tcNrZc-j5hORsBFpB0IKhPYxH2oYEHs9z5n4Rr5vopavHwNrxEbW0LMhMYk2WJ-V4515WcSmV57V13Vpg\"><\/p>\n\n\n\n<p><em>Source: Unity ML-Agents Documentation<\/em><\/p>\n\n\n\n<p>The first is the Agent,<strong> the actor of the scene. <\/strong>It\u2019s him that we\u2019re going to train by optimizing his policy (that will tell us what action to take at each state) <strong>called Brain.<\/strong><\/p>\n\n\n\n<p>Finally, there is the Academy, this element <strong>orchestrates agents and their decision-making process. <\/strong>Think of this Academy as a maestro that handles the requests from the python API.<\/p>\n\n\n\n<p>To better understand its role let\u2019s remember the RL process. This can be modeled as a loop that works like this:<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"212\" src=\"https:\/\/lh4.googleusercontent.com\/o_blwA8LsjGPL8cxI1WKiGOp2GnhhsHMPspKEkqyORHVu-swpY3ZG0u_ihE2yB8mnuZksAhZ5OnUik7MLcxQrLV0u2yyenB09hyezpDhwEqx4-qXiGyiZRo8QLNXC0GXCcDTqBgZ\"><\/p>\n\n\n\n<p><em>Source: Sutton\u2019s Book<\/em><\/p>\n\n\n\n<p>Now, let\u2019s imagine an agent learning to play a platform game. The RL process looks like this:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Our agent receives state S0 from the environment \u2014 we receive the first frame of our game (environment).<\/li><li>Based on the state S0, the agent takes an action A0 \u2014 our agent will move to the right.<\/li><li>The environment transitions to a new state S1.<\/li><li>Give a reward R1 to the agent \u2014 we\u2019re not dead <em>(Positive Reward +1)<\/em>.<\/li><\/ul>\n\n\n\n<p>This RL loop outputs a sequence of state, action, and reward. The goal of the agent is to <strong>maximize the expected cumulative reward.<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/YZvkl_jP2wpxXZEs16t6h4VppTXUpB1G3DATUGSDBkuMixun8f4Z2LAIxZUeDLTPoiTzOj1wzpBTo2KyJjo95HIu_XBgZGje_AejC_KATkeFM-jRZESOXP0C6sFnM9uPXxQvETq4\" alt=\"\"\/><\/figure>\n\n\n\n<p>In fact, the Academy<strong> will be the one that will send the order to our Agents and ensure that agents are in sync<\/strong>. It will also perform these actions:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Collect Observations<\/li><li>Select your action using your policy<\/li><li>Take the Action<\/li><li>Reset if you reached the max step or if you\u2019re done<\/li><\/ul>\n\n\n\n<p>In next post, we will explain in detail how to train agents for various scenarios. Stay tuned. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Unity ML-Agents toolkit is a new plugin based on the game engine Unity that allows us to use the Unity Game Engine as an environment builder to train agents. From playing football, learning to walk, to jumping big walls, to training a cute doggy to catch sticks, Unity ML-Agents Toolkit provides a ton of amazing [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":346,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30,47,17,48,39],"tags":[58,65,66],"class_list":["post-339","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-engineering","category-technology","category-unity","category-xr","tag-ai","tag-ml-agents","tag-unity","grid-item","grid-item-portrait"],"_links":{"self":[{"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/posts\/339","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/comments?post=339"}],"version-history":[{"count":4,"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/posts\/339\/revisions"}],"predecessor-version":[{"id":344,"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/posts\/339\/revisions\/344"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/media\/346"}],"wp:attachment":[{"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/media?parent=339"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/categories?post=339"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.icomputinglabs.in\/blog\/wp-json\/wp\/v2\/tags?post=339"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}