8?22????????????IEEE CIG2017???????Doom AI??(ViZDoom)????IEG NEXT????Axon???????????
IEEE CIG2017
ViZDoom???????????????????????????Doom????????????????????????????????AI?????????????????AI?????????????Facebook?FAIR??Track1????Intel??Track2????CMU????Arnold????Track????
CIG2016ViZDoom Final
ViZDoom??????????8???????????DeathMatch10?????Frags?????(Frags = ???? - ????)???????Track?Track1??????????Track2?????????????????????????????????????????10??????????????
Axon???????????AI?????????????AI??????????????????????????????????????ViZDoom??????????????????????4??ViZDoom??????????????????????????????????????????????Deep Learning????Reinforcement Learning???????????????????????????????????AI??????????AI???????????AI???????????????
Axon‘s Slogan?Hack AI for Game
??????????????????RL???Paper?????????????????????????????????????????RL????????????????(Trial-and-Error)?????????????????CPU?GPU????????????????RL?????????????????????????(???Error?Recall)??????????????????????????????????Gaia???????????????????????????DeepMind??????UNREAL??????????A3C??Auxiliary Tasks?????????????????????????UNREAL??????????????????????
????
???????GroupStudy???????RL???????????RL????????????MDP?Bellman Equation, TD, PO?Actor-Critic?etc. ???????????????????Approximator??????????????????????????Train from scratch??????????Reward??Sparse??????Reward Function??????????????????????????????????????????????????????Navigation??????????????????????????????????????????Imitation Learning?Train??????Reward????????
??????????Agent?????????????????????????????????????????????????????GroundTruth?Train??Policy Net????Alphago??????????????????5?????????????????PostRules????
Imitation Learning
Input: Static Attention
6????????UNREAL???Imitation??????????????????????????????????????????????????????????A3C????Actor?Critic?????????????????????????(Actor)???????(Critic)???Critic???????????????????????????????????????????????6????????????????????????????????????????????F1?
?????A3C
7????????????????????????????????????????(Tactics)???????“??”???????????????????????????Doom???????Track1????Rocket????????????????????????????????????????????????Fancy??
??????????????????????????DeepMind?EWC???????????????Tactics???????Feasible Configuration?????EWC?????????????????????????????????????????Tactics??????????Hack???????Bot????Imitation Learning???RL???Overall Polish???????????Agent????Dodge??????????Dodge???????????????????????????????????
Advanced Policy
?????????????????????????????4???Axon??????????DL?RL???????????????Gaia???????Training???????????????????
?????????AGI, RL??????????Agent???????????????Reward Engineering?Sample???High Variance?Delayed Reward???Temporal Credit Assignment???Prediction?????MultiTask Training????Catastrophic Forgetting; ??????????Explore?mid-term/long-term?Tactics???????Agent????????????Action???Decentralized-Centralized????MultiAgent Credit Assignment?long-term Planning?????AI?????DRL???????????????????????????????Racing?????Steering?Fighting????????RTS??Micromanagement??????????????????????Game AI?????Rule Based???Planners??????????Axon?????????????????
??IEG-NEXT????????????????????TEG???????????????