Mi的微博

突然好羡慕 reinforcement learning 中的 agent，可以与环境交互多个 episodes，而我的人生却只能来一次。