暨南经院统计学系列Seminar第142期:朱进(英国伦敦政治经济学院)

发布者:徐思捷发布时间:2024-09-26浏览次数:10

主题:Sequential Knockoffs for Variable Selection in Reinforcement Learning

主讲人:朱进 英国伦敦政治经济学院

主持人:姜云卢 暨南大学

时间:2024926日(周四)上午10:30-11:30

地点:暨南大学石牌校区经济学院大楼(中惠楼)102

摘要

In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state can slow learning and obfuscate the learned policy. We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) as the smallest subvector of the original state under which the process remains an MDP and shares the same optimal policy as the original process. We propose a novel sequential knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics. In large samples, the proposed method controls the false discovery rate, and selects all sufficient variables with probability approaching one. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy optimization. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods in terms of variable selection accuracy and regret.

主讲人简介

朱进,伦敦政治经济学院博士后,于中山大学获得博士学位。主要研究领域包括强化学习和高维数据分析,相关成果发表在PNASJASAJMLRICMLAISTATS等国际顶级期刊和会议。

欢迎感兴趣的师生参加

 

校对| 姜云卢

责编| 彭 毅

初审| 姜云卢

终审发布| 何凌云

(来源:暨南大学经济学院微信公众号)