暨南经院统计学系列Seminar第168期:徐世荣(美国加州大学洛杉矶分校)

发布者:徐思捷发布时间:2025-05-26浏览次数:11

主题:Golden Ratio Weighting Prevents Model Collapse

主讲人:徐世荣 美国加州大学洛杉矶分校

主持人:王国长 暨南大学

时间:2025530日(周五)下午16:30-17:30

地点:暨南大学石牌校区经济学院大楼(中惠楼)306

摘要

Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies have become central challenges in generative model research. In this paper, we investigate this phenomenon theoretically within a novel framework, where generative models are iteratively trained on a combination of newly collected real data and synthetic data from the previous training step. To develop an optimal training strategy for integrating real and synthetic data, we evaluate the performance of a weighted training scheme in various scenarios, including Gaussian distribution estimation and linear regression. We theoretically characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance. Our key finding is that, across different settings, the optimal weighting scheme under different proportions of synthetic data asymptotically follows a unified expression, revealing a fundamental trade-off between leveraging synthetic data and generative model performance. Notably, in some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio. Finally, we validate our theoretical results on extensive simulated datasets and a real tabular dataset.

主讲人简介

徐世荣博士现为加州大学洛杉矶分校的定期助理教授。他本科毕业于暨南大学,博士毕业于香港城市大学。他的研究方向包括统计机器学习及其在合成数据,差分隐私,图数据上的应用。他的研究成果发表于JBES, EJS, JASA等统计期刊。

欢迎感兴趣的师生参加!

校对|王国长

责编| 彭毅

初审| 姜云卢

终审发布| 何凌云

(来源:暨南大学经济学院微信公众号)