Dense Reward Generation for RL Fine-Tuning

← BACK

Unsupervised Dense Reward Generation for RL Fine-Tuning of BC Policies. CS234 project poster.

CS234 · Stanford, Spring 2026 · with Yubo Ruan & Ryan Zhang