What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study
2024
Conference Paper
ei
Author(s): | Jain, S. and Lubana, E. S. and Oksuz, K. and Joy, T. and Torr, P. and Sanyal, A. and Dokania, P. K. |
Book Title: | Advances in Neural Information Processing Systems 37 (NeurIPS 2024) |
Year: | 2024 |
Month: | December |
Department(s): | Empirical Inference |
Bibtex Type: | Conference Paper (conference) |
Event Name: | 38th Annual Conference on Neural Information Processing Systems |
Event Place: | Vancouver, Canada |
State: | Accepted |
BibTex @conference{Jainetal24, title = {What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study}, author = {Jain, S. and Lubana, E. S. and Oksuz, K. and Joy, T. and Torr, P. and Sanyal, A. and Dokania, P. K.}, booktitle = {Advances in Neural Information Processing Systems 37 (NeurIPS 2024)}, month = dec, year = {2024}, doi = {}, month_numeric = {12} } |