Empirical Inference

What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study

2024

Conference Paper

ei


Author(s): Jain, S. and Lubana, E. S. and Oksuz, K. and Joy, T. and Torr, P. and Sanyal, A. and Dokania, P. K.
Book Title: Advances in Neural Information Processing Systems 37 (NeurIPS 2024)
Year: 2024
Month: December

Department(s): Empirical Inference
Bibtex Type: Conference Paper (conference)

Event Name: 38th Annual Conference on Neural Information Processing Systems
Event Place: Vancouver, Canada

State: Accepted

BibTex

@conference{Jainetal24,
  title = {What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study},
  author = {Jain, S. and Lubana, E. S. and Oksuz, K. and Joy, T. and Torr, P. and Sanyal, A. and Dokania, P. K.},
  booktitle = {Advances in Neural Information Processing Systems 37 (NeurIPS 2024)},
  month = dec,
  year = {2024},
  doi = {},
  month_numeric = {12}
}