Results for "LLM auditing"

Projects

Mechanistic Interpretability and Human-like Tendencies of Generative AI

Project Lead:
Description:

Our research aims to provide insights into the neural sociology of LLMs—examining how different social preferences factorize internally within the model’s latent space. We seek to identify interpretable latent units…