Results for "LLM auditing"

Projects

Mechanistic Interpretability and Human-like Tendencies of Generative AI

Project Lead:

Description:

Our research aims to provide insights into the neural sociology of LLMs—examining how different social preferences factorize internally within the model’s latent space. We seek to identify interpretable latent units…

School of Undergraduate Studies

The University of Texas

Results for "LLM auditing"

Projects

Mechanistic Interpretability and Human-like Tendencies of Generative AI

Faculty