During a summer fellowship at the Center on Long-term Risk, I investigated how philosophical and neuroscientific perspectives on agency could inform AI safety research.
This work explored Dennett’s intentional stance—a framework for attributing beliefs and desires to systems—and examined insights from computational neuroscience about how the human brain implements agency. The goal was to develop better conceptual tools for reasoning about agent-like behavior in AI systems and understanding what it means for systems to have goals or objectives.
The research aimed to bridge philosophical clarity about agency with practical considerations for building and aligning AI systems that exhibit goal-directed behavior.
Publications & Writing
- Grokking the Intentional Stance (LessWrong)
- Integrating Three Models of Human Cognition (LessWrong)