-
Alignment Faking Replication and Chain-of-Thought Monitoring Extensions
A replication of alignment faking in Hermes-3-Llama-3.1-405B, plus CoT monitoring ablations.
A replication of alignment faking in Hermes-3-Llama-3.1-405B, plus CoT monitoring ablations.