I agree that this site is using cookies. You can find further informations
here
.
X
Login
Merkliste (
0
)
Home
About us
Home About us
Our history
Profile
Press & public relations
Friends
The library in figures
Exhibitions
Projects
Training, internships, careers
Films
Services & Information
Home Services & Information
Lending and interlibrary loans
Returns and renewals
Training and library tours
My Account
Library cards
New to the library?
Download Information
Opening hours
Learning spaces
PC, WLAN, copy, scan and print
Catalogs and collections
Home Catalogs and Collections
Rare books and manuscripts
Digital collections
Subject Areas
Our sites
Home Our sites
Central Library
Law Library (Juridicum)
BB Business and Economics (BB11)
BB Physics and Electrical Engineering
TB Engineering and Social Sciences
TB Economics and Nautical Sciences
TB Music
TB Art & Design
TB Bremerhaven
Contact the library
Home Contact the library
Staff Directory
Open access & publishing
Home Open access & publishing
Reference management: Citavi & RefWorks
Publishing documents
Open Access in Bremen
zur Desktop-Version
Toggle navigation
Merkliste
1 Ergebnisse
1
Sleeper Agents: Training Deceptive LLMs that Persist Throug..:
Hubinger, Evan
;
Denison, Carson
;
Mu, Jesse
...
http://arxiv.org/abs/2401.05566. , 2024
Link:
http://arxiv.org/abs/2401.05566
RT Journal T1
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
UL https://suche.suub.uni-bremen.de/peid=base-ftarxivpreprints:oai:arXiv.org:2401.05566&Exemplar=1&LAN=DE A1 Hubinger, Evan A1 Denison, Carson A1 Mu, Jesse A1 Lambert, Mike A1 Tong, Meg A1 MacDiarmid, Monte A1 Lanham, Tamera A1 Ziegler, Daniel M A1 Maxwell, Tim A1 Cheng, Newton A1 Jermyn, Adam A1 Askell, Amanda A1 Radhakrishnan, Ansh A1 Anil, Cem A1 Duvenaud, David A1 Ganguli, Deep A1 Barez, Fazl A1 Clark, Jack A1 Ndousse, Kamal A1 Sachan, Kshitij A1 Sellitto, Michael A1 Sharma, Mrinank A1 DasSarma, Nova A1 Grosse, Roger A1 Kravec, Shauna A1 Bai, Yuntao A1 Witten, Zachary A1 Favaro, Marina A1 Brauner, Jan A1 Karnofsky, Holden A1 Christiano, Paul A1 Bowman, Samuel R A1 Graham, Logan A1 Kaplan, Jared A1 Mindermann, Sören A1 Greenblatt, Ryan A1 Shlegeris, Buck A1 Schiefer, Nicholas A1 Perez, Ethan YR 2024 K1 Computer Science - Cryptography and Security K1 Computer Science - Artificial Intelligence K1 Computer Science - Computation and Language K1 Computer Science - Machine Learning K1 Computer Science - Software Engineering JF http://arxiv.org/abs/2401.05566 LK http://arxiv.org/abs/2401.05566 DO http://arxiv.org/abs/2401.05566 SF ELIB - SuUB Bremen
Export
RefWorks (nur Desktop-Version!)
Flow
(Zuerst in
Flow
einloggen, dann importieren)