Advanced Running Now

Technical Paper Reading 2026

Weekly sessions to discuss the latest AI Safety research papers

March 16, 2026 – June 1, 2026

technicalresearchpapersupcoming

About Our Reading Group｜關於我們的讀書會

The NTU AI Safety Reading Group aims to build a student community in Taiwan interested in AI Safety & Alignment. As an active member of the global AI safety community, we regularly participate in international conferences across Singapore, the United States, Hong Kong, and beyond. We maintain close ties with the international AI safety network and frequently host visiting scholars for knowledge-sharing sessions.

This year, inspired by how AI safety organizations at other universities approach technical education, we are launching a new Technical Paper Reading group — providing a dedicated space for students with relevant academic backgrounds or a strong willingness to engage with technical literature to exchange ideas and deepen their understanding together.

台大 AI Safety 讀書會旨在台灣建立一個對 AI Safety & Alignment 感興趣的學生社群。作為全球 AI Safety 社群的活躍成員，我們經常參與在新加坡、美國、香港等地舉辦的國際會議，並與國際 AI Safety 網絡保持緊密聯繫，定期邀請國際學者進行知識分享。我們使用由 BlueDot Impact 與來自 OpenAI 和劍橋大學的 AI Safety 學者共同開發的課程內容，希望為大家提供全面的 AI Safety 概念和挑戰的介紹。今年我們參考國外其他大學AI Safety組織決定另外開設Technical Paper Reading，希望可以讓更多有相關學術背景或是願意閱讀技術論文的人有更好的平台可以彼此交流.

What We’ll Cover｜課程內容

Over 8 weeks, we’ll read and discuss landmark AI safety papers together — each session we dive into the week’s readings as a group, working through the core arguments, challenging each other’s understanding, and exploring open questions.

Topics we’ll read and discuss:

Concrete problem framings in AI safety (reward hacking, scalable oversight, distributional shift)
Learning from human feedback — RLHF, debate, and AI-generated oversight
Scalable oversight & RLAIF — Constitutional AI and mesa-optimizers
Mechanistic interpretability fundamentals — Transformer Circuits and Sparse Autoencoders
Advanced interpretability & feature discovery — Scaling Monosemanticity, Singular Vectors
Deception & alignment faking — Sleeper Agents, Alignment Faking in LLMs
Model scheming & anti-scheming training — In-context Scheming, Deliberative Alignment
Evaluation, governance & extreme risks — Frontier Safety Framework, International AI Safety Report

我們為期八週的讀書會將透過論文閱讀與小組討論來探索 AI Safety 的核心議題——每週大家帶著對當週論文的理解一同聚會，共同梳理核心論點、相互挑戰彼此的理解，並深入討論尚待解答的開放性問題。

我們會一起閱讀並討論的主題包括：

AI 安全的具體問題框架（reward hacking、scalable oversight、分布偏移）
從人類反饋學習——RLHF、Debate 與 AI 生成監督
可擴展監督與 RLAIF——Constitutional AI 與 Mesa-optimizer
可解釋性基礎——Transformer Circuits 與 Sparse Autoencoders
進階可解釋性與特徵發現——Scaling Monosemanticity、Singular Vectors
欺騙與對齊偽裝——Sleeper Agents、Alignment Faking in LLMs
模型謀算與反謀算訓練——In-context Scheming、Deliberative Alignment
評估、治理與極端風險——Frontier Safety Framework、國際 AI Safety 報告

Who Should Join?｜誰適合參加？

This technical reading group is designed to be accessible and valuable for:

Students interested in the technical side of AI safety — mechanistic interpretability, alignment, reward learning, and more
Those who want to start contributing to AI safety research and are looking for a community to think through problems with
Anyone looking to build paper-reading habits and get more comfortable with academic literature
Students with an ML/DL background who want to keep pushing their understanding further
Anyone who wants to challenge themselves to engage more seriously with research papers

這個讀書會適合：

對 AI safety 的技術細節有興趣的學生
想要開始做AI Safety相關研究，希望可以一起討論問題
藉由讀書會增加論文閱讀經驗
有ML/DL 背景想要持續精進
想要挑戰自己多閱讀學術文章

Format and Schedule｜形式和時間

Weekly 2-hour sessions
Bilingual discussions (English and Mandarin)
Mix of presentations and group discussions
Opportunities to engage in research projects after the reading group
Free dinner

每週 2 小時的聚會
中英雙語討論
結合導讀和小組討論
讀書會後有機會參與研究項目
免費晚餐

Why Join Us?｜為什麼要參加？

Build Essential Knowledge: Understand one of the most important challenges facing humanity
Join a Global Movement: Connect with the international AI safety community
Career Development: Explore opportunities in AI safety research and development
Make an Impact: Contribute to ensuring AI benefits humanity

建立重要知識：了解人類面臨的最重要挑戰之一
加入全球運動：與國際 AI safety 社群連結
職業發展：探索 AI safety 研究和開發的機會
產生影響：為確保 AI 造福人類貢獻一份心力

Real-World Impact｜實際影響

The effectiveness of this curriculum has been proven. According to BlueDot Impact’s data, after completing similar programs:

Many participants successfully transitioned into AI safety work
Participants gained clearer understanding of AI safety challenges
Some initiated their own AI safety research projects

這個課程的效果已經得到證實。根據 BlueDot Impact 的數據，完成類似項目後：

許多參與者成功轉入 AI safety 工作
參與者對 AI safety 挑戰有了更清晰的認識
一些人開始了自己的 AI Safety 研究項目

How to Join｜如何參加

Venue: TBD
First Session: 2026/3/16 (Monday) 19:00–21:00
Free of charge with food provided
Limited spots available
Both English and Mandarin welcome

Sign up now →

Join our community →

Sign up now to be part of this important initiative in understanding and shaping the future of AI!
立即報名，成為理解和塑造 AI 未來的重要行動的一份子！

Contact Us｜聯絡我們

Email: ntuaisafety@gmail.com

Discord: https://discord.gg/CUz4tWpggV