Technical Paper Reading 2026
Weekly sessions to discuss the latest AI Safety research papers
About Our Reading Group|關於我們的讀書會
The NTU AI Safety Reading Group aims to build a student community in Taiwan interested in AI Safety & Alignment. As an active member of the global AI safety community, we regularly participate in international conferences across Singapore, the United States, Hong Kong, and beyond. We maintain close ties with the international AI safety network and frequently host visiting scholars for knowledge-sharing sessions.
This year, inspired by how AI safety organizations at other universities approach technical education, we are launching a new Technical Paper Reading group — providing a dedicated space for students with relevant academic backgrounds or a strong willingness to engage with technical literature to exchange ideas and deepen their understanding together.
台大 AI Safety 讀書會旨在台灣建立一個對 AI Safety & Alignment 感興趣的學生社群。作為全球 AI Safety 社群的活躍成員,我們經常參與在新加坡、美國、香港等地舉辦的國際會議,並與國際 AI Safety 網絡保持緊密聯繫,定期邀請國際學者進行知識分享。我們使用由 BlueDot Impact 與來自 OpenAI 和劍橋大學的 AI Safety 學者共同開發的課程內容,希望為大家提供全面的 AI Safety 概念和挑戰的介紹。 今年我們參考國外其他大學AI Safety組織決定另外開設Technical Paper Reading,希望可以讓更多有相關學術背景或是願意閱讀技術論文的人有更好的平台可以彼此交流.
What We’ll Cover|課程內容
Over 8 weeks, we’ll read and discuss landmark AI safety papers together — each session we dive into the week’s readings as a group, working through the core arguments, challenging each other’s understanding, and exploring open questions.
Topics we’ll read and discuss:
- Concrete problem framings in AI safety (reward hacking, scalable oversight, distributional shift)
- Learning from human feedback — RLHF, debate, and AI-generated oversight
- Scalable oversight & RLAIF — Constitutional AI and mesa-optimizers
- Mechanistic interpretability fundamentals — Transformer Circuits and Sparse Autoencoders
- Advanced interpretability & feature discovery — Scaling Monosemanticity, Singular Vectors
- Deception & alignment faking — Sleeper Agents, Alignment Faking in LLMs
- Model scheming & anti-scheming training — In-context Scheming, Deliberative Alignment
- Evaluation, governance & extreme risks — Frontier Safety Framework, International AI Safety Report
我們為期八週的讀書會將透過論文閱讀與小組討論來探索 AI Safety 的核心議題——每週大家帶著對當週論文的理解一同聚會,共同梳理核心論點、相互挑戰彼此的理解,並深入討論尚待解答的開放性問題。
我們會一起閱讀並討論的主題包括:
- AI 安全的具體問題框架(reward hacking、scalable oversight、分布偏移)
- 從人類反饋學習——RLHF、Debate 與 AI 生成監督
- 可擴展監督與 RLAIF——Constitutional AI 與 Mesa-optimizer
- 可解釋性基礎——Transformer Circuits 與 Sparse Autoencoders
- 進階可解釋性與特徵發現——Scaling Monosemanticity、Singular Vectors
- 欺騙與對齊偽裝——Sleeper Agents、Alignment Faking in LLMs
- 模型謀算與反謀算訓練——In-context Scheming、Deliberative Alignment
- 評估、治理與極端風險——Frontier Safety Framework、國際 AI Safety 報告
Who Should Join?|誰適合參加?
This technical reading group is designed to be accessible and valuable for:
- Students interested in the technical side of AI safety — mechanistic interpretability, alignment, reward learning, and more
- Those who want to start contributing to AI safety research and are looking for a community to think through problems with
- Anyone looking to build paper-reading habits and get more comfortable with academic literature
- Students with an ML/DL background who want to keep pushing their understanding further
- Anyone who wants to challenge themselves to engage more seriously with research papers
這個讀書會適合:
- 對 AI safety 的技術細節有興趣的學生
- 想要開始做AI Safety相關研究,希望可以一起討論問題
- 藉由讀書會增加論文閱讀經驗
- 有ML/DL 背景想要持續精進
- 想要挑戰自己多閱讀學術文章
Format and Schedule|形式和時間
- Weekly 2-hour sessions
- Bilingual discussions (English and Mandarin)
- Mix of presentations and group discussions
- Opportunities to engage in research projects after the reading group
- Free dinner
- 每週 2 小時的聚會
- 中英雙語討論
- 結合導讀和小組討論
- 讀書會後有機會參與研究項目
- 免費晚餐
Why Join Us?|為什麼要參加?
- Build Essential Knowledge: Understand one of the most important challenges facing humanity
- Join a Global Movement: Connect with the international AI safety community
- Career Development: Explore opportunities in AI safety research and development
- Make an Impact: Contribute to ensuring AI benefits humanity
- 建立重要知識:了解人類面臨的最重要挑戰之一
- 加入全球運動:與國際 AI safety 社群連結
- 職業發展:探索 AI safety 研究和開發的機會
- 產生影響:為確保 AI 造福人類貢獻一份心力
Real-World Impact|實際影響
The effectiveness of this curriculum has been proven. According to BlueDot Impact’s data, after completing similar programs:
- Many participants successfully transitioned into AI safety work
- Participants gained clearer understanding of AI safety challenges
- Some initiated their own AI safety research projects
這個課程的效果已經得到證實。根據 BlueDot Impact 的數據,完成類似項目後:
- 許多參與者成功轉入 AI safety 工作
- 參與者對 AI safety 挑戰有了更清晰的認識
- 一些人開始了自己的 AI Safety 研究項目
How to Join|如何參加
- Venue: TBD
- First Session: 2026/3/16 (Monday) 19:00–21:00
- Free of charge with food provided
- Limited spots available
- Both English and Mandarin welcome
Sign up now to be part of this important initiative in understanding and shaping the future of AI!
立即報名,成為理解和塑造 AI 未來的重要行動的一份子!
Contact Us|聯絡我們
Email: ntuaisafety@gmail.com
Discord: https://discord.gg/CUz4tWpggV