2026 W1 - AI Safety 面臨的技挑戰 (Challenge of AI Safety)

The Fundamental + Policy Track

Fundamental + policy track 由 Leo 及 Christine 主持，我們帶大家了解 Bluedot Impact 的 Bluedot Impact’s The Technical Challenge with AI，讓大家對於人工智慧（Artificial Intelligence, AI）在達到安全的技術上有哪些挑戰。同時，透過區分 Artificial General Intelligence（AGI）與 Transformative AI（TAI），我們理解不同類型 AI 在風險與產業衝擊上的差異，並反思 AI 對娛樂、創作與注意力經濟的改變。在政策層面，各國仍在摸索合適的監管方式，而企業的快速發展與安全之間存在明顯張力，也讓我們思考影響對象是否應從政府延伸至大型科技公司。最後，討論延伸至 Outer and Inner AI alignment的核心問題，提醒我們在設計制度與指標時，需警覺其可能被濫用的風險。整體而言，本次讀書會不只是理解 AI 技術，更是試圖釐清在不確定未來中，人類應如何共同制定可持續且負責任的發展方向。

The Fundamental + Policy Track was led by Leo and Christine. We walked through Bluedot Impact’s The Technical Challenge with AI, helping everyone understand the key challenges in achieving safe artificial intelligence (AI). By distinguishing between Artificial General Intelligence (AGI) and Transformative AI (TAI), we explored how different types of AI vary in terms of risks and industry impact. We also reflected on how AI is reshaping entertainment, creativity, and the attention economy. On the policy side, countries are still exploring appropriate regulatory approaches, while there is a clear tension between rapid corporate development and safety considerations. This led us to question whether the focus of governance should extend beyond governments to include large technology companies. Finally, we discussed the core issues of Outer and Inner AI alignment, emphasizing the need to remain cautious about how designed systems and metrics may be misused. Overall, this session was not only about understanding AI technologies, but also about clarifying how humanity can collectively shape a sustainable and responsible path forward in an uncertain future.

Technical Paper Reading

而 Technical Paper Reading track 則是由 Ted 以及 Zen 主持，本週介紹的是 Concrete Problems in AI Safety 以及 Eliciting Latent Knowledge (ELK)，由經典的文章帶出 AI Safety 在技術實踐方面的各種問題定義以及相關討論。在 Concrete Poblems in AI Safety 中，我們藉由從 high level 問題的分類與定義（distributional shift, negative side effect, scalable oversight…）到實際的情境與解決方式，讓大家對 AI Safety 領域有更深入的架構理解；並且從文章中的例子，連結到現在的研究方向與正在解決的問題。關於 ELK 的討論，我們想確保 AI 以 Direct Translators 的姿態忠實呈現真相，而非以 Human Simulators 的角色說出諂媚的謊話；我們也了解到現階段要 AI 驗證「真相」可能是困難的，因此能夠自動解釋 AI 當下運作機智的 Eliciting Latent Beliefs（ELB）或 Eliciting Latent Information（ELI），可能更是當下的階段性目標。

The Technical Paper Reading Track was led by Ted and Zen. This week, we covered Concrete Problems in AI Safety and Eliciting Latent Knowledge (ELK). Starting from these classic papers, we introduced key problem definitions and discussions in the technical practice of AI safety. In Concrete Problems in AI Safety, we moved from high-level categorizations (such as distributional shift, negative side effects, and scalable oversight) to concrete scenarios and potential solutions, helping participants build a more structured understanding of the field. We also connected the examples in the paper to current research directions and ongoing challenges. For ELK, we explored the goal of ensuring AI systems act as direct translators that faithfully represent the truth, rather than as human simulators that produce pleasing (Sycophancy) but potentially misleading responses. We also recognized that verifying “truth” remains difficult at the current stage. Therefore, approaches like eliciting latent beliefs (ELB) or eliciting latent information (ELI), which aim to automatically reveal what an AI system internally “knows,” may serve as more practical intermediate objectives. We sincerely appreciate everyone’s enthusiastic participation in this first session, and we look forward to bringing even more value to future study groups!