Andrew YAO

姚期智

About the author Andrew Yao received the A.M. Turing Award in 2000 for his contributions to the theory of computation. He is also the recepient of the Kyoto Award in Advanced Technology in 2021. He is a member of the US National Academy of Sciences, the American Academy of Arts and Sciences, and the Chinese Academy of Sciences. He is Dean of the Institute for Interdisciplinary Information Sciences (IIIS) in Tsinghua University. His research interests include analysis of algorithms, computational complexity, cryptography, and quantum computing.

关于作者姚期智在2000年因其对计算理论的贡献而获得图灵奖。他还是2021年京都奖得主。他是美国国家科学院、美国艺术与科学学院和中国科学院院士。现任清华大学交叉信息研究院院长。他的研究兴趣包括算法分析、计算复杂性、密码学和量子计算。

The following excerpt is a translation of Andrew Yao’s opening speech at World Artificial Intelligence Conference in 2020, titled “New Directions in Artificial Intelligence Theory” (人工智能理论的新方向).

▶ Cite Our TranslationConcordia AI. “Andrew Yao — Chinese Perspectives on AI Safety.” Chineseperspectives.ai, 29 Mar. 2024, chineseperspectives.ai/Andrew-Yao.

▶ Cite This Work 姚期智(2020-7-9). “人工智能理论的新方向”. 在世界人工智能大会云端峰会上的演讲. https://mp.weixin.qq.com/s/HaMG9N4tV721SGSpf1YkSw

Selected Excerpt

The topic I'm discussing today is "New Directions in AI Theory." AI has already achieved widespread real-world use, with new advances on display at this conference. However, these applications all originate from foundational scientific research. The theoretical bedrock of AI was laid years ago, underscoring the need to sustain theoretical progress. I will focus on three key points: 1) AI theory is vital, enabling analysis of current challenges to clarify their essence and solutions. 2) AI is interdisciplinary, with breakthroughs arising in seemingly unrelated fields. 3) We must explore new theoretical directions like neural topologies, privacy-preserving learning, and, crucially, designing beneficial superintelligence. While AI holds great promise, pursuing it responsibly demands proactive efforts to ensure its alignment with human values.

原文

我今天讨论的话题是“人工智能理论新的方向”。AI在现实世界已经有了广泛应用，在这场大会中也能看到AI应用的新进展。但是我想说明的是，所有的这些进展都来自于基础科学。也就是说，AI领域在很多年前就已经打下了理论基础。这给我们的启示是：一定要让理论研究不断发展。在这次演讲中，我想讨论三个要点：1、AI理论很重要。当前AI面临的很多问题和挑战，都可以用理论来进行分析。通过理论分析，我们能更清楚的知道我们面临的挑战的本质，以及解决这些挑战的方法。2、AI是跨学科的行业。当前在AI中获得的一些成果，其所处的领域很多是和AI几乎不搭边的学科。3、探讨AI领域的新理论方向。我列举了三个例子来进行讨论，分别是：1、神经拓扑结构：神经网络研究的新视角；2、隐私保护学习：人工智能+多方计算；3、可控的超级人工智能：如何设计有益的超级智能。

Another critical AI theory direction is determining when artificial general superintelligence could arise. The timeline remains unpredictable – while recent systems like AlphaZero and facial recognition are impressive, their specialized scope differs vastly from artificial general superintelligence. As John McCarthy noted¹ in 1977, fundamental conceptual breakthroughs are needed, perhaps requiring 5 to 500 years. In his 2019 book, UC Berkeley's Stuart Russell proposed "super AI theory" stipulating that, regardless of when artificial general superintelligence emerges, certain principles must be rigorously implemented mathematically: 1) Beneficence - prioritizing human interests over machines; 2) Non-overconfidence - avoiding overestimating capabilities; and 3) Agreeableness - enabling learning human preferences. He suggested probability theory and game theory to actualize these principles and responsibly steer superintelligent systems.

第三个需要讨论的AI理论方向是：通用的超级人工智能何时到来？答案是不可预知，因为现在的AlphaZero、人脸识别虽然很牛，但仅适用于特定领域。正如1977年，John McCarthy曾经说过：“我们需要概念上的突破，1.7爱因斯坦+0.3曼哈顿项目，可能需要5~500年时间。”最新的“超级AI理论”提出是在2019年，当时伯克利大学的Stuart Russell在书中提到，虽然超级人工智能不知道什么时候到来，但是我们必须做好准备。他在书中设定了三个原则，每一个原则都要用严格的数学方法来实现，这三个原则分别是：1、利他的：人的利益凌驾机器利益；2、谦卑的：机器不能自以为是；3、尽心的：机器能学懂人的偏好。此外，他还提出了许多方法论，涉及概率理论和博弈论。

The following excerpt is from an expert dialogue between Andrew Yao and UC Berkeley Professor Stuart Russell at the BAAI Conference AI Safety and Alignment Forum. The dialogue took place in English and was translated into Chinese in the article, Russell对话姚期智：AI应被设计服务全人类，正如墨子兼爱思想.

The excerpt has been lightly edited for clarity and Stuart Russell’s portions have been summarized for brevity.

▶ Cite Our TranslationConcordia AI. “Andrew Yao — Chinese Perspectives on AI Safety.” Chineseperspectives.ai, 29 Mar. 2024, chineseperspectives.ai/Andrew-Yao.

▶ Cite This Work 姚期智(2023-6-10). “AI应被设计服务全人类，正如墨子兼爱思想”. 在2023智源大会AI 安全与对齐论坛上的发言. https://mp.weixin.qq.com/s/LiosCT34gHh_41ctpBfZRw

Selected Excerpt

On how to align AGI with human preferences：

Andrew Yao: You proposed suggestions to make Artificial General Intelligence (AGI) safer, but how can this idea be implemented? Humans and machines are different species. Unless we understand ourselves very well, it is difficult to control human-machine interactions. Human thinking is not uniform - how can we prevent people from creating machines that are too powerful? Should we sacrifice the interests of others for personal gain? Machines may try to alter human behavior. How do we coordinate human thought? What do we want? What should an ideal world look like? We may not have given enough thought to this question. In fact, machines are like harmless creatures, capable of doing anything humans ask. Therefore, the most important thing is to clarify what human demands actually are.

中文

如何使AGI与人类价值观对齐：

姚期智：你提出让通用人工智能（Artificial General Intelligence, AGI）更加安全的建议，如何实现这个想法？人和机器是不同物种，除非我们对自己非常了解，否则很难把控人机互动。而人类的想法不尽相同，如何防止人类制造过于强大的机器？我们是否该为了个人利益而牺牲其他人的利益？机器可能试图改变人类的行为。如何协调人类的思想？我们想要什么？理想的世界应该是怎样的？我们并不一定有好好思考过这个问题。实际上，机器就像是无害的物种，只要人类提出要求，机器可以做任何事，因此最重要的是要明确人类的需求是什么。

Stuart Russell: I agree we cannot easily articulate goals for AI. Humans have implicit preferences about the future, like preferring one movie over another. We make decisions or have the potential to do so before events occur. This raises the question of whether AI should serve individuals or humanity. We could start with simple principal-agent models of one person and one AI. With multiple people and AIs, ensuring cooperation while helping humans becomes a core moral dilemma - should an AI prioritize individual wishes even if harming others' interests? Since AI influencing society affects many, it should account for all human preferences equally. This aligns with the centuries-old philosophical exploration of universal love, from Mozi in ancient China to utilitarianism in the 18th century West. Though open challenges remain, some form of preference utilitarianism could reasonably weigh all interests. We must resolve these issues to ensure future powerful AI uses sound moral reasoning.

Stuart Russell：我们很难明确表达人类的目标和偏好，它们往往是隐性的和个人的。人类有能力对未来事件做出判断和决策。这样就产生了一个问题，机器应该服务于个人还是整个人类?如果机器只服务于个人，在追求个人利益的同时可能会损害他人利益。我认为设计良好的AI系统应该默认为服务全人类。这可以追溯到道德哲学中的“兼爱”思想，也体现在效益主义中，即决策时应平等考虑每个人的利益和偏好。但是效益主义还存在一些未解决的问题，如权衡不同国家的总体幸福程度。尽管存在这些挑战，为确保未来AI系统以正确的方式运作，我们仍需回答这些道德哲学的核心问题。

Andrew Yao: I agree with your viewpoint - we need to differentiate individual preferences from things affecting the whole of society. But I'm somewhat pessimistic about the latter, because this is not an AI systems issue, it is a problem of the modern world, partly due to the emergence of powerful technologies like biotech or nuclear energy. I think this is the most serious issue we face - the need to utilize AGI to truly solve problems for all humanity. But many problems remain. I believe society in many parts of the world is severely polarized into two camps, both firmly believing they are right. With AI systems now assisting advocacy, it is easy to generate ten thousand written pieces and submit them to newspapers, possibly shifting the power balance in serious debates. We urgently need to address these issues, but I see little hope of resolving them. If we don’t even know human preferences on these urgent questions — these questions are sometimes matters of life and death, so we cannot pretend they don’t exist. What do you think? In many places, society seems to constantly struggle with this. I think it happens less in China, but is very common in many other countries and regions. Humans have many different goals and wants - how should we confront this? If we don't solve this problem, I don't think even beginning to control AGI is possible, because this is the first thing people think of.

姚期智：我同意你的观点，我们需要把个人的偏好和影响整个社会的事情区分开来。但我对后者有些悲观，因为这不是AI系统的问题，而是现代世界的问题，而且部分原因是生物技术或核能源等强大技术的出现。我认为这是目前最严肃的问题，也就是我们需要去使用AGI来真正解决全人类的问题。但是目前仍有很多的问题，我认为在世界上的许多地方，社会是严重分裂为两个阵营的，双方都坚信自己是对的。现在有了AI系统来帮助做宣传，借助AI技术可以轻易写出一万封稿件并提交给报社，而这可能会影响一场严肃辩论中的力量平衡。我们现在亟需解决这些问题，但我认为这似乎没有任何解决这些问题的希望。如果我们甚至不知道人类在这些紧迫问题上的偏好是什么——因为这些问题有时是生死攸关的问题，那么我们就不能假装它们不存在。请问你怎么看？在许多地方，社会似乎一直在与之斗争。我认为在中国这种现象会少一些，但在其他很多国家和地区这种现象是很普遍的。人类有很多不同的目标，人类想要的东西有很多。我们该如何面对？如果我们不解决这个问题，我认为控制AGI的事情甚至都无法开始，因为这是人们首先想到的事情。

Stuart Russell: The advent of utilitarianism in the 18th century was an important step forward. Previously, public decisions aimed to benefit a ruler's inner circle rather than ordinary people. Today, well-organized governments generally see their role as improving overall societal welfare. However, substantial disagreement remains on what “happiness” entails – beyond just GDP, it could encompass various freedoms or privileges for some groups over others.

Lingering utilitarianism problems relate directly to current disputes. For example, should sadists who enjoy inflicting pain have their interests considered? I believe not – AI systems representing society should not work for those who seek pleasure from suffering.

Another issue is "positional goods" in economics, where value stems from having something others lack. For instance, the Nobel Prize derives worth by indicating intellectual superiority over almost everyone else. By definition, not everyone can be in the top 1%. If individuals gain pride from such elite status, it cannot be universalized. When making societal decisions, should AI systems factor in these positional goods? If not, it would profoundly transform how our society functions. I believe much friction arises from the inherent limits on positional status.

Stuart Russell: 18世纪效益主义的出现是人类社会进步的一个重要进步。以前的公共决策旨在使统治阶层受益，而不是普通民众。今天，大多数政府认为其角色是提高整体社会福祉。但是,关于“幸福”的内涵存在很大分歧，它可能不仅仅是GDP，还可能包含不同群体享有的各种自由或特权。

效益主义留下的一些问题直接相关到当前的争议。例如，应该考虑虐待狂的利益吗？我认为不应该。代表社会的AI系统不应该为那些以他人痛苦为乐的人服务。

另一个问题是经济学中的“地位商品“，其价值在于拥有其他人没有的东西。例如，诺贝尔奖的价值在于显示出比其他人更高的智力。按定义，不可能人人都能进入前1%。如果个人从这种精英地位获得骄傲感，那这种感觉就不可能被普遍化。在做出社会决策时，AI系统是否应考虑这些“地位商品“？如果不考虑，会深刻改变我们社会的运作方式。我认为许多社会摩擦源自这些“地位商品“的内在局限。

On building AI systems that solve specific problems rather than AGI

Andrew Yao: You proposed constructing beneficial AI systems and using encoded proofs for critical systems. Could we come up with a "whitelist" of tasks that do not involve machines having human thoughts but actively improve human welfare? For example, we might fully support using AI to design drugs and solve cancer. Could we find ways to leverage AI systems to solve whitelist tasks? I think it resembles internet security - most universities do not teach students to hack. Before figuring out comprehensive, rigorous, systematic approaches, is it possible to explore beneficial AI techniques in this way? As you said, we really are just at an experimental stage presently, uncertain what immense difficulties may lie ahead.

与其构建AGI，不如构建解决特定问题的AI系统

姚期智：你提出了构建有益的AI系统以及关于重要AI系统的建议，即系统需要使用携带证明的编码。我们能否拟定出一些任务的白名单，这些任务不涉及机器是否具有人类的思想，而是积极的提升人类福祉的事情？例如，我们可能百分百支持使用AI技术来设计药物和解决癌症问题。我们是否能够找到一种可以利用AI系统去解决那些白名单上任务的方式？我认为这和网络安全一样，在大部分大学里不会教学生如何入侵互联网。在我们弄清楚什么是全面、严谨和系统的方法之前，有可能以这种方式探索有益的AI技术吗？正如你所说，目前我们真的只是在实验阶段，不确定前面会出现什么巨大的困难。

Stuart Russell: I believe there is still a long way to go in understanding how to create AI systems that can solve large-scale multi-agent problems while ensuring human steering. As a pioneer of nanotechnology, Eric Drexler has recently focused on AI safety. He proposed whitelisting tasks for “comprehensive AI services” rather than pursuing general AI – constructing systems to solve specific problems like protein folding, traffic prediction, etc., without general capabilities or scope that could pose widescale risks. This seems a sensible near-term approach to me.

But released systems already carry huge risks in my view, llike AI using simple conversations to persuade billions of people into war or ignore climate change. Social media algorithms already filter information and shape public views without awareness. Deploying these systems at larg scale to see what happens, as some suggest, is dangerously misguided - like testing vaccines without rigor. I believe the AI community needs a fundamentally different mindset, like that applied to vaccines, considering these systems’ reach and impact before unleashing them.

Stuart Russell：我认为，要创建既能解决大规模多主体问题又确保人类监管的AI系统，我们还有很长的路要走。纳米技术先驱Eric Drexler最近专注于AI安全，他提出通过“综合AI服务”的任务白名单方法来构建解决特定问题的AI系统，而不是通用AI。这似乎是一个明智的短期方法。但我认为已经发布的系统存在巨大风险，如通过简单对话说服数十亿人发动战争或忽视气候变化。社交媒体算法已经在无知觉下过滤信息和塑造公众观点。如某些人所建议的那样，大规模部署这些系统仅为观察结果，是危险的误导 - 如在没有严谨测试的情况下使用疫苗。我认为，AI社区需要一个与疫苗研发类似的思维转变，在释放这些影响广泛的系统之前，充分考虑其影响。

On how to control AI when it’s seemingly invincible

Andrew Yao: From a more optimistic perspective, even if large AI systems may seem like uncontrollable monsters, we can find ways to "tame" them through proper protocols. Like with quantum computing, I believe similar protocols may quickly emerge in AI in the coming years. Theorists have already discovered many methods to control quantum systems, which operate in a very different space that humans cannot easily reason about intuitively. This is analogous to medicine, where we may not fully grasp how a drug works on a molecular level, yet can still test it. What you mentioned gives hope that even as a very weak species, humans may still be able to control things that do not exist in the universe. Perhaps by following your suggestions, we will see some light in this field, truly making AI systems our servants - I'm not sure if that is the right metaphor.

势不可挡，人类如何管控AI

姚期智：从更乐观的角度来看，即使大型AI系统可能是一个我们无法控制的怪物，但我们有办法通过适当的协议来“驯服”它们。就像量子计算，我认为在AI领域，类似的协议可能在未来几年内很快问世。理论学家们已经发现有很多方法可以控制量子系统。有趣的是，量子机器可以在一个非常不同的空间里工作。目前人类并不能凭直觉很好地处理它。这与医学非常相似，我们可能不完全了解药物是如何在分子层面起作用的，但是我们可以进行一些测试。你提到的这类事情给了我们希望，即使人类是一个非常弱小的种族，我们仍也许能够控制宇宙中不存在的东西。也许通过遵循你的建议，我们会看到这一领域的一些希望，能够真正使AI系统成为我们的仆人，我不知道这样的说法是否恰当。

Stuart Russell: I believe there will be a control mechanism for AI technology similar to the control of nuclear weapons. If a group obtains nuclear weapons, they can threaten the whole world and blackmail us to achieve their goals. If this technology is more powerful than nuclear weapons, we may need to control it in a similar way. But I think nuclear weapons now need better control. The first patent for the nuclear bomb was applied for in France in 1939, but the atomic bomb was not successfully tested until 1945. However, some physicists had calculated in the 1910s that this was possible. So during World War I, some physicists were talking about the threat of nuclear war. Their view was that before the technology is developed, we need a control structure to ensure the technology is only used for human benefit, not as weapons. Unfortunately, most physicists, the establishment, and governments did not listen. If they had listened, world history might have taken a completely different and perhaps better path. Before real AGI systems are created, before a serious arms race emerges, we now have a window of opportunity. I think the concept of an arms race is very harmful because it leads to lack of cooperation, distrust, and failure of security efforts. For all these reasons, I think we should strive to establish these cooperative protocols as soon as possible. Sam Altman rightly pointed out that we can agree to share AI safety technology because it benefits every country.

Stuart Russell：我认为将会有一种类似管控核武器的管控方式来管控AI技术。如果一群人获得了核武器，他们可以威胁整个世界，并勒索我们实现他们的目标。如果这项技术比核武器更强大，我们可能需要以类似的方式来管控。但我认为现在需要更好地管控核武器。在核武器真正被制造出来之前，第一个关于核弹的专利是1939年在法国申请的，然而原子弹是在1945年首次试验成功的。但实际上一些物理学家在20世纪10年代计算出这是可能的。所以在第一次世界大战期间，一些物理学家在谈论核战争的威胁。他们的观点是，在技术开发之前，我们需要有一个管控结构，以确保技术只用于人类利益，而不是以武器的形式使用。不幸的是，大部分物理学家，建制派和政府都不听他们的。如果他们听了，世界的历史可能会朝着一个完全不同的方向发展，也许是一个更好的方向。在真正的AGI系统被创造出来之前，在出现严重的军备竞赛之前，我们现在有一个窗口期。我认为军备竞赛的概念是非常有害的，因为它导致缺乏合作，导致不信任，导致安全工作的失败。基于所有这些原因，我认为我们应该努力尽快建立起这些合作协议。Sam Altman正确地指出了我们可以同意分享AI安全技术，因为共享这些信息对每个国家都有好处。

Translator’s notes

1. In 1978, Dr. McCarthy wrote, “human-level A.I. might require 1.7 Einsteins, 2 Maxwells, 5 Faradays and 0.3 Manhattan Projects.” https://www.nytimes.com/2009/12/08/science/08sail.html

Other Authors

Chinese Perspectives on AI Safety