Wen GAO
高文

About the author Wen Gao is Director of Peng Cheng Laboratory, a research lab in Shenzhen that has contributed to Chinese advances in pre-trained language models, and Dean of the School of Information Science and Technology at Peking University. He is also an academician of the Chinese Academy of Engineering.

He has extensive experience advising on state-led S&T initiatives. In 2018 he delivered a presentation at a Politburo study session on AI. He is a member of China’s Science and Technology Ethics Committee; Deputy Chief of the Advisory Group of the Ministry of Education’s AI Technology Innovation Expert Group, which was established in 2018 to advise on topics including talent development and academia-industry collaboration; and is one of 27 members of the Ministry of Science and Technology’s Next Generation Artificial Intelligence Strategic Advisory Committee. He was formerly deputy director of the National Natural Science Foundation of China and from 1996-2000 was chief of the National High-Tech R&D Program (863 Program) Intelligent Computing Expert Group.

关于作者高文，中国工程院院士、北京大学博雅讲席教授，鹏城实验室主任，新一代人工智能产业技术创新战略联盟理事长，全国信息技术标准化技术委员会委员，数字音视频编解码技术标准(AVS)工作组组长，国际电气电子工程师协会会士（IEEE Fellow）、国际计算机协会会士（ACＭ Fellow）。他曾任国家自然科学基金委员会副主任。1996年担任国家863计划信息领域智能计算机主题专家组组长。主要从事人工智能应用和多媒体技术、计算机视觉、模式识别与图像处理、虚拟现实方面的研究。主要著作有《数字视频编码技术原理》、《Advanced Video Coding Systems》等。

The following excerpts are from Technical Countermeasures for Security Risks of Artificial General Intelligence, Strategic Study of Chinese Academy of Engineering, 2021, Volume 23, Issue 3.

Gao has mentioned this paper at least three times during 2021-22: at the China Electronics and Information Technology Conference (May 2022); at the AI Cooperation and Governance international conference hosted by Tsinghua University (December 2021 and December 2022).

▶ Cite Our TranslationConcordia AI. “Wen Gao — Chinese Perspectives on AI Safety.” Chineseperspectives.ai, 29 Mar. 2024, chineseperspectives.ai/Wen-Gao.

▶ Cite This Work 刘宇擎,张玉槐,段沛奇,施柏鑫,余肇飞,黄铁军 & 高文(2021). 针对强人工智能安全风险的技术应对策略. 中国工程科学(03), 75-81.

Selected excerpts

Abstract：

“Human beings might face significant security risks after entering into the artificial general intelligence (AGI) era. By summarizing the difference between AGI and traditional artificial intelligence, we analyze the sources of the security risks of AGI from the aspects of model uninterpretability, unreliability of algorithms and hardware, and uncontrollability over autonomous consciousness. Moreover, we propose a security risk assessment system for AGI from the aspects of ability, motivation, and behavior. Subsequently, we discuss the defense countermeasures in the research and application stages. In the research stage, theoretical verification should be improved to develop interpretable models, the basic values of AGI should be rigorously constrained, and technologies should be standardized. In the application stage, man-made risks should be prevented, motivations should be selected for AGI, and human values should be given to AGI. Furthermore, it is necessary to strengthen international cooperation and the education of AGI professionals, to well prepare for the unknown coming era of AGI.”

精选原文

摘要：

“未来进入强人工智能（AGI）时代，人类可能面临重大安全风险。本文归纳了 AGI 与传统人工智能的区别，从模型的不可解释性、算法及硬件的不可靠性、自主意识的不可控性三方面研判了 AGI 安全风险的来源，从能力、动机、行为3 个维度提出了针对 AGI 的安全风险评估体系。为应对安全风险，从理论及技术研究、应用两个层面分别探讨相应风险的防御策略：在理论技术研究阶段，完善理论基础验证，实现模型可解释性，严格限制 AGI 底层价值取向，促进技术标准化；在应用阶段，预防人为造成的安全问题，对 AGI 进行动机选择，为 AGI 赋予人类价值观。此外，建议加强国际合作，培养强AI 研究人才，为迎接未知的强AI 时代做好充分准备。”

Consciousness is core to AGI:

“In terms of cognitive theory, the concept of AGI emphasizes the existence of consciousness and highlights systems of values and worldviews.”

意识是强人工智能的核心：

“在认知论方面，AGI 强调意识的存在，突出价值观和世界观体系，认为智能体可以拥有生物的本能。”

China is behind in research relating to AGI security：

“Evaluating and formulating strategies for coping with potential AGI security risks, and finding measures that will ensure that AGI is beneficial to humanity rather than harmful to society, have become research topics worldwide. For instance, in 2016, the U.S. lab OpenAI analyzed potential security problems that might arise in the development of AI¹. In 2018, the U.S. government established the National Security Commission on Artificial Intelligence². In addition, the EU set up the High-Level Expert Group on AI to help it strive for discourse and rule-making power in technological development³. AI has also become a subject of significant attention in the field of national defense. For example, AI is being adopted to improve the capability of defense systems, and AI anomaly detection technology is being developed to prevent malicious tampering of private data. AI theories and technologies, including algorithms integrating multiple disciplines, self-adaptive situational awareness, and human-machine trust, are also being studied⁴.”

中国在与强人工智能安全相关的研究方面落后：

“对 AGI 可能的安全性风险进行评估并制定适宜对策，探讨有效驾驭 AGI 并使之既造福于人类又不对社会造成危害的举措，已经成为世界性的研究议题。例如，美国 OpenAI 团队 2016 年分析了 AI 发展过程中可能遇到的安全问题，随后美国政府成立了人工智能安全委员会；欧盟设立了人工智能高级别专家组，争取技术发展的话语权和规则制定权。此外，AI 也成为国防领域的重点关注对象，如采用 AI 手段提高防御系统能力，发展 AI 异常检测技术用于防止隐私数据被恶意篡改，研究涉及多学科融合算法、自适应态势感知能力、人机信任等方面的 AI 理论与技术。”

“It should be noted that, in terms of research related to AGI security issues, there is a gap between China and the international frontier of progress. Chinese academic and industry circles are paying more attention to the development of AI and less to the value of and need for AGI security.”

“也要注意到，针对 AGI 安全问题，我国相比国际前沿进展存在一定差距；国内学术界、产业界较多专注于 AI 的发展，很少关注 AGI 安全性保障的价值和需求。”

Uncontrollability of autonomous consciousness is one of three main sources of AGI security risk:

“The construction of an initial intelligent agent and effective principles for evolution are key to the design of an AGI system that can conduct self-development and self-iteration. Although human beings can control the initial intelligent agent well, AGI can design rules of evolution autonomously, possibly much more efficiently than human beings. After it undergoes recursive self-improvement, AGI will have a higher development efficiency in the subsequent stages and will surpass the cognition of human beings by a long way through recursive self-improvement.

自主意识是强人工智能安全风险的三大主要来源之一：

“构建初始智能体、有效进化准则，是能够自我发展、自我迭代 AGI 系统设计的关键。人类可以很好地控制初始智能，但是 AGI 可以自主设计进化规则，这种设计进化规则的效率可能足以碾压人类。自我发展后的 AGI，在后续阶段的发展效率将会更高，通过递归地自我改进而使其远超人类认知。”

“AGI with autonomous consciousness⁵ carries potential risks. Unlike those of the human brain, the computational and analytical abilities of AGI are theoretically limitless. AGI has efficient data collection, processing, and analysis abilities and can understand all the information it sees, hears, and receives. Once it achieves consciousness, AGI will be able to share and exchange information through communication and significantly improve its understanding of the world and the efficiency through which it can transform reality. Accordingly, AI may gradually conduct various human activities. With the emergence of autonomous consciousness, the legal status of AGI becomes unclear: should it be seen as a subject with consciousness or as personal property? This may lead to disagreements at the legal, ethical, and political levels, and cause unexpected consequences.”

“具有自主意识的 AGI 具有潜在风险。不同于人脑，AGI 的计算和分析能力在理论上是没有边界的，具有高效的数据收集、处理、分析能力，可理解看到、听到、接收到的所有信息。AGI 被赋予自主意识后，可通过交流、沟通的方式进行信息的分享与交换，显著提高对世界的认知、理解与改造效率。相应地，人类的各种活动都有可能逐步被 AI 取代。由于自主意识的呈现，AGI 的法律定位出现了模糊：将其视为有意识的主体，还是个人的私有财产？这可能在法律、伦理、政治层面引入分歧，从而引发难以预料的后果。”

AGI might undergo a “treacherous turn” after its capabilities grow and it develops consciousness:

“There is no need to worry that AI might cause harm to human beings when it is weak and can be controlled by them. However, once AI completely surpasses humans in all abilities and possesses consciousness, it will become difficult to assess whether AI will necessarily continue to obey the orders of human beings. This situation has been called the “treacherous turn”⁶. Although the questions of whether AI has human consciousness and how it will realize humanlike consciousness remain unanswered, they are worthy of attention and research.”

强人工智能在能力提升、发展出意识之前可能经历一次“背叛转折”：

“就 AI 而言，在其能力弱小、可被人类控制的阶段，不必担心对人类造成危害。当 AI 的各方面能力超过人类、和人类一样拥有意识后，就很难判断是否必然继续听从人类命令，这种情况称为“背叛转折”。AI 是否具有人类意识、依靠何种方式实现类人意识，尽管尚属未知，但同样值得关注和研究。”

Monitoring the behavior of AGI during testing is not sufficient to provide reliability guarantees:

“The supervision and control of AGI behaviors can be regarded as a “principal-agent” problem, where humans are the principals and AGI systems are the agents. However, this differs from the current “principal-agent” problems between people, because AGI can formulate differential strategies and actions based on its analytical capabilities and knowledge reserves. Therefore, the monitoring of AGI behavior during testing at an early stage of research and development cannot support humans in making rational inferences about the future reliability of AGI. This means that behaviorist methods may fail.”

在测试过程中监督强人工智能的行为不足以提供可靠保障：

“对 AGI 行为的监督和控制，可视为一类“委托–代理”问题，即人类是委托方， AGI 系统是代理方。这与当前人类实体的“委托–代理”问题性质不同，即 AGI 可根据自己的分析能力、知识储备来自行制定差异化的策略与行动。因此，监测 AGI 在研发初期的测试行为，并不能支持人类合理推测 AGI 未来的可靠性。如此，行为主义方法可能失效。”

AGI based on cognitive neuroscience and meta learning may have advantages for verification and interpretability:

“Improving the verification of theoretical foundations and exploring the interpretability of models constitute the foundations of AGI accuracy and the formal guarantees of AGI security.”

基于认知神经科学和元学习的强人工智能可能有助于可验证性和可解释性：

“完善理论基础验证、探索模型的可解释性，是 AGI 正确性的构建基础，也是 AGI 安全的形式化保障。”

“The model design of AGI should be explored based on cognitive neuroscience, the discipline that studies the brain’s structure and investigates the brain’s mode of operation based on its biological structure and the cognitive ability of human beings. A suitable AGI model can be designed based on the structure and mode of operation of the human brain.”

“应以认知神经科学为基础，探索 AGI 的模型设计。认知神经科学是基于大脑的生物结构、人类的认知能力，研究脑构造、探索脑运行方式的学科；借鉴人脑结构和运行方式，可设计适当的 AGI 模型。”

“The implementation of AGI should be based on meta learning, a method of learning how to learn⁷ that enables AI to think and reason… As one of the implementation methods of semi-supervised and unsupervised learning, meta learning is an important mathematical implementation for simulating human learning processes. Seeking methods for such simulations can improve the model interpretability, explore ways to enable AGI to “learn to learn,” and develop consciousness similar to that of human beings.”

“应以元学习为基础，探索 AGI 的实现方法。元学习是学习“学习方法”的方法，可赋予 AI 思考和推理的能力；……元学习则是经验导向，基于过去的经验去学习新任务的解决办法，可使 AI 掌握更多技能、更好适应复杂的实际环境。元学习作为半监督、无监督学习的实现方式之一，是模拟人类学习过程的重要数学实现；寻求通过数学方法模拟人类学习过程的手段，据此提高模型的可解释性，探索让 AGI “学会学习”，像人类一样“产生自主意识”。”

The underlying value orientation of AGI must be constrained and monitored:

“Explicit rules should be designed to limit the range of action of AI. In view of the complexity and uninterpretability of AI, it is difficult to constrain and monitor its value orientation using the source code. Constraining the value orientation of AI from a behavioral perspective and limiting the behavioral ability and action permissions of AGI using explicit rules are key research objectives. ”

强人工智能的底层价值取向需要被控制和监督：

“应设计明文规则，限制 AI 的行动范围。鉴于 AI 的复杂性、不可解释性，很难从源代码角度对其价值取向进行限制和监控。从行为角度对 AGI 的价值取向进行限制，通过明文规则来限制 AGI 的行为能力和动作权限，是重要的研究目标。”

“An underlying value network can be constructed during the process of meta learning to accelerate inference training and guide the action network to take action.⁸ The algorithm for the underlying value network is complex, and the dataset cannot be controlled, making it extremely difficult to adopt measures to limit the inference process of the network. For the action network, explicit rules can be manually added to ensure that each action is in line with the correct values (i.e., limiting the occurrence of incorrect behaviors for every independent action).”

“在元学习的过程中，可构建底层的价值观网络来加速推理，指导行动网络采取行为。关于底层的价值观网络，算法具有复杂性，数据集存在不可控性，很难采取措施对其推理过程进行限制。关于行动网络，可人为加入明文规则，确保在原子行动上符合正确的价值观（即针对每一个独立动作，限制错误行为的出现）。”

“Trusted computing technology should be applied to monitor AI actions. Trusted computing is a mechanism for defending against malicious code and attacks and can be regarded as an “immune system” for computers. Additional supervision is introduced to build a complete, trustworthy, and quantifiable evaluation mechanism for various computer behaviors, and then judge whether these behaviors meet the expectations of human beings, thus preventing and handling actions that cannot be trusted.”

“要应用可信计算技术，监控 AI 的行动内容。可信计算是一种针对恶意代码、恶意攻击的防御机制，可视为计算机的“免疫系统”；引入额外监督，对计算机的各种行为建立完整、可信、可量化的评价机制，据此判断各种行为是否符合人类的预期、对不可信的行动进行防治。”

“The operation process of AGI should be monitored and analyzed. A time series analysis can be used to determine if the current behavior has a reasonable value orientation. If it does not conform to such an orientation, an external intervention method should be adopted to interrupt the current action of the AGI and ensure that the AGI will not act contrary to values.”

“应用于 AI 的行动过程监控，即可认为具备正确价值观的行为是合理可信的。监控并分析 AGI 行为的运行过程，通过时间序列来判断当前行为是否具备合理的价值取向；如不符合，采用外部干预的方式干扰或打断 AGI 的当前行动，确保 AGI 不会做出违背价值观的行为。”

Motivation selection for AGI should happen in advance so that superintelligence does not wish to harm humans:

“At the “treacherous turn” stage, AI has already developed cognitive abilities far exceeding those of human beings in various fields. This can be referred to as superintelligence.⁹ Given the reasonable assumption that superintelligence may betray human beings, humans should select the motivation of intelligent agents in advance to fully prevent undesirable results and equip superintelligence with the innate wish to not harm human beings.”

应该提前对强人工智能进行动机选择，使超级AI不具有对人类造成危害的自发意愿：

“‘背叛转折’阶段的 AI 已经具有在各个领域都远超人类的认知能力，可称为超级 AI 。基于超级 AI 可能会背叛人类的合理猜想，人类应当提前对智能体的动机进行选择，全力制止不良结果的出现；应使超级 AI 具有不对人类造成危害的自发意愿。”

Four approaches to motivation selection have been proposed: direct specification, domesticity, augmentation, and indirect normativity.¹⁰ [An explanation of these four approaches follows.]

针对动机选择问题，当前研究讨论提出了直接规定、驯化、扩增、间接规范 4 种应对方式。【后续解释了这四种应对方式】

Loading AGI with human values may be a more reliable approach than motivation selection, and requires whole-brain emulation:

“Although motivation selection improves the effectiveness of human control over AI, compared to limiting its ability, several problems still exist. For example, AI may face an infinite number of situations, making it impossible to discuss solutions for every situation, and it is infeasible for human beings to continuously monitor the motivation of AI. In this case, one feasible solution is to endow AGI with human values (by loading them into the AGI), thereby allowing it to consciously execute actions that will not pose a threat to human beings. It is impossible to fully represent the motivation systems present under all situations in a table (which would lead to an infinitely large table)...

与动机选择相比，为强人工智能赋予人类价值可能是更可靠的方法，且需要全脑仿真：

“相比限制 AI 的能力，动机选择已经在一定程度上提升了人类控制 AI 的有效性，但仍面临一些问题。例如，AI 可能面对无穷多种情况，不可能具体讨论每一种情况下的对策，而人类本身不可能持续监视 AI 的动机。可行的思路之一是将人类的价值观赋予 AI（加载到 AGI 内部），让其自觉地执行那些不对人类构成威胁的事件。无法将各种情况下的动机系统均完整具象为可以查询的表格（导致无穷大的表格）......”

“The use of evolutionary algorithms is a feasible route for value loading… However, the accumulation of human values is the result of our genetic mechanism evolving over millions of years and imitating or reproducing this process would be extremely difficult. Because this mechanism has adapted to the neural cognitive architecture of human beings, it can only be realized through whole-brain emulation.¹¹As the premise of whole-brain emulation, the brain is a computer that can be simulated. However, it faces three challenges: scanning, translation, and simulation.¹²The required precision can only be achieved using high-throughput microscopy and supercomputing systems.”

“进化算法可能是加载价值观的可行途径之一......然而，人类价值观的积累过程是人类相关基因机理经历成千上万年进化的结果，模仿并复现这一过程非常困难；这一机理与人类神经认知体系结构相适应，因而只能应用于全脑仿真。全脑仿真的前提是大脑可被模拟、可以计算，面临着扫描、翻译、模拟 3 类条件的制约，采用高通量显微镜、超级计算系统才能达到所需精确度。”

International cooperation for AGI is necessary:

“AGI research has become a subject of international attention. Only by concentrating the scientific and technological strengths of the whole of humanity can we ensure that AGI better serves society. The process of AGI research and its gradual application involve many unknown problems. Strengthening international AGI cooperation and promoting the sharing of research will be necessary to improve the ability to respond to emerging situations and guarantee the implementation and expansion of AGI applications….”

强人工智能国际合作是必要的：

“AGI 研究已经成为国际性的关注点，集中全人类的科技力量来推进 AGI 的深化研究，才能使 AGI 更好服务人类社会。相关研究和逐步应用的过程，将面临许多未知问题。加强 AGI 国际合作、促进研究成果共享，才能根本性地提高应对突发情况的能力，也才能真正保障 AGI 的应用落地和拓展......”

A controlled intelligence explosion and dynamically responding to risks are needed to prevent catastrophic outcomes:

“The intelligence and behavior of AGI cannot simply be equated to those of human beings. The motivation for creating AGI is to benefit human society. However, to protect the privacy of individuals and society as a whole, AGI should be controlled such that it only serves human beings passively, rather than learning on its own initiative. If there is an intelligence explosion once AI has evolved to a certain level, the default result will inevitably be catastrophic. In view of such potential threats, humanity should continue to monitor the risks and search for countermeasures to avoid the occurrence of this default ending. Humanity should design a controlled intelligence explosion and set the proper initial foundations, all while achieving humanity’s desired results and ensuring that all consequences remain within an acceptable range.

受控制的智能爆发与对风险的动态应对对于防范灾难性后果是不可或缺的：

“AGI 的智慧与行为不能简单地与人类划等号，创造 AGI 的动机是为了更好地造福人类社会。对于人类社会的隐私，应控制 AGI 只能给人类提供被动的服务，而不是主动的学习。如果 AI 进化到一定水平后出现智能爆发，默认后果必然是造成确定性灾难。面对这样的潜在威胁，人类应持续关注并着力寻求应对方法，坚决避免这种默认结局的出现；设计出受控制的智能爆发，设置必要的初始条件，在获得人类想要的特定结果的同时，至少保证结果始终处于人类能接受的范围。”

In the future, we recommend paying close attention to the technological evolution of AGI and proposing dynamic strategies for responding to potential security risks. We should examine international discussions and drafting of AGI policies, integrate cutting-edge legal and ethical findings, and explore the elements of China’s AGI policymaking in a deeper and more timely manner.”

“着眼未来发展，建议持续关注 AGI 的技术演进路线，对技术伴生的潜在安全风险提出动态的应对策略；参考国际性的 AGI 政策研讨和制定过程，结合法律、伦理方面的前沿成果，更为及时、深刻地探讨我国 AGI 政策的制定要素。”

Notes

1. Amodei D, Olah C, Steinhardt J, et al. Concrete problems in AI safety [EB/OL]. (2016-07-25) [2021-02-15]. https://arxiv.org/ abs/1606.06565.

2. Congress of the United States. H.R.5356-National security commission artificial intelligence act of 2018 [EB/OL]. (2018-03-20) [2021-02-15]. https://www.congress.org/bill/115th-congress/housebill/5356.

3. China Academy of Information and Communications Technology. Global AI governance report [EB/OL]. (2020-12-30) [2021-02- 15]. https://pdf.dfcfw.com/pdf/H3_AP202012301445361107_1.pdf?1609356816000.pdf. Chinese.

4. Jin J, Qin H, Dai Z X. Top-level strategy of artificial intelligence security and the research status of key institutions in the United States [J]. Civil-Military Integration on Cyberspace, 2020 (5): 45–48. Chinese.

5. Translator’s note: The meaning of the word ‘consciousness‘ (意识) in this piece is ambiguous. Due to the association with the words ‘uncontrollability’ (不可控性) and ‘autonomous’ (自主), we interpret it as emphasising an ability to control one’s own development.

6. Bostrom N. Superintelligence: Paths, dangers, strategies [M]. Oxford: Oxford University Press, 2015.

7. Vilalta R, Drissi Y. A perspective view and survey of meta-learning [J]. Artificial Intelligence Review, 2002, 18(2): 77–95.

8. Translator’s note: The value network (价值观网络, jiazhiguan wangluo) and action network (行为网络 xingwei wangluo) referred to here are distinct from the value network (数值网络 shuzhi wangluo) and policy network (策略网络 celue wangluo) of reinforcement learning.

9. Bostrom N. Superintelligence: Paths, dangers, strategies [M]. Oxford: Oxford University Press, 2015.

10. Bostrom N. Superintelligence: Paths, dangers, strategies [M]. Oxford: Oxford University Press, 2015.

11. Huang T J. Can human build “super brain”? [N]. China Reading Weekly, 2015-01-07(5). Chinese.

12. Bostrom N. Superintelligence: Paths, dangers, strategies [M]. Oxford: Oxford University Press, 2015.

Other Authors

Chinese Perspectives on AI Safety

Wen GAO 高文