Market Mechanisms for Multiple Minds
Christos Dimitrakakis. From: Chalmers University of Technology, to: Harvard University, USA.
Most problems in artificial intelligence are too complex to be tackled by a monolithic agent mind. In this project, we shall investigate mechanisms for the collaboration between multiple minds that wish to achieve a common goal. However, each mind only has a partial knowledge and view of the world, and they must communicate to ensure that they co-ordinate successfully. In particular, we shall investigate simple signaling mechanisms, such as sidepayments, in order to ensure that locally optimal choices of minds result in near-optimal behavior of the complete agent. The applicants have considerable experience in decision theory, game theory, statistics and reinforcement learning. The project will develop state-of-the-art algorithms for effective co-ordination of distributed learning and acting, which are important in applications such as smart grids and vehicular networks.
The project will start a long-term collaboration between Chalmers and Harvard in the field of distributed learning and decision making. This is important for problems such as smart grids, crowdsourcing and vehicular networks where information and decision making are inherently distributed, as well as for bid data problems which are too large to solve in a centralised manner. The partners complement each other perfectly. The applicant is a well-known researcher in decision theory and reinforcement learning, while Prof. Parkes is an authority on multi-agent systems and mechanism design. We expect Chalmers to benefit from this collaboration: while the field is severely underdeveloped in Swedish universities, companies such as Microsoft, Google, Adobe and Facebook are investing heavily in the area. Further collaboration is envisaged in the form of PhD student exchange, larger projects and workshops.
Projet Summary Results
We envision afuture with a massive number of AIs, these AIs owned, operated, designed, and deployed by a diverse array of entities. This multiplicity of interacting AIs, apart or together with people, will constitute a social system. We are interested in coordinating the behavior of these AIs with the goal of promoting beneficial outcomes, both when AIs interact with each other and when they interact with people. We expect that a successful theory will need to consider incentives, privacy and fairness considerations.
We are conducting fundamental research to understand the role of mechanism design, multi—agent dynamical models, and privacy-preserving algorithms, especially in the context of multi-agent systems in which the AIs are built through reinforcement learning (RL). We have made good progress in the project, and have advanced research on the following problems:
a) Helper-AI. This problem considers a two agent sequential decision—making problem, with one "helper agent" (the AI) and one human. The human is assumed to have an incorrect understanding of the way the environment works. The AI knows the correct model and that the human has a possibly incorrect belief. The AI and human act together in the same environment. Both agents have the same payoff structure but because their models are different this is a general-sum Stackelberg game rather than a common payoff game.
We have developed algorithms to compute a policy for the AI, assuming that the human may select the worst-case policy amongst those that best-respond to the AIs policy. In this sense, the AI’s behavior is "value aligned”--— it is aligned with the response of the human given his or her potentially incorrect view of the world.1 Recognizing the problem is NP-hard, we adopt a backwards induction approach and develop bounds on the performance of the resulting policy relative to the optimal Al policy. We show in simulation that the AI benefits from knowing the human model ("Multi-View Decision Processes: The Helper Agent Problem”). ln ongoing work, we are developing a method to infer the belief model of the human--— assuming multiple interactions between the Al and the human, adopting a different policy each time.
b) Cooperative-AI. This problem considers a multi—agent architecture for an AI system and appeals to mechanism design to design the reward architecture for individual agents (e.g., perception, planning, reasoning) so that the system has good overall behavior. Each agent is modeled with its own action space and state space and its own reward function. This is the agent’s intrinsic reward, and depends on its local state as well as the actions of the other agents. The challenge is to define transfers ("extrinsic reward") such that local optimality implies global optimality, and such that sufficient statistics about local state information are accurately reported.
ln ongoing work, we are using mechanism design theory to develop a Characterization of which joint policies are supported, or otherwise constrained, by this ”intrinsic + extrinsic" reward architecture. For example, we can show that (i) without transfers the only incentive-aligned policies are trivial, (ii) stability requires that an agent’s extrinsic reward for different states is invariant to its own choice of policy, and (iii) only monotone (roughly: as an agent’s reward moves more towards one state over another then the system should visit that state more often), and for a sufficiently rich space of possiblereward functions, policies that maximize alinear—affine sum of agent rewards can be supported ("Reward transfer Mechanisms for Multi-agent AI”).
Related to this, we are developing bounds on the performance of agents with differentially-private policies. These policies prevent external observers from obtaining knowledge of the variables the agent wants to keep private and can also be used to create incentive-compatible mechanisms ("Differentially-private mechanisms for Multi-agent AI").
c) Fair decision making
An additional area of research that was developed during the course of the project was how to ensure fairness in decision making, particularly when AI decisions have a societal impact. Traditional game-theoretic notions of fairness are not immediately applicable because they related to a joint decision (a "social choice") whereas many of the settings of fair decision making affecta single individual; e.g., approve a loan, offer a job, provide car insurance. We are building up recent notions of fairness in the statistical machine learning literature based on smoothness, independence of decision variables conditioned on latent variables, and notions of meritocracy.
Relevant published papers
1. Algorithms for Differentially Private Multi-Armed Bandits (A. C. Y. Tossou and C. Dimitrakakis), ln Proc. 30th AAAIConf. on Artificial Intelligence (AAAI 2016), 2016
2. On the Differential Privacy of Bayesian Inference (Z. Zhang, B. l. P. Rubinstein, and C. Dimitrakakis), ln Proc. 30th AAAI Conf. on Artificial Intelligence (AAAI 2016), 2016
3. Achieving privacy in the adversarial muitiuarmed pandit (Aristide C. Y. Tossou, Christos Dimitrakakis), In Proc. 31 st AAA/ Conf. on Artificial Intelligence (AAAI 2017), 2017.
4. Calibrated fairnessin bandits (Christos Dimitrakakis, Yang Liu. Debmalya Mandal, David Parkes, and Goran Radanovic), In Fairness, Accountability and Transparency in Machine Learning Workshop, at KDD, 2017.
5. Differential Privacy for Bayesian Inference through Posterior Sampling (Christos Dimitrakakis, Blaine Nelson, Zuhe Zhang, Aikaterini Mitrokotsa, Benjamin l. P. Rubinstein), In Journal of Machine Learning Research, volume 18, 2017
6. Multi-View Decision Processes: The Helper Agent Problem (C. Dimitrakakis, G. Radanovic, D. Parkes and P. Tylkin). ln NlPS 2017.
1. Reward transfer Mechanisms for Multi-agent AI (C. Dimitrakakis, D. Parkes and P.Tylkin) (in progress)
2. Differentially-private Mechanisms for Multi-agent Al (C. Dimitrakakis, A. Tossou, D. Parkes, G. Radanovic) (planned)
3. Subjective fairness (C. Dimitrakakis, Y. Liu, D. Parkes and G. Radanovic) (arXiv:1706.00119)