The recent advent of personalized services, which are tailored according to the interests of their users, has lead to the massive collection of personal data and the construction of detailed profiles about these users. However, in the current Internet paradigm, there is a strong asymmetry in terms of power and information between the entities that gather and process personal data (e.g., major Internet companies, telecom operators, cloud providers, ...) and the individuals to whom those data are related. As a consequence, users of information technologies have no choice but to trust the entities collecting and using their data. This lack of transparency gives rise to ethical issues such as loss of control over personal data, discrimination, and unfair processing. For instance, controversial practices have been observed such as price discrimination (i.e., customizing prices for users according to their browsing history or their location), gender discrimination as well as price steering (i.e., changing the order of search results to highlight specific products). Unfair treatments can result from biases in the data or wrong predictions made by the machine learning process used to perform the personalization. In addition, the structure of these machine learning algorithms is often complex and producing intelligible explanations about their results is quite challenging. All these issues become even more critical when this type of system is used to help decision making (or to make decisions) in sectors such as justice or health.
To address these issues, data and algorithmic transparency is a growing and emerging research area. In addition, increasing the transparency of algorithms and opening them to the scrutiny of the public or to independent authorities such as the CNIL is only the first step in order to make them more accountable. In particular, once the opacity of personalization algorithms has been lifted, one of the long-term objective is to able to measure and reduce the discrimination in these algorithms. We believe that the data and algorithmic transparency and accountability should address three aspects: (1) the traceability of the collection of personal data; (2) the verification of the compliance of algorithms with critical properties (and legal requirements) such as non-discrimination and fairness, and (3) the capacity to explain the results of algorithmic systems. The proposed Associate Team will build on the expertise of the partners to address these three complementary aspects.
The main scientific objective of this collaboration is to be able to advance the research on data and algorithmic transparency and accountability by studying it in concrete but diverse contexts. The first context that we propose to investigate thoroughly is how personal information related to the physical world, such as the presence and mobility data of users, are collected and exploited by companies to perform behavioral targeting.
The challenges are multiple. First, cyber-physical tracking systems are often passively collecting data or use deceptive techniques, and are thus hard to detect. Second, the collection and analysis of data is also a challenging task as we need to capture enough data to be able to infer and model the link between the activities of the users (in particular their mobility) and the personalized content. Finally, even if enough data are available, inferring the behavior of an algorithm is not trivial as we may only be able to interact with it as a black-box and thus we may need to approximate its behavior following a machine learning approach.
To overcome these challenges, we will split the research program of the associated team into three phases. During the first phase, we have identified existing tracking systems and studies and reconstructed their underlying mechanisms. Then, during a second phase, we will design a methodology to quantify how well a particular system meets requirements such as non-discrimination and fairness. And finally, during the third phase, we will develop methods for improving the intelligibility of systems, for example through the generation of global explanations of the logic of the system and local explanations about specific results. We will put emphasis on the interactions with the users to enhance their understanding of the system.
We will then build on these results to develop broader mechanisms that could be applied to other areas such as predictive justice and health. The investigation of this area will be done in collaboration with researchers in law from Québec and with experts of algorthmic decision making in the medical sector (such as the designers and operators of algorithms used in France to take decisions about organ transplantations).