Liars, Outliers, and Algorithmic Fairness
This past week, I was at a pair of workshops on Workshop on Fairness, Accountability, and Transparency in Machine Learning and Data and Algorithm Transparency. They were both great workshops.
For obvious reasons, the election hung as something of a cloud over the meetings. It wasn’t constantly discussed, but we kept returning to it from time to time. It’s pretty sad, in my opinion, when ‘what does this work like if rule of law collapses?’ is a live question. Regulation is a key outcome of fairness research, and representatives from a number of regulatory agencies were in attendance. There’s a very real concern that regulation and policy will not be available levers for the next several years.
Some of the discussion, therefore, was about ways to supplement or compensate for lack of regulatory mechanisms. Far more questions were raised than answered, I think, but it was discussed both in the panels and in hallway discussions.
As we were talking about this, I couldn’t help but think of Bruce Schneier’s Liars and Outliers. I think this book provides a very helpful framework and language for reasoning about what, exactly, we might be trying to do as we promote fairness and nondiscrimination. These are the pressures that we are concerned will be unavailable, at least at the national level. If new fairness, privacy, and nonmaleficence regulation is unlikely to be enacted, then institutional pressures are off the table for discouraging people from building racist machine learning tools or AI systems that produce - intentionally or not - deletirious effects. This was definitely discussed, most notably by Kate Crawford but also by others throughout the meetings. The need for developers to think through the ethics of what they are building was brought up several times, but it was also preaching to the choir. We did discuss a little bit how to get information about detecting and mitigating unfairness out to working data scientists, but do not currently have much by way of solutions. We need to work on this, though. We do not have the luxury of ignoring the ethical implications of our work. Too many focus just on the technical and consider ethics & external validity to be out-of-scope, and down that path lies a great deal of harm and wrongdoing. I’ve talked to some of them; one time when I raised a concern about the validity of the scientific underpinnings of a health-related system, the response I got was basically ‘not my problem, I just build the system’. Mark Van Hollebeke gave a great presentation about adopting data ethics as a corporate value; I see this as partly a kind of corporate moral pressure, and a starting point for using education and company loyalty to encourage individual developers to feel the need for ethical data-driven applications. This was actually brought up once or twice, in the form of ‘what can tools like SciKit-Learn do to make fairness easier/automatic/default’. But it will always be possible - and not very hard - to build machine learning systems with no regard to their social or ethical impact. Given how computers work, it seems very difficult to conceive of what security pressures to prevent maleficent machine learning might be. The closest relevant touchstone I can think of is the Four Laws of Robotics; in Asimov’s Robot universe, they were baked in to robots at the lowest level and no one knew how to build a robot that did not obey them. It wasn’t impossible, but it would require the rather substantial undertaking of re-deriving the entire workings of a positronic brain from a different set of fundamental principles. For better or worse, though, this doesn’t work with the ways that computers actually work. We have the math for morally oblivious machine learning, and that math won’t go away. Fairness is built on top of it as an add-on. So I think that security pressures will be difficult to implement on a widespread scale. They can certainly be implemented on a smaller scale, though. Individual companies can design their data access and machine learning infrastructure to limit the kinds of queries and models that developers can run. I understand Google does this for keeping employees from snooping around in users’ data, and Apple’s rollout of differential privacy may also play this role. But it doesn’t realy help get people and companies to consider data ethics if they aren’t doing so already. Encouraging and incentivizing more widespread consideration of ethics in AI sytems is not an easy task, and it isn’t made any easier by current swings in the U.S. political climate. Liars and Outliers may, however, provide a useful roadmap for helping us navigate this world.Institutional Pressures
Moral Pressures
Security Pressures
Wrapup
Social Pressures
Social pressures were probably the second most-discussed, after the need for individual ethics. Once or twice the idea of codes of ethics and possibly licensure were raised, but it seems to me that a code of ethics needs enforcement and that licensure is a pretty heavy-handed regime that won’t necessarily fit well with machine learning, AI, and data science.
I would love to see an extension of Cantrell, Salido, and Van Hollebeke’s thinking to look at what companies who have adopted data ethics values can do to encourage or pressure other companies to do likewise.
Social pressures are a complicated thing, too. We need to earnestly and regularly argue for the need for AI researchers and developers to think about these issues, but also need to encourage them to get on board rather than just bop them with social censure. I think there is probably a role for censure in egregious cases, but we need to find other ways to strongly encourage developers to think critically about their systems.