Biased data sets
In law enforcement:
The problem is that crime statistics do not reflect the crimes actually occurring; rather, they provide a picture of the stateβs response to crime.
β [[Future Histories]]
The data on which we train technology 'uncritically ingests yesterdayβs mistakes', as James Bridle puts it, encoding the barbarianism of the past into the future.
β [[Future Histories]]
One big problem being that we have a tendency to trust the decision made by a computer. But we have to really aware of the biases in these systems. Part of this bias is part of the bigger problem endemic in the tech industry - that's it's overrepresented by white men who have a very limited world view and a particular set of biases. The system is often going to be made in the image of its creator, right.
But aside from that ML can also biased in that if the data that goes in to them is biased, so will the outcomes be. Garbage in, garbage out. And there's a lot of biases and garbage statistics in the world. So say if policing disproportiately targets a particular group in arrests and justice treats them differently in sentencing, then they're more likely to be targeted by an algorithm based on existing policing and crime stats. You have to really challenge existing biases, not build them in to the system.
- public document at doc.anagora.org/biased-data-sets
- video call at meet.jit.si/biased-data-sets