With the growing prevalence of AI/ML, it’s vital we understand how data may be biased and what this means for business decisioning.
Ethics is a big part of data. You want to ensure that data is responsibly leveraged to make decisions that are completely justifiable and accountable. However, bias naturally comes into the equation when this data turns out to be skewed toward certain subsets, or just not representative of the total environment.
For example, a published study revealed some years back that a major healthcare risk algorithm used on over 200 million people in the U.S. exhibited racial bias. The algorithm, designed to identify patients who would benefit from high-risk care management programs, relied on a flawed metric for determining medical needs. By using previous patients’ healthcare spending as a proxy for need, the algorithm failed to recognize the higher level of medical intervention required by black patients compared to white patients with similar spending. This algorithm’s design led to unequal allocation of resources, resulting in disparities in care for black patients.
This brings us to the subject of addressing these biases in algorithms and pushing instead for ethical data practices. We’ll study more of this in this piece.
Data bias occurs when there are systematic errors that favor a specific group over another. While this may be intentional or not, it leads to unfair decisions because the available data is not an accurate representation of the population. Bias highlights the differences between a model’s predicted and actual values and in artificial intelligence/machine learning (AI/ML). These biases can be based on stereotypes rather than the definite knowledge of individuals or circumstances.
AI is gaining widespread adoption and this will continue, with global spending projected to reach $110 billion annually by 2024. No industry is spared in this wave, from healthcare and banking to retail and manufacturing. Companies are using AI software to make important (rule-driven) decisions about health, employment, insurance, creditworthiness, etc. and the benefits are glaring—improved efficiency, reduced costs and accelerated development.
However, there’s a drawback embedded in these processes—data used to feed these systems are not neutral. There’s always some form of inherent bias derived from the medium with which the data was created. What AI does is reflect and amplify the biases of their human creators, and this raises ethical concerns that relate to privacy, security, bias and human judgment.
Progress carried out an extensive study recently of data bias as a hidden risk in AI and ML. This study was conducted on 640 business and IT professionals, director level and above, who leveraged data to make decisions in tandem with AI or ML. The results were fascinating:
This survey showed that data bias is a much talked about topic and is a bigger issue than many even realize.
AI systems are only as good as the data we feed them. If the data is skewed, the AI will be too. Here are some common examples:
These all imply that when the data used to train AI models lacks diversity, the AI inherits those blind spots and biases.
Sometimes bias comes from relying too heavily on proxies. AI may use attributes like ZIP code or education level as proxies for determining things like creditworthiness or job performance. But those proxies can reflect and amplify societal biases.
Data bias also creeps in when we make assumptions about what’s “normal” or “average.” AI models trained on skewed data may see some groups as deviations from a norm that doesn’t actually exist. They end up systematically disadvantageous to certain groups.
Left unaddressed, the consequences of data bias and flawed AI can accumulate and worsen over time. As these systems are increasingly integrated into critical processes, their biases may become further entrenched and amplified. The good news is we can take practical steps to immediately address data bias.
Look at how data is collected and who is included/excluded. The data you feed into your AI systems is the foundation that algorithms build upon. Look closely at where your data comes from and how it was collected. Consider context—what seems like an objective decision in one context could be problematic in another. Are there any biases or skews in who or what is being measured? For example, if you’re building a healthcare diagnostic tool but your data comes only from patients at a single hospital, it may not generalize well to the general population.
Diversity and inclusion in AI development isn’t just a “nice to have”—it’s crucial for building systems that are fair and unbiased which serve the needs of all groups. Homogenous teams often lack the breadth of experience to identify their own blind spots. That often leads to region-based or cultural stereotypes (appearances, class, gender) causing prejudice bias. Including diverse voices helps address this and leads to AI that benefits and empowers all of humanity.
The metrics you choose to measure success and optimize for can also introduce bias. For example, if you’re building an AI for recruiting and you only measure the hiring rates of candidates as a metric, you may end up with bias against minority groups that face discrimination in the hiring process. Consider metrics that account for fairness and equity, not just raw numbers.
Look for attributes in your data that could act as proxies for sensitive attributes like race, gender or sexual orientation. For example, ZIP codes and income levels are often correlated with race. If your AI makes decisions based on proxies like these, it can negatively impact marginalized groups. Remove or account for proxy attributes whenever possible.
Once your AI system is built, test it to make sure there are no unintended consequences for different groups. See if error rates, false positives/negatives or other impacts differ significantly based on attributes like race, age, gender, etc. If you find disparate impact, you’ll need to re-examine your data and algorithms to address the root causes.
Explain how your AI systems work to stakeholders. Be open about data sources, decision processes and potential issues. Only by acknowledging and understanding the problem can we make progress toward fair and ethical AI.
A policy-driven approach is essential for preventing bias in big data as AI and ML are increasingly integrated into business operations. This approach involves establishing ethical standards, best practices for data collection and model development, regular model evaluation, ongoing monitoring and collaboration among stakeholders.
Progress Corticon, a business rules management system, can help businesses implement this by providing a platform for defining and managing business rules and policies. By leveraging Corticon’s intuitive interface, businesses can design, model and maintain operational decisions and the business rules behind them, ensuring transparency and fairness in their decision-making processes.
Test drive Corticon today.
John Iwuozor is a freelance writer for cybersecurity and B2B SaaS brands. He has written for a host of top brands, the likes of ForbesAdvisor, Technologyadvice and Tripwire, among others. He’s an avid chess player and loves exploring new domains.
Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.
Learn MoreSubscribe to get all the news, info and tutorials you need to build better business apps and sites