Ethics in Data Science

Ethics in Data Science#

Exercise#

Understand Data Ethics#

Exercise#

Split in 3 groups and open chapter 1 from the book.

Group 1: Data Science by Whom?

Group 2: Data Science for Whom?

Group 3: Data Science with Whose Interests and Goals?

https://data-feminism.mitpress.mit.edu/pub/vi8obxh7/release/4

Summarise the text as answers to these 3 questions:

1. The problem 2. One example 3. A possible solution

https://data-feminism.mitpress.mit.edu/pub/vi8obxh7/release/4

Definition

Data ethics refers to the moral and responsible use of data. It involves considering the implications and consequences of collecting, storing, analyzing, and disseminating data, especially in the context of technology, data science, and artificial intelligence. The goal of data ethics is to ensure that data practices are aligned with principles such as fairness, transparency, accountability, and privacy. __Form: https://atlan.com/data-ethics-101/

Key aspects

Fairness
Transparency
Privacy
Security
Accountability
Consent
Data Governance
Societal Impact

»Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit.«

Cathy O’Neil, Weapon of Math Destruction

Examples of ethical concerns#

Fairness:

Issue: Ensuring that data and algorithms are not biased and do not discriminate against individuals or groups.

Example: A loan application system uses a black-box machine learning model that denies loans, but customers are not told how decisions were made.

Transparency:

Issue: Making data processes and decisions understandable and visible to stakeholders, reducing opacity in algorithms and data usage.

Example: A loan application system uses a black-box machine learning model that denies loans, but customers are not told how decisions were made.

Privacy:

Issue: Respecting individuals’ rights to control their personal information and ensuring that data is handled securely.

Example: A fitness app collects users’ location and health data without clearly informing them of the extent of data collection.

Security:

Issue: Protecting data from unauthorized access, ensuring data integrity, and implementing measures to prevent breaches.

Example: A healthcare provider experiences a data breach, exposing patients’ personal and medical records.

Accountability:

Issue: Holding individuals and organizations responsible for the ethical use of data, including being able to explain and justify decisions made using data.

Example: An AI-powered resume screening tool unfairly rejects qualified candidates, and no team or individual takes responsibility for the error.

Consent:

Issue: Obtaining informed consent from individuals before collecting, processing, or sharing their data.

Example: A smart home device records conversations without users’ explicit permission.

Data Governance:

Issue: Establishing frameworks and policies for how data is managed, including defining roles, responsibilities, and processes.

Example: A company shares customer data with third-party vendors without proper tracking, leading to unauthorized access.

Societal impact:

Issue: Considering the broader social implications of data use and ensuring that it contributes positively to society, considering unintended consequences.

Example: A predictive policing algorithm disproportionately targets certain neighborhoods, leading to over-policing and reinforcing negative stereotypes.

What can we do?#

Build better model: Fairness over accuracy.

Collect better data: Garbage in, Garbage out.

Accountability and revision of the algorithm

Challenge institution that misuse the data

Demand for more regulation

… and more

»How do we start to regulate the mathematical models that run more and more of our lives? I would suggest that the process begin with the modelers themselves. Like doctors, data scientists should pledge a Hippocratic Oath, one that focuses on the possible misuses and misinterpretations of their models. Following the market crash of 2008, two financial engineers, Emanuel Derman and Paul Wilmott, drew up such an oath. It reads:

I will remember that I didn’t make the world, and it doesn’t satisfy my equations.

Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.

I will never sacrifice reality for elegance without explaining why I have done so.

Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights.

I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.

Cathy O’Neil, Weapon of Math Destruction

Practical examples of possible solutions to some ethical concern.#

Fairness:

Perform bias audits on the dataset and algorithm to identify biases.
Use re-sampling or re-weighting techniques to correct for imbalanced data.
Introduce fairness constraints into the algorithm, such as equal opportunity or demographic parity.
Use bilanced data.
Regularly monitor the model’s performance to detect any unfair patterns after deployment.

Transparency:

Use explainable AI techniques, such as SHAP (SHapley Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations), to provide users with insights into why a particular decision was made.
Create clear documentation for stakeholders and end-users about how the model works and what data it relies on.
Provide appeal mechanisms for customers to challenge decisions.

Privacy:

Implement data minimization principles—only collect data necessary for the service provided.
Use encryption to secure sensitive data in storage and during transmission.
Provide privacy policies in plain language and let users easily access, manage, and delete their data.

Security:

Apply multi-factor authentication and encryption to protect sensitive data.
Conduct regular penetration testing and security audits to identify vulnerabilities.
Ensure incident response plans are in place to address and mitigate breaches quickly.

Accountability:

Establish clear accountability frameworks, assigning roles and responsibilities to specific teams or individuals for the design, deployment, and monitoring of models.
Require model documentation and audits to track who made what changes and why.
Implement audit logs and make performance reports available to both internal and external stakeholders.

Consent:

Ensure informed and explicit consent by giving users clear information about what data will be collected, how it will be used, and with whom it will be shared.
Use opt-in mechanisms rather than opt-out for data collection.
Provide easy-to-use dashboards where users can manage their data preferences.

Data Governance:

Implement a data governance framework that defines how data is collected, stored, accessed, and shared.
Use data catalogs and metadata management tools to track data flows and ensure compliance with policies.
Assign data stewards or data owners to oversee data usage and enforce governance rules.

Societal impact:

Assess the societal impact of the model using impact assessments that involve diverse stakeholders, including community representatives.
Regularly evaluate the system’s outcomes and unintended consequences and adjust the algorithm as necessary.

Terminology#

Cultural Agility

Cultural agility is the ability to understand multiple local contexts and work within them to obtain consistent organizational results.

Diversity

The presence of different and multiple characteristics that make up individual and collective identities, including race, gender, age, religion, sexual orientation, ethnicity, national origin, socioeconomic status, language, and physical ability.

Inclusion

Is creating environments in which any individual or group can be and feel welcomed, respected, supported and valued to participate fully.

Equity

The process of identifying and removing the barriers that create disparities in the access to resources and means, and the achievement of fair treatment and equal opportunities to thrive.

Intersectionality

The intertwining of social identities such as gender, race, ethnicity, social class, religion, sexual orientation or gender identity, which result in unique experiences, opportunities, barriers or social inequality.

Be Inspired, J. K. Nelson: Building a Foundation of Shared Language, Cultural Awareness, & Belonging

»Intersectionality is a lens, a prism, for seeing the way in which various forms of inequality often operate together and exacerbate each other. We tend to talk about race inequality as separate from inequality based on gender, class, sexuality or immigrant status. What’s often missing is how some people are subject to all of these, and the experience is not just the sum of its parts.«

Kimberlé Williams Crenshaw

Terminology#

Oppression/Subjugation

A system of supremacy and discrimination for the benefit of a limited dominant group perpetuated through differential or unjust treatment, ideology and institutional control.

Privilege

Unearned access to resources (social power) only readily available to some people as a result of their advantaged social group membership.

Be Inspired, J. K. Nelson: Building a Foundation of Shared Language, Cultural Awareness, & Belonging

Privilege

Advantage
Opportunity
Inclusion

Subjugation

Disadvantage
Humiliation
Exclusion

Oppression vs. Privilege#

Oppression

→ Identity that has historically experienced more barriers/stigma

→ Identity is a source of exclusion

→ Identity is very present

Privilege

→ Identity that has historically experienced more power

→ Identity is a source of strength

→ Identity seems unimportant

Some of the privileges out there#

→ Identity that has historically experienced more barriers/stigma

→ Identity is a source of exclusion

→ Identity is very present


	← Education →
	← Gender →
	← Ethnicity & race →
	← Socioeconomic status →
	← Disability status →
	← Sexual orientation →
	← Age →
	← National origin →

→ Identity that has historically experienced more power

→ Identity is a source of strength

→ Identity seems unimportant

Axes of privilege#


low	← Education →	high
Non-binary	← Gender →	Male
Non white passing	← Ethnicity & race →	White or white passing
Low	← Socioeconomic status →	High
Disabled	← Disability status →	Not disabled
LGBTQI	← Sexual orientation →	Hetero
Old	← Age →	Young
Stigmatised foreign country	← National origin →	Same as current country

Exercise#

Identify your privilege#

Axes of privilege#

-1	0	1
low	← Education →	high
Non-binary	← Gender →	Male
Non white passing	← Ethnicity & race →	White or white passing
Low	← Socioeconomic status →	High
Disabled	← Disability status →	Not disabled
LGBTQI	← Sexual orientation →	Hetero
Old	← Age →	Young
Stigmatised foreign country	← National origin →	Same as current country

»Data is not going away. Nor are computers - much less mathematics. Predictive models are, increasingly, the tools we will be relying on to run our institutions, deploy our resources, and manage our lives. But as I’ve tried to show throughout this book, these models are constructed not just data but from the choises we make about the type of data to pay attention to - and which to leave out. Those choices are not just about logistics, profits, and efficiency. They are fundamentally moral. «

Cathy O’Neil, Weapon of Math Destruction

Ethics in Data Science

Contents

Ethics in Data Science#

Exercise#

Understand Data Ethics#

Exercise#

Examples of ethical concerns#

What can we do?#

What can we do?#

Practical examples of possible solutions to some ethical concern.#

Terminology#

Terminology#

Oppression vs. Privilege#

Some of the privileges out there#

Axes of privilege#

Exercise#

Identify your privilege#

Axes of privilege#

References & Follow up#