본문 바로가기
Study Note/Data Analysis

Simpson's paradox

by jhleeatl 2024. 4. 24.

Simpson's paradox

 

 

 

How Data Trends Can Mislead

Simpson's Paradox is a fascinating statistical phenomenon that can lead to misleading conclusions if not properly understood. I'll explore what Simpson's Paradox is and why it's important for data analysts and researchers to be aware of it.

 


What is Simpson's Paradox?

Simpson's Paradox occurs when a trend that appears in different groups of data disappears or reverses when the 

groups are combined. In other words, the relationship between two variables observed within subgroups can be 

different from the relationship observed when the subgroups are combined.

 

Example:

 

Making Sense of the Paradox

Let's consider a classic example involving admission rates at two departments within a university. 

Suppose Department A has a higher overall admission rate than Department B. However, when we break down the 

data by gender, we find that both male and female applicants have a higher admission rate in Department B compared to Department A. This seems paradoxical at first glance.

Understanding the Cause

The paradox arises due to the presence of a lurking variable or confounding factor that influences the relationship 

between the variables being studied. Example, the lurking variable could be the distribution of

applicants' qualifications or preferences for certain departments based on their gender.

Implications for Data Analysis

Simpson's Paradox serves as a reminder that aggregated data can sometimes mask important underlying trends and 

relationships. It underscores the importance of considering all relevant variables and conducting deeper analyses to 

avoid drawing misleading conclusions.

Conclusion

In conclusion, Simpson's Paradox highlights the complexity of data analysis and the potential pitfalls of oversimplified 

interpretations. By understanding this paradox and being vigilant for its occurrence, data analysts can ensure that their conclusions are robust and accurate.

'Study Note > Data Analysis' 카테고리의 다른 글

Mece logic tree  (0) 2024.04.29
AARRR Funnel Analysis  (0) 2024.04.24
Qualitative Data vs Quantitatived Data  (0) 2024.04.24
Sampling Bias and Sample Error  (0) 2024.04.24
Data literacy  (0) 2024.04.24