Pearson correlation coefficient
Pearson correlation coefficient, also known as Pearson's r, is a measure of the strength and direction of association that exists between two continuous variables. It is a method of correlation: a statistical technique used to determine the degree to which two variables are related. The coefficient values range from -1 to 1, where -1 indicates a perfect negative linear relationship, 0 indicates no linear relationship, and 1 indicates a perfect positive linear relationship.
Definition[edit | edit source]
The Pearson correlation coefficient is defined as the covariance of the two variables divided by the product of their standard deviations. Mathematically, it is represented as:
\[ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} \]
where:
- \(n\) is the number of data points,
- \(x\) and \(y\) are the variables,
- \(\sum\) denotes the summation.
Interpretation[edit | edit source]
The value of the Pearson correlation coefficient indicates the strength and direction of the linear relationship between two variables. A value close to 1 implies a strong positive relationship, a value close to -1 implies a strong negative relationship, and a value around 0 implies no linear relationship.
Assumptions[edit | edit source]
The calculation and interpretation of Pearson's r assume that:
- Both variables are normally distributed.
- The relationship between the variables is linear.
- The data is homoscedastic, meaning the variance within each variable is the same.
Applications[edit | edit source]
Pearson's r is widely used in the fields of statistics, psychology, medicine, and social sciences to measure the linear relationship between variables. It is particularly useful in research studies that aim to determine the strength and direction of relationships among continuous variables.
Limitations[edit | edit source]
While Pearson's r is a powerful tool for measuring linear relationships, it has limitations:
- It can only measure linear relationships and may not accurately represent non-linear relationships.
- It is sensitive to outliers, which can significantly affect the coefficient value.
- It assumes that the relationship between variables is linear and may not accurately reflect the complexity of real-world data.
See also[edit | edit source]
References[edit | edit source]
Search WikiMD
Ad.Tired of being Overweight? Try W8MD's physician weight loss program.
Semaglutide (Ozempic / Wegovy and Tirzepatide (Mounjaro / Zepbound) available.
Advertise on WikiMD
WikiMD's Wellness Encyclopedia |
Let Food Be Thy Medicine Medicine Thy Food - Hippocrates |
Translate this page: - East Asian
中文,
日本,
한국어,
South Asian
हिन्दी,
தமிழ்,
తెలుగు,
Urdu,
ಕನ್ನಡ,
Southeast Asian
Indonesian,
Vietnamese,
Thai,
မြန်မာဘာသာ,
বাংলা
European
español,
Deutsch,
français,
Greek,
português do Brasil,
polski,
română,
русский,
Nederlands,
norsk,
svenska,
suomi,
Italian
Middle Eastern & African
عربى,
Turkish,
Persian,
Hebrew,
Afrikaans,
isiZulu,
Kiswahili,
Other
Bulgarian,
Hungarian,
Czech,
Swedish,
മലയാളം,
मराठी,
ਪੰਜਾਬੀ,
ગુજરાતી,
Portuguese,
Ukrainian
WikiMD is not a substitute for professional medical advice. See full disclaimer.
Credits:Most images are courtesy of Wikimedia commons, and templates Wikipedia, licensed under CC BY SA or similar.
Contributors: Prab R. Tumpati, MD