Matthews correlation coefficient

From WikiMD's Food, Medicine & Wellness Encyclopedia

Matthews Correlation Coefficient (MCC) is a measure used in machine learning and bioinformatics to assess the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and -1 indicates total disagreement between prediction and observation.

Definition[edit | edit source]

The Matthews Correlation Coefficient is calculated using the formula:

\[ MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP+FP) \times (TP+FN) \times (TN+FP) \times (TN+FN)}} \]

where:

  • TP = True Positives
  • TN = True Negatives
  • FP = False Positives
  • FN = False Negatives

Application[edit | edit source]

The MCC is used in various fields, including bioinformatics, where it is used to evaluate the performance of sequence search methods, and in machine learning and data mining, where it is used to assess the performance of classification models. It is particularly useful in situations where the classes are imbalanced, that is, when the number of instances in one class significantly outnumbers the instances in the other class.

Advantages[edit | edit source]

  • Balanced: The MCC takes into account both the size of the positive elements and the size of the negative elements in the dataset, making it a balanced measure.
  • Interpretable: The value of the MCC directly corresponds to the quality of the classification, making it easy to interpret.
  • Applicable to imbalanced datasets: Unlike other metrics such as accuracy, the MCC is not biased towards the majority class in imbalanced datasets.

Limitations[edit | edit source]

  • Sensitivity to small sample sizes: The MCC can be overly optimistic or pessimistic in datasets with very small sample sizes.
  • Not applicable to multi-class problems: The MCC is only defined for binary classification tasks. For multi-class problems, other measures such as the confusion matrix or multi-class versions of the MCC need to be used.

Comparison with Other Metrics[edit | edit source]

The MCC is often compared with other classification metrics such as Precision and Recall, F1 Score, and Accuracy. While accuracy is the most intuitive performance measure, it can be misleading in the presence of imbalanced classes. The F1 score is the harmonic mean of precision and recall, providing a balance between the two, but it does not take into account true negatives. The MCC, by considering all four quadrants of the confusion matrix, provides a more comprehensive measure of classification performance.

See Also[edit | edit source]

References[edit | edit source]


Wiki.png

Navigation: Wellness - Encyclopedia - Health topics - Disease Index‏‎ - Drugs - World Directory - Gray's Anatomy - Keto diet - Recipes

Search WikiMD


Ad.Tired of being Overweight? Try W8MD's physician weight loss program.
Semaglutide (Ozempic / Wegovy and Tirzepatide (Mounjaro) available.
Advertise on WikiMD

WikiMD is not a substitute for professional medical advice. See full disclaimer.

Credits:Most images are courtesy of Wikimedia commons, and templates Wikipedia, licensed under CC BY SA or similar.


Contributors: Prab R. Tumpati, MD