{"id":17226,"date":"2026-06-23T05:23:04","date_gmt":"2026-06-23T05:23:04","guid":{"rendered":"https:\/\/www.smilefoundationindia.org\/blog\/?p=17226"},"modified":"2026-06-26T06:03:00","modified_gmt":"2026-06-26T06:03:00","slug":"pc-mahalanobis-distance","status":"publish","type":"post","link":"https:\/\/www.smilefoundationindia.org\/blog\/pc-mahalanobis-distance\/","title":{"rendered":"Understanding PC Mahalanobis&#8217; Contribution to Data Science: Mahalanobis Distance"},"content":{"rendered":"\n<p><strong>PC Mahalanobis<\/strong> gave the world a statistical tool that is, by almost any measure, more useful today than when he first proposed it in the 1930s. The Mahalanobis distance \u2014 a method for measuring how far a data point lies from a distribution, while accounting for the shape and spread of that data \u2014 has become a foundational technique in modern data science, machine learning and anomaly detection.<\/p>\n\n\n\n<p>This blog explains what the Mahalanobis distance is, how it works, where it is used and why the man behind it deserves a more prominent place in the history of data science than he typically receives.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Who is PC Mahalanobis \u2014 Life and Legacy<\/strong><\/h2>\n\n\n\n<p>Prasanta Chandra Mahalanobis was born on 29 June 1893 in Calcutta, into a family with deep roots in Bengal&#8217;s intellectual and reform traditions. He studied physics at Presidency College, Calcutta, where his teachers included Jagadish Chandra Bose, before travelling to King&#8217;s College, Cambridge, to continue his education in mathematics and physics.<\/p>\n\n\n\n<p>It was during this period in England that a passing introduction to statistics changed the course of his life. He returned to India with a new conviction: that rigorous data collection and statistical reasoning could be applied to real, pressing problems in agriculture, anthropology, economics and public policy.<\/p>\n\n\n\n<p>What followed was a career of extraordinary breadth. <strong>PC Mahalanobis<\/strong> founded the Indian Statistical Institute in Calcutta in 1931, established India&#8217;s first statistical journal and built the survey methods that laid the groundwork for India&#8217;s National Sample Survey \u2014 systems that continue to inform government policy to this day. He also played a central role in shaping India&#8217;s Second Five Year Plan, where his macro-economic model prioritised heavy industry investment as the foundation for long-term growth.<\/p>\n\n\n\n<p>He was elected a Fellow of the Royal Society in 1945, received the Weldon Medal from Oxford University in 1944 and was awarded the Padma Vibhushan in 1968. He died in June 1972, one day before what would have been his seventy-ninth birthday.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Founding of the Indian Statistical Institute<\/strong><\/h3>\n\n\n\n<p>The Indian Statistical Institute began in December 1931, growing out of an informal statistical laboratory that Mahalanobis had set up in his own room at Presidency College. Initially registered as a non-profit learned society in April 1932, it operated on a budget of just 238 rupees in its first year.<\/p>\n\n\n\n<p>Over the following decades, ISI expanded into a globally recognised centre for statistical and quantitative research. In 1933, it launched Sankhya \u2014 India&#8217;s first academic statistics journal, still published today \u2014 which carried many of Mahalanobis&#8217;s landmark papers, including the original formulation of what would become the Mahalanobis distance.<\/p>\n\n\n\n<p>ISI&#8217;s Kolkata campus became a hub for interdisciplinary research spanning economics, genetics, computer science and sociology. It trained generations of statisticians who went on to shape statistical practice in India and internationally. The Institute remains one of the most rigorous quantitative research institutions in the world.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is Mahalanobis Distance \u2014 Simple Explanation<\/strong><\/h2>\n\n\n\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Mahalanobis_distance\" rel=\"nofollow noopener\" target=\"_blank\">Mahalanobis distance<\/a> answers a deceptively simple question: <\/p>\n\n\n\n<p>how unusual is this data point, given everything we know about the distribution it comes from?<\/p>\n\n\n\n<p>Imagine you are looking at the heights and weights of a large group of people. You want to know whether a specific individual&#8217;s measurements are typical or unusual. The challenge is that height and weight are correlated \u2014 taller people tend to weigh more \u2014 and they are measured in completely different units. A simple measurement of distance from the average would ignore both of these facts.<\/p>\n\n\n\n<p>The Mahalanobis distance solves this by accounting for the correlation between variables and the scale of each variable simultaneously. It transforms the space in which the measurement exists, stretching and rotating it to reflect the actual shape of the data, and then measures distance within that corrected space. The result is a single number that tells you, in a statistically meaningful way, how far a point lies from the centre of its distribution.<\/p>\n\n\n\n<p>A Mahalanobis distance of zero means the point is exactly at the mean of the distribution. The larger the value, the more unusual the observation \u2014 regardless of how many variables are involved.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The Formula and How It Works<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img fetchpriority=\"high\" decoding=\"async\" width=\"540\" height=\"360\" src=\"https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Understanding-Mahalanobis-Distance-in-Data.png\" alt=\"PC Mahalanobis\" class=\"wp-image-17229\" srcset=\"https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Understanding-Mahalanobis-Distance-in-Data.png 540w, https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Understanding-Mahalanobis-Distance-in-Data-300x200.png 300w\" sizes=\"(max-width: 540px) 100vw, 540px\" \/><\/figure>\n\n\n\n<p>The Mahalanobis distance between a point x and a distribution with mean vector mu and covariance matrix S is defined as:<\/p>\n\n\n\n<p>D(x) = sqrt [ (x &#8211; mu)^T * S^-1 * (x &#8211; mu) ]<\/p>\n\n\n\n<p>Breaking this down:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>(x &#8211; mu)<\/strong> is the difference between the point and the mean of the distribution, a vector when multiple variables are involved.<\/li>\n\n\n\n<li><strong>S^-1<\/strong> is the inverse of the covariance matrix \u2014 the key term that accounts for how variables are spread and correlated. This is what distinguishes the Mahalanobis distance from simpler distance measures.<\/li>\n\n\n\n<li><strong>The transpose and multiplication<\/strong> combine these to produce a single scalar value.<\/li>\n<\/ul>\n\n\n\n<p>In practical terms, the covariance matrix describes the shape of the data cloud. If two variables are highly correlated, the data cloud is elongated in a particular direction. The inverse covariance matrix essentially normalises this shape, so that what appears to be a large distance in one direction is correctly recognised as not unusual if it follows the natural spread of the data.<\/p>\n\n\n\n<p>The result is unitless meaning it does not depend on the original scale of any individual variable and can be compared directly across observations, regardless of the number of variables involved.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Mahalanobis Distance vs Euclidean Distance<\/strong><\/h3>\n\n\n\n<p>The Euclidean distance is the straightforward geometric distance between two points \u2014 the kind taught in secondary school geometry. It works reliably when variables are uncorrelated and measured on the same scale.<\/p>\n\n\n\n<p>Real-world data almost never meets both conditions simultaneously.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Feature<\/th><th>Euclidean Distance<\/th><th>Mahalanobis Distance<\/th><\/tr><\/thead><tbody><tr><td>Accounts for correlation<\/td><td>No<\/td><td>Yes<\/td><\/tr><tr><td>Scale-independent<\/td><td>No<\/td><td>Yes<\/td><\/tr><tr><td>Works with multiple variables<\/td><td>Partly<\/td><td>Fully<\/td><\/tr><tr><td>Sensitive to variable units<\/td><td>Yes<\/td><td>No<\/td><\/tr><tr><td>Handles elongated data clouds<\/td><td>No<\/td><td>Yes<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Consider a dataset where one variable ranges from 0 to 1 and another ranges from 0 to 10,000. Euclidean distance would be dominated almost entirely by the larger-scale variable, making the smaller one irrelevant. Mahalanobis distance normalises for this automatically.<\/p>\n\n\n\n<p>Similarly, if two variables are highly correlated, a Euclidean measurement might flag a point as an outlier simply because it deviates from the mean in both variables simultaneously \u2014 even if that joint deviation is perfectly consistent with the correlation structure of the data. Mahalanobis distance would correctly identify this as a typical observation.<\/p>\n\n\n\n<p>This is why data scientists prefer Mahalanobis distance for multivariate problems and why it has remained a standard tool more than 85 years after it was first proposed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Applications of Mahalanobis Distance in Data Science<\/strong><\/h2>\n\n\n\n<p>The Mahalanobis distance has found a wide range of practical applications in modern data science, reflecting both its mathematical rigour and its intuitive logic.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Outlier Detection, Clustering and Machine Learning<\/strong><\/h3>\n\n\n\n<p><strong>Outlier and anomaly detection<\/strong> is perhaps the most common application. In any dataset, an observation with a high Mahalanobis distance from the rest of the data is, by definition, statistically unusual. This makes the measure particularly useful in:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Financial fraud detection<\/strong> \u2014 identifying transactions that deviate significantly from a customer&#8217;s normal pattern of behaviour, accounting for correlations between transaction amount, time, location and merchant type simultaneously.<\/li>\n\n\n\n<li><strong>Medical diagnostics<\/strong> \u2014 detecting patients whose combination of clinical markers falls unusually far from reference population norms, even when each individual marker might appear borderline.<\/li>\n\n\n\n<li><strong>Manufacturing quality control<\/strong> \u2014 identifying products whose combination of measurable properties falls outside acceptable limits, even when no single property is individually defective.<\/li>\n\n\n\n<li><strong>Cybersecurity<\/strong> \u2014 flagging network activity that deviates from established behavioural baselines in multivariate ways that simple threshold rules would miss.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"493\" height=\"395\" src=\"https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Fraud-detection-using-Mahalanobis-distance.png\" alt=\"\" class=\"wp-image-17230\" srcset=\"https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Fraud-detection-using-Mahalanobis-distance.png 493w, https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Fraud-detection-using-Mahalanobis-distance-300x240.png 300w\" sizes=\"(max-width: 493px) 100vw, 493px\" \/><\/figure>\n\n\n\n<p><strong>In machine learning<\/strong>, Mahalanobis distance appears across several core techniques:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>In <strong>Linear Discriminant Analysis (LDA)<\/strong>, classifying a new observation into one of several groups involves comparing its Mahalanobis distance from each group&#8217;s centroid. The observation is assigned to the group whose centroid it is closest to, in the Mahalanobis sense.<\/li>\n\n\n\n<li>In <strong>k-Nearest Neighbour algorithms<\/strong>, replacing Euclidean distance with Mahalanobis distance often improves classification accuracy on correlated, multi-scale datasets.<\/li>\n\n\n\n<li>In <strong>Gaussian mixture models and cluster analysis<\/strong>, Mahalanobis distance is used to assign observations to clusters in a way that respects the shape of each cluster rather than assuming circular symmetry.<\/li>\n<\/ul>\n\n\n\n<p><strong>In clinical research<\/strong>, the Mahalanobis distance is used to assess whether individuals or groups are genuinely comparable \u2014 a technique called propensity score matching \u2014 ensuring that treated and control groups in observational studies are appropriately balanced across multiple background characteristics simultaneously.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"540\" height=\"360\" src=\"https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Propensity-score-matching-explained-visually.png\" alt=\"\" class=\"wp-image-17233\" srcset=\"https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Propensity-score-matching-explained-visually.png 540w, https:\/\/www.smilefoundationindia.org\/blog\/wp-content\/uploads\/2026\/06\/Propensity-score-matching-explained-visually-300x200.png 300w\" sizes=\"(max-width: 540px) 100vw, 540px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>PC Mahalanobis and the Second Five-Year Plan<\/strong><\/h2>\n\n\n\n<p>Beyond his statistical contributions, PC Mahalanobis played a defining role in shaping India&#8217;s post-independence economic strategy. As a member of the Planning Commission, he developed what became known as the Mahalanobis Model \u2014 the strategic framework behind India&#8217;s Second Five Year Plan (1956 to 1961).<\/p>\n\n\n\n<p>The model argued that sustained economic growth required prioritising investment in heavy industry and capital goods, rather than concentrating resources on consumer goods production. By building domestic productive capacity \u2014 steel mills, machine tools, power generation \u2014 India would create the foundation for long-term industrialisation rather than remaining dependent on imported capital equipment.<\/p>\n\n\n\n<p>The model drew on his statistical background in a direct way, that is, built on input-output analysis and aggregate growth accounting, applying quantitative reasoning to economic planning at a national scale. While economic historians have debated its consequences, the model reflected a genuine and sophisticated attempt to apply statistical thinking to one of the most consequential policy decisions of newly independent India.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Why India Celebrates National Statistics Day on His Birthday<\/strong><\/h2>\n\n\n\n<p>In 2006, the Government of India announced that 29 June, the birthday of PC Mahalanobis, would be observed annually as National Statistics Day. The first official celebration took place in 2007.<\/p>\n\n\n\n<p>The choice acknowledges not only his specific contributions \u2014 the Mahalanobis distance, the Indian Statistical Institute, the National Sample Survey methodology \u2014 but the broader idea he embodied: that rigorous, evidence-based statistical reasoning is essential for good governance, sound policy and equitable development.<\/p>\n\n\n\n<p>National Statistics Day is observed each year with seminars, academic events and policy discussions organised by the Ministry of Statistics and Programme Implementation, typically focused on a theme relevant to current data and development priorities.<\/p>\n\n\n\n<p>As India moves deeper into a data-driven economy where decisions in agriculture, <a href=\"https:\/\/www.smilefoundationindia.org\/health\">public health<\/a>, urban planning and financial services increasingly rely on statistical inference, the foundations that PC Mahalanobis built become more relevant, not less.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>FAQs \u2014 Mahalanobis Distance and PC Mahalanobis<\/strong><\/h2>\n\n\n\n<p><strong>Who is PC Mahalanobis and what is he known for?<\/strong> <\/p>\n\n\n\n<p>Prasanta Chandra Mahalanobis was an Indian statistician who founded the Indian Statistical Institute, developed the Mahalanobis distance and played a central role in India&#8217;s post-independence economic planning. He is widely regarded as the father of statistics in India and his birthday is celebrated as National Statistics Day on 29 June each year.<\/p>\n\n\n\n<p><strong>What is Mahalanobis distance in simple terms?<\/strong> <\/p>\n\n\n\n<p>It is a measure of how unusual a data point is relative to a distribution, accounting for the correlation and scale of all variables involved. Unlike simpler distance measures, it correctly handles situations where variables are correlated or measured in different units.<\/p>\n\n\n\n<p><strong>How is Mahalanobis distance different from Euclidean distance?<\/strong> <\/p>\n\n\n\n<p>Euclidean distance measures straight-line distance between points but ignores correlation between variables and differences in scale. Mahalanobis distance corrects for both, making it more accurate and meaningful for multivariate, real-world datasets.<\/p>\n\n\n\n<p><strong>What is the formula for Mahalanobis distance?<\/strong> <\/p>\n\n\n\n<p>D(x) = sqrt [ (x &#8211; mu)^T * S^-1 * (x &#8211; mu) ], where x is the data point, mu is the mean vector of the distribution, and S^-1 is the inverse of the covariance matrix. The covariance matrix is what accounts for the correlation structure and scale of the data.<\/p>\n\n\n\n<p><strong>How is Mahalanobis distance used in machine learning?<\/strong> <\/p>\n\n\n\n<p>It is used in outlier detection, Linear Discriminant Analysis, k-Nearest Neighbour classification and cluster analysis. In each case, it provides a more accurate measure of statistical distance than Euclidean distance when variables are correlated or on different scales.<\/p>\n\n\n\n<p><strong>What is the Indian Statistical Institute and who founded it?<\/strong> <\/p>\n\n\n\n<p>The Indian Statistical Institute was founded by PC Mahalanobis in Calcutta in 1931. It grew from a small statistical laboratory at Presidency College into a globally recognised centre for statistical and quantitative research, and remains one of the leading such institutions in the world.<\/p>\n\n\n\n<p><strong>How did PC Mahalanobis contribute to India&#8217;s economic planning?<\/strong> <\/p>\n\n\n\n<p>He developed the Mahalanobis Model, which shaped India&#8217;s Second Five Year Plan (1956 to 1961) by arguing for prioritised investment in heavy industry as the foundation for long-term economic growth. He also pioneered the large-scale sample survey methods used by India&#8217;s National Sample Survey to collect economic and demographic data.<\/p>\n\n\n\n<p><strong>Why is National Statistics Day celebrated on June 29?<\/strong> <\/p>\n\n\n\n<p>29 June is the birthday of PC Mahalanobis. The Government of India designated it as National Statistics Day in 2006 to recognise his contributions to statistical science, survey methodology and evidence-based governance.<\/p>\n\n\n\n<p><strong>Where is Mahalanobis distance used in real-world data science?<\/strong> <\/p>\n\n\n\n<p>Applications include financial fraud detection, medical diagnostics, manufacturing quality control, cybersecurity anomaly detection, propensity score matching in clinical research, and classification tasks in machine learning wherever multiple correlated variables need to be assessed jointly.<\/p>\n\n\n\n<p><strong>What is the significance of PC Mahalanobis in modern statistics?<\/strong> <\/p>\n\n\n\n<p>His development of the Mahalanobis distance gave statisticians and data scientists a tool for multivariate analysis that remains widely used nearly 90 years later. His institutional legacies like ISI, Sankhya, the National Sample Survey built the statistical infrastructure that modern India relies on for governance, planning and research.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Mahalanobis distance is one of data science&#8217;s most enduring tools \u2014 and most people have never heard of the Indian statistician who created it. This blog explains how the formula works, why it outperforms Euclidean distance for real-world data and what PC Mahalanobis contributed to modern statistics.<\/p>\n","protected":false},"author":3,"featured_media":2435,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[],"class_list":["post-17226","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-smile"],"_links":{"self":[{"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/posts\/17226","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/comments?post=17226"}],"version-history":[{"count":0,"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/posts\/17226\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/media\/2435"}],"wp:attachment":[{"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/media?parent=17226"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/categories?post=17226"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.smilefoundationindia.org\/blog\/wp-json\/wp\/v2\/tags?post=17226"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}