The scatter chart (AKA scatter plot, scatter graph, scattergram, XY plot) shows the relationship between two variables. The convention is to place the independent variable on the X (horizontal) axis and the dependent variable (the thing that changes in response to X) on the Y (vertical) axis.
When the X and Y variables are highly correlated then the points formed by the scatter chart will form a tight predictable pattern (commonly a straight line). Figure 1 below shows a strong positive relationship between research income and publications (See this post for the full analysis). When the X and Y variables are poorly correlated then the points in the scatter plot look like a random cloud of points.
Figure 1: Scatter chart showing the average cost of research publications in Australian Universities (data from research income and publications data for 2014)
Scatter plots are typically used when exploring data and testing hypotheses of correlation / causation between variables. What is really annoying in this exploratory phase is having to mash together different data sets into a singe file before you can create a scatter chart. Truii lets you select data from across different files and instantly compare. Truii looks for ways to relate the data in the different files by looking for common ‘keys’ such as a date or location columns.
Causation Vs Correlation
A common groan from statistically inclined folks is how often correlation is mistaken for causation. Just because there is a tight relationship between two variables doesn’t mean they directly interact. This matters a lot for scientific study because research is focused on explaining the ‘why’. The warning here is against using data exploration as an hypothesis generation process.
From a business perspective the distinction between correlation and causation is less problematic. Even if you don’t fully understand the ‘why’, a strong correlation may be enough basis for a business decision. Unless of course the relationship is obviously spurious (Figure 2).
There is a great ‘spurious correlations‘ site which catalogues a very amusing collection of highly correlated but unrelated variables.
Figure 2: Spurious correlation example – Shark Attack Vs Body Mass Index (BMI) – it looks like sharks prefer their victims a little more portly
Making a scatter chart in Truii
Don’t forget to sign up to Truii’s news and posts (form on the right).