It can be a little daunting to select the best way to visualise your data.
If you already know what chart type you want to use then the following links might help with some tips and tricks:
If you don’t know what chart to use – where do you start?
The following sections give you a little guidance on selecting the right chart based on what you are trying to achieve.
1) Compare values between different groups
To compare values in different collections – use a column chart or bar chart (we collectively call them category charts).
See this post on how to create great column / bar charts.
Figure 1: Vertical column chart comparing home and away win rates for international cricket (see table 1 for the data)
Table 1:The data used in the next few charts – International Test match cricket home and away win percentage (source : ESPN Crickinfo
|Country||Home Win||Away Win||Overall Win rate|
|Australia (n=366 wins from 781 matches)||56.6416||36.64921||46.863|
|South Africa (n=144 wins from 392 matches)||43.12796||29.28177||36.73469|
|England (n=344 wins from 964 matches)||40.68136||30.32258||35.68465|
|West Indies (n=164 wins from 509 matches)||36.0515||28.98551||32.22004|
|Pakistan (n=124 wins from 393 matches)||37.41007||28.34646||31.55216|
|Sri Lanka (n=74 wins from 242 matches)||41.6||18.80342||30.57851|
|India (n=124 wins from 491 matches)||33.73494||16.52893||25.25458|
|New Zealand (n=81 wins from 401 matches)||25.3886||15.38462||20.1995|
|Zimbabwe (n=11 wins from 97 matches)||17.30769||4.444444||11.34021|
|Bangladesh (n=7 wins from 93 matches)||6.779661||8.823529||7.526882|
|all (n=1439 wins from 2182 matches)||39.55535||26.54265|
2) Look for relationships between two different collections
To look for relationships or correlations between collections a scatter chart (X,Y plot, Scattergram) is the place to start. check out this post on how to make scatter charts.
Figure 2 plots the overall success rate of international cricket teams (data from Table 1). There is a clear strong positive relationship (a line through the points would go up and to the right) indicating that winning at home and winning away are closely correlated.
Figure 2: Scatter chart comparing home and away win rates for international cricket (see table 1 for the data)
Correlation vs Causation
Figure 2 shows that teams that win at home are more likely to win away. Does winning at home make you more successful away? Nope, better teams win whether they are at home or away. Classic causation vs correlation – because things are correlated doesn’t make them causative.
Statisticians often get worked into a lather about correlation vs causation. The difference only really matters in how you present or use your findings. For the example if I was trying to explain ‘why’ a touring team was successful I would need to consider all the features that make up a great cricket team (coaches, players, national development programs etc) and not just home form. However, if I want to have a punt on an upcoming match, I don’t really care about the underlying cause for a teams success, just the likelihood of success. In which case – simple correlation might be enough to tell me to back the team with recent home form.
At the risk of trivialising the correlation Vs causation discussion, many business decisions can be made without fully understanding ‘why’. The ‘beer and nappies’ parable is often used to illustrate the value of correlation. The original story relates to shopping basket analysis (this analysis is literally for detecting trends in the contents of peoples shopping baskets). The story goes that men’s shopping baskets that contained nappies (diapers) also contained beer. Nobody is suggesting that babies drive you to drink (i.e. not a causative relationship), but the result does suggest that retailers can sell more beer if they include a stack of beer cases in the nappy aisle.
The shape or distribution of data within a collection can be informative. There are two common charts for showing distributions: Histograms, and box plots.
Histograms divide the total range of the data into equal sized ‘bins’ and then counts how often a value falls into each bin. See this page on how to create Histograms.
It is pretty clear from Figure 3 that the most countries have a good home win rate (over 50% of the countries win at home at least 35% of the time) – the tall green bar. The ‘away’ success is not so great – blue bars show a lower win rate.
Figure 3: Histogram (distribution chart) comparing home and away win rates for international cricket (see table 1 for the data)
Box plots summarise key statistics from the collection as a box and whisker. The Box part the middle half of the data range and the line in the middle of the box shows the median (middle value if you sort the values). See this page on how to create box plots .
Figure 4 is the same data as the previous plots presented as box plot – clearly the green is higher than the blue – more countries win more home games than away games.
Figure 3: Box plot comparing home and away win rates for international cricket (see Table 1 for the data)
Show parts of a whole something
A pie chart is the ubiquitous way to show parts of a whole. Pie charts look good but they are a pretty inefficient way to show data. Figure 4 takes up a lot of space to show just three values. Pie charts have to add up to 100%. See this page on how to make pie charts. Stacked bar charts or stacked column charts (we call them category charts) like Figure 5 are often a better data visualisation that a pie chart becuase they are easier to read and can contain more information.
Figure 4: Pie Chart showing game outcomes for test cricket home teams (see table 1 for the data)
Compare parts of different collections
https://truii.com/features/data-page/scatter-chart/You can stack the columns (or bars if you want a horizontal version) to show the component parts of a category. Stacked column charts have the advantage over pie charts in that they are more efficient use of space and the component parts don’t necessarily need to add to 100%.
Figure 5: Stacked column chart showing game outcomes for test cricket nations (see Table 1 for the data)
Look for a trend
To show a trend you should start with a line chart (time series chart) which is a type of scatter chart with date values along the X axis. Figure 6 shows the increase in the number of points scored in international Rugby (note no games during the first and second world wars). This page describes making a line chart.
Figure 6: Line chart showing the rise in points scored in International Rugby games (Data source – ESPN)
Compare parts through time
Stacked line charts are a really appealing way to show both a trend and how the underlying components contribute to the trend. Figure 7 shows what a dramatic affect penalties (the grey section) has had on the overall total score in International Rugby. To further emphasise the relative importance of the different ways of scoring in Rugby, we can range standardise the the Y axis (Figure 8) to show the proportion of points scored by different scoring methods. It used to be a game of conversions and now it is a game of long range penalties.
Figure 7: Stacked line chart showing the rise in points scored in International Rugby games (Data source – ESPN)
Figure 8: Stacked line chart showing relative scaling on Y axis which shows how the proportion of points scored in a rugby game has changed (Data source – ESPN)