Truii data visualization, analysis and management light through forest in snow

Is it time to MoneyBall the environment?

Post is based on a paper and presentation that I gave at the recent Australian Stream Management Conference. Get the full paper here…

 

The term Moneyball comes from Lewis (2003) “The art of winning an unfair game” which was the basis for the movie “MoneyBall” (2011). The moneyball story is about how in 2002, the Oaklands Athletics baseball team with a payroll budget of US$41M was able to compete successfully against much wealthier teams (such as the New York Yankees with US$125 payroll budget).

The approach used by Oaklands Athletics was to move away from traditional metrics of a baseball player’s value, and to use a data driven statistical approach to dig deep into the data to buy great players that had been undervalued by the traditional metrics.

Like cricket, batting averages or bowling economy rates, baseball has long-established traditions of using statistics to summarise a player’s performance. These traditional player statistics have played a major role in determining a player’s value.

However, when Oaklands analysed the data, the combination of traditional metrics of each player’s value didn’t match the team’s overall performance. That is, a team of players that scored well in the traditional metrics didn’t necessarily translate into a successful team on the park. Oakland Athletics developed their own more insightful metrics to identify valuable team players; they bought undervalued players; and went on to make the playoffs in 2002 and 2003. The approach is now widely adopted in the US baseball scene.

So what does Moneyballing have to do with the Environment?

Using the Oaklands moneyball analogy. Environmental management in Australia is a lot like baseball – pre-moneyballing. There are lots of earnest activities going on based on our historic preconceived notions of what works and how to get the best environmental outcomes. Loads of semi-government environmental organisations are very busy working on their own projects and lobbying hard for funding but very few are putting their heads up to see if this approach is getting the best environmental outcomes for the meagre resources in this sector.

This is not meant to be a criticism of the natural resource management sector. An enormous amount is achieved with ever diminishing resources. This is really a suggestion about how we make those meagre resources stretch a bit farther by taking advantage of online data resources being made available.

I reckon it is time to take a more data driven approach to natural resource management – it’s time to “moneyball” the environment. For example, it is good to know if we are throwing good money after bad trying to restore some clapped out farm when for a fraction of the cost, we could preserve an area that isn’t already irreparably damaged.

To do this we need to quantify the benefits and compare that with the costs. To moneyball we need data and we need to be able to analyse it.
Whilst cash to do on ground works seems ever harder to find. Fortunately one resource that is increasing for natural resource management is the availability of structured environmental data to help drive our decisions (some open data resources…). To be more data driven in our decisions we need;

  1. Easy access to relevant data,
  2. Easy ways to manage and share data within the project team, and
  3. Easy ways of combining, analysing and reporting our results.

Access to relevant data

We all see how business is becoming ever more data driven. Just think about those direct marketing emails you are getting because your grocery store has analysed your buying habits through the store card that you happily swipe every time you shop.

The great thing about the environment is that we already have a ‘store card’. Various levels of government and research organisations have been collecting detailed information about the environment for the past 100 years. This public asset environmental data is effectively our ‘store card’ (more on this in the next blog). To make sense of it, we need to be able to get to the data and make sense of it.

In times past, the availability of useful data sources wasn’t well publicised, and if a data source was known, then access to that data required identifying the data custodian, sending data requests, accepting data licencing agreements and eventually receiving the data.

Once we had the data, it invariably had to be converted from some proprietary data structure into something readable by the software that we had at hand. Government initiatives to make publicly funded data sets more open has resulted in the development of some great data portals where the available data is easy to search, the data licencing is straightforward, (often adopting a creative commons licence) and much more uniform data structures are being used such as simple column based .csv files, or standard spatial data formats (examples…). This collective openness of environmental data is bringing us to the point where we can rapidly access and mash up different datasets from our environmental store card programs to create better insights into our key customer – the environment.

The ideal data for moneyballing the environment is not yet in the public domain (Figure 1), however many useful components are or there are proxies that we can use that are publicly available.

 

moneyball datasets required

Figure 1. To Moneyball the environment we require industry level data about program delivery, environmental condition and activity level data about the benefits of different NRM activities to combine with local project level data.

 

Share and manage data (data democratisation)

The key to being data driven is to democratise the data management and analysis process. Natural Resource Management groups are often eclectic associations of earnest environmental custodians who are rarely housed in the same corporate structure. Data handling and management is often a one-off, short-term, project specific operation, with the solution devised by the local practitioner.

This approach is at odds with our traditional view of data managing where the preferred approach is an integrated corporate data management system with tight data management protocols and institutionalised data custodian protocols. Whilst, corporate databases are entirely appropriate for large scale ongoing data collection and management systems, they simply become a burdensome overhead for small project based operations. This often results in practitioners circumventing the corporate database system in favour of a local quick solution. A result of the quick local solution, is that datasets are disconnected and have no point of truth, and have no lineage of history which prevents them being updated or added to. The trick for data management is to ensure data lineage, whilst embracing the distributed nature of NRM organisations.

Truii.com is specifically tailored for cross-organisational project teams to share and manage their data whilst maintaining version control.

 

Combining, analysing and reporting

To moneyball the environment we need data, we need data democratisation to empower the team to take collective responsibility for the data, and lastly we require tools for data analysis. This role has traditionally been the domain of the data analyst or statistician. However, in many instances complicated statistics and difficult to use software packages are not required for pulling out key relationships in data. Two very powerful techniques are filtering and bivariate visualisations.

Filtering is simply subsampling a large data set to allow you to focus on the important data. For example, from a water quality data set you may want to create a subset of only those samples that were not taken during a storm event, so you simply extract the water quality data from the main file on days of low stream flow. The second powerful technique is simple visualisation on one variable against another in a standard scatter plot. Environmental systems are complicated and there are many interacting elements, but as a first cast, being able to plot the dry weather water quality against the number of upstream kilometres of riparian fencing will give you an indication of the strength and importance of keeping livestock out of the stream before getting too carried away with tests of significance. These combinations sound simple, but their implementation requires both spatial and temporal data analysis. Until recently, spatial and temporal analysis were very separate domains, with spatial analysis being an expensive and specialist skill due largely to the complicated software used to create and analyse spatial data.

Truii.com has been developed for just this style of data analysis. Whereby you can rapidly filter datasets and overlay one dataset with another. Truii.com embraces the spatio-temporal nature of environmental data without the traditional segregation of spatial or temporal analysis.

Conclusions

All the key elements are in place for the NRM industry to be more data driven – data is increasingly available and in accessible formats, Truii.com provides an online data collaboration platform that allows you to do both, manage your data and perform data overlays.

The viz’s are easy to make. All you need to do is create a free Truii account to create and publish your own data visualizations.
Don’t forget to sign up to Truii’s news and posts (form on the right).