Truii data wrangling feature

Data Wrangling

So you work in a team that generates data. Whether it’s environmental observations like water quality or sales figures or contact referrals – it is all data and it is constantly coming at you.

You might have started off with a simple spreadsheet, that grew to a multipage spreadsheet, you shared that spreadsheet with someone in your team, who then shared it with someone else and now there are two or three parallel spreadsheets.

Those spreadsheets might have names like ‘water quality_final, water quality final_final, water quality_final_datechange, etc’. You have no real idea which is the most complete dataset.

Trying to keep on top of data within a team is a thankless and time consuming process (about 80% of your data analysis time according to Dasu and Johnson (2003) ). In the good/bad old days we had database custodians who were the ultimate gate keepers to all data and if you wanted data, you sent a request and eventually you were issued with the ‘latest’ set from the corporate database, which may be months or sometimes a year out of date.

Data is generated at an enormous rate these days, and we all expect to be empowered to see, access and analyse the latest data. What this means is that there are a lot of hands in the datapot at the same time. Whilst we want instant access we don’t want to let go of the expectations of data tracability and accountability that we have grown to expect from the good/bad old days with the single point of truth data gate keeper. That gets us to the point of how to wrangle data in a way that lets a team have access to the latest dataset – even live data – whilst you are in the process of cleaning that data.

Truii is designed around this data wrangling problem. The approach in Truii is to provide a large and growing collection of data wrangling functionality to allow you to search for outliers, change timestep, fill missing values, correct drift in data , append new data and merge two datasets – all at the same time that your team has access to the data. It is a bit like trying to build a house around the owners who move in as soon as the slab is layed.

Truii allows you to control which members in your team can modify the data and allows you to roll back to previous versions and generally track the lineage of a dataset so you know where the data has come from and who has worked on it.

To take the building analogy further, this means that you can let your team watch the building process or even give them a wheel barrow to help out (knowing you can always undo their work if things don’t go well). We like to think of this as a form or data empowerment of data democratization, encouraging the team to get their hands dirty with data wrangling and analysis.

As a team of crack data wranglers there are a few common tasks that you will inevitable do.

  • Filtering: This is where you remove some data that doesn’t fit your requirements. For example, I only want end of month total sales figures and not daily progressive values.
  • Transforming: This is where append new data or modify values like converting from degrees Farenheit to degrees Celsius.
  • Aggregating / disaggregating: This is where data is lumped together or divided apart. It is most commonly done with temporal data where for example you want the monthly totals, not the daily values. Or similarly you may do the same operation over space – I want the total for the state not for the post/zip code.

Truii is designed to embrace the data wrangling process and let you stay on top of managing your data while your team simultaneously accesses the data.

The viz’s are easy to make . All you need to do is create a free Truii account to create and publish your own data visualizations.
Don’t forget to sign up to Truii’s news and posts (form on the right).