This rise of big data has made businesses realize that to meet their organization’s needs, not only do they need to generate data to act on, but they also need to “wrangle” their data to a useable state to make accurate business decisions; however, not all data that comes into a company is actionable.
To take data and turn it into actionable leads, comes with a learning curve. If businesses are not careful with how they prep their company data, it can lead to costly mistakes. In 2016 IBM estimated that, in the U.S. alone, the yearly cost of poor-quality data was over $3.1 trillion. Costs can stem from numerous areas such as, sanitizing, formatting, or transforming incoming data incorrectly. If this happens, poor-quality data can lead to business decisions that hurt the company more than help it.
Cliché but true, approximately 80% of a Data Analyst’s time is spent on data wrangling. Data wrangling is the process of transforming, mapping, preparing, and enriching data to arrive at a format that is compatible with the end system and to business intelligence tools alike. The abundance of digital media data is there, and it is continuously growing. To ensure this staggering amount of data works for us, we need to properly process it and transform it so we can focus on the information that will be useful for our analysis and that aligns with our measurement framework.
Our checklist for data wrangling:
Get to know your data
- Do we have what is needed to report on our KPIs?
- What data is available and how does it align with our measurement framework?
- Work with Media Planners and Channel Specialists to align on business decisions.
Structure and clean
- Look for outliers and inconsistencies.
- Implement standard formatting across your data tables.
- Missing values analysis (missing values as a percentage of total observations, replacing missing values and not a numbers (NaNs) with other values as needed).
- Not all columns are of interest; use only fields that are needed for final analysis and visualization.
- The same practice applies to dates and time frames.
Enrich and transform
- Utilize conditional filtering and if-then-else statements.
- Group to aggregate rows based on categorical features.
- Create new calculated columns using existing fields based on KPIs of interest.
- Concatenate fields to arrive to a final dimension as specified in the measurement framework.
Join and merge
- Is there a need for data integration?
- Combine tables on a unique matching dimension to retrieve the information needed.
As the age-old battle of ‘quantity vs. quality’ rages on, businesses are faced with the reality that, when it comes to digital data, there is far more ‘quantity’ available. In order to harness the value buried within this sometimes overwhelming amount of data, it is imperative to develop systems and processes that will allow the ‘quality’ information to be easily found and readily used. Finding the best way to extricate the most useful data will be crucial for businesses to find, connect with, and appeal to their audiences in the future.