Blog

The Do’s and Dont’s when choosing a Data Visualization – (1/2)

Data Visualization it’s all about telling your analytics story in the most convincing and straightforward way. Finding the best balance between a good visualization and communicating an idea or result is not an easy job. What I have found is that the most efficient way depends on what your story is about. I am keeping using the term story since a visualization is journey that your audience needs to be part of. Keep in mind that :

  • If you need to convey something simple ,make it pretty.
  • When a complex idea or result is on the table , make it simple.
  • Never overdo it ,”detrend” you visualization based on the context at hand
  • Know when to stop adding graphs 😉

The following examples demonstrate two different cases of analytics results.

In the first case a simple idea of model performance and time series is shown allowing for a more “invested” graph with a lot of room for visual appeal.

In the second case, a complex observation of a physical system is shown. Accuracy is important in these types of systems so displaying all data points in high detail is important without overdoing it .

On the second part of this piece we will look at the specific ways we can visualize different types of data and the challenges involved.

Data Quality and Tiers of Extractable Information

ThinkstockPhotos-466728721

Ask any data scientist,expert or CEO in the field and he/she will tell you data quality is paramount to running a data driven business efficiently. Still in a recent survey over 60% of businesses suffer from poor data quality with that number getting even higher  to enterprises with limited data governance .

But what exactly is data quality and how it affects model/forecasting/training performance. In layman’s terms good data quality means minimal loss of information needed to complete a task or making a decision. To quantify this, a number of metrics can be evaluated and currently they do include :

  • Consistency
  • Accuracy
  • Completeness
  • Auditability
  • Orderliness
  • Uniqueness
  • Timeliness

The table below takes the case of customer data with quantifiable results while measuring these data quality attributes. Although this is a customer data example, data quality can be applied to any data source being structured or unstructured .

Desktop Screenshot 2019.04.11 - 17.03.16.38 (2)

Having applied these kind of metrics to those attributes , we can have a good of idea the quality of our data set/source and can take steps to improve it. In a paper I am currently writing about this exact quantification of these metrics I examine the categorization of data quality into tiers for evaluating datasets. Depending of the dataset in question the importance of each of those factors can differ and can affect data quality in a variable way.

A data quality Tier system can look something like this with the end result a score that characterizes the data set:

Desktop Screenshot 2019.04.11 - 18.44.49.62 (2)

In the next blog post  we will discuss how to improve data quality given our analysis.