Episode 100: David Hand

Listen to Episode on:

 
 

Watch the Unabridged Interview:

Order Books:

Dark Data: Why What You Don’t Know Matters

We like to think we have everything we need to make decisions based on the numbers we are presented in a data set. But any large data set is bound to have problems. And it's often the data that we are missing that can lead us off course unexpectedly. 

David Hand has written many books, including The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day and the more recent, Dark Data: Why What You Don’t Know Matters. He is also emeritus professor of math at Imperial College.

David and Greg talk today about bias in statistics, interpreting data sets, and whether or not we are just more aware of global events happening than we were in the past, and how that affects stats?

Episode Quotes:

Interpreting data sets:

“You need an element of caution, skepticism about the data because let's face it. Any large data set is likely to have some problems, measurement, error problems, duplications and missing values. In time, missing records, it's likely to have some problems. So, a skeptical attitude I think is a healthy attitude.”

Observational data:

“I think observational data is particularly risky and it has to be said that the data science revolution we are currently living through is in large part driven by big observational administrative data sets. Data sets which arise in the normal practice of everyday life. Running a credit card or a retail operation, for example or a transport company, a hospital or whatever. You're just observing what happens. You're not manipulating or intervening. And in that case, I think the opportunities for distortions are very severe. Now, whether those distortions will impact your conclusions depends on what question you're asking, but there is a great risk.”

Misconceptions of big data sets:

“People have this belief that big data, massive data sets, billions of data points - no need to worry, the size of the data or wash all the problems away. What I say is that big data has all the problems of small data and extra problems of their own because I think they have more opportunities for glitches to occur and problems to arise.”

Show Links:

Guest's Profile:

His Work:

Previous
Previous

Episode 101: Lydia Denworth

Next
Next

Episode 99: Dean Buonomano