Cracking the data science interview

by Randy Gingeleski

6 minutes to read

"How can I crack the data science interview?" For interview prep or just studying data science, here are answers and additional resources.

Post featured image

While I’ll probably never directly work as a data scientist, skills of that nature are becoming more prevalent throughout the tech scene. There was definitely a presence at Uncubed.

Nir Kaldero gave a solid list of interview questions on Quora, kind of an answer to “how can I crack the data science interview?”

As a study tool, here are answers and additional resources. Credit to Nir for the questions obviously. Reference links for everything are at the end of attributed statements.

[Data Science Disciplines

Data Science Disciplines

](/wp-content/uploads/2015/02/Data-Science-Disciplines.png)

(Image credit - Wikimedia)


Entry- or Mid-Level Position Interview Questions

What is P value?

The P value (or calculated probability) is the estimated probability of rejecting the null hypothesis H0 when that hypothesis is true. Usually the null hypothesis is one of “no difference” - like “no difference in blood pressure between groups A and B.” - StatsDirect

Wikipedia entry on P value.

What is regularization? Which problem does regularization try to solve?

Regularization is tuning or selecting the preferred level of model complexity so your models are better at predicting (generalizing). If you don’t do this your models may be too complex and overfit or too simple and underfit, either way giving poor predictions.

To regularize you need 2 things:

  1. A way of testing how good your models are at prediction, for example using cross-validation or a set of validation data (you can’t use the fitting error for this).
  2. A tuning parameter which lets you change the complexity or smoothness of the model, or a selection of models of differing complexity/smoothness.

Basically you adjust the complexity parameter (or change the model) and find the value which gives the best model predictions. - Toby Kelsey

Wikipedia entry on regularization.

Quora page on regularization.

How can you fit a non-linear relationship between X and Y into a linear model?

As an example - X could be age, Y could be income.

What is the probability of getting a sum of 2 from 2 equally weighted dice?

To arrive at a sum of 2…

What about a sum of 4?

For a sum of 7…

What is gradient descent method?

“Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.” - Wikipedia

Read Eren Golge’s answer

Wikipedia entry on gradient descent. Quora page on gradient descent.

Which clustering methods are you familiar with?

Cluster Analysis: Basic Concepts & Algorithms (pdf) - University of Minnesota

Statistical Clustering (pdf slides) - Texas A&M University

Wikipedia entry on cluster analysis.

Which libraries for analytics / data science are you familiar with in Python?

Probably the most prevalent: “pandas” Python data analysis library

9 Python Analytics Libraries - Data Science Central

Python for Data Analysis via O’Reilly

What is an eigenvalue?

To really understand eigenvalues, you need a grasp on eigenvectors.

“In linear algebra, an eigenvector or characteristic vector of a square matrix is a vector that does not change its direction under the associated linear transformation.” - Wikipedia

[eigenvector-grid

eigenvector-grid

](/wp-content/uploads/2015/02/eigenvector-grid.png)

_Image credit - Wikipedia_

“The eigenvalue… tells whether the special vector x is stretched or shrunk or reversed or left unchanged…” - MIT

Eigenvalues & Eignvectors (pdf) from MIT

Khan Academy: Introduction to eigenvalues & eigenvectors

Wikipedia entry on eignvalues and eigenvectors.

A question on Baye’s Law

“Bayes’ Law relates current probability to prior probability.“ - Wikipedia

[(Image credit - Air & Space Power Journal)

(Image credit - Air & Space Power Journal)

](/wp-content/uploads/2015/02/Bayes-Theorem.gif)

Image credit - Air & Space Power Journal

A question of time series

If you have a data set with 100 observations for each Xi, and 3 lag-effect variables of X1, how many predictions will you have if you run any simple linear regression?


Advanced-Level Position Interview Questions

What is the difference in the outcome (coefficients) between the L1 and L2 norms?

Read Quote of YunFang Juan’s answer to What is the difference between L1 and L2 regularization? on Quora

Wikipedia entry on regularization.

Quora page on regularization.

What is Box-Cox transformation?

“… Many real data sets are in fact not approximately normal. However, an appropriate transformation of a data set can often yield a data set that does follow approximately a normal distribution. … The Box-Cox transformation is a particularly useful family of transformations.” - Engineering Statistics Handbook

[box-cox

box-cox

](/wp-content/uploads/2015/02/box-cox.png)

Image credit - Online Stat Book

Engineering Statistics Handbook page on Box-Cox transformation.

Online Stat Book page on Box-Cox transformation.

Wikipedia entry on Box-Cox distribution.

What is multicollinearity?

Multicollinearity (also collinearity) is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy. - Wikipedia

It can cause strange results when attempting to study how well individual independent variables contribute to an understanding of the dependent variable. Investopedia

Examples of contexts in which multicollinearity arises - survival analysis, interest rates for different terms to maturity. Wikipedia

The simplest way to resolve multicollinearity problems is to reduce the number of collinear variables until there is only one remaining out of the set. - Investopedia

Quora page on multicollinearity.

Will gradient descent methods always converge on the same point?

Wikipedia entry on gradient descent.

Quora page on gradient descent.

Is it necessary that gradient descent methods will always find the global minima?

guide through the intuition and explain some of the math

Quora page on gradient descent.

Questions about natural language processing (NLP)

Wikipedia entry on natural language processing.

Quora page on natural language processing.

A question in combinatorics

Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures. Aspects of combinatorics include:

  • Counting the structures of a given kind and size, deciding when certain criteria can be met, and constructing and analyzing objects meeting the criteria.
  • Finding “largest”, “smallest”, or “optimal” objects.
  • Studying combinatorial structures arising in an algebraic context, or applying algebraic techniques to combinatorial problems. - Wikipedia

Quora page on combinatorics.

[(Image credit - University of Miami)

(Image credit - University of Miami)

](/wp-content/uploads/2015/02/Statistics.jpg)

(Image credit - University of Miami)


Bonus: Top 3 Algorithms for Data Scientists

Credit to William Chen, these are expanded upon from his Quora answer here.

Logistic Regression / Linear Regression

“For binary classification and regression.”

Quora page on logistic regression.

Quora page on linear regression.

Random Forests

“For classification.”

Quora page on random forests.

TF-IDF

“For textual analysis.”

Quora page on TF-IDF.


If you have no idea what Quora is, check it out and follow me. It’s like a more intelligent Yahoo Answers.