Cracking the data science interview
by Randy Gingeleski
6 minutes to read
"How can I crack the data science interview?" For interview prep or just studying data science, here are answers and additional resources.
While I’ll probably never directly work as a data scientist, skills of that nature are becoming more prevalent throughout the tech scene. There was definitely a presence at Uncubed.
As a study tool, here are answers and additional resources. Credit to Nir for the questions obviously. Reference links for everything are at the end of attributed statements.
(Image credit - Wikimedia)
Entry- or Mid-Level Position Interview Questions
What is P value?
The P value (or calculated probability) is the estimated probability of rejecting the null hypothesis H0 when that hypothesis is true. Usually the null hypothesis is one of “no difference” - like “no difference in blood pressure between groups A and B.” - StatsDirect
What is regularization? Which problem does regularization try to solve?
Regularization is tuning or selecting the preferred level of model complexity so your models are better at predicting (generalizing). If you don’t do this your models may be too complex and overfit or too simple and underfit, either way giving poor predictions.
To regularize you need 2 things:
- A way of testing how good your models are at prediction, for example using cross-validation or a set of validation data (you can’t use the fitting error for this).
- A tuning parameter which lets you change the complexity or smoothness of the model, or a selection of models of differing complexity/smoothness.
Basically you adjust the complexity parameter (or change the model) and find the value which gives the best model predictions. - Toby Kelsey
How can you fit a non-linear relationship between X and Y into a linear model?
As an example - X could be age, Y could be income.
What is the probability of getting a sum of 2 from 2 equally weighted dice?
To arrive at a sum of 2…
What about a sum of 4?
For a sum of 7…
What is gradient descent method?
“Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.” - Wikipedia
Which clustering methods are you familiar with?
Cluster Analysis: Basic Concepts & Algorithms (pdf) - University of Minnesota
Statistical Clustering (pdf slides) - Texas A&M University
Which libraries for analytics / data science are you familiar with in Python?
9 Python Analytics Libraries - Data Science Central
Python for Data Analysis via O’Reilly
What is an eigenvalue?
To really understand eigenvalues, you need a grasp on eigenvectors.
“In linear algebra, an eigenvector or characteristic vector of a square matrix is a vector that does not change its direction under the associated linear transformation.” - Wikipedia
_Image credit - Wikipedia_
“The eigenvalue… tells whether the special vector x is stretched or shrunk or reversed or left unchanged…” - MIT
Eigenvalues & Eignvectors (pdf) from MIT
A question on Baye’s Law
“Bayes’ Law relates current probability to prior probability.“ - Wikipedia
Image credit - Air & Space Power Journal
A question of time series
If you have a data set with 100 observations for each Xi, and 3 lag-effect variables of X1, how many predictions will you have if you run any simple linear regression?
Advanced-Level Position Interview Questions
What is the difference in the outcome (coefficients) between the L1 and L2 norms?
What is Box-Cox transformation?
“… Many real data sets are in fact not approximately normal. However, an appropriate transformation of a data set can often yield a data set that does follow approximately a normal distribution. … The Box-Cox transformation is a particularly useful family of transformations.” - Engineering Statistics Handbook
Image credit - Online Stat Book
What is multicollinearity?
Multicollinearity (also collinearity) is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy. - Wikipedia
It can cause strange results when attempting to study how well individual independent variables contribute to an understanding of the dependent variable. - Investopedia
Examples of contexts in which multicollinearity arises - survival analysis, interest rates for different terms to maturity. - Wikipedia
The simplest way to resolve multicollinearity problems is to reduce the number of collinear variables until there is only one remaining out of the set. - Investopedia
Will gradient descent methods always converge on the same point?
Is it necessary that gradient descent methods will always find the global minima?
guide through the intuition and explain some of the math
Questions about natural language processing (NLP)
A question in combinatorics
Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures. Aspects of combinatorics include:
- Counting the structures of a given kind and size, deciding when certain criteria can be met, and constructing and analyzing objects meeting the criteria.
- Finding “largest”, “smallest”, or “optimal” objects.
- Studying combinatorial structures arising in an algebraic context, or applying algebraic techniques to combinatorial problems. - Wikipedia
(Image credit - University of Miami)
Bonus: Top 3 Algorithms for Data Scientists
Logistic Regression / Linear Regression
“For binary classification and regression.”
“For textual analysis.”