It can be exciting when your data analysis suggests a surprising or counterintuitive prediction. But the result might be due to overfitting, which occurs when a statistical model describes random noise rather than the underlying relationship you need to capture. You can guard against this trap by keeping your analysis simple. Be on guard against spurious correlations, and look for relationships that measure important effects related to clear, logical hypotheses. Test for overfitting by randomly dividing the data into a training set, with which you’ll estimate the model, and a validation set, with which you’ll test the accuracy of the model’s predictions. An overfit model might be great at making predictions within the training set but raise warning flags by performing poorly in the validation set. You might also consider alternative narratives: Is there another story you could tell with the same data? If so, you cannot be confident that the relationship you have uncovered is the right — or only — one.
Adapted from the HBR Guide to Data Analytics Basics for Managers