Prediction with Python

Machine Learning is a power tool for your predictions

Many recruiters ask me if I know to do Prediction. I ask myself if they not should be asking if I know and understand Linear Regression?!

For that has an excellent library in Python that will help you:

from sklearn.linear_model import LinearRegression

One of the pillar's of a good analysis is your dataset, and for prediction couldn't be different, if you want a good prediction you must have a good dataset where you must have the variables and the value that you want to predict.

More big is your dataset with history data and accurate data, better will be your prediction. Thinking in sales prediction, normally you will have the channels of marketing where the money is expend and the quantity or value of sales for these period. Example below:

image.png

There come two important steps before you create your prediction:

  1. Create a scatter plot with trending line with the mean to see how distributed are your dataset from the mean.
  2. Check the correlation to see which variable have more relation with the sales, that means, which is the variable that is closer to 1,00. Correlation is a number from 0 to 1, where 1 is the perfect correlation.

image.png

That already give you the idea where to invest your marketing budget, but now come the trick part, use Machine Learning to test your model. We already saw the library for LinearRegression above, other to be used is 'train_test_split':

from sklearn.model_selection import train_test_split

Combining the both libraries and based on your sales history, you will be able to do a prediction of sales by your budget per channel.

x = np.array(df.drop( ['Sales'], 1 ))
y = np.array(df['Sales'])
x_train, xtest, y_train, ytest = train_test_split(x, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(x_train, y_train)
print(model.score(xtest, ytest))
# features = ['TV', 'Radio', 'Newspaper']
features = np.array([[500.1, 10.8, 8.4]])
print(model.predict(features))