Machine Learning Data Preparation

Follow @AnalyseUp

Split Data Into Features & Target

Create a DataFrame that contains the features (X) that will be used to predict the target and seperate Series that contain just the target (y):

X = data[['neighbourhood_group', 'latitude', 'longitude', 'room_type']]

y = data['price']

Create Dummy Variables Using Pandas get_dummies

Convert the categorical features "neighbourhood_group" and "room_type" into dummy variables with first level of each feature dropped:

X = pd.get_dummies(X, columns=['neighbourhood_group','room_type'], drop_first=True)

Split Data Into Train & Test Sets Using Sklearn train_test_split

Split the data into train and test sets where the test set with train taking 70% of the data and test taking 30%:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

Scale Data Using Standard Scaler from Sklearn

Fit the standard scaler to X_train data then transform X_train and X_test using the fitted scaler. As the transform function outputs the data as a Numpy array then we convert back this back to Pandas DataFrame.

                              from sklearn.preprocessing import StandardScaler
                          
                              #Initalise standard scaler
                          
                              scaler = StandardScaler()
                          
                              #Fit the scaler using X_train data
                          
                              scaler.fit(X_train)
                          
                              #Transform X_train and X_test using the scaler and convert back to DataFrame
                          
                              X_train = pd.DataFrame(scaler.transform(X_train), columns = X_train.columns)
                          
                              X_test = pd.DataFrame(scaler.transform(X_test), columns = X_test.columns)

Data Wrangling

Data Analysis

Machine Learning

Other Tutorials & Content

Machine Learning Data Preparation

Split Data Into Features & Target

Create Dummy Variables Using Pandas get_dummies

Split Data Into Train & Test Sets Using Sklearn train_test_split

Scale Data Using Standard Scaler from Sklearn