Split Data Into Features & Target
Create a DataFrame that contains the features (X) that will be used to predict the target and seperate Series that contain just the target (y):
X = data[['neighbourhood_group', 'latitude',
'longitude', 'room_type']]
y = data['price']
Create Dummy Variables Using Pandas get_dummies
Convert the categorical features "neighbourhood_group" and "room_type" into dummy variables with first level of each feature dropped:
X = pd.get_dummies(X, columns=['neighbourhood_group','room_type'], drop_first=True)
Split Data Into Train & Test Sets Using Sklearn train_test_split
Split the data into train and test sets where the test set with train taking 70% of the data and test taking 30%:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
Scale Data Using Standard Scaler from Sklearn
Fit the standard scaler to X_train data then transform X_train and X_test using the fitted scaler. As the transform function outputs the data as a Numpy array then we convert back this back to Pandas DataFrame.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), columns = X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), columns = X_test.columns)