작업형2 모의문제3 질문있습니다!

안녕하세요. 작업형2 모의문제3(에어비엔비 가격)을 직접 풀었을 아래와 같이 입력했습니다.

저는 minmax_scale을 사용했고, 선생님께서 입력하신 결과와 비교를 하는데 사용하지 않았다는 것을 알게되었고 결과값도 다르게 나왔습니다.

작업형 2유형 문제를 풀때마다 스케일링을 적용하고 있는데, minmax스케일을 하는 경우와 사용하지 않는 경우가 따로 있나요? 있다면 어떻게 구분할 수 있는지 궁금합니다.

그리고 위 문제에서 적용 안하신 자세한 이유도 궁금합니다.

감사합니다!

import pandas as pd
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

# print(train.head())
# print(test.head())

# print(train.info())
# print(test.info())

train = train.drop(columns = 'id')
test_id = test.pop('id')
train = train.drop(columns = 'name')
test = test.drop(columns = 'name')
train = train.drop(columns = 'host_id')
test = test.drop(columns = 'host_id')
train = train.drop(columns = 'host_name')
test = test.drop(columns = 'host_name')
train = train.drop(columns = 'neighbourhood')
test = test.drop(columns = 'neighbourhood')
train = train.drop(columns = 'neighbourhood_group')
test = test.drop(columns = 'neighbourhood_group')
train = train.drop(columns = 'last_review')
test = test.drop(columns = 'last_review')

# print(train.info())
# print(test.info())

# print(train.isnull().sum())
# print(test.isnull().sum())
#last_review, reviews_per_month

train['reviews_per_month'] = train['reviews_per_month'].fillna(0)
test['reviews_per_month'] = test['reviews_per_month'].fillna(0)

# print(train.isnull().sum())
# print(test.isnull().sum())

# print(train.info()) room_type,last_review
# print(test.info())

from sklearn.preprocessing import LabelEncoder
cols = train.select_dtypes(include = 'object').columns
for col in cols:
  encoder = LabelEncoder()
  train[col] = encoder.fit_transform(train[col])
  test[col] = encoder.transform(test[col])

# print(train.describe())
# print(test.describe())

from sklearn.preprocessing import minmax_scale
cols2 = train.select_dtypes(exclude = 'object').columns
for col in cols2:
  train[col] = minmax_scale(train[col])

cols3 = test.select_dtypes(exclude = 'object').columns
for col in cols3:
  test[col] = minmax_scale(test[col])

# print(train.describe())
# print(test.describe())

# print(train.info())
# print(test.info())

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(train.drop('price', axis = 1), train['price'], test_size=0.2, random_state = 20)

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor()
rf.fit(X_train, y_train)
pred_val = rf.predict(X_val)

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# print(mean_squared_error(y_val, pred_val))
# print(mean_absolute_error(y_val, pred_val))
# print(r2_score(y_val, pred_val))

pred = rf.predict(test)

pd.DataFrame({'id': test_id, 'price': pred}).to_csv('5959.csv', index=False)

인프런 커뮤니티 질문&답변