Bayesian Optimization LightGBM 적용

import numpy as np

from xgboost import XGBClassifier

from bayes_opt import BayesianOptimization

from sklearn.model_selection import cross_val_score

pbounds = {

'learning_rate': (0.01, 0.5),

'n_estimators': (100, 1000),

'max_depth': (3, 10),

'min_child_weight': (0, 10),

'subsample': (0.5, 1.0),

'colsample_bytree': (0.5, 1.0)

# 'reg_lambda': (0, 1000),

# 'reg_alpha': (0, 1.0)

}

def lgbm_hyper_param(learning_rate, n_estimators, max_depth, min_child_weight, subsample, colsample_bytree):

max_depth = int(max_depth)

n_estimators = int(n_estimators)

clf = LGBMClassifier(

max_depth=max_depth,

min_child_weight=min_child_weight,

learning_rate=learning_rate,

n_estimators=n_estimators,

subsample=subsample,

colsample_bytree=colsample_bytree,

random_state=1

# reg_lambda=reg_lambda,

# reg_alpha=reg_alpha

)

return np.mean(cross_val_score(clf, train_importance, train_answer, cv=5, scoring='accuracy')) # cv 도 숫자로 작성하여, 내부적으로 (Stratified)KFold 사용함

optimizer = BayesianOptimization( f=lgbm_hyper_param, pbounds=pbounds, verbose=1, random_state=1)

optimizer.maximize(init_points=10, n_iter=100, acq='ei', xi=0.01)

에서

| iter | target | colsam... | learni... | max_depth | min_ch... | n_esti... | subsample | ------------------------------------------------------------------------------------------------- [LightGBM] [Info] Number of positive: 273, number of negative: 439 [LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000185 seconds. You can set force_row_wise=true to remove the overhead. And if memory is not enough, you can set force_col_wise=true. [LightGBM] [Info] Total Bins 90 [LightGBM] [Info] Number of data points in the train set: 712, number of used features: 45 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.383427 -> initscore=-0.475028 [LightGBM] [Info] Start training from score -0.475028 [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf

같은 오류가 발생합니다...

안녕하세요. 답변 도우미입니다.

해당 부분은 에러가 아닌 경고 표시로 보이는데요. 경고 표시는 데이터 관련 라이브러리에서는 수시로 나오는 부분이긴 합니다. 관련 실행에 상당한 시간이 걸리긴 하는데요. 금일 전체 코드에 대해서 최신 아나콘다 버전에 탑재된 pandas 와 scikit-learn 라이브러리에 맞추어, 테스트를 진행했고,
에러등이 나타나는 코드에 대해서는 모두 업데이트를 하였습니다. 정상 동작을 하긴 하는데, 너무나도 많은 시간이 걸려서, 관련 파라미터를 테스트용으로 작게 조정을 하였습니다. (저희 PC 가 거의 최상급 PC 인데도 상당한 시간이 걸리더라고요)

수업자료에서 새로 다운로드 가능하고요. 관련 코드도 참고해보시면서, 테스트해보시면 좋을 것 같습니다.

참고로 문의에 또다른 문의등을 다시면, 이미 해결된 것으로 나와서, 문의를 확인하지 못하는 상황이 발생하거든요. 우연히 본 질문도 찾게 된것이라서, 이 부분도 참고하시면 좋을 것 같습니다.

감사합니다.

인프런 커뮤니티 질문&답변