작성
·
1.4K
0
수업에서는 auc를 최대로 하려고 eval metric을 사용하셨는데, 저는 f1score를 최대화 하려고 합니다.
인터넷 찾아보면서 eval metric에 f1score를 주려고 하는데 잘 안되는데 혹시 도움을 요청할 수 있을까요?
감사합니다.
답변 5
0
오류 마지막 결과를 보면,
raise ValueError("Found input variables with inconsistent numbers of"
ValueError: Found input variables with inconsistent numbers of samples: [650, 325]
650과 325가 안맞는다고 나옵니다.
325가 650의 절반인데, evaluate_macreF1_lgb의 reshape과정에서 문제가 생기는 것은 아닐까 생각되는데 어떻게 해결해야할지 잘 모르겠습니다...
print(re_train_x.shape[0], re_train_y.shape[0])
print(valid_x.shape[0], valid_y.shape[0])
print(test_x.shape[0], test_y.shape[0])
를 해보면 결과는
1947 1947
650 650
649 649
이 나옵니다.
피처와 레이블의 갯수는 잘 맞는 것 같습니다..
수업시간 데이터는 조금 오래걸려서, 다른 간단한 예제에 적용해보려고 했습니다.
실행이 되는 것 같습니다. 조금 귀찮은 질문이셨을텐데, 답해주셔서 정말 감사드립니다!
0
0
Eval_set을 검증만 넣어서
clf.fit(re_train_x, re_train_y, eval_set=[(valid_x, valid_y)],
eval_metric= evaluate_macroF1_lgb, verbose= 100, early_stopping_rounds= 1000)
위와 같이 fit 시켰을 때 다음과 같은 오류가 나옵니다..
Traceback (most recent call last):
File "C:\Users\user\PycharmProjects\mb\MB_final_code_11_05.py", line 122, in <module>
clf.fit(re_train_x, re_train_y, eval_set=[(valid_x, valid_y)],
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\sklearn.py", line 890, in fit
super().fit(X, _y, sample_weight=sample_weight, init_score=init_score, eval_set=valid_sets,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\sklearn.py", line 683, in fit
self._Booster = train(params, train_set,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\engine.py", line 256, in train
evaluation_result_list.extend(booster.eval_valid(feval))
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\basic.py", line 2888, in eval_valid
return [item for i in range(1, self.__num_dataset)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\basic.py", line 2889, in <listcomp>
for item in self.__inner_eval(self.name_valid_sets[i - 1], i, feval)]
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\basic.py", line 3402, in __inner_eval
feval_ret = eval_function(self.__inner_predict(data_idx), cur_data)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\sklearn.py", line 168, in __call__
return self.func(labels, preds)
File "C:\Users\user\PycharmProjects\mb\MB_final_code_11_05.py", line 49, in evaluate_macroF1_lgb
f1 = f1_score(truth, pred_labels, average='macro')
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 1071, in f1_score
return fbeta_score(y_true, y_pred, beta=1, labels=labels,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 1195, in fbeta_score
_, _, f, _ = precision_recall_fscore_support(y_true, y_pred,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 1464, in precision_recall_fscore_support
labels = _check_set_wise_labels(y_true, y_pred, average, labels,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 1277, in _check_set_wise_labels
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 83, in _check_targets
check_consistent_length(y_true, y_pred)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 319, in check_consistent_length
raise ValueError("Found input variables with inconsistent numbers of"
ValueError: Found input variables with inconsistent numbers of samples: [650, 325]
0
def evaluate_macroF1_lgb(truth, predictions):
pred_labels = predictions.reshape(len(np.unique(truth)),-1).argmax(axis=0)
f1 = f1_score(truth, pred_labels, average='macro')
return ('macroF1', f1, True)
clf = LGBMClassifier(
n_jobs=-1,
n_estimators=1000000,
learning_rate=0.0001,
num_leaves=48,
subsample=0.8,
max_depth=20,
silent=-1,
verbose=-1)
clf.fit(re_train_x, re_train_y, eval_set=[(re_train_x, re_train_y), (valid_x, valid_y)],
eval_metric= evaluate_macroF1_lgb, verbose= 100, early_stopping_rounds= 1000)
코드를 위와 같이 작성하고, eval_metric에만 저 식을 적용해주었는데, 계속 에러가 납니다.
eval metric을 loss나 auc로 하면 작동은 잘 됩니다.
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/mb/MB_final_code_11_03.py", line 116, in <module>
clf.fit(re_train_x, re_train_y, eval_set=[(re_train_x, re_train_y), (valid_x, valid_y)],
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\sklearn.py", line 890, in fit
super().fit(X, _y, sample_weight=sample_weight, init_score=init_score, eval_set=valid_sets,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\sklearn.py", line 683, in fit
self._Booster = train(params, train_set,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\engine.py", line 255, in train
evaluation_result_list.extend(booster.eval_train(feval))
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\basic.py", line 2856, in eval_train
return self.__inner_eval(self._train_data_name, 0, feval)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\basic.py", line 3402, in __inner_eval
feval_ret = eval_function(self.__inner_predict(data_idx), cur_data)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\lightgbm\sklearn.py", line 168, in __call__
return self.func(labels, preds)
File "C:/Users/user/PycharmProjects/mb/MB_final_code_11_03.py", line 38, in evaluate_macroF1_lgb
f1 = f1_score(truth, pred_labels, average='macro')
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 1071, in f1_score
return fbeta_score(y_true, y_pred, beta=1, labels=labels,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 1195, in fbeta_score
_, _, f, _ = precision_recall_fscore_support(y_true, y_pred,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 1464, in precision_recall_fscore_support
labels = _check_set_wise_labels(y_true, y_pred, average, labels,
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 1277, in _check_set_wise_labels
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\metrics\_classification.py", line 83, in _check_targets
check_consistent_length(y_true, y_pred)
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 316, in check_consistent_length
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 316, in <listcomp>
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\user\.conda\envs\mb\lib\site-packages\sklearn\utils\validation.py", line 259, in _num_samples
raise TypeError("Singleton array %r cannot be considered"
TypeError: Singleton array 80 cannot be considered a valid collection.
어떻게 해결해야할까요?
인터넷에서 찾아본 다른 f1 score 코드를 이용해도 모두 에러가 나옵니다.
문제는 0과 1로 된 이진분류 문제입니다..
감사합니다.
0
안녕하십니까
scikit learn f1_score()함수를 이용할 수 있도록 custom 함수를 만들어야 할 것 같습니다.
아래 예제는 https://www.kaggle.com/mlisovyi/lighgbm-hyperoptimisation-with-f1-macro 에서 가져 왔습니다.
from sklearn.metrics import f1_score
def evaluate_macroF1_lgb(truth, predictions):
# this follows the discussion in https://github.com/Microsoft/LightGBM/issues/1483
pred_labels = predictions.reshape(len(np.unique(truth)),-1).argmax(axis=0)
f1 = f1_score(truth, pred_labels, average='macro')
return ('macroF1', f1, True)
import lightgbm as lgb
fit_params={"early_stopping_rounds":300,
"eval_metric" : evaluate_macroF1_lgb,
"eval_set" : [(X_test,y_test)],
'eval_names': ['valid'],
#'callbacks': [lgb.reset_parameter(learning_rate=learning_rate_010_decay_power_099)],
'verbose': False,
'categorical_feature': 'auto'}
사용하시는 데이터 세트가 강의에 사용되는 데이터 세트가 아닌것 같습니다만,, 맞나요?
이진 분류면 함수를 아래와 같이 수정해 보시지요