plot을 그릴때 데이터에 결측치가 있으면 그려지지 않는 컬럼이 ... - 인프런

캐글 Advanced 머신러닝 실전 박치기

application 데이터 세트 주요 피처 EDA 수행 - 01(연속형 값 분석)

plot을 그릴때 데이터에 결측치가 있으면 그려지지 않는 컬럼이 있습니다.

해결된 질문

작성

561

show_hist_by_target() 함수 호출 시 'ValueError: cannot convert float NaN to integer' 에러가 발생하는데요. 혹시 seaborn 라이브러리의 버전 문제나 먼저 결측치 제거 작업을 거쳐야 할까요?

머신러닝 배워볼래요? kaggle

답변 10

권 철민

지식공유자

실습에 사용되는 버전은 scipy 버전은 1.5.0 입니다. 지금 seaborn에서 scipy의 statsmodel 을 이용해서 KDE를 그리는 작업을 하는데 여기서 오류가 나는것 같습니다. 버전 upgrade가 필요해 보입니다.

이렇게 기존에 구성된 개인 환경을 사용해서 개별 패키지를 각각 upgrade하여서 실습하시는 것 보다는 강의 동영상에 나와 있는데로 Anaconda를 download 받으시고, conda 기반으로 환경 셋업을 하시고 실습을 하시는게 어떨지요?

groov

질문자

네. 이미 개인적으로 환경이 구성되어 있어 초반 환경 설정 강의를 넘기고 진행해서 그런거 같습니다. 답변 주셔서 감사드립니다.

groov

질문자

네. 1.4.1 입니다.

권 철민

지식공유자

scipy 버전이 안맞는것 같습니다. 버전을 알 수 있을까요?

groov

질문자

네. 다음과 같습니다.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid)
    450     try:
--> 451         bw = float(bw)
    452     except:

ValueError: could not convert string to float: 'scott'

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-8-9527850a704a> in <module>
      1 columns = ['AMT_REQ_CREDIT_BUREAU_HOUR']
----> 2 show_hist_by_target(app_train, columns)

<ipython-input-7-551ac81feeb7> in show_hist_by_target(df, columns)
      6         fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(12, 4), squeeze=False)
      7         sns.violinplot(x='TARGET', y=column, data=df, ax=axs[0][0] )
----> 8         sns.distplot(df[cond_0][column], ax=axs[0][1], label='0', color='blue')
      9         sns.distplot(df[cond_1][column], ax=axs[0][1], label='1', color='red')

/opt/conda/lib/python3.7/site-packages/seaborn/distributions.py in distplot(a, bins, hist, kde, rug, fit, hist_kws, kde_kws, rug_kws, fit_kws, color, vertical, norm_hist, axlabel, label, ax)
    231     if kde:
    232         kde_color = kde_kws.pop("color", color)
--> 233         kdeplot(a, vertical=vertical, ax=ax, color=kde_color, **kde_kws)
    234         if kde_color != color:
    235             kde_kws["color"] = kde_color

/opt/conda/lib/python3.7/site-packages/seaborn/distributions.py in kdeplot(data, data2, shade, vertical, kernel, bw, gridsize, cut, clip, legend, cumulative, shade_lowest, cbar, cbar_ax, cbar_kws, ax, **kwargs)
    703         ax = _univariate_kdeplot(data, shade, vertical, kernel, bw,
    704                                  gridsize, cut, clip, legend, ax,
--> 705                                  cumulative=cumulative, **kwargs)
    706 
    707     return ax

/opt/conda/lib/python3.7/site-packages/seaborn/distributions.py in _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative, **kwargs)
    293         x, y = _statsmodels_univariate_kde(data, kernel, bw,
    294                                            gridsize, cut, clip,
--> 295                                            cumulative=cumulative)
    296     else:
    297         # Fall back to scipy if missing statsmodels

/opt/conda/lib/python3.7/site-packages/seaborn/distributions.py in _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative)
    365     fft = kernel == "gau"
    366     kde = smnp.KDEUnivariate(data)
--> 367     kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip)
    368     if cumulative:
    369         grid, y = kde.support, kde.cdf

/opt/conda/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in fit(self, kernel, bw, fft, weights, gridsize, adjust, cut, clip)
    138             density, grid, bw = kdensityfft(endog, kernel=kernel, bw=bw,
    139                     adjust=adjust, weights=weights, gridsize=gridsize,
--> 140                     clip=clip, cut=cut)
    141         else:
    142             density, grid, bw = kdensity(endog, kernel=kernel, bw=bw,

/opt/conda/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid)
    451         bw = float(bw)
    452     except:
--> 453         bw = bandwidths.select_bandwidth(X, bw, kern) # will cross-val fit this pattern?
    454     bw *= adjust
    455 

/opt/conda/lib/python3.7/site-packages/statsmodels/nonparametric/bandwidths.py in select_bandwidth(x, bw, kernel)
    172         # eventually this can fall back on another selection criterion.
    173         err = "Selected KDE bandwidth is 0. Cannot estimate density."
--> 174         raise RuntimeError(err)
    175     else:
    176         return bandwidth

RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.