pd.concat(df, ignore_index=True) I... - 인프런

내 업무를 대신 할 파이썬(Python) 웹크롤링 & 자동화 (feat. 주식, 부동산 데이터 / 인스타그램)

15.4 데이터 리포멧팅(reformatting) part2 - 단지정보

pd.concat(df, ignore_index=True) InvalidIndexError 해결 방법 문의

작성

168

안녕하세요.

아래 코드에서 마지막 부분에서 에러가 발생하는데 찾아봐도 해결을 못하겠습니다.

# 최종 데이터 합치기

df1 = pd.concat(df, ignore_index=True)

---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
Cell In[89], line 2
      1 # 최종 데이터 합치기    
----> 2 df1 = pd.concat(df, ignore_index=True)

File ~\anaconda3\Lib\site-packages\pandas\core\reshape\concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    378     copy = False
    380 op = _Concatenator(
    381     objs,
    382     axis=axis,
   (...)
    390     sort=sort,
    391 )
--> 393 return op.get_result()

File ~\anaconda3\Lib\site-packages\pandas\core\reshape\concat.py:676, in _Concatenator.get_result(self)
    674         obj_labels = obj.axes[1 - ax]
    675         if not new_labels.equals(obj_labels):
--> 676             indexers[ax] = obj_labels.get_indexer(new_labels)
    678     mgrs_indexers.append((obj._mgr, indexers))
    680 new_data = concatenate_managers(
    681     mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
    682 )

File ~\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3875, in Index.get_indexer(self, target, method, limit, tolerance)
   3872 self._check_indexing_method(method, limit, tolerance)
   3874 if not self._index_as_unique:
-> 3875     raise InvalidIndexError(self._requires_unique_msg)
   3877 if len(target) == 0:
   3878     return np.array([], dtype=np.intp)

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

============================

df = []

articleNos = ['2433459189','2433504511']

for articleNo in articleNos:

ind_url = f'https://new.land.naver.com/api/articles/{articleNo}?complexNo='

res = requests.get(ind_url, headers=headers)

ind_dict = res.json()

article_df = pd.Series(ind_dict['articleDetail']).to_frame().T

# articleDetail_df = articleDetail_df[['articl/eNo','articleName','buildingTypeName','realestateTypeName', 'tradeTypeName', 'cityName','divisionName', 'sectionName', 'etcAddress', 'monthlyManagementCost', 'buildingName']]

if 'articleFloor' in ind_dict.keys():

articleFloor_df = pd.Series(ind_dict['articleFloor']).to_frame().T

article_df = pd.concat(

[

article_df,

articleFloor_df,

axis=1

)

else:

print(articleNo, '/', 'articleFloor')

# articleFloor_df = articleFloor_df[['totalFloorCount','correspondingFloorCount']]

if 'articlePrice' in ind_dict.keys():

articlePrice = pd.Series(ind_dict['articlePrice']).to_frame().T

article_df = pd.concat(

[

article_df,

articlePrice,

axis=1

)

else:

print(articleNo, '/', 'articlePrice')

# articlePrice_df = articlePrice_df[['dealPrice','allWarrantPrice','allRentPrice']]

if 'articleRealtor' in ind_dict.keys():

articleRealtor = pd.Series(ind_dict['articleRealtor']).to_frame().T

article_df = pd.concat(

[

article_df,

articleRealtor,

axis=1

)

else:

print(articleNo, '/', 'articleRealtor')

# articleRealtor_df = articleRealtor_df[['realtorName','representativeName','cellPhoneNo','representativeTelNo']]

if 'articleSpace' in ind_dict.keys():

articleSpace = pd.Series(ind_dict['articleSpace']).to_frame().T

article_df = pd.concat(

[

article_df,

articleSpace,

axis=1

)

else:

print(articleNo, '/', 'articleSpace')

# articleSpace_df = articleSpace_df[['supplySpace','exclusiveSpace']]

# article_df = pd.concat(

# [

# articleDetail_df,

# articleFloor_df,

# articlePrice_df,

# articleRealtor_df,

# articleSpace_df,

# ],

# axis=1

# )

df.append(article_df)

# 최종 데이터 합치기

df1 = pd.concat(df, ignore_index=True)

python 웹-크롤링 pandas concat

답변 2

DeepingSauce

지식공유자

안녕하세요. 우선 답이 너무 늦어 죄송합니다.

df 라는 list에 넣은 article_df 개개의 dataframe에 중복인덱스가 있나 봅니다

ignore_index는 df를 이루는 article_df간에 중복인덱스가 있다면 무시한다는 뜻이기 때문에, 한 dataframe에서 중복 index가 있는 경우는 해결해주지 못합니다

인프런 AI 인턴

안녕하세요, 인프런 AI 인턴입니다.

초칼라님, pd.concat(df, ignore_index=True)에서 발생하는 InvalidIndexError 문제를 해결하기 위해 질문 내용과 함께 제공된 정보를 검토했습니다. 제공된 정보 중에서는 아래의 게시글이 초칼라님의 문제와 가장 유사한 상황을 다루고 있습니다:

to_csv() 한글 깨짐: 이 게시물은 DataFrame을 to_csv 메소드를 사용하여 CSV 파일로 저장할 때 한글 인코딩 관련 문제를 해결하는 방법에 대해 설명하고 있습니다. 하지만 초칼라님의 문제는 DataFrame들을 concat 함수를 사용해 합치려 할 때 InvalidIndexError가 발생하는 것과 관련이 있습니다. 따라서, 이 내용은 직접적으로 해당 문제를 해결하는데 도움이 되지 않습니다.

초칼라님의 문제는 pd.concat 함수를 사용할 때 여러 DataFrame을 합치고자 하셨지만, 발생한 에러는 보통 인덱스 문제가 있을 때 나타납니다. 특히, InvalidIndexError: Reindexing only valid with uniquely valued Index objects 에러 메시지는 합치려는 DataFrame 중 하나 이상에서 중복된 인덱스 값이 있을 때 발생할 수 있습니다.

해결 방법으로는, 각 DataFrame에 대한 발생 가능한 중복 인덱스를 확인하고, 필요하다면 인덱스를 재설정(reset_index())하는 과정이 도움이 될 수 있습니다. 다만, 제공된 정보 내에서 직접적인 해결 방법을 안내드리기 어려운 점 양해 부탁드립니다.

저는 질문자님의 이해를 돕고자 지식공유자님의 기존 답변을 참고하여 유사한 답변 링크를 추천해드리고 있습니다.
현재 베타 기능으로 답변이 만족스럽지 않을 수 있는 점 양해 부탁드립니다. 🙏
추가적으로 궁금한 점이 있으시면, 이어서 질문해 주세요. 곧 지식공유자께서 답변해 주실 것입니다.

인프런 커뮤니티 질문&답변

pd.concat(df, ignore_index=True) InvalidIndexError 해결 방법 문의