게시글
질문&답변
embedding 과정 중 Error, message length too large 발생
아래와 같이 batch로 반복문 돌려서 add_documents(batch) 처리했어요. from langchain_pinecone import PineconeVectorStore # 데이터를 처음 저장할 때 index_name = 'tax-upstage-index' # Split documents into smaller chunks text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100) chunked_documents = text_splitter.split_documents(document_list) print(f"Chunked documents length: {len(chunked_documents)}") # Initialize the PineconeVectorStore database = PineconeVectorStore.from_documents( documents=[], # Start with an empty list embedding=embedding, index_name=index_name ) # Upload documents in batches batch_size = 100 for i in range(0, len(chunked_documents), batch_size): print(f'index: {i}, batch size: {batch_size}') batch = chunked_documents[i:i + batch_size] database.add_documents(batch) # Add documents to the existing database
- 0
- 4
- 254
고민있어요
웹개발, CI/CD, 인프라 엔지니어 폭넓게하고있는 현직자입니다
- 0
- 1
- 248