LangSmith Evaluator로 Ollama 모델 설정

강의 전반을 개인 데스크탑에 도커를 설치하여 Ollama에 한국어로 파인튜닝된 llama3.2 모델을 사용하였었습니다.

streamlit으로 만든 프로젝트도 문제없이 돌아갔었는데,

Langsmith에서 Evaluator로 해당 모델을 설정하니 아래와 같은 에러가 나옵니다.

langsmith에서는 ollama 모델로 평가를 할 수 없는건가요?

ollama._types.ResponseError: llama3.2-ko does not support tools
Error running evaluator <DynamicRunEvaluator answer_hallucination_evaluator> on run 38c51823-def2-4eb1-8347-c019874622eb: KeyError('contexts')
Traceback (most recent call last):
  File "E:\PythonProject\rag_streamlit\.venv\Lib\site-packages\langsmith\evaluation\_runner.py", line 1573, in _run_evaluators
    evaluator_response = evaluator.evaluate_run(
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\PythonProject\rag_streamlit\.venv\Lib\site-packages\langsmith\evaluation\evaluator.py", line 331, in evaluate_run
    result = self.func(
             ^^^^^^^^^^
  File "E:\PythonProject\rag_streamlit\.venv\Lib\site-packages\langsmith\run_helpers.py", line 617, in wrapper
    raise e
  File "E:\PythonProject\rag_streamlit\.venv\Lib\site-packages\langsmith\run_helpers.py", line 614, in wrapper
    function_result = run_container["context"].run(func, *args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Temp\ipykernel_10064\4107259675.py", line 12, in answer_hallucination_evaluator
    contexts = run.outputs["contexts"]
               ~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'contexts'
Error running evaluator <DynamicRunEvaluator answer_evaluator> on run 4a5e6612-1a97-4efc-9da4-ee0b4f113b70: ResponseError('llama3.2-ko does not support tools')

# Prompt
# hallucination 판단을 위한 프롬프트
grade_prompt_hallucinations = prompt = hub.pull("langchain-ai/rag-answer-hallucination")

def answer_hallucination_evaluator(run, example) -> dict:
    """
    hallucination 판단을 위한 Evaluator
    """

    # 데이터셋에 있는 질문과, LLM이 답변을 생성할 때 사용한 context를 활용
    input_question = example.inputs["input_question"]
    contexts = run.outputs["contexts"]

    # LLM의 답변
    prediction = run.outputs["answer"]

    # LLM Judge로 사용될 LLM
    llm = ChatOllama(
        model="llama3.2-ko",
        base_url=os.getenv("LLM_BASE_URL"),
        temperature=0
    )


# LLM 응답을 위한 LCEL 활용
    # 3.6 `dictionary_chain`의 `prompt | llm | StrOutputParser()`` 의 구조와 유사함
    answer_grader = grade_prompt_hallucinations | llm

    # Evaluator 실행
    score = answer_grader.invoke({"documents": contexts,
                                  "student_answer": prediction})
    score = score["Score"]

    return {"key": "answer_hallucination", "score": score}

코드는 llm 부분만 ChatOllama를 사용하고 나머지 부분은 모두 동일합니다

인프런 커뮤니티 질문&답변