소개
게시글
질문&답변
2024.06.05
spark_kafka 실행시 java.lang.IllegalArgumentException 에러
혹시 몰라 도커 환경 리스트도 같이 첨부할게요! % docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 64ab0ab44e45 bitnami/spark:3.4 "/opt/bitnami/script…" 12 hours ago Up 12 hours dockertest-spark-worker-2 a105f410fb7e bitnami/spark:3.4 "/opt/bitnami/script…" 13 hours ago Up 12 hours 0.0.0.0:4040->4040/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:18080->18080/tcp dockertest-spark-1 8bd9e59d9d2d bitnami/cassandra:4.0.11 "/opt/bitnami/script…" 13 hours ago Up 12 hours 7000/tcp, 0.0.0.0:9042->9042/tcp dockertest-cassandra-1 26427ab96f20 bitnami/kafka:3.4 "/opt/bitnami/script…" 13 hours ago Up 11 hours 0.0.0.0:9092->9092/tcp, 0.0.0.0:9094->9094/tcp dockertest-kafka-1 c2c01171c8e6 jupyter/pyspark-notebook "tini -g -- start-no…" 13 hours ago Up 12 hours (healthy) 4040/tcp, 0.0.0.0:8888->8888/tcp dockertest-pyspark-1 bb193b55b622 bitnami/spark:3.4 "/opt/bitnami/script…" 13 hours ago Up 12 hours dockertest-spark-worker-1구글링 후 버전을 낮추면서 실행해도 동일합니다.https://stackoverflow.com/questions/76920944/pyspark-structured-streaming-error-related-to-allowautotopiccreation실행을 할때마다 app 폴더가 계속 생깁니다(사진)
- 1
- 9
- 726
질문&답변
2024.06.05
spark_kafka 실행시 java.lang.IllegalArgumentException 에러
저도 동일한 에러가 발생하는 것 같습니다. 맥북 M1을 사용중이며 강의 자료 수정 없이 진행했는데 발생합니다. 윈도우 노트북에서도 똑같은 에러가 발생하네요... 먼저 카프카 컨테이너에서 토픽부분입니다I have no name!@26427ab96f20:/opt/bitnami/kafka/bin$ ./kafka-console-producer.sh --bootstrap-server 127.0.0.1:9092 --topic quickstart --producer.config /opt/bitnami/kafka/config/producer.properties >123 1234 >asdd fdsfsf스파크 컨테이너에서 9092 포트 확인 부분입니다.root@a105f410fb7e:/opt/bitnami/spark/work# nc -vz kafka 9092 Connection to kafka (172.19.0.2) 9092 port [tcp/*] succeeded!스파크 컨테이너에서 실행시 에러입니다.root@a105f410fb7e:/opt/bitnami/spark/work# spark-submit --master spark://spark:7077 --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1 spark_kafka.py :: loading settings :: url = jar:file:/opt/bitnami/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-b98230a9-4cd2-4a18-8308-db54fe90bc3e;1.0 confs: [default] found org.apache.spark#spark-sql-kafka-0-10_2.12;3.4.1 in central found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.4.1 in central found org.apache.kafka#kafka-clients;3.3.2 in central found org.lz4#lz4-java;1.8.0 in central found org.xerial.snappy#snappy-java;1.1.10.1 in central found org.slf4j#slf4j-api;2.0.6 in central found org.apache.hadoop#hadoop-client-runtime;3.3.4 in central found org.apache.hadoop#hadoop-client-api;3.3.4 in central found commons-logging#commons-logging;1.1.3 in central found com.google.code.findbugs#jsr305;3.0.0 in central found org.apache.commons#commons-pool2;2.11.1 in central :: resolution report :: resolve 244ms :: artifacts dl 6ms :: modules in use: com.google.code.findbugs#jsr305;3.0.0 from central in [default] commons-logging#commons-logging;1.1.3 from central in [default] org.apache.commons#commons-pool2;2.11.1 from central in [default] org.apache.hadoop#hadoop-client-api;3.3.4 from central in [default] org.apache.hadoop#hadoop-client-runtime;3.3.4 from central in [default] org.apache.kafka#kafka-clients;3.3.2 from central in [default] org.apache.spark#spark-sql-kafka-0-10_2.12;3.4.1 from central in [default] org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.4.1 from central in [default] org.lz4#lz4-java;1.8.0 from central in [default] org.slf4j#slf4j-api;2.0.6 from central in [default] org.xerial.snappy#snappy-java;1.1.10.1 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 11 | 0 | 0 | 0 || 11 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-b98230a9-4cd2-4a18-8308-db54fe90bc3e confs: [default] 0 artifacts copied, 11 already retrieved (0kB/5ms) 24/06/04 23:46:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable root |-- key: binary (nullable = true) |-- value: binary (nullable = true) |-- topic: string (nullable = true) |-- partition: integer (nullable = true) |-- offset: long (nullable = true) |-- timestamp: timestamp (nullable = true) |-- timestampType: integer (nullable = true) 24/06/04 23:46:39 WARN ResolveWriteToStream: spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets and will be disabled. 24/06/04 23:46:39 WARN OffsetSeqMetadata: Updating the value of conf 'spark.sql.shuffle.partitions' in current session from '3' to '200'. 24/06/04 23:46:40 ERROR MicroBatchExecution: Query [id = 4e479a54-3766-412f-ba6c-c5879ecc0e00, runId = a0b6261a-59a5-4f3b-a214-db6dad504d90] terminated with error java.lang.IllegalArgumentException: Expected e.g. {"topicA":{"0":23,"1":-1},"topicB":{"0":-2}}, got {"logOffset":0} at org.apache.spark.sql.kafka010.JsonUtils$.partitionOffsets(JsonUtils.scala:75) at org.apache.spark.sql.kafka010.KafkaMicroBatchStream.deserializeOffset(KafkaMicroBatchStream.scala:216) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$getStartOffset$1(MicroBatchExecution.scala:454) at scala.Option.map(Option.scala:230) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.getStartOffset(MicroBatchExecution.scala:454) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$4(MicroBatchExecution.scala:489) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:411) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:409) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$2(MicroBatchExecution.scala:488) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$1(MicroBatchExecution.scala:477) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:802) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.constructNextBatch(MicroBatchExecution.scala:473) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:266) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:411) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:409) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:247) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:237) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:306) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:284) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:207) Traceback (most recent call last): File "/opt/bitnami/spark/work/spark_kafka.py", line 44, in query.awaitTermination() File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/sql/streaming/query.py", line 201, in awaitTermination File "/opt/bitnami/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco pyspark.errors.exceptions.captured.StreamingQueryException: [STREAM_FAILED] Query [id = 4e479a54-3766-412f-ba6c-c5879ecc0e00, runId = a0b6261a-59a5-4f3b-a214-db6dad504d90] terminated with exception: Expected e.g. {"topicA":{"0":23,"1":-1},"topicB":{"0":-2}}, got {"logOffset":0}
- 1
- 9
- 726