当我运行一个正常的wordcount程序(使用下面的代码)而不包含任何Dataframe时,我可以使用spark-submit运行应用程序。
object wordCount {
def main(args: Array[String]): Unit = {
val logFile= "path/thread.txt"
val sparkConf = new SparkConf().setAppName("Spark Word Count")
val sc = new SparkContext(sparkConf)
val file = sc.textFile(logFile)
val counts = file.flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.saveAsTextFile("path/output1234")
sc.stop()
}
}
但是当我运行以下代码时
import scala.reflect.runtime.universe
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
object wordCount {
def main(args: Array[String]): Unit = {
val logFile = "path/thread.txt"
val sparkConf = new SparkConf().setAppName("Spark Word Count")
val sc = new SparkContext(sparkConf)
val file = sc.textFile(logFile)
val counts = file.flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
case class count1(key:String,value:Int)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._;
counts.toDF.registerTempTable("count1")
val counts1 = sqlContext.sql("select * from count1")
counts.saveAsTextFile("path/output1234")
sc.stop()
}
}
我收到以下错误:
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at com.cadillac.spark.sparkjob.wordCount$.main(wordCount.scala:18)
我不确定我错过了什么。
我使用的Pom.xml如下,
<name>sparkjob</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.10</version>
</dependency>
</dependencies>
</project>
请提出任何修改建议。
我的群集是
Spark-version 2.1.0 -Map -1703 Scale Version 2.11.8
提前致谢
如果你去this documentation那里定义错误的原因
这意味着代码中使用的库中存在混合的Scala版本。 Scala 2.10和2.11之间的集合API不同,如果尝试在Scala 2.11运行时加载Scala 2.10库,则会出现最常见的错误。要解决此问题,请确保该名称具有正确的Scala版本后缀以匹配您的Scala版本。
所以改变你的依赖关系
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.1</version>
</dependency>
至
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.0</version>
</dependency>
并添加一个依赖项
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
我猜错误应该消失