我正在尝试运行一些Spark Scala代码:
import org.apache.spark.{SparkConf, SparkContext}
import scala.collection.mutable.ListBuffer
object EzRecoMRjobs {
def main(args: Array[String]) {
val conf = new SparkConf()
conf.setMaster("local")
conf.setAppName("Product Cardinalities")
val sc = new SparkContext(conf)
val dataset = sc.textFile(args(0))
// Load parameters
val customerIndex = args(1).toInt - 1
val ProductIndex = args(2).toInt - 1
val outputPath = args(3).toString
val resu = dataset.map( line => { val orderId = line.split("\t")(0)
val cols = line.split("\t")(1).split(";")
cols(ProductIndex)
})
.map( x => (x,1) )
.reduceByKey(_ + _)
.saveAsTextFile(outputPath)
sc.stop()
}
}
此代码在Intellij中工作,并将结果写入“outputPath”文件夹。从我的Intellij项目中我生成了一个.jar文件,我想用spark-submit运行这段代码。所以在我的终端我推出:
spark-submit \
--jars /Users/users/Documents/TestScala/ezRecoPreBuild/target/ezRecoPreBuild-1.0-SNAPSHOT.jar \
--class com.np6.scala.EzRecoMRjobs \
--master local \
/Users/users/Documents/DATA/data.txt 1 2 /Users/users/Documents/DATA/dossier
但我得到了这个错误:
Exception in thread "main" java.lang.NumberFormatException: For input string: "/Users/users/Documents/DATA/dossier"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at com.np6.scala.EzRecoMRjobs$.main(ezRecoMRjobs.scala:51)
at com.np6.scala.EzRecoMRjobs.main(ezRecoMRjobs.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
这个错误的原因是什么?谢谢
查看文档:https://spark.apache.org/docs/latest/submitting-applications.html
第一个应用程序参数应该是jar文件路径,所以很明显你得到一个NumberFormatException,因为你的代码将最后一个参数(它是一个String)解析为一个数字。
--jars标志用于指定将在您的应用程序中使用的其他jar。
您必须以这种方式运行spark-submit命令:
spark-submit \
--class com.np6.scala.EzRecoMRjobs \
--master local[*] \
/Users/users/Documents/TestScala/ezRecoPreBuild/target/ezRecoPreBuild-1.0-SNAPSHOT.jar /Users/users/Documents/DATA/data.txt 1 2 /Users/users/Documents/DATA/dossier
希望能帮助到你。