ãã两ç§æ¹æ³ï¼
ããä½¿ç¨ spark-submit 解éæ§è¡pythonèæ¬
ããä½¿ç¨ python 解éæ§è¡pythonèæ¬
ãã1. 使ç¨Spark-submit解éæ§è¡pythonèæ¬
ããpythonèæ¬ä¸éè¦å¨å¼å¤´å¯¼å
¥sparkç¸å
³æ¨¡åï¼è°ç¨æ¶ä½¿ç¨spark-submitæ交ï¼ç¤ºä¾ä»£ç å¦ä¸ï¼
ãã===========================================================
ãã"""odflow.py"""
ããfrom pyspark import SparkContext
ããfileDir = "/TripChain3_Demo.txt"
ãã# sc = SparkContext("local", "ODFlow")
ããsc = SparkContext("spark://ITS-Hadoop10:7077", "ODFlow")
ããlines = sc.textFile(fileDir)
ãã# pythonä¸è½ç´æ¥åå¤è¡çlambda表达å¼ï¼æ以è¦å°è£
å¨å½æ°ä¸
ããdef toKV(line):
ããarr = line.split(",")
ããt = arr[5].split(" ")[1].split(":")
ããreturn (t[0]+t[1]+","+arr[11]+","+arr[18],1)
ããr1 = lines.map( lambda line : toKV(line) ).reduceByKey(lambda a,b: a+b)
ãã# æåºå¹¶ä¸åå
¥ä¸ä¸ªï¼repartitionï¼æ件ä¸
ããr1.sortByKey(False).saveAsTextFile("/pythontest/output")
ãã===========================================================
ããåå¸å½ä»¤ä¸ºï¼
ããspark-submit \
ãã--master spark://ITS-Hadoop10:7077 \
ããodflow.py
ãã2. ä½¿ç¨ python 解éæ§è¡pythonèæ¬
ããç´æ¥ç¨pythonæ§è¡ä¼åºç°é误:
ããImportError: No module named pyspark
ããImportError: No module named py4j.java_gateway
ãã缺å°pysparkåpy4jè¿ä¸¤ä¸ªæ¨¡åï¼è¿ä¸¤ä¸ªå
å¨Sparkçå®è£
ç®å½éï¼éè¦å¨ç¯å¢åééå®ä¹PYTHONPATHï¼ç¼è¾~/.bashrcæè
/etc/profileæ件åå¯
ããvi ~/.bashrc # æè
sudo vi /etc/profile
ãã# æ·»å ä¸é¢è¿ä¸è¡
ããexport PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
ãã# 使å
¶çæ
ããsource ~/.bashrc # æè
sudo source /etc/profile
ããç¶åå
³éç»ç«¯ï¼éæ°æå¼ï¼ç¨pythonæ§è¡å³å¯
ããpython odflow.py
温馨提示:答案为网友推荐,仅供参考