hadoop - Doing multiple mapreduce jobs in Python -
i writing codes run on hadoop streaming in python. however, trying 1 mapping , 2 reducing jobs. when try run code using following command, 1 reducer - first 1 - working. i using command: hadoop jar /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-streaming.jar -dmapreduce.job.queuename=user -dmapreduce.map.memory.mb=4096 -dmapreduce.map.java.opts=-xmx3276m -dmapred.output.compress=false -file mapper.py -file reducer_tf_hcuot.py -mapper mapper.py -reducer reducer_tf_hcuot.py -input text -output o_text can please tell me how work on it? in hadoop streaming, can run 1 map , 1 reduce job @ time (at present). you can run 2 mappers (or number of mappers) in 1 job piping output of first map function second map function. hadoop jar $hadoop_jar -mapper 'map1.py | map2.py | map3.py' -reducer 'reduce.py' ... however multiple reducers, ned rockson said, you'll have 2 independent jobs using identity mapper in second job hadoop jar $hadoop_jar -mapper ...