Почему hadoop не видит ключевое слово python?

Я запускаю hadoop из docker командой

docker exec -it namenode /bin/bash

Я перенес файлы mapper.py -

#!/usr/bin/env python
"""mapper.py"""

import sys

# Input comes from standard input (stdin)
for line in sys.stdin:
    # Remove leading and trailing whitespace
    line = line.strip()
    # Split the line into words
    words = line.split()
    # Increase counters
    for word in words:
        # Write the results to standard output (stdout)
        print(f'{word}\t1')

и reducer.py -

#!/usr/bin/env python
"""reducer.py"""
import sys

current_word = None
current_count = 0
word = None

# Input comes from standard input
for line in sys.stdin:
    # Remove leading and trailing whitespace
    line = line.strip()
    # Parse the input we got from mapper.py
    word, count = line.split('\t', 1)
    # Convert count (currently a string) to an integer
    try:
        count = int(count)
    except ValueError:
        continue

    if current_word == word:
        current_count += count
    else:
        if current_word:
            print(f'{current_word}\t{current_count}')
        current_count = count
        current_word = word

if current_word == word:
    print(f'{current_word}\t{current_count}')

в папку tmp(локальной файловой системы) внутри namenode контейнера.

Файл input.txt(перемещен в hdfs) -

hello world
hello friend
hello mom
hello father

Но при попытке применить mapper на input.txt bash не видит python-команды ничего не работает.

Если же я пытаюсь запустить hadoop streaming командой

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar -file /tmp/mapper.py -mapper mapper.py -file /tmp/reducer.py -reducer reducer.py -input /user/hduser/input.txt  -output /user/hduser/output

то выдается следующая ошибка -

root@afa15008d868:/# hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar -file /tmp/mapper.py -mapper mapper.py -file /tmp/reducer.py -reducer reducer.py -input /user/hduser/input.txt  -output /user/hduser/output
2024-10-11 11:29:50,843 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/tmp/mapper.py, /tmp/reducer.py, /tmp/hadoop-unjar2358119599702684140/] [] /tmp/streamjob6545005124994330395.jar tmpDir=null
2024-10-11 11:29:51,694 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager/172.18.0.3:8032
2024-10-11 11:29:51,923 INFO client.AHSProxy: Connecting to Application History server at historyserver/172.18.0.6:10200
2024-10-11 11:29:51,950 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager/172.18.0.3:8032
2024-10-11 11:29:51,950 INFO client.AHSProxy: Connecting to Application History server at historyserver/172.18.0.6:10200
2024-10-11 11:29:52,116 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1728639985711_0003
2024-10-11 11:29:52,204 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2024-10-11 11:29:52,296 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2024-10-11 11:29:52,324 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2024-10-11 11:29:52,385 INFO mapred.FileInputFormat: Total input files to process : 1
2024-10-11 11:29:52,418 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2024-10-11 11:29:52,447 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2024-10-11 11:29:52,463 INFO mapreduce.JobSubmitter: number of splits:2
2024-10-11 11:29:52,615 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2024-10-11 11:29:52,630 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1728639985711_0003
2024-10-11 11:29:52,630 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-10-11 11:29:52,785 INFO conf.Configuration: resource-types.xml not found
2024-10-11 11:29:52,786 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-10-11 11:29:53,043 INFO impl.YarnClientImpl: Submitted application application_1728639985711_0003
2024-10-11 11:29:53,077 INFO mapreduce.Job: The url to track the job: http://resourcemanager:8088/proxy/application_1728639985711_0003/
2024-10-11 11:29:53,079 INFO mapreduce.Job: Running job: job_1728639985711_0003
2024-10-11 11:29:58,137 INFO mapreduce.Job: Job job_1728639985711_0003 running in uber mode : false
2024-10-11 11:29:58,138 INFO mapreduce.Job:  map 0% reduce 0%
2024-10-11 11:30:02,180 INFO mapreduce.Job: Task Id : attempt_1728639985711_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:326)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:539)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

Операционная система - Win 10 Тип перехода на следующую строку выбран LF Выбрана нужный тип конца строки(как в linux)


Ответы (0 шт):