问题:
输入文件A的样例如下(注意文件以tab为分隔符,粘贴时请检查):
20170101 x 20170102 y 20170103 x 20170104 y 20170105 z 20170106 x |
输入文件B的样例如下:
20170101 y 20170102 y 20170103 x 20170104 z 20170105 y |
根据输入文件A和B合并得到的输出文件C的样例如下:
20170101 x 20170101 y 20170102 y 20170103 x 20170104 y 20170104 z 20170105 y 20170105 z 20170106 x |
代码实现:
1 import org. apache.hadoop.fs.Path; 2 import org.apache.hadoop.io.DoubleWritable; 3 import org.apache.hadoop.io.IntWritable; 4 import org.apache.hadoop.io.LongWritable; 5 import org.apache.hadoop.io.Text; 6 import org.apache.hadoop.mapreduce.Job; 7 import org.apache.hadoop.mapreduce.Mapper; 8 import org.apache.hadoop.mapreduce.Reducer; 9 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;10 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;11 import org.apache.hadoop.util.GenericOptionsParser;12 13 public class Task1 {14 public static class MapClass extends Mapper{15 public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException {16 context.write(value, new Text(""));17 }18 }19 public static class ReduceClass extends Reducer {20 public void reduce( Text key, Iterable values,Context context) throws IOException, InterruptedException {21 context.write(key, new Text(""));22 }23 }24 public static void main(String args[]) throws IOException, ClassNotFoundException, InterruptedException {25 Configuration conf = new Configuration();26 Job job = new Job(conf);27 job.setJarByClass(Task1.class);28 job.setMapperClass(MapClass.class);29 job.setReducerClass(ReduceClass.class);30 job.setOutputKeyClass(Text.class);31 job.setOutputValueClass(Text.class);32 33 FileInputFormat.addInputPath(job, new Path("C:\\Users\\Administrator\\Desktop\\新建文件夹\\input2.txt") );34 FileInputFormat.addInputPath(job, new Path("C:\\Users\\Administrator\\Desktop\\\\新建文件夹\\input1.txt") );35 FileOutputFormat.setOutputPath(job, new Path("C:\\Users\\Administrator\\Desktop\\新建文件夹\\output"));36 37 System.exit(job.waitForCompletion(true)?0:1);38 }39 }
结果:
踩过的坑:
reduce不执行的原因:
1、程序出现过异常,可以通过日志来debug;
2、参数类型不匹配;
等