Java Target类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中org.apache.crunch.Target类的典型用法代码示例。如果您正苦于以下问题：Java Target类的具体用法？Java Target怎么用？Java Target使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

Target类属于org.apache.crunch包，在下文中一共展示了Target类的13个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: testGeneric

import org.apache.crunch.Target; //导入依赖的package包/类
@Test
public void testGeneric() throws IOException {
  Dataset<Record> inputDataset = repo.create("in", new DatasetDescriptor.Builder()
      .schema(USER_SCHEMA).build());
  Dataset<Record> outputDataset = repo.create("out", new DatasetDescriptor.Builder()
      .schema(USER_SCHEMA).build());

  // write two files, each of 5 records
  writeTestUsers(inputDataset, 5, 0);
  writeTestUsers(inputDataset, 5, 5);

  Pipeline pipeline = new MRPipeline(TestCrunchDatasets.class);
  PCollection<GenericData.Record> data = pipeline.read(
      CrunchDatasets.asSource(inputDataset, GenericData.Record.class));
  pipeline.write(data, CrunchDatasets.asTarget(outputDataset), Target.WriteMode.APPEND);
  pipeline.run();

  checkTestUsers(outputDataset, 10);
}

开发者ID:cloudera，项目名称:cdk，代码行数:20，代码来源:TestCrunchDatasets.java

示例2: testGenericParquet

import org.apache.crunch.Target; //导入依赖的package包/类
@Test
public void testGenericParquet() throws IOException {
  Dataset<Record> inputDataset = repo.create("in", new DatasetDescriptor.Builder()
      .schema(USER_SCHEMA).format(Formats.PARQUET).build());
  Dataset<Record> outputDataset = repo.create("out", new DatasetDescriptor.Builder()
      .schema(USER_SCHEMA).format(Formats.PARQUET).build());

  // write two files, each of 5 records
  writeTestUsers(inputDataset, 5, 0);
  writeTestUsers(inputDataset, 5, 5);

  Pipeline pipeline = new MRPipeline(TestCrunchDatasets.class);
  PCollection<GenericData.Record> data = pipeline.read(
      CrunchDatasets.asSource(inputDataset, GenericData.Record.class));
  pipeline.write(data, CrunchDatasets.asTarget(outputDataset), Target.WriteMode.APPEND);
  pipeline.run();

  checkTestUsers(outputDataset, 10);
}

开发者ID:cloudera，项目名称:cdk，代码行数:20，代码来源:TestCrunchDatasets.java

示例3: asTarget

import org.apache.crunch.Target; //导入依赖的package包/类
/**
 * Expose the given {@link Dataset} as a Crunch {@link Target}.
 *
 * Only the FileSystem {@code Dataset} implementation is supported and the
 * file format must be {@code Formats.PARQUET} or {@code Formats.AVRO}. In
 * addition, the given {@code Dataset} must not be partitioned,
 * <strong>or</strong> must be a leaf partition in the partition hierarchy.
 *
 * <strong>The {@code Target} returned by this method will not write to
 * sub-partitions.</strong>
 *
 * @param dataset the dataset to write to
 * @return the {@link Target}, or <code>null</code> if the dataset is not
 * filesystem-based.
 */
public static Target asTarget(Dataset dataset) {
  Path directory = Accessor.getDefault().getDirectory(dataset);
  if (directory != null) {
    final Format format = dataset.getDescriptor().getFormat();
    if (Formats.PARQUET.equals(format)) {
      return new AvroParquetFileTarget(directory);
    } else if (Formats.AVRO.equals(format)) {
      return new AvroFileTarget(directory);
    } else {
      throw new UnsupportedOperationException(
          "Not a supported format: " + format);
    }
  }
  return null;
}

开发者ID:cloudera，项目名称:cdk，代码行数:31，代码来源:CrunchDatasets.java

示例4: testPartitionedSourceAndTarget

import org.apache.crunch.Target; //导入依赖的package包/类
@Test
@SuppressWarnings("deprecation")
public void testPartitionedSourceAndTarget() throws IOException {
  PartitionStrategy partitionStrategy = new PartitionStrategy.Builder().hash(
      "username", 2).build();

  Dataset<Record> inputDataset = repo.create("in", new DatasetDescriptor.Builder()
      .schema(USER_SCHEMA).partitionStrategy(partitionStrategy).build());
  Dataset<Record> outputDataset = repo.create("out", new DatasetDescriptor.Builder()
      .schema(USER_SCHEMA).partitionStrategy(partitionStrategy).build());

  writeTestUsers(inputDataset, 10);

  PartitionKey key = partitionStrategy.partitionKey(0);
  Dataset<Record> inputPart0 = inputDataset.getPartition(key, false);
  Dataset<Record> outputPart0 = outputDataset.getPartition(key, true);

  Pipeline pipeline = new MRPipeline(TestCrunchDatasets.class);
  PCollection<GenericData.Record> data = pipeline.read(
      CrunchDatasets.asSource(inputPart0, GenericData.Record.class));
  pipeline.write(data, CrunchDatasets.asTarget(outputPart0), Target.WriteMode.APPEND);
  pipeline.run();

  Assert.assertEquals(5, datasetSize(outputPart0));
}

开发者ID:cloudera，项目名称:cdk，代码行数:26，代码来源:TestCrunchDatasets.java

示例5: run

import org.apache.crunch.Target; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {

  // Construct a local filesystem dataset repository rooted at /tmp/data
  DatasetRepository fsRepo = DatasetRepositories.open("repo:hdfs:/tmp/data");

  // Construct an HCatalog dataset repository using external Hive tables
  DatasetRepository hcatRepo = DatasetRepositories.open("repo:hive:/tmp/data");

  // Turn debug on while in development.
  getPipeline().enableDebug();
  getPipeline().getConfiguration().set("crunch.log.job.progress", "true");

  // Load the events dataset and get the correct partition to sessionize
  Dataset<StandardEvent> eventsDataset = fsRepo.load("events");
  Dataset<StandardEvent> partition;
  if (args.length == 0 || (args.length == 1 && args[0].equals("LATEST"))) {
    partition = getLatestPartition(eventsDataset);
  } else {
    partition = getPartitionForURI(eventsDataset, args[0]);
  }

  // Create a parallel collection from the working partition
  PCollection<StandardEvent> events = read(
      CrunchDatasets.asSource(partition, StandardEvent.class));

  // Group events by user and cookie id, then create a session for each group
  PCollection<Session> sessions = events
      .by(new GetSessionKey(), Avros.strings())
      .groupByKey()
      .parallelDo(new MakeSession(), Avros.specifics(Session.class));

  // Write the sessions to the "sessions" Dataset
  getPipeline().write(sessions, CrunchDatasets.asTarget(hcatRepo.load("sessions")),
      Target.WriteMode.APPEND);

  return run().succeeded() ? 0 : 1;
}

开发者ID:cloudera，项目名称:cdk-examples，代码行数:39，代码来源:CreateSessions.java

示例6: run

import org.apache.crunch.Target; //导入依赖的package包/类
@Override
public int run(String... args) throws Exception {
  if (args.length != 3) {
    System.err.println("Usage: " + CombinedLogFormatConverter.class.getSimpleName() +
        " <input> <dataset_uri> <dataset name>");
    return 1;
  }
  String input = args[0];
  String datasetUri = args[1];
  String datasetName = args[2];

  Schema schema = new Schema.Parser().parse(
      Resources.getResource("combined_log_format.avsc").openStream());

  // Create the dataset
  DatasetRepository repo = DatasetRepositories.open(datasetUri);
  DatasetDescriptor datasetDescriptor = new DatasetDescriptor.Builder()
      .schema(schema).build();
  Dataset<Object> dataset = repo.create(datasetName, datasetDescriptor);

  // Run the job
  final String schemaString = schema.toString();
  AvroType<GenericData.Record> outputType = Avros.generics(schema);
  PCollection<String> lines = readTextFile(input);
  PCollection<GenericData.Record> records = lines.parallelDo(
      new ConvertFn(schemaString), outputType);
  getPipeline().write(records, CrunchDatasets.asTarget(dataset),
      Target.WriteMode.APPEND);
  run();
  return 0;
}

开发者ID:cloudera，项目名称:cdk，代码行数:32，代码来源:CombinedLogFormatConverter.java

示例7: run

import org.apache.crunch.Target; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
  final long startOfToday = startOfDay();

  // the destination dataset
  Dataset<Record> persistent = Datasets.load(
      "dataset:file:/tmp/data/logs", Record.class);

  // the source: anything before today in the staging area
  Dataset<Record> staging = Datasets.load(
      "dataset:file:/tmp/data/logs_staging", Record.class);
  View<Record> ready = staging.toBefore("timestamp", startOfToday);

  ReadableSource<Record> source = CrunchDatasets.asSource(ready);

  PCollection<Record> stagedLogs = read(source);

  getPipeline().write(stagedLogs,
      CrunchDatasets.asTarget(persistent), Target.WriteMode.APPEND);

  PipelineResult result = run();

  if (result.succeeded()) {
    // remove the source data partition from staging
    ready.deleteAll();
    return 0;
  } else {
    return 1;
  }
}

开发者ID:kite-sdk，项目名称:kite-examples，代码行数:31，代码来源:StagingToPersistent.java

示例8: compressedTextOutput

import org.apache.crunch.Target; //导入依赖的package包/类
protected final Target compressedTextOutput(Configuration conf, String outputPathKey) {
  // The way this is used, it doesn't seem like we can just set the object in getConf(). Need
  // to set the copy in the MRPipeline directly?
  conf.setClass(FileOutputFormat.COMPRESS_CODEC, GzipCodec.class, CompressionCodec.class);
  conf.setClass(MRJobConfig.MAP_OUTPUT_COMPRESS_CODEC, SnappyCodec.class, CompressionCodec.class);
  return To.textFile(Namespaces.toPath(outputPathKey));
}

开发者ID:apsaltis，项目名称:oryx，代码行数:8，代码来源:JobStep.java

示例9: run

import org.apache.crunch.Target; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
  JCommander jc = new JCommander(this);
  try {
    jc.parse(args);
  } catch (ParameterException e) {
    jc.usage();
    return 1;
  }

  if (paths == null || paths.size() != 2) {
    jc.usage();
    return 1;
  }

  String inputPathString = paths.get(0);
  String outputPathString = paths.get(1);

  Configuration conf = getConf();
  Path inputPath = new Path(inputPathString);
  Path outputPath = new Path(outputPathString);
  outputPath = outputPath.getFileSystem(conf).makeQualified(outputPath);

  Pipeline pipeline = new MRPipeline(getClass(), conf);

  VariantsLoader variantsLoader;
  if (dataModel.equals("GA4GH")) {
    variantsLoader = new GA4GHVariantsLoader();
  } else if (dataModel.equals("ADAM")) {
    variantsLoader = new ADAMVariantsLoader();
  } else {
    jc.usage();
    return 1;
  }

  Set<String> sampleSet = samples == null ? null :
      Sets.newLinkedHashSet(Splitter.on(',').split(samples));

  PTable<String, SpecificRecord> partitionKeyedRecords =
      variantsLoader.loadPartitionedVariants(inputFormat, inputPath, conf, pipeline,
          variantsOnly, flatten, sampleGroup, sampleSet, redistribute, segmentSize,
          numReducers);

  if (FileUtils.sampleGroupExists(outputPath, conf, sampleGroup)) {
    if (overwrite) {
      FileUtils.deleteSampleGroup(outputPath, conf, sampleGroup);
    } else {
      LOG.error("Sample group already exists: " + sampleGroup);
      return 1;
    }
  }

  pipeline.write(partitionKeyedRecords, new AvroParquetPathPerKeyTarget(outputPath),
      Target.WriteMode.APPEND);

  PipelineResult result = pipeline.done();
  return result.succeeded() ? 0 : 1;
}

开发者ID:cloudera，项目名称:quince，代码行数:59，代码来源:LoadVariantsTool.java

示例10: run

import org.apache.crunch.Target; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
  // Turn debug on while in development.
  getPipeline().enableDebug();
  getPipeline().getConfiguration().set("crunch.log.job.progress", "true");

  Dataset<StandardEvent> eventsDataset = Datasets.load(
      "dataset:hdfs:/tmp/data/default/events", StandardEvent.class);

  View<StandardEvent> eventsToProcess;
  if (args.length == 0 || (args.length == 1 && args[0].equals("LATEST"))) {
    // get the current minute
    Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
    cal.set(Calendar.SECOND, 0);
    cal.set(Calendar.MILLISECOND, 0);
    long currentMinute = cal.getTimeInMillis();
    // restrict events to before the current minute
    // in the workflow, this also has a lower bound for the timestamp
    eventsToProcess = eventsDataset.toBefore("timestamp", currentMinute);

  } else if (isView(args[0])) {
    eventsToProcess = Datasets.load(args[0], StandardEvent.class);
  } else {
    eventsToProcess = FileSystemDatasets.viewForPath(eventsDataset, new Path(args[0]));
  }

  if (eventsToProcess.isEmpty()) {
    LOG.info("No records to process.");
    return 0;
  }

  // Create a parallel collection from the working partition
  PCollection<StandardEvent> events = read(
      CrunchDatasets.asSource(eventsToProcess));

  // Group events by user and cookie id, then create a session for each group
  PCollection<Session> sessions = events
      .by(new GetSessionKey(), Avros.strings())
      .groupByKey()
      .parallelDo(new MakeSession(), Avros.specifics(Session.class));

  // Write the sessions to the "sessions" Dataset
  getPipeline().write(sessions,
      CrunchDatasets.asTarget("dataset:hive:/tmp/data/default/sessions"),
      Target.WriteMode.APPEND);

  return run().succeeded() ? 0 : 1;
}

开发者ID:kite-sdk，项目名称:kite-examples，代码行数:49，代码来源:CreateSessions.java

示例11: avroOutput

import org.apache.crunch.Target; //导入依赖的package包/类
protected final Target avroOutput(String outputPathKey) {
  return To.avroFile(Namespaces.toPath(outputPathKey));
}

开发者ID:apsaltis，项目名称:oryx，代码行数:4，代码来源:JobStep.java

示例12: output

import org.apache.crunch.Target; //导入依赖的package包/类
protected final Target output(String outputPathKey) {
  return avroOutput(outputPathKey);
}

开发者ID:apsaltis，项目名称:oryx，代码行数:4，代码来源:JobStep.java

示例13: outputConf

import org.apache.crunch.Target; //导入依赖的package包/类
@Override
public Target outputConf(final String key, final String value) {
  extraConf.put(key, value);
  return this;
}

开发者ID:spotify，项目名称:hdfs2cass，代码行数:6，代码来源:CQLTarget.java

注：本文中的org.apache.crunch.Target类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java HandlerTube类代码示例发布时间：2022-05-23

Java C12PacketUpdateSign类代码示例发布时间：2022-05-23

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18055|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9604|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8144|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8529|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8429|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9337|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8393|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7829|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8381|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7378|2022-11-06

客服电话

电子邮件

Java Target类代码示例

示例1: testGeneric

示例2: testGenericParquet

示例3: asTarget

示例4: testPartitionedSourceAndTarget

示例5: run

示例6: run

示例7: run

示例8: compressedTextOutput

示例9: run

示例10: run

示例11: avroOutput

示例12: output

示例13: outputConf

请发表评论

全部评论

上一篇：

下一篇：

GitbookIO/gitbook:

ekzhang/library: Advanced algorithm and

juleswhite/mobile-cloud-asgn1

kyamagu/matlab-json: Use official API: h

墙壁眼睛膝盖

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053