Java Avros类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中org.apache.crunch.types.avro.Avros类的典型用法代码示例。如果您正苦于以下问题：Java Avros类的具体用法？Java Avros怎么用？Java Avros使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

Avros类属于org.apache.crunch.types.avro包，在下文中一共展示了Avros类的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: loadKeyedRecords

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
public PTable<Tuple3<String, Long, String>, SpecificRecord>
  loadKeyedRecords(String inputFormat, Path inputPath, Configuration conf,
      Pipeline pipeline, boolean variantsOnly, boolean flatten, String sampleGroup,
      Set<String> samples)
      throws IOException {
  PCollection<Pair<org.bdgenomics.formats.avro.Variant, Collection<Genotype>>> adamRecords
      = readVariants(inputFormat, inputPath, conf, pipeline, sampleGroup);
  // The data are now loaded into ADAM variant objects; convert to keyed SpecificRecords
  ADAMToKeyedSpecificRecordFn converter =
      new ADAMToKeyedSpecificRecordFn(variantsOnly, flatten, sampleGroup, samples);
  @SuppressWarnings("unchecked")
  PType<SpecificRecord> specificPType = Avros.specifics(converter.getSpecificRecordType());
  return adamRecords.parallelDo("Convert to keyed SpecificRecords",
      converter, Avros.tableOf(KEY_PTYPE, specificPType));
}

开发者ID:cloudera，项目名称:quince，代码行数:17，代码来源:ADAMVariantsLoader.java

示例2: readVariants

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
private static PCollection<Pair<Variant, Collection<Genotype>>>
    readVariants(String inputFormat, Path inputPath, Configuration conf,
    Pipeline pipeline, String sampleGroup) throws IOException {
  PCollection<Pair<Variant, Collection<Genotype>>> adamRecords;
  if (inputFormat.equals("VCF")) {
    TableSource<LongWritable, VariantContextWritable> vcfSource =
        From.formattedFile(
            inputPath, VCFInputFormat.class, LongWritable.class, VariantContextWritable.class);
    PCollection<VariantContextWritable> vcfRecords = pipeline.read(vcfSource).values();
    PType<Pair<Variant, Collection<Genotype>>> adamPType =
        Avros.pairs(Avros.specifics(org.bdgenomics.formats.avro.Variant.class),
            Avros.collections(Avros.specifics(Genotype.class)));
    adamRecords =
        vcfRecords.parallelDo("VCF to ADAM Variant", new VCFToADAMVariantFn(), adamPType);
  } else if (inputFormat.equals("AVRO")) {
    throw new UnsupportedOperationException("Unsupported input format: " + inputFormat);
  } else if (inputFormat.equals("PARQUET")) {
    throw new UnsupportedOperationException("Unsupported input format: " + inputFormat);
  } else {
    throw new IllegalStateException("Unrecognized input format: " + inputFormat);
  }
  return adamRecords;
}

开发者ID:cloudera，项目名称:quince，代码行数:24，代码来源:ADAMVariantsLoader.java

示例3: testDetach

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Test
public void testDetach() {
  Collection<TestAvroRecord> expected = Lists.newArrayList(
          new TestAvroRecord(new Utf8("something"), new Utf8("*"), 1L),
          new TestAvroRecord(new Utf8("something"), new Utf8("**"), 1L),
          new TestAvroRecord(new Utf8("something"), new Utf8("***"), 1L)
  );
  DoFn<Pair<String, Iterable<TestAvroRecord>>, Collection<TestAvroRecord>> doFn =
          DoFns.detach(new CollectingMapFn(), Avros.specifics(TestAvroRecord.class));
  Pair<String, Iterable<TestAvroRecord>> input = Pair.of("key", (Iterable<TestAvroRecord>) new AvroIterable());
  InMemoryEmitter<Collection<TestAvroRecord>> emitter = new InMemoryEmitter<Collection<TestAvroRecord>>();

  doFn.configure(new Configuration());
  doFn.initialize();
  doFn.process(input, emitter);
  doFn.cleanup(emitter);

  assertEquals(expected, emitter.getOutput().get(0));
}

开发者ID:spotify，项目名称:crunch-lib，代码行数:20，代码来源:DoFnsTest.java

示例4: asSource

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
/**
 * Expose the given {@link Dataset} as a Crunch {@link ReadableSource}.
 *
 * Only the FileSystem {@code Dataset} implementation is supported and the
 * file format must be {@code Formats.PARQUET} or {@code Formats.AVRO}.
 *
 * @param dataset the dataset to read from
 * @param type    the Java type of the entities in the dataset
 * @param <E>     the type of entity produced by the source
 * @return the {@link ReadableSource}, or <code>null</code> if the dataset is not
 * filesystem-based.
 */
@SuppressWarnings("unchecked")
public static <E> ReadableSource<E> asSource(Dataset<E> dataset, Class<E> type) {
  Path directory = Accessor.getDefault().getDirectory(dataset);
  if (directory != null) {
    List<Path> paths = Lists.newArrayList(
        Accessor.getDefault().getPathIterator(dataset));

    AvroType<E> avroType;
    if (type.isAssignableFrom(GenericData.Record.class)) {
      avroType = (AvroType<E>) Avros.generics(dataset.getDescriptor().getSchema());
    } else {
      avroType = Avros.records(type);
    }
    final Format format = dataset.getDescriptor().getFormat();
    if (Formats.PARQUET.equals(format)) {
      return new AvroParquetFileSource<E>(paths, avroType);
    } else if (Formats.AVRO.equals(format)) {
      return new AvroFileSource<E>(paths, avroType);
    } else {
      throw new UnsupportedOperationException(
          "Not a supported format: " + format);
    }
  }
  return null;
}

开发者ID:cloudera，项目名称:cdk，代码行数:38，代码来源:CrunchDatasets.java

示例5: createPipeline

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
protected MRPipeline createPipeline() throws IOException {

  JobStepConfig config = getConfig();
  String instanceDir = config.getInstanceDir();
  long generationID = config.getGenerationID();

  String inputKey = Namespaces.getTempPrefix(instanceDir, generationID) + "partialRecommend/";
  String outputKey = Namespaces.getInstanceGenerationPrefix(instanceDir, generationID) + "recommend/";
  if (!validOutputPath(outputKey)) {
    return null;
  }

  MRPipeline p = createBasicPipeline(CollectRecommendFn.class);
  p.getConfiguration().set(IDMappingState.ID_MAPPING_KEY,
                           Namespaces.getInstanceGenerationPrefix(instanceDir, generationID) + "idMapping/");
  PTables.asPTable(p.read(input(inputKey, ALSTypes.VALUE_MATRIX)))
      .groupByKey(groupingOptions())
      .parallelDo("collectRecommend", new CollectRecommendFn(), Avros.strings())
      .write(compressedTextOutput(p.getConfiguration(), outputKey));
  return p;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:23，代码来源:CollectRecommendStep.java

示例6: createPipeline

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
protected final MRPipeline createPipeline() throws IOException {

  JobStepConfig config = getConfig();

  IterationState iterationState = getIterationState();
  String iterationKey = iterationState.getIterationKey();
  String xOrY = isX() ? "X/" : "Y/";
  String outputKeyPath =
      Namespaces.getInstanceGenerationPrefix(config.getInstanceDir(), config.getGenerationID()) + xOrY;

  if (!validOutputPath(outputKeyPath)) {
    return null;
  }

  MRPipeline p = createBasicPipeline(PublishMapFn.class);
  p.read(input(iterationKey + xOrY, ALSTypes.DENSE_ROW_MATRIX))
      .parallelDo("publish", new PublishMapFn(), Avros.strings())
      .write(compressedTextOutput(p.getConfiguration(), outputKeyPath));
  return p;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:22，代码来源:PublishStep.java

示例7: createPipeline

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
protected MRPipeline createPipeline() throws IOException {

  JobStepConfig config = getConfig();
  String instanceDir = config.getInstanceDir();
  long generationID = config.getGenerationID();

  String outputKey = Namespaces.getInstanceGenerationPrefix(instanceDir, generationID) + "knownItems/";

  if (!validOutputPath(outputKey)) {
    return null;
  }

  MRPipeline p = createBasicPipeline(CollectKnownItemsFn.class);
  // Really should read in and exclude tag IDs but doesn't really hurt much
  p.read(input(Namespaces.getTempPrefix(instanceDir, generationID) + "userVectors/", ALSTypes.SPARSE_ROW_MATRIX))
      .parallelDo("collectKnownItems", new CollectKnownItemsFn(), Avros.strings())
      .write(compressedTextOutput(p.getConfiguration(), outputKey));
  return p;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:21，代码来源:CollectKnownItemsStep.java

示例8: createPipeline

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
protected final MRPipeline createPipeline() throws IOException {
  JobStepConfig config = getConfig();
  String tempPrefix = Namespaces.getTempPrefix(config.getInstanceDir(), config.getGenerationID());
  String outputPathKey = tempPrefix + getPopularPathDir() + '/';
  if (!validOutputPath(outputPathKey)) {
    return null;
  }

  MRPipeline p = createBasicPipeline(PopularMapFn.class);
  p.read(input(tempPrefix + getSourceDir() + '/', ALSTypes.SPARSE_ROW_MATRIX))
      .parallelDo("popularMap", new PopularMapFn(), Avros.tableOf(ALSTypes.INTS, ALSTypes.ID_SET))
      .groupByKey(groupingOptions())
      //.combineValues(new FastIDSetAggregator())
      .parallelDo("popularReduce", new PopularReduceFn(), ALSTypes.LONGS)
      .write(output(outputPathKey));
  return p;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:19，代码来源:AbstractPopularStep.java

示例9: createPipeline

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
protected MRPipeline createPipeline() throws IOException {

  JobStepConfig config = getConfig();

  String instanceDir = config.getInstanceDir();
  long generationID = config.getGenerationID();
  String tempPrefix = Namespaces.getTempPrefix(instanceDir, generationID);
  String outputPathKey = Namespaces.getInstanceGenerationPrefix(instanceDir, generationID) + "similarItems/";

  if (!validOutputPath(outputPathKey)) {
    return null;
  }

  MRPipeline p = createBasicPipeline(SimilarReduceFn.class);
  p.getConfiguration().set(IDMappingState.ID_MAPPING_KEY,
                           Namespaces.getInstanceGenerationPrefix(instanceDir, generationID) + "idMapping/");
  PTables.asPTable(p.read(input(tempPrefix + "distributeSimilar/", ALSTypes.VALUE_MATRIX)))
      .groupByKey(groupingOptions())
      .parallelDo("similarReduce", new SimilarReduceFn(), Avros.strings())
      .write(compressedTextOutput(p.getConfiguration(), outputPathKey));
  return p;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:24，代码来源:SimilarStep.java

示例10: testCategorical

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Test
public void testCategorical() {
  PCollection<String> input = MemPipeline.typedCollectionOf(
      Avros.strings(),
      "1.0,a,3.0,y",
      "0.4,b,1.0,x",
      "3.2,c,29.0,z");
  PCollection<Record> elems = StringSplitFn.apply(input);
  Summary s = new Summarizer()
    .categoricalColumns(1, 3)
    .build(elems).getValue();
  PCollection<RealVector> vecs = elems.parallelDo(new StandardizeFn(s), MLAvros.vector());
  assertEquals(ImmutableList.of(
      Vectors.of(1.0, 1, 0, 0, 3.0, 0.0, 1.0, 0.0),
      Vectors.of(0.4, 0, 1, 0, 1.0, 1.0, 0.0, 0.0),
      Vectors.of(3.2, 0, 0, 1, 29.0, 0, 0, 1)),
      vecs.materialize());
}

开发者ID:apsaltis，项目名称:oryx，代码行数:19，代码来源:StringParsingTest.java

示例11: createPipeline

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
protected MRPipeline createPipeline() throws IOException {

  JobStepConfig config = getConfig();
  String instanceGenerationPrefix =
      Namespaces.getInstanceGenerationPrefix(config.getInstanceDir(), config.getGenerationID());
  String outputPathKey = instanceGenerationPrefix + "trees/";
  if (!validOutputPath(outputPathKey)) {
    return null;
  }

  MRPipeline p = createBasicPipeline(DistributeExampleFn.class);
  p.read(textInput(instanceGenerationPrefix + "inbound/"))
      .parallelDo("distributeData",
                  new DistributeExampleFn(),
                  Avros.tableOf(Avros.ints(), Avros.strings()))
      .groupByKey(groupingOptions())
      .parallelDo("buildTrees", new BuildTreeFn(), Avros.strings())
      .write(compressedTextOutput(p.getConfiguration(), outputPathKey));
  return p;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:22，代码来源:BuildTreesStep.java

示例12: run

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
public int run(String[] args) throws Exception {

    String fooInputPath = args[0];
    String barInputPath = args[1];
    String outputPath = args[2];
    int fooValMax = Integer.parseInt(args[3]);
    int joinValMax = Integer.parseInt(args[4]);
    int numberOfReducers = Integer.parseInt(args[5]);

    Pipeline pipeline = new MRPipeline(JoinFilterExampleCrunch.class, getConf()); //<1>
    
    PCollection<String> fooLines = pipeline.readTextFile(fooInputPath);  //<2>
    PCollection<String> barLines = pipeline.readTextFile(barInputPath);

    PTable<Long, Pair<Long, Integer>> fooTable = fooLines.parallelDo(  //<3>
        new FooIndicatorFn(),
        Avros.tableOf(Avros.longs(),
        Avros.pairs(Avros.longs(), Avros.ints())));

    fooTable = fooTable.filter(new FooFilter(fooValMax));  //<4>

    PTable<Long, Integer> barTable = barLines.parallelDo(new BarIndicatorFn(),
        Avros.tableOf(Avros.longs(), Avros.ints()));

    DefaultJoinStrategy<Long, Pair<Long, Integer>, Integer> joinStrategy =   //<5>
        new DefaultJoinStrategy
          <Long, Pair<Long, Integer>, Integer>
          (numberOfReducers);

    PTable<Long, Pair<Pair<Long, Integer>, Integer>> joinedTable = joinStrategy //<6>
        .join(fooTable, barTable, JoinType.INNER_JOIN);

    PTable<Long, Pair<Pair<Long, Integer>, Integer>> filteredTable = joinedTable.filter(new JoinFilter(joinValMax));

    filteredTable.write(At.textFile(outputPath), WriteMode.OVERWRITE); //<7>

    PipelineResult result = pipeline.done();

    return result.succeeded() ? 0 : 1;
  }

开发者ID:amitchmca，项目名称:hadooparchitecturebook，代码行数:41，代码来源:JoinFilterExampleCrunch.java

示例13: loadKeyedRecords

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
public PTable<Tuple3<String, Long, String>, SpecificRecord>
    loadKeyedRecords(String inputFormat, Path inputPath, Configuration conf,
        Pipeline pipeline, boolean variantsOnly, boolean flatten, String sampleGroup,
        Set<String> samples)
    throws IOException {
  PCollection<Variant> variants = readVariants(inputFormat, inputPath,
      conf, pipeline, sampleGroup);

  GA4GHToKeyedSpecificRecordFn converter =
      new GA4GHToKeyedSpecificRecordFn(variantsOnly, flatten, sampleGroup, samples);
  @SuppressWarnings("unchecked")
  PType<SpecificRecord> specificPType = Avros.specifics(converter
      .getSpecificRecordType());
  return variants.parallelDo("Convert to keyed SpecificRecords",
      converter, Avros.tableOf(KEY_PTYPE, specificPType));
}

开发者ID:cloudera，项目名称:quince，代码行数:18，代码来源:GA4GHVariantsLoader.java

示例14: readVariants

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
private static PCollection<Variant> readVariants(String inputFormat, Path inputPath,
    Configuration conf, Pipeline pipeline, String sampleGroup) throws IOException {
  PCollection<Variant> variants;
  if (inputFormat.equals("VCF")) {
    VCFToGA4GHVariantFn.configureHeaders(
        conf, FileUtils.findVcfs(inputPath, conf), sampleGroup);
    TableSource<LongWritable, VariantContextWritable> vcfSource =
        From.formattedFile(
            inputPath, VCFInputFormat.class, LongWritable.class, VariantContextWritable.class);
    PCollection<VariantContextWritable> vcfRecords = pipeline.read(vcfSource).values();
    variants = vcfRecords.parallelDo(
        "VCF to GA4GH Variant", new VCFToGA4GHVariantFn(), Avros.specifics(Variant.class));
  } else if (inputFormat.equals("AVRO")) {
    variants = pipeline.read(From.avroFile(inputPath, Avros.specifics(Variant.class)));
  } else if (inputFormat.equals("PARQUET")) {
    @SuppressWarnings("unchecked")
    Source<Variant> source =
        new AvroParquetFileSource(inputPath, Avros.specifics(Variant.class));
    variants = pipeline.read(source);
  } else {
    throw new IllegalStateException("Unrecognized input format: " + inputFormat);
  }
  return variants;
}

开发者ID:cloudera，项目名称:quince，代码行数:25，代码来源:GA4GHVariantsLoader.java

示例15: loadPartitionedVariants

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
/**
 * Load and partition variants.
 * key = (contig, pos, sample_group); value = Variant/Call Avro object
 * @param inputFormat the format of the input data (VCF, AVRO, or PARQUET)
 * @param inputPath the input data path
 * @param conf the Hadoop configuration
 * @param pipeline the Crunch pipeline
 * @param variantsOnly whether to ignore samples and only load variants
 * @param flatten whether to flatten the data types
 * @param sampleGroup an identifier for the group of samples being loaded
 * @param samples the samples to include
 * @param redistribute whether to repartition the data by locus/sample group
 * @param segmentSize the number of base pairs in each segment partition
 * @param numReducers the number of reducers to use
 * @return the keyed variant or call records
 * @throws IOException if an I/O error is encountered during loading
 */
public PTable<String, SpecificRecord> loadPartitionedVariants(
    String inputFormat, Path inputPath, Configuration conf,
    Pipeline pipeline, boolean variantsOnly, boolean flatten, String sampleGroup,
    Set<String> samples, boolean redistribute, long segmentSize, int numReducers)
    throws IOException {
  PTable<Tuple3<String, Long, String>, SpecificRecord> locusSampleKeyedRecords =
      loadKeyedRecords(inputFormat, inputPath, conf, pipeline, variantsOnly, flatten,
          sampleGroup, samples);

  // execute a DISTRIBUTE BY operation if requested
  PTable<Tuple3<String, Long, String>, SpecificRecord> sortedRecords;
  if (redistribute) {
    // partitionKey(chr, chrSeg, sampleGroup), Pair(secondaryKey/pos, originalDatum)
    PTableType<Tuple3<String, Long, String>,
        Pair<Long,
            Pair<Tuple3<String, Long, String>, SpecificRecord>>> reKeyedPType =
        Avros.tableOf(Avros.triples(Avros.strings(), Avros.longs(), Avros.strings()),
            Avros.pairs(Avros.longs(),
                Avros.pairs(locusSampleKeyedRecords.getKeyType(),
                    locusSampleKeyedRecords.getValueType())));
    PTable<Tuple3<String, Long, String>,
        Pair<Long, Pair<Tuple3<String, Long, String>, SpecificRecord>>> reKeyed =
        locusSampleKeyedRecords.parallelDo("Re-keying for redistribution",
            new ReKeyDistributeByFn(segmentSize), reKeyedPType);
    // repartition and sort by pos
    sortedRecords = SecondarySort.sortAndApply(
        reKeyed, new UnKeyForDistributeByFn(),
        locusSampleKeyedRecords.getPTableType(), numReducers);
  } else {
    // input data assumed to be already globally sorted
    sortedRecords = locusSampleKeyedRecords;
  }

  // generate the partition keys
  return sortedRecords.mapKeys("Generate partition keys",
      new LocusSampleToPartitionFn(segmentSize, sampleGroup), Avros.strings());
}

开发者ID:cloudera，项目名称:quince，代码行数:55，代码来源:VariantsLoader.java

示例16: run

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {

  // Construct a local filesystem dataset repository rooted at /tmp/data
  DatasetRepository fsRepo = DatasetRepositories.open("repo:hdfs:/tmp/data");

  // Construct an HCatalog dataset repository using external Hive tables
  DatasetRepository hcatRepo = DatasetRepositories.open("repo:hive:/tmp/data");

  // Turn debug on while in development.
  getPipeline().enableDebug();
  getPipeline().getConfiguration().set("crunch.log.job.progress", "true");

  // Load the events dataset and get the correct partition to sessionize
  Dataset<StandardEvent> eventsDataset = fsRepo.load("events");
  Dataset<StandardEvent> partition;
  if (args.length == 0 || (args.length == 1 && args[0].equals("LATEST"))) {
    partition = getLatestPartition(eventsDataset);
  } else {
    partition = getPartitionForURI(eventsDataset, args[0]);
  }

  // Create a parallel collection from the working partition
  PCollection<StandardEvent> events = read(
      CrunchDatasets.asSource(partition, StandardEvent.class));

  // Group events by user and cookie id, then create a session for each group
  PCollection<Session> sessions = events
      .by(new GetSessionKey(), Avros.strings())
      .groupByKey()
      .parallelDo(new MakeSession(), Avros.specifics(Session.class));

  // Write the sessions to the "sessions" Dataset
  getPipeline().write(sessions, CrunchDatasets.asTarget(hcatRepo.load("sessions")),
      Target.WriteMode.APPEND);

  return run().succeeded() ? 0 : 1;
}

开发者ID:cloudera，项目名称:cdk-examples，代码行数:39，代码来源:CreateSessions.java

示例17: run

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
public int run(String... args) throws Exception {
  if (args.length != 3) {
    System.err.println("Usage: " + CombinedLogFormatConverter.class.getSimpleName() +
        " <input> <dataset_uri> <dataset name>");
    return 1;
  }
  String input = args[0];
  String datasetUri = args[1];
  String datasetName = args[2];

  Schema schema = new Schema.Parser().parse(
      Resources.getResource("combined_log_format.avsc").openStream());

  // Create the dataset
  DatasetRepository repo = DatasetRepositories.open(datasetUri);
  DatasetDescriptor datasetDescriptor = new DatasetDescriptor.Builder()
      .schema(schema).build();
  Dataset<Object> dataset = repo.create(datasetName, datasetDescriptor);

  // Run the job
  final String schemaString = schema.toString();
  AvroType<GenericData.Record> outputType = Avros.generics(schema);
  PCollection<String> lines = readTextFile(input);
  PCollection<GenericData.Record> records = lines.parallelDo(
      new ConvertFn(schemaString), outputType);
  getPipeline().write(records, CrunchDatasets.asTarget(dataset),
      Target.WriteMode.APPEND);
  run();
  return 0;
}

开发者ID:cloudera，项目名称:cdk，代码行数:32，代码来源:CombinedLogFormatConverter.java

示例18: createPipeline

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
protected MRPipeline createPipeline() throws IOException {

  IterationState iterationState = getIterationState();
  String iterationKey = iterationState.getIterationKey();
  JobStepConfig config = getConfig();
  String instanceDir = config.getInstanceDir();
  long generationID = config.getGenerationID();

  String outputPathKey = Namespaces.getTempPrefix(instanceDir, generationID) + "distributeRecommend/";
  if (!validOutputPath(outputPathKey)) {
    return null;
  }

  MRPipeline p = createBasicPipeline(DistributeRecommendWorkFn.class);

  String knownItemsKey = Namespaces.getInstanceGenerationPrefix(instanceDir, generationID) + "knownItems/";
  PTable<Long, LongSet> knownItems = p.read(textInput(knownItemsKey))
      .parallelDo("knownItems", new KnownItemsFn(), Avros.tableOf(ALSTypes.LONGS, ALSTypes.ID_SET));
  PTable<Long, float[]> userFeatures = p.read(input(iterationKey + "X/", ALSTypes.DENSE_ROW_MATRIX))
      .parallelDo("asPair", MatrixRow.AS_PAIR, Avros.tableOf(Avros.longs(), ALSTypes.FLOAT_ARRAY));

  JoinStrategy<Long, float[], LongSet> joinStrategy = new DefaultJoinStrategy<Long, float[], LongSet>(
      getNumReducers());
  PTable<Long, Pair<float[], LongSet>> joined = joinStrategy.join(userFeatures, knownItems, JoinType.INNER_JOIN);

  joined.parallelDo(
      "distribute", new DistributeRecommendWorkFn(), ALSTypes.REC_TYPE)
      .write(output(outputPathKey));

  return p;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:33，代码来源:DistributeRecommendWorkStep.java

示例19: createPipeline

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
@Override
protected MRPipeline createPipeline() throws IOException {

  JobStepConfig jobConfig = getConfig();

  String instanceDir = jobConfig.getInstanceDir();
  long generationID = jobConfig.getGenerationID();
  long lastGenerationID = jobConfig.getLastGenerationID();

  String outputKey = Namespaces.getInstanceGenerationPrefix(instanceDir, generationID) + "idMapping/";
  if (!validOutputPath(outputKey)) {
    return null;
  }

  MRPipeline p = createBasicPipeline(MergeNewOldValuesFn.class);

  String inboundKey = Namespaces.getInstanceGenerationPrefix(instanceDir, generationID) + "inbound/";

  PTable<Long, String> parsed = p.read(textInput(inboundKey))
      .parallelDo("inboundParseForMapping", new MappingParseFn(),
          Avros.tableOf(ALSTypes.LONGS, Avros.strings()));

  if (lastGenerationID >= 0) {
    String idMappingPrefix = Namespaces.getInstanceGenerationPrefix(instanceDir, lastGenerationID) + "idMapping/";
    Preconditions.checkState(Store.get().exists(idMappingPrefix, false), "Input path does not exist: %s", idMappingPrefix);
    PTable<Long,String> joinBefore = p.read(textInput(idMappingPrefix))
        .parallelDo("lastGeneration", new ExistingMappingsMapFn(),
                    Avros.tableOf(ALSTypes.LONGS, Avros.strings()));
    parsed = parsed.union(joinBefore);
  }

  parsed.groupByKey(groupingOptions())
      .parallelDo("mergeNewOldMappings", new CombineMappingsFn(), Avros.strings())
      .write(compressedTextOutput(p.getConfiguration(), outputKey));
  return p;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:37，代码来源:MergeIDMappingStep.java

示例20: readExpectedIDs

import org.apache.crunch.types.avro.Avros; //导入依赖的package包/类
private static LongSet readExpectedIDs(String key,
                                         Progressable progressable,
                                         Configuration conf) throws IOException {
  LongSet ids = new LongSet();
  long count = 0;
  for (long id : new AvroFileSource<Long>(Namespaces.toPath(key), Avros.longs()).read(conf)) {
    ids.add(id);
    if (++count % 10000 == 0) {
      progressable.progress();
    }
  }
  log.info("Read {} IDs from {}", ids.size(), key);
  return ids;
}

开发者ID:apsaltis，项目名称:oryx，代码行数:15，代码来源:ComputationDataUtils.java

注：本文中的org.apache.crunch.types.avro.Avros类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java ConfigConstants类代码示例发布时间：2022-05-22

Java ConcurrentException类代码示例发布时间：2022-05-22

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18011|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9588|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8136|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8520|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8424|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9323|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8387|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7820|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8375|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7369|2022-11-06

客服电话

电子邮件

Java Avros类代码示例

示例1: loadKeyedRecords

示例2: readVariants

示例3: testDetach

示例4: asSource

示例5: createPipeline

示例6: createPipeline

示例7: createPipeline

示例8: createPipeline

示例9: createPipeline

示例10: testCategorical

示例11: createPipeline

示例12: run

示例13: loadKeyedRecords

示例14: readVariants

示例15: loadPartitionedVariants

示例16: run

示例17: run

示例18: createPipeline

示例19: createPipeline

示例20: readExpectedIDs

请发表评论

全部评论

上一篇：

下一篇：

dphi-official/Machine_Learning_Bootcamp

juven/maven-bash-completion: Maven Bash

win7系统注册表编辑器打开的操作方法

route101/mastoinker: Quick image view as

CVE-2022-21509

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053