• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

Java AvroParquetReader类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中parquet.avro.AvroParquetReader的典型用法代码示例。如果您正苦于以下问题:Java AvroParquetReader类的具体用法?Java AvroParquetReader怎么用?Java AvroParquetReader使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



AvroParquetReader类属于parquet.avro包,在下文中一共展示了AvroParquetReader类的15个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: checkSortedByStart

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
private void checkSortedByStart(File file, int expectedCount) throws IOException {
  // check records are sorted by start position
  ParquetReader<FlatVariantCall> parquetReader =
      AvroParquetReader.<FlatVariantCall>builder(new Path(file.toURI())).build();

  int actualCount = 0;

  FlatVariantCall flat1 = parquetReader.read();
  actualCount++;

  Long previousStart = flat1.getStart();
  while (true) {
    FlatVariantCall flat = parquetReader.read();
    if (flat == null) {
      break;
    }
    Long start = flat.getStart();
    assertTrue("Should be sorted by start",
        previousStart.compareTo(start) <= 0);
    previousStart = start;
    actualCount++;
  }

  assertEquals(expectedCount, actualCount);
}
 
开发者ID:cloudera,项目名称:quince,代码行数:26,代码来源:LoadVariantsToolIT.java


示例2: getSchema

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public String getSchema(FileSystem fs, Path path) {
	String schema = null;
	try {
		AvroParquetReader<GenericRecord> parquetReader = 
				new AvroParquetReader<GenericRecord>(path);
		GenericRecord record = parquetReader.read();
		if (record == null) {
			return null;
		}
		Schema avroSchema = record.getSchema();
		AvroSchemaConverter converter = new AvroSchemaConverter();
		schema = converter.convert(avroSchema).toString();
	}
	catch (IOException e) {
		logger.warn("Cannot get schema for file: " + path.toUri().getPath());
		return null;
	}

	return schema;
}
 
开发者ID:JasonBian,项目名称:azkaban,代码行数:22,代码来源:ParquetFileViewer.java


示例3: run

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
/**
 * Read the file.
 *
 * @param args the command-line arguments
 * @return the process exit code
 * @throws Exception if something goes wrong
 */
public int run(final String[] args) throws Exception {

  Cli cli = Cli.builder().setArgs(args).addOptions(CliCommonOpts.MrIOpts.values()).build();
  int result = cli.runCmd();

  if (result != 0) {
    return result;
  }

  Path inputFile = new Path(cli.getArgValueAsString(CliCommonOpts.MrIOpts.INPUT));

  ParquetReader<Stock> reader = new AvroParquetReader<Stock>(inputFile);

  Stock stock;
  while ((stock = reader.read()) != null) {
    System.out.println(ToStringBuilder.reflectionToString(stock,
        ToStringStyle.SIMPLE_STYLE
    ));
  }

  reader.close();

  return 0;
}
 
开发者ID:Hanmourang,项目名称:hiped2,代码行数:32,代码来源:ParquetAvroStockReader.java


示例4: getSchema

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public String getSchema(FileSystem fs, Path path) {
  String schema = null;
  try {
    AvroParquetReader<GenericRecord> parquetReader =
        new AvroParquetReader<GenericRecord>(path);
    GenericRecord record = parquetReader.read();
    if (record == null) {
      return null;
    }
    Schema avroSchema = record.getSchema();
    AvroSchemaConverter converter = new AvroSchemaConverter();
    schema = converter.convert(avroSchema).toString();
  } catch (IOException e) {
    logger.warn("Cannot get schema for file: " + path.toUri().getPath());
    return null;
  }

  return schema;
}
 
开发者ID:azkaban,项目名称:azkaban-plugins,代码行数:21,代码来源:ParquetFileViewer.java


示例5: writeParquet

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void writeParquet() throws Exception {
  Map<String, Object> paramMap = new HashMap<>();
  paramMap.put(FileSystemOutput.FORMAT_CONFIG, "parquet");
  paramMap.put(FileSystemOutput.PATH_CONFIG, results.getPath());
  config = ConfigFactory.parseMap(paramMap);

  FileSystemOutput fileSystemOutput = new FileSystemOutput();
  fileSystemOutput.configure(config);
  fileSystemOutput.applyBulkMutations(plannedRows);

  File[] files = results.listFiles(new FilenameFilter() {
    @Override
    public boolean accept(File dir, String name) {
      return name.endsWith("parquet");
    }
  });
  assertEquals("Incorrect number of Parquet files", 1, files.length);

  Path path = new Path(files[0].toURI());
  AvroParquetReader<GenericRecord> reader = new AvroParquetReader<>(path);
  //AvroParquetReader.Builder<GenericRecord> reader = AvroParquetReader.builder(path);

  int i = 0;
  GenericRecord record = reader.read();
  while (null != record) {
    i++;
    record = reader.read();
  }
  assertEquals("Invalid record count", 4, i);
}
 
开发者ID:cloudera-labs,项目名称:envelope,代码行数:32,代码来源:TestFileSystemOutput.java


示例6: testAvroInput

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testAvroInput() throws Exception {
  String baseDir = "target/datasets";

  FileUtil.fullyDelete(new File(baseDir));

  String sampleGroup = "default";
  String input = "datasets/variants_avro";
  String output = "target/datasets/variants_out";

  int exitCode = tool.run(new String[]{"--flatten", "--data-model", "GA4GH",
      "--input-format", "AVRO", "--sample-group", sampleGroup, input, output});
  assertEquals(0, exitCode);
  File partition = new File(baseDir, "variants_out/chr=1/pos=0/sample_group=default");
  File[] dataFiles = partition.listFiles(new HiddenFileFilter());

  ParquetReader<FlatVariantCall> parquetReader =
      AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();

  // first record has first sample (call set) ID
  FlatVariantCall flat1 = parquetReader.read();
  assertEquals(".", flat1.getId());
  assertEquals("1", flat1.getReferenceName());
  assertEquals(14396L, flat1.getStart().longValue());
  assertEquals(14400L, flat1.getEnd().longValue());
  assertEquals("CTGT", flat1.getReferenceBases());
  assertEquals("C", flat1.getAlternateBases1());
  assertEquals("NA12878", flat1.getCallSetId());
  assertEquals(0, flat1.getGenotype1().intValue());
  assertEquals(1, flat1.getGenotype2().intValue());

  checkSortedByStart(dataFiles[0], 15);
}
 
开发者ID:cloudera,项目名称:quince,代码行数:34,代码来源:LoadVariantsToolIT.java


示例7: testSmallVCF

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testSmallVCF() throws Exception {
  String baseDir = "target/datasets";

  FileUtil.fullyDelete(new File(baseDir));

  String sampleGroup = "default";
  String input = "datasets/variants_vcf";
  String output = "target/datasets/variants_out";

  int exitCode = tool.run(new String[]{"--flatten", "--data-model", "GA4GH", "--sample-group",
      sampleGroup, input, output});
  assertEquals(0, exitCode);
  File partition = new File(baseDir, "variants_out/chr=1/pos=0/sample_group=default");
  File[] dataFiles = partition.listFiles(new HiddenFileFilter());

  ParquetReader<FlatVariantCall> parquetReader =
      AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();

  // first record has first sample (call set) ID
  FlatVariantCall flat1 = parquetReader.read();
  assertEquals(".", flat1.getId());
  assertEquals("1", flat1.getReferenceName());
  assertEquals(14396L, flat1.getStart().longValue());
  assertEquals(14400L, flat1.getEnd().longValue());
  assertEquals("CTGT", flat1.getReferenceBases());
  assertEquals("C", flat1.getAlternateBases1());
  assertEquals("NA12878", flat1.getCallSetId());
  assertEquals(0, flat1.getGenotype1().intValue());
  assertEquals(1, flat1.getGenotype2().intValue());

  checkSortedByStart(dataFiles[0], 15);
}
 
开发者ID:cloudera,项目名称:quince,代码行数:34,代码来源:LoadVariantsToolIT.java


示例8: testRestrictSamples

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testRestrictSamples() throws Exception {
  String baseDir = "target/datasets";

  FileUtil.fullyDelete(new File(baseDir));

  String sampleGroup = "default";
  String input = "datasets/variants_vcf";
  String output = "target/datasets/variants_out";

  int exitCode = tool.run(new String[]{"--flatten", "--data-model", "GA4GH", "--sample-group",
      sampleGroup, "--samples", "NA12878,NA12892", input, output});
  assertEquals(0, exitCode);
  File partition = new File(baseDir, "variants_out/chr=1/pos=0/sample_group=default");
  File[] dataFiles = partition.listFiles(new HiddenFileFilter());

  ParquetReader<FlatVariantCall> parquetReader =
      AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();

  // first record has first sample (call set) ID
  FlatVariantCall flat1 = parquetReader.read();
  assertEquals(".", flat1.getId());
  assertEquals("1", flat1.getReferenceName());
  assertEquals(14396L, flat1.getStart().longValue());
  assertEquals(14400L, flat1.getEnd().longValue());
  assertEquals("CTGT", flat1.getReferenceBases());
  assertEquals("C", flat1.getAlternateBases1());
  assertEquals("NA12878", flat1.getCallSetId());
  assertEquals(0, flat1.getGenotype1().intValue());
  assertEquals(1, flat1.getGenotype2().intValue());

  checkSortedByStart(dataFiles[0], 10);
}
 
开发者ID:cloudera,项目名称:quince,代码行数:34,代码来源:LoadVariantsToolIT.java


示例9: testVariantsOnly

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testVariantsOnly() throws Exception {

  String baseDir = "target/datasets";

  FileUtil.fullyDelete(new File(baseDir));

  String input = "datasets/variants_vcf";
  String output = "target/datasets/variants_flat_locuspart";

  int exitCode = tool.run(new String[]{"--variants-only", "--flatten", input, output});

  assertEquals(0, exitCode);
  File partition = new File(baseDir,
      "variants_flat_locuspart/chr=1/pos=0");
  assertTrue(partition.exists());

  File[] dataFiles = partition.listFiles(new HiddenFileFilter());

  assertEquals(1, dataFiles.length);
  assertTrue(dataFiles[0].getName().endsWith(".parquet"));

  ParquetReader<FlatVariantCall> parquetReader =
      AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();

  // first record has no sample (call set)
  FlatVariantCall flat1 = parquetReader.read();
  assertEquals(".", flat1.getId());
  assertEquals("1", flat1.getReferenceName());
  assertEquals(14396L, flat1.getStart().longValue());
  assertEquals(14400L, flat1.getEnd().longValue());
  assertEquals("CTGT", flat1.getReferenceBases());
  assertEquals("C", flat1.getAlternateBases1());
  assertNull(flat1.getCallSetId());
  assertNull(flat1.getGenotype1());
  assertNull(flat1.getGenotype2());

  checkSortedByStart(dataFiles[0], 5);
}
 
开发者ID:cloudera,项目名称:quince,代码行数:40,代码来源:LoadVariantsToolIT.java


示例10: main

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
public static void main(String[] args) throws IOException {
  if (args.length == 0) {
    System.out.println("AvroReader {dataFile} {max.lines.to.read.optional}");
  }
  
  String dataFile = args[0];
  int recordsToRead = Integer.MAX_VALUE;
  if (args.length > 1) {
    recordsToRead = Integer.parseInt(args[1]);
  }
  
  //Schema.Parser parser = new Schema.Parser();
  //Configuration config = new Configuration();
  //FileSystem fs = FileSystem.get(config);
  
  //Schema schema = parser.parse(fs.open(new Path(schemaFile)));
  
  Path dataFilePath = new Path(dataFile);
  
  AvroParquetReader<GenericRecord> reader =  new AvroParquetReader<GenericRecord>(dataFilePath);
  
  Object tmpValue;
  
  
  
  int counter = 0;
  while ((tmpValue = reader.read()) != null && counter++ < recordsToRead) {
    GenericRecord r = (GenericRecord)tmpValue;
    System.out.println(counter + " : " + r);
  }
}
 
开发者ID:tmalaska,项目名称:HBase-ToHDFS,代码行数:32,代码来源:ParquetReader.java


示例11: open

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public void open() {
  Preconditions.checkState(state.equals(ReaderWriterState.NEW),
    "A reader may not be opened more than once - current state:%s", state);

  logger.debug("Opening reader on path:{}", path);

  try {
    reader = new AvroParquetReader<E>(fileSystem.makeQualified(path));
  } catch (IOException e) {
    throw new DatasetReaderException("Unable to create reader path:" + path, e);
  }

  state = ReaderWriterState.OPEN;
}
 
开发者ID:cloudera,项目名称:cdk,代码行数:16,代码来源:ParquetFileSystemDatasetReader.java


示例12: displayFile

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public void displayFile(FileSystem fs, Path path, OutputStream outputStream,
		int startLine, int endLine) throws IOException {
	if (logger.isDebugEnabled()) {
		logger.debug("Display Parquet file: " + path.toUri().getPath());
	}

	JsonGenerator json = null;
	AvroParquetReader<GenericRecord> parquetReader = null;
	try {
		parquetReader = new AvroParquetReader<GenericRecord>(path);

		// Initialize JsonGenerator.
		json = new JsonFactory().createJsonGenerator(outputStream, JsonEncoding.UTF8);
		json.useDefaultPrettyPrinter();

		// Declare the avroWriter encoder that will be used to output the records
		// as JSON but don't construct them yet because we need the first record
		// in order to get the Schema.
		DatumWriter<GenericRecord> avroWriter = null;
		Encoder encoder = null;

		long endTime = System.currentTimeMillis() + STOP_TIME;
		int line = 1;
		while (line <= endLine && System.currentTimeMillis() <= endTime) {
			GenericRecord record = parquetReader.read();
			if (record == null) {
				break;
			}

			if (avroWriter == null) {
				Schema schema = record.getSchema();
				avroWriter = new GenericDatumWriter<GenericRecord>(schema);
				encoder = EncoderFactory.get().jsonEncoder(schema, json);
			}

			if (line >= startLine) {
				String recordStr = "\n\nRecord " + line + ":\n";
				outputStream.write(recordStr.getBytes("UTF-8"));
				avroWriter.write(record, encoder);
				encoder.flush();
			}
			++line;
		}
	}
	catch (IOException e) {
		outputStream.write(("Error in displaying Parquet file: " + 
				e.getLocalizedMessage()).getBytes("UTF-8"));
		throw e;
	}
	catch (Throwable t) {
		logger.error(t.getMessage());
		return;
	}
	finally {
		if (json != null) {
			json.close();
		}
		parquetReader.close();
	}
}
 
开发者ID:JasonBian,项目名称:azkaban,代码行数:62,代码来源:ParquetFileViewer.java


示例13: testRedistribute

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testRedistribute() throws Exception {
  String baseDir = "target/datasets";

  FileUtil.fullyDelete(new File(baseDir));

  String sampleGroup = "default";
  String input = "datasets/variants_vcf";
  String output = "target/datasets/variants_out";

  int exitCode = tool.run(new String[]{"--redistribute", "--flatten", "--data-model", "GA4GH",
      "--sample-group", sampleGroup, input, output});
  assertEquals(0, exitCode);
  File partition = new File(baseDir, "variants_out/chr=1/pos=0/sample_group=default");
  assertTrue(partition.exists());
  File[] dataFiles = partition.listFiles(new HiddenFileFilter());
  assertEquals(1, dataFiles.length);
  assertTrue(dataFiles[0].getName().endsWith(".parquet"));

  ParquetReader<FlatVariantCall> parquetReader =
      AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();

  FlatVariantCall fvc = parquetReader.read();
  // variants should be sorted, so this is the first one that we should see
  assertEquals(14396L, fvc.getStart().longValue());
  assertEquals(14400L, fvc.getEnd().longValue());
  assertEquals("CTGT", fvc.getReferenceBases());
  assertEquals("C", fvc.getAlternateBases1());

  Set<String> observedSamples = new HashSet<>();
  int numCalls = 0;
  while (fvc != null) {
    observedSamples.add(fvc.getCallSetId().toString());
    numCalls += 1;
    fvc = parquetReader.read();
  }
  Set<String> expectedSamples = new HashSet<>(Arrays.asList("NA12878", "NA12891", "NA12892"));
  assertEquals(expectedSamples, observedSamples);
  assertEquals(15, numCalls);

  checkSortedByStart(dataFiles[0], 15);
}
 
开发者ID:cloudera,项目名称:quince,代码行数:43,代码来源:LoadVariantsToolIT.java


示例14: testGVCF

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testGVCF() throws Exception {

  // Note that sites with no variant calls are ignored, see https://github.com/cloudera/quince/issues/19

  String baseDir = "target/datasets";

  FileUtil.fullyDelete(new File(baseDir));

  String sampleGroup = "sample1";
  String input = "datasets/variants_gvcf";
  String output = "target/datasets/variants_flat_locuspart_gvcf";

  int exitCode = tool.run(new String[]{"--flatten", "--sample-group", sampleGroup, input, output});

  assertEquals(0, exitCode);
  File partition = new File(baseDir,
      "variants_flat_locuspart_gvcf/chr=20/pos=10000000/sample_group=sample1");
  assertTrue(partition.exists());

  File[] dataFiles = partition.listFiles(new HiddenFileFilter());

  assertEquals(1, dataFiles.length);
  assertTrue(dataFiles[0].getName().endsWith(".parquet"));

  ParquetReader<FlatVariantCall> parquetReader =
      AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();

  // first record has first sample (call set) ID
  FlatVariantCall flat1 = parquetReader.read();
  assertEquals(".", flat1.getId());
  assertEquals("20", flat1.getReferenceName());
  assertEquals(10000116L, flat1.getStart().longValue());
  assertEquals(10000117L, flat1.getEnd().longValue());
  assertEquals("C", flat1.getReferenceBases());
  assertEquals("T", flat1.getAlternateBases1());
  assertEquals("NA12878", flat1.getCallSetId());
  assertEquals(0, flat1.getGenotype1().intValue());
  assertEquals(1, flat1.getGenotype2().intValue());

  checkSortedByStart(dataFiles[0], 30);
}
 
开发者ID:cloudera,项目名称:quince,代码行数:43,代码来源:LoadVariantsToolIT.java


示例15: displayFile

import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public void displayFile(FileSystem fs, Path path, OutputStream outputStream,
    int startLine, int endLine) throws IOException {
  if (logger.isDebugEnabled()) {
    logger.debug("Display Parquet file: " + path.toUri().getPath());
  }

  JsonGenerator json = null;
  AvroParquetReader<GenericRecord> parquetReader = null;
  try {
    parquetReader = new AvroParquetReader<GenericRecord>(path);

    // Initialize JsonGenerator.
    json =
        new JsonFactory()
            .createJsonGenerator(outputStream, JsonEncoding.UTF8);
    json.useDefaultPrettyPrinter();

    // Declare the avroWriter encoder that will be used to output the records
    // as JSON but don't construct them yet because we need the first record
    // in order to get the Schema.
    DatumWriter<GenericRecord> avroWriter = null;
    Encoder encoder = null;

    long endTime = System.currentTimeMillis() + STOP_TIME;
    int line = 1;
    while (line <= endLine && System.currentTimeMillis() <= endTime) {
      GenericRecord record = parquetReader.read();
      if (record == null) {
        break;
      }

      if (avroWriter == null) {
        Schema schema = record.getSchema();
        avroWriter = new GenericDatumWriter<GenericRecord>(schema);
        encoder = EncoderFactory.get().jsonEncoder(schema, json);
      }

      if (line >= startLine) {
        String recordStr = "\n\nRecord " + line + ":\n";
        outputStream.write(recordStr.getBytes("UTF-8"));
        avroWriter.write(record, encoder);
        encoder.flush();
      }
      ++line;
    }
  } catch (IOException e) {
    outputStream.write(("Error in displaying Parquet file: " + e
        .getLocalizedMessage()).getBytes("UTF-8"));
    throw e;
  } catch (Throwable t) {
    logger.error(t.getMessage());
    return;
  } finally {
    if (json != null) {
      json.close();
    }
    parquetReader.close();
  }
}
 
开发者ID:azkaban,项目名称:azkaban-plugins,代码行数:61,代码来源:ParquetFileViewer.java



注:本文中的parquet.avro.AvroParquetReader类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java SvnOperationFactory类代码示例发布时间:2022-05-22
下一篇:
Java ContainerUtilRt类代码示例发布时间:2022-05-22
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap