本文整理汇总了Java中parquet.avro.AvroParquetReader类的典型用法代码示例。如果您正苦于以下问题:Java AvroParquetReader类的具体用法?Java AvroParquetReader怎么用?Java AvroParquetReader使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
AvroParquetReader类属于parquet.avro包,在下文中一共展示了AvroParquetReader类的15个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: checkSortedByStart
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
private void checkSortedByStart(File file, int expectedCount) throws IOException {
// check records are sorted by start position
ParquetReader<FlatVariantCall> parquetReader =
AvroParquetReader.<FlatVariantCall>builder(new Path(file.toURI())).build();
int actualCount = 0;
FlatVariantCall flat1 = parquetReader.read();
actualCount++;
Long previousStart = flat1.getStart();
while (true) {
FlatVariantCall flat = parquetReader.read();
if (flat == null) {
break;
}
Long start = flat.getStart();
assertTrue("Should be sorted by start",
previousStart.compareTo(start) <= 0);
previousStart = start;
actualCount++;
}
assertEquals(expectedCount, actualCount);
}
开发者ID:cloudera,项目名称:quince,代码行数:26,代码来源:LoadVariantsToolIT.java
示例2: getSchema
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public String getSchema(FileSystem fs, Path path) {
String schema = null;
try {
AvroParquetReader<GenericRecord> parquetReader =
new AvroParquetReader<GenericRecord>(path);
GenericRecord record = parquetReader.read();
if (record == null) {
return null;
}
Schema avroSchema = record.getSchema();
AvroSchemaConverter converter = new AvroSchemaConverter();
schema = converter.convert(avroSchema).toString();
}
catch (IOException e) {
logger.warn("Cannot get schema for file: " + path.toUri().getPath());
return null;
}
return schema;
}
开发者ID:JasonBian,项目名称:azkaban,代码行数:22,代码来源:ParquetFileViewer.java
示例3: run
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
/**
* Read the file.
*
* @param args the command-line arguments
* @return the process exit code
* @throws Exception if something goes wrong
*/
public int run(final String[] args) throws Exception {
Cli cli = Cli.builder().setArgs(args).addOptions(CliCommonOpts.MrIOpts.values()).build();
int result = cli.runCmd();
if (result != 0) {
return result;
}
Path inputFile = new Path(cli.getArgValueAsString(CliCommonOpts.MrIOpts.INPUT));
ParquetReader<Stock> reader = new AvroParquetReader<Stock>(inputFile);
Stock stock;
while ((stock = reader.read()) != null) {
System.out.println(ToStringBuilder.reflectionToString(stock,
ToStringStyle.SIMPLE_STYLE
));
}
reader.close();
return 0;
}
开发者ID:Hanmourang,项目名称:hiped2,代码行数:32,代码来源:ParquetAvroStockReader.java
示例4: getSchema
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public String getSchema(FileSystem fs, Path path) {
String schema = null;
try {
AvroParquetReader<GenericRecord> parquetReader =
new AvroParquetReader<GenericRecord>(path);
GenericRecord record = parquetReader.read();
if (record == null) {
return null;
}
Schema avroSchema = record.getSchema();
AvroSchemaConverter converter = new AvroSchemaConverter();
schema = converter.convert(avroSchema).toString();
} catch (IOException e) {
logger.warn("Cannot get schema for file: " + path.toUri().getPath());
return null;
}
return schema;
}
开发者ID:azkaban,项目名称:azkaban-plugins,代码行数:21,代码来源:ParquetFileViewer.java
示例5: writeParquet
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void writeParquet() throws Exception {
Map<String, Object> paramMap = new HashMap<>();
paramMap.put(FileSystemOutput.FORMAT_CONFIG, "parquet");
paramMap.put(FileSystemOutput.PATH_CONFIG, results.getPath());
config = ConfigFactory.parseMap(paramMap);
FileSystemOutput fileSystemOutput = new FileSystemOutput();
fileSystemOutput.configure(config);
fileSystemOutput.applyBulkMutations(plannedRows);
File[] files = results.listFiles(new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
return name.endsWith("parquet");
}
});
assertEquals("Incorrect number of Parquet files", 1, files.length);
Path path = new Path(files[0].toURI());
AvroParquetReader<GenericRecord> reader = new AvroParquetReader<>(path);
//AvroParquetReader.Builder<GenericRecord> reader = AvroParquetReader.builder(path);
int i = 0;
GenericRecord record = reader.read();
while (null != record) {
i++;
record = reader.read();
}
assertEquals("Invalid record count", 4, i);
}
开发者ID:cloudera-labs,项目名称:envelope,代码行数:32,代码来源:TestFileSystemOutput.java
示例6: testAvroInput
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testAvroInput() throws Exception {
String baseDir = "target/datasets";
FileUtil.fullyDelete(new File(baseDir));
String sampleGroup = "default";
String input = "datasets/variants_avro";
String output = "target/datasets/variants_out";
int exitCode = tool.run(new String[]{"--flatten", "--data-model", "GA4GH",
"--input-format", "AVRO", "--sample-group", sampleGroup, input, output});
assertEquals(0, exitCode);
File partition = new File(baseDir, "variants_out/chr=1/pos=0/sample_group=default");
File[] dataFiles = partition.listFiles(new HiddenFileFilter());
ParquetReader<FlatVariantCall> parquetReader =
AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();
// first record has first sample (call set) ID
FlatVariantCall flat1 = parquetReader.read();
assertEquals(".", flat1.getId());
assertEquals("1", flat1.getReferenceName());
assertEquals(14396L, flat1.getStart().longValue());
assertEquals(14400L, flat1.getEnd().longValue());
assertEquals("CTGT", flat1.getReferenceBases());
assertEquals("C", flat1.getAlternateBases1());
assertEquals("NA12878", flat1.getCallSetId());
assertEquals(0, flat1.getGenotype1().intValue());
assertEquals(1, flat1.getGenotype2().intValue());
checkSortedByStart(dataFiles[0], 15);
}
开发者ID:cloudera,项目名称:quince,代码行数:34,代码来源:LoadVariantsToolIT.java
示例7: testSmallVCF
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testSmallVCF() throws Exception {
String baseDir = "target/datasets";
FileUtil.fullyDelete(new File(baseDir));
String sampleGroup = "default";
String input = "datasets/variants_vcf";
String output = "target/datasets/variants_out";
int exitCode = tool.run(new String[]{"--flatten", "--data-model", "GA4GH", "--sample-group",
sampleGroup, input, output});
assertEquals(0, exitCode);
File partition = new File(baseDir, "variants_out/chr=1/pos=0/sample_group=default");
File[] dataFiles = partition.listFiles(new HiddenFileFilter());
ParquetReader<FlatVariantCall> parquetReader =
AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();
// first record has first sample (call set) ID
FlatVariantCall flat1 = parquetReader.read();
assertEquals(".", flat1.getId());
assertEquals("1", flat1.getReferenceName());
assertEquals(14396L, flat1.getStart().longValue());
assertEquals(14400L, flat1.getEnd().longValue());
assertEquals("CTGT", flat1.getReferenceBases());
assertEquals("C", flat1.getAlternateBases1());
assertEquals("NA12878", flat1.getCallSetId());
assertEquals(0, flat1.getGenotype1().intValue());
assertEquals(1, flat1.getGenotype2().intValue());
checkSortedByStart(dataFiles[0], 15);
}
开发者ID:cloudera,项目名称:quince,代码行数:34,代码来源:LoadVariantsToolIT.java
示例8: testRestrictSamples
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testRestrictSamples() throws Exception {
String baseDir = "target/datasets";
FileUtil.fullyDelete(new File(baseDir));
String sampleGroup = "default";
String input = "datasets/variants_vcf";
String output = "target/datasets/variants_out";
int exitCode = tool.run(new String[]{"--flatten", "--data-model", "GA4GH", "--sample-group",
sampleGroup, "--samples", "NA12878,NA12892", input, output});
assertEquals(0, exitCode);
File partition = new File(baseDir, "variants_out/chr=1/pos=0/sample_group=default");
File[] dataFiles = partition.listFiles(new HiddenFileFilter());
ParquetReader<FlatVariantCall> parquetReader =
AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();
// first record has first sample (call set) ID
FlatVariantCall flat1 = parquetReader.read();
assertEquals(".", flat1.getId());
assertEquals("1", flat1.getReferenceName());
assertEquals(14396L, flat1.getStart().longValue());
assertEquals(14400L, flat1.getEnd().longValue());
assertEquals("CTGT", flat1.getReferenceBases());
assertEquals("C", flat1.getAlternateBases1());
assertEquals("NA12878", flat1.getCallSetId());
assertEquals(0, flat1.getGenotype1().intValue());
assertEquals(1, flat1.getGenotype2().intValue());
checkSortedByStart(dataFiles[0], 10);
}
开发者ID:cloudera,项目名称:quince,代码行数:34,代码来源:LoadVariantsToolIT.java
示例9: testVariantsOnly
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testVariantsOnly() throws Exception {
String baseDir = "target/datasets";
FileUtil.fullyDelete(new File(baseDir));
String input = "datasets/variants_vcf";
String output = "target/datasets/variants_flat_locuspart";
int exitCode = tool.run(new String[]{"--variants-only", "--flatten", input, output});
assertEquals(0, exitCode);
File partition = new File(baseDir,
"variants_flat_locuspart/chr=1/pos=0");
assertTrue(partition.exists());
File[] dataFiles = partition.listFiles(new HiddenFileFilter());
assertEquals(1, dataFiles.length);
assertTrue(dataFiles[0].getName().endsWith(".parquet"));
ParquetReader<FlatVariantCall> parquetReader =
AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();
// first record has no sample (call set)
FlatVariantCall flat1 = parquetReader.read();
assertEquals(".", flat1.getId());
assertEquals("1", flat1.getReferenceName());
assertEquals(14396L, flat1.getStart().longValue());
assertEquals(14400L, flat1.getEnd().longValue());
assertEquals("CTGT", flat1.getReferenceBases());
assertEquals("C", flat1.getAlternateBases1());
assertNull(flat1.getCallSetId());
assertNull(flat1.getGenotype1());
assertNull(flat1.getGenotype2());
checkSortedByStart(dataFiles[0], 5);
}
开发者ID:cloudera,项目名称:quince,代码行数:40,代码来源:LoadVariantsToolIT.java
示例10: main
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
public static void main(String[] args) throws IOException {
if (args.length == 0) {
System.out.println("AvroReader {dataFile} {max.lines.to.read.optional}");
}
String dataFile = args[0];
int recordsToRead = Integer.MAX_VALUE;
if (args.length > 1) {
recordsToRead = Integer.parseInt(args[1]);
}
//Schema.Parser parser = new Schema.Parser();
//Configuration config = new Configuration();
//FileSystem fs = FileSystem.get(config);
//Schema schema = parser.parse(fs.open(new Path(schemaFile)));
Path dataFilePath = new Path(dataFile);
AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>(dataFilePath);
Object tmpValue;
int counter = 0;
while ((tmpValue = reader.read()) != null && counter++ < recordsToRead) {
GenericRecord r = (GenericRecord)tmpValue;
System.out.println(counter + " : " + r);
}
}
开发者ID:tmalaska,项目名称:HBase-ToHDFS,代码行数:32,代码来源:ParquetReader.java
示例11: open
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public void open() {
Preconditions.checkState(state.equals(ReaderWriterState.NEW),
"A reader may not be opened more than once - current state:%s", state);
logger.debug("Opening reader on path:{}", path);
try {
reader = new AvroParquetReader<E>(fileSystem.makeQualified(path));
} catch (IOException e) {
throw new DatasetReaderException("Unable to create reader path:" + path, e);
}
state = ReaderWriterState.OPEN;
}
开发者ID:cloudera,项目名称:cdk,代码行数:16,代码来源:ParquetFileSystemDatasetReader.java
示例12: displayFile
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public void displayFile(FileSystem fs, Path path, OutputStream outputStream,
int startLine, int endLine) throws IOException {
if (logger.isDebugEnabled()) {
logger.debug("Display Parquet file: " + path.toUri().getPath());
}
JsonGenerator json = null;
AvroParquetReader<GenericRecord> parquetReader = null;
try {
parquetReader = new AvroParquetReader<GenericRecord>(path);
// Initialize JsonGenerator.
json = new JsonFactory().createJsonGenerator(outputStream, JsonEncoding.UTF8);
json.useDefaultPrettyPrinter();
// Declare the avroWriter encoder that will be used to output the records
// as JSON but don't construct them yet because we need the first record
// in order to get the Schema.
DatumWriter<GenericRecord> avroWriter = null;
Encoder encoder = null;
long endTime = System.currentTimeMillis() + STOP_TIME;
int line = 1;
while (line <= endLine && System.currentTimeMillis() <= endTime) {
GenericRecord record = parquetReader.read();
if (record == null) {
break;
}
if (avroWriter == null) {
Schema schema = record.getSchema();
avroWriter = new GenericDatumWriter<GenericRecord>(schema);
encoder = EncoderFactory.get().jsonEncoder(schema, json);
}
if (line >= startLine) {
String recordStr = "\n\nRecord " + line + ":\n";
outputStream.write(recordStr.getBytes("UTF-8"));
avroWriter.write(record, encoder);
encoder.flush();
}
++line;
}
}
catch (IOException e) {
outputStream.write(("Error in displaying Parquet file: " +
e.getLocalizedMessage()).getBytes("UTF-8"));
throw e;
}
catch (Throwable t) {
logger.error(t.getMessage());
return;
}
finally {
if (json != null) {
json.close();
}
parquetReader.close();
}
}
开发者ID:JasonBian,项目名称:azkaban,代码行数:62,代码来源:ParquetFileViewer.java
示例13: testRedistribute
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testRedistribute() throws Exception {
String baseDir = "target/datasets";
FileUtil.fullyDelete(new File(baseDir));
String sampleGroup = "default";
String input = "datasets/variants_vcf";
String output = "target/datasets/variants_out";
int exitCode = tool.run(new String[]{"--redistribute", "--flatten", "--data-model", "GA4GH",
"--sample-group", sampleGroup, input, output});
assertEquals(0, exitCode);
File partition = new File(baseDir, "variants_out/chr=1/pos=0/sample_group=default");
assertTrue(partition.exists());
File[] dataFiles = partition.listFiles(new HiddenFileFilter());
assertEquals(1, dataFiles.length);
assertTrue(dataFiles[0].getName().endsWith(".parquet"));
ParquetReader<FlatVariantCall> parquetReader =
AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();
FlatVariantCall fvc = parquetReader.read();
// variants should be sorted, so this is the first one that we should see
assertEquals(14396L, fvc.getStart().longValue());
assertEquals(14400L, fvc.getEnd().longValue());
assertEquals("CTGT", fvc.getReferenceBases());
assertEquals("C", fvc.getAlternateBases1());
Set<String> observedSamples = new HashSet<>();
int numCalls = 0;
while (fvc != null) {
observedSamples.add(fvc.getCallSetId().toString());
numCalls += 1;
fvc = parquetReader.read();
}
Set<String> expectedSamples = new HashSet<>(Arrays.asList("NA12878", "NA12891", "NA12892"));
assertEquals(expectedSamples, observedSamples);
assertEquals(15, numCalls);
checkSortedByStart(dataFiles[0], 15);
}
开发者ID:cloudera,项目名称:quince,代码行数:43,代码来源:LoadVariantsToolIT.java
示例14: testGVCF
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Test
public void testGVCF() throws Exception {
// Note that sites with no variant calls are ignored, see https://github.com/cloudera/quince/issues/19
String baseDir = "target/datasets";
FileUtil.fullyDelete(new File(baseDir));
String sampleGroup = "sample1";
String input = "datasets/variants_gvcf";
String output = "target/datasets/variants_flat_locuspart_gvcf";
int exitCode = tool.run(new String[]{"--flatten", "--sample-group", sampleGroup, input, output});
assertEquals(0, exitCode);
File partition = new File(baseDir,
"variants_flat_locuspart_gvcf/chr=20/pos=10000000/sample_group=sample1");
assertTrue(partition.exists());
File[] dataFiles = partition.listFiles(new HiddenFileFilter());
assertEquals(1, dataFiles.length);
assertTrue(dataFiles[0].getName().endsWith(".parquet"));
ParquetReader<FlatVariantCall> parquetReader =
AvroParquetReader.<FlatVariantCall>builder(new Path(dataFiles[0].toURI())).build();
// first record has first sample (call set) ID
FlatVariantCall flat1 = parquetReader.read();
assertEquals(".", flat1.getId());
assertEquals("20", flat1.getReferenceName());
assertEquals(10000116L, flat1.getStart().longValue());
assertEquals(10000117L, flat1.getEnd().longValue());
assertEquals("C", flat1.getReferenceBases());
assertEquals("T", flat1.getAlternateBases1());
assertEquals("NA12878", flat1.getCallSetId());
assertEquals(0, flat1.getGenotype1().intValue());
assertEquals(1, flat1.getGenotype2().intValue());
checkSortedByStart(dataFiles[0], 30);
}
开发者ID:cloudera,项目名称:quince,代码行数:43,代码来源:LoadVariantsToolIT.java
示例15: displayFile
import parquet.avro.AvroParquetReader; //导入依赖的package包/类
@Override
public void displayFile(FileSystem fs, Path path, OutputStream outputStream,
int startLine, int endLine) throws IOException {
if (logger.isDebugEnabled()) {
logger.debug("Display Parquet file: " + path.toUri().getPath());
}
JsonGenerator json = null;
AvroParquetReader<GenericRecord> parquetReader = null;
try {
parquetReader = new AvroParquetReader<GenericRecord>(path);
// Initialize JsonGenerator.
json =
new JsonFactory()
.createJsonGenerator(outputStream, JsonEncoding.UTF8);
json.useDefaultPrettyPrinter();
// Declare the avroWriter encoder that will be used to output the records
// as JSON but don't construct them yet because we need the first record
// in order to get the Schema.
DatumWriter<GenericRecord> avroWriter = null;
Encoder encoder = null;
long endTime = System.currentTimeMillis() + STOP_TIME;
int line = 1;
while (line <= endLine && System.currentTimeMillis() <= endTime) {
GenericRecord record = parquetReader.read();
if (record == null) {
break;
}
if (avroWriter == null) {
Schema schema = record.getSchema();
avroWriter = new GenericDatumWriter<GenericRecord>(schema);
encoder = EncoderFactory.get().jsonEncoder(schema, json);
}
if (line >= startLine) {
String recordStr = "\n\nRecord " + line + ":\n";
outputStream.write(recordStr.getBytes("UTF-8"));
avroWriter.write(record, encoder);
encoder.flush();
}
++line;
}
} catch (IOException e) {
outputStream.write(("Error in displaying Parquet file: " + e
.getLocalizedMessage()).getBytes("UTF-8"));
throw e;
} catch (Throwable t) {
logger.error(t.getMessage());
return;
} finally {
if (json != null) {
json.close();
}
parquetReader.close();
}
}
开发者ID:azkaban,项目名称:azkaban-plugins,代码行数:61,代码来源:ParquetFileViewer.java
注:本文中的parquet.avro.AvroParquetReader类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论