• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

Java Dictionary类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中org.apache.parquet.column.Dictionary的典型用法代码示例。如果您正苦于以下问题:Java Dictionary类的具体用法?Java Dictionary怎么用?Java Dictionary使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



Dictionary类属于org.apache.parquet.column包,在下文中一共展示了Dictionary类的20个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: createGlobalDictionaries

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
/**
 * Builds a global dictionary for parquet table for BINARY or FIXED_LEN_BYTE_ARRAY column types.
 * It will remove exiting dictionaries if present and create new ones.
 * @param fs filesystem
 * @param tableDir root directory for given table that has parquet files
 * @param bufferAllocator memory allocator
 * @return GlobalDictionariesInfo that has dictionary version, root path and columns along with path to dictionary files.
 * @throws IOException
 */
public static GlobalDictionariesInfo createGlobalDictionaries(FileSystem fs, Path tableDir, BufferAllocator bufferAllocator) throws IOException {
  final FileStatus[] statuses = fs.listStatus(tableDir, PARQUET_FILES_FILTER);
  final Map<ColumnDescriptor, Path> globalDictionaries = Maps.newHashMap();
  final Map<ColumnDescriptor, List<Dictionary>> allDictionaries = readLocalDictionaries(fs, statuses, bufferAllocator);
  final long dictionaryVersion = getDictionaryVersion(fs, tableDir) + 1;
  final Path tmpDictionaryRootDir = createTempRootDir(fs, tableDir, dictionaryVersion);
  logger.debug("Building global dictionaries for columns {} with version {}", allDictionaries.keySet(), dictionaryVersion);

  // Sort all local dictionaries and write it to file with an index if needed
  for (Map.Entry<ColumnDescriptor, List<Dictionary>> entry : allDictionaries.entrySet()) {
    final ColumnDescriptor columnDescriptor = entry.getKey();
    final Path dictionaryFile = dictionaryFilePath(tmpDictionaryRootDir, columnDescriptor);
    logger.debug("Creating a new global dictionary for {} with version {}", columnDescriptor.toString(), dictionaryVersion);
    createDictionaryFile(fs, dictionaryFile, columnDescriptor, entry.getValue(), null, bufferAllocator);
    globalDictionaries.put(columnDescriptor, dictionaryFile);
  }
  final Path finalDictionaryRootDir = createDictionaryVersionedRootPath(fs, tableDir, dictionaryVersion, tmpDictionaryRootDir);
  return new GlobalDictionariesInfo(globalDictionaries, finalDictionaryRootDir,  dictionaryVersion);
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:29,代码来源:GlobalDictionaryBuilder.java


示例2: readDictionaries

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
/**
 * Return dictionary per row group for all binary columns in given parquet file.
 * @param fs filesystem object.
 * @param filePath parquet file to scan
 * @return pair of dictionaries found for binary fields and list of binary fields which are not dictionary encoded.
 * @throws IOException
 */
public static Pair<Map<ColumnDescriptor, Dictionary>, Set<ColumnDescriptor>> readDictionaries(FileSystem fs, Path filePath, CodecFactory codecFactory) throws IOException {
  final ParquetMetadata parquetMetadata = ParquetFileReader.readFooter(fs.getConf(), filePath, ParquetMetadataConverter.NO_FILTER);
  if (parquetMetadata.getBlocks().size() > 1) {
    throw new IOException(
      format("Global dictionaries can only be built on a parquet file with a single row group, found %d row groups for file %s",
        parquetMetadata.getBlocks().size(), filePath));
  }
  final BlockMetaData rowGroupMetadata = parquetMetadata.getBlocks().get(0);
  final Map<ColumnPath, ColumnDescriptor> columnDescriptorMap = Maps.newHashMap();

  for (ColumnDescriptor columnDescriptor : parquetMetadata.getFileMetaData().getSchema().getColumns()) {
    columnDescriptorMap.put(ColumnPath.get(columnDescriptor.getPath()), columnDescriptor);
  }

  final Set<ColumnDescriptor> columnsToSkip = Sets.newHashSet(); // columns which are found in parquet file but are not dictionary encoded
  final Map<ColumnDescriptor, Dictionary> dictionaries = Maps.newHashMap();
  try(final FSDataInputStream in = fs.open(filePath)) {
    for (ColumnChunkMetaData columnChunkMetaData : rowGroupMetadata.getColumns()) {
      if (isBinaryType(columnChunkMetaData.getType())) {
        final ColumnDescriptor column = columnDescriptorMap.get(columnChunkMetaData.getPath());
        // if first page is dictionary encoded then load dictionary, otherwise skip this column.
        final PageHeaderWithOffset pageHeader = columnChunkMetaData.getPageHeaders().get(0);
        if (PageType.DICTIONARY_PAGE == pageHeader.getPageHeader().getType()) {
          dictionaries.put(column, readDictionary(in, column, pageHeader, codecFactory.getDecompressor(columnChunkMetaData.getCodec())));
        } else {
          columnsToSkip.add(column);
        }
      }
    }
  }
  return new ImmutablePair<>(dictionaries, columnsToSkip);
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:40,代码来源:LocalDictionariesReader.java


示例3: testLocalDictionaries

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
@Test
public void testLocalDictionaries() throws IOException {
  try (final BufferAllocator bufferAllocator = new RootAllocator(SabotConfig.getMaxDirectMemory())) {
    final CodecFactory codecFactory = CodecFactory.createDirectCodecFactory(fs.getConf(), new ParquetDirectByteBufferAllocator(bufferAllocator), 0);
    Pair<Map<ColumnDescriptor, Dictionary>, Set<ColumnDescriptor>> dictionaries1 =
      LocalDictionariesReader.readDictionaries(fs, new Path(tableDirPath, "phonebook1.parquet"), codecFactory);
    Pair<Map<ColumnDescriptor, Dictionary>, Set<ColumnDescriptor>> dictionaries2 =
      LocalDictionariesReader.readDictionaries(fs, new Path(tableDirPath, "phonebook2.parquet"), codecFactory);
    Pair<Map<ColumnDescriptor, Dictionary>, Set<ColumnDescriptor>> dictionaries3 =
      LocalDictionariesReader.readDictionaries(fs, new Path(tableDirPath, "phonebook3.parquet"), codecFactory);
    Pair<Map<ColumnDescriptor, Dictionary>, Set<ColumnDescriptor>> dictionaries4 =
      LocalDictionariesReader.readDictionaries(fs, new Path(partitionDirPath, "phonebook4.parquet"), codecFactory);

    assertEquals(2, dictionaries1.getKey().size()); // name and kind have dictionaries
    assertEquals(1, dictionaries2.getKey().size());
    assertEquals(1, dictionaries3.getKey().size());
    assertEquals(1, dictionaries4.getKey().size());

    assertEquals(0, dictionaries1.getValue().size());
    assertEquals(1, dictionaries2.getValue().size()); // skip name
    assertEquals(1, dictionaries3.getValue().size()); // skip name
    assertEquals(1, dictionaries4.getValue().size()); // skip name
  }
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:25,代码来源:TestGlobalDictionaryBuilder.java


示例4: readLocalDictionaries

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
private static Map<ColumnDescriptor, List<Dictionary>> readLocalDictionaries(FileSystem fs, FileStatus[] statuses, BufferAllocator allocator) throws IOException{
  final Set<ColumnDescriptor> columnsToSkip = Sets.newHashSet(); // These columns are not dictionary encoded in at least one file.
  final Map<ColumnDescriptor, List<Dictionary>> allDictionaries = Maps.newHashMap();
  final CodecFactory codecFactory = CodecFactory.createDirectCodecFactory(fs.getConf(), new ParquetDirectByteBufferAllocator(allocator), 0);
  for (FileStatus status : statuses) {
    logger.debug("Scanning file {}", status.getPath());
    final Pair<Map<ColumnDescriptor, Dictionary>, Set<ColumnDescriptor>> localDictionaries = LocalDictionariesReader.readDictionaries(
      fs, status.getPath(), codecFactory);

    // Skip columns which are not dictionary encoded
    for (ColumnDescriptor skippedColumn : localDictionaries.getRight()) {
      columnsToSkip.add(skippedColumn);
      allDictionaries.remove(skippedColumn);
    }

    for (final Map.Entry<ColumnDescriptor, Dictionary> entry : localDictionaries.getLeft().entrySet()) {
      if (!columnsToSkip.contains(entry.getKey())) {
        if (allDictionaries.containsKey(entry.getKey())) {
          allDictionaries.get(entry.getKey()).add(entry.getValue());
        } else {
          allDictionaries.put(entry.getKey(), Lists.newArrayList(entry.getValue()));
        }
      }
    }
  }
  logger.debug("Skipping columns {}", columnsToSkip);
  return allDictionaries;
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:29,代码来源:GlobalDictionaryBuilder.java


示例5: buildIntegerGlobalDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
private static VectorContainer buildIntegerGlobalDictionary(List<Dictionary> dictionaries, VectorContainer existingDict, ColumnDescriptor columnDescriptor, BufferAllocator bufferAllocator) {
  final Field field = new Field(SchemaPath.getCompoundPath(columnDescriptor.getPath()).getAsUnescapedPath(), true, new ArrowType.Int(32, true), null);
  final VectorContainer input = new VectorContainer(bufferAllocator);
  final NullableIntVector intVector = input.addOrGet(field);
  intVector.allocateNew();
  final SortedSet<Integer> values = Sets.newTreeSet();
  for (Dictionary dictionary : dictionaries) {
    for (int i = 0; i <= dictionary.getMaxId(); ++i) {
      values.add(dictionary.decodeToInt(i));
    }
  }
  if (existingDict != null) {
    final NullableIntVector existingDictValues = existingDict.getValueAccessorById(NullableIntVector.class, 0).getValueVector();
    for (int i = 0; i < existingDict.getRecordCount(); ++i) {
      values.add(existingDictValues.getAccessor().get(i));
    }
  }
  final Iterator<Integer> iter = values.iterator();
  int recordCount = 0;
  while (iter.hasNext()) {
    intVector.getMutator().setSafe(recordCount++, iter.next());
  }
  intVector.getMutator().setValueCount(recordCount);
  input.setRecordCount(recordCount);
  input.buildSchema(BatchSchema.SelectionVectorMode.NONE);
  return input;
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:28,代码来源:GlobalDictionaryBuilder.java


示例6: buildLongGlobalDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
private static VectorContainer buildLongGlobalDictionary(List<Dictionary> dictionaries, VectorContainer existingDict, ColumnDescriptor columnDescriptor, BufferAllocator bufferAllocator) {
  final Field field = new Field(SchemaPath.getCompoundPath(columnDescriptor.getPath()).getAsUnescapedPath(), true, new ArrowType.Int(64, true), null);
  final VectorContainer input = new VectorContainer(bufferAllocator);
  final NullableBigIntVector longVector = input.addOrGet(field);
  longVector.allocateNew();
  SortedSet<Long> values = Sets.newTreeSet();
  for (Dictionary dictionary : dictionaries) {
    for (int i = 0; i <= dictionary.getMaxId(); ++i) {
      values.add(dictionary.decodeToLong(i));
    }
  }
  if (existingDict != null) {
    final NullableBigIntVector existingDictValues = existingDict.getValueAccessorById(NullableBigIntVector.class, 0).getValueVector();
    for (int i = 0; i < existingDict.getRecordCount(); ++i) {
      values.add(existingDictValues.getAccessor().get(i));
    }
  }
  final Iterator<Long> iter = values.iterator();
  int recordCount = 0;
  while (iter.hasNext()) {
    longVector.getMutator().setSafe(recordCount++, iter.next());
  }
  longVector.getMutator().setValueCount(recordCount);
  input.setRecordCount(recordCount);
  input.buildSchema(BatchSchema.SelectionVectorMode.NONE);
  return input;
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:28,代码来源:GlobalDictionaryBuilder.java


示例7: buildDoubleGlobalDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
private static VectorContainer buildDoubleGlobalDictionary(List<Dictionary> dictionaries, VectorContainer existingDict, ColumnDescriptor columnDescriptor, BufferAllocator bufferAllocator) {
  final Field field = new Field(SchemaPath.getCompoundPath(columnDescriptor.getPath()).getAsUnescapedPath(), true, new ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE), null);
  final VectorContainer input = new VectorContainer(bufferAllocator);
  final NullableFloat8Vector doubleVector = input.addOrGet(field);
  doubleVector.allocateNew();
  SortedSet<Double> values = Sets.newTreeSet();
  for (Dictionary dictionary : dictionaries) {
    for (int i = 0; i <= dictionary.getMaxId(); ++i) {
      values.add(dictionary.decodeToDouble(i));
    }
  }
  if (existingDict != null) {
    final NullableFloat8Vector existingDictValues = existingDict.getValueAccessorById(NullableFloat8Vector.class, 0).getValueVector();
    for (int i = 0; i < existingDict.getRecordCount(); ++i) {
      values.add(existingDictValues.getAccessor().get(i));
    }
  }
  final Iterator<Double> iter = values.iterator();
  int recordCount = 0;
  while (iter.hasNext()) {
    doubleVector.getMutator().setSafe(recordCount++, iter.next());
  }
  doubleVector.getMutator().setValueCount(recordCount);
  input.setRecordCount(recordCount);
  input.buildSchema(BatchSchema.SelectionVectorMode.NONE);
  return input;
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:28,代码来源:GlobalDictionaryBuilder.java


示例8: buildFloatGlobalDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
private static VectorContainer buildFloatGlobalDictionary(List<Dictionary> dictionaries, VectorContainer existingDict, ColumnDescriptor columnDescriptor, BufferAllocator bufferAllocator) {
  final Field field = new Field(SchemaPath.getCompoundPath(columnDescriptor.getPath()).getAsUnescapedPath(), true, new ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE), null);
  final VectorContainer input = new VectorContainer(bufferAllocator);
  final NullableFloat4Vector floatVector = input.addOrGet(field);
  floatVector.allocateNew();
  SortedSet<Float> values = Sets.newTreeSet();
  for (Dictionary dictionary : dictionaries) {
    for (int i = 0; i <= dictionary.getMaxId(); ++i) {
      values.add(dictionary.decodeToFloat(i));
    }
  }
  if (existingDict != null) {
    final NullableFloat4Vector existingDictValues = existingDict.getValueAccessorById(NullableFloat4Vector.class, 0).getValueVector();
    for (int i = 0; i < existingDict.getRecordCount(); ++i) {
      values.add(existingDictValues.getAccessor().get(i));
    }
  }
  final Iterator<Float> iter = values.iterator();
  int recordCount = 0;
  while (iter.hasNext()) {
    floatVector.getMutator().setSafe(recordCount++, iter.next());
  }
  floatVector.getMutator().setValueCount(recordCount);
  input.setRecordCount(recordCount);
  input.buildSchema(BatchSchema.SelectionVectorMode.NONE);
  return input;
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:28,代码来源:GlobalDictionaryBuilder.java


示例9: buildBinaryGlobalDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
private static VectorContainer buildBinaryGlobalDictionary(List<Dictionary> dictionaries, VectorContainer existingDict, ColumnDescriptor columnDescriptor, BufferAllocator bufferAllocator) {
  final Field field = new Field(SchemaPath.getCompoundPath(columnDescriptor.getPath()).getAsUnescapedPath(), true, new ArrowType.Binary(), null);
  final VectorContainer input = new VectorContainer(bufferAllocator);
  final NullableVarBinaryVector binaryVector = input.addOrGet(field);
  binaryVector.allocateNew();
  final SortedSet<Binary> values = new TreeSet<>();
  for (Dictionary dictionary : dictionaries) {
    for (int i = 0; i <= dictionary.getMaxId(); ++i) {
      values.add(dictionary.decodeToBinary(i));
    }
  }
  if (existingDict != null) {
    final NullableVarBinaryVector existingDictValues = existingDict.getValueAccessorById(NullableVarBinaryVector.class, 0).getValueVector();
    for (int i = 0; i < existingDict.getRecordCount(); ++i) {
      values.add(Binary.fromConstantByteArray(existingDictValues.getAccessor().get(i)));
    }
  }
  final Iterator<Binary> iter = values.iterator();
  int recordCount = 0;
  while (iter.hasNext()) {
    final byte[] data = iter.next().getBytes();
    binaryVector.getMutator().setSafe(recordCount++, data, 0, data.length);
  }
  binaryVector.getMutator().setValueCount(recordCount);
  input.setRecordCount(recordCount);
  input.buildSchema(BatchSchema.SelectionVectorMode.NONE);
  return input;
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:29,代码来源:GlobalDictionaryBuilder.java


示例10: readDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
public static Dictionary readDictionary(FSDataInputStream in, ColumnDescriptor column, PageHeaderWithOffset pageHeader, BytesDecompressor decompressor) throws IOException {
  in.seek(pageHeader.getOffset());
  final byte[] data = new byte[pageHeader.getPageHeader().getCompressed_page_size()];
  int read = in.read(data);
  if (read != data.length) {
    throw new IOException(format("Failed to read dictionary page, read %d bytes, expected %d", read, data.length));
  }
  final DictionaryPage dictionaryPage = new DictionaryPage(
    decompressor.decompress(BytesInput.from(data), pageHeader.getPageHeader().getUncompressed_page_size()),
    pageHeader.getPageHeader().getDictionary_page_header().getNum_values(),
    CONVERTER.getEncoding(pageHeader.getPageHeader().getDictionary_page_header().getEncoding()));
  return dictionaryPage.getEncoding().initDictionary(column, dictionaryPage);
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:14,代码来源:LocalDictionariesReader.java


示例11: main

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
public static void main(String[] args) {
  try (final BufferAllocator bufferAllocator = new RootAllocator(SabotConfig.getMaxDirectMemory())) {
    final FileSystem fs = FileSystem.getLocal(new Configuration());
    final Path filePath = new Path(args[0]);
    final CodecFactory codecFactory = CodecFactory.createDirectCodecFactory(fs.getConf(), new ParquetDirectByteBufferAllocator(bufferAllocator), 0);
    final Pair<Map<ColumnDescriptor, Dictionary>, Set<ColumnDescriptor>> dictionaries = readDictionaries(fs, filePath, codecFactory);
    for (Map.Entry<ColumnDescriptor, Dictionary> entry :  dictionaries.getLeft().entrySet()) {
      printDictionary(entry.getKey(), entry.getValue());
    }
    System.out.println("Binary columns which are not dictionary encoded: " + dictionaries.getRight());
  } catch (IOException ioe) {
    logger.error("Failed ", ioe);
  }
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:15,代码来源:LocalDictionariesReader.java


示例12: printDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
public static void printDictionary(ColumnDescriptor columnDescriptor, Dictionary localDictionary) {
  System.out.println("Dictionary for column " + columnDescriptor.toString());
  for (int i = 0; i < localDictionary.getMaxId(); ++i) {
    switch (columnDescriptor.getType()) {
      case INT32:
        System.out.println(format("%d: %d", i, localDictionary.decodeToInt(i)));
        break;
      case INT64:
        System.out.println(format("%d: %d", i, localDictionary.decodeToLong(i)));
        break;
      case INT96:
      case BINARY:
      case FIXED_LEN_BYTE_ARRAY:
        System.out.println(format("%d: %s", i, new String(localDictionary.decodeToBinary(i).getBytesUnsafe())));
        break;
      case FLOAT:
        System.out.println(format("%d: %f", i, localDictionary.decodeToFloat(i)));
        break;
      case DOUBLE:
        System.out.println(format("%d: %f", i, localDictionary.decodeToDouble(i)));
        break;
      case BOOLEAN:
        System.out.println(format("%d: %b", i, localDictionary.decodeToBoolean(i)));
        break;
      default:
        break;
    }
  }
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:30,代码来源:LocalDictionariesReader.java


示例13: setDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
@Override
public void setDictionary(Dictionary dictionary)
{
    expandedDictionary = new Value[dictionary.getMaxId() + 1];
    for (int id = 0; id <= dictionary.getMaxId(); id++) {
        // This is copied array. Copying at ValueFactory#newString is not necessary.
        byte[] bytes = dictionary.decodeToBinary(id).getBytes();
        expandedDictionary[id] = ValueFactory.newString(bytes);
    }
}
 
开发者ID:CyberAgent,项目名称:embulk-input-parquet_hadoop,代码行数:11,代码来源:ParquetStringConverter.java


示例14: setDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
@Override
public void setDictionary(Dictionary dictionary)
{
    expandedDictionary = new Value[dictionary.getMaxId() + 1];
    for (int id = 0; id <= dictionary.getMaxId(); id++) {
        expandedDictionary[id] = decimalFromLong(dictionary.decodeToInt(id));
    }
}
 
开发者ID:CyberAgent,项目名称:embulk-input-parquet_hadoop,代码行数:9,代码来源:ParquetDecimalConverter.java


示例15: setDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
@Override
public void setDictionary(Dictionary dictionary) {
  _dict = new String[dictionary.getMaxId() + 1];
  for (int i = 0; i <= dictionary.getMaxId(); i++) {
    _dict[i] = dictionary.decodeToBinary(i).toStringUsingUTF8();
  }
}
 
开发者ID:h2oai,项目名称:h2o-3,代码行数:8,代码来源:ChunkConverter.java


示例16: bindToDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
private void bindToDictionary(final Dictionary dictionary) {
  binding =
      new Binding() {
        void read() {
          dictionaryId = dataColumn.readValueDictionaryId();
        }
        public void skip() {
          dataColumn.skip();
        }
        public int getDictionaryId() {
          return dictionaryId;
        }
        void writeValue() {
          converter.addValueFromDictionary(dictionaryId);
        }
        public int getInteger() {
          return dictionary.decodeToInt(dictionaryId);
        }
        public boolean getBoolean() {
          return dictionary.decodeToBoolean(dictionaryId);
        }
        public long getLong() {
          return dictionary.decodeToLong(dictionaryId);
        }
        public Binary getBinary() {
          return dictionary.decodeToBinary(dictionaryId);
        }
        public float getFloat() {
          return dictionary.decodeToFloat(dictionaryId);
        }
        public double getDouble() {
          return dictionary.decodeToDouble(dictionaryId);
        }
      };
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:36,代码来源:ColumnReaderImpl.java


示例17: initDicReader

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
private DictionaryValuesReader initDicReader(ValuesWriter cw, PrimitiveTypeName type)
    throws IOException {
  final DictionaryPage dictionaryPage = cw.toDictPageAndClose().copy();
  final ColumnDescriptor descriptor = new ColumnDescriptor(new String[] {"foo"}, type, 0, 0);
  final Dictionary dictionary = PLAIN.initDictionary(descriptor, dictionaryPage);
  final DictionaryValuesReader cr = new DictionaryValuesReader(dictionary);
  return cr;
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:9,代码来源:TestDictionary.java


示例18: expandDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
@SuppressWarnings("unchecked")
private <T extends Comparable<T>> Set<T> expandDictionary(ColumnChunkMetaData meta) throws IOException {
  ColumnDescriptor col = new ColumnDescriptor(meta.getPath().toArray(), meta.getPrimitiveType(), -1, -1);
  DictionaryPage page = dictionaries.readDictionaryPage(col);

  // the chunk may not be dictionary-encoded
  if (page == null) {
    return null;
  }

  Dictionary dict = page.getEncoding().initDictionary(col, page);

  Set dictSet = new HashSet<T>();

  for (int i=0; i<=dict.getMaxId(); i++) {
    switch(meta.getType()) {
      case BINARY: dictSet.add(dict.decodeToBinary(i));
        break;
      case INT32: dictSet.add(dict.decodeToInt(i));
        break;
      case INT64: dictSet.add(dict.decodeToLong(i));
        break;
      case FLOAT: dictSet.add(dict.decodeToFloat(i));
        break;
      case DOUBLE: dictSet.add(dict.decodeToDouble(i));
        break;
      default:
        LOG.warn("Unknown dictionary type{}", meta.getType());
    }
  }

  return (Set<T>) dictSet;
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:34,代码来源:DictionaryFilter.java


示例19: setDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
@Override
public void setDictionary(Dictionary dictionary) {
  dict = new  Descriptors.EnumValueDescriptor[dictionary.getMaxId() + 1];
  for (int i = 0; i <= dictionary.getMaxId(); i++) {
    Binary binaryValue = dictionary.decodeToBinary(i);
    dict[i] = translateEnumValue(binaryValue);
  }
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:9,代码来源:ProtoMessageConverter.java


示例20: setDictionary

import org.apache.parquet.column.Dictionary; //导入依赖的package包/类
@Override
public void setDictionary(Dictionary dictionary) {
  dict = new String[dictionary.getMaxId() + 1];
  for (int i = 0; i <= dictionary.getMaxId(); i++) {
    dict[i] = dictionary.decodeToBinary(i).toStringUsingUTF8();
  }
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:8,代码来源:TupleConverter.java



注:本文中的org.apache.parquet.column.Dictionary类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java CodeCacheVisitor类代码示例发布时间:2022-05-23
下一篇:
Java ProviderBinding类代码示例发布时间:2022-05-23
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap