• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

Java PathType类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中org.apache.mahout.common.iterator.sequencefile.PathType的典型用法代码示例。如果您正苦于以下问题:Java PathType类的具体用法?Java PathType怎么用?Java PathType使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



PathType类属于org.apache.mahout.common.iterator.sequencefile包,在下文中一共展示了PathType类的20个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: runSequential

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/***
 * PPCA: sequential PPCA based on the paper from Tipping and Bishop
 * 
 * @param conf
 *          the configuration
 * @param input
 *          the path to the input matrix Y
 * @param output
 *          the output path (not used currently)
 * @param nRows
 *          number or rows in Y
 * @param nCols
 *          number of columns in Y
 * @param nPCs
 *          number of desired principal components
 * @return the error
 * @throws Exception
 */
double runSequential(Configuration conf, Path input, Path output,
    final int nRows, final int nCols, final int nPCs) throws Exception {
  Matrix centralY = new DenseMatrix(nRows, nCols);
  FileSystem fs = FileSystem.get(input.toUri(), conf);
  if (fs.listStatus(input).length == 0) {
    System.err.println("No file under " + input);
    return 0;
  }
  int row = 0;
  for (VectorWritable vw : new SequenceFileDirValueIterable<VectorWritable>(
      input, PathType.LIST, null, conf)) {
    centralY.assignRow(row, vw.get());
    row++;
  }
  Matrix centralC = PCACommon.randomMatrix(nCols, nPCs);
  double ss = PCACommon.randSS();
  InitialValues initVal = new InitialValues(centralC, ss);
  // Matrix sampledYe = sample(centralY);
  // runSequential(conf, sampledYe, initVal, 100);
  double error = runSequential(conf, centralY, initVal, 100);
  return error;
}
 
开发者ID:SiddharthMalhotra,项目名称:sPCA,代码行数:41,代码来源:SPCADriver.java


示例2: runSequential_JacobVersion

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * PPCA: sequential PPCA based on the matlab implementation of Jacob Verbeek
 * 
 * @param conf
 *          the configuration
 * @param input
 *          the path to the input matrix Y
 * @param output
 *          the output path (not used currently)
 * @param nRows
 *          number or rows in Y
 * @param nCols
 *          number of columns in Y
 * @param nPCs
 *          number of desired principal components
 * @return the error
 * @throws Exception
 */
double runSequential_JacobVersion(Configuration conf, Path input,
    Path output, final int nRows, final int nCols, final int nPCs) throws Exception {
  Matrix centralY = new DenseMatrix(nRows, nCols);
  FileSystem fs = FileSystem.get(input.toUri(), conf);
  if (fs.listStatus(input).length == 0) {
    System.err.println("No file under " + input);
    return 0;
  }
  int row = 0;
  for (VectorWritable vw : new SequenceFileDirValueIterable<VectorWritable>(
      input, PathType.LIST, null, conf)) {
    centralY.assignRow(row, vw.get());
    row++;
  }
  Matrix C = PCACommon.randomMatrix(nCols, nPCs);
  double ss = PCACommon.randSS();
  InitialValues initVal = new InitialValues(C, ss);
  double error = runSequential_JacobVersion(conf, centralY, initVal, 100);
  return error;
}
 
开发者ID:SiddharthMalhotra,项目名称:sPCA,代码行数:39,代码来源:SPCADriver.java


示例3: process

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * This method takes the clustered points output by the clustering algorithms as input and writes them into
 * their respective clusters.
 */
public void process() throws IOException {
  createPostProcessDirectory();
  for (Pair<?,WeightedVectorWritable> record : 
       new SequenceFileDirIterable<Writable,WeightedVectorWritable>(clusteredPoints,
                                                                    PathType.GLOB,
                                                                    PathFilters.partFilter(),
                                                                    null,
                                                                    false,
                                                                    conf)) {
    String clusterId = record.getFirst().toString().trim();
    putVectorInRespectiveCluster(clusterId, record.getSecond());
  }
  IOUtils.close(writersForClusters.values());
  writersForClusters.clear();
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:20,代码来源:ClusterOutputPostProcessor.java


示例4: getNumberOfClusters

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Reads the number of clusters present by reading the clusters-*-final file.
 * 
 * @param clusterOutputPath
 *          The output path provided to the clustering algorithm.
 * @param conf
 *          The hadoop configuration.
 * @return the number of final clusters.
 */
public static int getNumberOfClusters(Path clusterOutputPath, Configuration conf) throws IOException {
  FileSystem fileSystem = clusterOutputPath.getFileSystem(conf);
  FileStatus[] clusterFiles = fileSystem.listStatus(clusterOutputPath, PathFilters.finalPartFilter());
  int numberOfClusters = 0;
  Iterator<?> it = new SequenceFileDirValueIterator<Writable>(clusterFiles[0].getPath(),
                                                              PathType.LIST,
                                                              PathFilters.partFilter(),
                                                              null,
                                                              true,
                                                              conf);
  while (it.hasNext()) {
    it.next();
    numberOfClusters++;
  }
  return numberOfClusters;
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:26,代码来源:ClusterCountReader.java


示例5: configureWithClusterInfo

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Create a list of SoftClusters from whatever type is passed in as the prior
 * 
 * @param conf
 *          the Configuration
 * @param clusterPath
 *          the path to the prior Clusters
 * @param clusters
 *          a List<Cluster> to put values into
 */
public static void configureWithClusterInfo(Configuration conf, Path clusterPath, List<Cluster> clusters) {
  for (Writable value : new SequenceFileDirValueIterable<Writable>(clusterPath, PathType.LIST,
      PathFilters.partFilter(), conf)) {
    Class<? extends Writable> valueClass = value.getClass();
    
    if (valueClass.equals(ClusterWritable.class)) {
      ClusterWritable clusterWritable = (ClusterWritable) value;
      value = clusterWritable.getValue();
      valueClass = value.getClass();
    }
    
    if (valueClass.equals(Kluster.class)) {
      // get the cluster info
      Kluster cluster = (Kluster) value;
      clusters.add(new SoftCluster(cluster.getCenter(), cluster.getId(), cluster.getMeasure()));
    } else if (valueClass.equals(SoftCluster.class)) {
      // get the cluster info
      clusters.add((SoftCluster) value);
    } else if (valueClass.equals(Canopy.class)) {
      // get the cluster info
      Canopy canopy = (Canopy) value;
      clusters.add(new SoftCluster(canopy.getCenter(), canopy.getId(), canopy.getMeasure()));
    } else {
      throw new IllegalStateException("Bad value class: " + valueClass);
    }
  }
  
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:39,代码来源:FuzzyKMeansUtil.java


示例6: configureWithClusterInfo

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Create a list of Klusters from whatever Cluster type is passed in as the prior
 * 
 * @param conf
 *          the Configuration
 * @param clusterPath
 *          the path to the prior Clusters
 * @param clusters
 *          a List<Cluster> to put values into
 */
public static void configureWithClusterInfo(Configuration conf, Path clusterPath, Collection<Cluster> clusters) {
  for (Writable value : new SequenceFileDirValueIterable<Writable>(clusterPath, PathType.LIST,
      PathFilters.partFilter(), conf)) {
    Class<? extends Writable> valueClass = value.getClass();
    if (valueClass.equals(ClusterWritable.class)) {
      ClusterWritable clusterWritable = (ClusterWritable) value;
      value = clusterWritable.getValue();
      valueClass = value.getClass();
    }
    log.debug("Read 1 Cluster from {}", clusterPath);
    
    if (valueClass.equals(Kluster.class)) {
      // get the cluster info
      clusters.add((Kluster) value);
    } else if (valueClass.equals(Canopy.class)) {
      // get the cluster info
      Canopy canopy = (Canopy) value;
      clusters.add(new Kluster(canopy.getCenter(), canopy.getId(), canopy.getMeasure()));
    } else {
      throw new IllegalStateException("Bad value class: " + valueClass);
    }
  }
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:34,代码来源:KMeansUtil.java


示例7: readPerplexity

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * @param topicModelStateTemp
 * @param iteration
 * @return {@code double[2]} where first value is perplexity and second is model weight of those
 *         documents sampled during perplexity computation, or {@code null} if no perplexity data
 *         exists for the given iteration.
 * @throws IOException
 */
public static double readPerplexity(Configuration conf, Path topicModelStateTemp, int iteration)
  throws IOException {
  Path perplexityPath = perplexityPath(topicModelStateTemp, iteration);
  FileSystem fs = FileSystem.get(perplexityPath.toUri(), conf);
  if (!fs.exists(perplexityPath)) {
    log.warn("Perplexity path {} does not exist, returning NaN", perplexityPath);
    return Double.NaN;
  }
  double perplexity = 0;
  double modelWeight = 0;
  long n = 0;
  for (Pair<DoubleWritable, DoubleWritable> pair : new SequenceFileDirIterable<DoubleWritable, DoubleWritable>(
      perplexityPath, PathType.LIST, PathFilters.partFilter(), null, true, conf)) {
    modelWeight += pair.getFirst().get();
    perplexity += pair.getSecond().get();
    n++;
  }
  log.info("Read {} entries with total perplexity {} and model weight {}", new Object[] { n,
          perplexity, modelWeight });
  return perplexity / modelWeight;
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:30,代码来源:CVB0Driver.java


示例8: populateClusterModels

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Populates a list with clusters present in clusters-*-final directory.
 * 
 * @param clusterOutputPath
 *            The output path of the clustering.
 * @param conf
 *            The Hadoop Configuration
 * @return The list of clusters found by the clustering.
 * @throws IOException
 */
private static List<Cluster> populateClusterModels(Path clusterOutputPath,
		Configuration conf) throws IOException {
	List<Cluster> clusterModels = Lists.newArrayList();
	Path finalClustersPath = finalClustersPath(conf, clusterOutputPath);
	Iterator<?> it = new SequenceFileDirValueIterator<Writable>(
			finalClustersPath, PathType.LIST, PathFilters.partFilter(),
			null, false, conf);
	while (it.hasNext()) {
		ClusterWritable next = (ClusterWritable) it.next();
		Cluster cluster = next.getValue();
		cluster.configure(conf);
		clusterModels.add(cluster);
	}
	return clusterModels;
}
 
开发者ID:pgorecki,项目名称:visearch,代码行数:26,代码来源:ImageToTextDriver.java


示例9: selectCluster

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Classifies the vector into its respective cluster.
 * 
 * @param input
 *            the path containing the input vector.
 * @param clusterModels
 *            the clusters
 * @param clusterClassifier
 *            used to classify the vectors into different clusters
 * @param output
 *            the path to store classified data
 * @param clusterClassificationThreshold
 * @param emitMostLikely
 *            TODO
 * @throws IOException
 */
private static void selectCluster(Path input, List<Cluster> clusterModels,
		ClusterClassifier clusterClassifier, Path output,
		Double clusterClassificationThreshold, boolean emitMostLikely)
		throws IOException {
	Configuration conf = new Configuration();
	SequenceFile.Writer writer = new SequenceFile.Writer(
			input.getFileSystem(conf), conf,
			new Path(output, "part-m-" + 0), IntWritable.class, Text.class);
	for (Pair<Text, VectorWritable> entry : new SequenceFileDirIterable<Text, VectorWritable>(
			input, PathType.LIST, PathFilters.logsCRCFilter(), conf)) {
		Vector pdfPerCluster = clusterClassifier.classify(entry.getSecond()
				.get());
		if (shouldClassify(pdfPerCluster, clusterClassificationThreshold)) {
			classifyAndWrite(clusterModels, clusterClassificationThreshold,
					emitMostLikely, writer, entry.getFirst(), pdfPerCluster);
		}
	}
	writer.close();
}
 
开发者ID:pgorecki,项目名称:visearch,代码行数:36,代码来源:MyClusterClassificationDriver.java


示例10: readClusters

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
public static List<List<Cluster>> readClusters(Configuration conf, Path output)
		throws IOException {
	List<List<Cluster>> Clusters = Lists.newArrayList();
	FileSystem fs = FileSystem.get(output.toUri(), conf);

	for (FileStatus s : fs.listStatus(output, new ClustersFilter())) {
		List<Cluster> clusters = Lists.newArrayList();
		for (ClusterWritable value : new SequenceFileDirValueIterable<ClusterWritable>(
				s.getPath(), PathType.LIST, PathFilters.logsCRCFilter(),
				conf)) {
			Cluster cluster = value.getValue();
			clusters.add(cluster);
		}
		Clusters.add(clusters);
	}
	return Clusters;
}
 
开发者ID:tknandu,项目名称:recommender_pilot,代码行数:18,代码来源:ClusterHelper.java


示例11: iterateAll

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
@Override
public Iterator<MatrixSlice> iterateAll() {
  try {
    Path pathPattern = rowPath;
    if (FileSystem.get(conf).getFileStatus(rowPath).isDir()) {
      pathPattern = new Path(rowPath, "*");
    }
    return Iterators.transform(
        new SequenceFileDirIterator<IntWritable, VectorWritable>(pathPattern,
            PathType.GLOB, PathFilters.logsCRCFilter(), null, true, conf),
        new Function<Pair<IntWritable, VectorWritable>, MatrixSlice>() {
          @Override
          public MatrixSlice apply(Pair<IntWritable, VectorWritable> from) {
            return new MatrixSlice(from.getSecond().get(), from.getFirst()
                .get());
          }
        });
  } catch (IOException ioe) {
    throw new IllegalStateException(ioe);
  }
}
 
开发者ID:millecker,项目名称:applications,代码行数:22,代码来源:DistributedRowMatrix.java


示例12: crossTestIterationOfMapReducePPCASequentialPPCA

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
@Test
public void crossTestIterationOfMapReducePPCASequentialPPCA() throws Exception {
  Matrix C_central = PCACommon.randomMatrix(D, d);
  double ss = PCACommon.randSS();
  InitialValues initValSeq = new InitialValues(C_central, ss);
  InitialValues initValMR = new InitialValues(C_central.clone(), ss);

  //1. run sequential
  Matrix Ye_central = new DenseMatrix(N, D);
  int row = 0;
  for (VectorWritable vw : new SequenceFileDirValueIterable<VectorWritable>(
      input, PathType.LIST, null, conf)) {
    Ye_central.assignRow(row, vw.get());
    row++;
  }
  double bishopSeqErr = ppcaDriver.runSequential(conf, Ye_central, initValSeq, 1);
  
  //2. run mapreduce
  DistributedRowMatrix Ye = new DistributedRowMatrix(input, tmp, N, D);
  Ye.setConf(conf);
  double bishopMRErr = ppcaDriver.runMapReduce(conf, Ye, initValMR, output, N, D, d, 1, 1, 1, 1);
  
  Assert.assertEquals(
      "ss value is different in sequential and mapreduce PCA", initValSeq.ss,
      initValMR.ss, EPSILON);
  double seqCTrace = PCACommon.trace(initValSeq.C);
  double mrCTrace = PCACommon.trace(initValMR.C);
  Assert.assertEquals(
      "C value is different in sequential and mapreduce PCA", seqCTrace,
      mrCTrace, EPSILON);
  Assert.assertEquals(
      "The PPCA error between sequntial and mapreduce methods is too different: "
          + bishopSeqErr + "!= " + bishopMRErr, bishopSeqErr, bishopMRErr, EPSILON);
}
 
开发者ID:SiddharthMalhotra,项目名称:sPCA,代码行数:35,代码来源:PCATest.java


示例13: buildClustersSeq

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Build a directory of Canopy clusters from the input vectors and other
 * arguments. Run sequential execution
 * 
 * @param input
 *          the Path to the directory containing input vectors
 * @param output
 *          the Path for all output directories
 * @param measure
 *          the DistanceMeasure
 * @param t1
 *          the double T1 distance metric
 * @param t2
 *          the double T2 distance metric
 * @param clusterFilter
 *          the int minimum size of canopies produced
 * @return the canopy output directory Path
 */
private static Path buildClustersSeq(Path input, Path output,
    DistanceMeasure measure, double t1, double t2, int clusterFilter)
    throws IOException {
  CanopyClusterer clusterer = new CanopyClusterer(measure, t1, t2);
  Collection<Canopy> canopies = Lists.newArrayList();
  Configuration conf = new Configuration();
  FileSystem fs = FileSystem.get(input.toUri(), conf);

  for (VectorWritable vw : new SequenceFileDirValueIterable<VectorWritable>(
      input, PathType.LIST, PathFilters.logsCRCFilter(), conf)) {
    clusterer.addPointToCanopies(vw.get(), canopies);
  }

  Path canopyOutputDir = new Path(output, Cluster.CLUSTERS_DIR + '0'+ Cluster.FINAL_ITERATION_SUFFIX);
  Path path = new Path(canopyOutputDir, "part-r-00000");
  SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path,
      Text.class, ClusterWritable.class);
  ClusterWritable clusterWritable = new ClusterWritable();
  try {
    for (Canopy canopy : canopies) {
      canopy.computeParameters();
      if (log.isDebugEnabled()) {
        log.debug("Writing Canopy:{} center:{} numPoints:{} radius:{}",
            new Object[] { canopy.getIdentifier(),
                AbstractCluster.formatVector(canopy.getCenter(), null),
                canopy.getNumObservations(),
                AbstractCluster.formatVector(canopy.getRadius(), null) });
      }
      if (canopy.getNumObservations() > clusterFilter) {
      	clusterWritable.setValue(canopy);
      	writer.append(new Text(canopy.getIdentifier()), clusterWritable);
      }
    }
  } finally {
    Closeables.closeQuietly(writer);
  }
  return canopyOutputDir;
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:57,代码来源:CanopyDriver.java


示例14: populateClusterModels

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Populates a list with clusters present in clusters-*-final directory.
 * 
 * @param clusterOutputPath
 *          The output path of the clustering.
 * @param conf
 *          The Hadoop Configuration
 * @return The list of clusters found by the clustering.
 * @throws IOException
 */
private static List<Cluster> populateClusterModels(Path clusterOutputPath, Configuration conf) throws IOException {
  List<Cluster> clusterModels = new ArrayList<Cluster>();
  Path finalClustersPath = finalClustersPath(conf, clusterOutputPath);
  Iterator<?> it = new SequenceFileDirValueIterator<Writable>(finalClustersPath, PathType.LIST,
      PathFilters.partFilter(), null, false, conf);
  while (it.hasNext()) {
    ClusterWritable next = (ClusterWritable) it.next();
    Cluster cluster = next.getValue();
    cluster.configure(conf);
    clusterModels.add(cluster);
  }
  return clusterModels;
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:24,代码来源:ClusterClassificationDriver.java


示例15: readFromSeqFiles

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
public void readFromSeqFiles(Configuration conf, Path path) throws IOException {
  Configuration config = new Configuration();
  List<Cluster> clusters = Lists.newArrayList();
  for (ClusterWritable cw : new SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST,
      PathFilters.logsCRCFilter(), config)) {
    Cluster cluster = cw.getValue();
    cluster.configure(conf);
    clusters.add(cluster);
  }
  this.models = clusters;
  modelClass = models.get(0).getClass().getName();
  this.policy = readPolicy(path);
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:14,代码来源:ClusterClassifier.java


示例16: populateClusterModels

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
public static List<Cluster> populateClusterModels(Path clusterOutputPath, Configuration conf) throws IOException {
  List<Cluster> clusters = new ArrayList<Cluster>();
  FileSystem fileSystem = clusterOutputPath.getFileSystem(conf);
  FileStatus[] clusterFiles = fileSystem.listStatus(clusterOutputPath, PathFilters.finalPartFilter());
  Iterator<?> it = new SequenceFileDirValueIterator<Writable>(
      clusterFiles[0].getPath(), PathType.LIST, PathFilters.partFilter(),
      null, false, conf);
  while (it.hasNext()) {
    ClusterWritable next = (ClusterWritable) it.next();
    Cluster cluster = next.getValue();
    cluster.configure(conf);
    clusters.add(cluster);
  }
  return clusters;
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:16,代码来源:ClusterClassificationMapper.java


示例17: getCanopies

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
public static List<MeanShiftCanopy> getCanopies(Configuration conf) {
  String statePath = conf.get(MeanShiftCanopyDriver.STATE_IN_KEY);
  List<MeanShiftCanopy> canopies = Lists.newArrayList();
  Path path = new Path(statePath);
  for (ClusterWritable clusterWritable 
       : new SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST, PathFilters.logsCRCFilter(), conf)) {
    MeanShiftCanopy canopy = (MeanShiftCanopy)clusterWritable.getValue();
    canopies.add(canopy);
  }
  return canopies;
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:12,代码来源:MeanShiftCanopyClusterMapper.java


示例18: iterateSeq

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a sequential
 * implementation
 * 
 * @param conf
 *          the Configuration
 * @param inPath
 *          a Path to input VectorWritables
 * @param priorPath
 *          a Path to the prior classifier
 * @param outPath
 *          a Path of output directory
 * @param numIterations
 *          the int number of iterations to perform
 */
public static void iterateSeq(Configuration conf, Path inPath, Path priorPath, Path outPath, int numIterations)
  throws IOException {
  ClusterClassifier classifier = new ClusterClassifier();
  classifier.readFromSeqFiles(conf, priorPath);
  Path clustersOut = null;
  int iteration = 1;
  while (iteration <= numIterations) {
    for (VectorWritable vw : new SequenceFileDirValueIterable<VectorWritable>(inPath, PathType.LIST,
        PathFilters.logsCRCFilter(), conf)) {
      Vector vector = vw.get();
      // classification yields probabilities
      Vector probabilities = classifier.classify(vector);
      // policy selects weights for models given those probabilities
      Vector weights = classifier.getPolicy().select(probabilities);
      // training causes all models to observe data
      for (Iterator<Vector.Element> it = weights.iterateNonZero(); it.hasNext();) {
        int index = it.next().index();
        classifier.train(index, vector, weights.get(index));
      }
    }
    // compute the posterior models
    classifier.close();
    // update the policy
    classifier.getPolicy().update(classifier);
    // output the classifier
    clustersOut = new Path(outPath, Cluster.CLUSTERS_DIR + iteration);
    classifier.writeToSeqFiles(clustersOut);
    FileSystem fs = FileSystem.get(outPath.toUri(), conf);
    iteration++;
    if (isConverged(clustersOut, conf, fs)) {
      break;
    }
  }
  Path finalClustersIn = new Path(outPath, Cluster.CLUSTERS_DIR + (iteration - 1) + Cluster.FINAL_ITERATION_SUFFIX);
  FileSystem.get(clustersOut.toUri(), conf).rename(clustersOut, finalClustersIn);
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:52,代码来源:ClusterIterator.java


示例19: countRecords

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
/**
 * Count all the records in a directory using a {@link org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator}
 * @param path The {@link org.apache.hadoop.fs.Path} to count
 * @param pt The {@link org.apache.mahout.common.iterator.sequencefile.PathType}
 * @param filter Apply the {@link org.apache.hadoop.fs.PathFilter}.  May be null
 * @param conf The Hadoop {@link org.apache.hadoop.conf.Configuration}
 * @return The number of records
 * @throws IOException if there was an IO error
 */
public static long countRecords(Path path, PathType pt, PathFilter filter, Configuration conf) throws IOException {
  long count = 0;
  Iterator<?> iterator = new SequenceFileDirValueIterator<Writable>(path, pt, filter, null, true, conf);
  while (iterator.hasNext()) {
    iterator.next();
    count++;
  }
  return count;
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:19,代码来源:HadoopUtil.java


示例20: getFileStatus

import org.apache.mahout.common.iterator.sequencefile.PathType; //导入依赖的package包/类
public static FileStatus[] getFileStatus(Path path, PathType pathType, PathFilter filter, Comparator<FileStatus> ordering, Configuration conf) throws IOException {
  FileStatus[] statuses;
  FileSystem fs = path.getFileSystem(conf);
  if (filter == null) {
    statuses = pathType == PathType.GLOB ? fs.globStatus(path) : listStatus(fs, path);
  } else {
    statuses = pathType == PathType.GLOB ? fs.globStatus(path, filter) : listStatus(fs, path, filter);
  }
  if (ordering != null) {
    Arrays.sort(statuses, ordering);
  }
  return statuses;
}
 
开发者ID:saradelrio,项目名称:Chi-FRBCS-BigDataCS,代码行数:14,代码来源:HadoopUtil.java



注:本文中的org.apache.mahout.common.iterator.sequencefile.PathType类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java AbandonBlockResponseProto类代码示例发布时间:2022-05-22
下一篇:
Java XSAttributeUse类代码示例发布时间:2022-05-22
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap