Python rdd.RDD类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中pyspark.rdd.RDD类的典型用法代码示例。如果您正苦于以下问题：Python RDD类的具体用法？Python RDD怎么用？Python RDD使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

在下文中一共展示了RDD类的15个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: loadLabeledPoints

    def loadLabeledPoints(sc, path, minPartitions=None):
        """
        Load labeled points saved using RDD.saveAsTextFile.

        @param sc: Spark context
        @param path: file or directory path in any Hadoop-supported file
                     system URI
        @param minPartitions: min number of partitions
        @return: labeled data stored as an RDD of LabeledPoint

        >>> from tempfile import NamedTemporaryFile
        >>> from pyspark.mllib.util import MLUtils
        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, -1.23), (2, 4.56e-7)])), \
                        LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 3.03]))]
        >>> tempFile = NamedTemporaryFile(delete=True)
        >>> tempFile.close()
        >>> sc.parallelize(examples, 1).saveAsTextFile(tempFile.name)
        >>> loaded = MLUtils.loadLabeledPoints(sc, tempFile.name).collect()
        >>> type(loaded[0]) == LabeledPoint
        True
        >>> print examples[0]
        (1.1,(3,[0,2],[-1.23,4.56e-07]))
        >>> type(examples[1]) == LabeledPoint
        True
        >>> print examples[1]
        (0.0,[1.01,2.02,3.03])
        """
        minPartitions = minPartitions or min(sc.defaultParallelism, 2)
        jSerialized = sc._jvm.PythonMLLibAPI().loadLabeledPoints(sc._jsc, path, minPartitions)
        serialized = RDD(jSerialized, sc, NoOpSerializer())
        return serialized.map(lambda bytes: _deserialize_labeled_point(bytearray(bytes)))

开发者ID:AndyHu19900119，项目名称:spark，代码行数:31，代码来源:util.py

示例2: init

	def __init__(self, ctx, resource_read=None, query=None, **kwargs):
		kwargs = make_es_config(kwargs, resource_read=resource_read, query=query)
		kwargs = as_java_object(ctx._gateway, kwargs)
		jrdd = helper(ctx).esJsonRDD(ctx._jsc, kwargs)
		rdd = RDD(jrdd, ctx, NoOpSerializer())

		# read the rdd in batches of two (first key then value / doc)
		def pairwise(iterable):
			iterator = iter(iterable)
			return izip(iterator, iterator)
		kvRdd = rdd.mapPartitions(pairwise, True)

		super(EsRDD, self).__init__(kvRdd._jrdd, ctx)

开发者ID:TargetHolding，项目名称:pyspark-elastic，代码行数:13，代码来源:rdd.py

示例3: rdd

    def rdd(self):
        """Returns the content as an :class:`pyspark.RDD` of :class:`Row`.
        """
        if self._lazy_rdd is None:
            jrdd = self._jdf.javaToPython()
            rdd = RDD(jrdd, self.sql_ctx._sc, BatchedSerializer(PickleSerializer()))
            schema = self.schema

            def applySchema(it):
                cls = _create_cls(schema)
                return map(cls, it)

            self._lazy_rdd = rdd.mapPartitions(applySchema)

        return self._lazy_rdd

开发者ID:ZhangQingcheng，项目名称:spark，代码行数:15，代码来源:dataframe.py

示例4: rdd

    def rdd(self):
        """
        Return the content of the :class:`DataFrame` as an :class:`RDD`
        of :class:`Row` s.
        """
        if not hasattr(self, '_lazy_rdd'):
            jrdd = self._jdf.javaToPython()
            rdd = RDD(jrdd, self.sql_ctx._sc, BatchedSerializer(PickleSerializer()))
            schema = self.schema

            def applySchema(it):
                cls = _create_cls(schema)
                return itertools.imap(cls, it)

            self._lazy_rdd = rdd.mapPartitions(applySchema)

        return self._lazy_rdd

开发者ID:Sunyifan，项目名称:spark，代码行数:17，代码来源:dataframe.py

示例5: normalVectorRDD

    def normalVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None):
        """
        Generates an RDD comprised of vectors containing i.i.d. samples drawn
        from the standard normal distribution.

        >>> import numpy as np
        >>> mat = np.matrix(RandomRDDs.normalVectorRDD(sc, 100, 100, seed=1L).collect())
        >>> mat.shape
        (100, 100)
        >>> abs(mat.mean() - 0.0) < 0.1
        True
        >>> abs(mat.std() - 1.0) < 0.1
        True
        """
        jrdd = sc._jvm.PythonMLLibAPI().normalVectorRDD(sc._jsc, numRows, numCols, numPartitions, seed)
        normal = RDD(jrdd, sc, NoOpSerializer())
        return normal.map(lambda bytes: _deserialize_double_vector(bytearray(bytes)))

开发者ID:hcook，项目名称:spark，代码行数:17，代码来源:random.py

示例6: uniformVectorRDD

    def uniformVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None):
        """
        Generates an RDD comprised of vectors containing i.i.d. samples drawn
        from the uniform distribution U(0.0, 1.0).

        >>> import numpy as np
        >>> mat = np.matrix(RandomRDDs.uniformVectorRDD(sc, 10, 10).collect())
        >>> mat.shape
        (10, 10)
        >>> mat.max() <= 1.0 and mat.min() >= 0.0
        True
        >>> RandomRDDs.uniformVectorRDD(sc, 10, 10, 4).getNumPartitions()
        4
        """
        jrdd = sc._jvm.PythonMLLibAPI().uniformVectorRDD(sc._jsc, numRows, numCols, numPartitions, seed)
        uniform = RDD(jrdd, sc, NoOpSerializer())
        return uniform.map(lambda bytes: _deserialize_double_vector(bytearray(bytes)))

开发者ID:hcook，项目名称:spark，代码行数:17，代码来源:random.py

示例7: poissonRDD

    def poissonRDD(sc, mean, size, numPartitions=None, seed=None):
        """
        Generates an RDD comprised of i.i.d samples from the Poisson
        distribution with the input mean.

        >>> mean = 100.0
        >>> x = RandomRDDGenerators.poissonRDD(sc, mean, 1000, seed=1L)
        >>> stats = x.stats()
        >>> stats.count()
        1000L
        >>> abs(stats.mean() - mean) < 0.5
        True
        >>> from math import sqrt
        >>> abs(stats.stdev() - sqrt(mean)) < 0.5
        True
        """
        jrdd = sc._jvm.PythonMLLibAPI().poissonRDD(sc._jsc, mean, size, numPartitions, seed)
        poisson = RDD(jrdd, sc, NoOpSerializer())
        return poisson.map(lambda bytes: _deserialize_double(bytearray(bytes)))

开发者ID:Cindy-Guo，项目名称:perrier，代码行数:19，代码来源:random.py

示例8: poissonVectorRDD

    def poissonVectorRDD(sc, mean, numRows, numCols, numPartitions=None, seed=None):
        """
        Generates an RDD comprised of vectors containing i.i.d. samples drawn
        from the Poisson distribution with the input mean.

        >>> import numpy as np
        >>> mean = 100.0
        >>> rdd = RandomRDDs.poissonVectorRDD(sc, mean, 100, 100, seed=1L)
        >>> mat = np.mat(rdd.collect())
        >>> mat.shape
        (100, 100)
        >>> abs(mat.mean() - mean) < 0.5
        True
        >>> from math import sqrt
        >>> abs(mat.std() - sqrt(mean)) < 0.5
        True
        """
        jrdd = sc._jvm.PythonMLLibAPI().poissonVectorRDD(sc._jsc, mean, numRows, numCols, numPartitions, seed)
        poisson = RDD(jrdd, sc, NoOpSerializer())
        return poisson.map(lambda bytes: _deserialize_double_vector(bytearray(bytes)))

开发者ID:hcook，项目名称:spark，代码行数:20，代码来源:random.py

示例9: normalRDD

    def normalRDD(sc, size, numPartitions=None, seed=None):
        """
        Generates an RDD comprised of i.i.d samples from the standard normal
        distribution.

        To transform the distribution in the generated RDD from standard normal
        to some other normal N(mean, sigma), use
        C{RandomRDDGenerators.normal(sc, n, p, seed)\
          .map(lambda v: mean + sigma * v)}

        >>> x = RandomRDDGenerators.normalRDD(sc, 1000, seed=1L)
        >>> stats = x.stats()
        >>> stats.count()
        1000L
        >>> abs(stats.mean() - 0.0) < 0.1
        True
        >>> abs(stats.stdev() - 1.0) < 0.1
        True
        """
        jrdd = sc._jvm.PythonMLLibAPI().normalRDD(sc._jsc, size, numPartitions, seed)
        normal = RDD(jrdd, sc, NoOpSerializer())
        return normal.map(lambda bytes: _deserialize_double(bytearray(bytes)))

开发者ID:Cindy-Guo，项目名称:perrier，代码行数:22，代码来源:random.py

示例10: asDataFrames

	def asDataFrames(self, *index_by):
		'''
			Reads the spanned rows as DataFrames if pandas is available, or as
			a dict of numpy arrays if only numpy is available or as a dict with
			primitives and objects otherwise.
			
			@param index_by If pandas is available, the dataframes will be
			indexed by the given columns.
		'''
		for c in index_by:
			if c in self.columns:
				raise ValueError('column %s cannot be used as index in the data'
					'frames as it is a column by which the rows are spanned.') 
		
		columns = as_java_array(self.ctx._gateway, "String", (str(c) for c in self.columns))
		jrdd = self._helper.spanBy(self._cjrdd, columns)
		rdd = RDD(jrdd, self.ctx)
		
		global pd
		if index_by and pd:
			return rdd.mapValues(lambda _: _.set_index(*[str(c) for c in index_by]))
		else:
			return rdd

开发者ID:dancingnarwhals，项目名称:pyspark-cassandra，代码行数:23，代码来源:rdd.py

示例11: createRDD

    def createRDD(sc, kafkaParams, offsetRanges, leaders={},
                  keyDecoder=utf8_decoder, valueDecoder=utf8_decoder):
        """
        .. note:: Experimental

        Create a RDD from Kafka using offset ranges for each topic and partition.

        :param sc:  SparkContext object
        :param kafkaParams: Additional params for Kafka
        :param offsetRanges:  list of offsetRange to specify topic:partition:[start, end) to consume
        :param leaders: Kafka brokers for each TopicAndPartition in offsetRanges.  May be an empty
            map, in which case leaders will be looked up on the driver.
        :param keyDecoder:  A function used to decode key (default is utf8_decoder)
        :param valueDecoder:  A function used to decode value (default is utf8_decoder)
        :return: A RDD object
        """
        if not isinstance(kafkaParams, dict):
            raise TypeError("kafkaParams should be dict")
        if not isinstance(offsetRanges, list):
            raise TypeError("offsetRanges should be list")

        try:
            helperClass = sc._jvm.java.lang.Thread.currentThread().getContextClassLoader() \
                .loadClass("org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper")
            helper = helperClass.newInstance()
            joffsetRanges = [o._jOffsetRange(helper) for o in offsetRanges]
            jleaders = dict([(k._jTopicAndPartition(helper),
                              v._jBroker(helper)) for (k, v) in leaders.items()])
            jrdd = helper.createRDD(sc._jsc, kafkaParams, joffsetRanges, jleaders)
        except Py4JJavaError as e:
            if 'ClassNotFoundException' in str(e.java_exception):
                KafkaUtils._printErrorMsg(sc)
            raise e

        ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())
        rdd = RDD(jrdd, sc, ser)
        return rdd.map(lambda k_v: (keyDecoder(k_v[0]), valueDecoder(k_v[1])))

开发者ID:308306362，项目名称:spark，代码行数:37，代码来源:kafka.py

示例12: uniformRDD

    def uniformRDD(sc, size, numPartitions=None, seed=None):
        """
        Generates an RDD comprised of i.i.d. samples from the
        uniform distribution on [0.0, 1.0].

        To transform the distribution in the generated RDD from U[0.0, 1.0]
        to U[a, b], use
        C{RandomRDDGenerators.uniformRDD(sc, n, p, seed)\
          .map(lambda v: a + (b - a) * v)}

        >>> x = RandomRDDGenerators.uniformRDD(sc, 100).collect()
        >>> len(x)
        100
        >>> max(x) <= 1.0 and min(x) >= 0.0
        True
        >>> RandomRDDGenerators.uniformRDD(sc, 100, 4).getNumPartitions()
        4
        >>> parts = RandomRDDGenerators.uniformRDD(sc, 100, seed=4).getNumPartitions()
        >>> parts == sc.defaultParallelism
        True
        """
        jrdd = sc._jvm.PythonMLLibAPI().uniformRDD(sc._jsc, size, numPartitions, seed)
        uniform = RDD(jrdd, sc, NoOpSerializer())
        return uniform.map(lambda bytes: _deserialize_double(bytearray(bytes)))

开发者ID:Cindy-Guo，项目名称:perrier，代码行数:24，代码来源:random.py

示例13: init

 def __init__(self, jrdd, ctx, jrdd_deserializer):
     RDD.__init__(self, jrdd, ctx, jrdd_deserializer)

开发者ID:BeforeRain，项目名称:spark，代码行数:2，代码来源:kafka.py

示例14: update_dictionary

from pyspark import SparkContext
from pyspark import SparkConf
import sys
from pyspark.rdd import RDD

def update_dictionary(map):
        map.update(test="alex")
        return map

if __name__ == "__main__":
        print("here1")
        conf = SparkConf()
        sc = SparkContext(appName="alex_test_app")
        print("here2")
        print("here2b: " + conf.get("spark.aleph2_job_config"))
        aleph2 = sc._jvm.java.lang.Thread.currentThread().getContextClassLoader().loadClass("com.ikanow.aleph2.analytics.spark.utils.SparkPyTechnologyUtils").newInstance().getAleph2(sc._jsc, sys.argv[1])
        print("here3")
        print aleph2.getRddInputNames()
        print("here4")
        #print RDD(sc._jvm.SerDe.javaToPython(aleph2.getAllRddInputs()), sc).count()
        print("here5")
        to_output = RDD(sc._jvm.SerDe.javaToPython(aleph2.getAllRddInputs()), sc).map(lambda m: update_dictionary(m))
        aleph2.emitRdd(to_output._to_java_object_rdd())

开发者ID:Alex-At-Home，项目名称:Aleph2-contrib，代码行数:23，代码来源:low_level_sample.py

示例15: init

 def __init__(self, jrdd, ctx, jrdd_deserializer):
     warnings.warn(
         "Deprecated in 2.3.0. Kafka 0.8 support is deprecated as of Spark 2.3.0. "
         "See SPARK-21893.",
         DeprecationWarning)
     RDD.__init__(self, jrdd, ctx, jrdd_deserializer)

开发者ID:BaiBenny，项目名称:spark，代码行数:6，代码来源:kafka.py

注：本文中的pyspark.rdd.RDD类示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python serializers.read_int函数代码示例发布时间：2022-05-26

Python rdd._prepare_for_python_RDD函数代码示例发布时间：2022-05-26

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13786|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10177|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4074|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4039|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3831|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3508|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3029|2022-01-22

8 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2645|2022-05-25

9 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2635|2022-01-22

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2293|2022-01-22

客服电话

电子邮件

Python rdd.RDD类代码示例

示例1: loadLabeledPoints

示例2: init

示例3: rdd

示例4: rdd

示例5: normalVectorRDD

示例6: uniformVectorRDD

示例7: poissonRDD

示例8: poissonVectorRDD

示例9: normalRDD

示例10: asDataFrames

示例11: createRDD

示例12: uniformRDD

示例13: init

示例14: update_dictionary

示例15: init

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053

客服电话

电子邮件

Python rdd.RDD类代码示例

示例1: loadLabeledPoints

示例2: __init__

示例3: rdd

示例4: rdd

示例5: normalVectorRDD

示例6: uniformVectorRDD

示例7: poissonRDD

示例8: poissonVectorRDD

示例9: normalRDD

示例10: asDataFrames

示例11: createRDD

示例12: uniformRDD

示例13: __init__

示例14: update_dictionary

示例15: __init__

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053

示例2: init

示例13: init

示例15: init