Python functions.pandas_udf函数代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中pyspark.sql.functions.pandas_udf函数的典型用法代码示例。如果您正苦于以下问题：Python pandas_udf函数的具体用法？Python pandas_udf怎么用？Python pandas_udf使用的例子？那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。

在下文中一共展示了pandas_udf函数的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: test_vectorized_udf_timestamps_respect_session_timezone

    def test_vectorized_udf_timestamps_respect_session_timezone(self):
        schema = StructType([
            StructField("idx", LongType(), True),
            StructField("timestamp", TimestampType(), True)])
        data = [(1, datetime(1969, 1, 1, 1, 1, 1)),
                (2, datetime(2012, 2, 2, 2, 2, 2)),
                (3, None),
                (4, datetime(2100, 3, 3, 3, 3, 3))]
        df = self.spark.createDataFrame(data, schema=schema)

        f_timestamp_copy = pandas_udf(lambda ts: ts, TimestampType())
        internal_value = pandas_udf(
            lambda ts: ts.apply(lambda ts: ts.value if ts is not pd.NaT else None), LongType())

        timezone = "America/New_York"
        with self.sql_conf({
                "spark.sql.execution.pandas.respectSessionTimeZone": False,
                "spark.sql.session.timeZone": timezone}):
            df_la = df.withColumn("tscopy", f_timestamp_copy(col("timestamp"))) \
                .withColumn("internal_value", internal_value(col("timestamp")))
            result_la = df_la.select(col("idx"), col("internal_value")).collect()
            # Correct result_la by adjusting 3 hours difference between Los Angeles and New York
            diff = 3 * 60 * 60 * 1000 * 1000 * 1000
            result_la_corrected = \
                df_la.select(col("idx"), col("tscopy"), col("internal_value") + diff).collect()

        with self.sql_conf({
                "spark.sql.execution.pandas.respectSessionTimeZone": True,
                "spark.sql.session.timeZone": timezone}):
            df_ny = df.withColumn("tscopy", f_timestamp_copy(col("timestamp"))) \
                .withColumn("internal_value", internal_value(col("timestamp")))
            result_ny = df_ny.select(col("idx"), col("tscopy"), col("internal_value")).collect()

            self.assertNotEqual(result_ny, result_la)
            self.assertEqual(result_ny, result_la_corrected)

开发者ID:q977734161，项目名称:spark，代码行数:35，代码来源:test_pandas_udf_scalar.py

示例2: test_vectorized_udf_wrong_return_type

 def test_vectorized_udf_wrong_return_type(self):
     from pyspark.sql.functions import pandas_udf
     with QuietTest(self.sc):
         with self.assertRaisesRegexp(
                 NotImplementedError,
                 'Invalid returnType.*scalar Pandas UDF.*MapType'):
             pandas_udf(lambda x: x * 1.0, MapType(LongType(), LongType()))

开发者ID:JingchengDu，项目名称:spark，代码行数:7，代码来源:test_pandas_udf_scalar.py

示例3: test_udf_wrong_arg

    def test_udf_wrong_arg(self):
        with QuietTest(self.sc):
            with self.assertRaises(ParseException):
                @pandas_udf('blah')
                def foo(x):
                    return x
            with self.assertRaisesRegexp(ValueError, 'Invalid returnType.*None'):
                @pandas_udf(functionType=PandasUDFType.SCALAR)
                def foo(x):
                    return x
            with self.assertRaisesRegexp(ValueError, 'Invalid functionType'):
                @pandas_udf('double', 100)
                def foo(x):
                    return x

            with self.assertRaisesRegexp(ValueError, '0-arg pandas_udfs.*not.*supported'):
                pandas_udf(lambda: 1, LongType(), PandasUDFType.SCALAR)
            with self.assertRaisesRegexp(ValueError, '0-arg pandas_udfs.*not.*supported'):
                @pandas_udf(LongType(), PandasUDFType.SCALAR)
                def zero_with_type():
                    return 1

            with self.assertRaisesRegexp(TypeError, 'Invalid returnType'):
                @pandas_udf(returnType=PandasUDFType.GROUPED_MAP)
                def foo(df):
                    return df
            with self.assertRaisesRegexp(TypeError, 'Invalid returnType'):
                @pandas_udf(returnType='double', functionType=PandasUDFType.GROUPED_MAP)
                def foo(df):
                    return df
            with self.assertRaisesRegexp(ValueError, 'Invalid function'):
                @pandas_udf(returnType='k int, v double', functionType=PandasUDFType.GROUPED_MAP)
                def foo(k, v, w):
                    return k

开发者ID:Lewuathe，项目名称:spark，代码行数:34，代码来源:test_pandas_udf.py

示例4: test_vectorized_udf_struct_type

    def test_vectorized_udf_struct_type(self):
        df = self.spark.range(10)
        return_type = StructType([
            StructField('id', LongType()),
            StructField('str', StringType())])

        def func(id):
            return pd.DataFrame({'id': id, 'str': id.apply(unicode)})

        f = pandas_udf(func, returnType=return_type)

        expected = df.select(struct(col('id'), col('id').cast('string').alias('str'))
                             .alias('struct')).collect()

        actual = df.select(f(col('id')).alias('struct')).collect()
        self.assertEqual(expected, actual)

        g = pandas_udf(func, 'id: long, str: string')
        actual = df.select(g(col('id')).alias('struct')).collect()
        self.assertEqual(expected, actual)

        struct_f = pandas_udf(lambda x: x, return_type)
        actual = df.select(struct_f(struct(col('id'), col('id').cast('string').alias('str'))))
        if LooseVersion(pa.__version__) < LooseVersion("0.10.0"):
            with QuietTest(self.sc):
                from py4j.protocol import Py4JJavaError
                with self.assertRaisesRegexp(
                        Py4JJavaError,
                        'Unsupported type in conversion from Arrow'):
                    self.assertEqual(expected, actual.collect())
        else:
            self.assertEqual(expected, actual.collect())

开发者ID:q977734161，项目名称:spark，代码行数:32，代码来源:test_pandas_udf_scalar.py

示例5: test_vectorized_udf_unsupported_types

 def test_vectorized_udf_unsupported_types(self):
     from pyspark.sql.functions import pandas_udf
     with QuietTest(self.sc):
         with self.assertRaisesRegexp(
                 NotImplementedError,
                 'Invalid returnType.*scalar Pandas UDF.*MapType'):
             pandas_udf(lambda x: x, MapType(StringType(), IntegerType()))

开发者ID:JingchengDu，项目名称:spark，代码行数:7，代码来源:test_pandas_udf_scalar.py

示例6: test_vectorized_udf_chained

 def test_vectorized_udf_chained(self):
     from pyspark.sql.functions import pandas_udf, col
     df = self.spark.range(10)
     f = pandas_udf(lambda x: x + 1, LongType())
     g = pandas_udf(lambda x: x - 1, LongType())
     res = df.select(g(f(col('id'))))
     self.assertEquals(df.collect(), res.collect())

开发者ID:JingchengDu，项目名称:spark，代码行数:7，代码来源:test_pandas_udf_scalar.py

示例7: test_wrong_return_type

 def test_wrong_return_type(self):
     with QuietTest(self.sc):
         with self.assertRaisesRegexp(
                 NotImplementedError,
                 'Invalid returnType.*grouped map Pandas UDF.*MapType'):
             pandas_udf(
                 lambda pdf: pdf,
                 'id long, v map<int, int>',
                 PandasUDFType.GROUPED_MAP)

开发者ID:CodingCat，项目名称:spark，代码行数:9，代码来源:test_pandas_udf_grouped_map.py

示例8: test_vectorized_udf_unsupported_types

 def test_vectorized_udf_unsupported_types(self):
     with QuietTest(self.sc):
         with self.assertRaisesRegexp(
                 NotImplementedError,
                 'Invalid returnType.*scalar Pandas UDF.*MapType'):
             pandas_udf(lambda x: x, MapType(StringType(), IntegerType()))
         with self.assertRaisesRegexp(
                 NotImplementedError,
                 'Invalid returnType.*scalar Pandas UDF.*ArrayType.StructType'):
             pandas_udf(lambda x: x, ArrayType(StructType([StructField('a', IntegerType())])))

开发者ID:q977734161，项目名称:spark，代码行数:10，代码来源:test_pandas_udf_scalar.py

示例9: test_mixed_scalar_udfs_followed_by_grouby_apply

    def test_mixed_scalar_udfs_followed_by_grouby_apply(self):
        df = self.spark.range(0, 10).toDF('v1')
        df = df.withColumn('v2', udf(lambda x: x + 1, 'int')(df['v1'])) \
            .withColumn('v3', pandas_udf(lambda x: x + 2, 'int')(df['v1']))

        result = df.groupby() \
            .apply(pandas_udf(lambda x: pd.DataFrame([x.sum().sum()]),
                              'sum int',
                              PandasUDFType.GROUPED_MAP))

        self.assertEquals(result.collect()[0]['sum'], 165)

开发者ID:q977734161，项目名称:spark，代码行数:11，代码来源:test_pandas_udf_grouped_map.py

示例10: test_stopiteration_in_udf

    def test_stopiteration_in_udf(self):
        from pyspark.sql.functions import udf, pandas_udf, PandasUDFType
        from py4j.protocol import Py4JJavaError

        def foo(x):
            raise StopIteration()

        def foofoo(x, y):
            raise StopIteration()

        exc_message = "Caught StopIteration thrown from user's code; failing the task"
        df = self.spark.range(0, 100)

        # plain udf (test for SPARK-23754)
        self.assertRaisesRegexp(
            Py4JJavaError,
            exc_message,
            df.withColumn('v', udf(foo)('id')).collect
        )

        # pandas scalar udf
        self.assertRaisesRegexp(
            Py4JJavaError,
            exc_message,
            df.withColumn(
                'v', pandas_udf(foo, 'double', PandasUDFType.SCALAR)('id')
            ).collect
        )

        # pandas grouped map
        self.assertRaisesRegexp(
            Py4JJavaError,
            exc_message,
            df.groupBy('id').apply(
                pandas_udf(foo, df.schema, PandasUDFType.GROUPED_MAP)
            ).collect
        )

        self.assertRaisesRegexp(
            Py4JJavaError,
            exc_message,
            df.groupBy('id').apply(
                pandas_udf(foofoo, df.schema, PandasUDFType.GROUPED_MAP)
            ).collect
        )

        # pandas grouped agg
        self.assertRaisesRegexp(
            Py4JJavaError,
            exc_message,
            df.groupBy('id').agg(
                pandas_udf(foo, 'double', PandasUDFType.GROUPED_AGG)('id')
            ).collect
        )

开发者ID:JingchengDu，项目名称:spark，代码行数:54，代码来源:test_pandas_udf.py

示例11: test_vectorized_udf_complex

 def test_vectorized_udf_complex(self):
     df = self.spark.range(10).select(
         col('id').cast('int').alias('a'),
         col('id').cast('int').alias('b'),
         col('id').cast('double').alias('c'))
     add = pandas_udf(lambda x, y: x + y, IntegerType())
     power2 = pandas_udf(lambda x: 2 ** x, IntegerType())
     mul = pandas_udf(lambda x, y: x * y, DoubleType())
     res = df.select(add(col('a'), col('b')), power2(col('a')), mul(col('b'), col('c')))
     expected = df.select(expr('a + b'), expr('power(2, a)'), expr('b * c'))
     self.assertEquals(expected.collect(), res.collect())

开发者ID:q977734161，项目名称:spark，代码行数:11，代码来源:test_pandas_udf_scalar.py

示例12: test_vectorized_udf_nested_struct

    def test_vectorized_udf_nested_struct(self):
        nested_type = StructType([
            StructField('id', IntegerType()),
            StructField('nested', StructType([
                StructField('foo', StringType()),
                StructField('bar', FloatType())
            ]))
        ])

        with QuietTest(self.sc):
            with self.assertRaisesRegexp(
                    Exception,
                    'Invalid returnType with scalar Pandas UDFs'):
                pandas_udf(lambda x: x, returnType=nested_type)

开发者ID:q977734161，项目名称:spark，代码行数:14，代码来源:test_pandas_udf_scalar.py

示例13: test_unsupported_types

    def test_unsupported_types(self):
        common_err_msg = 'Invalid returnType.*grouped map Pandas UDF.*'
        unsupported_types = [
            StructField('map', MapType(StringType(), IntegerType())),
            StructField('arr_ts', ArrayType(TimestampType())),
            StructField('null', NullType()),
            StructField('struct', StructType([StructField('l', LongType())])),
        ]

        for unsupported_type in unsupported_types:
            schema = StructType([StructField('id', LongType(), True), unsupported_type])
            with QuietTest(self.sc):
                with self.assertRaisesRegexp(NotImplementedError, common_err_msg):
                    pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP)

开发者ID:apache，项目名称:spark，代码行数:14，代码来源:test_pandas_udf_grouped_map.py

示例14: test_vectorized_udf_basic

 def test_vectorized_udf_basic(self):
     from pyspark.sql.functions import pandas_udf, col, array
     df = self.spark.range(10).select(
         col('id').cast('string').alias('str'),
         col('id').cast('int').alias('int'),
         col('id').alias('long'),
         col('id').cast('float').alias('float'),
         col('id').cast('double').alias('double'),
         col('id').cast('decimal').alias('decimal'),
         col('id').cast('boolean').alias('bool'),
         array(col('id')).alias('array_long'))
     f = lambda x: x
     str_f = pandas_udf(f, StringType())
     int_f = pandas_udf(f, IntegerType())
     long_f = pandas_udf(f, LongType())
     float_f = pandas_udf(f, FloatType())
     double_f = pandas_udf(f, DoubleType())
     decimal_f = pandas_udf(f, DecimalType())
     bool_f = pandas_udf(f, BooleanType())
     array_long_f = pandas_udf(f, ArrayType(LongType()))
     res = df.select(str_f(col('str')), int_f(col('int')),
                     long_f(col('long')), float_f(col('float')),
                     double_f(col('double')), decimal_f('decimal'),
                     bool_f(col('bool')), array_long_f('array_long'))
     self.assertEquals(df.collect(), res.collect())

开发者ID:JingchengDu，项目名称:spark，代码行数:25，代码来源:test_pandas_udf_scalar.py

示例15: test_vectorized_udf_null_binary

 def test_vectorized_udf_null_binary(self):
     if LooseVersion(pa.__version__) < LooseVersion("0.10.0"):
         with QuietTest(self.sc):
             with self.assertRaisesRegexp(
                     NotImplementedError,
                     'Invalid returnType.*scalar Pandas UDF.*BinaryType'):
                 pandas_udf(lambda x: x, BinaryType())
     else:
         data = [(bytearray(b"a"),), (None,), (bytearray(b"bb"),), (bytearray(b"ccc"),)]
         schema = StructType().add("binary", BinaryType())
         df = self.spark.createDataFrame(data, schema)
         str_f = pandas_udf(lambda x: x, BinaryType())
         res = df.select(str_f(col('binary')))
         self.assertEquals(df.collect(), res.collect())

开发者ID:q977734161，项目名称:spark，代码行数:14，代码来源:test_pandas_udf_scalar.py

示例16: test_vectorized_udf_null_int

 def test_vectorized_udf_null_int(self):
     data = [(None,), (2,), (3,), (4,)]
     schema = StructType().add("int", IntegerType())
     df = self.spark.createDataFrame(data, schema)
     int_f = pandas_udf(lambda x: x, IntegerType())
     res = df.select(int_f(col('int')))
     self.assertEquals(df.collect(), res.collect())

开发者ID:q977734161，项目名称:spark，代码行数:7，代码来源:test_pandas_udf_scalar.py

示例17: test_vectorized_udf_null_array

 def test_vectorized_udf_null_array(self):
     data = [([1, 2],), (None,), (None,), ([3, 4],), (None,)]
     array_schema = StructType([StructField("array", ArrayType(IntegerType()))])
     df = self.spark.createDataFrame(data, schema=array_schema)
     array_f = pandas_udf(lambda x: x, ArrayType(IntegerType()))
     result = df.select(array_f(col('array')))
     self.assertEquals(df.collect(), result.collect())

开发者ID:q977734161，项目名称:spark，代码行数:7，代码来源:test_pandas_udf_scalar.py

示例18: test_vectorized_udf_null_string

 def test_vectorized_udf_null_string(self):
     data = [("foo",), (None,), ("bar",), ("bar",)]
     schema = StructType().add("str", StringType())
     df = self.spark.createDataFrame(data, schema)
     str_f = pandas_udf(lambda x: x, StringType())
     res = df.select(str_f(col('str')))
     self.assertEquals(df.collect(), res.collect())

开发者ID:q977734161，项目名称:spark，代码行数:7，代码来源:test_pandas_udf_scalar.py

示例19: test_vectorized_udf_string_in_udf

 def test_vectorized_udf_string_in_udf(self):
     import pandas as pd
     df = self.spark.range(10)
     str_f = pandas_udf(lambda x: pd.Series(map(str, x)), StringType())
     actual = df.select(str_f(col('id')))
     expected = df.select(col('id').cast('string'))
     self.assertEquals(expected.collect(), actual.collect())

开发者ID:Brett-A，项目名称:spark，代码行数:7，代码来源:test_pandas_udf_scalar.py

示例20: test_manual

    def test_manual(self):
        df = self.data
        sum_udf = self.pandas_agg_sum_udf
        mean_udf = self.pandas_agg_mean_udf
        mean_arr_udf = pandas_udf(
            self.pandas_agg_mean_udf.func,
            ArrayType(self.pandas_agg_mean_udf.returnType),
            self.pandas_agg_mean_udf.evalType)

        result1 = df.groupby('id').agg(
            sum_udf(df.v),
            mean_udf(df.v),
            mean_arr_udf(array(df.v))).sort('id')
        expected1 = self.spark.createDataFrame(
            [[0, 245.0, 24.5, [24.5]],
             [1, 255.0, 25.5, [25.5]],
             [2, 265.0, 26.5, [26.5]],
             [3, 275.0, 27.5, [27.5]],
             [4, 285.0, 28.5, [28.5]],
             [5, 295.0, 29.5, [29.5]],
             [6, 305.0, 30.5, [30.5]],
             [7, 315.0, 31.5, [31.5]],
             [8, 325.0, 32.5, [32.5]],
             [9, 335.0, 33.5, [33.5]]],
            ['id', 'sum(v)', 'avg(v)', 'avg(array(v))'])

        self.assertPandasEqual(expected1.toPandas(), result1.toPandas())

开发者ID:Brett-A，项目名称:spark，代码行数:27，代码来源:test_pandas_udf_grouped_agg.py

注：本文中的pyspark.sql.functions.pandas_udf函数示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python functions.rand函数代码示例发布时间：2022-05-27

Python functions.min函数代码示例发布时间：2022-05-27

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13812|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10205|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4092|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4045|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3845|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3515|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3032|2022-01-22

8 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2655|2022-05-25

9 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2651|2022-01-22

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2303|2022-01-22

客服电话

电子邮件

Python functions.pandas_udf函数代码示例

示例1: test_vectorized_udf_timestamps_respect_session_timezone

示例2: test_vectorized_udf_wrong_return_type

示例3: test_udf_wrong_arg

示例4: test_vectorized_udf_struct_type

示例5: test_vectorized_udf_unsupported_types

示例6: test_vectorized_udf_chained

示例7: test_wrong_return_type

示例8: test_vectorized_udf_unsupported_types

示例9: test_mixed_scalar_udfs_followed_by_grouby_apply

示例10: test_stopiteration_in_udf

示例11: test_vectorized_udf_complex

示例12: test_vectorized_udf_nested_struct

示例13: test_unsupported_types

示例14: test_vectorized_udf_basic

示例15: test_vectorized_udf_null_binary

示例16: test_vectorized_udf_null_int

示例17: test_vectorized_udf_null_array

示例18: test_vectorized_udf_null_string

示例19: test_vectorized_udf_string_in_udf

示例20: test_manual

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053