TransWikia.com

Access global variable from UDF (User Defined Function) in python in spark

Data Science Asked by give_it_a_bit on July 5, 2021

I am trying to alter a global variable from inside a pyspark.sql.functions.udf function in python. But, the change in not getting reflected in the global variable.

The reproducible example along with outputs is:

counter = 0

schema2 = StructType([
    StructField("id", IntegerType(), True),
    StructField("name", StringType(), True)   
])

data2 = [(1, "A"), (2, "B")]

df = spark.createDataFrame(data = data2, schema = schema2)

def myFunc(column):
    global counter
    counter = counter + 1
    return column + 5
  
myFuncUDF = udf(myFunc, IntegerType())

display(df.withColumn('id1', myFuncUDF(df.id)))

Output:

id name id1
1 A 6
2 B 7

When I print the counter variable, it remains 0.

Can anyone help me to know how to access a global variable inside a UDF and alter the global variable on each call to the UDF? or whether it is not possible?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP