Data Science Asked by give_it_a_bit on July 5, 2021
I am trying to alter a global variable from inside a pyspark.sql.functions.udf
function in python. But, the change in not getting reflected in the global variable.
The reproducible example along with outputs is:
counter = 0
schema2 = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True)
])
data2 = [(1, "A"), (2, "B")]
df = spark.createDataFrame(data = data2, schema = schema2)
def myFunc(column):
global counter
counter = counter + 1
return column + 5
myFuncUDF = udf(myFunc, IntegerType())
display(df.withColumn('id1', myFuncUDF(df.id)))
Output:
id | name | id1 |
---|---|---|
1 | A | 6 |
2 | B | 7 |
When I print the counter variable, it remains 0.
Can anyone help me to know how to access a global variable inside a UDF and alter the global variable on each call to the UDF? or whether it is not possible?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP