我如何修复此“未定义”的“名称”? 未定义的“ Spark” - 创建PythonUDF 步骤是:
i首先定义convert_time函数。
文件“/tmp/ipykernel_8245/3893918262.py”,第16行,在convert_time中 名称:名称“值”未定义
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType, IntegerType
# Select the column and collect the values
#column_values = test_df5.select("TOI").collect()
#convert every row from TOI to appropriate Time of Day
def convert_time(hhmm):
print("initiated convert time")
if hhmm == None:
print("The variable is of Nonetype")
# Extract hours and minutes from the input string
elif value.isdigit() == True:
#print("stripping strings")
#int(hhmm.strip())
print(value)
hours = int(hhmm[:2])
minutes = int(hhmm[2:])
# Determine the time of day
if 5 <= hours < 12:
period = "Morning"
elif 12 <= hours < 17:
period = "Afternoon"
elif 17 <= hours < 21:
period = "Evening"
elif hours > 21:
period = "Night"
elif hours < 5:
period = "Night"
else:
period = ""
# Format the time in a readable way
formatted_time = f"{period}"
return formatted_time
else:
return
print("is another type")
print("creating the UDF")
convtime_udf = udf(convert_time, IntegerType())
print("applying the UDF to the TOI Column")
df_with_convtime = test_df5.withColumn("convtime", convtime_udf(test_df5["Time"]))
#df_with_convtime.show()
display(df_with_convtime)
value
,而是尝试使用它(在第16和20行上)。