我在代码中使用特征列数据集,在较新版本的 TensorFlow 2.16.1 及更高版本中,没有 keras.layers.DenseFeatures 类来为 DNN 准备输入层。有什么替代方案吗? 由于我使用的是 python 3.11.7,因此无法安装 TensorFlow 2.15 或更早版本。
INPUT_COLS = [
"pickup_longitude",
"pickup_latitude",
"dropoff_longitude",
"dropoff_latitude",
"passenger_count",
]
feature_columns = {colname: tf.feature_column.numeric_column(colname) for colname in INPUT_COLS}
# Build a keras DNN model using Sequential API
model = Sequential(
[
keras.layers.DenseFeatures(feature_columns=feature_columns.values()),
keras.layers.Dense(units=32, activation="relu", name="h1"),
keras.layers.Dense(units=8, activation="relu", name="h2"),
keras.layers.Dense(units=1, activation="linear", name="output"),
]
)
我假设你是从 Google 云做这个lab。我也用 Tensorflow 2.16.1 完成了训练和预测部分。
首先根据此迁移指南,当我们从 Tensorflow 2.15 迁移到 Tensorflow 2.16.1 时,DenseFeatures 已被弃用,转而使用预处理层
在实验室期间,我还收到了类似于以下内容的警告:
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/3124623333.py:2: numeric_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0., -1., 1.]], dtype=float32)>
它清楚地表明我们应该使用预处理层或FeatureSpace实用程序。我尝试了两种方法,可以为您提供以下解决方案
首先定义批量大小、纪元数、示例数的常量并准备数据集。我在创建模型之前执行此步骤,该模型与实验室设计的模型不同,但不更改结果。原因是标准化器需要数据集作为下一步的参数
TRAIN_BATCH_SIZE = 1000
NUM_TRAIN_EXAMPLES = 10000 * 5 # training dataset will repeat, wrap around
NUM_EVALS = 50 # how many times to evaluate
NUM_EVAL_EXAMPLES = 10000 # enough to get a reasonable sample
trainds = create_dataset(
pattern="../data/taxi-train.csv", batch_size=TRAIN_BATCH_SIZE, mode="train"
)
evalds = create_dataset(
pattern="../data/taxi-valid.csv", batch_size=TRAIN_BATCH_SIZE, mode="eval"
).take(NUM_EVAL_EXAMPLES // TRAIN_BATCH_SIZE)
然后定义一个 normalizer 作为方法:
def get_normalization_layer(name, dataset):
# Create a Normalization layer for our feature.
normalizer = tf.keras.layers.Normalization(axis=None)
# Prepare a Dataset that only yields our feature.
feature_ds = dataset.map(lambda x, y: x[name])
# Learn the statistics of the data.
normalizer.adapt(feature_ds)
return normalizer
然后使用功能 API 构建模型,如下所示:
INPUT_COLS = [
"pickup_longitude",
"pickup_latitude",
"dropoff_longitude",
"dropoff_latitude",
"passenger_count",
]
inputs = [tf.keras.Input(shape = (1,), name = col) for col in INPUT_COLS] # Here you need a KerasTensor with named elements
encoded_features = [get_normalization_layer(col, trainds)(inputs[idx]) for (idx, col) in enumerate(INPUT_COLS)]
x = tf.keras.layers.Dense(32, activation = "relu")(tf.keras.layers.concatenate(encoded_features)) # Here you need a KerasTensor with unnamed elements which are preprocessed KerasTensor referenced to inputs
x = tf.keras.layers.Dense(8, "relu")(x)
outputs = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs, outputs)
编译你的模型
model.compile(optimizer = "rmsprop", loss = "mse", metrics = ["mse", "root_mean_squared_error"])
最后在实验室按照要求做训练部分
%time
steps_per_epoch = NUM_TRAIN_EXAMPLES // NUM_EVALS
LOGDIR = "./taxi_trained"
history = model.fit(x = trainds, batch_size = TRAIN_BATCH_SIZE, epochs = NUM_EVALS, validation_data = evalds, steps_per_epoch=steps_per_epoch)
INPUT_COLS = [
"pickup_longitude",
"pickup_latitude",
"dropoff_longitude",
"dropoff_latitude",
"passenger_count",
]
# Create input layer of feature columns
feature_columns = [tf.feature_column.numeric_column(col) for col in INPUT_COLS]
from tensorflow.keras.utils import FeatureSpace
feature_space = tf.keras.utils.FeatureSpace(features = {col: FeatureSpace.float_normalized() for col in INPUT_COLS}, output_mode = "concat")
# Here the normalization is done when we adapt the features to sample data
feature_space.adapt(tf.data.Dataset.from_tensor_slices({
'dropoff_latitude': [40.751293, 40.75003 ],
'dropoff_longitude': [-73.99051 , -73.974396],
'passenger_count': [1., 1.],
'pickup_latitude': [40.7661 , 40.753353],
'pickup_longitude': [-73.97977, -73.98125]
}))
然后我们可以使用 Sequential
构建我们的模型model = tf.keras.Sequential(
[
feature_space,
tf.keras.layers.Dense(32, "relu"),
tf.keras.layers.Dense(8, "relu"),
tf.keras.layers.Dense(1),
]
)
模型编译、数据集准备和训练可以如上所述重复。
尝试了这两种方法后,我发现我可能不明白如何彻底使用
FeatureSpace.adapt()
,这导致第一种方法表现更好(该模型提供了更好的指标)。也许有人也可以帮我解决这个问题:)
希望有帮助!