TensorFlow 2.16.+ 中 keras.layers.DenseFeatures 的替代方案是什么

问题描述 投票:0回答:1

我在代码中使用特征列数据集,在较新版本的 TensorFlow 2.16.1 及更高版本中,没有 keras.layers.DenseFeatures 类来为 DNN 准备输入层。有什么替代方案吗? 由于我使用的是 python 3.11.7,因此无法安装 TensorFlow 2.15 或更早版本。

INPUT_COLS = [
    "pickup_longitude",
    "pickup_latitude",
    "dropoff_longitude",
    "dropoff_latitude",
    "passenger_count",
]

feature_columns = {colname: tf.feature_column.numeric_column(colname) for colname in INPUT_COLS}

# Build a keras DNN model using Sequential API
model = Sequential(
    [
        keras.layers.DenseFeatures(feature_columns=feature_columns.values()),
        keras.layers.Dense(units=32, activation="relu", name="h1"),
        keras.layers.Dense(units=8, activation="relu", name="h2"),
        keras.layers.Dense(units=1, activation="linear", name="output"),
    ]
)
python-3.x tensorflow deep-learning data-preprocessing
1个回答
0
投票

我假设你是从 Google 云做这个lab。我也用 Tensorflow 2.16.1 完成了训练和预测部分。

首先根据此迁移指南,当我们从 Tensorflow 2.15 迁移到 Tensorflow 2.16.1 时,DenseFeatures 已被弃用,转而使用预处理层

在实验室期间,我还收到了类似于以下内容的警告:

WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/3124623333.py:2: numeric_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0., -1.,  1.]], dtype=float32)>

它清楚地表明我们应该使用预处理层FeatureSpace实用程序。我尝试了两种方法,可以为您提供以下解决方案

  1. 使用预处理层:在本实验中,我发现使用归一化是合理的,因为输入似乎接受浮点值。

首先定义批量大小、纪元数、示例数的常量并准备数据集。我在创建模型之前执行此步骤,该模型与实验室设计的模型不同,但不更改结果。原因是标准化器需要数据集作为下一步的参数

TRAIN_BATCH_SIZE = 1000
NUM_TRAIN_EXAMPLES = 10000 * 5  # training dataset will repeat, wrap around
NUM_EVALS = 50  # how many times to evaluate
NUM_EVAL_EXAMPLES = 10000  # enough to get a reasonable sample

trainds = create_dataset(
    pattern="../data/taxi-train.csv", batch_size=TRAIN_BATCH_SIZE, mode="train"
)

evalds = create_dataset(
    pattern="../data/taxi-valid.csv", batch_size=TRAIN_BATCH_SIZE, mode="eval"
).take(NUM_EVAL_EXAMPLES // TRAIN_BATCH_SIZE)

然后定义一个 normalizer 作为方法:

def get_normalization_layer(name, dataset):
  # Create a Normalization layer for our feature.
  normalizer = tf.keras.layers.Normalization(axis=None)

  # Prepare a Dataset that only yields our feature.
  feature_ds = dataset.map(lambda x, y: x[name])
  
  # Learn the statistics of the data.
  normalizer.adapt(feature_ds)

  return normalizer

然后使用功能 API 构建模型,如下所示:

INPUT_COLS = [
    "pickup_longitude",
    "pickup_latitude",
    "dropoff_longitude",
    "dropoff_latitude",
    "passenger_count",
]

inputs = [tf.keras.Input(shape = (1,), name = col) for col in INPUT_COLS] # Here you need a KerasTensor with named elements
encoded_features = [get_normalization_layer(col, trainds)(inputs[idx]) for (idx, col) in enumerate(INPUT_COLS)]
x = tf.keras.layers.Dense(32, activation = "relu")(tf.keras.layers.concatenate(encoded_features)) # Here you need a KerasTensor with unnamed elements which are preprocessed KerasTensor referenced to inputs
x = tf.keras.layers.Dense(8, "relu")(x)
outputs = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs, outputs)

编译你的模型

model.compile(optimizer = "rmsprop", loss = "mse", metrics = ["mse", "root_mean_squared_error"])

最后在实验室按照要求做训练部分

%time 
steps_per_epoch = NUM_TRAIN_EXAMPLES // NUM_EVALS

LOGDIR = "./taxi_trained"

history = model.fit(x = trainds, batch_size = TRAIN_BATCH_SIZE, epochs = NUM_EVALS, validation_data = evalds, steps_per_epoch=steps_per_epoch)
  1. 使用FeatureSpace实用程序:通过这种方法,我们初始化FeatureSpace,然后在后台为我们进行标准化
INPUT_COLS = [
    "pickup_longitude",
    "pickup_latitude",
    "dropoff_longitude",
    "dropoff_latitude",
    "passenger_count",
]

# Create input layer of feature columns
feature_columns = [tf.feature_column.numeric_column(col) for col in INPUT_COLS]

from tensorflow.keras.utils import FeatureSpace
feature_space = tf.keras.utils.FeatureSpace(features = {col: FeatureSpace.float_normalized() for col in INPUT_COLS}, output_mode = "concat")

# Here the normalization is done when we adapt the features to sample data
feature_space.adapt(tf.data.Dataset.from_tensor_slices({
 'dropoff_latitude': [40.751293, 40.75003 ],
 'dropoff_longitude': [-73.99051 , -73.974396],
 'passenger_count': [1., 1.],
 'pickup_latitude': [40.7661  , 40.753353],
 'pickup_longitude': [-73.97977, -73.98125]
}))

然后我们可以使用 Sequential

构建我们的模型
model = tf.keras.Sequential(
    [
        feature_space,
        tf.keras.layers.Dense(32, "relu"),
        tf.keras.layers.Dense(8, "relu"),
        tf.keras.layers.Dense(1),
    ]
)

模型编译、数据集准备和训练可以如上所述重复。

尝试了这两种方法后,我发现我可能不明白如何彻底使用

FeatureSpace.adapt()
,这导致第一种方法表现更好(该模型提供了更好的指标)。也许有人也可以帮我解决这个问题:)

希望有帮助!

© www.soinside.com 2019 - 2024. All rights reserved.