从 VAE 培训中获得输出

Question

我正在尝试在玩具数据集上训练

VAE

，以根据输入的微笑字符串预测差异基因表达谱。在

Keras

完成。这是代码：

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from rdkit import Chem
from rdkit.Chem import AllChem

# Generate toy data
num_samples = 1000
num_genes = 100
num_molecules = 10

# Generate random gene expression profiles
x_train = np.random.rand(num_samples, num_genes)

# Generate random SMILES strings and convert them to RDKit molecules
smiles_list = [Chem.MolToSmiles(Chem.MolFromSmiles(smiles)) for smiles in np.random.choice(["C", "O", "N", "CCC", "O=C(c1ccccc1)N"], size=num_samples)]
mol_list = [Chem.MolFromSmiles(smiles) for smiles in smiles_list]

# Convert molecules to Morgan fingerprints
fps = np.array([AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mol_list])
x_mol = np.array([np.array(fp) for fp in fps])


# Define the encoder model
input_shape = (num_genes,)
latent_dim = 10
encoder_inputs = keras.Input(shape=input_shape, name="encoder_inputs")
mol_inputs = keras.Input(shape=(2048,), name="mol_inputs")
x = layers.Dense(64, activation="relu")(encoder_inputs)
x = layers.Concatenate()([x, mol_inputs])
x = layers.Dense(32, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)

# Define the sampling layer
class Sampling(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch_size = tf.shape(z_mean)[0]
        epsilon = tf.keras.backend.random_normal(shape=(batch_size, latent_dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

z = Sampling()([z_mean, z_log_var])

# Define the decoder model
latent_inputs = keras.Input(shape=(latent_dim,), name="z_sampling")
x = layers.Dense(32, activation="relu")(latent_inputs)
x = layers.Dense(64, activation="relu")(x)
decoder_outputs = layers.Dense(num_genes, activation="sigmoid")(x)

# Define the VAE model
vae = keras.Model(inputs=[encoder_inputs, mol_inputs, latent_inputs], outputs=decoder_outputs)

# Define the loss function
reconstruction_loss = keras.losses.mse(encoder_inputs, decoder_outputs)
reconstruction_loss *= num_genes
kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
kl_loss = tf.reduce_mean(kl_loss) * -0.5
vae_loss = reconstruction_loss + kl_loss

# Compile the VAE model
vae.add_loss(vae_loss)
vae.compile(optimizer="adam")


# Train the VAE model
history = vae.fit(x=[x_train, x_mol, np.random.normal(size=(num_samples, latent_dim))],y=y_train,
                  epochs=1000,
                  batch_size=32)

And then to generate differential gene expression profile for a given data point doing :

# Define the encoder model
encoder = keras.Model(inputs=[encoder_inputs, mol_inputs], outputs=z_mean)

# Get the latent representation of a new sample
new_sample=x_train[1]
new_fp=x_mol[1]
latent_rep = encoder.predict([new_sample.reshape(1, -1), new_fp.reshape(1, -1)])

# Define the decoder model
decoder = keras.Model(inputs=latent_inputs, outputs=decoder_outputs)
# Generate a new sample from a given latent representation
new_sample = decoder.predict(latent_rep).reshape(-1)


# Encode the training data to obtain latent representations
latent_train = encoder.predict([x_train, x_mol])

# Decode the latent representations to obtain reconstructed gene expression patterns
reconstructed_train = decoder.predict(latent_train)

我无法重建原始数据。这些可能是它不起作用的可能原因： 1）如果解码器无法重新创建训练示例，则可能是由于各种原因造成的。一些常见的原因包括：

2）模型架构可能不适合给定的数据集。在这种情况下，您可能需要尝试不同的架构或超参数。

3）模型可能训练的时间不够长。您可能需要增加轮数或批量大小。

4）训练数据可能有噪声或不能代表真实的数据分布。在这种情况下，您可能需要收集更多数据或对数据进行预处理以去除噪音。

5）损失函数可能不适合给定的问题。您可能需要尝试不同的损失函数。

6）优化器可能无法找到损失函数的全局最小值。您可能需要尝试不同的优化器或学习率。

7）我可能在定义问题时做错了。

在这种情况下，我的 reconstructed_train 具有所有相同的元素。我做错了什么？

任何帮助将不胜感激。提前致谢！

从 VAE 培训中获得输出

问题描述投票：0回答：0

最新问题

从 VAE 培训中获得输出

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0