从 VAE 培训中获得输出

问题描述 投票:0回答:0

我正在尝试在玩具数据集上训练

VAE
,以根据输入的微笑字符串预测差异基因表达谱。在
Keras
完成。这是代码:

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from rdkit import Chem
from rdkit.Chem import AllChem

# Generate toy data
num_samples = 1000
num_genes = 100
num_molecules = 10

# Generate random gene expression profiles
x_train = np.random.rand(num_samples, num_genes)

# Generate random SMILES strings and convert them to RDKit molecules
smiles_list = [Chem.MolToSmiles(Chem.MolFromSmiles(smiles)) for smiles in np.random.choice(["C", "O", "N", "CCC", "O=C(c1ccccc1)N"], size=num_samples)]
mol_list = [Chem.MolFromSmiles(smiles) for smiles in smiles_list]

# Convert molecules to Morgan fingerprints
fps = np.array([AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mol_list])
x_mol = np.array([np.array(fp) for fp in fps])


# Define the encoder model
input_shape = (num_genes,)
latent_dim = 10
encoder_inputs = keras.Input(shape=input_shape, name="encoder_inputs")
mol_inputs = keras.Input(shape=(2048,), name="mol_inputs")
x = layers.Dense(64, activation="relu")(encoder_inputs)
x = layers.Concatenate()([x, mol_inputs])
x = layers.Dense(32, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)

# Define the sampling layer
class Sampling(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch_size = tf.shape(z_mean)[0]
        epsilon = tf.keras.backend.random_normal(shape=(batch_size, latent_dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

z = Sampling()([z_mean, z_log_var])

# Define the decoder model
latent_inputs = keras.Input(shape=(latent_dim,), name="z_sampling")
x = layers.Dense(32, activation="relu")(latent_inputs)
x = layers.Dense(64, activation="relu")(x)
decoder_outputs = layers.Dense(num_genes, activation="sigmoid")(x)

# Define the VAE model
vae = keras.Model(inputs=[encoder_inputs, mol_inputs, latent_inputs], outputs=decoder_outputs)

# Define the loss function
reconstruction_loss = keras.losses.mse(encoder_inputs, decoder_outputs)
reconstruction_loss *= num_genes
kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
kl_loss = tf.reduce_mean(kl_loss) * -0.5
vae_loss = reconstruction_loss + kl_loss

# Compile the VAE model
vae.add_loss(vae_loss)
vae.compile(optimizer="adam")


# Train the VAE model
history = vae.fit(x=[x_train, x_mol, np.random.normal(size=(num_samples, latent_dim))],y=y_train,
                  epochs=1000,
                  batch_size=32)

And then to generate differential gene expression profile for a given data point doing :

# Define the encoder model
encoder = keras.Model(inputs=[encoder_inputs, mol_inputs], outputs=z_mean)

# Get the latent representation of a new sample
new_sample=x_train[1]
new_fp=x_mol[1]
latent_rep = encoder.predict([new_sample.reshape(1, -1), new_fp.reshape(1, -1)])

# Define the decoder model
decoder = keras.Model(inputs=latent_inputs, outputs=decoder_outputs)
# Generate a new sample from a given latent representation
new_sample = decoder.predict(latent_rep).reshape(-1)


# Encode the training data to obtain latent representations
latent_train = encoder.predict([x_train, x_mol])

# Decode the latent representations to obtain reconstructed gene expression patterns
reconstructed_train = decoder.predict(latent_train)

我无法重建原始数据。 这些可能是它不起作用的可能原因: 1)如果解码器无法重新创建训练示例,则可能是由于各种原因造成的。一些常见的原因包括:

2)模型架构可能不适合给定的数据集。在这种情况下,您可能需要尝试不同的架构或超参数。

3)模型可能训练的时间不够长。您可能需要增加轮数或批量大小。

4)训练数据可能有噪声或不能代表真实的数据分布。在这种情况下,您可能需要收集更多数据或对数据进行预处理以去除噪音。

5)损失函数可能不适合给定的问题。您可能需要尝试不同的损失函数。

6)优化器可能无法找到损失函数的全局最小值。您可能需要尝试不同的优化器或学习率。

7)我可能在定义问题时做错了。

在这种情况下,我的 reconstructed_train 具有所有相同的元素。我做错了什么?

任何帮助将不胜感激。提前致谢!

python tensorflow keras deep-learning
© www.soinside.com 2019 - 2024. All rights reserved.