这不是重复的,因为有关奇偶分类的其他问题都不会尝试使用此特定函数来学习,而是使用通常的 ReLU 或 sigmoid。
我正在尝试使用 pytorch 作为自分配练习来估计函数
w
中的参数 x -> sin(w*x)^2
将整数分类为偶数或奇数。当然,正确的w
有多种可能的值,包括w = pi/2
。我用 w = 1.5
接近 pi/2
初始化了我的网络(没有偏置的线性,然后是正弦激活,然后平方),希望它收敛到 pi/1 = 1.507...
,但无论我如何调整学习率或我使用什么优化器。
class Net(torch.nn.Module):
def __init__(self):
super().__init__()
# linear without bias
self.fc1 = torch.nn.Linear(1, 1, bias=False)
# Initialize close to the theoretical solution
with torch.no_grad():
self.fc1.weight.data.fill_(1.5) # Close to π/2 ≈ 1.57
def forward(self, x):
x = self.fc1(x)
return torch.sin(x)**2
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.0004)
weights = []
for epoch in range(100):
net.train()
optimizer.zero_grad()
output = net(train_x)
loss = criterion(output, train_y)
loss.backward()
optimizer.step()
weights.append(w)
权重图表表明没有收敛到任何点的趋势。
我愿意相信我避免了常见的陷阱:我让输入和输出为
float 32
,目标函数可以使用该模型完全学习,我也尝试过使用其他损失函数但失败了。
请帮我找出哪里出错了,这是完整的代码(从jupyter笔记本导出):
# %%
import torch
import numpy as np
import pandas as pd
# %%
# Generate data and scale inputs
def generate_data(size):
x = np.random.randint(0, size, size) # Smaller range for better visualization
return x.astype(float), (x % 2).astype(float)
# %%
# Generate datasets
train_x, train_y = generate_data(1000)
val_x, val_y = generate_data(1000)
# Convert to tensors
train_x = torch.tensor(train_x, dtype=torch.float32).reshape(-1, 1)
train_y = torch.tensor(train_y, dtype=torch.float32).reshape(-1, 1)
val_x = torch.tensor(val_x, dtype=torch.float32).reshape(-1, 1)
val_y = torch.tensor(val_y, dtype=torch.float32).reshape(-1, 1)
# %%
class Net(torch.nn.Module):
def __init__(self):
super().__init__()
# linear without bias
self.fc1 = torch.nn.Linear(1, 1, bias=False)
# Initialize close to the theoretical solution
with torch.no_grad():
self.fc1.weight.data.fill_(1.5) # Close to π/2 ≈ 1.57
def forward(self, x):
x = self.fc1(x)
return torch.sin(x)**2
# %%
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.0004)
# %%
weights = []
for epoch in range(100):
net.train()
optimizer.zero_grad()
output = net(train_x)
loss = criterion(output, train_y)
loss.backward()
optimizer.step()
net.eval()
with torch.no_grad():
val_output = net(val_x)
val_loss = criterion(val_output, val_y)
if epoch % 1 == 0:
print(f"Epoch {epoch}")
print(f"Loss: {loss.item():.8f} Val Loss: {val_loss.item():.8f}")
w = net.fc1.weight.item()
print(f"Weight: {w:.8f} (target: {np.pi/2:.8f})")
print("---")
weights.append(w)
# %%
# plot weights
import matplotlib.pyplot as plt
plt.plot(weights)
plt.plot([np.pi/2]*len(weights))
# %%
# Test the model
w = net.fc1.weight.item()
print("\nFinal parameters:")
print(f"Weight: {w:.8f} (target: {np.pi/2:.8f})")
# Test on even and odd numbers
test_numbers = np.arange(0, 1500, 1)
net.eval()
with torch.no_grad():
for x in test_numbers:
test_input = torch.tensor([[float(x)]], dtype=torch.float32)
pred = net(test_input).item()
print("✅" if (pred < 0.5) == (x % 2 == 0) else "❌", end="")
if (x+1) % 60 == 0:
print()
我意识到,因为这个函数的梯度(相对于
w
)是2*x*cos(w*x)*sin(w*x)
,所以它线性依赖于x
。由于我使用大量大输入和小输入相结合,因此梯度本质上是随机的。
我仅使用 6 个样本点 (0 - 5) 就能成功训练“网络”。
要使用所有数据,我想,我必须想出一个表现良好的梯度函数,它不太依赖于变量
x
。