R优化函数未达到最优值

Question

我的 R

optimize

函数发生了一些非常意想不到的事情。我有 2 个非常相似的数据集，其中一个

optimize

按预期工作，另一个则不然。

这是我试图最小化的函数和 2 个数据集：

my_beta_fc <- function(data, par){
  
  # data is a dataframe of the ndays distribution with 2 columns:
  # x = number of days ranging from 0 to x+
  # f_x = reach (absolute)
  
  a <- par

  data <- data[,1:2]
  names(data) <- c("x", "f_x")
  
  n <- max(data$x)
  
  p0 <- data %>% 
    mutate(reach_pc = f_x / sum(f_x)) %>% 
    filter(x == 0) %>% 
    pull(reach_pc)
    
  xbar <- data %>% 
    summarise(mean = sum(f_x*x)/sum(f_x)) %>% 
    pull(mean)
  
  b <- a*(n-xbar)/xbar
    
  # we want to minimise the difference between observed and calculated p0:
  result <- abs(p0 - (gamma(n+b)*gamma(a+b))/(gamma(a+n+b)*gamma(b)))
  
  # if the evaluated function returns NaN, replace with a high penalty
  # to steer the optimization away from regions where the function returns NaN:
  if(is.nan(result)|is.na(result)){
    return(1e10)
  }
  
  return(result)
  
}

t1 <- data.frame(
  n_days = 0:7,
  reach = c(40979971, 2110778, 1126387, 729457, 541512, 346607, 236263, 198262)
) 

t2 <- data.frame(
  n_days = 0:7,
  reach = c(41233610, 2017354, 1063684, 694576, 518144, 330215, 223006, 188648)
)

那么 t1 的结果是：

param_solution <- optimize(
  f = function(param) my_beta_fc(data = t1, param),
  interval = c(0,10),
  tol = 0.00001
)

给出：

> param_solution
$minimum
[1] 0.05495449

$objective
[1] 0.0000003038929

但对 t2 运行相同：

param_solution <- optimize(
  f = function(param) my_beta_fc(data = t2, param),
  interval = c(0,10),
  tol = 0.00001
)

给出：

> param_solution
$minimum
[1] 9.999995

$objective
[1] 10000000000

这显然不是解决方案，因为我们可以使用 t1 找到的解决方案获得更好的目标函数值...

知道这里可能有什么问题吗？

Answer 1

返回的值是您超出允许范围的惩罚。定义惩罚的方式使得函数看起来非常平坦，所以

optimize

不知道该走哪条路，最终放弃。

约束优化很难。我发现有时有效的方法如下：

当建议的参数超出允许范围时，找到该范围内的值，并返回其中的值，并根据您必须更改它们的程度进行惩罚。那么至少

optimize

会得到一个提示，让事情顺利进行。

在您的情况下，这很困难，因为

gamma()

函数在各种不同的情况下返回

NaN

：0 或更少的整数值。也许你会得到合理的结果，通过增加

n+b

，强制

a+b

、

a+n+b

、

和

都大于某个小的正数（例如 0.00001），然后添加惩罚值你需要改变它。

我刚刚尝试了这个，但没有成功。 NaN 的出现是因为值太大，而不是太小。您应该使用

lgamma()

而不是

gamma

，即更改您的功能，如下所示：

my_beta_fc <- function(data, par){
  
  # data is a dataframe of the ndays distribution with 2 columns:
  # x = number of days ranging from 0 to x+
  # f_x = reach (absolute)
  
  a <- par
  
  data <- data[,1:2]
  names(data) <- c("x", "f_x")
  
  n <- max(data$x)
  
  p0 <- data %>% 
    mutate(reach_pc = f_x / sum(f_x)) %>% 
    filter(x == 0) %>% 
    pull(reach_pc)
  
  xbar <- data %>% 
    summarise(mean = sum(f_x*x)/sum(f_x)) %>% 
    pull(mean)
  
  b <- a*(n-xbar)/xbar
  
  bounds_violation <- min(n+b, a+b, a+n+b, b)
  if (bounds_violation <= 0) {
    bounds_violation <- abs(bounds_violation)
    b <- b + bounds_violation + 0.000001
  } else
    bounds_violation <- 0
  
  # we want to minimise the difference between observed and calculated p0:
  result <- abs(p0 - exp(lgamma(n+b) + lgamma(a+b) - lgamma(a+n+b) - lgamma(b)))
  
  # if the evaluated function returns NaN, stop!
  if(is.nan(result)|is.na(result)){
    stop("Still get NaN")
  }
  
  return(result + bounds_violation)
  
}

R优化函数未达到最优值

问题描述投票：0回答：1

1个回答

最新问题

R优化函数未达到最优值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1