如何搜索符合条件的值

问题描述 投票:0回答:1

我正在寻找数据集中最好的 X% 数据,其中“最佳”被定义为具有最小的值之和。我可以通过运行一系列测试来获得我想要的结果来做到这一点:

SELECT
  -- Analyze results manually, looking for a testXX value which is close to X%.
  -- If needed, edit the query for higher precision and try again.
  1.0 * count_if(f1+f2 < 0.1)/count(1) AS test01,
  1.0 * count_if(f1+f2 < 0.2)/count(1) AS test02,
  ...
FROM table1

我尝试加入 SEQUENCE 来减少复制和粘贴,但除了使查询更加占用内存之外,我无法让它工作。这是我尝试过的:

SELECT 1.0 * count_if(f1+f2 < threshold)/count(1) AS test
FROM table1
JOIN (SELECT t.v/100.0 AS threshold FROM UNNEST(SEQUENCE(20, 80, 1)) t(v))
  ON true

我真正想要的是一个查询,它会自动找到一个等于 X +-某个 epsilon 的阈值,或者更好的是,一个尽可能接近 X 的阈值。

简化样本数据

f1    f2
0.04  0.05
0.02  0.07
0.02  0.69
0.1   0.1
0.1   0.3
0.1   0.4
0.1   0.5
0.1   0.6
0.1   0.7
0.1   0.8

如果我的目标 X 是 0.3,我希望阈值在 0.09 左右,因为 f1+f2 的 30% 是 <=0.09. The real data set has tens of millions of rows with far more random values. If I want a 30% slice, it's okay if it's actually 30.2% or 29.8%.

sql presto
1个回答
0
投票
CREATE TABLE sample (
  f1   DECIMAL(4,3),
  f2   DECIMAL(4,3)
)
INSERT INTO
  sample
VALUES
(0.04,  0.05), 
(0.02,  0.07), 
(0.02,  0.069), -- I changed this value
(0.1 ,  0.1), 
(0.1 ,  0.3), 
(0.1 ,  0.4), 
(0.1 ,  0.5), 
(0.1 ,  0.6), 
(0.1 ,  0.7), 
(0.1 ,  0.8) 
WITH
  ranked AS
(
  SELECT
    *,
    f1+f2 AS x,
    ROW_NUMBER()
      OVER (ORDER BY f1+f2, f1, f2)
    *
    1.0
    /
    COUNT(*) OVER ()
      AS percentile
  FROM
    sample
)
SELECT
  MAX(CASE WHEN percentile <= 0.3 THEN x END),
  MIN(CASE WHEN percentile >  0.3 THEN x END)
FROM
  ranked
最大 分钟
0.090 0.200

30% 的截止值可以是从 0.090 到(但不包括)0.200 的任何值

小提琴

© www.soinside.com 2019 - 2024. All rights reserved.