我使用此查询来查找使用 slot 的总成本。此处计算的total_cost 与我在 BigQuery 预订 API 中看到的完全不同。我有标准版。我是不是错过了什么?
我还估计了存储成本,但这根本不包括差额。如有任何帮助,我们将不胜感激。
SELECT (SUM(TOTAL_SLOT_MS)*0.04)/(1000*60*60) AS TOTAL_COST
, MAX(jobstage_max_slots) AS MAX_SLOTS
, AVG(job_avg_slots) AS AVG_SLOTS
FROM
(
SELECT
project_id,
job_id,
reservation_id,
EXTRACT(DATE FROM creation_time) AS creation_date,
TIMESTAMP_DIFF(end_time, start_time, SECOND) AS job_duration_seconds,
job_type,
user_email,
total_bytes_billed,
-- Average slot utilization per job
SAFE_DIVIDE(job.total_slot_ms,(TIMESTAMP_DIFF(job.end_time, job.start_time, MILLISECOND))) AS job_avg_slots,
query,
-- Determine the max number of slots used at ANY stage in the query.
-- The average slots might be 55. But a single stage might spike to 2000 slots.
-- This is important to know when estimating number of slots to purchase.
job.total_slot_ms,
MAX(SAFE_DIVIDE(unnest_job_stages.slot_ms,unnest_job_stages.end_ms - unnest_job_stages.start_ms)) AS jobstage_max_slots,
-- Check if there's a job that requests more units of works (slots). If so you need more slots.
-- estimated_runnable_units = Units of work that can be scheduled immediately.
-- Providing additional slots for these units of work accelerates the query,
-- if no other query in the reservation needs additional slots.
MAX(unnest_timeline.estimated_runnable_units) AS estimated_runnable_units
FROM `region-us`.INFORMATION_SCHEMA.JOBS AS job
CROSS JOIN UNNEST(job_stages) as unnest_job_stages
CROSS JOIN UNNEST(timeline) AS unnest_timeline
WHERE project_id = 'open-bridge-bg'
-- and job_type = 'QUERY'
-- AND statement_type != 'SCRIPT'
AND DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
GROUP BY 1,2,3,4,5,6,7,8,9,10,11
ORDER BY job_id
);
https://cloud.google.com/bigquery/docs/information-schema-jobs#calculate_average_slot_utilization
我预计总成本与我在账单中看到的大致相同
造成差异的一个原因可能是因为 BigQuery 基于槽位的定价以 100 个槽位为增量向上/向下扩展,因此,即使
total_slot_ms
值可能指示在查询持续时间内平均少于 100 个槽位,自动缩放器仍将扩展到 100。我已经能够通过将插槽使用量四舍五入 100 来获得更接近的成本估算。以下是如何查询的示例:
DECLARE standard_edition_cost_per_slot_hour FLOAT64 DEFAULT 0.04;
DECLARE ms_per_hour INT64 DEFAULT 1000 * 3600;
WITH raw_data AS (
SELECT
job_id,
total_slot_ms,
TIMESTAMP_DIFF(end_time, start_time, millisecond) as job_duration_ms
FROM `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
creation_time BETWEEN
TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)
AND TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND cache_hit != true
AND total_slot_ms IS NOT NULL
),
job_w_avg_slot_usage AS (
SELECT
*,
total_slot_ms / job_duration_ms AS avg_slot_usage
FROM raw_data
),
job_w_avg_slot_usage_100 AS (
SELECT
*,
CEIL(avg_slot_usage / 100) * 100 AS avg_slot_usage_100
FROM job_w_avg_slot_usage
),
job_w_slot_hr_100 AS (
SELECT
*,
job_duration_ms * avg_slot_usage_100 / ms_per_hour AS slot_hr_100
FROM job_w_avg_slot_usage_100
),
job_w_cost AS (
SELECT
*,
slot_hr_100 * standard_edition_cost_per_slot_hour AS standard_edition_cost
FROM job_w_slot_hr_100
)
SELECT SUM(standard_edition_cost) as total_cost FROM job_w_cost