我尝试遵循 youtube 上的数据清理教程。导师使用 SQL,我使用 BigQuery(在 Mac 上安装 SQL 太麻烦,我想学习如何使 SQL 语法适应 Bigquery)。
当在 SQL 中指示如何填充属性地址数据时,SQL 语法是
Select *
From PortfolioProject.dbo.NashvilleHousing
--Where PropertyAddress is null
order by ParcelID
Select a.ParcelID, a.PropertyAddress, b.ParcelID, b.PropertyAddress, ISNULL(a.PropertyAddress,b.PropertyAddress)
From PortfolioProject.dbo.NashvilleHousing a
JOIN PortfolioProject.dbo.NashvilleHousing b
on a.ParcelID = b.ParcelID
AND a.[UniqueID ] <> b.[UniqueID ]
Where a.PropertyAddress is null
Update a
SET PropertyAddress = ISNULL(a.PropertyAddress,b.PropertyAddress)
From PortfolioProject.dbo.NashvilleHousing a
JOIN PortfolioProject.dbo.NashvilleHousing b
on a.ParcelID = b.ParcelID
AND a.[UniqueID ] <> b.[UniqueID ]
Where a.PropertyAddress is null
我已经设法使一些内容适应bigquery语言
Select *
From `sturdy-filament-415311.NashvilleHousing.NHData`
order by ParcelID
;
Select a.ParcelID, a.PropertyAddress, b.ParcelID, b.PropertyAddress, COALESCE(a.PropertyAddress,b.PropertyAddress)
From `sturdy-filament-415311.NashvilleHousing.NHData` a
JOIN `sturdy-filament-415311.NashvilleHousing.NHData` b
on a.ParcelID = b.ParcelID
AND a.UniqueID_ <> b.UniqueID_
Where a.PropertyAddress is null
;
但在执行以下查询时不知何故出现错误
Update a
SET PropertyAddress = COALESCE(a.PropertyAddress,b.PropertyAddress)
From sturdy-filament-415311.NashvilleHousing.NHData a
JOIN sturdy-filament-415311.NashvilleHousing.NHData b
on a.ParcelID = b.ParcelID
AND a.UniqueID_ <> b.UniqueID_
Where a.PropertyAddress is null
我已经知道表“a”必须用数据集限定(例如 dataset.table)。
然后我用这个来解决它
Update sturdy-filament-415311.NashvilleHousing.NHData
SET PropertyAddress = COALESCE(a.PropertyAddress,b.PropertyAddress)
From sturdy-filament-415311.NashvilleHousing.NHData a
JOIN sturdy-filament-415311.NashvilleHousing.NHData b
on a.ParcelID = b.ParcelID
AND a.UniqueID_ <> b.UniqueID_
Where a.PropertyAddress is null
并且更新/合并必须为每个目标行最多匹配一个源行
我该如何解决这个问题?
为了避免重复的表,请考虑在没有
JOIN
的情况下运行别名,并一致使用反引号,因为表名称中的连字符 -
可能会引发语法错误。另外,让我们注意,要改掉的坏习惯:表别名,如 (a, b, c) 或 (t1, t2, t3),使用信息更丰富的别名:
UPDATE `sturdy-filament-415311.NashvilleHousing.NHData` nh1
SET nh1.PropertyAddress = COALESCE(nh1.PropertyAddress, nh2.PropertyAddress)
FROM `sturdy-filament-415311.NashvilleHousing.NHData` nh2
WHERE nh1.ParcelID = nh2.ParcelID
AND nh1.UniqueID_ <> nh2.UniqueID_
AND nh1.PropertyAddress IS NULL