唯一标识符问题
我们有一个现有的数据库,它广泛使用唯一标识符(不幸的是!)作为主键和一些表的一些可为空的列。我们遇到过这样的情况:在这些表上运行的一些报告对这些唯一标识符进行排序,因为表中没有其他列可以提供有意义的排序(这不是很讽刺吗!)。目的是进行排序,以便按照插入的顺序显示项目,但它们不是使用
NewSequentialId()
插入的 - 因此浪费时间。
有关排序算法的事实
无论如何,考虑到 SQL Server 根据字节组对 uniqueidentifiers 进行排序,从结尾的第 5 个字节组(6 个字节)开始,向第 1 个字节组(4 个字节)移动,从右侧反转第 3 个字节组(2 个字节)的顺序 -从左到左到右,
我的问题
我很想知道这种类型在现实生活中是否有帮助。
SQL Server 如何在内部存储唯一标识符,这可以提供以下方面的见解: 为什么它有这种奇怪的排序算法?
参考:
Alberto Ferrari 对 SQL Server GUID 排序的发现
示例
当您在具有以下数据的 uniqueidentifier 列上使用 Order By 时,Uniqueidentifier 会按如下所示进行排序。
请注意,以下数据按升序排序,最高排序优先级是从第 5 个字节组到第 1 个字节组(向后)。
-- 1st byte group of 4 bytes sorted in the reverse (left-to-right) order below --
01000000-0000-0000-0000-000000000000
10000000-0000-0000-0000-000000000000
00010000-0000-0000-0000-000000000000
00100000-0000-0000-0000-000000000000
00000100-0000-0000-0000-000000000000
00001000-0000-0000-0000-000000000000
00000001-0000-0000-0000-000000000000
00000010-0000-0000-0000-000000000000
-- 2nd byte group of 2 bytes sorted in the reverse (left-to-right) order below --
00000000-0100-0000-0000-000000000000
00000000-1000-0000-0000-000000000000
00000000-0001-0000-0000-000000000000
00000000-0010-0000-0000-000000000000
-- 3rd byte group of 2 bytes sorted in the reverse (left-to-right) order below --
00000000-0000-0100-0000-000000000000
00000000-0000-1000-0000-000000000000
00000000-0000-0001-0000-000000000000
00000000-0000-0010-0000-000000000000
-- 4th byte group of 2 bytes sorted in the straight (right-to-left) order below --
00000000-0000-0000-0001-000000000000
00000000-0000-0000-0010-000000000000
00000000-0000-0000-0100-000000000000
00000000-0000-0000-1000-000000000000
-- 5th byte group of 6 bytes sorted in the straight (right-to-left) order below --
00000000-0000-0000-0000-000000000001
00000000-0000-0000-0000-000000000010
00000000-0000-0000-0000-000000000100
00000000-0000-0000-0000-000000001000
00000000-0000-0000-0000-000000010000
00000000-0000-0000-0000-000000100000
00000000-0000-0000-0000-000001000000
00000000-0000-0000-0000-000010000000
00000000-0000-0000-0000-000100000000
00000000-0000-0000-0000-001000000000
00000000-0000-0000-0000-010000000000
00000000-0000-0000-0000-100000000000
代码:
Alberto 的代码扩展为表示排序是在字节上而不是在各个位上。
With Test_UIDs As (-- 0 1 2 3 4 5 6 7 8 9 A B C D E F
Select ID = 1, UID = cast ('00000000-0000-0000-0000-100000000000' as uniqueidentifier)
Union Select ID = 2, UID = cast ('00000000-0000-0000-0000-010000000000' as uniqueidentifier)
Union Select ID = 3, UID = cast ('00000000-0000-0000-0000-001000000000' as uniqueidentifier)
Union Select ID = 4, UID = cast ('00000000-0000-0000-0000-000100000000' as uniqueidentifier)
Union Select ID = 5, UID = cast ('00000000-0000-0000-0000-000010000000' as uniqueidentifier)
Union Select ID = 6, UID = cast ('00000000-0000-0000-0000-000001000000' as uniqueidentifier)
Union Select ID = 7, UID = cast ('00000000-0000-0000-0000-000000100000' as uniqueidentifier)
Union Select ID = 8, UID = cast ('00000000-0000-0000-0000-000000010000' as uniqueidentifier)
Union Select ID = 9, UID = cast ('00000000-0000-0000-0000-000000001000' as uniqueidentifier)
Union Select ID = 10, UID = cast ('00000000-0000-0000-0000-000000000100' as uniqueidentifier)
Union Select ID = 11, UID = cast ('00000000-0000-0000-0000-000000000010' as uniqueidentifier)
Union Select ID = 12, UID = cast ('00000000-0000-0000-0000-000000000001' as uniqueidentifier)
Union Select ID = 13, UID = cast ('00000000-0000-0000-0001-000000000000' as uniqueidentifier)
Union Select ID = 14, UID = cast ('00000000-0000-0000-0010-000000000000' as uniqueidentifier)
Union Select ID = 15, UID = cast ('00000000-0000-0000-0100-000000000000' as uniqueidentifier)
Union Select ID = 16, UID = cast ('00000000-0000-0000-1000-000000000000' as uniqueidentifier)
Union Select ID = 17, UID = cast ('00000000-0000-0001-0000-000000000000' as uniqueidentifier)
Union Select ID = 18, UID = cast ('00000000-0000-0010-0000-000000000000' as uniqueidentifier)
Union Select ID = 19, UID = cast ('00000000-0000-0100-0000-000000000000' as uniqueidentifier)
Union Select ID = 20, UID = cast ('00000000-0000-1000-0000-000000000000' as uniqueidentifier)
Union Select ID = 21, UID = cast ('00000000-0001-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 22, UID = cast ('00000000-0010-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 23, UID = cast ('00000000-0100-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 24, UID = cast ('00000000-1000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 25, UID = cast ('00000001-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 26, UID = cast ('00000010-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 27, UID = cast ('00000100-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 28, UID = cast ('00001000-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 29, UID = cast ('00010000-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 30, UID = cast ('00100000-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 31, UID = cast ('01000000-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 32, UID = cast ('10000000-0000-0000-0000-000000000000' as uniqueidentifier)
)
Select * From Test_UIDs Order By UID, ID
该算法由 SQL Server 人员记录在此处:如何在 SQL Server 2005 中比较 GUID?我在这里引用它(因为这是一篇旧文章,几年后可能会永远消失)
一般来说,相等比较很有意义 唯一标识符值。但是,如果您发现自己需要一般 排序,那么您可能会看到错误的数据类型,并且应该 请考虑各种整数类型。
如果经过深思熟虑,您决定使用唯一标识符进行订购 专栏,您可能会对得到的结果感到惊讶。
给定这两个唯一标识符值:
@g1='55666BEE-B3A0-4BF5-81A7-86FF976E763F'@g2= '8DD5BCA5-6ABE-4F73-B4B7-393AE6BBB849'
许多人认为@g1小于@g2,因为'55666BEE'是 当然比“8DD5BCA5”小。然而,这不是 SQL Server 的方式 2005 比较 uniqueidentifier 值。
比较是通过从右到左查看字节“组”来进行的,并且 在字节“组”内从左到右。字节组是分隔的 通过“-”字符。从技术上讲,我们查看字节 {10 到 15} 首先,然后是 {8-9},然后是 {6-7},然后是 {4-5},最后是 {0 到 3}。
在这个具体示例中,我们首先比较“86FF976E763F” 与“393AE6BBB849”。我们立即看到@g2确实更大 比@g1。
请注意,在 .NET 语言中,Guid 值具有不同的默认排序 顺序比 SQL Server 中的顺序要多。如果您发现需要订购数组或 使用 SQL Server 比较语义的 Guid 列表,您可以使用 相反,SqlGuid 的数组或列表,它在 a 中实现 IComparable 与 SQL Server 语义一致的方式。
此外,排序遵循字节组字节顺序(请参见此处:全局唯一标识符)。组10-15和组8-9存储为大端(对应维基百科文章中的Data4),因此它们作为大端进行比较。其他组使用小端进行比较。
为那些发现接受的答案有点模糊的人提供特殊服务。代码本身就说明了一切;神奇的部分是:
System.Guid g
g.ToByteArray();
int[] m_byteOrder = new int[16] // 16 Bytes = 128 Bit
{10, 11, 12, 13, 14, 15, 8, 9, 6, 7, 4, 5, 0, 1, 2, 3};
public int Compare(Guid x, Guid y)
{
byte byte1, byte2;
//Swap to the correct order to be compared
for (int i = 0; i < NUM_BYTES_IN_GUID; i++)
{
byte1 = x.ToByteArray()[m_byteOrder[i]];
byte2 = y.ToByteArray()[m_byteOrder[i]];
if (byte1 != byte2)
return (byte1 < byte2) ? (int)EComparison.LT : (int)EComparison.GT;
} // Next i
return (int)EComparison.EQ;
}
完整代码:
namespace BlueMine.Data
{
public class SqlGuid
: System.IComparable
, System.IComparable<SqlGuid>
, System.Collections.Generic.IComparer<SqlGuid>
, System.IEquatable<SqlGuid>
{
private const int NUM_BYTES_IN_GUID = 16;
// Comparison orders.
private static readonly int[] m_byteOrder = new int[16] // 16 Bytes = 128 Bit
{10, 11, 12, 13, 14, 15, 8, 9, 6, 7, 4, 5, 0, 1, 2, 3};
private byte[] m_bytes; // the SqlGuid is null if m_value is null
public SqlGuid(byte[] guidBytes)
{
if (guidBytes == null || guidBytes.Length != NUM_BYTES_IN_GUID)
throw new System.ArgumentException("Invalid array size");
m_bytes = new byte[NUM_BYTES_IN_GUID];
guidBytes.CopyTo(m_bytes, 0);
}
public SqlGuid(System.Guid g)
{
m_bytes = g.ToByteArray();
}
public byte[] ToByteArray()
{
byte[] ret = new byte[NUM_BYTES_IN_GUID];
m_bytes.CopyTo(ret, 0);
return ret;
}
int CompareTo(object obj)
{
if (obj == null)
return 1; // https://msdn.microsoft.com/en-us/library/system.icomparable.compareto(v=vs.110).aspx
System.Type t = obj.GetType();
if (object.ReferenceEquals(t, typeof(System.DBNull)))
return 1;
if (object.ReferenceEquals(t, typeof(SqlGuid)))
{
SqlGuid ui = (SqlGuid)obj;
return this.Compare(this, ui);
} // End if (object.ReferenceEquals(t, typeof(UInt128)))
return 1;
} // End Function CompareTo(object obj)
int System.IComparable.CompareTo(object obj)
{
return this.CompareTo(obj);
}
int CompareTo(SqlGuid other)
{
return this.Compare(this, other);
}
int System.IComparable<SqlGuid>.CompareTo(SqlGuid other)
{
return this.Compare(this, other);
}
enum EComparison : int
{
LT = -1, // itemA precedes itemB in the sort order.
EQ = 0, // itemA occurs in the same position as itemB in the sort order.
GT = 1 // itemA follows itemB in the sort order.
}
public int Compare(SqlGuid x, SqlGuid y)
{
byte byte1, byte2;
//Swap to the correct order to be compared
for (int i = 0; i < NUM_BYTES_IN_GUID; i++)
{
byte1 = x.m_bytes[m_byteOrder[i]];
byte2 = y.m_bytes[m_byteOrder[i]];
if (byte1 != byte2)
return (byte1 < byte2) ? (int)EComparison.LT : (int)EComparison.GT;
} // Next i
return (int)EComparison.EQ;
}
int System.Collections.Generic.IComparer<SqlGuid>.Compare(SqlGuid x, SqlGuid y)
{
return this.Compare(x, y);
}
public bool Equals(SqlGuid other)
{
return Compare(this, other) == 0;
}
bool System.IEquatable<SqlGuid>.Equals(SqlGuid other)
{
return this.Equals(other);
}
}
}
这是一种不同的方法。 GUID 被简单地打乱,准备好进行正常的字符串比较,就像在 SQL Server 中发生的那样。这是 Javascript,但很容易转换为任何语言。
function guidForComparison(guid) {
/*
character positions:
11111111112222222222333333
012345678901234567890123456789012345
00000000-0000-0000-0000-000000000000
byte positions:
111111111111
00112233 4455 6677 8899 001122334455
*/
return guid.substr(24, 12) +
guid.substr(19, 4) +
guid.substr(16, 2) +
guid.substr(14, 2) +
guid.substr(11, 2) +
guid.substr(9, 2) +
guid.substr(6, 2) +
guid.substr(4, 2) +
guid.substr(2, 2) +
guid.substr(0, 2);
};