我有 50k 个可能的域名列表。我想知道哪些是可用的,如果可能的话,它们的价格是多少。列表看起来像这样
presumptuous.ly
principaliti.es
procrastinat.es
productivene.ss
professional.ly
profession.ally
professorshi.ps
prognosticat.es
prohibitioni.st
我尝试过 whois,但运行速度太慢,无法在未来 100 年内完成。
def check_domain(domain):
try:
# Get the WHOIS information for the domain
w = whois.whois(domain)
if w.status == "free":
return True
else:
return False
except Exception as e:
print("Error: ", e)
print(domain+" had an issue")
return False
def check_available(matches):
print('checking availability')
available=[]
for match in matches:
if(check_domain(match)):
print("found "+match+" available!")
available.append(match)
return available
我也尝试过names.com/names批量上传工具,但似乎根本不起作用。
如何确定这些域的可用性?
您可以使用例如
multiprocessing
包来加速该过程,即:
import os
import sys
from multiprocessing import Pool
import pandas as pd
from tqdm import tqdm
from whois import whois
# https://stackoverflow.com/a/8391735/10035985
def blockPrint():
sys.stdout = open(os.devnull, "w")
def enablePrint():
sys.stdout = sys.__stdout__
def check_domain(domain):
try:
blockPrint()
result = whois(domain)
except:
return domain, None
finally:
enablePrint()
return domain, result.status
if __name__ == "__main__":
domains = [
"google.com",
"yahoo.com",
"facebook.com",
"xxxnonexistentzzz.domain",
] * 100
results = []
with Pool(processes=16) as pool: # <-- select here how many processes do you want
for domain, status in tqdm(
pool.imap_unordered(check_domain, domains), total=len(domains)
):
results.append((domain, not bool(status)))
df = pd.DataFrame(results, columns=["domain", "is_free"])
print(df.drop_duplicates())
打印:
100%|██████████████████████████████████████████████| 400/400 [00:07<00:00, 55.67it/s]
domain is_free
0 xxxnonexistentzzz.domain True
5 facebook.com False
11 google.com False
14 yahoo.com False
您可以看到它每秒检查约 55 个域。