我有 cookies txt 文件,其中包含 Chrome 扩展生成的数据,如下所示:
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.
.site.net TRUE / FALSE 1701453620 _ga GA1.2.10834324067.1638446981
.site.net TRUE / FALSE 1638123020 _gid GA1.2.25433025264.1638446981
.site.net TRUE / FALSE 1646432624 _fbp fb.1.1643546988197.973328968
我需要将它加载到 hashmap 并在 Jsoup 连接中使用它
HashMap<String,String> coockies = load.file
Document doc = Jsoup.connect(mainUrl).cookies(cookies).get();
可以加载txt文件并将其转换为hashMap
我首先会预处理文本文件以获取键值列表。像这样的东西:
grep "^[^#]" cookies.txt | awk '{print $6 " " $7}'
_ga GA1.2.10834324067.1638446981
_gid GA1.2.25433025264.1638446981
_fbp fb.1.1643546988197.973328968
上面的代码删除了以
#
开头的行和空行。接下来,对结果进行过滤,仅选择第 6 列(cookie 名称)和第 7 列(cookie 值)。
如果将上述bash命令的输出保存到
filtered.txt
中,则可以像这样在Java中解析cookie信息:
Map<String, String> cookies = new HashMap<>();
try (Stream<String> stream = Files.lines(Paths.get("filtered.txt"))) {
stream.forEach(line -> {
String[] columns = line.split(" ");
cookies.put(columns[0], columns[1]);
});
}
我们只是从每一行抓取key和value来填充我们的
cookies
地图;我认为代码可以更短,但是会牺牲可读性。
Netscape HTTP Cookie 文件
#HttpOnly_.netflix.com 正确/正确 1751668879 SecureNetflixId v%3D2%26mac%3DAQEAEQABABRc-yfFVGumexrAXDYthMa4N04Kw4Qi0Ec.%26dt%3D1720132880656 .netflix.com 正确/错误 1720134752 配置文件NewSession 0 #HttpOnly_.netflix.com 正确/正确 1751668879 NetflixId v%3D2%26ct%3DBQAOAAEBEIYIxb8oaV42omO1Cj_b90CB0MNoiEBUoGKU8AUO_b0UMfz8ZMpKbRbMh5NYZiYeXas9JltksguxTYaJjyjtPCpbmPQO 86i5M7ngqr5Mx15-HVVImuMCjJeEYweGu_c3E-BrFvVptK0G8pkp4qIJNbHX9BafIQw1pFMrQet xfYpHEAsqboIPaTiSx40cnkNRBb357_gl8idNr-FJAs5s1qpR8Rt7HTMYrsBO1OdNJMWiQoqdLdL 9c-mFB8cS7hNZ2g-FnMB-MmCTP7lcrHqd1qW65xabnSN2CTpmYm8Mtv39faqQi-fmXJDK_GOK4D3NYMJlsUJ01YOIqDGf8Yblzuibs684q3FjfcxQCZX5sUYoHI3Fu4gnmkLBxGGWz32VeO0Sr7 O6jyZf7yed5BkgjkIM8Hsa_4kMum3-Qo1yTi17CVp_qWg2LnvaKtaROMGRxflSpQ5V2KWzJHdhgM3EMCNv99DrnyNIwpetw5puisAFDcK848QHw9ujIsMZNWv71rvb_TtbAeGWY_xVl1R5-6iXtWh BJkvxA3MJ9YRNvbJkpwWKetiZnRaKE6SR_fKnlhROUdwXyPL-Dr3G4ikWQeo2toRy2rYgF97x2n kJx7vSmzs31wWzA41CJGNRYs4ibSav%26bt%3Ddbl%26ch%3DAQEAEAABABR-XLG5HxBTA-W5euo CExsXe2f3IjhqHkM .%26mac%3DAQEAEAABABRnAcoLnQNnowk4qFi0G-6Or_Wu8sMSqaE。 .netflix.com 正确/错误 1727908954 %7B%22supplementals%22%3A%7B%22muted%22%3Atrue%7D%7D .netflix.com 正确/错误 1751668955 OptanonConsent isGpcEnabled=0&datestamp=Thu+Jul+04+2024+17%3A42%3A35+GMT-0500+(hora+est%C3%A1ndar+de+哥伦比亚)&version=202406.1.0&browserGpcFlag= 0&isIABGlobal=假&主机=&consentId=7e6e5d34-7ee1-4223-a3c5-786ac472532c&interactionCount=2&isAnonUser=1&landingPath=NotLandingPage&groups=C0001%3A1%2CC0002%3A1%2CC0003%3A1%2CC0004%3A1 &AwaitingReconsent=false&intType=3&地理位置=PE%3BLMA .netflix.com 正确/错误 1720143754 flwssn 5fce8d8b-1209-4477-8595-6ab74a657efc .netflix.com 正确/错误 1750733459 nfvdid BQFmAEBEFhcumPIawTPzxWSkWdT9CBgxrGGD0iRo4aaHnezc6wF2bC5cvKvAnbxNTwhneakvw4mvbXdBc0vB7Sjcg5iAeoh6kBiku2HWOVJslqm2r2AL MQy1GxePQf6rFczd0kkN7Hgu1m1W8bbL8URQVu1LNAw .netflix.com 正确/错误 1751334093 OptanonAlertBox已关闭 2024-07-01T01:41:33.739Z