我的表情符号去除方法有问题。
此方法用于下载聊天记录并检查用户名中是否有表情符号,如果存在则将其删除。它适用于本地和临时环境。但在实际环境中,有时它对具有相同用户名的成绩单有效,但对其他用户名则失败。
2024-09-26 16:17:20.877 ERROR [whatsapp-api-service,64b76bd04750daa2,64b76bd04750daa2] 3200902 --- [http-nio-8002-exec-2] z.c.e.p.s.UUIDAuthenticationFilter : Request processing failed; nested exception is Error downloading transcript, U+1F413 ('.notdef') is not available in the font Helvetica, encoding: WinAnsiEncoding
org.springframework.web.util.NestedServletException: Request processing failed; nested exception is Error downloading transcript, U+1F413 ('.notdef') is not available in the font Helvetica, encoding: WinAnsiEncoding
通过数据库手动删除该表情符号后,我能够下载该特定用户的文字记录
这是代码。
private static final List<UnicodeRange> EMOJI_RANGES = Arrays.asList(
new UnicodeRange(0x1F600, 0x1F64F), // Emoticons
new UnicodeRange(0x1F300, 0x1F5FF), // Misc Symbols and Pictographs
new UnicodeRange(0x1F680, 0x1F6FF), // Transport and Map Symbols
new UnicodeRange(0x2600, 0x26FF), // Misc Symbols
new UnicodeRange(0x2700, 0x27BF), // Dingbats
new UnicodeRange(0x1F900, 0x1F9FF), // Supplemental Symbols and Pictographs
new UnicodeRange(0x1FA70, 0x1FAFF), // Symbols and Pictographs Extended-A
new UnicodeRange(0x1F1E6, 0x1F1FF), // Regional Indicator Symbols
new UnicodeRange(0xFE00, 0xFE0F), // Variation Selectors
new UnicodeRange(0x1F000, 0x1F02F), // Mahjong Tiles
new UnicodeRange(0x1F0A0, 0x1F0FF), // Playing Cards
new UnicodeRange(0x1F700, 0x1F77F), // Alchemical Symbols
new UnicodeRange(0x1F780, 0x1F7FF), // Geometric Shapes Extended
new UnicodeRange(0x1F800, 0x1F8FF) // Supplemental Arrows and Symbols
);
private boolean isEmoji(int codePoint) {
for (UnicodeRange range : EMOJI_RANGES) {
if (range.contains(codePoint)) {
return true;
}
}
return false;
}
private String removeEmojis(String input) {
StringBuilder result = new StringBuilder();
int length = input.length();
int i = 0;
while (i < length) {
int codePoint = input.codePointAt(i);
if (Character.isSupplementaryCodePoint(codePoint)) {
i += Character.charCount(codePoint);
continue;
}
if (!isEmoji(codePoint)) {
result.append(Character.toChars(codePoint));
}
i += Character.charCount(codePoint);
}
return result.toString();
}
private byte[] generatePdfTranscript() throws IOException {
if (ticketNumber != null) {
fileName = ticketNumber + "_transcript.pdf";
} else {
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd_HH-mm-ss");
fileName = removeEmojis(customerName) + "_" + LocalDateTime.now().format(formatter) + "_transcript.pdf";
}
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
try (PDDocument document = new PDDocument()) {
initializeFonts(document);
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
try (PDPageContentStream initialContentStream = new PDPageContentStream(document, page)) {
float yPosition = page.getMediaBox().getHeight() - MARGIN_Y;
yPosition = setupPdfHeader(initialContentStream, yPosition, document);
yPosition = addTranscriptTitle(initialContentStream, yPosition);
PDPageContentStream currentContentStream = initialContentStream;
for (Message message : messages) {
Pair<Float, PDPageContentStream> result = addMessage(document, currentContentStream, message, yPosition - 15);
yPosition = result.getLeft();
if (result.getRight() != currentContentStream) {
if (currentContentStream != initialContentStream) {
currentContentStream.close();
}
currentContentStream = result.getRight();
}
}
if (currentContentStream != initialContentStream) {
currentContentStream.close();
}
}
document.save(byteArrayOutputStream);
}
return byteArrayOutputStream.toByteArray();
}
private float setupPdfHeader(PDPageContentStream contentStream, float yPosition, PDDocument document) throws IOException {
contentStream.saveGraphicsState();
contentStream.setFont(PDType1Font.HELVETICA_BOLD, FONT_SIZE);
float pageWidth = document.getPage(0).getMediaBox().getWidth();
float headerStartY = yPosition + 10;
float contentWidth = pageWidth - 2 * MARGIN_X;
if (ticketNumber != null) {
yPosition = addTextLine(contentStream, "Ticket Number: " + ticketNumber, yPosition);
yPosition = addTextLine(contentStream, "Date & Time Ticket Issued: " + issuedTime, yPosition);
yPosition = addTextLine(contentStream, "Date & Time Ticket Closed: " + closedTime, yPosition);
} else {
yPosition = addTextLine(contentStream, "No ticket issued", yPosition);
}
if (clientWhatsAppNumber != null) {
yPosition = addTextLine(contentStream, "Service's WhatsApp number: " + clientWhatsAppNumber, yPosition);
}
if (customerWhatsAppNumber != null) {
yPosition = addTextLine(contentStream, "Customer's WhatsApp number: " + customerWhatsAppNumber, yPosition);
}
yPosition = addTextLine(contentStream, "Agent's Name & Surname: " + removeEmojis(agentName), yPosition);
yPosition = addTextLine(contentStream, "Customer's Name & Surname: " + removeEmojis(customerName), yPosition);
float headerHeight = headerStartY - yPosition + FONT_SIZE; // Add FONT_SIZE to give some bottom padding
contentStream.setLineWidth(1f);
contentStream.setStrokingColor(Color.BLACK);
contentStream.addRect(MARGIN_X - 5, yPosition, contentWidth + 10, headerHeight);
contentStream.stroke();
contentStream.restoreGraphicsState();
yPosition -= FONT_SIZE;
return yPosition;
}
private float addTranscriptTitle(PDPageContentStream contentStream, float yPosition) throws IOException {
contentStream.setFont(PDType1Font.HELVETICA_BOLD, FONT_SIZE + 2);
return addTextLine(contentStream, "Ticket Transcript:", yPosition - 20);
}
我为导致错误的用户名编写了单元测试,并且所有测试都通过了。我将代码部署到临时服务器,它按预期工作,成功下载了带有表情符号的用户名的文字记录。然而,在实际环境中,它对某些具有相同用户名的票证有效,但对其他票证则失败。直接从数据库中删除表情符号后,我能够下载出现错误的票证的文字记录。
Character.isEmoji
正如 g00se 评论的那样,你可以大大简化你的表情符号删除代码。
Java 21+ 在 类上提供了多种方法来识别表情符号:isEmoji
、isEmojiPresentation
、
isEmojiModifier
、
isEmojiModifierBase
和
isEmojiComponent
。我没有研究过这些,建议你研究一下。但作为一个例子,我会猜测下面的代码可以做到。
String strippedOfEmoji =
input
.codePoints( )
.filter( codepoint -> !Character.isEmoji( codepoint ) )
.filter( codepoint -> !Character.isEmojiPresentation( codepoint ) )
.filter( codepoint -> !Character.isEmojiModifier( codepoint ) )
.filter( codepoint -> !Character.isEmojiComponent( codepoint ) )
.collect( StringBuilder :: new , StringBuilder :: appendCodePoint , StringBuilder :: append )
.toString( );
让我们尝试一下该代码。
String input = "abc🐓xyz";
strippedOfEmoji = abcxyz
Character
类提供了许多此类类别检测方法,因此在编写自己的方法之前请先研究一下。UTF-8
WinAnsiEncoding
更改为UTF-8。 然后您应该能够看到 Unicode 中定义的所有 150,000 个左右的字符,前提是您已经更新了字体。