我正在开发一个 C 程序,该程序应该计算给定文件中子字符串列表的出现次数。当我使用我手动创建并填充文本的文件测试程序时,它工作正常并正确计算每个子字符串的出现次数。但是,当我尝试在使用
echo
命令创建的文件上使用该程序时,该程序似乎无法找到任何出现的子字符串,即使我可以看到文件中存在子字符串我用文本编辑器打开它。
我已经检查了程序的逻辑,我相信它是正确的,但我不确定为什么它不能使用
echo
. 创建的文件
这是程序的简化版本:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#define BUFFER_SIZE 1024
int num_substrings = 0;
int use_systemcall = 0;
void search_file(char *filename, char *substring) {
// Open the file with the given filename in read mode
FILE *file = fopen(filename, "r");
// Check if the file was successfully opened
if (file == NULL) {
// Print an error message and exit the program with an error code
fprintf(stderr, "Error: could not open file '%s'\n", filename);
exit(1);
}
int count = 0;
char buffer[BUFFER_SIZE];
char *line;
size_t len = 0;
ssize_t read;
// Read the file line by line until the end
while ((read = getline(&line, &len, file)) != -1) {
// Skip the last line if it is empty
if (read == 1 && line[0] == '\n') {
continue;
}
// Strip any newline characters from the end of the line
if (line[read - 1] == '\n') {
line[read - 1] = '\0';
read--;
}
// Find the first occurrence of the given substring in the current line
char *match = strstr(line, substring);
// While there are still occurrences of the substring in the current line
while (match != NULL) {
// Increment the counter and find the next occurrence of the substring
count++;
match = strstr(match + 1, substring);
}
}
// Close the file
fclose(file);
// Print the number of occurrences of the substring found in the file
printf("Found %d occurrences of substring '%s' in file '%s'\n",
count, substring, filename);
}
int main(int argc, char *argv[]) {
// Get the filename from the first command-line argument
char *filename = argv[1];
// Initialize an array to store the substrings and a counter for the number of substrings
char substrings[10][100];
int num_substrings = 0;
// Loop through the remaining command-line arguments (starting from the second one)
for (int i = 2; i < argc; i++) {
// Copy the current argument (substring) into the substrings array
strcpy(substrings[num_substrings], argv[i]);
// Increment the counter for the number of substrings
num_substrings++;
}
// Ask the user if they want to use a system call
printf("Do you want to use system call? (y/n): ");
char answer[10];
fgets(answer, 10, stdin);
// Check if the user answered yes (y or Y) and set the use_systemcall variable accordingly
int use_systemcall = 0;
if (answer[0] == 'y' || answer[0] == 'Y') {
use_systemcall = 1;
}
printf("Filename: %s\n", filename);
printf("Substrings: ");
for (int i = 0; i < num_substrings; i++) {
printf("%s ", substrings[i]);
}
printf("\n");
// Open the file for reading
FILE *file = fopen(filename, "rb");
if (file == NULL) {
printf("Error: Cannot open file %s\n", filename);
return 1;
}
// Initialize a buffer to read the file in blocks of 100 characters
char buffer[101];
// Loop through each substring and search for it in the file
for (int i = 0; i < num_substrings; i++) {
// Reset the file pointer to the beginning of the file
fseek(file, 0, SEEK_SET);
// Initialize a counter for the number of occurrences of the substring
int count = 0;
// Loop through the file in blocks of 100 characters
while (fread(buffer, sizeof(char), 100, file) > 0) {
// Add a null terminator at the end of the buffer
buffer[100] = '\0';
// Search for the substring in the buffer
char *result = strstr(buffer, substrings[i]);
// If the substring is found, increment the count
while (result != NULL) {
count++;
// Move the result pointer to the next character after the match
result++;
// Search for the substring again starting from the result pointer
result = strstr(result, substrings[i]);
}
}
// Print the number of occurrences of the substring
printf("'%s' appears %d times in the file.\n", substrings[i], count);
}
return 0;
}
命令:
echo "hello world" > foo.txt ---For creating file
./substring_search foo.txt world -- for searching substrings
输出:
'world' appears 0 times in the file.
任何人都可以帮我找出可能导致此问题的原因以及如何解决它吗?
代码有一些问题:
不完整的块读取
文件说是 50 个字节,那么
buffer[100] = '\0'
不会使 buffer[]
成为正确的 string。最好使用 fread()
返回的长度。我怀疑这是 OP 的关键问题。
子串跨越块边界
strstr(buffer, substrings[i]);
如果子串的一部分在一个块中而其余部分在另一个块中,则不会检测子串。
超出范围访问
char substrings[10][100];
substrings[num_substrings]
不好当num_substrings >= 10
.
当源字符串的长度为 100 或更多时,strcpy(substrings[num_substrings], argv[i]);
是错误的。
文件中有空字符?
如果源文件包含strstr(buffer, substrings[i])
.,
buffer[]
将比阅读整个
'\0'
提前停止
先检查
argc
int main(int argc, char *argv[]) {
if (argc < 2) {
fprintf(stderr, "Error: Insufficient arguments\n");
return EXIT_FAILURE;
}
// OK now to save the argument for later fopen() use.
char* filename = argv[1];
...
FILE *file = fopen(filename, "rb");
"\n"
与"\r\n"
如果手动创建的文件或 echo 文件有不同的行尾,我认为这对 OP 没有影响 - 但在调试时要注意这一点。
程序的simplified版本不产生发布的输出:系统调用没有问题,文件名输出丢失。我得到了用 echo
:
创建的foo.txt 文件的输出
Do you want to use system call? (y/n): y
Filename: foo.txt
Substrings: world
'world' appears 1 times in the file.
程序有一些问题,但不会妨碍预期的输出:
search_file
使用不同的方法读取文件并且也应该产生预期的输出。main
函数一次读取文件 100 个字节,因此不会计算重叠块边界的匹配你应该简化发布的代码并确保它仍然有问题。