我最近一直在尝试在 Python 3 中创建一个程序,该程序将读取包含 23005 个单词的文本文件,然后用户将输入一个 9 个字符的字符串,程序将使用它来创建单词并将它们与单词进行比较在文本文件中。
我想打印包含 4-9 个字母并且还包含列表中间的字母的单词。例如,如果用户输入字符串“anitskem”,则该单词中必须存在第五个字母“s”。
这是我自己已经走了多远:
# Open selected file & read
filen = open("svenskaOrdUTF-8.txt", "r")
# Read all rows and store them in a list
wordList = filen.readlines()
# Close File
filen.close()
# letterList index
i = 0
# List of letters that user will input
letterList = []
# List of words that are our correct answers
solvedList = []
# User inputs 9 letters that will be stored in our letterList
string = input(str("Ange Nio Bokstäver: "))
userInput = False
# Checks if user input is correct
while userInput == False:
# if the string is equal to 9 letters
# insert letter into our letterList.
# also set userInput to True
if len(string) == 9:
userInput = True
for char in string:
letterList.insert(i, char)
i += 1
# If string not equal to 9 ask user for a new input
elif len(string) != 9:
print("Du har inte angivit nio bokstäver")
string = input(str("Ange Nio Bokstäver: "))
# For each word in wordList
# and for each char within that word
# check if said word contains a letter from our letterList
# if it does and meets the requirements to be a correct answer
# add said word to our solvedList
for word in wordList:
for char in word:
if char in letterList:
if len(word) >= 4 and len(word) <= 9 and letterList[4] in word:
print("Char:", word)
solvedList.append(word)
我遇到的问题是,它不是打印仅包含我的
letterList
字母的单词,而是打印出包含至少一个来自我的letterList
字母的单词。这也意味着某些单词会被多次打印,例如,如果这些单词包含 letterList
中的多个字母。
我已经尝试解决这些问题有一段时间了,但我似乎无法弄清楚。我还尝试使用排列来创建列表中字母的所有可能组合,然后将它们与我的
wordlist
进行比较,但是我认为解决方案是考虑到必须创建的组合数量而减慢速度。
# For each word in wordList
# and for each char within that word
# check if said word contains a letter from our letterList
# if it does and meets the requirements to be a correct answer
# add said word to our solvedList
for word in wordList:
for char in word:
if char in letterList:
if len(word) >= 4 and len(word) <= 9 and letterList[4] in word:
print("Char:", word)
solvedList.append(word)
另外,由于我对 Python 还不太熟悉,如果您有任何一般性技巧可以分享,我将非常感激。
您获得多个单词主要是因为您迭代给定单词中的每个字符,并且如果该字符位于您附加并打印的
letterList
中。
相反,基于单词而不是字符进行迭代,同时还使用
with
上下文管理器自动关闭文件:
with open('american-english') as f:
for w in f:
w = w.strip()
cond = all(i in letterList for i in w) and letterList[4] in w
if 9 > len(w) >= 4 and cond:
print(w)
这里
cond
用于修剪if
语句,all(..)
用于检查单词中的每个字符是否在letterList
中,w.strip()
用于删除多余的空格。
此外,要在输入为
letterList
字母时填充 9
,不要使用 insert
。相反,只需将字符串提供给 list
,列表就会以类似但明显更快的方式创建:
这个:
if len(string) == 9:
userInput = True
for char in string:
letterList.insert(i, char)
i += 1
可以写成:
if len(string) == 9:
userInput = True
letterList = list(string)
通过这些更改,不需要初始
open
和 readlines
,也不需要初始化 letterList
。
你可以尝试这个逻辑:
for word in wordList:
# if not a valid work skip - moving this check out side the inner for-each will improve performance
if len(word) < 4 or len(word) > 9 or letterList[4] not in word:
continue
# find the number of matching words
match_count = 0
for char in word:
if char in letterList:
match_count += 1
# check if total number of match is equal to the word count
if match_count == len(word):
print("Char:", word)
solvedList.append(word)
您可以使用 lambda 函数来完成此任务。 我只是在这里放置一个 POC,留给您将其转换为完整的解决方案。
filen = open("test.text", "r")
word_list = filen.read().split()
print("Enter your string")
search_letter = raw_input()[4]
solved_list = [ word for word in word_list if len(word) >= 4 and len(word) <= 9 and search_letter in word]
print solved_list