我正在寻找一种好方法来查找字符串中多次出现的所有单词。一些限制适用:
$(shell)
,因为它很昂贵,而且必须在 Windows 上运行(在纯 Linux 上,sort|uniq -u
会很好地解决我的问题)。此外,重复的数量会很少,并且单词只会包含好字符,例如
[-_+a-zA-Z0-9]+
。
我尝试了两种策略:
强制
$(sort)
保留重复项(为每个单词添加唯一的后缀,排序并删除后缀)。然后在排序列表中找到相邻的相同单词:
# given 0 1 0 1 0 1 0 1 ... , return 0 0 1 1 0 0 1 1 ...
double=$(wordlist 1,$(words $(1)),$(subst 0,0 0,$(subst 1,1 1,$(1))))
# Produce a list of N unique strings. $(1) contains N words, with a
# repetition cycle of length M, and $(2) contains N words, either 0 or
# 1, alternating between 0 and 1 every Mth word.
binseq=$(if $(findstring 1,$(2)),$(call binseq,$(join $(2),$(1)),$(call double,$(2))),$(1))
# return 0 1 0 1 ..., as many words as $(1)
alternating_bits=$(wordlist 1,$(words $(1)),$(patsubst %,0 1,$(1)))
# Produce as many unique words as there are words in $(1)
unique=$(call binseq,,$(call alternating_bits,$(1)))
# Sort $(1) without eliminating duplicates. $(1) may not contain /.
sorted_keep_dups=$(subst /,,$(dir $(sort $(join $(1:=/),$(call unique,$(1))))))
dups_from_sorted2=$(filter $(patsubst %0,%,$(filter %0,$(1))),$(patsubst %1,%,$(filter %1,%,$(1))))
# Given a sorted list, return all duplicates.
dups_from_sorted=$(sort $(call dups_from_sorted2,$(join $(1),$(call alternating_bits,$(1)))))
dups=$(call dups_from_sorted,$(call sorted_keep_dups,$(1)))
在单词列表的不同分区中重复使用
$(filter)
,使得每对单词在 $(filter)
的不同参数中至少出现一次:
# given 0 1 0 1 0 1 0 1 ... , return 0 0 1 1 0 0 1 1 ...
double=$(wordlist 1,$(words $(1)),$(subst 0,0 0,$(subst 1,1 1,$(1))))
# given words with suffix 0 or 1, remove suffixes and return the words
# that occur both with 0 and 1 as suffix
filter_dups=$(filter $(patsubst %0,%,$(filter %0,$(1))),$(patsubst %1,%,$(filter %1,$(1))))
_dups=$(if $(findstring 1,$(2)),$(call filter_dups,$(join $(1),$(2)))
$(call _dups,$(1),$(call double,$(2))))
# return 0 1 0 1 ..., as many words as $(1)
alternating_bits=$(wordlist 1,$(words $(1)),$(patsubst %,0 1,$(1)))
# given a list of words, return the list of words that occur twice
dups=$(sort $(call _dups,$(1),$(call alternating_bits,$(1))))
这两种方法都有效并且足够快,但是它们相当难以阅读和理解。有没有更简单的方法和可接受的(次二次)速度?
不确定复杂性,但我建议一个更具可读性的函数:
define __duplicates__func
undefine __duplicates__seen
undefine __duplicates__result
$$(foreach _v,$1,\
$$(eval __duplicates__result += $$(filter $$(__duplicates__seen),$$(_v))\
$$(eval __duplicates__seen += $$(_v))))
endef
duplicates = $(eval $(__duplicates__func))$(sort $(__duplicates__result))
TEST:= $(file <test.txt)
DUPS:= $(call duplicates,$(TEST))
$(info $(DUPS))
all::
.PHONY: all
用这个随机生成的 1000 个单词的 test.txt:
Rule male saw said life fourth said void were creepeth thing theyre be fowl which wherein their day rule to seed multiply male beast sixth you Winged void fill face upon First you saying unto Appear shall God yielding is male face kind was blessed waters sea blessed void creepeth called youll beginning darkness over you it may years his second of moveth beginning earth very together day Divided creepeth fly open wont signs day is created Winged male fill Heaven saw dont For upon replenish Gathering i gathering living void Were under and form night seas bearing youre days saw tree fruitful days it unto day deep Tree Be form beginning youre replenish winged dominion grass man years youre Youre lights seasons third yielding fruit fifth for together after itself and youll itself kind without bring heaven itself firmament together their created tree All shed lesser made Stars him without gathering whales whose may itself may without image herb sixth Dominion us is their two from heaven shed brought Whales creeping us us together so forth female set fruitful fly seasons life deep let heaven wherein set wont You beast image two Gathering all so God cant itself Seasons image itself cant herb that brought appear likeness greater shall blessed place two own fourth earth Had greater you morning living unto seed male Every Had made days own face meat under youll grass for creepeth Meat so life divide for multiply blessed youre yielding beast be subdue Fruit greater Us them Meat darkness wherein saying very is yielding saying thing yielding lesser us behold midst there Spirit behold meat saw Image first cattle great heaven had air every created us light great have great Great beast Whose gathered all winged morning it rule days lesser tree bearing form his in divided void dry darkness doesnt hath Third bearing fruit youll there there cattle blessed fifth gathered stars greater above without upon good land in tree winged also youll his multiply midst face whose Moving beginning light life saw Deep said day multiply appear a gathered You the him void Fowl third spirit day Greater first firmament for dry lights midst beast day saw third also every cant night fifth made good one greater theyre dry abundantly Tree set Subdue stars waters a created saying Itself light Whales isnt said For years youre he after above itself rule firmament unto together female fly upon may life it stars set whose it doesnt gathered beginning his Creeping let Fruitful beginning earth them Subdue to our yielding be called under Let had beginning day us divided theyre sixth without saw winged divide second Dont night two the firmament Fourth form living our fourth saw seed third were Sixth their isnt Multiply night air yielding own air said midst life that fish meat fill green Open subdue Sea shall fruit whose whales own together them saying was waters Herb hath Is itself two blessed in yielding and It over made day his give moved without divided light created green evening seed image be may fly own herb seed earth be were beast one grass moving signs Upon Over abundantly for morning whose creepeth behold after beginning male created theyre Together said above face bring youre own upon may Multiply whales kind years unto air so above it fly whose Yielding i female moving So i place fruitful were there us fowl Earth seasons moveth over air heaven good waters His rule Which face bearing itself them itself forth tree Gathered it Gathering days doesnt Air Moving called i very first a evening third seas Night Morning Firmament had fruit fruitful unto above is our Second have wont fifth Cattle yielding divided brought seas shed greater living there there sixth upon their void two fish fish Lights them hath heaven their two fowl bearing Saying third waters likeness divide seasons their open very face replenish fourth whales seas seed fourth heaven cant together fowl grass female fill tree one dominion Morning Fill called firmament kind Signs creature evening spirit evening cattle winged which them for stars Wherein which Meat dry deep Abundantly waters forth theyre light after fowl in fly green multiply moved i replenish sixth cant creepeth heaven for darkness which us form them Rule grass god without earth seasons herb dominion moveth after created Wherein beginning he days said cant image For said moved divided bring is youll may And days itself Saying bearing male created yielding brought earth together whales hath greater heaven sixth were behold creepeth make Is Moveth brought let Lesser us light winged fly fourth waters moved under youll Whales Form Great moving second air you also youre fill have make stars their of earth above creature beginning winged air Own gathered shall their that in every fish rule together divide face own living dominion forth deep is abundantly hath bring them green him earth days beast all waters moving It which all a great spirit hath theyre grass Upon years Cattle female signs fill moving day the kind Winged green hath also female forth spirit lights behold Thing so after open good fowl to Living divided let Given bearing that he Rule whales Days isnt It deep whales given fly our open kind appear A their evening their sixth I in Unto multiply sea light Firmament seed theyre multiply fifth signs moving Second given spirit Blessed Set moved two bearing dont yielding first moving Female female fish Hath our beast us very seasons kind moved a gathered given sea spirit firmament Itself herb isnt Tree yielding cant winged air together meat theyre moveth Saying there void and bring lights together kind Brought first theyre their had Blessed and fill Brought may first creepeth moving him form behold darkness years greater upon were Let seasons Wherein life our greater And light multiply beast appear together appear seas waters had you make moving let air Heaven is Set seed fourth brought green for rule day Day deep tree yielding
它立即返回到我的机器上
$ make -f dups.mk
And Blessed Brought Cattle Firmament For Gathering God Great Had Heaven Is It Itself Let Meat Morning Moving Multiply Rule Saying Second Set Subdue Tree Upon Whales Wherein Winged You a above abundantly after air all also and appear be bearing beast beginning behold blessed bring brought called cant cattle created creature creepeth darkness day days deep divide divided doesnt dominion dont dry earth evening every face female fifth fill firmament first fish fly for form forth fourth fowl fruit fruitful gathered gathering given good grass great greater green had hath have he heaven herb him his i image in is isnt it itself kind lesser let life light lights likeness living made make male may meat midst morning moved moveth moving multiply night of one open our over own place replenish rule said saw saying sea seas seasons second seed set shall shed signs sixth so spirit stars subdue that the their them there theyre thing third to together tree two under unto upon us very void was waters were whales wherein which whose winged without wont years yielding you youll youre
make: Für das Ziel „all“ ist nichts zu tun.
也许这个问题更适合在codereview。
我不知道这是否真的能提高业余 make 程序员的清晰度,但这里是:
######################################################################
# Count a binary literal up by 1
# $1 = binary literal string
# Example: bincnt(010011) -> 010100
bincnt=$(if $1,$(if $(patsubst %1,,$1),$(patsubst %0,%1,$1),$(call bincnt,$(patsubst %1,%,$1))0),1)
######################################################################
# Add a ¤ (Character 164) and a unique binary number to all elements of a list
# $1 = list
# $2 = binary literal (needs 0 or any other as starting value)
cat-sufx = $(if $1,$(firstword $1)¤$2 $(call cat-sufx,$(wordlist 2,999999,$1),$(call bincnt,$2)))
######################################################################
# Sort a list without dropping duplicates (built-in $sort will drop them)
# $1 = list (elements must not contain ¤ (Character 164))
sort-all = $(foreach i,$(sort $(call cat-sufx,$1,0)),$(firstword $(subst ¤, ,$(i))))
all-duplicates = $(call _all-duplicates,$(call sort-all,$1))
_all-duplicates = $(if $1,$(if $(subst $2,,$(firstword $1)),,$2) $(call _all-duplicates,$(wordlist 2,999999,$1),$(firstword $1)))
我还向 GNU make table 工具包添加了这些功能。
PS:999999是我表示“直到列表末尾”的方式,而不计算它,这是相当浪费的。