如何从bash中的目录中选择随机文件？

Question

我有一个约2000个文件的目录。如何通过使用bash脚本或管道命令列表来选择N文件的随机样本？

Answer 1

这是一个使用GNU sort的随机选项的脚本：

ls |sort -R |tail -$N |while read file; do
    # Something involving $file, or you can leave
    # off the while to just get the filenames
done

Answer 2

我使用它：它使用临时文件，但深入到目录，直到找到一个常规文件并返回它。

# find for a quasi-random file in a directory tree:

# directory to start search from:
ROOT="/";  

tmp=/tmp/mytempfile    
TARGET="$ROOT"
FILE=""; 
n=
r=
while [ -e "$TARGET" ]; do 
    TARGET="$(readlink -f "${TARGET}/$FILE")" ; 
    if [ -d "$TARGET" ]; then
      ls -1 "$TARGET" 2> /dev/null > $tmp || break;
      n=$(cat $tmp | wc -l); 
      if [ $n != 0 ]; then
        FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
#       r=$(($RANDOM % $n)) ; 
#       FILE=$(tail -n +$(( $r + 1 ))  $tmp | head -n 1); 
      fi ; 
    else
      if [ -f "$TARGET"  ] ; then
        rm -f $tmp
        echo $TARGET
        break;
      else 
        # is not a regular file, restart:
        TARGET="$ROOT"
        FILE=""
      fi
    fi
done;

Answer 3

如果文件夹中有更多文件，则可以使用我在unix stackexchange中找到的以下管道命令。

find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/

在这里我想复制文件，但如果你想移动文件或做其他事情，只需更改我使用cp的最后一个命令。

Answer 4

如何从康先生那里略微篡改Perl解决方案： How can I shuffle the lines of a text file on the Unix command line or in a shell script?

$ ls | perl -MList :: Util = shuffle -e'@ lines = shuffle（<>）; print @lines [0..4]'

Answer 5

你可以使用shuf（来自GNU coreutils包）。只需输入一个文件名列表，并要求它从随机排列中返回第一行：

ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..

调整-n, --head-count=COUNT值以返回所需行数。例如，要返回5个随机文件名，您将使用：

find dirname -type f | shuf -n 5

Answer 6

以下是一些不解析ls输出的可能性，对于名称中带有空格和有趣符号的文件，它们是100％安全的。所有这些都将使用随机文件列表填充数组randf。如果需要，可以使用printf '%s\n' "${randf[@]}"轻松打印此阵列。

这个可能会多次输出相同的文件，并且需要事先知道N。在这里我选择N = 42。 a=( * ) randf=( "${a[RANDOM%${#a[@]}]"{1..42}"}" ) 此功能没有很好的记录。
如果事先不知道N，但你真的很喜欢以前的可能性，你可以使用eval。但它是邪恶的，你必须确保N不直接来自用户输入而不经过彻底检查！ N=42 a=( * ) eval randf=( \"\${a[RANDOM%\${#a[@]}]\"\{1..$N\}\"}\" ) 我个人不喜欢eval，因此这个答案！
使用更直接的方法（循环）相同： N=42 a=( * ) randf=() for((i=0;i<N;++i)); do randf+=( "${a[RANDOM%${#a[@]}]}" ) done
如果您不希望多次使用同一个文件： N=42 a=( * ) randf=() for((i=0;i<N && ${#a[@]};++i)); do ((j=RANDOM%${#a[@]})) randf+=( "${a[j]}" ) a=( "${a[@]:0:j}" "${a[@]:j+1}" ) done

注意。这是旧帖子的迟到答案，但接受的答案链接到显示可怕的bash练习的外部页面，而另一个答案并没有好多，因为它也解析了ls的输出。对接受的答案的评论指出了Lhunath的一个很好的答案，它显然表现出良好的实践，但并没有完全回答OP。

Answer 7

8
投票

ls | shuf -n 10 # ten random files

Answer 8

在5选择avoiding to parse ls随机文件的简单解决方案。它还适用于包含空格，换行符和其他特殊字符的文件：

shuf -ezn 5 * | xargs -0 -n1 echo

将echo替换为您要为文件执行的命令。

Answer 9

如果安装了Python（适用于Python 2或Python 3）：

要选择一个文件（或来自任意命令的行），请使用

ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"

要选择N文件/行，请使用（注意N位于命令的末尾，将其替换为数字）

ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N

Answer 10

这是对@gniourf_gniourf迟到的答案后来的回应，我刚刚赞成，因为它是迄今为止最好的答案，两次。（一次用于避免使用eval，一次用于安全文件名处理。）

但是我花了几分钟时间来解开这个答案使用的“没有很好记录”的功能。如果您的Bash技能足够坚实，您可以立即看到它是如何工作的，那么请跳过此评论。但我没有，并且已经解开它，我认为值得解释。

功能＃1是shell自己的文件通配符。 a=(*)创建了一个数组$a，其成员是当前目录中的文件。 Bash理解文件名的所有奇怪之处，因此列表保证正确，保证转义等。无需担心正确解析ls返回的文本文件名。

特征＃2是parameter expansions的Bash arrays，一个嵌套在另一个中。这开始于${#ARRAY[@]}，它扩展到$ARRAY的长度。

然后使用该扩展来下标数组。找到1到N之间的随机数的标准方法是取模数为N的随机数的值。我们想要一个介于0和数组长度之间的随机数。这是方法，为清楚起见分为两行：

LENGTH=${#ARRAY[@]}
RANDOM=${a[RANDOM%$LENGTH]}

但是这个解决方案只需一行即可完成，从而消除了不必要的变量赋值。

特色＃3是Bash brace expansion，虽然我不得不承认我并不完全理解它。例如，使用大括号扩展来生成名为filename1.txt，filename2.txt等的25个文件的列表：echo "filename"{1..25}".txt"。

上面的子shell中的表达式"${a[RANDOM%${#a[@]}]"{1..42}"}"使用该技巧产生42个单独的扩展。大括号扩展在]和}之间放置一个数字，起初我认为是下标数组，但如果是这样，它前面会有一个冒号。（它也会从数组中的一个随机点返回42个连续项，这与从数组中返回42个随机项完全不同。）我认为它只是使shell运行扩展42次，从而返回数组中的42个随机项。（但如果有人能够更充分地解释它，我很乐意听到它。）

N必须被硬编码（到42）的原因是支撑扩展在变量扩展之前发生。

最后，这是功能＃4，如果你想以递归方式为目录层次结构执行此操作：

shopt -s globstar
a=( ** )

这会导致shell option导致**递归匹配。现在，您的$a数组包含整个层次结构中的每个文件。

Answer 11

这是我可以在MacOS上与bash玩得很好的唯一脚本。我合并并编辑了以下两个链接的片段：

ls command: how can I get a recursive full-path listing, one line per file?

http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/

#!/bin/bash

# Reads a given directory and picks a random file.

# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"

# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'

if [[ -d "${DIR}" ]]
then
  # Runs ls on the given dir, and dumps the output into a matrix,
  # it uses the new lines character as a field delimiter, as explained above.
  #  file_matrix=($(ls -LR "${DIR}"))

  file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
  num_files=${#file_matrix[*]}

  # This is the command you want to run on a random file.
  # Change "ls -l" by anything you want, it's just an example.
  ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi

exit 0

Answer 12

MacOS没有sort -R和shuf命令，因此我需要一个仅使用bash的解决方案，可以随机化所有文件而不会重复，并且在此处找不到。此解决方案类似于gniourf_gniourf的解决方案＃4，但希望添加更好的评论。

脚本应该很容易修改，以便在使用带有if的计数器的N个样本后停止，或者使用带有N. $ RANDOM的gniourf_gniourf for循环限制为~32000个文件，但这应该适用于大多数情况。

#!/bin/bash

array=(*)  # this is the array of files to shuffle
# echo ${array[@]}
for dummy in "${array[@]}"; do  # do loop length(array) times; once for each file
    length=${#array[@]}
    randomi=$(( $RANDOM % $length ))  # select a random index

    filename=${array[$randomi]}
    echo "Processing: '$filename'"  # do something with the file

    unset -v "array[$randomi]"  # set the element at index $randomi to NULL
    array=("${array[@]}")  # remove NULL elements introduced by unset; copy array
done

如何从bash中的目录中选择随机文件？

问题描述投票：110回答：12

12个回答

最新问题

如何从bash中的目录中选择随机文件？

问题描述 投票：110回答：12

12个回答

最新问题

问题描述投票：110回答：12