逐行读取大文件

Question

我正在尝试基于 Clojure 中的迭代编写大文件阅读器。但是如何在 Clojure 中逐行返回字符串呢？我想做这样的东西：

(println (do_something(readFile (:file opts))) ; 处理并打印第一行
(println (do_something(readFile (:file opts))) ; 处理并打印第二行

代码：

(ns testapp.core
  (:gen-class)
  (:require [clojure.tools.cli :refer [cli]])
  (:require [clojure.java.io]))


(defn readFile [file, cnt]
  ; Iterate over opened file (read line by line)
  (with-open [rdr (clojure.java.io/reader file)]
    (let [seq (line-seq rdr)]
      ; how return only one line there? and after, when needed, take next line?
    )))

(defn -main [& args]
  ; Main function for project 
  (let [[opts args banner] 
        (cli args
          ["-h" "--help" "Print this help" :default false :flag true]
          ["-f" "--file" "REQUIRED: File with data"]
          ["-c" "--clusters" "Count of clusters" :default 3]
          ["-g" "--hamming" "Use Hamming algorithm"]
          ["-e" "--evklid" "Use Evklid algorithm"]
          )]
    ; Print help, when no typed args
    (when (:help opts)
      (println banner)
      (System/exit 0))
    ; Or process args and start work
    (if (and (:file opts) (or (:hamming opts) (:evklid opts)))
      (do
        ; Use Hamming algorithm
        (if (:hamming opts)
          (do
            (println (readFile (:file opts))
            (println (readFile (:file opts))
          )
          ;(count (readFile (:file opts)))
        ; Use Evklid algorithm
        (println "Evklid")))
      (println "Please, type path for file and algorithm!"))))

Answer 1

可能我不明白你所说的“逐行返回”是什么意思，但我建议你编写函数，它接受文件和处理函数，然后为你的大文件的每一行打印处理函数的结果文件。或者，甚至更通用的方式，让我们接受处理函数和输出函数（默认为 println ），所以如果我们不仅仅想要打印，而是通过网络发送它，保存在某个地方，发送到另一个线程等：

(defn process-file-by-lines
  "Process file reading it line-by-line"
  ([file]
   (process-file-by-lines file identity))
  ([file process-fn]
   (process-file-by-lines file process-fn println))
  ([file process-fn output-fn]
   (with-open [rdr (clojure.java.io/reader file)]
     (doseq [line (line-seq rdr)]
       (output-fn
         (process-fn line))))))

所以

(process-file-by-lines "/tmp/tmp.txt") ;; Will just print file line by ine
(process-file-by-lines "/tmp/tmp.txt"
                       reverse) ;; Will print each line reversed

Answer 2

尝试剂量：

(defn readFile [file]
  (with-open [rdr (clojure.java.io/reader file)]
    (doseq [line (line-seq rdr)]
      (println line))))

Answer 3

你也可以尝试从阅读器中延迟读取，这与

line-seq

返回的延迟字符串列表不同。详细信息在这个非常相似问题的答案中讨论，但要点在这里：

 (defn lazy-file-lines [file]
      (letfn [(helper [rdr]
                (lazy-seq
                  (if-let [line (.readLine rdr)]
                    (cons line (helper rdr))
                    (do (.close rdr) nil))))]
        (helper (clojure.java.io/reader file))))

然后，您可以

map

越过这些行，这些行只会根据需要进行阅读。正如链接答案中详细讨论的那样，缺点是，如果您不阅读直到文件末尾，则

(.close rdr)

将永远不会运行，可能会导致资源问题。

Answer 4

这就是我使用减速器函数实现的方法：

(defn lazy-reduce
  "Reduces lazily, produces `next` value and passes it to callback `cb`"
  ([next cb] (lazy-reduce next cb nil))
  ([next cb accumulator]
   (lazy-seq
    (when-let [value (next)]
      (let [result (cb value accumulator)]
        (cons result (lazy-reduce next cb result)))))))

(defn lazy-read-file
  "Reads the file `filename` by passing each line to `process-line`"
  [filename process-line]
  (with-open [file (reader filename)]
    (seq (lazy-reduce #(line-seq file) process-line))))

使用类似：

(defn read-a-line
  "Reads a line"
  [first-line & _state]
  (prn first-line)
  true)

(lazy-read-file filename read-a-line)

逐行读取大文件

问题描述投票：0回答：4

4个回答

最新问题

逐行读取大文件

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4