Hadoop 有配置参数
hadoop.tmp.dir
,根据文档,它是 `“其他临时目录的基础。” 我认为,此路径指的是本地文件系统。
我将此值设置为
/mnt/hadoop-tmp/hadoop-${user.name}
。格式化名称节点并启动所有服务后,我看到在 HDFS 上创建了完全相同的路径。
这是否意味着,
hadoop.tmp.dir
指的是HDFS上的临时位置?
这很令人困惑,但
hadoop.tmp.dir
用作本地临时目录的基础,也在HDFS中使用。该文档不是很好,但是 mapred.system.dir
默认设置为 "${hadoop.tmp.dir}/mapred/system"
,这定义了 HDFS 上的路径,Map/Reduce 框架在其中存储系统文件。
如果您希望这些不绑定在一起,您可以编辑您的
mapred-site.xml
,使 mapred.system.dir 的定义不与 ${hadoop.tmp.dir}
绑定
让我对 kkrugler 的答案补充一点:
共有三个 HDFS 属性,其值中包含
hadoop.tmp.dir
dfs.name.dir
:namenode存储元数据的目录,默认值为${hadoop.tmp.dir}/dfs/name
。dfs.data.dir
:HDFS数据块存储目录,默认值为${hadoop.tmp.dir}/dfs/data
。fs.checkpoint.dir
:辅助名称节点存储检查点的目录,默认值为${hadoop.tmp.dir}/dfs/namesecondary
。这就是为什么你在格式化 namenode 后会在 HDFS 中看到
/mnt/hadoop-tmp/hadoop-${user.name}
。
四处寻找有关此的信息。我唯一能想到的是 Amazon Elastic MapReduce 开发指南上的这篇文章:
在hadoop-site.xml中,我们设置 hadoop.tmp.dir 到 /mnt/var/lib/hadoop/tmp. /mnt 是哪里 我们安装“额外”的 EC2 卷, 其中可以包含比 默认音量。 (具体金额 取决于实例类型。)Hadoop 的 RunJar.java(解压的模块 输入 JAR)解释 hadoop.tmp.dir 作为 Hadoop 文件系统 路径而不是本地路径,所以它 写入 HDFS 中的路径而不是 本地路径。 HDFS挂载在 /mnt(具体来说 /mnt/var/lib/hadoop/dfs/.所以你可以 向其中写入大量数据。
hadoop.tmp.dir
是 Hadoop 的临时目录,它是一个 本地目录(非 HDFS),从 Hadoop 3.4.0 开始,它是默认目录 (core-default.xml
)
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
不同的进程/服务使用
hadoop.tmp.dir
的子文件夹作为临时数据。
# cd $HADOOP_HOME;grep -lrH --include="*.xml" "hadoop.tmp.dir"
share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
直接依赖于
hadoop.tmp.dir
的所有属性都可以通过以下方式提取:
for f in $(grep -lrH --include="*.xml" "hadoop.tmp.dir" $HADOOP_HOME);do
basename $f
xmllint --xpath '/configuration/property[contains(value,"hadoop.tmp.dir")]' $f
echo
done
hdfs-default.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${hadoop.tmp.dir}/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file://${hadoop.tmp.dir}/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]/[NVDIMM]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.
</description>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file://${hadoop.tmp.dir}/dfs/namesecondary</value>
<description>Determines where on the local filesystem the DFS secondary
name node should store the temporary images to merge.
If this is a comma-delimited list of directories then the image is
replicated in all of the directories for redundancy.
</description>
</property>
core-default.xml
<property>
<name>io.seqfile.local.dir</name>
<value>${hadoop.tmp.dir}/io/local</value>
<description>The local directory where sequence file stores intermediate
data files during merge. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>fs.s3a.buffer.dir</name>
<value>${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a</value>
<description>Comma separated list of directories that will be used to buffer file
uploads to.
Yarn container path will be used as default value on yarn applications,
otherwise fall back to hadoop.tmp.dir
</description>
</property>
<property>
<name>fs.azure.buffer.dir</name>
<value>${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/abfs</value>
<description>Directory path for buffer files needed to upload data blocks
in AbfsOutputStream.
Yarn container path will be used as default value on yarn applications,
otherwise fall back to hadoop.tmp.dir </description>
</property>
mapred-default.xml
<property>
<name>mapreduce.cluster.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<description>
The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>mapreduce.jobhistory.recovery.store.fs.uri</name>
<value>${hadoop.tmp.dir}/mapred/history/recoverystore</value>
<!--value>hdfs://localhost:9000/mapred/history/recoverystore</value-->
<description>The URI where history server state will be stored if
HistoryServerFileSystemStateStoreService is configured as the recovery
storage class.</description>
</property>
<property>
<name>mapreduce.jobhistory.recovery.store.leveldb.path</name>
<value>${hadoop.tmp.dir}/mapred/history/recoverystore</value>
<description>The URI where history server state will be stored if
HistoryServerLeveldbSystemStateStoreService is configured as the recovery
storage class.</description>
</property>
yarn-default.xml
<property>
<description>URI pointing to the location of the FileSystem path where
RM state will be stored. This must be supplied when using
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
as the value for yarn.resourcemanager.store.class</description>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value>${hadoop.tmp.dir}/yarn/system/rmstore</value>
<!--value>hdfs://localhost:9000/rmstore</value-->
</property>
<property>
<description>Local path where the RM state will be stored when using
org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore
as the value for yarn.resourcemanager.store.class</description>
<name>yarn.resourcemanager.leveldb-state-store.path</name>
<value>${hadoop.tmp.dir}/yarn/system/rmstore</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
</property>
<property>
<description>The local filesystem directory in which the node manager will
store state when recovery is enabled.</description>
<name>yarn.nodemanager.recovery.dir</name>
<value>${hadoop.tmp.dir}/yarn-nm-recovery</value>
</property>
<property>
<description>Store file name for leveldb timeline store.</description>
<name>yarn.timeline-service.leveldb-timeline-store.path</name>
<value>${hadoop.tmp.dir}/yarn/timeline</value>
</property>
<property>
<description>Store file name for leveldb state store.</description>
<name>yarn.timeline-service.leveldb-state-store.path</name>
<value>${hadoop.tmp.dir}/yarn/timeline</value>
</property>
<property>
<description>
The storage path for LevelDB implementation of configuration store,
when yarn.scheduler.configuration.store.class is configured to be
"leveldb".
</description>
<name>yarn.scheduler.configuration.leveldb-store.path</name>
<value>${hadoop.tmp.dir}/yarn/system/confstore</value>
</property>
<property>
<description>
The file system directory to store the configuration files. The path
can be any format as long as it follows hadoop compatible schema,
for example value "file:///path/to/dir" means to store files on local
file system, value "hdfs:///path/to/dir" means to store files on HDFS.
If resource manager HA is enabled, recommended to use hdfs schema so
it works in fail-over scenario.
</description>
<name>yarn.scheduler.configuration.fs.path</name>
<value>file://${hadoop.tmp.dir}/yarn/system/schedconf</value>
</property>
除此之外,还有二级依赖关系,例如
dfs.namenode.checkpoint.edits.dir
依赖于dfs.namenode.checkpoint.dir
:
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>${dfs.namenode.checkpoint.dir}</value>
<description>Determines where on the local filesystem the DFS secondary
name node should store the temporary edits to merge.
If this is a comma-delimited list of directories then the edits is
replicated in all of the directories for redundancy.
Default value is same as dfs.namenode.checkpoint.dir
</description>
</property>
所有属性的默认值都可以在相应的
-site.xml
文件中覆盖。