我正在使用 Apache Ray 创建自定义集群来运行我的逻辑。但是,当我使用 ray.remote 提交任务时,它们在驱动程序节点上执行,而不是在我在 Ray 初始化期间配置的工作节点上执行。如何确保我的逻辑只在工作节点上运行?
import os
os.environ["RAY_PROFILING"] = "1"
os.environ["RAY_task_events_report_interval_ms"] = "0"
import ray
from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster
from ray.util import get_node_ip_address
RAY_LOGS_VOLUME_PATH = "***/temp/apache_ray_logs"
RAY_TIMELINE_VOLUME_PATH = "***/temp/apache_ray_logs/timeline.json"
MAX_WORKER_NODES = 2
try:
shutdown_ray_cluster()
except:
pass
try:
ray.shutdown()
except:
pass
if not os.path.exists(RAY_LOGS_VOLUME_PATH):
os.makedirs(RAY_LOGS_VOLUME_PATH)
_, cluster_address = setup_ray_cluster(
max_worker_nodes = MAX_WORKER_NODES,
memory_worker_node = 28 * 2**30,
num_cpus_per_node = 4,
num_gpus_worker_node = 0,
collect_log_to_path = RAY_LOGS_VOLUME_PATH
)
ray.init(
address = cluster_address,
ignore_reinit_error = True
)
ray.timeline(filename=RAY_TIMELINE_VOLUME_PATH)
@ray.remote
def some_function(input):
print(f"Node used: {get_node_ip_address()}")
# Some Logic
sources = ["input1", "input2"]
results = ray.get([some_function.remote(x) for x in sources])
使用命令
0
启动具有 ray start --head --num-cpus=0
逻辑 CPU 的头节点指示 Ray 调度程序不要在头节点上调度任何需要逻辑 CPU 资源的任务或参与者。
此配置仅将头节点专用于 Ray 系统进程,从而保持头节点稳定。
欲了解更多信息,请参阅以下链接: