九游平台/ ai开发平台modelarts/ modelarts用户指南（standard）/ 使用notebook进行ai开发调试/ modelarts cli命令参考/ ma-cli ma-job训练作业支持的命令

更新时间：2025-03-04 gmt 08:00

ma-九游平台

使用ma-cli ma-job命令可以提交训练作业，查询训练作业日志、事件、使用的ai引擎、资源规格及停止训练作业等。

$ ma-cli ma-job -h
usage: ma-cli ma-job [options] command [args]...
  modelarts job submission and query jod details.
options:
  -h, -h, --help  show this message and exit.
commands:
  delete      delete training job by job id.
  get-engine  get job engines.
  get-event   get job running event.
  get-flavor  get job flavors.
  get-job     get job details.
  get-log     get job log details.
  get-pool    get job engines.
  stop        stop training job by job id.
  submit      submit training job.

表1 训练作业支持的命令
命令	命令详情
get-job	查询modelarts训练作业列表及详情。
get-log	查询modelarts训练作业运行日志。
get-engine	查询modelarts训练ai引擎。
get-event	查询modelarts训练作业事件。
get-flavor	查询modelarts训练资源规格。
get-pool	查询modelarts训练专属池。
stop	停止modelarts训练作业。
submit	提交modelarts训练作业。
delete	删除指定作业id的训练作业。

使用ma-cli ma-job get-job命令查询modelarts训练作业

使用ma-cli ma-job get-job命令可以查看训练作业列表或某个作业详情。

$ ma-cli ma-job get-job -h
usage: ma-cli ma-job get-job [options]
  get job details.
  example:
  # get train job details by job name
  ma-cli ma-job get-job -n ${job_name}
  # get train job details by job id
  ma-cli ma-job get-job -i ${job_id}
  # get train job list
  ma-cli ma-job get-job --page-size 5 --page-num 1
options:
  
  -i, --job-id text               get training job details by job id.
  -n, --job-name text             get training job details by job name.
  -pn, --page-num integer         specify which page to query.  [x>=1]
  -ps, --page-size integer range  the maximum number of results for this query.  [1<=x<=50]
  -v, --verbose                   show detailed information about training job details.
  -c, --config-file text          configure file path for authorization.
  -d, --debug                     debug mode. shows full stack trace when error occurs.
  -p, --profile text              cli connection profile to use. the default profile is "default".
  -h, -h, --help                  show this message and exit.

表2 参数说明
参数名	参数类型	是否必选	参数说明
-i / --job-id	string	否	查询指定训练作业id的任务详情。
-n / --job-name	string	否	查询指定任务名称的训练作业或根据任务名称关键字过滤训练作业。
-pn / --page-num	int	否	页面索引，默认是第1页。
-ps / --page-size	int	否	每页显示的训练作业数量，默认是10。
-v / --verbose	bool	否	显示详细的信息开关，默认关闭。

示例：查询指定任务id的训练作业。
```
ma-cli ma-job get-job -i b63e90xxx
```

示例：根据任务名称关键字“auto”过滤训练作业。
```
ma-cli ma-job get-job -n auto
```

使用ma-cli ma-job submit命令提交modelarts训练作业

执行ma-cli ma-job submit命令提交modelarts训练作业。

ma-cli ma-job submit命令需要指定一个位置参数yaml_file表示作业的配置文件路径，如果不指定该参数，则表示配置文件为空。配置文件是一个yaml格式的文件，里面的参数就是命令的option参数。此外，如果用户在命令行中同时指定yaml_file配置文件和option参数，命令行中指定的option参数的值将会覆盖配置文件相同的值。

$ma-cli ma-job submit -h
usage: ma-cli ma-job submit [options] [yaml_file]...
submit training job.
example:
ma-cli ma-job submit --code-dir obs://your_bucket/code/
--boot-file main.py
--framework-type pytorch
--working-dir /home/ma-user/modelarts/user-job-dir/code
--framework-version pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64
--data-url obs://your_bucket/dataset/
--log-url obs://your_bucket/logs/
--train-instance-type modelarts.vm.cpu.8u
--train-instance-count 1
options:
--name text job name.
--description text job description.
--image-url text full swr custom image path.
--uid text uid for custom image (default: 1000).
--working-dir text modelarts training job working directory.
--local-code-dir text modelarts training job local code directory.
--user-command text execution command for custom image.
--pool-id text dedicated pool id.
--train-instance-type text train worker specification.
--train-instance-count integer number of workers.
--data-url text obs path for training data.
--log-url text obs path for training log.
--code-dir text obs path for source code.
--output text training output parameter with obs path.
--input text training input parameter with obs path.
--env-variables text env variables for training job.
--parameters text training job parameters (only keyword parameters are supported).
--boot-file text training job boot file path behinds `code_dir`.
--framework-type text training job framework type.
--framework-version text training job framework version.
--workspace-id text the workspace where you submit training job(default "0")
--policy [regular|economic|turbo|auto]
training job policy, default is regular.
--volumes text information about the volumes attached to the training job.
-q, --quiet exit without waiting after submit successfully.
-c, --config-file path configure file path for authorization.
-d, --debug debug mode. shows full stack trace when error occurs.
-p, --profile text cli connection profile to use. the default profile is "default".
-h, -h, --help show this message and exit.

表3 参数说明
参数名	参数类型	是否必选	参数说明
yaml_file	string	否	表示训练作业的配置文件，如果不传则表示配置文件为空。
--code-dir	string	是	训练源代码的obs路径。
--data-url	string	是	训练数据的obs路径。
--log-url	string	是	存放训练生成日志的obs路径。
--train-instance-count	string	是	训练作业实例数，默认是1，表示单节点。
--boot-file	string	否	当使用自定义镜像或自定义命令时可以省略，当使用预置命令提交训练作业时需要指定该参数。
--name	string	否	训练作业名称。
--description	string	否	训练作业描述信息。
--image-url	string	否	自定义镜像swr地址，遵循organization/image_name:tag
--uid	string	否	自定义镜像运行的uid，默认值1000。
--working-dir	string	否	运行算法时所在的工作目录。
--local-code-dir	string	否	算法的代码目录下载到训练容器内的本地路径。
--user-command	string	否	自定义镜像执行命令。需为/home下的目录。当code-dir以file://为前缀时，当前字段不生效。
--pool-id	string	否	训练作业选择的资源池id。可在modelarts管理控制台，单击左侧“专属资源池”，在专属资源池列表中查看资源池id。
--train-instance-type	string	否	训练作业选择的资源规格。
--output	string	否	训练的输出信息，指定后，训练作业将会把训练脚本中指定输出参数对应训练容器的输出目录上传到指定的obs路径。如果需要指定多个参数，可以使用--output output1=obs://bucket/output1 --output output2=obs://bucket/output2
--input	string	否	训练的输入信息，指定后，训练作业将会把对应obs上的数据下载到训练容器，并将数据存储路径通过指定的参数传递给训练脚本。如果需要指定多个参数，可以使用--input data_path1=obs://bucket/data1 --input data_path2=obs://bucket/data2
--env-variables	string	否	训练时传入的环境变量，如果需要指定多个参数，可以使用--env-variables env1=env1 --env-variables env2=env2
--parameters	string	否	训练入参，可以通过--parameters "--epoch 0 --pretrained"指定多个参数。
--framework-type	string	否	训练作业选择的引擎规格。
--framework-version	string	否	训练作业选择的引擎版本。
-q / --quiet	bool	否	提交训练作业成功后直接退出，不再同步打印作业状态。
--workspace-id	string	否	作业所处的工作空间，默认值为“0”。
--policy	string	否	训练资源规格模式，可选值regular、economic、turbo、auto。
--volumes	string	否	挂载efs，如果需要指定多个参数，可以使用--volumes。 "local_path=/xx/yy/zz;read_only=false;nfs_server_path=xxx.xxx.xxx.xxx:/" -volumes "local_path=/xxx/yyy/zzz;read_only=false;nfs_server_path=xxx.xxx.xxx.xxx:/"

示例：基于modelarts预置镜像提交训练作业

指定命令行options参数提交训练作业

ma-cli ma-job submit --code-dir obs://your-bucket/mnist/code/ \
                  --boot-file main.py \
                  --framework-type pytorch \
                  --working-dir /home/ma-user/modelarts/user-job-dir/code \
                  --framework-version pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 \
                  --data-url obs://your-bucket/mnist/dataset/mnist/ \
                  --log-url obs://your-bucket/mnist/logs/ \
                  --train-instance-type modelarts.vm.cpu.8u \
                  --train-instance-count 1  \
                  -q

使用预置镜像的train.yaml样例：

# .ma/train.yaml样例（预置镜像）
# pool_id: pool_xxxx
train-instance-type: modelarts.vm.cpu.8u
train-instance-count: 1
data-url: obs://your-bucket/mnist/dataset/mnist/
code-dir: obs://your-bucket/mnist/code/
working-dir: /home/ma-user/modelarts/user-job-dir/code
framework-type: pytorch
framework-version: pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64
boot-file: main.py
log-url: obs://your-bucket/mnist/logs/
##[optional] uncomment to set uid when use custom image mode
uid: 1000
##[optional] uncomment to upload output file/dir to obs from training platform
output:
    - name: output_dir
      obs_path: obs://your-bucket/mnist/output1/
##[optional] uncomment to download input file/dir from obs to training platform
input:
    - name: data_url
      obs_path: obs://your-bucket/mnist/dataset/mnist/
##[optional] uncomment pass hyperparameters
parameters:
    - epoch: 10
    - learning_rate: 0.01
    - pretrained:
##[optional] uncomment to use dedicated pool
pool_id: pool_xxxx
##[optional] uncomment to use volumes attached to the training job
volumes:
  - efs:
      local_path: /xx/yy/zz
      read_only: false
      nfs_server_path: xxx.xxx.xxx.xxx:/

示例：基于自定义镜像创建训练作业

指定命令行options参数提交训练作业

ma-cli ma-job submit --image-url atelier/pytorch_1_8:pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64-20220926104358-041ba2e \
                  --code-dir obs://your-bucket/mnist/code/ \
                  --user-command "export ld_library_path=/usr/local/cuda/compat:$ld_library_path && cd /home/ma-user/modelarts/user-job-dir/code && /home/ma-user/anaconda3/envs/pytorch-1.8/bin/python main.py" \
                  --data-url obs://your-bucket/mnist/dataset/mnist/ \
                  --log-url obs://your-bucket/mnist/logs/ \
                  --train-instance-type modelarts.vm.cpu.8u \
                  --train-instance-count 1  \
                  -q

使用自定义镜像的train.yaml样例：

# .ma/train.yaml样例（自定义镜像）
image-url: atelier/pytorch_1_8:pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64-20220926104358-041ba2e
user-command: export ld_library_path=/usr/local/cuda/compat:$ld_library_path && cd /home/ma-user/modelarts/user-job-dir/code && /home/ma-user/anaconda3/envs/pytorch-1.8/bin/python main.py
train-instance-type: modelarts.vm.cpu.8u
train-instance-count: 1
data-url: obs://your-bucket/mnist/dataset/mnist/
code-dir: obs://your-bucket/mnist/code/
log-url: obs://your-bucket/mnist/logs/
##[optional] uncomment to set uid when use custom image mode
uid: 1000
##[optional] uncomment to upload output file/dir to obs from training platform
output:
    - name: output_dir
      obs_path: obs://your-bucket/mnist/output1/
##[optional] uncomment to download input file/dir from obs to training platform
input:
    - name: data_url
      obs_path: obs://your-bucket/mnist/dataset/mnist/
##[optional] uncomment pass hyperparameters
parameters:
    - epoch: 10
    - learning_rate: 0.01
    - pretrained:
##[optional] uncomment to use dedicated pool
pool_id: pool_xxxx
##[optional] uncomment to use volumes attached to the training job
volumes:
  - efs:
      local_path: /xx/yy/zz
      read_only: false
      nfs_server_path: xxx.xxx.xxx.xxx:/

使用ma-cli ma-job get-log命令查询modelarts训练作业日志

执行ma-cli ma-job get-log命令查询modelarts训练作业日志。

$ ma-cli ma-job get-log -h
usage: ma-cli ma-job get-log [options]
  get job log details.
  example:
  # get job log by job id
  ma-cli ma-job get-log --job-id ${job_id}
options:
  -i, --job-id text       get training job details by job id.  [required]
  -t, --task-id text      get training job details by task id (default "worker-0").
  -c, --config-file text  configure file path for authorization.
  -d, --debug             debug mode. shows full stack trace when error occurs.
  -p, --profile text      cli connection profile to use. the default profile is "default".
  -h, -h, --help          show this message and exit.

参数名	参数类型	是否必选	参数说明
-i / --job-id	string	是	查询指定训练作业id的任务日志。
-t / --task-id	string	否	查询指定task的日志，默认是work-0。

示例：查询指定训练作业id的作业日志。

ma-cli ma-job get-log --job-id b63e90baxxx

点击放大

使用ma-cli ma-job get-event命令查询modelarts训练作业事件

执行ma-cli ma-job get-event命令查看modelarts训练作业事件。

$ ma-cli ma-job get-event -h
usage: ma-cli ma-job get-event [options]
  get job running event.
  example:
  # get training job running event
  ma-cli ma-job get-event --job-id ${job_id}
options:
  -i, --job-id text       get training job event by job id.  [required]
  -c, --config-file text  configure file path for authorization.
  -d, --debug             debug mode. shows full stack trace when error occurs.
  -p, --profile text      cli connection profile to use. the default profile is "default".
  -h, -h, --help          show this message and exit.

参数名	参数类型	是否必选	参数说明
-i / --job-id	string	是	查询指定训练作业id的事件。

示例：查看指定id的训练作业的事件详情等。

ma-cli ma-job get-event --job-id b63e90baxxx

点击放大

使用ma-cli ma-job get-engine命令查询modelarts训练ai引擎

执行ma-cli ma-job get-engine命令查询modelarts训练使用的ai引擎。

$ ma-cli ma-job get-engine -h
usage: ma-cli ma-job get-engine [options]
  get job engine info.
  example:
  # get training job engines
  ma-cli ma-job get-engine
options:
  -v, --verbose           show detailed information about training engines.
  -c, --config-file text  configure file path for authorization.
  -d, --debug             debug mode. shows full stack trace when error occurs.
  -p, --profile text      cli connection profile to use. the default profile is "default".
  -h, -h, --help          show this message and exit.

表4 参数说明
参数名	参数类型	是否必选	参数说明
-v / --verbose	bool	否	显示详细的信息开关，默认关闭。

示例：查看训练作业的ai引擎。

ma-cli ma-job get-engine

点击放大

使用ma-cli ma-job get-flavor命令查询modelarts训练资源规格

执行ma-cli ma-job get-flavor命令查询modelarts训练的资源规格。

$ ma-cli ma-job get-flavor -h
usage: ma-cli ma-job get-flavor [options]
  get job flavor info.
  example:
  # get training job flavors
  ma-cli ma-job get-flavor
options:
  -t, --flavor-type [cpu|gpu|ascend]
                                  type of training job flavor.
  -v, --verbose                   show detailed information about training flavors.
  -c, --config-file text          configure file path for authorization.
  -d, --debug                     debug mode. shows full stack trace when error occurs.
  -p, --profile text              cli connection profile to use. the default profile is "default".
  -h, -h, --help                  show this message and exit.

表5 参数说明
参数名	参数类型	是否必选	参数说明
-t / --flavor-type	string	否	资源规格类型，如果不指定默认返回所有的资源规格。
-v / --verbose	bool	否	显示详细的信息开关，默认关闭。

示例：查看训练作业的资源规格及类型。

ma-cli ma-job get-flavor

点击放大

使用ma-cli ma-job stop命令停止modelarts训练作业

执行ma-cli ma-job stop命令，可停止指定作业id的训练作业。

$ ma-cli ma-job stop -h
usage: ma-cli ma-job stop [options]
  stop training job by job id.
  example:
  stop training job by job id
  ma-cli ma-job stop --job-id ${job_id}
options:
  -i, --job-id text       get training job event by job id.  [required]
  -y, --yes               confirm stop operation.
  -c, --config-file text  configure file path for authorization.
  -d, --debug             debug mode. shows full stack trace when error occurs.
  -p, --profile text      cli connection profile to use. the default profile is "default".
  -h, -h, --help          show this message and exit.

表6 参数说明
参数名	参数类型	是否必选	参数说明
-i / --job-id	string	是	modelarts训练作业id。
-y / --yes	bool	否	强制关闭指定训练作业。

示例：停止运行中的训练作业。

ma-cli ma-job stop --job-id efd3e2f8xxx

父主题： modelarts cli命令参考

上一篇：ma-cli image镜像构建支持的命令

下一篇：ma-cli dli-job提交dli spark作业支持的命令

意见反馈

文档内容是否对您有帮助？

提交成功！非常感谢您的反馈，我们会继续努力做到更好！您可在查看反馈及问题处理状态。

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

如您有其它疑问，您也可以通过华为云社区问答频道来与我们联系探讨

ma-九游平台

使用ma-cli ma-job get-job命令查询modelarts训练作业

使用ma-cli ma-job submit命令提交modelarts训练作业

示例：基于modelarts预置镜像提交训练作业

示例：基于自定义镜像创建训练作业

使用ma-cli ma-job get-log命令查询modelarts训练作业日志

使用ma-cli ma-job get-event命令查询modelarts训练作业事件

使用ma-cli ma-job get-engine命令查询modelarts训练ai引擎

使用ma-cli ma-job get-flavor命令查询modelarts训练资源规格

使用ma-cli ma-job stop命令停止modelarts训练作业

相关文档

意见反馈

文档内容是否对您有帮助？

售前咨询热线