批量迁移gitlab内网仓库到codearts repo-九游平台

背景介绍

codearts repo现有迁仓能力只支持公网之间迁移，缺少客户内网自建代码托管平台往repo迁移的快速方案，因此提供批量迁移内网代码托管平台仓库到repo的脚本。

配置访问codearts repo的ssh公钥

在进行批量迁移gitlab的代码仓到codearts repo前，您需要安装git bash客户端，并且把本地生成的ssh公钥配置到codearts repo，具体操作步骤如下：

运行git bash，先检查本地是否已生成过ssh密钥。
如果选择rsa算法，请在git bash中执行如下命令：
```
cat ~/.ssh/id_rsa.pub
```
如果选择ed255219算法，请在git bash中执行如下命令：
```
cat ~/.ssh/id_ed25519.pub
```
- 如果提示“no such file or directory”，说明您这台计算机没生成过ssh密钥，请继续执行2。
- 如果返回以ssh-rsa或ssh-ed25519开头的字符串，说明您这台计算机已经生成过ssh密钥，如果想使用已经生成的密钥请直接跳到3，如果想重新生成密钥，请从2向下执行。
生成ssh密钥。如果选择rsa算法，在git bash中生成密钥的命令如下：
```
ssh-keygen -t rsa -b 4096 -c your_email@example.com
```
其中，-t rsa表示生成的是rsa类型密钥，-b 4096是密钥长度（该长度的rsa密钥更具安全性），-c your_email@example.com表示在生成的公钥文件中添加注释，方便识别这个密钥对的用途。

如果选择ed25519算法，在git bash中生成密钥的命令如下：
```
ssh-keygen -t ed25519 -b 521 -c your_email@example.com
```
其中，-t ed25519表示生成的是ed25519类型密钥，-b 521是密钥长度（该长度的ed25519密钥更具安全性），-c your_email@example.com表示在生成的公钥文件中添加注释，方便识别这个密钥对的用途。

输入生成密钥的命令后，直接回车，密钥会默认存储到~/.ssh/id_rsa或者~/.ssh/id_ed25519路径下，对应的公钥文件为~/.ssh/id_rsa.pub或者~/.ssh/id_ed25519.pub。
复制ssh公钥到剪切板。请根据您的操作系统，选择相应的执行命令，将ssh公钥复制到您的剪切板。
- windows：
```
clip < ~/.ssh/id_rsa.pub
```
- mac：
```
pbcopy < ~/.ssh/id_rsa.pub
```
- linux (xclip required):
```
xclip -sel clip < ~/.ssh/id_rsa.pub
```
登录并进入repo的代码仓库列表页，单击右上角昵称，选择“个人设置” > “代码托管” > “ssh密钥”，进入配置ssh密钥页面。
也可以在repo的代码仓库列表页，单击右上角“设置我的ssh密钥”，进入配置ssh密钥页面。
在“标题”中为您的新密钥起一个名称，将您在3中复制的ssh公钥粘贴进“密钥”中，单击确定后，弹出页面“密钥已设置成功，单击立即返回，无操作3s后自动跳转”，表示密钥设置成功。

批量迁移gitlab内网仓库到codearts repo

进入下载并安装python3。
登录gitlab并获取private_token，在“用户设置”里，选择“访问令牌” > “添加新令牌”。
您需要在本地生成ssh公钥并配置到gitlab和codearts repo，其中配置到codearts repo可参考配置访问codearts repo的ssh公钥。
调试接口，通过华为云账号的用户密码获取用户token。参数的填写方法，您可以在接口的调试界面，单击右侧“请求示例”，填写好参数后，单击“调试”，将获取到的用户token复制并保存到本地。

用获取到的用户token配置“config.json”文件。其中，source_host_url是您内网的gitlab的接口地址，repo_api_prefix是codearts repo 的openapi地址。

{
	"source_host_url": "http://{source_host}/api/v4/projects?simple=true",
	"private_token": "gitlab上获取的private_token",
	"repo_api_prefix": "https://${open_api}",
	"x_auth_token": "用户token"
}

登录codearts九游平台首页创建项目并保存您的项目id。
用获取的项目id配置“plan.json”文件，如下的示例表示两个代码仓的迁移配置，您可以根据需要进行配置。此处的g1/g2/g3表示代码组路径，如果没有提前在页面创建，根据该配置会自动生成。
```
[
	["path_with_namespace", "项目id", "g1/g2/g3/目标仓库名1"],
        ["path_with_namespace", "项目id", "g1/g2/g3/目标仓库名2"]
]
```
- 代码组的创建请进入codearts repo九游平台首页，单击“新建仓库”旁的下拉框，选择“新建代码组”。
- 代码仓库的名字需要以大小写字母、数字、下划线开头，可包含大小写字母、数字、中划线、下划线、英文句点，但不能以.git、.atom或.结尾。

在本地python控制台，创建migrate_to_repo.py文件。

#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import logging
import os
import subprocess
import time
import urllib.parse
import urllib.request
from logging import handlers
# 存在同名仓库时是否跳过
skip_same_name_repo = true
status_ok = 200
status_created = 201
status_internal_server_error = 500
status_not_found = 404
http_method_post = "post"
code_utf8 = 'utf-8'
file_source_repo_info = 'source_repos.json'
file_target_repo_info = 'target_repos.json'
file_config = 'config.json'
file_plan = 'plan.json'
file_log = 'migrate.log'
x_auth_token = 'x-auth-token'
class logger(object):
    def __init__(self, filename):
        format_str = logging.formatter('%(asctime)s - %(pathname)s[line:%(lineno)d] - %(levelname)s: %(message)s')
        self.logger = logging.getlogger(filename)
        self.logger.setlevel(logging.info)
        sh = logging.streamhandler()
        sh.setformatter(format_str)
        th = handlers.timedrotatingfilehandler(filename=filename, when='d', backupcount=3, encoding=code_utf8)
        th.setformatter(format_str)
        self.logger.addhandler(sh)
        self.logger.addhandler(th)
log = logger(file_log)
def make_request(url, data={}, headers={}, method='get'):
    headers["content-type"] = 'application/json'
    headers['accept-charset'] = code_utf8
    params = json.dumps(data)
    params = bytes(params, 'utf8')
    try:
        import ssl
        ssl._create_default_https_context = ssl._create_unverified_context
        request = urllib.request.request(url, data=params, headers=headers, method=method)
        r = urllib.request.urlopen(request)
        if r.status != status_ok and r.status != status_created:
            log.logger.error('request error: '   str(r.status))
            return r.status, ""
    except urllib.request.httperror as e:
        log.logger.error('request with code: '   str(e.code))
        msg = str(e.read().decode(code_utf8))
        log.logger.error('request error: '   msg)
        return status_internal_server_error, msg
    content = r.read().decode(code_utf8)
    return status_ok, content
def read_migrate_plan():
    log.logger.info('read_migrate_plan start')
    with open(file_plan, 'r') as f:
        migrate_plans = json.load(f)
    plans = []
    for m_plan in migrate_plans:
        if len(m_plan) != 3:
            log.logger.error("line format not match \"source_path_with_namespace\",\"project_id\",\"target_namespace\"")
            return status_internal_server_error, []
        namespace = m_plan[2].split("/")
        if len(namespace) < 1 or len(namespace) > 4:
            log.logger.error("group level support 0 to 3")
            return status_internal_server_error, []
        l = len(namespace)
        plan = {
            "path_with_namespace": m_plan[0],
            "project_id": m_plan[1],
            "groups": namespace[0:l - 1],
            "repo_name": namespace[l - 1]
        }
        plans.append(plan)
    return status_ok, plans
def get_repo_by_plan(namespace, repos):
    if namespace not in repos:
        log.logger.info("%s not found in gitlab, skip" % namespace)
        return status_not_found, {}
    repo = repos[namespace]
    return status_ok, repo
def repo_info_from_source(config):
    if os.path.exists(file_source_repo_info):
        log.logger.info('get_repos skip: %s already exist' % file_source_repo_info)
        return status_ok
    log.logger.info('get_repos start')
    headers = {'private-token': config['private_token']}
    url = config['source_host_url']
    per_page = 100
    page = 1
    data = {}
    while true:
        url_with_page = "%s&page=%s&per_page=%s" % (url, page, per_page)
        status, content = make_request(url_with_page, headers=headers)
        if status != status_ok:
            return status
        repos = json.loads(content)
        for repo in repos:
            namespace = repo['path_with_namespace']
            repo_info = {'name': repo['name'], 'id': repo['id'], 'path_with_namespace': namespace,
                         'ssh_url': repo['ssh_url_to_repo']}
            data[namespace] = repo_info
        if len(repos) < per_page:
            break
        page = page   1
    with open(file_source_repo_info, 'w') as f:
        json.dump(data, f, indent=4)
    log.logger.info('get_repos end with %s' % len(data))
    return status_ok
def get_repo_dir(repo):
    return "repo_%s" % repo['id']
def exec_cmd(cmd, ssh_url, dir_name):
    log.logger.info("will exec %s %s" % (cmd, ssh_url))
    pr = subprocess.popen(cmd   " "   ssh_url, cwd=dir_name, shell=true, stdout=subprocess.pipe, stderr=subprocess.pipe)
    (out, error) = pr.communicate()
    log.logger.info("stdout of %s is:%s" % (cmd, str(out)))
    log.logger.info("stderr of %s is:%s" % (cmd, str(error)))
    if "error" in str(error) or "err" in str(error) or "failed" in str(error):
        log.logger.error("%s failed" % cmd)
        return status_internal_server_error
    return status_ok
def clone_from_source(config, plans):
    log.logger.info('clone_repos start')
    with open(file_source_repo_info, 'r') as f:
        repos = json.load(f)
    for plan in plans:
        status, repo = get_repo_by_plan(plan["path_with_namespace"], repos)
        if status == status_not_found:
            return status
        name = repo["name"]
        dir_name = get_repo_dir(repo)
        folder = os.path.exists(dir_name)
        if folder:
            log.logger.info("skip clone "   name)
            continue
        os.makedirs(dir_name)
        status = exec_cmd("git clone --mirror", repo['ssh_url'], dir_name)
        if status != status_ok:
            return status
    log.logger.info('clone_repos end')
    return status_ok
def get_groups(config, project_id):
    log.logger.info('get_groups start')
    headers = {x_auth_token: config['x_auth_token']}
    api_prefix = config['repo_api_prefix']
    limit = 100
    offset = 0
    data = {}
    while true:
        url_with_page = "%s/v4/%s/manageable-groups?offset=%s&limit=%s" % (api_prefix, project_id, offset, limit)
        status, content = make_request(url_with_page, headers=headers)
        if status != status_ok:
            return status, dict()
        rows = json.loads(content)
        for row in rows:
            full_name = row['full_name']
            data[full_name] = row
        if len(rows) < limit:
            break
        offset = offset   len(rows)
    log.logger.info('get_groups end with %s' % len(data))
    return status_ok, data
def create_group(config, project_id, name, parent, has_parent):
    log.logger.info('create_group start')
    headers = {x_auth_token: config['x_auth_token']}
    api_prefix = config['repo_api_prefix']
    data = {
        'name': name,
        'visibility': 'private',
        'description': ''
    }
    if has_parent:
        data['parent_id'] = parent['id']
    url = "%s/v4/%s/groups" % (api_prefix, project_id)
    status, content = make_request(url, data=data, headers=headers, method='post')
    if status != status_ok:
        log.logger.error('create_group error: %s', str(status))
        return status
    return status_ok
# 指定代码组创建仓库
def create_repo(config, project_id, name, parent, has_parent):
    log.logger.info('create_repo start')
    headers = {x_auth_token: config['x_auth_token']}
    api_prefix = config['repo_api_prefix']
    data = {
        'name': name,
        'project_uuid': project_id,
        'enable_readme': 0
    }
    if has_parent:
        data['group_id'] = parent['id']
    url = "%s/v1/repositories" % api_prefix
    status, content = make_request(url, data=data, headers=headers, method='post')
    if "同名仓库或代码组" in content:
        log.logger.info("repo %s already exist. %s" % (name, content))
        log.logger.info("skip same name repo %s: %s" % (name, skip_same_name_repo))
        return check_repo_conflict(config, project_id, parent, name)
    elif status != status_ok:
        log.logger.error('create_repo error: %s', str(status))
        return status, ""
    response = json.loads(content)
    repo_uuid = response["result"]["repository_uuid"]
    # 创建后检查
    for retry in range(1, 4):
        status, ssh_url = get_repo_detail(config, repo_uuid)
        if status != status_ok:
            if retry == 3:
                return status, ""
            time.sleep(retry * 2)
            continue
        break
    return status_ok, ssh_url
def check_repo_conflict(config, project_id, group, name):
    if not skip_same_name_repo:
        return status_internal_server_error, ""
    log.logger.info('check_repo_conflict start')
    headers = {x_auth_token: config['x_auth_token']}
    api_prefix = config['repo_api_prefix']
    url_with_page = "%s/v2/projects/%s/repositories?search=%s" % (api_prefix, project_id, name)
    status, content = make_request(url_with_page, headers=headers)
    if status != status_ok:
        return status, ""
    rows = json.loads(content)
    for row in rows["result"]["repositories"]:
        if "full_name" in group and "group_name" in row:
            g = group["full_name"].replace(" ", "")
            if row["group_name"].endswith(g):
                return status_ok, row["ssh_url"]
        elif "full_name" not in group and name == row['repository_name']:
            # 没有代码组的场景
            return status_ok, row["ssh_url"]
    log.logger.info('check_repo_conflict end, failed to find: %s' % name)
    return status_internal_server_error, ""
def get_repo_detail(config, repo_uuid):
    log.logger.info('get_repo_detail start')
    headers = {x_auth_token: config['x_auth_token']}
    api_prefix = config['repo_api_prefix']
    url_with_page = "%s/v2/repositories/%s" % (api_prefix, repo_uuid)
    status, content = make_request(url_with_page, headers=headers)
    if status != status_ok:
        return status, ""
    rows = json.loads(content)
    log.logger.info('get_repo_detail end')
    return status_ok, rows["result"]["ssh_url"]
def process_plan(config, plan):
    # 获取项目下的组织列表
    project_id = plan["project_id"]
    status, group_dict = get_groups(config, project_id)
    if status != status_ok:
        return status, ""
    group = ""
    last_group = {}
    has_group = false
    for g in plan["groups"]:
        # 检查目标代码组，如果存在则检查下一层
        if group == "":
            group = " %s" % g
        else:
            group = "%s / %s" % (group, g)
        if group in group_dict:
            last_group = group_dict[group]
            has_group = true
            continue
        # 不存在则创建，并更新
        status = create_group(config, project_id, g, last_group, has_group)
        if status != status_ok:
            return status, ""
        status, group_dict = get_groups(config, project_id)
        if status != status_ok:
            return status, ""
        last_group = group_dict[group]
        has_group = true
    status, ssh_url = create_repo(config, project_id, plan["repo_name"], last_group, has_group)
    if status != status_ok:
        return status, ""
    return status, ssh_url
def create_group_and_repos(config, plans):
    if os.path.exists(file_target_repo_info):
        log.logger.info('create_group_and_repos skip: %s already exist' % file_target_repo_info)
        return status_ok
    log.logger.info('create_group_and_repos start')
    with open(file_source_repo_info, 'r') as f:
        repos = json.load(f)
        target_repo_info = {}
    for plan in plans:
        status, ssh_url = process_plan(config, plan)
        if status != status_ok:
            return status
        status, repo = get_repo_by_plan(plan["path_with_namespace"], repos)
        if status == status_not_found:
            return
        repo['codehub_sshurl'] = ssh_url
        target_repo_info[repo['path_with_namespace']] = repo
    with open(file_target_repo_info, 'w') as f:
        json.dump(target_repo_info, f, indent=4)
    log.logger.info('create_group_and_repos end')
    return status_ok
def push_to_target(config, plans):
    log.logger.info('push_repos start')
    with open(file_target_repo_info, 'r') as f:
        repos = json.load(f)
    for r in repos:
        repo = repos[r]
        name = repo["name"]
        dir_name = get_repo_dir(repo)
        status = exec_cmd("git config remote.origin.url", repo['codehub_sshurl'], dir_name   "/"   name   ".git")
        if status != status_ok:
            log.logger.error("%s git config failed" % name)
            return
        status = exec_cmd("git push --mirror -f", "", dir_name   "/"   name   ".git")
        if status != status_ok:
            log.logger.error("%s git push failed" % name)
            return
    log.logger.info('push_repos end')
def main():
    with open(file_config, 'r') as f:
        config = json.load(f)
    # read plan
    status, plans = read_migrate_plan()
    if status != status_ok:
        return
    # 获取自建gitlab仓库列表，结果输出到file_source_repo_info文件中
    if repo_info_from_source(config) != status_ok:
        return
    # clone仓库到本地
    status = clone_from_source(config, plans)
    if status != status_ok:
        return
    # 调用接口创建仓库，并记录仓库地址到file_source_repo_info中
    if create_group_and_repos(config, plans) != status_ok:
        return
    # 推送时使用ssh方式推送，请提前在codearts repo服务配置ssh key
    push_to_target(config, plans)
if __name__ == '__main__':
    main()

执行如下命令，启动脚本并完成代码仓的批量迁移。
```
python migrate_to_repo.py
```

上一篇：codearts repo最佳实践汇总

下一篇：he2e devops实践之管理代码

意见反馈

文档内容是否对您有帮助？

提交成功！非常感谢您的反馈，我们会继续努力做到更好！您可在查看反馈及问题处理状态。

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

如您有其它疑问，您也可以通过华为云社区问答频道来与我们联系探讨