Source: https://gitlab.com/gitlab-org/gitlab-runner/issues/1583#note_93170156
OK, I've experimented a lot getting this going with the docker+machine executor (specifically with the amazonec2 driver, which I suspect is quite common for people looking at this thread!), it may also be helpful to others when debugging what's going on for them.
docker+machine is interesting because it has several relevant contexts (i.e. a file system and environment variables), which I shall refer to as:
- "runner": what is running the
gitlab-runnerbinary - in my case this is an ECS-managed docker container for thegitlab/gitlab-runnerimage on docker hub, but it could thesystemdservice configuration if you're running directly on the machine. - "job host": the docker-machine created machine (e.g. EC2 instance) that runs the docker daemon
- "job container": the docker container for the image specified in the project
.gitlab-ci.yaml(or the default in config.toml)
Of course, if you're not using the docker machine (or ssh?) executor, then the runner and job host context are on the same physical machine.
With some experimenting, and spelunking through this project, I found out the following:
- The
gitlab-runnerbinary is what callsdocker-credential-ecr-login, so make suredocker-credential-ecr-login versionin the runner context succeeds, and that the runner context is the one with IAM permissions for ECR gitlab-runneruses the docker go client library to talk to the docker daemon, not thedockerCLI, so it must re-implement configuration parsing and authentication. In particular, this means thatcredsStoreis implemented (by !501 (merged)), but notcredHelpersDOCKER_AUTH_CONFIGis defined and used bygitlab-runner, not by docker, so don't expect setting that to make thedockerCLI work.DOCKER_AUTH_CONFIGshould still be specified as a job-visible environment variable, e.g. inconfig.tomlenvironment, or pipeline secret variables etc., even though it's actually read bygitlab-runnerin the runner context, not the job container. That one is weird. I suspect usingengine-envinMachineOptionsto set this would not work because of this?gitlab-runneruses the provided credsStorelistcommand for... some reason? Unfortunately, at some point AWS added the requirement todocker-credential-ecr-login listthat the AWS region is provided, the simplest way to do this is to set theAWS_REGIONenvironment variable - but unlikeDOCKER_AUTH_CONFIGthis must be in the runner context- Test the final call that actually gets the token with
echo $REGISTRY_NAME | docker-credential-ecr-login get, where$REGISTRY_NAMEshould look like123456789012.dkr.ecr.my-region-1.amazonaws.com(the part of the repository name before the first/)
Unrelated to gitlab, but also:
- By default the EC2 instance profile is exposed to docker containers that are run in it. You can test this with
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<iam-role-name>, which will return the access key id and secret key along with other metadata. You can lock this down further with ECS task roles, but I haven't looked into that myself. This applies both to running gitlab-runner as a docker container, and todocker-machinecreated EC2 instances with theamazonec2-iam-instance-profilemachine option. - The only relevant ECR permission when actually using docker is
ecr:GetAuthorizationToken, which doesn't distinguish between read and write, nor to individual repositories (only at the registry level), so don't bother trying to lock down permission to push to ECR.
In summary, to pull ECR as the job image:
- ensure the runner context has credentials with ECR permissions - including via IAM profiles if it's on EC2, but the default profile in
~/.aws/config/~/.aws/credentialsshould also work? - put
docker-credential-ecr-loginon the PATH forgitlab-runner(and don't forget to +x, of course) - set
AWS_REGIONto the region of your ECR repository (don't think it's possible to be cross-region yet) config.tomlshould haveenvironment = ["DOCKER_AUTH_CONFIG={\"credsStore\":\"ecr-login\"}"]in[[runners]], or if you have multiple private registries(?), as a runner pipeline variable or in.gitlab-ci.yamlvariables.
This wont get you the ability to use ECR in your CI job scripts though, for that you have a few options, but it's easy enough to extend the solution:
- grant the docker client in the job container access to the docker daemon on the job host (installed by docker-machine) by sharing
/var/run/docker.sock - make sure in the job
/root/.docker/config.json(remember,DOCKER_AUTH_CONFIGis not read bydockerCLI) has{"credsStore":"ecr-login"}, anddocker-credential-ecr-loginis on the path. - that the job container context has AWS credentials with ECR permissions, so
docker-credential-ecr-logincan get the token, same as above. - that you have the
dockerclient binary, of course! You can use thedockerimage, or also mount the job hostdockerbinary.
Note that docker doesn't require AWS_REGION, it only uses get with the actually accessed registry.
The way I did this is update config.toml to have:
[[runners]]
[runners.docker]
volumes = [
"/cache",
# So 'docker' client works in CI
"/var/run/docker.sock:/var/run/docker.sock",
# So 'docker push <ECR image> works in CI
"/root/.docker:/root/.docker",
"/usr/local/bin/docker-credential-ecr-login:/usr/local/bin/docker-credential-ecr-login"
]
[runners.machine]
MachineOptions = [
"amazonec2-iam-instance-profile=RUNNER_INSTANCE_PROFILE_NAME",
"amazonec2-userdata=/path/to/userdata"
]
where /path/to/userdata contains something like:
#!/bin/bash
set -eu
curl --fail \
https://MY_BUCKET.s3-MY_REGION.amazonaws.com/SOME_PREFIX/docker-credential-ecr-login \
-o /usr/local/bin/docker-credential-ecr-login
chmod +x /usr/local/bin/docker-credential-ecr-login
mkdir -p ~/.docker
echo > ~/.docker/config.json '{ "credsStore": "ecr-login" }'
And the URL to docker-credential-ecr-login works because the object was uploaded with --acl public-read
Thanks to all the above commenters for helping me nail this down!