You are an enterprise engineer!
Because GitHub Enterprise Server (GHES) drives our revenue and supports our largest and most recognizable clients, every engineer at GitHub, including yourself, is an enterprise engineer! This lab is an opportunity to practice a few of the concepts you'll need to test code that you write in a GHES environment.
As a prerequisite to this lab, you should watch each part of the Engineering for Enterprise Lecture(TODO). The lecture provides an overview of the tools and concepts that we will be practicing during this self-directed exercise. After watching the lecture, you should be familiar with the key concepts required to complete this lab:
- Deploying your pull request to GitHub.com is not the end of the road.
- The GHES Release schedule and how it impacts your daily work.
- What GHES Backports are and how they are applied to a GHES release branch
- What the Build Pipelines Development Environment (bp-dev) is and how it is used to create a development environment for GHES.
During the Sparkles lab, you created and deployed your very own user profiles page to Sparkles. In this lab, we will be taking that one step farther, and adding Sparkles as a service to GHES and viewing our change in a GHES instance. This is a somewhat contrived example, since Sparkles is not part of GHES currently and likely will never be. We are going to be using a Sparkles container that was built for you during CI when you deployed your profile page. This build will be incorporated into a bp-dev environment that you will create.
Because this is just an exercise, there are a few thing to keep in mind:
- Sparkles is not a customer service
- Anything Sparkles should not make it to production and preferably not to the GHES repo either. Please don't commit your changes.
- We would suggest that you try to follow the steps below first, but if you get stuck, a video of the process can be found on ReWatch(TODO)
- You have completed the Sparkles Lab onboarding to moda.
- You have a VPN for dev setup.
-
Spin up a bp-dev instance
- Run the
.bp-dev launch ghe --enable_ui --tags onboardingchatops command in the #sparkle-ops Slack channel. - Hubot will spin up your environment and return the hostname of a bp-dev instance after 10-15 minutes.
- NOTE: There may be a bug impacting the notification, so if after 15 minutes, Hubot has not notified you that the instance is ready, you can use the command
.bp-dev list gheto see if your instance has been built and is running. If it is, you can continue and use the hostname to connect via SSH in the next step.
- NOTE: There may be a bug impacting the notification, so if after 15 minutes, Hubot has not notified you that the instance is ready, you can use the command
- Run the
-
Connect via the dev VPN you set up earlier
-
use the following command to open an environment workspace in VSCode. (This information can also be found in the slack message that informed you of the success creation of the bp-dev instance)
code --new-window --remote ssh-remote+build@{HANDLE-RANDOM}.service.bpdev-us-east-1.github.net /workspace/enterprise2
Welcome to the enterprise2 world! Don't feel overwhelmed by the folders, we will guide you to the essential part in next steps.
Troubleshooting your connection
-
If you fail to connect, you may have missed setting up your SSH keypair in the prerequisites.
- Create an SSH keypair (if you don't already have one)
- Add the key to GitHub.com
- Make sure you can see the public key by navigating to https://github.com/{USERNAME}.keys
- Make sure your key is active in your local SSH agent.
- From a terminal prompt run
ssh-add -lto list your active keys - If your keypair is not active, run
ssh-add /path/to/your/private-keyto add it. When you run this, you will be prompted to enter the password you used when you created the key.
- From a terminal prompt run
-
If it has been awhile since you created your instance, it is possible that it has been shut down. Idle instances are turned off to save resources after a period of time.
- To see your instances, run
.bp-dev list ghe - Then, restart your instance by running
.bp-dev restart ghe {YOUR_INSTANCE_NAME}
- To see your instances, run
-
The developer VPN does not always set up the right gateways needed to route traffic properly. If the SSH command simply hangs and times out with no response, you may need to manually tell Viscosity about the IP of your instance.
- Find the IP address of your bp-dev instance by using
.bp-dev list ghe - Follow the these instructions to instruct Viscosity to route traffic to the IP through the VPN.
- Find the IP address of your bp-dev instance by using
- open a terminal in
VSCodeand start the enterprise2 instance with following steps. Wait for the build and configuration to finish. ( NOTE : This could take 20-30 minutes for the first run.)
./chroot-stop.sh && ./chroot-reset.sh && ./chroot-build.sh && ./chroot-start.sh && ./chroot-configure.sh
- run
sudo update-reverse-proxyinVSCodeterminal to allow external access to your enterprise instance - Now you can reach your github eneteprise instance at
www.{HANDLE-RANDOM}.service.bpdev-us-east-1.github.netwith userghe-adminand passwordpassworD1 - In a terminal inside VSCode, run
./chroot-ssh.sh. Now you are inside the GHES instance with SSH connection. We will use this connection to interact with the instance in following tasks
GHES is made up of a collection of services that are included in the instance as Linux containers. The next few steps will enable a Sparkles container within the instance. We'll use the latest container built from main in the Sparkles repository.
Find the latest sparkles image tag and container digest SHA256 on the Sparkles Packages page. Click on the version of the image you want to use in the list "Recent tagged image versions" to get the SHA256. Note down the tag and sha of your docker image.
We use Nomad to orchestrate the services and containers needed to run a GHES instance. In order to run a nomad service, we need a nomad job specification file.
- use
./chroot-ssh.shto ssh inside the GHES bp-dev instance in VSCode. - use
nomad job statusto list all nomad jobs and their status. More to read aboutnomadcommand line tool - use
ls -al /etc/nomad-jobs/to list all nomad job specficaotion files. - Let's create one job specification for our sparkles application with
sudo mkdir /etc/nomad-jobs/sparkles/
sudo vi /etc/nomad-jobs/sparkles/sparkles.hcl- Adjust the
image-namewith the information you've noted previously in the following template and copy it to your vim.
Sparkles Nomad Template V1
job "sparkles" {
datacenters = [ "default" ]
type = "service"
group "sparkles" {
restart {
attempts = 50
delay = "5s"
mode = "delay"
interval = "5m"
}
constraint {
attribute = "${meta.cluster_roles}"
operator = "set_contains_any"
value = "web-server"
}
count = 1
update {
canary = 1
max_parallel = 1
min_healthy_time = "5s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
auto_promote = true
}
spread {
attribute = "${meta.node_uuid}"
}
network {
port "http" {
static = 3000
}
}
task "sparkles" {
driver = "docker"
config {
image = "<your-sparkles-image-name>"
memory_hard_limit = "${meta.memory_mb}"
network_mode = "host"
logging {
type = "journald"
config {
tag = "sparkles"
}
}
}
kill_timeout = "120s"
kill_signal = "SIGTERM"
shutdown_delay = "30s"
service {
name = "sparkles"
port = "http"
check {
type = "http"
path = "/_ping"
interval = "15s"
timeout = "2s"
}
}
resources {
cpu = 100
memory = 32
}
env {
LOGGING_LEVEL = "info"
HTTP_ADDR = ":${NOMAD_PORT_http}"
STATS_PERIOD = "1s"
STATS_ADDR = "127.0.0.1:8125"
RAILS_ENV = "production"
SECRET_KEY_BASE = "1234567890"
MYSQL_TLS = "false"
MYSQL_RW_HOST = "127.0.0.1"
MYSQL_RW_PORT = "3308"
MYSQL_RW_DB_NAME = "sparkles"
RAILS_SERVE_STATIC_FILES = "true"
DISABLE_AUTH = "true"
TAGLESS_METRICS = "true"
}
}
}
}
- save the job specification file and run the nomad job with
sudo nomad run -detach /etc/nomad-jobs/sparkles/sparkles.hcl
- query the status of the job with
nomad job status sparkles
- we see that the application stays in a
pendingstate:
Allocations
ID Node ID Task Group Version Desired Status Created Modified
9ae0bfe8 434492b0 sparkles 0 run pending 15h5m ago 15h11s ago
- can you find out why?
TIP: Nomad jobs are run in docker containers, which are called allocations in Nomad context. You can find the ID of the allocations using nomad job status sparkles and use nomad alloc command to inspect the container.
- With
nomad alloc status <alloc-id>
you should see the error similar to the following that shows the docker image is not able to be pulled down:
2024-12-24T10:04:03Z Driver Failure Failed to pull `sparkles:9cb3f38b09325f1fd595e1d65706a3333b6a5035`: API error (500): Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on [::1]:53: server misbehaving
GHES is delivered as a complete solution to customer, any images used inside the instance is packaged into the server. There should be no need to contact a docker registry to pull images.
Now, instead of directly messing with the instance, we will do it the right way to build and deploy with scripts provided in GHES.
Let's exit the ssh connection with exit.
- Locate
docker-image-list-ghein enterprise2 folder in VSCod. - Add image tag and SHA for sparkles. This ensures the image is copied to the instance during build.
sparkles=containers.pkg.github.com/github/sparkles/sparkles-bionic:<image-tag>@sha256:<image-sha>
- Find the folder
vm_files/etc/consul-templates/etc/nomad-jobs. All job specifications are stored here. - Create a nomad template for sparkles under path
vm_files/etc/consul-templates/etc/nomad-jobs/sparkles/sparkles.hcl.ctmpl. Copy the following template without changing anything.
Sparkles Nomad Template V2
{{ $configapply := file "/etc/github/configapply.json" | parseJSON }}
job "sparkles" {
datacenters = [ {{ $configapply.nomad_primary_datacenter | toJSON }} ]
type = "service"
group "sparkles" {
{{ file "/etc/consul-templates/etc/nomad-jobs/restart_policy.hcl" }}
constraint {
attribute = "${meta.cluster_roles}"
operator = "set_contains_any"
value = "web-server"
}
count = {{ $configapply.web_server_count }}
update {
canary = {{ $configapply.web_server_count }}
max_parallel = {{ $configapply.web_server_count }}
min_healthy_time = "5s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
auto_promote = true
}
spread {
attribute = "${meta.node_uuid}"
}
network {
port "http" {
static = 3000
}
}
task "sparkles" {
driver = "docker"
user = "{{ plugin "id" "-u" "sparkles" }}:{{ plugin "id" "-g" "sparkles" }}"
config {
image = "sparkles:{{ file "/data/docker-image-tags/sparkles_image_tag" | trimSpace }}"
memory_hard_limit = "${meta.memory_mb}"
network_mode = "host"
logging {
type = "journald"
config {
tag = "sparkles"
}
}
}
kill_timeout = "120s"
kill_signal = "SIGTERM"
shutdown_delay = "30s"
service {
name = "sparkles"
port = "http"
check {
type = "http"
path = "/_ping"
interval = "15s"
timeout = "2s"
}
}
resources {
cpu = 100
memory = 32
}
env {
LOGGING_LEVEL = "info"
HTTP_ADDR = ":${NOMAD_PORT_http}"
STATS_PERIOD = "1s"
STATS_ADDR = "127.0.0.1:8125"
RAILS_ENV = "production"
SECRET_KEY_BASE = "1234567890"
MYSQL_TLS = "false"
MYSQL_RW_HOST = "127.0.0.1"
MYSQL_RW_PORT = "3308"
MYSQL_RW_USER = "{{ $configapply.mysql_username }}"
MYSQL_RW_PASS = "{{ $configapply.mysql_password }}"
MYSQL_RW_DB_NAME = "sparkles"
RAILS_SERVE_STATIC_FILES = "true"
DISABLE_AUTH = "true"
TAGLESS_METRICS = "true"
GITHUB_CONFIG_SUM = "{{ file "/data/user/common/github.conf" | sha256Hex }}"
# If cluster topology got updated, job should be rendered
{{ if $configapply.is_cluster_enabled }}
CLUSTER_CONFIG_SUM = "{{ file "/data/user/common/cluster.conf" | sha256Hex }}""
{{ end }}
{{ plugin "ghe-app-config-for" "sparkles" }}
}
}
}
}
We see values are being dynamically populated in this template:
- instead of a hardcoded tag, we now see the information for image is read from a file
/data/docker-image-tags/sparkles_image_tag. $configapplyshows up in multiple places: this variable comes from runningghe-config-applywhich is an essential step to build and run GHES. During the config apply, important information of system are injected into values needed by nomad, eg. how many instances should be running, mysql credentials...
- it is a good practice to restrict the application's access with a specific user. Find the file
chroot-prepare-base.shin enterprise2 folder and add
setup_system_user sparkles <replace-with-a-unique-3-digit-uid>
Now we have image and job specification template. We still need to tell the configuration system to start the sparkles application. The GHES configuration system starts with a Bash script: ghe-config-apply, and works through the Ruby modules under ConfigApply, SSH calls (HA/cluster), and invocations of Bash scripts and external tools. It is one of the most important part of GHES. The essential ruby modules can be found under folder vm_files/usr/local/share/lib. More reading here(TODO).
We will add sparkles app to be queued in the config-apply process in following steps.
- add
sparklesto the list ofAPP_NOMAD_SERVICESinvm_files/usr/local/share/lib/config.rb. It should look like this:
APP_NOMAD_SERVICES = %w[
<...other services...>
notebooks
sparkles
treelights
<...other services...>
]
- add
queue_nomad_job("/etc/nomad-jobs/sparkles/sparkles.hcl")to method nomad_run_app_jobs in vm_files/usr/local/share/lib/configapply/nomad.rb, eg. here
- Rebuild enterprise2 in VSCode with:
./chroot-stop.sh; ./chroot-reset.sh && ./chroot-build.sh && ./chroot-start.sh && ./chroot-configure.sh- You can ssh into the enterprise2 instance and watch the logs of configuration
./chroot-ssh.sh
tail -f /data/user/common/ghe-config.log- After a while, you might see
sparkles criticalappear in the logs. This is because thesparklesapp needs a database which doesn't exist yet. But that's ok, we will recover in the next step
The configuration will eventually fail with this error message:
2024-12-30T16:44:24.616076Z Reloading application services...
2024-12-30T17:05:52.511404Z ERROR: ghe-nomad-jobs wait-health-checks
2024-12-30T17:05:52.512426Z Error while running config-apply, caught an exception: Failure: ghe-nomad-jobs wait-health-checks
Let's fix the missing database and get Sparkles running.
GHES normally has a single database for all the services. But Sparkles isn't tuned for that, so we need to create a new database and populate it with the schema.
- go to https://github.com/github/sparkles/blob/main/db/structure.sql and copy the content to a file
structure.sqlin root of enterprise2 folder in VSCode. Then run the following command to copy the file to the enterprise2 instance.
# copy structure.sql from bp-dev instance into the running enterprise2 instance
./chroot-scp.sh structure.sql .-
use
./chroot-ssh.shto ssh into the enterprise2 instance -
use the following command to create a new MySQL database for Sparkles and set the correct permissions needed to access it.
docker exec -i $(docker ps --format {{.Names}} | grep mysql) mysql -t <<EOF
CREATE DATABASE sparkles;
GRANT ALL PRIVILEGES ON sparkles.* TO 'github'@'%';
EOF- next, we use the
structure.sqlfile we copied into the container to populate the tables needed to run Sparkles.
docker exec -i $(docker ps --format {{.Names}} | grep mysql) mysql -t sparkles < ~/structure.sqlBecause of the missing database, the command we used to build the GHES instance previous didn't successfully start the Sparkles service. Let's do that now.
nomad job stop sparkles
sudo nomad job run /etc/nomad-jobs/sparkles/sparkles.hcl- You should see the application being deployed successfully this time:
Deployed
Task Group Auto Revert Desired Placed Healthy Unhealthy
Progress Deadline
sparkles true 1 1 1 0
- Can you find the health check endpoint from
/etc/nomad-jobs/sparkles/sparkles.hcl? Let's try to ping it. - You should be able to ping the service with the following
admin@{USER-RANDOM}-service-bpdev-us-east-1-github-net:~$ curl http://localhost:3000/_ping
pong
- In order to see sparkles within the bp-dev instance, you will need to set up port forwarding twice.
- Once, from your local machine to the bp-dev instance (run in your local terminal)
ssh -L 3000:localhost:3000 build@{USER-RANDOM}.service.bpdev-us-east-1.github.net, where USER-RANDOM is replaced by your handle and the random host string that Hubot provided you. For example:ssh -L 3000:localhost:3000 [email protected]
- And second, from the bp-dev instance itself, to the GitHub Enterprise container inside it
- In VSCode, make sure you exited the enterprise2 container (
exitso you get thebuild@somethingprompt again) and run
./chroot-ssh.sh -L 3000:localhost:3000 - In VSCode, make sure you exited the enterprise2 container (
Check out your hard work by opening localhost:3000 in a browser on your own machine.
You should be able to navigate to https://localhost:3000/users/your-handle and see the profile that you added previously during the Sparkles deployment lab. This means that the change you have made works from within an Enterprise environment! 🎉
Of course, we can't manually forward port for every new service. We will use haproxy, a TCP/HTTP load-balancing proxy, that sits in front of most of the user-facing services we have.
- Let's define
3030as port for the sparkles to be accessed from outside. We need to wire all requests to port3030to3000. To enable the configuration, we need to change the template for thehaproxy-frontend.cfgintemplates/etc/haproxy/haproxy-frontend.cfg.erb. - Let's first define a
backendfor the sparkles application which is running onlocalhost:3000
backend sparkles
server localhost 127.0.0.1:3000 check
- We will define a
frontendto receive the request:
frontend sparkles
bind :3030
bind :::3030 v6only
default_backend sparkles
It tells haproxy to listen on port 3030 and direct all request to a backend named sparkles.
-
And let's add them to
templates/etc/haproxy/haproxy-frontend.cfg.erblike here. -
We also need to open the port in firewall as well. Create a new ufw configuration file under
vm_files/etc/ufw/applications.d/ghe-3030with following content:
[ghe-3030]
title=Sparkles
description=Sparkles in GHES
ports=3030/tcp
- Then lets build the application again with
./chroot-stop.sh && ./chroot-reset.sh && ./chroot-build.sh && ./chroot-start.sh && ./chroot-configure.sh
- Because we have rebuilt the enterprise2, the previous error we faced when starting sparkles app will appear again because there is no database for sparkles. We need to repeat the steps of copying the database schema into the GHES container.
- After the database is fixed, try to see if you can access the
3030port of the enterprise2 instance from bp-dev machine. You can find the ip of your enterprise2 instance with./chroot-info.sh. Then ping the sparkles app withcurl http://10.0.1.246:3030/_ping.
- Clean up the GHES container with
.bp-dev destroy ghe allchatops command in Slack. - The
allin the command above refers to all of your bp-dev instances, so it is safe to run assuming you are not actively working on any other GHES projects. - Alternatively, you can replace
allwithYOUR_GHES_HOST_FQDNif you only want a specific instance to be destroyed. - These instances are cleaned up automatically for you after 7 days of being idle, but running the destroy command releases the resources immediately.
nomad stop sparkles
# Stop does quiet possible NOT stop the docker container, but think it has.
# So you might have to do a `docker stop sparkles-something` also.
nomad job run /etc/nomad-jobs/sparkles/sparkles.hclInside of the LXC container:
docker exec -it $(docker ps --format {{.Names}} | grep sparkles) bashHave the settings for Sparkles DB connection been set up correctly?
./bin/rails console
ActiveRecord::Base.configurations.configurations.select{|c| c.env_name == "production"}
# Also try to query the User tables:
User.allBuild the container locally and tag it with the container registry:
cd ~/sparkles
SPARKLES_TAG=$(git rev-parse HEAD); echo $SPARKLES_TAG
./script/build-sparkles-bionic
Sending build context to Docker daemon 24.71MB
Step 1/49 : FROM containers.pkg.github.com/github/gh-base-image/gh-builder-bionic:latest AS install-gems
latest: Pulling from github/gh-base-image/gh-builder-bionic
etc.
...
Step 49/49 : LABEL com.github.repo.branch=$BRANCH
---> Running in 459ac0bd5a0e
Removing intermediate container 459ac0bd5a0e
---> 17d1d3c7ade9
Successfully built 17d1d3c7ade9
Successfully tagged containers.pkg.github.com/github/sparkles/sparkles-bionic:8eaefb9ed98a7f3d6d87ea07ee883b702d730cfdGet the SHA256 of the image: SPARKLES_SHA=$(docker inspect 17d1d3c7ade9 -f "{{.Id}}" | cut -f 2 -d :) ; echo $SPARKLES_SHA (Using the same image id from above in your build instead of 17d1d3c7ade9). Then use the enterprise2/script/update-service.rb to update your service.
- The Nomad Integration Guide was used to add Sparkles to GHES. Any reference to committing files to the GHES repo was skipping since we do not want these updates to persist.
- The GitHub Enterprise Server Development document provides comprehensive documentation on how to develop for GHES.