Skip to content

Instantly share code, notes, and snippets.

@zheng022
Last active January 2, 2025 08:26
Show Gist options
  • Select an option

  • Save zheng022/484753dfa0f77c6d771b6ff5c2fa48fb to your computer and use it in GitHub Desktop.

Select an option

Save zheng022/484753dfa0f77c6d771b6ff5c2fa48fb to your computer and use it in GitHub Desktop.

GitHub Enterprise Server (GHES)

You are an enterprise engineer!

Introduction

Because GitHub Enterprise Server (GHES) drives our revenue and supports our largest and most recognizable clients, every engineer at GitHub, including yourself, is an enterprise engineer! This lab is an opportunity to practice a few of the concepts you'll need to test code that you write in a GHES environment.

As a prerequisite to this lab, you should watch each part of the Engineering for Enterprise Lecture(TODO). The lecture provides an overview of the tools and concepts that we will be practicing during this self-directed exercise. After watching the lecture, you should be familiar with the key concepts required to complete this lab:

What We'll Accomplish

During the Sparkles lab, you created and deployed your very own user profiles page to Sparkles. In this lab, we will be taking that one step farther, and adding Sparkles as a service to GHES and viewing our change in a GHES instance. This is a somewhat contrived example, since Sparkles is not part of GHES currently and likely will never be. We are going to be using a Sparkles container that was built for you during CI when you deployed your profile page. This build will be incorporated into a bp-dev environment that you will create.

Because this is just an exercise, there are a few thing to keep in mind:

  • Sparkles is not a customer service
  • Anything Sparkles should not make it to production and preferably not to the GHES repo either. Please don't commit your changes.
  • We would suggest that you try to follow the steps below first, but if you get stuck, a video of the process can be found on ReWatch(TODO)

Assumptions/Prerequisites

Let's get started

Create a bp-dev instance

  • Spin up a bp-dev instance

    • Run the .bp-dev launch ghe --enable_ui --tags onboarding chatops command in the #sparkle-ops Slack channel.
    • Hubot will spin up your environment and return the hostname of a bp-dev instance after 10-15 minutes.
      • NOTE: There may be a bug impacting the notification, so if after 15 minutes, Hubot has not notified you that the instance is ready, you can use the command .bp-dev list ghe to see if your instance has been built and is running. If it is, you can continue and use the hostname to connect via SSH in the next step.
  • Connect via the dev VPN you set up earlier

  • use the following command to open an environment workspace in VSCode. (This information can also be found in the slack message that informed you of the success creation of the bp-dev instance)

code --new-window --remote ssh-remote+build@{HANDLE-RANDOM}.service.bpdev-us-east-1.github.net /workspace/enterprise2

Welcome to the enterprise2 world! Don't feel overwhelmed by the folders, we will guide you to the essential part in next steps.

Troubleshooting your connection
  • If you fail to connect, you may have missed setting up your SSH keypair in the prerequisites.

    • Create an SSH keypair (if you don't already have one)
    • Add the key to GitHub.com
    • Make sure you can see the public key by navigating to https://github.com/{USERNAME}.keys
    • Make sure your key is active in your local SSH agent.
      • From a terminal prompt run ssh-add -l to list your active keys
      • If your keypair is not active, run ssh-add /path/to/your/private-key to add it. When you run this, you will be prompted to enter the password you used when you created the key.
  • If it has been awhile since you created your instance, it is possible that it has been shut down. Idle instances are turned off to save resources after a period of time.

    • To see your instances, run .bp-dev list ghe
    • Then, restart your instance by running .bp-dev restart ghe {YOUR_INSTANCE_NAME}
  • The developer VPN does not always set up the right gateways needed to route traffic properly. If the SSH command simply hangs and times out with no response, you may need to manually tell Viscosity about the IP of your instance.

    • Find the IP address of your bp-dev instance by using .bp-dev list ghe
    • Follow the these instructions to instruct Viscosity to route traffic to the IP through the VPN.

Build and start your enterprise2 GHES instance

  • open a terminal in VSCode and start the enterprise2 instance with following steps. Wait for the build and configuration to finish. ( NOTE : This could take 20-30 minutes for the first run.)
./chroot-stop.sh && ./chroot-reset.sh && ./chroot-build.sh && ./chroot-start.sh && ./chroot-configure.sh
  • run sudo update-reverse-proxy in VSCode terminal to allow external access to your enterprise instance
  • Now you can reach your github eneteprise instance at www.{HANDLE-RANDOM}.service.bpdev-us-east-1.github.net with user ghe-admin and password passworD1
  • In a terminal inside VSCode, run ./chroot-ssh.sh. Now you are inside the GHES instance with SSH connection. We will use this connection to interact with the instance in following tasks

Challenge: Create Sparkles Service in GHES

Find your Sparkles build

GHES is made up of a collection of services that are included in the instance as Linux containers. The next few steps will enable a Sparkles container within the instance. We'll use the latest container built from main in the Sparkles repository.

Find the latest sparkles image tag and container digest SHA256 on the Sparkles Packages page. Click on the version of the image you want to use in the list "Recent tagged image versions" to get the SHA256. Note down the tag and sha of your docker image.

Create Nomad job directly inside the instance

We use Nomad to orchestrate the services and containers needed to run a GHES instance. In order to run a nomad service, we need a nomad job specification file.

  • use ./chroot-ssh.sh to ssh inside the GHES bp-dev instance in VSCode.
  • use nomad job status to list all nomad jobs and their status. More to read about nomad command line tool
  • use ls -al /etc/nomad-jobs/ to list all nomad job specficaotion files.
  • Let's create one job specification for our sparkles application with
sudo mkdir /etc/nomad-jobs/sparkles/
sudo vi /etc/nomad-jobs/sparkles/sparkles.hcl
  • Adjust the image-name with the information you've noted previously in the following template and copy it to your vim.
Sparkles Nomad Template V1
job "sparkles" {
  datacenters = [ "default" ]
  type = "service"

  group "sparkles" {

    restart {
      attempts = 50
      delay = "5s"
      mode = "delay"
      interval = "5m"
    }
    
    constraint {
      attribute = "${meta.cluster_roles}"
      operator = "set_contains_any"
      value = "web-server"
    }

    count = 1

    update {
      canary            = 1
      max_parallel      = 1
      min_healthy_time  = "5s"
      healthy_deadline  = "5m"
      progress_deadline = "10m"
      auto_revert       = true
      auto_promote      = true
    }

    spread {
      attribute = "${meta.node_uuid}"
    }

    network {
      port "http" {
        static = 3000
      }
    }

    task "sparkles" {
      driver = "docker"
      config {
        image = "<your-sparkles-image-name>"
        memory_hard_limit = "${meta.memory_mb}"
        network_mode = "host"

        logging {
          type = "journald"
          config {
            tag = "sparkles"
          }
        }
      }

      kill_timeout = "120s"
      kill_signal = "SIGTERM"
      shutdown_delay = "30s"

      service {
        name = "sparkles"
        port = "http"
        check {
          type     = "http"
          path     = "/_ping"
          interval = "15s"
          timeout  = "2s"
        }
      }

      resources  {
        cpu = 100
        memory = 32
      }

      env {
        LOGGING_LEVEL = "info"
        HTTP_ADDR = ":${NOMAD_PORT_http}"

        STATS_PERIOD = "1s"
        STATS_ADDR = "127.0.0.1:8125"

        RAILS_ENV = "production"
        SECRET_KEY_BASE = "1234567890"
        MYSQL_TLS = "false"
        MYSQL_RW_HOST = "127.0.0.1"
        MYSQL_RW_PORT = "3308"
        MYSQL_RW_DB_NAME = "sparkles"
        RAILS_SERVE_STATIC_FILES = "true"
        DISABLE_AUTH = "true"

        TAGLESS_METRICS = "true"
      }

    }
  }
}
  • save the job specification file and run the nomad job with
    sudo nomad run -detach /etc/nomad-jobs/sparkles/sparkles.hcl
  • query the status of the job with
    nomad job status sparkles
  • we see that the application stays in a pending state:
Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
9ae0bfe8  434492b0  sparkles    0        run      pending  15h5m ago  15h11s ago
  • can you find out why?

TIP: Nomad jobs are run in docker containers, which are called allocations in Nomad context. You can find the ID of the allocations using nomad job status sparkles and use nomad alloc command to inspect the container.

  • With
nomad alloc status <alloc-id>

you should see the error similar to the following that shows the docker image is not able to be pulled down:

 2024-12-24T10:04:03Z  Driver Failure  Failed to pull `sparkles:9cb3f38b09325f1fd595e1d65706a3333b6a5035`: API error (500): Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on [::1]:53: server misbehaving

GHES is delivered as a complete solution to customer, any images used inside the instance is packaged into the server. There should be no need to contact a docker registry to pull images.

Now, instead of directly messing with the instance, we will do it the right way to build and deploy with scripts provided in GHES.
Let's exit the ssh connection with exit.

Create Sparkles application with GHES scripts

ship docker image

  • Locate docker-image-list-ghe in enterprise2 folder in VSCod.
  • Add image tag and SHA for sparkles. This ensures the image is copied to the instance during build.
  sparkles=containers.pkg.github.com/github/sparkles/sparkles-bionic:<image-tag>@sha256:<image-sha>

add nomad template

  • Find the folder vm_files/etc/consul-templates/etc/nomad-jobs. All job specifications are stored here.
  • Create a nomad template for sparkles under path vm_files/etc/consul-templates/etc/nomad-jobs/sparkles/sparkles.hcl.ctmpl. Copy the following template without changing anything.
Sparkles Nomad Template V2
  {{ $configapply := file "/etc/github/configapply.json" | parseJSON  }}
job "sparkles" {
  datacenters = [ {{ $configapply.nomad_primary_datacenter | toJSON }} ]
  type = "service"

  group "sparkles" {

    {{ file "/etc/consul-templates/etc/nomad-jobs/restart_policy.hcl" }}

    constraint {
      attribute = "${meta.cluster_roles}"
      operator = "set_contains_any"
      value = "web-server"
    }

    count = {{ $configapply.web_server_count }}

    update {
      canary            = {{ $configapply.web_server_count }}
      max_parallel      = {{ $configapply.web_server_count }}
      min_healthy_time  = "5s"
      healthy_deadline  = "5m"
      progress_deadline = "10m"
      auto_revert       = true
      auto_promote      = true
    }

    spread {
      attribute = "${meta.node_uuid}"
    }

    network {
      port "http" {
        static = 3000
      }
    }

    task "sparkles" {
      driver = "docker"

      user = "{{ plugin "id" "-u" "sparkles" }}:{{ plugin "id" "-g" "sparkles" }}"

      config {
        image = "sparkles:{{ file "/data/docker-image-tags/sparkles_image_tag" | trimSpace }}"
        memory_hard_limit = "${meta.memory_mb}"
        network_mode = "host"

        logging {
          type = "journald"
          config {
            tag = "sparkles"
          }
        }
      }

      kill_timeout = "120s"
      kill_signal = "SIGTERM"
      shutdown_delay = "30s"

      service {
        name = "sparkles"
        port = "http"
        check {
          type     = "http"
          path     = "/_ping"
          interval = "15s"
          timeout  = "2s"
        }
      }

      resources  {
        cpu = 100
        memory = 32
      }

      env {
        LOGGING_LEVEL = "info"
        HTTP_ADDR = ":${NOMAD_PORT_http}"

        STATS_PERIOD = "1s"
        STATS_ADDR = "127.0.0.1:8125"

        RAILS_ENV = "production"
        SECRET_KEY_BASE = "1234567890"
        MYSQL_TLS = "false"
        MYSQL_RW_HOST = "127.0.0.1"
        MYSQL_RW_PORT = "3308"
        MYSQL_RW_USER = "{{ $configapply.mysql_username }}"
        MYSQL_RW_PASS = "{{ $configapply.mysql_password }}"
        MYSQL_RW_DB_NAME = "sparkles"
        RAILS_SERVE_STATIC_FILES = "true"
        DISABLE_AUTH = "true"

        TAGLESS_METRICS = "true"

        GITHUB_CONFIG_SUM = "{{ file "/data/user/common/github.conf" | sha256Hex }}"
        # If cluster topology got updated, job should be rendered
        {{ if $configapply.is_cluster_enabled }}
        CLUSTER_CONFIG_SUM = "{{ file "/data/user/common/cluster.conf" | sha256Hex }}""
        {{ end }}
        {{ plugin "ghe-app-config-for" "sparkles" }}
      }

    }
  }
}

We see values are being dynamically populated in this template:

  • instead of a hardcoded tag, we now see the information for image is read from a file /data/docker-image-tags/sparkles_image_tag.
  • $configapply shows up in multiple places: this variable comes from running ghe-config-apply which is an essential step to build and run GHES. During the config apply, important information of system are injected into values needed by nomad, eg. how many instances should be running, mysql credentials...

create a user for the application

  • it is a good practice to restrict the application's access with a specific user. Find the file chroot-prepare-base.sh in enterprise2 folder and add
setup_system_user sparkles <replace-with-a-unique-3-digit-uid>

adjust GHES configuration

Now we have image and job specification template. We still need to tell the configuration system to start the sparkles application. The GHES configuration system starts with a Bash script: ghe-config-apply, and works through the Ruby modules under ConfigApply, SSH calls (HA/cluster), and invocations of Bash scripts and external tools. It is one of the most important part of GHES. The essential ruby modules can be found under folder vm_files/usr/local/share/lib. More reading here(TODO).

We will add sparkles app to be queued in the config-apply process in following steps.

  • add sparkles to the list of APP_NOMAD_SERVICES in vm_files/usr/local/share/lib/config.rb. It should look like this:
APP_NOMAD_SERVICES = %w[
  <...other services...>
  notebooks
  sparkles
  treelights
  <...other services...>
]
  • add
queue_nomad_job("/etc/nomad-jobs/sparkles/sparkles.hcl")

to method nomad_run_app_jobs in vm_files/usr/local/share/lib/configapply/nomad.rb, eg. here

Rebuild the GHES environment to activate Sparkles

  • Rebuild enterprise2 in VSCode with:
  ./chroot-stop.sh; ./chroot-reset.sh && ./chroot-build.sh && ./chroot-start.sh && ./chroot-configure.sh
  • You can ssh into the enterprise2 instance and watch the logs of configuration
./chroot-ssh.sh
tail -f /data/user/common/ghe-config.log
  • After a while, you might see sparkles critical appear in the logs. This is because the sparkles app needs a database which doesn't exist yet. But that's ok, we will recover in the next step

The configuration will eventually fail with this error message:

2024-12-30T16:44:24.616076Z Reloading application services...
2024-12-30T17:05:52.511404Z ERROR: ghe-nomad-jobs wait-health-checks
2024-12-30T17:05:52.512426Z Error while running config-apply, caught an exception: Failure: ghe-nomad-jobs wait-health-checks

Copy the database schema into the GHES container

Let's fix the missing database and get Sparkles running.

GHES normally has a single database for all the services. But Sparkles isn't tuned for that, so we need to create a new database and populate it with the schema.

# copy structure.sql from bp-dev instance into the running enterprise2 instance
./chroot-scp.sh structure.sql .
  • use ./chroot-ssh.sh to ssh into the enterprise2 instance

  • use the following command to create a new MySQL database for Sparkles and set the correct permissions needed to access it.

docker exec -i $(docker ps --format {{.Names}} | grep mysql) mysql -t <<EOF
CREATE DATABASE sparkles;
GRANT ALL PRIVILEGES ON sparkles.* TO 'github'@'%';
EOF
  • next, we use the structure.sql file we copied into the container to populate the tables needed to run Sparkles.
docker exec -i $(docker ps --format {{.Names}} | grep mysql) mysql -t sparkles < ~/structure.sql

Start the Sparkles service again

Because of the missing database, the command we used to build the GHES instance previous didn't successfully start the Sparkles service. Let's do that now.

nomad job stop sparkles
sudo nomad job run /etc/nomad-jobs/sparkles/sparkles.hcl
  • You should see the application being deployed successfully this time:
    Deployed
    Task Group  Auto Revert  Desired  Placed  Healthy  Unhealthy
    Progress Deadline
    sparkles    true         1        1       1        0
  • Can you find the health check endpoint from /etc/nomad-jobs/sparkles/sparkles.hcl? Let's try to ping it.
  • You should be able to ping the service with the following
admin@{USER-RANDOM}-service-bpdev-us-east-1-github-net:~$ curl http://localhost:3000/_ping
pong

Set up port forwarding

  • In order to see sparkles within the bp-dev instance, you will need to set up port forwarding twice.
  • Once, from your local machine to the bp-dev instance (run in your local terminal)
    • ssh -L 3000:localhost:3000 build@{USER-RANDOM}.service.bpdev-us-east-1.github.net, where USER-RANDOM is replaced by your handle and the random host string that Hubot provided you. For example: ssh -L 3000:localhost:3000 [email protected]
  • And second, from the bp-dev instance itself, to the GitHub Enterprise container inside it
    • In VSCode, make sure you exited the enterprise2 container (exit so you get the build@something prompt again) and run
    ./chroot-ssh.sh -L 3000:localhost:3000
    

Check out your hard work by opening localhost:3000 in a browser on your own machine.

You should be able to navigate to https://localhost:3000/users/your-handle and see the profile that you added previously during the Sparkles deployment lab. This means that the change you have made works from within an Enterprise environment! 🎉

Optional task: Set up Haproxy

Of course, we can't manually forward port for every new service. We will use haproxy, a TCP/HTTP load-balancing proxy, that sits in front of most of the user-facing services we have.

  • Let's define 3030 as port for the sparkles to be accessed from outside. We need to wire all requests to port 3030 to 3000. To enable the configuration, we need to change the template for the haproxy-frontend.cfg in templates/etc/haproxy/haproxy-frontend.cfg.erb.
  • Let's first define a backend for the sparkles application which is running on localhost:3000
backend sparkles
  server localhost 127.0.0.1:3000 check
  • We will define a frontend to receive the request:
frontend sparkles
  bind :3030
  bind :::3030 v6only
  default_backend sparkles

It tells haproxy to listen on port 3030 and direct all request to a backend named sparkles.

  • And let's add them to templates/etc/haproxy/haproxy-frontend.cfg.erb like here.

  • We also need to open the port in firewall as well. Create a new ufw configuration file under vm_files/etc/ufw/applications.d/ghe-3030 with following content:

[ghe-3030]
title=Sparkles
description=Sparkles in GHES
ports=3030/tcp
  • Then lets build the application again with
./chroot-stop.sh && ./chroot-reset.sh && ./chroot-build.sh && ./chroot-start.sh && ./chroot-configure.sh
  • Because we have rebuilt the enterprise2, the previous error we faced when starting sparkles app will appear again because there is no database for sparkles. We need to repeat the steps of copying the database schema into the GHES container.
  • After the database is fixed, try to see if you can access the 3030 port of the enterprise2 instance from bp-dev machine. You can find the ip of your enterprise2 instance with ./chroot-info.sh. Then ping the sparkles app with curl http://10.0.1.246:3030/_ping.

When done with this exercise

  • Clean up the GHES container with .bp-dev destroy ghe all chatops command in Slack.
  • The all in the command above refers to all of your bp-dev instances, so it is safe to run assuming you are not actively working on any other GHES projects.
  • Alternatively, you can replace all with YOUR_GHES_HOST_FQDN if you only want a specific instance to be destroyed.
  • These instances are cleaned up automatically for you after 7 days of being idle, but running the destroy command releases the resources immediately.

References / Guides

FAQ/Troubleshooting

Optional: Reload sparkles nomad job

nomad stop sparkles
# Stop does quiet possible NOT stop the docker container, but think it has.
# So you might have to do a `docker stop sparkles-something` also.
nomad job run /etc/nomad-jobs/sparkles/sparkles.hcl

Get a shell in Sparkles

Inside of the LXC container:

docker exec -it $(docker ps --format {{.Names}} | grep sparkles)  bash

Rails DB check

Have the settings for Sparkles DB connection been set up correctly?

./bin/rails console
ActiveRecord::Base.configurations.configurations.select{|c| c.env_name == "production"}
# Also try to query the User tables:
User.all

Building sparkles on bp-dev

Build the container locally and tag it with the container registry:

cd ~/sparkles
SPARKLES_TAG=$(git rev-parse HEAD); echo $SPARKLES_TAG
./script/build-sparkles-bionic
Sending build context to Docker daemon  24.71MB
Step 1/49 : FROM containers.pkg.github.com/github/gh-base-image/gh-builder-bionic:latest AS install-gems
latest: Pulling from github/gh-base-image/gh-builder-bionic
etc.
...
Step 49/49 : LABEL com.github.repo.branch=$BRANCH
 ---> Running in 459ac0bd5a0e
Removing intermediate container 459ac0bd5a0e
 ---> 17d1d3c7ade9
Successfully built 17d1d3c7ade9
Successfully tagged containers.pkg.github.com/github/sparkles/sparkles-bionic:8eaefb9ed98a7f3d6d87ea07ee883b702d730cfd

Get the SHA256 of the image: SPARKLES_SHA=$(docker inspect 17d1d3c7ade9 -f "{{.Id}}" | cut -f 2 -d :) ; echo $SPARKLES_SHA (Using the same image id from above in your build instead of 17d1d3c7ade9). Then use the enterprise2/script/update-service.rb to update your service.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment