Contractor Onboarding Checklist - Nebula Aurora (Updated Jan 2026)
Welcome to Nebula Aurora! This checklist will get you fully set up to start contributing.
flowchart LR
subgraph "Week 1"
A[Admin Setup] --> B[Communication]
B --> C[Platform Access]
C --> D[Environment Setup]
end
D --> E[Getting Started]
E --> F[Task Workflow]
Complete these as emails arrive (usually within 1-2 days of starting):
- Sign NDA (email notification)
- Sign statement of work contract via Remote.com or platform.a.team* (if applicable)
- Install Insightful for time tracking*
- You'll receive an invite from
support@insightful.io(check spam) - If no invite, your account may be active—download from app.insightful.io and log in
- Setup guide: Insightful onboarding instructions
- You'll receive an invite from
*May not apply to all team members—confirm with your manager before proceeding.
- Join Discord (invite will be sent to you)
- Attend onboarding call: Daily at 10:00–10:30 PM IST — Google Meet link
- Note: Office hours are mandatory, Tuesdays and Fridays at 10:30 PM IST
These require an admin to grant access first. Reach out on Discord to Kartik, Chai, Vaibhav, or Nikos if blocked.
- Review the Apex Setup Guide for reference
- Log into Apex UI with your Google account
- Retrieve your API key from the Apex UI dashboard
Install the CLI tool using the install script in the Nebula repo:
cd Nebula
bash apex-arena-install.shYou'll need an API key from Apex UI to complete the installation.
Authenticate with the same Google account you used for Apex UI:
gcloud auth loginConfigure Docker to pull from Google Artifact Registry:
gcloud auth configure-docker us-central1-docker.pkg.dev| Command | Purpose |
|---|---|
apex-arena init <task_name> |
Create a new task from template |
apex-arena check-anatomy <folder> |
Validate task folder structure |
apex-arena check-quality <task> |
AI-powered quality review (also runs during push/update) |
apex-arena validate-grader <grader.py> |
Check grader for issues (also runs during push/update) |
apex-arena test-solution <task_id> |
Run solution.sh and verify score |
apex-arena tasks list |
List your tasks |
apex-arena tasks push <dir> |
Push a new task to Apex (returns a UUID) |
apex-arena tasks update <uuid> <dir> |
Update an existing task by UUID |
apex-arena tasks download <id> |
Download task(s) by ID |
apex-arena eval --tasks <ids> |
Run evaluations locally (multiple tasks) |
apex-arena evaluations run <task_id> |
Run evaluations locally (single/remote tasks) |
apex-arena grade |
Grade a specific problem |
apex-arena version |
Show current apex-arena version |
apex-arena update |
Update apex-arena to latest version |
Run apex-arena --help or apex-arena <command> --help for the full list of commands and options.
- Confirm you can access NebulaAuroras/Nebula
- Confirm you can access the Task Tracking Board
Important: Local development is only supported on Linux or in a Linux VM. Running the Nebula container directly on macOS is not supported due to container/k3s compatibility issues.
The Nebula image uses immutable versioned tags — there is no :latest tag. Always use the fully qualified image name with a version tag. Check the CHANGELOG and releases for the current version.
docker pull us-central1-docker.pkg.dev/bespokelabs/nebula-devops-registry/nebula-devops:1.0.1Tag it locally for convenience:
docker tag us-central1-docker.pkg.dev/bespokelabs/nebula-devops-registry/nebula-devops:1.0.1 nebula-devops:latestTest that it runs:
docker run -d \
--name nebula-test-container \
--privileged \
--cgroupns=private \
nebula-devopsVerify the environment:
docker exec -it -u ubuntu nebula-test-container bash
watch -n 1 kubectl get podsIf you're not certain you'll be running exclusively on local Linux or hosted environments, request a VM in the #vm-request Discord channel.
Once your VM is being provisioned:
- Add your SSH public key to the pubkey spreadsheet
- Wait for VM connection details from admin
- SSH into VM and verify access
- Start Here / Single Source of Truth — check the "Last Updated" date; some sections may be outdated
- Nebula Aurora Instructions — task creation workflow, environment details, debugging
- Apex Arena Documentation — CLI reference and task format
Note: The Nebula Aurora Real Scenarios spreadsheet is deprecated. Task tracking has moved to the GitHub Project Board.
- Receive your task category: SRE / DevOps / Platform Engineering / CloudOps
- Check the Task Tracking Board for available tasks and assignments
Create a simple task to verify your setup:
flowchart LR
A[init] --> B[edit]
B --> C[check-anatomy]
C --> D[check-quality]
D --> E[test-solution]
E --> F[push]
D -.->|issues| B
E -.->|fails| B
-
Initialize a new task:
apex-arena init my-first-task
-
Edit the generated files in
tasks/my-first-task/ -
Validate your task:
apex-arena check-anatomy tasks/my-first-task apex-arena check-quality tasks/my-first-task
-
Test your solution:
apex-arena test-solution my-first-task
-
Push to Apex using the spec ID for your category. The first push returns a UUID for your task:
apex-arena tasks push tasks/my-first-task --spec <spec-id>
You can view your task in the Apex UI at
https://apex-ui-v2-319533213591.us-central1.run.app/tasks/<uuid>.For all subsequent updates, use
updatewith that UUID:apex-arena tasks update <task-uuid> tasks/my-first-task
Category Spec ID DevOps b407a435-9dc1-4cc3-950c-3194a8f08fdeSRE 46394e31-2a74-47c1-8359-51e1b678146dPlatform Engineering 9e4d158e-96ff-4435-ab39-4d1e389f4b47CloudOps 450f2e9c-ba04-429c-bf80-e22be0065313
This section covers the ongoing process for creating, reviewing, and evaluating tasks. You are expected to complete at least 1 approved task per week, including all reviews and iterations.
All tasks go through bot review, then two layers of human review before acceptance:
flowchart TD
A[Approved Task Ideas] --> B[In Progress]
B --> B1[Bot Review]
B1 --> C[Ready for Primary Review]
B1 -.->|address feedback| B
C --> D[In Primary Review]
D --> D1[Implementing Primary Feedback]
D1 --> D
D --> E[Ready for Secondary Review]
E --> F[In Secondary Review]
F --> F1[Implementing Secondary Feedback]
F1 --> F
F --> G[Approved]
G --> H[Done]
- Bot Review — Before requesting human review, run
@nebula-reviewer <apex-task-uuid>in your task-feedback thread. Address all valid points raised by the bot. You can also use@nebula-reviewer improve <task_id>for suggestions on increasing difficulty. The bot supports version and model selection:
Limit bot usage to once per version of your task — don't spam it.@nebula-reviewer abc123... # latest version, biggie-nebula (default) @nebula-reviewer abc123... 2 # version 2, biggie-nebula @nebula-reviewer abc123... smalli-nebula # latest version, smalli-nebula @nebula-reviewer abc123... 2 smalli-nebula # version 2, smalli-nebula - Primary Review — A team member reviews your task for correctness and clarity
- Secondary Review — A category lead performs final approval
Note: If a reviewer requests changes, move the task to the corresponding "Implementing Feedback" column while you address it, then back to the review column when ready. Don't move it all the way back to In Progress. See this Discord thread for context.
Use the GitHub project board to track your task through these stages.
- Use the
biggie-nebulamodel for all evaluations - Run at least 8 rollouts per evaluation
- Target a score of < 0.7 across rollouts
- Evaluations can be run locally via
apex-arena evalorapex-arena evaluations run, or hosted through the Apex UI web interface - Review rollout transcripts in Apex UI — ensure failures are due to task difficulty, not grader bugs or environment instability
- If you see inconsistent pass/fail across rollouts with similar agent behavior, add a short sleep (e.g., 60s) before the first grader check to let the environment stabilize
Important: Your task Dockerfiles must reference the fully qualified versioned image for hosted evals to work:
FROM us-central1-docker.pkg.dev/bespokelabs/nebula-devops-registry/nebula-devops:1.0.1Do not use
FROM nebula-devops:latest— it will not resolve in hosted environments.
Partial grading is preferred. Break your task into functional subscores, each representing a real milestone toward the goal.
Rules for subscores:
- Incremental — represents real progress toward the goal
- Objective — deterministic and measurable
- Not gameable — can't earn the reward without actual work
- Equal weights — all subscores should have equal weight
Quick self-test for each subscore:
- "If the agent ONLY gets this subscore, did they make real progress?" → should be yes
- "Can the agent get this reward without working toward the actual goal?" → should be no
Bad subscores: file_exists, no_syntax_errors, config_valid, pods_ready
Good subscores: database_running, app_responds, backup_restores_data, canary_routes_traffic
Example for a task "deploy flask app with postgres database":
# Bad - agent can score 0.8 without anything working
subscores = {"requirements_exists": 1, "dockerfile_exists": 1, "no_syntax_errors": 1, "config_file_valid": 1}
# Good - each subscore means something actually works
subscores = {"database_running": 1, "app_responds": 1, "db_connection_works": 1}
weights = {"database_running": 0.33, "app_responds": 0.33, "db_connection_works": 0.33}| Resource | Link |
|---|---|
| Apex UI | https://apex-ui-v2-319533213591.us-central1.run.app/ |
| Apex Setup Guide | Google Slides |
| GitHub Repo | https://github.com/NebulaAuroras/Nebula |
| Task Tracking Board | GitHub Project |
| Nebula CHANGELOG | CHANGELOG.md |
| Instructions Doc | Google Doc |
| Daily Onboarding Call | Google Meet — 10:00 PM IST |
| Office Hours (Tue/Fri) | Google Meet — 10:30 PM IST |
Post in Discord or attend office hours. Kartik, Vaibhav, and Nikos can help with access and operational issues. Shahryar can assist with technical and task-feedback support. For infrastructure issues, ask Greg or Dylan in #nebula-infra.
This is an unofficial onboarding document maintained by Dylan. Bespoke is working on improving their official docs, which may supersede this one over time.