Skip to content

Instantly share code, notes, and snippets.

@SeanMooney
Created November 7, 2025 16:07
Show Gist options
  • Select an option

  • Save SeanMooney/5a4d7515127bb5c3e617807af1c5abf6 to your computer and use it in GitHub Desktop.

Select an option

Save SeanMooney/5a4d7515127bb5c3e617807af1c5abf6 to your computer and use it in GitHub Desktop.

Comprehensive Summary of Changes: ResourceProviderWeigher Spec

Overview

This document summarizes the changes between commit 38b5e63cf3d56df4568a9640e40ea3283ada7b95 (previous iteration) and the current HEAD (10487db642cf5580d20016dde25ab1423b0c1ad7). The spec underwent significant expansion with 491 insertions and 168 deletions, adding substantial detail on future work, implementation details, and use cases.


Major Additions

1. New Use Case Added

  • Added: Use case for expressing preferred and avoided traits in scheduler and flavor
  • Purpose: Enables optimization of placement decisions based on soft constraints
  • Impact: Expands the scope of the weigher beyond hard requirements

2. Extensive "Future Work" Section (New)

Added a comprehensive section covering potential enhancements:

2.1 Preferred and Avoided Traits

  • Mechanism: Traits expressed as trait:$TRAIT_NAME=preferred or trait:$TRAIT_NAME=avoided in flavor extra specs
  • Scoring Algorithm: Detailed explanation of cost calculation with configurable multipliers (default: 1000.0)
  • Examples: Provided detailed examples showing how preferred traits subtract from cost and avoided traits add to cost
  • Complexity: O(n) operation where n is the number of requested traits

2.2 PRESSURE_* Traits

  • Proposed Standard Traits: PRESSURE_CPU, PRESSURE_MEMORY, PRESSURE_DISK, PRESSURE_NETWORK
  • Purpose: Allow external monitoring systems to report host pressure without requiring nova logic
  • Configuration: New resource_provider_avoided_traits config option with default to PRESSURE_* traits
  • Override Behavior: Config avoided traits can be overridden by flavor preferred traits (similar to Kubernetes tolerations)

2.3 CPU Performance Tier Levels (Four Approaches)

Added extensive documentation on four different approaches to express CPU performance tiers:

Approach 1: Custom Resource Classes

  • Uses provider.yaml to define custom resource classes (e.g., CUSTOM_CPU_PERFORMANCE_TIER1)
  • Requires setting resources:VCPU=0 in flavor extra specs
  • Requires custom quotas via unified limits
  • Pros: Capacity enforced by Placement
  • Cons: Requires new resource class per tier, not enforceable by Placement

Approach 2: Traits-Based

  • Uses traits to express CPU performance tier levels
  • Simple approach, no custom quotas needed
  • Pros: No unified limits configuration required
  • Cons: Requires partitioning cloud by CPU tier levels (aligns to server SKUs)

Approach 3: CPU Time Microseconds

  • Uses quota:cpu_period flavor extra spec with CUSTOM_CPU_TIME_MICROSECONDS inventory
  • More flexible than Approach 1 (no pre-planning required)
  • Pros: Capacity enforced by Placement, flexible tier creation
  • Cons: More complex configuration

Approach 4: Hybrid (Traits + Resource Classes/Time)

  • Combines traits for general CPU classes/features with resource class or time-based tiers
  • Most flexible approach

3. Enhanced Algorithm Documentation

3.1 Resource Calculation Clarification

  • Added: Detailed explanation of why requested / free_capacity is used instead of (used + requested) / total_capacity
  • Rationale: More sensitive to consuming available capacity, supporting "most boring host" strategy
  • Example: Shows difference between 33.3% (requested/free) vs 50% (absolute utilization)

3.2 Expanded Example Calculation

  • Previous: Simple example with 2 VCPUs and one trait
  • Current: Comprehensive example with:
    • Multiple resources (VCPU, MEMORY_MB, DISK_GB)
    • Detailed step-by-step calculations
    • Explanation of each calculation step
    • Final weight calculation: 0.542 (vs previous 0.585)

4. Implementation Reference

5. Testing Enhancements

  • Added: Mention of potential scheduler simulator for functional testing
  • Purpose: Validate performance impact and behavior at scale
  • Format: May be a new tool (python -m nova.tests.functional.scheduler_simulator) or test suite

Major Modifications

1. Sequence Diagram Simplification

  • Removed: Detailed code blocks showing exact implementation lines
  • Simplified: Diagram formatting and spacing
  • Retained: Core flow information
  • Impact: More readable, less implementation-specific

2. HostState Modifications Section

  • Removed: Code block showing class definition with __init__ method
  • Simplified: Text description only
  • Impact: Less implementation detail, more conceptual

3. Input Data Structure Format

  • Changed: Resource format from tuple (capacity, used) to dict {'capacity': int, 'used': int}
  • Added: Reference to Placement API documentation for complete example
  • Added: Example trait COMPUTE_MANAGED_PCI_DEVICE in PCI provider example

4. Scheduler Flow Details

  • Removed: Detailed code examples showing:
    • Exact line numbers (~207, ~260-263, ~357-365, ~382-386)
    • Code snippets for fetching allocation candidates
    • Code snippets for indexing by RP UUID
    • Code snippets for extending HostState objects
    • Code snippets for accessing in weigher
  • Simplified: High-level description with reference to key points
  • Impact: Less brittle to code changes, more maintainable

5. Performance Impact Section

  • Removed: Detailed bullet points about overhead scaling factors
  • Simplified: General statement about typical deployments
  • Added: Note that new weigher may replace existing weighers, potentially offsetting performance impact
  • Changed: "allocation candidates" → "compute nodes" (more accurate terminology)

6. Developer Impact Section

  • Removed: Mention of copy.deepcopy() usage
  • Simplified: "generator functions using copy.deepcopy()" → "generator functions"
  • Impact: Less implementation detail in spec

Minor Changes

1. Use Case Wording

  • Changed: "custom resource classes (such as CUSTOM_VCPU_SHARES for performance tiers)"
  • To: "custom resource classes such as those used by pci in placement CUSTOM_PCI_<vendor_id>_<product_id> or vGPUs CUSTOM_<type>"
  • Impact: More concrete examples

2. Algorithm Section Formatting

  • Improved: Numbering and formatting consistency
  • Added: More detailed explanations of each step

3. Pseudocode Formatting

  • Fixed: Line continuation formatting for better readability

4. Key Points Section

  • Simplified: Removed mention of "deep-copied data"
  • Clarified: "Generator wrappers populate HostState objects"

Summary Statistics

  • Total Changes: 491 insertions, 168 deletions
  • Net Addition: +323 lines
  • Files Changed: 1 file (specs/2026.1/approved/resource-provider-weigher.rst)
  • Primary Focus: Future work and implementation details

Key Themes

  1. Future-Proofing: Extensive documentation of potential enhancements (preferred/avoided traits, PRESSURE_* traits, CPU performance tiers)

  2. Clarification: Better explanations of algorithms, calculations, and rationale

  3. Simplification: Removed implementation-specific details (line numbers, code snippets) that could become outdated

  4. Practical Examples: More concrete examples with real-world scenarios (PCI devices, vGPUs, CPU performance tiers)

  5. Testing Strategy: Added mention of scheduler simulator for scale testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment