Comprehensive Summary of Changes: ResourceProviderWeigher Spec

Overview

This document summarizes the changes between commit 38b5e63cf3d56df4568a9640e40ea3283ada7b95 (previous iteration) and the current HEAD (10487db642cf5580d20016dde25ab1423b0c1ad7). The spec underwent significant expansion with 491 insertions and 168 deletions, adding substantial detail on future work, implementation details, and use cases.

Major Additions

1. New Use Case Added

Added: Use case for expressing preferred and avoided traits in scheduler and flavor
Purpose: Enables optimization of placement decisions based on soft constraints
Impact: Expands the scope of the weigher beyond hard requirements

2. Extensive "Future Work" Section (New)

Added a comprehensive section covering potential enhancements:

2.1 Preferred and Avoided Traits

Mechanism: Traits expressed as trait:$TRAIT_NAME=preferred or trait:$TRAIT_NAME=avoided in flavor extra specs
Scoring Algorithm: Detailed explanation of cost calculation with configurable multipliers (default: 1000.0)
Examples: Provided detailed examples showing how preferred traits subtract from cost and avoided traits add to cost
Complexity: O(n) operation where n is the number of requested traits

2.2 PRESSURE_* Traits

Proposed Standard Traits: PRESSURE_CPU, PRESSURE_MEMORY, PRESSURE_DISK, PRESSURE_NETWORK
Purpose: Allow external monitoring systems to report host pressure without requiring nova logic
Configuration: New resource_provider_avoided_traits config option with default to PRESSURE_* traits
Override Behavior: Config avoided traits can be overridden by flavor preferred traits (similar to Kubernetes tolerations)

2.3 CPU Performance Tier Levels (Four Approaches)

Added extensive documentation on four different approaches to express CPU performance tiers:

Approach 1: Custom Resource Classes

Uses provider.yaml to define custom resource classes (e.g., CUSTOM_CPU_PERFORMANCE_TIER1)
Requires setting resources:VCPU=0 in flavor extra specs
Requires custom quotas via unified limits
Pros: Capacity enforced by Placement
Cons: Requires new resource class per tier, not enforceable by Placement

Approach 2: Traits-Based

Uses traits to express CPU performance tier levels
Simple approach, no custom quotas needed
Pros: No unified limits configuration required
Cons: Requires partitioning cloud by CPU tier levels (aligns to server SKUs)

Approach 3: CPU Time Microseconds

Uses quota:cpu_period flavor extra spec with CUSTOM_CPU_TIME_MICROSECONDS inventory
More flexible than Approach 1 (no pre-planning required)
Pros: Capacity enforced by Placement, flexible tier creation
Cons: More complex configuration

Approach 4: Hybrid (Traits + Resource Classes/Time)

Combines traits for general CPU classes/features with resource class or time-based tiers
Most flexible approach

3. Enhanced Algorithm Documentation

3.1 Resource Calculation Clarification

Added: Detailed explanation of why requested / free_capacity is used instead of (used + requested) / total_capacity
Rationale: More sensitive to consuming available capacity, supporting "most boring host" strategy
Example: Shows difference between 33.3% (requested/free) vs 50% (absolute utilization)

3.2 Expanded Example Calculation

Previous: Simple example with 2 VCPUs and one trait
Current: Comprehensive example with:
- Multiple resources (VCPU, MEMORY_MB, DISK_GB)
- Detailed step-by-step calculations
- Explanation of each calculation step
- Final weight calculation: 0.542 (vs previous 0.585)

4. Implementation Reference

Added: Note referencing initial implementation in review: https://review.opendev.org/c/openstack/nova/+/953131
Purpose: Links spec to actual code implementation

5. Testing Enhancements

Added: Mention of potential scheduler simulator for functional testing
Purpose: Validate performance impact and behavior at scale
Format: May be a new tool (python -m nova.tests.functional.scheduler_simulator) or test suite

Major Modifications

1. Sequence Diagram Simplification

Removed: Detailed code blocks showing exact implementation lines
Simplified: Diagram formatting and spacing
Retained: Core flow information
Impact: More readable, less implementation-specific

2. HostState Modifications Section

Removed: Code block showing class definition with __init__ method
Simplified: Text description only
Impact: Less implementation detail, more conceptual

3. Input Data Structure Format

Changed: Resource format from tuple (capacity, used) to dict {'capacity': int, 'used': int}
Added: Reference to Placement API documentation for complete example
Added: Example trait COMPUTE_MANAGED_PCI_DEVICE in PCI provider example

4. Scheduler Flow Details

Removed: Detailed code examples showing:
- Exact line numbers (~207, ~260-263, ~357-365, ~382-386)
- Code snippets for fetching allocation candidates
- Code snippets for indexing by RP UUID
- Code snippets for extending HostState objects
- Code snippets for accessing in weigher
Simplified: High-level description with reference to key points
Impact: Less brittle to code changes, more maintainable

5. Performance Impact Section

Removed: Detailed bullet points about overhead scaling factors
Simplified: General statement about typical deployments
Added: Note that new weigher may replace existing weighers, potentially offsetting performance impact
Changed: "allocation candidates" → "compute nodes" (more accurate terminology)

6. Developer Impact Section

Removed: Mention of copy.deepcopy() usage
Simplified: "generator functions using copy.deepcopy()" → "generator functions"
Impact: Less implementation detail in spec

Minor Changes

1. Use Case Wording

Changed: "custom resource classes (such as CUSTOM_VCPU_SHARES for performance tiers)"
To: "custom resource classes such as those used by pci in placement CUSTOM_PCI_<vendor_id>_<product_id> or vGPUs CUSTOM_<type>"
Impact: More concrete examples

2. Algorithm Section Formatting

Improved: Numbering and formatting consistency
Added: More detailed explanations of each step

3. Pseudocode Formatting

Fixed: Line continuation formatting for better readability

4. Key Points Section

Simplified: Removed mention of "deep-copied data"
Clarified: "Generator wrappers populate HostState objects"

Summary Statistics

Total Changes: 491 insertions, 168 deletions
Net Addition: +323 lines
Files Changed: 1 file (specs/2026.1/approved/resource-provider-weigher.rst)
Primary Focus: Future work and implementation details

Key Themes

Future-Proofing: Extensive documentation of potential enhancements (preferred/avoided traits, PRESSURE_* traits, CPU performance tiers)
Clarification: Better explanations of algorithms, calculations, and rationale
Simplification: Removed implementation-specific details (line numbers, code snippets) that could become outdated
Practical Examples: More concrete examples with real-world scenarios (PCI devices, vGPUs, CPU performance tiers)
Testing Strategy: Added mention of scheduler simulator for scale testing

SeanMooney/ResourceProviderWeigher-Spec-v5-to-v6.md

Select an option

No results found