This document summarizes the changes between commit 38b5e63cf3d56df4568a9640e40ea3283ada7b95 (previous iteration) and the current HEAD (10487db642cf5580d20016dde25ab1423b0c1ad7). The spec underwent significant expansion with 491 insertions and 168 deletions, adding substantial detail on future work, implementation details, and use cases.
- Added: Use case for expressing preferred and avoided traits in scheduler and flavor
- Purpose: Enables optimization of placement decisions based on soft constraints
- Impact: Expands the scope of the weigher beyond hard requirements
Added a comprehensive section covering potential enhancements:
- Mechanism: Traits expressed as
trait:$TRAIT_NAME=preferredortrait:$TRAIT_NAME=avoidedin flavor extra specs - Scoring Algorithm: Detailed explanation of cost calculation with configurable multipliers (default: 1000.0)
- Examples: Provided detailed examples showing how preferred traits subtract from cost and avoided traits add to cost
- Complexity: O(n) operation where n is the number of requested traits
- Proposed Standard Traits:
PRESSURE_CPU,PRESSURE_MEMORY,PRESSURE_DISK,PRESSURE_NETWORK - Purpose: Allow external monitoring systems to report host pressure without requiring nova logic
- Configuration: New
resource_provider_avoided_traitsconfig option with default to PRESSURE_* traits - Override Behavior: Config avoided traits can be overridden by flavor preferred traits (similar to Kubernetes tolerations)
Added extensive documentation on four different approaches to express CPU performance tiers:
Approach 1: Custom Resource Classes
- Uses
provider.yamlto define custom resource classes (e.g.,CUSTOM_CPU_PERFORMANCE_TIER1) - Requires setting
resources:VCPU=0in flavor extra specs - Requires custom quotas via unified limits
- Pros: Capacity enforced by Placement
- Cons: Requires new resource class per tier, not enforceable by Placement
Approach 2: Traits-Based
- Uses traits to express CPU performance tier levels
- Simple approach, no custom quotas needed
- Pros: No unified limits configuration required
- Cons: Requires partitioning cloud by CPU tier levels (aligns to server SKUs)
Approach 3: CPU Time Microseconds
- Uses
quota:cpu_periodflavor extra spec withCUSTOM_CPU_TIME_MICROSECONDSinventory - More flexible than Approach 1 (no pre-planning required)
- Pros: Capacity enforced by Placement, flexible tier creation
- Cons: More complex configuration
Approach 4: Hybrid (Traits + Resource Classes/Time)
- Combines traits for general CPU classes/features with resource class or time-based tiers
- Most flexible approach
- Added: Detailed explanation of why
requested / free_capacityis used instead of(used + requested) / total_capacity - Rationale: More sensitive to consuming available capacity, supporting "most boring host" strategy
- Example: Shows difference between 33.3% (requested/free) vs 50% (absolute utilization)
- Previous: Simple example with 2 VCPUs and one trait
- Current: Comprehensive example with:
- Multiple resources (VCPU, MEMORY_MB, DISK_GB)
- Detailed step-by-step calculations
- Explanation of each calculation step
- Final weight calculation: 0.542 (vs previous 0.585)
- Added: Note referencing initial implementation in review: https://review.opendev.org/c/openstack/nova/+/953131
- Purpose: Links spec to actual code implementation
- Added: Mention of potential scheduler simulator for functional testing
- Purpose: Validate performance impact and behavior at scale
- Format: May be a new tool (
python -m nova.tests.functional.scheduler_simulator) or test suite
- Removed: Detailed code blocks showing exact implementation lines
- Simplified: Diagram formatting and spacing
- Retained: Core flow information
- Impact: More readable, less implementation-specific
- Removed: Code block showing class definition with
__init__method - Simplified: Text description only
- Impact: Less implementation detail, more conceptual
- Changed: Resource format from tuple
(capacity, used)to dict{'capacity': int, 'used': int} - Added: Reference to Placement API documentation for complete example
- Added: Example trait
COMPUTE_MANAGED_PCI_DEVICEin PCI provider example
- Removed: Detailed code examples showing:
- Exact line numbers (
~207,~260-263,~357-365,~382-386) - Code snippets for fetching allocation candidates
- Code snippets for indexing by RP UUID
- Code snippets for extending HostState objects
- Code snippets for accessing in weigher
- Exact line numbers (
- Simplified: High-level description with reference to key points
- Impact: Less brittle to code changes, more maintainable
- Removed: Detailed bullet points about overhead scaling factors
- Simplified: General statement about typical deployments
- Added: Note that new weigher may replace existing weighers, potentially offsetting performance impact
- Changed: "allocation candidates" → "compute nodes" (more accurate terminology)
- Removed: Mention of
copy.deepcopy()usage - Simplified: "generator functions using copy.deepcopy()" → "generator functions"
- Impact: Less implementation detail in spec
- Changed: "custom resource classes (such as CUSTOM_VCPU_SHARES for performance tiers)"
- To: "custom resource classes such as those used by pci in placement
CUSTOM_PCI_<vendor_id>_<product_id>or vGPUsCUSTOM_<type>" - Impact: More concrete examples
- Improved: Numbering and formatting consistency
- Added: More detailed explanations of each step
- Fixed: Line continuation formatting for better readability
- Simplified: Removed mention of "deep-copied data"
- Clarified: "Generator wrappers populate HostState objects"
- Total Changes: 491 insertions, 168 deletions
- Net Addition: +323 lines
- Files Changed: 1 file (
specs/2026.1/approved/resource-provider-weigher.rst) - Primary Focus: Future work and implementation details
-
Future-Proofing: Extensive documentation of potential enhancements (preferred/avoided traits, PRESSURE_* traits, CPU performance tiers)
-
Clarification: Better explanations of algorithms, calculations, and rationale
-
Simplification: Removed implementation-specific details (line numbers, code snippets) that could become outdated
-
Practical Examples: More concrete examples with real-world scenarios (PCI devices, vGPUs, CPU performance tiers)
-
Testing Strategy: Added mention of scheduler simulator for scale testing