Skip to content

Instantly share code, notes, and snippets.

@nerdalert
Created January 26, 2026 03:50
Show Gist options
  • Select an option

  • Save nerdalert/658bea4e023eb5b3040086c502d0fbe6 to your computer and use it in GitHub Desktop.

Select an option

Save nerdalert/658bea4e023eb5b3040086c502d0fbe6 to your computer and use it in GitHub Desktop.

Deploy and Validation stdout of vSR/Kserve/multi-model sim.

$ ./deploy/openshift/deploy-to-openshift.sh --kserve --simulator
[SUCCESS] Logged in as kube:admin
[INFO] Creating namespace: vllm-semantic-router-system
namespace/vllm-semantic-router-system configured
[SUCCESS] Namespace ready
[INFO] KServe CRD missing; installing KServe dependencies...
[INFO] cert-manager CRDs already present.
[INFO] Installing KServe (v0.15.2)...
namespace/kserve serverside-applied
customresourcedefinition.apiextensions.k8s.io/clusterservingruntimes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/clusterstoragecontainers.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/inferencegraphs.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/inferenceservices.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelcaches.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelnodegroups.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelnodes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/servingruntimes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/trainedmodels.serving.kserve.io serverside-applied
serviceaccount/kserve-controller-manager serverside-applied
serviceaccount/kserve-localmodel-controller-manager serverside-applied
serviceaccount/kserve-localmodelnode-agent serverside-applied
role.rbac.authorization.k8s.io/kserve-leader-election-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-localmodel-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-localmodelnode-agent-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-proxy-role serverside-applied
rolebinding.rbac.authorization.k8s.io/kserve-leader-election-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-localmodel-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-localmodelnode-agent-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-proxy-rolebinding serverside-applied
configmap/inferenceservice-config serverside-applied
secret/kserve-webhook-server-secret serverside-applied
service/kserve-controller-manager-metrics-service serverside-applied
service/kserve-controller-manager-service serverside-applied
service/kserve-webhook-server-service serverside-applied
deployment.apps/kserve-controller-manager serverside-applied
deployment.apps/kserve-localmodel-controller-manager serverside-applied
Warning: would violate PodSecurity "restricted:latest": restricted volume types (volume "models" uses restricted volume type "hostPath"), seccompProfile (pod or container "manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
daemonset.apps/kserve-localmodelnode-agent serverside-applied
certificate.cert-manager.io/serving-cert serverside-applied
issuer.cert-manager.io/selfsigned-issuer serverside-applied
mutatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/clusterservingruntime.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/inferencegraph.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/localmodelcache.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/servingruntime.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/trainedmodel.serving.kserve.io serverside-applied
clusterstoragecontainer.serving.kserve.io/default created
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
$  ./^Cripts/deploy-rhoai-stable.sh
$  ./deploy/openshift/deploy-to-openshift.sh --kserve --simulator
[SUCCESS] Logged in as kube:admin
[INFO] Creating namespace: vllm-semantic-router-system
namespace/vllm-semantic-router-system configured
[SUCCESS] Namespace ready
[INFO] Creating KServe simulator InferenceService: model-a
inferenceservice.serving.kserve.io/model-a created
[INFO] Creating KServe simulator InferenceService: model-b
inferenceservice.serving.kserve.io/model-b created
[INFO] Waiting for KServe simulator InferenceServices to be ready...
inferenceservice.serving.kserve.io/model-a condition met
inferenceservice.serving.kserve.io/model-b condition met
[INFO] KServe mode: Deploying semantic-router with KServe backend...

==================================================
  vLLM Semantic Router - KServe Deployment
==================================================

Configuration:
  Namespace:              vllm-semantic-router-system
  Simulator Mode:         true
  InferenceService A:     model-a
  InferenceService B:     model-b
  Model A Name:           Model-A
  Model B Name:           Model-B
  Embedding Model:        all-MiniLM-L12-v2
  Storage Class:          <cluster default>
  Models PVC Size:        10Gi
  Cache PVC Size:         5Gi
  Dry Run:                false

Step 1: Validating prerequisites...
✓ OpenShift CLI found
✓ Logged in as kube:admin
✓ Namespace exists: vllm-semantic-router-system
✓ InferenceService exists: model-a
✓ InferenceService is ready
✓ InferenceService exists: model-b
✓ InferenceService is ready
Creating stable ClusterIP service for predictor: model-a
✓ Predictor service ClusterIP A: 172.30.14.160 (stable across pod restarts)
Creating stable ClusterIP service for predictor: model-b
✓ Predictor service ClusterIP B: 172.30.233.137 (stable across pod restarts)

Step 2: Generating manifests...
✓ Generated: configmap-router-config.yaml
✓ Generated: configmap-envoy-config.yaml
✓ Generated: serviceaccount.yaml
✓ Generated: pvc.yaml
✓ Generated: peerauthentication.yaml
✓ Generated: deployment.yaml
✓ Generated: service.yaml
✓ Generated: route.yaml

Step 3: Deploying to OpenShift...
serviceaccount/semantic-router created
persistentvolumeclaim/semantic-router-models created
persistentvolumeclaim/semantic-router-cache created
configmap/semantic-router-kserve-config created
configmap/semantic-router-envoy-kserve-config created
Skipping PeerAuthentication (Istio CRD not found).
deployment.apps/semantic-router-kserve created
service/semantic-router-kserve created
route.route.openshift.io/semantic-router-kserve created
route.route.openshift.io/semantic-router-kserve-api created
✓ Resources deployed successfully

Step 4: Waiting for deployment to be ready...
This may take a few minutes while models are downloaded...

  Waiting for pod... (28/60)✓ Pod is ready: semantic-router-kserve-69749f4496-tsgr6


✓ External URL: https://semantic-router-kserve-vllm-semantic-router-system.apps.ci-ln-l739mdk-76ef8.aws-2.ci.openshift.org

==================================================
  Deployment Complete!
==================================================

Next steps:

1. Test the deployment:
   curl -k "https://semantic-router-kserve-vllm-semantic-router-system.apps.ci-ln-l739mdk-76ef8.aws-2.ci.openshift.org/v1/models"

2. Try a chat completion:
   curl -k "https://semantic-router-kserve-vllm-semantic-router-system.apps.ci-ln-l739mdk-76ef8.aws-2.ci.openshift.org/v1/chat/completions" \
     -H 'Content-Type: application/json' \
     -d '{"model": "Model-B", "messages": [{"role": "user", "content": "Hello!"}]}'

3. Run validation tests:
   NAMESPACE=vllm-semantic-router-system MODEL_NAME=Model-B /home/brent/prs/svr/kserve-sim/semantic-router/deploy/kserve/test-semantic-routing.sh

4. View logs:
   oc logs -l app=semantic-router -c semantic-router -n vllm-semantic-router-system -f

5. Monitor metrics:
   oc port-forward -n vllm-semantic-router-system svc/semantic-router-kserve 9190:9190
   curl http://localhost:9190/metrics


Run validation tests now? (y/n) n

For more information, see: /home/brent/prs/svr/kserve-sim/semantic-router/deploy/kserve/README.md

[SUCCESS] KServe deployment complete
[INFO] KServe mode finished; skipping vLLM model deployment and OpenShift routes.
$  API_ROUTE=$(oc get route semantic-router-kserve-api -n vllm-semantic-router-system -o jsonpath='{.spec.host}')
$  ENVOY_ROUTE=$(oc get route semantic-router-kserve -n vllm-semantic-router-system -o jsonpath='{.spec.host}')
$  curl -k -X POST https://$API_ROUTE/api/v1/classify/intent \
  -H "Content-Type: application/json" \
  -d '{"text": "What is machine learning?"}'
{"classification":{"category":"other","confidence":0,"processing_time_ms":55},"recommended_model":"Model-B","routing_decision":"low_confidence_general","matched_signals":{}}

$  curl -k -X POST https://$ENVOY_ROUTE/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"What is 2+2?"}]}'
{"id":"chatcmpl-2d8d2b0f-699e-497f-9fbb-805fcd7a19d1","created":1769399097,"model":"Model-A","usage":{"prompt_tokens":12,"completion_tokens":49,"total_tokens":61},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Today it is partially cloudy and raining. Today is a nice sunny day. I am fine, how are you today? The temperature here is twenty-five degrees centigrade. To be or not to be that is the question. I am your AI "}}]}

$  k get inferenceservice -A
NAMESPACE                     NAME      URL                                                      READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
vllm-semantic-router-system   model-a   http://model-a-vllm-semantic-router-system.example.com   True                                                                  21m
vllm-semantic-router-system   model-b   http://model-b-vllm-semantic-router-system.example.com   True                                                                  21m
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment