Deploy and Validation stdout of vSR/Kserve/multi-model sim.
$ ./deploy/openshift/deploy-to-openshift.sh --kserve --simulator
[SUCCESS] Logged in as kube:admin
[INFO] Creating namespace: vllm-semantic-router-system
namespace/vllm-semantic-router-system configured
[SUCCESS] Namespace ready
[INFO] KServe CRD missing; installing KServe dependencies...
[INFO] cert-manager CRDs already present.
[INFO] Installing KServe (v0.15.2)...
namespace/kserve serverside-applied
customresourcedefinition.apiextensions.k8s.io/clusterservingruntimes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/clusterstoragecontainers.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/inferencegraphs.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/inferenceservices.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelcaches.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelnodegroups.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/localmodelnodes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/servingruntimes.serving.kserve.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/trainedmodels.serving.kserve.io serverside-applied
serviceaccount/kserve-controller-manager serverside-applied
serviceaccount/kserve-localmodel-controller-manager serverside-applied
serviceaccount/kserve-localmodelnode-agent serverside-applied
role.rbac.authorization.k8s.io/kserve-leader-election-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-localmodel-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-localmodelnode-agent-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/kserve-proxy-role serverside-applied
rolebinding.rbac.authorization.k8s.io/kserve-leader-election-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-localmodel-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-localmodelnode-agent-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/kserve-proxy-rolebinding serverside-applied
configmap/inferenceservice-config serverside-applied
secret/kserve-webhook-server-secret serverside-applied
service/kserve-controller-manager-metrics-service serverside-applied
service/kserve-controller-manager-service serverside-applied
service/kserve-webhook-server-service serverside-applied
deployment.apps/kserve-controller-manager serverside-applied
deployment.apps/kserve-localmodel-controller-manager serverside-applied
Warning: would violate PodSecurity "restricted:latest": restricted volume types (volume "models" uses restricted volume type "hostPath"), seccompProfile (pod or container "manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
daemonset.apps/kserve-localmodelnode-agent serverside-applied
certificate.cert-manager.io/serving-cert serverside-applied
issuer.cert-manager.io/selfsigned-issuer serverside-applied
mutatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/clusterservingruntime.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/inferencegraph.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/localmodelcache.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/servingruntime.serving.kserve.io serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/trainedmodel.serving.kserve.io serverside-applied
clusterstoragecontainer.serving.kserve.io/default created
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
Error from server (InternalError): error when creating "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"
$ ./^Cripts/deploy-rhoai-stable.sh
$ ./deploy/openshift/deploy-to-openshift.sh --kserve --simulator
[SUCCESS] Logged in as kube:admin
[INFO] Creating namespace: vllm-semantic-router-system
namespace/vllm-semantic-router-system configured
[SUCCESS] Namespace ready
[INFO] Creating KServe simulator InferenceService: model-a
inferenceservice.serving.kserve.io/model-a created
[INFO] Creating KServe simulator InferenceService: model-b
inferenceservice.serving.kserve.io/model-b created
[INFO] Waiting for KServe simulator InferenceServices to be ready...
inferenceservice.serving.kserve.io/model-a condition met
inferenceservice.serving.kserve.io/model-b condition met
[INFO] KServe mode: Deploying semantic-router with KServe backend...
==================================================
vLLM Semantic Router - KServe Deployment
==================================================
Configuration:
Namespace: vllm-semantic-router-system
Simulator Mode: true
InferenceService A: model-a
InferenceService B: model-b
Model A Name: Model-A
Model B Name: Model-B
Embedding Model: all-MiniLM-L12-v2
Storage Class: <cluster default>
Models PVC Size: 10Gi
Cache PVC Size: 5Gi
Dry Run: false
Step 1: Validating prerequisites...
✓ OpenShift CLI found
✓ Logged in as kube:admin
✓ Namespace exists: vllm-semantic-router-system
✓ InferenceService exists: model-a
✓ InferenceService is ready
✓ InferenceService exists: model-b
✓ InferenceService is ready
Creating stable ClusterIP service for predictor: model-a
✓ Predictor service ClusterIP A: 172.30.14.160 (stable across pod restarts)
Creating stable ClusterIP service for predictor: model-b
✓ Predictor service ClusterIP B: 172.30.233.137 (stable across pod restarts)
Step 2: Generating manifests...
✓ Generated: configmap-router-config.yaml
✓ Generated: configmap-envoy-config.yaml
✓ Generated: serviceaccount.yaml
✓ Generated: pvc.yaml
✓ Generated: peerauthentication.yaml
✓ Generated: deployment.yaml
✓ Generated: service.yaml
✓ Generated: route.yaml
Step 3: Deploying to OpenShift...
serviceaccount/semantic-router created
persistentvolumeclaim/semantic-router-models created
persistentvolumeclaim/semantic-router-cache created
configmap/semantic-router-kserve-config created
configmap/semantic-router-envoy-kserve-config created
Skipping PeerAuthentication (Istio CRD not found).
deployment.apps/semantic-router-kserve created
service/semantic-router-kserve created
route.route.openshift.io/semantic-router-kserve created
route.route.openshift.io/semantic-router-kserve-api created
✓ Resources deployed successfully
Step 4: Waiting for deployment to be ready...
This may take a few minutes while models are downloaded...
Waiting for pod... (28/60)✓ Pod is ready: semantic-router-kserve-69749f4496-tsgr6
✓ External URL: https://semantic-router-kserve-vllm-semantic-router-system.apps.ci-ln-l739mdk-76ef8.aws-2.ci.openshift.org
==================================================
Deployment Complete!
==================================================
Next steps:
1. Test the deployment:
curl -k "https://semantic-router-kserve-vllm-semantic-router-system.apps.ci-ln-l739mdk-76ef8.aws-2.ci.openshift.org/v1/models"
2. Try a chat completion:
curl -k "https://semantic-router-kserve-vllm-semantic-router-system.apps.ci-ln-l739mdk-76ef8.aws-2.ci.openshift.org/v1/chat/completions" \
-H 'Content-Type: application/json' \
-d '{"model": "Model-B", "messages": [{"role": "user", "content": "Hello!"}]}'
3. Run validation tests:
NAMESPACE=vllm-semantic-router-system MODEL_NAME=Model-B /home/brent/prs/svr/kserve-sim/semantic-router/deploy/kserve/test-semantic-routing.sh
4. View logs:
oc logs -l app=semantic-router -c semantic-router -n vllm-semantic-router-system -f
5. Monitor metrics:
oc port-forward -n vllm-semantic-router-system svc/semantic-router-kserve 9190:9190
curl http://localhost:9190/metrics
Run validation tests now? (y/n) n
For more information, see: /home/brent/prs/svr/kserve-sim/semantic-router/deploy/kserve/README.md
[SUCCESS] KServe deployment complete
[INFO] KServe mode finished; skipping vLLM model deployment and OpenShift routes.
$ API_ROUTE=$(oc get route semantic-router-kserve-api -n vllm-semantic-router-system -o jsonpath='{.spec.host}')
$ ENVOY_ROUTE=$(oc get route semantic-router-kserve -n vllm-semantic-router-system -o jsonpath='{.spec.host}')
$ curl -k -X POST https://$API_ROUTE/api/v1/classify/intent \
-H "Content-Type: application/json" \
-d '{"text": "What is machine learning?"}'
{"classification":{"category":"other","confidence":0,"processing_time_ms":55},"recommended_model":"Model-B","routing_decision":"low_confidence_general","matched_signals":{}}
$ curl -k -X POST https://$ENVOY_ROUTE/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"What is 2+2?"}]}'
{"id":"chatcmpl-2d8d2b0f-699e-497f-9fbb-805fcd7a19d1","created":1769399097,"model":"Model-A","usage":{"prompt_tokens":12,"completion_tokens":49,"total_tokens":61},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Today it is partially cloudy and raining. Today is a nice sunny day. I am fine, how are you today? The temperature here is twenty-five degrees centigrade. To be or not to be that is the question. I am your AI "}}]}
$ k get inferenceservice -A
NAMESPACE NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
vllm-semantic-router-system model-a http://model-a-vllm-semantic-router-system.example.com True 21m
vllm-semantic-router-system model-b http://model-b-vllm-semantic-router-system.example.com True 21m