Troubleshooting Monitoring and Service Issues
To troubleshoot the cb-ai-service, follow these guidelines. If issues persist, contact PTC Support.
Debugging guidance
Use the following steps to identify the underlying issue and determine which logs to collect.
1. Every API response includes a traceId. All the log lines from a request are tagged with traceId. Use the traceId to filter logs to the exact failing request.
2. Interpret the HTTP status code.
Status
Likely cause
401
Missing or incorrect token, or misconfigured audience or tenant ID
422
Malformed request or threat detection triggered
429
Azure OpenAI rate limit reached
500
Internal error; review pod logs for level="error"
504
Timeout connecting to Azure OpenAI; check network connectivity
3. Check pod logs. Logs are structured JSON emitted to stdout. Key fields include event, level, logger, dd.trace_id, and timestamp.
* 
Stack traces are suppressed in production mode. Set LOG_LEVEL: DEBUG and restart pods for more detail. Customer-hosted deployment always enforces INFO-level floor on modules that expose sensitive data.
4. Use the GET /cb-ai-service/ping/v1 that actively calls Azure OpenAI, and returns response status and latency. This tells you whether the service reaches Azure OpenAI.
5. Use Kubernetes commands to inspect pod status.
GET /returns OK if the pod is healthy. If this fails, check kubectl describe pod for restart reasons.
Logs to collect before contacting PTC Support
Collect the following data from AKS pods:
kubectl logs -n <namespace> deployment/cb-ai-service --since=2h > cbai_logs.txt
kubectl logs -n <namespace> <pod-name> --previous >> cbai_logs.txt # if pod crashed
kubectl describe pod -n <namespace> <pod-name> > pod_describe.txt
kubectl get events -n <namespace> --sort-by='.lastTimestamp' > k8s_events.txt
Pod logs for the last two hours
Logs from previously crashed pods
Pod descriptions
Kubernetes events sorted by timestamp
Extract the following information from logs:
All the level="error" and level="warning" logs near the incident time.
All the logs that match the dd.trace_id from the failing request.
Startup log entries that confirm application configuration, such as AZURE_CUSTOMER_HOSTED_DEPLOYMENT, AUTH_MODE, and SERVICE_VERSION.
Collect the following outputs, if available:
ConfigMap details
kubectl get configmap <name> -n <namespace> -o yaml
The relevant fields are: LOG_LEVEL, AZURE_CUSTOMER_HOSTED_DEPLOYMENT, AZURE_OPENAI_BASE_URL, AUTH_MODE, and SERVICE_VERSION.
Ping endpoint response
curl -H "Authorization: Bearer <token>" https://<host>/cb-ai-service/ping/v1
If OTEL or Grafana is configured, include observability screenshots of http_requests_total (by status) and http_request_duration_seconds (p95/p99) close to the incident window. Include all the traces from the failing traceId.
Was this helpful?