Troubleshooting Monitoring and Service Issues

Welcome to Codebeamer AI Help Center > Setting Up the Codebeamer AI > Customer-hosted Codebeamer AI > Recommendations and Best Practices > Reference Monitoring Configuration > Troubleshooting Monitoring and Service Issues

To troubleshoot the cb-ai-service, follow these guidelines. If issues persist, contact PTC Support.

Debugging guidance

Use the following steps to identify the underlying issue and determine which logs to collect.

1. Every API response includes a traceId. All the log lines from a request are tagged with traceId. Use the traceId to filter logs to the exact failing request.

2. Interpret the HTTP status code.

Status	Likely cause
401	Missing or incorrect token, or misconfigured audience or tenant ID
422	Malformed request or threat detection triggered
429	Azure OpenAI rate limit reached
500	Internal error; review pod logs for level="error"
504	Timeout connecting to Azure OpenAI; check network connectivity

3. Check pod logs. Logs are structured JSON emitted to stdout. Key fields include event, level, logger, dd.trace_id, and timestamp.

Stack traces are suppressed in production mode. Set LOG_LEVEL: DEBUG and restart pods for more detail. Customer-hosted deployment always enforces INFO-level floor on modules that expose sensitive data.

4. Use the GET /cb-ai-service/ping/v1 that actively calls Azure OpenAI, and returns response status and latency. This tells you whether the service reaches Azure OpenAI.

5. Use Kubernetes commands to inspect pod status.

GET /returns OK if the pod is healthy. If this fails, check kubectl describe pod for restart reasons.

Logs to collect before contacting PTC Support

Collect the following data from AKS pods:

kubectl logs -n <namespace> deployment/cb-ai-service --since=2h > cbai_logs.txt
kubectl logs -n <namespace> <pod-name> --previous >> cbai_logs.txt   # if pod crashed
kubectl describe pod -n <namespace> <pod-name> > pod_describe.txt
kubectl get events -n <namespace> --sort-by='.lastTimestamp' > k8s_events.txt

• Pod logs for the last two hours

• Logs from previously crashed pods

• Pod descriptions

• Kubernetes events sorted by timestamp

Extract the following information from logs:

• All the level="error" and level="warning" logs near the incident time.

• All the logs that match the dd.trace_id from the failing request.

• Startup log entries that confirm application configuration, such as AZURE_CUSTOMER_HOSTED_DEPLOYMENT, AUTH_MODE, and SERVICE_VERSION.

Collect the following outputs, if available:

• ConfigMap details

kubectl get configmap <name> -n <namespace> -o yaml

The relevant fields are: LOG_LEVEL, AZURE_CUSTOMER_HOSTED_DEPLOYMENT, AZURE_OPENAI_BASE_URL, AUTH_MODE, and SERVICE_VERSION.

• Ping endpoint response

curl -H "Authorization: Bearer <token>" https://<host>/cb-ai-service/ping/v1

• If OTEL or Grafana is configured, include observability screenshots of http_requests_total (by status) and http_request_duration_seconds (p95/p99) close to the incident window. Include all the traces from the failing traceId.

Was this helpful?