Design and Architecture
This topic describes the design and architecture of a customer-hosted deployment of Codebeamer AI.
Core Components
A customer-hosted deployment of Codebeamer AI consists of the following core components.
1. Virtual network
The virtual network provides private network isolation for all deployed components.
It ensures that internal communication between AKS, Azure OpenAI, and other Azure services occurs within a secure Azure network boundary rather than the public internet.
The resource group acts as the logical container for all infrastructure deployed as part of the customer-hosted deployment.
Grouping networking, computing, monitoring, and identity resources simplify lifecycle management and allows clean removal using Terraform.
Azure resources created using Terraform include:
Resource group
Virtual network with Network Security Group (NSG) flow logs
AKS subnet with the Cognitive Services service endpoint
OpenAI subnet
Network Security Group associated to the AKS subnet
NSG inbound allow rule (IP allowlist)
NSG inbound deny rule (internet block)
Private DNS zone (privatelink.openai.azure.com)
Private DNS zone virtual network link
Private endpoint (OpenAI)
User-assigned managed identity
Federated identity credential (AKS workload identity)
Role assignment — Cognitive services OpenAI user
Role assignment — Network contributor (AKS subnet)
Azure OpenAI account
Responsible AI content filter policy
OpenAI model deployment — GPT-5-mini
OpenAI model deployment — GPT-5-nano
AKS cluster (system + user node pools)
Kubernetes namespace
Log analytics workspace
Diagnostic settings — AKS
Diagnostic settings — OpenAI
Azure AD application registration
Storage account network rules (flow logs)
2. Service
Runs the cb-ai-service deployed on Kubernetes using Helm
Uses Azure OpenAI models created in the customer-hosted deployment infrastructure
3. Azure policy
Azure policies are applied at the resource group (customer-hosted deployment infrastructure) level.
The policies protect infrastructure from accidental updates such as:
Resource deletion
Network configuration changes
Identity-related restrictions, and so on.
Azure cloud infrastructure for Codebeamer AI service deployment
Azure resource group
The resource group contains all infrastructure resources related to the customer-hosted deployment. This structure simplifies operational management, and enables controlled cleanup using Terraform.
Azure virtual network (VNet)
The virtual network provides private network isolation for all deployed components.
It ensures that internal communication between AKS, Azure OpenAI, and other services occurs within a secure Azure network boundary rather than the public internet.
Subnets
Two dedicated subnets are created within the virtual network:
AKS subnet—Hosts AKS node pool virtual machines used by the Kubernetes cluster. This subnet enables secure communication between the AKS cluster and other Azure resources.
Azure OpenAI subnet—Ensures that Azure OpenAI API traffic flows through the private Azure network rather than public endpoints. This subnet is used for the Azure OpenAI private endpoint connection.
Network Security Group (NSG)—Controls inbound and outbound network traffic at the subnet level. A Network Security Group is created and associated with the AKS subnet.
When an IP allowlist is configured, two custom inbound rules are applied:
Allow rule (Priority 100)—Permits TCP traffic on ports 80, 443, and 8000 only from the specified IP addresses. For example, Codebeamer AI plugin egress IPs, VPN, Zscaler, and so on.
Deny rule (Priority 200)—Blocks all remaining internet-originated TCP traffic on the same ports.
When an IP allowlist is not provided, custom rules are not created and Azure default NSG rules apply.
Azure Kubernetes Service (AKS)
AKS cluster hosts the cb-ai-service application, OpenTelemetry Collector, and any observability and monitoring tools. The cluster provides:
Container orchestration.
Workload scaling.
Kubernetes-native deployment and management.
The cluster is configured with:
Workload identity.
Managed identity integration.
Azure CNI networking.
Container Insights monitoring.
Azure role-based access control (RBAC) and authorized IP ranges for AKS APIs.
Managed identity
A user-assigned managed identity is created for the AKS cluster. This identity allows workloads running inside Kubernetes to securely authenticate with Azure services without storing credentials or secrets.
The managed identity is later connected to the Kubernetes service account using federated identity credentials.
Federated identity credential
The federated identity credentials link the Kubernetes service account used by cb-ai-service to the Azure managed identity.
This credential enables Azure workload identity authentication, allowing the application to securely access Azure services, such as Azure OpenAI.
Azure OpenAI cognitive services
An Azure OpenAI cognitive services account provides access to large language models. Customer-hosted deployment includes:
GPT–5-mini
GPT–5-nano
Public network access is disabled to ensure the requests only use private network connectivity.
OpenAI content filter (RAI policy)
A Responsible AI (RAI) policy is configured for the Azure OpenAI service. The policy defines content safety rules for prompts and responses, including filtering categories such as:
Hate
Violence
Sexual content
Self-harm
This ensures that AI responses comply with Microsoft Responsible AI policies.
Private endpoint
A private endpoint is created for the Azure OpenAI service. This endpoint enables the AKS cluster to access Azure OpenAI through the internal Azure network and the request does not route through the public internet.
Benefits include:
Improved security
Network isolation
Compliance with enterprise network policies
Private DNS zone
A private DNS zone is created for privatelink.openai.azure.com.
This zone allows resources inside the VNet, such as AKS, to resolve the private endpoint address of the Azure OpenAI service.
Without this DNS configuration, private endpoint connectivity does not correctly function.
Log Analytics workspace
A Log Analytics workspace is deployed to collect logs and metrics from the AKS cluster.
This workspace enables:
Cluster monitoring
Operational diagnostics
Log querying and troubleshooting
Container insights
Container insights is enabled for the AKS cluster using the Log Analytics workspace.
These insights provide visibility into:
Container health
Node performance
Resource utilization
Kubernetes events
Diagnostic settings
Diagnostic settings are configured on the AKS cluster.
These settings forward the following details to the Log Analytics workspace for monitoring and troubleshooting.
Platform logs
Metrics
Cluster audit information
Role assignment
A role assignment is created to grant the required permissions to the managed identity on the Azure OpenAI resource.
Specifically, the identity receives the cognitive services OpenAI user role.
This assignment allows the application to call Azure OpenAI APIs using its managed identity.
Azure AD application registration
This registration provides an OAuth client identity for external systems to authenticate against the service. A service principal that is created to register an application is used to authenticate from the Codebeamer AI plugin to the cb-ai-service.
AI service
The AI service deployed on the cluster created by the customer-hosted deployment IaC, uses Microsoft Entra ID to authenticate requests from the Codebeamer AI plugin. You must deploy OTel on the customer-hosted deployment infrastructure. The AI service sends OTel-compatible logs, traces, spans, and metrics to an OTel collector. You can configure this collector with any exporter to capture data in your chosen observability and monitoring tools.
The sample Helm chart templates in the image archive provide guidance about production-ready defaults.
Resource and requests limits for HPA
Pod affinity policy
Use of read-only root file systems where possible
Service configuration of the LoadBalancer type
Managed identity tenant ID and object ID place holders
Governance of infrastructure provisioned for the Codebeamer AI service using Azure policies
The customer-hosted deployment of Codebeamer AI implements Azure policy-based governance to enforce security and compliance controls at the resource group level. The design allows controlled initial deployment and updates or upgrades. The design also prevents long-term bypass of governance controls.
Deny policies are set to reduce the risk of unintentional resource modification or deletions and for compliance purposes.
Policy definitions
Created per environment.
Include required governance rules.
Policy initiative (policy set)
Groups multiple policy definitions.
Provides a single governance baseline.
Simplifies assignment and management.
Policy assignment
Assigned at the resource group level.
Enforces all policies defined in the initiative.
Time-bound exemption
Created automatically during policy deployment.
Configured with a short expiration window.
Key design principles
Policies are always enforced by default.
Exemptions are:
Temporary
Controlled
Time-bound
Governance layer is:
Independent
Applied after infrastructure and service deployment
Policy definition is created for each environment to include required governance rules
Resource upgrade behavior - A temporary waiver exemption module is included for upgrade windows, after which policies return to normal enforcement.
List of Policies
#
Policy
Effect
Purpose
Error Message
1
Cognitive Services
deny_cognitive_account_sku_downgrade
Deny
Prevents SKU downgrade below the allowed tier.
SKU changes for Cognitive Services accounts are not permitted.
2
deny_cognitive_account_kind_change
Deny
Locks the account kind to OpenAI.
Changing the kind (service type) of existing Cognitive Services accounts is not permitted.
3
deny_public_network_access
Deny
Blocks enabling public access.
Public network access must be disabled for Cognitive Services accounts. Enable private endpoints and set publicNetworkAccess to Disabled to comply with network security requirements.
4
deny_cognitive_deployment_deletion
DenyAction
Prevents AI model deployment deletion.
Deletion of Cognitive Services model deployments is not permitted.
5
deny_cognitive_deployment_model_change
DenyAction
Locks model configuration.
Creation or modification of Cognitive Services deployments is not permitted.
6
General
deny_resource_deletion
DenyAction
Blocks deletion of resources and resource group.
Deletion of this resource or resource group is not permitted in this protected scope.
7
Network
deny_vnet_address_space_change
Deny
Locks VNet Classless Inter-Domain Routing (CIDR).
Modification of Virtual Network address space is not permitted.
8
deny_vnet_peering
Deny
Restricts VNet peering initiation from customer-hosted deployment-managed VNets while allowing inbound (reverse) peering from customer-managed VNets.
Creation of VNet peering is not permitted.
9
deny_subnet_address_prefix_change
Deny
Locks subnet CIDRs.
Modification of subnet address prefixes is not permitted.
10
deny_subnet_service_endpoint_removal
Deny
Prevents endpoint removal.
The aks-subnet must have the Microsoft.CognitiveServices service endpoint configured. Removing it breaks AKS-to-OpenAI connectivity.
11
deny_private_endpoint_subnet_change
Deny
Restricts private endpoint to an allowed subnet.
Changing the subnet of a private endpoint is denied by policy. The endpoint must remain on its designated subnet to maintain network segmentation and routing integrity.
12
AKS
deny_aks_rbac_disabled
Deny
Enforces Kubernetes RBAC.
Deny blocks RBAC disabling, Audit logs violations, Disabled turns off the policy
13
deny_aks_network_policy_removal
Deny
Prevents network policy removal
AKS clusters must have a network policy plugin configured.
14
deny_aks_workload_identity_disabled
Deny
Enforces workload identity.
AKS clusters must have workload identity and OIDC issuer enabled. Disabling these breaks pod-to-Azure authentication (federated credentials) and forces insecure alternatives.
15
deny_aks_azure_policy_addon_disabled
Deny
Ensures that the Azure policy add-on stays enabled
The Azure Policy add-on must remain enabled on AKS clusters.
16
AKS In-Cluster
deny_aks_privileged_containers
Deny
Ensures that Kubernetes pods in AKS cannot run privileged containers
Blocks privileged containers
17
Managed Identity
deny_federated_credential_modification
Deny
Locks the issuer, subject, or audience.
Modification of federated identity credentials is denied by policy. These credentials establish the trust chain between AKS workload identity and Azure AD. Tampering with the issuer, subject, or audience severs pod-to-Azure authentication.
18
deny_federated_credential_deletion
DenyAction
Prevents federated credential deletion.
Deletion of federated identity credentials is denied by policy. These credentials enable AKS workload identity federation. Removing them breaks pod authentication to Azure services including OpenAI.
Key Concepts
Separation of infrastructure and service
Infrastructure and services are deployed independently, enabling flexible upgrades and maintenance.
Terraform as the source of truth
Infrastructure and policy configuration are managed through Terraform using separate configurations with distinct roles and permissions.
Full customer control
You fully owns provisioning, security, operations, and monitoring of all Azure resources, and AI service deployment.
Authentication
Codebeamer AI plugin to Codebeamer AI services in your environment
Your PTC product deployment, plugin for Codebeamer AI, uses Microsoft Entra ID service principal API to authenticate with the Codebeamer AI service.
Authentication is enforced using JWT tokens, and the Codebeamer AI service validates token claims (OID, Audience) before processing requests.
Authentication between Codebeamer AI service and Open AI Deployment in your tenant
The AI service communicates with OpenAI models using an Azure user-managed identity over a private link. This setup is backed by a private DNS zone to ensure that traffic remains within the private network boundary.
Telemetry, monitoring, and observability
The AI service sends OTel-compatible logs, traces, spans, and metrics to an OTel collector. This collector can be configured with any exporter to capture data in your chosen observability and monitoring tools.
Azure resources include activity logs by default. Logs are also configured for Log Analytics workspace, AKS container logs, diagnostic settings, AKS and OpenAI, and VNet flow logs for resources managed using IaC. Logs are retained for 90 days.
For service troubleshooting, you can view logs, traces, and spans in your chosen observability and monitoring tool.
For resource troubleshooting, you can view activity logs and persisted logs in the log analytics workspace. You can also access network or flow logs in the configured storage account.
The product plugin may send data, or continue in case of Codebeamer AI, to the preconfigured PTC SaaS Platform Analytics and not be part of the customer-hosted deployment solution.
Sending Telemetry to PTC
You can optionally choose to send telemetry and usage data to PTC. To enable this, create an PTC SaaS Platform organization and the required product entries (links to be added).
This benefits in the following ways:
Connect telemetry to help us improve the AI you use.
Faster and better outcomes: Your usage patterns highlight what to fix and what to enhance next, accelerating quality improvements.
Roadmap influence: Opt-in data carries more weight in prioritization, shaping features that directly benefit your teams.
Low effort, high impact: No manual reports, only anonymous, aggregated signals from real workflows.
Privacy-first: By default, we collect telemetry usage. Your content or IP is not collected. Data is used solely to improve product quality and reliability.
Infrastructure logs and audit logs
Infrastructure and audit visibility is centralized through Azure Monitor and Azure Policy in your subscription. Operational logs for AKS and Azure OpenAI are sent to the configured Log Analytics workspace, where teams can query and monitor platform activity. Audit and governance evidence is available through Azure activity log, resource or control-plane changes, Azure policy compliance, deny or audit results, and network watcher flow logs when enabled.
Assumptions
You have a skilled DevOps Team that can perform the following tasks.
AKS or Kubernetes—Operate AKS and workloads, including upgrades, image pull configuration, Helm deployments, and day-2 operations, rollouts, rollbacks, or configuration updates.
Terraform—Run and maintain Terraform, secure state management, for example Blob backend with RBAC, versioning, or soft delete, and align IaC changes with internal SDLC or governance.
Container Registry—Manage an ACR or OCI compatible registry, including image publishing or mirroring, access control, lifecycle, and AKS connectivity.
Microsoft Entra ID administration—Manage application registrations, service principals, managed identities, and required role or group assignments and permissions.
Azure readiness—You have an Azure subscription and sufficient permissions or RBAC rights to deploy and operate the solution.
Regional support
To meet compliance requirements related to data residency and governance, and so on, the entire infrastructure is created in a single region of your choice. You must consider availability of required Azure OpenAI models in the region. You can configure data zone standard models as PAY-AS-YOU-GO or PTU.
For more information, see the Microsoft documentation Foundry Models sold directly by Azure - Microsoft Foundry | Microsoft Learn.
Disaster Recovery
Customer-hosted deployments are completely stateless. The Azure resources or the deployed Codebeamer AI service do not persist in any state. Given this statelessness, backup and storage do not apply. In case of disaster recovery, you must recreate the entire setup using the fresh deployment guidelines.
Was this helpful?