In Cloudera deployments on cloud, one of the key configuration elements is the DNS. Get it wrong and your deployment may become wholly unusable with users unable to access and use the Cloudera data services. If the DNS is set up less ideal than it could be, connectivity and performance issues may arise. In this blog, we’ll take you through our tried and tested best practices for setting up your DNS for use with Cloudera on Azure.
To get started and give you a feel for the dependencies for the DNS, in an Azure deployment for Cloudera, these are the Azure managed services being used:
AKS cluster: Data Warehouse, Data Engineering, Artificial Intelligence, and Data flow
Azure database for MySQL: Data Engineering
Storage account: all services
Azure database for PostgreSQL DB: data lake, Data Hub clusters, Data Warehouse, Artificial Intelligence, Data Flow
Key vault: all services
Most Azure users use private networks with a firewall as egress control. Most users have restrictions on firewalls for wildcard rules. Cloudera resources are created on the fly, which means wildcard rules may be declined by the security team.
Most Azure users use hub-spoke network topology. DNS servers are usually deployed in the hub virtual network or an on-prem data center instead of in the Cloudera VNET. That means if DNS is not configured correctly, the deployment will fail.
Most Cloudera customers deploying on Azure allow the use of service endpoints; there is a smaller set of organizations that do not allow the use of service endpoints. Service endpoint is a simpler implementation to allow resources on a private network to access managed services on Azure Cloud. If service endpoints are not allowed, firewall and private endpoints will be the other two options. Most cloud users do not like opening firewall rules because that will introduce the risk of exposing private data on the internet. That leaves private endpoints the only option, which will also introduce additional DNS configuration for the private endpoints.
Route from private VNET to firewall, and then to Azure managed service endpoint on the internet directly.
Azure provides service endpoints for resources on private networks to access the managed services on the internet without going through the firewall. That can be configured at a subnet level. Since Cloudera resources are deployed in different subnets, this configuration must be enabled on all subnets.
The DNS records of the managed services using service endpoints will be on the internet and managed by Microsoft. The IP address of this service will be a public IP, and routable from the subnet. Please refer to the Microsoft documentation for detail.
Not all managed services support services endpoint. In a Cloudera deployment scenario, only storage accounts, PostgreSQL DB, and Key Vault support service endpoints.
Fortunately, most users allow service endpoints. If a customer doesn’t allow service endpoints, they have to go with a private endpoint, which is similar to what needs to be configured in the following content.
There is a network interface with a private IP address created with a private endpoint, and there is a private link service associated with a specific network interface, so that other resources in the private network can access this service through the private network IP address.
The key here is for the private resources to find a DNS resolve for that private IP address. There are two options to store the DNS record:
Azure managed public DNS zones will always be there, but they store different types of IP addresses for the private endpoint. For example:
Storage account private endpoint—the public DNS zone stores the public IP address of that service.
AKS API server private endpoint—the public DNS zone stores the private IP of that service.
Azure Private DNS zone: The DNS records will be synchronized to the Azure Default DNS of LINKED VNET.
Private endpoint is eligible to all Azure managed services that are used in Cloudera deployments.
As a consequence, for storage accounts, users either use service endpoints or private endpoints. Because the public DNS zone will always return a public IP, the private DNS zone becomes a mandatory configuration.
For AKS, these two DNS alternatives are both suitable. The challenges of private DNS zones will be discussed next.
As mentioned above for the typical scenario, most Azure users are using a hub-and-spoke network architecture, and deploy custom private DNS on hub VNET.
The DNS records will be synchronized to Azure default DNS of linked VNET.
When a private endpoint is created, Cloudera on Azure will register the private endpoint to the private DNS zone. The DNS record will be synchronized to Azure Default DNS of linked VNET.
If users use custom private DNS, they can configure conditional forward to Azure Default DNS for the domain suffix of the FQDN.
With Azure default DNS, that is still acceptable. The only problem is that the resources on the un-linked VNET will not be able to access the AKS. But since AKS is used by Cloudera resources on the same VNET, that does not pose any major issues.
The most popular network architecture among Azure consumers is hub-spoke network with custom private DNS servers deployed either on hub-VNET or on-premises network.
Since DNS records are not synchronized to the Azure Default DNS of the hub VNET, the custom private DNS server cannot find the DNS record for the private endpoint. And because the Cloudera VNET is using the custom private DNS server on hub VNET, the Cloudera resources on Cloudera VNET will go to a custom private DNS server for DNS resolution of the FQDN of the private endpoint. The provisioning will fail.
With the DNS server deployed in the on-prem network, there isn’t Azure default DNS associated with the on-prem network, so the DNS server couldn’t find the DNS record of the FQDN of the private endpoint.
Different Azure managed services have different DNS attributes, and based on the different use cases, the DNS configuration is different.
Azure Storage Account supports both service endpoint and private endpoint. All the consumers of the Azure storage account are on Azure in the same region, which makes service endpoints good enough for the storage account.
When there is an on-premises workload involved, because the on-premises network doesn’t support service endpoint, the public IP returned by DNS lookup against FQDN of the storage account will inevitably lead the traffic to the internet. Under this scenario, a private endpoint for the storage account is required. Fortunately, we don’t need to consider this when creating Cloudera for Azure services. This use case is required when replicating data from on-premises storage to the storage account.
Use Azure Storage service endpoint for all Azure subnets.
Create a private endpoint for Azure storage if there is on-premises data to be loaded to Azure Storage Account.
Do not use private endpoint FQDN in Cloudera on Azure configuration no matter whether the private endpoint exists or not.
Cloudera for Azure supports 3 types of deployments for Azure Postgres DB: Single server(to be deprecated), Flexible server with delegated network, and Flexible server with private link.
Azure Postgres Single Server will be deprecated. We don’t recommend using the single server option. For users who have to use a single server, Azure service endpoint is good enough.
With the delegated subnet option for Postgres, a dedicated subnet is required. A /27 CIDR is recommended for the subnet, and a /28 is the minimum. A private DNS zone with any domain name is required.
With the privatelink option, a private DNS zone with name “privatelink.postgres.database.azure.com” is required.
Azure stores a CNAME record for PostgresDB in the Azure public DNS zone. The CNAME record points to an A record in the private DNS zone.
Choose delegated subnet or privatelink. Private DNS zone for Postgres DB is mandatory.
Azure Key Vault supports service endpoint and doesn’t need a private DNS zone.
Azure MySQL Flexible server supports privatelink. A private DNS zone is required. Azure stores a CNAME record for Azure MySQL in Azure public DNS zone, and points that CNAME record to an A record in private DNS zone.
Private DNS zone for MySQL DB is inevitable.
AKS API server supports privatelink. AKS stores an A record in the Azure Public DNS zone. Users can choose to use a private DNS zone, or not use a private DNS zone. When using a private DNS zone, another A record is created in the private DNS zone. These two A records are independent of each other.
Since Azure stores two A records in Azure Public DNS zone and Private DNS zone respectively, users can choose to disable Private DNS zone for AKS if necessary.
Users do not need to do anything if Azure Default DNS is being used on the Cloudera VNET configuration. Below steps discuss the best practices and options when using custom private DNS at VNET configuration.
Since different Azure Manager services have different DNS resolve options, the first step is to collect private DNS zone requirements. Normally, Postgres DB private DNS zone, AKS private DNS zone, MySQL private DNS zone are 3 types of private DNS zone we need to consider.
Precreate all required private DNS zones. DNS configuration is a super important IT governance. Even Cloudera can create these private DNS zones on your behalf, it’s better to pre-create them so that DNS can have better configuration management.
Scenario 1: When using custom private DNS and the DNS server on the Cloudera VNET
Link Cloudera VNET to all the private DNS zones.
On the custom private DNS server, create conditional forward on the custom private DNS server to forward DNS requests for the Azure Managed Services to the Azure Default DNS server IP address(168.63.129.16).
Scenario 2: When using custom private DNS and the DNS server on HUB VNET
Link HUB VNET to all the private DNS zones.
On the custom private DNS server, create conditional forward on the custom private DNS server to forward DNS requests for the Azure Managed Services to the Azure Default DNS server IP address (168.63.129.16).
Scenario 3: When using custom private DNS and the DNS server on on-premise network
Link Cloudera VNET to the private DNS zones.
Create Azure DNS resolver delegated subnet on Cloudera VNET.
Create Azure DNS resolver and inbound endpoint on the Azure DNS resolver delegated subnet. A private IP will be associated with the inbound endpoint.
On the custom private DNS server, create conditional forward on the custom private DNS server to forward DNS requests for the Azure Managed Services to the private IP address of Azure DNS Resolver inbound endpoint.
This section introduces the key DNS related configurations when creating Cloudera services.
If DataFlow, Data Engineering, or Artificial Intelligence is required, AKS private DNS zone ID for Liftie AKS clusters can only be configured at this step. So, please make sure this decision is made.
Cloudera supports using delegated subnet and privatelink for Azure PostgreSQL DB. Please make sure which one to be used before this step. Data Warehouse doesn’t support using privatelink for Azure PostgreSQL DB. So, if a Data Warehouse is to be created, it’s better to use a delegated subnet.
If AKS private DNS zone ID is to be used, the configuration can only be specified with the Cloudera CLI
If AKS private DNS zone ID is not to be used, the Cloudera UI can be used to create the environment.
Network net configuration for Azure PostgreSQL DB can be selected on the UI or the Cloudera CLI
When using a delegated subnet for Postgres DB, please select Flexible Server with Delegated Subnet; Select the delegated subnet in the subnet selection; and select the private DNS zone for Postgres DB.
When using a private link for Postgres DB, please select Flexible server with Private Link; and select the private DNS zone for Postgres DB.
NOTE: the private DNS zone name for delegated subnet can be customized. But the private DNS zone name for private link cannot be customized, and has to be “privatelink.postgres.database.azure.com”.
As mentioned in the AKS private DNS zone features, Azure stores two A records for AKS API server. One in the public DNS zone, and another in the private DNS zone. They both point to the same private IP address. From AKS perspective, the external resources use the A record on the Public DNS zone, and the AKS internal resources use the A record on the Private DNS zone.
Azure provides a way to disable the Private DNS zone for AKS API server, so that both AKS external and internal resources can use the A record on the public DNS zone to access the AKS API server.
Cloudera Data Services can also leverage this feature to simplify the DNS forwarding process.
Use the Cloudera CLI to activate Cloudera Data Warehouse. Use ‘--private-dns-zone-aks’ to specify the private DNS zone ID to ‘None’.
An Entitlement is needed to disable the private DNS zone for AKS: “LIFTIE_AKS_DISABLE_PRIVATE_DNS_ZONE”
Otherwise, the AKS private DNS zone configuration will be inherited from the environment setting.
Postgres DB private DNS zone configuration will be inherited from the environment setting.
The AKS private DNS zone configuration is the same as Data Flow and Machine Learning.
Data Engineering uses Azure MySQL DB. So, MySQL private DNS zone has to be configured with the Cloudera CLI
Bringing all things together, consider these best practices for setting up your DNS with Cloudera on Azure:
For more background reading on network and DNS specifics for Azure, have a look at our documentation for the various data services: DataFlow, Data Engineering, Data Warehouse, and Machine Learning. We’re also happy to discuss your specific needs; in that case please reach out to your Cloudera account manager or get in touch.
This may have been caused by one of the following: