Strange issue with AWS VPC Private link endpoint

Hai guys, sorry for discontinuing this blog for a long time; got distorted with work pressure and a lot of changes in my life! This Covid-19 lockdown time made me think about my blog and planned to start it again

Recently I was working for one of the clients to set up an AWS account for there internal product. Meantime one of my team members got into a strange issue while creating private link endpoint from an endpoint service. I got involved in it and got to know that it some issue with AWS availability zone assignment! I will be explaining how this issue come in to notice and what AWS asked us to do resolve the issue.

Before starting on the issue; let me explain what exactly I am trying to achieve. My client is having multiple products and multiple teams working on different projects on the AWS platform. One of the projects wanted to access one of the services running on a different AWS account, which is fully running on the private network and it’s not exposed to the public network.

To achieve this connectivity, utilized AWS service called Private linking using VPC endpoint services and VPC endpoint interface. High-level architecture will look like this.

A screenshot of a cell phone

Description automatically generated
AWS Pvt Link

How to create an endpoint service in AWS VPC:

  • Create a Network Load Balancer for your application in your VPC and configure it for each subnet (Availability Zone az1, az2, az3) in which the service should be available.
  • Create a VPC endpoint service configuration and specify your Network Load Balancer created above.
  • Grant permissions to specific service consumers (AWS accounts) to create a connection to endpoint service.

Steps to enable service consumers to connect to endpoint service:

  • Creates an interface endpoint with endpoint service name
  • Choose respective VPC and availability zone. We used CloudFormation with default option, this means; it will create in all zones as Account B NLB and Account A is having 3 subnets with az1, az2, az3.
  • To activate the connection, accept the interface endpoint connection request. It’s set to automatically accepted in account B so no actin required in our case.
  • Attached a security group with outgoing tariffing enabled for service ports on VPC CIDR.

So, till here all looks good; but it’s not! When tried to access or telnet endpoint DNS name on service port from account A it’s getting a timeout error.

Root cause:

When validated, I have noticed the endpoint interface created in account A only created interface with 2 availability zone. Asper AWS documents, CloudFormation should have created endpoint interface with 3 availability zones as NLB in Account B and account B is having 3 availability zones!

I have taken this issue with AWS and they came back with a reply saying

When creating endpoint service CloudFormation do not have the option to give AZs. It takes AZs from the NLBs attached.

If you add a subnet later to the NLB in different AZ that change wont take effect on endpoint service. i.e. when you add a subnet to the NLB AFTER you created the Endpoint Service.

But we didn’t add or update any subnet in any of the accounts, it was same old VPC and subnets in both of the accounts! AWS also asked us to delete and create endpoint service and endpoint interface again.

I have also noticed, when we create endpoint interface from AWS console, I do not have any issue and It takes AZs from the NLBs attached and it works as expected.