Overview
This article describes how to establish a VPC peering connection to allow traffic from another VPC in either the same or a different AWS account. This allows services located in other VPCs to query Imply without requiring the Imply Cloud clusters to be reachable publicly over the Internet.
We will first go through the straightforward case, where none of the peered VPCs have overlapping CIDR blocks (the IP address ranges for all peered VPCs are disjoint). Following this, we will describe a scenario where there is an overlap, and present a strategy for setting up a network topology to handle this.
This article does not cover the following two cases:
- Setting up a peering connection between different AWS regions: AWS now provides limited support for managed peering between some regions, but this is an advanced case as there are a number of restrictions that must be considered. Please consult the AWS documentation on multi-region VPC peering for more details.
- Setting up a peering connection if the Imply Cloud VPC (the one created in your AWS account) has an overlapping CIDR block with the VPC you're trying to peer: VPC peering is not supported by AWS in this case. You may contact us to request that your Imply Cloud VPC use a different CIDR block to remove the conflict. Changing the CIDR block of the VPC will incur downtime of your Imply Cloud cluster.
No Overlapping VPCs
There are two main steps in peering VPCs when a given VPC does not have multiple peering connections which have overlapping network ranges:
- Establish the peering connection between the VPCs.
- Configure the route tables so that packets are routed to the correct destination.
Create a Peering Connection
Using the AWS web interface, go to the VPC Dashboard and inspect your configured VPCs. Identify the Imply Cloud VPC (the one tagged as 'imply-{accountId}-network-vpc') and the VPC you wish to peer with:
Here, we wish to peer the Imply VPC (vpc-ce78dbb7) with non-overlapping-vpc (vpc-641fee1c). Note the CIDR ranges, 10.50.0.0/21 and 172.20.0.0/20 respectively, and ensure that they are not overlapping.
Go to the Peering Connections section of the VPC Dashboard and note the active peering connections:
You should see at least one entry, which is the peering connection used by Imply to enable communication between Imply's servers (the Imply Cloud Manager) and the Imply clusters running in your account. If the VPC you wish to peer with has a CIDR range that overlaps with this entry, you fall into the overlapping VPC category and should continue to read that section after understanding the basic case.
In this example, the Imply VPC (vpc-ce78dbb7) is peered to a VPC in Imply's AWS account (vpc-828a54fb) which has a CIDR block of 172.19.0.0/20. This corresponds to an IP address range of 172.19.0.0 - 172.19.15.255 which does not overlap the 172.20.0.0/20 block of non-overlapping-vpc so no special configuration will be required.
Click on Create Peering Connection, optionally provide a name tag, and put the Imply VPC (here vpc-ce78dbb7) as the requester and the other VPC (here vpc-641fee1c) as the accepter. You will see a new entry in the peering table with a status of Pending Acceptance:
To accept the peering request, right-click on the peering connection and choose Accept Request. Upon acceptance, the peering connection will change status to be Active.
You will also want to configure the peering connection to allow DNS names to be resolved over the link. This will allow you to address the query elastic load balancer using its assigned DNS name from the other VPC. To do this, right-click on the peering connection and select Edit DNS Settings. On this page, ensure that the option allowing your VPC to resolve DNS requests for the Imply VPC hosts is checked. In this example, we want non-overlapping-vpc (vpc-641fee1c) to be able to resolve DNS requests of Imply VPC (vpc-ce78dbb7):
Click on Save.
Setup Route Tables
Route tables are used to provide instructions to your subnet's virtual router about where network traffic with a given IP address should be sent to. Common destinations include: other instances within the same VPC, an Internet Gateway for external traffic over the Internet, and a peering connection for traffic destined for a different peered VPC. Creating a VPC creates a default route table which is used implicitly if no other route tables are associated with a subnet. However, strictly speaking, route tables are associated with subnets and not with VPCs, which is an important distinction that we will utilize when handling the overlapping peered VPC scenario below. For more information on route tables, see Amazon's documentation.
Click on the Subnets section to view the available subnets for your account in this region:
The first 6 subnets belong to the Imply VPC (vpc-ce78dbb7). Imply typically creates two subnets (one for static and one for dynamically allocated addresses) in three different availability zones to provide protection against failures of an availability zone. Note that they are all associated with the route table rtb-7122e509. The subnet labelled non-overlapping is a subnet in the non-overlapping-vpc (vpc-641fee1c) which, for this example, will be the subnet used by the EC2 instance that requires connectivity to Imply Cloud. This subnet uses the route table rtb-8c4492f7. We will need to configure these two route tables to route traffic destined for the other subnet to the VPC peering connection.
Starting with the route table for the Imply VPC (here rtb-7122e509):
Click on the link in the Route table column and then click on the Routes tab:
Routes are matched by finding the most specific entry that matches the destination address. In this example:
- Traffic destined for an address in 10.50.0.0/21 (10.50.0.0 - 10.50.7.255) remain local and are routed to other EC2 instances in this VPC.
- Traffic destined for an address in 172.19.0.0/20 (172.19.0.0-172.19.15.255) are routed to pcx-d3c1e8ba, which is the peering connection link to a VPC in Imply's AWS account. From here this traffic gets routed to one of the EC2 instances powering the Imply Cloud Manager.
- All other traffic (0.0.0.0/0 matches all IPs) is routed to igw-acd20cca which is the Internet Gateway allowing the instances to make outbound requests externally over the Internet.
To this table, we will need to add an entry to route traffic destined for 172.20.0.0/22 (the range of the non-overlapping subnet) to pcx-6ca76b04 (the VPC peering connection we established). Alternatively, we can specify the range for the entire non-overlapping-vpc VPC (172.20.0.0/20) to allow other subnets within that VPC to also communicate over the peering link.
Click on Edit, click on Add another route, and using this example, set:
- Destination: 172.20.0.0/22
- Target: pcx-6ca76b04
Click on Save.
We will now need to do a similar configuration for the route table rtb-8c4492f7. Here, we will want to add an entry to route all traffic destined for the Imply Cloud VPC (10.50.0.0/21) to the peering link pcx-6ca76b04. Select this route table, click on the Routes tab, click on Edit, click on Add another route, and using this example, set:
- Destination: 10.50.0.0/21
- Target: pcx-6ca76b04
Click on Save.
This completes the setup of the peering connection and configuration of the route tables.
Testing the Peering Connection
We can test the peering connection by making an API request to Druid from an instance in the peered VPC. We will do this by making a call to the query load balancer rather than to an individual instance. You can get the hostname of the load balancer from the API section of the Cloud Manager:
Note that for VPC peering connections, we use the Private endpoints rather than the Public ones (which are used when you want the cluster to be accessible over the public Internet).
Even though the peering connection is in place, the security group for the load balancer will not by default permit traffic from your EC2 instance. In order to allow the connection to be established, we will need to add an entry to the Inbound table of the Imply Cloud ELB Unmanaged security group.
First, determine which security group is associated with the EC2 instance the request will originate from. From the EC2 Dashboard, click on Instances, and find the details of your instance:
Here, the security group is launch-wizard-2. Clicking on the security group name will bring you to the Security Groups section where, in this example, we find that launch-wizard-2 is identified with group ID: sg-dee0b1ae.
Staying in the Security Groups section, look for the security group with the description 'Imply Cloud ELB Unmanaged'. You may have to clear your filter for it to appear in the table. Select this entry, click on the Inbound tab, click Edit, and add a rule as follows:
- Type: All TCP
- Protocol: TCP
- Port Range: 0 - 65535
- Source: Custom: sg-dee0b1ae
Click on Save.
You should now be able to make requests from your instance to the Imply Cloud query load balancer, as an example:
ubuntu@ip-172-20-2-51:~$ curl -k -u admin https://internal-imply-e04-elbinter-qqsuirj9f9ep-901811362.us-west-2.elb.amazonaws.com:9088/druid/v2/datasources
Enter host password for user 'admin':
["wikipedia"]
ubuntu@ip-172-20-2-51:~$ curl -k -u admin -XPOST -H'Content-Type:application/json' -d '{"queryType": "timeBoundary", "dataSource": "wikipedia"}' https://internal-imply-e04-elbinter-qqsuirj9f9ep-901811362.us-west-2.elb.amazonaws.com:9088/druid/v2/
Enter host password for user 'admin':
[{"timestamp":"2016-06-27T00:00:11.080Z","result":{"maxTime":"2016-06-27T21:31:02.498Z","minTime":"2016-06-27T00:00:11.080Z"}}]
Overlapping VPCs
We now move on to the case where the Imply VPC needs to be peered to multiple VPCs that have overlapping CIDR blocks. This happens most frequently when the VPC you wish to create a peering connection to has a network range that overlaps the VPC in Imply's AWS account. Continuing the previous example, we now want to establish a peering connection between the Imply VPC (vpc-ce78dbb7, 10.50.0.0/21) and overlapping-vpc (vpc-3607f64e, 172.19.0.0/20). Note that while these two VPCs don't overlap (if they did, a peering connection would not be allowed), the Imply VPC already has a peering connection to vpc-828a54fb which has a CIDR block of 172.19.0.0/20, conflicting with overlapping-vpc. vpc-828a54fb is a VPC in Imply's AWS account, and this peering connection is crucial for the operation of Imply Cloud and cannot be modified or removed.
AWS allows peering connections to be made to multiple VPCs which have overlapping CIDR blocks, but there is an additional complication when trying to resolve the route table entries: given a packet that should be sent to 172.19.0.3 (an address within the 172.19.0.0/20 CIDR), how does the router know which of the two VPC peering connections to route the message to?
As mentioned previously, we will need to leave the subnets and route table entries set up by Imply Cloud unmodified, since modifying or removing these will cause your clusters to be unreachable by the Imply Cloud Manager. Remembering that the routers (and their corresponding route tables) are associated with subnets rather than VPCs, our strategy here will be to set up additional subnets with a separate route table that will route traffic for 172.19.0.0/20 to your other VPC instead of Imply's VPC. Into these subnets, we will add a new elastic load balancer which will act as a proxy to the Imply query servers, and the route table rules will ensure that traffic passing through this load balancer will be routed to your VPC instead of Imply's.
This works particularly nicely in AWS because an auto-scaling group (which is used to manage the query EC2 instances) can be associated with multiple elastic load balancers. Hence, as the EC2 instances are created, fail, and are replaced, the elastic load balancer will continue pointing at the correct online set of instances instead of pointing at instances which have died. When we are done with this setup, the query auto-scaling group will be associated with three load balancers:
- one for communication internally within the VPC and with the VPC in Imply's AWS account (created by Imply)
- one for communication externally over the internet (created by Imply, but by default blocked using firewall rules)
- one for communication internally with your peered VPC using the VPC peering connection (this is what we will be setting up here)
Create a Peering Connection
The first step is to create a peering connection using the steps detailed in the similarly named section for the non-overlapping case. In our example, we will be peering the Imply VPC (vpc-ce78dbb7) and overlapping-vpc (vpc-3607f64e):
Note that the peering connection we created (pcx-ceb27ea6) links us to a VPC with a CIDR 172.19.0.0/20 which is the same network range as the peering connection created by Imply (pcx-d3c1e8ba) and that this is a permitted operation.
Create New Subnets
We will need to create an additional subnet for each availability zone used by the Imply Cloud cluster. There will be 2 or 3 availability zones used, depending on which region you are in. To identify the availability zones, inspect the Availability Zone column (or view the detail descriptions) for each Imply Cloud subnet (beginning with imply-...).
On the Subnet page, click Create subnet. Give your subnet a recognizable name, and ensure that it is created in the Imply VPC (in our example, vpc-ce78dbb7). For the IPv4 CIDR block, you will need to choose a block of IP addresses within the Imply VPC block that doesn't conflict with any of the existing 6 subnets; in our example, we used 10.50.6.0/26. Set the Availability Zone to correspond to one of the zones used by Imply Cloud. Clicking Create gives us the following subnets:
The one that we created is called imply-overlap (subnet-d5eea2ac). Note that it uses a different route table (rtb-a822e5d0) than the subnets created by Imply Cloud. This is actually the default route table for the Imply VPC which was previously unused, since the 6 subnets created by Imply were explicitly associated with a custom route table. It is fine to use the default route table, but alternatively, you can also create an additional route table and then explicitly associate the imply-overlap subnet with this route table.
Repeat the previous step until you have created a subnet for each availability zone used by Imply Cloud.
Setup Route Table
This is performed similarly to the non-overlapping case, but using the route table associated with the newly created subnets instead of the one used by the existing 6 subnets. In this case, our new route table entries look like this:
For the route table associated with the new subnets in the Imply VPC (rtb-a822e5d0):
- Destination: 172.19.0.0/20
- Target: pcx-ceb27ea6
For the route table associated with the subnet in the other VPC that was peered with the Imply VPC (in this example, the route table for overlapping subnet-567e171d, which is rtb-48598f33):
- Destination: 10.50.6.0/26
- Target: pcx-ceb27ea6
Create a New Elastic Load Balancer
Now that we have created new subnets which route traffic from 172.19.0.0/20 to your peered VPC, we will need to create an elastic load balancer which uses those subnets and proxies requests to the Imply Cloud query nodes. We will inspect the existing load balancer setup by Imply and create an additional one with similar configuration:
In the Load Balancers section of the EC2 Dashboard:
- Click on Create Load Balancer
- For this example, we are using the Classic Load Balancer, but a similar setup can be done using one of the newer generation load balancers. Click on Create.
- Load Balancer name: provide a recognizable name for the load balancer
- Create LB Inside: select the Imply Cloud VPC, in this example vpc-ce78dbb7
- Create an internal load balancer: this should be checked so that the load balancer can be used over the peering connection
- Listener Configuration: This should have the same entries as the reference load balancer:
- TCP 8888 -> TCP 8888
- TCP 9088 -> TCP 9088
- TCP 9095 -> TCP 9095
- Select Subnets: select the subnets you created previously (in this example we only have one, subnet-d5eea2ac, but you should have one for each availability zone)
- Click on Next: Assign Security Groups
- Select the security groups with description 'Imply Cloud ELB Unmanaged' and 'Imply Cloud Default'
- Click on Next: Configure Security Settings
- Click on Next: Configure Health Check
- Use the same entries as the reference load balancer:
- Ping Protocol: HTTPS
- Ping Port: 9095
- Ping Path: /health
- Click on Next: Add EC2 Instances
- We will not select any EC2 instances on this page, but will later associate this elastic load balancer with our auto-scaling group.
- Click on Next: Add Tags
- Click on Review and Create and then click Create.
Note the load balancer name in the AWS console, in our example it is imply-query-my-peering:
Add Load Balancer to Query Auto Scaling Group
Go to the Auto Scaling Groups section of the EC2 Dashboard and locate the auto-scaling group tagged with a name of the form imply-{clusterId}-Query-... Right-click on this auto-scaling group and select Edit.
Modify the Classic Load Balancers section to include an additional entry for the load balancer you created above, and then click on Save.
To confirm this was done correctly, return to the Load Balancers section and inspect the load balancer you created. Status should show 'x of x instances in service' where x is non-zero, and the Instances tab should show a list of all the query instances in your cluster.
Testing the Peering Connection
Similarly to the non-overlapping case, we should now be able to make query requests from our VPC to the Imply VPC, but we will be communicating with the load balancer we created instead of the one listed in the API section of the Imply Cloud Manager. For this example, our load balancer has the DNS name of internal-imply-query-my-peering-1218651714.us-west-2.elb.amazonaws.com. You will still have to add an entry to the Imply Cloud ELB Unmanaged security group to permit inbound access from the origin instance's security group as described in the non-overlapping section.
If everything was correctly set up, issuing Druid API calls to the new load balancer should return a response:
ubuntu@ip-172-19-2-170:~$ curl -k -u admin https://internal-imply-query-my-peering-1218651714.us-west-2.elb.amazonaws.com:9088/druid/v2/datasources
Enter host password for user 'admin':
["wikipedia"]
ubuntu@ip-172-19-2-170:~$ curl -k -u admin -XPOST -H'Content-Type:application/json' -d '{"queryType": "timeBoundary", "dataSource": "wikipedia"}' https://internal-imply-query-my-peering-1218651714.us-west-2.elb.amazonaws.com:9088/druid/v2/
Enter host password for user 'admin':
[{"timestamp":"2016-06-27T00:00:11.080Z","result":{"maxTime":"2016-06-27T21:31:02.498Z","minTime":"2016-06-27T00:00:11.080Z"}}]
Additional Notes about the New Query Load Balancer
This strategy works by binding an additional load balancer to the query auto-scaling group to allow requests made through this load balancer to be routed back to your peered VPC instead of the VPC in Imply's AWS account. At first glance, this may seem to only allow queries to be made from your VPC, while not supporting calls to Druid's other APIs such as to submit indexing tasks. Actually, the same query load balancer can be used to access Druid's other APIs for the coordinator and overlord services (running on the master nodes) by using the router's management proxy functionality which is enabled by default in Imply Cloud. For more information on this, see the router documentation here.
One caveat about the router management proxy is that it cannot handle requests to serve the coordinator or overlord consoles. If you need access to these consoles over the peering connection, one strategy is to create additional load balancers (one for the coordinator and one for the overlord) according to the steps described above. You would then add these load balancers to the Classic Load Balancers section of the master auto-scaling groups (you will have 1 of these in the non-HA configuration and 3 of them in the HA configuration). For the health check endpoints, use the following:
- Coordinator: HTTPS:8281/druid/coordinator/v1/isLeader
- Overlord: HTTPS:8290/druid/indexer/v1/isLeader
This will cause the active coordinator and overlord to be the in-service instance, while removing the slave instances from the load balancers and thus preventing traffic from being routed to them. As leadership changes in a highly available clusters, instances will be registered with and removed from the load balancer to ensure that all traffic is routed to the active coordinator/overlord. This configuration will support loading of the coordinator and overlord web consoles over the peered connection.
Comments
0 comments
Please sign in to leave a comment.