Disk Provisioning on AWS


If you are running on Kubernetes, visit Portworx on Kubernetes on AWS

Below guide explains how Portworx dynamic disk provisioning works on AWS and the requirements for it. This is typically useful when an autoscaling group (ASG) is managing your AWS instances.

AWS Requirements

Granting Portworx the needed AWS permissions

Portworx creates and attaches EBS volumes. As such, it needs the AWS permissions to do so. Below is a sample policy describing these permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "<stmt-id>",
            "Effect": "Allow",
            "Action": [
                "ec2:AttachVolume",
                "ec2:DetachVolume",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:DeleteTags",
                "ec2:DeleteVolume",
                "ec2:DescribeTags",
                "ec2:DescribeVolumeAttribute",
                "ec2:DescribeVolumesModifications",
                "ec2:DescribeVolumeStatus",
                "ec2:DescribeVolumes",
                "ec2:DescribeInstances"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

You can provide these permissions to Portworx in one of following ways:

  1. Instance Privileges: Provide above permissions for all the instances in the autoscaling cluster by applying the corresponding IAM role. More info about IAM roles and policies can be found here
  2. Environment Variables: Create a User with the above policy and provide the security credentials (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) to Portworx.

EBS volume template

An EBS volume template defines the EBS volume properties that Portworx will use as a reference. There are 2 ways you can provide this template to Portworx. These templates are given to Portworx during installation.

1. Using a template specification

You can specify a template spec which will be used by Portworx to create new EBS volumes.

The spec follows the following format:

"type=<EBS volume type>,size=<size of EBS volume>,iops=<IOPS value>,enc=<true/false>,kms=<CMK>"
  • type: Following two types are supported
    • gp2
    • io1 (For io1 volumes specifying the iops value is mandatory.)
  • size: This is the size of the EBS volume in GB
  • iops: This is the required IOs per second from the EBS volume.
  • enc: This needs to be set to true if EBS volumes need to be encrypted. Default: false
  • kms: This is the Customer Master Key to encrypt the EBS volume.

See EBS details for more details on above parameters.

Examples

  • "type=gp2,size=200"
  • "type=gp2,size=100","type=io1,size=200,iops=1000"
  • "type=gp2,size=100,enc=true,kms=AKXXXXXXXX123","type=io1,size=200,iops=1000,enc=true,kms=AKXXXXXXXXX123"

2. Using existing EBS volumes as templates

You can also reference an existing EBS volume as a template. Create at least one EBS volume using the AWS console or AWS CLI. This volume (or a set of volumes) will serve as a template EBS volume(s). On every node where PX is brought up as a storage node, a new EBS volume(s) identical to the template volume(s) will be created.

For example, create two volumes as:

vol-0743df7bf5657dad8: 1000 GiB provisioned IOPS
vol-0055e5913b79fb49d: 1000 GiB GP2

Ensure that these EBS volumes are created in the same region as the auto scaling group.

Record the EBS volume ID (e.g. vol-04e2283f1925ec9ee), this will be passed in to PX as a parameter as a storage device.

Limiting storage nodes

PX allows you to create a heterogenous cluster where some of the nodes are storage nodes and rest of them are storageless.

You can specify the number of storage nodes in your cluster by setting the max_storage_nodes_per_zone input argument. This instructs PX to limit the number of storage nodes in one zone to the value specified in max_storage_nodes_per_zone argument. The total number of storage nodes in your cluster will be

Total Storage Nodes = (Num of Zones) * max_storage_nodes_per_zone.

While planning capacity for your auto scaling cluster make sure the minimum size of your cluster is equal to the total number of storage nodes in PX. This ensures that when you scale up your cluster, only storage less nodes will be added. While when you scale down the cluster, it will scale to the minimum size which ensures that all PX storage nodes are online and available.

You can always ignore the max_storage_nodes_per_zone argument. When you scale up the cluster, the new nodes will also be storage nodes but while scaling down you will loose storage nodes causing PX to loose quorum.

Examples:

  • "-s", "type=gp2,size=200", "-max_storage_nodes_per_zone", "1"

For a cluster of 6 nodes spanning 3 zones (us-east-1a,us-east-1b,us-east-1c), in the above example PX will have 3 storage nodes (one in each zone) and 3 storage less nodes. PX will create a total 3 disks of size 200 each and attach one disk to each storage node.

  • "-s", "type=gp2,size=200", "-s", "type=io1,size=100,iops=1000", "-max_storage_nodes_per_zone", "2"

For a cluster of 9 nodes spanning 2 zones (us-east-1a,us-east-1b), in the above example PX will have 4 storage nodes and 5 storage less nodes. PX will create a total of 8 disks (4 of size 200 and 4 of size 100). PX will attach a set of 2 disks (one of size 200 and one of size 100) to each of the 4 storage nodes.

EC2 Instance types

A PX cluster can be deployed with a heterogeneous makeup of EC2 instance types. Some of your nodes can be used for converged compute and storage, some for compute only and some for storage only.

Follow this guide to select your appropriate instance type. Once you create an AMI template for an instance type, you will create multiple instances from that AMI. Make sure your AMIs are available in each region that you want to run the PX cluster in.

Since PX is a replicated block device, you can also use instance local store volumes for maximum performance. However you must have PX replication turned on.

Multi Zone Availability

Since PX is a replicated storage solution, we recommend using multiple availability zones when creating your EC2 based cluster. Follow this site for more information on geographical availability of your instances: here


Last edited: Tuesday, Dec 4, 2018