Postgres on Cloud : 9 mistakes and how to avoid them.

We all make mistakes and we learn from them. It’s human behavior.  While it’s always good to learn from your own, it’s never a bad idea to learn something from someone elses.

The past few years working and managing DB on the cloud have brought a lot of insight and helping to design a better database system. It’s not just for better performance but also to meet cost, security and business requirements.

Whichever cloud option you choose, here are 10 common mistakes to avoid if you are running Postgres on EC2 instance.

  1. You don’t need to be a cloud Expert:
    The most important thing of cloud is to understand the technology and how it works. Out of all the leading cloud providers it becomes really simple to create a login and get started. However, understanding the features and functionality becomes really crucial if you want to manage your database on cloud infrastructure. Having said that not necessary you need to be expert on all the services provided. Having working knowledge services like  VPC, IAM, EC2, Volumes & Snapshots, CloudWatch, ELB ,SNS could be a good starting point.
    2. Incorrect instance type:
    There are a lot of instance class available based upon Network, Disk and CPU performance. Each instance class has different Instance type. Choosing the right type of instance for your work load is really important whether it should be memory optimized or Compute. Having said that, Cloud providers does allow to change the instance type however it becomes bit tricky if you are looking for a long term commitment(Known as RI or Reserved Instances). Also, not necessary the type of configuration you are using currently will suffice your need of the HW requirement on Cloud. For example: If you are running your production database with 16CPU , 64 GB RAM on premises, you might need a bigger size machine on cloud. Before you start making your move to cloud make sure you choose the right instance type, run your application and identify if the EC2 is sized enough to take the load.
    3. Not using LVM for Data mount: Always Use LVM(Logical Volume Manager) as data mount point. You can’t effort to have a downtime to increase the space of your data directory every time it reaches the threshold defined. If you create a LVM, adding additional space becomes really easy if needed. All you have to do is to create another volume, attach to the instance and add it to LVM.
    4. Using Public IP : If you are running your database on AWS and wanted to access it from your home or office, setup a VPN or use jump boxes to do that. Assigning Public IP to the Database instance is not a best practice. Also Make sure your Database SG group is unique and only know IP address with ports are allowed to communicate with the Database instance.
    5. Not doing enough logging and Auditing: Most of the people don’t capture or believe in capturing enough system and database for the future auditing needs. All the cloud providers have made it really easy so capture logs and retain them to S3 storage bucket. Once the logs are moved to S3 , depending upon your organization need you can either archive the logs for longer period or apply automatic retention to it. However, at the same time you need to make sure that only authorized person has access to logs in case you want to control any modifications to it. Having altering enabled for any changes on infrastructure always help in case any member of the team changes/adds the IP or Port into Security group and it puts hole on your security.
    6. Not Idea on how to Migrate out of the cloud : When we move to cloud, our thinking is always restricted towards moving to the cloud. Well, have you thought of what if you need to move out of the cloud ? Have you thought of moving to another cloud platform or may be back to on-prem?
    6. Not committing to Reserved Capacity:  One of the reason to move to cloud is to save infrastructure cost. Moving to cloud may or may not save you money, however if you are certain that you will be running your production workload for an year or more than that without making changes to instance type, committing to RI or reserved capacity may able to save the cost by 40-50%. However there are multiple options available for the commitment, it could be either fixed which usually saves more money or convertible where you can change your instance type within the same class.
    Also, if you are running a large organization where different teams or departments are using sub-accounts it makes more sense to buy RI under parent  account so that the discount hours can also be applied across the instances which are not under Reserved capacity. Also in this case it makes more sense for Infrastructure department to make a guidelines for the user to use specific type of instances or class.
    7. Not Tracking Usage:
    Have you ever wondered that why your cloud cost is going high or higher than its expected. Make sure that you are taking a look at your forecasting and continuously keeping a track of all the resource running. Cloud makes  it really easier  to spin up instances without understanding the impact of on the cost. So the best way to control cost is to stop the resources or terminate it if you are not using it.
    8. Segregate production environment from Dev/testing. One of the feature I really like about cloud is the option to create sub accounts or sub organization. In case you are not doing that, think of a scenario where a developer wanted to stop/reboot/terminate a dev cluster and accidently terminates production database instance with the same name. Segregating different environments really helps to avoid human errors and also increases manageability of the infrastructure. Another way of avoiding these type of mistakes is by creating custom policies and making sure user has only access to specific resources. So In
    9. Not using Provisioned IOPS : If you need consistent disk performance use prisoning IOPS Disk or SSD. If you are running a very large database warehousing database it doesn’t makes sense to use SSD, and you may want to use gp2 volumes which has high burst of IOPS If needed during reporting. However, if you are running OLTP or ecommerce environment make sure you have sufficient IOPS assigned to your volume for consistent throughput. One mistake we often make is not to monitor IOPS of the volumes, please make sure you monitor it and if your system need more IOPS you can always go back and increase the IOPS. However, you also need to  consider that any of the cloud provider will not make the immediate changes to the new IOPS value and it could very well can take around 4-6 hrs of time to optimize the volume.

Leave a Reply

Your email address will not be published. Required fields are marked *