Accelerate innovation by shifting left finops, In part 1st and 2nd we understood the importance of cost models and how to create and refine cost models. In some subsequent parts, we have learned how to optimize our workload components across infrastructure, applications and data. This is the final part and we will present the impact and results related to applying the cost optimization and ShiftLift FinOps techniques for a cloud-native application. We will use the sample application or workload given in this article.
The application which is cloud native performs data ingestion processing to enrich and analyze the data and then outputs the data along with reports and insights for a user. Now we will utilize a cloud agnostic approach and break the workload and optimization techniques across infrastructure, application and data components. The sections show the gains and impact of applying these techniques across various layers.
Also Read:- Optimizing pgbench for cockroach DB part 3 | Demystifying virtual thread performance: unveiling the truth beyond the buzz
Building and refining the cost model reference (part 2 of this series): lets create and refine the FinOps cost model.
We will see the requirement to find a balance between model complexity and the granularity of performance levels expected from the solution. The table given for example, will help you to determine your compute requirements based on the data ingestion timelines for the workload.
Monthly source data size in GIB | Data processing time in hours (Approx) | Number of instances (processing hours/24) | Compute instance type | Storage size in GIB |
>=300 | 196 | 9 | CPU:4,RAM:32 GB | 50 |
200<300 | 42 | 2 | CPU: 4RAM:32 GB | 50 |
100<200 | 20 | 1 | CPU:4RAM: 32 GB | 20 |
50<100 | 17 | 1 | CPU:4RAM: 32 GB | 10 |
20<50 | 10 | 1 | CPU:4RAM: 32 GB | 5 |
5<20 | 6 | 1 | CPU:4RAM: 32 GB | 2 |
<5 | 2 | 1 | CPU:4RAM: 32 GB | 0.5 |
Based on the monthly source data and the response time needed for the queries, the databases are sized.
Query type | Elapsed time | Slots consumed | Slot time consumed |
The query for line items | 19s | 81.95 | 25 min 57 sec |
Query for aggregated records | 15s | 90.47 | 22 min 37 sec |
A workload with planning capability as provided here by Cloudability is recommended to create the cost model as well as track, monitor and analyze your spending through development, deployment and operations.
Cost optimization techniques for infrastructure
The following table lists the potential optimization and techniques for the infrastructure layer applied to the example workload.
S.No | Optimization objective | Area to optimize | Approx. cost savings | Techniques/considerations |
1 | Implement a controller to scale in/out the sc2 instances (required to run the application dynamically) | Compute | 10% of compute cost | There is an additional cost incurred to maintain the job statistics and the overall computed cost is reduced. Note: this optimization may look costly because of the additional cost explained above for a single data domain. But if the workload is huge then it will be beneficial. |
2 | Split the data processing jobs to run the data processing part with spot instances | Compute | 60% of compute cost | The spot instances can bring lots of savings but it split the jobs should not overflow the number of jobs. |
3 | Implement S3 and cloud storage clean-up policy | Storage | 50% of storage cost | After a certain period ( based on the business requirement) adding data archival rule is cost-beneficial for the longer run of the application. |
Cost optimization techniques for applications
Indicate the figures and potential optimizations with impact and savings by applying the techniques discussed related to application component optimizations to the example workload given below in the table.
S.No | Optimization objective | Area to optimize | Approx. cost optimization | Techniques/considerations |
1 | Implement EMR to do the data instead of using EC2 instances | Applications | 50% of compute cost | You will see the EMR processes the data very fast with a higher configuration. So uptime and running time of the cluster are lesser than ASG-based comute infra. Note: this optimization may look a bit costly for single customers because of the additional cost at the cluster level, but if the workload is high, then it will be beneficial. |
2 | Review of job slicing and job scheduling | Application | 40% of compute cost | Categorize jobs as T-shirt sizes. It will bug, medium, and small and allocate compute accordingly. By splitting the incoming file into smaller chunks and adjusting the job size and schedule to leverage the use of spot instances with optimal duration and availability and introduce a parallel process. Use spot instances in place of on demand instances to lower overall cost. |
Cost optimization techniques for data
You can indicate figures and potential optimizations with impact and savings by applying the techniques discussed related to data components optimization to the example workload given below in the table.
S.No | Optimization objective | Area to optimize | Approx. cost savings | Techniques/considerations |
1 | Read from file storage instead of dynamodb | Data/read API Calls | 30% of data cost | Make enrichment data available in files in S3 compared to DynamoDB, and file storage does not need additional RCU costs compared to Dynamodh. This will be helpful for fetching multiple records from a data store at a time. |
2 | Use a single partition instead of two partitions to fetch the data | Data/Schema changes | 80% of data cost | Implement the available filters to limit the fetch of data across a single partition only. |
3 | Optimization BigQuery slot usage and requirement | Data/query optimization | 15% of data cost | You can estimate the number of max slots needed for a given period. From existing functionality or other projects using similar functionality like GCP BigQuery this can be estimated. The goal should be to buy optimal slots for a longer period based on past experience and the current requirement for the example cloud-native application. |
Conclusion: Accelerate innovation by shifting left finops
This is the final part of the series, and we shared a worked-out example to illustrate the cost optimization techniques for the functional components and their related impact. Our objective in this series is to articulate the value of shifting left FinOps earlier in the application solution design phase itself. I hope this information helps you.