How to store data in S3 and save?

Kamil Porembiński

In today’s business and technology world, data collection and management play a key role. Storing data in the cloud has become the standard, and one popular provider of this service is Amazon Web Services (AWS) with their Amazon S3 (Simple Storage Service). It’s worth learning about strategies for using this tool effectively, which will not only allow you to organize your data efficiently but also save money. Amazon S3 provides an affordable, reliable, and extremely durable solution for data storage.

What is Amazon S3?

Amazon Simple Storage Service is a storage service offered by Amazon Web Services. Its main purpose is to be able to store and easily access any amount of data. The S3 service can be used independently or in conjunction with other AWS services. It can also integrate with third-party software. Amazon S3 is most often used for data storage and distribution, static file sharing, Big Data analysis, disaster recovery, mobile applications, backup, and archiving.

Amazon Simple Storage Service Logo

Benefits of Amazon Simple Storage Service

The benefits of Amazon S3 include immediate service configuration and startup. We do not need to declare the required storage space upfront, as the service is entirely scalable, and there are no storage limits.

Amazon’s guaranteed data availability is 99.99%. While their durability Amazon guarantees at 99.999999999%. This means that AWS’s S3 classes are designed with durability levels that correspond to an average annual expected loss of 0.000000001% of objects. For example, if you store 10 million objects in Amazon S3, you can expect to lose one object roughly every 10,000 years.

This, as well as encryption using SSL, means we can be confident in the security of our data entrusted to Amazon.

Amazon S3 also features a low cost of running the service – the fee covers the space actually used. In addition, we have various options for accessing data stored in the cloud, such as AWS console, SDK, CLI, and API.

Amazon S3 storage classes

Within Amazon Simple Storage Service, there are different storage classes. The choice depends on the required data access speed, frequency, and financial expectations.

S3 Standard	General purpose storage for active, frequently accessed data
S3 Intelligent-Tiering	Automatic savings for data with unpredictable or changing access patterns
S3 Standard-Infrequent Access	For long-lived but infrequently accessed data that requires millisecond access
S3 Glacier Instant Retrieval	For long-lived archived data that can be accessed quarterly and retrieved instantly in milliseconds
S3 Glacier Flexible Retrieval (Formerly S3 Glacier)	For long-term backups and archives with retrieval options from 1 minute to 12 hours
S3 Glacier Deep Archive	For long-term archiving of data with access once or twice a year and retrieval within 12 hours
S3 One Zone-Infrequent Access	For rarely accessed data that requires millisecond access with storage in a single Availability Zone and reduced cost
S3 Outposts	For deploying and managing AWS infrastructure inside your own data center or customer location

Amazon S3 Classes

Storage locations vary in cost

The chosen storage location within a region affects the cost of the service and data access speed. If we know what we care more about – the price of the service or the speed of transferring data – we can choose between specific regions in Europe and around the world. However, it is important to keep in mind the legal regulations that apply to our files, as well as to the region in question. Objects stored in a selected region will not change until you explicitly approve such an operation.

Understanding data access patterns

A crucial step in optimal data storage is to understand data access patterns. How often is data retrieved? Is it accessed regularly, or is it primarily archived for safety? Analyzing these patterns helps correctly assign data to the appropriate storage classes.

As an example, it is interesting to note the path that Canva, an online design tool that allows users to create, edit, and share various types of projects, has taken. With more than 100 million monthly active users and more than 15 billion projects created, it demonstrates the importance of effective data management in Amazon Web Services (AWS). Canva’s example has valuable lessons for companies seeking to optimize their data storage strategies while minimizing costs.

Canva’s success story is based on the utilization of AWS services, including Amazon S3, Amazon ECS, Amazon RDS, and Amazon DynamoDB. AWS enabled Canva to scale its infrastructure to meet the rapid user base growth. However, this growth presented challenges in efficiently managing user-generated content, including templates, stock photos, graphics, and more. Canva’s unique data landscape required careful consideration of data storage options to balance availability, costs, and performance.

Canva’s path to cost optimization began with a comprehensive analysis of data access patterns. The newly available Amazon S3 tool, Storage Class Analysis provided valuable insights by presenting graphs showing data access requests at specific intervals. This analysis allowed Canva to identify trends in data access and adjust its storage strategy accordingly.

Click to learn more about cloud migration.

Cost savings by choosing the right classes

Amazon Simple Storage Service offers various storage classes with different pricing, so choosing the right class for specific data types can lead to significant savings. Data that is accessed infrequently over time can be moved to cheaper storage classes, saving costs while maintaining accessibility when needed.

The opportunity to save on S3 service comes with Amazon Glacier and Amazon Standard-Infrequent Access, among others. These plans offer a lower price for the service based on more tailored data access.

S3 Standard-Infrequent Access – instant access for infrequently retrieved data

Data that users want to access daily is stored on Amazon S3 Standard service. Data for which the frequency of required access is lower can be moved to the Standard-Infrequent Access plan. This results in a reduction of about 40% in operating costs.

S3 Standard-IA offers an optimal solution for quickly accessing infrequently retrieved data that is needed when necessary. It offers the same exceptional durability, throughput, and low latency as S3 Standard, but at a lower cost per GB of data storage and retrieval. This unique balance of affordability and performance makes S3 Standard-IA an excellent choice for long-term data storage, backup, and disaster recovery.

Amazon S3 Glacier – saving money through archiving

Amazon S3 Glacier Instant Retrieval provides immediate data retrieval for data accessed on average quarterly. S3 Glacier Flexible Retrieval offers continuous access with retrieval times ranging from 1 minute to 12 hours.

S3 Glacier Flexible Retrieval gives users greater flexibility in choosing how to access data from the Glacier service, allowing them to tailor the retrieval process to specific project requirements. This allows users to optimize costs and performance as needed.

Amazon S3 Glacier services can reduce storage costs by approximately 80%. However, it’s important to note that there are costs associated with moving data between Amazon S3 classes. Interestingly, the cost is based on the number of objects you move ($0.02 per 1,000 objects). Objects are all the files we upload to the cloud such as photos, databases, or videos. Keep in mind, however, that the size of a single object can be up to 5 TB. At the same time, the potential savings from S3 Glacier come mainly from the total amount of stored data. It’s worth noting that the cost of moving all data is a one-time fee, while the savings from cheaper storage are continuous.

Thus, based on the average object size in the bucket, we can calculate the approximate time needed to make the transition from S3 Standard to S3 Glacier Instant Retrieval profitable. For example, if we want to move a small number of large objects, the investment pays off quickly. Otherwise, it may take months before moving a large number of small objects becomes profitable. It’s important to mention that objects in S3 Standard-IA and S3 Glacier IR are always billed as if they occupied at least 128 KB. Hence, there is a certain range, around 20KB, where it is more profitable to store objects in S3 Standard than in S3 Standard-IA or S3 Glacier Instant Retrieval.

Understanding the differences between these classes and matching them to the stored data needs is crucial for optimizing cost management and data availability in Amazon S3 Glacier service.

Automating class transitions

To optimize storage costs, it’s advisable to take advantage of automating the process of moving data between classes based on their characteristics and access patterns. Tools like lifecycle policies allow for dynamic assignment of data to appropriate classes based on predefined rules.

Calculating transition costs and savings – Canva’s example

Canva has realized that not all data is used equally often. While the templates, images, and graphics that users frequently access require the S3 Standard storage class, the content they generate, such as designs and uploaded media, has varying access patterns. For such content, Canva has used the S3 Standard-Infrequent Access storage class, which offers cost savings without compromising access times. Additionally, for use cases like log archiving and backups, Canva utilized the flexible retrieval of S3 Glacier, which was suitable for data that could be retrieved within minutes or hours. This approach ensured that data access aligned with actual usage needs.

Given the costs associated with moving data between S3 classes, Canva’s substantial data inventory required a thorough evaluation of costs versus potential savings. By considering factors such as object size and transfer fees, Canva determined a balance point for each transition between storage classes.

The transition from S3 classes to S3 Glacier Instant Retrieval proceeded smoothly and resulted in significant cost savings. Approximately 130 petabytes out of Canva’s total 230 petabytes of data in S3 now reside in S3 Glacier Instant Retrieval, which has resulted in a significant cost decrease. The company’s proactive approach to understanding data access patterns and strategically transitioning between storage classes has led to savings of about $300,000 per month, totaling $3.6 million per year.

QLOS Case Study

In the described case, the client was a leader in the field of online terrain maps, and the main objective was to conduct an audit of their IT infrastructure. The client’s service is a complex application that is used daily by thousands of users. They reported a problem with running out of server space, which could have led to costly infrastructure expansion.

As part of the solution, a series of actions were taken to optimize the client’s IT environment. The performance of the MySQL (MariaDB) database was improved by adjusting its configuration, resulting in better query performance and shorter response times. In addition, robust SQL logging mechanisms were implemented and resource-intensive components were moved to the cloud, improving backup efficiency and reducing loading times. The process of automatic database backups has been improved, and the resilience of the test environment on the server has been increased through appropriate security and backup measures.

As a result of these actions, the client avoided the need to expand their server infrastructure. Additionally, information security was increased and website performance was improved (shorter page load times affect SEO and reduce bounce rates). The audit also helped the client meet regulatory requirements related to data privacy and protection.

Summary

Efficient data storage on Amazon S3 is a critical element of many companies’ strategies. Choosing the right storage classes based on access patterns and data characteristics can lead to significant financial savings. Accurate analysis and monitoring of costs, automation of processes, and understanding of business needs are key factors that maximize the potential of Amazon S3 and achieve optimal data management results.

This is how we approach optimization at QLOS. A properly conducted IT infrastructure audit, in-depth analysis of the results, and skillful implementation of changes result in benefits such as improved performance, increased security, and cost optimization.

As part of the infrastructure audit, the client receives a comprehensive report each time, which includes a detailed analysis of the current state, problem identification, and recommendations for improvements, as well as an action plan for the client.

Back to blog