AWS Tip Of The Day: Get S3 disk usage per bucket

Getting disk usage statistics for Amazon S3 buckets is not that easy. Things are quite simple when your objects count and data size is reasonable and by reasonable I mean something around < 10 TB and < 1000 objects per bucket.

You can either use s4cmd (fast, multi threaded)

s4cmd du -r s3://dbbackups/

or AWS Cli tools

aws s3api list-objects --bucket dbbackups --output json --query "[sum(Contents[].Size), length(Contents[])]"

But when your buckets have >30000 objects totaling at hunderds of terabytes, then those api calls are slow. Really slow. One solution that works for me is to use AWS Usage Reports to generate report for the last full day and then parse this report using bash one liner.

cat report.csv | awk -F, '{printf "%.2f GB %s %s \n", $7/(1024**3 )/24, $4, $2}' | sort -n

This converts CSV file into a list that displays disk usage in GB, one line per bucket. Values presented in the original report are in Byte-Hours per day and you can read more about it on official AWS S3 Billing FAQ entry.

To obtain S3 Usage reports follow these steps

  1. Go to AWS Usage Reports
  2. Select Amazon Simple Storage Service
  3. Select TimedStorage-ByteHrs
  4. Select Custom date range, fill for last day
  5. Download CSV
  6. Apply above bash one liner for downloaded report file

S3 Usage Report

I’ll update this post if I’ll come up with some automated solution.

comments powered by Disqus