Efficiency File Counting in AWS S3 with Go Concurrency

Published on November 11, 2024

While working on a task assigned to me at work, I needed to search through a range of folders in AWS S3 and count the total number of files. At first glance, this seemed straightforward.

The Initial Approach: Bash and AWS CLI

Initially, I tried using a bash script with the AWS CLI to count files by listing each folder one by one. However, this approach quickly became inefficient due to its sequential nature. Processing each folder individually takes a very long time, especially in folders containing many files.

Moving to Go for Faster Processing

To streamline the process, I turned to Go (Golang), which offers robust concurrency support. By leveraging Go’s goroutines, I was able to count files across multiple folders in parallel, dramatically reducing the overall time needed for the task.

Go Implementation with Concurrency

I developed a tool countS3 using Go that utilizes the AWS SDK for Go, goroutines, and sync.WaitGroup to manage concurrent tasks. The code is available on GitHub: countS3.

Here’s a breakdown of the main components of the Go implementation:

Goroutines: Each folder count is run in a separate goroutine.
WaitGroup: Ensures that the main function waits until all file-counting goroutines are complete.
Channel: Collects the folder path that needs to be counted, allowing us multiple goroutines to pick up the folder to count concurrently.

Code Structure and Functions

1. Counting Files in a Single S3 Folder
The CountFilesInS3Folder function counts the files in a single folder by listing objects under a specific prefix (folder path). It excludes the folder itself (often represented by an object of size 0).

// internal/count/count.go
func CountFilesInS3Folder(client *s3.Client, bucket string, prefix string) {
 var count int

 for {
  input := &s3.ListObjectsV2Input{
   Bucket: aws.String(bucket),
   Prefix: aws.String(prefix),
  }

  result, err := client.ListObjectsV2(context.TODO(), input)
  if err != nil {
   log.Fatal(err)
  }

  for _, object := range result.Contents {
   // Exclude the folder itself (usually represented as an object with size 0)
   if *object.Size > 0 {
    count++
   }
  }

  if !*result.IsTruncated {
   break // No more objects to retrieve
  }
 }

 fmt.Printf("Total files in folder %s: %d\n", prefix, count)
}

2. Queueing Folder Paths as Jobs
The QueueJob function reads a file containing folder paths and adds each folder path to the jobs channel.

// internal/queue/queue.go
func QueueJob(fileName string, jobs chan string) {
 file, err := os.Open(fileName)

 if err != nil {
  log.Fatal(err)
 }
 defer file.Close()

 scanner := bufio.NewScanner(file)

 for scanner.Scan() {
  job := scanner.Text()
  jobs <- job
 }
}

3. Worker Pool for Concurrent Execution
The Worker function retrieves folder paths from the jobs channel and executes CountFilesInS3Folder for each one. The worker pool size is defined by workerPool, which specifies the number of concurrent workers to use when executing the count.

// internal/worker/worker.go
func Worker(s3c *s3.Client, bucket string, jobs chan string, wg *sync.WaitGroup) {

 for job := range jobs {
  count.CountFilesInS3Folder(s3c, bucket, job)
 }
 defer wg.Done()
}

4. Starting the Workers
In the main function, workerPool is the maximum number of go routines that will be created to run the count job. This will be passed in by the -w parameter.

// cmd/counts3/main.go
 for i := 1; i <= workerPool; i++ {
  wg.Add(1)
  go worker.Worker(s3Client, bucketName, jobs, &wg)
 }

Benefits and Results

Using Go’s concurrency model, I reduced the file-counting time significantly. Instead of waiting for each folder to be processed one by one, the parallel approach aggregates results much faster, improving both efficiency and performance.

Efficiency File Counting in AWS S3 with Go Concurrency was originally published in Government Digital Products, Singapore on Medium, where people are continuing the conversation by highlighting and responding to this story.

Continue reading on website

Other news

🌸 Spring bingo - Wellness challenge - Halfway! 🌸

April 15, 2025

Hey Hivebriters! Quick check-in on our April Wellness Challenge - Spring Bingo! We're halfway through the month, and it's the perfect time to jump in if you haven't started yet (or keep going if you have)! Quick Reminders:Complete rows or columns for 5 raffle entries eachSquares with 📷 require photo submissions in the commentsSubmit completed rows/columns through the form by April 30thBonus entri