A very happy Golang memory profiling story at CloudQuery
Introduction #
OutOfMemory
issue in one of our popular sync configurations: syncing from an AWS source to an S3 destination.How to run a memory profile? #
pprof
is to add this import to main.go
:import _ "net/http/pprof"
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
go tool pprof http://localhost:6060/debug/pprof/heap
pprof
file you can chart. For a quick chart:go tool pprof -png http://localhost:6060/debug/pprof/heap > out.png
GC Debug Logs #
GODEBUG=gctrace=1 cloudquery sync ...
stderr
every time your binary runs a GC:gc 1966 @134.752s 2%: 0.76+5.7+0.036 ms clock, 6.1+0.52/10/2.4+0.29 ms cpu, 68->70->35 MB, 71 MB goal, 0 MB stacks, 0 MB globals, 8 P
- When GC ran, in seconds, since the binary started.
- CPU % spent in GC.
- Heap size before GC, heap size after GC, and live heap after GC.
gcvis
. I created a similar tool for more customization:- Pipe the
stder
r output to a file. - Write a regex-based parse script to extract metrics to a CSV file.
- Plot it with
matplotlib
.
Profiling CloudQuery syncs with gctrace
#
- An initial spike of 2.5GB, leading to an OOM at peak, with a long tail.
- The total heap is twice as high as the live heap, meaning 50% of the total heap is garbage.
GOMEMLIMIT to the rescue? #
GOMEMLIMIT
. The GOMEMLIMIT
variable sets a soft memory limit for the runtime, running GC more often when this threshold is crossed (read more here).- Setting an absolute
GOMEMLIMIT
value for each customer setup is challenging. - Legitimate high memory use will still cause OOM.
- Frequent GC increases CPU usage, slowing down sync time.
- Total heap usage decreased by over 50%.
- GC ran more frequently, consuming more CPU.
- Sync time increased significantly.
Revisiting pprof #
- A bug in the SDK.
- Our misuse or misconfiguration of the SDK
- Legitimate but inefficient use of the SDK.
Three engineers look at three processes 👀 #
Source back pressure #
- 42% reduction in memory usage.
- Total sync time remained the same.
Apache Arrow's RecordBatch #
S3 SDK's memory over-allocation #
5MB
per uploaded chunk if the supplied io.Reader
didn’t support seeking, only freeing memory when the upload succeeded. He fixed this by providing an io.Reader
that supported seeking.- 84% memory usage reduction!
- Total sync time remained the same.
Conclusion #
- Fix our customers' memory issues.
- Better understand how our concurrency model impacts memory consumption.
- Develop a streamlined mechanism for profiling our popular source/destination matrix
- Have a lot of fun working on it!

Written by Mariano Gappa
Mariano is a software engineer working at CloudQuery with 15 years of experience in the industry. His speciality is in improving performance and his work has reduced sync times and significantly improved CloudQuery's performance.