可能是为了避免重复造轮子,Go官方库推出了sync.Pool:一个thread-safe、可回收/重用对象的内存池。对性能优化狂魔而言,sync.Pool无疑是一个优化GC的好工具,因为理论上重用对象会减少了GC次数,缩短latency。这篇文章是sync.Pool的性能验证报告:sync.Pool确实能极大减少GC次数。
Benchmark关注什么?
在写Benchmark代码之前,要先确定如何衡量GC效果。很直观地,GC次数越少,效果越好。但GC次数的粒度太大,说服力不够,还需要其他的指标。
这篇文章Golang real time gc 给我了答案。不断往一个size固定的buffer里覆盖写入数据,记录写入耗时。被覆盖掉的数据会变成垃圾,继而触发GC,所以耗时就是latency。
原文引述如下:
The benchmark program repeatedly pushes messages into a size-limited buffer. Old messages constantly expire and become garbage.
于是,Benchmark的实现,以及关注的指标就确定了:
- GC次数
- 数据写入耗时
代码实现
不用sync.Pool的实现
见https://play.golang.org/p/049Xmy1lTfV
package main
import (
"fmt"
"time"
)
const (
windowSize = 200000
msgCount = 100000000
)
type (
message []byte
buffer map[int]message
)
var worst time.Duration
func mkMessage(n int) message {
m := make(message, 1024)
for i := range m {
m[i] = byte(n)
}
return m
}
func pushMsg(b *buffer, highID int) {
start := time.Now()
m := mkMessage(highID)
(*b)[highID%windowSize] = m
elapsed := time.Since(start)
if elapsed > worst {
worst = elapsed
}
}
func main() {
b := make(buffer, windowSize)
for i := 0; i < msgCount; i++ {
pushMsg(&b, i)
}
fmt.Println("Worst push time: ", worst)
}
用sync.Pool的实现
见https://play.golang.org/p/Wop29wN7Dp
package main
import (
"fmt"
"sync"
"time"
)
const (
windowSize = 200000
msgCount = 100000000
)
type (
message []byte
buffer map[int]message
)
var worst time.Duration
//pool for statistics model
var statModelPool = sync.Pool{
New: func() interface{} {
return make(message, 1024)
},
}
func mkMessage(n int) message {
m := statModelPool.Get().(message)
for i := range m {
m[i] = byte(n)
}
return m
}
func pushMsg(b *buffer, highID int) {
start := time.Now()
m := mkMessage(highID)
if highID > windowSize {
statModelPool.Put((*b)[highID%windowSize])
}
(*b)[highID%windowSize] = m
elapsed := time.Since(start)
if elapsed > worst {
worst = elapsed
}
}
func main() {
b := make(buffer, windowSize)
for i := 0; i < msgCount; i++ {
pushMsg(&b, i)
}
fmt.Println("Worst push time: ", worst)
}
运行代码
因为要观察GC次数,我们需要打开GODEBUG的GCTRACE开关`GODEBUG=gctrace=1`。
(下面的数据是在我的Thinkpad T450上跑出来的。)
不使用sync.Pool的实现
触发454次GC,最差写入耗时50.40ms。
摘录一部分output:
$ GODEBUG=gctrace=1 go run benchmark_gc.go
gc 1 @0.041s 0%: 0.044+0.39+0.037 ms clock, 0.13+0.19/0.26/0.40+0.11 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
...
gc 454 @106.994s 4%: 0.012+29+0.045 ms clock, 0.048+1.6/26/39+0.18 ms cpu, 422->437->219 MB, 439 MB goal, 4 P
Worst push time: 50.401524ms
用sync.Pool的实现
触发22次GC,最差写入耗时36.14ms
摘录一部分output:
GODEBUG=gctrace=1 go run benchmark_gc_pool.go
gc 1 @0.045s 0%: 0.047+1.2+0.077 ms clock, 0.19+0.12/1.1/0.50+0.30 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
# command-line-arguments
gc 1 @0.007s 6%: 0.051+2.2+0.024 ms clock, 0.15+0.32/2.1/1.5+0.073 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
.....
gc 22 @76.006s 0%: 0.015+53+0.039 ms clock, 0.062+1.7/32/0.62+0.15 ms cpu, 401->401->205 MB, 411 MB goal, 4 P
Worst push time: 36.141858ms
结论
sync.Pool的效果很不错,值得尝试。