Skip to content

runtime: -msan / -asan stack corruption with CPU profiling and SetCgoTraceback context callback #71395

Closed
@prattmic

Description

@prattmic

msancall and asancall are used to call into the MSAN and ASAN C runtimes, respectively.

These wrappers need to handle stack switching, similar to asmcgocall.

If the caller is running on g0, then they just perform the call, otherwise they switch SP to g0.sched.sp and then make the call. This is normally fine, but in a signal context we will be on gsignal (not g0!), but the code the signal interrupted may have been on g0. By using g0.sched.sp, the MSAN/ASAN call will scribble all over the stack that the interrupted code is using.

As far as I know, MSAN/ASAN calls are possible from signal context in only one case:

  • runtime.cgoContextPCs contains msanwrite/asanwrite calls.
  • runtime.cgoContextPCs is reachable from the SIGPROF signal handler: runtime.sigprof -> runtime.tracebackPCs -> runtime.(*unwinder).cgoCallers -> runtime.cgoContextPCs.
  • This is only reachable if the application has registered cgo traceback via runtime.SetCgoTraceback. Note that both the traceback and context handlers must be registered. The latter is required because runtime.cgoContextPCs only calls the traceback function if gp.cgoCtxt is active, which requires a context handler.

https://go.dev/cl/643875 contains a reproducer. The allocator runs portions on the system stack, so with MSAN/ASAN plus profiling, we see crashes due to stack corruption in the allocator.

$ GOFLAGS=-msan CC=clang go test -run CgoTracebackContextProfile -v runtime
=== RUN   TestCgoTracebackContextProfile
=== PAUSE TestCgoTracebackContextProfile
=== CONT  TestCgoTracebackContextProfile
    crash_test.go:172: running /usr/local/google/home/mpratt/src/go/bin/go build -o /tmp/go-build4253652554/testprogcgo.exe
    crash_test.go:194: built testprogcgo in 1.417734407s
    crash_cgo_test.go:292: /tmp/go-build4253652554/testprogcgo.exe TracebackContextProfile: exit status 2
    crash_cgo_test.go:295: expected "OK\n" got SIGSEGV: segmentation violation
        PC=0x50d8e2 m=7 sigcode=1 addr=0x1b
        
        goroutine 0 gp=0xc0003021c0 m=7 mp=0xc000300008 [idle]:
        runtime.callers.func1()
                /usr/local/google/home/mpratt/src/go/src/runtime/traceback.go:1100 +0xc2 fp=0x7f6637ffed40 sp=0x7f6637ffec78 pc=0x50d8e2
        msancall()
                /usr/local/google/home/mpratt/src/go/src/runtime/msan_amd64.s:87 +0x2d fp=0x7f6637ffed50 sp=0x7f6637ffed40 pc=0x525c2d
        
        goroutine 24 gp=0xc000103180 m=7 mp=0xc000300008 [running, locked to thread]:
        runtime.systemstack_switch()
                /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:479 +0x8 fp=0xc00051abb0 sp=0xc00051aba0 pc=0x522728
        runtime.callers(0x7f6684100788?, {0xc00030e000?, 0x219cd20?, 0x7f6684e18470?})
                /usr/local/google/home/mpratt/src/go/src/runtime/traceback.go:1097 +0x92 fp=0xc00051ac18 sp=0xc00051abb0 pc=0x5215f2
        runtime.mProf_Malloc(0xc000300008, 0xc000330880, 0x80)
                /usr/local/google/home/mpratt/src/go/src/runtime/mprof.go:447 +0x74 fp=0xc00051ac98 sp=0xc00051ac18 pc=0x4db374
        runtime.profilealloc(0xc000300008?, 0xc000330880?, 0x80?)
                /usr/local/google/home/mpratt/src/go/src/runtime/malloc.go:1802 +0x9b fp=0xc00051acc8 sp=0xc00051ac98 pc=0x4be47b
        runtime.mallocgcSmallNoscan(0xc000330800?, 0x80?, 0x0?)
                /usr/local/google/home/mpratt/src/go/src/runtime/malloc.go:1327 +0x23c fp=0xc00051ad20 sp=0xc00051acc8 pc=0x4bd61c
        runtime.mallocgc(0x80, 0x688f80, 0x1)
                /usr/local/google/home/mpratt/src/go/src/runtime/malloc.go:1055 +0xb9 fp=0xc00051ad58 sp=0xc00051ad20 pc=0x51b4f9
        runtime.makeslice(0x0?, 0xc000103180?, 0x4b3c45?)
                /usr/local/google/home/mpratt/src/go/src/runtime/slice.go:116 +0x49 fp=0xc00051ad80 sp=0xc00051ad58 pc=0x51f449
        main.TracebackContextProfileGoFunction(...)
                /usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprogcgo/tracebackctxt.go:176
        _cgoexp_b32fe38f1ae6_TracebackContextProfileGoFunction(0x0?)
                _cgo_gotypes.go:868 +0x27 fp=0xc00051adb0 sp=0xc00051ad80 pc=0x658227
        runtime.cgocallbackg1(0x658200, 0x7f6637ffedd0, 0x1)
                /usr/local/google/home/mpratt/src/go/src/runtime/cgocall.go:444 +0x28b fp=0xc00051ae68 sp=0xc00051adb0 pc=0x4b3b8b
        runtime.cgocallbackg(0x658200, 0x7f6637ffedd0, 0x1)
                /usr/local/google/home/mpratt/src/go/src/runtime/cgocall.go:350 +0x133 fp=0xc00051aed0 sp=0xc00051ae68 pc=0x4b3833
        runtime.cgocallbackg(0x658200, 0x7f6637ffedd0, 0x1)
                <autogenerated>:1 +0x29 fp=0xc00051aef8 sp=0xc00051aed0 pc=0x526cc9
        runtime.cgocallback(0xc00051af58, 0x51a8f5, 0x662270)
                /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1084 +0xcc fp=0xc00051af20 sp=0xc00051aef8 pc=0x5244ec
        cFunction
                tracebackctxt.go:65792 pc=0x100
        cFunction
                tracebackctxt.go:256 pc=0x100
        runtime.systemstack_switch()
                /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:479 +0x8 fp=0xc00051af30 sp=0xc00051af20 pc=0x522728
        runtime.cgocall(0x662270, 0xc00051af90)
                /usr/local/google/home/mpratt/src/go/src/runtime/cgocall.go:185 +0x75 fp=0xc00051af68 sp=0xc00051af30 pc=0x51a8f5
        main._Cfunc_TracebackContextProfileCallGo()
                _cgo_gotypes.go:267 +0x3a fp=0xc00051af90 sp=0xc00051af68 pc=0x6478fa
        main.TracebackContextProfile.func1()
                /usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprogcgo/tracebackctxt.go:161 +0x7e fp=0xc00051afe0 sp=0xc00051af90 pc=0x6574be
        runtime.goexit({})
                /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00051afe8 sp=0xc00051afe0 pc=0x524741
        created by main.TracebackContextProfile in goroutine 1
                /usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprogcgo/tracebackctxt.go:158 +0x10e
...

I haven't tested older versions, but this code hasn't changed in a while, so I suspect that 1.22 and 1.23 are also affected.

Activity

added
compiler/runtimeIssues related to the Go compiler and/or runtime.
NeedsFixThe path to resolution is known, but the work has not been done.
on Jan 22, 2025
added
BugReportIssues describing a possible bug in the Go implementation.
on Jan 22, 2025
gopherbot

gopherbot commented on Jan 22, 2025

@gopherbot
Contributor

Change https://go.dev/cl/643875 mentions this issue: runtime: MSAN/ASAN + SIGPROF regression test

gopherbot

gopherbot commented on Jan 22, 2025

@gopherbot
Contributor

Change https://go.dev/cl/643897 mentions this issue: runtime: pass through -asan/-msan/-race to testprog tests

gopherbot

gopherbot commented on Jan 23, 2025

@gopherbot
Contributor

Change https://go.dev/cl/643918 mentions this issue: main.star: add linux-arm64 ASAN/MSAN builders

added this to the Backlog milestone on Jan 29, 2025
moved this from Todo to In Progress in Go Compiler / Runtimeon Jan 29, 2025
modified the milestones: Backlog, Go1.25 on Jan 29, 2025

17 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

BugReportIssues describing a possible bug in the Go implementation.NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

Projects

Status

Done

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @mknyszek@prattmic@gopherbot@gabyhelp

      Issue actions

        runtime: -msan / -asan stack corruption with CPU profiling and SetCgoTraceback context callback · Issue #71395 · golang/go