This is a tool to convert assembly as generated by a C/C++ compiler into Golang assembly. It is meant to be used in combination with asm2plan9s in order to automatically generate pure Go wrappers for C/C++ code (that may for instance take advantage of compiler SIMD intrinsics or template<> code).
Mode of operation:
$ c2goasm -a /path/to/some/great/c-code.s /path/to/now/great/golang-code_amd64.s
You can optionally nicely format the code using asmfmt by passing in an -f flag.
This project has been developed as part of developing a Go wrapper around Simd. However it should also work with other projects and libraries. Keep in mind though that it is not intented to 'port' a complete C/C++ project in a single action but rather do it on a case-by-case basis per function/source file (and create accompanying high level Go code to call into the assembly code).
Command line options
$ c2goasm --help
Usage of c2goasm:
-a Immediately invoke asm2plan9s
-c Compact byte codes
-f Format using asmfmt
-s Strip comments
A simple example
Here is a simple C function doing an AVX2 intrinsics computation:
And as you may have gathered the amd64.go file needs to be in place in order for the arguments names to be derived (and allow go vet to succeed).
Benchmark against cgo
We have run benchmarks of c2goasm versus cgo for both Go version 1.7.5 and 1.8.1. You can find the c2goasm benchmark test in test/ and the cgo test in cgocmp/ respectively. Here are the results for both versions:
$ benchcmp ../cgocmp/cgo-1.7.5.out c2goasm.out
benchmark old ns/op new ns/op delta
BenchmarkMultiplyAndAdd-12 382 10.9 -97.15%
$ benchcmp ../cgocmp/cgo-1.8.1.out c2goasm.out
benchmark old ns/op new ns/op delta
BenchmarkMultiplyAndAdd-12 236 10.9 -95.38%
As you can see Golang 1.8 has made a significant improvement (38.2%) over 1.7.5, but it is still about 20x slower than directly calling into assembly code as wrapped by c2goasm.
The basic process is to (in the prologue) setup the stack and registers as how the C code expects this to be the case, and upon exiting the subroutine (in the epilogue) to revert back to the golang world and pass a return value back if required. In more details:
Define assembly subroutine with proper golang decoration in terms of needed stack space and overall size of arguments plus return value.
Function arguments are loaded from the golang stack into registers and prior to starting the C code any arguments beyond 6 are stored in C stack space.
Stack space is reserved and setup for the C code. Depending on the C code, the stack pointer maybe aligned on a certain boundary (especially needed for code that takes advantages of SIMD instructions such as AVX etc.).
A constants table is generated (if needed) and any rip-based references are replaced with proper offsets to where Go will put the table.
Limitations
Arguments need (for now) to be 64-bit size, meaning either a value or a pointer (this requirement will be lifted)
Maximum number of 14 arguments (hard limit -- if you hit this maybe you should rethink your api anyway...)
Generally no call statements (thus inline your C code) with a couple of exceptions for functions such as memset and memcpy (see clib_amd64.s)
Generate assembly from C/C++
For eg. projects using cmake, here is how to see a list of assembly targets
$ make help | grep "\.s"
To see the actual command to generate the assembly
$ make -n SimdAvx2BgraToGray.s
Supported golang architectures
For now just the AMD64 architecture is supported. Also ARM64 should work just fine in a similar fashion but support is lacking at the moment.
Do not generate unwind tables (for debug purposes)
-fno-exceptions
Disable exception handling
-fno-rtti
Disable run-time type information
The following flags are only available in clang -cc1 frontend mode (see below):
Flag
Explanation
-fno-jump-tables
Do not use jump tables as may be generated for select statements
clang vs clang -cc1
As per the clang FAQ, clang -cc1 is the frontend, and clang is a (mostly GCC compatible) driver for the frontend. To see all options that the driver passes on to the frontend, use -### like this:
$ clang -### -c hello.c
"/usr/lib/llvm/bin/clang" "-cc1" "-triple" "x86_64-pc-linux-gnu" etc. etc. etc.
Command line flags for clang
To see all command line flags use either clang --help or clang --help-hidden for the clang driver or clang -cc1 -help for the frontend.
Further optimization and fine tuning
Using the LLVM optimizer (opt) you can further optimize the code generation. Use opt -help or opt -help-hidden for all available options.
An option can be passed in via clang using the -mllvm <value> option, such as -mllvm -inline-threshold=1000 as discussed above.
Also LLVM allows you to tune specific functions via function attributes like define void @f() alwaysinline norecurse { ... }.
What about GCC support?
For now GCC code will not work out of the box. However there is no reason why GCC should not work fundamentally (PRs are welcome).
请发表评论