This package provides bindings to the Intel MKL Vector Mathematics Functions.
This is often substantially faster than broadcasting Julia's built-in functions, especially when applying a transcendental function over a large array.
Until Julia 0.6 the package was registered as VML.jl.
Similar packages are Yeppp.jl, which wraps the open source Yeppp library, and AppleAccelerate.jl, which provides access to macOS's Accelerate framework.
Warning for macOS
There is currently the following issue between the CompilerSupportLibraries_jll artifact, which is used for example by SpecialFunctions.jl, and MKL_jll. Unless MKL_jll is loaded first, there might be wrong results coming from a small number of function for particular input array lengths. If you are unsure which, if any, your used packages might load this artifact, loading IntelVectorMath as the very first package should be fine.
Basic install
To install IntelVectorMath.jl run
julia> ] add IntelVectorMath
Since version 0.4 IntelVectorMath uses the MKL_jll artifact, which is shared with other packages uses MKL, removing several other dependencies. This has the side effect that from version 0.4 onwards this package requires at least Julia 1.3.
For older versions of Julia IntelVectorMath v0.3 downloads its own version of MKL and keeps only the required files in its own directory. As such installing MKL.jl or MKL via intel are no longer required, and may mean some duplicate files if they are present. However, this package will adopt the new artifact system in the next minor version update and fix this issue.
In the event that MKL was not installed properly you will get an error when first using it. Please try running
julia> ] build IntelVectorMath
If this does not work, please open an issue and include the output of <packagedir>/deps/build.log.
Renaming from VML
If you used this package prior to its renaming, you may have to run ] rm VML first. Otherwise there will be a conflict due to the UUID.
Using IntelVectorMath
After loading IntelVectorMath, you have the supported function listed below, for example IntelVectorMath.sin(rand(100)). These should provide a significant speed-up over broadcasting the Base functions.
Since the package name is quite long, an alias IVM is also exported to allow IVM.sin(rand(100)) after using the package.
If you import the package, you can add this alias via const IVM = IntelVectorMath. Equally, you can replace IVM with another alias of your choice.
Example
julia>using IntelVectorMath, BenchmarkTools
julia> a =randn(10^4);
julia>@btimesin.($a); # apply Base.sin to each element102.128 μs (2 allocations:78.20 KiB)
julia>@btime IVM.sin($a); # apply IVM.sin to the whole array20.900 μs (2 allocations:78.20 KiB)
julia> b =similar(a);
julia>@btime IVM.sin!(b, a); # in-place version20.008 μs (0 allocations:0 bytes)
Accuracy
By default, IntelVectorMath uses VML_HA mode, which corresponds to an accuracy of
<1 ulp, matching the accuracy of Julia's built-in openlibm
implementation, although the exact results may be different. To specify
low accuracy, use vml_set_accuracy(VML_LA). To specify enhanced
performance, use vml_set_accuracy(VML_EP). More documentation
regarding these options is available on
Intel's website.
Performance
Summary of Results:
Relative speed of IntelVectorMath/Base: The height of the bars is how fast IntelVectorMath is compared to using broadcasting for functions in Base
Full Results:
Real Functions - Full Benchmark Results
Complex Functions - Full Benchmark Results
Real Functions - Performance over dimensions
Tests were performed on an Intel(R) Core(TM) i5-8250U @ 1.6 [GHz] 1800 Mhz. The dashed line indicates equivalent performance for IntelVectorMath versus the implementations in Base.
Supported functions
IntelVectorMath.jl supports the following functions, most for Float32 and
Float64, while some also take complex numbers.
Unary functions
Allocating forms have signature f(A). Mutating forms have signatures
f!(A) (in place) and f!(out, A) (out of place). The last 9 functions have been moved from Base to SpecialFunctions.jl or have no Base equivalent.
Allocating
Mutating
acos
acos!
asin
asin!
atan
atan!
cos
cos!
sin
sin!
tan
tan!
acosh
acosh!
asinh
asinh!
atanh
atanh!
cosh
cosh!
sinh
sinh!
tanh
tanh!
cbrt
cbrt!
sqrt
sqrt!
exp
expm1!
log
log!
log10
log10!
log1p
log1p!
abs
abs!
abs2
abs2!
ceil
ceil!
floor
floor!
round
round!
trunc
trunc!
erf
erf!
erfc
erfc!
erfinv
erfinv!
efcinv
efcinv!
gamma
gamma!
lgamma
lgamma!
inv_cbrt
inv_cbrt!
inv_sqrt
inv_sqrt!
pow2o3
pow2o3!
pow3o2
pow3o2!
Binary functions
Allocating forms have signature f(A, B). Mutating forms have
signature f!(out, A, B).
Allocating
Mutating
atan
atan!
hypot
hypot!
pow
pow!
divide
divide!
Next steps
Next steps for this package
Windows support
Basic Testing
Avoiding overloading base and optional overload function
Travis and AppVeyor testing
Adding CIS function
Move Testing to GitHub Actions
Add test for using standalone MKL
Update Benchmarks
Add tests for mutating functions
Add own dependency management via BinaryProvider
Update function list in README
Adopt Julia 1.3 artifact system, breaking backwards compatibility
Advanced
IntelVectorMath.jl uses CpuId.jl to detect if your processor supports the newer avx2 instructions, and if not defaults to libmkl_vml_avx. If your system does not have AVX this package will currently not work for you.
If the CPU feature detection does not work for you, please open an issue.
请发表评论