The SciML Style Guide is a style guide for the Julia programming language. It is used by the
SciML Open Source Scientific Machine Learning Organization. As such, it is
open to discussion with the community. Please file an issue or open a PR to discuss changes to
the style guide.
A style guide is about consistency. Consistency with this style guide is important.
Consistency within a project is more important. Consistency within one module or function is the most important.
But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply.
When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!
Some code within the SciML organization is old, on life support, donated by researchers to be maintained.
Consistency is the number one goal, so updating to match the style guide should happen on a repo-by-repo
basis, i.e. do not update one file to match the style guide (leaving all other files behind).
Community Contribution Guidelines
For a comprehensive set of community contribution guidelines, refer to ColPrac.
A relevant point to highlight PRs should do one thing. In the context of style, this means that PRs which update
the style of a package's code should not be mixed with fundamental code contributions. This separation makes it
easier to ensure that large style improvement are isolated from substantive (and potentially breaking) code changes.
Open source contributions are allowed to start small and grow over time
If the standard for code contributions is that every PR needs to support every possible input type that anyone can
think of, the barrier would be too high for newcomers. Instead, the principle is to be as correct as possible to
begin with, and grow the generic support over time. All recommended functionality should be tested, any known
generality issues should be documented in an issue (and with a @test_broken test when possible). However, a
function which is known to not be GPU-compatible is not grounds to block merging, rather its an encouragement for a
follow-up PR to improve the general type support!
Generic code is preferred unless code is known to be specific
For example, the code:
functionf(A, B)
for i in1:length(A)
A[i] = A[i] + B[i]
endend
would not be preferred for two reasons. One is that it assumes A uses one-based indexing, which would fail in cases
like OffsetArrays and FFTViews.
Another issue is that it requires indexing, while not all array types support indexing (for example,
CuArrays). A more generic compatible implementation of this function would be
to use broadcast, for example:
functionf(A, B)
@. A = A + B
end
which would allow support for a wider variety of array types.
Internal types should match the types used by users when possible
If f(A) takes the input of some collections and computes an output from those collections, then it should be
expected that if the user gives A as an Array, the computation should be done via Arrays. If A was a
CuArray, then it should be expected that the computation should be internally done using a CuArray (or appropriately
error if not supported). For these reasons, constructing arrays via generic methods, like similar(A), is preferred when
writing f instead of using non-generic constructors like Array(undef,size(A)) unless the function is documented as
being non-generic.
Trait definition and adherence to generic interface is preferred when possible
Julia provides many different interfaces, for example:
Those interfaces should be followed when possible. For example, when defining broadcast overloads,
one should implement a BroadcastStyle as suggested by the documentation instead of simply attempting
to bypass the broadcast system via copyto! overloads.
When interface functions are missing, these should be added to Base Julia or an interface package,
like ArrayInterface.jl. Such traits should be
declared and used when appropriate. For example, if a line of code requires mutation, the trait
ArrayInterface.ismutable(A) should be checked before attempting to mutate, and informative error
messages should be written to capture the immutable case (or, an alternative code which does not
mutate should be given).
One example of this principle is demonstrated in the generation of Jacobian matrices. In many scientific
applications, one may wish to generate a Jacobian cache from the user's input u0. A naive way to generate
this Jacobian is J = similar(u0,length(u0),length(u0)). However, this will generate a Jacobian J such
that J isa Matrix.
Macros should be limited and only be used for syntactic sugar
Macros define new syntax, and for this reason they tend to be less composable than other coding styles
and require prior familiarity to be easily understood. One principle to keep in mind is, "can the person
reading the code easily picture what code is being generated?". For example, a user of Soss.jl may not know
what code is being generated by:
@model (x, α) begin
σ ~Exponential()
β ~Normal()
y ~For(x) do xj
Normal(α + β * xj, σ)
endreturn y
end
and thus using such a macro as the interface is not preferred when possible. However, a macro like
@muladd is trivial to picture on a code (it recursively
transforms a*b + c to muladd(a,b,c) for more
accuracy and efficiency), so using
such a macro for example:
Some performance macros, like @simd, @threads, or
@turbo from LoopVectorization.jl,
make an exception in that their generated code may be foreign to many users. However, they still are
classified as appropriate uses as they are syntactic sugar since they do (or should) not change the behavior
of the program in measurable ways other than performance.
Errors should be caught as high as possible, and error messages should be contextualized for newcomers
Whenever possible, defensive programming should be used to check for potential errors before they are encountered
deeper within a package. For example, if one knows that f(u0,p) will error unless u0 is the size of p, this
should be caught at the start of the function to throw a domain specific error, for example "parameters and initial
condition should be the same size".
Subpackaging and interface packages is preferred over conditional modules via Requires.jl
Requires.jl should be avoided at all costs. If an interface package exists, such as
ChainRulesCore.jl for defining automatic differentiation
rules without requiring a dependency on the whole ChainRules.jl system, or
RecipesBase.jl which allows for defining Plots.jl
plot recipes without a dependency on Plots.jl, a direct dependency on these interface packages is
preferred.
Otherwise, instead of resorting to a conditional dependency using Requires.jl, it is
preferred one creates subpackages, i.e. smaller independent packages kept within the same Github repository
with independent versioning and package management. An example of this is seen in
Optimization.jl which has subpackages like
OptimizationBBO.jl for
BlackBoxOptim.jl support.
Some important interface packages to know about are:
Functions should either attempt to be non-allocating and reuse caches, or treat inputs as immutable
Mutating codes and non-mutating codes fall into different worlds. When a code is fully immutable,
the compiler can better reason about dependencies, optimize the code, and check for correctness.
However, many times a code making the fullest use of mutation can outperform even what the best compilers
of today can generate. That said, the worst of all worlds is when code mixes mutation with non-mutating
code. Not only is this a mishmash of coding styles, it has the potential non-locality and compiler
proof issues of mutating code while not fully benefiting from the mutation.
Out-Of-Place and Immutability is preferred when sufficient performant
Mutation is used to get more performance by decreasing the amount of heap allocations. However,
if it's not helpful for heap allocations in a given spot, do not use mutation. Mutation is scary
and should be avoided unless it gives an immediate benefit. For example, if
matrices are sufficiently large, then A*B is as fast as mul!(C,A,B), and thus writing
A*B is preferred (unless the rest of the function is being careful about being fully non-allocating,
in which case this should be mul! for consistency).
Similarly, when defining types, using struct is preferred to mutable struct unless mutating
the struct is a common occurrence. Even if mutating the struct is a common occurrence, see whether
using SetField.jl is sufficient. The compiler will optimize
the construction of immutable structs, and thus this can be more efficient if it's not too much of a
code hassle.
Tests should attempt to cover a wide gamut of input types
Code coverage numbers are meaningless if one does not consider the input types. For example, one can
hit all of the code with Array, but that does not test whether CuArray is compatible! Thus it's
always good to think of coverage not in terms of lines of code but in terms of type coverage. A good
list of number types to think about are:
When in doubt, a submodule should become a subpackage or separate package
Keep packages to one core idea. If there's something separate enough to be a submodule, could it
instead be a separate well-tested and documented package to be used by other packages? Most likely
yes.
Globals should be avoided whenever possible
Global variables should be avoided whenever possible. When required, global variables should be
consts and have an all uppercase name separated with underscores (e.g. MY_CONSTANT). They should be
defined at the top of the file, immediately after imports and exports but before an __init__ function.
If you truly want mutable global style behaviour you may want to look into mutable containers.
Type-stable and Type-grounded code is preferred wherever possible
Type-stable and type-grounded code helps the compiler create not only more optimized code, but also
faster to compile code. Always keep containers well-typed, functions specializing on the appropriate
arguments, and types concrete.
Closures should be avoided whenever possible
Closures can cause accidental type instabilities that are difficult to track down and debug; in the
long run it saves time to always program defensively and avoid writing closures in the first place,
even when a particular closure would not have been problematic. A similar argument applies to reading
code with closures; if someone is looking for type instabilities, this is faster to do when code does
not contain closures.
Furthermore, if you want to update variables in an outer scope, do so explicitly with Refs or self
defined structs.
For example,
map(Base.Fix2(getindex, i), vector_of_vectors)
is preferred over
map(v -> v[i], vector_of_vectors)
or
[v[i] for v in vector_of_vectors]
Numerical functionality should use the appropriate generic numerical interfaces
While you can use A\b to do a linear solve inside of a package, that does not mean that you should.
This interface is only sufficient for performing factorizations, and so that limits the scaling
choices, the types of A that can be supported, etc. Instead, linear solves within packages should
use LinearSolve.jl. Similarly, nonlinear solves should use NonlinearSolve.jl. Optimization should use
Optimization.jl. Etc. This allows the full generic choice to be given to the user without depending
on every solver package (effectively recreating the generic interfaces within each package).
Functions should capture one underlying principle
Functions mean one thing. Every dispatch of + should be "the meaning of addition on these types".
While in theory you could add dispatches to + that mean something different, that will fail in
generic code for which + means addition. Thus for generic code to work, code needs to adhere to
one meaning for each function. Every dispatch should be an instantiation of that meaning.
Internal choices should be exposed as options whenever possible
Whenever possible, numerical values and choices within scripts should be exposed as options
to the user. This promotes code reusability beyond the few cases the author may have expected.
Prefer code reuse over rewrites whenever possible
If a package has a function you need, use the package. Add a dependency if you need to. If the
function is missing a feature, prefer to add that feature to said package and then add it as a
dependency. If the dependency is potentially troublesome, for example because it has a high
load time, prefer to spend time helping said package fix these issues and add the dependency.
Only when it does not seem possible to make the package "good enough" should using the package
be abandoned. If it is abandoned, consider building a new package for this functionality as you
need it, and then make it a dependency.
Prefer to not shadow functions
Two functions can have the same name in Julia by having different namespaces. For example,
X.f and Y.f can be two different functions, with different dispatches, but the same name.
This should be avoided whenever possible. Instead of creating MyPackage.sort, consider
adding dispatches to Base.sort for your types if these new dispatches match the underlying
principle of the function. If it doesn't, prefer to use a different name. While using MyPackage.sort
is not conflicting, it is going to be confusing for most people unfamiliar with your code,
so MyPackage.special_sort would be more helpful to newcomers reading the code.
Specific Rules
High Level Rules
Use 4 spaces per indentation level, no tabs.
Try to adhere to a 92 character line length limit.
General Naming Principles
All type names should be CamelCase.
All struct names should be CamelCase.
All module names should be CamelCase.
All function names should be snake_case (all lowercase).
All variable names should be snake_case (all lowercase).
All constant names should be SNAKE_CASE (all uppercase).
All abstract type names should begin with Abstract.
All type variable names should be a single capital letter, preferably related to the value being typed.
Whole words are usually better than abbreviations or single letters.
Variables meant to be internal or private to a package should be denoted by prepending two underscores, i.e. __.
Single letters can be okay when naming a mathematical entity, i.e. an entity whose purpose or non-mathematical "meaning" is likely only known by downstream callers. For example, a and b would be appropriate names when implementing *(a::AbstractMatrix, b::AbstractMatrix), since the "meaning" of those arguments (beyond their mathematical meaning as matrices, which is already described by the type) is only known by the caller.
Unicode is fine within code where it increases legibility, but in no case should Unicode be used in public APIs.
This is to allow support for terminals which cannot use Unicode: if a keyword argument must be η, then it can be
exclusionary to uses on clusters which do not support Unicode inputs.
Comments
TODO to mark todo comments and XXX to mark comments about currently broken code
Quote code in comments using backticks (e.g. `variable_name`).
When possible, code should be changed to incorporate information that would have been in
a comment. For example, instead of commenting # fx applies the effects to a tree, simply
change the function and variable names apply_effects(tree).
Comments referring to Github issues and PRs should add the URL in the comments.
Only use inline comments if they fit within the line length limit. If your comment
cannot be fitted inline then place the comment above the content to which it refers:
# Yes:# Number of nodes to predict. Again, an issue with the workflow order. Should be updated# after data is fetched.
p =1# No:
p =1# Number of nodes to predict. Again, an issue with the workflow order. Should be# updated after data is fetched.
In general, comments above a line of code or function are preferred to inline comments.
Modules
Module imports should occur at the top of a file or right after a module declaration.
Module imports in packages should either use import or explicitly declare the imported functionality, for example
using Dates: Year, Month, Week, Day, Hour, Minute, Second, Millisecond.
Import and using statements should be separated, and should be divided by a blank line.
# Yes:import A: a
import C
using B
using D: d
# No:import A: a
using B
import C
using D: d
Exported variables should be considered as part of the public API, and changing their interface constitutes a
breaking change.
Any exported variables should be sufficiently unique. I.e., do not export f as that is very likely to clash with
something else.
A file that includes the definition of a module, should not include any other code that runs outside that module.
i.e. the module should be declared at the top of the file with the module keyword and end at the bottom of the file.
No other code before, or after (except for module docstring before).
In this case the code with in the module block should not be indented.
Sometimes, e.g. for tests, or for namespacing an enumeration, it is desirable to declare a submodule midway through a file.
In this case the code within the submodule should be indented.
Functions
Only use short-form function definitions when they fit on a single line:
# Yes:foo(x::Int64) =abs(x) +3# No:foobar(array_data::AbstractArray{T}, item::T) where {T <:Int64} = T[
abs(x) *abs(item) +3for x in array_data
]
Inputs should be required unless a default is historically expected or likely to be applicable to >95% of use cases.
For example, the tolerance of a differential equation solver was set to a default of abstol=1e-6,reltol=1e-3 as a
generally correct plot in most cases, and is an expectation from back in the 90's. In that case, using the historically
expected and most often useful default tolerances is justified. However, if one implements GradientDescent, the learning
rate needs to be adjusted for each application (based on the size of the gradient), and thus a default of
GradientDescent(learning_rate = 1) is not recommended.
Arguments which do not have defaults should be preferrably made into positional arguments. The newer syntax of required
keyword arguments can be useful but should not be abused. Notable exceptions are cases where "either or" arguments are
accepted, for example of defining g or dgdu is sufficient, then making them both keyword arguments with = nothing
and checking that either is not nothing (and throwing an appropriate error) is recommended if distinct dispatches with
different types is not possible.
When calling a function always separate your keyword arguments from your positional arguments with a semicolon.
This avoids mistakes in ambiguous cases (such as splatting a Dict).
When writing a function that sends a lot of keyword arguments to another function, say sending keyword arguments to a
differential equation solver, use a named tuple keyword argument instead of splatting the keyword arguments. For example,
use diffeq_solver_kwargs = (; abstol=1e-6, reltol=1e-6,) as the API and use solve(prob, alg; diffeq_solver_kwargs...)
instead of splatting all keyword arguments.
Functions which mutate arguments should be appended with !.
Avoid type piracy. I.e., do not add methods
to functions you don't own on types you don't own. Either own the types or the function.
Functions should prefer instances instead of types for arguments. For example, for a solver type Tsit5, the interface
should use solve(prob,Tsit5()), not solve(prob,Tsit5). The reason for this is multifold. For one, passing a type
has different specialization rules, so functionality can be slower unless ::Type{Tsit5} is written in the dispatches
which use it. Secondly, this allows for default and keyword arguments to extend the choices, which may become useful
for some types down the line. Using this form allows adding more options in a non-breaking manner.
If the number of arguments is too large to fit into a 92 character line, then use as many arguments as possible within
a line and start each new row with the same indentation, preferably at the same column as the ( but this can be moved
left if the function name is very long. For example:
# Yesfunctionmy_large_function(argument1, argument2,
argument3, argument4,
argument5, x, y, z)
# Nofunctionmy_large_function(argument1,
argument2,
argument3,
argument4,
argument5,
x,
y,
z)
Function Argument Precedence
Function argument.
Putting a function argument first permits the use of do blocks for passing
multiline anonymous functions.
I/O stream.
Specifying the IO object first permits passing the function to functions such as
sprint, e.g. sprint(show, x).
Input being mutated.
For example, in [fill!(x, v)](@ref fill!), x is the object being mutated and it
appears before the value to be inserted into x.
Type.
Passing a type typically means that the output will have the given type.
In [parse(Int, "1")](@ref parse), the type comes before the string to parse.
There are many such examples where the type appears first, but it's useful to note that
in [read(io, String)](@ref read), the IO argument appears before the type, which is
in keeping with the order outlined here.
Input not being mutated.
In fill!(x, v), v is not being mutated and it comes after x.
Key.
For associative collections, this is the key of the key-value pair(s).
For other indexed collections, this is the index.
Value.
For associative collections, this is the value of the key-value pair(s).
In cases like [fill!(x, v)](@ref fill!), this is v.
Everything else.
Any other arguments.
Varargs.
This refers to arguments that can be listed indefinitely at the end of a function call.
For example, in Matrix{T}(undef, dims), the dimensions can be given as a
Tuple, e.g. Matrix{T}(undef, (1,2)), or as Varargs,
e.g. Matrix{T}(undef, 1, 2).
Keyword arguments.
In Julia keyword arguments have to come last anyway in function definitions; they're
listed here for the sake of completeness.
The vast majority of functions will not take every kind of argument listed above; the
numbers merely denote the precedence that should be used for any applicable arguments
to a function.
Tests and Continuous Integration
The high level runtests.jl file should only be used to shuttle to other test files.
Every set of tests should be included into a @safetestset.
A standard @testset does not fully enclose all defined values, such as functions defined in a @testset, and
thus can "leak".
Test includes should be written in one line, for example:
@time@safetestset"Jacobian Tests"begininclude("interface/jacobian_tests.jl") end
Every test script should be fully reproducible in isolation. I.e., one should be able to copy paste that script
and receive the results.
Test scripts should be grouped based on categories, for example tests of the interface vs tests for numerical
convergence. Grouped tests should be kept in the same folder.
A GROUP environment variable should be used to specify test groups for parallel testing in continuous integration.
A fallback group All should be used to specify all of the tests that should be run when a developer runs ]test Package
locally. As an example, see the
OrdinaryDiffEq.jl test structure
Tests should include downstream tests to major packages which use the functionality, to ensure continued support.
Any update which breaks the downstr
请发表评论