Using @snoop_inference to emit manual precompile directives

In a few cases, it may be inconvenient or impossible to precompile using a workload. Some examples might be:

  • an application that opens graphical windows
  • an application that connects to a database
  • an application that creates, deletes, or rewrites files on disk

In such cases, one alternative is to create a manual list of precompile directives using Julia's precompile(f, argtypes) function.

Warning

Manual precompile directives are much more likely to "go stale" as the package is developed–-precompile does not throw an error if a method for the given argtypes cannot be found. They are also more likely to be dependent on the Julia version, operating system, or CPU architecture. Whenever possible, it's safer to use a workload.

precompile directives have to be emitted by the module that owns the method and/or types. SnoopCompile comes with a tool, parcel, that splits out the "root-most" precompilable MethodInstances into their constituent modules. This will typically correspond to the bottom row of boxes in the flame graph. In cases where you have some that are not naively precompilable, they will include MethodInstances from higher up in the call tree.

Let's use SnoopCompile.parcel on our OptimizeMe demo:

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> ttot, pcs = SnoopCompile.parcel(tinf);
julia> ttot0.06811638899999999
julia> pcs4-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}: Core => (1.884e-6, [(1.884e-6, MethodInstance for (NamedTuple{(:sizehint,)})(::Tuple{Int64}))]) Base.Multimedia => (6.632e-6, [(6.632e-6, MethodInstance for MIME(::String))]) Base => (0.0034157299999999996, [(1.483e-6, MethodInstance for LinearIndices(::Tuple{Base.OneTo{Int64}})), (1.713e-6, MethodInstance for IOContext(::IOBuffer, ::IOContext{Base.PipeEndpoint})), (3.787e-6, MethodInstance for IOContext(::IOContext{Base.PipeEndpoint}, ::Base.ImmutableDict{Symbol, Any})), (5.851e-6, MethodInstance for Base.indexed_iterate(::Tuple{String, Bool}, ::Int64, ::Int64)), (6.231e-6, MethodInstance for Base.indexed_iterate(::Tuple{Int64, Int64}, ::Int64, ::Int64)), (6.452e-6, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64, ::Int64)), (6.492e-6, MethodInstance for Base.indexed_iterate(::Tuple{Any, Int64}, ::Int64, ::Int64)), (6.793e-6, MethodInstance for getindex(::Tuple{Int64, Int64}, ::Int64)), (8.336e-6, MethodInstance for getindex(::Tuple{Base.OneTo{Int64}}, ::Int64)), (8.746e-6, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64)) … (2.9365e-5, MethodInstance for getproperty(::BitVector, ::Symbol)), (2.9906e-5, MethodInstance for getproperty(::Vector, ::Symbol)), (3.0647e-5, MethodInstance for getproperty(::UnionAll, ::Symbol)), (0.000108133, MethodInstance for LinearIndices(::Vector{Float64})), (0.00025969000000000003, MethodInstance for haskey(::IOContext{Base.PipeEndpoint}, ::Symbol)), (0.000265207, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::Char)), (0.000291839, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::String)), (0.00034655099999999996, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Type{Any})), (0.000446318, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Bool)), (0.001391357, MethodInstance for string(::String, ::Int64, ::String))]) Main.var"Main".OptimizeMe => (0.023525280999999995, [(0.000115968, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.023409312999999994, MethodInstance for Main.var"Main".OptimizeMe.main())])

ttot shows the total amount of time spent on type-inference. parcel discovered precompilable MethodInstances for four modules, Core, Base.Multimedia, Base, and OptimizeMe that might benefit from precompile directives. These are listed in increasing order of inference time.

Let's look specifically at OptimizeMeFixed, since that's under our control:

julia> pcmod = pcs[end]Main.var"Main".OptimizeMe => (0.023525280999999995, Tuple{Float64, Core.MethodInstance}[(0.000115968, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.023409312999999994, MethodInstance for Main.var"Main".OptimizeMe.main())])
julia> tmod, tpcs = pcmod.second;
julia> tmod0.023525280999999995
julia> tpcs2-element Vector{Tuple{Float64, Core.MethodInstance}}: (0.000115968, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)) (0.023409312999999994, MethodInstance for Main.var"Main".OptimizeMe.main())

This indicates the amount of time spent specifically on OptimizeMe, plus the list of calls that could be precompiled in that module.

We could look at the other modules (packages) similarly.

SnoopCompile.write

You can generate files that contain ready-to-use precompile directives using SnoopCompile.write:

julia> SnoopCompile.write("/tmp/precompiles_OptimizeMe", pcs)Core: no precompile statements out of 1.884e-6
Base.Multimedia: no precompile statements out of 6.632e-6
Base: precompiled 0.001391357 out of 0.0034157299999999996
Main.var"Main".OptimizeMe: precompiled 0.023409312999999994 out of 0.023525280999999995

You'll now find a directory /tmp/precompiles_OptimizeMe, and inside you'll find files for modules that could have precompile directives added manually. The contents of the last of these should be recognizable:

function _precompile_()
    ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
    Base.precompile(Tuple{typeof(main)})   # time: 0.4204474
end

The first ccall line ensures we only pay the cost of running these precompile directives if we're building the package; this is relevant mostly if you're running Julia with --compiled-modules=no, which can be a convenient way to disable precompilation and examine packages in their "native state." (It would also matter if you've set __precompile__(false) at the top of your module, but if so why are you reading this?)

This file is ready to be moved into the OptimizeMe repository and included into your module definition.

You might also consider submitting some of the other files (or their precompile directives) to the packages you depend on.