Reference

Data collection

SnoopCompileCore.@snoop_invalidations — Macro

invs = @snoop_invalidations expr

Capture method cache invalidations triggered by evaluating expr. invs is a sequence of invalidated Core.MethodInstances together with "explanations," consisting of integers (encoding depth) and strings (documenting the source of an invalidation).

Unless you are working at a low level, you essentially always want to pass invs directly to SnoopCompile.invalidation_trees.

Extended help

invs is in a format where the "reason" comes after the items. Method deletion results in the sequence

[zero or more (mi, "invalidate_mt_cache") pairs..., zero or more (depth1 tree, loctag) pairs..., method, loctag] with loctag = "jl_method_table_disable"

where mi means a MethodInstance. depth1 means a sequence starting at depth=1.

Method insertion results in the sequence

[zero or more (depth0 tree, sig) pairs..., same info as with delete_method except loctag = "jl_method_table_insert"]

The authoritative reference is Julia's own src/gf.c file.

SnoopCompileCore.@snoop_inference — Macro

tinf = @snoop_inference commands;

Produce a profile of julia's type inference, recording the amount of time spent inferring every MethodInstance processed while executing commands. Each fresh entrance to type inference (whether executed directly in commands or because a call was made by runtime-dispatch) also collects a backtrace so the caller can be identified.

tinf is a tree, each node containing data on a particular inference "frame" (the method, argument-type specializations, parameters, and even any constant-propagated values). Each reports the exclusive/inclusive times, where the exclusive time corresponds to the time spent inferring this frame in and of itself, whereas the inclusive time includes the time needed to infer all the callees of this frame.

The top-level node in this profile tree is ROOT. Uniquely, its exclusive time corresponds to the time spent not in julia's type inference (codegen, llvm_opt, runtime, etc).

Working with tinf effectively requires loading SnoopCompile.

Warning

Note the semicolon ; at the end of the @snoop_inference macro call. Because SnoopCompileCore is not permitted to invalidate any code, it cannot define the Base.show methods that pretty-print tinf. Defer inspection of tinf until SnoopCompile has been loaded.

Example

julia> tinf = @snoop_inference begin
           sort(rand(100))  # Evaluate some code and profile julia's type inference
       end;

SnoopCompileCore.@snoop_llvm — Macro

@snoop_llvm "func_names.csv" "llvm_timings.yaml" begin
    # Commands to execute, in a new process
end

causes the julia compiler to log timing information for LLVM optimization during the provided commands to the files "funcnames.csv" and "llvmtimings.yaml". These files can be used for the input to SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml").

The logs contain the amount of time spent optimizing each "llvm module", and information about each module, where a module is a collection of functions being optimized together.

GUIs

SnoopCompile.flamegraph — Function

flamegraph(tinf::InferenceTimingNode; tmin=0.0, excluded_modules=Set([Main]), mode=nothing)

Convert the call tree of inference timings returned from @snoop_inference into a FlameGraph. Returns a FlameGraphs.FlameGraph structure that represents the timing trace recorded for type inference.

Frames that take less than tmin seconds of inclusive time will not be included in the resultant FlameGraph (meaning total time including it and all of its children). This can be helpful if you have a very big profile, to save on processing time.

Non-precompilable frames are marked in reddish colors. excluded_modules can be used to mark methods defined in modules to which you cannot or do not wish to add precompiles.

mode controls how frames are named in tools like ProfileView. nothing uses the default of just the qualified function name, whereas supplying mode=Dict(method => count) counting the number of specializations of each method will cause the number of specializations to be included in the frame name.

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
InferenceTimingNode: 0.002148974/0.002767166 on Core.Compiler.Timings.ROOT() with 1 direct children

julia> fg = flamegraph(tinf)
Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:75, 0x00, 0:3334431))

julia> ProfileView.view(fg);  # Display the FlameGraph in a package that supports it

You should be able to reconcile the resulting flamegraph to print_tree(tinf) (see flatten).

The empty horizontal periods in the flamegraph correspond to times when something other than inference is running. The total width of the flamegraph is set from the ROOT node.

SnoopCompile.pgdsgui — Function

methodref, ax = pgdsgui(tinf::InferenceTimingNode; consts::Bool=true, by=inclusive)
methodref     = pgdsgui(ax, tinf::InferenceTimingNode; kwargs...)

Create a scatter plot comparing: - (vertical axis) the inference time for all instances of each Method, as captured by tinf; - (horizontal axis) the run time cost, as estimated by capturing a @profile before calling this function.

Each dot corresponds to a single method. The face color encodes the number of times that method was inferred, and the edge color corresponds to the fraction of the runtime spent on runtime dispatch (black is 0%, bright red is 100%). Clicking on a dot prints the method (or location, if inlined) to the REPL, and sets methodref[] to that method.

ax is the pyplot axis of the scatterplot.

Compat

pgdsgui depends on PyPlot via Julia extensions. You must load both SnoopCompile and PyPlot for this function to be defined.

Analysis of invalidations

SnoopCompile.uinvalidated — Function

umis = uinvalidated(invlist)

Return the unique invalidated MethodInstances. invlist is obtained from SnoopCompileCore.@snoop_invalidations. This is similar to filtering for MethodInstances in invlist, except that it discards any tagged "invalidate_mt_cache". These can typically be ignored because they are nearly inconsequential: they do not invalidate any compiled code, they only transiently affect an optimization of runtime dispatch.

SnoopCompile.invalidation_trees — Function

trees = invalidation_trees(list)

Parse list, as captured by SnoopCompileCore.@snoop_invalidations, into a set of invalidation trees, where parents nodes were called by their children.

Example

julia> f(x::Int)  = 1
f (generic function with 1 method)

julia> f(x::Bool) = 2
f (generic function with 2 methods)

julia> applyf(container) = f(container[1])
applyf (generic function with 1 method)

julia> callapplyf(container) = applyf(container)
callapplyf (generic function with 1 method)

julia> c = Any[1]
1-element Array{Any,1}:
 1

julia> callapplyf(c)
1

julia> trees = invalidation_trees(@snoop_invalidations f(::AbstractFloat) = 3)
1-element Array{SnoopCompile.MethodInvalidations,1}:
 inserting f(::AbstractFloat) in Main at REPL[36]:1 invalidated:
   mt_backedges: 1: signature Tuple{typeof(f),Any} triggered MethodInstance for applyf(::Array{Any,1}) (1 children) more specific

See the documentation for further details.

SnoopCompile.precompile_blockers — Function

staletrees = precompile_blockers(invalidations, tinf::InferenceTimingNode)

Select just those invalidations that contribute to "stale nodes" in tinf, and link them together. This can allow one to identify specific blockers of precompilation for particular MethodInstances.

Example

using SnoopCompileCore
invalidations = @snoop_invalidations using PkgA, PkgB;
using SnoopCompile
trees = invalidation_trees(invalidations)
tinf = @snoop_inference begin
    some_workload()
end
staletrees = precompile_blockers(trees, tinf)

In many cases, this reduces the number of invalidations that require analysis by one or more orders of magnitude.

Info

precompile_blockers is experimental and has not yet been thoroughly vetted by real-world use. Users are encouraged to try it and report any "misses" or unnecessary "hits."

SnoopCompile.filtermod — Function

modtrigs = filtermod(mod::Module, mtrigs::AbstractVector{MethodTriggers})

Select just the method-based triggers arising from a particular module.

thinned = filtermod(module, trees::AbstractVector{MethodInvalidations}; recursive=false)

Select just the cases of invalidating a method defined in module.

If recursive is false, only the roots of trees are examined (i.e., the proximal source of the invalidation must be in module). If recursive is true, then thinned contains all routes to a method in module.

SnoopCompile.findcaller — Function

methinvs = findcaller(method::Method, trees)

Find a path through trees that reaches method. Returns a single MethodInvalidations object.

Examples

Suppose you know that loading package SomePkg triggers invalidation of f(data). You can find the specific source of invalidation as follows:

f(data)                             # run once to force compilation
m = @which f(data)
using SnoopCompile
trees = invalidation_trees(@snoop_invalidations using SomePkg)
methinvs = findcaller(m, trees)

If you don't know which method to look for, but know some operation that has had added latency, you can look for methods using @snoopi. For example, suppose that loading SomePkg makes the next using statement slow. You can find the source of trouble with

julia> using SnoopCompile

julia> trees = invalidation_trees(@snoop_invalidations using SomePkg);

julia> tinf = @snoopi using SomePkg            # this second `using` will need to recompile code invalidated above
1-element Array{Tuple{Float64,Core.MethodInstance},1}:
 (0.08518409729003906, MethodInstance for require(::Module, ::Symbol))

julia> m = tinf[1][2].def
require(into::Module, mod::Symbol) in Base at loading.jl:887

julia> findcaller(m, trees)
inserting ==(x, y::SomeType) in SomeOtherPkg at /path/to/code:100 invalidated:
   backedges: 1: superseding ==(x, y) in Base at operators.jl:83 with MethodInstance for ==(::Symbol, ::Any) (16 children) more specific

SnoopCompile.report_invalidations — Function

report_invalidations(
    io::IO = stdout;
    invalidations,
    n_rows::Int = 10,
    process_filename::Function = x -> x,
)

Print a tabular summary of invalidations given:

invalidations the output of SnoopCompileCore.@snoop_invalidations

and (optionally)

io::IO IO stream. Defaults to stdout
n_rows::Int the number of rows to be displayed in the truncated table. A value of 0 indicates no truncation. A positive value will truncate the table to the specified number of rows.
process_filename(::String)::String a function to post-process each filename, where invalidations are found

Example usage

import SnoopCompileCore
invalidations = SnoopCompileCore.@snoop_invalidations begin

    # load packages & define any additional methods

end;

using SnoopCompile
using PrettyTables # to load report_invalidations
report_invalidations(;invalidations)

Using report_invalidations requires that you first load the PrettyTables.jl package.

Analysis of `@snoop_inference`

SnoopCompile.flatten — Function

flatten(tinf; tmin = 0.0, sortby=exclusive)

Flatten the execution graph of InferenceTimingNodes returned from @snoop_inference into a Vector of InferenceTiming frames, each encoding the time needed for inference of a single MethodInstance. By default, results are sorted by exclusive time (the time for inferring the MethodInstance itself, not including any inference of its callees); other options are sortedby=inclusive which includes the time needed for the callees, or nothing to obtain them in the order they were inferred (depth-first order).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
InferenceTimingNode: 0.002148974/0.002767166 on Core.Compiler.Timings.ROOT() with 1 direct children

julia> using AbstractTrees; print_tree(tinf)
InferenceTimingNode: 0.00242354/0.00303526 on Core.Compiler.Timings.ROOT() with 1 direct children
└─ InferenceTimingNode: 0.000150891/0.000611721 on SnoopCompile.FlattenDemo.packintype(::Int64) with 2 direct children
   ├─ InferenceTimingNode: 0.000105318/0.000105318 on SnoopCompile.FlattenDemo.MyType{Int64}(::Int64) with 0 direct children
   └─ InferenceTimingNode: 9.43e-5/0.000355512 on SnoopCompile.FlattenDemo.dostuff(::SnoopCompile.FlattenDemo.MyType{Int64}) with 2 direct children
      ├─ InferenceTimingNode: 6.6458e-5/0.000124716 on SnoopCompile.FlattenDemo.extract(::SnoopCompile.FlattenDemo.MyType{Int64}) with 2 direct children
      │  ├─ InferenceTimingNode: 3.401e-5/3.401e-5 on getproperty(::SnoopCompile.FlattenDemo.MyType{Int64}, ::Symbol) with 0 direct children
      │  └─ InferenceTimingNode: 2.4248e-5/2.4248e-5 on getproperty(::SnoopCompile.FlattenDemo.MyType{Int64}, x::Symbol) with 0 direct children
      └─ InferenceTimingNode: 0.000136496/0.000136496 on SnoopCompile.FlattenDemo.domath(::Int64) with 0 direct children

Note the printing of getproperty(::SnoopCompile.FlattenDemo.MyType{Int64}, x::Symbol): it shows the specific Symbol, here :x, that getproperty was inferred with. This reflects constant-propagation in inference.

Then:

julia> flatten(tinf; sortby=nothing)
8-element Vector{SnoopCompileCore.InferenceTiming}:
 InferenceTiming: 0.002423543/0.0030352639999999998 on Core.Compiler.Timings.ROOT()
 InferenceTiming: 0.000150891/0.0006117210000000001 on SnoopCompile.FlattenDemo.packintype(::Int64)
 InferenceTiming: 0.000105318/0.000105318 on SnoopCompile.FlattenDemo.MyType{Int64}(::Int64)
 InferenceTiming: 9.43e-5/0.00035551200000000005 on SnoopCompile.FlattenDemo.dostuff(::SnoopCompile.FlattenDemo.MyType{Int64})
 InferenceTiming: 6.6458e-5/0.000124716 on SnoopCompile.FlattenDemo.extract(::SnoopCompile.FlattenDemo.MyType{Int64})
 InferenceTiming: 3.401e-5/3.401e-5 on getproperty(::SnoopCompile.FlattenDemo.MyType{Int64}, ::Symbol)
 InferenceTiming: 2.4248e-5/2.4248e-5 on getproperty(::SnoopCompile.FlattenDemo.MyType{Int64}, x::Symbol)
 InferenceTiming: 0.000136496/0.000136496 on SnoopCompile.FlattenDemo.domath(::Int64)

julia> flatten(tinf; tmin=1e-4)                        # sorts by exclusive time (the time before the '/')
4-element Vector{SnoopCompileCore.InferenceTiming}:
 InferenceTiming: 0.000105318/0.000105318 on SnoopCompile.FlattenDemo.MyType{Int64}(::Int64)
 InferenceTiming: 0.000136496/0.000136496 on SnoopCompile.FlattenDemo.domath(::Int64)
 InferenceTiming: 0.000150891/0.0006117210000000001 on SnoopCompile.FlattenDemo.packintype(::Int64)
 InferenceTiming: 0.002423543/0.0030352639999999998 on Core.Compiler.Timings.ROOT()

julia> flatten(tinf; sortby=inclusive, tmin=1e-4)      # sorts by inclusive time (the time after the '/')
6-element Vector{SnoopCompileCore.InferenceTiming}:
 InferenceTiming: 0.000105318/0.000105318 on SnoopCompile.FlattenDemo.MyType{Int64}(::Int64)
 InferenceTiming: 6.6458e-5/0.000124716 on SnoopCompile.FlattenDemo.extract(::SnoopCompile.FlattenDemo.MyType{Int64})
 InferenceTiming: 0.000136496/0.000136496 on SnoopCompile.FlattenDemo.domath(::Int64)
 InferenceTiming: 9.43e-5/0.00035551200000000005 on SnoopCompile.FlattenDemo.dostuff(::SnoopCompile.FlattenDemo.MyType{Int64})
 InferenceTiming: 0.000150891/0.0006117210000000001 on SnoopCompile.FlattenDemo.packintype(::Int64)
 InferenceTiming: 0.002423543/0.0030352639999999998 on Core.Compiler.Timings.ROOT()

As you can see, sortby affects not just the order but also the selection of frames; with exclusive times, dostuff did not on its own rise above threshold, but it does when using inclusive times.

See also: accumulate_by_source.

SnoopCompileCore.exclusive — Function

exclusive(frame)

Return the time spent inferring frame, not including the time needed for any of its callees.

SnoopCompileCore.inclusive — Function

inclusive(frame)

Return the time spent inferring frame and its callees.

SnoopCompile.accumulate_by_source — Function

accumulate_by_source(flattened; tmin = 0.0, by=exclusive)

Add the inference timings for all MethodInstances of a single Method together. flattened is the output of flatten. Returns a list of (t, method) tuples.

When the accumulated time for a Method is large, but each instance is small, it indicates that it is being inferred for many specializations (which might include specializations with different constants).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
InferenceTimingNode: 0.004978/0.005447 on Core.Compiler.Timings.ROOT() with 1 direct children

julia> accumulate_by_source(flatten(tinf))
7-element Vector{Tuple{Float64, Union{Method, Core.MethodInstance}}}:
 (4.6294999999999996e-5, getproperty(x, f::Symbol) @ Base Base.jl:37)
 (5.8965e-5, dostuff(y) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:45)
 (6.4141e-5, extract(y::SnoopCompile.FlattenDemo.MyType) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:36)
 (8.9997e-5, (var"#ctor-self#"::Type{SnoopCompile.FlattenDemo.MyType{T}} where T)(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:35)
 (9.2256e-5, domath(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:41)
 (0.000117514, packintype(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:37)
 (0.004977755, ROOT() @ Core.Compiler.Timings compiler/typeinfer.jl:79)

Compared to the output from flatten, the two inferences passes on getproperty have been consolidated into a single aggregate call.

mtrigs = accumulate_by_source(Method, itrigs::AbstractVector{InferenceTrigger})

Consolidate inference triggers via their caller method. mtrigs is a vector of Method=>list pairs, where list is a list of InferenceTriggers.

loctrigs = accumulate_by_source(itrigs::AbstractVector{InferenceTrigger})

Aggregate inference triggers by location (function, file, and line number) of the caller.

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrigs = inference_triggers(SnoopCompile.itrigs_demo())
2-element Vector{InferenceTrigger}:
 Inference triggered to call MethodInstance for double(::UInt8) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)
 Inference triggered to call MethodInstance for double(::Float64) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)

julia> accumulate_by_source(itrigs)
1-element Vector{SnoopCompile.LocationTriggers}:
    calldouble1 at /pathto/SnoopCompile/src/parcel_snoop_inference.jl:762 (2 callees from 1 callers)

SnoopCompile.collect_for — Function

list = collect_for(m::Method, tinf::InferenceTimingNode)
list = collect_for(m::MethodInstance, tinf::InferenceTimingNode)

Collect all InferenceTimingNodes (descendants of tinf) that match m.

SnoopCompile.staleinstances — Function

staleinstances(tinf::InferenceTimingNode)

Return a list of InferenceTimingNodes corresponding to MethodInstances that have "stale" code (specifically, CodeInstances with outdated max_world world ages). These may be a hint that invalidation occurred while running the workload provided to @snoop_inference, and consequently an important origin of (re)inference.

Warning

staleinstances only looks retrospectively for stale code; it does not distinguish whether the code became stale while running @snoop_inference from whether it was already stale before execution commenced.

While staleinstances is recommended as a useful "sanity check" to run before performing a detailed analysis of inference, any serious examination of invalidation should use @snoop_invalidations.

For more information about world age, see https://docs.julialang.org/en/v1/manual/methods/#Redefining-Methods.

SnoopCompile.inference_triggers — Function

itrigs = inference_triggers(tinf::InferenceTimingNode; exclude_toplevel=true)

Collect the "triggers" of inference, each a fresh entry into inference via a call dispatched at runtime. All the entries in itrigs are previously uninferred, or are freshly-inferred for specific constant inputs.

exclude_toplevel determines whether calls made from the REPL, include, or test suites are excluded.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
InferenceTimingNode: 0.004490576/0.004711168 on Core.Compiler.Timings.ROOT() with 2 direct children

julia> itrigs = inference_triggers(tinf)
2-element Vector{InferenceTrigger}:
 Inference triggered to call MethodInstance for double(::UInt8) from calldouble1 (/pathto/SnoopCompile/src/inference_demos.jl:86) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/inference_demos.jl:87)
 Inference triggered to call MethodInstance for double(::Float64) from calldouble1 (/pathto/SnoopCompile/src/inference_demos.jl:86) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/inference_demos.jl:87)

julia> edit(itrigs[1])     # opens an editor at the spot in the caller

julia> using Cthulhu

julia> ascend(itrigs[2])   # use Cthulhu to inspect the stacktrace (caller is the second item in the trace)
Choose a call for analysis (q to quit):
 >   double(::Float64)
       calldouble1 at /pathto/SnoopCompile/src/inference_demos.jl:86 => calldouble2(::Vector{Vector{Any}}) at /pathto/SnoopCompile/src/inference_demos.jl:87
         calleach(::Vector{Vector{Vector{Any}}}) at /pathto/SnoopCompile/src/inference_demos.jl:88
...

SnoopCompile.trigger_tree — Function

root = trigger_tree(itrigs)

Organize inference triggers itrigs in tree format, grouping items via the call tree.

It is a tree rather than a more general graph due to the fact that caching inference results means that each node gets visited only once.

SnoopCompile.suggest — Function

suggest(itrig::InferenceTrigger)

Analyze itrig and attempt to suggest an interpretation or remedy. This returns a structure of type Suggested; the easiest thing to do with the result is to show it; however, you can also filter a list of suggestions.

Example

julia> itrigs = inference_triggers(tinf);

julia> sugs = suggest.(itrigs);

julia> sugs_important = filter(!isignorable, sugs)    # discard the ones that probably don't need to be addressed

Warning

Suggestions are approximate at best; most often, the proposed fixes should not be taken literally, but instead taken as a hint about the "outcome" of a particular runtime dispatch incident. The suggestions target calls made with non-inferrable argumets, but often the best place to fix the problem is at an earlier stage in the code, where the argument was first computed.

You can get much deeper insight via ascend (and Cthulhu generally), and even stacktrace is often useful. Suggestions are intended to be a quick and easier-to-comprehend first pass at analyzing an inference trigger.

SnoopCompile.isignorable — Function

isignorable(s::Suggested)

Returns true if s is unlikely to be an inference problem in need of fixing.

SnoopCompile.callerinstance — Function

mi = callerinstance(itrig::InferenceTrigger)

Return the MethodInstance mi of the caller in the selected stackframe in itrig.

SnoopCompile.callingframe — Function

itrigcaller = callingframe(itrig::InferenceTrigger)

"Step out" one layer of the stacktrace, referencing the caller of the current frame of itrig.

You can retrieve the proximal trigger of inference with InferenceTrigger(itrigcaller).

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_demo())[1]
Inference triggered to call MethodInstance for double(::UInt8) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)

julia> itrigcaller = callingframe(itrig)
Inference triggered to call MethodInstance for double(::UInt8) from calleach (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:764) with specialization MethodInstance for calleach(::Vector{Vector{Vector{Any}}})

SnoopCompile.skiphigherorder — Function

itrignew = skiphigherorder(itrig; exact::Bool=false)

Attempt to skip over frames of higher-order functions that take the callee as a function-argument. This can be useful if you're analyzing inference triggers for an entire package and would prefer to assign triggers to package-code rather than Base functions like map!, broadcast, etc.

Example

We collect data using the SnoopCompile.itrigs_higherorder_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_higherorder_demo())[1]
Inference triggered to call MethodInstance for double(::Float64) from mymap! (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:706) with specialization MethodInstance for mymap!(::typeof(SnoopCompile.ItrigHigherOrderDemo.double), ::Vector{Any}, ::Vector{Any})

julia> callingframe(itrig)      # step out one (non-inlined) frame
Inference triggered to call MethodInstance for double(::Float64) from mymap (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:710) with specialization MethodInstance for mymap(::typeof(SnoopCompile.ItrigHigherOrderDemo.double), ::Vector{Any})

julia> skiphigherorder(itrig)   # step out to frame that doesn't have `double` as a function-argument
Inference triggered to call MethodInstance for double(::Float64) from callmymap (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:711) with specialization MethodInstance for callmymap(::Vector{Any})

Warn

By default skiphigherorder is conservative, and insists on being sure that it's the callee being passed to the higher-order function. Higher-order functions that do not get specialized (e.g., with ::Function argument types) will not be skipped over. You can pass exact=false to allow ::Function to also be passed over, but keep in mind that this may falsely skip some frames.

SnoopCompile.InferenceTrigger — Type

InferenceTrigger(callee::MethodInstance, callerframes::Vector{StackFrame}, btidx::Int, bt)

Organize information about the "triggers" of inference. callee is the MethodInstance requiring inference, callerframes, btidx and bt contain information about the caller. callerframes are the frame(s) of call site that triggered inference; it's a Vector{StackFrame}, rather than a single StackFrame, due to the possibility that the caller was inlined into something else, in which case the first entry is the direct caller and the last entry corresponds to the MethodInstance into which it was ultimately inlined. btidx is the index in bt, the backtrace collected upon entry into inference, corresponding to callerframes.

InferenceTriggers are created by calling inference_triggers. See also: callerinstance and callingframe.

SnoopCompile.runtime_inferencetime — Function

ridata = runtime_inferencetime(tinf::InferenceTimingNode; consts=true, by=inclusive)
ridata = runtime_inferencetime(tinf::InferenceTimingNode, profiledata; lidict, consts=true, by=inclusive)

Compare runtime and inference-time on a per-method basis. ridata[m::Method] returns (trun, tinfer, nspecializations), measuring the approximate amount of time spent running m, inferring m, and the number of type-specializations, respectively. trun is estimated from profiling data, which the user is responsible for capturing before the call. Typically tinf is collected via @snoop_inference on the first call (in a fresh session) to a workload, and the profiling data collected on a subsequent call. In some cases you may need to repeat the workload several times to collect enough profiling samples.

profiledata and lidict are obtained from Profile.retrieve().

SnoopCompile.parcel — Function

ttot, pcs = SnoopCompile.parcel(tinf::InferenceTimingNode)

Parcel the "root-most" precompilable MethodInstances into separate modules. These can be used to generate precompile directives to cache the results of type-inference, reducing latency on first use.

Loosely speaking, and MethodInstance is precompilable if the module that owns the method also has access to all the types it need to precompile the instance. When the root node of an entrance to inference is not itself precompilable, parcel examines the children (and possibly, children's children...) until it finds the first node on each branch that is precompilable. MethodInstances are then assigned to the module that owns the method.

ttot is the total inference time; pcs is a list of module => (tmod, pclist) pairs. For each module, tmod is the amount of inference time affiliated with methods owned by that module; pclist is a list of (t, mi) time/MethodInstance tuples.

See also: SnoopCompile.write.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
InferenceTimingNode: 0.004490576/0.004711168 on Core.Compiler.Timings.ROOT() with 2 direct children

julia> ttot, pcs = SnoopCompile.parcel(tinf);

julia> ttot
0.000220592

julia> pcs
1-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}:
 SnoopCompile.ItrigDemo => (0.000220592, [(9.8986e-5, MethodInstance for double(::Float64)), (0.000121606, MethodInstance for double(::UInt8))])

Since there was only one module, ttot is the same as tmod. The ItrigDemo module had two precomilable MethodInstances, each listed with its corresponding inclusive time.

modtrigs = SnoopCompile.parcel(mtrigs::AbstractVector{MethodTriggers})

Split method-based triggers into collections organized by the module in which the methods were defined. Returns a module => list vector, with the module having the most MethodTriggers last.

SnoopCompile.write — Function

write(prefix::AbstractString, pc; always::Bool=false, suppress_time::Bool=false)

Write each modules' precompiles to a separate file. If always is true, the generated function will always run the precompile statements when called, otherwise the statements will only be called during package precompilation.

When writing results from parceling @snoop_inference, by default SnoopCompile appends the time taken to precompile each statement to the generated file. If suppress_time is true, this information will be omitted.

SnoopCompile 3.1

The suppress_time keyword argument was added in SnoopCompile 3.1.

SnoopCompile.isprecompilable — Function

isprecompilable(mod::Module, mi::MethodInstance)
isprecompilable(mi::MethodInstance; excluded_modules=Set([Main::Module]))

Determine whether mi is able to be precompiled within mod. This requires that all the types in mi's specialization signature are "known" to mod. See SnoopCompile.known_type for more information.

isprecompilable(mi) sets mod to the module in which the corresponding method was defined. If mod ∈ excluded_modules, then isprecompilable returns false.

If mi has been compiled by the time its defining module "closes" (the final end of the module definition) and isprecompilable(mi) returns true, then Julia will automatically include this specialization in that module's precompile cache.

!!! tip If mi is a MethodInstance corresponding to f(::T), then calling f(x::T) before the end of the module definition suffices to force compilation of mi. Alternatively, use precompile(f, (T,)).

If you'd like to cache it but isprecompilable(mi) returns false, you need to identify a module mod for which isprecompilable(mod, mi) returns true. However, just ensuring that mi gets compiled within mod may not be sufficient to ensure that it gets retained in the cache: by default, Julia will omit it from the cache if none of the types are "owned" by that module. (For example, if mod didn't define the method, and all the types in mi's signature come from other modules imported by mod, then mod does not "own" any aspect of mi.) To force it to be retained, ensure it gets called (for the first time) within a PrecompileTools.@compile_workload block. (This is the main purpose of PrecompileTools.)

Examples

julia> module A
       a(x) = x
       end
Main.A

julia> module B
       using ..A
       struct BType end    # this type is not known to A
       b(x) = x
       end
Main.B

Now let's run these methods to generate some compiled MethodInstances:

julia> A.a(3.2)          # Float64 is not "owned" by A, but A loads Base so A knows about it
3.2

julia> A.a(B.BType())    # B.BType is not known to A
Main.B.BType()

julia> B.b(B.BType())    # B knows about B.BType
Main.B.BType()

julia> mia1, mia2 = Base.specializations(only(methods(A.a)));

julia> @show mia1 SnoopCompile.isprecompilable(mia1);
mia1 = MethodInstance for Main.A.a(::Float64)
SnoopCompile.isprecompilable(mia1) = true

julia> @show mia2 SnoopCompile.isprecompilable(mia2);
mia2 = MethodInstance for Main.A.a(::Main.B.BType)
SnoopCompile.isprecompilable(mia2) = false

julia> mib = only(Base.specializations(only(methods(B.b))))
MethodInstance for Main.B.b(::Main.B.BType)

julia> SnoopCompile.isprecompilable(mib)
true

julia> SnoopCompile.isprecompilable(A, mib)
false

SnoopCompile.known_type — Function

known_type(mod::Module, T::Union{Type,TypeVar})

Returns true if the type T is "known" to the module mod, meaning that one could have written a function with signature f(x::T) in mod without getting an error.

SnoopCompile.report_callee — Function

To use report_callee do using JET

SnoopCompile.report_callees — Function

To use report_callees do using JET

SnoopCompile.report_caller — Function

To use report_caller do using JET

Analysis of LLVM

SnoopCompile.read_snoop_llvm — Function

times, info = SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml"; tmin_secs=0.0)

Reads the log file produced by the compiler and returns the structured representations.

The results will only contain modules that took longer than tmin_secs to optimize.

Return value

times contains the time spent optimizing each module, as a Pair from the time to an

array of Strings, one for every MethodInstance in that llvm module.

info is a Dict containing statistics for each MethodInstance encountered, from before

and after optimization, including number of instructions and number of basicblocks.

Example

julia> @snoop_llvm "func_names.csv" "llvm_timings.yaml" begin
           using InteractiveUtils
           @eval InteractiveUtils.peakflops()
       end
Launching new julia process to run commands...
done.

julia> times, info = SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml", tmin_secs = 0.025);

julia> times
3-element Vector{Pair{Float64, Vector{String}}}:
 0.028170923 => ["Tuple{typeof(LinearAlgebra.copy_transpose!), Array{Float64, 2}, Base.UnitRange{Int64}, Base.UnitRange{Int64}, Array{Float64, 2}, Base.UnitRange{Int64}, Base.UnitRange{Int64}}"]
 0.031356962 => ["Tuple{typeof(Base.copyto!), Array{Float64, 2}, Base.UnitRange{Int64}, Base.UnitRange{Int64}, Array{Float64, 2}, Base.UnitRange{Int64}, Base.UnitRange{Int64}}"]
 0.149138788 => ["Tuple{typeof(LinearAlgebra._generic_matmatmul!), Array{Float64, 2}, Char, Char, Array{Float64, 2}, Array{Float64, 2}, LinearAlgebra.MulAddMul{true, true, Bool, Bool}}"]

julia> info
Dict{String, NamedTuple{(:before, :after), Tuple{NamedTuple{(:instructions, :basicblocks), Tuple{Int64, Int64}}, NamedTuple{(:instructions, :basicblocks), Tuple{Int64, Int64}}}}} with 3 entries:
  "Tuple{typeof(LinearAlgebra.copy_transpose!), Ar… => (before = (instructions = 651, basicblocks = 83), after = (instructions = 348, basicblocks = 40…
  "Tuple{typeof(Base.copyto!), Array{Float64, 2}, … => (before = (instructions = 617, basicblocks = 77), after = (instructions = 397, basicblocks = 37…
  "Tuple{typeof(LinearAlgebra._generic_matmatmul!)… => (before = (instructions = 4796, basicblocks = 824), after = (instructions = 1421, basicblocks =…

Demos

SnoopCompile.flatten_demo — Function

tinf = SnoopCompile.flatten_demo()

A simple demonstration of @snoop_inference. This demo defines a module

module FlattenDemo
    struct MyType{T} x::T end
    extract(y::MyType) = y.x
    function packintype(x)
        y = MyType{Int}(x)
        return dostuff(y)
    end
    function domath(x)
        y = x + x
        return y*x + 2*x + 5
    end
    dostuff(y) = domath(extract(y))
end

It then "warms up" (forces inference on) all of Julia's Base methods needed for domath, to ensure that these MethodInstances do not need to be inferred when we collect the data. It then returns the results of

@snoop_inference FlattenDemo.packintypes(1)

See flatten for an example usage.

SnoopCompile.itrigs_demo — Function

tinf = SnoopCompile.itrigs_demo()

A simple demonstration of collecting inference triggers. This demo defines a module

module ItrigDemo
@noinline double(x) = 2x
@inline calldouble1(c) = double(c[1])
calldouble2(cc) = calldouble1(cc[1])
calleach(ccs) = (calldouble2(ccs[1]), calldouble2(ccs[2]))
end

It then "warms up" (forces inference on) calldouble2(::Vector{Vector{Any}}), calldouble1(::Vector{Any}), double(::Int):

cc = [Any[1]]
ItrigDemo.calleach([cc,cc])

Then it collects and returns inference data using

cc1, cc2 = [Any[0x01]], [Any[1.0]]
@snoop_inference ItrigDemo.calleach([cc1, cc2])

This does not require any new inference for calldouble2 or calldouble1, but it does force inference on double with two new types. See inference_triggers to see what gets collected and returned.

SnoopCompile.itrigs_higherorder_demo — Function

tinf = SnoopCompile.itrigs_higherorder_demo()

A simple demonstration of handling higher-order methods with inference triggers. This demo defines a module

module ItrigHigherOrderDemo
double(x) = 2x
@noinline function mymap!(f, dst, src)
    for i in eachindex(dst, src)
        dst[i] = f(src[i])
    end
    return dst
end
@noinline mymap(f::F, src) where F = mymap!(f, Vector{Any}(undef, length(src)), src)
callmymap(src) = mymap(double, src)
end

The key feature of this set of definitions is that the function double gets passed as an argument through mymap and mymap! (the latter are higher-order functions).

It then "warms up" (forces inference on) callmymap(::Vector{Any}), mymap(::typeof(double), ::Vector{Any}), mymap!(::typeof(double), ::Vector{Any}, ::Vector{Any}) and double(::Int):

ItrigHigherOrderDemo.callmymap(Any[1, 2])

Then it collects and returns inference data using

@snoop_inference ItrigHigherOrderDemo.callmymap(Any[1.0, 2.0])

which forces inference for double(::Float64).

See skiphigherorder for an example using this demo.