Using @snoop_inference
results to improve inferrability
Throughout this page, we'll use the OptimizeMe
demo, which ships with SnoopCompile
.
To understand what follows, it's essential to refer to OptimizeMe
source code as you follow along.
julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))
Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();
lotsa containers:
julia> fg = flamegraph(tinf)
Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:79, 0x00, 0:1388336678))
If you visualize fg
with ProfileView, you may see something like this:
From the standpoint of precompilation, this has some obvious problems:
- even though we called a single method,
OptimizeMe.main()
, there are many distinct flames separated by blank spaces. This indicates that many calls are being made by runtime dispatch: each separate flame is a fresh entrance into inference. - several of the flames are marked in red, indicating that they are not naively precompilable (see the Tutorial on
@snoop_inference
). While@compile_workload
can handle these flames, an even more robust solution is to eliminate them altogether.
Our goal will be to improve the design of OptimizeMe
to make it more readily precompilable.
Analyzing inference triggers
We'll first extract the "triggers" of inference, which is just a repackaging of part of the information contained within tinf
. Specifically an InferenceTrigger
captures callee/caller relationships that straddle a fresh entrance to type-inference, allowing you to identify which calls were made by runtime dispatch and what MethodInstance
they called.
julia> itrigs = inference_triggers(tinf)
37-element Vector{InferenceTrigger}: Inference triggered to call Main.var"Main".OptimizeMe.main() from eval (./boot.jl:430) inlined into cd(::Documenter.var"#64#66"{Module}, ::String) (./file.jl:112) Inference triggered to call similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container{Int64}}) from copy (./broadcast.jl:907) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15) Inference triggered to call setindex!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Main.var"Main".OptimizeMe.Container{Int64}, ::Int64) from copy (./broadcast.jl:908) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15) Inference triggered to call Base.Broadcast.copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) from copy (./broadcast.jl:914) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15) Inference triggered to call similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container}) from copyto_nonleaf! (./broadcast.jl:1083) with specialization Base.Broadcast.copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) Inference triggered to call Base.Broadcast.restart_copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container}, ::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Main.var"Main".OptimizeMe.Container{UInt8}, ::Int64, ::Base.OneTo{Int64}, ::Int64, ::Int64) from copyto_nonleaf! (./broadcast.jl:1084) with specialization Base.Broadcast.copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) Inference triggered to call show(::IOContext{Base.PipeEndpoint}, ::MIME{Symbol("text/plain")}, ::Vector{Main.var"Main".OptimizeMe.Container}) from display (./multimedia.jl:254) with specialization display(::TextDisplay, ::MIME{Symbol("text/plain")}, ::Any) Inference triggered to call isassigned(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64) from alignment (./arrayshow.jl:68) with specialization Base.alignment(::IOContext{Base.PipeEndpoint}, ::AbstractVecOrMat, ::Vector{Int64}, ::Vector{Int64}, ::Int64, ::Int64, ::Int64, ::Int64) Inference triggered to call getindex(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64) from alignment (./arrayshow.jl:69) with specialization Base.alignment(::IOContext{Base.PipeEndpoint}, ::AbstractVecOrMat, ::Vector{Int64}, ::Vector{Int64}, ::Int64, ::Int64, ::Int64, ::Int64) Inference triggered to call Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Int64}) from alignment (./arrayshow.jl:69) with specialization Base.alignment(::IOContext{Base.PipeEndpoint}, ::AbstractVecOrMat, ::Vector{Int64}, ::Vector{Int64}, ::Int64, ::Int64, ::Int64, ::Int64) ⋮ Inference triggered to call Base.replace_in_print_matrix(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64, ::String) from print_matrix_row (./arrayshow.jl:120) with specialization Base.print_matrix_row(::IOContext{Base.PipeEndpoint}, ::AbstractVecOrMat, ::Vector{Tuple{Int64, Int64}}, ::Int64, ::Vector{Int64}, ::String, ::Int64) Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{UInt8}) from #sprint#592 (./strings/io.jl:112) with specialization Base.var"#sprint#592"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any}) Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{UInt16}) from #sprint#592 (./strings/io.jl:112) with specialization Base.var"#sprint#592"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any}) Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Float32}) from #sprint#592 (./strings/io.jl:112) with specialization Base.var"#sprint#592"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any}) Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Char}) from #sprint#592 (./strings/io.jl:112) with specialization Base.var"#sprint#592"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any}) Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) from #sprint#592 (./strings/io.jl:112) with specialization Base.var"#sprint#592"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any}) Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) from #sprint#592 (./strings/io.jl:112) with specialization Base.var"#sprint#592"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any}) Inference triggered to call Main.var"Main".OptimizeMe.howbig(::Float64) from #1 (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29) with specialization (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64) Inference triggered to call Base.collect_to_with_first!(::Vector{Float64}, ::Float64, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Int64) from _collect (./array.jl:810) with specialization Base._collect(::Vector{Float64}, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Base.EltypeUnknown, ::Base.HasShape{1})
The number of elements in this Vector{InferenceTrigger}
tells you how many calls were (1) made by runtime dispatch and (2) the callee had not previously been inferred.
In the REPL, SnoopCompile
displays InferenceTrigger
s with yellow coloration for the callee, red for the caller method, and blue for the caller specialization. This makes it easier to quickly identify the most important information.
In some cases, this might indicate that you'll need to fix each case separately; fortunately, in many cases fixing one problem addresses many other.
Method triggers
Most often, it's most convenient to organize them by the method triggering the need for inference:
julia> mtrigs = accumulate_by_source(Method, itrigs)
11-element Vector{SnoopCompile.TaggedTriggers{Method}}: cd(f::Function, dir::AbstractString) @ Base.Filesystem file.jl:107 (1 callees from 1 callers) print_matrix_row(io::IO, X::AbstractVecOrMat, A::Vector, i::Integer, cols::AbstractVector, sep::AbstractString, idxlast::Integer) @ Base arrayshow.jl:97 (1 callees from 1 callers) display(d::TextDisplay, M::MIME{Symbol("text/plain")}, x) @ Base.Multimedia multimedia.jl:254 (1 callees from 1 callers) (::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers) typeinfo_prefix(io::IO, X) @ Base arrayshow.jl:562 (1 callees from 1 callers) _collect(c, itr, ::Base.EltypeUnknown, isz::Union{Base.HasLength, Base.HasShape}) @ Base array.jl:797 (1 callees from 1 callers) copyto_nonleaf!(dest, bc::Base.Broadcast.Broadcasted, iter, state, count) @ Base.Broadcast broadcast.jl:1071 (2 callees from 1 callers) lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers) var"#sprint#592"(context, sizehint::Integer, ::typeof(sprint), f::Function, args...) @ Base strings/io.jl:107 (8 callees from 2 callers) alignment(io::IO, X::AbstractVecOrMat, rows::AbstractVector{T}, cols::AbstractVector{V}, cols_if_complete::Integer, cols_otherwise::Integer, sep::Integer, ncols::Integer) where {T, V} @ Base arrayshow.jl:60 (9 callees from 1 callers) _show_default(io::IO, x) @ Base show.jl:481 (9 callees from 1 callers)
The methods triggering the largest number of inference runs are shown at the bottom. You can also select methods from a particular module:
julia> modtrigs = filtermod(OptimizeMe, mtrigs)
2-element Vector{SnoopCompile.TaggedTriggers{Method}}: (::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers) lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers)
Rather than filter by a single module, you can alternatively call SnoopCompile.parcel(mtrigs)
to split them out by module. In this case, most of the triggers came from Base
, not OptimizeMe
. However, many of the failures in Base
were nevertheless indirectly due to OptimizeMe
: our methods in OptimizeMe
call Base
methods with arguments that trigger internal inference failures. Fortunately, we'll see that using more careful design in OptimizeMe
can avoid many of those problems.
If you have a longer list of inference triggers than you feel comfortable tackling, filtering by your package's module or using precompile_blockers
can be a good way to start. Fixing issues in the package itself can end up resolving many of the "indirect" triggers too. Also be sure to note the ability to filter out likely "noise" from test suites.
You can get an overview of each Method trigger with summary
:
julia> mtrig = modtrigs[1]
(::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
julia> summary(mtrig)
(::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 had 1 specializations MethodInstance for (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64) has Core.Box (fix this before tackling other problems, see https://timholy.github.io/SnoopCompile.jl/stable/snoop_invalidations/#Fixing-Core.Box) Triggering calls: Line 29: calling howbig (1 instances)
You can also say edit(mtrig)
and be taken directly to the method you're analyzing in your editor. You can still "dig deep" into individual triggers:
julia> itrig = mtrig.itrigs[1]
Inference triggered to call Main.var"Main".OptimizeMe.howbig(::Float64) from #1 (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29) with specialization (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64)
This is useful if you want to analyze with Cthulhu.ascend
. Method
-based triggers, which may aggregate many different individual triggers, can be useful because tools like Cthulhu.jl show you the inference results for the entire MethodInstance
, allowing you to fix many different inference problems at once.
Trigger trees
While method triggers are probably the most useful way of organizing these inference triggers, for learning purposes here we'll use a more detailed scheme, which organizes inference triggers in a tree:
julia> itree = trigger_tree(itrigs)
TriggerNode for root with 1 direct children
julia> using AbstractTrees
julia> print_tree(itree)
root └─ Main.var"Main".OptimizeMe.main() ├─ similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container{Int64}}) ├─ setindex!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Main.var"Main".OptimizeMe.Container{Int64}, ::Int64) ├─ Base.Broadcast.copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) │ ├─ similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container}) │ └─ Base.Broadcast.restart_copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container}, ::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Main.var"Main".OptimizeMe.Container{UInt8}, ::Int64, ::Base.OneTo{Int64}, ::Int64, ::Int64) ├─ show(::IOContext{Base.PipeEndpoint}, ::MIME{Symbol("text/plain")}, ::Vector{Main.var"Main".OptimizeMe.Container}) │ ├─ isassigned(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64) │ ├─ getindex(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Int64}) │ │ └─ show(::IOContext{IOBuffer}, ::Any) │ │ └─ sizeof(::Main.var"Main".OptimizeMe.Container{Int64}) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{UInt8}) │ │ └─ sizeof(::Main.var"Main".OptimizeMe.Container{UInt8}) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{UInt16}) │ │ ├─ sizeof(::Main.var"Main".OptimizeMe.Container{UInt16}) │ │ └─ show(::IOContext{IOBuffer}, ::UInt16) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Float32}) │ │ └─ sizeof(::Main.var"Main".OptimizeMe.Container{Float32}) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Char}) │ │ └─ sizeof(::Main.var"Main".OptimizeMe.Container{Char}) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) │ │ ├─ sizeof(::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) │ │ └─ Base.typeinfo_eltype(::Type) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) │ │ ├─ sizeof(::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) │ │ └─ show(::IOContext{IOBuffer}, ::Tuple{String, Int64}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Int64}) │ ├─ Base.replace_in_print_matrix(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64, ::String) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{UInt8}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{UInt16}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Float32}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Char}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) │ └─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) ├─ Main.var"Main".OptimizeMe.howbig(::Float64) └─ Base.collect_to_with_first!(::Vector{Float64}, ::Float64, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Int64)
This gives you a big-picture overview of how the inference failures arose. The parent-child relationships are based on the backtraces at the entrance to inference, and the nodes are organized in the order in which inference occurred. Inspection of these trees can be informative; for example, here we notice a lot of method specializations for Container{T}
for different T
.
We're going to march through these systematically.
suggest
and fixing Core.Box
You may have noticed above that summary(mtrig)
generated a red has Core.Box
message. Assuming that itrig
is still the first (and it turns out, only) trigger from this method, let's look at this again, explicitly using suggest
, the tool that generated this hint:
julia> suggest(itrig)
/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29: has Core.Box (fix this before tackling other problems, see https://timholy.github.io/SnoopCompile.jl/stable/snoop_invalidations/#Fixing-Core.Box)
You can see that SnoopCompile recommends tackling this first; depending on how much additional code is affected, fixing a Core.Box
allows inference to work better and may resolve other triggers.
This message also directs readers to a section of this documentation that links to a page of the Julia manual describing the underlying problem. The Julia documentation suggests a couple of fixes, of which the best (in this case) is to use the let
statement to rebind the variable and end any "conflict" with the closure:
function abmult(r::Int, ys)
if r < 0
r = -r
end
let r = r # Julia #15276
return map(x -> howbig(r * x), ys)
end
end
suggest
and a fix involving manual eltype
specification
Let's look at the other Method-trigger rooted in OptimizeMe
:
julia> mtrig = modtrigs[2]
lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers)
julia> summary(mtrig)
lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 had 1 specializations Triggering calls: Inlined copy at ./broadcast.jl:907: calling similar (1 instances) Inlined copy at ./broadcast.jl:908: calling setindex! (1 instances) Inlined copy at ./broadcast.jl:914: calling copyto_nonleaf! (1 instances)
julia> itrig = mtrig.itrigs[1]
Inference triggered to call similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container{Int64}}) from copy (./broadcast.jl:907) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)
If you use Cthulhu's ascend(itrig)
you might see something like this:
The first thing to note here is that cs
is inferred as an AbstractVector
–fixing this to make it a concrete type should be our next goal. There's a second, more subtle hint: in the call menu at the bottom, the selected call is marked < semi-concrete eval >
. This is a hint that a method is being called with a non-concrete type.
What might that non-concrete type be?
julia> isconcretetype(OptimizeMe.Container)
false
The statement Container.(list)
is thus creating an AbstractVector
with a non-concrete element type. You can seem in greater detail what happens, inference-wise, in this snippet from print_tree(itree)
:
├─ similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.OptimizeMe.Container{Int64}})
├─ setindex!(::Vector{Main.OptimizeMe.Container{Int64}}, ::Main.OptimizeMe.Container{Int64}, ::Int64)
├─ Base.Broadcast.copyto_nonleaf!(::Vector{Main.OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64)
│ ├─ similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.OptimizeMe.Container})
│ └─ Base.Broadcast.restart_copyto_nonleaf!(::Vector{Main.OptimizeMe.Container}, ::Vector{Main.OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted
In rough terms, what this means is the following:
- since the first item in
list
is anInt
, the output initially gets created as aVector{Container{Int}}
- however,
copyto_nonleaf!
runs into trouble when it goes to copy the second item, which is aContainer{UInt8}
- hence,
copyto_nonleaf!
re-allocates the output array to be a genericVector{Container}
and then callsrestart_copyto_nonleaf!
.
We can prevent all this hassle with one simple change: rewrite that line as
cs = Container{Any}.(list)
We use Container{Any}
here because there is no more specific element type–other than an unreasonably-large Union
–that can hold all the items in list
.
If you make these edits manually, you'll see that we've gone from dozens of itrigs
(38 on Julia 1.10, you may get a different number on other Julia versions) down to about a dozen (13 on Julia 1.10). Real progress!
Replacing hard-to-infer calls with lower-level APIs
We note that many of the remaining triggers are somehow related to show
, for example:
Inference triggered to call show(::IOContext{Base.TTY}, ::MIME{Symbol("text/plain")}, ::Vector{Main.OptimizeMe.Container{Any}}) from #55 (/cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:273) with specialization (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(::Any)
In this case we see that the calling method is #55
. This is a gensym
, or generated symbol, indicating that the method was generated during Julia's lowering pass, and might indicate a macro, a do
block or other anonymous function, the generator for a @generated
function, etc.
edit(itrig)
(or equivalently, edit(node)
where node
is a child of itree
) takes us to this method in Base
, for which key lines are
function display(d::REPLDisplay, mime::MIME"text/plain", x)
x = Ref{Any}(x)
with_repl_linfo(d.repl) do io
⋮
show(io, mime, x[])
⋮
end
The generated method corresponds to the do
block here. The call to show
comes from show(io, mime, x[])
. This implementation uses a clever trick, wrapping x
in a Ref{Any}(x)
, to prevent specialization of the method defined by the do
block on the specific type of x
. This trick is designed to limit the number of MethodInstance
s inferred for this display
method.
A great option is to replace the call to display
with an explicit
show(stdout, MIME("text/plain"), cs)
There's one extra detail: the type of stdout
is not fixed (and therefore not known), because one can use a terminal, a file, devnull
, etc., as stdout
. If you want to prevent all runtime dispatch from this call, you'd need to supply an io::IO
object of known type as the first argument. It could, for example, be passed in to lotsa_containers
from main
:
function lotsa_containers(io::IO)
⋮
println(io, "lotsa containers:")
show(io, MIME("text/plain"), cs)
end
However, if you want it to go to stdout
–and to allow users to redirect stdout
to a target of their choosing–then an io
argument may have to be of unknown type when called from main
.
When you need to rely on @compile_workload
Most of the remaining triggers are difficult to fix because they occur in deliberately-@nospecialize
d portions of Julia's internal code for displaying arrays. In such cases, adding a PrecompileTools.@compile_workload
is your best option. Here we use an interesting trick:
@compile_workload begin
lotsa_containers(devnull) # use `devnull` to suppress output
abmult(rand(-5:5), rand(3))
end
precompile(lotsa_containers, (Base.TTY,))
During the workload, we pass devnull
as the io
object to lotsa_containers
: this suppresses the output so you don't see anything during precompilation. However, devnull
is not a Base.TTY
, the standard type of stdout
. Nevertheless, this is effective because we can see that many of the callees in the remaining inference-triggers do not depend on the io
object.
To really ice the cake, we also add a manual precompile
directive. (precompile
doesn't execute the method, it just compiles it.) This doesn't "step through" runtime dispatch, but at least it precompiles the entry point. Thus, at least lotsa_containers
will be precompiled for the most likely IO
type encountered in practice.
With these changes, we've fixed nearly all the latency problems in OptimizeMe
, and made it much less vulnerable to invalidation as well. You can see the final code in the OptimizeMeFixed
source code. Note that this would have to be turned into a real package for the @compile_workload
to have any effect.
A note on analyzing test suites
If you're doing a package analysis, it's convenient to use the package's runtests.jl
script as a way to cover much of the package's functionality. SnoopCompile has a couple of enhancements designed to make it easier to ignore inference triggers that come from the test suite itself. First, suggest.(itrigs)
may show something like this:
./broadcast.jl:1315: inlineable (ignore this one)
./broadcast.jl:1315: inlineable (ignore this one)
./broadcast.jl:1315: inlineable (ignore this one)
./broadcast.jl:1315: inlineable (ignore this one)
This indicates a broadcasting operation in the @testset
itself. Second, while it's a little dangerous (because suggest
cannot entirely be trusted), you can filter these out:
julia> itrigsel = [itrig for itrig in itrigs if !isignorable(suggest(itrig))];
julia> length(itrigs)
222
julia> length(itrigsel)
71
While there is some risk of discarding triggers that provide clues about the origin of other triggers (e.g., they would have shown up in the same branch of the trigger_tree
), the shorter list may help direct your attention to the "real" issues.