Using @snoop_inference results to improve inferrability

Throughout this page, we'll use the OptimizeMe demo, which ships with SnoopCompile.

Note

To understand what follows, it's essential to refer to OptimizeMe source code as you follow along.

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> fg = flamegraph(tinf)Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:79, 0x00, 0:1163518749))

If you visualize fg with ProfileView, you may see something like this:

flamegraph-OptimizeMe

From the standpoint of precompilation, this has some obvious problems:

  • even though we called a single method, OptimizeMe.main(), there are many distinct flames separated by blank spaces. This indicates that many calls are being made by runtime dispatch: each separate flame is a fresh entrance into inference.
  • several of the flames are marked in red, indicating that they are not naively precompilable (see the Tutorial on @snoop_inference). While @compile_workload can handle these flames, an even more robust solution is to eliminate them altogether.

Our goal will be to improve the design of OptimizeMe to make it more readily precompilable.

Analyzing inference triggers

We'll first extract the "triggers" of inference, which is just a repackaging of part of the information contained within tinf. Specifically an InferenceTrigger captures callee/caller relationships that straddle a fresh entrance to type-inference, allowing you to identify which calls were made by runtime dispatch and what MethodInstance they called.

julia> itrigs = inference_triggers(tinf)38-element Vector{InferenceTrigger}:
 Inference triggered to call Main.var"Main".OptimizeMe.main() from eval (./boot.jl:385) inlined into cd(::Documenter.var"#62#64"{Module}, ::String) (./file.jl:112)
 Inference triggered to call similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container{Int64}}) from copy (./broadcast.jl:943) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)
 Inference triggered to call setindex!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Main.var"Main".OptimizeMe.Container{Int64}, ::Int64) from copy (./broadcast.jl:944) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)
 Inference triggered to call Base.Broadcast.copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) from copy (./broadcast.jl:950) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)
 Inference triggered to call similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container}) from copyto_nonleaf! (./broadcast.jl:1104) with specialization Base.Broadcast.copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64)
 Inference triggered to call Base.Broadcast.restart_copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container}, ::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Main.var"Main".OptimizeMe.Container{UInt8}, ::Int64, ::Base.OneTo{Int64}, ::Int64, ::Int64) from copyto_nonleaf! (./broadcast.jl:1105) with specialization Base.Broadcast.copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64)
 Inference triggered to call println(::IOContext{Base.PipeEndpoint}, ::String, ::Vararg{String}) from println (./coreio.jl:4) with specialization println(::String)
 Inference triggered to call show(::IOContext{Base.PipeEndpoint}, ::MIME{Symbol("text/plain")}, ::Vector{Main.var"Main".OptimizeMe.Container}) from display (./multimedia.jl:254) with specialization display(::TextDisplay, ::MIME{Symbol("text/plain")}, ::Any)
 Inference triggered to call isassigned(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64) from alignment (./arrayshow.jl:68) with specialization Base.alignment(::IOContext{Base.PipeEndpoint}, ::AbstractVecOrMat, ::Vector{Int64}, ::Vector{Int64}, ::Int64, ::Int64, ::Int64, ::Int64)
 Inference triggered to call getindex(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64) from alignment (./arrayshow.jl:69) with specialization Base.alignment(::IOContext{Base.PipeEndpoint}, ::AbstractVecOrMat, ::Vector{Int64}, ::Vector{Int64}, ::Int64, ::Int64, ::Int64, ::Int64)
 ⋮
 Inference triggered to call Base.replace_in_print_matrix(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64, ::String) from print_matrix_row (./arrayshow.jl:120) with specialization Base.print_matrix_row(::IOContext{Base.PipeEndpoint}, ::AbstractVecOrMat, ::Vector{Tuple{Int64, Int64}}, ::Int64, ::Vector{Int64}, ::String, ::Int64)
 Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{UInt8}) from #sprint#501 (./strings/io.jl:112) with specialization Base.var"#sprint#501"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any})
 Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{UInt16}) from #sprint#501 (./strings/io.jl:112) with specialization Base.var"#sprint#501"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any})
 Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Float32}) from #sprint#501 (./strings/io.jl:112) with specialization Base.var"#sprint#501"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any})
 Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Char}) from #sprint#501 (./strings/io.jl:112) with specialization Base.var"#sprint#501"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any})
 Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) from #sprint#501 (./strings/io.jl:112) with specialization Base.var"#sprint#501"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any})
 Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) from #sprint#501 (./strings/io.jl:112) with specialization Base.var"#sprint#501"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any})
 Inference triggered to call Main.var"Main".OptimizeMe.howbig(::Float64) from #1 (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29) with specialization (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64)
 Inference triggered to call Base.collect_to_with_first!(::Vector{Float64}, ::Float64, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Int64) from _collect (./array.jl:864) with specialization Base._collect(::Vector{Float64}, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Base.EltypeUnknown, ::Base.HasShape{1})

The number of elements in this Vector{InferenceTrigger} tells you how many calls were (1) made by runtime dispatch and (2) the callee had not previously been inferred.

Tip

In the REPL, SnoopCompile displays InferenceTriggers with yellow coloration for the callee, red for the caller method, and blue for the caller specialization. This makes it easier to quickly identify the most important information.

In some cases, this might indicate that you'll need to fix each case separately; fortunately, in many cases fixing one problem addresses many other.

Method triggers

Most often, it's most convenient to organize them by the method triggering the need for inference:

julia> mtrigs = accumulate_by_source(Method, itrigs)12-element Vector{SnoopCompile.TaggedTriggers{Method}}:
 display(d::TextDisplay, M::MIME{Symbol("text/plain")}, x) @ Base.Multimedia multimedia.jl:254 (1 callees from 1 callers)
 print_matrix_row(io::IO, X::AbstractVecOrMat, A::Vector, i::Integer, cols::AbstractVector, sep::AbstractString, idxlast::Integer) @ Base arrayshow.jl:97 (1 callees from 1 callers)
 typeinfo_prefix(io::IO, X) @ Base arrayshow.jl:562 (1 callees from 1 callers)
 (::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
 println(xs...) @ Base coreio.jl:4 (1 callees from 1 callers)
 _collect(c, itr, ::Base.EltypeUnknown, isz::Union{Base.HasLength, Base.HasShape}) @ Base array.jl:851 (1 callees from 1 callers)
 cd(f::Function, dir::AbstractString) @ Base.Filesystem file.jl:107 (1 callees from 1 callers)
 copyto_nonleaf!(dest, bc::Base.Broadcast.Broadcasted, iter, state, count) @ Base.Broadcast broadcast.jl:1092 (2 callees from 1 callers)
 lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers)
 var"#sprint#501"(context, sizehint::Integer, ::typeof(sprint), f::Function, args...) @ Base strings/io.jl:107 (8 callees from 2 callers)
 alignment(io::IO, X::AbstractVecOrMat, rows::AbstractVector{T}, cols::AbstractVector{V}, cols_if_complete::Integer, cols_otherwise::Integer, sep::Integer, ncols::Integer) where {T, V} @ Base arrayshow.jl:60 (9 callees from 1 callers)
 _show_default(io::IO, x) @ Base show.jl:464 (9 callees from 1 callers)

The methods triggering the largest number of inference runs are shown at the bottom. You can also select methods from a particular module:

julia> modtrigs = filtermod(OptimizeMe, mtrigs)2-element Vector{SnoopCompile.TaggedTriggers{Method}}:
 (::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
 lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers)

Rather than filter by a single module, you can alternatively call SnoopCompile.parcel(mtrigs) to split them out by module. In this case, most of the triggers came from Base, not OptimizeMe. However, many of the failures in Base were nevertheless indirectly due to OptimizeMe: our methods in OptimizeMe call Base methods with arguments that trigger internal inference failures. Fortunately, we'll see that using more careful design in OptimizeMe can avoid many of those problems.

Tip

If you have a longer list of inference triggers than you feel comfortable tackling, filtering by your package's module or using precompile_blockers can be a good way to start. Fixing issues in the package itself can end up resolving many of the "indirect" triggers too. Also be sure to note the ability to filter out likely "noise" from test suites.

You can get an overview of each Method trigger with summary:

julia> mtrig = modtrigs[1](::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
julia> summary(mtrig)(::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 had 1 specializations MethodInstance for (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64) has Core.Box (fix this before tackling other problems, see https://timholy.github.io/SnoopCompile.jl/stable/snoop_invalidations/#Fixing-Core.Box) Triggering calls: Line 29: calling howbig (1 instances)

You can also say edit(mtrig) and be taken directly to the method you're analyzing in your editor. You can still "dig deep" into individual triggers:

julia> itrig = mtrig.itrigs[1]Inference triggered to call Main.var"Main".OptimizeMe.howbig(::Float64) from #1 (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29) with specialization (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64)

This is useful if you want to analyze with Cthulhu.ascend. Method-based triggers, which may aggregate many different individual triggers, can be useful because tools like Cthulhu.jl show you the inference results for the entire MethodInstance, allowing you to fix many different inference problems at once.

Trigger trees

While method triggers are probably the most useful way of organizing these inference triggers, for learning purposes here we'll use a more detailed scheme, which organizes inference triggers in a tree:

julia> itree = trigger_tree(itrigs)TriggerNode for root with 1 direct children
julia> using AbstractTrees
julia> print_tree(itree)root └─ Main.var"Main".OptimizeMe.main() ├─ similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container{Int64}}) ├─ setindex!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Main.var"Main".OptimizeMe.Container{Int64}, ::Int64) ├─ Base.Broadcast.copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) │ ├─ similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container}) │ └─ Base.Broadcast.restart_copyto_nonleaf!(::Vector{Main.var"Main".OptimizeMe.Container}, ::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Main.var"Main".OptimizeMe.Container{UInt8}, ::Int64, ::Base.OneTo{Int64}, ::Int64, ::Int64) ├─ println(::IOContext{Base.PipeEndpoint}, ::String, ::Vararg{String}) ├─ show(::IOContext{Base.PipeEndpoint}, ::MIME{Symbol("text/plain")}, ::Vector{Main.var"Main".OptimizeMe.Container}) │ ├─ isassigned(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64) │ ├─ getindex(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Int64}) │ │ └─ show(::IOContext{IOBuffer}, ::Any) │ │ └─ sizeof(::Main.var"Main".OptimizeMe.Container{Int64}) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{UInt8}) │ │ └─ sizeof(::Main.var"Main".OptimizeMe.Container{UInt8}) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{UInt16}) │ │ ├─ sizeof(::Main.var"Main".OptimizeMe.Container{UInt16}) │ │ └─ show(::IOContext{IOBuffer}, ::UInt16) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Float32}) │ │ └─ sizeof(::Main.var"Main".OptimizeMe.Container{Float32}) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Char}) │ │ └─ sizeof(::Main.var"Main".OptimizeMe.Container{Char}) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) │ │ ├─ sizeof(::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) │ │ └─ Base.typeinfo_eltype(::Type) │ ├─ Base.alignment(::IOContext{Base.PipeEndpoint}, ::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) │ │ ├─ sizeof(::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) │ │ └─ show(::IOContext{IOBuffer}, ::Tuple{String, Int64}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Int64}) │ ├─ Base.replace_in_print_matrix(::Vector{Main.var"Main".OptimizeMe.Container}, ::Int64, ::Int64, ::String) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{UInt8}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{UInt16}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Float32}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Char}) │ ├─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) │ └─ show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) ├─ Main.var"Main".OptimizeMe.howbig(::Float64) └─ Base.collect_to_with_first!(::Vector{Float64}, ::Float64, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Int64)

This gives you a big-picture overview of how the inference failures arose. The parent-child relationships are based on the backtraces at the entrance to inference, and the nodes are organized in the order in which inference occurred. Inspection of these trees can be informative; for example, here we notice a lot of method specializations for Container{T} for different T.

We're going to march through these systematically.

suggest and fixing Core.Box

You may have noticed above that summary(mtrig) generated a red has Core.Box message. Assuming that itrig is still the first (and it turns out, only) trigger from this method, let's look at this again, explicitly using suggest, the tool that generated this hint:

julia> suggest(itrig)/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29: has Core.Box (fix this before tackling other problems, see https://timholy.github.io/SnoopCompile.jl/stable/snoop_invalidations/#Fixing-Core.Box)

You can see that SnoopCompile recommends tackling this first; depending on how much additional code is affected, fixing a Core.Box allows inference to work better and may resolve other triggers.

This message also directs readers to a section of this documentation that links to a page of the Julia manual describing the underlying problem. The Julia documentation suggests a couple of fixes, of which the best (in this case) is to use the let statement to rebind the variable and end any "conflict" with the closure:

function abmult(r::Int, ys)
    if r < 0
        r = -r
    end
    let r = r    # Julia #15276
        return map(x -> howbig(r * x), ys)
    end
end

suggest and a fix involving manual eltype specification

Let's look at the other Method-trigger rooted in OptimizeMe:

julia> mtrig = modtrigs[2]lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers)
julia> summary(mtrig)lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 had 1 specializations Triggering calls: Inlined copy at ./broadcast.jl:943: calling similar (1 instances) Inlined copy at ./broadcast.jl:944: calling setindex! (1 instances) Inlined copy at ./broadcast.jl:950: calling copyto_nonleaf! (1 instances)
julia> itrig = mtrig.itrigs[1]Inference triggered to call similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container{Int64}}) from copy (./broadcast.jl:943) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)

If you use Cthulhu's ascend(itrig) you might see something like this:

ascend-lotsa

The first thing to note here is that cs is inferred as an AbstractVector–fixing this to make it a concrete type should be our next goal. There's a second, more subtle hint: in the call menu at the bottom, the selected call is marked < semi-concrete eval >. This is a hint that a method is being called with a non-concrete type.

What might that non-concrete type be?

julia> isconcretetype(OptimizeMe.Container)false

The statement Container.(list) is thus creating an AbstractVector with a non-concrete element type. You can seem in greater detail what happens, inference-wise, in this snippet from print_tree(itree):

   ├─ similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.OptimizeMe.Container{Int64}})
   ├─ setindex!(::Vector{Main.OptimizeMe.Container{Int64}}, ::Main.OptimizeMe.Container{Int64}, ::Int64)
   ├─ Base.Broadcast.copyto_nonleaf!(::Vector{Main.OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64)
   │  ├─ similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.OptimizeMe.Container})
   │  └─ Base.Broadcast.restart_copyto_nonleaf!(::Vector{Main.OptimizeMe.Container}, ::Vector{Main.OptimizeMe.Container{Int64}}, ::Base.Broadcast.Broadcasted

In rough terms, what this means is the following:

  • since the first item in list is an Int, the output initially gets created as a Vector{Container{Int}}
  • however, copyto_nonleaf! runs into trouble when it goes to copy the second item, which is a Container{UInt8}
  • hence, copyto_nonleaf! re-allocates the output array to be a generic Vector{Container} and then calls restart_copyto_nonleaf!.

We can prevent all this hassle with one simple change: rewrite that line as

cs = Container{Any}.(list)

We use Container{Any} here because there is no more specific element type–other than an unreasonably-large Union–that can hold all the items in list.

If you make these edits manually, you'll see that we've gone from dozens of itrigs (38 on Julia 1.10, you may get a different number on other Julia versions) down to about a dozen (13 on Julia 1.10). Real progress!

Replacing hard-to-infer calls with lower-level APIs

We note that many of the remaining triggers are somehow related to show, for example:

Inference triggered to call show(::IOContext{Base.TTY}, ::MIME{Symbol("text/plain")}, ::Vector{Main.OptimizeMe.Container{Any}}) from #55 (/cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:273) with specialization (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(::Any)

In this case we see that the calling method is #55. This is a gensym, or generated symbol, indicating that the method was generated during Julia's lowering pass, and might indicate a macro, a do block or other anonymous function, the generator for a @generated function, etc.

edit(itrig) (or equivalently, edit(node) where node is a child of itree) takes us to this method in Base, for which key lines are

function display(d::REPLDisplay, mime::MIME"text/plain", x)
    x = Ref{Any}(x)
    with_repl_linfo(d.repl) do io
        ⋮
        show(io, mime, x[])
        ⋮
end

The generated method corresponds to the do block here. The call to show comes from show(io, mime, x[]). This implementation uses a clever trick, wrapping x in a Ref{Any}(x), to prevent specialization of the method defined by the do block on the specific type of x. This trick is designed to limit the number of MethodInstances inferred for this display method.

A great option is to replace the call to display with an explicit

show(stdout, MIME("text/plain"), cs)

There's one extra detail: the type of stdout is not fixed (and therefore not known), because one can use a terminal, a file, devnull, etc., as stdout. If you want to prevent all runtime dispatch from this call, you'd need to supply an io::IO object of known type as the first argument. It could, for example, be passed in to lotsa_containers from main:

function lotsa_containers(io::IO)
    ⋮
    println(io, "lotsa containers:")
    show(io, MIME("text/plain"), cs)
end

However, if you want it to go to stdout–and to allow users to redirect stdout to a target of their choosing–then an io argument may have to be of unknown type when called from main.

When you need to rely on @compile_workload

Most of the remaining triggers are difficult to fix because they occur in deliberately-@nospecialized portions of Julia's internal code for displaying arrays. In such cases, adding a PrecompileTools.@compile_workload is your best option. Here we use an interesting trick:

@compile_workload begin
    lotsa_containers(devnull)  # use `devnull` to suppress output
    abmult(rand(-5:5), rand(3))
end
precompile(lotsa_containers, (Base.TTY,))

During the workload, we pass devnull as the io object to lotsa_containers: this suppresses the output so you don't see anything during precompilation. However, devnull is not a Base.TTY, the standard type of stdout. Nevertheless, this is effective because we can see that many of the callees in the remaining inference-triggers do not depend on the io object.

To really ice the cake, we also add a manual precompile directive. (precompile doesn't execute the method, it just compiles it.) This doesn't "step through" runtime dispatch, but at least it precompiles the entry point. Thus, at least lotsa_containers will be precompiled for the most likely IO type encountered in practice.

With these changes, we've fixed nearly all the latency problems in OptimizeMe, and made it much less vulnerable to invalidation as well. You can see the final code in the OptimizeMeFixed source code. Note that this would have to be turned into a real package for the @compile_workload to have any effect.

A note on analyzing test suites

If you're doing a package analysis, it's convenient to use the package's runtests.jl script as a way to cover much of the package's functionality. SnoopCompile has a couple of enhancements designed to make it easier to ignore inference triggers that come from the test suite itself. First, suggest.(itrigs) may show something like this:

 ./broadcast.jl:1315: inlineable (ignore this one)
 ./broadcast.jl:1315: inlineable (ignore this one)
 ./broadcast.jl:1315: inlineable (ignore this one)
 ./broadcast.jl:1315: inlineable (ignore this one)

This indicates a broadcasting operation in the @testset itself. Second, while it's a little dangerous (because suggest cannot entirely be trusted), you can filter these out:

julia> itrigsel = [itrig for itrig in itrigs if !isignorable(suggest(itrig))];

julia> length(itrigs)
222

julia> length(itrigsel)
71

While there is some risk of discarding triggers that provide clues about the origin of other triggers (e.g., they would have shown up in the same branch of the trigger_tree), the shorter list may help direct your attention to the "real" issues.