Metadata

Manually reading JLSO files can be helpful when addressing issues deserializing objects or to simply to help with reproducibility.

using JLSO

jlso = read("breakfast.jlso", JLSOFile)
JLSOFile([cost, food, time]; version="4.0.0", julia="1.5.4", format=:julia_serialize, compression=:gzip, image="")

Now we can manually access the serialized objects:

jlso.objects
Dict{Symbol,Array{UInt8,1}} with 3 entries:
  :cost => UInt8[0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03  … …
  :food => UInt8[0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03  … …
  :time => UInt8[0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03  … …

Or deserialize individual objects:

jlso[:food]
"☕️🥓🍳"

Maybe you need to figure out what packages you had installed in the save environment?

jlso.project
Dict{String,Any} with 2 entries:
  "deps"   => Dict{String,Any}("Documenter"=>"e30172f5-a6a5-5a46-863b-614d45cd2…
  "compat" => Dict{String,Any}("Documenter"=>"0.26")

In extreme cases, you may need to inspect the full environment stack. For example, having a struct changed in a dependency.

jlso.manifest
Dict{String,Any} with 41 entries:
  "Mocking"       => Dict{String,Any}[Dict("deps"=>["ExprTools"],"git-tree-sha1…
  "Pkg"           => Dict{String,Any}[Dict("deps"=>["Dates", "LibGit2", "Libdl"…
  "TimeZones"     => Dict{String,Any}[Dict("deps"=>["Dates", "EzXML", "Mocking"…
  "Documenter"    => Dict{String,Any}[Dict("deps"=>["Base64", "Dates", "DocStri…
  "BSON"          => Dict{String,Any}[Dict("git-tree-sha1"=>"db18b5ea04686f73d2…
  "Test"          => Dict{String,Any}[Dict("deps"=>["Distributed", "Interactive…
  "Zlib_jll"      => Dict{String,Any}[Dict("deps"=>["Artifacts", "JLLWrappers",…
  "IOCapture"     => Dict{String,Any}[Dict("deps"=>["Logging"],"git-tree-sha1"=…
  "Random"        => Dict{String,Any}[Dict("deps"=>["Serialization"],"uuid"=>"9…
  "Libdl"         => Dict{String,Any}[Dict("uuid"=>"8f399da3-3557-5675-b5ff-fb8…
  "JLSO"          => Dict{String,Any}[Dict("deps"=>["BSON", "CodecZlib", "FileP…
  "UUIDs"         => Dict{String,Any}[Dict("deps"=>["Random", "SHA"],"uuid"=>"c…
  "Distributed"   => Dict{String,Any}[Dict("deps"=>["Random", "Serialization", …
  "Serialization" => Dict{String,Any}[Dict("uuid"=>"9e88b42a-f829-5b0c-bbe9-9e9…
  "SHA"           => Dict{String,Any}[Dict("uuid"=>"ea8e919c-243c-51af-8825-aaa…
  "REPL"          => Dict{String,Any}[Dict("deps"=>["InteractiveUtils", "Markdo…
  "Memento"       => Dict{String,Any}[Dict("deps"=>["Dates", "Distributed", "JS…
  "Syslogs"       => Dict{String,Any}[Dict("deps"=>["Printf", "Sockets"],"git-t…
  "CodecZlib"     => Dict{String,Any}[Dict("deps"=>["TranscodingStreams", "Zlib…
  ⋮               => ⋮

These project and manifest fields are just the dictionary representations of the Project.toml and Manifest.toml files found in a Julia Pkg environment. As such, we can also use Pkg.activate to construct and environment matching that used to write the file.

dir = joinpath(dirname(dirname(pathof(JLSO))), "test", "specimens")
jlso = read(joinpath(dir, "v4_bson_none.jlso"), JLSOFile)
jlso[:DataFrame]
1355-element Array{UInt8,1}:
 0x4b
 0x05
 0x00
 0x00
 0x02
 0x74
 0x61
 0x67
 0x00
 0x07
    ⋮
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00

Unfortunately, we can't load some objects in the current environment, so we might try to load the offending package only to find out it isn't part of our current environment.

try using DataFrames catch e @warn e end
┌ Warning: ArgumentError("Package DataFrames not found in current path:\n- Run `import Pkg; Pkg.add(\"DataFrames\")` to install the DataFrames package.\n")
└ @ Main.ex-metadata-example none:1

Okay, so we don't have DataFrames loaded and it isn't part of our current environment. Rather than adding every possible package needed to deserialize the objects in the file, we can use the Pkg.activate do-block syntax to:

  1. Initialize the exact environment needed to deserialize our objects
  2. Load our desired dependencies
  3. Migrate our data to a more appropriate long term format
using Pkg

# Now we can run our conversion logic in an isolated environment
mktempdir(pwd()) do d
    cd(d) do
        # Modify our Manifest to just use the latest release of JLSO
        delete!(jlso.manifest, "JLSO")

        Pkg.activate(jlso, d) do
            @eval Main begin
                using Pkg; Pkg.resolve(); Pkg.instantiate(; verbose=true)
                using DataFrames, JLSO
                describe($(jlso)[:DataFrame])
            end
        end
    end
end

4 rows × 8 columns

variablemeanminmedianmaxnuniquenmissingeltype
SymbolUnion…AnyUnion…AnyUnion…Union…DataType
1a3.013.05Int64
2b0.7724320.5124520.8631220.907903Float64
3cae50Any
4d0.601.01Bool

NOTE:

  • Comparing project and manifest dictionaries isn't ideal, but it's currently unclear if that should live here or in Pkg.jl.
  • The Pkg.activate workflow could probably be replaced with a macro