Metadata
Manually reading JLSO files can be helpful when addressing issues deserializing objects or to simply to help with reproducibility.
using JLSO
jlso = read("breakfast.jlso", JLSOFile)
JLSOFile([cost, food, time]; version="4.0.0", julia="1.5.4", format=:julia_serialize, compression=:gzip, image="")
Now we can manually access the serialized objects:
jlso.objects
Dict{Symbol,Array{UInt8,1}} with 3 entries: :cost => UInt8[0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03 … … :food => UInt8[0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03 … … :time => UInt8[0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03 … …
Or deserialize individual objects:
jlso[:food]
"☕️🥓🍳"
Maybe you need to figure out what packages you had installed in the save environment?
jlso.project
Dict{String,Any} with 2 entries: "deps" => Dict{String,Any}("Documenter"=>"e30172f5-a6a5-5a46-863b-614d45cd2… "compat" => Dict{String,Any}("Documenter"=>"0.26")
In extreme cases, you may need to inspect the full environment stack. For example, having a struct changed in a dependency.
jlso.manifest
Dict{String,Any} with 41 entries: "Mocking" => Dict{String,Any}[Dict("deps"=>["ExprTools"],"git-tree-sha1… "Pkg" => Dict{String,Any}[Dict("deps"=>["Dates", "LibGit2", "Libdl"… "TimeZones" => Dict{String,Any}[Dict("deps"=>["Dates", "EzXML", "Mocking"… "Documenter" => Dict{String,Any}[Dict("deps"=>["Base64", "Dates", "DocStri… "BSON" => Dict{String,Any}[Dict("git-tree-sha1"=>"db18b5ea04686f73d2… "Test" => Dict{String,Any}[Dict("deps"=>["Distributed", "Interactive… "Zlib_jll" => Dict{String,Any}[Dict("deps"=>["Artifacts", "JLLWrappers",… "IOCapture" => Dict{String,Any}[Dict("deps"=>["Logging"],"git-tree-sha1"=… "Random" => Dict{String,Any}[Dict("deps"=>["Serialization"],"uuid"=>"9… "Libdl" => Dict{String,Any}[Dict("uuid"=>"8f399da3-3557-5675-b5ff-fb8… "JLSO" => Dict{String,Any}[Dict("deps"=>["BSON", "CodecZlib", "FileP… "UUIDs" => Dict{String,Any}[Dict("deps"=>["Random", "SHA"],"uuid"=>"c… "Distributed" => Dict{String,Any}[Dict("deps"=>["Random", "Serialization", … "Serialization" => Dict{String,Any}[Dict("uuid"=>"9e88b42a-f829-5b0c-bbe9-9e9… "SHA" => Dict{String,Any}[Dict("uuid"=>"ea8e919c-243c-51af-8825-aaa… "REPL" => Dict{String,Any}[Dict("deps"=>["InteractiveUtils", "Markdo… "Memento" => Dict{String,Any}[Dict("deps"=>["Dates", "Distributed", "JS… "Syslogs" => Dict{String,Any}[Dict("deps"=>["Printf", "Sockets"],"git-t… "CodecZlib" => Dict{String,Any}[Dict("deps"=>["TranscodingStreams", "Zlib… ⋮ => ⋮
These project
and manifest
fields are just the dictionary representations of the Project.toml and Manifest.toml files found in a Julia Pkg environment. As such, we can also use Pkg.activate
to construct and environment matching that used to write the file.
dir = joinpath(dirname(dirname(pathof(JLSO))), "test", "specimens")
jlso = read(joinpath(dir, "v4_bson_none.jlso"), JLSOFile)
jlso[:DataFrame]
1355-element Array{UInt8,1}: 0x4b 0x05 0x00 0x00 0x02 0x74 0x61 0x67 0x00 0x07 ⋮ 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Unfortunately, we can't load some objects in the current environment, so we might try to load the offending package only to find out it isn't part of our current environment.
try using DataFrames catch e @warn e end
┌ Warning: ArgumentError("Package DataFrames not found in current path:\n- Run `import Pkg; Pkg.add(\"DataFrames\")` to install the DataFrames package.\n") └ @ Main.ex-metadata-example none:1
Okay, so we don't have DataFrames loaded and it isn't part of our current environment. Rather than adding every possible package needed to deserialize the objects in the file, we can use the Pkg.activate
do-block syntax to:
- Initialize the exact environment needed to deserialize our objects
- Load our desired dependencies
- Migrate our data to a more appropriate long term format
using Pkg
# Now we can run our conversion logic in an isolated environment
mktempdir(pwd()) do d
cd(d) do
# Modify our Manifest to just use the latest release of JLSO
delete!(jlso.manifest, "JLSO")
Pkg.activate(jlso, d) do
@eval Main begin
using Pkg; Pkg.resolve(); Pkg.instantiate(; verbose=true)
using DataFrames, JLSO
describe($(jlso)[:DataFrame])
end
end
end
end
variable | mean | min | median | max | nunique | nmissing | eltype | |
---|---|---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Union… | Union… | DataType | |
1 | a | 3.0 | 1 | 3.0 | 5 | Int64 | ||
2 | b | 0.772432 | 0.512452 | 0.863122 | 0.907903 | Float64 | ||
3 | c | a | e | 5 | 0 | Any | ||
4 | d | 0.6 | 0 | 1.0 | 1 | Bool |
NOTE:
- Comparing
project
andmanifest
dictionaries isn't ideal, but it's currently unclear if that should live here or in Pkg.jl. - The
Pkg.activate
workflow could probably be replaced with a macro