Working with Symbols
Tables use Symbol
s to represent column names. Often one thus wants to do various filters on Symbol
s.
Most filtering functions only accept strings, so code contains a lot of code like:
filter(col_name->startswith(colname, "temperature"), String.(names(df)))
We solve that by making functions that accept Symbol
for all the arguments that would otherwise be strings. Similarly, while some functions like isequal
and in
have curried varients, like in(collection)
being the same as item->in(item, collection)
, not all do. We add the missing ones.
We type-pirate startwith
and endswith
to provide Symbol
accepting versions. This is misdemeanor type-piracy: it only turns code which currently errors into non-errors. It is also the only reasonable definition for these methods.
Functions
These are all variants of existing functions.
contains
, startswith
, endswith
, and all their varients mentioned here, accept Symbol
s everywhere they might accept String
s.
The any
varients are startswith_any
, endswith_any
, and contains_any
. They are of the form:
check_any(needles, haystack) == any(check(haystack, needle) for needle in needles)
Examples
Consider if I had a list of column names, to do with prices and weather in various cities. Where the column names include the city and kind of data it is about.
I might want just the columns names that are for a particular city:
julia> using Wrangling
julia> const column_names = [:NY_temperature, :NY_house_price, :NY_car_price, :NY_rainfall, :LON_temperature, :LON_house_price, :LON_car_price, :LON_rainfall];
julia> const LON_cols = filter(startswith(:LON), column_names)
4-element Array{Symbol,1}:
:LON_temperature
:LON_house_price
:LON_car_price
:LON_rainfall
Another thing I might want to do, is get just the ones that are to do with weather.
julia> const weather_cols = filter(contains_any((:temperature, :rainfall)), column_names)
4-element Array{Symbol,1}:
:NY_temperature
:NY_rainfall
:LON_temperature
:LON_rainfall
Or we might want to do the opposite and exclude any to do with weather:
julia> const nonweather_cols = filter(!contains_any((:temperature, :rainfall)), column_names)
4-element Array{Symbol,1}:
:NY_house_price
:NY_car_price
:LON_house_price
:LON_car_price
Or I might want to get only price data for only one city.
julia> const LON_prices = filter(contains(r"^LON_.*_price$"), column_names)
2-element Array{Symbol,1}:
:LON_house_price
:LON_car_price
What if we want all columns that are either for a particular city, or that are for rainfall anywhere:
julia> const LON_or_rainfall_cols = filter(column_names) do col
startswith(col, :LON) || endswith(col, :rainfall)
end
5-element Array{Symbol,1}:
:NY_rainfall
:LON_temperature
:LON_house_price
:LON_car_price
:LON_rainfall