Skip to content

Commit

Permalink
docs: improved ImpostorTemplate docs
Browse files Browse the repository at this point in the history
  • Loading branch information
lfenzo committed Oct 16, 2023
1 parent a972307 commit 07ee048
Show file tree
Hide file tree
Showing 4 changed files with 81 additions and 22 deletions.
14 changes: 8 additions & 6 deletions docs/src/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ Impostor._load!("localization", "state", "en_US")
the structured archive is checked for the existance of a `"state"` *content* in the `"localization"`
*provider* associated to the United States english (`"en_US"`) *locale*. If such file is available,
a `DataFrame` object is returned with the associated data for further manipulation. Since the column
names for a given (`"localization"`, `"state"`) tuple are restricted by design, multiple locales
may be loaded at once:
names for a given (`"localization"`, `"state"`) tuple are garanteed to be equal by design, multiple
locales may be loaded at once:

```julia
Impostor._load!("localization", "state", ["en_US", "pt_BR"])
Expand All @@ -100,16 +100,18 @@ Impostor._load!("localization", "state", ["en_US", "pt_BR"])
In order to add new data files, contents or providers, carefully follow the same directory structure
described in the previous section paying attention to the format of the `HEADER.txt` file. Some of
the scenarios you will find while adding new data to the archive are shown below:
- **Incrementing existing locale files**: corresponds to the simplest case, just add new rows to the respective `.csv` file. Typically, to ease navigation for users adding new data, the `.csv` are usually sorted by some of its columns, make your changes so that this property is kept in the modified file.
- **Adding new contents or providers**: in both cases the creation of a new directory/set of directories is needed. Although this my be slightly subjective, try to do it so that the new set of directories resambles the current organization structure in `data/`.
- **Incrementing existing locale files**: corresponds to the simplest case, just add new rows to the respective `.csv` file. Typically, to ease navigation for users adding new data, the `.csv` are usually sorted by some of its columns, make your changes so that this property remains valid in the modified file.
- **Adding new contents or providers**: in both cases the creation of a new directory/set of directories is needed. Although this my be slightly subjective, try to do it so that the new set of directories resambles the current organization structure in `data/` (when in doubt reach out via GitHub so we can discuss the best organization for the files).
- Make sure that a set of unit tests exist for the new content in order to ensure its consistency, place the implementation under the `tests/data_integrity/` directory in a file called `test_<your provider>.jl`
- If your data requires any kind of restriction (*e.g.* a certain column may only contain a restricted set of values), register such restrictions in the `src/relation_restrictions.jl` file.

## Adding New Functions

Some guide-lines on adding new generator-functions are:
1. Make sure that that contents required for the new generator-function are available under the data archive in `src/data/`. If not, then proceed to the previous section on adding new data.
1. Use exclusively the [`Impostor._load!`](@ref) to interact with the data archive. In order to manipulate the dataframe(s) according to your needs, the functions exported by [DataFrames.jl](https://dataframes.juliadata.org/stable/lib/functions/) should suffice most use cases.
1. Use *exclusively* the [`Impostor._load!`](@ref) function to interact with the data archive. In order to manipulate the dataframe(s) according to your needs, the functions exported by [DataFrames.jl](https://dataframes.juliadata.org/stable/lib/functions/) should suffice most use cases. If you need other package(s) to manipulate the dataframes in order to get the desired output, file an issue explaining the situation and we will discuss the addition of a new dependency.
1. Add the new generator-function to the `export` list in the `src/Impostor.jl` in the appropriate Provider grouping. Make sure to add it in alphabetical order in each group.
1. Add docstrings with examples, when possible.
1. Add docstrings with examples, when possible/applicable.

## Testing Philosophy

Expand Down
11 changes: 7 additions & 4 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,13 @@ is to generate single and multiple values specifying the number of expected valu

```@repl
using Impostor # hide
firstname() # equivalent to firstname(1)
firstname(5)
firstname() # equivalent to firstname(1)
```

Generator functions may be found in each of the Providers individual pages or via the
[API Reference](./api_reference.md) page.

!!! note
When a single value is produced by the generator function, as in `firstname(1)` from the
example above, the returned valued is automatically unpacked into a `String`
Expand Down Expand Up @@ -62,8 +65,8 @@ resetlocale!(); # hide

### Impostor Templates

Besides providing several *generator-functions* which may be used as standalone data series
generators, Impostor also exports the `ImpostorTemplate` which is a utility struct to encapsulate
Besides providing several *generator functions* which may be used as standalone data series
generators, Impostor also exports the [`ImpostorTemplate`](@ref) which is a utility struct to encapsulate
formats and generate a fully fledgned table.

```@repl
Expand All @@ -73,7 +76,7 @@ template = ImpostorTemplate([:firstname, :surname, :country_code, :state, :city]
template(3)
template(5, DataFrame; locale = ["pt_BR", "en_US"]) # optionally, provide a `sink` type
template(5, DataFrame; locale = ["pt_BR", "en_US"]) # optionally provide a `sink` type
```

## Concepts
Expand Down
74 changes: 66 additions & 8 deletions src/impostor_template.jl
Original file line number Diff line number Diff line change
@@ -1,10 +1,46 @@
"""
ImpostorTemplate(format::Vector{Symbol})
ImpostorTemplate(formats::Union{T, S, Vector{Union{T, S}}}) where {T<:AbstractString, S<:Symbol}
Struct storing the formats used to
Struct storing the `formats` used to generate new tables. Each of the elements in `formats` maps to
a generator function exported by Impostor. This struct is later used as a *functor* in order to
generate data, that is, after instantiating a new `ImpostorTemplate` object, this object will be
called providing arguments in order to generate the data entries.
# Parameters
- `formats` (`String`, `Symbol` or `Vector{Union{String, Symbol}}`): table output format specified in terms of generator functions to be used in each column (see examples below).
# Examples
```@repl
julia> imp = ImpostorTemplate("firstname")
ImpostorTemplate([:firstname])
julia> imp = ImpostorTemplate(:firstname)
ImpostorTemplate([:firstname])
julia> imp = ImpostorTemplate(["firstname"])
ImpostorTemplate([:firstname])
julia> imp = ImpostorTemplate([:firstname])
ImpostorTemplate([:firstname])
```
"""
Base.@kwdef mutable struct ImpostorTemplate
format::Vector{Symbol}

function ImpostorTemplate(formats::Union{T, S, Vector{Union{T, S}}}) where {T<:AbstractString, S<:Symbol}
if formats isa Vector
symbol_formats = eltype(formats) <: String ? Symbol.(formats) : formats
else
symbol_formats = if formats isa String
[Symbol.(formats)]
elseif formats isa Symbol
[formats]
end
end
if _all_formats_availabe(symbol_formats)
return new(symbol_formats)
end
end
end


Expand All @@ -14,11 +50,11 @@ setformat!(i::ImpostorTemplate, format::Vector{Symbol}) = setfield!(i, :format,


"""
_sanitize_formats(formats::Vector)
_all_formats_available(formats::Vector)
Verify if all `formats` are available and exported by Impostor.jl, otherwise throw
Verify if all `formats` are available and exported by Impostor.jl, otherwise throw `ArgumentError`
"""
function _verify_formats(formats::Vector{Symbol})
function _all_formats_availabe(formats::Vector{Symbol})
invalid_formats = Symbol[]
valid_formats = names(Impostor; all = false)

Expand All @@ -30,6 +66,8 @@ function _verify_formats(formats::Vector{Symbol})

if !isempty(invalid_formats)
ArgumentError("Invalid formats provided: $(invalid_formats)") |> throw
else
return true
end
end

Expand All @@ -38,7 +76,7 @@ end
"""
(impostor::ImpostorTemplate)(n::Integer = 1, sink = Dict; kwargs...)
Generates `n` entries with information specified in the formats.
Generate `n` entries according to the `format` provided when `impostor` was instantiated.
# Parameters
- `n`: number of entries/rows to generate in each format
Expand All @@ -49,14 +87,34 @@ Generates `n` entries with information specified in the formats.
# Examples
```@repl
julia> formats = ["complete_name", "credit_card_number", "credit_card_expiry"];
julia> template = ImpostorTemplate(formats)
ImpostorTemplate([:complete_name, :credit_card_number, :credit_card_expiry])
julia> template(3, DataFrame)
3×3 DataFrame
Row │ complete_name credit_card_number credit_card_expiry
│ String String String
─────┼───────────────────────────────────────────────────────────────────────
1 │ Sophie Cornell Collins 52583708162384822 6/2008
2 │ Mary Collins Cornell 3442876938992966 10/2022
3 │ John Sheffard Cornell Collins 4678055537702596 10/2021
julia> template(3, DataFrame; locale = ["pt_BR"])
3×3 DataFrame
Row │ complete_name credit_card_number credit_card_expiry
│ String String String
─────┼───────────────────────────────────────────────────────────────────────────
1 │ João Camargo da Silva Pereira 3418796429393351 4/2018
2 │ João Pereira da Silva 4305288858368967 6/2018
3 │ Bernardo Pereira Camargo da Silva 3751513143972989 3/2024
```
"""
function (impostor::ImpostorTemplate)(n::Integer = 1, sink = Dict;
locale = session_locale(),
kwargs...
)
_verify_formats(impostor.format)
generated_values = OrderedDict()

if !isempty(intersect(SEXES[:provider_functions], impostor.format))
Expand Down
4 changes: 0 additions & 4 deletions src/providers/finance.jl
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,6 @@ end
# Kwargs
- `level::Symbol = :bank_code`: Level of values in `options` or `mask` when using option-based or mask-based eneration.
- `locale::Vector{String}`: locale(s) from which entries are sampled. If no `locale` is provided, the current session locale is used.
# Example
```jldoctest
"""
function bank_official_name(n::Integer = 1; locale = session_locale())
return rand(_load!("finance", "bank", locale)[:, :bank_official_name], n) |> coerse_string_type
Expand Down

0 comments on commit 07ee048

Please sign in to comment.