Skip to content

Commit a02e398

Browse files
committed
Starting a full read-through
1 parent d4a957c commit a02e398

8 files changed

+128
-122
lines changed

content/05.introduction.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ When viewing this from the perspective of the landscape of inquiry, the most sta
1515

1616
In [@doi:10.1088/0067-0049/192/1/9], the analysis platform `yt` was described.
1717
At the time, `yt` was focused on analyzing and visualizing the output of grid-based adaptive mesh refinement hydrodynamic simulations; while these were used to study many different physical phenomena, they all were laid out in roughly the same way, in rectilinear meshes of data.
18-
In this paper, we present the current version of `yt`, which enables identical scripts to analyze and visualize data stored as rectilinear grids as before, but additionally particle or discrete data, octree-based data, and data stored as unstructured meshes.
18+
In this paper, we present the current version of `yt`, which enables identical scripts to analyze and visualize data stored as [rectilinear grids](#sec:grid_analysis) as before, but additionally [particle or discrete data](#sec:sph-analysis), [octree-based data](#sec:octree_analysis), and data stored as [unstructured meshes](#sec:unstructured_mesh).
1919
This has been the result of a large-scale effort to rewrite the underlying machinery within `yt` for accessing data, indexing that data, and providing it in efficient ways to higher-level routines, as discussed in Section Something.
2020
While this was underway, `yt` has also been considerably reinstrumented with [metadata-aware array infrastructure](#sec:units), the [volume rendering infrastructure](#sec:vr) has been rewritten to be more user-friendly and capable, and support for [non-Cartesian geometries](#sec:noncartesian) has been added.
2121

22-
The single biggest update/addition to `yt` since that paper was published has not been technical in nature.
22+
The single biggest update or addition to `yt` since that paper was published has not been technical in nature.
2323
In the intervening years, a directed and intense community-building effort has resulted in the contributions from over a hundred different individuals, many of them early-stage researchers, and a [thriving community of both users and developers](#sec:community).
2424
This is the crowning achievement of development, as we have attempted to build `yt` into a tool that enables inquiry from a technical level as well as fosters a supportive, friendly community of individuals engaged in self-directed inquiry.

content/10.community_building.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,12 @@ Participation in code review, providing comments, feedback and suggestions to ot
4747
But, it does arise from a pragmatic (ensuring code reliability) or altruistic (the public good of the software) motivation, and is thus a deeper level of engagement.
4848

4949
The final two activities, drafting enhancement proposals and closing bug reports, are the most engaged, and often the most removed from the academic motivation structure.
50-
Developing an [enhancement proposal](#sec:ytep) for `yt` means iterating with other developers on the motivation behind and implementation of a large piece of functionality; it requires both motivation to engage with the community and the patience to build consensus among stakeholders.
50+
Developing an [enhancement proposal](#sec:ytep) for `yt` means iterating with other developers on the motivation behind and implementation of a large piece of functionality; it requires both motivation to engage with the community and the patience to build consensus among stakeholders.
5151
Closing bug reports -- and the development work associated with identifying, tracking and fixing bugs -- requires patience and often repeated engagement with stakeholders.
5252

5353
### Engagement Metrics
5454

55-
We include here plots of the level of engagement on mailing list discussions and the citation count of the original method paper.
56-
55+
Typically, measuring the degree of engagement in a project is done by examining the amount of activity that surrounds it; this can be through development, mailing list or other discussion forum engagement, or through citations of a paper.
56+
These metrics are valuable, but incomplete.
57+
Furthermore, their quantification presents challenges: how does migration of a project (and a community) from one form of interaction (such as a mailing list) to another (such as Slack or Github Issues) impact the perceived growth or health of that project?
58+
As such, we have attempted to build a proxy for the development metrics by examining activity around pull requests (as below in Figure #fig:pr-closing-time) and have opted to elide discussion of the activity of the project through the currently dominant medium of Slack.

content/15.development_procedure.md

+68-67
Large diffs are not rendered by default.

content/20.data_objects.md

+33-34
Original file line numberDiff line numberDiff line change
@@ -5,29 +5,31 @@ The basic principles by which `yt` operates are built on the notion of selecting
55
Selections in `yt` are usually spatial in nature, although several non-spatial mechanisms focused on queries can be utilized as well.
66
These objects which conduct selection are selectors, and are designed to provide as small of an API as possible, to enable ease of development and deployment of new selectors.
77

8-
Selectors require defining several functions, with the option of defining additional functions for optimization, that return true or false whether a given point is or is not included in the selected region.
8+
Implementing a new "selector" in `yt` requires defining several functions, with the option of defining additional functions for optimization, that return true or false whether a given point is or is not included in the selected region.
99
These functions include selection of a rectilinear grid (or any point within that grid), selection of a point with zero extent and selection of a point with a non-zero spherical radius.
10+
Implementing new selectors is uncommon, as many basic selectors have been defined, along with the ability to combine these through boolean operations.
1011

1112
The base selector object utilizes these routines during a selection operation to maximize the amount of code reused between particle, patch, and octree selection of data.
1213
These three types of data are selected through specific routines designed to minimize the number of times that the selection function must be called, as they can be quite expensive.
1314

14-
Selecting data from a grid is a two-step process.
15+
Selecting data from a dataset composed of grids is a two-step process.
1516
The first step is identifying which grids intersect a given data selector; this is done through a sequence of bounding box intersection checks.
1617
Within a given grid, the cells which are intersected are identified.
17-
This results in the selection routine being called once for each grid object in the simulation and once for each cell located within an intersecting grid.
18+
This results in the selection routine being called once for each grid object in the simulation and once for each cell located within an intersecting grid (unless additional short-circuit paths, specific to the selector, are available).
1819
This can be conducted hierarchically, but due to implementation details around how the grid index is stored this is not yet cost effective.
1920

2021
Selecting data from an octree-organized dataset utilizes a recursive scheme that selects individual oct nodes, then for each cell within that oct, determining which cells must be selected or child nodes recursed into.
2122
This system is designed to allow for having leaf nodes of varying cells-per-side, for instance 1, 2, 4, 8, etc.
2223
However, the number of nodes is fixed at 8, with subdivision always occurring at the midplane.
2324

2425
The final mechanism by which data is selected is for discrete data points, typically particles in astrophysical simulations.
25-
At present, this is done by first identifying which data files intersect with a given selector, then selecting individual points.
26-
There is no hierarchical data selection conducted in this system, as we do not yet allow for re-ordering of data on disk or in-memory which would facilitate hierarchical selection through the use of operations such as Morton indices.
26+
Often these particles are stored in multiple files, or multiple _virtual_ files can be identified by `yt` through applying range or subsetting to the full dataset.
27+
Selection is conducted by first identifying which data files (or data file subsets) intersect with a given selector, then selecting individual points in those data files.
28+
There is only a single level of hierarchical data selection in this system, as we do not yet allow for re-ordering of data on disk or in-memory which would facilitate multi-level hierarchical selection through the use of operations such as Morton indices.
2729

2830
### Selection Routines
2931

30-
Given these set of hierarchical selection methods, all of which are designed to provide opportunities for early-termination, each *geometric* selector object is required to implement a small set of methods to expose its functionality to the hierarchical selection process.
32+
Given these set of hierarchical selection methods, all of which are designed to provide opportunities for early-termination, each _geometric_ selector object is required to implement a small set of methods to expose its functionality to the hierarchical selection process.
3133
Duplicative functions often result from attempts to avoid expensive calculations that take into account boundary conditions such as periodicity and reflectivity unless necessary.
3234
Additionally, by providing some routines as options, we can in some instances specialize them for the specific geometric operation.
3335

@@ -56,13 +58,12 @@ A selection of data in a low-resolution simulation from a sphere.
5658
The logical `A AND NOT B` for regions `A` and `B` from Figures @fig:reg2 and @fig:sp2 respectively.
5759
](images/selectors/reg2_not_sp2.svg){#fig:reg2_not_sp2}
5860

59-
6061
### Fast and Slow Paths
6162

6263
Given an ensemble of objects, the simplest way of testing for inclusion in a selector is to call the operation `select_cell` on each individual object.
6364
Where the objects are organized in a regular fashion, for instance a "grid" that contains many "cells," we can apply both "first pass" and "second pass" fast-path operations.
6465
The "first pass" checks whether or not the given ensemble of objects is included, and only iterates inward if there is partial or total inclusion.
65-
The "second pass" fast pass is specialized to both the organization of the objects *and* the selector itself, and is used to determine whether either only a specific (and well-defined) subset of the objects is included or the entirety of them.
66+
The "second pass" fast pass is specialized to both the organization of the objects _and_ the selector itself, and is used to determine whether either only a specific (and well-defined) subset of the objects is included or the entirety of them.
6667

6768
For instance, we can examine the specific case of selecting grid cells within a rectangular prism.
6869
When we select a "grid" of cells within a rectangular prism, we can have either total inclusion, partial inclusion, or full exclusion.
@@ -79,31 +80,29 @@ We do make a distinction between "selection" operations and "reduction" or "cons
7980
Additionally, some have been marked as not "user-facing," in the sense that they are not expected to be constructed directly by users, but instead are utilized internally for indexing purposes.
8081
In columns to the right, we provide information as to whether there is an available "fast" path for grid objects.
8182

82-
| Object Name | Object Type |
83-
| ------------------------ | ------------------------ |
84-
| Arbitrary grid | Resampling |
85-
| Boolean object | Selection (Base Class) |
86-
| Covering grid | Resampling |
87-
| Cut region | Selection |
88-
| Cutting plane | Selection |
89-
| Data collection | Selection |
90-
| Disk | Selection |
91-
| Ellipsoid | Selection |
92-
| Intersection | Selection (Bool) |
93-
| Octree | Internal index |
94-
| Orthogonal ray | Selection |
95-
| Particle projection | Reduction |
96-
| Point | Selection |
97-
| Quadtree projection | Reduction |
98-
| Ray | Selection |
99-
| Rectangular Prism | Selection |
100-
| Slice | Selection |
101-
| Smoothed covering grid | Resampling |
102-
| Sphere | Selection |
103-
| Streamline | Selection |
104-
| Surface | Selection |
105-
| Union | Selection (Bool) |
83+
| Object Name | Object Type |
84+
| ---------------------- | ---------------------- |
85+
| Arbitrary grid | Resampling |
86+
| Boolean object | Selection (Base Class) |
87+
| Covering grid | Resampling |
88+
| Cut region | Selection |
89+
| Cutting plane | Selection |
90+
| Data collection | Selection |
91+
| Disk | Selection |
92+
| Ellipsoid | Selection |
93+
| Intersection | Selection (Bool) |
94+
| Octree | Internal index |
95+
| Orthogonal ray | Selection |
96+
| Particle projection | Reduction |
97+
| Point | Selection |
98+
| Quadtree projection | Reduction |
99+
| Ray | Selection |
100+
| Rectangular Prism | Selection |
101+
| Slice | Selection |
102+
| Smoothed covering grid | Resampling |
103+
| Sphere | Selection |
104+
| Streamline | Selection |
105+
| Surface | Selection |
106+
| Union | Selection (Bool) |
106107

107108
Table: Selection objects and their types. {#tbl:selection-objects}
108-
109-

content/25.processing_and_analysis.md

+13-9
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ Derived fields are an extremely integral component of `yt` and are the gateway t
5050
In addition, `yt` includes a large number of fields available, many of which are dynamically constructed according to metadata available in the dataset, to jump-start analysis.
5151
Researchers using `yt` can load a dataset and immediately compute, for instance, the velocity divergence and `yt` will construct the appropriate finite different stencil, fill in any missing zones at the edge of individual boundaries, and return an array that can be accessed, visualized or processed.
5252

53+
`yt` also provides, and utilizes internally, methods for constructing derived fields from "templates."
54+
For instance, generation of mass fraction fields (as demonstrated above) is conducted internally by `yt` through iterating over all known fields of type density and applying the same function template to them.
55+
This is applied for quantities such as atomic and molecular species as well as for vector fields, where operators such as divergence and gradient are available through templated field operations.
56+
5357
#### Particle Filters {#sec:particle_filters}
5458

5559
Many of the data formats that `yt` accepts define particles as mixtures of a single set of attributes (such as position, velocity, etc) and then a "type" -- for instance, intermingling dark matter particles with "star" particles.
@@ -141,13 +145,13 @@ The array-like operations utilized in `yt` attempt to map to conceptually simila
141145
Unlike numpy, however, these utilize `yt`'s dataset-aware "chunking" operations, in a manner philosophically similar to the chunking operations used in the parallel computation library dask.
142146
Below, we outline the three classes of operations that are available, based on the type of their return value.
143147

144-
#### Reduction to Scalars {#sec:arrayops-scalar}
148+
#### Reduction to Scalars {#sec:arrayops-scalar}
145149

146-
Traditional array operations that map from an array to a scalar are accessible utilizing familiar syntax. These include:
150+
Traditional array operations that map from an array to a scalar are accessible utilizing familiar syntax. These include:
147151

148-
* `min(field_specification)`, `max(field_specification)`, and `ptp(field_specification)`
149-
* `argmin(field_specification, axis)`, and `argmax(field_specification, axis)`
150-
* `mean(field_specification, weight)`, `std(field_specification, weight)`, and `sum(field_specification)`
152+
- `min(field_specification)`, `max(field_specification)`, and `ptp(field_specification)`
153+
- `argmin(field_specification, axis)`, and `argmax(field_specification, axis)`
154+
- `mean(field_specification, weight)`, `std(field_specification, weight)`, and `sum(field_specification)`
151155

152156
In addition to the advantages of allowing the parallelism and memory management be handled by `yt`, these operations are also able to accept multiple fields.
153157
This allows multiple fields to be queried in a single pass over the data, rather than multiple passes.
@@ -160,17 +164,17 @@ The operations `mean` and `sum` are available here in a non-spatial form, where
160164

161165
#### Reduction to Vectors {#sec:arrayops-vector}
162166

163-
* `profile(axes, fields, profile_specification)`
167+
- `profile(axes, fields, profile_specification)`
164168

165169
The `profile` operation provides weighted or unweighted histogramming in one or two dimensions.
166170
This function accepts the axes along which to compute the histogram as well as the fields to compute, and information about whether the binning should be an accumulation, an average, or a weighted average.
167171
These operations are described in more detail in **reference profile section**.
168172

169173
#### Remapping Operations {#sec:arrayops-remap}
170174

171-
* `mean(field_specification, weight, axis)`
172-
* `sum(field_specification, axis)`
173-
* `integrate(field_specification, weight, axis)`
175+
- `mean(field_specification, weight, axis)`
176+
- `sum(field_specification, axis)`
177+
- `integrate(field_specification, weight, axis)`
174178

175179
These functions map directly to different methods used by the projection data object.
176180
Both `mean` and `sum`, when supplied a spatial axis, will compute a dimensionally-reduced projection, remapped into a pixel coordinate plane.

0 commit comments

Comments
 (0)