Workflow deBoer lab (cBITE)

##cBITE Research in cBITE is dedicated to understanding and applying basic cell biological principles in the field of biomedical engineering. The research program is characterized by a holistic approach to both discovery and application, aiming at combining high throughput technologies, computational modeling and experimental cell biology to streamline the wealth of biological knowledge to real clinical applications.

Biological motivations

Types of perturbations: The TopoChip, a micro topography screening platform, enables the assessment of cell response to 2176 unique topographies in a single high-throughput screen. The topographical features were randomly selected from an in silico library of more than 150 million of topographies, which were designed from algorithm that synthesized patterns based on simple geometric elements – circles, triangles and rectangles (Unadkat et al, 2011).

![](https://github.com/aliaksey/cytomining-hackathon-wiki/blob/master/figure 1_3.jpg)

Electron microscope image of the TopoChip, showing the different surface topographies in each unit.

Biological problems: Topographical cues, have been repeatedly shown to dramatically influence cell behavior and phenotype. We perform high throughput screening to evaluate cellular response, cell morphology and the expression of phenotypic markers in response to topography. We systematically characterize the diversity of cell phenotypes, specifically, cell shape phenotypes that can be induced by topography.

Actin and Nuclei staining of stromal cells on TopoChip

Image analysis and feature extraction

We use Matlab for images correction. For cell segmentation and features extraction we employ CellProfiler. One image corresponds to single TopoUnit (well) that contains one topography type, the size of each TopoUnit is 300 x 300 microns, 4356 in total on chip. After acquisition on automated microscope all images are merged together with following flat-field correction and image normalization (contrast stretching). After that we estimate background and subtract it.

Background removal. Background is estimated and subtracted using a procedure based on work in [Levesque MP et al, 2008]. The algorithm starts with an estimate of the background b equal to the measured signals. Next, a circular disk filter is applied on the background to generate a blurred version. In places where the measured signal is significantly higher than the blurred background (i.e. s > (blur(b) + 2<p)), the background estimate b is replaced by the blurred background estimate blur(b). This is repeated several times in order to converge. Value (p was tuned by hand. This procedure has as advantage that it, besides the average background, also removes the background variations. Figure shows the estimated background. Performing the background correction in the first phase of the pipeline allows us also to account for varying backgrounds between individual microscope images (Marc Hulsman, et. al., 2015).

a) Estimated background b) Background removed

Image quality control

Focus quality (a feature used for quality control) is calculated from the strength of the discontinuities in each image. Larger discontinuities indicate better (sharper) focus. A Laplace filter is used to quantify this, using the second order partial derivatives in an image. Large (absolute) values indicate better focus. The standard deviation measure is used to summarize these values per TopoUnit. (Marc Hulsman, et. al., 2015)

Data cleaning

Detection the whole TopoUnit as an outlier. Quality metrics are calculated for each TopoUnit. These include focus quality and the (background) signal intensity of the different channels. Additionally, Cell Profiler determines the number of cells in each unit, and the extent to which objects are clumped together (artifacts are sometimes misrecognized as clumps of nuclei by Cell Profiler). As technical artifacts often affect multipleneighboring TopoUnits together, we also determine for each quality metric the mean, minimum and maximum value observed in the directly neighboring TopoUnits. In total, this results in 64 quality metrics for each TopoUnit. Valid value ranges of these metrics are unfortunately influenced by experimental settings, such as seeding density and the used staining. Due to this, it is not possible to reliable detect outliers by putting default thresholds on these metrics. As alternative, we use a machine learning approach, which can automatically detect outliers. First, an initial guess is obtained on which units could be outliers, by applying M—estimation to several morphological factors (cell and nucleus extents and form factors). Subsequently, a linear logistic classifier is trained using this set of outliers, to separate them from non— outliers based on only the quality metrics. The classifier integrates the 64 quality measures into a posterior probability score. Next, the threshold for this score is determined, above which TopoUnits are considered outliers, by optimizing (all in cross validation) the Kruskal—Wallis statistic. The latter step helps to prevent the removal of real surface effects that were not captured by the regression model, as the Kruskal—Wallis test is not constrained to such a model. If a majority of the TopoUnits on a chip is considered outlier, we presume that the whole chip is affected, and automatically remove it (after which the outlier model is retrained). (Marc Hulsman, et. al., 2015)

Outlier detection based on single cells data. To find the most reproducible cell shape pattern across all replicas we used the following set of outlier detection steps:

For each topography, we see the distribution of cell counts across the repeats, then remove the replicates that have too few or too many cells, using 1.5 Interquartile range rule
The next step is to remove bad segmentations, and this we do by finding cells that have too small or too large area or perimeter. Here the QC is done by considering the distribution of all cells within a given surface, that is, we consider all cells from all repeats and then do the filtering per cell. using 1.5 Interquartile range rule

Following step was to remove objects that are likely artifacts. We do this by looking at the distribution of cells shapes in a multi dimensional shape space and then removing outliers using the robust Mahalanobis-based outlier detection. Selected features were choosen by filtering based on correlation. The idea of this outlier detection is to compute the mahalanobis-based for each point from the center, then remove outliers based on that distance.
Final step was to consider all the repeats of a surface, then filter out repeats based on correlation of features between each other. Outlier detection was applied iteratively per surface (up to 20 replicas per surface ~400 cells)

Normalize features

Usually we do not normalize features.

Transform features

If required by the analysis, we scale the data. In some cases, we used median and median absolute deviation for scaling instead of mean and sd.

Correct for systematic effects

Part of outlier detection step, described earlier.

Select features / reduce dimensionality

For unsupervised analysis we used PCA to reduce dimensionality, as well as filtering out features that highly correlate between each other and features that can be predicted from linear combination of other features. In addition, for supervised analysis, we used recursive feature elimination procedure.

Create per-well profiles

Normally we took the mean of medians after all outlier detection steps, however we found that trimmed mean can be a better option in comparison with median when number of cells per replica is small.

Measure similarity between profiles

All topographies from the screen are ranked based on a certain feature, topographies can be compared based on their position in the rank. Rank itself can be validated by checking consistency of ranking based on replicas sampling. (Marc Hulsman, et. al., 2015). In addition we also using unsupervised clustering to find similar patterns in the data.

Downstream analysis / visualization

We utilize machine learning techniques to create computational models to detect surface design parameters that affect cell fate.

References

Unadkat H V. et al. “An Algorithm-Based Topographical Biomaterials Library to Instruct Cell Fate.” Proceedings of the National Academy of Sciences of the United States of America 108.40 (2011).
Hulsman M et. al., Analysis of high-throughput screening reveals the effect of surface topographies on cellular morphology, Acta Biomaterialia, 15, (2015).
Levesque MP, Lelievre M. Evaluation of the Iterative Method for Image Background Removal in Astronomical Images. 2008
Reimer, A., Vasilevich, A., Hulshof, F., Viswanathan, P., van Blitterswijk, C. A., de Boer, J., & Watt, F. M. (2016). Scalable topographies to support proliferation and Oct4 expression by human induced pluripotent stem cells. Scientific reports, 6, 18948.