Title: | Geometric Morphometric Tools to Align, Scale, and Compare "Shape" of Menstrual Cycle Hormones |
---|---|
Description: | Mitteroecker & Gunz (2009) <doi:10.1007/s11692-009-9055-x> describe how geometric morphometric methods allow researchers to quantify the size and shape of physical biological structures. We provide tools to extend geometric morphometric principles to the study of non-physical structures, hormone profiles, as outlined in Ehrlich et al (2021) <doi:10.1002/ajpa.24514>. Easily transform daily measures into multivariate landmark-based data. Includes custom functions to apply multivariate methods for data exploration as well as hypothesis testing. Also includes 'shiny' web app to streamline data exploration. Developed to study menstrual cycle hormones but functions have been generalized and should be applicable to any biomarker over any time period. |
Authors: | Daniel Ehrlich [aut, cre] |
Maintainer: | Daniel Ehrlich <[email protected]> |
License: | GPL (>= 3.0) |
Version: | 1.0.2 |
Built: | 2024-12-13 07:52:19 UTC |
Source: | https://github.com/clancylabuiuc/morphomenses |
Construct a ragged array (containing missing data) of a specified length (up/down sampling individuals to fit).
mm_ArrayData( IDs, DAYS, VALUE, MID = NULL, targetLENGTH, targetMID = NULL, transformation = c("minmax", "geom", "zscore", "log", "log10"), impute_missing = 3 )
mm_ArrayData( IDs, DAYS, VALUE, MID = NULL, targetLENGTH, targetMID = NULL, transformation = c("minmax", "geom", "zscore", "log", "log10"), impute_missing = 3 )
IDs |
A vector that contains individual IDs repeated for multiple days of collection. |
DAYS |
A vector that contains information on time, IE Day 1, Day 2, Day 3. Note: this vector should include integers, continuous data might produce unintended results. |
VALUE |
A vector containing the variable sampled. |
MID |
Am optional vector of midpoints to center each individuals profile. These should be unique to each individual and repeated for each observation of DAYS, VALUE, and IDs. If NULL (defualt), data will not be centered on any day. |
targetLENGTH |
Integer. Number of days to up/down sample observations to using |
targetMID |
If NULL (default) data will not be centered and will range from 0 to 1. If specified, data will be centered on 0 ranging from -1 to 1. |
transformation |
Which (if any) data transformation to apply. Our reccomendation is minmax, but Geometric mean, Zscore, natural log and log10 transformations are available, if desired. |
impute_missing |
Integer. If not null, number of nearest-neighbors to use to impute missing data (Default = 3). |
Returns a 3D array of data to be analyzed with individuals in the 3rd dimension.
Easily evaluate simple model sets (one covariate with up to 2 additional classifiers/covariates). Helpful for exploratory analysis. For detailed models or specific combinations of variables, see geomorph::procD.lm for full use of this function.
mm_BuildModel(shape_data, ..., subgrps = NULL, ff1 = NULL, univ_series = FALSE)
mm_BuildModel(shape_data, ..., subgrps = NULL, ff1 = NULL, univ_series = FALSE)
shape_data |
This will be the (multivariate) response variable |
... |
Covariate(s)/classifier(s) to build a model set. Individual models are run with interaction effects. |
subgrps |
Optional. Vector of group membership. Model sets will be run across the whole sample and subgroups. If k is specified, only the full model will be run. |
ff1 |
An explicit model to test in the format: " coords ~ ...". Names
must match those specifed in |
univ_series |
Default (FALSE) will evaluate multiple covariates and their
interaction in a single model. However, it can be helpful to understand the
univariate effects in isolation of interaction/confounding factors. Set
|
A list containing output of one or more multivariate linear models that can be inspected on their own or interacted with using mm_VizModel or mm_CompModel.
Conduct PCA of shape data and visualize major shape trends.
mm_CalcShapespace(dat, max_Shapes = 10)
mm_CalcShapespace(dat, max_Shapes = 10)
dat |
A 3D array of shape data to be analyzed. |
max_Shapes |
The maximum amount of PCs to visualize. Default 10. |
A list containing the results of shape-pca, including vizualizations of shape extrema for each Principal Component.
Plot Raw (aligned) data along side by side with imputed data.
mm_CheckImputation(A1, A2, ObO = interactive())
mm_CheckImputation(A1, A2, ObO = interactive())
A1 |
An aligned array, containing missing data (presumably made with
|
A2 |
An aligned and imputed array (presumably made with
|
ObO |
One-by-One. If TRUE (default, in interactive sessions), individuals
will be plotted one at a time, requiring the user to advance/exit the
operation. If FALSE, all plots #' will be generated at once to be browsed
or exported from the |
A series of plots for each individual in the array. If ObO=TRUE
user
input is required to advance or exit the plotting.
Specify color order approriately for a dendrogram
mm_ColorLeaves(dendro, cols)
mm_ColorLeaves(dendro, cols)
dendro |
A dendrogram or hclust class object |
cols |
a vector of colors |
Leaves of a dendrogram will be re-ordered compared to most input classifiers. This function takes the study-ordered colors and correctly applies them to the dendrogram using dendextend
A dendrogram class object with leaves colored as specified.
Compare key figs (Rsq, p-value, etc) across multiple models.
mm_CompModel(mv_results, row_labels = NULL, digits = 4)
mm_CompModel(mv_results, row_labels = NULL, digits = 4)
mv_results |
Input mvlm, created by mm_BuildModel (or by using geomorph::procD.lm) |
row_labels |
A character vector to use in output. If NULL (default) labels from the input data will be used. |
digits |
Number of decimal places to round to. Default includes 4 decimal places. |
A list containing the results of the mvlm, visualizations of shape trends along the regression line, and the model itself.
Compare key figs (Rsq, p-value, etc) across multiple complex models.
mm_CompModel_Full(mv_results, row_labels = NULL, var_labels = NULL, digits = 4)
mm_CompModel_Full(mv_results, row_labels = NULL, var_labels = NULL, digits = 4)
mv_results |
Input mvlm, created by mm_BuildModel (or by using geomorph::procD.lm) |
row_labels |
A character vector to use in output. If NULL (default) labels from the input data will be used. |
var_labels |
A character vector to use in output. If NULL (default) labels from the input data will be used. |
digits |
Number of decimal places to round to. Default includes 4 decimal places. |
description
Visualize shape of target coordinates
mm_coords_to_shape(A, PCA, target_coords, target_PCs = c(1, 2))
mm_coords_to_shape(A, PCA, target_coords, target_PCs = c(1, 2))
A |
A landmark array used for the pca |
PCA |
output of prcomp. Should contain $transormation |
target_coords |
A single set of X,Y coordinates. |
target_PCs |
Integer identifying which pc to use on the X and Y axis. Default is c(1,2) for PC1 on x and PC2 on y |
A landmark array representing the hypothetical shape of a given set of coordinates.
Sample dataset classifiers to be paired with sample array. This table contains 60 rows to match the 60 individuals across the third dimension of the array
mm_data
mm_data
A matrix with 2015 obs (rows) and 4 variables (columns).
Individual id, each integer represents a different individual.
Integer day of cycle. Generally runs from 1 ... (28 on average).
Single value for each individual, repeated along each CYCLEDAY. In this sample, day of ovulation.
Daily measure of hormone, in nanograms per mililiter
Conduct a set of analyses to make shape-PCA results easier to interpret. Specifically, this will provide a table of eigen values (optional barplot), provide 5-number summary across each PC, conduct a naive Ward's clustering of PC scores (optional dendrogram, along with silhouette plot and scree plot of individual distance to the sample mean
mm_Diagnostics(dat, max_PC_viz = 10, max_PC_calc = NULL, hide_plots = FALSE)
mm_Diagnostics(dat, max_PC_viz = 10, max_PC_calc = NULL, hide_plots = FALSE)
dat |
A 3D array or a mmPCA object (output of mm_CalcShapespace). |
max_PC_viz |
Maximum number of PCs to include in visualizations (EG Eigenplots, or shape trends. |
max_PC_calc |
By default (NULL), all PCs will be included in calculations. However, if fewer PCs are required users may specify an integer, n, to get the first n PCS. |
hide_plots |
By default (FALSE), helpful visuals are plotted. |
Returns a list containing the results of:
eigs - A table containing individual and cumulutive loadings for each PC
PC_5_num - A data.frame containing the fivenum summary for each PC
TREE - A dendrogram representing the results of a naive-Ward's clustering
Add confidence ellipses to an active scatterplot.
mm_ellipse( dat, ci = c(67.5, 90, 95, 99), linesCol = "black", fillCol = "grey", smoothness = 20 )
mm_ellipse( dat, ci = c(67.5, 90, 95, 99), linesCol = "black", fillCol = "grey", smoothness = 20 )
dat |
A matrix of data to draw an ellipses around. |
ci |
Percentage of data to capture. Must be one of c(67.5, 90, 95, 99). |
linesCol |
Border color of the shape. |
fillCol |
Fill color of the shape. |
smoothness |
Lower values will look jagged, higher value will make smoother lines, but may take a long time to plot. Default value is 20. |
No value. Will add an ellipses of a given size to the current plot.
Launch mm_Explorer
mm_Explorer()
mm_Explorer()
No value. Will launch shiny
app in default web browser.
Fill in a ragged away by nearest neighbor imputation
mm_FillMissing(A, knn = 3)
mm_FillMissing(A, knn = 3)
A |
A ragged array (IE, contains missing cells), presumably constructed with |
knn |
Number of nearest neighbors to draw on for imputation (default = 3). |
Returns an array of the same dimensions with all missing data filled.
Convert a 3D array to 2D matrix suitable for PCA, etc. Note, this function is identical to geomorph::two.d.array, reproduced here for convenience.
mm_FlattenArray(A, sep = ".")
mm_FlattenArray(A, sep = ".")
A |
an array to be flattened |
sep |
Separator to be used for column names |
Returns a flattened array
Create a sequence from -1:1 of specified length. MIDpoint (day0) can be
mm_get_interval(days, day0 = NULL)
mm_get_interval(days, day0 = NULL)
days |
The length of the sequence to return, inclusive of the endpoints (-1,1) |
day0 |
If NULL (default), the median integer will be calculated, centering the range on 0. Specifying a value will set 0 to that value, creating asymmetric ranges. |
Returns a numeric vector of specified length, ranging from -1 to 1
mm_get_interval(15) ## Symmetrical sequence from -1 to 1 with 0 in the middle. mm_get_interval(15, day0 = 8) ## The same sequence, explicitly specifying the midpoint mm_get_interval(15, day0 = 3) ## 15 divisions with an asymmetric distribution.
mm_get_interval(15) ## Symmetrical sequence from -1 to 1 with 0 in the middle. mm_get_interval(15, day0 = 8) ## The same sequence, explicitly specifying the midpoint mm_get_interval(15, day0 = 3) ## 15 divisions with an asymmetric distribution.
Calculate and plot group distance from centroid (grand mean)
mm_grp_dists(dat, grps, plots = TRUE)
mm_grp_dists(dat, grps, plots = TRUE)
dat |
a 2d matrix of data. Presumably PC scores |
grps |
a vector defining group IDs |
plots |
Logical. Should distances be plotted as boxplots? If FALSE, distance calculations are still performed |
A list containing individual distances from the sample mean shape. If
plots=TRUE
, will also visualize results
Attempts to optimally format a grid of arrays by group
mm_grps_PlotArray(A, grps)
mm_grps_PlotArray(A, grps)
A |
an array to be plotted |
grps |
a vector defining group IDs to subset along the 3rd dimension of the array |
4 Groups will plot as a 2x2 grid, while 9 groups plot in a 3x3. Function is experimental
Returns no values, produces a series of plots.
Modify color/transparency using hsv syntax
mm_mute_cols(cols, s = NULL, v = NULL, alpha = 0.4)
mm_mute_cols(cols, s = NULL, v = NULL, alpha = 0.4)
cols |
a vector of colors, eg: "#0066FF" |
s |
Either a single value or a vector of same length as cols specifying a new saturation (range 0-1). colors darken to black (0). |
v |
Either a single value or a vector of same length as cols specifying a new value (range 0-1). colors lighten to white (0) |
alpha |
Either a single value or a vector of same length as cols specifying a transparency value (range 0-1). colors translucent at 0. |
A vector of colors that have been modified in saturation, value, or alpha
Partition sample into clusters, based on information from
mm_Phenotype(dat, kgrps, cuttree_h = NULL, cuttree_k = NULL, plot_figs = TRUE)
mm_Phenotype(dat, kgrps, cuttree_h = NULL, cuttree_k = NULL, plot_figs = TRUE)
dat |
Either an Array of shape data, an mmPCA object, or an mmDiag object. |
kgrps |
A non-negative integer of sub-groups to draw. kgrps=1 will provide results for the whole input dat. |
cuttree_h |
Optional. Draw clusters by splitting the tree at a given height, h. |
cuttree_k |
Optional. Draw clsuters by splitting the tree into number of branches, k |
plot_figs |
Optional. Default = TRUE, plot phenotypes for each set(s) of subgroups. |
If plot_figs=TRUE (Default), plot associated graphs and return a list containing:
ALN - an array containing aligned and scaled landmark data, the output of mm_ArrayData
PCA - PC scores, eigenvalues, and shape visualizations, the output of mm_CalcShapespace
TREE - Dendrogram of PC scores, the output of mm_Diagnostics
k_grps - If kgrps
is specified, a vector defining group membership
(as integer); the results of k-means clustering based on PC scores.
cth_grps - If cth_grps
is specified, a vector defining group
membership (as integer); the results of clustering using
dendextend::cutree for a given height.
ctk_grps - If ctk_grps
is specified, a vector defining group
membership (as integer); the results of clustering using
dendextend::cutree for a given number of clusters.
Plot Array Plot individuals and optionally mean form
mm_PlotArray( A, MeanShape = TRUE, AllCols = NULL, MeanCol = NULL, plot_type = c("lines", "points"), lbl = NULL, yr = NULL, axis_labels = FALSE )
mm_PlotArray( A, MeanShape = TRUE, AllCols = NULL, MeanCol = NULL, plot_type = c("lines", "points"), lbl = NULL, yr = NULL, axis_labels = FALSE )
A |
An array to be plotted |
MeanShape |
Logical. Should the Mean Shape be calculated and plotted |
AllCols |
Either a single color for all individuals, or a vector specifying colors for each individual. If NULL (default) individuals will be plotted in grey |
MeanCol |
A single color for the mean shape. If Null (default) mean shape will be plotted in black |
plot_type |
Should the data be plotted as points or lines. |
lbl |
A title (main =) for the plot. If NULL (default) the name of the array will be used. |
yr |
Y-range, in the form c(0,100) |
axis_labels |
Should units be printed along the axis. Defaults to FALSE to maximize the profile shape. |
Plot individual(s) profile(s) in the default graphics device.
Pretty PCA
mm_pretty_pca(PCA, xPC = 1, yPC = 2, clas_col = NULL, legend_cex = 0.8)
mm_pretty_pca(PCA, xPC = 1, yPC = 2, clas_col = NULL, legend_cex = 0.8)
PCA |
Input data either prcomp or mmPCA. |
xPC |
The PC to plot on the x axis |
yPC |
The PC to plot on the y axis |
clas_col |
A character vector of groupings. Each level will be plotted as a different color. |
legend_cex |
A scaling factor to be applied specifically to the legend. Set to NULL for scatterplot only. |
A better PCA plot
Returns no object, plots results of PCA
Plot total within group sum of squares to evalaute clusters
mm_ScreePlot(x, maxC = 15, ...)
mm_ScreePlot(x, maxC = 15, ...)
x |
Input data for cluster analysis (IE, PCA) |
maxC |
Maximum clusters to evaluate |
... |
Additional arguments to be passed to plot |
No value, produces diagnostic plot.
Plot average silhouete widths to evaluate clusters
mm_SilPlot(x, maxC = 15, ...)
mm_SilPlot(x, maxC = 15, ...)
x |
Input data for cluster analysis (IE PCA) |
maxC |
Maximum clusters to evaluate |
... |
additional arguments passed to plot |
No value, produces diagnostic plot.
Calculate the geometric mean of a vector and scale all values by it.
mm_transf_geom(x)
mm_transf_geom(x)
x |
A numeric vector to be scaled. Missing values will produce NA, conduct knn imputation using mm_FillMissing first. |
Returns a scaled vector
mm_transf_geom(1:10)
mm_transf_geom(1:10)
Transform a vector by the natural log.
mm_transf_log(x)
mm_transf_log(x)
x |
A numeric vector to be scaled. Missing values will produce NA, conduct knn imputation using mm_FillMissing first. |
Returns a scaled vector
mm_transf_log(1:10)
mm_transf_log(1:10)
Transform a vector by the common log (base 10).
mm_transf_log10(x)
mm_transf_log10(x)
x |
A numeric vector to be scaled. Missing values will produce NA, conduct knn imputation using mm_FillMissing first. |
Returns a scaled vector
mm_transf_log10(1:10)
mm_transf_log10(1:10)
Scale a vector from 0,1 based on its minimum and maximum values.
mm_transf_minmax(x)
mm_transf_minmax(x)
x |
A Numeric vector to be scaled. Missing values are allowed and ignored. |
Returns a scaled vector
mm_transf_minmax(1:10)
mm_transf_minmax(1:10)
Calculate and return z-scores given a numeric vector.
mm_transf_zscore(x)
mm_transf_zscore(x)
x |
A numeric vector to be scaled. Missing values will produce NA, conduct knn imputation using mm_FillMissing first. |
Returns a scaled vector
mm_transf_zscore(1:10)
mm_transf_zscore(1:10)
Visualize 2D scatterplot of mvlm including predicted shapes.
mm_VizModel(dat, clas_col = NULL)
mm_VizModel(dat, clas_col = NULL)
dat |
Input mvlm, created by mm_BuildModel (or by using geomorph::procD.lm) |
clas_col |
A classifier to color the data by. If null (default) all points will be grey. Otherwise, data will be plotted as rainbow(n) colors. |
A list containing the results of the mvlm, visualizations of shape trends along the regression line, and the model itself.
Plot a scatterplot and vizualize shape change across the X axis.
mm_VizShapespace( mmPCA, xPC = 1, yPC = 2, yr = c(0, 1.1), cols = NULL, title = "", png_dir = NULL )
mm_VizShapespace( mmPCA, xPC = 1, yPC = 2, yr = c(0, 1.1), cols = NULL, title = "", png_dir = NULL )
mmPCA |
Output of |
xPC |
The PC to be plotted on the x axis. If yPC is left null, a univariate density distribution will be plotted with min/max shapes. |
yPC |
The PC to be plotted on the y axis. |
yr |
The y-xis range, in the format c(0,1) |
cols |
A vector of colors of length n, for use in scatterplot. |
title |
To be used for the plot |
png_dir |
A file path to a directory in which to save out PNG figures. Names will be automatically assigned based on input PC(s). |
Meant to be a quick diagnostic plot with minimal customization.
Produces a series of plots to visualize PCA analysis. If png_dir
is
specified, function will save out .png
files. Otherwise plots will be
displayed in the default plot window.
Analyze shapes/phenotypes of hormone data using Geometric Morphometric inspired methods.
Daniel E. Ehrlich
Print basic summary
print_summary(aln, grps = NULL)
print_summary(aln, grps = NULL)
aln |
An object created with mm_ArrayData |
grps |
(Optional) A numeric vector that defines groupings |
A character vector with basic descriptive information, to be used with
print()
. If grps=TRUE
, will return a list of character vectors.