Package: rSHAPE 0.3.2

rSHAPE: Simulated Haploid Asexual Population Evolution

In silico experimental evolution offers a cost-and-time effective means to test evolutionary hypotheses. Existing evolutionary simulation tools focus on simulations in a limited experimental framework, and tend to report on only the results presumed of interest by the tools designer. The R-package for Simulated Haploid Asexual Population Evolution ('rSHAPE') addresses these concerns by implementing a robust simulation framework that outputs complete population demographic and genomic information for in silico evolving communities. Allowing more than 60 parameters to be specified, 'rSHAPE' simulates evolution across discrete time-steps for an evolving community of haploid asexual populations with binary state genomes. These settings are for the current state of 'rSHAPE' and future steps will be to increase the breadth of evolutionary conditions permitted. At present, most effort was placed into permitting varied growth models to be simulated (such as constant size, exponential growth, and logistic growth) as well as various fitness landscape models to reflect the evolutionary landscape (e.g.: Additive, House of Cards - Stuart Kauffman and Simon Levin (1987) <doi:10.1016/S0022-5193(87)80029-2>, NK - Stuart A. Kauffman and Edward D. Weinberger (1989) <doi:10.1016/S0022-5193(89)80019-0>, Rough Mount Fuji - Neidhart, Johannes and Szendro, Ivan G and Krug, Joachim (2014) <doi:10.1534/genetics.114.167668>). This package includes numerous functions though users will only need defineSHAPE(), runSHAPE(), shapeExperiment() and summariseExperiment(). All other functions are called by these main functions and are likely only to be on interest for someone wishing to develop 'rSHAPE'. Simulation results will be stored in files which are exported to the directory referenced by the shape_workDir option (defaults to tempdir() but do change this by passing a folderpath argument for workDir when calling defineSHAPE() if you plan to make use of your results beyond your current session). 'rSHAPE' will generate numerous replicate simulations for your defined range of experimental parameters. The experiment will be built under the experimental working directory (i.e.: referenced by the option shape_workDir set using defineSHAPE() ) where individual replicate simulation results will be stored as well as processed results which I have made in an effort to facilitate analyses by automating collection and processing of the potentially thousands of files which will be created. On that note, 'rSHAPE' implements a robust and flexible framework with highly detailed output at the cost of computational efficiency and potentially requiring significant disk space (generally gigabytes but up to tera-bytes for very large simulation efforts). So, while 'rSHAPE' offers a single framework in which we can simulate evolution and directly compare the impacts of a wide range of parameters, it is not as quick to run as other in silico simulation tools which focus on a single scenario with limited output. There you have it, 'rSHAPE' offers you a less restrictive in silico evolutionary playground than other tools and I hope you enjoy testing your hypotheses.

Authors:Jonathan Dench

rSHAPE_0.3.2.tar.gz
rSHAPE_0.3.2.zip(r-4.5)rSHAPE_0.3.2.zip(r-4.4)rSHAPE_0.3.2.zip(r-4.3)
rSHAPE_0.3.2.tgz(r-4.4-any)rSHAPE_0.3.2.tgz(r-4.3-any)
rSHAPE_0.3.2.tar.gz(r-4.5-noble)rSHAPE_0.3.2.tar.gz(r-4.4-noble)
rSHAPE_0.3.2.tgz(r-4.4-emscripten)rSHAPE_0.3.2.tgz(r-4.3-emscripten)
rSHAPE.pdf |rSHAPE.html
rSHAPE/json (API)

# Install 'rSHAPE' in R:
install.packages('rSHAPE', repos = c('https://jdench.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

On CRAN:

This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.

1.00 score 126 downloads 58 exports 33 dependencies

Last updated 5 years agofrom:3c14194f56. Checks:OK: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKOct 30 2024
R-4.5-winOKOct 30 2024
R-4.5-linuxOKOct 30 2024
R-4.4-winOKOct 30 2024
R-4.4-macOKOct 30 2024
R-4.3-winOKOct 30 2024
R-4.3-macOKOct 30 2024

Exports:addDriftaddQuotesadjustBirthsbirthFunctionbuildPedigreecalc_relativeFitnesscompute_distGrowthcreate_genotypeFramecreateGenotypesdeathFunctiondefineNeighboursdefineSHAPEexpGrowthextract_popDemographicsextractInfo_focalIDfind_neededNeighboursfindParentfitnessDistfitnessLandscapegrowthFunctionlogisticGrowthlogisticMaplossSamplingmutationFunctionname_batchStringname_batchSubmitname_bodyScriptname_parameterScriptname_subScriptnameEnvironnameObjectnameTablenameTable_neighbourhoodnameTable_stepquerryEstablishedreportPopulationsreset_shapeDBretrieve_binaryStringrunProcessingrunReplicaterunSHAPEset_const_NK_interactionsMatset_const_RMF_globalOptimaset_DepbySite_ancestFitnessset_RMF_indWeightset_siteByState_fitnessMatshapeCombinationsshapeExperimentstopErrorsummarise_evolRepeatabilitysummarise_experimentFilessummarise_experimentParameterssummarise_popDemographicssummariseExperimenttrimQuotesupdateLineswrite_subScriptwriteParameters

Dependencies:abindbitbit64blobcachemclicodetoolscpp11DBIdoParallelevdfastmapforeachglueiteratorslatticelifecycleMASSMatrixMatrixModelsmemoisemnormtnumDerivpkgconfigplogrquantregrlangRSQLitesnSparseMsurvivalvctrsVGAM

Readme and manuals

Help Manual

Help pageTopics
This is a simple little function used to represent drift by introducing stochasticity to the vector passed by making poisson distribution calls. At present it forces values to integers because I've not been able to implement an appropriate continuous distribution for such calls that works with tested models and expected outcome.addDrift
This is a function to add quotation marks around each element of a character string vectoraddQuotes
This function ensures that a vector of values will sum to a given number. It's implemented in certain growth forms (curently: *constant* and *logistic*)adjustBirths
This function calculates the number of births for the vector of populations which are expected to be passed. The number of parameters which can be passed may be more than the number required to use one of the growth forms.birthFunction
This is a convenience script to build an named list of empty lists, where the names are based on the genotype IDs being passed.buildPedigree
This is a function to calculate the relative fitness for a vector of fitnesses. As a frame of reference it can use either an ancestral fitness value or the mean fitness of the passed vector. If the frame of reference is a value of zero - OR - the func_absDistance is set to TRUE then instead the vector is centered around a value of 1 where negative values will be set to zero.calc_relativeFitness
This function is used to calculate the effect size and timing of the next stochastic population disturbance in a SHAPE run.compute_distGrowth
This is a convenience function to ensure that we have a standard shaped data.frame. It is used to initiate a new table for the fitness landscape.create_genotypeFrame
This function searches the nearby mutational space of a focal genotype, identifies which genotypes in that space have not yet been identified, and create new database entries for any new genotypes.createGenotypes
This allows SHAPE to simulate the death process as a deterministic value, and may be density dependent.deathFunction
The function will identify the binary string of all possible neighbours to a focal genotype. It is important when querrying the fitness landscape.defineNeighbours
These are some global reference options that SHAPE will use and I consider the defaults. SHAPE parameters can be changed by calling this function and changing values OR by using the accessory SHAPE_parameters script, called in the SHAPE_runBody script. This second approach is considered more practical for building and running experiments.defineSHAPE
This function uses the exponential growth model and can either calculated the expected growth for a single time step OR it can work backwards to calculated what was the expected starting population size prior to a step of exponential growth.expGrowth
This is a function that steps forward through time steps of a SHAPE run and extracts population demographic information. This includes Fitness, Number of Lineages, and Transitions between dominant genotypes. Most important it will also return the information related to which lineages will eventually establish in the population, a piece of information that will be critical for downstream lineage specific information extraction.extract_popDemographics
This is a function to extract genotype/lineage specific information. This info will be mostly through time style of information but will also include information about it's line of descent, growth pressures pre-establishment, and population size.extractInfo_focalID
This function querries if a suite of genotypes exist within the fitness landscape database.find_neededNeighbours
This function will look through a pedigree data.frame and recursively continue building that back through the history of the SHAPE run being processed.findParent
This is the function that will call for draws from distributions.fitnessDist
This function will calculate the fitness values for genotypes being newly recorded to the fitness landscape.fitnessLandscape
This is a wrapper function where the birth and death related parameters of a SHAPE run are passed before the appropriate functions (and their associated methods) are called. This function will be called once per time step of a SHAPE run.growthFunction
This function is simply an implementation of the logistic growth equation where: f(x) = K / (1 + ((K - N_0)/N_0) *exp-k(x-x_0)) ; Where x_0 is an adjustment to the position of the midpoint of the curve's maximum value K = the curves maximum value, k = the steepness of the curve (growth rate), and N_0 is the starting population it includes parameters to change the midpoint as well as change the natural exponent (ie: exp) to some other value. NOTE: This is for continuous growth, and since SHAPE is discrete at present this is an unused function.logisticGrowth
This is the discrete time logistic growth function known as the logistic map. It calculates the amount of growth expected in a step of time given by: N_t+1 = N_t + r * (N_t (K - N_t)/K); where N_t is community size at a time point, r is the per step growth rate, and K is the environmental carrying capacity.logisticMap
This function actually calculates the stochastic loss to populations.lossSampling
This allows SHAPE to simulate the mutation process as a deterministic value. At present, values must be tracked as integer results for reasons of how I am passing to functions which identify what mutant genotype(s) are created.mutationFunction
This function is used to build or split character string to be used for naming batches of SHAPE runs.name_batchString
This is a function to programatically create R batch submission script namesname_batchSubmit
This is a function to programatically create R script namesname_bodyScript
This is a function to programatically create R script namesname_parameterScript
This is a function to programatically create R batch submission script namesname_subScript
This quick little function is a means for me to create the strings of environments and subsequently extract information back out.nameEnviron
This quick little function is a means for me to create the strings of environments and subsequently extract information back out.nameObject
This is a standardising function which allows SHAPE to programatiically name tables for the fitness landscape OR split a named table and extract the embedded information from its naming.nameTable
This is a standardising function which allows SHAPE to programatiically name tables for the neighbourhood record OR split a named table and extract the embedded information from its naming.nameTable_neighbourhood
This is a standardising function which allows SHAPE to programatiically name tables for the step-wise record OR split a named table and extract the embedded information from its naming.nameTable_step
This function is used to find which elements of a population matrix are deemed as established. Established is determined by having a number of individuals greater than or equal to a definable proportion of the summed community size.querryEstablished
This is a convenience function to ensure that our population demographics are stored in a data frame and exists because R's standard functions can collapse single row frames to named vectors. It requires that all passed vectors be of the same lengthreportPopulations
This is a convenience function to refresh connections to database files.reset_shapeDB
This is a function to search our mutational database and then find the binary string of the genotypeID passed. This function is more efficient when the number of mutations for each genotypeID be passed as this helps reduce the tables of the mutational space that are searched. This matters when large genotypes are simulated.retrieve_binaryString
This is a wrapper function to process a SHAPE run and extract meaningful summary information.runProcessing
This is the function that runs the main body, or meaningful execution, of SHAPE experiments. In other words this is the main work-horse function that calls all the other parts and will execute you simulation run. It has the main parts of: 1. Stochastic Events; 2. Deaths; 3. Births; 4. Mutations; and during mutations this is where the mutational landscape is queried and updated as required. NOTE: Many of its internal operations are controlled by options with the suffix "shape_" and are not explicitly passed as arguments at call to this function.runReplicate
This is the actual running of shape, it will initialise objects and values which are calculated from the parameters that have been set - see the options with the suffix 'shape_'. It will establish the database output files and other initial conditions and then perform replicate simulations as appropriately defined. In essense this is the master wrapper function for all other functions. If you want to test/see SHAPE's default run then simply call this function after loading the library you'll see an experiment built under your root directory. It at least requires that defineSHAPE have been run, else this is going to fail.runSHAPE
This is a function to just return a matrix that defines the sitewise dependencies for an NK fitness landscape. If K == 0 or, this is not an NK simulation, it return NULLset_const_NK_interactionsMat
This function samples the space of all possible genotypes and then defines one that will be considered as the independent fitness contribution global optima.set_const_RMF_globalOptima
This is a convenience function for setting the dependent fitness values of sites in an NK fitness landscape model. This allows the dependent fitness of sites to be calculated once and then referenced as mutations occur. It makes exploring this style of fitness landscape a bit more computationally friendly - as it generally isn't.set_DepbySite_ancestFitness
In a RMF fitness landscape model, there is a weighting value applied to the independent fitness contribution term. This function calculates that value for the runset_RMF_indWeight
This function is designed to establish an initial object which maps the fitness values of genome positions based on the state of that site. At present, this has no meaning if the model of simulation is no NK, Additive, or Fixed. Where the first is Kauffman's NK model and form of calculations, Additive is what that word would make you think for fitness effects of mutations at sites, and Fixed is when user supplied a defined fitness matrix that describes the entire fitness landscape. NOTE: This function should likely be called without supplying any non-default arguments as it will use the shape_ options defined.set_siteByState_fitnessMat
This is a function to take the input parameters and build the parameter combinationsshapeCombinations
This is a function used to read the SHAPE_experimentalDesign type input file and then build a SHAPE experiment by creating all the folder structure, .R and .sh scripts required to programatically run your experiment - excluding post-analysis, that's a you problem.shapeExperiment
This is a convenience wrapper for sending an error and ending the SHAPE run as well as the R environment. It will print a message and then traceback() report before pausing and quiting the R session. This exists to help debugging when SHAPE is run in batch-mode.stopError
This function will use output from summarise_experimentFiles and summarise_experimentParameters to help with expectations concerning run output and handling. This will save an RData file which will contain one object: all_popSets, which is a list of relevant control information about I/O and then a series of other RData files which contain the demographics information as a matrix with the mean and standard deviation of demographics for all replicates.summarise_evolRepeatability
This function will find all initially processed output files from individual replicates and return summary information. That information is saved to an RData file which will contain 3 objects: all_proccessedFiles, all_jobInfo, all_dividedFilessummarise_experimentFiles
This function will use output from summarise_experimentFiles to locate all parameter files and then report on all those parameters for the jobs that were run. This will save an RData file which will contain one object: all_parmInfosummarise_experimentParameters
This function will use output from summarise_experimentFiles and summarise_experimentParameters to help with expectations concerning run output and handling. This will save an RData file which will contain one object: all_popSets, which is a list of relevant control information about I/O and then a series of other RData files which contain the demographics information as a matrix with the mean and standard deviation of demographics for all replicates.summarise_popDemographics
This function is a wrapper for getting a summary of the results of an rSHAPE run and/or experiment as a whole. The former is presumed to be of greater use but either is fine as per your needs. This wrapper will cause RData files to be created which contain the summarised experimental details that you can then use more easily for analysis.summariseExperiment
This is a function to trim a string by removing the first and last character, it's used to trim quotation marks used in the parameter inputtrimQuotes
This is a function which is used to update lines that are searched and replace in a manner conditional to this script's circumstances The input lines can be a vector of any length, and the search patterns can be a list of any length where each list vector is used together. The values should be a list of information used as replacement info.updateLines
This function is used to programatically take vectors of paramters and write suites of R parameter scripts that will form part of a SHAPE experiment that is being built for running. This is a wrapper for writting out the suite of necessary scripts to form a run.write_subScript
This is a file for updating the post analysis plotting script and creating an updated copy in the experiment's folderwriteParameters