Anduril 1 (legacy)

Anduril 1 is a legacy version that will continue to be available for download, but is no longer actively maintained. Use Anduril 2 for new installations.

License

Anduril 1 is licensed under the GNU General Public License. Note that Anduril 2 uses a different license.

Documentation

Anduril 1 API


Download

Anduril is an integrator of multiple analysis tools, and thus it depends on a large set of libraries and software. If you plan to test drive Anduril only, it may be wise to start out with the preinstalled VirtualBox image, or with Docker.

VirtualBox

Download the image from here.

Docker

Docker is a container platform – almost like a virtual machine, but it runs directly on the current operating system kernel. Anduril on Docker can use all the computer resources, but it requires a Linux operating system.

We build many flavors of Anduril to the Docker Hub

Installation on Ubuntu/Debian

Due to the nature of dependency use in Anduril, the full installation of Anduril and its Bundles always requires a sudo or root access to the system.

Binary Installation

Example for Ubuntu Trusty (14.04 LTS): Add the following repositories to your 3rd party sources:

deb http://anduril.org/linux/ binary/
deb http://cran.at.r-project.org/bin/linux/ubuntu trusty/

You can add them by copy/pasting the above lines in the file: /etc/apt/sources.list.d/anduril.list

Next, add the signature for our repository, and CRAN R:

wget http://anduril.org/linux/anduril_pub.gpg -O - | sudo apt-key add -
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9

Update your package lists and install anduril

sudo apt-get update
sudo apt-get install anduril

Note that R packages must be installed separately. Many of the components use Bioconductor packages. If you want to automatically install requirements of components, such as R packages, use the InstallRequirements component:

cd /tmp
sudo ANDURIL_HOME=/usr/share/anduril anduril run-component InstallRequirements

Refer to the documentation of InstallRequirements prior to running it.

Source Installation

Installation of Anduril 1.x

To install dependencies, install Anduril like in the binary example above, but instead of package anduril, install package anduril-meta.

# Clone the repository:
hg clone https://bitbucket.org/anduril-dev/anduril -r default anduril
# set up environment
cd anduril
export ANDURIL_HOME=$( pwd )
export PATH=$ANDURIL_HOME/bin:$ANDURIL_HOME/utils:$PATH
# compile
ant anduril.jar

For each of the bundles you want to install, find the source code URL, and:

# Clone the repository to ANDURIL_HOME
cd $ANDURIL_HOME
hg clone https://bitbucket.org/anduril-dev/[bundleRepo] -r default [bundle_name]
cd [bundle_name]
ant setup
sudo $ANDURIL_HOME/utils/anduril-install-requirements -b . '*'

Eclipse integration

The integration of Anduril into the Eclipse IDE makes it possible to edit AndurilScript source and to invoke the workflow engine from Eclipse. The plugin implements syntax and error highlighting. See User Guide for instructions on installation and use. AndurilEclipse plugin installation is done using the Software Updates feature in Eclipse, with the following URL: http://anduril.org/pub/anduril_eclipse.


Bundles

List of bundles hosted by Anduril development team:

Name Description
Builtin Builtin bundle is shipped with Anduril, and includes input and output components.
Anima Anduril IMage Analysis bundle. APIs for popular scientific image analysis platforms, and convenience components.
FlowAnd Flow cytometry analysis tools.
Microarray Microarray analysis.
Moksiskaan Generic database and a toolkit for integrating information on connections between genes, proteins, pathways, drugs, and other biological entities.
Sequencing Deep sequencing data analysis.
TCGA Routines for TCGA microarray, clinical and sequencing data as well as TCGA data importing.
Tools All those generic little tools to help you, CSV handling, plotting etc.

Builtin

Includes the very basic components.

Anima

ANduril IMage Analysis bundle

FlowAnd

Flow cytometry analysis for Anduril

Microarray

Provides components for several types of analysis, such as

Links:

Moksiskaan

Moksiskaan is a generic database and a toolkit that can be used to integrate information about the connections between genes, proteins, pathways, drugs, and other biological entities. The database is used to combine various existing databases to find biological relationships between the genes of interest and to predict their interactions.

Sequencing

Bundle intended for sequencing analysis.

TCGA

The bundle encompasses routines to handle TCGA microarray, clinical and sequencing data as well as importing TCGA data into pipelines. The bundle’s components automatize the download of data from the TCGA data portal, and supports TCGA specific features such as data levels and batches. The download components automatically annotate array files with their TCGA sample codes.

Tools


Frequently Asked Questions

Discussion forum

If your answer is not here, come and ask us on the dicussion forum! Q&A Forum

Distributed Execution

Anduril provides a support for Slurm out-of-the-box and other schedulers via a custom prefix mode.

Slurm

To use Slurm with Anduril, specify –exec-mode slurm in Anduril command line.

Example: Allocate submit each component in awesome_workflow as a job to Slurm

anduril run awesome_workflow.and --exec-mode slurm -b awesome_bundle

Anduril uses Slurm srun command to launch components. To pass arguments to srun, use –slurm-args [arguments] switch. Dashes in arguments must be replaced with %-signs.

Exmple: Allocate 5 CPUs and one gigabyte of memory for the component

anduril run awesome_workflow.and --exec-mode slurm --slurm-args "%c 5 %%mem=10000" -b awesome_bundle

If you want to pass custom resource requirements to Slurm on component level, you can use @cpu and @memory annotations in your workflow. The specified values will be passed to srun command.

Example: Allocate 5 CPUs and one gigabyte of memory for the component

cB = CSVCleaner(original = in, rename = "number=value", @cpu=5, @memory=1024)

Similarly you can tell Slurm which node to use to run a specific component.

cB = CSVCleaner(original = in, rename = "number=value", @host="node3")

Prefix scripts

By using prefix mode it is possible to use run Anduril components with another scheduler, any other program or even specify a custom logic for each component. Prefix mode simply appends a custom prefix in front of the component execution string, so that component launch string is passed to the prefix as parameters. The prefix mode is taken into use by –exec-mode prefix switch and the prefix is specified by –prefix [script-name].

Example: Execute custom_prefix as part of each component’s execution.

anduril run awesome_workflow.and --exec-mode prefix --prefix custom_prefix -b awesome_bundle

One way to introduce custom logic for executing components is to use a Bash script as a prefix. Refer to doc/templates/prefix_template.sh as an example for such a script. Prefix mode also supports @cpu, @memory, @host annotations. To take them into use you must specify execution logic in a prefix script. The prefix template script contains an example how these annotations are applied to a prefix.

How do I simulate components and unavailable resources?

Records can be used to alter the output interface of the components. You may use this to:

The real advantage of using records is in runtime switching between the actual implementations.

if (useOldData) {
   dirOldData = "execBak/"
   mdIn1      = INPUT(path=dirOldData+"dbRead/idlist.lst")
   mdIn2      = INPUT(path=dirOldData+"dbRead/annotations.csv")
   myData     = record(ids  = mdIn1, // rename idlist to ids
                annotations = mdIn2,
                report      = null) // LatexCombiner will skip this automatically
} else {
   myData     = MyDatabaseReader() // outputs are: ids, annotations, and report
}

How do I generate unique row identifiers for a CSV file?

You can use TableQuery component and SQL sequencies for this. The following SQL will add an “id” column to the given file and the values are of the form id#, where # gets numbers from one to the number of rows in the file.

CREATE SEQUENCE seqMY_id AS INTEGER START WITH 1 INCREMENT BY 1;

SELECT 'id'||NEXT VALUE FOR seqMY_id AS "id", table1.* FROM table1

How do I define inputs that accept files and folders?

You may use generic data types to define component inputs that will accept files and folders. StandardProcess component can be used as an example of this. The component can be found from the microarray bundle.

How do I define public constants?

You may have a set of universal constants you would like to use in various pipelines. You can wrap these constants into a public function that can be called to make them visible. The same function can be used to include these values into your bundle API.

First you will need a function that is used to declare the constants. This same initialization function may also carry out some other preparements for the end user. Here is an example body that could be used:

include "doc-files/myConstants.and"
function MyInit {
  // You may add some logic in here!
}

The constants are defined in doc-files/myConstants.and, which is now an independent file. This file can be generated and maintained without a need to worry about the function itself.

The doc element of your component.xml may contain something like:

This function declares a set of useful
<a href="myConstants.and">constants for me</a>.

Now the actual values of your constants are included into the API.

How do I execute quick-and-dirty or standalone workflows?

Sometimes you want to use Anduril for a quick task and do not want to create various files (workflow configuration, CSV files, etc) as done with more complex problems. Or, you may want to avoid polluting the file system with too many files and want to have a standalone workflow file.

The following example shows how Bash and inline file generation can be used to create standalone workflows. This simple example creates a 5×5 random matrix and converts it into an Excel spreadsheet with style information. The style sheet is stored in a temporary file; this has the disadvantage that the name and timestamp of the file change on each execution, so CSV2Excel is re-executed each time. An alternative is to place the style sheet into a separate file.

#!/bin/bash

STYLE_FILE=$(tempfile)
cat >$STYLE_FILE <<EOF
Row Column  Bold
1   *   true
EOF

EXECUTE_DIR=execute
# When - is given as workflow file, Anduril reads standard input.
anduril run - -d $EXECUTE_DIR <<EOF
style = INPUT(path="$STYLE_FILE")
matrix = Randomizer(rows=5, columns=5)
excel = CSV2Excel(csv=matrix, style=style)
OUTPUT(excel.excelFile)
EOF

rm $STYLE_FILE

Why do my INPUT component instances get re-executed even if I haven’t changed them?

The likely reason for your INPUTs to get re-executed even if you haven’t changed them is that you using them directly as inputs to downstream components without first creating a named instance, like this:

myComponent = SomeComponent(INPUT(path="myFile.csv"), ...)

If you then introduce a new INPUT component instance upstream of these, their dynamic names change and the INPUTs are re-execute. To circumvent this problem, simply define a named instance for your inputs:

inputInstance = INPUT(path="myFile.csv")
otherComp = OtherComponent(inputInstance)