Quick start tutorial¶
Adenine* can be installed using standard Python tools (with administrative or sudo permissions on GNU-Linux platforms):
$ pip install adenine
Installation from sources¶
If you like to manually install Adenine, download the .zip or .tar.gz archive from http://slipguru.github.io/adenine/. Then extract it and move into the root directory:
$ unzip slipguru-adenine-|release|.zip
$ cd adenine-|release|/
or:
$ tar xvf slipguru-adenine-|release|.tar.gz
$ cd adenine-|release|/
Otherwise you can clone our GitHub repository:
$ git clone https://github.com/slipguru/adenine.git
From here, you can follow the standard Python installation step:
$ python setup.py install
After Adenine installation, you should have access to two scripts,
named with a common ade_
prefix:
$ ade_<TAB>
ade_analysis.py ade_run.py
This tutorial assumes that you downloaded and extracted Adenine
source package which contains a examples\data
directory with some data files (.npy
or .csv
) which will be used to show Adenine functionalities.
Adenine needs only 3 ingredients:
n_samples x n_variables
input matrixn_samples x 1
output vector (optional)configuration
file
Input data format¶
Input data are assumed to be:
numpy
array stored in.npy
files organized with a row for each sample and a column for each feature,- tabular data stored in comma separated
.csv
files presenting the variables header on the first row and the sample indexes on the first column, - toy examples available from
adenine.utils.data_source
function.
Configuration File¶
Adenine configuration file is a standard Python script. It is imported as a module, then all the code is executed. In this file the user can define all the option needed to read the data and to create the pipelines.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Configuration file for adenine."""
from adenine.utils import data_source
# -------------------------- EXPERMIENT INFO ------------------------- #
exp_tag = '_experiment'
output_root_folder = 'results'
plotting_context = 'notebook' # one of {paper, notebook, talk, poster}
file_format = 'pdf' # or 'png'
# ---------------------------- INPUT DATA ---------------------------- #
# Load an example dataset or specify your input data in tabular format
data_file = 'data.csv'
labels_file = 'labels.csv' # OPTIONAL
samples_on = 'rows' # if samples lie on columns use 'cols' or 'col'
data_sep = ',' # the data separator. e.g., ',', '\t', ' ', ...
X, y, feat_names, index = data_source.load('custom',
data_file, labels_file,
samples_on=samples_on,
sep=data_sep)
# ----------------------- PIPELINES DEFINITION ------------------------ #
# --- Missing values imputing --- #
step0 = {'Impute': [False, {'missing_values': 'NaN',
'strategy': ['median',
'mean',
'nearest_neighbors']}]}
# --- Data preprocessing --- #
step1 = {'None': [False], 'Recenter': [False], 'Standardize': [False],
'Normalize': [False, {'norm': ['l1', 'l2']}],
'MinMax': [False, {'feature_range': [(0, 1), (-1, 1)]}]}
# --- Unsupervised features learning --- #
# affinity ca be precumputed for SE
step2 = {'PCA': [False, {'n_components': 3}],
'IncrementalPCA': [False],
'RandomizedPCA': [False],
'KernelPCA': [False, {'kernel': ['linear', 'rbf', 'poly']}],
'Isomap': [False, {'n_neighbors': 5}],
'LLE': [False, {'n_neighbors': 5,
'method': ['standard', 'modified',
'hessian', 'ltsa']}],
'SE': [False, {'affinity': ['nearest_neighbors', 'rbf']}],
'MDS': [False, {'metric': True}],
'tSNE': [False],
'RBM': [False, {'n_components': 256}],
'None': [False]
}
# --- Clustering --- #
# affinity ca be precumputed for AP, Spectral and Hierarchical
step3 = {'KMeans': [False, {'n_clusters': [3, 'auto']}],
'AP': [False, {'preference': ['auto']}],
'MS': [False],
'Spectral': [False, {'n_clusters': [3, 8]}],
'Hierarchical': [False, {'n_clusters': [3, 8],
'affinity': ['manhattan', 'euclidean'],
'linkage': ['ward', 'complete', 'average']}]
}
Experiment runner¶
The ade_run.py
script, executes the full Adenine framework. The prototype is the following:
$ ade_run.py ade_config.py
When launched, the script reads the data, then it creates and runs each pipeline saving the results in a tree-like structure which has the current folder as root.
Results analysis¶
The ade_analysis.py
script provides useful summaries and graphs from the results of the experiment. This script accepts as only parameter a result directory
already created:
$ ade_analysis.py result-dir
The script produces a set of textual and graphical results. An output example obtained by one of the implemented pipelines is represented below.


You can reproduce the example above specifying data_source.load('circles')
in the configuration file.
Example dataset¶
An example dataset can be dowloaded here
. The dataset is a random extraction of 801 samples (with dimension 20531) measuring RNA-Seq gene expression of patients affected by 5 different types of tumor: breast invasive carcinoma (BRCA), kidney renal clear cell carcinoma (KIRC), colon (COAD), lung (LUAD) and prostate adenocarcinoma (PRAD). The full dataset is maintained by The Cancer Genome Atlas Pan-Cancer Project [1] and we refer to the original repository for furher details.
Reference¶
[1] Weinstein, John N., et al. “The cancer genome atlas pan-cancer analysis project.” Nature genetics 45.10 (2013): 1113-1120.