Unsupervised or supervised clustering analysis has been an integral part of transcriptomics, playing an important role in the identification of sub-groups.
Breaking: Some well-used and comprehensive clustering tools are provided in this section.
Introduction: The R package provides 30 indices which determine the number of clusters in a data set and it offers also the best clustering scheme from different results to the user. In addition, it provides a function to perform k-means and hierarchical clustering with different distance measures and aggregation methods. Any combination of validation indices and clustering methods can be requested in a single function call. This enables the user to simultaneously evaluate several clustering schemes while varying the number of clusters, to help determining the most appropriate number of clusters for the data set of interest.
Installation: To install this package, start R (version “4.2”) and enter:
install.packages(Nbclust)
Application: The Nbclust vignette can be found at https://cran.r-project.org/web/packages/NbClust/NbClust.pdf.
Introduction: This web tool allows users to upload their own data and easily create Principal Component Analysis (PCA) plots and heatmaps. Data can be uploaded as a file or by copy-pasteing it to the text box. Data format is shown under “Help” tab.
Type | Number |
---|---|
species | 9 |
Data sources | 22 |
Network | 7 |
Installation:To use the Docker image, you need to have Docker installed. Then use the following code:
sudo docker pull taunometsalu/clustvis
mkdir ~/customClustvis/
cd ~/customClustvis/
wget https://github.com/taunometsalu/ClustVis/archive/master.zip
unzip master.zip
chmod -R go+rx ~/customClustvis/
sudo docker run -d \
--name customClustvis \
-p <myPort>:3838 \
-v ~/customClustvis/ClustVis-master/:/srv/shiny-server/:ro \
taunometsalu/clustvis
To start using ClustVis R package, you can look at the examples in the vignette that comes with the package from github:
install.packages("BiocManager")
BiocManager::install("pcaMethods")
library(devtools)
install_github("taunometsalu/pheatmap")
install_github("taunometsalu/clustvis/Rpackage", build_vignettes = TRUE)
vignette("vignette", "clustvis")
Application: The ClustVis tutorial can be found at https://biit.cs.ut.ee/clustvis/
Web server: ClustVis.
Introduction: Medusa3 is a java standalone application for visualization and clustering analysis of biological networks in 2D. It is highly interactive and it supports weighted and multi-edged graphs where each edge between two bioentities can represent a different biological concept. Comparing to previous versions, it is currently enriched with a variety of layout and clustering methods for more intuitive visualizations. It is easy to integrate with web application since it is offered as an applet.
Type | Number |
---|---|
species | 9 |
Data sources | 22 |
Network | 7 |
Installation:The latest version 3.0 can be downloaded directly from here. https://docs.google.com/uc?id=0B-FSkXfyY2AFZTUxYWIxMTgtMjJiYy00NTMyLTlmMzUtZDdiZDFlZGRhYjUw&export=download&hl=en
Application: The Medusa tutorial can be found at https://sites.google.com/site/medusa3visualization/tutorial
Web server: Medusa.
Introduction: wcd is an open source tool for clustering expressed sequence tags. It implements matching using either d2 or edit distance, which provides sensitive matching. Heuristics speed up the clustering process. wcd parallelises well under MPI and also provides Pthreads clustering. wcd has a very compact memory representation which means that large files can be clustered on a single processor (and it can run quite happily without MPI or Pthreads headers or libraries being available).
Application: The wcd tutorial can be found at https://code.google.com/archive/about.
Web server: wcd.
Introduction: TimeClust is an user-friendly software package to cluster genes according to their temporal expression profiles. It can be conveniently used to analyze data obtained from DNA microarray time-course experiments. It implements two original algorithms expressed designed for clustering short time series together with hierarchical clustering and self-organizing maps. TimeClust executable files for Windows and LINUX platforms can be downloaded free of charge for non-profit institutions from the following web site http://aimed11.unipv.it/TimeClust.
Installation: The TimeClust software can be downloaded freely for non-profit institutions from http://aimed11.unipv.it/TimeClust. Executable files for Windows and LINUX platforms are available. The program requires the user to install the MATLAB runtime libraries, that are distributed together with the TimeClust files. The installation steps are here detailed.
How to install TimeClust:
run the MCRInstaller.exe. The setup wizard starts.
add the following directory to your system path:
copy the MCRInstaller.zip in the installation directory and unzip the MCRInstaller.zip file.
update your dynamic library path as follows:
Linux x86 32bit
export LD_LIBRARY_PATH="<mcr_root>/<ver>/bin/glnx86:<mcr_root>/<ver>/runtime/glnx86:<mcr_root>/<ver>/sys/os/glnx86:<mcr_root>/<ver>/sys/java/jre/glnx86/jre1.5.0/lib/i386/native_threads:<mcr_root>/<ver>/sys/java/jre/glnx86/jre1.5.0/lib/i386/client:<mcr_root>/<ver>/sys/java/jre/glnx86/jre1.5.0/lib/i386:"
export XAPPLRESDIR="<mcr_root>/<ver>/X11/app-defaults"
Linux x86-64 64bit
export LD_LIBRARY_PATH="<mcr_root>/<ver>/runtime/glnxa64:<mcr_root>/<ver>/sys/os/glnxa64:<mcr_root>/<ver>/sys/java/jre/glnxa64/jre1.4.2/lib/amd64/native_threads:<mcr_root>/<ver>/sys/java/jre/glnxa64/jre1.4.2/lib/amd64/client:<mcr_root>/<ver>/sys/java/jre/glnxa64/jre1.4.2/lib/amd64:"
export XAPPLRESDIR="<mcr_root>/<ver>/X11/app-defaults"
Note. If you have already installed MATLAB on your machine there is a limitation regarding directories on your path: this tool requires that mcr_root directory is first on the path, MATLAB requires that the matlabroot directory is first on the path.
Copy the TimeClust executable file and the TimeClust.CTF file in your application root directory.
Run for the first time the TimeClust executable file and close the application. In this step the TimeClust_mcr directory and its subdirectories are created.
Open the config.txt file in the TimeClust_mcr/TimeClust1.3 directory to configure your application. In particular, set:
at the line 2 of this file, the desired web site to open from the TimeClust tool (default site is www.geneontology.org)
at the line 4, the command line to start the preferred editor to show text files (this step is not necessary for Windows platforms, the default text editor will be used).
at line 6, the command line to start the web browser to use (this step is not necessary for Windows platforms, the default web browser will be used).
Application: The TimeClust vignette can be found at http://aimed11.unipv.it/TimeClust/example.pdf.
Introduction: we newly developed a pool of biomedical visualization and modeling functions (240+), and proposed a scalable and emerging one-stop web service, Hiplot, with versatile task plugins, such as basic statistics, regression, clustering, genomics, transcriptomics, time series, meta-analysis, and risk models. It provides more simple web and command-line interfaces compared with other similar platforms, and is supported and maintained by the openbiox open source community.
Type | Number |
---|---|
species | 9 |
Data sources | 22 |
Network | 7 |
Application: The hiplot tutorial can be found at https://hiplot-academic.com/docs/
Web server: hiplot.
Introduction: The cola package provides a general framework for subgroup classification by consensus partitioning. It has the following features: 1. It modularizes the consensus partitioning processes that various methods can be easily integrated. 2. It provides rich visualizations for interpreting the results. 3. It allows running multiple methods at the same time and provides functionalities to straightforward compare results. 4. It provides a new method to extract features which are more efficient to separate subgroups. 5. It automatically generates detailed reports for the complete analysis. 6. It allows applying consensus partitioning in a hierarchical manner.
Installation:To install this package, start R (version “4.2”) and enter:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("cola")
Application: The cola vignette can be found at https://bioconductor.org/packages/release/bioc/vignettes/cola/inst/doc/cola.html