Sunday, March 2, 2014

scikit-learn

scikit-learn - Machine Learning in Python

  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license

Prerequisites

Installation

For Cray XC30 only.

% module unload PrgEnv-cray
% module load PrgEnv-gnu

Start.

% cd $WORK/setup
% wget --no-check-certificate https://pypi.python.org/packages/source/s/scikit-learn/scikit-learn-0.14.1.tar.gz
% tar zxf scikit-learn-0.14.1.tar.gz
% cd scikit-learn-0.14.1
% python setup.py build
% python setup.py install
% cd ..
% nosetests -exe sklearn

I got 1 error on sklearn.cluster.bicluster.tests.test_utils.test_get_submatrix. Don't worry! Just go ahead.

matplotlib

"matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits."

For Cray XC30 only.

% module unload PrgEnv-cray
% module load PrgEnv-gnu

Start.

% cd $WORK/setup
% wget https://downloads.sourceforge.net/project/matplotlib/matplotlib/matplotlib-1.3.1/matplotlib-1.3.1.tar.gz
% tar zxf matplotlib-1.3.1.tar.gz
% cd matplotlib-1.3.1
% python setup.py build
% python setup.py install
% cd ..
% python -c "import matplotlib; matplotlib.test()"

NumPy and SciPy with ATLAS

We will compile NumPy and SciPy using GNU Compiler with the auto-optimized BLAS and LAPACK from ATLAS. If you are on Cray, run the following commands.

% module unload PrgEnv-cray
% module load PrgEnv-gnu

ATLAS

% cd $WORK/setup
% wget http://sourceforge.net/projects/math-atlas/files/Stable/3.10.1/atlas3.10.1.tar.bz2
% tar jxf atlas3.10.1.tar.bz2
% cd ATLAS
% wget http://www.netlib.org/lapack/lapack-3.4.2.tgz
% mkdir build-gcc
% cd build-gcc
% ../configure --prefix=$WORK -Fa alg -fPIC --with-netlib-lapack-tarfile=../lapack-3.4.2.tgz --cc=gcc
% make
% make check
% make install

Due to some optimization processes while making ATLAS, it might take up to a hour to complete.

NumPy

% cd $WORK/setup
% wget http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz
% tar zxf numpy-1.8.0.tar.gz
% cd numpy-1.8.0/
% env ATLAS=$WORK $WORK/bin/python setup.py build --fcompiler=gnu95 |& tee build.log
% env ATLAS=$WORK $WORK/bin/python setup.py install
% cd ..
% python -c "import numpy; numpy.test()"

SciPy

% cd $WORK/setup
% wget http://sourceforge.net/projects/scipy/files/scipy/0.13.3/scipy-0.13.3.tar.gz
% tar zxf scipy-0.13.3.tar.gz
% cd scipy-0.13.3/
% $WORK/bin/python setup.py build --fcompiler=gnu95 |& tee build.log
% $WORK/bin/python setup.py install
% cd ..
% python -c "import scipy; scipy.test()"

Monday, February 24, 2014

Amber 12 with Amber Tools 13 on Cray XC30

In this post, I will show you how to compile and run Amber 12 with AmberTools 13 on Cray XC30. Although we can compile them using Intel compiler, PGI compiler, and others, I have decided to use GNU compiler to reduce the number of failed tests.

First of all, you should register (and buy?) a license for Amber 12 to get the its source code. AmberTools 13 can be downloaded directly from the website after your registration.

For compilation, you need the following files in your setup directory ($WORK/setup).
  • Amber12.tar.bz2
  • AmberTools13.tar.bz2
And it is time to begin.

First of all, we have to load some modules and setting environment variables.

% module unload PrgEnv-cray
% module load PrgEnv-gnu
% module load netcdf

Next, extract Amber Tools, then Amber.

% test -d $WORK/apps && cd $WORK/apps || mkdir -p $WORK/apps && cd $WORK/apps
% tar jxf $WORK/setup/AmberTools13.tar.bz2
% tar jxf $WORK/setup/Amber12.tar.bz2
% setenv AMBERHOME $WORK/apps/amber12
% cd $AMBERHOME

Before compile Amber 12, we have to install all bug fixes so far. With Amber 12, you fortunately have an automatic tool to download bug fixes and apply patches (21 bug fixes for Amber 12 and 23 bug fixes for AmberTools 13 as of Feb. 22, 2014.)

% ./update_amber --update

You might have to run the above command again if there is some update for update_amber itself.

Serial Version

% ./configure -noX11 --with-netcdf $NETCDF_DIR --with-python $WORK/bin/python gnu
% make install
& make test.serial

The Python used here was installed in the previous post.

There are 13 file comparisons failed with Amber 12 and 1 file comparisons failed with AmberTools 13 (very small delta 10-7). I will come back with this problem later.

Parallel Version

% ./configure -mpi -noX11 -crayxt5 --with-netcdf $NETCDF_DIR --with-python $WORK/bin/python gnu
% make install

To test the parallel version, we have to submit a job (to the TINY queue, which is a dedicated queue for testing purpose). Create a script file named ptest.pbs with the following content.

#!/bin/csh
#PBS -q TINY
#PBS -N Amber12_PTest
#PBS -l mppwidth=16
#PBS -l mppnppn=16
#PBS -j oe

module unload PrgEnv-cray
module load PrgEnv-gnu
module load netcdf

setenv AMBERHOME "/work/${USER}/apps/amber12"
set path = ( ${AMBERHOME}/bin ${path} )

cd ${AMBERHOME}

setenv DO_PARALLEL "aprun -n 16 -N 16"
make test.parallel

Then, submit the job and wait. You can check the status of your job by checking queues' status.

% qsub ptest.pbs
% qstat -a

After your make have finished, check out the output in file Amber12_PTest.o<ID>, in which ID is your job's ID.

On Feb. 22, 2014, I get the following results.
  • Amber 12: 28 file comparions failed.
  • AmberTools 13: 21 file comparisons failed, 14 tests experienced errors.

Saturday, February 22, 2014

Amber12 with AmberTools13 on CentOS 6

This is a very quick note for install on your own small cluster or single server. If you need more information, please leave a comment.

The destination is /opt/local/amber12 and the shell is /bin/bash.

# yum install libXdmcp-devel python-devel

# mkdir -p /opt/local
# tar jxf AmberTools13.tar.bz2 -C /opt/local
# tar jxf Amber12.tar.bz2 -C /opt/local

# export AMBERHOME=/opt/local/amber12

# cd $AMBERHOME
# ./update_amber --update

Update once again due to the update of update.py.

# ./update_amber --update

Serial version

Configure, compile, install, and test.

# ./configure gnu
# make install
# make test.serial

There are 5 file comparisons failed. However,  they seem to be bugs which are fixed with update.21 (published on Nov. 4, 2013) of AmberTools13 without updates to the saved test output files (run on Mar. 19, 2013).

Parallel version

Install and active required and optional packages.

# yum install openmpi-devel blacs-openmpi-devel blacs-openmpi-static ga-openmpi-devel ga-openmpi-static hdf5-openmpi-devel hdf5-openmpi-static scalapack-openmpi-devel scalapack-openmpi-static mpi4py-openmpi
# module load openmpi-x86_64

Configure and install.

# cd $AMBERHOME
# make clean
# ./configure -mpi gnu
# make install

Test.

# export DO_PARALLEL="mpirun -np 16"
# make test.parallel

Friday, February 21, 2014

Python 2.7.x on Cray XC30

On Cray XC30, the default Python is at version 2.6. The recent version in 2.x series is 2.7 with many improvements and it includes many features to reach 3.x series. I will show you how to setup your own Python 2.7 in your $WORK directory.

Required Packages

Compilation and Installation

Firstly, load the GNU environment.

% module unload PrgEnv-cray
% module load PrgEnv-gnu

Next, download and extract the latest version of Python 2.7.x.

% cd $WORK/setup
% wget http://www.python.org/ftp/python/2.7.6/Python-2.7.6.tgz
% tar zxf Python-2.7.6.tgz
% cd Python-2.7.6

It is time for compilation and installation.

% ./configure --prefix=$WORK \
              --enable-ipv6 --enable-unicode=ucs4 \
              --enable-shared --enable-profiling \
              CC=gcc CPP=cpp CFLAGS=-I/usr/include/ncurses LDFLAGS=-L.
% make |& tee make.log

You may see something like,

Python build finished, but the necessary bits to build these modules were not found:
bsddb185           dl                 imageop
sunaudiodev

... but do not worry. It is expected and those modules are deprecated.

It is time to do some tests.

% make test

If there is no special fails, just do the installation.

% make install

Now, you can use your built version of Python, which is located at $WORK/bin/python, by default because $WORK/bin is already in your system's path. If it is not available, please log out and log in again.

For some personal reasons, I need a Python version which is not depended on libpython.so. I have to build once again without --enable-shared option.

% ./configure --prefix=$WORK \
              --enable-ipv6 --enable-unicode=ucs4 \
              --enable-profiling \
              CC=gcc CPP=cpp CFLAGS=-I/usr/include/ncurses
% make |& tee make.log
% make install

Tools for Making Package Installation Easier

For install and upgrade packages easier, we will use setuptools (easy_install) and pip.

% cd $WORK/setup
% wget --no-check-certificate https://pypi.python.org/packages/source/s/setuptools/setuptools-2.2.tar.gz
% tar zxf setuptools-2.2.tar.gz
% cd setuptools-2.2/
% $WORK/bin/python setup.py install
% $WORK/bin/easy_install pip

If you use tcsh as the default shell interpreter, please logout and login again for activate the new installed tools.

Testing Tools

We will install Nose for testing some packages such as NumPy and SciPy.

% easy_install nose

Interactive Shell

% $WORK/bin/easy_install ipython

BerkeleyDB 4.8.x on Cray XC30

Our applications use Berkeley DB for storing data and we do not have time to upgrade it for using some other modern key-value databases.

Since we will build Java API, we should load the module java. In addition, we will use the latest version of GNU Compiler, which is 4.8.1.

% module unload PrgEnv-cray
% module load PrgEnv-gnu
% module load java

Now, it is time to begin.

% cd $WORK/setup
% wget http://download.oracle.com/berkeley-db/db-4.8.30.tar.gz
% tar zxf db-4.8.30.tar.gz
% cd db-4.8.30/build_unix/
% ../dist/configure --prefix=$WORK --enable-cxx --enable-java --enable-o_direct --enable-stl CC=gcc CXX=g++ CPP=cpp
% make
% make install

Setting up the Working Environment

First Knowledge

  • The list of high-performance computing systems at JAIST can be found here.
  • Almost settings for your account can be changed by logging into server sparc1 from the internal network.
  • The default shell for user is tcsh.
  • Password used for logging into these servers is the same with your email password. Be careful!

Shell Interpreter

If you are not familiar with tcsh, you can change it to bash easily. Please note that I would prefer to use tcsh in all of my posts.
  1. Log in to sparc1.
  2. Check out your current information.
    % finger $USER
  3. Change the default shell.
    % passwd -r ldap -e
    Enter /bin/bash as the new shell.
  4. Confirm the change.
    % finger $USER
It may take a while for the change can be used.

Make Your Own Work Space

Almost systems has a very big directory /work for storing your temporary data used in computation process. This directory is locally available to the each cluster and shared with nodes inside the cluster. You should make your own directory, such as /work/$USER, and copy your applications as well as required data to there for computation. You should remove them after the computation to make the free space available for other users.

% mkdir /work/$USER

For easily in usage, I often set a environment variable WORK to point it to /work/$USER whenever possible.

% echo 'test -d /work/$USER && setenv WORK /work/$USER && cd $WORK' >> /home/$USER/.cshrc

Log out and log in again to check the effect.

If you want a more complicated settings, you can check out my .cshrc.

For the purpose of custom installation, we should create a directory for storing source packages.

% mkdir /work/$USER/setup

Very First Notes

Hi there,

Welcome to my notes on using high-performance computing (HPC) systems at JAIST.

I have noticed that many of you do not know how to use HPC systems at JAIST or use them inefficiently due to the lack of guidance. I have used them for a long time and it is my responsibility to share with you my experience on using those systems. I will write all posts in English because almost resources are still in Japanese now.

If you find this blog useful, please share with your friends or join with me to make other useful notes.