Sunday, March 2, 2014

scikit-learn

scikit-learn - Machine Learning in Python

  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license

Prerequisites

Installation

For Cray XC30 only.

% module unload PrgEnv-cray
% module load PrgEnv-gnu

Start.

% cd $WORK/setup
% wget --no-check-certificate https://pypi.python.org/packages/source/s/scikit-learn/scikit-learn-0.14.1.tar.gz
% tar zxf scikit-learn-0.14.1.tar.gz
% cd scikit-learn-0.14.1
% python setup.py build
% python setup.py install
% cd ..
% nosetests -exe sklearn

I got 1 error on sklearn.cluster.bicluster.tests.test_utils.test_get_submatrix. Don't worry! Just go ahead.

matplotlib

"matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits."

For Cray XC30 only.

% module unload PrgEnv-cray
% module load PrgEnv-gnu

Start.

% cd $WORK/setup
% wget https://downloads.sourceforge.net/project/matplotlib/matplotlib/matplotlib-1.3.1/matplotlib-1.3.1.tar.gz
% tar zxf matplotlib-1.3.1.tar.gz
% cd matplotlib-1.3.1
% python setup.py build
% python setup.py install
% cd ..
% python -c "import matplotlib; matplotlib.test()"

NumPy and SciPy with ATLAS

We will compile NumPy and SciPy using GNU Compiler with the auto-optimized BLAS and LAPACK from ATLAS. If you are on Cray, run the following commands.

% module unload PrgEnv-cray
% module load PrgEnv-gnu

ATLAS

% cd $WORK/setup
% wget http://sourceforge.net/projects/math-atlas/files/Stable/3.10.1/atlas3.10.1.tar.bz2
% tar jxf atlas3.10.1.tar.bz2
% cd ATLAS
% wget http://www.netlib.org/lapack/lapack-3.4.2.tgz
% mkdir build-gcc
% cd build-gcc
% ../configure --prefix=$WORK -Fa alg -fPIC --with-netlib-lapack-tarfile=../lapack-3.4.2.tgz --cc=gcc
% make
% make check
% make install

Due to some optimization processes while making ATLAS, it might take up to a hour to complete.

NumPy

% cd $WORK/setup
% wget http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz
% tar zxf numpy-1.8.0.tar.gz
% cd numpy-1.8.0/
% env ATLAS=$WORK $WORK/bin/python setup.py build --fcompiler=gnu95 |& tee build.log
% env ATLAS=$WORK $WORK/bin/python setup.py install
% cd ..
% python -c "import numpy; numpy.test()"

SciPy

% cd $WORK/setup
% wget http://sourceforge.net/projects/scipy/files/scipy/0.13.3/scipy-0.13.3.tar.gz
% tar zxf scipy-0.13.3.tar.gz
% cd scipy-0.13.3/
% $WORK/bin/python setup.py build --fcompiler=gnu95 |& tee build.log
% $WORK/bin/python setup.py install
% cd ..
% python -c "import scipy; scipy.test()"