A Technical Introduction to Machine Learning

Friday, Oct. 7, 2016

3 to 5pm

SEB 3265, UNLV

Abstract

In recent years, intelligent algorithms that learn from data have had an enormous impact on empirical research in such diverse areas as medicine, physics, finance, and beyond. Furthermore, these algorithms have been implemented in common programming languages, making them widely accessible to both researchers and developers.

In this workshop, we'll discuss the concepts and mathematics that underlie these machine learning techniques, as well as the Python libraries that enable us to efficiently apply them in practice. Requiring only a basic familiarity with calculus and programming, the workshop will introduce common challenges in applying and evaluating machine learning methods with the goal of extracting insights from large, complex datasets.

Schedule

Introduction

Sample complexity and computational limits
Scalability, robustness, and interpretability
A roadmap for applying machine learning

Building flexible models and managing their complexity

Classification: logistic regression, support vector machines, and decision trees
Overfitting and generalizable models
Ensembles: random forests and gradient-boosted machines

Performance metrics and model evaluation

Precision, recall, and the ROC curve
Cross-validation and generalization error
Pitfalls: avoiding snooping and sampling bias

Features and representation

Visually exploring data with an eye toward patterns and anomalies
Constructing useful features and estimating their relevance
How features and dimensionality impact performance
Introduction to deep learning: frameworks and applications

Applications to real-world data

Predicting customer satisfaction: Santander Bank
Mining text data: the 2016 U.S. Presidential Debate
Detecting cancer cells

Additional topics (as time permits)

Preparing data: standardizing, normalizing, and imputation
An introduction to optimization: gradient methods and convergence analysis

Prerequisites

Although not necessary, it is strongly recommended that you review the following topics before attending in order to fully benefit from the workshop:

Mathematics: vector/matrix calculations, probability distributions, sums/integrals, gradients
Programming: basic Linux commands, basic Python commands involving lists and functions

Notes

If you wish to follow along with our programming exercises during the workshop, please bring your own laptop with Ubuntu 14.04+ and the following packages:


    sudo apt-get install build-essential python-dev python-numpy \
    python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib \
    ipython ipython-notebook

    pip install --upgrade pip
    pip install jupyter
    pip install -U scikit-learn

Pip is required for the above commands.

Please install xgboost for Python if you wish to run the examples for gradient boosting, as well as seaborn for visualization.

If you use your own distro instead, we cannot offer any support if you encounter issues while executing our code.

Find the repo here.