Justin Le
Email:
Office:
Links:
justinle [at] ucsb.edu
Harold Frank Hall (HFH), 3112
Bio
Publications
Software
Teaching


A Technical Introduction to Machine Learning

Speaker: Justin Le.

Host: Prof. Yingtao Jiang.


Friday, Oct. 7, 2016

3 to 5pm

SEB 3265, UNLV


Abstract

In recent years, intelligent algorithms that learn from data have had an enormous impact on empirical research in such diverse areas as medicine, physics, finance, and beyond. Furthermore, these algorithms have been implemented in common programming languages, making them widely accessible to both researchers and developers.

In this workshop, we'll discuss the concepts and mathematics that underlie these machine learning techniques, as well as the Python libraries that enable us to efficiently apply them in practice. Requiring only a basic familiarity with calculus and programming, the workshop will introduce common challenges in applying and evaluating machine learning methods with the goal of extracting insights from large, complex datasets.


Schedule

Introduction

  • Sample complexity and computational limits
  • Scalability, robustness, and interpretability
  • A roadmap for applying machine learning

Building flexible models and managing their complexity

  • Classification: logistic regression, support vector machines, and decision trees
  • Overfitting and generalizable models
  • Ensembles: random forests and gradient-boosted machines

Performance metrics and model evaluation

  • Precision, recall, and the ROC curve
  • Cross-validation and generalization error
  • Pitfalls: avoiding snooping and sampling bias

Features and representation

  • Visually exploring data with an eye toward patterns and anomalies
  • Constructing useful features and estimating their relevance
  • How features and dimensionality impact performance
  • Introduction to deep learning: frameworks and applications

Applications to real-world data

  • Predicting customer satisfaction: Santander Bank
  • Mining text data: the 2016 U.S. Presidential Debate
  • Detecting cancer cells

Additional topics (as time permits)

  • Preparing data: standardizing, normalizing, and imputation
  • An introduction to optimization: gradient methods and convergence analysis

Prerequisites

Although not necessary, it is strongly recommended that you review the following topics before attending in order to fully benefit from the workshop:

  • Mathematics: vector/matrix calculations, probability distributions, sums/integrals, gradients
  • Programming: basic Linux commands, basic Python commands involving lists and functions

Notes

If you wish to follow along with our programming exercises during the workshop, please bring your own laptop with Ubuntu 14.04+ and the following packages:


    sudo apt-get install build-essential python-dev python-numpy \
      python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib \
      ipython ipython-notebook
    
    pip install --upgrade pip
    pip install jupyter
    pip install -U scikit-learn
    

Pip is required for the above commands.

Please install xgboost for Python if you wish to run the examples for gradient boosting, as well as seaborn for visualization.

If you use your own distro instead, we cannot offer any support if you encounter issues while executing our code.

Find the repo here.