P. Anandan (Microsoft Research, USA)

Overview of Machine Learning and Intelligence at Microsoft Research

Christopher Bishop (Microsoft Research, UK)

Model-Based Machine Learning

Today machine learning is centre stage in the world of technology, and thousands of scientists and engineers are applying machine learning to an extraordinarily broad range of domains. However, making effective use of machine learning in practice can be daunting, especially for newcomers to the field. Over the last five decades, researchers have created literally thousands of machine learning algorithms. Traditionally an engineer wanting to solve a problem using machine learning must choose one or more of these algorithms to try, often constrained those algorithms they happen to be familiar with, or by the availability of software implementations. In this talk we view machine learning from a fresh perspective which we call ‘model-based machine learning’, in which a bespoke solution is formulated for each new application. We show how probabilistic graphical models, coupled with efficient inference algorithms, provide a flexible foundation for model-based machine learning, and we describe several large-scale commercial applications of this framework. We also introduce the concept of ‘probabilistic programming’ as a powerful approach to model based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

Fabrizio Gagliardi (ACM Europe)

ACM Europe mission and activities

ACM is the oldest and largest computing professional society in the world having been established in 1947. It represents the interests and serves a community of more than 100’000 computing professionals from all over the world. To increase its presence in some strategic regions ACM established regional councils in 2009 in India, China and Europe. ACM Europe has been progressing since then and counts now more than 16’000 members. Since 2013 it is a legal non for profit entity registered in Brussels, Belgium. The talk will review the progress so far and will explain the current activities and plans with a particular focus on Russia.

Isabelle Guyon (ChaLearn, USA)

Advances in Network Reconstruction

Networks of influence are found at all levels of physical, biological, and societal systems: climate networks, gene networks, neural networks, and social networks are a few examples. These networks are not just descriptive of the “State of Nature”, they allow us to make predictions such as forecasting disruptive weather patterns, evaluating the possible effect of a drug, locating the focus of a neural seizure, and predicting the propagation of epidemics. This, in turns, allows us to device adequate interventions or change in policies to obtain desired outcomes: evacuate people before a region is hit by a hurricane, administer treatment, vaccinate, etc. But knowing the network structure is a prerequisite, and this structure may be very hard and costly to obtain with traditional means. For example, the medical community relies on clinical trials, which cost millions of dollars; the neuroscience community engages in connection tracing with election microscopy, which take years before establishing the connectivity of 100 neurons (the brain contains billions). These classes will review recent progresses that have been made in network reconstruction methods based solely on observational data. Great advances have been recently made using machine learning. We will analyze the results of several challenges we organized, which point us to new simple and practical methodologies to uncover potential cause-effect relationships and to prioritize experiments.

Mona Soliman Habib (Microsoft Corporation, USA)

Big Data and Data Science Challenges

We live in a sea of data! Large, complex, structured, unstructured, heterogeneous data is being generated and stored in various shapes, forms, and locations. The volume, variety, and velocity in which the data expands gives rise to numerous challenges in how the data is captured, stored, visualized, analyzed, and processed to extract valuable, actionable insights to influence the decision making process. In this talk, we will explore the opportunities and challenges of big data and data science, their impact across many sectors, and how they may shape new, inter-disciplinary research agendas to address them.

Tutorial #1 – Azure Machine Learning

Microsoft Azure Machine Learning is a service on Windows Azure which a Developer/Data Scientist/BI Analyst can use to easily build a predictive model using machine learning over data, and then deploy and manage that model as a cloud service. In this hands on tutorial, you will practice using Azure Machine Learning in an end-to-end workflow for constructing and deploying a predictive model; from data ingestion, data exploration, feature selection and creation, building and evaluating machine learning models, to final model selection and deployment.

Tutorial #1 Pre-requisites

  1. Any web browser + a Microsoft Live ID (e.g.,,, account).
  2. Sign in to Azure Machine Learning using a Microsoft Live ID to get a free workspace.

Tutorial #2 – Advanced Analytics Process

In this hands on tutorial, you will practice the Advanced Analytics Process using Azure technologies to go from GBs of data to operationalized machine learning models in a couple of hours using only a browser and elastic resources in the cloud. You will work with a public dataset - NYC Taxi Trips – to train and deploy one or more models to predict tips paid for a taxi trip. You will conduct exploratory data analysis, create features, sample the data using an SQL Server and IPython Notebook Server virtual machine, and use the data sample in Azure Machine Learning to build and operationalize the model as a web-service, ready for consumption in any application via a web-service API. More details available in this Advanced Analytics walkthrough.

Tutorial #2 Pre-requisites

  1. Any web browser + a Microsoft Live ID (e.g.,,, account).
  2. Sign in to Azure Machine Learning using a Microsoft Live ID to get a free workspace.
  3. Some familiarity with SQL and Python.
  4. Access to pre-configured and pre-loaded virtual machines will be provided to use in this session.
  5. Optional: Attendees may use an IPython Notebook installed on a personal machine. In this case, download the sample IPython notebooks on GitHub.

Christoph Lampert (Institute of Science and Technology, Austria)

Learning with Structured Inputs and Outputs

Structured prediction methods have revolutionized the way in which researchers in computer vision and other application areas can tackle the task of predicting complex object with many interconnected parts. The lecture will give an in-depth introduction into the theory and applications of one of the currently most popular frameworks: discrete probabilistic graphical models. Using example from computer vision we will discuss prediction algorithms, such as belief propagation and graph-cuts, as well as methods for parameter learning based on classic maximum likelihood as well as the maximum-margin principle.

Michael Levin (Yandex Data Factory, Russia)

MatrixNet and Applications

MatrixNet is a machine learning tool developed and widely used at Yandex. Talk will cover its key advantages, some of the tricks used in the implementation, examples of applications in different services at Yandex. We will discuss how models are deployed into production, how are they tested. We will finish with some limitations of MatrixNet and show how to overcome them.

Maxim Milakov (NVIDIA, Russia)

Deep Learning with GPUs

High-performance GPUs are the foundational technology powering deep learning research and product development at universities and in industry. This talk will begin with a brief history of Deep Learning, explaining why NVIDIA GPUs together with large datasets are the key enablers for recent advances in the field. We will then look more deeply at how NVIDIA is investing in software and hardware advances to benefit Deep Learning, including cuDNN and other libraries, the DIGITS platform and technologies such as NVLink and FP16. Finally we will see how some of the major frameworks enable researchers to train DL models using GPUs to accelerate time to discovery.

Kate Saenko (University of Massachusetts, Lowell, USA)

Modeling Images, Videos and Text Using the Caffe Deep Learning Library

In the first part of the course, I will describe several recent advances in automatic generation of natural language descriptions for images and video. Image and video description has important applications in human-robot interaction, indexing and retrieval, and audio descriptive language generation for the blind. I will start with a deep learning model that combines a convolutional network structure with a recurrent structure to generate sentences from images or fixed-length videos. I will then describe a sequence-to-sequence neural network that learns to generate captions for brief videos of variable length. The model is trained on video-sentence pairs and is naturally able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. To further handle the ambiguity over multiple objects and locations, the model incorporates convolutional networks with Multiple Instance Learning (MIL) to consider objects in different positions and at different scales simultaneously. I will show how the multi-scale multi-instance convolutional network is integrated with a sequence-to-sequence recurrent neural network to generate sentence descriptions based on the visual representation. This architecture is the first end-to-end trainable deep neural network that is capable of multi-scale region processing for variable-legth video description. I will show results of captioning YouTube videos and Hollywood movies.

In the second part of the course I will talk about how these deep language and vision models can be implemented using the Caffe library. Caffe (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Caffe’s expressive architecture encourages application and innovation, as models and optimization are defined by configuration without hard-coding. Caffe allows one to switch between CPU and GPU by setting a single flag to train on a GPU machine, then deploy to commodity clusters or mobile devices. Caffe’s extensible code fosters active development and has seen many contributors provide state-of-the-art models for computer vision. Speed makes Caffe perfect for research experiments and industry deployment (up to 1 ms/image for inference and 4 ms/image for learning). Caffe already powers academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. This tutorial will equip researchers and developers with the tools and know-how needed to incorporate deep learning into their work. I will show basic Caffe usage and step by step notebook examples including the language and vision models discussed in the first part of the course.

Ruslan Salakhutdinov (University of Toronto, Canada)

Deep Learning Tutorial

Building intelligent systems that are capable of extracting high-level representations from high-dimensional sensory data lies at the core of solving many AI related tasks, including visual object or pattern recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires deep architectures that involve many layers of nonlinear processing.

Many existing learning algorithms use shallow architectures, including neural networks with only one hidden layer, support vector machines, kernel logistic regression, and many others. The internal representations learned by such systems are necessarily simple and are incapable of extracting some types of complex structure from high-dimensional input.

In the past few years, researchers across many different communities, from applied statistics to engineering, computer science and neuroscience, have proposed several deep (hierarchical) models that are capable of extracting useful, high level structured representations. An important property of these models is that they can extract complex statistical dependencies from high-dimensional sensory input and efficiently learn high-level representations by re-using and combining intermediate concepts, allowing these models to generalize well across a wide variety of tasks. The learned high-level representations have been shown to give state-of-the-art results in many challenging learning problems, where data patterns often exhibit a high degree of variations, and have been successfully applied in a wide variety of application domains, including visual object recognition, information retrieval, natural language processing, and speech perception. A few notable examples of such models include Deep Belief Networks, Deep Boltzmann Machines, Deep Autoencoders, and sparse coding-based methods.

Olga Senyukova (Lomonosov Moscow State University, Russia)

Machine Learning Applications in Medicine

How can beautiful algorithmic findings be helpful in our everyday life? One of the answers to this question lies in the area of healthcare applications. Nowadays machine learning methods are becoming more and more useful in medicine. They are able not only to assist medical specialists in processing large amounts of data, but also to help in diagnostics and patient follow-up.

This course is devoted to the discussion of some interesting applications of machine learning methods to automatically analyse medical images and physiologic signals. Medical images acquired by means of special equipment represent internal structures of the human body and/or processes in it. The most modern technologies for acquisition of such images are magnetic resonance imaging (MRI) and computed tomography (CT). Physiologic signals usually refer to cardiologic time series such as electrocardiograms (ECG), but can also represent other physiological data, for example, stride intervals of human gait.

Several important problems will be highlighted along with successful solutions involving machine learning methods including examples both from the worldwide practice and the author’s own research. Description of the basic principles of the algorithms used will provide a good opprotunity to strengthen the knowledge acquired from the other courses of the school.

Dmitry Vetrov (Higher School of Economics, Russia)

Bayesian inference and latent variable models in machine learning

One of possible approaches to machine learning problem is to build a joint probability distribution over the whole set of variables. By doing this it becomes possible to make predictions on hidden variables by conditioning the joint distribution on the observed variables and marginalizing it w.r.t. unknown variables in whose particular values are not of interest for us. But this framework is much more powerful and provides us with greater opportunities. In the talk I will present how to apply it for the case when even in the training set we deal with partly-labeled or unlabeled data. We will review EM-algorithm and its extensions which allow computer to restore surprising dependencies from incomplete data.

Evelyne Viegas (Microsoft Research, USA)

Artificial Intelligence Academic Programs at Microsoft Research

We are in a new era of Artificial Intelligence. With advances in machine learning and computational speed, AI is at our doorstep. Intelligent agents like language translators and autonomous drones are just the beginning. I will present some of the research in artificial intelligence, applications (e.g., Hyperlapse), datasets (e.g., MS COCO), platforms (e.g. CodaLab) and services (e.g. Oxford) that you can engage with to explore how AI might change our lives for the better.