Preparing video…

Web Intelligence and Big Data

This course is about building 'web-intelligence' applications exploiting big data sources arising social media, mobile devices and sensors, using new big-data platforms based on the 'map-reduce' parallel programming paradigm. In the past, this course has been offered at the Indian Institute of Technology Delhi as well as the Indraprastha Institute of Information Technology Delhi.


Course at a Glance

About the Course

The past decade has witnessed the successful of application of many AI techniques used at `web-scale’, on what are popularly referred to as big data platforms based on the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases and stream computing engines. Online advertising, machine translation, natural language understanding, sentiment mining, personalized medicine, and national security are some examples of such AI-based web-intelligence applications that are already in the public eye. Others, though less apparent, impact the operations of large enterprises from sales and marketing to manufacturing and supply chains. In this course we explore some such applications, the AI/statistical techniques that make them possible, along with parallel implementations using map-reduce and related platforms.

This course was offered thrice during Fall 2012, Spring 2012 and Fall 2013; in Fall of both years it was also taken for credit at IIT Delhi and IIIT Delhi. During this period, I also wrote a book to elucidate the ideas discussed in the course at a 'popular' level:

The Intelligent Web: Search, Smart Algorithms and Big Data published by Oxford University Press, UK, in November 2013.

Now in this edition, the course is being offered in 'self-study' mode.

Course Syllabus

Introduction and Overview 
Look: Search, Indexing and Memory
Listen: Streams, Information and Language, Analyzing Sentiment and Intent
Load: Databases and their Evolution, Big data Technology and Trends
Programming: Map-Reduce
Learn: Classification, Clustering, and Mining, Information Extraction
Connect: Reasoning: Logic and its Limits, Dealing with Uncertainty
Programming: Bayesian Inference for Medical Diagnostics
Predict: Forecasting, Neural Models, Deep Learning, and Research Topics
Data Analysis: Regression and Feature Selection

Recommended Background

Basic programming, SQL and data structures
Exposure to probability, statistics and matrices

Course Format

The course consists of lecture videos, which are between 5 and 15 minutes in length, adding up to a maximum of 1-1.5 hrs per week. There are 1-2 integrated quiz questions per lecture video. Additional short quizzes will test basic understanding. However, the current edition of the course is being offered in 'self-study' mode, so there are no homeworks, assignments or exams. Nor is there active support by the instructor or TA, but discussion forums are available for peer-learning.


  • Will I get a certificate after completing this class?

    No. In the past, statements of accomplishment were given. However,  the current edition of the course is being offered for 'self-study', without any graded homework or exams, and so no certificates.

  • Do I need any additional materials?

    Access to a computer on which Python 2.7 either is already installed or can be downloaded and installed. See