Dimensionality Reduction and Visualization (MTTTS17)

Posted on

Basic Course Information

The basic course description and teaching schedule are available in the Curriculum Guide.

Lectures are given weekly Tuesdays 14-16 in Pinni B0016 starting Jan 7, see the preliminary lecture schedule below. Due to the coronavirus situation, lectures from March 17 onwards (for the time being) will not have contact teaching and will be arranged using the Zoom software; more information below. Lecturer: professor Jaakko Peltonen.

Course Contents

Preliminary contents: Properties of high-dimensional data; Feature Selection; Linear feature extraction methods such as principal component analysis and linear discriminant analysis; Graphical excellence; Human perception; Nonlinear dimensionality reduction methods such as the self-organizing map and Laplacian embedding; Neighbor embedding methods such as stochastic neighbor embedding and the neighbor retrieval visualizer; Graph visualization; Graph layout methods such as LinLog.

Course Material

The course is based on the lecture slides. However, a related book is Nonlinear Dimensionality Reduction (John Lee, Michel Verleysen). For lecture 4 (graphical excellence) a related book is The Visual Display of Quantitative Information (Edward R. Tufte). For lecture 5 (human perception) a related book is Information Visualization: Perception for Design (Colin Ware).

Learning Outcomes

After the course, the student will be aware of main approaches and issues in dimensionality reduction and visualization, will be aware of a variety of methods applicable to the tasks, and will be able to apply some of the basic techniques.

Passing the Course

To pass the course, you must pass the exam and complete a sufficient number of exercises from the exercise packs. Exercise packs will be released during the course.

Preliminary grading scheme (note: preliminary information only, may change!): the exercise packs are graded in total either as 0 (fail) or as a fractional number between 1 and 5 (such as 1.34). The exam is similarly graded either as 0 (fail) or as a fractional number between 1 and 5. The total grade of the course is computed as round(0.8*ExamGrade + 0.2*ExercisesGrade), so that e.g. 4.51 rounds up to 5 and 4.49 rounds down to 4.

Information about Remote Lectures (Updated March 16, 2020)

Due to the coronavirus situation, lectures from March 17 onwards (for the time being) will not have contact teaching. Instead, the lectures will be arranged using the Zoom software. For each lecture, a link to the lecture will be sent before the lecture – if you have not received a Zoom link to the lecture, contact the lecturer.

To participate in a Zoom lecture, please make sure that:

  • You have a laptop/desktop with a working microphone (you need a mic, even if you just want to listen in)
  • You have installed either the Zoom software, or the Chrome browser. Firefox and Microsoft Edge do not work.

To install the Zoom software, and to learn how to use it, please see the handbook on the university intranet (you need to login with your university account to see the page).

To join the meeting, you will receive a link by email. Simply open the link. Your browser should ask to open in the Zoom application; do that, or if you do not have the Zoom application installed, choose “join with browser” instead (you have to use Chrome as your browser in that case).

During the lecture, I recommend to keep your microphone muted (you can do that from the Zoom interface) when you are not talking, otherwise background noise is heard by everyone.

The Zoom lectures will be recorded, and the lecture slides and the lecture recording will be available from Panopto as usual after the lecture.

Preliminary Schedule

The preliminary schedule below may change as the course progresses. Lecture slides for each lecture will be added to the schedule as the course progresses.

Jan 7 Lecture 1: Introduction, properties of high-dimensional data. Material: Lecture slides, Lecture video in Panopto
Jan 14 Lecture 2: Feature selection. Material: Lecture slides, Lecture video in Panopto – part 1, Lecture video in Panopto – part 2.
Jan 21 Lecture 3: Feature selection continued, and Linear dimensionality reduction. Material: Lecture slides, Lecture video in Panopto
Jan 28 Lecture on linear dimensionality reduction continued. Material: no new slides, Lecture video in Panopto: part 1, part 2, part 3, part 4.
Feb 4 Lecture 4: Graphical excellence. Lecture material: Lecture slides, Lecture video in Panopto.
Feb 11 Lecture 5: Human perception. Lecture material: Lecture slides, Lecture video in Panopto
Feb 18 lecture on human perception continued. Lecture material: Lecture slides, Lecture video in Panopto.
Feb 25
Lecture 6: Nonlinear dimensionality reduction, part 1. Lecture material: Lecture slides, Lecture video in Panopto
Mar 3 continuation of nonlinear dimensionality reduction part 1, and beginning of part 2. Lecture material: Lecture slides, Lecture video in Panopto
Mar 10 Lecture 7: Nonlinear dimensionality reduction, continuation of part 2. Lecture material: same slides as March 3, Lecture video in Panopto
Mar 17 Lecture 8: Nonlinear dimensionality reduction, part 3. Lecture material: Lecture slides, Lecture video in Panopto
Mar 24 Lecture 9: Metric learning. Lecture material: Lecture slides, Lecture video in Panopto
Mar 31 Lecture 10: Neighbor embedding, part 1. Lecture material: Lecture slides, Lecture video in Panopto
Apr 7 Lecture 11: Neighbor embedding, part 2. Material: Lecture slides (updated Apr 14, 2020), Lecture video in Panopto
Apr 14 at 12:15-14 Lecture 12: Graph visualization. Note new lecture time. Lecture material: Lecture slides, Lecture video in Panopto
Apr 21 Lectures 11-12 continued. Lecture material: Lecture video in Panopto
Apr 28
Lecture 13: Dimensionality reduction for graph layout. Lecture material: Lecture slides, Lecture video in Panopto
May 5 Recap for course material, discussion of exercise packs. Lecture material: Lecture video in Panopto
May 19 Tentative date for first exam.

Exercise Packs

Exercise packs will be released during the course. They can be completed using e.g. Octave, Matlab, or R.

About Octave, Matlab, and R

Octave (GNU Octave) is a free software that is very similar in operation to Matlab, and is available for several systems including Windows, Linux, and Mac OS X. For Linux it is likely available in the software repository of your distribution such as Ubuntu Software Center; for Windows download it
through the download page; for Mac OS X there are various alternatives, the easiest is a slightly older version at SourceForge.

Several tutorials are available online about programming in Matlab and programming in Octave. If you are familiar with R, Prof. David Hiebeler (University of Maine) has written a useful Matlab/R reference that tells how the same operations are done in both languages.

R is a software for statistical computing, also available for Windows, Linux, and Mac OS X. For Linux it might be already installed (check with “which R”) or is likely available in the software repository of your distribution such as Ubuntu Software Center. For Windows and Mac OS X download it through one of the many CRAN mirror sites.

There are a large amount of R tutorials available online (e.g. this one). If you are familiar with Matlab or Octave, Prof. David Hiebeler (University of Maine) has written a useful Matlab/R reference that tells how the same operations are done in both languages.