Big Data Analytics on Cloud Computing Infrastructures

Summary

The purpose of this course is that the student develops a deeper understanding of big data analytics with cloud computing infrastructures, and how software is made available in cloud services. The student also develops his or her ability to handle big data processing using Apache Spark.

Admission requirements

  1. Bachelor of Science in computer science or related subjects.
  2. At least 15 credits in programming.
  3. At least 7.5 credits in mathematics.
  4. Knowledge equivalent to English 6 at Swedish upper secondary level
  5. Passing grade in the course Artificial Intelligence for data science
  6. Passing grade in the course Exploratory Data Analysis, Visualisation and Storytelling

Syllabus

Syllabus for students autumn 2021

Course Code:
DA636E revision 1
Swedish name:
Storskalig dataanalys med molnbaserad datorinfrastruktur
Level of specialisation
A1F
Main fields of study:
Computer Science
Language:
English
Date of ratification:
02 September 2019
Decision-making body:
Faculty of Technology and Society
Enforcement date:
30 August 2021

Entry requirements

  1. Bachelor of Science in computer science or related subjects.
  2. At least 15 credits in programming.
  3. At least 7.5 credits in mathematics.
  4. Knowledge equivalent to English 6 at Swedish upper secondary level
  5. Passing grade in the course Artificial Intelligence for data science
  6. Passing grade in the course Exploratory Data Analysis, Visualisation and Storytelling

Specialisation and progression relative to the degree regulations

The course is part of the programme Computer Science: Applied Data Science, master’s programme, and can be included in the master's degree in computer science (120 credits).

Purpose

The purpose of this course is that the student develops a deeper understanding of big data analytics with cloud computing infrastructures, and how software is made available in cloud services. The student also develops his or her ability to handle big data processing using Apache Spark.

Contents

The course contains the following elements:

  • Ecosystem for big data processing
  • Large-scale data storage (including cloud file systems, cloud object stores, archival storage)
  • Data analytics with Apache Spark
  • Spark’s programming model with RDD
  • Spark applications with Hadoop/AWS
  • Spark SQL
  • Alternatives to SQL-based databases for big data
  • Streaming with Spark
  • Machine learning with Spark MLlib
  • Advanced real-world applications with Spark

Learning outcomes

Knowledge and understanding
For a passing grade the student shall be able to:

  • Demonstrate an in-depth understanding of the data flow programming model for distributed computations for Big Data applications
  • Distinguish between traditional and large-scale database management systems
  • Describe components and programming models used in building big data analysis systems
Competence and abilities
For a passing grade the student shall be able to:
  • Use cloud-based platforms and implement techniques for large-scale data management
  • Analyse large-scale data management problems and construct data-driven models based on open-source frameworks
  • Productionize the trained models by deploying them to the cloud
  • Verbally and in writing present work within Big Data Analytics on Cloud Computing Infrastructures
Evaluation abilities and approach
For a passing grade the student shall be able to:
  • Assess the characteristics of large-scale data frameworks and determine when such frameworks are applicable or not

Learning activities

Lectures, computer laboratories, seminars, project work.

Assessments

Requirements for pass, the course is assessed through:

  • Report and oral presentation in group projects (7 credits, UG),
  • Laboratory assignments (3 credits, UG) and
  • Written examination (5 credits, UA).
An A-E pass requires that all parts have been completed and passed. The final grade is based on the written examination.

Grading system

Excellent (A), Very Good (B), Good (C), Satisfactory (D), Pass (E) or Fail (U).

Course literature and other teaching materials

  • A. Teller, M. Pumperla, M. Malohlava (2015). Advanced Analytics with Spark: Patterns For Learning From Data at Scale. O'Reilly
  • S. Amirgodshi, M. Rajendran, B. Hall, S. Mei (2017), Mastering Machine Learning with Apache Spark 2.x. Packt Publishing
  • A collection of scientific articles will be added to the above mentioned literature.

Course evaluation

The University provides students who are taking or have completed a course with the opportunity to share their experiences of and opinions about the course in the form of a course evaluation that is arranged by the University. The University compiles the course evaluations and notifies the results and any decisions regarding actions brought about by the course evaluations. The results shall be kept available for the students. (HF 1:14).

Interim rules

When a course is no longer given, or the contents have been radically changed, the student has the right to re-take the examination, which will be given twice during a one year period, according to the syllabus which was valid at the time of registration.

Other Information

The syllabus is a translation of a Swedish source text.

Contact

The education is provided by the Faculty of Technology and Society at the Department of Computer Science and Media Technology.

Further information

Application

30 August 2021 - 16 January 2022 Day-time 50% Malmö This course is offered as part of a program