Master of Science Honors in Big Data Science
- 00:00:00Course Duration
- Skill level
-
$1085.00
- 30 November -0001Admission Deadline
Purpose of the programme
The advances in technology have led to data being generated in large quantities in Government, the business community, social networks, and various other sectors. Therefore, there is an increasing need for society to gain insights from the large volumes of data to create value and aid in decision making. The cornerstone of this degree programme is to develop individuals whom after successful completion of the programme should be able to design solutions and approaches that are able to convert raw data into actionable and valuable knowledge. Big Data Science graduates should be able to uncover complexities associated with ‘big data’ that will lead to the provision of solutions to real-world problems.
Learning outcomes
Graduates should be able to:
1. Use big data analytic tools to analyse big data |
2. Communicate big data science concepts and use them to design and provide solutions effectively. |
3. To configure, manage and trouble shoot computer systems for big data analytics. |
4. Identify and analyse scholarly literature relating to big data science. |
5. Management of big data. |
6. Manipulate and process large volumes of data. |
Career opportunities and further education
Employability: Data Engineer, Data Analyst, Data Architect, Systems Analyst, Business Intelligence Analyst, Machine Learning Engineer.
Further studies: Successful candidates may pursue Ph.D. studies in the following: Computer Science, Data Science, Research Engineering or Artificial Intelligence.
Entry requirements
Applicants must be holders of a BSc Honours Degree with a degree class of at least an Upper Second (2.1) in Computing (Computer Engineering, Informatics, Information Systems, Computer Science, Software Engineering), Electronic Engineering, Applied Mathematics, Statistics, Business Analytics or Operations Research.
Applicants with a Lower Second Class (2.2) Honours Degree in the above fields will be required to have at least two years’ post-qualification experience.
MODULE |
SYNOPSES |
1. Big Data Analytics |
The three V’s of Big Data (Volume, Velocity, and Variety); Building models for data; Understand the occurrence of rare events in random data. Understand sources of big data such as the web and social networks; Model social networks; Apply algorithms for community detection in networks. Clustering big data: clustering social networks; Apply hierarchical clustering. Mining rapidly arriving data streams: Understand the types of queries for data streams; Analyse sampling methods for data streams; Count distinct elements in data streams; Filter data streams. Big Data landscape including examples of real-world big data problems including the three key sources of Big Data: people, organizations, and sensors. Identify what are and what are not big data problems and be able to recast big data problems as data science questions. |
2. Programming for Data Science |
High-performance computing using high-level languages (python, java), distributed computing, cores, threads, and nodes. Operating systems, multicore architectures, file systems, point-to-point communication. Single-core optimization, parallel algorithms: Collecting, storing and organizing data using big data solutions. Techniques using real-time and semi-structured data examples. Systems and tools including AsterixDB, HP Vertica, Impala, Neo4j, Redis, SparkSQL. Extracting value from existing untapped data sources and discovering new data sources. Recognize different data elements in everyday life problems. Design a big data Infrastructure. Plan and Information System Design. Identify frequent data operations required for various types of data. Select a data model to suit the characteristics of big data. Apply techniques to handle streaming data. Differentiate between a traditional Database Management System and a Big Data Management System. Apply MapReduce using Hadoop; Compute PageRank using MapReduce. Spark, Hadoop, R and SAS, Streaming, Data fusion, Distributed file systems; and Data sources such as social media and sensor data. |
3. Big Data Research Methods |
Fundamentals of the research process; from developing a good research question to designing good data collection strategies to putting results into context. Topics include but not limited to: research process, research ethics, planning for analysis, research claims, measurement, orrelational and experimental design. Phases and life cycles of research in data science. |
4. Computational Statistics |
Random number generation. Monte Carlo Integration: Simulation and Monte Carlo integration, variance reduction, stratified sampling. Resampling Methods: Bootstrapping. Jackknife resampling, percentile confidence interval. Markov chain Monte Carlo methods; Markov chains, Metropolis-Hastings algorithms, Gibbs sampling convergence. Density Estimation: Univariate estimation, kernel smoothing, multivariate density estimation, Numerical Methods: root finding, constrained and unconstrained optimisation, EM algorithm |
5. Big Data Project Management |
The big data ecosystem; technology-agnostic market, pillars of big data. Adopting big data analytics; vendor selection, opportunities and implications, research and development pipelines, intellectual property protection, clients and projects, mission-critical and availability. Project management; project failures and successes, PMBOK and data science. Project lifecycles; estimation, scope, schedule, quality, staffing shortages, communication, risk management, and mitigation. Methodologies; scrum and scrum again, big data hub, big data factory, big data lake, big data foundry, big data as a service , big data analytics as a service. Platforms and governance; security and services, process monitoring, compliance reporting, ethical issues, system metrics and KPIs. Program portfolio and program management office. |
6. Machine Learning |
Machine and statistical learning algorithms for big data, identify trends from the data, modeling trends for prediction purposes as well as modelling for the detection of hidden knowledge. Supervised learning algorithms and unsupervised learning algorithms. Stochastic gradient descent. Building a machine learning algorithm, deep learning. Bayesian networks, support vector machines. Programming for machine learning (e.g python, c, java). New developments in regression and classification, probabilistic graphical models, numerical. Bayesian and Monte Carlo methods, neural networks, decision trees, deep learning, and other computational methods. R for data mining, cluster analysis, dimensional reduction, calculating statistical significance. |
7. Big Data Visualisation |
Visualisation component focusing on the encoding of information, such as patterns, into visual objects. Visualization using python and R. Python pandas data science library, python lambdas, and the Numpy library, data cleaning and manipulation techniques. Data collection structures: list, creating lists. Data frames. File I/O processing and regular expressions. Data gathering and cleaning. Data exploring and analysis. |
8. Big Data Science Project |
In this module, a student is expected to demonstrate the application of the theoretical Big Data Science knowledge to solve real-world problems. A student is expected to identify a real-world problem from industry and commerce. Projects shall be based on the entire big data lifecycle. This includes the gathering of data of significant size as well as a final technical report describing the process followed and the deliverables. Students may be allowed to work in pairs. The proposed project shall be subject to approval by the Department. It is expected that a submission to a relevant journal is made at the end of this module. This module is assessed entirely through coursework. |
9. Mathematical Modelling |
Eigenvalues and eigenven the model looks at the general principles of mathematical modelling and modelling skills needed for abstraction, idealisation, identification of important factors such as variables and parameters. actors, principal component analysis (PCA), the graph Laplacian, and singular value decomposition (SVD), Application of eigenvalues and eigenvectors to investigate prototypical problems of ranking big data, Application of the graph Laplacian to investigate prototypical problems of clustering big data, Application of PCA and SVD to investigate prototypical problems of big data compression. Case studies may be chosen from the following list hence students shall study case studies from any of the following case studies. Simulation modelling; Discrete event simulation; Systems dynamics; Simulation software; Sampling methods; Model testing and validation; Materials Science Modelling: Understand the micro-level molecular and sub-atomic effects, subtle engineering of special compounds etc; Traffic and Transportation Modelling: Roads, railway networks and air traffic contain many challenges for modelling; Modelling in Food and Brewing Industry; Chemical Reactions and Processes Modelling. |
10.Dissertation |
A student shall be allowed to commence a dissertation provided they have not failed more than 25% of taught modules. He/She is expected to identify real-world problems and provide well-researched solutions. It is recommended that the approach to the research dissertation adheres to the phases of solving a big data science project i.e. (i) discovery, (ii) data preparation (iii) model planning (iv) model building (v) operationalise and communicative results. The completion of the dissertation shall culminate in the production of a dissertation report. |