Data Science and Machine Learning are the leading buzzwords of today.

This book covers all aspects of these subjects, from data definition and categorization, classification techniques, clustering and ML algorithms to data stream and association rule mining, language data processing and neural networks. It explains descriptive and inferential statistical analysis, probability distribution and density functions as well as time series. It also describes the fundamentals of Python programming, the Python environment and libraries such as scikit-learn, NumPy and pandas, and takes a deep dive into data visualization modules and tools.

Mastery of these areas will enable students to become proficient and effective data scientists.

*Salient features*

- Ideal for undergraduate courses on Data Science and Analytics
- Provides step-by-step instructions for setting up the Python environment and executing various libraries and packages
- All chapters include relevant case studies, their Python code and output; the last chapter is dedicated to case studies
- Over 300 exercise questions comprising MCQs, programming exercises and concept-based questions, with answers provided for quick reference
- Bibliography at the end of every chapter for further reading
- Android app with chapter-wise PowerPoint slides and job interview questions

Chapter-wise PowerPoint slides are available at: www.universitiespress.com/DataScienceandAnalyticswithPython

**Sandhya Arora** is Professor, Department of Computer Engineering, MKSSS’s Cummins College of Engineering, Pune, Maharashtra.

**
Latesh Malik **is Associate Professor, Department of Computer Science and Engineering, Government College of Engineering, Nagpur, Maharashtra.

*Preface *

*Acknowledgements*

**Chapter 1: Introduction to Data Science **

Introduction | Data Science | Data Science Stages | Data Science Ecosystem | Tools Used in Data Science | Data Science Workflow | Automated Methods for Data Collection | Overview of Data | Sources of Data | Big Data | Data Categorization

Chapter 2: Environment Set-up and Basics of Python

Introduction to Python | Features of Python | Installation of Python | Python Identifiers
| Python Indentation | Comments in Python | Basic Data | Operators and Expressions
| Data Types | Sets and Frozen Sets | Loops and Conditions | Classes and Functions |
Working with Files

Chapter 3: NumPy and pandas

Arrays | NumPy | The pandas Package | Panels

Chapter 4: Data Visualization

Introduction | Visualization Software and Tools | Interactive Visual Analysis | Text
Visualization | Creating Graphs with Matplotlib | Creating Graphs with the plotly Package

| Data Visualization with Matplotlib, Seaborn and pandas | Exploratory Data Analysis |
Mapping and Cartopy

Chapter 5: Python scikit-learn

Introduction | Features of scikit-learn | Installation | Regression and Classifiers in scikit-learn
| Support Vector Machine (SVM) | K-Nearest Neighbor (K-NN) | Case Studies

Chapter 6: Environment Set-up: TensorFlow and Keras

Introduction to TensorFlow | TensorFlow Features | Benefits of TensorFlow | Installation
of TensorFlow | TensorFlow Architecture | Introduction to Keras | Installation of Keras |

Features of Keras | Programming in Keras

Chapter 7: Probability

Introduction to Probability | Probability and Statistics | Random Variables | Central Limit
Theorem | Density Functions | Probability Distribution

Chapter 8: Machine Learning and Data Pre-processing

Introduction to Machine Learning | Need for Machine Learning | Types of Machine
Learning | Understanding Data | Data Set and Data Types | Data Pre-processing | Data

Pre-processing in Python

Chapter 9: Statistical Analysis: Descriptive Statistics

Introduction | One-dimensional Statistics | Multi-dimensional Statistics | Simpson’s
Paradox

Chapter 10: Statistical Analysis: Inferential Statistics

Introduction | Hypothesis Testing | Using the t-test | The t-test in Python | Chi-square
Test | Wilcoxon Rank-Sum Test | Introduction to Analysis of Variance

Chapter 11: Classification

Introduction | K-NN Classification | Decision Trees | Support Vector Machine (SVM)
| Naive Bayes’ Classification | Metrics for Evaluating Classifier Performance | Cross-validation

| Ensemble Methods: Techniques to Improve Classification Accuracy

Chapter 12: Prescriptive Analytics: Data Stream Mining

Introduction to Stream Concepts | Mining Data Streams | Data Stream Management
System (DSMS) | Data Stream Models | Data Stream Filtering | Sampling Data in a Stream

| Concept Drift | Data Stream Classification | Rare Class Problem | Issues, Controversies
and Problems | Applications of Data Mining | Implementation of Data Streams in Python

Chapter 13: Language Data Processing in Python

Natural Language Processing | Text Processing in Python | CGI/Web Programming Using
Python | Twitter Sentiment Analysis in Python | Twitter Sentiment Analysis for Film

Reviews | Case Study: A Recommendation System for a Film Data Set | Case Study: Text
Mining and Visualization in Word Clouds

Chapter 14: Clustering

Introduction | Distance Measures | K-means Clustering | Hierarchical Clustering |
DBSCAN Clustering

Chapter 15: Association Rule Mining

Introduction | The Apriori Algorithm | An Example of an Apriori Algorithm | An Example
Using Python: Transactions in a Grocery Store

Chapter 16: Time Series Analysis Using Python

Introduction | Components of a Time Series | Additive and Multiplicative Time Series |
Time Series Analysis | Case Study on Time Series Analysis

Chapter 17: Deep Neural Network and Convolutional Neural Network

Overview of Feed Forward Neural Network | Overview of Deep Neural Network |
Activation Function | Loss Functions | Regularization | Convolutional Neural Network |

Implementation of CNN | Case Studies

Chapter 18: Case Studies

Digit Recognition | Face and Eye Detection in Images | Correlation and Feature Selection
| Fake News Detection | Detecting Duplicate Questions | Weather Prediction and Song

Recommendation System | Spam Detection

*Index*