Data Science and Machine Learning are the leading buzzwords of today. This book covers all aspects of these subjects, from data definition and categorization, classification techniques, clustering and ML algorithms to data stream and association rule mining, language data processing and neural networks. It explains descriptive and inferential statistical analysis, probability distribution and density functions as well as time series. It also describes the fundamentals of Python programming, the Python environment and libraries such as scikit-learn, NumPy and pandas, and takes a deep dive into data visualization modules and tools. Mastery of these areas will enable students to become proficient and effective data scientists. Salient features
Chapter-wise PowerPoint slides are available at: www.universitiespress.com/DataScienceandAnalyticswithPython
Sandhya Arora is Professor, Department of Computer Engineering, MKSSS’s Cummins College of Engineering, Pune, Maharashtra.
Latesh Malik is Associate Professor, Department of Computer Science and Engineering, Government College of Engineering, Nagpur, Maharashtra.
Preface Acknowledgements Chapter 1: Introduction to Data Science Introduction | Data Science | Data Science Stages | Data Science Ecosystem | Tools Used in Data Science | Data Science Workflow | Automated Methods for Data Collection | Overview of Data | Sources of Data | Big Data | Data Categorization Chapter 2: Environment Set-up and Basics of Python Introduction to Python | Features of Python | Installation of Python | Python Identifiers | Python Indentation | Comments in Python | Basic Data | Operators and Expressions | Data Types | Sets and Frozen Sets | Loops and Conditions | Classes and Functions | Working with Files Chapter 3: NumPy and pandas Arrays | NumPy | The pandas Package | Panels Chapter 4: Data Visualization Introduction | Visualization Software and Tools | Interactive Visual Analysis | Text Visualization | Creating Graphs with Matplotlib | Creating Graphs with the plotly Package | Data Visualization with Matplotlib, Seaborn and pandas | Exploratory Data Analysis | Mapping and Cartopy Chapter 5: Python scikit-learn Introduction | Features of scikit-learn | Installation | Regression and Classifiers in scikit-learn | Support Vector Machine (SVM) | K-Nearest Neighbor (K-NN) | Case Studies Chapter 6: Environment Set-up: TensorFlow and Keras Introduction to TensorFlow | TensorFlow Features | Benefits of TensorFlow | Installation of TensorFlow | TensorFlow Architecture | Introduction to Keras | Installation of Keras | Features of Keras | Programming in Keras Chapter 7: Probability Introduction to Probability | Probability and Statistics | Random Variables | Central Limit Theorem | Density Functions | Probability Distribution Chapter 8: Machine Learning and Data Pre-processing Introduction to Machine Learning | Need for Machine Learning | Types of Machine Learning | Understanding Data | Data Set and Data Types | Data Pre-processing | Data Pre-processing in Python Chapter 9: Statistical Analysis: Descriptive Statistics Introduction | One-dimensional Statistics | Multi-dimensional Statistics | Simpson’s Paradox Chapter 10: Statistical Analysis: Inferential Statistics Introduction | Hypothesis Testing | Using the t-test | The t-test in Python | Chi-square Test | Wilcoxon Rank-Sum Test | Introduction to Analysis of Variance Chapter 11: Classification Introduction | K-NN Classification | Decision Trees | Support Vector Machine (SVM) | Naive Bayes’ Classification | Metrics for Evaluating Classifier Performance | Cross-validation | Ensemble Methods: Techniques to Improve Classification Accuracy Chapter 12: Prescriptive Analytics: Data Stream Mining Introduction to Stream Concepts | Mining Data Streams | Data Stream Management System (DSMS) | Data Stream Models | Data Stream Filtering | Sampling Data in a Stream | Concept Drift | Data Stream Classification | Rare Class Problem | Issues, Controversies and Problems | Applications of Data Mining | Implementation of Data Streams in Python Chapter 13: Language Data Processing in Python Natural Language Processing | Text Processing in Python | CGI/Web Programming Using Python | Twitter Sentiment Analysis in Python | Twitter Sentiment Analysis for Film Reviews | Case Study: A Recommendation System for a Film Data Set | Case Study: Text Mining and Visualization in Word Clouds Chapter 14: Clustering Introduction | Distance Measures | K-means Clustering | Hierarchical Clustering | DBSCAN Clustering Chapter 15: Association Rule Mining Introduction | The Apriori Algorithm | An Example of an Apriori Algorithm | An Example Using Python: Transactions in a Grocery Store Chapter 16: Time Series Analysis Using Python Introduction | Components of a Time Series | Additive and Multiplicative Time Series | Time Series Analysis | Case Study on Time Series Analysis Chapter 17: Deep Neural Network and Convolutional Neural Network Overview of Feed Forward Neural Network | Overview of Deep Neural Network | Activation Function | Loss Functions | Regularization | Convolutional Neural Network | Implementation of CNN | Case Studies Chapter 18: Case Studies Digit Recognition | Face and Eye Detection in Images | Correlation and Feature Selection | Fake News Detection | Detecting Duplicate Questions | Weather Prediction and Song Recommendation System | Spam Detection
Index