# Making Sense of Data II : A Practical Guide to Data Visualization, Advanced Data Mining Methods, and Applications

, by Myatt, Glenn J.; Johnson, Wayne P.**Note:**Supplemental materials are not guaranteed with Rental or Used book purchases.

- ISBN: 9780470222805 | 0470222808
- Cover: Paperback
- Copyright: 2/3/2009

This second installment in the Making Sense of Data series continues to explore a diverse range commonly used approaches to making and communicating decisions from data. Delving into more technical topics, this book equips readers with advanced data mining methods that are needed to engineering, finance, and the social sciences.

Glenn J. Myatt, PhD, is cofounder of Leadscope, Inc. and a Partner of Myatt & Johnson, Inc., a consulting company that focuses on business intelligence application development delivered through the Internet. Dr. Myatt is the author of Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining, also published by Wiley. WAYNE P. JOHNSON, MSc., is cofounder of Leadscope, Inc. and a Partner of Myatt & Johnson, Inc. Mr. Johnson has over two decades of experience in the design and development of large software systems, and his current professional interests include human–computer interaction, information visualization, and methodologies for contextual inquiry.

Preface | p. xi |

Introduction | p. 1 |

Overview | p. 1 |

Definition | p. 1 |

Preparation | p. 2 |

Overview | p. 2 |

Accessing Tabular Data | p. 3 |

Accessing Unstructured Data | p. 3 |

Understanding the Variables and Observations | p. 3 |

Data Cleaning | p. 6 |

Transformation | p. 7 |

Variable Reduction | p. 9 |

Segmentation | p. 10 |

Preparing Data to Apply | p. 10 |

Analysis | p. 11 |

Data Mining Tasks | p. 11 |

Optimization | p. 12 |

Evaluation | p. 12 |

Model Forensics | p. 13 |

Deployment | p. 13 |

Outline of Book | p. 14 |

Overview | p. 14 |

Data Visualization | p. 14 |

Clustering | p. 15 |

Predictive Analytics | p. 15 |

Applications | p. 16 |

Software | p. 16 |

Summary | p. 16 |

Further Reading | p. 17 |

Data Visualization | p. 19 |

Overview | p. 19 |

Visualization Design Principles | p. 20 |

General Principles | p. 20 |

Graphics Design | p. 20 |

Anatomy of a Graph | p. 28 |

Tables | p. 32 |

Simple Tables | p. 32 |

Summary Tables | p. 32 |

Two-Way Contingency Tables | p. 34 |

Supertables | p. 34 |

Univariate Data Visualization | p. 36 |

Bar Chart | p. 36 |

Histograms | p. 37 |

Frequency Polygram | p. 41 |

Box Plots | p. 41 |

Dot Plot | p. 43 |

Stem-and Leaf Plot | p. 44 |

Quantile Plot | p. 46 |

Quantile-Quantile Plot | p. 48 |

Bivariate Data Visualization | p. 49 |

Scatterplot | p. 49 |

Multivariate Data Visualization | p. 50 |

Histogram Matrix | p. 52 |

Scatterplot Matrix | p. 54 |

Multiple Box Plot | p. 56 |

Trellis Plot | p. 56 |

Visualizing Groups | p. 59 |

Dendrograms | p. 59 |

Decision Trees | p. 60 |

Cluster Image Maps | p. 60 |

Dynamic Techniques | p. 63 |

Overview | p. 63 |

Data Brushing | p. 64 |

Nearness Selection | p. 65 |

Sorting and Rearranging | p. 65 |

Searching and Filtering | p. 65 |

Summary | p. 65 |

Further Reading | p. 66 |

Clustering | p. 67 |

Overview | p. 67 |

Distance Measures | p. 75 |

Overview | p. 75 |

Numeric Distance Measures | p. 77 |

Binary Distance Measures | p. 79 |

Mixed Variables | p. 84 |

Other Measures | p. 86 |

Agglomerative Hierarchical Clustering | p. 87 |

Overview | p. 87 |

Simple Linkage | p. 87 |

Complete Linkage | p. 92 |

Average Linkage | p. 93 |

Other Methods | p. 96 |

Selecting Groups | p. 96 |

Partitioned-Based Clustering | p. 98 |

Overview | p. 98 |

k-Means | p. 98 |

Worked Example | p. 100 |

Miscellaneous Partitioned-Based Clustering | p. 101 |

Fuzzy Clustering | p. 103 |

Overview | p. 103 |

Fuzzy k-Means | p. 103 |

Worked Examples | p. 104 |

Summary | p. 109 |

Further Reading | p. 110 |

Predictive Analytics | p. 111 |

Overview | p. 111 |

Predictive Modeling | p. 111 |

Testing Model Accuracy | p. 116 |

Evaluating Regression Models' Predictive Accuracy | p. 117 |

Evaluating Classification Models' Predictive Accuracy | p. 119 |

Evaluating Binary Models' Predictive Accuracy | p. 120 |

ROC Charts | p. 122 |

Lift Chart | p. 124 |

Principal Component Analysis | p. 126 |

Overview | p. 126 |

Principal Components | p. 126 |

Generating Principal Components | p. 127 |

Interpretation of Principal Components | p. 128 |

Multiple Linear Regression | p. 130 |

Overview | p. 130 |

Generating Models | p. 130 |

Prediction | p. 136 |

Analysis of Residuals | p. 136 |

Standard Error | p. 139 |

Coefficient of Multiple Determination | p. 140 |

Testing the Model Significance | p. 142 |

Selecting and Transforming Variables | p. 143 |

Discriminant Analysis | p. 145 |

Overview | p. 145 |

Discriminant Function | p. 146 |

Discriminant Analysis Example | p. 146 |

Logistic Regression | p. 151 |

Overview | p. 151 |

Logistic Regression Formula | p. 151 |

Estimating Coefficients | p. 153 |

Assessing and Optimizing Results | p. 156 |

Naïve Bayes Classifiers | p. 157 |

Overview | p. 157 |

Bayes Theorem and the Independence Assumption | p. 158 |

Independence Assumption | p. 158 |

Classification Process | p. 159 |

Summary | p. 161 |

Further Reading | p. 163 |

Applications | p. 165 |

p. Overview | |

Sales and Marketing | p. 166 |

Industry-Specific Data Mining | p. 169 |

Finance | p. 169 |

Insurance | p. 171 |

Retail | p. 172 |

Telecommunications | p. 173 |

Manufacturing | p. 174 |

Entertainment | p. 175 |

Pharamaceuticals | p. 177 |

Healthcare | p. 178 |

micro RNA Data Analysis Case? Study | p. 181 |

Defining the Problem | p. 181 |

Preparing the Data | p. 181 |

Analysis | p. 183 |

Credit Scoring Case Study | p. 192 |

Defining the Problem | p. 192 |

Preparing the Data | p. 192 |

Analysis | p. 199 |

Deployment | p. 203 |

Data Mining Nontabular Data | p. 203 |

Overview | p. 203 |

Data Mining Chemical Data 203 | |

Data Mining Text | p. 210 |

Further Reading | p. 213 |

Martices | p. 215 |

Overview of Matrices | p. 215 |

Matrix Addition | p. 215 |

Matrix Multiplication | p. 216 |

Transpose of a Matrix | p. 217 |

Inverse of a Matrix | p. 217 |

Software | p. 219 |

Software Overview | p. 219 |

Software Objectives | p. 219 |

Access and Installation | p. 221 |

User Interface Overview | p. 221 |

Data Preparation | p. 223 |

Overview | p. 223 |

Reading in Data | p. 224 |

Searching the Data | p. 225 |

Variable Characterization | p. 237 |

Removing Observations and Variables | p. 228 |

Clearing the Data | p. 228 |

Transforming the Data | p. 230 |

Segmentation | p. 235 |

Principal Component Analysis | p. 236 |

Tables and Graphs | p. 238 |

Overview | p. 238 |

Contingency Tables | p. 239 |

Summary Tables | p. 240 |

Graphs | p. 242 |

Graph Matrices | p. 246 |

Statistics | p. 246 |

Overview | p. 246 |

Descriptive Statistics | p. 248 |

Confidence Intervals | p. 248 |

Hypothesis Tests | p. 249 |

Chi-Square Test | p. 250 |

ANOVA | p. 251 |

comparative Statistics | p. 251 |

Grouping | p. 253 |

Overview | p. 253 |

Clustering | p. 254 |

Associative Rules | p. 257 |

Decision Trees | p. 258 |

Prediction | p. 261 |

Overview | p. 261 |

Linear Regression | p. 263 |

Discriminant Analysis | p. 265 |

Logistic Regression | p. 266 |

Naive Bayes | p. 267 |

kNN | p. 269 |

CART | p. 269 |

Neural Networks | p. 270 |

Apply Model | p. 271 |

Bibliography | p. 273 |

Index | p. 279 |

Table of Contents provided by Ingram. All Rights Reserved. |