Preface | book-1

I'm a paragraph. Click here to add your own text and edit me. I’m a great place for you to tell a story and let your users know a little more about you.

PREFACE

We wrote this book because there is a large gap between the elementary statistics course that most people take and the more advanced research methods courses taken by graduate and upper-division students so they can carry out research projects. These advanced courses include difficult topics such as regression, forecasting, structural equations, survival analysis, and categorical data, often analyzed using sophisticated likelihood-based and even Bayesian methods. However, these advanced courses typically devote little time to helping students understand the fundamental assumptions and machinery behind these methods. Instead, they teach the material like witchcraft: Do this, do that, and voilà—Statistics! Students thus have little idea what they are doing and why they are doing it. Like trained parrots, they learn how to recite statistical jargon mindlessly. The goal of this book is to make statistics less like witchcraft, to treat students like intelligent humans and not like trained parrots—thus the title, Understanding Advanced Statistical Methods.

This book will surprise your students. It will cause them to think differently about things, not only about math and statistics, but also about research, the scientific method, and life in general. It will teach them how to do good modeling—and hence good statistics—from a standpoint of deep knowledge rather than rote knowledge. It will also provide them with tools to think critically about the claims they see in the popular press, and to design their own studies to avoid common errors.

There are plenty of formulas in this book, because to understand advanced statistical methods requires understanding probabilistic models, and probabilistic models are necessarily mathematical. But if your students ever find themselves plugging numbers into formulas mindlessly, make them stop and ask, “Why”? Getting students to ask and answer that question is a main objective of this book. Having them perform mindless calculations is a waste of your time and theirs, unless they understand the why. Every formula tells an interesting story, and the story explains the why.

While all statistics books purport to have the goal of making statistics understandable, many try to do so by avoiding math. This book will not shy away from math; rather, it will teach the needed math and probability along with the statistics. Even if your students are math “phobes” they will learn the math and probability theory and hopefully enjoy it, or at least appreciate it.

In particular, statistics is all about unknown, algebraic quantities. What is the probability of a coin landing heads up when flipped? It is not 50%. Instead, it is an unknown algebraic quantity that depends on the construction of the coin and on the methods of the coin-flipper. Any book that teaches statistics while avoiding algebra is therefore a book of fiction!

This book uses calculus at times, where needed to understand continuous distributions and optimization. Students should learn enough calculus to understand the logical arguments concerning these core concepts. But calculus is not a prerequisite. We only assume that students have a comfortable familiarity with algebra, functions and graphs, and spreadsheet software such as Microsoft Excel. The book employs a “just-in-time” approach, introducing mathematical topics, including calculus, where needed. We present mathematical concepts in a concrete way, with the aim of showing students how even the seemingly hard math is really not so hard, and also showing them how to use math to answer important questions about our world.

As far as probability theory goes, we employ a laser-beam focus on those aspects of probabilistic models that are most useful for statistics. Our discussion therefore focuses more on distributions than on counting formulas or individual probability calculations. For example, we present Bayes’ Theorem in terms of distributions rather than using the classical two-event form presented in other sources. For another example, we do not emphasize the binomial distribution; instead we focus on the Bernoulli distribution with independent and identically distributed observations.

This book emphasizes applications; it is not “math for math’s sake.” We take real data analysis very seriously. We explain the theory and logic behind real data analysis intuitively, and gear our presentation toward students who have an interest in science but may have forgotten some math.

Statistics is not a collection of silly rules that students should recite like trained parrots—rules such as p < 0.05, n > 30, r > 0.3, etc. We call these Ugly Rules of Thumb throughout the book to emphasize that they are mere suggestions, and that there is nothing hard-and-fast about any of them. On the other hand, the logic of the mathematics underlying statistics is not ugly at all. Given the assumptions, the mathematical conclusions are 100% true. But the assumptions themselves are never quite true. This is the heart and soul of the subject of statistics—how to draw conclusions successfully when the premises are flawed—and this is what your students will learn from this book.

This book is not a “cookbook.” Cookbooks tell you all about the what but nothing about the why. With computers, software and the Internet readily available, it is easier than ever for students to lose track of the why and focus on the what instead. This book takes exactly the opposite approach. By enabling your students to answer the why, it will help them to figure out the what on their own—that is, they will be able to develop their own statistical recipes. This will empower your students to use advanced statistical methods with confidence.

The main challenge for your students is not to understand the math. Rather, it is to understand the statistical point of view, which we present consistently throughout this book as a mantra:

MODEL PRODUCES DATA

More specifically, the statistical model is a recipe for producing random data. This one concept will turn your students’ minds around 180 degrees, because most think a statistical model is something produced by data, rather than a producer of data. In our experience, the difficulty in understanding the statistical model as a data-generator is the single most significant barrier to students’ learning of statistics. Understanding this point can be a startling epiphany, and your students might find statistics to be fun, and surprisingly easy, once they “get it.” So let them have fun!

Along with the presentation of models as producers of data, another unique characteristic of this book is that it avoids the overused (and usually misused) “population” terminology. Instead, we define and use the “process” terminology, which is always more correct, generally more applicable, and nearly always more scientific. We discuss populations, of course, but correctly and appropriately. Our point of view is consistent with one presented in Statistical Science (Vol. 26, No. 1, 1–9, 2011) by Robert E. Kass and several discussants, in an article entitled “Statistical Inference: The Big Picture.”

Another unique characteristic of this book is that it teaches Bayesian methods before classical (frequentist) methods. This sequencing is quite natural given our emphasis on probability models: The flow from probability, to likelihood, to Bayes, is seamless. Placing Bayesian methods before classical methods also allows for more rounded and thoughtful discussion of the convoluted frequentist-based confidence interval and hypothesis testing concepts.

This book has no particular preference for the social and economic sciences, for the biological and medical sciences, or for the physical and engineering sciences. All are useful, and the book provides examples from all these disciplines. The emphasis is on the overarching statistical science. When the book gives an example that doesn’t particularly apply to you or your students’ fields of study, just change the example! The concepts and methods of statistics apply universally.

The target audience for this book is mainly upper-division undergraduates and graduate students. It can also serve lower-division students to satisfy a mathematics general education requirement. A previous course in statistics is not necessary.

This book is particularly useful as a prerequisite for more advanced study of regression, experimental design, survival analysis, time series analysis, structural equations modeling, categorical data analysis, nonparametric statistics, and multivariate analysis. We introduce regression analysis (ordinary and logistic) in the book, and for this reason, we refer to the data as Y, rather than X as in many other books. We use the variable designation X as well, but mainly as a predictor variable.

The spreadsheet software Microsoft Excel is used to illustrate many of the methods in this book. It is a good idea, but not strictly necessary, to use a dedicated mathematical or statistical software package in addition to spreadsheet software. However, we hope to convince your students that advanced statistical methods are really not that hard, since one can understand them to a great extent simply by using such commonplace software as Excel.

About Using This Book

• Always get students to ask “Why?” The point of the book is not the what; it is the why. Always question assumptions, and aim to understand how the logical conclusions follow from the assumptions.

• Students should read the book with a pencil and paper nearby, as well as spreadsheet or other software, for checking calculations and satisfying themselves that things make sense.

• Definitions are important and should be memorized. Vocabulary terms are given in boldface in the book, and their definitions are summarized at the ends of the chapters. Strive to teach the definitions in the context of your own field of interest, or in the context of your students’ fields of interest.

• Some formulas should be memorized, along with the stories they tell. Important formulas are given at the ends of the chapters.

• We often give derivations of important formulas, and we give the reasons for each step in parentheses to the right of the equations. These reasons are often simple, involving basic algebra. The reasons are more important than the formulas themselves. Learn the reasons first!

• The exercises all contain valuable lessons and are essential to understanding. Have your students do as many as possible.
• A companion website for the book (this one) contains data sets, computer code, sample quizzes and exams, and other supplemental materials.

About the Authors

Peter H. Westfall has a Ph.D. in Statistics and many years of teaching, research, and consulting experience in biostatistics and a variety of other disciplines. He has published over 100 papers in statistical theory and methods, won several teaching awards, and has written several books, one of which won two awards from the Society for Technical Communication. He is former editor of The American Statistician and is a Fellow of both the American Statistical Association and of the American Association for the Advancement of Science.

Kevin S. S. Henning has a Ph.D. in business statistics from Texas Tech University and currently teaches business statistics and forecasting in the Department of Economics and International Business in the College of Business at Sam Houston State University.

Acknowledgments

The authors would like to thank Josh Fredman for his excellent editing and occasional text contributions; students in Dr. Westfall’s ISQS 5347 class, including Natascha Israel, Ajay Swain, Jianjun Luo, Chris Starkey, Robert Jordan, and Artem Meshcheryakov for careful reading and feedback; Drs. Jason Rinaldo and D. S. Calkins for careful reading, needling, and occasional text passages; and the production staff at Taylor & Francis/Academic Press, including Remya Divakaran, Rachel Holt and Rob Calver for helpful direction and editing. Most graphics in the book were produced using the SGPLOT and SGPANEL procedures in SAS software.