Certain of Uncertainty

DAVID DRAPER'S WORK includes developing statistical methods for dealing with some of the thorniest problems facing modern society, such as how to evaluate the quality of hospitals and schools, and how to assess the risks of nuclear waste disposal. His fellow statisticians in UCSC's newly forming Department of Applied Mathematics and Statistics study problems ranging from rainfall prediction to the interpretation of electrocardiogram readings from heart patients.
But if Draper sometimes sounds more like a history professor than a statistician when he talks about his work, it may be because the history of his field is so intriguing.

You might say it all started in 1654 in the court of King Louis XIV, when gambling was all the rage. An inquiry from a French knight, Chevalier de Méré, about the chances of winning a certain game of dice seems to have prompted an exchange of letters between the two leading French mathematicians of the day, Pascal and Fermat. Their correspondence laid the foundation for the mathematics of probability.

"The interesting thing is that right from the beginning, in the original exchange of letters between Pascal and Fermat, two completely different notions of probability were developed side by side," says Draper, professor and chair of the department in UCSC's Baskin School of Engineering.

Probability theory was essential to the development of statistics, mostly in the 20th century, as a set of mathematical tools for analyzing data from experiments and observations.

The two views of probability first put forth by Pascal and Fermat eventually gave rise to two very different approaches to statistics, now known as the frequentist (or relative frequency) and Bayesian approaches. Until recently, the frequentist approach has dominated the field.

The frequentist theory of probability is based on the idea of repetition: How frequently would a particular outcome occur in repeated trials under the same conditions? It's a natural approach for gambling, but it only works for things that are repeatable, like rolling a pair of dice. Suppose you're interested in the probability that Al Gore will run for president in 2004. From the frequentist perspective, the problem doesn't even make sense, because the event in question is a one-time occurrence that can't be repeated.

But Pascal and Fermat also described a "subjective" approach to probability. Imagine placing a bet on some proposition - say, that Gore will run in 2004 - and ask yourself what odds you would have to give or receive for you to judge the bet to be fair. It may be a subjective judgment, but it allows you to quantify your uncertainty about the proposition.

This approach was further developed in the mid-18th century by the Reverend Thomas Bayes, an English Presbyterian minister and amateur mathematician. The theorem that would eventually make him famous was discovered among his papers and published posthumously in 1764. But it took a couple of centuries and a series of important contributions from other mathematicians before Bayes's ideas blossomed into a whole new approach to statistics.

What Bayes did was to develop a mathematical formula for revising subjective probability in the light of new evidence. Bayes showed how to use this approach to draw inferences about future events based on the results of previous trials. The example he used involved rolling billiard balls on a table, but his method turns out to be so general that it is applicable to a wide range of statistical problems, from analyzing the results of clinical trials to economic decision making.

Within the past decade or so, the Bayesian approach has gone from being a controversial theory on the margins of mainstream statistics to being widely accepted as a valuable alternative to the frequentist approach.

"People now realize that there is merit in both ways of looking at the world, and to insist on using only one would be like fighting with one hand tied behind your back," Draper says.

As Draper builds the Department of Applied Mathematics and Statistics, the emphasis in the statistics group is clearly Bayesian. (See story below for a description of the applied math group.)

"The two main areas in which the department aims to achieve excellence are Bayesian statistical methods and mathematical modeling of complex natural phenomena," Draper says. "The focus in both cases is on solving real-world problems in engineering and the sciences."

One of the desirable features of the Bayesian approach to statistical inference and decision making is that it provides a straightforward way to combine new information with existing knowledge.

"You can use all of the existing information you have on the problem and build that into your statistical model in a very natural way," says Bruno Sansó, a visiting associate professor in UCSC's statistics group.

Draper

Much of the controversy around Bayesian ideas has stemmed from philosophical resistance to letting subjectivity play a role in the scientific process.

David Draper

Photo credit: r. r. jones

Much of the controversy around Bayesian ideas has stemmed from philosophical resistance to letting subjectivity play a role in the scientific process. In order to evaluate new evidence relating to some hypothesis, a Bayesian has to establish a "prior" probability of the hypothesis being correct. Different people, even experts in the field, may well come up with different values for the prior probability.

But Bayesians point out that there is no truly objective method of quantifying uncertainty. The Bayesian approach actually embraces this problem directly, Draper says, because it allows for the fact that different people will make different assessments.

"There are simple things that we can all agree on the probability of, but when you get into more complicated situations you discover there are always elements of judgment involved," he says. "People have gradually and grudgingly come to understand that the objectivity we hoped to get from the frequentist approach is a myth, and what we should be doing instead is to be as clear as possible about what we assume and to show whether different assumptions all lead to the same outcome or not."

The biggest stumbling block for Bayesian statistics, however, was not the issue of subjectivity but the complexity of the math. At the heart of the Bayesian approach is a tremendously difficult mathematical task involving a type of calculus called multiple integration. Most calculus students struggle with problems involving integrals in two dimensions, and would find an integral in, say, 300 dimensions impossible to compute. For centuries, such problems were effectively unsolvable, and the application of Bayes's theorem was restricted to fairly simple situations.

"There was a long period in which Bayesians couldn't really make computations in complicated statistical situations," Draper says.

Two developments eventually came together to make Bayesian statistics more than just an interesting theory. One was the discovery of mathematical techniques for handling high-dimensional integration problems, and the second was the advent of computers fast enough to actually do the calculations in a reasonable time.

"In the early 1990s, at the moment when computers became fast enough to really make it possible, Bayesian ideas burst forward again and created a revolution in statistics," Draper says.

Although Bayesian reasoning is a key part of his approach to statistical problems, Draper is not about to jettison frequentist methods.

"If the frequentist and Bayesian approaches are like boxers who have been punching each other for the past 350 years, both boxers are still standing, which to me means that there must be elements of merit in both," he says. "Rather than choosing one paradigm or the other, I think our job is to create a fusion of the two that emphasizes the strengths of each and de-emphasizes their weaknesses."

Draper's way of doing this is to use a Bayesian approach to formulate inferences and predictions, and then evaluate how good they are, using frequentist methods. It's a way to "keep us honest," he says.

"The main potential weakness of the Bayesian approach is that nothing guarantees that my uncertainty assessment is any good for you - I'm just expressing an opinion," Draper says. "To convince you that it's a good uncertainty assessment, I need to show that the statistical model I created makes good predictions in situations where we know what the truth is, and the process of calibrating predictions against reality is inherently frequentist."

AMSGROUPUCSC's applied mathematics and statistics group: (l-r, standing) Bruno Mendes, Roberto Sassi, Hong Zhou, Raquel Prado, Hongyun Wang, Peter Grünwald, Shufeng Liu; (l-r, seated) Neil Balmforth, Bruno Sansó, David Draper

Photo credit: r. r. jones

Draper has been working on health policy issues since the mid-1980s, when he was part of a large project at the RAND Corporation, a southern California think tank, studying the cost-effectiveness of the Medicare system on behalf of the federal government. A central problem he worked on then - how to measure the quality of care that hospitals offer their patients - is still a major focus of his research.

In a nutshell, there are good ways of assessing quality that are too expensive to be practical on a large scale, and there are cheaper ways that yield less reliable information. Draper's work is aimed at finding a combination of assessment strategies that can yield good information at a reasonable cost.

"That's a problem we are able to come much closer to solving now using Bayesian techniques," he says.

Quality assessment in health care has a parallel in education, where the quality of schools is always an issue. To a statistician, the data look much the same whether they describe patients in hospitals or students in schools. Both situations require what statisticians call hierarchical modeling, and the Bayesian approach turns out to be naturally suited to this task.

"Any techniques you create for measuring quality in hospitals would also be useful for measuring quality in schools," Draper says.

Bayesian methods have proven useful in a wide range of disciplines, in part because they are more flexible and general than other approaches. Bayesians are not stymied by incomplete data sets or multiple sources of uncertainty.

"Bayesian methods handle the messiness of real-world scientific data better than other approaches," says David Haussler, University Professor of computer science and director of UCSC's Center for Biomolecular Science and Engineering. Haussler has used Bayesian methods to analyze human genome data, and one of his graduate students, Chuck Sugnet, is now working with Draper to devise statistical methods for handling certain types of genetic data.

Raquel Prado, an assistant professor of applied mathematics and statistics, uses Bayesian methods to analyze the signals from biomedical devices, such as electroencephalograms and electrocardiograms. Her work may enable physicians to extract more information about a patient's health or prospects for recovery from these kinds of tests.

Lecturer Marshall Sylvan uses frequentist and Bayesian methods to solve a variety of problems, including predicting the scores of wines in expert taste-testings from their chemical compositions. Sylvan has taught statistics at UCSC since 1965, when he was one of the campus's founding instructors.

Sansó uses Bayesian methods to predict rainfall and gain insights into climate patterns. He and his colleagues in the department also hope to develop collaborations in new areas of research.

"Statistics is a tool, and you can apply the same methods to many different issues - in engineering, psychology, economics, the environment - that's one of the beauties of this job," Sansó says.

- Tim Stephens

 

 

Mathematical solutions for complex problems

BALMFORTH.UCSC's Department of Applied Mathematics and Statistics encompasses two related but distinct disciplines. Both statistics and applied mathematics involve using mathematics to solve real-world problems, but they address different kinds of problems. The applied mathematics group at UCSC focuses on developing mathematical models of complex natural phenomena, ranging from ocean circulation to molecular motions in biological systems.

"Mathematics is the tool we use to understand the phenomena we're interested in," says Neil Balmforth, associate professor and head of the department's applied math group.

What distinguishes Balmforth as a mathematician who studies astrophysics (among other things), from an astrophysicist is the different expertise that each brings to the investigation of, say, star formation. There is also a difference in philosophy underlying their approaches to problems, Balmforth says.

TURBULENCERGB"It's the underlying mathematical language that's important to the applied mathematician," he says. "Similar mathematical techniques can help you understand many different problems."

The "cat's-eye" in this image (inset) is the result of waves propagating through a shear flow, a central problem in modeling the fluid flows encountered in oceanography and atmospheric sciences. Balmforth's dynamic model shows how the waves "break," overturning the flow into a series of vortices - a chain of cat's-eyes - that themselves break up into smaller vortices. This pattern has implications for the mixing of the oceans and atmosphere.

Photo credit: r. r. jones (right); Cat's eye image: Neil Balmforth

Balmforth's work on fluid dynamics and turbulence, for example, has applications in astrophysics, geophysics, and oceanography. Hongyun Wang, an assistant professor in the applied math group, studies problems in biophysics and molecular biology. He is particularly interested in the workings of the tiny molecular motors and pumps that operate within living cells. Lecturer Hong Zhou studies fluid dynamics problems in industrial processes.

In every case, their work involves creating sophisticated mathematical models of complex systems. Focusing on the mathematics does have dangers, however, according to Balmforth.

"You can get intoxicated by the mathematical elegance of the problem and do something that may be very elegant but is not really applicable to the problem that motivated it," he says. "Collaboration is one of the best ways to make sure you have a good handle on the application, because you're working with someone who really cares about using the results."

Balmforth says he looks forward to developing collaborations with researchers in other departments on campus. "That's part of what motivated the creation of this department, to lend a mathematical hand to other researchers."

- Tim Stephens

 


Return to Winter 2002 Issue Contents