Friday, March 02, 2007

The correct interpretation of Dr. Andrey Feuerverger's 1:600 odds calculation

I am happy to provide a guest post from Dr Joe D'Mello on the question of the interpretation of the statistical claims made by the makers of the forthcoming Discovery Channel programme, The Lost Tomb of Jesus. I have reproduced the post in full below, but have also made it available in PDF:

Available here as PDF

I would particularly like to draw your attention to the request for additional data below. Does anyone know a good place for the collection of the relevant pieces of data? Thanks.
------------------------
The correct interpretation of Dr. Andrey Feuerverger's 1:600 odds calculation

Joe D'Mello

There has been plenty of discussion focused on the validity of the numbers and the assumptions used in Dr Andrey Feuerverger's calculation that results in a 1:600 odds claim. While that discussion is certainly interesting, there is a more fundamental issue associated with the very interpretation of this 1:600 odds calculation.

I am a mathematician or, strictly speaking, a former mathematician. After earning my Ph.D. in mathematics from Ohio State University, I worked at Bell Labs and then in the corporate business world for about 15 years before starting my own management training and consulting company (Exequity Inc.). However, I have not severed my ties with the academic and scholarly community, and I still teach operations management, project management, and quantitative business courses in some MBA programs. (I earned my MBA from the Kellogg School, Northwestern University, in 2001.)

First, I would like to point out that Dr. Andrey Feuerverger's calculation is nothing very fancy, involving only very basic mathematical probability that is taught in many undergraduate programs and business schools. It is conceptually no deeper than a problem I could include on a take-home final exam for my MBA students. Actually, several decades ago, when I was a teaching assistant at the State University of New York, I remember giving my undergraduate freshman class problems that required this level of understanding of mathematical probability.

Second, I am willing to accept the 1:600 result that Dr. Andrey Feuerverger has computed. However, it is the INTERPRETATION of this 1:600 result that is of crucial significance here. The media are touting this 1:600 result as:

Interpretation A: “There is only a 1 in 600 chance that this is NOT the Jesus family tomb.” OR, equivalently, “There is a 599 in 600 chance that this IS the Jesus family tomb.”

This interpretation is mathematically, statistically, and semantically flawed, and I am sure that Dr. Andrey Feuerverger is well aware of that. I am really shocked that an individual of his stature would not set the record straight on this and try to make sure that the public knows the correct interpretation. Then again, the truth does not always make for good business or popular TV. Using numbers and language precisely often runs contrary to the goals of advertising! It is generally more advantageous to advertisers to word numerical results and statistical findings in a manner that appears precise and impressive without necessarily being so.

If you read through Dr. Andrey Feuerverger’s calculation at the end of the PDF file on the Discovery Channel website, it is clear that he is restricting his “population” (in a statistical sense) to the roughly 1,000 tombs found in the geographic area in question. He is not basing his calculation on the overall Jewish populace in the area and the time period in question. So, the correct interpretation of his 1:600 odds calculation is:

Interpretation B: "There is a 1 in 600 chance that this particular cluster of names would occur in one of the roughly 1,000 tombs discovered so far"

An alternative but equivalent (to B) interpretation of the 1:600 odds result is:

Interpretation C: "If the Jesus family did indeed have a family tomb (that was among the 1,000 found), then there is a 599 in 600 chance that this particular tomb found is indeed that of the Jesus family"

Clearly, these latter and correct interpretations (B and C) would not sell the TV program very well! What Dr. Andrey Feuerverger has calculated here is known in probability theory as a “conditional probability” (more about that later!). This means that you are calculating the probability of one event on the condition that another has occurred.

If Cameron wants to invoke probability to make his point - and I commend him for trying to do that – then the more relevant probability that he should have gone after is:

“Suppose that (for argument's sake) the cluster of names in question did in fact occur in Jesus' family (assuming that Mariamne was part of that family). Then, what is the probability that there would be at least one other Jewish family in the geographic area in question that had the same name cluster?”

I suspect that if this probability is calculated it would burst Cameron’s bubble and sink his story faster than the Titanic! I would be happy to calculate this probability but would need (ideally) the following data:

1. The name cluster that would make sense to work with (based on the facts known to leading New Testament scholars)

2. The frequencies of these names from a gender perspective. For example: 1 out of every 6 women was named Mary, 1 out of every 12 males was named Jesus, etc.

3. The appropriate geographic area and time period (example: 10 to 110 AD) to consider for this calculation and the population of males and females in that area during that entire time period

4. The percentage of families at that time that would have had family tombs

If I could get help assembling this data, I will be able to quickly compute the probability.

There is another avenue one can take here that uses Dr Andrey Feuerverger's own calculation to calculate a probability far more relevant to Cameron’s claim! To explore that avenue, let’s get back to the notion of conditional probability! Recall that the correct 1:600 odds interpretation is:

Interpretation C: "If the Jesus family did indeed have a family tomb (that was among the 1,000 found), then there is a 599 in 600 chance that this particular tomb found is indeed that of the Jesus family"

Those of you mildly comfortable with quantitative concepts and probability terms should be able to follow the next few computations. The others can just skip the computations and read the text conclusions.

Let B be the event that the 1,000 (approximately) family tombs found to date in the area in question included the Jesus family tomb among them; and let A be the event that the particular tomb found is that of the Jesus family. In the language of probability, Dr. Andrey Feuerverger has calculated:

P(A|B) (read as “the probability of A given that B has occurred”) and he estimates it to be about 599/600

We know from classical probability theory that

P(B) * P(A|B) = P(A and B) (* stands for multiplication)

Now, P(A and B) is the probability that the Jesus family had a family tomb AND that the tomb discovered is that of the Jesus family. Note that the media are taking P(A|B) (Feuerverger’s 599/600 number) and wording it in a manner that makes it appear to the general public that it is in fact P(A and B). This is a fallacy and an out right deception! It behooves Dr. Andrey Feuerverger as a respected member of the academic community to set the record straight here.

To calculate P(A and B) we would need to estimate P(B) and THEN use Dr. Andrey Feuerverger’s 599/600 number. Note that several experts, including Professor Amos Kloner (of Bar-Ilan University in Israel) have strongly asserted that there is a very small, if any, likelihood that the Jesus family had a tomb to begin with. So, for illustration only, suppose we assumed that there was a 1 in 10 chance that the Jesus family had their own tomb to begin with. This means that P(B) would be roughly 1/10. Using the formula above:

P(B) * P(A|B) = P(A and B)

We see that P(A and B) = (1/10) * (599/600) = 0.1 (approximately). This immediately slashes the probability of the discovered tomb being that of the Jesus family down to 0.1 or 10%. In other words, there is then only a 10% chance that the discovered tomb belongs to the Jesus family – a number not likely to draw a runaway TV audience for Cameron!

Finally, I would like to note that, in the spirit of intellectual honesty and fairness, I sent two e-mails to Dr. Andrey Feuerverger at his University of Toronto email address (copies below). The e-mails requested a detailed write-up of his assumptions and calculations (and an interpretation of the results, of course). The second email was copied to the University of Toronto President, Dr. David Naylor. Neither of them has responded to date.

I would really love to have an open and honest discussion (maybe on this blog!) with Dr. Andrey Feuerverger and find out if he agrees or disagrees with what I have claimed above. If anyone can induce him to enter into a discussion that would be great!

Date: Wed, 28 Feb 2007 08:59:07 -0800 (PST)
From: Joe D'Mello
Subject: Fwd: Request for assumptions & calculations
To: andrey@utstat.toronto.edu
CC: president@utoronto.ca

Dear Professor Feuerverger,

Since I did not hear back from you on the email I sent yesterday (copy below), I sent a formal request today to Discovery Channel requesting the assumptions and details underlying your calculations, and am also copying your president, Dr. David Naylor, on this email. I'm sure that as a respected faculty member of a university of worldwide repute, any professional assertions you make, especially in matters that have profound historical significance, will have sound documentation and analysis, and will pass the highest levels of academic scrutiny and peer review.

The brief numerical calculation in the pdf document on The Discovery Channel website raises more questions than it answers, and it appears to me that the logic is flawed. However, I cannot be sure unless I can inspect the detail and assumptions underlying your calculations. Will it be possible for you to send me these? Better still, could you kindly post that detail (at a level comparable to that of a scholarly research publication) on the Discovery Channel website, so that fellow academics can have the opportunity to understand and appreciate your work?

Best regards,

Dr. Joe D'Mello


Joe D'Mello wrote:

Date: Tue, 27 Feb 2007 08:49:19 -0800 (PST)
From: Joe D'Mello
Subject: Request for assumptions & calculations
To: andrey@utstat.toronto.edu

Dear Professor Feuerverger,

As a fellow mathematician I am sending you this email to request your calculations (and associated assumptions) for the probability numbers being circulated in the media about the 600:1 odds (attributed to your calculations) that the tomb belonged to Jesus's family. I am generally suspect of media coverage, and want to get the real scoop directly from you, so I can get a better understanding of the assumptions and the true interpretation of these odds. I would appreciate any information you can provide in this regard.

Best regards,

Joe D'Mello

Chicago, USA

---------------------------------------------

14 comments:

Curious Presbyterian said...

Dr. Richard Bauckham would have all the data that you require to make the calculation. Some of it is found in chapter 4 of his recent book 'Jesus and the Eyewitnesses'. He can be contacted at the University of St. Andrews, Scotland, or via:
rjb@st-andrews.ac.uk

Chris Rosebrough said...

I've written a comprehensive rebuttal of the films claims. Please read it and decide for yourself whether or not the film claims are solid or a hoax.

You will find it at extremetheology.com

Jim Deardorff said...

Your earlier arguments are most germane, Mark, and so also those of several of
your commentors. I.e., that additional facts that do not support the Jesus-tomb claim also need to be taken into account in any proper statistical analysis. And there is a sound statistical way to take account of
all pieces of information that tend not to support a case as well as all
that support it. It's called Bayesian statistics.

There, one must be able to assign a probability, p, to each mutually exclusive piece of information that tends to go against the claim (p less than 0.5), as well as to each one that supports the claim (p greater than 0.5). Then all these individual probabilities can be accumulated, via a relatively simple Bayesian statistics formula, into a single overall probability.

Each item of input needs to be as correct as possible. E.g., instead of Dr. Feuerverger's probability (599/600) as one item, one would use the (much smaller) result from Dr. D'Mello's analysis that takes into account the conditional probability.

One proviso is that ALL information bearing on the question needs to be
included in the Bayesian analysis. This would include, e.g., the probability assigned to the lack of mention in the Gospels or elsewhere of Jesus having a wife -- there could be a reason for that, but also reasons against it that would lead to a best estimate of the probability stemming from that one datum point alone.

Like Mark, I'm also no statistician, but have studied this type of problem within the framework of Bayesian statistics.

Jim Deardorff

Eliezer Yudkowsky said...

Readers having trouble with the Bayesian statistics see An Intuitive Explanation of Bayesian Reasoning.

glee20@uic.edu said...

The 1/600 statistic is incorrect on a number of levels.

First, from an interpretive standpoint you have to understand that the final number they are attempting to calculate is an expected value, not a probability. The number represents the average number of tombs you would expect to find with those names if you could investigate every tomb if there were 1000 total tombs and each tomb had a certain number of names. (This is known as an "expected value" or "average value.") They attempt to calculate the probability of a given tomb having a particular set of names, then multiply that probability by the total number of tombs. This tells you how many tombs you expect to find on average.

Therefore, the number they come up with is highly dependent on the number of tombs during that time period. The higher or lower the total number of tombs, the higher or lower number of tombs with those four names you would expect to find.

Thus the number they are trying to calculate (even if they got it right, which they don't) does not tell you the likelihood of finding a family during that time with those four or five names in them. In other words, there might be many, many families with people of those names in them, but because only a small percentage of families had tombs, the number of tombs you expect to find is small.

Anonymous said...

I'm afraid that Dr. D'Mello's Interpretation B and Interpretation C do NOT accurately represent the correct interpretation of Dr. Feueverger's 1/600 calculation.

The methodology employed by Feueverger leads to an expected value, not probability odds.

What 1/600 represents is the (incorrect) expected value of how many tombs one would find with the four names in that exact order, if 1000 tombs all had exactly four names. We ignore the two other names and the unnamed ossuaries for this calculation.

If you look at what Feuerverger did, he multiplied the probabilities of the four names. This gives you the probability in one tomb of finding those four names in that order, if you only have four total names. If you multiply this probability by a 1000, this gives you the expected value.

However, when you factor in the fifth name, sixth name, and the four unnamed ossuaries, the expected value changes dramatically. Furthermore this is a drastic simplification as tombs would be expected to have varying numbers of names.

Anonymous said...

I might be missing something here, but if the statistician on the programme is simply talking about expected values, then it seems that the 1/600 figure is pretty meaningless. If all we are saying is that the EV of finding a specific combination of names in 1000 tombs in first century Jerusalem is 1/600, then how does that add to the debate?

Within any sample of the population there are bound to be lots of combinations of names that are intrinsically unlikely, but in order to identify a specific combination with an actual family in the wider population, you would need to take into account the whole of the population, not simply the sample. Therefore, even if we are talking about EV's rather than probabilities, I think that Dr. D'Mello's points still hold.

Does anyone have any relevant population data for the period?

Stuart

Anonymous said...

"So, for illustration only, suppose we assumed that there was a 1 in 10 chance that the Jesus family had their own tomb to begin with.
...
We see that P(A and B) = (1/10) * (599/600) = 0.1 (approximately)."

-- This is already accounted for. This is the bias error calculation. They've divided by 4 (not 10).

It basically means... assuming there are another 3000 tombs undiscovered... the likelihood of finding these THREE names together is 1 in 600.

You should find 5 more tombs like this, theoretically, if there are a ton of undiscovered tombs.
*******
Then again, if you were looking at alternative calculations...
they say no other tombs say "Mariamne". So that makes her 1 in 1000. Finding this name in connection with others has gotto increase the odds.

Anonymous said...

Yes, anonymous, you are correct. He already tried to account for the fact that there would be more tombs than we know about. It is disturbing that someone trying to set the record straight would miss this key fact.
As far as the comment about the specific order being important, there is nothing wrong about that since the order in this case would be the familial relationships, which is important!

Dianelos said...

Well, Prof. Feuerverger the professional statistician who in the film says that there is only a 1/600 probability that the Talpiot tomb does not belong to Jesus's family has already backtracked from his claim. See it in his own words here: http://fisher.utstat.toronto.edu/andrey/OfficeHrs.txt (the relevant bit is "I now believe that I should not assert any conclusions connecting this tomb with any hypothetical one of the NT family.") So there is really no question that the statistics in the film are bogus.
Actually, the discovery site explains the statistics of the movie makers (go to http://dsc.discovery.com/convergence/tomb/explore/explore.html, click on "Enter the Tomb", then on "Supporting Evidence", and finally "Statistical Evidence"). Clearly the question they asked was: How probable is to find this combination of suggestive names in a tomb? The answer given to them by Prof. Feuerverger basically was: The frequencies of "Jesus son of Joseph", "Maria", "Joseph", and "Mariamene" (the relevant names found in the Talpiot tomb) are 1/190, 1/160, 1/20, 1/4 respectively; therefore the probability of finding them all in one tomb is their product, i.e. 1/2,400,000. Let's conservatively cut this number to a quarter of it in order to account for biases in the historical sources; we get then that the probability of finding this particular mix of numbers in one tomb is 1/600,000. There are a thousand tombs around Jerusalem from this era; therefore the probability of any one tomb having these names is 1/600.

So? How do they get from this answer to the conclusion that there is "a high statistical probability that the Talpiot tomb is the Jesus Family tomb" as the discovery site says? This would *only* follow if we knew 1) that Jesus family was entombed in Jerusalem and 2) that their tomb would have the names "Jesus son of Joseph", "Maria", "Joseph" and "Mariamene" inscribed in them. But neither 1) nor 2) are likely. Indeed most people would think that 2) is highly unlikely, especially taking into account that also an inscription "Judas son of Jesus" was found in the Talpiot tomb.

What's worse, you don't just simply multiply probabilities. Suppose you put two coins in a match box, toss it, open the box and find one coin showing heads and the other tails. How probable is that result? The probability of heads is 1/2 and the probability of tails is also 1/2 so, according to the film's logic, the probability of getting one heads and one tails is their product, i.e. 1/4. But that's grossly wrong. The correct probability of getting one heads and one tails is 1/2 (if you don't believe it try the experiment and count the results :-) The correct probability of finding a tomb around Jerusalem with ossuaries with inscriptions "Jesus son of Joseph", a "Mary", a "Joseph" and a "Mariamene" is in fact approx. 1/170. But again, that probability has little to do with the question at hand.

The question at hand of course is: Given that we found a tomb with this combination of suggestive names, what is the probability that this tomb belongs to Jesus's family? This is the question I tried to answer using Monte Carlo simulation.

My first analysis shows that there were about 12 families in ancient Jerusalem who might have produced a tomb with just as unlikely a combination of names (that according to the gospels belong to Jesus's family) as the Talpiot tomb. So, *at best*, there is a 1/12 probability of the Talpiot tomb being Jesus's.

My second analysis shows that even assuming that the family of Jesus has the members the film makers hypothesize (Jesus son of Joseph, Mary, Joseph, Mary), and even assuming that all these members would be buried in a potential Jesus family tomb, the film makers' method of identifying a tomb as being Jesus's based on the very low probability of finding a particular combination of names in it would be correct in only about 8% of the positive identifications (so again we get at best a probability of 1/12 for the Talpiot tomb being Jesus's). (I have posted details of both these analyses in Usenet.)

Some bring up the issue of the inscription "Mariamene". My understanding after reading http://benwitherington.blogspot.com/2007/03/smoking-gun-tenth-talpiot-ossuary_9874.html is that "Mariamene" just an alternative form of the name Mary (or Miriam, or Maria, or Mariamme, etc). There is no good reason to believe that Mary Magdalene was called Mariamene and not Mary, as she is called "Mary" in the gospels and the earliest mention of "Mariamene" was written much later than the gospels (some 100 years later) in a rather primitive Gnostic text (the "Acts of Phillip", see: http://en.wikipedia.org/wiki/Acts_of_Phillip ). Neither do we have reason to believe that Mary Magdalene was married to Jesus and would possibly get a place in Jesus's family tomb, assuming that such tomb exists. In fact the film makers make an assumption that depends on four (count them) "ifs": If Mary Magdalene was known as "Mariamene", and if she was married to Jesus, and if Jesus's family was entombed in Jerusalem, and if Mary Magdalene was entombed in the Jesus's family tomb then it's probable that the Talpiot tomb is Jesus's tomb. I analyzed this case too. It turns out that if we accept all these "ifs" then indeed there is a probability of 90% that the Talpiot tomb belongs to the family of Jesus - but still far less than the 99.8% (or 599/600) probability the film makers suggested as the most conservative one. In short the film's thesis is based on very shaky assumptions, and its math is wrong to boot.

glee said...

I would like to clarify my earlier remarks regarding "expected values."

I think all the folks, including Dr. D'Mello, who mention that you need Bayesian statistics to do any real sort of calculation are directly on point. I didn't mean to contradict that in any way.

My point was that the documentary statistics are even worse than people are saying, and that by talking about Bayesian statistics you are giving them too much credit, at least in so far as what was available at the doc website.

When I look at the pdf of the statistics provided on the documentary website, what I see is that the methodology they use gives you an expected value. They calculate the probability of finding four particular names at random in one tomb, then multiply by a thousand tombs and by a factor four to correct for biases. I wish I could enter this formula in LaTEX on this blog, because you will see that this corresponds to the formula for expected value.

As far as whether the order matters or not, the family relationship has nothing to do with the order, in a statistical sense. Think of a poker hand -- if you draw three queens and two jacks, it doesn't matter what order you draw them in, so long as you end up with those five cards. Similarly, when you are calculating the probability of finding four names at random in one tomb, if you simply multiply the four individual probabilities, you are calculating the odds of finding them in a particular order. This is basic first semester combinatorics.

glee said...

P.S. I also agree with much of the sentiments expressed by dianelos above.

glee20 said...

P.P.S. As a further poker illustration, the odds of a full house in five card draw are different than the odds of a full house in Texas Hold'Em, because in the latter you can choose the best five cards out of seven.

The calculation shown on the doc website simply ignores the presence of the other ossuaries. This is analogous to calculating seven card odds while only using five cards to calculate with. It skews your results.

donbock said...

I don't know if this helps, but I noticed this posting at Dr. Feuerverger's web site:
http://fisher.utstat.toronto.edu/andrey/OfficeHrs.txt
This is an open letter to statistical collegues.