Archives
Categories
Afghanistan and Central Asia
Anticontrarianism Biology Complexity Corrupting the Young Creationism Cthulhiana Enigmas of Chance Food Friday Cat Blogging Incestuous Amplification IQ Islam Learned Folly Linkage Mathematics Minds, Brains, and Neurons Modest Proposals Networks Philosophy Physics Postcards Power Laws Psychoceramica Scientifiction and Fantastica Self-Centered The Beloved Republic The Collective Use and Evolution of Concepts The Commonwealth of Letters The Continuing Crises The Dismal Science The Eternal Silence of These Infinite Spaces The Great Transformation The Natural Science of the Human Species The Progressive Forces The Running-Dogs of Reaction Writing for Antiquity
Self-Centered
Personal homepage
Professorial homepage News and Blogroll Research Notebooks My books (LibraryThing) Buy me more books! (wishlist) del.icio.us Dopplr
Books to Read While the Algae Grow in Your Fur
Books I've read in the last month or so and
feel I can recommend
|
June 16, 2009Because You Really Wished I'd Write More About Books
... I present for your amusement my reviews of Flynn on the Flynn Effect, and Tett on synthetic collateralized debt obligations and their kin. The former is the original version of something which had to be drastically cut down to fit into American Scientist; the latter appears nowhere else. (But, y'know, make me an offer.) Anyone who tries to take this as an opportunity to drag me back into arguing about IQ will be ignored. Posted by crshalizi at June 16, 2009 15:37 | permanent link
On the Certainty of the Bayesian Fortune-Teller
Attention conservation notice: 2300 words of technical, yet pretentious and arrogant, dialogue on a point which came up in a manuscript-in-progress, as well as in my long-procrastinated review of Plight of the Fortune Tellers. Why don't you read that book instead? Q: You really shouldn't write in library books, you know; and if you do, your marginalia should be more helpful, or less distracting, than just "wrong wrong wrong!" A: No harm done; my pen and I are both transparent rhetorical devices. And besides, Rebonato is wrong in those passages. Q: Really? Isn't his point that it's absurd to pretend you could actually estimate a something like a probability of an interest rate jump so precisely that there's a real difference between calling it 0.500 000 and calling it 0.499 967? Isn't it yet more absurd to think that you could get the 99.5 percent annual value-at-risk — the amount of money you'd expect to lose once in two thousand years — down to four significant figures, from any data set, let alone one that covers just five years and so omits "not only the Black Death, the Thirty Years' War, the Barbarian invasions, and the fall of the Roman Empire, but even the economic recession of 1991 — the only meaningful recession in the last twenty years" (as of 2006), to say nothing of the "famous corporate loan book crises of the Paleochristian era" (p. 218)? A: Of course all that's absurd, and Rebonato is right to call people on it. By the time his book came out it was too late to do much good, but if people had paid attention to such warnings I dare say we wouldn't be quite so badly off now, and they had better listen in the future. Q: So what's your problem. Oh, wait, let me guess: you're upset because Rebonato's a Bayesian, aren't you? Don't bother, I can tell that that's it. Look, we all know that you've got objections to that approach, but at this point I'm starting to think that maybe you have issues. Isn't this sort of reflexive hostility towards a whole methodology — something you must run into every day of work — awkward and uncomfortable? Embarrassing, even? Have you thought about seeking help? A: Actually, I have a serious point to make here. What Rebonato wants is entirely right-headed, but it fits very badly with his Bayesianism, because Bayesian agents are never uncertain about probabilities; at least, not about the probability of any observable event. Q: But isn't Bayesianism about representing uncertainty, and making decisions under uncertainty? A: Yes, but Bayesian agents never have the kind of uncertainty that Rebonato (sensibly) thinks people in finance should have. Q: Let me try to pin you down in black and white. [Opens notebook] I have here on one side of the page our old friend, the well-known the probability space Omega F. Prob. Coming out of it, in the middle, is a sequence of random variables X1, X2, ... , Xn, ... , which have some joint distribution or other. (And nothing really depends on its being a sequence, I could use a random field on a network or whatever you like, add in covariates, etc.) On the other side of the random variables, looking at them, I have a standard-issue Bayesian agent. The agent has a hypothesis space, each point m of which is a probability distribution for the random sequence. This hypothesis space is measurable, and the agent also has a probability measure, a.k.a. prior distribution, on this space. The agent uses Bayes's rule to update the distribution by conditioning, so it has a sequence of measures D0, D1, etc. A: I think you are missing an "As you know, Bob", but yes, this is the set-up I have in mind. Q: Now I pick my favorite observable event f, a set in the joint sigma-field of the Xi. For each hypothesis m, the probability m(f) is well-defined. The Bayesian thinks this is a random variable M(f), since it has a distribution D on the hypothesis space. How is that not being uncertain about the probability of f? A: Well, in the first place — Q: I am not interested in quibbles about D being a Dirac delta function. A: Fine, assume that D doesn't put unit mass on any single hypothesis, and that it gives non-zero weight to hypotheses with different values of m(f). But remember how Bayesian updating works: The Bayesian, by definition, believes in a joint distribution of the random sequence X and of the hypothesis M. (Otherwise, Bayes's rule makes no sense.) This means that by integrating over M, we get an unconditional, marginal probability for f: Pn(f) = EDn[M(f|X1=x1, X2=x2, ... , Xn=xn)] Q: Wait, isn't that the denominator in Bayes's rule? A: Not quite, that equation defines a measure — the predictive distribution — and the denominator in Bayes's rule is the density of that measure (with n=0) at the observed sequence. Q: Oh, right, go on. A: As an expectation value, Pn(f) is a completely precise number. The Bayesian has no uncertainty whatsoever in the probabilities it gives to anything observable. Q: But won't those probabilities change over time, as it gets new data? A: Yes, but this just means that the random variables aren't independent (under the Bayesian's distribution over observables). Integrating m with respect to the prior D0 gives us the infinite-dimensional distribution of a stochastic process, one which is not (in general) equal to any particular hypothesis, though of course it lies in their convex hull; the simple hypotheses are extremal points. If the individual hypothesis are (laws of) independent, identically-distributed random sequences, their mixture will be exchangeable. If the individual hypotheses are ergodic, their mixture will be asymptotically mean-stationary. Q: Don't you mean "stationary" rather than "asymptotically mean-stationary"? A: No; see chapter 25 here, or better yet that trifler's authority. Q: You were saying. A: Right. The Bayesian integrates out m and gets a stochastic process where the Xi are dependent. As far as anything observable goes, the Bayesian's predictions, and therefore its actions, are those of an agent which treats this stochastic process as certainly correct. Q: What happens if the Bayesian agent uses some kind of hierarchical model, or the individual hypotheses are themselves exchangeable/stationary? A: The only thing that would change, for these purposes, is the exact process the Bayesian is committed to. Convex mixtures of convex mixtures of points in C are convex mixtures of points in C. Q: So to sum up, you're saying that the Bayesian agent is uncertain about the truth of the unobservable hypotheses (that's their posterior distribution), and uncertain about exactly which observable events will happen (that's their predictive distribution), but not uncertain about the probabilities of observables. A: Right. (Some other time I'll explain how that helps make Bayesian models testable.) And — here's where we get back to Rebonato — all the things he is worried about, like values-at-risk and so forth, are probabilities of observable events. Put a Bayesian agent in the risk-modeling situation he talks about, and it won't just say that the 99.5% VaR is 109.7 million euros rather than 110 million, it will give you as many significant digits as you have time for. Q: So let me read you something form p. 194--195: Once frequentists accept (at a given statistical level of confidence) the point estimate of a quantity (say, a percentile), they tend to act as if the estimated number were the true value of the parameter. Remember that, for a frequentist, a coin cannot have a 40% chance of being biased. Either the coin is fair or it is biased. Either we are in a recession or we are not. We simply accept or reject these black-or-white statements at a certain confidence level... A Bayesian approach automatically tells us that a parameter (say, a percentile) has a whole distribution of possible values attached to it, and that extracting a single number out of this distribution (as I suggested above, the average, the median, the mode, or whatever) is a possibly sensible, but always arbitrary, procedure. No single number distilled from the posterior distribution is a primus inter pares: only the full posterior distribution enjoys this privileged status, and it is our choice what use to make of it. This seems entirely reasonable; where do you think it goes wrong? A: You mean, other than the fact that point estimates do not have "statistical levels of confidence", and that Rebonato has apparently forgotten about actual confidence intervals? Q: Let's come back to that. A: He is running together parameters of the unobserved hypotheses, and the properties of the predictive distribution on which the Bayesian acts. I can take any function I like of the hypothesis, g(m) say, and use it as a parameter of the distribution. If I have enough parameters gi and they're (algebraically) independent of each other, there's a 1-1 map between hypotheses and parameter vectors — parameter vectors are unique names for hypotheses. I could make parts of those names be readily-interpretable aspects of the hypothetical distributions, like various percentiles or biases. The distribution over hypotheses then gives me a distribution over percentiles conditional on the hypothesis M. But we don't know the true hypothesis, and on the next page Rebonato goes on to cast "ontological" doubt about whether it even exists. (How he can be uncertain about the state of something he thinks doesn't exist is a nice question.) We only have the earlier observations, so we need to integrate or marginalize out M, and this collapses the distribution of percentiles down to a single exact value for that percentile. Q: Couldn't we avoid that integration somehow? A: Integrating over the posterior distribution is the whole point of Bayesian decision theory. Q: Let's go back to the VaR example. If you try estimating the size of once-in-two-thousand-year losses from five years of data, your posterior distribution has got to be pretty diffuse. A: Actually, it can be arbitrarily concentrated by picking the right prior. Q: Fine, for any reasonable prior it needs to be pretty diffuse. Shouldn't the Bayesian agent be able to use this information to avoid recklessness? A: That depends on the loss function. If the loss involves which hypothesis happens to be true, sure, it'll make a difference. (That's how we get the classic proof that if the loss is the squared difference between the true parameter and the point estimate, the best decision is the posterior mean.) But if the loss function just involves what observable events actually take place, then no. Or, more exactly, it might make sense to show more caution if your posterior distribution is very diffuse, but that's not actually licensed by Bayesian decision theory; it is "irrational" and sets you up for a Dutch Book. Q: Should I be worried about having a Dutch Book made against me? A: I can't see why, but some people seem to find the prospect worrying. Q: So what should people do? A: I wish I had a good answer.. Many of Rebonato's actual suggestions — things like looking at a range of scenarios, robust strategies, not treating VaR as the only thing you need, etc. — make a lot of sense. (When he is making these practical recommendations, he does not counsel people to engage in a careful quantitative elicitation of their subject prior probabilities, and then calculate posterior distributions via Bayes's rule; I wonder why.) I would also add that there are such things as confidence intervals, which do let you make probabilistic guarantees about parameters. Q: What on earth do you mean by a "probabilistic guarantee"? A: That either the right value of the parameter is in the confidence set, or you get very unlucky with the data (how unlucky depends on the confidence level), or the model is wrong. Unlike coherence, coverage connects you to reality. This is basically why Haavelmo told the econometricians, back in the day, that they needed confidence intervals, not point estimates. Q: So how did the econometricians come to make fetishes of unbiased point-estimators and significance tests of equality constraints? A: No doubt for the same reason they became convinced that linear and logistic regression was all you'd ever need to deal with any empirical data ever. Q: Anyway, that "get the model right" part seems pretty tricky. A: Everyone is going to have to deal with that. (You certainly still have to worry about mis-specification with Bayesian updating.) You can test your modeling assumptions, and you can weaken them so you are less susceptible to mis-specification. Q: Don't you get weaker conclusions — in this case, bigger confidence intervals — from weaker modeling assumptions? A: That's an unavoidable trade-off, and it's certainly not evaded by going Bayesian (as Rebonato knows full well). With very weak, and therefore very defensible, modeling assumptions, the confidence interval on, say, the 99.5% VaR may be so broad that you can't devise any sensible strategy which copes with that whole range of uncertainty, but that's the math's way of telling you that you don't have enough data, and enough understanding of the data, to talk about once-in-two-thousand-year events. I suppose that, if they have financial engineers in the stationary state, they might eventually be able to look back on enough sufficiently-converged data to do something at the 99% or even 99.5% level. Q: Wait, doesn't that suggest that there is a much bigger problem with all of this? The economy is non-stationary, right? A: Sure looks like it. Q: So how can we use statistical models to forecast it? A: If you want someone to solve the problem of induction, the philosophy department is down the stairs and to the left. Posted by crshalizi at June 16, 2009 09:50 | permanent link
Chaos, Complexity and Inference: 2009 Syllabus
See this earlier post or the course homepage for more. This should be an RSS feed for this page, so you can follow updates, which will generally be posted after lectures.
Posted by crshalizi at June 16, 2009 09:24 | permanent link
May 31, 2009Books to Read While the Algae Grow in Your Fur, May 2009
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Enigmas of Chance; The Dismal Science Posted by crshalizi at May 31, 2009 23:59 | permanent link
April 30, 2009Books to Read While the Algae Grow in Your Fur, April 2009
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Writing for Antiquity; Enigmas of Chance; Mathematics; The Progressive Forces; The Commonwealth of Letters Posted by crshalizi at April 30, 2009 23:59 | permanent link
April 28, 2009That Word Does Not Exist In Any Language
Attention conservation notice: 1500 words on a dispute at the intersection of paleography and computational linguistics, two areas in which I have no qualifications; also a sermon on a text from Lichtenberg: "We must not seek to abstract from the busts of the great Greeks and Romans rules for the visible form of genius as long as we cannot contrast them with Greek blockheads." Over the weekend, I read Mark Liberman's post at Language Log about the new Rao et al. paper in Science, claiming to show information-theoretically that the symbols recovered on artifacts from the Harappan civilization in the Indus Valley are in fact a form of writing, as had long been supposed but was denied a few years ago by Farmer, Sproat and Witzel. What Rao et al. claimed to show is that the sequences of Indus symbols possess information-theoretic properties which are distinctive of written language, as opposed to other symbols sequences, say ones which are completely random (IID, their "type 1 nonlinguistic") or completely deterministic (their "type 2 nonlinguistic"). Specifically, they examined the conditional entropy of sequence pairs (i.e., the entropy of the next symbol given the previous one). The claim is that the Indus symbols have the same pattern for their conditional entropy as writing systems, which is clearly distinguishable from non-linguistic symbol sequences by these means. As someone who is very, very into information theory (especially conditional entropy), I was intrigued, but also very puzzled by Mark's account, from which it seemed that Rao et al. had a huge gap where the core of their paper should be. Actually reading the paper convinced me that Mark's account was correct, and there was a huge logical fallacy. I'll reproduce Figure 1A from the paper and explain what I mean.
Rao et al. worked with a corpus of Indus Valley inscriptions, which recognizes 417 distinct symbol types. This is their "Indus" line. The other language lines come from different corpora in the indicated language. In each corpus, they filtered out the less common symbols, and then fit a first-order Markov chain. (Transition probabilities were estimated with a smoothing estimator rather than straight maximum likelihood.) Then they calculated the conditional entropy of the chain, using the estimated transition probabilities and the observed symbol frequencies (rather than say the invariant distribution of the chain); that's the vertical axis. The horizontal axis shows how many symbol types were retained --- i.e., "100 tokens" means that only the 100 most common symbols in the corpus were kept, and the chain was fit to those sequences. (This is not explained in the paper but was made clear in later correspondence between Sproat and the authors.) There are two lines for English, depending on whether "token" was taken to mean "character" (differentiating upper and lower case) or to mean "word". The bottom curve shows the estimated conditional entropy from a purely deterministic sequence; the actual conditional entropy is in fact zero, so I presume that the upward trend is an artifact of the smoothed transition probabilities. The top curve, on the other hand, is from a uniform IID sequence --- here the real conditional entropy is the same as the marginal entropy, but both grow as N increases because the size of the symbol set grows. (I.e., this is an artifact of keeping only the most common symbols.) Here is the flaw: there is no demonstration that only linguistic sequences have this pattern in their conditional entropies. Rao et al. have shown that two really extreme non-linguistic processes don't, but that's not a proof or even a plausibility argument. I would settle for an argument that non-linguistic processes have to be really weird to show this pattern, but even that is lacking. In Mayo's terms, they have not shown that this test has any severity. Of course the fact that they haven't shown their test is severe doesn't mean that it isn't, in fact, severe. So, by way of procrastinating, I spent some time yesterday constructing a counter-example. My starting point was what Mark had done, generating a sequence of IID draws from a geometric distribution (rather than a uniform one) and subjecting it to the same analysis as Rao et al. As it happens, I had already written a function in R to fit Markov chains and calculate their log-likelihood, and here the conditional entropy is the negative log likelihood over the sequence length. (Admittedly this is only true using the maximum likelihood estimates for transition probabilities, rather than smoothed estimates as Rao et al. do, but my simulations had so much data this shouldn't matter.) Setting the rate of the geometric distribution to 0.075, here were my first results.
Mark Liberman and Richard Sproat did almost the same thing pretty much simultaneously, as you can see from the updates to Mark's post. This was not entirely satisfactory, since (as Rao et al. point out in the online supplementary materials), there is a big gap between the marginal and conditional entropies for writing and for the Indus symbols. This was also, however, not too hard to finesse. In addition to the geometric sequence, I generated a Markov chain which alternated between the values +1 and -1, but where each positive or negative sign was 99% likely to be followed by the same sign. (That is, the signs were highly persistent.) I then multiplied the IID geometric variables (plus 1) by the Markov signs. This gave a larger set of symbols, but where knowing the sign of the current symbol (which "register" or "sub-vocabulary" it came from) was quite informative about the sign of the next symbol. (I added 1 to the geometric variables to exclude 0=-0, keeping the sub-vocabularies distinct.)
Pluses: marginal entropy; circles: conditional entropy A third experiment takes after the fact that the Indus symbol sequences are all extremely short, at most a dozen characters or so. In stead of having a Markov chain for the sign, I used another, independent set of random draws, uniform on the integers from 2 to 6, to divide the sequence into blocks, and gave all the symbols in each block the same (coin-toss) sign.
Pluses: marginal entropy; circles: conditional entropy (Because I'm doing everything with a single long sequence, I artificially introduce transitions from positive to negative signs, which lowers the gap between the conditional and unconditional entropies. If I wanted to do this properly, I'd re-write my Markov estimator so it used many short sequences; but that would be too much like real work.) The mechanism producing the gap between conditional and unconditional entropies is that the marginal distribution of symbols is a mixture of several pure distributions, and which mixture component we draw from now influences which component we'll draw from next (so the sequence can be Markov, exchangeable, etc.). Given the mixture components, the symbols are independent and the conditional and unconditional entropies are equal. Without that knowledge, the first symbol in effect is a cue for figuring out the mixture component, reducing the entropy of the second. There is nothing specifically linguistic about this; any hidden Markov model does as much. It would, for instance, work if the "symbols" were characters in randomly-selected comic books, who cluster (though slightly imperfectly); if that's too low-brow, think about Renaissance paintings, and the odds of seeing St. John the Baptist as opposed to a swan in one which contains Leda. I have made no attempt to match the quantitative details Rao et al. report for the Indus symbols, just the qualitative patterns. Were I to set out seriously to do so, I'd get rid of the geometric distribution, and instead use a hidden Markov model with more than two states, each state having a distinct output alphabet, the distribution of which would be a Zipf (as used by Liberman or Sproat) or a Yule-Simon. (I might also try block-exchangeable sequences, as in my third example.) But this would approach real work, rather than a few hours of procrastination, and I think the point is made. Perhaps the specific results Rao et al. report can only be replicated by making distributional assumptions which are very implausible for anything except language, but I'd say that the burden of proof is on them. If, for instance, they analyzed lots of real-world non-linguistic symbol systems (like my comic books) and showed that all of them had very different conditional entropy curves than did actual writing, that would be a start. I should in the interest of full disclosure say that a number of years ago Farmer and I corresponded about his work on the development of pre-modern cosmologies, which I find interesting and plausible (though very conjectural). But if anything I hope Farmer et al. are wrong and the Indus Valley civilization was literate, and I'd be extra pleased if the language were related to Tamil. Unfortunately, if that's the case it will need to be shown some other way, because these conditional entropy calculations have, so far as I can see, no relevance to the question at all. My code (in R) is here if you want to play with this, or check if I'm doing something stupid. In the unlikely event you want more, I suggest reading the reply of Farmer et al., Rahul Siddharthan (especially the comments), or Fernando Pereira; the last is probably the wisest. Manual trackback: Metadatta; Language Hat Posted by crshalizi at April 28, 2009 22:30 | permanent link
April 10, 2009Next Week at the Statistics Seminar: Bayes, Bayes, Baked Beans, Sausage and Bayes
Next week at the CMU statistics seminar, we give you all the Bayes you could want and more:
Posted by crshalizi at April 10, 2009 13:08 | permanent link
In Another Green World
I will be completely offline from the 11th to 20th, while I contemplate whether certain referees deserve to be fed to jaguars, or whether it would not be more humane to sacrifice them to the great feathered serpent. (I mean humane to the cats, of course.) Posted by crshalizi at April 10, 2009 13:00 | permanent link
April 03, 2009Next Week at the Statistics Seminar: "Methods and Models for Time-Dependent Relational Data"
Next week at the CMU statistics seminar:
Posted by crshalizi at April 03, 2009 13:37 | permanent link
March 31, 2009Books to Read While the Algae Grow in Your Fur, March 2009
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; The Dismal Science; The Progressive Forces; The Continuing Crises; Writing for Antiquity; Cthulhiana Posted by crshalizi at March 31, 2009 23:59 | permanent link
March 27, 2009Another Idle Question
How many of the people currently pushing or exploiting conspiracy theories about the introduction of a global currency also claim to support returning to the gold standard? (And where's Dan Sperber when we need him?) The Dismal Science; The Running Dogs of Reaction; Psychoceramics Posted by crshalizi at March 27, 2009 21:10 | permanent link
March 26, 2009Some Bayesian Finger-Puzzle Exercises, or: Often Wrong, Never In Doubt
Attention conservation notice: Clearing out my drafts folder. 600+ words on some examples that I cut from a recent manuscript. Only of interest to (bored) statisticians. The theme here is to construct some simple yet pointed examples where Bayesian inference goes wrong, though the data-generating processes are well-behaved, and the priors look harmless enough. In reality, however, there is no such thing as an prior without bias, and in these examples the bias is so strong that Bayesian learning reaches absurd conclusions. Example 1The data Xi, i=1,2,3,..., come from a 50/50 mixture of two Gaussians, with means at -1 and +1, both with standard deviation 1. (They are independent and identically distributed.) The prior, by coincidence, is a 50/50 mix of two Gaussians, located at -1 and +1, both with standard deviation 1. So initially the posterior predictive distribution coincides exactly with the actual data-generating distribution. After n observations x1, ... xn, whose sum is z, the log-likelihood ratio L(+1)/L(-1) is e2z. Hence the posterior probability that the expectation is +1 is 1/(1+e-2z), and the posterior probability that the expectation is -1 is 1/(1+e2z). The sufficient statistic z itself follows an unbiased random walk, meaning that as n grows it tends to get further and further away from the origin, with a typical size growing roughly like n1/2. It does keep returning to the origin, at intervals dictated by the arc sine law, but it spends more and more of its time very far away from it. The posterior estimate of the mean thus wanders from being close to +1 to being close to -1 and back erratically, hardly ever spending time near zero, even though (from the law of large numbers) the sample mean converges to zero. This figure shows typical sample paths for z, for the posterior probability of the +1 mode, and for the relative entropy of the predictive distribution from the data-generating distribution. (The latter is calculated by Monte Carlo since I've forgotten how to integrate, so some of the fuzziness is MC noise.) Here is the R code.
Exercise 1: Confirm those calculations for the likelihood ratio and so for the posterior. Exercise 2: Find the expected log-likelihood of an arbitrary-mean unit-variance Gaussian under this data-generating distribution. Example 2Keep the same data-generating distribution, but now let the prior be the conjugate prior for a Gaussian, namely another Gaussian, centered at zero. The posterior is then another Gaussian, which is a function of the sample mean, since the latter is a sufficient statistic for the problem. Exercise 3: Find the mean and variance of the posterior distribution as functions of the sample mean. (You could look them up, but that would be cheating.) As we get more and more data, the sample mean of converges almost surely to zero (by the law of large numbers), which here drives the mean and variance of the posterior to zero almost surely as well. In other words, the Bayesian becomes dogmatically certain that the data are distributed according to a standard Gaussian with mean 0 and variance 1. This is so even though the sample variance almost surely converges to the true variance, which is 2. This Bayesian, then, is certain that the data are really not that variable, and any time now will start settling down. Exercise 4: Suppose that we take the prior from the previous example, set it to 0 on the interval [-1,+1], and increase the prior everywhere else by a constant factor to keep it normalized. Show that the posterior density at every point except -1 and +1 will go to zero. (Hint: use exercise 2 and see here.) Update in response to e-mails, 27 March: No, I'm not saying that actual Bayesian statisticians are this dumb. A sensible practitioner would, as Andy Gelman always recommends, run a posterior predictive check, and discover that his estimated model looks nothing at all like the data. But that sort of thing is completely outside the formal apparatus of Bayesian inference. What amuses me in these examples is that the formal machinery becomes so certain while being so wrong, while starting from the right answer (and this while Theorem 5 from my paper still applies!). See the second post by Brad DeLong, linked to below. Manual trackback: Brad DeLong; and again Brad DeLong (with a simpler version of example 1!); The Statistical Mechanic Posted by crshalizi at March 26, 2009 10:45 | permanent link
March 24, 2009Where Did the Steelworkers Go?
Attention conservation notice: back-of-the-envelope calculations about why the US has only about a fifth as many steelworkers now as it did in 1960. Not backed by any actual knowledge of the steel industry. Utterly untimely, it was, I think, prompted by a comment thread on Unfogged, but so long ago I can't remember which. In 1960, US primary steel production was 91 million tons, of which 2.95 million tons were exported; it also imported 3.24 million tons. This part of the industry employed 530,000 people in all capacities, for an annual output of 170 tons/employee. In 2007, US primary steel production was 98.1 million tons, with exports of 10.1 million tons and imports of 30.2 million tons. Employment was only 97,540 people, coming to 1005 tons/employee. Exports and imports in 1960 were a wash, nearly enough, so let's suppose trade patterns had remained comparable and say that all of the net imports were to be made up by higher domestic production: (20.1 million tons)/(1005 tons/worker) = 20,000 extra workers. This would be a substantial increase, but it would still leave employment in steel at only 22% of its 1960 level. Where did the other four-fifths of the industry go? The most obvious explanation is productivity. The industry in 2007 produced more than it did in 1960, with many fewer employees. In fact, output per employee grew 5.9 times over that period. A six-fold increase in productivity divided by a slight rise in total demand equals a roughly five-fold fall in employment. Now, this calculation understates the effect of trade because it only considers net imports of steel. But steel is used as an input to producing many other things, and a washing machine made of steel shows up in this sort of official statistic as an import of a manufactured good, not an import of steel. So to really see what US steel production would be if we retained 1960 trade patterns, we'd need to see what the change in the (foreign*) steel content of US net imports has been. Since I don't have Leontief input-output matrices for the US and its trading partners in the two years, I can't do this. Failing actual knowledge, I'll turn to guesswork. Suppose the steel content of imports was equal to net direct imports; this seems high, but what do I know? This would just add another 20,000 jobs, and bring us up to 26% of the size of the industry in 1960. To get the same level of employment in steel production now as in 1960, the net increase in the foreign steel content of our imports would have to satisfy (530,000 workers) - (117,540 workers for domestic production and direct imports) = (increase in net indirect imports)/(1005 tons/worker)or 414,522,300 tons, i.e., about 3.5 times total production plus net direct imports. This is highly implausible. I conclude that domestic employment in steel production has collapsed largely because increases in productivity have not been matched by increases in demand. If someone can point out where this reasoning goes wrong, I'd appreciate it. *: Foreign steel content, because if the washing machine is made abroad of steel exported by the US, replacing that washing machine with a US-made one will not increase the demand for American steel. Sources: 2007 employment figure from BLS (NAICS code 3311). 1960 employment figure from Table 1 on p. 2 of Lebergott. (It does not, however, appear to be affected by some of the well-known problems with Lebergott's series for the 1930s.) Annual production, import and export figures from USGS. Manual trackback: The Inverse Square Blog; Nothing Funny About Feldspar (with more facts; go read) Posted by crshalizi at March 24, 2009 10:49 | permanent link
March 22, 2009Special Function Invocation
O Hive Mind, o Lazy Web, Urania's child, I invoke thee! Is there a name
for the function
Posted by crshalizi at March 22, 2009 21:52 | permanent link
March 17, 2009Idle Question of the Day
Exactly what bad consequences would follow if laws were passed by the relevant countries rendering credit default swap contracts void henceforth? (That is, canceling all the outstanding wagers because the bookies went bust.) Update, 22 March: Well, one bad consequence would evidently be agreeing with Ben Stein. A bit from that link (by Felix Salmon, not Stein) is worth quoting: There's a good chance, just for starters, that every major bank in America would go bust overnight: after all, they've been packaging up and selling off the credit risk on their multi-trillion-dollar loan portfolios [for years]. If Stein got his way, all that credit risk would suddenly reappear on the banks' balance sheets, and there's nothing they could do about it. Genius. Remember that those super-senior CDOs were the safest bits of the credit that they sold off. Just imagine what their balance sheets would look like if all the risky bits reappeared. The issue he's raising is that if the banks can't say that they're covered for the risk of their loans defaulting (via the credit default swaps), they need to hold more capital as a protection against default. So as a legal or regulatory issue, ending the swaps would make the banks worse off. Substantively, however, this only makes sense if the swaps would, in fact, protect banks in the event of defaults — if they actually shifted the risk to the swap-sellers. Since we have just had pretty dramatic demonstrations that this is not something to be counted on, it's not at all clear to me that the banks ought to be able to keep that risk off their balance sheets. (In other words, the real value of the swaps to the banks is zero, or next to zero.) In any case, this objection could be countered by combining ending credit default swaps with public guarantees of the banks' existing positions — which is effectively what's happening anyway, only without making it harder to repeat the mistake in the future. More broadly, ending credit default swaps would mean that those who sold such swaps would lose their stream of payments (a flow) but gain back their collateral and reserves (a stock); conversely buyers of default protection would gain a cash flow but take a hit to their capital stocks. Right now one imagines that even those selling the "end of the world trade" might prefer to get out of the game; I'd be interested to see an estimate of the effects of this on the stability of the financial world right now. There is also the possibility that eliminating the swaps would deprive us of information about how risky different debts are. The value we should place on this, however, depends on how well these markets actually succeed in aggregating information about risk. I'd say there is abundant cause for skepticism about this — especially when things are, in fact, dangerous. Economic theory does not, in fact, provide any reason to think that such markets will be dominated by those with the most accurate beliefs (or even that the market as a whole will be more accurate than the best-informed trader), unless you assume a complete set of markets, which is a reductio ad absurdum if ever there were one. (When markets are incomplete, more markets are not necessarily better.) To be clear, I am not asserting that credit default swaps should be ended. I honestly don't think I know enough to have an opinion about that, and while I'm obviously skeptical about their value, some serious and credible people (i.e., ones who do not have a vested interest in the matter) who've studied them in more depth see merit to them. If this is what a world with efficiently-allocated risk looks like, though, I'd hate to see a messed-up one. (Thanks to readers D.H. and son1 for comments and pointers.) Posted by crshalizi at March 17, 2009 22:22 | permanent link
|