Update: Johan wrote me back this afternoon and confirmed that the z-scores are not used in assessing predictive power. I think future drafts of the paper will be much more clear on this point, as apparently similar concerns had been raised by others. Based on Johan's statements, I'd like to emphasize that this paper does rigorously support the claim that Twitter can be used to help predict the direction of the Dow for their sample. Though directional prediction does not necessarily equate to profitable strategy, this is an exciting conclusion. I think the paper would benefit most from a portfolio backtesting instead of just directional prediction, and perhaps an extensions to either interest rates or the something like the VIX. All in all, I'm excited to see future research from their group.
As I noted when I first linked to this paper on arXiv, I think there may be an issue with the claim of prediction. Here is the portion of text that raises some serious questions in my mind. Emphasis is mine.
Note then that the assessment of predictive power later uses these z-scores, which are clearly not out-of-sample since they incorporate $k$ periods of future knowledge. Figure 3 and its caption below drive this point home, as they clearly indicate that $Z_t$ is used here.
The remainder of the text is somewhat ambiguous.
I've emailed the authors twice over the last week, and despite the fact that they visited my personal homepage through the email, I've received no response. In the meantime, I think the jury is out on whether Twitter can actually be used to rigorously, out-of-sample predict the stock market.