April 10, 2003 - 'Word Bursts'

Broadcast on "Coast to Coast": April 10, 2003

AA: I'm Avi Arditti with Rosanne Skirble and this week on WORDMASTER -- a way to find out what people are talking about. First, though, a story.


RS: Imagine a waterway. Fish rush by. Lots and lots of different fish. A computer program counts how many of each kind of fish there are. Now imagine that each kind of fish is really a word. What the computer program counts, then, is how often each different word appears.

AA: A computer scientist named Jon Kleinberg has developed one such program. He's an associate professor at Cornell University in Ithaca, New York. The software does more than just count words. With enough computer power, it could analyze huge amounts of electronic content. For instance, what's on all the front pages of all the English-language newspapers on the Internet.

RS: Day after day, it could track the frequency of use for each word. When certain words start to get used a lot more often, say, this week than last week -- "word bursts," they're called -- that's a signal. It suggests that these words are suddenly topical. Jon Kleinberg says these 鈥渨ord bursts鈥?reveal what is on people's minds.

KLEINBERG: "One of the things that's going on the Web is that there's not just mainstream media -- so things like the New York Times' homepage, CNN's homepage -- but there are also tens of thousands of people who maintain these online journals."

RS: These journals are called "Web logs" -- or simply "blogs" -- and they have become, Jon Kleinberg says, a new type of medium.

"They're the same kind of populist sort of commentary and discussion that we got with personal homepages early on in the Web, and they're now doing that for current events and for news. And by watching what these people talk about, that's a very good leading indicator of trends that people, for example, on the Web are aware of."

AA: Jon Kleinberg has also looked backward for trends -- for example, in the online archives of State of the Union speeches by U-S presidents. Words that appeared with particular increases in frequency tended to correspond to historical trends -- not a surprise, he says.

KLEINBERG: "So in the 1930s we have words like 'banks,' 'depression,' 'recovery,' in the 1940s we have words like 'war' and 'atomic,' and then in the '50s words like 'Korea,' 'communist.' The point is, that's an example where we believe we know what we're going to be seeing. It's a way of sanity-checking what's happening, so that we can then try it on things where we don't necessarily know what to expect."

RS: "For example?"

KLEINBERG: "One thing which surprised me -- and this was still in the context even of State of the Union addresses -- is that once we get to the 1980s, words that have to do with historical events in the '80s get mixed in a lot more with particular rhetorical devices. So, sudden increases in words like 'communities' and 'American' and 'patriotism.' So we find that with the increasing dominance of professional speechwriting, we have certain words that simply were appearing every single year. And that's something which one sort of may have thought about at an intuitive level, but it shows up extremely strongly when one does this frequency analysis. So it's a way of quantitatively verifying a shift in the language used in speechwriting, for example."

RS: "Do you see anything in this work that tells us a little bit about who we are as Americans? Because you see the frequency of words, does it tell us -- "

AA: "Where we're heading?"

RS: "Where we're heading, or where we've been?"

KLEINBERG: "I'm certainly heartened by all of the activity and things like the Web log community, which is really, I think, supplementing the mainstream news media with this very large additional set of outlets for opinions and commentary and expression. It's creating an extremely vibrant community, and I think that's an exciting development, certainly -- and something that one can, again, hopefully track by being aware of the current topics of interest as manifested through choices of words."

AA: "Doesn't that just sort of feed on itself or create kind of a loop, where you know what words are on the rise so you start using them more?"

KLEINBERG: "There is this interesting feedback going on, that as you become explicitly aware of this notion of popularity, you -- right, it feeds back on itself. One thing that helps alleviate that is this notion of 'burstiness' as being about change, not just about frequency. So we aren't just finding the most frequent words, but the words that are changing most sharply. So once something becomes popular, the fact that people continue using it no longer contributes to its change. It already is popular."

RS: "What about new words in the language?"

KLEINBERG: "At the moment methods like this are very good for catching the sudden appearance of coinages of new words in online media, simply because we have access to all that data. But I think this is something that could be used retrospectively to go back through books or newspapers over hundreds of years, trying to find the rise of words that are now quite common."

RS: In the long run, Jon Kleinberg of Cornell University says, the goal is to develop computer search engines that can catch ideas that are on the rise, and not just words.

AA: You'll find all of our words on the Web at voanews.com/wordmaster. And our e-mail address is word@voanews.com. With Rosanne Skirble, I'm Avi Arditti.