Giovanni Stilo presents "Time Makes Sense"

Stilo Giovanni
Title: "Time Makes Sense"   Stilo Giovanni   ABSTRACT            Temporal text mining (TTM) has recently attracted the attention of scientists as a means to discover and track real-time discussions in micro-blogs. However current approaches to temporal mining suffer from efficiency problems when applied to large micro-blog streams, like Twitter, now reaching an average of 500 million tweets per day. We propose a technique, named SAX* (based on an algorithm named Symbolic Aggregate ApproXimation) to discretize the temporal series of terms into a small set of levels, leading to a string for each series. We then define a subset of “interesting” strings, i.e. those representing patterns of collective attention. Sliding temporal windows are used to detect clusters of terms with the same string. We show that SAX* is more efficient (by orders of magnitude) than other approaches to temporal mining in literature. In this paper, we experiment SAX* on the task of event discovery over Twitter's stream (1% worldwide) of one year. BIOGRAPHICAL NOTE Giovanni is a research fellow in the Department of Computer Science at the Sapienza University of Rome. His research interests lie in the areas of Web Information Retrieval, Data Mining, Emergent Semantics, Clustering, TimeSeries Similarity, ILI Nowcasting and Forecasting.