|
|
Publication
Uplink Macro Diversity of Limited Backhaul Cellular Network
Source: IEEE transactions on Information Theory, Volume 55, Issue 8, p.3457 - 3478 (2009)
Abstract: In this work, new achievable rates are derived for the uplink channel of a cellular network with joint multicell processing (MCP), where unlike previous results, the ideal backhaul network has finite capacity per cell. Namely, the cell sites are linked to the central joint processor via lossless links with finite capacity. The new rates are based on compress-and-forward schemes combined with local decoding. Further, the cellular network is abstracted by symmetric models, which render analytical treatment plausible. For this family of idealistic models, achievable rates are presented for both Gaussian and fading channels. The rates are given in closed form for the classical Wyner model and the soft-handover model. These rates are then demonstrated to be rather close to the optimal unlimited backhaul joint processing rates, even for modest backhaul capacities, supporting the potential gain offered by the joint MCP approach. Particular attention is also given to the low-signal-to-noise ratio (SNR) characterization of these rates through which the effect of the limited backhaul network is explicitly revealed. In addition, the rate at which the backhaul capacity should scale in order to maintain the original high-SNR characterization of an unlimited backhaul capacity system is found.
Publication
Opportunistic Relaying in Wireless Networks
Source: IEEE transactions on Information Theory, Volume 55, Issue 11, p.5121-5137 (2009)
Abstract: Relay networks having n source-to-destination pairs and m half-duplex relays, all operating in the same frequency band and in the presence of block fading, are analyzed. This setup has attracted significant attention, and several relaying protocols have been reported in the literature. However, most of the proposed solutions require either centrally coordinated scheduling or detailed channel state information (CSI) at the transmitter side. Here, an opportunistic relaying scheme is proposed that alleviates these limitations, without sacrificing the system throughput scaling in the regime of large n. The scheme entails a two-hop communication protocol, in which sources communicate with destinations only through half-duplex relays. All nodes operate in a completely distributed fashion, with no cooperation. The key idea is to schedule at each hop only a subset of nodes that can benefit from multiuser diversity. To select the source and destination nodes for each hop, CSI is required at receivers (relays for the first hop, and destination nodes for the second hop), and an index-valued CSI feedback at the transmitters. For the case when n is large and m is fixed, it is shown that the proposed scheme achieves a system throughput of m/2 bits/s/Hz. In contrast, the information-theoretic upper bound of (m/2) log log n bits/s/Hz is achievable only with more demanding CSI assumptions and cooperation between the relays. Furthermore, it is shown that, under the condition that the product of block duration and system bandwidth scales faster than log n log log n, the achievable throughput of the proposed scheme scales as Theta (log n). Notably, this is proven to be the optimal throughput scaling even if centralized scheduling is allowed, thus proving the optimality of the proposed scheme in the scaling law sense. Simulation results indicate a rather fast convergence to the asymptotic limits with the system's size, demonstrating the practical importance of the scaling results.
News
Splitting Up Search
Searching the Web could become faster for users and much more efficient for search companies if search engines were split up and distributed around the world, according to researchers at Yahoo.
News
Yahoo! Wins Best Paper at CIKM 2009
CIKM 2009 took place in vibrant Hong Kong from November 2 to November 6. CIKM is a leading ACM conference on computer science. Its goal is to bring together researchers from the fields of information retrieval, databases, and knowledge management.
Publication
Explore/Exploit Schemes for Web Content Optimization (best paper award)
Source: IEEE International Conference on Data Mining (2009)
Abstract: We propose novel multi-armed bandit (explore/exploit) schemes to
maximize total clicks on a content module published regularly on
Yahoo! Intuitively, one can "explore" each candidate item by
displaying it to a small fraction of user visits to estimate the item's
click-through rate (CTR), and then "exploit" high CTR items in order
to maximize clicks. While bandit methods that seek to find the
optimal trade-off between explore and exploit have been studied for
decades, existing solutions are not satisfactory for web content
publishing applications where dynamic set of items with short
lifetimes, delayed feedback and non-stationary reward (CTR)
distributions are typical. In this paper, we develop a Bayesian
solution and extend several existing schemes to our
setting. Through extensive evaluation with nine bandit schemes, we
show that our Bayesian solution is uniformly better
in several scenarios. We also study the empirical
characteristics of our schemes and provide useful insights
on the strengths and weaknesses of each. Finally, we validate
our results with a "side-by-side" comparison
of schemes through "live experiments" conducted on a
random sample of real user visits to Yahoo!
Publication
Regression based Latent Factor Models
Source: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2009)
URL:http://portal.acm.org/citation.cfm?doid=1557019.1557029
Abstract: We propose a novel latent factor model to accurately predict response for large scale dyadic data in the presence of features. Our approach is based on a model that predicts response as a multiplicative function of row and column latent factors that are estimated through separate regressions on known row and column features. In fact, our model provides a single unified framework to address both cold and warm start scenarios that are commonplace in practical applications like recommender systems, online advertising, web search, etc. We provide scalable and accurate model fitting methods based on Iterated Conditional Mode and Monte Carlo EM algorithms. We show our model induces a stochastic process on the dyadic space with kernel (covariance) given by a polynomial function of features. Methods that generalize our procedure to estimate factors in an online fashion for dynamic applications are also considered. Our method is illustrated on benchmark datasets and a novel content recommendation application that arises in the context of Yahoo! Front Page. We report significant improvements over several commonly used methods on all datasets.
Publication
Spatio-Temporal Models for Estimating Click-through Rate
Source: The 18th International World Wide Web Conference (2009)
URL:http://www2009.eprints.org/3/
Abstract: We propose novel spatio-temporal models to estimate clickthrough rates in the context of content recommendation. We track article CTR at a fixed location over time through a dynamic Gamma-Poisson model and combine information from correlated locations through dynamic linear regressions, significantly improving on per-location model. Our models adjust for user fatigue through an exponential tilt to the firstview CTR (probability of click on first article exposure) that is based only on user-specific repeat-exposure features. We illustrate our approach on data obtained from a module (Today Module) published regularly on Yahoo! Front Page and demonstrate significant improvement over commonly used baseline methods. Large scale simulation experiments to study the performance of our models under different scenarios provide encouraging results. Throughout, all modeling assumptions are validated via rigorous exploratory data analysis.
Publication
Bellwether Analysis: Searching for Cost-Effective Query-defined Predictors in Large Databases
Source: ACM Transactions on Knowledge Discovery from Data, Volume 3, Issue 1 (2009)
URL:http://portal.acm.org/citation.cfm?doid=1497577.1497582
Abstract: How to mine massive datasets is a challenging problem with great potential value. Motivated by this challenge, much effort has concentrated on developing scalable versions of machine learning algorithms. However, the cost of mining large datasets is not just computational; preparing the datasets into the “right form” so that learning algorithms can be applied is usually costly, due to the human labor that is typically required and a large number of choices in data preparation, which include selecting different subsets of data and aggregating data at different granularities. We make the key observation that, for a number of practically motivated problems, these choices can be defined using database queries and analyzed in an automatic and systematic manner. Specifically, we propose a new class of data-mining problem, called bellwether analysis, in which the goal is to find a few query-defined predictors (e.g., first week sales of Peoria, IL of an item) that can be used to accurately predict the result of a target query (e.g., first year worldwide sales of the item) from a large number of queries that define candidate predictors. To make a prediction for a new item, the data needed to generate such predictors has to be collected (e.g., selling the new item in Peoria, IL for a week and collecting the sales data). A useful predictor is one that has high prediction accuracy and a low data-collection cost. We call such a cost-effective predictor a bellwether.
Publication
Adversarial-Knowledge Dimensions in Data Privacy
Source: International Journal on Very Large Data Bases, Volume 18, Issue 2 (2009)
URL:http://www.springerlink.com/content/j5974525568074xm/
Abstract: Privacy is an important issue in data publishing. Many organizations distribute non-aggregate personal data for research, and they must take steps to ensure that an adversary cannot predict sensitive information pertaining to individuals with high confidence. This problem is further complicated by the fact that, in addition to the published data, the adversary may also have access to other resources (e.g., public records and social networks relating individuals), which we call adversarial knowledge. A robust privacy framework should allow publishing organizations to analyze data privacy by means of not only data dimensions (data that a publishing organization has), but also adversarial-knowledge dimensions (information not in the data). In this paper, we first describe a general framework for reasoning about privacy in the presence of adversarial knowledge. Within this framework, we propose a novel multidimensional approach to quantifying adversarial knowledge. This approach allows the publishing organization to investigate privacy threats and enforce privacy requirements in the presence of various types and amounts of adversarial knowledge. Our main technical contributions include a multidimensional privacy criterion that is more intuitive and flexible than previous approaches to modeling background knowledge. In addition, we identify an important congregation property of the adversarial-knowledge dimensions. Based on this property, we provide algorithms for measuring disclosure and sanitizing data that improve computational efficiency several orders of magnitude over the best known techniques.
|