Publications > On Building Decision Trees from Large-scale Data in Applications of On-line Advertising

On Building Decision Trees from Large-scale Data in Applications of On-line Advertising

Publication

Nov 3, 2014

Abstract

Decision trees have been used for several decades as simple and effective solutions to supervised learning problems. Their success extends to tasks across a variety of areas. Yet, data collected today through web-domains such as online advertising presents many new challenges: sheer size, the prevalence of high-arity categorical features, unknown feature-values, “cold starts”, sparse training instances, and imbalance in the class labels. We argue that decision trees remain an ideal choice for applications of on-line advertising as they naturally construct higher-order conjunctive features; we then contribute two ideas to improve tree-building accordingly. First, to handle high-arity categorical features, we introduce a method to cluster feature-values based on their output responses. The result is more“data-dense”trees with relatively small branching factors. Second, we employ cross-validation as a principled approach to derive splitting and stopping criteria: thereby we identify splits that generalize well, and also curb overfitting. Evaluated on three distinct probability-estimation tasks in on-line advertising, our method,“CCDT”, shows significant improvements in the accuracy of prediction.

Venue:

Shanghai, China

Type:

Conference/Workshop Paper

Authors:

Durga Deepthi Singh
Ravi Kant

BibTeX

@inproceedings{ author = {Durga Deepthi Singh and Ravi Kant}, title = {On Building Decision Trees from Large-scale Data in Applications of On-line Advertising}, booktitle = {Proceedings of Shanghai, China}, year = {2014} }

- Help
- About our ads

On Building Decision Trees from Large-scale Data in Applications of On-line Advertising

Publication

Abstract

Shanghai, China

Conference/Workshop Paper

Durga Deepthi Singh

Ravi Kant

BibTeX