Decision trees have been used for several decades as simple and effective solutions to supervised learning problems. Their success extends to tasks across a variety of areas. Yet, data collected today through web-domains such as online advertising presents many new challenges: sheer size, the prevalence of high-arity categorical features, unknown feature-values, “cold starts,” sparse training instances, and imbalance in the class labels. We argue that decision trees remain an ideal choice for applications of online advertising as they naturally construct higher-order conjunctive features; we then contribute two ideas to improve tree-building accordingly. First, to handle high-arity categorical features, we introduce a method to cluster feature-values based on their output responses. The result is more “data-dense” trees with relatively small branching factors. Second, we employ cross-validation as a principled approach to derive splitting and stopping criteria: thereby we identify splits that gen- eralize well, and also curb overfitting. Evaluated on three distinct probability-estimation tasks in on-line advertising, our method, “CCDT”, shows significant improvements in the accuracy of prediction.