Abstract: Aggregating search results from a variety of heterogeneoussources or verticals such as news, image and video into a sin-gle interface is a popular paradigm in web search. Althoughvarious approaches exist for selecting relevant verticals oroptimising the aggregated search result page, evaluating thequality of an aggregated page is an open question. Thispaper proposes a general framework for evaluating the qual-ity of aggregated search pages. We evaluate our approachby collecting annotated user preferences over a set of ag-gregated search pages for 56 topics and 12 verticals. Weempirically demonstrate the fidelity of metrics instantiatedfrom our proposed framework by showing that they stronglyagree with the annotated user preferences of pairs of sim-ulated aggregated pages. Furthermore, we show that ourmetrics agree with the majority user preference more oftenthan the current diversity-based information retrieval met-rics. Finally, we demonstrate the flexibility of our frameworkby showing that personalised historical preference data canimprove the performance of our proposed metrics.