Contextual multi-armed bandit algorithms have received significant attention in modeling users’ preferences for online personalized recommender systems in a timely manner. While significant progress has been made along this direction, a few major challenges have not been well addressed yet: (i) a vast majority of the literature is based on linear models that cannot capture complex non-linear inter-dependencies of user-item interactions; (ii) existing literature mainly ignores the latent relations among users and non-recommended items: hence may not properly reflect users’ preferences in the real-world; (iii) current solutions are mainly based on historical data and are prone to cold-start problems for new users who have no interaction history.
To address the above challenges, we develop a Graph Regularized Cross-modal (GRC) learning model, a general framework to exploit transferable knowledge learned from user-item interactions as well as the external features of users and items in online personalized recommendations. In particular, the GRC framework leverage a nonlinearity of neural network to model complex inherent structure of user-item interactions. We further augment GRC with the cooperation of the metric learning technique and a graph-constrained embedding module, to map the units from different dimensions (temporal, social and semantic) into the same latent space. An extensive set of experiments are conducted on two benchmark datasets as well as a large scale proprietary dataset from a major search engine demonstrates the power of the proposed GRC model in effectively capturing users’ dynamic preferences under different settings by outperforming all baselines by a large margin.