In the face of large-scale automated social engineering attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of the attack and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning: we develop an early warning system that harnesses account activity traces
to predict which accounts are likely to be compromised in the future. We
demonstrate the feasibility and applicability of the system through an
experiment at a large-scale online service provider using four months of
real-world production data encompassing hundreds of millions of users.
We show that—even limiting ourselves to login data only in order to derive features with low computational cost, and a basic model selection approach—our classifier can be tuned to achieve good classification precision when used for forecasting. Our system correctly identifies up
to one month in advance the accounts later flagged as suspicious with
precision, recall, and false positive rates that indicate the mechanism
is likely to prove valuable in operational settings to support additional
layers of defense.