Feedback control and decentralized optimization have become increasingly important in online programmatic advertising systems, e.g., to deal with campaign budget pacing and performance optimization. The solutions typically involve an auction-based allocation of ad inventory, which historically was implemented using a second-price cost model. In recent years the industry has rapidly transitioned to predominantly a first-price cost model. This has important implications on what is the optimal bidding strategy for advertisers. In particular, bids have to be shaded (discounted) to avoid overpaying. This paper proposes an adaptive scheme for online learning of optimal bid shading. The scheme involves segmentation, a two-parametric nonlinear shading mechanism, and an online learning algorithm for parameter optimization. The learning algorithm employs a recursive least squares estimation of a logquadratic model of the relationship between the surplus and the parameters, and a Newton-like gradient descent update scheme to find the surplus maximizing shading parameters. The effectiveness of the proposed approach is demonstrated with experiment results from Verizon Media Demand Side Platform (DSP).