Isotonic Regressions in scikit-learn

Isotonic regression is a great tool to keep in your repertoire; it’s like weighted least-squares with a monotonicity constraint.  Why is this so useful, you ask?  Take a look at the example relationship below.

(You can follow along with the Python code here).


  Let’s imagine that the true relationship between x and y is characterized piece-wise by a sharp decrease in y at low values of x, followed by a gradual decrease in y for larger x.  Let’s also imagine there is heteroskedasticity, with greater errors at low values of x than at high values of x.  However, the key is that we believe y should be strictly non-increasing in x, i.e., monotonic.

  Now, let’s imagine we want to produce a “smoothed” or “denoised” version of this relationship.  We might want to do this for visualization, or to regularize or smooth data for usage in some further modeling.  There are many common ways to smooth data, including polynomial or Chebyshev splines, LO(W)ESS, or non-linear least squares (NLS).  Without additional work, however, none of these methods will obey our monotonicity assumption above.

  Isotonic regression, on the other hand, is explicitly designed for this purpose.  Let’s look at what happens when we fit our observed y on x and plot the resulting isotonic fit.  I’ve included the default FITPACK univariate spline for comparison.



  In practical applications, we are probably trying to use observed values of y to predict some further z.  We want to use past experience about x and y to help us better predict z.   So, let’s imagine we sample a random sample of x values, and want to produce smoothed values of y and a final function z.  Again, I’ve included a default FITPACK univariate spline for comparison.


  Under the hood, this is entirely handled in Python by scikit-learn‘s IsotonicRegression class.  In prior versions (0.14 and before) of scikit-learn, however, IsotonicRegression required that you explicitly state whether y was increasing or decreasing in x.  With help from the wonderful sklearn team, I recently pushed a few enhancements to the IsotonicRegression class, making it a bit more powerful and friendly:

  • PR 3157: Automatically determine whether y is increasing or decreasing in x.  IsotonicRegression will now automatically determine this based on the sign of the Spearman correlation coefficient estimate, warning when the confidence interval implies uncertainty.
  • PR 3199: x values that were outside the training domain used to throw ValueError exceptions in calls to fit.   IsotonicRegression now supports a few friendly options to handle this more gracefully.
  • PR 3250: Efficiency improvements and refactoring.  IsotonicRegression is now slightly smarter and faster in repeated predict use cases, as the interpolating function is stored at fit time.

If you want to follow along with the figures and code above to see how it works, please refer to this rendered ipython notebook here.