On January 14, 2020, Sanjeeb Dash from IBM T. J. Watson Research presented a talk on the integer programming formulation for the problem arising from the interpretable machine learning. The title of his talk was “Boolean decision rules via column generation“. He is visiting IBS Discrete Mathematics Group until the end of this week.
In many applications of machine learning, interpretable or explainable models for binary classification, such as decision trees or decision lists, are preferred over potentially more accurate but less interpretable models such as random forests or support vector machines. In this talk, we consider boolean decision rule sets (equivalent to boolean functions in disjunctive normal form) as interpretable models for binary classification. We define the complexity of a rule set to be the number of rules (clauses) plus the number of conditions (literals) across all clauses, and assume that simpler or less complex models are more interpretable. We discuss an integer programming formulation for such models that trades off classification accuracy against rule simplicity, and obtain high-quality classifiers of this type using column generation techniques. Compared to some recent alternatives, our algorithm dominates the accuracy-simplicity trade-off in 8 out of 16 datasets, and also produced the winning entry in the 2018 FICO explainable machine learning challenge. When compared to rule learning methods designed for accuracy, our algorithm sometimes finds significantly simpler solutions that are no less accurate.