Volume 25 Number 3 2008
pp. 315-336

ABSTRACT: Real-world predictive data mining (classification or regression) problems are often cost sensitive, meaning that different types of prediction errors are not equally costly. While cost-sensitive learning methods for classification problems have been extensively studied recently, cost-sensitive regression has not been adequately addressed in the data mining literature yet. In this paper, we first advocate the use of average misprediction cost as a measure for assessing the performance of a cost-sensitive regression model. We then propose an efficient algorithm for tuning a regression model to further reduce its average misprediction cost. In contrast with previous statistical methods, which are tailored to particular cost functions, this algorithm can deal with any convex cost functions without modifying the underlying regression methods. We have evaluated the algorithm in bank loan charge-off forecasting, where underforecasting is considered much more costly than overforecasting. Our results show that the proposed algorithm significantly reduces the average misprediction costs of models learned with various base regression methods, such as linear regression, model tree, and neural network. The amount of cost reduction increases as the difference between the unit costs of the two types of errors (overprediction and underprediction) increases.