“Garbage in, garbage out.” While the phrase is not a new one, its implications for banking credit risk models cannot be understated.

Financial institutions, corporate credit offices and other lenders that rely on credit risk models like a probability of default or loss given default often rely on the model’s output to make significant lending and capital requirement decisions. But in order for the model’s output to be meaningful and sound, the model must have been built on and then run using quality and accurate data.

Data for the development of a credit risk model

Accuracy of the model depends on both the recency and the representativeness of the data that was used to determine the model’s factors and coefficients.

The Altman Z score, for example, is often used to determine a company’s likelihood of default; however, that model was first constructed in 1968, using financial data from that period.

Given the significant economic changes that have taken place since that time period, it is no doubt that the expected financial performance of a company today is slightly different than it was in the 1960s. In fact, economic changes since the recession alone have changed how companies are run. For example, compared to 2010, private companies are carrying less debt and have stronger debt-service ratios.

A credit risk model that either incorporates current data into model development or uses economic adjustment factors to account for current economic conditions is better positioned to rate a company’s credit risk today.

In addition to recency, data that is used to generate a credit risk model should reflect the types of businesses or loans that will be analyzed in the model. For example, a U.S. company would not want to use financial benchmarks from a BRIC economy. Macroeconomic influences would be different and could make the model’s output for a U.S. company inaccurate.

Given the breadth of recent data required to be representative, it can be very difficult for community banks and other creditors to access sufficient data to develop statistically valid, data-driven models.

Data for a business analysis

Once a model is built using recent and representative data, the data challenge still isn’t over. The financial information that is input into a model to score a specific company or borrower, likewise, must be recent and accurate to avoid “garbage in, garbage out.”

For example, consider how quickly the business environment changed between mid-2008 and late 2009; evaluating a company’s 2007 payment history would have been of limited use in evaluating a company’s credit risk in 2009.

For credit risk models, the amount of data required for each analysis could also pose a problem. A model run using only 5 of the 6 required variables couldn’t give a meaningful answer. And for analysts looking at a private company that is not required to disclose financial data publicly, incomplete data could be a hurdle.

 To review the data collection methods Sageworks employed for its probability of default model, review its methodology whitepaper.