AI-based economic monitoring using alternative data sources: Using news articles as a data source for nowcasting

Start of implementation: 2019

Technology type: Artificial intelligence

Technology service provider: Team from Asian Institute Management

ADB Department: Economic Research and Development Impact Department

In line with ADB’s Operational Approach: Promoting digital development and innovative technologies

A country’s Gross Domestic Product (GDP) is an important indicator of the size and health of its economy. It is considered by policymakers, investors, and other stakeholders in policy formulation and decision-making. However, there is usually a delay in the release of quarterly GDP, particularly for many developing countries that have sparse data.

A growing number of emerging new technologies and alternative data sources can be used to generate insights on a country’s economic status within a given period more quickly compared to GDP. However, the sheer volume and complexity of the information make it difficult for human users to sift through and interpret. In addition, big data may also not be compatible with traditional econometric models. New models need to be developed to make sense of big data for nowcasting.

 The initiative explored the possibility of using news articles to augment macroeconomic data for nowcasting. To do this, three models were developed.

Quantitative data was used to build the economic indicator model (EIM) using the monthly records for 44 indicators provided by ADB, along with seasonally adjusted data. Meanwhile, textual data from news articles were used to develop the thematic model (TM) to compensate for the economic data availability issues in EIM. A web scraping tool for a news aggregator was used to collect a total of 2.2 million articles released by various news outlets from October 1989 to December 2019. Various publishers were included to minimize bias and normalize the writing styles of different authors. Data cleaning, followed by lemmatization and token generation, was then performed. Keyword co-occurrence networks were also built to show the relationship strengths between keyword pairs. The most frequent unique tokens in an article were used as the keywords of the article to reduce noise and lessen the strain on the machine when performing network analysis and modeling.  Various algorithms were then developed and trained to model the GDP growth rate using the keywords.

EIM and TM were independently developed, enabling the team to generate these models in parallel.  Network analysis was also conducted to capture dynamic relationships and uncover themes and topics.

A third model, called the theme augmented model (TAM), combined the data sources of the two other models to offer a more holistic picture of the GDP, compensating for the respective weaknesses of the EIM and the TM, and producing more accurate predictions.

Three approaches were used to develop the TAM. The first model, dubbed unified TAM or U TAM, was an XGBoost model with both economic and textual features. The U TAM did not offer much of an improvement to the TM regarding the timely generation of accurate nowcasts. The second approach, dubbed E1 TAM, was a Random Forest regressor that used the predictions of the EIM and TM. The third approach, an ensemble model called E2 TAM, incorporated predictions from U TAM with the EIM and TM results.

All the models were trained using a common target and evaluated using the same metrics and methodology. The training set covered data from 2000 to 2019 and was used for hyperparameter tuning to minimize root mean squared error. Meanwhile, the results were compared to common baselines, which included the Bloomberg consensus forecasts, an autoregressive random walk, and a naïve assumption, which mimicked how GDP growth rates were reported in press releases relative to previous year values.  Multiple consecutive test samples were identified, with each of the models trained using data before each quarterly sample in a process the team called “extending window training and evaluation” to reduce the mean absolute error. For instance, GDP data from the first quarter of 2000 to the fourth quarter of 2015 were used to check whether the model was able to correctly predict the GDP of the first quarter of 2016.

The exploratory initiative showed that news articles are useful in generating accurate and timely nowcasts: nowcasting with news alone can reduce the margin of error to 0.36 percentage points. Being able to use a combination of methods and tools—including word co-occurrence networks, correlation networks, and community detection—with machine learning for GDP nowcasting can aid key decision-makers in government, non-government, and finance organizations in making decisions related to areas such as poverty alleviation, labor, trade, and investments.