Data mining and operations research techniques in Supply Chain Risk Management: A bibliometric study

Goal: This paper aims to carry a bibliometric study to map how data mining and operations research techniques are being applied to Supply Chain Risk Management. Design/Methodology/Approach: We conducted a bibliometric analysis implemented in R language (bibliometrix package) using Systematic Literature Review approach to conduct the search. Results: As the main results we highlight the gap we found in the literature considering Data Mining techniques in Supply Chain Risk Management and we set a full panorama of this stream of research. Limitations of the Investigation: We used Scopus database which allows recovering peer-reviewed texts from dozens of strong databases, nevertheless, we can not guarantee that all relevant documents were recovered. In addition, we considered only full published papers published in English language. Practical Implications: Managers and companies that are related in a supply chain must gradually redesign processes to include Data Mining techniques to support SCRM processes and activities along the SC. Originality / Value: The paper showed the updated panorama of Data Mining implementation regarding SCRM. We did not find any similar studies, which shows our unique contribution.


INTRODUCTION
Quantitative methodologies such as data mining and operations research techniques have potential to solve problems related to Supply Chain Risk Management (SCRM). Operations Research techniques are more popular among scholars while Data Mining (DM) techniques are starting to be applied. In this sense, the current situation of application of such techniques is yet to unfold. Therefore, this paper aims to carry a bibliometric study to map how these techniques are being applied in SCRM, answering five research questions: RQ1 Which are the most relevant sources? RQ2 Which are the main authors? RQ3 How is the paper network structured? RQ4 Which countries outstand in this stream of research? RQ5 Which are the main keywords related to SCRM? As the main results, we highlight a full bibliometric analysis carried on using bibliometrix (Aria and Cuccurullo, 2017), which is a package implemented using the R programming language. We highlight the lack of DM articles used in SCRM. We could not find any other studies performing a similar bibliometric analysis with these research streams, therefore, showing the uniqueness of this paper.

Bibliometry
Bibliometry is the collection, handling, and analysis of quantitative bibliographic data from scientific publications (Verbeek et al., 2002). Bibliometric analysis is fundamental to analyze the intellectual connections from the citations of articles in research areas (Ardito et al., 2019). Bibliometry consists in the identification of authors, publications, journals (Wu and Wu, 2017), and the co-citation of documents (Fahimnia et al., 2015;Appio et al., 2016). Bibliometric analysis allows the assessment of connections and relevance of a research stream applying the network theory (Liu et al., 2015) and identifying citations about similar research areas (Hjørland, 2013).
Thus, bibliometric analysis has as its main purpose the identification of a systematic connection between publications that attribute development to the research field under analysis (Di Stefano et al., 2010).

Supply Chain Risk Management
Supply chain is a network composed of a supplier, manufacturer, distributor, and retailer to meet customer demand (Rajan et al., 2019). Supply chain management (SCM) is the ability to strategically manage supply demand, movement and storage through information, inventory, and marketing flows, with the objective of serving the end customer. Globalization in the 21st century has provided companies with competitive advantages, but also created vulnerability in the face of an increasingly competitive market (Rajan et al., 2019). As a result, companies tend to face planned and unforeseen events, such as uncertainties in demand, production uncertainty, supply interruptions, and even unexpected events, such as work accidents, cyber-attacks, natural disasters and terrorism (Kara and Fırat, 2017). Thus, industries must reinvent themselves and adopt SCRM practices to survive.
Since 2004, SCRM is increasingly gaining importance. SCRM is the study of the application of tools in SCM that monitors the capacity of a supply chain, balancing supply and demand from the management of risks and uncertainties present in the supply chain management process (Kara et al., 2020). In SCM, risks are defined as impacts and interruptions in the activities of logistics, resources, material flow and information in the supply chain (Brindley, 2004) that cause vulnerability. Thus, SCRM is described as processes to identify and mitigate potential risks (Manuj & Mentzer, 2008), following 5 steps (Ho et al., 2015): i) Risk identification -process that identifies types and risk factors; ii) Risk assessment -process that assesses the probability and impact of an event occurring; iii) Risk mitigation -process to mitigate the probability of an event occurring; iv) Risk monitoring -process that detects an interruption occurrence; v) Risk recovery: process for rapid recovery of the supply chain during an occurrence.
The increase in globalization accelerates risks in the supply chain (Kara et al., 2020), due to the proportional increase in customer expectations and shortening of the life cycle of products in supply chain networks (World Economic Forum, 2017), threatening the sustainability and competitiveness of supply chains (Aqlan and Lam, 2016;Brusset and Teller, 2017).
In this current scenario, companies coordinate large amounts of information, therefore, companies increasingly use techniques such as Business Intelligence (BI) and DM for more strategic decision making (Heaney, 2015). Thus, DM techniques allow the identification of supply chain risk based on data analysis (Ranjan and Bhatnagar, 2011), developing proactive and reactive systems for SCRM (Lee et al., 2017;Wu et al., 2014). Although in the last years SCRM had experienced a considerable growth in terms of publications, some gaps remain. In this sense, this paper aims to carry a bibliometric study to map how these tools are being applied jointly with SCRM.

Data Mining and Operations Research techniques
The lack of data and information about uncertainties and risks generates lack of readiness, extra costs, and ruptures for a company. Nowadays, DM uses techniques and tools to convert data into metrics and information for SCRM decision making (Kara et al., 2020). Thus, the DM is used to detect and assess risks, discover how to source risks, determine patterns, predict events, and classify risks through the analysis of historical data and real-time data processing (Lee et al., 2017).
There are several descriptive and predictive techniques of DM for risk interpretation, which are: classification, prediction, regression, association analysis, clustering, anomaly detection (Jukic et al., 2017;Witten et al., 2017) and risk trends (Kara et al., 2020), depending on the needs and solutions required. Techniques like decision trees can also be used according to each functionality (Lee et al., 2017). Therefore, it is important to assess the standards and relationships between risks for the purpose of developing risk mitigation strategies (Liu et al., 2014), as well as the determination of categories, such as: risk classification of suppliers, customers and zones (Dutta et al., 2017;Geng et al., 2015;Tobback et al., 2017).

RESEARCH METHODOLOGY
To fulfill the objective of this paper, we propose five research questions: RQ1 Which are most relevant sources? RQ2 Which are the main authors? RQ3 How is the paper network structured? RQ4 Which countries outstand in this stream of research? RQ5 Which are the main keywords related to SCRM? To answer these questions, we structured a systematic search using Scopus database, Scopus database is the largest searchable citation and abstract source (Chadegani et al., 2013). Figure 1 shows our methodology workflow. The first step was searching Scopus database for "Supply Chain Risk Management" as well as its variations. The search was conducted in March 2020. We considered only papers in English, being these papers article or review papers being in its final publication stage. From the file selected from Scopus we generated a file with the ".bib" extension. The script written in R language allowed to remove duplicates and generate a final dataset containing 576 papers. We used the package called Bibliometrix (Aria and Cucurullo, 2017) and the R application called Biblioshiny (Aria and Cucurullo, 2017). Then, we read all titles and abstracts and selected papers that approached one or more DM techniques, which generated a final list of 48 papers. The complete set of results and statistics is shown in section 4.

RESULTS OF BIBLIOMETRIC STUDY
In this section we present the full bibliometric analysis. Table 1 shows a statistical summary about the bibliographic research. In Figure 2, we show the article distribution. We see that after 2016 there is an expressive growth of publications. Although the year of 2020 is incomplete, we note that it still has more publications than any other year before 2017.    The H-index of the journals is shown in Figure 6. We note that even though "Computers and Industrial Engineering" occupies the second position in publication numbers, "International Journal of Production Economics" occupies the second place in H-index ranking.     the papers are: 1 -Supply chain risk simulation and vendor selection; 2 -A stochastic model for risk management in global supply chain networks; 3 -A quantitative analysis of disruption risk in a multi echelon supply chain; 4 -Modeling supply chain planning under demand uncertainty using stochastic programming a survey motivated by asset liability management; 5-The impact of digital technology and industry 4.0 on the ripple effect and supply chain risk analytics; 6 -Revealing interfaces of supply chain resilience and sustainability a simulation study; 7 -a portfolio approach to supply chain disruption management; 8 -Dynamic pricing in the newsvendor problem with yield risks; 9 -Modeling supplier risks using Bayesian networks; 10 -Exploring dependency based probabilistic supply chain risk measures for prioritizing interdependent risks and strategies. Among these papers we perceive that the approach mostly consists in mathematical programming methods with few exceptions.
Among the whole list of 48 papers, two papers use DM techniques and are analyzed as follows. Kara et al. (2020) affirm that the growing information overload in supply chains require that companies develop fast real-time data mining techniques to create useful information from all the data. The authors construct a framework based on data mining for risk identification, risk assessment and risk mitigation. The main stages of the proposed model are: (i) identification of risk indicators, (ii) development of a risk data warehouse to gather and store risk data, and (iii) incorporation of DM module. The authors identify the following Risk identification sub-tasks: (i) Collect information about the company: The firm's size, and structure of the SC network have a significant impact on the type and distribution of risks; (ii) Map the SC network: SC maps provide a fundamental visibility of the SC network. The authors suggest heatmaps, for example, for visualizing the chain; (iii) Determine the risk attitude of the firm: The risk attitude and tolerance level of companies affect the risk identification, assessment, and perception, and choosing of risk mitigation solutions. (iv) Evaluate the company's resilience level to risks: Companies can prioritize their critical risk areas and focus their risk management practices in these areas. Thoni et al. (2018) propose using a Bayesian network (BN) to compute the likelihood of child labor in a supplier location based on the evidence from geography and sector, audits, and news reports. The key advantage of BNs is the explicit way of dealing with uncertain information, and, therefore, supporting decision making. The authors feed the BN with data from twitter and news websites to create indicators. The definition of what is data mining and what is classical mathematical programming is also unclear. In this sense, we only considered that a paper used a "data mining" technique if the authors explicitly considered so. We consider that is important do Disambiguate mathematical programming, mathematical optimization, and DM. The confusion is even augmented if we consider that many data mining algorithms use optimization techniques in its computations. Figure 13 shows the most used keywords. We note that DM related keywords are not among the most relevant keywords  Figure 14 shows the word cloud based on the keywords of our survey. With a wider word range in comparison to Figure 13, we can still perceive a lack of keywords such as Data Mining.  Figure 15 shows the co word network. The network generated has three main cluster. A small cluster which associates sales and supplier selection with integer linear programming; a cluster where risk assessment, decision making and supply chain management is associated with artificial intelligence and Bayesian networks; and, finally, a cluster where supply chain risk management is associated with disruptions, ripple effects, uncertainty and stochastic methods. The co citation network in Figure 16 shows as the core papers, the fundamental papers that started to coin Supply Chain Risk Management as a concept. For example, Norman andJansson (2004), Juttner (2003) are among the first papers that came up with SCRM as a concept.
The green cluster follows conceptual as definition papers mostly while the blue cluster includes newer and applied papers such as Ivanov (2017).

DISCUSSION
This research presents a first "scanning" of the literature to serve as guide to researchers and scholars to initiate themselves in the subject, therefore, the study identified the main authors, the top cited papers, co citation between papers, as well as other statistics that help to create a panorama of the literature.
Nevertheless, even conducting the research in a broad database such as Scopus, it would not be possible to guarantee that all relevant literature was covered and so, it can be considered a limitation. The second limitation is that only papers in English language were considered. This decision was based in: i) Most review articles use this criterion, therefore, it has methodological support from the literature, ii) English is the main language used to communicate scientific research, iii) To guarantee that any researcher can understand any paper selected for this bibliometric study. Nevertheless, this procedure may disregard relevant studies published in other idioms.
This study may lead to some managerial implications. Managers and companies that are related in a supply chain must gradually redesign processes to include SCRM processes and activities along the SC. In addition, it may show, in the long run, some strong possibilities of saving resources that are increasingly scarce and expensive. COVID-19 pandemic has shown the severe lack of resilience of supply chains throughout the world. Local producers are responding more efficiently while globalized supply chains are struggling to keep the service level as the pandemic spreads differently across the globe. In the future, global supply chains should develop triggers and fast reconfiguration methods that allow them to diminish the ripple effects, as well as other ruptures. Data mining must be much more implemented in order to forecast conditions which may contain hints that the next pandemic is on its way and take the necessary precautions.
In terms of SCRM managers should implement control towers to quickly respond to global risks. Data science and operations research techniques should be implemented in programming languages fast enough to support real-time decisions that can identify, assess, and mitigate risks in the supply chain.
In terms of capabilities, today's professionals were not trained in a 4.0 world. In this sense, companies must heavily invest in training, including quantitative methods, as well as programming languages such as Python, Julia, R, among others.
As social implications, when the pandemic ends, global economies will struggle, and people will lose their jobs as a direct consequence of a lack of resilience of global supply chains. Important questions arise such as: What can be done to respond more efficiently to the next pandemic? How SCRM can support a quick discover and production of vaccines?

CONCLUSION
This paper accomplished a first literature scan concerning quantitative techniques under the umbrella of Supply Chain Risk Management. Since we mapped the main papers, as a future research, we recommend a thorough Systematic Literature Review with rigorous content analysis. The SLR conducted by this paper found only 2 articles that propose a thorough framework of DM application, one of the papers is conceptual and the other one is applied. As future tendencies we perceive that DM has still an unfolded potential to identify, assess and mitigate risks in a supply chain. We conclude that big data is a reality for most companies, nevertheless, better decision-making processes based on the data does not augment in the same rhythm as the data is collected. Such fact becomes a problem, as all the effort that is being put in data collection is not being put in data analysis that would generate efficient risk mitigation. Although both papers provide remarkable contributions, we note that the related literature still lacks a complete framework, that allows managers to gather data, input in a system and have an automated process that performs risk identification, assessment and mitigation. Journals could open paper calls directed to DM articles that add significant contribution to SCRM, motivating researchers to develop useful knowledge within this stream of research. As future research opportunities, we recommend that a deeper SLR with stronger content analysis is conducted to thoroughly analyze each of the papers found presenting a detailed content analysis to identify the gaps