Displaying present location in the site.

Advancing Sustainability in Global Supply Chains through Agent-based Simulation

Vol.18 No.1 May 2025 Special Issue on Green Transformation — The NEC Group’s Environmental Initiatives

In today’s world, with its complex global supply chains, the difficulties and uncertainties we face offer both challenges and opportunities for making things better, especially in terms of efficiency and sustainability.

These challenges are exacerbated by unpredictable events, including natural disasters, unexpected incidents, and unusual business practices. The increasing frequency and severity of climaterelated events highlight the need for more advanced modeling methods. By focusing on reducing risks and enhancing sustainability, these models can play a crucial role in mitigating the impacts of climate change and supporting carbon emission reduction efforts.

In this paper, we present a new agent-based simulation approach that goes beyond the usual limits of supply chain simulations by incorporating sustainability directly into supply chain operations using Reinforcement Learning (RL) algorithms. We will introduce a new agent-based simulation technology; a sustainable supply chain simulation system that takes carbon emissions into account in its main operations. This new agent-based simulation technology is the core of the analytical simulation in a comprehensive suite offering support for GX, which is currently under planning. Additionally, we examine how effective a multi-agent RL strategy is in dealing with the complex and uncertain nature of supply chains that span multiple levels. By comparing this strategy with traditional heuristic methods, our study looks at how well single versus multiple RL agents can manage risks and improve sustainability in both the beginning and end parts of the supply chain. The results of our experi-ments show that strategies based on RL are much better than traditional methods at managing risks, making profits, and achieving sustainability goals.

1. Introduction

In the evolving landscape of global Supply Chain Management (SCM), mitigating carbon emissions has emerged as a critical concern. These imperatives address not only environmental sustainability but also operational efficiency and regulatory compliance. The complexity of modern supply chains, characterized by intricate networks of suppliers, manufacturers, and retailers across diverse regions, poses significant challenges in accurately quantifying and managing carbon footprints. With ambitions toward achieving a net-zero economy, numerous countries are adopting varied sustainability policies1) 2). To meet internationally agreed-upon cli-mate goals, optimizing supply chain management by integrating carbon emissions considerations is essential.

Efforts to reduce carbon footprints in supply chain management necessitate a comprehensive approach that incorporates robust strategies to address traditional uncertainties while actively striving for sustainability and carbon neutrality. This approach ensures that supply chains not only achieve their economic objectives but also contribute positively to environmental stewardship. Machine Learning (ML) has become increasingly prevalent in enhancing SCM, particularly in improving demand forecasting and sales predictions3)-7), estimating commercial partnerships8) 9), and optimizing inventory management10) 11). However, reliance solely on ML techniques presents certain limitations, including the lack of transparency in the decision-making process and the intensive requirements for training data and computational resources. These challenges necessitate either advanced domain expertise for developing sophisticated and experiential strategies or substantial datasets for model training, which can be particularly challenging to acquire, especially in the context of proprietary commercial data.

Considering the identified limitations of ML methods in addressing supply chain challenges, an increasing number of researchers are integrating simulation-based methods with ML to tackle these issues12) 13). To this end, we are introducing our new agent-based simulation technology, which is a simulation tool tailored for general complex problems. Our simulation technology encompasses three critical components: a comprehensive agent-based simulation engine, a resource management system, and an interactive platform for the implementation and testing of policies. Designed for efficiency and scalability, our simulation technology is adept at simulating complex scenarios involving numerous agents and complex resource flows. Owing to its capability to monitor every detail of each component within the simulation framework, this simulation tool facilitates the calculation of product-level carbon emissions with a precision that surpasses previous methods.

Reinforcement Learning (RL) has emerged as a critical technique for optimizing agent-based simulations in supply chain management, attributed to its unparalleled capability to navigate complex, uncertain environments. Firstly, supply chain management necessitates sequential decision-making amidst uncertainty, a domain where RL excels by optimizing decisions across time to favor long-term rewards over immediate gains. This approach is vital for supply chain decisions, considering that short-term actions may lead to enduring consequences. Secondly, RL can model and learn complex behaviors directly from agent-interaction data, obviating the need to explicitly enumerate all conceivable states and actions—a task impractical for complex systems. In this study, we apply RL to each simulation agent and assess various RL algorithms to facilitate optimization of supply chain management towards achieving carbon neutrality.

The contributions of this paper are summarized as follows:

  • We have developed an agent-based supply chain simulator capable of simulating detailed interactions among supply chain components, with an emphasis on sustainability attributes.
  • We investigate the potential and limitations of multi-agent reinforcement learning algorithms in reducing supply chain uncertainty by extending the supply chain to include participants across various tiers.

This paper is organized as follows. Section 2 introduces the previous work about supply chain system simulation and reinforcement learning. In section 3, we defined the supply chain system optimization problem. In section 4, we introduce the new agent-based simulator in detail, including all the key components. In section 5 and section 6, we introduce the reinforcement learning used in our simulation evaluation and the experiments respectively. In section 7, we conclude this paper and introduce future work.

2. Related Work

Agent based simulation tools are increasingly employed in supply chain management to facilitate the exploration of complex interactions among individual agents, which may represent companies, consumers, or products, within the supply chain network. Tools such as AnyLogic12), Simio13), and MATSim14) exemplify agent-based supply chain simulation platforms. Nevertheless, these tools do not explicitly focus on sustainability within the supply chain, nor do they extend to the precise calculation of product-level carbon emissions.

Since there was a lack of relevant research on supply chain management in sustainability, based on our exploration, the closest previous work is the application of RL in inventory management. In this type of problem, the RL-agent first observes the current state of the system, including current inventory levels, demand patterns, lead times, etc. The RL-agent is then required to determine order quantities or reorder points, and the environment responds by generating new states and providing rewards or penalties to guide the learning process. As a typical case of downstream uncertainty, the variant demands are considered the drive for reinforcement learning solutions in a lot of research, and the adaptive balance between customer satisfaction and storage costs need to be found. Zwaida15) proposes an online solution with a deep Q-network (DQN) algorithm to prevent a drug shortage problem.

3. Supply Chain Simulation

The goal of our work is to develop a simulation tool to model supply chain system behaviors with a focus on sustainability, aiming to optimize supply chain decision-making for lower carbon emissions and reduced uncertainty. To assess our simulation tool and the RL optimization methods for system sustainability, we transformed a real-world supply chain system focusing on sustainability into an optimization problem aimed at lowering carbon emissions. We are introducing a new simulation technology; an agent-based simulation tool designed for general complex system modeling. This paper demonstrates the application of our simulation tool in supply chain management with a focus on sustainability. We use the supply chain system as a case study to elucidate our simulation’s rationale and the methodology for mapping real-world systems into the simulation environment.

3.1 Overview of Agent-based Simulation

A general complex system is comprised of interacting, autonomous components. Unlike simple systems, complex adaptive systems possess the ability for agents to adapt at the individual or population levels. This exploration into complex systems forms the basis for understanding self-organization, emergent phenomena, and the origins of adapta-tion in nature. Conceptually, the decomposition of a general complex system into three primary components—Agents, Resources, and Topology—is derived from a holistic approach to modeling and comprehending the intricate interactions and dynamics within such systems. Agents within the system have the capacity to act, interact, and make decisions based on predefined rules or through adaptive learning mechanisms. Resources include the various elements and assets that can be consumed, transformed, or produced by agents within the system. Topology refers to the arrangement and connectivity of elements within the system, highlighting the structural aspect of complex systems. It delineates how agents are linked and the way they can interact with one another. This framework not only facilitates the conceptual understanding of complex systems but also enables structured simulations to explore system dynamics, predict behavior under diverse scenarios, and devise interventions to achieve specific objectives. The diagram shown in Fig. 1 exemplifies our simulation tool’s functionality, orchestrating resource flow dynamically through the supply chain. The module is pivotal, facilitating the simulation of diverse supply chain strategies and their impacts on efficiency and sustainability.

zoomClick to Enlarge
Fig. 1 Simulation tool in supply chain simulation.

3.2 Agent

Agents are entities within the system capable of acting, interacting, and making decisions based on predefined rules or adaptive learning mechanisms. Each agent is endowed with the ability to process information, utilize resources, and potentially alter the topology through their actions. The complexity of real-world systems emerges from the collective behaviors of agents, leading to phenomena such as self-organization, adaptation, and evolution.

The design and description of agents within a simulation are predicated on several essential characteristics. First, an agent is a self-contained and uniquely identifiable entity with attributes that enable it to be distinguished from and recognized by other agents, facilitating interaction. Second, an agent is autonomous and self-directed, capable of operating independently within its environment and in interactions with other agents. An agent’s behavior, which bridges sensed information to decisions and actions, can range from simple rules to complex models, including RL mechanisms that adapt inputs to outputs. Third, an agent possesses a state that evolves over time or in response to external changes. In our methods, we employ a state machine mechanism within each agent to represent its state. This mechanism is chosen for its inherent ability to model the discrete states and transitions that define the operational and decision-making processes of agents. In the context of supply chain systems, this approach is particularly apt, as it mirrors the operational stages and decision-making sequences in procurement and manufacturing processes, among others.

Fig. 2 shows a state machine comprising two states: State 0 and State 1. Each state triggers a specific action, denoted as Action 0 and Action 1, respectively. The diagram also illustrates a unidirectional transition from State 0 to State 1, initiated by a designated event. This transition symbolizes a shift in behavior, as indicated by the distinct actions associated with each state.

zoomClick to Enlarge
Fig. 2 State machine in simulation agent.

Within our simulation, agents function according to a behavioral model that promotes autonomy and responsiveness to other agents. This model incorporates decision-making algorithms that enable agents to adapt to the evolving conditions of the simulation environment, thus mirroring the uncertainties and dynamics typical of real-world supply chain operations. Agents evaluate their performance metrics, such as delivery times and production rates, and adjust their strategies accordingly to optimize these variables. The model guarantees that agents’ actions are responsive to changes in resource availability and demand, creating a self-regulating system that adapts based on simulation inputs and inter-agent interactions.

3.3 Resource

The resource component manages both tangible and intangible resources within the connections among all the agents or produced by agents, employing algorithms that adapt to simulation conditions. Resources are allocated based on supply and demand, with the simulation tracking their utilization and wastage. It also simulates the exchange of resources among agents, incorporating factors like market trends and demand forecasts. It ensures a balance between resource consumption and replenishment, aligning with the sustainability metrics modeled within the simulation. Resource dynamics, such as scarcity, competition, and allocation, play a critical role in the agent’s behavior and interactions and consequently, in the emergent properties of the system.

The nature and dynamics of the resource can greatly impact agent behavior especially under different interactions such as cooperation and competition. For cooperation, the agents work together to share, allocate, or optimize the resource in the same direction such as a shared benefit or similar goal. The resource in such settings should be designed to encourage collaborative strategies, such as pooling resources to complete a task that no single agent could accomplish alone. Meanwhile, the simulation can explore how cooperation leads to efficient resource use and mechanisms for fair distribution and sustainability. For competition, the competitive resource settings can simulate the real-world phenomena such as market dynamics, ecological survival strategies, or social competition. The focus can be on how agents adapt the strategies in response to resource scarcity; the impact of competition on resource distribution.

3.4 Topology

The topology component in our simulation simulates the dynamic connections and interactions between agents and resources. It concerns the arrangement and connectivity of elements within the system, highlighting the structural dimension of complex systems. Topology determines how agents are interconnected, thereby influencing their potential interactions. The configuration of a system’s topology plays a crucial role in its dynamics by dictating the channels for information or resource flow and impacting overall system performance. As interactions between agents and resources unfold, the topological structure adapts, shedding light on optimal system configurations.

Topology within simulations can be categorized into static/dynamic and physical/virtual, accommodating various real-world system types. Static topology features a spatial structure that remains constant throughout the simulation period, streamlining the analysis of agent interactions and the influence of spatial arrangements on system dynamics. It suits the study of systems with stable spatial relationships over time, such as those in organization-based simulations, allowing a concentrated examination of other dynamics.

Conversely, dynamic topology supports modifications to spatial structures during the simulation, including changes in agent positions, modifications in agent connections, or variations in spatial configurations. This type of topology is crucial for simulating systems where adaptability, movement, or structural changes are integral to behavior, exemplified by social network evolution simulations.

Physical topology deals with the spatial arrangement of agents and resources, considering distances, barriers, or spatial distributions that influence interaction probabilities and dynamics. It is applied to simulate real-world spatial dynamics, such as urban traffic patterns. Virtual topology, on the other hand, defines connections among agents based on relationships, communication paths, or other non-physical links. It is vital for studying systems where physical locations are secondary to the connections between entities, as seen in simulations of idea development or virtual networks.

4. Optimization Method

In this study, our objective is to investigate the impact of supply chain depth on uncertainty management, focusing on a scenario that incorporates multiple suppliers and customers. At the heart of this scenario is an intermediary entity, such as a retailer, which is represented by a decision-making agent (as depicted in Fig. 3). This agent aims to maximize profits through strategic purchasing and selling activities, with a keen consideration for sustainability, herein represented by carbon emissions.

zoomClick to Enlarge
Fig. 3 (a) The basic topology of the supply chain with a single agent balancing with uncertainty from both upstream and downstream entities. (b) A multi-layer agents involved supply chain for uncertainty sharing.

The initial simulated scenario involves three suppliers connected to a central agent, which in turn is connected to three customers (as illustrated in Fig. 3 (a)). These connections symbolize contracts established for the trading of products. To inject an element of uncertainty into the simulation, the connection between a given supplier j and the central agent i may become disabled with a certain probability at pij any given time step t. Customer demand directed towards agent i is modeled as a random variable that follows a Poisson distribution, represented by di . This setup allows the agent to purchase products from suppliers at a quoted price and then sell them to distributors at a price determined by the agent, effectively simulating the dynamic and uncertain nature of customer demand.

To further examine the influence of having multiple multi-level agents in the supply chain on sustainability, the scenario is expanded as shown in Fig. 3 (b). In this more complex setup, multiple agents share the uncertainties, each facing a disruption probability pii when interacting with another agent. The demand requested by a downstream agent is denoted as dik , illustrating the extended network and layered interactions designed to explore deeper aspects of supply chain sustainability and uncertainty management. Because of the traditional optimization methods that struggle to cope with the stochastic nature and the high dimensionality of decision spaces, we apply the RL-agent on the agent to make decisions due to its capability to learn optimal strategies through interaction with a real-world system. Meanwhile, the adoption of RL can learn from simulation without real-world risks, deal with uncertainty and partial observability, and facilitate continuous improvement.

5. Experiments

5.1 Implementation Details

In the experiment, as shown in Fig. 3, the supply chain system includes three suppliers and three customers. An agent can purchase products from the suppliers at quoted prices and sell them to distributors at self-determined prices. This model effectively simulates the dynamic and uncertain nature of customer demands. We implement RL (DQN) in the supply chain with one single RL-agent or three RL-agents in Fig. 3 (a) and (b), respectively. In the RL method, the states include the product price from each supplier, the selling price to each customer, and the current inventory amount in each agent. The actions include setting the buying and selling prices of the product and the amount of product purchased by the agent for the next time period. The reward is considered according to the carbon emission for the product and the agent’s earned profit. To compare with the RL method, we designed a naive threshold-based heuristic strategy that determines the decision according to a certain threshold.

5.2 Implementation Details

In a comparison between an RL-agent and an agent driven by a heuristic strategy, we conclude the average profits in 200 simulations (Table). For the single-agent method, the heuristic strategy yielded an average profit of $183.05 with a standard deviation of $12.86, indicating a relatively stable performance. In contrast, the single agent employing RL outperformed the heuristic approach with a significantly higher average profit of $267.87, albeit with a larger standard deviation of $63.32, suggesting higher variability in the outcomes.

Table Average Profits Obtained by Agent with Different Strategies.

The multi-agent method followed a similar pattern, with the heuristic strategy achieving an average profit of $215.43 and a standard deviation of $23.68. The multi-agent strategy utilizing RL demonstrated superior performance with the highest average profit of $307.19 among all strategies tested, but also exhibited the highest standard deviation of $79.31, indicating the great-est variability in profit outcomes.

These results underscore the enhanced performance potential of RL strategies over heuristic in both single and multiple agent settings, as evidenced by the higher average profits. However, the increased standard deviations associated with RL strategies also highlight the greater risk in profits, which may be attributed to the dynamic and possibly complex decision-making processes intrinsic to RL algorithms.

6. Future Work

This research lays a foundational framework for integrating sustainability considerations with reinforcement learning to enhance supply chain resilience and sustainability. However, the dynamic and multifaceted nature of global supply chains presents numerous avenues for further investigation. Future work will focus on expanding our simulation tool’s model complexity to enhance supply chain resilience, particularly in the context of climate change with detailed sustainability metrics.

Enhanced Model Complexity: Expanding the complexity of our simulation system to incorporate more granular sustainability metrics, such as water usage, land use, and bio-diversity impact. This would allow for a more comprehensive assessment of environmental stewardship across the supply chain.

Advanced Reinforcement Learning Algorithms: Investigating the application of more advanced reinforcement learning algorithms, including deep reinforcement learning and multi-agent reinforcement learning strategies, to better capture the complexities and dynamics of global supply chains.

Supply Chain Collaboration Mechanisms: Developing mechanisms for enhanced collaboration and information sharing among supply chain participants. This includes exploring the role of blockchain and other decentralized technologies in fostering transparency and trust in sustainable supply chain practices.

Policy and Regulatory Impact Analysis: Analyzing the impact of policies and regulations on supply chain sustainability and resilience. Future research could model the effects of different regulatory frameworks on supply chain decisions and outcomes, providing insights for policymakers.

LLM-Enabled Agent-Based Simulation: Building upon the integration of advanced AI techniques, future research will explore the application of Large Language Models (LLMs) within the agent-based simulation framework to facilitate more sophisticated communication and decision-making processes among agents. LLMs can be utilized to enable agents to process and interpret natural language data, allowing them to extract actionable insights from unstructured data sources such as news articles, social media feeds, and indus-try reports. This capability will signifi-cantly enhance the agents’ ability to anticipate and react to real-world supply chain disruptions and trends by understanding the context and sentiments expressed in global news and market analyses.

7. Conclusion

This paper studied the complexities and challenges inherent in today’s global supply chains, underscoring the need for innovative approaches to manage uncertainties and enhance sustainability. By introducing our new sustainable supply chain simulation system and employing a multi-agent reinforcement learning strategy, we have made a significant step forward in addressing these challenges. Our findings reveal that reinforcement learning, when applied across a multi-level supply chain topology, not only improves risk management and profit margins but also significantly advances environmental, social, and economic sustainability objectives. The comparative analysis with heuristic strategies further emphasizes the superiority of reinforcement learning in navigating the uncertainties that plague global supply chains. This research contributes to the broader discourse on sustainable supply chain management, showcasing the potential of advanced simulation techniques to fortify supply chain resilience and sustainability amidst a volatile global landscape.

References

Authors’ Profiles

WANG Haoyu
Researcher
NEC Laboratories America
CHEN Haifeng
Department Head of DSSS
NEC Laboratories America
SATO Moto
Director of Business Development
NEC Laboratories America
WHITE Chris
President
NEC Laboratories America