1 Introduction

In the last few years, we have witnessed the rapid proliferation of Internet of Things (IoT) deployments and applications. A number of them, including public transportation, traffic monitoring, parking and surveillance systems in smart cities [25, 34, 37], collect and transmit video over a multi-hop wireless network.

Like other services that transmit video streams, IoT video applications have stringent Quality of Service (QoS) requirements in terms of throughput, delay and packet loss [2]. Multipath routing, a routing strategy that tries to find multiple paths between the source and destination, has been proposed as an effective strategy to help meet QoS requirements of video transmission by finding routes that provide adequate bandwidth and delay as well as reliability and resilience [10].

In order to fully leverage its ability to find and transmit flows through multiple routes, multipath routing must be able to, in a timely manner, select routes that meet the requirements of the driving video application. For applications with multiple video sources, for example, the probability of paths “interfering” with one another, i.e., sharing nodes and links, can be significantly higher. Additionally, sources that transmit flows at different bitrates require more complex path selection approaches to accommodate such flow heterogeneity and satisfy user Quality of Experience (QoE). Wireless link performance can also vary due to various factors such as link-layer bitrate control, changes in Signal-to-Noise-Ratio (SNR), and complex propagation phenomena. Furthermore, uncontrolled transmissions of multiple flows can also cause heavy congestion, as well as flow interference, medium access contention, and collisions [23]. Depending on the number of simultaneous flows, the likelihood of losing or delaying video frames that are particularly loss- and delay-intolerant may increase as a consequence of flow interference, mainly due to collision and queuing.

To address these challenges, in our prior work we introduced FITPATH [8], an efficient and adaptive path selection scheme for multipath routing, that accommodates environments where multiple video sources can transmit simultaneous heterogeneous video flows, i.e., flows with different bitrates. FITPATH implements a novel heuristic-based iterative optimization approach, which leverages the Multimedia-Aware Performance Estimator (MAPE) [9] to estimate the conditions of the underlying network in real time while accounting for the different bitrate requirements of the application video flows.

In light of our prior work which focuses on the initial design and basic performance evaluation of FITPATH [8], this paper makes the following new contributions:

  • Taxonomy of related work: We introduce a taxonomy of the state-of-the-art in multipath selection mechanisms for wireless video transmission which we use to provide an overview of existing approaches.

  • Broader experimental scenarios: We evaluate FITPATH across a wide range of network densities, video motion levels, and offered loads, reflecting a variety of realistic IoT deployment conditions.

  • Thorough comparative performance evaluation: We compare FITPATH with six existing path selection mechanisms under diverse IoT scenarios, demonstrating that FITPATH is able to deliver superior end-user QoE while also improving overall network performance.

  • Extended performance metrics: In addition to throughput, delay, and structural similarity index measure (SSIM), we also evaluate FITPATH considering frame loss, network utilization, peak signal-to-noise ratio (PSNR) and mean opinion score (MOS).

  • Heuristic analysis: We analyze FITPATH’s behavior in terms of convergence time, number of MAPE evaluations, video coding schemes, and the impact of pruning strategies on computational efficiency.

  • Control message overhead: We investigate the influence of control message overhead and network dynamics on video quality in decentralized network control plane architectures.

  • Deployment strategies: We propose and discuss practical deployment alternatives for FITPATH, considering both centralized (e.g., SDN-based) and decentralized (e.g., proactive routing) network control mechanisms.

  • Realistic scenario evaluation: We assess FITPATH’s effectiveness under realistic conditions and demonstrate that it produces feasible intermediate solutions that incrementally improve QoE over time, regardless of the video coding scheme.

  • Reproducibility resources: We provide access to our source code, datasets, and detailed simulation setup instructions to support reproducibility and further experimentation.

The remainder of this paper is organized as follows. Section 2 introduces application scenarios for IoT video transmission that motivate this proposal. Section 3 reviews related work and classifies existing multipath selection schemes. In Section 4, we provide a comprehensive overview of FITPATH, offering a detailed review of its components and functionalities. We propose possible ways to deploy FITPATH in practice in Section 5. Our experimental methodology and results are reported in Sections 6 and 7, respectively. Finally, Section 8 concludes the paper and presents directions for future work.

2 IoT application scenarios

FITPATH caters to a wide variety of IoT systems where video streams are transmitted by multiple sources. Some example IoT applications that use multiple video sources include Connected Health [14, 26], Industrial IoT [27], Smart City [4, 15] and Environmental Monitoring [17, 20]. In these cases, the number of active video sources depends on the requirements of the driving application. Usually, sources are enabled by video scenes that require attention, and video resolution is set according to the viewer’s needs.

FITPATH targets IEEE 802.11-based multihop wireless networks as they are self-sufficient and can operate independently from the power- and communication grids. Additionally, recent versions of the IEEE 802.11 protocol family have been designed to support high data rate applications such as video transmission. Our network model assumes that: (1) all nodes can work as video sources as well as relays, (2) are powered by continuous energy sources, and (3) have sufficient storage and processing capacity to serve as relays and video sources.

Figure 1 illustrates typical deployment, scenarios and applications for IoT video transmission. One such scenario represents a smart city with cameras deployed along neighborhood streets in an urban area, where video sources send live video streams to a monitoring center over a multihop wireless network. Those multiple sources can transfer flows simultaneously according to the demands of the monitoring center. For example, the monitoring center needs to track an event or a target in a street or a block of the city and sends a request to the set of IoT nodes in that area to start transmitting video with some level of video quality. In this case, each source may generate a known number of video flows with specific bitrates, according to the video resolution and encoder used. These flows are transmitted from each source node through selected paths. At the destination, i.e., the monitoring center, the video decoder is responsible for synchronizing and merging received flows for rendering. Depending on the application scenario, the system can also handle multiple sinks by specifying the sink for each set of sources. Note that this is not restricted to outdoor environments. IoT video systems are also deployed indoors.

Fig. 1
Fig. 1

Example of IoT video application scenarios

3 Related work

A number of multipath selection mechanisms for wireless video transmission have been proposed [2, 10, 19]. Most of them have been proposed for on-demand video services or handle only a single video source, ignoring flow interference [19]Footnote 1.

Some existing approaches which consider IoT application scenarios where live video streams are transmitted from multiple sources have used routing metrics, such as delay and hop count, which are not necessarily the best options to select optimal paths for all flows [10]. They analyze links and paths individually increasing the probability that two or more paths share a link, which results in flow interference.

FITPATH addresses the above issues by using an efficient heuristic-based algorithm to: (1) find a set of candidate paths and (2) given the requirements of the different video sources, e.g., number of flows, flow bitrates, rank candidate paths based on their throughput and end-to-end delay by employing MAPE [9] to estimate real-time network conditions, while accounting for the diverse bitrate requirements of application video flows.

In this work, we conduct a thorough experimental study comparing FITPATH against various path selection mechanisms. The different multipath selection strategies in our comparative study were chosen based on our taxonomy (illustrated in Table 1) to represent different classes of multipath selection strategies, namely: disjoint-path, position-based and heuristic-based. In general, as the name implies, disjoint-path mechanisms focus on selecting multiple paths that are disjoint from each other in order to enhance fault tolerance and reliability. Position-based strategies consider geographical positions or physical locations of nodes to optimize resource utilization. Heuristic-based mechanisms rely on predefined rules to prioritize certain criteria, such as minimizing latency or maximizing throughput. These multipath selection approaches are described in more detail below.

Table 1 Multipath selection approaches considered in our performance evaluation study

Disjoint-Path Selection

The Error Resilient Video Transmission (ERVT) [18] implements an extended version of the AOMDV [29] routing protocol where the two best disjoint paths according to the minimum hop-count metric are selected for each source, while the others are considered backup paths. This approach assumes the existence of paths with disjoint nodes for each source. However, this assumption is not always true, particularly for scenarios with multiple video sources. The Cross-Layer Multipath Routing (CLMR) [1] is also based on a reactive protocol that starts with the flooding of a route request to obtain the total cost for each path. It aims to improve QoS and minimize energy consumption by measuring the path cost with a parameterized weight to control the importance of delay, packet delivery ratio and energy metrics. However, these parameters are defined offline and configured statically which makes this method not suitable for dynamic video services.

Position-Based Path Selection

The Real-Time Video Streaming Routing Protocol (RTVP) [3] uses a path selection approach based on a geographic structure in the form of coronas, where the network is divided into regions, the coronas, delimited by a set of circumferences of certain radii centered at the destination node. Each node is assigned a corona level according to its position in these regions. Thus, paths are selected by forwarding packets hop-by-hop from a source that has a high corona level to the destination node, which has corona level zero, seeking to avoid nodes that share the same paths. This approach accounts for node mobility as corona levels adjust according to the nodes’ position, but, depending on the corona structure, it might select paths with a high number of hops. QoS-oriented Multipath Multimedia Transmission Planning (Q-MMTP) [21] proposes a multipath selection mechanism using the Spline mathematical model to estimate a set of ideal positions for the nodes. As a result, it generates a series of interpolations between the source and the destination. Thus, the paths are selected according to the distance between the nodes and the ideal positions. Although Q-MMTP provides adequate load balance for multiple flows, it does not support multiple sources and its strategy depends on the nodes’ geographical positions.

Heuristic-Based Path Selection

QoE-aware sub-optimal routing algorithm (QSOpt) [33] formulates the multipath selection problem using mixed integer linear programming and implements a heuristic to find a solution that attempts to maximize QoE. To evaluate QoE, a discrete function is proposed to determine a mapping between packet loss and QoE metrics. While the centralized algorithm provides a viable solution to improve resource utilization as well as QoE, the QSOpt model is not scalable and requires significant computational resources as the number of flows increases. It also assumes identical transmission rates for all flows, regardless of video coding characteristics, which impacts the level of interference and consequently its ability to find good solutions. In prior work, we proposed a mechanism called ILS-MDC [6] based on the Iterated Local Search (ILS) metaheuristic for transmitting flows generated by the Multiple Description Coding (MDC) technique. The heuristic performs iterative searches for a set of paths that maximize the aggregate network throughput. The objective function was implemented using the Algorithmic Framework for Throughput EstimatoRs, or AFTER [32], a real-time throughput estimator that estimates the throughput of each flow for a given set of paths. However, flow bitrates are not specified in ILS-MDC because AFTER does not support this. Instead, AFTER assumes a TCP-like behavior in which flows transmit at the maximum capacity achieved by their respective paths. Because multimedia flows have well-defined bitrates, this may lead to significant inaccuracies.

4 FITPATH

FITPATH’s main goal is to perform efficient path selection to support IoT video transmission. It can accommodate multiple video sources as well as different requirements from the video transmission application, in particular generating solutions suitable for flows with different bitrates. It introduces a novel route selection approach that evaluates end-to-end delay and throughput for each flow, considering inter-flow interference from multiple sources transmitting video flows at different bitrates.

While based on established metaheuristics, FITPATH’s main contribution is the practical adaptation of these techniques to IoT video transmission. Its implementation integrates path selection, flow requirements, and inter-flow interference into a modular framework. In this work, we demonstrate how FITPATH performs per-flow path selection in a decentralized and adaptive way, making it well-suited for dynamic and resource-constrained wireless environments where video quality and network performance must be balanced. FITPATH improves on current metaheuristic-based approaches such as ILS-MDC and QSOpt, by considering inter-flow interference and multi-flow sources with specific bitrates. In the remainder of this section, we describe FITPATH in detail. We start by formulating the path selection problem FITPATH addresses.

4.1 Problem formulation

As illustrated in Fig. 2, FITPATH considers a known set S of network nodes that can act as video sources. According to the video encoder used, each source \(s\in S\) generates a set \(f_s = \{f_{s_i}\}\) of video flows, where \(0 < i \le |f_s|\). Each flow \(f_{s_i}\) is transmitted from each source node using a selected path \(p_{s_i}\) to reach the flow’s destination. At the destination, the sink node, the video decoder is responsible for synchronizing and merging the received flows to render the video stream. Depending on the application scenario, FITPATH supports multiple sinks by specifying the sink for each flow \(f_{s_i}\).

Fig. 2
Fig. 2

Multi-source, multi-sink, multi-path video transmission scenario

In some applications, video sources generate multiple flows with different video compression techniques. For instance, traditional video compression techniques adopted in multipath video transmission, such as Layered Coding (LC) and Multiple Description Coding (MDC), generate flows with different bitrates [7]. As such, the multipath selection mechanism must account for the flow’s bitrate and be aware of the application’s requirements to improve the user’s experience, balance offered load and reduce packet loss and delays. Therefore, the problem FITPATH addresses consists of selecting paths that support the requirements of concurrent and heterogeneous video flows in order to maximize the user’s perceived QoE, while accounting for the current conditions of the underlying network.

4.2 Approach and architecture

FITPATH is based on the Iterated Local Search (ILS) metaheuristic [28] for selecting candidate paths for each video flow generated from multiple sources. Figure 3 shows FITPATH’s main components. As input, FITPATH uses information about the network topology and about the video flows (e.g., sources, number of flows and flow bitrate).

Fig. 3
Fig. 3

FITPATH design

FITPATH starts by performing an all-pairs least-cost (or shortest) path computation to generate a set of candidate paths P, which is used by ILS as input. The set P of candidate paths is generated based on Yen’s algorithm [40] by choosing, for each flow, a list of K least cost paths between the source and the sink using the ETX (Expected Transmission Count) metric [16]. The ETX of a link is an estimate of the expected number of link-layer transmissions and retransmissions needed until a packet is correctly received. In order to rank these K candidate paths, ILS applies its iterated search algorithm and evaluates each solution using the Multimedia-Aware Performance Estimator (MAPE) [9]. MAPE, which is described in more detail in Section 4.4, considers the entire state of the network and estimates the network performance of candidate paths, i.e., throughput, delay and loss, in the presence of inter-flow interference, which allows FITPATH’s ILS to select the best candidate path for each flow.

4.3 Iterated local search

Algorithm 1 shows FITPATH’s ILS pseudo-code and Table 2 summarizes the algorithm’s notations. FITPATH’s ILS computation consists of the following steps: initSolution(P) initializes p with the first candidate solution in the list of candidate solutions P generated by an all-pairs least-cost path algorithm (line 2). P’s calculation is based on the graph G(NLFR) representing the underlying network topology, where N is the set of nodes \(\{ 0, 1, \dots , n \}\), L is the set of links \(\{0, 1, \dots , l \}\), F is a list of flows associated with their sources S, and R is a list of bitrates corresponding to each flow. Notice that \(F = \bigcup \limits _{s\in S} f_s\) and \(|p|=|F|\). As will be discussed in Section 5, information about the underlying network topology, i.e., G, can be obtained in practice using different approaches, such as proactive link-state routing protocols (e.g. OLSR [13]), or from a centralized controller in SDN-based networks. Note that the initial solution p contains a single path for each pair source/sink, through which all flows are transmitted; then MAPE is invoked and calculates the cost of the candidate solution p (line 3). Solution p and its cost are then used to try to find better solutions; LocalSearch performs local searches to refine p seeking a better solution (line 6); Perturbation generates intermediate solutions \(p\prime\) and \(p\prime \prime\) by applying perturbations to p (line 8); and AcceptanceCriterion evaluates and determines if a new solution can be accepted according to the objective function (Line 10).

Algorithm 1
Algorithm 1

Iterated Local Search (ILS).

Table 2 ILS algorithm notations

The output of FITPATH’s ILS computation is the p, the list of best paths found by the end of the process. A possible solution obtained for the topology in Fig. 2 is represented in the same colors in Fig. 4, where each list represents a path \(p_{s_i}\) for each flow \(f_{s_i}\). In this example, the selected paths share some links, which may cause inter-flow interference and, consequently, may result in lower network performance. Thus, the next step attempts to improve the solution.

Fig. 4
Fig. 4

Example of a candidate solution p

Functions LocalSearch, Perturbation and AcceptanceCriterion iterate to traverse potentially good alternate solutions with the goal of selecting a solution that improves network performance while meeting the requirements of the video flows. Based on MAPE’s performance estimates, the ILS algorithm calculates the objective function given by (1) and (2) which evaluate the candidate solutions using two network performance metrics. The first is the throughput gap \(\varGamma (c)\) given by (1), which measures the normalized network throughput gap of a candidate solution c, defined as the sum of the percentage differences between the estimated throughput \(\overline{Th_{f}}\) and bitrate \(\lambda _{f}\) for all flows f.

$$\begin{aligned} \varGamma (c) = \sum _{f \in F} \frac{\lambda _{f}-\overline{ Th_{f}}}{\overline{Th_{f}}} \end{aligned}$$
(1)

In a nutshell, \(\varGamma (c)\) measures how far solution c is from perfectly supporting the traffic demands of all flows. We use the normalized throughput gap because flows may have different bitrates.

The second metric is the end-to-end delay \(\varDelta (c)\), given by (2), which is the estimated average delay \(\overline{\varDelta _{f}}\) averaged over all flows f of a candidate solution c, where |F| is the number of flows.

$$\begin{aligned} \varDelta (c) = \frac{\sum _{f \in F} \overline{\varDelta _{f}}}{|F|}, \end{aligned}$$
(2)

We evaluated the performance of the objective function by correlating these two metrics. As will be demonstrated in the Section 7.1.1, both metrics have significant effect on perceived video quality and FITPATH’s ILS obtains better candidate solutions when the end-to-end delay is used only as a tie-breaking criteria between solutions that are estimated to result in the same throughput gap.

Finally, the ILS algorithm stops when the termination condition is met, for example by establishing a limit on the number of iterations without significant improvement to the solution as well as considering the algorithm’s total execution time (often based on the application response time requirements). The termination condition can be used to control the balance between intensification, i.e., performing local search and diversification, i.e., exploring perturbations in the solution space of that search.

Local Search

At each iteration, LocalSearch evaluates solutions that are “neighbors” of the current best solution p. Here, two feasible solutions are considered neighbors if they differ by exactly one path. We use a simple local search technique to minimize computation and storage overhead. As shown in Algorithm 2, LocalSearch uses paths of the next solution ranked in the list of candidate solutions P to generate and evaluate all possible neighbor candidates of the current solution.

Algorithm 2
Algorithm 2

Local Search.

Before estimating the performance of each neighbor solution using MAPE, paths that cannot meet the current solution c’s throughput estimate are pruned to reduce computation resource usage and execution time. This pruning assumes that paths with higher ETX (and thus lower reliability, since they may require more packet retransmissions) are not capable of meeting the flows’ throughput and delay requirements. To that end, the LocalSearch procedure computes a pruning threshold using the minimum throughput gap \(\varGamma _{min}(c)\) given by (3), which is calculated based on the link ETX (\(l_{ETX}\)), where \(L_{f}\) is the set links of flow f’s path and LC is the link capacity.

$$\begin{aligned} \varGamma _{min}(c) = \sum _{f \in F} \frac{\lambda _{f}-\left[ {max}{\left( \sum _{l \in L_{f}}\frac{1}{l_{ETX}}\right) }\times LC\right] }{{max}{\left( \sum _{l \in L_{f}}\frac{1}{l_{ETX}}\right) }\times LC} \end{aligned}$$
(3)

As will be demonstrated in Section 7.1.2, this pruning strategy significantly increases the heuristic’s search space by reducing the number of MAPE runs. Solutions that are not pruned are evaluated by MAPE, which returns estimates for the throughput and delay of each flow when transmitted using the candidate solution’s path. A new candidate solution is considered better than the current best solution if it either has a lower throughput gap or if it results in the same throughput, but with lower delay.

Perturbation

To avoid getting “stuck” in local optimal solutions, FITPATH applies perturbations to the current best solution. Our current strategy is to generate a new candidate solution by randomly replacing one path for each source in the best currently known solution p. This uses paths in P and the history solution list H which may lead to visiting new portions of the solution space not previously visited by the initial all-pairs least cost path (initSolution) or nearest neighbor (LocalSearch).

Acceptance Criterion

In each iteration, FITPATH compares the costs of solutions p and \(p''\) considering the same criterion used in LocalSearch and accepts the best one (see Algorithm 1). Whenever a new better solution is accepted, \(\varGamma _{best}\) is updated and the previous solution is added to the solution history list H that is consulted upon future perturbations. FITPATH’s quest for the set of best paths terminates when the specified acceptance criterion is met. In our experiments, the termination condition uses the maximum execution time which is set at 60 seconds. As will be demonstrated in Section 7.1.4, FITPATH did not achieve significant improvement after 40 seconds in the evaluated scenarios.

The computational complexity of FITPATH can be represented as \(\mathcal {O}(I \times N \times F\)), where I is the number of ILS iterations, C is the number of candidate neighbors explored in each local search, and F is the number of flows. For each ILS iteration, the algorithm performs a local search that evaluates C neighbors as possible solutions. This is done using MAPE, which itself computes metrics for F video flows. The number of neighbors C and iterations I are both tunable parameters, allowing the cost to be bounded according to the target system’s constraints. Since each iteration produces feasible paths, FITPATH can be interrupted early and still provide adequate results, making it practical even in resource-constrained or dynamic network environments.

FITPATH’s convergence time depends on the network topology and application scenarios, as the solution space increases with the number of sources and the number of flows. For more complex scenarios that require longer convergence times, video transmission could start using FITPATH’s output after the first few iterations; FITPATH can continue to execute in the background, looking for better solutions. For scenarios where network conditions may change frequently, FITPATH can run periodically, allowing it to continuously adjust paths in response to network dynamics, such as node failures or changes in network conditions [30].

4.4 Network performance estimation

MAPE [9] is used to estimate the performance of the candidate paths selected by FITPATH’s ILS computation. It is a deterministic simulator that estimates the long-term average performance for all flows considering their bitrate as well as flow interference.

Unlike stochastic simulators that study network behavior over a predefined period of time, MAPE estimates the performance of the network, e.g., throughput, packet loss, and end-to-end delay, at steady-state. When compared to other deterministic estimators [9], MAPE attains more accurate performance estimates due to its ability to account for flows with specific bitrates which is consistent with real-world multimedia application scenarios.

Whenever FITPATH invokes MAPE, it provides as input the set of paths to be evaluated and the bitrate of each flow. MAPE then uses this information to deterministically simulate the network dynamics by generating simulated packets for each flow and triggering relevant network events, such as wireless medium access transmission, queue management packets being added, removed and discarded from buffers. As a result, MAPE is able to take into account flow interference that happens due to buffer overflow, link-layer transmission losses, and medium access contention. MAPE continues to simulate the network until it detects that steady state has been reached, where steady state is defined as a state where the same sequence of events repeats itself. At that point, MAPE computes each flow’s throughput and delay estimates, which are used to calculate the ILS objective function according to (1) and (2).

5 FITPATH deployment

In this section, we discuss how FITPATH can be deployed in practice. Recall that FITPATH’s path selection technique targets IEEE 802.11-based wireless multihop networks. In particular, FITPATH focuses on IoT applications such as Smart Cities and surveillance systems which require the transmission of video streams. In these scenarios, nodes are typically stationary, powered by continuous energy sources and equipped with sufficient storage and computing power. As such, frequent topology changes caused by node mobility, energy depletion, etc. are not expected to play a significant role.

FITPATH was designed such that it can be deployed in real-world IoT scenarios. As such, the simulation scenarios we use to evaluate FITPATH were configured to reflect realistic wireless conditions, including congestion, traffic burstiness, and protocol constraints commonly observed in real networks. The implementation was validated using ns-3, a widely adopted network simulator in the scientific community. Additionally, FITPATH’s modular design facilitates integration with existing network stacks, and its operation is compatible with commercially available hardware platforms. Moreover, deploying FITPATH in a real camera network testbed is a part of our ongoing work.

In scenarios where topology changes need to be considered, information about network conditions can be obtained using different mechanisms depending on the type of network control plane architecture as discussed below.

5.1 Decentralized network control

Figure 5 illustrates an example FITPATH deployment under decentralized network control, e.g., multi-hop wireless ad-hoc networks using distributed routing protocols. In these scenarios, each node maintains a complete view of the network topology utilizing topology state monitoring and dissemination mechanisms similar to those implemented by proactive link-state routing, e.g., OLSR [13]. Those mechanisms update topology information periodically and whenever nodes detect “significant” changes, e.g., link failures, new nodes/links, changes in link quality, etc. Current topology state information is fed as input to FITPATH.

Fig. 5
Fig. 5

FITPATH and decentralized network control

At each video source, FITPATH makes routing decisions based on the current network state obtained from the topology monitoring module. It finds a set of paths over which video flows can be transmitted. The video source is then responsible for forwarding the flows through the selected paths. In order to avoid routing loops, a simple solution is to implement a source routing technique in which the source includes complete path information in the packet header.

5.2 Centralized network control

FITPATH can also be deployed under a centralized network control plane a la SDN (Software-Defined Networking) [11] as shown in Fig. 6. The SDN controller makes routing decisions based on its global view of the network and informed by FITPATH, while network nodes perform data forwarding based on the routing rules they receive from the controller.

Fig. 6
Fig. 6

FITPATH and centralized network control

Note that in centralized network control environments, FITPATH will be used to assist the controller in its routing decisions based on current network topology knowledge and video traffic requirement information. The controller then updates the forwarding nodes’ routing tables to forward flows according to the selected paths.

6 Evaluation methodology

FITPATH’s performance is assessed hereinafter using the ns-3 [35] network simulator, version 3.7. Experiments were performed on a dedicated Ubuntu 18.04 server with an Intel i7-860 processor running at 2.8 GHz and with 32 GB of RAM. We considered application scenarios that model typical surveillance deployments in urban regions where video sources transmit flows simultaneously to a single monitoring center [4]. A variety of video sequences that contain scenes of different motion levels was used to reproduce video traffic diversity in complex environments, such as smart cities [36].

In each simulation run, nodes were randomly placed in a 300 m \(\times\) 300 m region separated from each other by a minimum distance. In our current experiments, we used a minimum separation distance of 5 meters.

Although FITPATH is able to handle higher bitrates and is compatible with more recent IEEE 802.11 versions, we used IEEE 802.11g with fixed link bitrates of 18 Mb/s in our experiments because it remains dominant a widely used standard in real-world IoT deployments [31], particularly in 2.4 GHz networks. Additionally, the use of 802.11g introduces lower data rates and thus higher probability of link saturation/congestion, which helps simulate more congested and challenging conditions, as typically observed in dense IoT environments. In low-saturation scenarios, different mechanisms tend to present similar performance.

To more realistically simulate an urban environment, the Cost231 channel propagation model [38] was used.

In our experiments, we used the maximum execution time of 60 seconds as ILS’ termination condition but that can be adjusted accordingly depending on the driving application. For P, the list of candidate solutions used by ILS as input, we set its size to \(100 \times |S|\), where |S| is the number of sources. P’s size was determined empirically: given the computational environment used in our experiments, \(100 \times |S|\) was the maximum candidate solution that could be evaluated within the 60-second time limit. This timeout was chosen based on preliminary observations, which showed consistent convergence within this duration under typical simulation scenarios.

Table 3 lists the parameters and their values used to simulate a variety of network and application scenarios. A total of 2, 700 simulation runs, as a combination of 90 topologies (30, 45 and 60 nodes), 5 offered loads, 2 video coding techniques, 3 video scene motions, were generated to evaluate the mechanisms under diverse conditions. Detailed instructions for reproducing the experiments, including the simulation setup, source code, and datasets, are available in our public repository Footnote 2.

Table 3 Simulation Parameters

We evaluated FITPATH, by comparing its performance against the mechanisms described in Section 3, namely: RTVP, CLMR, ERVT, Q-MMTP, QSOpt and ILS-MDC. For that, we implemented all these mechanisms on top of ns-3. After validating their implementations using the scenarios of each proposal’s original paper, we evaluated them by using the same scenarios as specified in Table 3.

6.1 Traffic models

In our experiments, unless otherwise stated, only video traffic is transmitted , i.e., there is no competing background traffic. The Evalvid [24] framework was used to generate realistic video traffic for the simulations. It generates traffic corresponding to a given video clip and evaluates the quality of the video delivered at the receiver.

Our experiments employed publicly available and commonly used video clips, namely “Hall Monitor”, “Paris” and “CoastGuard” [36], which represent different motion complexity levels, i.e., low-, medium- and high-motion, respectively. The video clips were converted to H.264 format with a rate of 30 frames per second. Considering real-time transmission delay and human tolerance, the play-out buffer was set to 300 ms to mitigate potential out-of-order packets; packets delayed longer than 300 ms were discarded at the decoder.

Video traffic was generated using a mix of five different target video encoder rates, namely 256 kb/s, 512 kb/s, 1 Mb/s, 1.5 Mb/s and 2 Mb/s, to represent different levels of video quality. At the video source, flows were forwarded according to the strategies specified by each of the evaluated proposals using both LC or MDC video compression techniques.

6.2 Evaluation metrics

Our evaluation considered both network performance and quality of experience. For that, we experimented with different network densities and offered load (see Table 3). Network density refers to the number of nodes in an area, i.e., topologies generated with 30, 45 and 60 nodes. Offered load is defined as the sum of bitrates of all flows transmitted simultaneously, considering the average rate used as target for generating video traffic. We varied video traffic with different flow bitrate for each source generating offered loads between 1 and 8 Mb/s. Furthermore, we evaluated FITPATH with different parameters in order to assess how they affect video quality.

Network performance

We evaluated network performance using throughput efficiency, end-to-end delay, video frame loss, and network utilization. Throughput efficiency is calculated as the ratio between the throughput (number of bits delivered at the destination per simulation time) and the video encoder rate, averaged over all video sources. End-to-end delay is the time interval between when a packet is transmitted by the source node and when that packet is delivered at the destination, averaged over all received packets. Video frame loss is calculated as the percentage of frames transmitted that were not decoded at the destination due to delay or packet loss. Network utilization is based on the nodes’ queue occupancy averaged over the simulation time and is measured as the highest average queue occupancy among all nodes.

Quality of experience

We used the Structural Similarity Index Measure (SSIM), the Peak Signal-to-Noise Ratio (PSNR), and the Mean Opinion Score (MOS) to evaluate the quality of received video. These metrics have been widely used to measure user Quality-of-Experience, or QoE [12]. SSIM measures the structural distortion of the video. It combines luminance, contrast and structural similarity of the frames to compare the delivered (possibly distorted) frame with the original one. SSIM values range from 0 to 1, where 1 means maximum quality. PSNR, measured in dB, is calculated as the frame-by-frame error between the original and the delivered video. The PSNR value tends to increase as the video quality improves. MOS represents the perceived quality of the video as rated by human observers and is usually expressed on a scale from 1 (worst) to 5 (best), where higher values reflect greater user satisfaction. In this work, MOS values were estimated by mapping PSNR results to the corresponding MOS scale using the established model from Evalvid [24]. While we provide PSNR results in Table 4 and demonstrate MOS representation of video quality in Fig. 13, our evaluation focuses on SSIM, as it has been reported to correlate well with QoE [39].

7 Simulation results

In our previous work [8], we performed a preliminary evaluation of FITPATH to validate its design and implementation. In this paper, we take a significant step forward and considerably expand our evaluation of FITPATH by exploring a wider range of new scenarios and performance metrics for a more comprehensive analysis of FITPATH. We organize our new experimental results into five categories, namely: (1) analyze FITPATH’s ILS performance in terms of its objective function, including evaluating the effects of pruning, video coding and convergence time, (2) study FITPATH’s average performance compared against other multipath selection mechanisms considered in Section 3, (3) evaluate the impact of network density on FITPATH’s performance, (4) study how offered load affects FITPATH’s performance, and (5) evaluate FITPATH’s performance under more realistic deployment conditions by analyzing the impact of control message overhead on video quality.

7.1 ILS performance

In order to set FITPATH’s ILS parameters, we evaluate different objective functions, pruning strategies, video coding techniques and convergence time.

7.1.1 Objective function

We explored how the metrics estimated by MAPE, i.e., the throughput gap and the end-to-end delay (given by (1) and (2)) affect video quality. For that, we evaluated both metrics independently and in combination. Figure 7 demonstrates that the FITPATH’s strategy of using throughput along with end-to-end delay as a tie-breaking criteria yields superior video quality measured by the SSIM in all of the evaluated scenarios which consider different network densities and video motion levels. This confirms that video quality is more sensitive to packet loss — which manifests itself on the throughput metric —, but delay must also be considered when paths have similar throughput.

Although MAPE estimates steady-state behavior and does not explicitly capture bursty traffic characteristics, our experimental results — based on real video traces with variable bitrate and traffic bursts — show that MAPE is effective as an objective function. Nonetheless, incorporating burst-aware metrics could further improve accuracy.

Fig. 7
Fig. 7

FITPATH objective function evaluation considering SSIM (mean and 95% confidence intervals) for different network densities and different video scene motion levels under 4 Mb/s offered load

7.1.2 Pruning

Since MAPE’s computational cost represents most of the overhead incurred by FITPATH’s objective function calculation, FITPATH tries to minimize this cost using pruning. As discussed in Section 4.3, FITPATH prunes solutions whose estimated ETX-based throughput is lower than the throughput of the current best known solution estimated by MAPE. Figure 8(a) shows the number of MAPE runs while Fig. 8(b) shows the total number of candidate solutions considered by FITPATH, with and without pruning. With pruning, FITPATH is able to evaluate significantly more solutions (in these experiments, thousands more solutions) than without pruning. This allows a larger portion of the search space to be evaluated.

Fig. 8
Fig. 8

FITPATH’s computation overhead (mean and 95% confidence intervals) with and without pruning in topologies with 60 nodes considering different video scene motion levels under 4 Mb/s offered load

7.1.3 Video coding

Since FITPATH considers each flow’s bitrate, it is able to cope with different video compression techniques. To demonstrate this, we evaluated FITPATH’s performance when driven by video traffic using LC and MDC video coding techniques. Figure 9 shows that FITPATH’s performance as measured by QoE is not affected by the video coding techniques used.

Fig. 9
Fig. 9

FITPATH’s QoE performance measured by the SSIM (mean and 95% confidence intervals) for different video coding, offered loads and different video motion levels in a 60-node IoT deployment

We also ran the same experiments for other path selection techniques and observed that proposals that were originally designed for LC video coding, such as ERVT, CLMR and RTVP, present worse performance when MDC video coding is used. This is because these mechanisms prioritize I-frame flows when selecting primary and secondary paths. This performance loss is more pronounced for high-motion video as the secondary paths tend not to withstand traffic bursts caused by scene changes.

7.1.4 Convergence time

Since FITPATH uses a heuristic-based iterative algorithm, it is important to study how long it takes to converge and how convergence impacts video quality. To that end, we analyzed how the SSIM of the solutions found by FITPATH evolve over its execution period — limited to 60 seconds. This kind of analysis can also be used to help define the ideal heuristic stop criterion. In fact, Fig. 10 shows SSIM’s mean and \(95\%\) confidence intervals of the current solution found by FITPATH at different execution moments for all scenarios with 4 Mb/s offered load and 60 node.

As expected, FITPATH’s solutions generally improve with time, especially for videos with higher scene motion levels. However, in this particular scenario with 60 nodes and 4 Mb/s offered load, we observed that relatively little improvement is achieved after 40 seconds of execution regardless of the level of motion. Furthermore, after 5 seconds, FITPATH already found solutions whose average SSIM surpasses the one obtained by QSOpt, as shown by the reference blue lines in the graph. As will be shown, that QSOpt was the second best performer, behind only FITPATH, in all experiments. Moreover, QSOpt itself is a heuristic-based approach and takes between 5 to 10 seconds to obtain a solution for this scenario with 60 nodes — its execution time increases with the number of nodes and flows. However, differently from FITPATH, QSOpt does not provide intermediate solutions, only its final solution when its execution terminates, which is why QSOpt’s SSIM value remains constant throughout the experiment as shown in Fig. 10.

Fig. 10
Fig. 10

FITPATH convergence: SSIM (mean and 95% confidence intervals) evolution over time for scenarios with 4 Mb/s offered load and 60 nodes

Multipath selection mechanisms must run whenever network topology or video requirements change, according to the application and network dynamics. However, as discussed in Section 4.3, FITPATH’s initial iterations can still be used by the video sources to start transmission and paths can be updated as new solutions are found during the convergence process, and, as our results show, FITPATH is able to quickly provide intermediate solutions that yield superior performance.

7.2 FITPATH’s average performance

Table 4 summarizes FITPATH’s average performance as well as the performance of the multipath selection mechanisms considered in our study. These experiments were executed for all scenarios (different number of nodes and coding rates) using 4 Mb/s of offered load. Value ranges shown for the different metrics denote 95% confidence intervals around the mean of all simulated runs. As expected, the network performance of each mechanism — throughput, delay and frame loss — correlates well with their video quality performance. Overall, FITPATH performs better in terms of throughput efficiency and video frame losses, resulting in superior video quality as reflected in the PSNR and SSIM metrics.

Table 4 Network and video quality performance for the different path selection mechanisms considering different number of nodes and coding rates with 4 Mb/s offered load

To compare FITPATH against the second and third top performers in terms of SSIM, i.e., QSOpt and ERVT, we plotted the SSIM results of these mechanisms for all simulation runs. Each point in the graphs of Fig. 11 represents a simulation run, or instance, and its coordinates correspond to the SSIM obtained by FITPATH (horizontal axis) and the SSIM for ERVT or QSOpt (vertical axis). In particular, blue points denote runs where FITPATH reached higher or equal SSIM, while the red ones are runs for which it yielded worse performance. In both graphs, there is clearly a significantly larger number of blue points (over 85%), indicating that FITPATH yields superior video quality compared to the other two mechanisms in most runs. We noticed FITPATH underperforms in runs with paths containing lower quality links according to their estimated ETX. As future work, we intend to explore different mechanisms to estimate links performance as well as investigate alternate link quality metrics, e.g., the RSSI. Moreover, we observed that in scenarios with higher offered load and higher motion levels, and thus lower SSIM, FITPATH is less likely to be outperformed by ERVT or QSOpt.

Fig. 11
Fig. 11

SSIM performance

Our simulations also confirm that performance is strongly influenced by the characteristics of the video. This is shown in Fig. 12, which plots mean network utilization (95% confidence intervals are relatively small and thus not visible in the graph) for videos with different motion levels. As expected, network utilization increases for all path selection algorithms as video motion levels increase, because of the transmission bursts resulting from changes in scenes. Thus, videos with more motion induce more bursts which put more pressure on the network. Even though all path selection approaches are affected by motion levels, FITPATH is the least affected one and results in lower network utilization for all three types of video, indicating it chooses more efficient paths overall. High network utilization might lead to or correlate with congestion episodes. Thus, proposals that better distribute the load between paths are often able to deliver superior video quality. Although ERVT and QSOpt present similar results in terms of video quality as shown in Table 4, ERVT and QSOpt tend to incur higher network overhead by, respectively, transmitting duplicate packets to improve error resilience and by selecting paths that are not able to meet flow bitrates.

Fig. 12
Fig. 12

Average network utilization (with 95% confidence intervals) for videos with different scene motion levels

Table 5 SSIM for different video scenes motion levels with 4 Mb/s offered load and 60 nodes

Table 5 summarizes SSIM results for all studied path selection mechanisms considering different scene motion levels in scenarios with 60 nodes and 4 Mb/s of offered load — results with different numbers of nodes and offered loads exhibit similar trends. We again see that FITPATH obtains higher video quality, regardless of the level of motion in the videos. We also highlight that SSIM’s variance is always low, indicating that FITPATH is consistently able to find good solutions. While QSOpt and ERVT presented results that were relatively close on average, their variances are much larger, suggesting they more often result in inferior solutions. FITPATH’s efficiency is even more evident when we look at lower SSIM values: for low-motion videos, for example, FITPATH’s lowest SSIM is 0.82, while QSOpt and ERVT both have at least one instance with an SSIM as low as 0.52. FITPATH SSIM’s low variance also demonstrates that FITPATH can identify good paths in a wide range of scenarios.

Since SSIM and PSNR are purely objective metrics, we provide an additional analysis by mapping the PSNR of each frame to the MOS scale, as described in Section 6. Figure 13 shows the distribution of frames across these levels for each mechanism under different video scenes motion. FITPATH achieves the highest proportion of Good (perceptible, but not annoying) frames, especially in low- and medium-motion scenarios. In contrast, other mechanisms show a larger fraction of Fair (slightly annoying) and Poor (annoying) frames, particularly as motion increases. This can demonstrate the superior perceptual performance of FITPATH under varying video scenes.

Fig. 13
Fig. 13

MOS for different number of nodes and video scene motion levels under 4 Mb/s offered load

Fig. 14
Fig. 14

SSIM (mean and 95% confidence intervals) for different different network densities and different video scene motion levels under 4 Mb/s offered load

Fig. 15
Fig. 15

SSIM (mean and 95% confidence intervals) for different offered loads in topologies with 60 nodes according to the level of video motion

7.3 Network density

Since network density has direct impact on path diversity and thus can influence multipath selection mechanisms, we ran experiments varying the number of deployed nodes using the same \(300~m \times 300~m\) area. Figure 14 shows the average SSIM (mean and 95% confidence intervals) obtained by each mechanism as a function of the number of nodes in runs under 4 Mb/s of offered load and different motion levels. In general, all mechanisms’ SSIM tend to improve as the number of nodes increases because of the higher path diversity. Position-based approaches, in particular, leverage high network density but might not accommodate multiple video sources, which explains the difficulty of RTVP and Q-MMTP in finding paths for all flows, resulting in lower performance. FITPATH and QSOpt, on the other hand, are relatively immune to the effects of network density since their approaches are based on the current conditions of the underlying network, regardless of the position of the nodes.

7.4 Offered load

Results (mean and 95% confidence intervals) showing the impact of the total offered load on video quality are presented in Fig. 15. In these experiments, we varied the offered load from 1 Mb/s to 8 Mb/s for all scenarios with 60 nodes. As expected, video quality decreases with increased offered load, especially for videos with higher scene motion levels, since it results in higher contention and consequently more collisions and longer queuing delays. Once again, the results show that FITPATH outperforms the other path selection mechanisms in all evaluated scenarios. Moreover, the performance gap between FITPATH and the other mechanisms generally grows for scenarios with higher offered load and higher motion levels. This can be explained by the fact that FITPATH uses MAPE’s network performance estimation strategy which accounts for flow interference. As such, it is able to select paths that are better suited to handle the offered load and, therefore, maximize video quality. This result is also consistent with our previous observation that FITPATH is able to deliver adequate performance under challenging network conditions and more stringent application requirements.

7.5 FITPATH’s deployment performance

We implemented a practical FITPATH’s deployment under decentralized network control. The topology monitor was implemented based on the OLSR protocol, as presented in Section 5.1. Control message mechanisms were implemented so that nodes can obtain a global view of the network topology. The discovery of neighboring nodes is achieved by periodically broadcasting HELLO packets. Local information is continuously collected and shared between nodes through periodic transmission of Topology Control (TC) packets. This iterative process continues until all nodes have acquired the network topology view.

To evaluate the control messages overhead impact on video quality, for each offered load, we simulated two conditions: (i) realistic - with control messages being transmitted simultaneously with video streams, and (ii) non-realistic - with interruption of transmission of control messages after computing the topology and starting video transmission. In this experiment, we defined transmission intervals for HELLO and TC packets of 2s and 5s, respectively – a standard OLSR parameter configuration. However, these parameters can be dynamically adjusted according to the network characteristics and application scenarios.

Figure 16 shows the SSIM (mean and 95% confidence intervals) results in the two control message transmission conditions, during the traffic of 300 frames of the video clip “Hall Monitor” (low mobility scenes) from each source using LC encoding. When video transmissions were carried out in competition with control messages, the overhead the quality of the video by 2.90%.

Fig. 16
Fig. 16

Influence of control messages on SSIM (mean and 95% confidence intervals)

In summary, note that the quality of the received video decreases not only with the increase in the offered load and the mobility of the scenes in the video, but also with the competition of control messages with the video streams, especially for videos with scenes with higher levels of mobility, since the bursts generated by scene changes result in greater contentions and, consequently, more collisions and delays in queues.

8 Conclusion

Wireless multipath video transmission has been fundamental for improving the user’s QoE. However, the multipath selection mechanism requires approaches to find solutions that maximize the users’ QoE considering flow interference from multiple video sources. This paper evaluated FITPATH, an efficient and adaptive multipath routing path selection scheme that can accommodate environments where multiple video sources may transmit simultaneous flows of different bitrates. FITPATH uses a novel heuristic-based iterative optimization approach that estimates in real time the conditions of the underlying network while accounting for the different bitrate requirements of the application video flows. Additionally, we introduce two practical FITPATH’s deployment network architectures, employing different strategies with centralized and decentralized control planes for topology monitoring.

Our experimental results that consider application scenarios that resemble realistic IoT deployments show that FITPATH outperforms various existing path selection mechanisms both in terms of user QoE and network performance. We also evaluated FITPATH’s convergence behavior to demonstrate that it can be used in practice by video transmission sources to quickly generate feasible solutions that result in adequate video quality while incrementally improving QoE as the algorithm iterates. Finally, we evaluated FITPATH under a decentralized network control plane and showed the influence of the overhead caused by control packets used to monitor topology on QoE.

As future work, we intend to explore adaptive or optimized termination criteria for the heuristic, based on convergence trends or solution quality thresholds. Additionally, we intend to evaluate different objective functions, e.g., accounting for other network performance metrics such as loss, as well as other heuristics, e.g., more elaborate local search approaches. We also plan to incorporate support for adaptive video coding and propose objective functions to assess solutions that adjust to dynamic coding requirements. Additionally, we plan to implement a proactive link-state routing protocol to test the behavior of FITPATH under dynamic topology conditions and evaluate the dynamics of the network, concurrent flows, alternate link quality metrics, and overhead introduced by the solution. Finally, we will deploy FITPATH in a real camera network testbed.