Logo for Mavs Open Press

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Part III: Travel Demand Modeling

9 Introduction to Transportation Modeling: Travel Demand Modeling and Data Collection

Chapter Overview

Chapter 9 serves as an introduction to travel demand modeling, a crucial aspect of transportation planning and policy analysis. As explained in previous chapters, the spatial distribution of activities such as employment centers, residential areas, and transportation systems mutually influence each other. The utilization of travel demand forecasting techniques leads to dynamic processes in urban areas. A comprehensive grasp of travel demand modeling is imperative for individuals involved in transportation planning and implementation.

This chapter covers the fundamentals of the traditional four-step travel demand modeling approach. It delves into the necessary procedures for applying the model, including establishing goals and criteria, defining scenarios, developing alternatives, collecting data, and conducting forecasting and evaluation.

Following this chapter, each of the four steps will be discussed in detail in Chapters 10 through 13.

Learning Objectives

Student Learning Outcomes

  • Describe the need for travel demand modeling in urban transportation and relate it to the structure of the four-step model (FSM).
  • Summarize each step of FSM and the prerequisites for each in terms of data requirement and model calibration.
  • Summarize the available methods for each of the first three steps of FSM and compare their reliability.
  • Identify assumptions and limitations of each of the four steps and ways to improve the model.

Introduction

Transportation planning and policy analysis heavily rely on travel demand modeling to assess different policy scenarios and inform decision-making processes. Throughout our discussion, we have primarily explored the connection between urban activities, represented as land uses, and travel demands, represented by improvements and interventions in transportation infrastructure. Figure 9.1 provides a humorous yet insightful depiction of the transportation modeling process. In preceding chapters, we have delved into the relationship between land use and transportation systems, with the houses and factories in the figure symbolizing two crucial inputs into the transportation model: households and jobs. The output of this model comprises transportation plans, encompassing infrastructure enhancements and programs. Chapter 9 delves into a specific model—travel demand modeling. For further insights into transportation planning and programming, readers are encouraged to consult the UTA OERtransport book, “Transportation Planning, Policies, and History.”

A graphical representation of FSM input and outputs data in the process.

Travel demand models forecast how people will travel by processing thousands of individual travel decisions. These decisions are influenced by various factors, including living arrangements, the characteristics of the individual making the trip, available destination options, and choices regarding route and mode of transportation. Mathematical relationships are used to represent human behavior in these decisions based on existing data.

Through a sequential process, transportation modeling provides forecasts to address questions such as:

  • What will the future of the area look like?
  • What is the estimated population for the forecasting year?
  • How are job opportunities distributed by type and category?
  • What are the anticipated travel patterns in the future?
  • How many trips will people make? ( Trip Generation )
  • Where will these trips end? ( Trip Distribution )
  • Which transportation mode will be utilized? ( Mode Split )
  • What will be the demand for different corridors, highways, and streets? ( Traffic Assignment )
  • Lastly, what impact will this modeled travel demand have on our area? (Rahman, 2008).

9.2 Four-step Model

According to the questions above, Transportation modeling consists of two main stages, regarding the questions outlined above. Firstly, addressing the initial four questions involves demographic and land use analysis, which incorporates the community vision collected through citizen engagement and input. Secondly, the process moves on to the four-step travel demand modeling (FSM), which addresses questions 5 through 8. While FSM is generally accurate for aggregate calculations, it may occasionally falter in providing a reliable test for policy scenarios. The limitations of this model will be explored further in this chapter.

In the first stage, we develop an understanding of the study area from demographic information and urban form (land-use distribution pattern). These are important for all the reasons we discussed in this book. For instance, we must obtain the current age structure of the study area, based on which we can forecast future birth rates, death, and migrations  (Beimborn & Kennedy, 1996).

Regarding economic forecasts, we must identify existing and future employment centers since they are the basis of work travel, shopping travel, or other travel purposes. Empirically speaking, employment often grows as the population grows, and the migration rate also depends on a region’s economic growth. A region should be able to generate new employment while sustaining the existing ones based upon past trends and form the basis for judgment for future trends (Mladenovic & Trifunovic, 2014).

After forecasting future population and employment, we must predict where people go (work, shop, school, or other locations). Land-use maps and plans are used in this stage to identify the activity concentrations in the study area. Future urban growth and land use can follow the same trend or change due to several factors, such as the availability of open land for development and local plans and  zoning ordinances (Beimborn & Kennedy, 1996). Figure 9.3 shows different possible land-use patterns frequently seen in American cities.

This pictures shows 6 different land use patterns that are: (a) traditional grid, (b) post-war suburb, (c) traditional neighborhood design, (d) fused grid, (e) post-war suburb II, and (f) tranditional neighborhood design II.

Land-use pattern can also be forecasted through the integration of land use and transportation as we explored in previous chapters.

Figure 9.3 above shows a simple structure of the second stage of FSM.

This picture shows the sequence of the fours steps of FSM.

Once the number and types of trips are predicted, they are assigned to various destinations and modes. In the final step, these trips are allocated to the transportation network to compute the total demand for each road segment. During this second stage, additional choices such as the time of travel and whether to travel at all can be modeled using choice models (McNally, 2007). Travel forecasting involves simulating human behavior through mathematical series and calculations, capturing the sequence of decisions individuals make within an urban environment.

The first attempt at this type of analysis in the U.S. occurred during the post-war development period, driven by rapid economic growth. The influential study by Mitchell and Rapkin (1954) emphasized the need to establish a connection between travel and activities, highlighting the necessity for a comprehensive framework. Initial development models for trip generation, distribution, and diversion emerged in the 1950s, leading to the application of the four-step travel demand modeling (FSM) approach in a transportation study in the Chicago area. This model was primarily highway-oriented, aiming to compare new facility development and improved traffic engineering. In the 1960s, federal legislation mandated comprehensive and continuous transportation planning, formalizing the use of FSM. During the 1970s, scholars recognized the need to revise the model to address emerging concerns such as environmental issues and the rise of multimodal transportation systems. Consequently, enhancements were made, leading to the development of disaggregate travel demand forecasting and equilibrium assignment methods that complemented FSM. Today, FSM has been instrumental in forecasting travel demand for over 50 years (McNally, 2007; Weiner, 1997).

Initially outlined by Mannheim (1979), the basic structure of FSM was later expanded by Florian, Gaudry, and Lardinois (1988). Figure 9.3 illustrates various influential components of travel demand modeling. In this representation, “T” represents transportation, encompassing all elements related to the transportation system and its services. “A” denotes the activity system, defined according to land-use patterns and socio-demographic conditions. “P” refers to transportation network performance. “D,” which stands for demand, is generated based on the land-use pattern. According to Florian, Gaudry, and Lardinois (1988), “L” and “S” (location and supply procedures) are optional parts of FSM and are rarely integrated into the model.

This flowchart shows the relationship between various components of transportation network and their joint impact on traffic volume (flow) on the network.

A crucial aspect of the process involves understanding the input units, which are defined both spatially and temporally. Demand generates person trips, which encompass both time and space (e.g., person trips per household or peak-hour person trips per zone). Performance typically yields a level of service, defined as a link volume capacity ratio (e.g., freeway vehicle trips per hour or boardings per hour for a specific transit route segment). Demand is primarily defined at the zonal level, whereas performance is evaluated at the link level.

It is essential to recognize that travel forecasting models like FSM are continuous processes. Model generation takes time, and changes may occur in the study area during the analysis period.

Before proceeding with the four steps of FSM, defining the study area is crucial. Like most models discussed, FSM uses traffic analysis zones (TAZs) as the geographic unit of analysis. However, a higher number of TAZs generally yield more accurate results. The number of TAZs in the model can vary based on its purpose, data availability, and vintage. These zones are characterized or categorized by factors such as population and employment. For modeling simplicity, FSM assumes that trip-making begins at the center of a zone (zone centroid) and excludes very short trips that start and end within a TAZ, such as those made by bike or on foot.

Furthermore, highway systems and transit systems are considered as networks in the model. Highway or transit line segments are coded as links, while intersections are represented as nodes. Data regarding network conditions, including travel times, speeds, capacity, and directions, are utilized in the travel simulation process. Trips originate from trip generation zones, traverse a network of links and nodes, and conclude at trip attraction zones.

Trip Generation

Trip generation is the first step in the FSM model. This step defines the magnitude of daily travel in the study area for different trip purposes. It will also provide us with an estimate of the total trips to and from each zone, creating a trip production and attraction matrix for each trip’s purpose. Trip purposes are typically categorized as follows:

  • Home-based work trips (work trips that begin or end at home),
  • Home-based shopping trips,
  • Home-based other trips,
  • School trips,
  • Non-home-based trips (trips that neitherbeginnorendathome),
  • Trucktrips,and
  • Taxitrips(Ahmed,2012).

Trip attractions are based on the level of employment in a zone. In the trip generation step, the assumptions and limitations are listed below:

  • Independent decisions: Travel behavior is affected by many factors generated within a household; the model ignores most of these factors. For example, childcare may force people to change their travel plans.
  • Limited trip purposes: This model consists of a limited number of trip purposes for simplicity, giving rise to some model limitations. Take shopping trips, for example; they are all considered in the same weather conditions. Similarly, we generate home-based trips for various purposes (banking, visiting friends, medical reasons, or other purposes), all of which are affected by factors ignored by the model.
  • Trip combinations: Travelers are often willing to combine various trips into a chain of short trips. While this behavior creates a complex process, the FSM model treats this complexity in a limited way.
  • Feedback, cause, and effect problems: Trip generation often uses factors that are a function of the number of trips. For instance, for shopping trip attractions in the FSM model, we assume they are a retail employment function. However, it is logical to assume how many customers these retail centers attract. Alternatively, we can assume that the number of trips a household makes is affected by the number of private cars they own. Nevertheless, the activity levels of families determine the total number of cars.

As mentioned, trip generation process estimations are done separately for each trip purpose. Equations 1 and 2 show the function of trip generation and attraction:

O_i = f(x_{i1}, x_{i2}, x_{i3}, \ldots)

where Oi and Dj trip are generated and attracted respectively, x refers to socio-economic characteristics, and y refers to land-use properties.

Generally, FSM aggregates different trip purposes previously listed into three categories: home-based work trips (HBW) , home-based other (or non-work) trips (HBO) , and non-home-based trips (NHB) . Trip ends are either the origin (generation) or destination (attraction), and home-end trips comprise most trips in a study area. We can also model trips at different levels, such as zones, households, or person levels (activity-based models). Household-level models are the most common scale for trip productions, and zonal-level models are appropriate for trip attractions (McNally, 2007).

There are three main methods for a trip generation or attraction.

  • The first method is multiple regression based on population, jobs, and income variables.
  • The second method in this step is experience-based analysis, which can show us the ratio of trips generated frequently.
  • The third method is cross-classification . Cross-classification is like the experience-based analysis in that it uses trip rates but in an extended format for different categories of trips (home-based trips or non-home-based trips) and different attributes of households, such as car ownership or income.

Elaborating on the differences between these methods, category analysis models are more common for the trip generation model, while regression models demonstrate better performance for trip attractions (Meyer, 2016). Production models are recognized to be influenced by a range of explanatory and policy-sensitive variables (e.g., car ownership, household income, household size, and the number of workers). However, estimation is more problematic for attraction models because regional travel surveys are at the household level (thus providing more accurate data for production models) and not for nonresidential land uses (which is important for trip attraction). Additionally, estimation can be problematic because explanatory trip attraction variables may usually underperform (McNally, 2007). For these reasons, survey data factoring is required prior to relating sample trips to population-level attraction variables, typically achieved via regression analysis. Table 9.1 shows the advantages and disadvantages of each of these two models.

Trip Distribution

Thus far, the number of trips beginning or ending in a particular zone have been calculated. The second step explores how trips are distributed between zones and how many trips are exchanged between two zones. Imagine a shopping trip. There are multiple options for accessible shopping malls accessible. However, in the end, only one will be selected for the destination. This information is modeled in the second step as a distribution of trips. The second step results are usually a very large Origin-Destination (O-D) matrix for each trip purpose. The O-D matrix can look like the table below (9.2), in which sum of Tij by j shows us the total number of trips attracted in zone J and the sum of Tij by I yield the total number of trips produced in zone I.

Up to this point, we have calculated the number of trips originating from or terminating in a specific zone. The next step involves examining how these trips are distributed across different zones and how many trips are exchanged between pairs of zones. To illustrate, consider a shopping trip: there are various options for reaching shopping malls, but ultimately, only one option is chosen as the destination. This process is modeled in the second step as the distribution of trips. The outcome of this step typically yields a large Origin-Destination (O-D) matrix for each trip purpose. An O-D matrix might resemble the table below (9.2), where the sum of Tij by j indicates the total number of trips attracted to zone J, and the sum of Tij by I represents the total number of trips originating from zone I.

T_{ij} = \frac{P(A_i F_{ij}(K_{ij}))}{\sum(A_x F_{ij}(k_{ix}))}

T ij = trips produced at I and attracted at j

P i = total trip production at I

A j = total trip attraction at j

F ij = a calibration term for interchange ij , (friction factor) or travel

time factor ( F ij =C/t ij n )

C= calibration factor for the friction factor

K ij = a socioeconomic adjustment factor for interchange ij

i = origin zone

n = number of zones

Different methods (units) in the gravity model can be used to perform distance measurements. For instance, distance can be represented by time, network distance, or travel costs. For travel costs, auto travel cost is the most common and straightforward way of monetizing distance. A combination of different costs, such as travel time, toll payments, parking payments, etc., can also be used. Alternatively, a composite cost of both car and transit costs can be used (McNally, 2007).

Generalized travel costs can be a function of time divided into different segments. For instance, public transit time can be divided into the following segments: in-vehicle time, walking time, waiting time, interchange time, fare, etc. Since travelers perceive time value differently for each segment (like in-vehicle time vs. waiting time), weights are assigned based on the perceived value of time (VOT). Similarly, car travel costs can be categorized into in-vehicle travel time or distance, parking charge, tolls, etc.

As with the first step in the FSM model, the second step has assumptions and limitations that are briefly explained below.

  • Constant trip times: In order to utilize the model for prediction, it assumes that the duration of trips remains constant. This means that travel distances are measured by travel time, and the assumption is that enhancements in the transportation system, which reduce travel times, are counterbalanced by the separation of origins and destinations.
  • Automobile travel times to represent distance: We utilize travel time as a proxy for travel distance. In the gravity model, this primarily relies on private car travel time and excludes travel times via other modes like public transit. This leads to a broader distribution of trips.
  • Limited consideration of socio-economic and cultural factors: Another drawback of the gravity model is its neglect of certain socio-economic or cultural factors. Essentially, this model relies on trip production and attraction rates along with travel times between them for predictions. Consequently, it may overestimate trip rates between high-income groups and nearby low-income Traffic Analysis Zones (TAZs). Therefore, incorporating more socio-economic factors into the model would enhance accuracy.
  • Feedback issues: The gravity model’s reliance on travel times is heavily influenced by congestion levels on roads. However, measuring congestion proves challenging, as discussed in subsequent sections. Typically, travel times are initially assumed and later verified. If the assumed values deviate from actual values, they require adjustment, and the calculations need to be rerun.

Mode choice

FSM model’s third step is a mode-choice estimation that helps identify what types of transportation travelers use for different trip purposes to offer information about users’ travel behavior. This usually results in generating the share of each transportation mode (in percentages) from the total number of trips in a study area using the utility function (Ahmed, 2012). Performing mode-choice estimations is crucial as it determines the relative attractiveness and usage of various transportation modes, such as public transit, carpooling, or private cars. Modal split analysis helps evaluate improvement programs or proposals (e.g., congestion pricing or parking charges) aimed at enhancing accessibility or service levels. It is essential to identify the factors contributing to the utility and disutility of different modes for different travel demands (Beimborn & Kennedy, 1996). Comparing the disutility of different modes between two points aids in determining mode share. Disutility typically refers to the burdens of making a trip, such as time, costs (fuel, parking, tolls, etc.). Once disutility is modeled for different trip purposes between two points, trips can be assigned to various modes based on their utility. As discussed in Chapter 12, a mode’s advantage in terms of utility over another can result in a higher share of trips using that mode.

The assumptions and limitations for this step are outlined as follows:

  • Choices are only affected by travel time and cost: This model assumes that changes in mode choices occur solely if transportation cost or travel time in the transportation network or transit system is altered. For instance, a more convenient transit mode with the same travel time and cost does not affect the model’s results.
  • Omitted factors: Certain factors like crime, safety, and security, which are not included in the model, are assumed to have no effect, despite being considered in the calibration process. However, modes with different attributes regarding these omitted factors yield no difference in the results.
  • Simplified access times: The model typically overlooks factors related to the quality of access, such as neighborhood safety, walkability, and weather conditions. Consequently, considerations like walkability and the impact of a bike-sharing program on the attractiveness of different modes are not factored into the model.
  • Constant weights: The model assumes that the significance of travel time and cost remains constant for all trip purposes. However, given the diverse nature of trip purposes, travelers may prioritize travel time and cost differently depending on the purpose of their trip.

The most common framework for mode choice models is the nested logit model, which can accommodate various explanatory variables. However, before the final step, results need to be aggregated for each zone (Koppelman & Bhat, 2006).

A generalized modal split chart is depicted in Figure 9.5.

a simple decision tree for transportation mode choice between car, train, and walking.

In our analysis, we can use binary logit models (dummy variable for dependent variable) if we have two modes of transportation (like private cars and public transit only). A binary logit model in the FSM model shows us if changes in travel costs would occur, such as what portion of trips changes by a specific mode of transport. The mathematical form of this model is:

P_ij^1=\frac{T_ij^1}{T_{ij}}\ =\frac{e^-bcij^1 }{e^(-bc_ij^1 )+e^(-bc_ij^2 )}

where: P_ij  1= The proportion of trips between i and j by mode 1 . Tij  1= Trips between i and j by mode 1.

Cij 1= Generalized cost of travel between i and j by mode 1 .

Cij^2= Generalized cost of travel between i and j by mode 2 .

b= Dispersion Parameter measuring sensitivity to cost.

It is also possible to have a hierarchy of transportation modes for using a binary logit model. For instance, we can first conduct the analysis for the private car and public transit and then use the result of public transit to conduct a binary analysis between rail and bus.

Trip assignment

After breaking down trip counts by mode of transportation, we analyze the routes commuters take from their starting point to their destination, especially for private car trips. This process is known as trip assignment and is the most intricate stage within the FSM model. Initially, the minimum path assigns trips for each origin-destination pair based on either travel costs or time. Subsequently, the assigned volume of trips is compared to the capacity of the route to determine if congestion would occur. If congestion does happen (meaning that traffic volume exceeds capacity), the speed of the route needs to be decreased, resulting in increased travel costs or time. When the Volume/Capacity ratio (v/c ratio) changes due to congestion, it can lead to alterations in both speed and the shortest path. This characteristic of the model necessitates an iterative process until equilibrium is achieved.

The process for public transit is similar, but with one distinction: instead of adjusting travel times, headways are adjusted. Headway refers to the time between successive arrivals of a vehicle at a stop. The duration of headways directly impacts the capacity and volume for each transit vehicle. Understanding the concept of equilibrium in the trip assignment step is crucial because it guides the iterative process of the model. The conclusion of this process is marked by equilibrium, a concept known as Wardrop equilibrium. In Wardrop equilibrium, traffic naturally organizes itself in congested networks so that individual commuters do not switch routes to reduce travel time or costs. Additionally, another crucial factor in this step is the time of day.

Like previous steps, the following assumptions and limitations are pertinent to the trip assignment step:

1.    Delays on links: Most traffic assignment models assume that delays occur on the links, not the intersections. For highways with extensive intersections, this can be problematic because intersections involve highly complex movements. Intersections are excessively simplified if the assignment process does not modify control systems to reach an equilibrium.

2.    Points and links are only for trips: This model assumes that all trips begin and finish at a single point in a zone (centroids), and commuters only use the links considered in the model network. However, these points and links can vary in the real world, and other arterials or streets might be used for commutes.

3.    Roadway capacities: In this model, a simple assumption helps determine roadways’ capacity. Capacity is found based on the number of lanes a roadway provides and the type of road (highway or arterial).

4.    Time of the day variations: Traffic volume varies greatly throughout the day and week. In this model, a typical workday of the week is considered and converted to peak hour conditions. A factor used for this step is called the hour adjustment factor. This value is critical because a small number can result in a massive difference in the congestion level forecasted on the model.

5.    Emphasis on peak hour travel: The model forecasts for the peak hour but does not forecast for the rest of the day. The models make forecasts for a typical weekday but neglect specific conditions of that time of the year. After completing the fourth step, precise approximations of travel demand or traffic count on each road are achieved. Further models can be used to simulate transportation’s negative or positive externalities. These externalities include air pollution, updated travel times, delays, congestion, car accidents, toll revenues, etc. These need independent models such as emission rate models (Beimborn & Kennedy, 1996).

The basic equilibrium condition point calculation is an algorithm that involves the computation of minimum paths using an all-or-nothing (AON) assignment model to these paths. However, to reach equilibrium, multiple iterations are needed. In AON, it is assumed that the network is empty, and a free flow is possible. The first iteration of the AON assignment requires loading the traffic by finding the shortest path. Due to congestion and delayed travel times, the

previous shortest paths may no longer be the best minimum path for a pair of O-D. If we observe a notable decrease in travel time or cost in subsequent iterations, then it means the equilibrium point has not been reached, and we must continue the estimation. Typically, the following factors affect private car travel times: distance, free flow speed on links, link capacity, link speed capacity, and speed flow relationship .

The relationship between the traffic flow and travel time equation used in the fourth step is:

t = t_0 + a v^n, \quad v < c

t= link travel time per length unit

t 0 =free-flow travel time

v=link flow

c=link capacity

a, b, and n are model (calibrated) parameters

Model improvement

Improvements to FSM continue to generate more accurate results. Since transportation dynamics in urban and regional areas are under the complex influence of various factors, the existing models may not be able to incorporate all of them. These can be employer-based trip reduction programs, walking and biking improvement schemes, a shift in departure (time of the day), or more detailed information on socio-demographic and land-use-related factors. However, incorporating some of these variables is difficult and can require minor or even significant modifications to the model and/or computational capacities or software improvements. The following section identifies some areas believed to improve the FSM model performance and accuracy.

•      Better data: An effective way of improving the model accuracy is to gather a complete dataset that represents the general characteristics of the population and travel pattern. If the data is out- of-date or incomplete, we will get poor results.

•      Better modal split: As you saw in previous sections, the only modes incorporated into the model are private car and public transit trips, while in some cities, a considerable fraction of trips are made by bicycle or by walking. We can improve our models by producing methods to consider these trips in the first and third steps.

•      Auto occupancy: In contemporary transportation planning practices, especially in the US, some new policies are emerging for carpooling. We can calculate auto occupancy rates using different mode types, such as carpooling, sensitive to private car trips’ disutility, parking costs, or introducing a new HOV lane.

•      Time of the day: In this chapter, the FSM framework discussed is oriented toward peak hour (single time of the day) travel patterns. Nonetheless, understanding the nature of congestion in other hours of the day is also helpful for understanding how travelers choose their travel time.

•      A broader trip purpose: Additional trip purposes may provide a better understanding of the

factors affecting different trip purposes and trip-chaining behaviors. We can improve accuracy by having more trip purposes (more disaggregate input and output for the model).

  • The concept of access: As discussed, land-use policies that encourage public transit use or create amenities for more convenient walking are not present in the model. Developing factors or indices that reflect such improvements in areas with high demand for non-private vehicles and incorporating them in choice models can be a good improvement.
  • Land use feedback: To better understand interactions between land use and travel demand, a land-use simulation model can be added to these steps to determine how a proposed transportation change will lead to a change in land use.
  • Intersection delays: As mentioned in the fourth step, intersections in major highways create significant delays. Incorporating models that calculate delays at these intersections, such as stop signs, could be another improvement to the model.

A Simple Example of the FSM model

An example of FSM is provided in this section to illustrate a typical application of this model in the U.S. In the first phase, the specifications about the transportation network and household data are needed. In this hypothetical example, 5 percent of households in each TAZ were sampled and surveyed, which generated 1,955 trips in 200 households. As a hypothetical case study, this sample falls below the standard required for statistical significance but is relevant to demonstrate FSM.

A home interview survey was carried out to gather data from a five percent sample of households in each TAZ. This survey resulted in 1,852 trips from 200 households. It is important to note that the sample size in this example falls below the minimum required for statistical significance, as it is intended for learning purposes only.

Table 9.3 provides network information such as speed limits, number of lanes, and capacity. Table 9.4 displays the total number of households and jobs in three industry sectors for each zone. Additionally, Table 9.5 breaks down the household data into three car ownership groups, which is one of the most significant factors influencing trip making.

In the first step (trip generation), a category model (i.e., cross-classification) helped estimate trips. The sampled population’s sociodemographic and trip data for different purposes helped calculate this estimate. Since research has shown the significant effect of auto ownership on private car trip- making (Ben-Akiva & Lerman, 1974), disaggregating the population based on the number of private cars generates accurate results. Table 9.7 shows the trip-making rate for different income and auto ownership groups.

Also, as mentioned in previous sections, multiple regression estimation analysis can be used to generate the results for the attraction model. Table 9.7 shows the equations for each of the trip purposes.

After estimating production and attraction, the models are used for population data to generate results for the first step. Next, comparing the results of trip production and attraction, we can observe that the total number of trips for each purpose is different. This can be due to using different methods for production and attraction. Since the production method is more reliable, attraction is typically normalized by  production. Also, some external zones in our study area are either attracting trips from our zones or generating them. In this case, another alternative is to extend the boundary of the study area and include more zones.

As mentioned, the total number of trips produced and attracted are different in these results. To address this mismatch, we can use a balance factor to come up with the same trip generation and attraction numbers if we want to keep the number of zones within our study area. Alternatively, we can consider some external stations in addition to designated zones. In this example, using the latter seems more rational because, as we saw in Table 9.4, there are more jobs than the number of households aggregately, and our zone may attract trips from external locations.

For the trip distribution step, we use the gravity model. For internal trips, the gravity model is:

T_{ij} = a_i b_j P_i A_j f(t_{ij})

and f(tij) is some function of network level of service (LOS)

To apply the gravity model, we need to calculate the impedance function first, which is represented here by travel cost. Table 9.9 shows the minimum travel path between each pair of zones ( skim tree ) in a matrix format in which each cell represents travel time required to travel between the corresponding row and column of that cell.

Table 9.9-Travel cost table (skim tree)

Note. Table adapted from “The Four-Step Model” by M. McNally, In D. A. Hensher, & K. J. Button (Eds.), Handbook of transport modelling , Volume1, p. 5, Bingley, UK: Emerald Publishing. Copyright 2007 by Emerald Publishing.

With having minimum travel costs between each pair of zones, we can calculate the impedance function for each trip purpose using the formula

f(t_{ij}) = a \cdot t_{ij} \cdot b \cdot e^{ct_{ij}}

Table 9.10 shows the model parameters for calculating the impedance function for different trip purposes:

After calculating the impedance function , we can calculate the result of the trip distribution. This stage generates trip matrices since we calculate trips between each zone pair. These matrices are usually in “Origin-Destination” (OD) format and can be disaggregated by the time of day. Field surveys help us develop a base-year trip distribution for different periods and trip purposes. Later, these empirical results will help forecast trip distribution. When processing the surveys, the proportion of trips from the production zone to the attraction zone (P-A) is also generated. This example can be seen in Table 9.11.  Looking at a specific example, the first row in table is for the 2-hour morning peak commute time period. The table documents that the production to attraction factor for the home-based work trip is 0.3.  Unsurprisingly, the opposite direction, attraction to production zone is 0.0 for this time of day. Additionally, the table shows that the factor for HBO and NHB trips are low but do occur during this time period. This could represent shopping trips or trips to school. Table 9.11 table also contains the information for average occupancy levels of vehicles from surveys. This information can be used to convert person trips to vehicle trips or vice versa.

Table 9.11 Trip distribution rates for different time of the day and trip purposes

The O-D trip table is calculated by adding the  multiplication of the P-to-A factor by corresponding cell of the P-A trip table and adding the corresponding cell of the transposed P-A trip table multiplied by the A-to-P factor. These results, which are the final output of second step, are shown in Table 9.12.

Once the Production-Attraction (P-A) table is transformed into Origin-Destination (O-D) format and the complete O-D matrix is computed, the outcomes will be aggregated for mode choice and traffic assignment modeling. Further elaboration on these two steps will be provided in Chapters 11 and 12.

In this chapter, we provided a comprehensive yet concise overview of four-step travel demand modeling including the process, the interrelationships and input data, modeling part and extraction of outputs. The complex nature of cities and regions in terms of travel behavior, the connection to the built environment and constantly growing nature of urban landscape, necessitate building models that are able to forecast travel patterns for better anticipate and prepare for future conditions from multiple perspectives such as environmental preservation, equitable distribution of benefits, safety, or efficiency planning. As we explored in this book, nearly all the land-use/transportation models embed a transportation demand module or sub model for translating magnitude of activities and interconnections into travel demand such as VMT, ridership, congestion, toll usage, etc. Four-step models can be categorized as gravity-based, equilibrium-based models from the traditional approaches. To improve these models, several new extensions has been developed such as simultaneous mode and destination choice, multimodality (more options for mode choice with utility), or microsimulation models that improve granularity of models by representing individuals or agents rather than zones or neighborhoods.

Travel demand modeling are models that predicts the flow of traffic or travel demand between zones in a city using a sequence of steps.

  • Intermodality refers to the concept of utilizing two or more travel modes for a trip such as biking to a transit station and riding the light rail.
  • Multimodality is a type of transportation network in which a variety of modes such as public transit, rail, biking networks, etc. are offered.

Zoning ordinances is legal categorization of land use policies that permits or prohibits certain built environment factors such as density.

Volume capacity ratio is ratio that divides the demand on a link by the capacity to determine the level of service.

  • Zone centroid is usually the geometric center of a zone in modeling process where all trips originate and end.

Home-based work trips (HBW) are the trips that originates from home location to work location usually in the AM peak.

  •  Home-based other (or non-work) trips (HBO) are the trips that originates from home to destinations other than work like shopping or leisure.

Non-home-based trips (NHB) are the trips that neither origin nor the destination are home or they are part of a linked trip.

Cross-classification is a method for trip production estimation that disaggregates trip rates in an extended format for different categories of trips like home-based trips or non-home-based trips and different attributes of households such as car ownership or income.

  • Generalized travel costs is a function of time divided into sections such as in vehicle time vs. waiting time or transfer time in a transit trip.

Binary logit models is a type of logit model where the dependent variable can take only a value of 0 or 1.

  • Wardrop equilibrium is a state in traffic assignment model where are drivers are reluctant to change their path because the average travel time is at a minimum.

All-or-nothing (AON) assignment model is a model that assumes all trips between two zones uses the shortest path regardless of volume.

Speed flow relationship is a function that determines the speed based on the volume (flow)

skim tree is structure of travel time by defining minimum cost path for each section of a trip.

Key Takeaways

In this chapter, we covered:

  • What travel demand modeling is for and what the common methods are to do that.
  • How FSM is structured sequentially, what the relationships between different steps are, and what the outputs are.
  • What the advantages and disadvantages of different methods and assumptions in each step are.
  • What certain data collection and preparation for trip generation and distribution are needed through a hypothetical example.

Prep/quiz/assessments

  • What is the need for regular travel demand forecasting, and what are its two major components?
  • Describe what data we require for each of the four steps.
  • What are the advantages and disadvantages of regression and cross-classification methods for a trip generation?
  • What is the most common modeling framework for mode choice, and what result will it provide us?
  • What are the main limitations of FSM, and how can they be addressed? Describe the need for travel demand modeling in urban transportation and relate it to the structure of the four-step model (FSM).

Ahmed, B. (2012). The traditional four steps transportation modeling using a simplified transport network: A case study of Dhaka City, Bangladesh. International Journal of Advanced Scientific Engineering and Technological Research ,  1 (1), 19–40. https://discovery.ucl.ac.uk/id/eprint/1418961/

ALMEC, C . (2015). The Project for capacity development on transportation planning and database management in the republic of the Philippines: MMUTIS update and enhancement project (MUCEP) : Project Completion Report . Japan International Cooperation Agency. (JICA) Department of Transportation and Communications (DOTC) . https://books.google.com/books?id=VajqswEACAAJ .

Beimborn, E., and  Kennedy, R. (1996). Inside the black box: Making transportation models work for livable communities . Washington, DC: Citizens for a Better Environment and the Environmental Defense Fund. https://www.piercecountywa.gov/DocumentCenter/View/755/A-GuideToModeling?bidId

Ben-Akiva, M., & Lerman, S. R. (1974). Some estimation results of a simultaneous model of auto ownership and mode choice to work.  Transportation ,  3 (4), 357–376. https://doi.org/10.1007/bf00167966

Ewing, R., & Cervero, R. (2010). Travel and the built environment: A meta-analysis. Journal of the American Planning Association , 76 (3), 265–294. https://doi.org/10.1080/01944361003766766

Florian, M., Gaudry, M., & Lardinois, C. (1988). A two-dimensional framework for the understanding of transportation planning models.  Transportation Research Part B: Methodological ,  22 (6), 411–419. https://doi.org/10.1016/0191-2615(88)90022-7

Hadi, M., Ozen, H., & Shabanian, S. (2012).  Use of dynamic traffic assignment in FSUTMS in support of transportation planning in Florida.  Florida International University Lehman Center for Transportation Research. https://rosap.ntl.bts.gov/view/dot/24925

Hansen, W. (1959). How accessibility shapes land use.” Journal of the American Institute of Planners 25 (2): 73–76. https://doi.org/10.1080/01944365908978307

Gavu, E. K. (2010).  Network based indicators for prioritising the location of a new urban transport connection: Case study Istanbul, Turkey (Master’s thesis, University of Twente). International Institute for Geo-Information Science and Earth Observation Enschede. http://essay.utwente.nl/90752/1/Emmanuel%20Kofi%20Gavu-22239.pdf

Karner, A., London, J., Rowangould, D., & Manaugh, K. (2020). From transportation equity to transportation justice: Within, through, and beyond the state. Journal of Planning Literature , 35 (4), 440–459. https://doi.org/10.1177/0885412220927691

Kneebone, E., & Berube, A. (2013). Confronting suburban poverty in America . Brookings Institution Press.

Koppelman, Frank S, and Chandra Bhat. (2006). A self instructing course in mode choice modeling: multinomial and nested logit models. U.S. Department of Transportation Federal Transit Administration https://www.caee.utexas.edu/prof/bhat/COURSES/LM_Draft_060131Final-060630.pdf

‌Manheim, M. L. (1979).  Fundamentals of transportation systems analysis. Volume 1: Basic Concepts . The MIT Press https://mitpress.mit.edu/9780262632898/fundamentals-of-transportation-systems-analysis/

McNally, M. G. (2007). The four step model. In D. A. Hensher, & K. J. Button (Eds.), Handbook of transport modelling , Volume1 (pp.35–53). Bingley, UK: Emerald Publishing.

Meyer, M. D., & Institute Of Transportation Engineers. (2016).  Transportation planning handbook . Wiley.

Mladenovic, M., & Trifunovic, A. (2014). The shortcomings of the conventional four step travel demand forecasting process. Journal of Road and Traffic Engineering , 60 (1), 5–12.

Mitchell, R. B., and C. Rapkin. (1954). Urban traffic: A function of land use . Columbia University Press. https://doi.org/10.7312/mitc94522

Rahman, M. S. (2008). “ Understanding the linkages of travel behavior with socioeconomic characteristics and spatial Environments in Dhaka City and urban transport policy applications .” Hiroshima: (Master’s thesis, Hiroshima University.) Graduate School for International Development and Cooperation. http://sr-milan.tripod.com/Master_Thesis.pdf

Rodrigue, J., Comtois, C., & Slack, B. (2020). The geography of transport systems . London ; New York Routledge.

Shen, Q. (1998). Location characteristics of inner-city neighborhoods and employment accessibility of low-wage workers. Environment and Planning B: Planning and Design , 25 (3), 345–365.

Sharifiasl, S., Kharel, S., & Pan, Q. (2023). Incorporating job competition and matching to an indicator-based transportation equity analysis for auto and transit in Dallas-Fort Worth Area. Transportation Research Record , 03611981231167424. https://doi.org/10.1177/03611981231167424

Weiner, Edward. 1997. Urban transportation planning in the United States: An historical overview . US Department of Transportation. https://rosap.ntl.bts.gov/view/dot/13691

Xiongbing, J,  Grammenos, F. (2013, May, 21) . Taking the Guesswork out of Designing for Walkability. Planetizen .  https://www.planetizen.com/node/63248

Home-based other (or non-work) trips (HBO) are the trips that originates from home to destinations other than work like shopping or leisure.

gravity model is a type of accessibility measurement in which the employment in destination and population in the origin defines thee degree of accessibility between the two zones.

Impedance function is a function that convert travel costs (usually time or distance) to the level of difficulty of getting from one location to the other.

Transportation Land-Use Modeling & Policy Copyright © by Mavs Open Press. All Rights Reserved.

Share This Book

Table removed due to copyright \ restrictions.

Table removed due to copyright restrictions.

An Open Access Journal

  • Original Paper
  • Open access
  • Published: 24 November 2021

Analysis and comparison of traffic flow models: a new hybrid traffic flow model vs benchmark models

  • Facundo Storani 1 ,
  • Roberta Di Pace   ORCID: orcid.org/0000-0001-7589-8570 1 ,
  • Francesca Bruno 1 &
  • Chiara Fiori 1  

European Transport Research Review volume  13 , Article number:  58 ( 2021 ) Cite this article

7301 Accesses

18 Citations

Metrics details

This paper compares a hybrid traffic flow model with benchmark macroscopic and microscopic models. The proposed hybrid traffic flow model may be applied considering a mixed traffic flow and is based on the combination of the macroscopic cell transmission model and the microscopic cellular automata.

Modelled variables

The hybrid model is compared against three microscopic models, namely the Krauß model, the intelligent driver model and the cellular automata, and against two macroscopic models, the Cell Transmission Model and the Cell Transmission Model with dispersion, respectively. To this end, three main applications were considered: (i) a link with a signalised junction at the end, (ii) a signalised artery, and (iii) a grid network with signalised junctions.

The numerical simulations show that the model provides acceptable results. Especially in terms of travel times, it has similar behaviour to the microscopic model. By contrast, it produces lower values of queue propagation than microscopic models (intrinsically dominated by stochastic phenomena), which are closer to the values shown by the enhanced macroscopic cell transmission model and the cell transmission model with dispersion. The validation of the model regards the analysis of the wave propagation at the boundary region.

1 Background and motivation

Although three main groups of traffic flow models have been identified in the literature, namely macroscopic, mesoscopic and microscopic models [ 1 ], hybrid traffic flow models obtained by combining models from two of the above groups have more recently been explored. This paper aims to compare a proposed hybrid traffic flow model (H-CA&CTM [ 2 ], with some benchmark macroscopic and microscopic models.

More in general the macroscopic models are based on aggregate variables representing user behaviour as flows, density, and aggregate variables describing supply, such as speed. They can be classified in accordance with the literature depending on the continuous or discrete representation of space and time. The basic model, formulated in the case of continuous space and time, was the first-order model developed by Lighthill and Whitham [ 3 ] and Richards [ 4 ] (the Lighthill–Whitham–Richards—LWR—model). Subsequently, Payne [ 5 ], Ross [ 6 ], and Kerner and Konhäuser [ 7 ] proposed second-order models to overcome limitations such as the instantaneous driver’s reaction and the impact of the inertial effect, as well as drivers’ reactions to the conditions of the traffic context. Finally, within the same group of models, the third-order model was proposed by Helbing [ 8 ], based on three states: vehicle density, mean speed and mean speed dispersion.

To solve the first-order model, the cell transmission model (CTM; [ 9 ], a discrete space and time model was introduced. In the class of space-discrete and time-continuous models is the model introduced by Newell [ 10 ], based on a simplified theory of kinematic waves focusing on the representation of inflow/outflow curves, and the state of flow at an extreme. Consistent with simplified first-order kinematic wave theory after Newell, Yperman [ 11 ] proposed the link transmission model (LTM) in which link volumes and link travel times are obtained starting from cumulative vehicle numbers.

Concerning the microscopic approach, this class of traffic flow models aim to reproduce single vehicle behaviour by considering the disaggregate representation of the position as well as the disaggregate representation of speeds. This class of models has been widely studied by researchers and four main groups may be identified: stimulus–response models, safety distance models, optimal velocity models and physiology-psychology models.

In the case of stimulus–response models, the leading vehicle and follower are analysed as a pair and it is supposed that each vehicle reacts to the stimulus of the leading vehicle. Preliminary studies may be found in Chandler et al. [ 12 ] and Gazis et al. [ 13 ]. Although the latter model was better able to model the case of high density by considering the stimulus not only a function of the leader vehicle as in the former model but also as a function of the speed difference between the leader and the follower, this class of model is not reliable in the case of free flow conditions. This shortcoming has given rise to other approaches in the literature [ 14 , 15 , 16 ].

The second group of models was primarily introduced by Gipps [ 17 ] and focused on the safe distance to ensure collision avoidance. Further refinements of the model were subsequently proposed especially by Leutzbach [ 18 ] who took account of different steps in driver behaviour (i.e. perception, decision and braking). Other enhancements of the Gipps model are proposed by the Krauß model [ 19 ] through the introduction of stochasticity.

The optimal velocity model [ 20 ] is based on the discrepancy between desired speed and actual speed. The model has been further developed by several authors [ 21 , 22 , 23 , 24 , 25 ]. In particular, Treiber et al. [ 26 ] proposed the intelligent driver model (IDM) which takes into account the desired space headway and desired speed.

Moreover, there are the action point models first introduced by Michaels [ 27 ], generally referred to as physiology-psychology models. Further developed by Wiedemann [ 28 ], the models are based on different regimes (i.e., free driving, closing in and emergency) depending different thresholds piloting the behaviour of the follower when approaching to the leading vehicle.

Finally, mention must be made of hybrid traffic flow models which are based on a combination of two traffic flow models [ 29 , 30 , 31 , 32 ]. Hybrid traffic flow models were introduced to obtain properties of different models at different levels of network layouts.

For instance, macroscopic modelling is more suitable than microscopic modelling for simple node representation whereas the latter may be better applied along links to appropriately reproduce vehicle interactions and drivers’ mutual influences).

In accordance with the literature, the approach was also introduced in order to deal with the lane changing problem in which microscopic modelling is suitable for realistic acceleration reproduction but cannot be applied for lane changes [ 33 ] therefore further investigations may found in Daganzo [ 34 ] and Laval and Daganzo [ 35 ]. They proposed a model based on a Kinematic Wave (KW; [ 36 ] model for traffic stream simulation and a micro model for the slower vehicles’ representation. Further studies may be also found in which obtained the same results by replacing the KW model with the CA and considering the same macroscopic parameters [ 37 ], in particular, the CA model provides the same trajectories of the KW model with triangular FD, confirming that the theory is insensitive to the level of approximation (i.e., discrete—continuum).

Leclercq [ 38 ] presented a hybrid ‘‘Lighthill–Whitham–Richards’’ (LWR) model combining both macroscopic and micro-scopic traffic descriptions. In particular, the main focus of the proposed model was to overcome the limitations of the models previously proposed [ 30 , 39 , 40 , 41 , 42 ] mainly related to the physical extension of the interfaces between the microscopic and the macroscopic models.

The proposed hybrid model is based on the combination of the macroscopic cell transmission model (CTM; [ 36 ] and the microscopic cellular automata model (CA, [ 43 ]. The hybrid model (H-CA&CTM, [ 2 ]) appropriately reproduces the queue propagation phenomena and drivers’ behaviour in order to be applied in the presence not only of human-driven vehicles but also in the presence of connected and autonomous vehicles supporting the vehicles to infrastructure communication at node networks particularly in the case of traffic control. In particular the model specified, calibrated and validated in [ 44 ] has been also applied to the case of a signalized arterial in order to develop an iterative bilevel optimisation framework combining the traffic lights optimisation at the first level with the speed optimisation at the second (lower) level (i.e., the GLOSA; Green Light Optimized Speed Advisory).

It should be pointed out that the CA is a disaggregate model for basic microscopic traffic flow analysis, significantly reducing computational effort. As for the CTM, although the model considered is the basic application, several enhancements may be found in the literature. For instance, the CTM with dispersion (PD&CTM; [ 45 , 46 ]) could be an affordable extension to be considered.

The rest of the paper is organised as follows: in Sect.  2 the models in question are outlined; in Sect.  3 the numerical results with reference to three applications are discussed, and in Sect.  4 conclusions and future perspectives are summarised.

2 Description of models

In this section the mathematical details of each macroscopic and microscopic model are discussed. In the former class, the cell transmission model [ 36 ] and the cell transmission model with dispersion (PD&CTM [ 45 ], are discussed, whilst in terms of microscopic models our analysis covers the Krauß model [ 19 ], the IDM [ 47 ] and the cellular automata [ 43 ].

2.1 Macroscopic models

2.1.1 cell transmission model (ctm).

The cell transmission model was introduced to support the solution of the continuous time—continuous space LWR model and is based on a finite difference method: the time is divided into constant time intervals, while the road segment is divided into cells of constant length, with an index i increasing in the downstream direction. At each time step, every cell has single values of density and speed (as a function of the speed-density relationship) while the flow between neighbouring cells is constant during the time interval. The most common integration method for LWR models is the Godunov scheme [ 1 ]. This method is based on an exact solution of the continuity equation for one time step, assuming stepwise initial conditions given by the actual densities of the cells. The road is divided into cells of length Δx equal to the distance that a vehicle would travel in a free flow condition during one time step. Hence it is equal to the free flow speed multiplied by the duration of the time step (also called clock tick), \(V_{f} \Delta t = \Delta x\) . The relation between the cell length and the time step complies with the Courant-Friedrichs-Lewy condition ( \(V_{f} \Delta t \le \Delta x\) ) for the stability of explicit solution methods.

Following the Godunov scheme, the densities are initially averaged for each cell (each cell has a constant density), and from one time step t, to a successive one, t +  \(\Delta\) t, the solution evolution is averaged again in order to obtain a piecewise constant solution. The main variables of the method are:

\(k_{i} { }\) density in cell i;

\(k_{j}\) jam density;

\(Q_{i}\) maximum flow rate in cell i;

\(V_{f}\) free flow speed;

\(\omega\) shock wave speed in congested traffic;

\(\Delta x\) cell length;

\(\Delta t\) time step;

\(Y_{i}\) flow exiting the boundary of cell i.

The density is then obtained as a function of flows at the cell boundaries as in the following:

Finally, the key quantities of the method can be introduced based on the (trapezoidal) fundamental diagram (Fig.  1 ).

figure 1

Trapezoidal fundamental diagram—link representation

The flow of vehicles moving through the boundary between upstream cell i and downstream cell \(i + 1\) (see Fig.  1 ) is given by the result of a comparison between the maximum flow that can be sent (that is the demand ) by cell i (upstream of the boundary):

and the maximum flow that can be received (that is the supply ) by the downstream cell \(i + 1\) :

Since every cell has a maximum density ( \(k_{j}\) ), the incoming flow is not only constrained by the maximum value \(Q_{i + 1}\) , but also by the difference between the maximum density and the current density \(\left( {k_{j} - k_{i + 1} } \right)\) , which captures the spillback phenomena and is able to model the effects of horizontal queuing.

Therefore, in accordance with the Godunov scheme , the flow \(Y_{i} \left( t \right)\) can be rewritten in accordance with the demand ( sending )-supply ( receiving ) rule of the cell transmission model as:

2.1.2 CTM with dispersion (PD&CTM)

In the case of a signalised network, two main issues are to be addressed: (i) the modelling of the dispersion between inter-acting junctions, which is strictly related to the distance travelled on the connecting links and (ii) the spillback (i.e., the link blockage) and the merging and diverging modelling (i.e., the lane blockage). In general, the Platoon Dispersion Model (PDM; Robertson [ 48 , 49 ]) which is adopted in several applications and benchmark tools TRANSYT [ 50 ] and SCOOT [ 51 ] is the most straightforward for modelling the dispersion of platoons. Moreover, this model shows a main weakness since it cannot describe the spillback phenomena and it does not model the effects of blocking back (i.e., horizontal queuing). Concerning the CTM, it may be adopted as an alternative to the PDM, for short distances whereas in the case of long distances the PDM is still preferred to the CTM. However, to overcome the limitations of PDM and CTM, the model proposed by Cantarella et al., [ 45 ] employing for each cell the well-known Drake speed-density relationship, can be considered.

First of all, some details about the platoon dispersion phenomenon should be supplied then the PD&CTM is specified. Let:

T be the mean link travel time;

t be equal to 0.8 T;

\(q_{d} \left( j \right)\) , the flow rate over a time step Δt arriving at the downstream signal at time interval j;

\(q_{0} \left( i \right)\) , the discharging flow over time step Δt observed at the upstream signal at time interval i;

Δt, the time step duration, usually assumed as one second;

F, the smoothing factor;

and \(\alpha {\text{and}} \beta\) ,the dimensionless model parameters.

Robertson’s model takes the following mathematical form:

where F, the smoothing factor, is given by:

Two main conditions may arise depending on the F values: (i) if the distance between two successive junctions is high, the travel time is high and F tends to zero; in this case uniform flow profiles are observed and the two successive junctions are not interacting; (ii) otherwise, when the distance between them is low they are interacting; suppose that the travel time tends to zero, the smoothing factor tends to 1 and then \(q_{d} \left( j \right) = q_{d} \left( {j - 1} \right)\) .

The cell transmission model with dispersion was modified to include the Drake speed-density relationship, modelling the dispersion of the platoon formed upstream of a traffic light.

\(t\) be the time step

\(\Delta t\) , the duration of the time step

\(\Delta x\) , the length of the cells

\(Q_{i}\) , the maximum flow rate in cell i

\(k_{i} \left( t \right)\) , the density in cell i at time step t

\(k_{j}\) , the jam density

\(k_{m}\) , the traffic density at maximum flow

\(V_{f}\) , the free flow speed and

\(\omega\) , the shock wave speed in congested traffic.

For each cell, at each time step, the demand flow is given by:

The supply flow from the immediate downstream cell is given by:

The speed of the outgoing flow at each cell is given by the Drake speed-density relationship as:

The flow from each cell derived from the Drake speed-density relationship is given by:

The flow to the downstream cell is then calculated as:

To update the density at the next time step, for each cell i:

Since the flow to the downstream cell is limited by the supply and demand of each cell from the basic CTM, the resulting outcoming flow can be either equal to or lower than them, depending on the parameters of the Drake speed-density relationship.

As an example, given the next set of parameters, the following fundamental diagrams (see Figs.  2 , 3 and 4 ) are obtained:

\(\Delta x\)  = 15 m

\(Q_{i}\)  = 1800 veh/h

\(k_{j}\)  = 200 veh/km

\(k_{m}\)  = 55 veh/km

\(V_{f}\)  = 15 m/s

\(\omega\)  = 5 m/s

figure 2

fundamental diagram: flow–density relationship

figure 3

fundamental diagram: speed–density relationship

figure 4

fundamental diagram: speed—flow relationship

2.2 Microscopic models

2.2.1 krauss’s model.

Several classes of models may be identified at the microscopic level, amongst which are safety distance models. The definition of this distance is crucial in order to avoid collisions between vehicles, and these models are based on the idea that following vehicles try to respect the safety distance from the leading vehicles. The main contributions in the literature concern works by Kometani and Sasaki [ 52 ], Gipps [ 17 ] and Krauß [ 19 ]. The model proposed by Gipps is a multiregime model able to reproduce the free flow driving condition and the car following regime. The two main limitations of the Gipps model concern its unsuitability in the case of unstable traffic flow conditions and the possibility that the model has no solutions due to its analytical formulation. Therefore, the Krauß model, which can overcome such limitations, may be considered an alternative approach to that of Gipps. In accordance with the Krauß model, the safe speed is given by the following expression:

where, \(v_{l} \left( t \right)\) is the speed of the leading vehicle at time t, \(g\left( t \right)\) is the gap between leader and follower at time t, \(t_{r}\) is the drivers’ reaction time and, b is the max value of deceleration.

Finally, the desired speed is given as the minimum between the maximum speed, the speed that can be achieved by the vehicle according to its acceleration, and the safe speed as defined above. That is:

2.2.2 Intelligent driver model

Next are the continuous time models, based on first-order differential equations. The two main contributions in the literature concern the optimal velocity model (OVM) [ 20 ] and the intelligent driver model (IDM) [ 26 ].

In the above class of models, it is supposed that each vehicle has a desired speed depending on the distance between vehicles or the difference between the speed of a pair of vehicles, namely the leader and follower. The OVM refers to the former case, whereas the IDM to the latter.

With regard to the OVM, it must be highlighted that the acceleration of the vehicle depends on the desired speed and can be formulated as

\(n\) is the following vehicle;

\(\Delta x_{n} \left( t \right)\) is the spacing between the leading and the following vehicle;

\(v_{n} \left( t \right)\) is the speed of the vehicle;

\(\tau\) is driver sensitivity.

However, one of the main limitations of the model concerns the unrealistic (high) values of maximum acceleration when the drivers’ sensitivity is of the same order as the drivers’ reaction time, which depends on the difference between the vehicles’ speeds [ 26 ] that is not considered. In general, in the IDM formulation, acceleration is a continuous function of speed, distance and speed difference.

In particular, let:

a 0 be the maximum acceleration;

v 0 , the drivers’ desired speed;

δ, a parameter to be calibrated and

Δx 0 , the desired distance, a function of the follower’s speed and the speed difference.

The final formulation of acceleration is composed by two terms, the free flow term and the interaction term as detailed in the following:

2.2.3 CA—Nagel–Schreckenberg model

The approach adopted was proposed by Nagel and Schreckenberg [ 43 ] who developed a model which was discrete in time and space, considering a single lane road and dividing it into cells that can have two states (occupied or empty), and a length equal to the length of a vehicle. Every vehicle occupies a cell, which has an “occupied” state. At the next time step, if a vehicle moves to another downstream cell, its speed has integer value (ranging from zero to a maximum value) which represents the number of cells that the vehicle moves downstream, from position \(x_{i} \left( t \right)\) to \(x_{i} \left( {t + 1} \right)\) . Because of this, the behaviour of an upstream vehicle i is influenced by a downstream one \(i + 1\) , if the gap \(g_{i}\) between them is smaller than the speed \(v_{i}\) of the upstream vehicle. The speed can be converted to a dimensional value through multiplying it by both the ratio of the cell length and the time step. The acceleration is equal to 1 or 0, thus increasing, or otherwise, the integer value of the speed at each time step.

The model also contains a stochastic component called the dawdling probability in which, with probability p , a vehicle can remain at the same speed (if it was accelerating) or decelerate. This allows us to model stop-and-go waves in congested traffic, varying the flow-density relation as well.

The model is applied by following four rules. At each time step, and for each vehicle i on the road, their speed \(v_{i} \left( t \right)\) and position \(x_{i} \left( t \right)\) are updated as:

Slowing down. Obtain the gap at time t. If speed  >  gap , then slow down.

Acceleration. If speed  <  gap and speed  <  max speed , then accelerate by one.

Randomization (Dawdling rule). If speed  > 0, then with probability p (dawdling probability, that is the random term) reduce it by one.

Car motion. Update the position

The Nagel-Schreckenberg Model is not the only type of cellular automata. There are also other types, such as the Barlovic model [ 53 ] which adds a “slow to start” rule, the Kerner Klenov and Wolf model [ 54 ] which considers the cell length equal to 0.50 m (thus considering an acceleration of 0.5 m/s2) and adds other parameters to model synchronized traffic in accordance with the three-phase traffic theory proposed by Kerner, and the same model but changing the safe speed rule by using a discretized version of the safe speed of the Gipps model (considering a braking deceleration parameter). In this study, the basic model remains that of Nagel-Schreckenberg, but given that each cell has a length of 2.50 m, the randomization rule is applied only if the speed exceeds a minimum value greater than 0.

2.3 Hybrid traffic flow models

2.3.1 ca—ctm hybrid.

The general architecture of the proposed hybrid model consists of the combination of a macroscopic CTM with a microscopic CA for each link (see Fig.  5 ). The CA is used to model the traffic flow at disaggregate level at the junction, whereas the CTM models the traffic flow at aggregate level along the link. The transitions from CA to CTM and vice versa are based on the introduction of a transition zone. Both models have the same simulation time step of 1 s to obtain a consistent queuing formation and backend propagation of the congestion.

figure 5

Example of the hybrid link representation

3 Numerical results

In this section the proposed hybrid traffic flow model and the benchmark macroscopic and microscopic models are compared. To this end, three main layouts were considered with reference to an urban context:

a link with a traffic signal (see Sect.  3.1 );

an arterial consisting of three signalised junctions (see Sect.  3.2 );

a nine-node grid layout with signalised junctions (see Sect.  3.4 ).

Before these analyses, the evaluation of the wave propagation is provided.

The details of the considered traffic flow models are displayed in the following Table 1 . Furthermore, all results were analysed in terms of travel time spent, max and mean queues. In general, a simulation horizon equal to one hour has been considered and the first 900 s were considered as a “warm—up period” to stabilize the flow and speed of all vehicles. The proposed traffic flow model was implemented in a code provided by the authors and developed in MATLAB (the Release 2020 was adopted) whereas the microscopic and macroscopic traffic flow analyses were run respectively in SUMO [ 55 ] and TRANSYT16®TRL; all simulations were run on machine which has an Intel(R) Core(TM) i7-4510U CPU with a base speed of 2.6 GHz, and 8 GB of RAM.

3.1 Wave propagation analysis

In this section an in-depth validation of how shockwaves propagation along a signalised link is provided and the consistency of the proposed model with the wave theory is carried out. Four applications are considered [ 38 ]:

at the transition from microscopic [m] to macroscopic model [M]

a capacity reduction is applied [from 2000 veh/h to 600 veh/h] at the macroscopic transition cell

a shockwave is induced downstream, in the macroscopic model

at the transition from the macroscopic [M] to the microscopic model [m]

a demand variation upstream is applied

a shockwave is induced downstream, in the microscopic model

Results are shown in Fig. 6 . With reference to case a: the exit flow fits the supply variations as well as in the case c where the entry flow fits the demand variations. Furthermore, the shock waves propagations through the interface are uniform and the interface does not affect the wave in terms of interruptions, delay or any other kinds of modifications.

figure 6

Wave propagation analysis

This numerical application was run considering a link 300 m long with a signalised junction at the end. In terms of demand, to test the impact of the undersaturation and oversaturation conditions, three different entry flows were tested: the first was 400 veh/h, the second, 800 veh/h, and the third, 1200 veh/h. In Fig.  7 the layout in terms of the hybrid model is displayed, and the details of the cellular automata model and cell transmission model are shown.

figure 7

Urban link layout with signalised junction

Comparison among models highlights that in the case of low demand (undersaturation) all models provide very similar results. In particular, the proposed hybrid model CA&CTM is very similar to the other benchmark models with respect to both indicators of travel times and queues. However, in the case of higher demand (oversaturation), microscopic models show lower values travel times whilst the estimated values of the queues are higher than the values achieved with the other models. In particular, the proposed hybrid model behaves very similarly to the CTM and CA models. Further details regarding the numerical results are shown in Table 2 below.

In this section the results concerning the artery with three successive signalised junctions (see Fig.  8 ) are considered. The entry flows in each node are displayed in the figure below. In particular, the distance between successive junctions is equal to 810 m, while the sources and sink arcs have 90 m. The parameters of the models are: free flow speed = 15 m/s, wave speed = 5 m/s, outflow capacity = 2000 veh/h, jam density = 200 veh/km, CTM cell length = 15 m, CA cell length = 2.50 m, dawdling probability = 0.266, min CA speed to apply dawdling = 2 cells/s = 5 m/s.

figure 8

Artery with three successive signalised junctions

The results are summarised in Table 3 . It may be observed that, unlike the previous case, the travel times of microscopic models are higher than those of other models, and the hybrid model provides very similar results to those of microscopic models, especially to the CA model in which travel times fall between those of the Krauß (travel time is 213,984.21 veh h) and IDM (travel time is 154,512.18 veh h).

However, as in the previous case, the estimated queues in case of microscopic models are lower with respect to them estimated with the other models, especially by the Krauß model and the IDM, whereas very similar values are shown by the other models (including the microscopic CA model).

3.4 Network

The third application concerns a network layout (see Fig. 9 ) in which all links have one lane in each direction and the saturation flow of each lane is assumed equal to 2000 PCU/h. Regarding link length, links connecting node 5 with other nodes (2–5, 4–5, 5–6, 5–8) are 405 m long (equal to 27 cells, each 15 m long in the CTM), the other links on the network are 810 m long (equal to 54 cells, each 15 m long in the CTM), and finally the links connecting the entry/exit nodes with the network (the connectors) are 90 m long. A scheme of the network layout is shown in Fig.  8 and the details of the entry-exit matrix are displayed in Table 4 .

figure 9

representation of the nine-node grid network layout modelled with the CA & CTM model

The network has signalised junctions at each node, while the solution for the optimisation control problem is based on the criterion of minimising total delay, considering green times and offsets as network decision variables. For this application, path choice modelling has an explicit (enumeration) approach (see [ 56 ].

Results are displayed in Table 5 below. It may be observed that, as in the previous case of the artery, the travel times are very similar to the case of microscopic models and generally to the CA model.

With regard to the number of vehicles in the queue, this value is still higher in the case of microscopic models with respect to the other models. However, the hybrid model provides very similar results to the case of the CA model (slightly lower due to the smoothing effect of the CTM).

3.5 Refinements’ overview

A final comparison of the models has been carried out in terms of computational effort with respect to the artery and the more complex layout of the network; results (see Table 6 ) point out that the even though CA&CTM model provides higher similar elapsed times than the CTM and the PD&CTM, the values are very similar to them of the CA.

A further microscopic analysis has been carried out with reference to gaps evaluated for the artery and the network.

First of all, it may be observed that in the case of all vehicles analyses (see Fig.  10 a and 11 a) the frequency distributions of gaps are concentrate on the lower values whereas in the case of moving vehicles (see Fig.  10 b and 11 b) the frequency distributions of gaps are more dispersed over all higher values.

figure 10

Gap frequency distribution in the artery a all vehicles b only moving vehicles

figure 11

Gap frequency distribution in the network a all vehicles b only moving vehicles

Furthermore, in the case of the artery the gap is distributed with a mean equal to 17 m and a standard deviation equal to 77.5 m whereas in the case of the network layout the gap mean is around 7.5 m and the standard deviation is around 65 m; this result may be justified considering the impact of the interaction between successive junctions within network. Finally, these analyses highlight a different behaviour of two the considered microscopic model, and in particular the deterministic IDM and the stochastic Krauss; the proposed hybrid model shows an intermediate behaviour.

4 Conclusions and future perspectives

This paper compared a proposed hybrid traffic flow model with the three main approaches generally used to describe traffic flow, namely macroscopic, microscopic and mesoscopic models. Macroscopic models are usually adopted for wide-area analysis whereas microscopic models are adopted for sub-area analysis, especially in the case of critical junctions; mesoscopic models may be indifferently adopted for both wide-area and sub-network analysis.

However, hybrid models based on combining two models at different scales are being increasingly used. For instance, wide areas may be directly analysed by combining macroscopic models with microscopic models or mesoscopic models with microscopic models. Furthermore, hybrid traffic flow modelling may also be suitable when the researcher is interested in representing links and nodes at different scales; in other words, microscopic link representation allows for consideration of driver behaviour, and macroscopic node representation avoids single manoeuvre analysis at junctions. Alternatively, specific analyses may require macroscopic link representation, in the case in which driver behaviour may be neglected, but microscopically represented nodes, when information about vehicles approaching signalised junctions is required. This may well be the case of mixed traffic flow analysis in which flow composition is based on human-driven vehicles and connected and autonomous vehicles, and the latter need to be sketched Footnote 1 in order to collect all information required for traffic signal decision variables optimisation.

The main focus of the paper was on comparing the proposed hybrid traffic flow model (CA&CTM), based on combining a macroscopic cell transmission model (CTM; [ 9 ] for link representation and a microscopic cellular automaton (CA [ 43 ], node representation and some benchmark macroscopic and microscopic models. While the reliability of the CTM is amply studied in the literature mainly with reference to queue propagation, CA reliability requires further investigation.

In terms of the macroscopic approach, the model was compared with both the CTM and the CTM with dispersion (PD&CTM; [ 45 ]. Indeed, dispersion may not be directly observed in macroscopic modelling and a specific analytical representation must be included in the CTM. However, as dispersion is endogenously present in microscopic models, consistent traffic flow representation is expected with respect to the CA and especially the CA&CTM. With regard to the microscopic approach, the proposed model was compared with both the Krauß model [ 19 ] which is considered the stochastic enhancement of the Gipps model [ 17 ], the reference model in the context of the collision avoidance class of approaches, and with the intelligent driver model (IDM, Treiber and Helbing [ 47 ]) which is based on the idea of combining the ability to reach the desired speed limit in a traffic-free situation with the ability to identify how much braking is necessary to steer clear of any collision situations.

To this end three main applications were considered: (i) a link with a signalised junction required to introduce capacity constraints to traffic signal stages and suitable for the preliminary interpretation of queuing phenomena; (ii) an artery comprising three successive signalised junctions suitable for queuing and dispersion analysis; (iii) a more complex grid network with signalised junctions required to capture the impact of the interacting junctions. All signalised junctions were optimised with a pre-timed approach and according to the total delay minimisation criterion. All results were analysed in terms of travel time, max and mean queue.

In the first application, numerical results were analysed with reference to three different entry flow values to observe the different model behaviour in undersaturation and oversaturation conditions. Comparison shows that in the case of low demand all models provide very similar results. However, in the case of higher demand, microscopic models provide lower values of the travel times whereas the values of the queues are higher with respect to the other models. In general, the hybrid CA&CTM behaves very similarly to the CTM and CA models, with higher values of travel times and lower values of queues.

In the signalised artery, our results show that, unlike the link layout case, the travel times of the CA model lie between the values provided by the Krauß model and the IDM, while the Krauß model clearly show higher values of travel times. Queues of the microsocpic models, as in the previous case, are higher, whereas very similar values are shown by the other models (including the microscopic CA model).

Finally, the results of the network layout were analysed. In terms of travel times, the proposed CA&CTM model provided very similar results to those of microscopic models, especially to the CA which has an intermediate value between the Krauß model and the IDM. By contrast, in terms of queue modelling, it was again observed that the queues in microscopic models are clearly higher, while the proposed model behaved very similarly to the CA model.

More in general, it must be highlighted that it is well known that in the car-following approaches the inter driver heterogeneity may directly affect the models’ reliability in the reproduction of macroscopic characteristics [ 58 ].

Three main research fields are considered worthy of further exploration: the first task would be to investigate application of the proposed model to the context of connected and autonomous vehicles, also in the presence of human-driven vehicles; secondly, the proposed model could be profitably applied to a real case study; finally, the model will be further developed also in terms of multi—lanes simulation in order to properly apply some of the traffic management strategies. Footnote 2

Availability of data and materials

No field data has been used in the paper. Findings are grounded on simulation experiments.

Connected and automated vehicles must be further analysed. Concerning the connectivity two main types of communications have been incorporated into the vehicles technology: the vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications; the first one collect information about single vehicle in terms of position, speed etc. whilst the V2I communication it is able to provide information about the traffic conditions (see [ 57 ]. Concerning the automation this may be defined in accordance with the U.S. Department of Transportation Releases and the last level of automation, full self-driving automation, refers to fully autonomous vehicles. A vehicle with this level of automation controls entire driving functions in any weather, road, and traffic condition and the V2V communication may be adopted to improve the autonomous vehicle (AV).

There are some specific fields of application, for instance the case of the combined strategies (traffic signals and speed optimization) in which the multi-lane traffic flow modelling may be particularly relevant.

Treiber, M., & Kesting, A. (2013). Traffic flow dynamics. Traffic Flow Dynamics: Data, Models and Simulation, Springer, Berlin. https://doi.org/10.1007/978-3-642-32460-4

Storani, F., Di Pace, R., & de Luca, S (2021) A hybrid traffic flow model for traffic management with human-driven and connected vehicles. Transportmetrica B: Transport Dynamics

Lighthill, M. J., & Whitham, G. B. (1955). On kinematic waves. II. A theory of traffic flow on long crowded roads. Paper presented at the Royal Society London. https://doi.org/10.1098/rspa.1955.0089

Richards, P. I. (1956). Shockwaves on the highway. Operations Research, 4 , 42–51. https://doi.org/10.1287/opre.4.1.42

Article   MathSciNet   MATH   Google Scholar  

Payne, H.J. (1971) Mathematical models of public systems, edited by G.A. Bekey. Simulation Council, La Jolla, CA, Vol. 1, pp. 51–61.

Ross, P. (1988) Traffic dynamics. Transportation Research-B 22 (6), 421–435. https://doi.org/10.1016/0191-2615(88)90023-9

Kerner, B. S., & Konhäuser, P. (1994). Structure and parameters of clusters in traffic flow. Physical Review E , 50 (1), 54.

Article   Google Scholar  

Helbing, D. (1996). Gas-kinetic derivation of Navier–Stokes-like traffic equations. Physical Review E, 53 (3), 2366. https://doi.org/10.1103/PhysRevE.53.2366

Article   MathSciNet   Google Scholar  

Daganzo, C. F. (1994). The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory. Transportation Research Part B: Methodological, 28 (4), 269–287. https://doi.org/10.1016/0191-2615(94)90002-7

Newell, G. F. (1993). A simplified theory of kinematic waves in highway traffic, part I: General theory. Transportation Research Part B: Methodological, 27 (4), 281–287. https://doi.org/10.1016/0191-2615(93)90038-C

Yperman, I. (2007). The Link Transmission Model (Doctoral dissertation, PhD Thesis. Department of Transport and Infrastructure, Katholieke Universiteit Leuven). https://www.researchgate.net/publication/28360292_The_Link_Transmission_Model_for_dynamic_network_loading

Chandler, R. E., Herman, R., & Montroll, E. W. (1958). Traffic dynamics: Studies in car following. Operations Research, 6 , 165–184. https://doi.org/10.1287/opre.6.2.165

Gazis, D. C., Herman, R., & Rothery, R. W. (1961). Nonlinear follow the leader models of traffic flow. Operations Research, 9 , 545–567. https://doi.org/10.1287/opre.9.4.545

Ahmed, K. I. (1999). Modeling drivers’ acceleration and lane changing behavior. Ph.D. thesis. Massachusetts Institute of Technology. http://hdl.handle.net/1721.1/9662

Koutsopoulos, H. N., & Farah, H. (2012). Latent class model for car following behavior. Transportation Research Part B: Methodological, 46 , 563–578. https://doi.org/10.1016/j.trb.2012.01.001

Lee, G. (1966). A generalization of linear car-following theory. Operations Research, 14 , 595–606. https://doi.org/10.1287/opre.14.4.595

Gipps, P. G. (1981). A behavioural car-following model for computer simulation. Transportation Research-B, 15 , 105–111. https://doi.org/10.1016/0191-2615(81)90037-0

Leutzbach, W. (1988). An introduction to the theory of traffic flow . Springer, Berlin. Retrieved from https://link.springer.com/content/pdf/ https://doi.org/10.1007/978-3-642-61353-1.pdf

Krauß, S. (1998). Microscopic modeling of traffic flow: Investigation of collision free vehicle dynamics (Doctoral dissertation).

Bando, M., Hasebe, K., Nakayama, A., Shibata, A., & Sugiyama, Y. (1995). Dynamical model of traffic congestion and numerical simulation. Physical Review E, 51 , 1035. https://doi.org/10.1103/PhysRevE.51.1035

Davis, L. (2003). Modifications of the optimal velocity traffic model to include delay due to driver reaction time. Physica A: Statistical Mechanics and its Applications, 319 , 557–567. https://doi.org/10.1016/S0378-4371(02)01457-7

Article   MATH   Google Scholar  

Gong, H., Liu, H., & Wang, B. H. (2008). An asymmetric full velocity difference car-following model. Physica A: Statistical Mechanics and its Applications, 387 , 2595–2602. https://doi.org/10.1016/j.physa.2008.01.038

Helbing, D., & Tilch, B. (1998). Generalized force model of traffic dynamics. Physical Review E, 58 , 133. https://doi.org/10.1103/PhysRevE.58.133

Jiang, R., Wu, Q., & Zhu, Z. (2001). Full velocity difference model for a car-following theory. Physical Review E, 64 , 017101. https://doi.org/10.1103/PhysRevE.64.017101

Peng, G., & Sun, D. (2010). A dynamical model of car-following with the consideration of the multiple information of preceding cars. Physics Letters A, 374 , 1694–1698. https://doi.org/10.1016/j.physleta.2010.02.020

Treiber, M., Hennecke, A., & Helbing, D. (2000). Congested traffic states in empirical observations and microscopic simulations. Physical Review E, 62 , 1805. https://doi.org/10.1103/PhysRevE.62.1805

Michaels, R. M. (1963). Perceptual factors in car following In Proceedings of the second international symposium on the theory of road traffic flow . Paris: OECD, pp. 44–59.

Wiedemann, R. (1974) “Simulation des Strassenverkehrsflusses” Schriftenreihe des Institutes für Verkehrswesen der Universität Karlsruhe.

Burghout, W. (2004). Hybrid Microscopic-Mesoscopic Traffic Simulation. PhD diss., Royal Institute of Technology. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-72

Burghout, W., Koutsopoulos, H. N., & Andreasson, I. (2005). Hybrid mesoscopic–microscopic traffic simulation. Transportation Research Record, 1934 (1), 218–225. https://doi.org/10.1177/0361198105193400123

Bourrel, E. (2003). Modélisation dynamique de l’écoulement du trafic routier: du macroscopique au microscopique. These de Doctorat, l’Institut National des Sciences Appliquées de Lyon, France. Retrieved from http://theses.insa-lyon.fr/publication/2003ISAL0073/these.pdf

Bourrel, E., & Henn, V. (2002). Mixing micro and macro representations of traffic flow: a first theoretical step. In Proceedings of the 9th meeting of the Euro Working Group on Transportation (pp. 610–616). Retrieved from http://www.iasi.cnr.it/ewgt/13conference/109_bourrel.pdf

Buisson, C., & Wagner, P. (2004). Calibration and validation of simulation models . Washington: 2004 TRB annual meeting Workshop.

Google Scholar  

Daganzo, C. F. (2006). In traffic flow, cellular automata= kinematic waves. Transportation Research Part B: Methodological , 40 (5), 396–403.

Laval, J. A., & Daganzo, C. F. (2003). A hybrid model of traffic flow: Impacts of roadway geometry on capacity. TRB 2003 Annual Meeting CD-ROM. Retrieved from http://trafficlab.ce.gatech.edu/sites/default/files/docs/MLHM%20paper.html

Daganzo, C. F. (1995). A finite difference approximation of the kinematic wave model of traffic flow. Transportation Research Part B: Methodological, 29 (4), 261–276. https://doi.org/10.1016/0191-2615(95)00004-W

Daganzo, C. F. (2006). In traffic flow, cellular automata = kinematic waves. Transportation Research Part B: Methodological, 40 (5), 396–403. https://doi.org/10.1016/j.trb.2005.05.004

Leclercq, L. (2007). Hybrid approaches to the solutions of the “Lighthill–Whitham–Richards” model. Transportation Research Part B: Methodological, 41 (7), 701–709. https://doi.org/10.1016/j.trb.2006.11.004

Bourrel, E., & Lesort, J. B. (2003). Mixing micro and macro representation of traffic flow: A hybrid model based on the lwr theory. Transportation Research Record, 1852 , 193–200. https://doi.org/10.3141/1852-24

Hennecke, A., Treiber, M., Helbing, D. (2000). Macroscopic simulation of open systems and micro–macro link. In Helbing, D., Herrman, H. J., Shcrechenberg, M., Wolf, D. E. (Eds.), Proceedings of the traffic and granular flow conference , pp. 383–388.708L. Leclercq / Transportation Research Part B 41 (2007) 701–709. https://doi.org/10.1007/978-3-642-59751-0_38

Magne, L., Rabut, S., Gabard, J. F. (2002). Towards a hybrid macro micro traffic flow simulation model. In INFORMS Spring 2000Meeting

Poshinger, A., Kates, R., Keller, H. (2002). Coupling of concurrent macroscopic and microscopic traffic flow models using hybrid stochastic and deterministic disaggregation. In Taylor, M. A. P. (Ed.), Proceedings of the 15th ISTTT, pp. 583–605. https://doi.org/10.1108/9780585474601-029

Nagel, K., & Schreckenberg, M. (1992). A cellular automaton model for freeway traffic. Journal de Physique I, 2 (12), 2221–2229. https://doi.org/10.1051/jp1:1992277

Storani, F, Di Pace, R., & De Schutter, B (2021) A traffic responsive control framework for signalised junctions based on hybrid traffic flow representation. Journal of Intelligent Transportation Systems

Cantarella, G. E., de Luca, S., Di Pace, R., & Memoli, S. (2015). Network signal setting design: Meta-heuristic optimisation methods. Transportation Research Part C, 55 , 24–45. https://doi.org/10.1016/j.trc.2015.03.032

Di Pace, R. (2020). A traffic control framework for urban networks based on within-day dynamic traffic flow models. Transportmetrica A: Transport Science , 16 (2), 234–269. ISSN: 2324–9943.

MathSciNet   Google Scholar  

Treiber, M., Helbing, D., (2002). Realistische mikrosimulation von straenverkehr mit einem einfachen modell. In Symposium ”Simulationstechnik ASIM . Retrieved from https://mtreiber.de/publications/MOBIL_ASIM.pdf

Robertson, D. (1986a). Research on the transyt and scoot methods of signal coordination. ITE Journal-institute of Transportation Engineers, 56 (1), 36–40. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.417.7154&rep=rep1&type=pdf

Robertson, D. I. (1969). TRANSYT: a traffic network study tool.

Robertson, D. I. (1969b). TRANSYT: a traffic network study tool. Tech. rep. TRRL-LR-253, Transport and Road Research Laboratory.

Hunt, P. B., Robertson, D. I., Bretherton, R. D., Winton, R. I. (1981). SCOOT-a traffic responsive method of coordinating signals (No. LR 1014 Monograph).

Kometani, E. I. J. I., & Sasaki, T. S. U. N. A. (1958). On the stability of traffic flow (report-I).  Journal of the Operations Research Society of Japan ,  2 (1), 11–26. Retrieved from https://www.orsj.or.jp/~archive/pdf/e_mag/Vol.02_01_011.pdf

Barlovic, R., Santen, L., Schadschneider, A., & Schreckenberg, M. (1998). Metastable states in cellular automata for traffic flow. The European Physical Journal B-Condensed Matter and Complex Systems, 5 (3), 793–800. https://doi.org/10.1007/s100510050504

Kerner, B. S., Klenov, S. L., & Wolf, D. E. (2002). Cellular automata approach to three-phase traffic theory. Journal of Physics A: Mathematical and General, 35 (47), 9971. https://doi.org/10.1088/0305-4470/35/47/303

Alvarez Lopez, P. Behrisch, M, Bieker-Walz, L, Erdmann, J, Flötteröd,Y-P, Hilbrich, R. Lücken, L. Rummel, J. Wagner, P., Wießner, E. (2018). Microscopic Traffic Simulation using SUMO. In IEEE Intelligent Transportation Systems Conference (ITSC) , 2018. https://doi.org/10.1109/ITSC.2018.8569938

Cascetta E. (2009). Transportation systems analysis: Models and applications . Springer, pp. 448–477

Talebpour, A., & Mahmassani, H. S. (2016). Influence of connected and autonomous vehicles on traffic flow stability and throughput. Transportation Research Part C: Emerging Technologies, 71 , 143–163. https://doi.org/10.1016/j.trc.2016.07.007

Punzo, V., & Montanino, M. (2020). A two-level probabilistic approach for validation of stochastic traffic simulations: Impact of drivers’ heterogeneity models. Transportation Research Part C: Emerging Technologies, 121 , 102843. https://doi.org/10.1016/j.trc.2020.102843

Download references

Acknowledgements

The author also wishes to thank anonymous reviewers for their helpful comments.

This research has been partially supported by the University of Salerno, under PhD program on transportation (Ph.D. School in Environmental Engineering), local grant n. ORSA180377- 2018, local grant n. ORSA191831-2019, CA18232 - Mathematical models for interacting dynamics on networks and under the Italian program POIN AIM and under the Italian program PON AIM – Attraction and International Mobility, Linea 1 (AIM1877579-3-CUP-D44I18000220006).

Author information

Authors and affiliations.

Dipartimento Di Ingegneria Civile Edile (DICIV), Università Degli Studi Di Salerno, Via Giovanni Paolo II, 132, 84084, Fisciano, SA, Italy

Facundo Storani, Roberta Di Pace, Francesca Bruno & Chiara Fiori

You can also search for this author in PubMed   Google Scholar

Contributions

Facundo Storani: model formulation, model implementation and writing the paper. Roberta Di Pace: model formulation, model implementation and writing the paper. Francesca Bruno: model formulation and writing the paper. Chiara Fiori: model formulation and writing the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Roberta Di Pace .

Ethics declarations

Ethics approval and consent to participate.

Not required; considered data are related not to a collected survey but to simulated data.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Storani, F., Di Pace, R., Bruno, F. et al. Analysis and comparison of traffic flow models: a new hybrid traffic flow model vs benchmark models. Eur. Transp. Res. Rev. 13 , 58 (2021). https://doi.org/10.1186/s12544-021-00515-0

Download citation

Received : 02 January 2021

Accepted : 20 October 2021

Published : 24 November 2021

DOI : https://doi.org/10.1186/s12544-021-00515-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Macroscopic traffic flow models
  • Microscopic traffic flow models
  • Hybrid traffic flow models

travel flow models

  • View Record

https://nap.nationalacademies.org/catalog/27432/critical-issues-in-transportation-for-2024-and-beyond

TRID the TRIS and ITRD database

A Spatial Econometric Model for Travel Flow Analysis and Real-World Applications with Massive Mobile Phone Data

Cellular signaling data provide a massive and emerging source to acquire urban origin-destination (OD) travel flows for transportation planners, support decision-making of large-scale mobility enhancement, and make it possible to explore underling influence factors of travel demand considering spatial autocorrelation. The effects of population, facilities, and transit accessibility on the travel flow between traffic analysis zones are revealed with empirical evidence. This paper employs the spatial econometric model for the OD travel flow analysis by coping massive mobile data with other related explanatory features of different urban regions. The results of real-world applications with Hangzhou, China show that: (I) all of the origin dependence, destination dependence and OD dependence are statistically significant, which verifies the consideration of spatial interdependence; (II) permanent population, facility number and transit accessibility all have positive correlation with travel flows; (III) distance, as expected, is negatively correlated with the travel flow volume. Finally, policy implications are discussed based on the estimated coefficients, marginal effects of explanatory variables, and future urban development plans by 2020. These findings contribute to the design of urban land use and transportation policies.

  • This paper was sponsored by TRB committee ADD30 Standing Committee on Transportation and Land Development. Alternate title: Spatial Econometric Model for Travel Flow Analysis and Real-World Applications with Massive Mobile Phone Data.

Transportation Research Board

  • Ni, Linglin
  • Wang, Xiaokun Cara
  • Chen, Xiqun (Michael)
  • Transportation Research Board 96th Annual Meeting
  • Location: Washington DC, United States
  • Date: 2017-1-8 to 2017-1-12
  • Media Type: Digital/other
  • Features: Figures; References; Tables;
  • Pagination: 19p
  • Monograph Title: TRB 96th Annual Meeting Compendium of Papers

Subject/Index Terms

  • TRT Terms: Autocorrelation ; Cellular telephones ; Data analysis ; Econometric models ; Flow ; Mobile telephones ; Origin and destination ; Spatial analysis ; Travel behavior
  • Geographic Terms: Hangzhou (China)
  • Subject Areas: Data and Information Technology; Economics; Highways; Planning and Forecasting;

Filing Info

  • Accession Number: 01623343
  • Record Type: Publication
  • Report/Paper Numbers: 17-04042
  • Files: TRIS, TRB, ATRI
  • Created Date: Jan 24 2017 3:31PM

Guo, Zhang, et al.

Explainable Traffic Flow Prediction with Large Language Models

Traffic flow prediction is crucial for urban planning, transportation management, and infrastructure development. However, achieving both accuracy and interpretability in prediction models remains challenging due to the complexity of traffic data and the inherent opacity of deep learning methodologies. In this paper, we propose a novel approach, Traffic Flow Prediction LLM (TF-LLM), which leverages large language models (LLMs) to generate interpretable traffic flow predictions. By transferring multi-modal traffic data into natural language descriptions, TF-LLM captures complex spatial-temporal patterns and external factors such as weather conditions, Points of Interest (PoIs), date, and holidays. We fine-tune the LLM framework using language-based instructions to align with spatial-temporal traffic flow data. Our comprehensive multi-modal traffic flow dataset (CATraffic) in California enables the evaluation of TF-LLM against state-of-the-art deep learning baselines. Results demonstrate TF-LLM’s competitive accuracy while providing intuitive and interpretable predictions. We discuss the spatial-temporal and input dependencies for explainable future flow forecasting, showcasing TF-LLM’s potential for diverse city prediction tasks. This paper contributes to advancing explainable traffic prediction models and lays a foundation for future exploration of LLM applications in transportation.

Keywords : Traffic flow prediction, Large language models, Explainability.

2 Introduction

Traffic flow prediction is a typical spatial-temporal problem and plays a critical role in urban planning, transportation management, and infrastructure development. With the increasing availability and importance of large spatio-temporal datasets in traffic such as traffic sensing data, human-flow, and GPS trajectories, deep learning-based traffic analysis [ 1 ] have become popular, covering human mobility research [ 2 ] [ 3 ] , traffic management [ 4 ] [ 5 ] , and accident analysis [ 6 ] . Typically, these issues are treated as spatio-temporal deep learning problems. Deep learning methods consistently learn hierarchical feature representations from spatial-temporal data, understand historical trends [ 7 ] [ 8 ] , and employ graphs to illustrate the spatial relationships between locations [ 9 ] [ 10 ] . The spatio-temporal-graph learning paradigm is the primary methodology for learning representations and capturing potential trends and relationships from traffic data.

Well-established deep learning models usually perform well in prediction accuracy, leveraging advanced architectures, which can empower decision-makers in optimizing traffic flow, mitigating congestion, and improving overall transportation efficiency. However, developing prediction models that are both accurate and explainable is still confronted with challenges. Traffic data inherently embodies complex spatio-temporal patterns influenced by diverse factors such as weather conditions, road networks, events, and human behavior. Accurately capturing these dynamics necessitates models with high capacity and sophisticated learning mechanisms [ 11 ] [ 12 ] . However, such models often lack interpretability, as they are artificially crafted and tend to prioritize prediction accuracy over explainability. Moreover, the abstract representations inherent in deep-learning methodologies further obscure generalization ability, despite their efficacy in achieving accurate predictions. This emphasis on accuracy often results in opaque systems that offer limited insight into the underlying mechanisms governing predictions and how inputs contribute to them. As a result, striking a balance between prediction accuracy and interpretability remains a critical challenge in the development of traffic prediction models.

Refer to caption

Recently, with the popularity of foundation models [ 13 ] [ 14 ] , spatial-temporal learning tasks are gradually refined into language format for exploring the potential of large language models(LLMs) in various application fields, including forecasting, classification, missed data imputation, and anomaly detection. Considering multi-modalities in urban big data, the LLM framework transfers original data into natural language description and is capable of capturing latent relationships between inputs from complicated contexts. Also, LLMs can generate explanations of the reasoning process, which provides available supplements for prediction and decision-making. However, spatial-temporal alignments are required because foundation models are pre-trained based on vast language text datasets, leading to poor performance in domain tasks.

To alleviate the influence of the challenges mentioned above, we present Traffic Flow Prediction LLM (TF-LLM), an explainable traffic prediction model. Considering multi-modalities as input, our framework transferred the spatial-temporal traffic flow and external factors (PoIs, weather, date, holiday) as languages. We take the language-based instruction fine-tuned as the alignment method. After directly fine-tuning, TF-LLM performs well compared with deep learning SOTA baselines, evaluated based on different traffic flow datasets. Also, spatial-temporal and input dependencies are discussed for explainable future flow forecasting. This study’s contributions are as follows:

We present a comprehensive multi-modality traffic flow dataset (CATraffic) in California, covering traffic sensors from various areas with weather, nearby PoIs, and holiday information factors. It can be used for future explainable learning-based prediction work.

A traffic flow prediction method based on large language models - TF-LLM is proposed to generate interpretable prediction results, still maintaining competitive accuracy with SOTAs.

TF-LLM demonstrates a language-based format considering multi-modalities and instruction spatial-temporal alignment, which provide an intuitive view and can be simply generalized to different city prediction tasks.

For the rest part of the paper, the related works are involved in Section II and details about methodology are illustrated in Section III. Comparison results, ablation, and explainable case studies are included in experimental results and analysis, as Section IV. Section V summarizes this paper and gives sights for future exploration of LLM applications in transportation.

3 Related Work

In this section, we will first explore advancements in traffic flow prediction, emphasizing the integration of deep learning methodologies. Subsequently, we’ll delve into the importance of explainable prediction, discussing methods for enhancing interpretability in spatial-temporal learning. Finally, we’ll highlight the transformative role of LLMs across diverse domains, elucidating their pre-training and fine-tuning practices for domain-specific tasks.

3.1 Traffic flow prediction

Recently, the field of spatial-temporal learning has witnessed significant advancements, particularly in traffic flow prediction, due to the emergence of deep learning methodologies. These approaches have enabled the modeling of latent relationships among various features of traffic flow in diverse formats. These architectural designs are meticulously crafted to comprehend and represent the intricate interplay between spatial and temporal dimensions within datasets. Convolutional Neural Networks(CNNs) [ 2 ] [ 15 ] , quite renowned for their efficacy in computer vision, are employed to discern the spatial relations among grid regions, by filtering the input data. Moreover, Recurrent Neural Networks (RNNs) [ 16 ] are usually leveraged to adeptly capture temporal dependencies, through the maintenance of a memory state, facilitating the reusing of information over time. Notably, more spatial-temporal learning frameworks introduce Graph Neural Networks(GNNs) [ 17 ] [ 18 ] [ 19 ] , advanced in the representation of complex spatial relationships inherent in data structured as graphs, wherein nodes correspond to spatial locations and edges encapsulate the connections between them. Additionally, the adaptation of Transformers [ 20 ] , originally proposed for natural language processing, has proven effective in long-sequence modeling to capture comprehensive information. Various Transformer blocks [ 21 ] [ 22 ] have been tailored for different dependencies among spatial-temporal features, enabling the modeling of intricate relationships. A notable trend in this domain is the combination of different model architectures, leveraging various modules for spatial or temporal features [ 23 ] [ 24 ] . This amalgamation gradually becomes the prevailing paradigm, showcasing promising performance in prediction tasks. However, it’s worth noting that while these methods perform excellent in prediction accuracy, they often fall short in terms of explainability and generalization.

3.2 Explainable prediction

The interpretability of spatial-temporal learning is also worthy of consideration for reliable prediction, which provides abundant views beyond prediction accuracy. Most recent works studied which features mostly affect decisions generated by models. [ 25 ] focuses on the dependency on latent variables of road forecasting based on black-box machine learning methods, including RNNs and Random Forests. The spatial-temporal causal graph inference, as presented in reference [ 26 ] , offers an approximation of the Granger causality test, thereby enhancing the accessibility of forecasting. Counterfactual explanations for time series [ 27 ] [ 28 ] are also highly regarded, as they concentrate on generating alternative prediction outcomes by selecting time series data points from the training set and substituting them into the sample under analysis. This method allows for the illustration of results by examining a limited number of variables. Large Language Models offer an alternative approach to interpreting prediction results with greater intuitiveness [ 29 ] [ 30 ] [ 31 ] . By mapping natural language-based inputs to outputs, this relationship can be studied simply by altering the input, without the requirements for complex feature engineering or model architecture adjustments.

3.3 Large language models

Large Language Models (LLMs) have achieved remarkable success across a wide range of tasks and fields, including natural language processing [ 32 ] , vision-language tasks [ 33 ] , and various other interdisciplinary domains [ 34 ] , [ 35 ] . Originally designed as pre-trained language foundation models for addressing various natural language tasks, LLMs have exhibited the capacity to acquire intricate semantic and knowledge representations from extensive text corpora over time. This newfound ability has been a profound source of inspiration within the community for addressing a variety of tasks. The success of models like GPT-4 [ 13 ] in natural language understanding and generation tasks has spurred interest in exploring their potential for handling complex, multi-modal datasets beyond traditional linguistic domains. They can extract valuable information and relationships from complex textual contexts, thereby enhancing the learning of city data. With the popularity of decoder-architecture LLMs, domain tasks are normally formulated into the next token generation, which provides a unified formulation to learn the map from the input to the output. To acquire large models for specific fields, the practices of pre-training and fine-tuning have become widely accepted in the model training process. Pre-training a foundation model from scratch necessitates substantial computing resources and domain-specific datasets, resulting in its superior performance within professional domains compared to baseline models. On the other hand, fine-tuning based on foundation models offers a more accessible approach, involving adjustments to only a few parameters [ 36 ] . This method preserves most general knowledge while targeting expertise in domain-specific tasks. In some cases, researchers freeze all parameters of large language models and focus solely on training the extended encoders and decoders [ 37 ] [ 38 ] . This strategy aims to extend the learning capabilities of LLMs to domain-specific tasks while leveraging their existing knowledge base.

4 Methodology

In the following sections, we will provide a comprehensive overview of our approach to traffic flow prediction. Firstly, we’ll describe the problem formulation and the predictive framework. Next, we’ll discuss the construction of prompts, crucial for fine-tuning Large Language Models (LLMs). Finally, we’ll delve into the supervised fine-tuning process.

4.1 Problem Description

The traffic flow prediction problem as a part of spatial-temporal prediction problems, can be formulated as forecasting future values according to the historical data. In our framework, the goal is to predict future-step values and generate explanations based on text prompts from historical values and external factors.

𝑇 𝐻 X_{T:T+H} italic_X start_POSTSUBSCRIPT italic_T : italic_T + italic_H end_POSTSUBSCRIPT represent continuous historical values and future predicted values with H steps. S i subscript 𝑆 𝑖 S_{i} italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT demonstrates the spatial attributes of the region i 𝑖 i italic_i . E i subscript 𝐸 𝑖 E_{i} italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT includes external factors such as the date, holiday, and weather information within the region. T 𝑇 T italic_T is the explanation generated by LLMs illustrating how it captures the input dependency and predicts the trend of this case.

4.2 Prompt Construction

Refer to caption

The instruction part of Figure  2 showcases a meticulously crafted prompt template, designed to capture essential details consistently and comprehensively. This structured format is tailored to convey diverse data modalities, boost the model’s comprehension, and refine predictive accuracy. Given the innate capacity of Language Models (LLMs) for textual reasoning, spatial-temporal data is fashioned into language-based inputs. These inputs are enriched with task-specific descriptions and Chain of Thought (CoT) prompts as shown in Figure  3 , strategically crafted to enhance the LLMs’ reasoning capabilities. System prompts are designed to incorporate task illustrations and relevant knowledge, thereby priming LLMs for prediction tasks. This approach ensures the adaptability of LLMs to the specific requirements of prediction tasks. Furthermore, to empower LLMs to glean informative factors from inputs, CoT prompts guide the model to first consider spatial attributes and temporal periods. Subsequently, the model is prompted to reason through potential traffic patterns within the specified area during the predicted time frame.

Refer to caption

Multi-modal information prompts serve as the cornerstone of our approach. In this context, spatial attributes are derived from nearby Points of Interest (PoIs). We preprocess PoI category data within different proximity ranges (3km, 5km, 10km), aligning with the locations of traffic volume sensors in the designated area. This ensures comprehensive coverage of diverse spatial attributes across the dataset, acknowledging the varied traffic patterns influenced by different locations.

To achieve this, we employ clustering techniques to group PoIs within each designated radius, ultimately organizing them into 1000 clusters. These clusters are then summarized into distinct categories such as transportation hubs, commercial zones, and residential areas, effectively representing the key characteristics of each geographical area. This approach facilitates a nuanced representation of spatial attributes, enabling the model to grasp the intricate interplay between various factors impacting traffic flow dynamics. Also, regional data comprises details such as the city, and road location. The TF-LLM framework leverages this multifaceted information to recognize and integrate spatio-temporal patterns across diverse regions and time periods.

Historical series and external factors are directly translated into natural language descriptions. Temporal information encompasses aspects such as the day of the week and the specific hour and holiday. This integration ensures a holistic understanding of the surrounding environment, historical trends, and external influences, enriching the model’s contextual reasoning and enhancing its predictive capabilities. The structure of the instructional material for spatio-temporal data is depicted in Figure 4, and a detailed prompt design can be found in Appendix A.1 .

Refer to caption

4.3 Supervised Fine-tune

The prediction task for LLM is framed as a next-token generation task. Following the construction of prompts in a specific format, the subsequent step involves fine-tuning the LLM. To facilitate this process, we annotate the instruction prompt section and label it with specific tokens, thereby establishing formatted instructions. Leveraging the wealth of prior knowledge and the inference capabilities inherent in foundation models, we utilize formatted-prompt datasets to fine-tune the pre-trained large language model. Our TF-LLM model is built upon the renowned open-source large language model, Llama2 [ 14 ] , integrating LoRA technology [ 36 ] . LoRA is a parameter-efficient fine-tuning method for large language models. It modifies the self-attention layers of Transformers by introducing low-rank matrices A 𝐴 A italic_A and B 𝐵 B italic_B to represent changes to the attention weights W 𝑊 W italic_W , with Δ ⁢ W = A × B Δ 𝑊 𝐴 𝐵 \Delta W=A\times B roman_Δ italic_W = italic_A × italic_B . This approach reduces the number of trainable parameters, as A 𝐴 A italic_A and B 𝐵 B italic_B are much smaller than W 𝑊 W italic_W , facilitating quicker and less resource-intensive adaptations while retaining model performance. Fine-tuning involves learning the compact matrices A 𝐴 A italic_A and B 𝐵 B italic_B while keeping other parameters frozen.

Throughout the fine-tuning phase, we employ the Cross-Entropy loss function to compute the loss between the output tokens and labels. This loss function quantifies the disparity between the model’s predicted probability distribution over tokens y ^ ^ 𝑦 \hat{y} over^ start_ARG italic_y end_ARG and the actual distribution represented by the labels y 𝑦 y italic_y . Mathematically, the Cross-Entropy loss can be expressed as:

Where y i subscript 𝑦 𝑖 y_{i} italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and y i ^ ^ subscript 𝑦 𝑖 \hat{y_{i}} over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG donate the i 𝑖 i italic_i -th elements of y 𝑦 y italic_y and y i ^ ^ subscript 𝑦 𝑖 \hat{y_{i}} over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG respectively. By optimizing this loss function, the model learns to make more accurate predictions tailored to the specific task or domain it is being fine-tuned for. This approach maximizes the learning potential of the LLM, enabling it to develop specialized proficiency in the target domain.

5 Experiments Settings and Results

In this section, we will start by explaining our experimental setups. This includes detailing the dataset we used, the evaluation methods we employed, the baseline models we compared against, and the parameters we used in the fine-tuning process. Subsequently, we will present the results of our experiments. This will involve comparing TF-LLM with the baseline models, examining how prediction errors are distributed in spatio-temporal contexts, conducting ablation studies, and exploring zero-shot experiments. Finally, we will discuss the interpretability of TF-LLM in the context of traffic flow prediction. This discussion will elucidate how the robust reasoning capability inherent in large language models enhances traffic flow forecasting tasks.

5.1 Dataset Description

Our experiments are conducted using our proposed multi-modal traffic flow prediction dataset, named CATraffic. This dataset encompasses traffic volume data from various regions in California, alongside meteorological information, nearby point of interests (PoIs) data, and holiday information. The traffic volume data is sourced from the LargeST dataset [ 39 ] , which comprises five years (2017-2021) of traffic flow data in California, encompassing 8600 traffic sensors sampled at 15-minute intervals. We construct our dataset by selecting a subset of this data, focusing on 1000 sensors from the Greater Los Angeles (GLA) and Greater Bay Area (GBA) regions. This subset spans a two-year period from January 1, 2018, to December 30, 2019, with data sampled hourly. During the sensor selection process, to ensure a diverse representation of traffic patterns, we cluster all sensors into 1000 categories based on their nearby PoIs features. From each category, only one sensor is retained to build our CATraffic dataset. The PoIs data is obtained through OpenStreetMap 1 1 1 OpenStreetMap: https://openmaptiles.org using the Overpass Turbo API 2 2 2 Overpass Turbo: https://overpass-turbo.eu . We record the number of different PoIs within a 10km radius in each direction to compose the final PoIs features. For meteorological data, we collect information from the National Oceanic and Atmospheric Administration ( NOAA 3 3 3 NOAA: https://www.ncei.noaa.gov/ ), aligning it with the locations of traffic sensors. This includes factors such as reported weather events, temperature, and visibility, which are considered as they have direct impacts on traffic patterns.

We divided the data into two sets: data from 2018 was used as the training set, while data from 2019 was reserved for model validation. All experiments were configured to predict traffic flows for the subsequent 12 hours based on historical traffic flows data spanning the preceding 12 hours. During the data preprocessing stage, we filtered out samples with zero traffic volume for 24 consecutive hours, which likely resulted from malfunctioning sensors. Additionally, to assess the model’s zero-shot capability, we created a zero-shot dataset derived from LargeST [ 39 ] . This dataset comprises data from 100 sensors in San Diego (SD), covering the period from November 1, 2019, to December 31, 2019. These data were not utilized in the model fine-tuning process and serve as a means to evaluate the model’s generalization performance.

5.2 Evaluation Metrics

In time series forecasting tasks, researchers commonly employ Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), to evaluate the accuracy of forecasting results. These metrics are defined as follows:

Here, y i subscript 𝑦 𝑖 y_{i} italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the ground truth value of the i 𝑖 i italic_i -th data point, y ^ i subscript ^ 𝑦 𝑖 \hat{y}_{i} over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the corresponding predicted value, and n 𝑛 n italic_n stands for the total number of samples. RMSE is advantageous for emphasizing larger errors due to its square term, while MAE provides a straightforward interpretation by averaging absolute errors, treating all errors equally. On the other hand, MAPE measures the average percentage difference between predicted and actual values, offering interpretability in terms of relative accuracy but being sensitive to zero values in the denominator. Researchers typically employ a combination of these metrics to comprehensively assess model performance.

5.3 Baseline Models

We extensively compared our proposed TF-LLM with nine advanced baseline models. Among these, LSTM [ 16 ] stands as a temporal-only deep model based on Recurrent Neural Networks (RNNs), disregarding spatial correlations. Additionally, we select DCRNN [ 40 ] and AGCRN [ 41 ] as the representation of RNNs-based methods. We also choose TCN-based methods such as STGCN [ 18 ] and GWNET [ 17 ] , along with attention-based methods ASTGCN [ 19 ] and STTN [ 23 ] . These models were proposed between 2018 and 2020, reflecting the prevalent research direction in time series forecasting during those years. Furthermore, we integrated three representative methods from recent years, including STGODE [ 42 ] and DSTAGNN [ 43 ] . STGODE adeptly utilizes neural ordinary differential equations to capture the continuous changes of traffic signals, while DSTAGNN is specifically designed to capture the dynamic correlations among traffic sensors.

5.4 Experiment Settings

Our TF-LLM model is fine-tuned based on the well-known open-source large language model, Llama2 [ 14 ] , specifically utilizing the chat version with a size of 7B. We load the base model in 8 bits for fine-tuning, with parameters including a batch size of 8, a learning rate of 5e-4, a warm-up step of 400, gradient accumulation steps of 8, and training for two epochs. The LoRA [ 36 ] parameters are configured with a rank of 64 and an alpha value of 16. During the inference phase, a temperature of 0.95 was applied.

5.5 Overall Performance

We report the performance comparison results between our proposed TF-LLM and the baseline models in Table  1 . All models are trained and evaluated on our CATraffic dataset, with the same dataset settings. The task is to utilize traffic flows data from the historical 12 hours to forecast future traffic flows in the next 12 hours. We represent the results of horizons 3, 6, 9, and 12, as well as the average performance over all 12 steps in the table. The results demonstrate that the overall performance of our proposed TF-LLM exceeds that of the baseline models by a large margin, especially in the MAE and MAPE. For example, In terms of average performance of the 12 horizons, our model outperforms the best two baseline models (GWNET [ 17 ] and STGCN [ 18 ] ) by 18.37% in MAE and 34.00% in MAPE, which exhibits the impressive capability in traffic flow forecasting of our model.

Refer to caption

We further plot the results of the baseline models and TF-LLM with different prediction horizons, depicted in Figure  5 . From left to right, RMSE, MAE, and MAPE of comparised models at different prediction horizons are shown. The results yielded the following observations:

As the prediction horizon increases, performance generally declines across all models, as longer-term forecasts inherently entail greater uncertainty and complexity. However, several models exhibit improved performance in longer-term forecasting, such as LSTM, ASTGCN, and DCRNN. This phenomenon may be attributed to these models’ ability to capture and leverage the periodicity within the data, allowing them to make more accurate predictions over extended time horizons.

Our proposed model consistently outperforms the comparison methods at each time step, showing significant advantages in both short-term and long-term traffic flow forecasting. This indicates the robustness of our model in various prediction horizons.

Compared to RMSE, our proposed model has a significant advantage in the MAE and MAPE, which is attributed to its robustness to outliers. Compared to RMSE, MAE and MAPE are less sensitive to extreme errors because they measure the mean absolute error and the percentage error, respectively. The excellent performance of our model on MAE and MAPE suggests that it is effective in mitigating the effects of outliers and provides more accurate and stable predictions, especially in cases where extreme values may occasionally occur.

These findings underscore the effectiveness of our method in capturing complex temporal patterns in traffic flow data, leading to more accurate and reliable predictions.

5.6 Spatial and Temporal Homogeneity

Evaluating spatial and temporal homogeneity in traffic flow prediction is vital for ensuring model performance, generalizability, and robustness. It ensures that traffic flow prediction models can effectively adapt to diverse real-world conditions. Therefore, in this section, we will thoroughly analyze the performance of our proposed model in terms of spatio-temporal consistency.

Refer to caption

To analyze the spatial homogeneity of our model, we conducted evaluations in various locations across the Greater Los Angeles (GLA) with diverse urban distributions to assess the generalization capability of language models in learning traffic patterns from different spatial features. We compared our proposed TF-LLM model against LSTM [ 16 ] , AGCRN [ 41 ] , STGCN [ 18 ] and GWNET [ 17 ] , maintaining a fixed evaluation period for both settings. The results, depicted in Figure  6 (a), illustrate MAPE values for prediction horizons of 3h, 6h, 9h, and 12h, with darker colors indicating poorer performance. Overall, our TF-LLM model exhibits relatively consistent prediction performance across different locations, effectively capturing traffic flow trends irrespective of spatial characteristics. In contrast, other models demonstrate inferior homogeneity, particularly in areas featuring complex road network intersections and intricate facility distributions. These findings underscore the robustness and adaptability of our proposed TF-LLM model in effectively learning and predicting traffic patterns across diverse spatial contexts, thus highlighting its potential for real-world application in urban traffic management and planning scenarios.

In our analysis of temporal homogeneity, we evaluated traffic test data spanning November 2019, using daily averaged MAPE values. Illustrated in Figure  6 (b) as calendar heat maps, the results showcase the comparative performance across the month. Our model consistently exhibited lower daily average MAPE values, demonstrating robustness in capturing temporal nuances of traffic flow patterns. Notably, it consistently outperformed others throughout this period, emphasizing its superior capability in handling diverse temporal dynamics inherent in real-world traffic scenarios.

5.7 Ablation Studies

Refer to caption

In order to investigate the influence of different modal information on prompt input and different prompt strategies on model performance, we conducted comprehensive ablation experiments. The results are illustrated in Figure  7 . From left to right, the figures represent the average RMSE, MAE, and MAPE values of TF-LLM under different prompt configurations. The results provide insightful observations on the impact of different input prompt components on predictive performance. Through the removal of various informational elements, including date, weather conditions, Points of Interest (PoI), contextual knowledge, and the Chai-of-Thought (CoT) [ 44 ] prompt, discernible trends emerge.

It is worth noting that the exclusion of date information yields the worst results, showing a substantial increase in all error metrics, with RMSE of 69.97, MAE of 40.44, and MAPE of 24.90%. This indicates the crucial role of temporal context in traffic flow forecasting, which enables the model to capture recurring patterns and trends in traffic behavior. Similarly, weather conditions profoundly influence traffic dynamics, affecting factors such as road surface conditions, visibility, and driver behavior. Removal of weather information leads to a notable decrease in performance metrics, highlighting its importance in accurately modeling traffic flow. Interestingly, omitting PoI information also led to moderate increases in prediction error. This shows that our TF-LLM can exploit and leverage the spatial context from PoI data which is useful in understanding traffic patterns and congestion dynamics.

In our studies, we also validated the effect of prompting strategies on model performance. Initially, we observed that the absence of context information led to model performance degradation, indicating the importance of domain knowledge in enhancing the predictive capabilities of TF-LLM. By leveraging insights into traffic volume, pattern characteristics, and spatial-temporal correlations, the model can anticipate variations in traffic flow dynamics with greater accuracy, resulting in more reliable forecasts. Additionally, the CoT strategy contributed to performance improvement. By incorporating insights into area attributes, predicted time zones, traffic patterns, and historical temporal trends, the model gained a more nuanced understanding of the factors influencing traffic flow dynamics, thereby improving prediction performance.

These findings collectively emphasize the significance of incorporating comprehensive multimodal data, particularly in a temporal and environmental context, to improve the accuracy of traffic flow predictions. While certain input components exhibit more substantial impacts than others, their synergistic integration contributes to the overall predictive capability of the model.

5.8 Zero-shot Capabilities

Refer to caption

Large language models are well known for their excellent zero-shot capabilities. In this section, we delve into the zero-shot capability of our proposed TF-LLM and compare it with other large language models. We conduct zero-shot experiments using two datasets: the first is our proposed CATraffic-based zero-shot dataset, and the other is the taxiBJ dataset [ 2 ] . The taxiBJ dataset comprises taxicab GPS data and meteorology data in Beijing from four time intervals within 2013-2016, focusing on the inflow/outflow prediction task. To ensure a fair comparison, we reorganize the taxiBJ dataset into the same format as the CATraffic dataset. The overall results are presented in Table  2 , indicating that our proposed model exhibits superior performance across all three tasks compared to the original Llama2 series models, as well as GPT-3.5-turbo and GPT-4. For the CATraffic dataset, our TF-LLM model exhibits notable improvements over the best-performing comparative model (GPT-4) by 61.53%, 71.22%, and 85.11% in RMSE, MAE, and MAPE, respectively. On the TaxiBJ dataset, our model achieves substantial performance enhancements in both inflow and outflow tasks. Compared to GPT-4, TF-LLM shows improvements of 40.32% in RMSE, 44.01% in MAE and 80.75% in MAPE in inflow prediction task, and improvements of 44.89% in RMSE, 48.13% in MAPE and 77.26% inMAPE in outflow prediction task. These results distinctly underscore the superiority of our TF-LLM model. These findings illustrate that our fine-tuned model surpasses even the state-of-the-art large language model (GPT-4) when tested on a dataset distinct from the one used for fine-tuning. This suggests that our model adeptly acquired the domain knowledge of traffic flow prediction and effectively generalized it to different scenarios.

We also provide visualization results in Figure  8 , showing 12 test samples from three different sensors in four different time periods. The results depicted in the figure highlight that our model not only captures the traffic trends of new scenarios effectively but also delivers accurate prediction values. In contrast, the Llama2 series of LLMs struggles to capture the dynamic pattern of traffic flow, regardless of model size. While GPT-3.5-turbo and GPT-4 demonstrate the ability to describe some overarching trends in traffic, they perform inadequately in capturing nuances of variation, hindering their ability to provide accurate predictions of traffic flow values. We attribute these discrepancies to our effective prompt design and model fine-tuning. Prompts effectively describe the task context and align information from different modalities into a unified feature space, while fine-tuning infuses domain knowledge into LLMs, thereby enhancing their performance in specific fields.

5.9 Interpretive Studies

Our proposed TF-LLM is based on Llama2-7B-chat, which is tailored to excel in understanding and generating text in conversational contexts. Therefore, by incorporating interpretability requirements into the prompts, our model can not only generate prediction results but also provide explanations simultaneously. Initially, we attempted to have the fine-tuned model directly output explanations, but the outcomes proved to be unsatisfactory. Although the generated explanations were coherent, they failed to align with the traffic flow prediction sequence. This discrepancy arose because the model fine-tuning process solely relied on traffic flow sequences for supervision, without incorporating any mechanisms to link the explanatory text with the output sequences. To address this issue, we employed the few-shot learning technique by providing several explanation examples before the original prompt text. We generate the ground truth explanation texts through ChatGPT by asking it to explain the traffic flow sequences for the next 12 hours. This approach enabled the model to learn from a limited number of examples and improve its ability to generate coherent explanations alongside predictions.

We report four explanatory cases generated by our TF-LLM in Table  3 , each with different time periods. In each example, the ground truth and our predicted result are displayed in the figure on the left (light red areas indicate the 95% confidence interval), with the corresponding interpretative output on the right. The original interpretive output was too long, so we used chatGPT to summarise it into a small paragraph for presentation purposes. A complete explanatory result can be found in Appendix  A.2 (corresponding to the last example in Table  3 ). These four examples cover traffic flow predictions across different time periods, including weekdays, weekends, and holidays (such as Christmas). Our proposed TF-LLM consistently provides accurate prediction results and offers reasonable explanations. It is noteworthy that our model integrates factors such as time, weather, holiday information, and PoIs in the vicinity of the prediction point, ultimately yielding convincing outcomes. Previous time series prediction models could only output prediction results, lacking intuitive interpretability. However, our TF-LLM can directly interact with humans in textual form, showcasing the strength of the LLM-based method.

6 Conclusion and Future Work

In conclusion, our research introduces TF-LLM, a novel traffic prediction model designed for both accuracy and interpretability. By incorporating multi-modal inputs and employing language-based representations, TF-LLM achieves competitive performance compared to state-of-the-art models while offering insights into its predictions. TF-LLM’s language-based framework, coupled with spatial-temporal alignment instructions, provides a transparent and adaptable approach suitable for various urban prediction tasks. In general, our work contributes to the advancement of interpretable and effective traffic prediction methods, essential for informed decision-making in urban transportation planning and management.

In the future, we aim to delve into methods that enable LLMs to harness spatial information more effectively and to grasp how different sensors are related spatially. This will help them make better predictions by considering data from nearby sensors. Achieving this might involve helping LLMs understand graph structures, which could be a promising research direction. Furthermore, exploring the development of LLM systems tailored for urban brains is a very interesting but challenging topic. This entails integrating city-level data into LLMs to tackle various downstream tasks like urban planning, traffic management, and pollution control, etc. Achieving this involves the challenge of enabling LLMs to efficiently utilize city-level multimodal data, alongside the need for substantial computational resources and exceptional engineering capabilities.

7 Acknowledgements

This research is funded by multiple sources, including the National Natural Science Foundation of China under Grant 52302379, the Guangzhou Basic and Applied Basic Research Projects under Grants 2023A03J0106 and 2024A04J4290, the Guangdong Province General Universities Youth Innovative Talents Project under Grant 2023KQNCX100, and the Guangzhou Municipal Science and Technology Project 2023A03J0011.

  • Wang et al. [2020] Wang, S., J. Cao, and S. Y. Philip, Deep learning for spatio-temporal data mining: A survey. IEEE transactions on knowledge and data engineering , Vol. 34, No. 8, 2020, pp. 3681–3700.
  • Zhang et al. [2017] Zhang, J., Y. Zheng, and D. Qi, Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the AAAI conference on artificial intelligence , 2017, Vol. 31.
  • Jin et al. [2018] Jin, W., Y. Lin, Z. Wu, and H. Wan, Spatio-Temporal Recurrent Convolutional Networks for Citywide Short-term Crowd Flows Prediction. In Proceedings of the 2nd International Conference on Compute and Data Analysis , 2018.
  • Du et al. [2018] Du, S., T. Li, X. Gong, and S.-J. Horng, A hybrid method for traffic flow forecasting using multimodal deep learning. arXiv preprint arXiv:1803.02099 , 2018.
  • Ranjan et al. [2020] Ranjan, N., S. Bhandari, H. P. Zhao, H. Kim, and P. Khan, City-wide traffic congestion prediction based on CNN, LSTM and transpose CNN. IEEE Access , Vol. 8, 2020, pp. 81606–81620.
  • Yannis et al. [2017] Yannis, G., A. Dragomanovits, A. Laiou, F. La Torre, L. Domenichini, T. Richter, S. Ruhl, D. Graham, and N. Karathodorou, Road traffic accident prediction modelling: a literature review. In Proceedings of the institution of civil engineers-transport , Thomas Telford Ltd, 2017, Vol. 170, pp. 245–254.
  • Yang et al. [2017] Yang, H.-F., T. S. Dillon, and Y.-P. P. Chen, Optimized Structure of the Traffic Flow Forecasting Model With a Deep Learning Approach. IEEE Transactions on Neural Networks and Learning Systems , 2017, p. 2371–2381.
  • Fan et al. [2018] Fan, Z., X. Song, T. Xia, R. Jiang, R. Shibasaki, and R. Sakuramachi, Online deep ensemble learning for predicting citywide human mobility. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies , Vol. 2, No. 3, 2018, pp. 1–21.
  • Marblestone et al. [2016] Marblestone, A. H., G. Wayne, and K. P. Kording, Toward an integration of deep learning and neuroscience. Frontiers in computational neuroscience , Vol. 10, 2016, p. 94.
  • Gao et al. [2018] Gao, Q., G. Trajcevski, F. Zhou, K. Zhang, T. Zhong, and F. Zhang, Trajectory-based social circle inference. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems , 2018, pp. 369–378.
  • Li et al. [2017a] Li, Y., R. Yu, C. Shahabi, and Y. Liu, Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. arXiv: Learning,arXiv: Learning , 2017a.
  • Akbari Asanjan et al. [2018] Akbari Asanjan, A., T. Yang, K. Hsu, S. Sorooshian, J. Lin, and Q. Peng, Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. Journal of Geophysical Research: Atmospheres , Vol. 123, No. 22, 2018, pp. 12–543.
  • Chang [2023] Chang, E. Y., Examining GPT-4: Capabilities, Implications and Future Directions. In The 10th International Conference on Computational Science and Computational Intelligence , 2023.
  • Touvron et al. [2023] Touvron, H., L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al., Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 , 2023.
  • Liang et al. [2018] Liang, Y., S. Ke, J. Zhang, X. Yi, and Y. Zheng, Geoman: Multi-level attention networks for geo-sensory time series prediction. In IJCAI , 2018, Vol. 2018, pp. 3428–3434.
  • Graves and Graves [2012] Graves, A. and A. Graves, Long short-term memory. Supervised sequence labelling with recurrent neural networks , 2012, pp. 37–45.
  • Wu et al. [2019] Wu, Z., S. Pan, G. Long, J. Jiang, and C. Zhang, Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence , 2019.
  • Yu et al. [2018] Yu, B., H. Yin, and Z. Zhu, Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence , 2018.
  • Guo et al. [2019] Guo, S., Y. Lin, N. Feng, C. Song, and H. Wan, Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI conference on artificial intelligence , 2019, Vol. 33, pp. 922–929.
  • Vaswani et al. [2017] Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need. Advances in neural information processing systems , Vol. 30, 2017.
  • Yu et al. [2020] Yu, C., X. Ma, J. Ren, H. Zhao, and S. Yi, Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16 , Springer, 2020, pp. 507–523.
  • Cai et al. [2020] Cai, L., K. Janowicz, G. Mai, B. Yan, and R. Zhu, Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting. Transactions in GIS , Vol. 24, No. 3, 2020, pp. 736–755.
  • Xu et al. [2020] Xu, M., W. Dai, C. Liu, X. Gao, W. Lin, G.-J. Qi, and H. Xiong, Spatial-Temporal Transformer Networks for Traffic Flow Forecasting. arXiv: Signal Processing,arXiv: Signal Processing , 2020.
  • Jiang et al. [2023] Jiang, J., C. Han, W. X. Zhao, and J. Wang, Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. In Proceedings of the AAAI conference on artificial intelligence , 2023, Vol. 37, pp. 4365–4373.
  • Barredo-Arrieta et al. [2019] Barredo-Arrieta, A., I. Laña, and J. Del Ser, What lies beneath: A note on the explainability of black-box machine learning models for road traffic forecasting. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC) , IEEE, 2019, pp. 2232–2237.
  • Zhang et al. [2022] Zhang, L., K. Fu, T. Ji, and C.-T. Lu, Granger causal inference for interpretable traffic prediction. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) , IEEE, 2022, pp. 1645–1651.
  • Yan and Wang [2023] Yan, J. and H. Wang, Self-interpretable time series prediction with counterfactual explanations. In International Conference on Machine Learning , PMLR, 2023, pp. 39110–39125.
  • Ates et al. [2021] Ates, E., B. Aksar, V. J. Leung, and A. K. Coskun, Counterfactual explanations for multivariate time series. In 2021 international conference on applied artificial intelligence (ICAPAI) , IEEE, 2021, pp. 1–8.
  • Huang et al. [2023] Huang, S., S. Mamidanna, S. Jangam, Y. Zhou, and L. H. Gilpin, Can large language models explain themselves? a study of llm-generated self-explanations. arXiv preprint arXiv:2310.11207 , 2023.
  • Peng et al. [2024] Peng, M., X. Guo, X. Chen, M. Zhu, K. Chen, X. Wang, Y. Wang, et al., LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models. arXiv preprint arXiv:2403.18344 , 2024.
  • Gruver et al. [2024] Gruver, N., M. Finzi, S. Qiu, and A. G. Wilson, Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems , Vol. 36, 2024.
  • Ray [????] Ray, P., ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope, ????
  • Liu et al. [2023a] Liu, S., H. Cheng, H. Liu, H. Zhang, F. Li, T. Ren, X. Zou, J. Yang, H. Su, J. Zhu, et al., Llava-plus: Learning to use tools for creating multimodal agents. arXiv preprint arXiv:2311.05437 , 2023a.
  • Thirunavukarasu et al. [2023] Thirunavukarasu, A. J., D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, Large language models in medicine. Nature medicine , Vol. 29, No. 8, 2023, pp. 1930–1940.
  • Wu et al. [2023] Wu, S., O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 , 2023.
  • Hu et al. [2022] Hu, E. J., Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations , 2022.
  • Jin et al. [2023] Jin, M., S. Wang, L. Ma, Z. Chu, J. Y. Zhang, X. Shi, P.-Y. Chen, Y. Liang, Y.-F. Li, S. Pan, et al., Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728 , 2023.
  • Zhu et al. [2023] Zhu, D., J. Chen, X. Shen, X. Li, and M. Elhoseiny, Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 , 2023.
  • Liu et al. [2023b] Liu, X., Y. Xia, Y. Liang, J. Hu, Y. Wang, L. Bai, C. Huang, Z. Liu, B. Hooi, and R. Zimmermann, LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting. In Advances in Neural Information Processing Systems , 2023b.
  • Li et al. [2017b] Li, Y., R. Yu, C. Shahabi, and Y. Liu, Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. arXiv: Learning,arXiv: Learning , 2017b.
  • Bai et al. [2020] Bai, L., L. Yao, C. Li, X. Wang, and C. Wang, Adaptive graph convolutional recurrent network for traffic forecasting. Advances in neural information processing systems , Vol. 33, 2020, pp. 17804–17815.
  • Fang et al. [2021] Fang, Z., Q. Long, G. Song, and K. Xie, Spatial-Temporal Graph ODE Networks for Traffic Flow Forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining , 2021.
  • Lan et al. [2022] Lan, S., Y. Ma, W. Huang, W. Wang, H. Yang, and P. Li, Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In International conference on machine learning , PMLR, 2022, pp. 11906–11917.
  • Wei et al. [2022] Wei, J., X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al., Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems , Vol. 35, 2022, pp. 24824–24837.

Appendix A Appendix

A.1 prompt design.

The prompt for TF-LLM in traffic flow prediction is carefully designed, as shown in Table 4. The complete prompt contains both a system prompt and a user input prompt. The system prompt sets the role of the LLM, and it also contains a context knowledge part that provides additional background information, as well as a chain-of-thought part that guides the LLM through human’s reasoning process which enbles it to understand the main concepts in traffic flow prediction. The system prompt remains the same throughout the dataset, the part that changes is the user input prompt, and the corresponding ground truth.

A.2 Explanatory Example

A complete example of interpretable output is given in Table 5. This example shows the traffic flow prediction in the next 12 hours at 5PM on Christmas Day 2019. By adding explanatory demand instructions to the original prompt (bold text in Input Prompt of Table 5), TF-LLM is able to output both the prediction results as well as the explanatory text. It can be found that our model is able to adjust the traffic flow prediction strategy by taking into account holidays and the impact of weather on human activities.

  • Application Areas
  • English (US)

Recommended product

  • Transportation modelling
  • Purpose of a transport model
  • Workflow of a transport model
  • Macroscopic, mesoscopic and microscopic models
  • Methods for transportation modelling
  • Structure of aggregated transportation models

How are transportation models built?

How are transportation models analyzed.

  • Connection of transportation models with other plannings and applications

Transportation Modelling: Challenges & Solutions

What is transportation modelling? How do you create a transportation model and how are they used? What are the so-called “four steps” of transportation modelling? What is activity-or agent-based modelling? We answer the most important questions concerning the topic of transportation modelling.

Today, city and traffic planners must balance between various competing demands. The transition towards a more sustainable mobility ecosystem is more urgent than ever in order to meet challenges like climate change and growing urbanization and shape livable environments. At the same time, the demand for mobility and easily accessible means of transport is growing steadily. Everyone expects safe, accessible, fast, and comfortable transportation. Planners are therefore tasked with coming up with reliable transport solutions that are affordable, efficient, and equitable.

In transport planning and the development of advanced mobility systems, forecasting travel behavior and demand for travel plays a crucial role. Only if you can estimate how and where people will be traveling in the coming years, you can make the right decisions for a future mobility system. Traffic flow modelling and simulation enables planners to understand the current issues in their transportation system, identify opportunities and forecast and measure effects of development planning. It serves as the base to make sound decisions and set the right framework for the future of transportation.

The leading software for multimodal transport planning and macroscopic traffic simulations.

Use automation to get a transport model of your city in just one week.

Use attractive visualizations to share your mobility story with decision-makers and the public.

What is transportation modelling?

A transport model is a detailed digital replica of the complex real-world transport and land use system. It represents the numerous complex travel choices people make, their movement patterns and thus level of demand for travel, as well as the transport system network capacities.

Transportation modelling is not limited to car traffic, it’s multimodal. All modes of transport and their interactions can be modelled. This includes bicycles, pedestrians, public transport, new micro-mobility modes, and even air traffic. Transportation models are a kind of digital playground to assess the impact of different transportation and land use options and to identify how the transport system is likely to perform in the future. Transportation modelling is thus a powerful tool to for reliable urban and transport what-if analysis and scenario planning.

What is the purpose of a transport model?

Transport models are the foundation of transportation and traffic planning. Transportation systems involve many components and stakeholders, each with their own perspective and interests. Further, transportation is closely linked to many other aspects of society. Therefore, transportation planning is not usually about finding the ‘one optimal solution’, but considering a range of possible measures, policies and external conditions and then suggest suitable actions for political or commercial decision making.  This is called “what-if” analysis, or scenario analysis.

Transportation modelling tools enable the modelling experts to quickly develop different scenarios for a transport network and test them under a range of assumed future demographic or economic conditions.

The question of where people will live and work in the future and how and where they will travel is crucial for planning infrastructure and transport services and for creating a future-proven mobility system. Travel demand models represent all transport-relevant decision processes that make people move. Within a model, future scenarios for population growth, land use, transport networks and mobility behavior can be built to assess the impact of these changes. This enables planners to determine whether a new highway lane is needed, how the public transportation network should be expanded to meet demand, where locations for new bus terminals or logistics hubs should be sited, or how people's mobility behavior will change with autonomous vehicles.

Transportation modelling enables planners to

  • Develop advanced and future-proofed transport strategies and solutions.
  • Conduct traffic analyses and forecasts
  • Plan public transport services
  • Setting the framework to adapt to new mobility services such as autonomous driving

What are use cases of transport modelling?

Traffic models are used for a wide variety of applications. Here are some examples:

Transportation masterplans & Infrastructure planning

Cities and transportation agencies today face the challenge of creating a mobility infrastructure that satisfies all needs. Not only in terms of efficient movement of people and goods but also concerning planning goals such as safety and sustainability.  Transport modelling helps to plan and design new infrastructure while taking future developments into account and making them easily adaptable to changing demographic, economic or spatial conditions.

Transportation modelling supports:

  • Planning & design of new infrastructures
  • Long term development of transportation infrastructure according to demographic projections and land use development
  • Accessibility to different services and by various modes

Public Transport & Rail planning

How can the public transport network be expanded? Where does a new bus line make sense, where are new stops needed? Which frequency serves the demand and creates an attractive offer? Transport modelling provides a detailed representation of all modes of public transport such as bus, tram, underground, taxi, rail, and train. It allows planners to design reliable transit services which optimally serve passengers needs and allow efficient operations.

Transport models support:

  • Development of lines & timetables for future years (1, 5, 10 years)
  • Fleet planning (long term vehicle procurement), Vehicle allocation
  • Planning for operation of electric buses
  • Planning of services
  • Subnetwork tendering in Public Transit Agencies
  • Allocation of revenues & subsidies to operators by agency
  • Evaluation of fare structures
  • Analysis of passenger counts
  • Rider equity analysis  

Development of transportation policies and regulations

Transportation models provide an important basis for defining framework conditions and regulations in transportation policy. For example, in the introduction of low emission zones or other traffic regulations, or as a basis for efficient traffic management.

New mobility planning

The future of mobility is gearing towards electric and autonomous vehicles.  In addition, ride and vehicle sharing are increasingly important. Transport planners must adapt to these new mobility services and make the necessary changes to serve their community’s needs. How can the charging infrastructure for e-mobility be strategically planned? What impact will autonomous vehicles have on traffic flow and road capacity? How can on-demand and sharing services be planned in such a way that they enrich existing public transport services? Transportation modelling is indispensable for setting the right course for a future-oriented mobility ecosystem.

What is the workflow to develop, maintain and apply a transport model

For strategic transportation planning there is a relatively clear distinction between mode development and model application. Model development is the process to set up a base model which reproduces the mobility in the planning regions at a given time (the base year). This model is built from various data sources, which all should relate to the base year. By adjusting various parameters and inputs, the model is calibrated to match traffic counts and various survey data (such as vehicle counts, public transit passenger boardings, trip distance distributions), which are also collected for the base model. Due to these data and calibration requirements, the base year of the model will often be an earlier year than the actual year when the model is developed.

Once calibrated and approved, this base (year) model can then be used in many different applications to develop projects and test scenarios. This model may then be handed to different agencies or consultants for their project studies.

As the transport system evolves over time, also the base model needs to be maintained and updated in order to remain representative for the model region. Bigger updates to the model will usually also require a recalibration. The frequency of such updates depends on the scope of mobility changes in the region and the project timeline and budget.

What are macroscopic, mesoscopic and microscopic models?

The three terms refer to the level of detail in which, in particular, road traffic is modelled. In macroscopic models, traffic is modelled with a flow model similar to fluids, generating outputs e.g. as fractional volumes on links and turns. Macroscopic models can be used to assess traffic in large scale networks, at the expense of simulation detail. In contrast, microscopic simulation models provide a detailed simulation of individual vehicles, with their acceleration, deceleration, and precise movements along links and through intersections. The output of microscopic models are therefore detailed trajectories of individual vehicles. The higher computational requirements render them less applicable to large scale networks.  Mesoscopic models (or simulation-based assignment models) combine aspects of both models, by simulating traffic in large scale networks through a simplified vehicle movement models which omit aspects like acceleration or deceleration. Mesoscopic models provide enough detail for assessment of traffic management measures etc., while still being applicable to large scale networks.

What methods are used for transportation modelling?

Depending on the required level of detail and accuracy, the forecast period, available input data, resources and know how, different mathematical approaches can be used. Historically, an aggregate methodology referred to as the 4-step-model, or trip-based model, has been most used. Recently, more detailed disaggregate approaches referred to as activity-based models or agent-based models (ABM) have been implemented in many locations. Both approaches, and some other model types are explained in the following:

The classical 4-step travel demand modelling process

The 4-step-process is an established methodology for urban, regional and national travel demand modelling. The aggregate planning transportation model compromises four steps related to travel choices.

1. Trip generation –  how many trips are generated?

The first step in the four-step transportation planning process deals with the question of how many trips originate in or are destined for a particular travel analysis zone (TAZ).  TAZs are neighborhoods in the model area and serve as the source or destination for trips. TAZs are also coded with land use data like the number of households and employment for understanding travel demand.  The trips generated are related to different trip purposes, for example, work, shopping, or leisure. The production and attraction of trips are driven by so-called trip rates, averages based on the number of people in households or the number of vehicles available.

Sometimes, dedicated TAZs are introduced to the model to represent facilities such as airports or large factories which feature special trip production and attraction characteristics. 

The output of the trip generation step is a set of production and attraction values associated with each zone.

2. Trip distribution – where do trips go?

Destination choice is the second component of four-step transportation planning. The trip distribution step matches trip origins with destination. This is done by weighing the attractiveness of the potential destination and the effort required to get there such as road distance, travel time, and toll/cost.

The result is that the original demand of a TAZ is split across several destination zones. Depending on the segmentation of the model, multiple distribution matrices may be generated, for example by trip purpose or household income.

3. Mode Choice – what travel mode is used for each trip?

In the third step, trips between the TAZs are allocated to different transportation modes . Which mode of transport people are using depends on their preferences and aspects of their household or person such as car ownership. Other factors such as travel time, cost, parking availability, and number of transfers for transit have an additional influence on the modal split. These variables and parameters are typically incorporated into a logit model to calculate the split of demand across the modes.

As an output of the mode choice procedure, the trip matrices from the distribution step are further refined into trip matrices per mode.

4. Trip Assignment – what is the route of each trip?

In the assignment step, the trips between an origin and destination by a particular mode are ‘assigned’ to a specific path. This means that the trip matrices from the prior steps are used as an input to assign route flows to the actual transportation network. Traffic volumes for road segments (or links) and transit vehicle loads are generated, and often analyzed as a result

There are different network assignment procedures for different types of transport modes:

Road traffic assignment

For road-based traffic by cars, heavy goods vehicles, etc., which are constrained by road capacity, iterative equilibrium network assignment procedures are applied. The distribution of traffic to different routes in these procedures is driven by the observation that the actual travel speeds on roads decrease with the amount of traffic on the road in relation to the capacity (the saturation level). This is expressed by volume-delay-functions (or capacity-restraint-functions). With increasing traffic load and decreasing travel speeds on primary roads, road users shift to secondary, faster routes.

The assignment procedures iteratively shift fractions of travel demand between different routes, until all routes allocated to each pair of origin-destination zones experience the same (or very similar) travel time (or generalized cost). This balancing is done for all pairs at once, converging to a network-wide equilibrium state, called the Wardrop equilibrium. Due to the vast number of routes considered, the equilibrium is never met exactly. A gap measure is used to indicate the level of convergence reached in the assignment process. Good convergence of the base model is essential for transportation planning because, with bad convergence, it is impossible to distinguish scenario effects from random assignment artifacts/noise in later model applications.

Public Transport assignment

The process of assigning public transport trips works differently from road assignments.  Public transit networks consist of distinct transit lines with specific service frequencies and possible stop waiting times. The transit network can only be entered at specific stops, therefore access and egress– usually by walking – are required. For a given origin-destination zone pair,  there may not be a  direct connection and so one or multiple transfers may be required. Furthermore, transit fares are considered in the route choice as well.

There are different factors that influence the journey experience, such as travel time, number of transfers, waiting times, or access and egress time. Within a public transit assignment, these factors are considered in a choice model.

Various trip connection alternatives are derived from the public transit network and the timetable. Trips are then distributed to these alternatives based on traveler preferences and the resulting public transit network line, stop, and vehicle boardings and volumes available for analysis.

Active modes / Bike assignment

Although active transport modes such as bikes often use the same road infrastructure as cars, the route choice of cyclists is influenced by other factors then of drivers.  Travel speeds of bicycles are not due to the capacity effect like travel speeds of cars. Instead, cyclists are more sensitive to features such as slopes, paving, traffic lights, striping, etc. Therefore, choice models reflecting these aspects are applied for assigning active mode travel to different route alternatives.

Disaggregate Activity-based / Agent-based models (ABM)

In contrast to aggregate trip-based models, disaggregate models model people and/or households individually and often with more precise home and activity locations. Since individual data is not available for the entire population, a ‘synthetic population’ of households and persons is generated from statistical data and distributions of key variables such as household income and person age.

The general choice model structure applied in ABM is very flexible. The person and household attributes attached to each individual, as well as their previous decisions, can be considered in each subsequent choice step. This allows a more realistic representation of their mobility in terms of travel during the simulation day as household context variables (e.g. family car usage), long-term decisions (e.g. car ownership), and tour variables (e.g. drove to work) can be used to more precisely estimate travel decisions.  Understanding of time and space is typically more precision as well which makes estimating tolled/priced travel and active mode travel more accurate.

ABM models generate daily activity plans covering each person’s relevant activities along with their location, timing, mode, and route (in some cases).  This synthetic travel diary provides much higher spatial and temporal resolution of model outputs for analysis. As the results are generated as individual tours, trips, and activities, they are easier for non-experts to understand than the traditional fractional traffic flows generated by aggregate models. On the other hand, setting up and calibrating ABM models is more complex than aggregate models.

Other modelling methods

Aggregate tour-based / activity-based models.

While maintaining the spatial aggregation and segmentation of the classical 4-step models, aggregate tour-based models consider tours of individuals spanning multiple activities at different locations. They incorporate some aspects of ABM models and some aspects of the traditional aggregate models.

Incremental / pivot-point models

Pivot-point models are similar to simple growth models that relate growth to a relevant variable (such as change in ticket price) in that they estimate changes in travel demand from changes in travel cost. But instead of simply applying a fixed elasticity related to one or a few selected variables, they typically reflect the complex choices travelers take in an incremental logit model. As this approach allows to consider various variables influencing travel choices, it has been widely adopted in transportation planning, e.g. for project appraisal. Some national guidelines (e.g. the UK TAG guideline, (Transport, 2022)) for project appraisal provide detailed instructions and model parameters for the workflow.

Multimodal modelling

For many travelers, mobility is not limited to using a single mode of transportation for a trip.  Instead, they may take their private car to reach a park & ride facility, use public transit to get to the city center and then continue their journey by e-scooter to reach their destination. Other examples are car sharing or ride sharing systems, which are operated by cars, but to the user they appear much like a public transit mode. For multimodal modelling these special requirements and framework conditions need to be taken into account.

How is freight transport and commercial traffic considered?

Freight transport, as well as commercial and service activities, generate a large share of the overall traffic volume. Due to the large variety of operations involved, and the heterogeneity and complexity of logistics chains, modelling commercial and freight transportation is less standardized. The availability of input data and the calibration of models restrains a wide application of these models. Many smaller scale models (for urban areas) only roughly assess commercial and freight traffic. More complex, bespoke models are often built on the national scale for assessing freight transport based on internal and external supply chains.

How are aggregated transportation models structured?

Spatial model structure - what are transport analysis zones (taz).

The core principle of aggregate models is the spatial dissection of the study area into travel analysis zones (TAZ). These zones seamlessly cover the study area and are often associated with statistical units (communities, census tracts, …) where statistical data are collected. Most inputs and outputs are aggregated to these TAZ and can be prepared and analyzed with GIS tools. Since TAZs are the core units of the model computations, many outputs such as network level-of-service indicators (called skims) (travel time, cost, number of transfers in public transport, etc.) and travel demand flows (person trips, etc.) between the TAZ are generated in form of square matrices.  The rows are the origin zones and the columns are the destination zones.

Population segmentation

People have different travel patterns depending on their life situation and other aspects. An employee for example covers different road routes than a student or retiree. People in urban areas have different travel options than in rural areas. In 4-step-models, this is reflected by segmentizing the total population of the TAZ into different groups or ‘demand segments’. The number of segments and the characteristics considered for the segmentation (e.g. age, employment status, car availability) depend on model scope, budget, and data availability. For each segment, individual parameters for trip characteristics by purpose, mode preference, value-of-time etc. can then be applied in the modelling steps.

Consideration of time – temporal scope

Most travel demand models are used for strategic considerations. The planning horizon is in the range of years. In this context, it is often sufficient to consider average daily traffic volumes. Thus, many strategic models are designed to model traffic volumes and trips per average day. As travel patterns tend to have a high degree of temporal clustering, and because transportation infrastructure needs to be designed to meet peak demands, separate models for peak hour traffic estimates are often created. If models are used for more operational studies, e.g. of traffic management strategies, then a higher temporal resolution of demand, e.g. to hourly values, may be required.

What datasets are used for building transportation models?

For setting up transportation models, existing datasets are processed and combined wherever possible. In general, several different types of input data are required to build a model.

Transportation networks and supply data

Transportation networks form the base for any transportation model. Road networks can be extracted from navigation databases or from GIS datasets. Some processing may be required to provide essential attributes specific to transportation models, such as capacity. Public transit networks and timetables are often available in common formats such as GTFS from scheduling systems. Other networks, e.g. for cycling, sea transport or air traffic may be available from GIS datasets or other online sources. Although not required for most calculations, its best if the modal networks can be merged into a single multi-modal network, e.g. by mapping bus lines to the road network. This enables multimodal analysis and consideration of interactions between modes.

Land use, demographic, and economic data

Land use data, demographic variables (population, age, employment, income, …) and other data (workplaces, school places, …) are needed to assess the origins and destination of travel demand. Much of this data is available from census data and land use monitoring in databases or GIS formats. For aggregated models, this information is usually condensed to the TAZ level, while ABM models preserve the individual activity locations, household locations and synthetic population.

Travel behaviour parameters

Most calculations in the transportation modelling process – regardless of the model type – are based on parameters describing human travel behavior. These parameters can be estimated from household travel diary surveys or estimated by statistical methods from other datasets such as mobile phone data.

Observed control data

Although they are not needed for the model calculation itself, control datasets are important for the model calibration process. These can be observed traffic volumes from manual or automated vehicle counts, transit passenger counts from automated count systems or ticketing systems, or similar. Observed distributions of actual trip distances, travel times etc. are also useful for calibrating demand models and often come from household travel surveys or Big Data.

Defining the model sequence

The model sequence describes the sequence of calculations, data processing steps and output generation which is executed to ‘run’ the model or to assess a scenario. Depending on the software used for modelling, the sequence, along with the calculation parameters, may be defined through a graphical user interface or through a scripting / programming language.  In many cases, travel demand models implement feedback loops, so sub-parts of the model sequence will be executed several times during a single model run. Complex models or specific calculations may even launch external software tools for additional components.

Model calibration

To be used as a planning tool, transportation models must achieve an accurate replication of travel patterns in the base year. Only thereby can future scenarios be assessed in a meaningful way. To achieve this, model results are compared to observed data, usually at least vehicle counts, public transport passenger counts and/or trip distance and travel time distributions. Based on these comparisons, model parameters and other aspects are adjusted until the calibration requirements are met (e.g., max +/- 5% deviation at 85% of the count locations). Calibration is often a time-consuming process, requiring expert knowledge and experience. A number of automated procedures can be applied to automatically adjust model components to match observed counts, e.g. by adding constants to utility functions or by automated a posteriori matrix adjustment. While speeding up the calibration process, these methods can have issues regarding model expressiveness and can lead to model overfitting.

What outputs and results are generated by transportation models?

Transport models provide various results on different levels of detail and segmentation. The most frequently used outputs are traffic volumes for different modes of transport on the links. These provide direct insights into local traffic impacts, e.g. on noise and emission levels. For public transit planning, comparable outputs are volumes on transit lines or individual services, or boarding, alighting and transferring at individual stops. The assignments generate full paths from trip origin to trip destination through the network, along with the respective volumes, allowing detailed analysis of which travelers use each link. This is very helpful, e.g. when planning roadworks and bypasses.

For more comprehensive scenario assessments, a multitude of global KPIs can be provided for the full network, or for subsections like certain relations or destinations. For studies regarding decarbonization, this can be metrics such as total mileage or congestion length- For transit studies KPIs like total ridership, average travel time or number of transfers, estimated fare revenues, operating cost are provided. The various network skim matrices generated during the model computation provide valuable insights on issues related to accessibility and connectivity.

What tools can be used for analyzing transport model outputs?

Many of the outputs produced by transportation models can easily be analyzed in tables, charts, or GIS maps. For visualizing link volumes, 2D or 3D bar maps are a popular rendering.

The richness and structure of outputs from transportation models allows for much more sophisticated analysis. Only a few of the many available tools can be described in the following:

  • One of the most important tools is the ‘flow bundle’ or ‘select link’ analysis. This extracts all the paths traversing selected network elements. Analysts can asses which travelers will be affected by changes to these elements.
  • Isochrone calculations provide insights to accessibility based on detailed information on travel times and network connectivity contained in a transport model.
  • The information captured in network skim matrices and trip demand matrices can be visualized and aggregated to identify demand flows and overall travel patterns.
  • The time profiles of activities at selected locations can be investigated, e.g. charging of electric vehicles at charging facilities or the presence of visitors at a shopping center.
  • Crowding profiles and transfer locations of transit lines or signalization coordination in green bands can be analyzed in specialized time-space diagrams.

How are transportation models connected to other plannings and applications?

Mobility is at the heart of human activity. That’s why transportation planning never stands alone.   It is linked to many other urban planning activities and provides invaluable inputs to the respective processes and tools. Some examples are:

Air quality assessment and climate change

Road traffic is responsible for a large share of carbon emissions and other pollutants affecting air quality in cities. Many transportation planning project therefore aim to reduce those emissions, e.g. by introducing Low Emission Zones in city centers. Transportation models compute the traffic volumes resulting from such measures, while also indicating unwanted side effects like overall increases in trip lengths. The combination with emission models enables planners for example to assess different fleet compositions of electric and combustion engines.

Noise evaluations

Similar to air quality, traffic volumes computed by transportation models can also feed into noise emission models. Based on detailed models of the built-up environment etc., planners can assess the noise exposure and possible reductions within dedicated tools.

Accessibility and equity

Access to different types of services like health care, education, or groceries is strongly dependent on transportation – while not all mobility alternatives are equally suitable for all people. The detailed representation of all modes and the segmentation of transportation models for different population types allow concise planning for equity in service provisioning.

Public Transit, Rail and Ride Sharing Fleet planning

The procurement of vehicles for public transit, rail services or ride sharing offers requires huge financial investments and has long lead times. Therefore, implementing or adjusting such services is usually not a short-term matter but requires proper advance planning. In particular, the conversion of gasoline bus fleets to e-bus fleets requires careful assessment of the different operational concepts (overnight charging, opportunity charging etc.) and the needed charging infrastructure. Transportation models can be used to evaluate the fleet of vehicles required to operate a planned service or to assess how different types of vehicles will perform in the planned service.

Roadworks planning

When infrastructure needs to be maintained or replaced, or other roadworks affect the available road capacity, transportation models help to assess the relocation of traffic flows and to ensure sufficient capacity and smooth operation on the alternate routes.

Land use & energy grids

When zoning systems and land use of a city or region are designed, transportation systems need to be adjusted. Transportation models play a major role to support this process. With the shift towards electric mobility, other parts of the urban infrastructure such as the electric grid also may need adjustment. Transportation models provide key insights for helping to dimension these assets.

Road safety

Road safety plays a major role in providing livable cities. Transportation models can help in identifying and analyzing accident hot spots and designing network alternatives for mitigation.

Are you looking for the best transportation modelling solution?

Get your free demo version of PTV Visum - the world’s leading transport planning software. For 30 days, you can put the software into action with almost all functions. 

Get PTV Visum demo

National Academies Press: OpenBook

Travel Demand Forecasting: Parameters and Techniques (2012)

Chapter: chapter 3 - data needed for modeling.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

14 3.1 Introduction Many data are required for model development, validation, and application. This chapter briefly describes the data used for these functions. Model application data primarily include socioeconomic data and transportation networks. These data form the foundation of the model for an area, and if they do not meet a basic level of accuracy, the model may never adequately forecast travel. When preparing a model, it is wise to devote as much attention as necessary to developing and assuring the quality of input data for both the base year and for the forecast years. This chapter provides an overview of primary and secondary data sources and limitations of typical data. 3.2 Socioeconomic Data and Transportation Analysis Zones Socioeconomic data include household and employment data for the modeled area and are usually organized into geographic units called transportation analysis zones (TAZs, sometimes called traffic analysis zones or simply zones). Note that some activity-based travel forecasting models operate at a more disaggregate level than the TAZ (for example, the parcel level); however, the vast majority of models still use TAZs. The following discussion of data sources is applicable to any level of model geography. TAZ boundaries are usually major roadways, jurisdictional borders, and geographic boundaries and are defined by homo- geneous land uses to the extent possible. The number and size of TAZs can vary but should generally obey the following rules of thumb when possible: • The number of residents per TAZ should be greater than 1,200, but less than 3,000; • Each TAZ should yield less than 15,000 person trips per day; and • The size of each TAZ should be from one-quarter to one square mile in area. The TAZ structure in a subarea of particular interest may be denser than in other areas further away. It is important that TAZs are sized and bounded properly (Cambridge Sys- tematics, Inc. and AECOM Consult, 2007). In general, there is a direct relationship between the size and number of zones and the level of detail of the analysis being performed using the model; greater detail requires a larger number of zones, where each zone covers a relatively small land area. TAZs are typically aggregations of U.S. Census geo- graphic units (blocks, block groups, or tracts with smaller units preferred), which allows the use of census data in model development. To facilitate the use of U.S. Census data at the zonal level, an equivalency table showing which zones correspond with which census units should be constructed. Table 3.1 provides a brief example of such a table. Once the zone system is developed and mapped and a census equivalency table is constructed, zonal socioeconomic data can be assembled for the transportation planning process. Estimates of socioeconomic data by TAZ are developed for a base year, usually a recent past year for which neces- sary model input data are available and are used in model validation. Forecasts of socioeconomic data for future years must be developed by TAZ and are estimated based on future land use forecasts prepared either using a manual process or with the aid of a land use model. As a key input to the travel demand model, the accuracy of socioeconomic forecasts greatly affects the accuracy of a travel demand forecast. 3.2.1 Sources for Socioeconomic Data Data availability and accuracy, the ability to make periodic updates, and whether the data can be reasonably forecast into the future are the primary criteria in determining what data C h a p t e r 3 Data Needed for Modeling

15 will be used in a model.1 With that consideration and the understanding that in some cases it may be an objective to gather base year data for other planning purposes, the fol- lowing sources should be evaluated. In general, population and household data come from the U.S. Census Bureau and employment data from the Bureau of Labor Statistics (BLS, part of the United States Department of Labor), as well as their equivalent state and local agencies. Many of the pro- grams are collaborations between the two federal agencies. Socioeconomic input data are also available from a number of private vendors. Population and Households Four major data sources for population and household information are described in this subsection: decennial U.S. Census, American Community Survey (ACS), ACS Public Use Microdata Samples (PUMS), and local area population data. Decennial U.S. Census. The decennial census offers the best source for basic population and household data, including age, sex, race, and relationship to head of household for each individual. The census also provides data for housing units (owned or rented). These data are available at the census block level and can be aggregated to traffic zones. The decennial census survey is the only questionnaire sent to every Ameri- can household with an identifiable address. The 2010 Census is the first since 1940 to exclude the “long form.” Previously, approximately one in every six households received the long form, which included additional questions on individual and household demographic characteristics, employment, and journey-to-work. The absence of the long form means that modelers must obtain these data (if available) from other sources, such as the American Community Survey (see below). American Community Survey. The ACS has replaced the decennial census long form. Information such as income, education, ethnic origin, vehicle availability, employment status, marital status, disability status, housing value, housing costs, and number of bedrooms may be obtained from the ACS. The ACS content is similar to the Census 2000 long form, and questions related to commuting are about the same as for the long form, but the design and methodology differ. Rather than surveying about 1 in every 6 households once every 10 years, as had been done with the long form, the ACS samples about 1 in every 40 addresses every year, or 250,000 addresses every month. The ACS uses household addresses from the Census Master Address File that covers the entire country each year. The ACS thus samples about 3 million households per year, translating into a less than 2.5 percent sample per year. As a result of the smaller sample size, multiple years are required to accumulate sufficient data to permit small area tabulation by the Census Bureau in accordance with its disclosure rules. Table 3.2 highlights the ACS products, including the population and geography thresholds associated with each period of data collection. The sample size for the ACS, even after 5 years of data collection, is smaller than the old census long form. Thus, ACS’s 5-year estimates have margins of error about 1.75 times as large as those associated with the 2000 Census long form estimates, and this must be kept in mind when making use of the data. AASHTO and the FHWA offer Internet resources providing additional detail on ACS data and usage considerations. ACS Public Use Microdata Samples. The Census Bureau produces the ACS PUMS files so that data users can create custom tables that are not available through pretabulated data products (U.S. Census Bureau, 2011a). The ACS PUMS files are a set of untabulated records about individual people or housing units. PUMS files show the full range of popu- lation and housing unit responses collected on individual ACS questionnaires. For example, they show how respondents answered questions on occupation, place of work, etc. The PUMS files contain records for a subsample of ACS housing units and group quarters persons, with information on the characteristics of these housing units and group quarters persons plus the persons in the selected housing units. The Census Bureau produces 1-year, 3-year, and 5-year ACS PUMS files. The number of housing unit records contained in a 1-year PUMS file is about 1 percent of the total in the nation, or approximately 1.3 million housing unit records and about 3 million person records. The 3-year and 5-year ACS PUMS files are multiyear combinations of the 1-year PUMS files TAZ Census Block 101 54039329104320 101 54039329104321 101 54039329104322 102 54039329104323 102 54039329104324 Source: Martin and McGuckin (1998). Table 3.1. Example TAZ to Census geography equivalency table. 1The explanatory power of a given variable as it relates to travel behavior must also be considered; however, such consideration is subordinate to the listed criteria. A model estimated using best-fit data that cannot be forecast beyond the base year, for example, provides little long-term value in forecasting.

16 with appropriate adjustments to the weights and inflation adjustment factors. They typically cover large geographic areas with a population greater than 100,000 [Public Use Microdata Areas (PUMAs)] and, therefore, have some limits in applica- tion for building a socioeconomic database for travel fore- casting, but can be helpful because of the detail included in each record. PUMS data are often used as seed matrices in population synthesis to support more disaggregate levels of modeling (such as activity-based modeling). PUMS users may also benefit from looking at Integrated PUMS (IPUMS), which makes PUMS data available for time series going back over decades with sophisticated extract tools. Local area population data. Some local jurisdictions collect and record some type of population data. In many metropolitan areas, the information is used as base data for developing cooperative population forecasts for use by the MPO as travel model input. Employment Obtaining accurate employment data at the TAZ level is highly desirable but more challenging than obtaining house- hold data for a number of reasons, including the dynamic nature of employment and retail markets; the difficulty of obtaining accurate employee data at the site level; and lack of an equivalent control data source, such as the U.S. Census, at a small geographic level. Six potential sources of data are discussed in this subsection. Quarterly Census of Employment and Wages. Pre- viously called ES-202 data, a designation still often used, the Quarterly Census of Employment and Wages (QCEW) provides a quarterly count of employment and wages at the establishment level (company names are withheld due to con- fidentiality provisions), aggregated to the county level and higher (state, metropolitan statistical area). Data are classified using the North American Industry Classification System (NAICS). The QCEW is one of the best federal sources for at-work employment information. State employment commissions. State employment commissions generally document all employees for tax pur- poses. Each employer is identified by a federal identification number, number of employees, and a geocodable address usually keyed to where the payroll is prepared for the specified number of employees. Current Population Survey. The Current Population Survey (CPS) is a national monthly survey of about 50,000 households to collect information about the labor force. It is a joint project of the Census Bureau and the BLS. The CPS may be useful as a comparison between a local area’s labor force characteristics and national figures. Market research listings. Many business research firms (e.g., Infogroup, Dun and Bradstreet, etc.) sell listings of all (or major) employers and number of employees by county and city. These listings show business locations by street addresses, as well as post office boxes. Longitudinal Employer–Household Dynamics. Longi- tudinal Employer–Household Dynamics (LEHD) (U.S. Census Bureau, 2011b) is a program within the U.S. Census Bureau that uses statistical and computational techniques to com- bine federal and state administrative data on employers and employees with core Census Bureau censuses and surveys. LEHD excludes some employment categories, including self- employed and federal workers, and data are not generated for all states (i.e., Connecticut, Massachusetts, and New Hampshire as well as the District of Columbia, Puerto Rico, and the U.S. Virgin Islands as of September 2011). Users of LEHD should also be mindful of limitations with the methodology used to assemble the data, including the use of Minnesota data as the basis for matching workers to workplace establishments and the match (or lack of match) with Census Transportation Plan- ning Products (discussed below). Murakami (2007) provides Table 3.2. ACS data releases. Data Product Population Threshold Geographic Threshold Years Covered by Planned Year of Release 2010 2011 2012 2013 1-year estimates 65,000+ PUMAs, counties, large cities 2009 2010 2011 2012 3-year estimates 20,000+ Counties, large cities 2007 2009 2008–2010 2009–2011 2010–2012 5-year estimates All areas* Census tracts, block groups in summary file format 2005– – 2009 2006–2010 2007–2011 2008–2012 *5-year estimates will be available for areas as small as census tracts and block groups. Source: U.S. Census Bureau.

17 an examination and discussion of LEHD issues for transpor- tation planners. The LEHD Quarterly Workforce Indicators (QWI) report is a useful source for modelers, particularly as a complement to the QCEW. Local area employment data. Few areas record employ- ment data other than a broad listing of major employers with the highest number of employees locally, typically reported by a local chamber of commerce or similar organization. Special Sources Census Transportation Planning Products. Previously called the Census Transportation Planning Package, the Census Transportation Planning Products (CTPP) Program (AASHTO, 2011) is an AASHTO-sponsored data program funded by member state transportation agencies and oper- ated with support from the FHWA, Research and Innovative Technology Administration, FTA, U.S. Census Bureau, MPOs, state departments of transportation (DOTs), and the TRB. CTPP includes tabulations of interest to the transportation community for workers by place of residence, place of work, and for flows between place of residence and place of work. CTPP are the only ACS tabulations that include flow infor- mation. Examples of special dimensions of tabulation include travel mode, travel time, and time of departure. CTPP are most frequently used as an observed data source for comparison during model validation, but are sometimes used as a primary input in model development, particularly in small areas where local survey data are unavailable. The previous CTPP tabulations were based on the decennial census long form. The CTPP 2006 to 2008 is based on the ACS and is available at the county or place level for geography meeting a population threshold of 20,000. The CTPP 2006 to 2010, anticipated to be available in 2013, will provide data at the census tract, CTPP TAZ, and CTPP Transportation Analysis District (TAD) levels. ACS margin of error considerations apply to the CTPP. Aerial photography. Often aerial or satellite photo- graphs available at several locations on the Internet can be used to update existing land use, which can then be used as a cross-check in small areas to ensure that population and employment data are taking into account changes in land use. It is crucial to know the date of the imagery (when the pictures were taken) prior to using it for land use updates. Aerial photography is also useful in network checking, as discussed later. Other commercial directories. Some commercial direc- tories provide comprehensive lists of household and employ- ment data sorted by name and address. For households, such information as occupation and employer can be ascertained from these sources. For business establishments, type of business—including associations, libraries, and organizations that may not be on the tax file—can be determined. Other commercial databases provide existing and forecasted house- holds and employment by political jurisdictions. Other sources. Data on school types, locations, and enrollment are typically obtained directly from school districts and state departments of education (DOE). Large private schools might have to be contacted directly to obtain this information if the state DOE does not maintain records for such schools. 3.2.2 Data Source Limitations Population The main data source to establish a residential database is the decennial census. Other sources do not provide com- parable population statistics by specific area (i.e., block level). Often, the base year for modeling does not conform to a decennial census. In that case, data from the decennial cen- sus should be used as the starting point and updated with available data from the census and other sources to reflect the difference between the decennial census year and the base year. Employment Each of the previously identified data sources has some deficiency in accurately specifying employment for small geographic areas: • The census provides total labor force by TAZ; however, this represents only employment location of residents and not total employment. • The census also shows labor force statistics by industry group but does not compile this by employer and specific geographic area (i.e., block). • The CTPP counts employed persons, not jobs. For persons with more than one job, characteristics on only the principal job are collected. • Considerations regarding margin of error apply to use of CTPP or ACS data (or any data for that matter). • The employment commission data may provide accurate employment for each business but only partially list street addresses. • Market research listings have all employers by street address. Although these listings are extensive, the accuracy is con- trolled internally and often cannot be considered com- prehensive (because of the lack of information regarding

18 collection methodology), but it offers a check for other data sources. • The land use data obtained from aerial photography pro- vide a geographic location of businesses but do not provide numbers of employees. • Employment commission data (as well as other data on employers) often record a single address or post office box of record; employee data from multiple physical locations may be aggregated when reported (i.e., the headquarters of a firm may be listed with the total employment combined for all establishments). • Government employment is not included in some data sources (including market research listings) or is included incompletely. Government employment sites are often either double-counted in commercially available data sources or “lumped” (i.e., multiple sites reported at one address). For example, public school employees are not always assigned to the correct schools. Employment data are the most difficult data component to collect. None of the data sources alone offers a complete inventory of employment by geographic location. Therefore, the methodology for developing the employment database should be based on the most efficient and accurate method by which employment can be collected and organized into the database file. All data must be related to specific physical locations by geocoding. Planning for supplementary local data collection remains the best option for addressing deficiencies in source data on employment; however, this effort must be planned several years in advance to ensure that resources can be made available for survey development, administration, and data analysis. For all sources of socioeconomic data, users must be aware of disclosure-avoidance techniques applied by the issuing agency and their potential impact on their use in model development. 3.2.3 Base and Forecast Year Control Totals for the Database The control totals for the database should be determined before compilation of the data. The source of the control totals for population should be the decennial census. Control totals for employment at the workplace location are more difficult to establish; however, the best source is usually the QCEW or state employment commission data. When the most recent census data are several years old, it may be desirable to have a more recent base year for the model, especially in faster growing areas. This means that some data may not be available at the desired level of detail or segmentation—for example, the number of households for a more recent year may be available, but not the segmentation by income level. Analysts often use detailed information from the most recent year for which it is available to update segmentations, such as applying percentages of households by segment from the census year to the total number of households for a more recent year. In some cases, estimates of totals (for example, employment by type) may not be available at all for the base year. Other data sources, such as building permits, may be used to produce estimates for more recent years, building upon the known information for previous years. Census data are, of course, unavailable for forecast years. Some of the agencies discussed above—as well as state agen- cies, counties, and MPOs—produce population, housing, and employment forecasts. Such forecasts are often for geo- graphic subdivisions larger than TAZs, and other types of segmentation may also be more aggregate than in data for past years. This often means that analysts must disaggregate data for use as model inputs. Data are typically disaggregated using segmentation from the base year data, often updated with information about land use plans and planned and proposed future developments. 3.3 Network Data The estimation of travel demand requires an accurate rep- resentation of the transportation system serving the region. The most direct method is to develop networks of the system elements. All models include a highway network; models that include transit elements and mode choice must also include a transit network. Sometimes, a model includes a bicycling or a walk network. Accurate transportation model calibra- tion and validation require that the transportation networks represent the same year as the land use data used to estimate travel demand. 3.3.1 Highway Networks The highway network defines the road system in a manner that can be read, stored, and manipulated by travel demand forecasting software. Highway networks are developed to be consistent with the TAZ system. Therefore, network coding is finer for developed areas containing small zones and coarser for less-developed areas containing larger zones. The types of analyses, for which the model will be used, determine the level of detail required. A rule of thumb is to code in roads one level below the level of interest for the study. One high- way network may be used to represent the entire day, but it may be desirable to have networks for different periods of the day that include operational changes, such as reversible lanes or peak-period HOV lanes. Multiple-period networks can be stored in a single master network file that includes

19 period or alternative-specific configurations for activation and deactivation. Each TAZ has a centroid, which is a point on the model network that represents all travel origins and destinations in a zone. Zone centroids should be located in the center of activ- ity (not necessarily coincident with the geographic center) of the zone, using land use maps, aerial photographs, and local knowledge. Each centroid serves as a loading point to the highway and transit systems and, therefore, must be connected to the model network. Sources for Network Data Digital street files are available from the Census Bureau (TIGER/Line files), other public sources, or several commercial vendors and local GIS departments. Selecting the links for the coded highway network requires the official functional clas- sification of the roadways within the region, the average traf- fic volumes, street capacities, TAZ boundaries, and a general knowledge of the area. Other sources for network development include the FHWA National Highway Planning Network, Highway Performance Monitoring System (HPMS), Freight Analysis Framework Version 3 (FAF3) Highway Network, National Transportation Atlas Database, and various state transportation networks. All of these resources may be use- ful as starting points for development or update of a model network. However, there are limitations with each in terms of cartographic quality; available network attributes; source year; and, especially with commercial sources, copyrights, which should be considered when selecting a data source to use. In states where the state DOT has a database with the road- way systems already coded, the use of the DOT’s coded net- work can speed up the network coding process. Questions can be directed to the DOT; and such a working relationship between DOT and MPO helps the modeling process because both parties understand the network data source. Highway Network Attributes Highway links are assigned attributes representing level of service afforded by the segment and associated inter- sections. Link distance based on the true shape of the road- way (including curvature and terrain), travel time, speed, link capacity, and any delays that will impact travel time must be assigned to the link. Characteristics, such as the effect of traffic signals on free-flow travel time, should be considered (see Parsons Brinckerhoff Quade & Douglas, 1992). Three basic items needed by a transportation model to determine impedance for the appropriate assignment of trips to the net- work are distance, speed, and capacity. Additional desirable items may include facility type and area type. Facility Type and Area Type The link attributes facility type and area type are used by many agencies to determine the free-flow speed and per-lane hourly capacity of each link, often via a two-dimensional look-up table. Area type refers to a method of classifying zones by a rough measure of land use intensity, primarily based on popula- tion and employment density. A higher intensity of land use generally means more intersections, driveways, traffic signals, turning movements, and pedestrians, and, therefore, slower speeds. Sometimes, roadway link speeds and capacities are adjusted slightly based on the area type where they are located. Common area type codes include central business district (CBD), CBD fringe, outlying business district, urban, sub- urban, exurban, and rural. The definition of what is included in each area type is somewhat arbitrary since each study area is structured differently. In some models, area type values are assigned during the network building process on the basis of employment and population density of the TAZ centroid that is nearest to the link (Milone et al., 2008). Note that, since area type definitions are aggregate and “lumpy,” their use in models may result in undesirable boundary effects. In many cases, use of continuous variables will be superior to use of aggregate groupings of zone types. Facility type is a designation of the function of each link and is a surrogate for some of the characteristics that deter- mine the free-flow capacity and speed of a link. Facility type may be different from functional classification, which relates more to ownership and maintenance responsibility of dif- ferent roadways. Table 3.3 provides common facility types used by some modeling agencies. Features, such as HOV lanes, tolled lanes, and reversible lanes, are usually noted in net- work coding to permit proper handling but may not be facility types per se for the purposes of typical speed/capacity look-up tables. Link Speeds Link speeds are a major input to various model compo- nents. The highway assignment process relates travel times and speeds on links to their volume and capacity. This pro- cess requires what are commonly referred to as “free-flow” speeds. Free-flow speed is the mean speed of passenger cars measured during low to moderate flows (up to 1,300 passenger cars per hour per lane). Free-flow link speeds vary because of numerous factors, including: • Posted speed limits; • Adjacent land use activity and its access control; • Lane and shoulder widths;

20 • Number of lanes; • Median type; • Provision of on-street parking; • Frequency of driveway access; and • Type, spacing, and coordination of intersection controls. Transportation models can use any of several approaches to simulate appropriate speeds for the links included in the network. Speeds should take into account side friction along the road, such as driveways, and the effect of delays at traffic signals. One way to determine the free-flow speed is to conduct travel time studies along roadways included in the network during a period when traffic volumes are low and little if any delay exists. This allows the coding of the initial speeds based on observed running speeds on each facility. Speed data are also available from various commercial providers (e.g., Inrix); and in some jurisdictions, speed information on certain facilities is collected at a subsecond level. An alternative approach is to use a free-flow speed look-up table. Such a table lists default speeds by area and facility type, which are discussed later. Although regional travel demand forecasting validation generally focuses on volume and trip length-related measures, there is often a desire to look at loaded link speeds and travel times. The analyst should be cognizant that “model time” may differ from real-world time due to the many network simplifications present in the modeled world, among other reasons. Looking at changes in time and speed can be infor- mative (e.g., by what percentage are speeds reduced/travel times increased). When looking at such information for the validation year, a variety of sources may be available for comparative purposes, including probe vehicle travel time studies, GPS data collection, and commercial data. Link Capacity In its most general sense, capacity is used here as a measure of vehicles moving past a fixed point on a roadway in a defined period of time; for example, 1,800 vehicles per lane per hour. In practice, models do not uniformly define capacity. Some models consider capacity to be applied during free-flow, un congested travel conditions, while others use mathematical formulas and look-up tables based on historical research on speed-flow relationships [e.g., Bureau of Public Roads (BPR) curves and other sources] in varying levels of congestion on different types of physical facilities. Throughout this report, the authors have tried to specify what is meant for each use of “capacity.” The definitive reference for defining highway capacity is the Highway Capacity Manual (Transportation Research Board, 2010), most recently updated in 2010. “Capacity” in a traffic engineering sense is not necessarily the same as the capac- ity variable used in travel demand model networks. In early travel models, the capacity variable used in such volume- delay functions as the BPR formula represented the volume at Level of Service (LOS) C; whereas, in traffic engineering, the term “capacity” traditionally referred to the volume at LOS E. The Highway Capacity Manual does contain use- ful information for the computation of roadway capacity, although many of the factors that affect capacity, as dis- cussed in the manual, are not available in most model high- way networks. Table 3.3. Typical facility type definitions. Facility Type Definition Link Characteristics Centroid Connectors Links that connect zones to a network that represent local streets or groups of streets. High capacity and low speed Freeways Grade-separated, high-speed, high-capacity links. Freeways have limited access with entrance and exit ramps. Top speed and capacity Expressways Links representing roadways with very few stop signals serving major traffic movements (high speed, high volume) for travel between major points. Higher speed and capacity than arterials, but lower than freeways Major Arterials Links representing roadways with traffic signals serving major traffic movements (high speed, high volume) for travel between major points. Lower speed and capacity than freeways and expressways, but more than other facility types Minor Arterials Links representing roadways with traffic signals serving local traffic movements for travel between major arterials or nearby points. Moderate speed and capacity Collectors Links representing roadways that provide direct access to neighborhoods and arterials. Low speed and capacity Ramps Links representing connections to freeways and expressways from other roads. Speeds and capacity between a freeway and a major arterial

21 Link capacities are a function of the number of lanes on a link; however, lane capacities can also be specified by facility and area type combinations. Several factors are typically used to account for the variation in per-lane capacity in a highway network, including: • Lane and shoulder widths; • Peak-hour factors; • Transit stops; • Percentage of trucks2; • Median treatments (raised, two-way left turn, absent, etc.); • Access control; • Type of intersection control; • Provision of turning lanes at intersections and the amount of turning traffic; and • Signal timing and phasing at signalized intersections. Some models use area type and facility type to define per lane default capacities and default speed. The number of lanes should also be checked using field verification or aerial or satellite imagery to ensure accuracy. Some networks combine link capacity and node capacity to better define the characteristics of a link (Kurth et al., 1996). This approach allows for a more refined definition of capacity and speed by direction on each link based on the character- istics of the intersection being approached. Such a method- ology allows better definition of traffic control and grade separation at an intersection. Typical Highway Network Database Attributes The following highway network attributes are typically included in modeling databases: • Node identifiers, usually numeric, and their associated x-y coordinates; • Link identifiers, either numeric, defined by “A” and “B” nodes, or both; • Locational information (e.g., zone, cutline, or screenline location); • Link length/distance; • Functional classification/facility type, including the divided or undivided status of the link’s cross section; • Number of lanes; • Uncongested (free-flow) speed; • Capacity; • Controlled or uncontrolled access indicator; • One-way versus two-way status; • Area type; and • Traffic count volume (where available). 3.3.2 Transit Networks Most of the transit network represents transit routes using the highways, so the highway network should be complete before coding transit. Transit network coding can be complex. Several different modes (e.g., express bus, local bus, light rail, heavy rail, commuter rail, bus rapid transit) may exist in an area; and each should have its own attribute code. Peak and off-peak transit service likely have different service char- acteristics, including headways, speeds, and possibly fares; therefore, separate peak and off-peak networks are usually developed. The transit networks are developed to be consistent with the appropriate highway networks and may share node and link definitions. Table 3.4 is a compilation of transit network characteristics that may be coded into a model’s transit network. Charac- teristics in italics, such as headway, must be included in all networks, while the remaining characteristics, such as transfer penalty, may be needed to better represent the system in some situations. Transit networks representing weekday operations in the peak and off-peak periods are usually required for transit modeling; sometimes, separate networks may be required for the morning and afternoon peak periods, as well as the mid-day and night off-peak periods. The development of bus and rail networks begins with the compilation of transit service data from all service providers in the modeled area. Transit networks should be coded for a typical weekday situation, usually represented by service provided in the fall or spring of the year. Two types of data are needed to model transit service: schedule and spatial (the path each route takes). Although the data provided by transit operators will likely contain more detail than needed for coding a transit network, software can be used to calculate, for each route, the average headway and average run time during the periods for which networks are created. Transit Line Files Local bus line files are established “over” the highway network. Sometimes nodes and links, which are coded below the grain of the TAZ system, must be added to the highway network so that the proximity of transit service to zonal 2Facilities experiencing greater-than-typical truck traffic (say, greater than 5 percent for urban facilities; greater than 10 percent for nonurban facilities) have an effective reduction in capacity available for passenger cars (i.e., trucks reduce capacity available by their passenger car equiva- lent value, often a simplified value of 2 is used). Trucks in this context are vehicles F5 or above on the FHWA classification scheme, the standard Highway Capacity Manual definition.

22 activity centers can be more accurately represented. These subzonal highway links, which are used to more accurately reflect transit route alignments, should be disallowed from use during normal highway path-building and highway assign- ments. Local bus stops are traditionally coded at highway node locations. Transit line files can be designated for different types of ser- vice or different operators using mode codes, which designate a specific provider (or provider group) or type of service. Pre- mium transit line files that operate in their own right-of-way are coded with their own link and node systems rather than on top of the highway network. Some modeling software requires highway links for all transit links, thus, necessitating the coding of “transit only” links in the highway network. The modeler may not be provided with detailed characteristics for transit services that do not already exist in the modeled area and may need guidance with regard to what attribute values should be coded for these new services (FTA, 1992). Each transit line can be coded uniquely and independently so that different operat- ing characteristics by transit line can be designated. Transit line files contain information about transit lines, such as the headway, run time, and itinerary (i.e., the sequence of nodes taken by the transit vehicle as it travels its route). Some models compute the transit speed as a function of underlying highway speed instead of using a coded run time. Line files are time-of-day specific, so there is a set of line files for each time period for which a network is coded. One can usually designate stops as board-only or alight-only (useful for accurately coding express bus service). Similarly, one can code run times for subsections of a route, not just for the entire route; a feature useful for the accurate depiction of transit lines that undergo extensions or cutbacks, or which travel through areas with different levels of congestion. One can also store route-specific comments (such as route origin, route destination, and notes) in line files. Access Links It is assumed that travelers access the transit system by either walking or driving. Zone centroids are connected to the transit system via a series of walk access and auto access paths. In the past, modeling software required that walk access and auto access links be coded connecting each zone centroid to the transit stops within walking or driving distance. These Table 3.4. Transit network characteristics and definitions. Transit Network Characteristic Description Drive access link A link that connects TAZs to a transit network via auto access to a park-and-ride or kiss- and-ride location. Effective headway* The time between successive transit vehicles on multiple routes with some or all stops in common. Headway The time between successive arrivals (or departures) of transit vehicles on a given route. Local transit service Transit service with frequent stops within a shared right-of-way with other motorized vehicles. Mode number Code to distinguish local bus routes from express bus, rail, etc. Park-and-ride-to-stop link A walk link between a park-and-ride lot and a bus stop, which is used to capture out-of- vehicle time associated with auto access trips, and also for application of penalties asso- ciated with transfers. Premium transit service Transit service (e.g., bus rapid transit, light rail transit, heavy rail, commuter rail) with long distances between infrequent stops that may use exclusive right-of way and travel at speeds much higher than local service. Route description Route name and number/letter. Run time The time in minutes that the transit vehicle takes to go from the start to the finish of its route and a measure of the average speed of the vehicle on that route. Transfer link A link used to represent the connection between stops on two transit lines that estimates the out-of-vehicle time associated with transfers, and also for application of penalties associated with transfers. Transfer penalty Transit riders generally would rather have a longer total trip without transfers than a shorter trip that includes transferring from one vehicle to another; therefore, a penalty is often imposed on transfers to discourage excess transfers during the path-building process. Walk access link A link that connects TAZs to a transit network by walking from a zone to bus, ferry, or rail service; usually no longer than one-third mile for local service and one-half mile for premium service (some modeling software distinguishes access separately from egress). Walking link A link used exclusively for walking from one location to another. These links are used in dense areas with small TAZs to allow trips to walk between locations rather than take short transit trips. *Italics indicate characteristics that must be included in all networks.

23 separate access links are still seen, particularly in models that have been converted from older modeling software packages. Current modeling software generally allows walk or auto access paths to be built using the highway network links, including, where appropriate, auxiliary links that are not available to vehicular traffic (such as walking or bicycle paths). Walk paths are coded to transit service that is within walk- ing distance of a zone to allow access to and egress from transit service. The maximum walking distance may vary depending on urban area, with larger urban areas usually having longer maximum walk distances although generalizations about typical values could be misleading. The best source for deter- mining maximum walk distances is an on-board survey of transit riders. Some models may classify “short” and “long” walk distances. Auto access paths are used to connect zones with park-and- ride facilities or train stations. Auto access paths are coded for zones that are not within walking distance (as classified by that model) of transit service but are deemed to be used by transit riders from a zone. A rule-based approach (for example, maximum distance between the zone centroid and the stop) is often used to determine which zones will have auto access to which stops. Again, the best source for determining which zones should have auto access is an on-board survey of transit riders. Travel Times and Fares The time spent on transit trips—including time spent riding on transit vehicles, walking or driving to and from transit stops, transferring between transit lines, and waiting for vehicles—must be computed. This computation is done by skimming the transit networks for each required variable (for example, in-vehicle time, wait time, etc.). In-vehicle times are generally computed from the network links represent- ing transit line segments, with speeds on links shared with highway traffic sometimes computed as a function of the underlying (congested) highway speed. Wait times are usu- ally computed from headways with one-half of the headway representing the average wait time for frequent service and maximum wait times often used to represent infrequent service where the travelers will know the schedules and arrange their arrival times at stops accordingly. Auto access/egress times are often computed from highway networks. Walk access/ egress times are sometimes computed assuming average speeds applied to distances from the highway networks. Transit fares used in the mode choice process must be computed. The process may need to produce multiple fare matrices representing the fare for different peak and off-peak conditions. This can be done in multiple ways. If the fare system is distance based, then transit fares can be calculated by the modeling software by skimming the fare over the shortest path just as the time was skimmed. Systems that use one fare for all trips in the study area can assign a fare to every trip using transit. More complex systems with multiple fare tariffs will require unique approaches that may be a combination of the previous two or require the use of special algorithms. Some transit systems require transfer fares that are applied whenever a rider switches lines or from one type of service to another. 3.3.3 Updating Highway and Transit Networks Transportation networks change over time and must be coded to represent not only current conditions for the base year, but also forecasting scenarios so that models can be used to forecast the impact of proposed changes to the highway network. Socioeconomic data and forecasts must also be updated, and these can affect network attributes (for example, area type definitions that depend on population and employ- ment density). It is good transportation planning practice to have a rel- atively up-to-date base year for modeling, particularly when there are major changes to the supply of transportation facili- ties and/or newer socioeconomic data available. Many of the same data sources, such as digitized street files, aerial photographs, and state and local road inventories, can be used to update the network to a new base year. A region’s Transportation Improvement Program (TIP) and state and local capital improvement programs (CIPs) are also very useful for updating a network representing an earlier year to a more recent year. Traffic volumes and transit ridership coded in the network should also be updated for the new base year. Most MPOs and many local governments use models to evaluate short- and long-range transportation plans to determine the effect of changes to transportation facilities in concert with changes in population and employment and urban structure on mobility and environmental condi- tions in an area. Updating the transportation network to a future year requires some of the same data sources, as well as additional ones. In addition to TIPs and CIPs, master plans, long-range transportation plans, comprehensive plans, and other planning documents may serve as the source of net- work updates. 3.3.4 Network Data Quality Assurance Regardless of the sources, network data should be checked using field verification or an overlay of high-resolution aerials or satellite imagery.

24 Visual inspection cannot be used to verify certain link characteristics, such as speed and traffic volume, which may often be verified using databases and GIS files available from state DOTs or other agencies. One approach used to verify coded distances is to use the modeling software to build two zone-to-zone distance matrices: the first using airline distance calculated using the x-y coordinates for each centroid, and the second using the over-the-road distance calculated from paths derived using the coded distance on each link. If one matrix is divided by the other, the analyst can look at the results and identify situations where the airline distance is greater than the over-the-road distance, or where the airline distance is much lower. These situations should be investigated to determine if they are the result of a coding error. Coded speeds can be checked in a similar fashion by creat- ing skim trees (time between zone matrices) for each mode and dividing them by the distance matrix. Resulting high or low speeds should be investigated to determine if they are the result of coding errors. There are other data sources that may be used for reasonable- ness checking of roadway networks. For example, the HPMS has network data that may be used to check model networks. Quality assurance applies to transit networks, as well as highway networks. Local data sources may be available to check the networks against. For example, transit operators can often provide line-level data on run times, service hours, and service miles, which can be compared to model estimates of the same. The Travel Model Validation and Reasonableness Checking Manual, Second Edition (Cambridge Systematics, Inc., 2010b) includes detailed discussions of other transit network check- ing methods, including comparing modeled paths to observed paths from surveys and assigning a trip table developed from an expanded transit survey to the transit network. 3.4 Validation Data Model validation is an important component of any model development process. As documented in the Travel Model Validation and Reasonableness Checking Manual, Second Edition (Cambridge Systematics, Inc., 2010b), planning for validation and ensuring that good validation data are available are tasks that should be performed as an integral part of the model development process. Model validation should cover the entire modeling process, including checks of model input data and all model com- ponents. While reproduction of observed traffic counts and transit boardings may be important validation criteria, they are not sufficient measures of model validity. Adjustments can be made to any model to reproduce base conditions. Pendyala and Bhat (2008) provide the following comments regarding travel model validation: There is no doubt that any model, whether an existing four- step travel demand model or a newer tour- or activity-based model, can be adjusted, refined, tweaked, and—if all else fails— hammered to replicate base year conditions. Thus, simply per- forming comparisons of base year outputs from four-step travel models and activity-based travel models alone (relative to base year travel patterns) is not adequate . . . the emphasis needs to be on capturing travel behavior patterns adequately from base year data, so that these behavioral patterns may be reasonably transferable in space and time. 3.4.1 Model Validation Plan The development of a model validation plan at the outset of model development or refinement is good model develop- ment practice. The validation plan should establish model validation tests necessary to demonstrate that the model will produce credible results. Such tests depend, in part, on the intended uses of the model. Validation of models intended for support of long-range planning may have increased focus on model sensitivity to key input variables and less focus on the reproduction of traffic counts or tran- sit boardings. Conversely, models intended for support of facility design decisions or project feasibility probably require a strong focus on the reproduction of traffic counts or transit boardings. The validation plan should identify tests and validation data for all model components. A good approach for the develop- ment of a validation plan is to identify the types of validation tests and the standards desired (or required) prior to identify- ing whether the required validation data are available. Then, once the tests and required data have been identified, the available validation data can be identified and reviewed. Data deficiencies can then be pinpointed and evaluated against their importance to the overall model validation, as well as the cost, time, and effort required to collect the data. 3.4.2 Example Model Validation Tests Ideally, model validation tests should address all model components. The list of tests shown in Table 3.5 was devel- oped by a panel of travel modeling experts who participated in the May 2008 Travel Model Improvement Program Peer Exchange on Travel Model Validation Practices (Cambridge Systematics, Inc., 2008b). The table is intended to provide examples of tests and sources of data that may be used to validate travel models.

(continued on next page) Table 3.5. Example primary and secondary model validation tests. Model Component Primary Tests Secondary Tests Potential Validation Data Sources Networks/Zones Correct distances on links Network topology, including balance between roadway network detail and zone detail Appropriateness of zone size given spatial distribution of population and employment Network attributes (managed lanes, area types, speeds, capacities) Network connectivity Transit run times Intrazonal travel distances (model design issue) Zone structure compatibility with transit analysis needs (model design issue) Final quality control checks based on review by end users Transit paths by mode on selected interchanges GIS center line files Transit on-board or household survey data Socioeconomic Data/Models Households by income or auto ownership Jobs by employment sector by geographic location Locations of special generators Qualitative logic test on growth Population by geographic area Types and locations of group quarters Frequency distribution of households and jobs (or household and job densities) by TAZ Dwelling units by geographic location or jurisdiction Households and population by land use type and land use density categories Historical zonal data trends and projections to identify “large” changes (e.g., in autos/ household from 1995 to 2005) Census SF-3 data QCEW Private sources, such as Dun & Bradstreet Trip Generation Reasonableness check of trip rates versus other areas Logic check of trip rate relationships Checks on proportions or rates of nonmotorized trips Reasonableness check of tour rates Cordon lines by homogeneous land use type Chapter 4 of this report Traffic counts (or intercept survey data) for cordon lines Historic household survey data for region NHTS (2001 or 2009) Trip Distribution Trip length frequency distributions (time and distance) by market segments Worker flows by district District-to-district flows/desire lines Intrazonal trips External station volumes by vehicle class Area biases (psychological barrier— e.g., river) Use of k-factors (Design Issue) Comparison to roadside intercept origin- destination surveys Small market movements Special groups/markets Balancing methods ACS/CTPP data Chapter 4 of this report Traffic counts (or intercept survey data) for screenlines Historic household survey data for region NHTS (2001 or 2009)

Table 3.5. (Continued). Model Component Primary Tests Secondary Tests Potential Validation Data Sources Mode Choice Mode shares (geographic level/market segments) Check magnitude of constants and reasonableness of parameters District-level flows Sensitivity of parameters to LOS variables/elasticities Input variables Mode split by screenlines Frequency distributions of key variables Reasonableness of structure Market segments by transit service Existence of “cliffs” (cutoffs on continuous variables) Disaggregate validation comparing modeled choice to observed choice for individual observations Traffic counts and transit (or intercept survey data) for screenlines CTPP data Chapter 4 of this report Transit on-board survey data NHTS (2001 or 2009) Household survey data (separate from data used for model estimation) Transit Assignment Major station boardings Bus line, transit corridor, screenline volumes Park-and-ride lot vehicle demand Transfer rates Kiss-and-ride demand Transfer volumes at specific points Load factors (peak points) Transit boarding counts Transit on-board survey data Special surveys (such as parking lot counts) Traffic Assignment Assigned versus observed vehicles by screenline or cutline Assigned versus observed vehicles speeds/times (or vehicle hours traveled) Assigned versus observed vehicles (or vehicle miles traveled) by direction by time of day Assigned versus observed vehicles (or vehicle miles traveled) by functional class Assigned versus observed vehicles by vehicle class (e.g., passenger cars, single-unit trucks, combination trucks) Subhour volumes Cordon lines volumes Reasonable bounds on assignment parameters Available assignment parameters versus required assignment parameters for policy analysis Modeled versus observed route choice (based on data collected using GPS- equipped vehicles) Permanent traffic recorders Traffic count files HPMS data Special speed surveys (possibly collected using GPS-equipped vehicles) Source: Cambridge Systematics, Inc. (2008b). Time of Day of Travel Time of day versus volume peaking Speeds by time of day Cordon counts Market segments by time of day Permanent traffic recorder data NHTS (2001 or 2009) Historic household survey data for region Transit boarding count data

TRB’s National Cooperative Highway Research Program (NCHRP) Report 716: Travel Demand Forecasting: Parameters and Techniques provides guidelines on travel demand forecasting procedures and their application for helping to solve common transportation problems.

The report presents a range of approaches that are designed to allow users to determine the level of detail and sophistication in selecting modeling and analysis techniques based on their situations. The report addresses techniques, optional use of default parameters, and includes references to other more sophisticated techniques.

Errata: Table C.4, Coefficients for Four U.S. Logit Vehicle Availability Models in the print and electronic versions of the publications of NCHRP Report 716 should be replaced with the revised Table C.4 .

NCHRP Report 716 is an update to NCHRP Report 365 : Travel Estimation Techniques for Urban Planning .

In January 2014 TRB released NCHRP Report 735 : Long-Distance and Rural Travel Transferable Parameters for Statewide Travel Forecasting Models , which supplements NCHRP Report 716.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Engineering LibreTexts

5.2: Traffic Flow

  • Last updated
  • Save as PDF
  • Page ID 47334

  • David Levinson et al.
  • Associate Professor (Engineering) via Wikipedia

Traffic Flow is the study of the movement of individual drivers and vehicles between two points and the interactions they make with one another. Unfortunately, studying traffic flow is difficult because driver behavior cannot be predicted with one-hundred percent certainty. Fortunately, drivers tend to behave within a reasonably consistent range; thus, traffic streams tend to have some reasonable consistency and can be roughly represented mathematically. To better represent traffic flow, relationships have been established between the three main characteristics: (1) flow, (2) density, and (3) velocity. These relationships help in planning, design, and operations of roadway facilities.

Traffic flow theory

Time-space diagram.

Traffic engineers represent the location of a specific vehicle at a certain time with a time-space diagram. This two-dimensional diagram shows the trajectory of a vehicle through time as it moves from a specific origin to a specific destination. Multiple vehicles can be represented on a diagram and, thus, certain characteristics, such as flow at a certain site for a certain time, can be determined.

Road Space Requirements.png

Flow and density

Flow (q) = the rate at which vehicles pass a fixed point (vehicles per hour) ,

\[ t_{measured}=Average \text{ } measured \text{ } time \text{ } headway\)

\[q=\frac{3600 N}{t_{measured}}

Density (Concentration) (k) = number of vehicles (N) over a stretch of roadway (L) (in units of vehicles per kilometer)

\[k=\frac{N}{L}\]

L

  • \(q\) = equivalent hourly flow
  • \(L\) = length of roadway
  • \(k\) = density

Measuring speed of traffic is not as obvious as it may seem; we can average the measurement of the speeds of individual vehicles over time or over space, and each produces slightly different results.

Time mean speed

Time mean speed (\(\bar t\)) = arithmetic mean of speeds of vehicles passing a point

\[\bar v_t=\frac{1}{N} \sum_{n=1}^Nv_n\]

Space mean speed

Space mean speed (\(\bar {v_s}\)) is defined as the harmonic mean of speeds passing a point during a period of time. It also equals the average speeds over a length of roadway.

\[\bar v_t=\dfrac{N}{\sum_{n=1}^N \frac{1}{v_n}}\)

Relating time and space mean speed

Note that the time mean speed is average speed past a point as distinct from space mean speed which is average speed along a length.

The two speeds are related as

\[\bar v_t=\bar v_s + \frac{\sigma_s^2}{\bar v_s}\]

The time mean speed higher than the space mean speed, but the differences vary with the amount of variability within the speed of vehices. At high speeds (free flow), differences are minor, whereas in congested times, they might differ a factor 2.

The following definitions give what is referred to as the brutto gap (Asela) (Italian for gross ), in contrast to netto gaps (Italian for net ). Netto gaps give the distance or time between the rear bumper of a vehicle and the front bumper of the next.

Time headway

Time headway (\(h_t\)) = difference between the time when the front of a vehicle arrives at a point on the highway and the time the front of the next vehicle arrives at the same point (in seconds)

Average Time Headway (\(\bar h_t\)) = Average Travel Time per Unit Distance * Average Space Headway

\[\bar h_t=\bar t *\bar h_s\]

Space headway

Space headway (\(h_s\)) = difference in position between the front of a vehicle and the front of the next vehicle (in meters)

Average Space Headway (\(\bar h_s\))= Space Mean Speed * Average Time Headway

\[\bar h_s = \bar v_s * \bar h_t\]

Note that density and space headway are related:

\[k=\frac{1}{\bar{h_s}\]

Fundamental Diagram of Traffic Flow

The variables of flow, density, and space mean speed are related definitionally as:

\[q=k\bar v_s\]

Traditional Model (Parabolic)

Properties of the traditional fundamental diagram.

  • When density on the highway is zero, the flow is also zero because there are no vehicles on the highway
  • As density increases, flow increases
  • When the density reaches a maximum jam density (\(k_j\)), flow must be zero because vehicles will line up end to end
  • Flow will also increase to a maximum value (\(q_m\)), increases in density beyond that point result in reductions of flow.
  • Speed is space mean speed.
  • At density = 0, speed is freeflow (\(v_f\)). The upper half of the flow curve is uncongested, the lower half is congested.
  • The slope of the flow density curve gives speed. Rise/Run = Flow/Density = Vehicles per hour/ Vehicles per km = km / hour

travel flow models

Observation (Triangular or Truncated Triangular)

Actual traffic data is often much noisier than idealized models suggest. However, what we tend to see is that as density rises, speed is unchanged to a point (capacity) and then begins to drop if it is affected by downstream traffic (queue spillbacks). For a single link, the relationship between flow and density is thus more triangular than parabolic. When we aggregate multiple links together (e.g. a network), we see a more parabolic shape.

travel flow models

Microscopic and Macroscopic Models

Models describing traffic flow can be classed into two categories: microscopic and macroscopic. Ideally, macroscopic models are aggregates of the behavior seen in microscopic models.

Microscopic Models

Microscopic models predict the following behavior of cars (their change in speed and position) as a function of the behavior of the leading vehicle.

travel flow models

Macroscopic Models

Macroscopic traffic flow theory relates traffic flow, running speed, and density. Analogizing traffic to a stream, it has principally been developed for limited access roadways (Leutzbach 1988). The fundamental relationship “q=kv” (flow (q) equals density (k) multiplied by speed (v)) is illustrated by the fundamental diagram. Many empirical studies have quantified the component bivariate relationships (q vs. v, q vs. k, k vs. v), refining parameter estimates and functional forms (Gerlough and Huber 1975, Pensaud and Hurdle 1991; Ross 1991; Hall, Hurdle and Banks 1992; Banks 1992; Gilchrist and Hall 1992; Disbro and Frame 1992).

The most widely used model is the Greenshields model, which posited that the relationships between speed and density is linear. These were most appropriate before the advent of high-powered computers enabled the use of microscopic models. Macroscopic properties like flow and density are the product of individual (microscopic) decisions. Yet those microscopic decision-makers are affected by the environment around them, i.e. the macroscopic properties of traffic.

While traffic flow theorists represent traffic as if it were a fluid, queueing analysis essentially treats traffic as a set of discrete particles. These two representations are not-necessarily inconsistent. The figures to the right show the same 4 phases in the fundamental diagram and the queueing input-output diagram. This is discussed in more detail in the next section.

travel flow models

Example 1: Time-Mean and Space-Mean Speeds

Given five observed velocities (60 km/hr, 35 km/hr, 45 km/hr, 20 km/hr, and 50 km/hr), what is the time-mean speed and space-mean speed?

Time-Mean Speed:

\(\bar v_t=\dfrac{1}{5}(60+35+45+20+50)=42\)

Space-Mean Speed:

\(\bar v_s=\frac{N}{\sum_{n=1}^N\dfrac{1}{v_n}=\frac{5}{\dfrac{1}{60}+\dfrac{1}{35}+\dfrac{1}{45}+\dfrac{1}{20}+\dfrac{1}{50}}=36.37\)

The time-mean speed is 42 km/hr and the space-mean speed is 36.37 km/hr.

Example 2: Computing Traffic Flow Characteristics

Given that 40 vehicles pass a given point in 1 minute and traverse a length of 1 kilometer, what is the flow, density, and time headway?

Compute flow and density:

\(q=\frac{3600(40)}{60s}=2400 \text{ } veh/hr\)

\(k=\frac{40}{1}=40 \text{ } veh/km\)

Find space-mean speed:

\(q=k \bar v_s=2400 =40 \bar v_s\)

\(\bar v_s=60 km/hr\)

Compute space headway:

\(k=40=\frac{1}{\bar h_s}\)

\(\bar h_s=0.025 km =25m\)

Compute time headway:

\(\bar h_s = \bar v_s* \bar h_t=25=(60*1000/3600)\bar h_t\)

\(\bar h_t=1.5s\)

The time headway is 1.5 seconds.

EXAMPLE 3: The spot speeds (expressed in km/hr) observed at a road section are 66, 62, 45, 79, 32, 51,56,60,53 and 49. The median speed (expressed in km/hr) is .

Solution: Median speed is the speed at the middle value in series of spot speeds that are arranged in ascending order. 50% of speed values will be greater than the median 50% will be less than the median. Ascending order of spot speed studies are 32,39,45,51,53,56,60,62,66,79

Median speed = (53 +56 )/2=54.5 km/hr.

Thought Question

Microscopic traffic flow simulates the behaviors of individual vehicles while macroscopic traffic flow simulates the behaviors of the traffic stream overall. Conceptually, it would seem that microscopic traffic flow would be more accurate, as it would be based on driver behavior than simply flow characteristics. Assuming microscopic simulation could be calibrated to truly account for driver behaviors, what is the primary drawback to simulating a large network?

Computer power. To simulate a very large network with microscopic simulation, the number of vehicles that needed to be assessed is very large, requiring a lot of computer memory. Current computers have issues doing very large microscopic networks in a timely fashion, but perhaps future advances will do away with this issue.

Sample Problem

Four vehicles are traveling at constant speeds between sections X and Y (280 meters apart) with their positions and speeds observed at an instant in time. An observer at point X observes the four vehicles passing point X during a period of 15 seconds. The speeds of the vehicles are measured as 88, 80, 90, and 72 km/hr respectively. Calculate the flow, density, time mean speed, and space mean speed of the vehicles.

\(q=N(\dfrac{3600}{t_{measured}})=4(\dfrac{3600}{15})=960 \text{ } veh/hr\)

\(k=\frac{N}{L}=\frac{4*1000}{280}=14.2 \text{ } veh/km

Time Mean Speed

\(\bar v_t=\frac{1}{N} \sum_{n=1}^N v_n=\frac{1}{4}(72+90+80+88)=82.5 \text{ } km/hr\)

Space Mean Speed

\(\bar v_s=\frac{N}{\sum_{n=1}^N \frac{1}{v_i}}=\frac{4}{\frac{1}{72} \frac{1}{90} \frac{1}{80} \frac{1}{88}}=81.86

\(t_i=L/v_i\)

\(t_A=L/v_A=0.28/88=0.00318hr\)

\(t_B=L/v_B=0.28/80=0.00350hr\)

\(t_C=L/v_C=0.28/90=0.00311hr\)

\(t_D=L/v_D=0.28/72=0.00389hr\)

\(\bar v_s=\frac{NL}{\sum_{n=1}^N i_n}=\frac{4*0.28}{(0.00318+0.00350+0.00311+0.00389)}=81.87 \text{ } km/hr\)

  • \(d_n\) = distance of n th vehicle
  • \(t_n\) = travel time of n th vehicle
  • \(v_n\) = speed (velocity) of n th vehicle
  • \(h_{t,nm}\) = time headway between vehicles \(n\) and \(m\)
  • \(h_{s,nm}\) = space (distance) headway between vehicles \(n\) and \(m\)
  • \(q\) = flow past a fixed point (vehicles per hour)
  • \(N\) = number of vehicles
  • \(t_{measured}\) = time over which measurement takes place (number of seconds)
  • \(t\) = travel time
  • \(k\) = density (vehicles per km)
  • \(L\) = length of roadway section (km)
  • \(v_t\) = time mean speed
  • \(v_s\) = space mean speed
  • \(v_f\) = freeflow (uncongested speed)
  • \(k_j\) = jam density
  • \(q_m\) = maximum flow
  • Time-space diagram
  • Flow, speed, density
  • Headway (space and time)
  • Space mean speed, time mean speed
  • Microscopic, Macroscopic

Supplementary Reading

  • Revised Monograph on Traffic Flow Theory

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

An Open-Access Modeled Passenger Flow Matrix for the Global Air Network in 2010

* E-mail: [email protected]

Affiliations Department of Geography, University of Florida, Gainesville, Florida, United States of America, Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America

Affiliation Department of Statistics, University of Florida, Gainesville, Florida, United States of America

Affiliation Department of Geography, University of Florida, Gainesville, Florida, United States of America

Affiliations Department of Geography, University of Florida, Gainesville, Florida, United States of America, Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America, Department of Geography and Environment, University of Southampton, Highfield, Southampton, United Kingdom

  • Zhuojie Huang, 
  • Xiao Wu, 
  • Andres J. Garcia, 
  • Timothy J. Fik, 
  • Andrew J. Tatem

PLOS

  • Published: May 15, 2013
  • https://doi.org/10.1371/journal.pone.0064317
  • Reader Comments

Table 1

The expanding global air network provides rapid and wide-reaching connections accelerating both domestic and international travel. To understand human movement patterns on the network and their socioeconomic, environmental and epidemiological implications, information on passenger flow is required. However, comprehensive data on global passenger flow remain difficult and expensive to obtain, prompting researchers to rely on scheduled flight seat capacity data or simple models of flow. This study describes the construction of an open-access modeled passenger flow matrix for all airports with a host city-population of more than 100,000 and within two transfers of air travel from various publicly available air travel datasets. Data on network characteristics, city population, and local area GDP amongst others are utilized as covariates in a spatial interaction framework to predict the air transportation flows between airports. Training datasets based on information from various transportation organizations in the United States, Canada and the European Union were assembled. A log-linear model controlling the random effects on origin, destination and the airport hierarchy was then built to predict passenger flows on the network, and compared to the results produced using previously published models. Validation analyses showed that the model presented here produced improved predictive power and accuracy compared to previously published models, yielding the highest successful prediction rate at the global scale. Based on this model, passenger flows between 1,491 airports on 644,406 unique routes were estimated in the prediction dataset. The airport node characteristics and estimated passenger flows are freely available as part of the Vector-Borne Disease Airline Importation Risk (VBD-Air) project at: www.vbd-air.com/data .

Citation: Huang Z, Wu X, Garcia AJ, Fik TJ, Tatem AJ (2013) An Open-Access Modeled Passenger Flow Matrix for the Global Air Network in 2010. PLoS ONE 8(5): e64317. https://doi.org/10.1371/journal.pone.0064317

Editor: Tobias Preis, University of Warwick, United Kingdom

Received: December 18, 2012; Accepted: April 10, 2013; Published: May 15, 2013

Copyright: © 2013 Huang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: AJT and ZH acknowledge support from the Transportation Research Board of the National Academy of Sciences, through contract #ACRP02-20. AJT also acknowledges funding support from the RAPIDD program of the Science & Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health, and is also supported by grants from the Bill and Melinda Gates Foundation (#49446 and#1032350). AJG was partially supported by the National Science Foundation under Grant No.0801544 in the Quantitative Spatial Ecology, Evolution, and Environment Program at the University of Florida. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Demand for travel has boosted the growth of the global air travel network at an unprecedented rate. In the past 20–30 years, the network has expanded dramatically with a steady growth rate of 4–5% per year [1] , accompanied by a nearly 9% annual growth rate of passenger and freight traffic [2] . In 2011, the worldwide international and domestic passenger kilometers transported reached a record-high of 5.2 trillion kilometers [3] . The large volumes of air traffic, result in profound impacts on commodity trade [4] , regional development [5] , cultural communication [6] , disease importation [7] , [8] and species invasion [9] – [11] . As humans and commodities are transported at exceptional rates through aviation compared to other modes of transportation, how these patterns impact the socioeconomic, environmental and epidemiological landscape is of significant interest [7] , [9] , [11] , [12] .

Quantifying the volume of passengers on the air travel network is critical to understanding the complicated spatial interaction between origin and the destination cities [7] , [8] . Previously, studies from a range of fields [9] – [11] , [13] – [16] have made use of data from the International Air Transport Association (IATA) or the International Civil Aviation Organization (ICAO). These data are often restricted to scheduled flight plus seat capacity information on routes. However, not all commercial flights operate at full capacity; and such data often overestimate the passenger numbers on affected routes [7] . Moreover, capacity data provide information on only point-to-point connection; thus, travel patterns that require a stopover and transfer of planes are not captured [17] . Although origin-destination data derived from air ticket sales are available (e.g. http://www.iata.org/ps/intelligence_statistics/paxis/pages/index.aspx ), such data are expensive for research purposes, running to many tens of thousands of dollars, and can require significant legal and confidentiality agreements for data usage. Other databases of international flow by pair-wise airports are held by private companies (e.g. Marketing Information Data Transfer, http://ma.aspirion.aero/midt ). These proprietary data bases are costly and difficult to obtain; with payment required repeatedly to maintain the latest data. Here we aim to outline a modeling framework to produce open-access estimates of global air traffic flows for research purposes that can be regularly updated.

Spatial interaction models have been utilized to estimate the volume of passengers given an origin and destination city where data are lacking [4] , [14] , [15] , [18] – [25] . The most common of which is the gravity-type model, which incorporates drivers such as the site characteristics of origins and destinations, and measures of “locational separation” to depict the interaction between origins and destinations for purposes of estimating flows. As Grosche et al [25] summarized, commonly used drivers in the spatial interaction model to estimate the air traffic include 1) socio-economic characteristics of origins and destinations, such as population, income, GDP, urban infrastructure, education level, and 2) service-related factors such as the quality (e.g., flight frequency, plane size and air fares) and the market demand of airline service. The locational separation is usually calibrated by the distance or travel time separating origins and destinations. The gravity model provides a solid theoretical and practical background on understanding the movement of populations since it explicitly captures the absolute and relative spatial relationship of origins and destinations [19] .

The utilization of network characteristics sheds light on the identification of air service factors in the gravity model for flow estimation, since 1) the layout of the global air travel network follows the “hub-and-spoke” network model and 2) heterogeneities in the network topologies are indicated by the demands of air travel for the geographic areas in which the airports serves. Firstly, large air travel companies in mature air travel markets adapt a hub-and-spoke model to achieve a balance of travel time for customers and increase efficiencies in the use of transportation infrastructure. In this model, a single airport is assigned to a single hub or multiple hubs to form a regional inter-connected community [26] – [28] , where “stop over” and “feeder” routes exist; connecting the small airports with low degree connections to a larger degree hub [29] . The locations of airport hubs are selected as the optimum locations that satisfy the inter-regional travel demands and minimize the total transportation cost [26] , [27] . Moreover, the hub-and-spoke layout can be reflected on the “small-world” and the “scale-free” characteristics on the network. Guimera et al [30] studied the “small-world” feature and showed that most airports can be reached from every other with only a small number of connections. They also identified how central nodes with low degree connectivity play an important role for inter-regional and intra-regional communication. The “scale-free” feature ensures that the degrees of the air travel network follow a power-law distribution as suggested by the nodal structure of flows clusters [31] and described by the hierarchical span of the major airports in the United States [16] .

Secondly, the connectivity and centrality of airports in the air travel network can act as indicators for air travel demand, since the local measurement of air passenger volume, population, and the level of economic activities at the periphery of the hub are highly correlated [32] – [34] . Empirical research [35] – [37] suggests a link between observed incremental growth of air passengers, increased passenger flows, and economic growth. Liu et al. [38] quantified the marginal effects of population growth in metropolitan areas on the air travel market, indicating that the odds of having a ‘major’ air traffic market increase 41% per 100,000 population growth. Wang et al [39] studied the air travel network in China and found that cities in the more urbanized area of East China had a higher centrality score and a higher number of air passenger volumes compared to the more rural West China. These studies indicate the mutual correlation of network centralities and urban development, and reflect the spatial agglomeration of economic activities and unequal air travel service demands.

To study the movement of vector-borne disease on the air travel network, Johansson et al [17] , [40] modeled the actual passengers counts between 141 airports worldwide, for origins and destinations that had epidemic significance. Utilizing the air travel itineraries of the United States as a training set, they constructed a generalized linear model with a Poisson link to estimate worldwide passenger flows using nodes and routes characteristics as model covariates. Their models provided reasonable flow predictions of origin-destination travel. Our research follows the general modeling framework used in Johansson et al [17] , [40] , but extends the specification to a global model which includes: 1) all nodes with a host-city population of more than 100,000; 2) routes between all airports that are within 0, 1 or 2 stops on the air travel network.

Materials and Methods

Airport locations and scheduled routes.

Information on a total of 3,416 airports across the world, together with their coordinate locations was obtained using Flightstats ( www.flightstats.com ) for 2010. The connectivity and scheduled air travel network routes were defined by a 2010 scheduled flight capacity dataset purchased from OAG ( www.oag.com ). These included information on direct links (if a commercial flight is scheduled) of origin and destination airports, flight distances, and passenger capacity by month for 2010. Directly connected airports pairs were utilized to construct a graph for the air travel network in 2010 with 3,416 nodes and 37,674 edges. The average degree of the network was 22.06, with the maximal degree recorded as 476 for Frankfurt Airport (IATA code: FRA). The topology of the graph exhibited both small-world and scale-free properties as already observed in similar global or regional air travel dataset analyses [30] , [41] , [42] . The coefficients of the power law function fitting the scaled-degree distribution was 1.01±0.1, which is in concordance with a previous study [30] . The average path length is 4.11, measured as the average number of steps travelling from any one node to any other node, while the diameter of this network was 14 (which indicates the shortest path between the two most remote airports). Based on the network created by the flight statistics assembled, we calculated the degree, centrality and strength for each node and use these measurements as covariates at the modeling stage.

GDP and Population Information

Generally, socio-economic variables at a global scale are difficult to obtain. The G-Econ data ( http://gecon.yale.edu/ ) provide indices representing both market exchange rates (MER) and purchasing power parity (PPP) at a 1-degree longitude by 1-degree latitude resolution at a global scale. Due to the large geographical coverage of the grid cells, we extracted the closest PPP value for an airport and calculated the PPP value per capita in 2005 by dividing the purchasing power parity by the population value in each grid cell. These data were utilized as local economic measurements for each airport.

Given computing power limitations on the modeling and matrix sizes, we selected the airports serving a city population number more than 100,000. To select these airports, a web crawler built on the WolframAlpha API ( http://products.wolframalpha.com/api/ ) was used to extract the city populations for each airport. Wolfram alpha is a knowledge engine which is capable of computing population information from various sources including: U.S census data, United Nations urban agglomeration and City Population ( http://www.citypopulation.de/ ) data. These data capture the most recent city population estimates from these data sources (for cities in United States, the US Census 2010 data were utilized). In our database, there were 1,491 airports satisfying these criteria.

Actual Travel Passenger Flow

Data on passenger origins and destinations on the air travel network were obtained from a variety of sources to construct a training dataset:

  • The DB1B market data from the Airline Origin and Destination Survey (DB1B) provides a 10% sample of U.S. domestic passenger tickets from reporting carriers, including information such as the reporting carrier, origin and destination airports, prorated market fares, number of market coupons, market miles flown, and carrier change indicators. To create a training dataset, these data were aggregated annually by the origin and the destination airport code with the sum of counts of itineraries. This sum of counts was simply multiplied by 10 to reflect the 10% sample schema. To protect the US air travel industry, the reported international Origin-Destination data by the U.S carriers is strictly restricted to U.S citizens, and requires detailed statements on the use of the data. Hence, the research presented here did not take into account the international portion of the Origin-Destination data from DB1B.
  • The Canadian transport department provides statistics relating to the movement of aircraft, passengers and cargo by air for both Canadian and foreign air carriers operating in Canada ( http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=2703&lang=en&db=imdb&adm=8&dis=2 ). This survey provides estimates of the number of passengers traveling on scheduled domestic commercial flights by directional origin and destination city pairs. In this survey, significant numbers of Canada-U.S trips were reported. The city pairs were matched to the airport pair that had the shortest routes defined by the OAG database with the passenger number obtained from the above data source. For example, passenger numbers between Toronto and New York City were matched to the direct route of YYZ to JFK, since it is the shortest route between these two cities.
  • Detailed route data for passenger numbers from EuroStat ( http://epp.eurostat.ec.europa.eu/portal/page/portal/transport/data/database ). This database presents passenger numbers between the main airports of reporting countries and their main partner airports in the European Union.

All of these flow statistics were utilized to create a training dataset O-D matrix. In this training dataset, there were 95,709 aggregated itineraries between 712 airports. The covariates used for modeling are described below.

Network Covariate Processing

Cities are situated in a complex hierarchical network and the flows between cities are either constrained or facilitated by this hierarchical structure [4] , [43] . We defined three levels of economic activity for each city per capita based on the 33% quartile of the distribution of PPP per capita. Thus, nine types of economic links were identified (low-low, low-medium, low-high, etc.) to reflect the type of flow within/across the economic hierarchies. Similarly, we defined four levels of hierarchy based on the degree distribution of the airports, and sixteen types of flows were identified to reflect the type of flow within/across the air service hierarchies.

A prediction dataset framework for routes was constructed based on the adjacency matrix defined by the OAG dataset. For each airport, destination airports via first-order connection, second-order connection and third-order connections on the air travel network were identified. Along these routes, information on the minimum number of stopovers and the maximum seat capacity were calculated. Moreover, following approaches outlined in Bhadra’s research [32] we defined a categorical variable for distance classes to separate the markets by stage lengths, with 1 for short-haul (2,000 kilometers or less), 2 for medium-haul (between 2,000 and 3,500 kilometers) and 3 for longer hauls (3,500 or more kilometers). We excluded routes less than 200 km since passengers are believed to have more efficient and effective land-based methods to travel such small distances. Note that only 3,842 possible routes (<0.001%) are less than 200 km. Finally, an origin-destination (OD) pair list with 1,295,752 rows was created.

For analytical purposes, the global OD pair list was constructed following these assumptions:

  • Passengers always take the shortest path to their destination city, and they don’t stop at the connecting city. The data used for modeling is itinerary data which represents the minimum number of stops from one airport to another. Hence, passenger numbers in our database represented the flows for the first order, the second order and the third order of network connections. We assumed that passengers choose the first shortest path found by a breadth-first search algorithm, as the route was found by iterating all the neighboring nodes until a path from the origin and the destination was identified. If both the origin and the destination cities have multiple airports, the passengers were assumed to take the shortest path from all possible routes between these airport pairs, which usually resulted in the path between the two largest airports in terms of capacity. This assumption is supported by Button et al [44] ’s research that passengers tend to choose a larger hub for their travel.
  • Passengers do not choose routes with more than two stops. We used the number of stops as a categorical variable rather than a numeric variable since it is considered to be a measure of hierarchical accessibility. In fact, for the air travel network in 2010, all of the possible calculated routes within two stops covered 83% of all the possible connections. Also, multiple-stops (more than two stops) were comparatively rare as a share of total passengers in our actual travel flow datasets. In DB1B domestic datasets, there are no itineraries for travels between cities with a population size more than 100,000 within two stops.

All the network characteristics were calculated using the igraph ( http://igraph.sourceforge.net/ ) library in R ( http://www.r-project.org/ ). Snowfall library ( http://cran.r-project.org/web/packages/snowfall/index.html ) was utilized for parallel processing to accelerate calculations. A summary of variables included in the model is presented in Table 1 .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0064317.t001

travel flow models

For the purpose of enhancing estimation and thus prediction, we tested four model specifications which include 1) a lognormal model for main effects only. This model adopts the general gravity model framework as the one described in Balcan et al [45] . To utilize this model, a logarithm transformation is performed for each quantitative variable. The main effects included both node and route characteristics. 2) A generalized linear model for main effects and interactions with Poisson distribution and a log link. This model adapted the model utilized by Johansson et al [17] , [40] for predictions of the traffic flows between epidemiologically significant cities. 3) A generalized linear model for main effects and interactions with a negative binomial distribution and a log link. This model is similar to model 2 except that it utilized a negative binomial distribution to account for the possible over-dispersion in the data. 4) A lognormal mixed model with main effects, interactions, and random effects on origin and destination city (note that a logarithm transformation is performed for each quantitative variable as well). This model assumed that the passenger flows were independent between different degree link types but correlated within the same degree link type, while model 1–3 made the assumption that all passenger flows were independent of each other, which is very strong and unrealistic in practice. Random effects were thus included to account for the dependence among passenger flows and the possible heterogeneity between levels of air travel services. More detailed model descriptions can be found in Text S1 .

Apart from model fitting on the entire training dataset, cross-validation was performed to evaluate how accurately each model would predict in practice: firstly, the training dataset was randomly partitioned into 10 subsets, each consisting of 10% of the observations. Then on each of the subsets (the cross-validation testing set), we validated the analysis using the remaining data. Lastly, the validation results were averaged over the rounds, with ranges of percentages reported. Three criteria were chosen for model evaluation: 1) the coverage rate of the 95% prediction intervals, which measured the percentage of the observations that fall into the corresponding 95% prediction intervals; 2) the coverage rate of the ±30% observation intervals, which measured the percentage of the predictions that fall into the ±30% intervals of the corresponding observations; 3) the successful prediction rate, which measured the percentage of predictions that fall into the same magnitude category as the corresponding observations. These magnitude categories were defined by dividing the passenger flow numbers into five groups: 10 2 and under, 10 2 –10 3 , 10 3 –10 4 , 10 4 –10 5 , and 10 5 +, each group represents one category.

Model Comparison on the Training Dataset

For each model, most coefficients were significant at the 0.05 significance level as the percentages of the significant coefficients are about 90%, 100%, 96%, 95% respectively for model 1 to 4. For the purpose of prediction, we kept all the covariates in the model instead of removing the non-significant ones. Not surprisingly, most of the interactions between node and route characteristics played an important role in model estimation as we treated the number of stops as a categorical variable. The interaction between haul types and inverse distance was also significant, which agreed with previous work [32] .

Both model 1 and model 2 provided narrow confidence intervals for predictions, while model 3 and model 4 provided wider intervals to accommodate variation in the data. All of these models had at least 68% successful prediction rates for predicting the magnitude of passenger flow. According to the results presented in Table 2 , model 4 provided the most accurate prediction.

thumbnail

https://doi.org/10.1371/journal.pone.0064317.t002

For each of the models, we calculated the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). RMSE is a frequently used measure of the differences between estimate values and the values actually observed. A smaller RMSE suggests a better model fit. MAE is the average of the absolute value of the prediction errors, which serves the same purpose as RMSE and is believed to be more robust in many situations. As shown in Table 3 , model 4 yielded the lowest RMSE and MAE for the majority of the data points except for extremely large observations. For the largest observed passenger value category, model 2 gave the lowest RMSE and MAE, while model 4 gave the second lowest RMSE and MAE.

thumbnail

https://doi.org/10.1371/journal.pone.0064317.t003

Figure 1 presented the prediction and diagnostic plots for Model 4. In figure 1 , panel a) showed that most of the prediction values are close to the y = x (prediction = observation) line; panel b) showed that most of the residuals scatter along the y = 0 (residual = 0) line, yielding no obvious pattern. Both plots indicated that Model 4 was a plausible model for the passenger flows. However, the prediction seemed poor at the lower tail. This was expected, given likely randomness in the smaller amount of passenger exchanges between airports [17] . Diagnostic plots for other models are presented in figure S1 and S2 .

thumbnail

a) Predicted vs. observed value of model 4. b) Residual vs. observed value of model 4. c) Distribution of ratio of predicted value vs. observed value in log scale with 95% confidence interval for geometric mean. d) Distribution of ratio of capacity vs. observed value in log scale with 95% confidence interval for geometric mean.

https://doi.org/10.1371/journal.pone.0064317.g001

Alternative diagnostics for testing the model fit were performed for model 4 as well. Firstly, a multilevel model described in Snijders et al [46] and implemented in the SAS code written by Recchia et al [47] to calculate r-squared measures for the fourth model was utilized. The first level of the model which considered only the individual connectivity was found to explain 84.0% of the variance in the data, and the second level, which incorporated the independency between different degree link type group and the within-group correlation explained 98.7% of the variance, indicating a good model fit, and an improved explanation power in terms of variance. Secondly, for the directly connected flights, we compared both the predicted value from model 4 and the capacity data from OAG to the observed passenger flows on a log scale using paired t-test. The results showed evidence of difference between the mean predicted passenger number and mean observed passenger number, and between the mean capacity number and mean observed passenger number, both at the 0.05 significance level. However, the geometric mean ratio of log (predicted value) to log (passenger number) was 1.01(panel c) in Figure 1 ), while the geometric mean ratio of log (capacity) to log (passenger number) was 1.08(panel d) in Figure 1 ). The predicted values showed more agreement on the observed value, while the capacity data represented a significant overestimation of flows between two directly connected airports. Hence, our predicted values provided a closer approximation of the traffic flows on the air travel network compared to the maximum seat capacity metric for the directly connected cities, as used in previous studies [9] – [11] , [13] – [16] .

In summary, our model (Model 4) outperformed the lognormal spatial interaction model (Model 1) used in Balcan et al [45] and the Poisson model (Model 2) used in Johansson et al [17] , [40] for the training dataset. Moreover, for direct flights, our estimates showed more homogenous agreements with observed passenger numbers compared to simple seat capacity data.

Prediction and Interpretation of the O-D Passenger Flow Matrix

Model 4 was applied on the estimation dataset to predict passenger flows with coefficients extracted from the training datasets. We have identified the over-dispersed predictions that exceeded the maximum capacity on the routes (3% of the data) and replaced them with the product of the maximum capacity on the routes. According to the training dataset, the maximum numbers of itineraries for one-stop and two-stop connections were 140,086 and 8,060, respectively. Since these data were generated from the mature air travel market and constrained by the network structure, we considered them as the upper limits of the data distribution. As such, we adjusted the prediction of the first-order connection and the second-order connection flights scaled by these two maximum numbers. We then removed all predictions that were less than 1 person. Finally, 644,406 routes with origin/destination airport codes, number of stops, and predicted passenger numbers were produced.

As described before, the passenger counts were grouped into five categories as a test of successful prediction rate in magnitude: 1–10 2 ,10 2 –10 3 ,10 3 –10 4 ,10 4 –10 5 and 10 5 and more. The first two categories presented small numbers of passenger exchanges, implying random flows between two airports, and the fourth and fifth categories indicated a higher probability representing steady flows between airports. Figure 2 a) showed all the flows with more than 10 5 predicted passengers.

thumbnail

a) Predicted flights with passenger flows of more than 100,000. b) All possible passenger flows through direct flights originating from Atlanta. c) All possible passengers’ flows through one-stop flights originating in Atlanta. d) All possible passengers’ flows through two-stop flights originating Atlanta. e) All airports with an incoming passenger numbers more than 5,000,000.

https://doi.org/10.1371/journal.pone.0064317.g002

Secondly, given an origin/destination, the dataset produced through the research outlined here can estimates the endpoints and starting points with passenger flows on the air travel network. Figures 2 b)-d) illustrated the passenger flows and number originating from Atlanta, categorized by number of transfer. Figure 2 e) showed the distribution of airports with incoming passenger numbers over 5,000,000. This reflected the mature air markets of the United States and Europe, though noticeable concentrations of airports could be observed in the emerging markets such as India and China as well.

With continuing growth of the global air travel network, we must expect continued socioeconomic, environmental, cultural and epidemiological impacts. This research shows how network characteristics combined with multiple datasets on various perspectives relating to the movements of passengers of passenger flow on the global air network can be compiled to provide estimates that are more accurate than previous modeling efforts. Such a dataset provides a valuable resource for scientists and decision makers to measure the global flow of air traffic and its potential influences.

In the database outlined here, 644,406 unique routes spanning 1,491 airports serving city populations of more than 100,000 were modeled based primarily on publicly available datasets. On the training dataset, our model has outperformed similar research at the global scale and can explain 98% of the variance in the data. Within the database, 23,785 routes follow a direct connection, 291,745 routes are one-stop connections and 328,876 routes are two-stop connections. Using this route and airport information, anyone can construct flow matrices to describe the global air traffic flow and assess its multiple impacts.

Due to data constraints, a range of uncertainties and limitations exist in the output modeled datasets. The first inconsistence comes with internal uncertainties within the DB1B dataset. To construct the DB1B dataset, Transtat only requires US carriers to report O-D pair data, hence the O-D data is likely to be inaccurate in markets served with a significant number of foreign carriers (e.g., New York, Washington D.C., Chicago, and Los Angeles).Meanwhile, flights operated by foreign carriers usually have a share code with U.S. carriers and these flights are included in our database. If there is more than one airport in a city, each of the airports is treated as a separate node. This may well result in overestimates of the flow to secondary airports in a city.

The second set of inconsistency is the population data. Due to data availability, only city population data were utilized, when it is sometimes the case that people in neighboring metropolitan area can access the airport in question through other ground-based transportation methods (for example, people in Gainesville FL are often likely to drive two hours to Jacksonville or Orlando to take a plane, rather than utilize the Gainesville regional airport which is 10 miles away from the city center). As a result, our predictions may overstate the markets for small airports.

The third source of uncertainty stems from the fact that the data we utilized for the training datasets were only from the United States, Canada and the European Union. Thus, international flights are less well represented in our dataset and most of the flight data describes the flows between airports in high income countries. Additionally, long haul international flights with more than three stops are absent.

The topology of air travel network is likely to vary at the regional level. Wang et al [39] found that in terms of topological measurements, the Chinese air travel network is similar to the Indian one, but different than that of the US. As current air travel networks in low income countries usually feature point-to-point connections between city pairs [48] , high income countries are increasingly prompted to utilized a hub-and-spoke system due to their mature air travel markets. On the other hand, it is observed that some companies (such as Southwest Airlines and Jet Blue in United States) in high income countries also adopt spoke-to-spoke models to connect hot spots of air travel demand [32] . This heterogeneity may affect the flow estimation country-wise and overestimate the driving factor of hubs in both high and low income countries.

The demand for air travel are heterogeneous and “largely determined by the spending capacity of customers” [49] . Hence, it could be anticipated that the demand for air travel in each country varies and is correlated to GDP. Also, the demographic profile of passengers on the air travel network is likely different between countries. Under a regional context, this may affect the prediction of domestic passenger numbers, while international heterogeneities in traffic flows may be attributed to differing visa policies between countries [50] . Visa restrictions may reduce traffic flows substantially between countries [51] . Moreover, cultural differences at a country level could represent indicators of attraction and drivers of population movements [6] , [52] .

The potential limitations discussed above arise through the constraints of the data sources used. These may be alleviated through incorporation of more publicly accessible data in future work, including: 1) more detailed economic indicators (such as GDP, income etc.) at the city level: such measures could further describe drivers in the spatial interaction model; 2) itineraries from low income regions of the world–such data would enlarge our training and testing databases to avoid sampling errors; 3) hub characteristics (such as the number of enplanements, transfers and deplanements): these measures could help explain the function of the hubs in controlling network flows. Alternatively, transportation forecasting models [53] , [54] and mobility and migration models [55] could be utilized to estimate the global O-D matrix based on the traffic counts on nodes and edges.

The research presented here has documented the generation of a world-wide Origin-Destination matrix of passenger flows in 2010 for airports with host city populations of more than 100,000. Results show that the modeled dataset improves substantially on the accuracy of datasets used in previous studies. The datasets are freely accessible for academic use and are published as part of the Vector-Borne Disease Airline Importation Risk (VBD-Air) project at www.vbd-air.com/data/ .

Supporting Information

Plots for predicted value vs. the predicted value at a log scale.

https://doi.org/10.1371/journal.pone.0064317.s001

Plots for residuals vs. the predicted values at a log scale.

https://doi.org/10.1371/journal.pone.0064317.s002

Model description.

https://doi.org/10.1371/journal.pone.0064317.s003

Acknowledgments

We thank Dr. Michael Daniels at the University of Texas at Austin for suggestions on the model building process. We also thank two anonymous reviewers for their valuable comments on this research. This work forms part of the Human Mobility Mapping Project ( www.thummp.org ).

Author Contributions

Conceived and designed the experiments: ZH XW AJG TJF AJT. Analyzed the data: ZH XW AJT. Contributed reagents/materials/analysis tools: XW TJF. Wrote the paper: ZH XW AJT.

  • 1. IATA (2012) IATA 2012 Annual Review. Beijing: IATA.
  • View Article
  • Google Scholar
  • 3. IATA (2011) Annual Report 2011 International Air Transport Association.
  • 7. Tatem AJ, Huang Z, Das A, Qi Q, Roth J, et al.. (2012) Air travel and vector-borne disease movement. Parasitology: 1–15. doi: https://doi.org/10.1017/S0031182012000352 .
  • 18. Long W (1970) Air travel, spatial structure, and gravity models. Ann Reg Sci: 97–107.
  • 19. Haynes K, Fotheringham A (1984) Gravity and spatial interaction models.
  • 32. Bhadra D (2003) Demand for air travel in the United States: bottom-up econometric estimation and implications for forecasts by origin and destination pairs. J of Air Transp 8.
  • 35. Jin F, Wang F, Liu Y (2004) Geographic Patterns of Air Passenger Transport in China 1980–1998?: Imprints of Economic Growth, Regional Inequality and Network Development: 37–41.
  • 47. Recchia A (2010) R-squared measures for two-level hierarchical linear models using SAS. J Stat Softw 32: Code Snippet 2.
  • 51. Neumayer E (2010) Visa restrictions and bilateral travel. The Professional Geographer: 37–41.

TF Resource

Integrated Travel Demand and Network Models

Introduction

Purpose / Need / Importance

Historical Overview

Key Concepts

Choice Contexts by Travel Dimensions

Convergence and Equilibrium

Practical Methods to Achieve Convergence in Demand-Network Equilibration

Practical Integration Schemas

Integration Design Considerations

Information Exchange

Integration Examples

Active Research Projects

Emerging Practice and Research Avenues

Related Content

Page categories

Activity Based Models

Dynamic Network Models

Integrated Travel Demand And Network Models

Model Integration

Needs Review

An integrated travel demand model that can serve various planning needs should include at least two primary components:

  • A travel demand model that generates trips by origin, destination, mode, and time of day, and
  • A network model that assigns these trips onto the corresponding networks, identifies routes chosen for each trip, and generates Level-of-Service skims that are fed back to the travel demand model.

Both parts are equally important and should be properly integrated.

# Introduction

# purpose / need / importance.

The purpose of an integrated model is to ensure that different network improvement scenarios and socio-economic / land-use scenarios can be simulated with respect to the transportation system performance in a consistent way. Integrated travel demand and network models are needed for practically any study where the compared alternatives are substantially different to generate corresponding impacts on either travel demand or network performance. In particular, an integrated model is essential for analysis of large-scale transportation projects, regional policies like pricing, or environmental studies.

# Historical Overview

Theory and practice of integrated demand-network models have a long history dated back to the fundamental works of Evans, 1976, Florian. 1977, and others. The basic idea of these integrated formulations was that that the demand part of the model can be expressed as a set of entropy-maximization terms (Wilson, 1970) while the network part of the model can be expressed as a set of link-based congestion terms (Beckman, 1956). These fundamental works related to integration of 4-step demand models and Static User Equilibrium network assignments. In these works, existence and uniqueness of the equilibrium solution was established. The practical way to achieve the equilibrium solution is to apply the travel demand and network model iteratively by feeding back the Level-of-Service Variables from the network model to the demand model. With the advent of advanced Activity-Based Models and Dynamic Traffic/Transit Simulations in practice since 2000, the basic idea of iterative procedure has been applied although the exact mathematical formulation of the integrated model as applied before for 4-step models with Static User Equilibrium cannot be easily extended (Ben-Akiva, et al, 2002, Waller & Castiglione, 2012, Vovsha & Mahmassani, 2012). This webpage is intended to provide an overview and resources for theoretical and practical aspects of constructing integrated models of travel demand and network. It covers different types of demand models and different types of network models with the corresponding analysis of compatibility and integration schemas.

# Key Concepts

# components, # travel demand models, # aggregate 4-step / trip-based.

The trip-based travel model approach evolved over many decades. As their name suggests, trip-based models use individual trips as the fundamental units of analysis. Trip-based models are widely used in practice to support regional, subregional, and project-level transportation analysis and decision-making. Trip-based models are often referred to as “4-step” models because they commonly include four primary components. The first trip generation component estimates the numbers of trips produced by and attracted to each zone (these zones collectively represent the geography of the modeled area). The second trip distribution step connects where trips are produced and where they are attracted to. The third mode choice step determines the travel mode, such as auto or transit, used for each trip, while the fourth assignment step predicts the specific network facilities or routes used for each trip. Additional detail on trip-based models can be found here .

# Microsimulation / Activity-based / Tour-based

Recently, activity-based models have become more widely used in practice. Activity-based models share some similarities to traditional 4-step models – activities are generated, destinations for the activities are identified, travel modes are determined and finally, the specific network facilities or routes used for each trip are predicted. However, activity-based models incorporate some significant advancements over 4-step trip-based models, such as the explicit representation of realistic constraints of time and space and the linkages amongst activities and travel both for an individual person as well as across multiple persons in a household. Activity-based models also have the ability to incorporate the influence of very detailed person-level and household-level attributes, and the ability to produce detailed information across a broader set of performance metrics. These capabilities are possible because activity-based models work at a disaggregate person-level rather than a more aggregate zone-level like most trip-based models.

# Auxiliary Models

Trip-based models and activity-based models represent the trips made by residents of the modeled area when they are travelling entirely within the modeled area. Typically these trips comprise about 80%-90% of the total demand. Auxiliary demand refers to trips that are not represented in the activity-based model system, such as truck trips, visitor trips, internal-external and external-external trips, and special generator trips. In most cases, the auxiliary trip models used in conjunction with activity-based models are similar to those used in conjunction with trip-based models, although additional temporal or spatial detail may be include. In some cases, more sophisticated auxiliary demand models have been implemented in conjunction with activity-based models, but such models are not required.

# Network Supply Models

# static / aggregate.

Static network assignment models are the most widely used roadway network models. They are typically used with the input travel demand is generated for longer time periods, such as multi-hour peak periods, or entire days. As a result, they can only generate estimates of average network travel times and link volumes representing longer time periods for input into the activity-based travel demand model. While the behaviors of static network assignment models are well-known and they have relatively fast runtimes, they are limited by their insensitivity to many operational attributes and by their inconsistency with traffic flow theory (Chiu et al, 2011) Most transit network models are also static, although they vary in their level of complexity. Most BA model are linked with simple shortest generalized path transit assignment models, although some transit assignment models used include the ability to distribute flows across multiple competing routes, and even to reflect the impacts of transit crowding.

# Dynamic / Disaggregate

In contrast to static network assignment models, dynamic network assignment models (also referred to as dynamic traffic assignment models or DTA) capture the changes in network performance by detailed time-of-day, and can be used to generate time varying measures of this performance for input to an activity based model. This temporal resolution can be flexibly defined, and recent implementations have tested resolutions as fine as 10 minutes. The network performance indicators derived from dynamic network models such as congested travel times arise from the dynamic interaction of individual vehicles or packets of vehicles being simulated or calculated using extremely fine grained temporal resolution, such as seconds or fractions of seconds. Dynamic traffic assignment models are sensitive to operational attributes and are founded upon traffic flow theory, but their wide adoption has been hindered by long runtimes and by their inherent stochasticity.

# Choice Contexts by Travel Dimensions

There are certain gray areas between the travel demand side and the network side, as well as within the network simulation process itself, that have been treated differently by different researches and can be either included in or excluded from the network model. In this section we provide the relevant examples, references, and recommendations in this regard.

What decisions are made in demand model vs. network supply model?

Mode / sub-mode / route type / route itinerary

The technical implementation of such a model depends on a treatment of route choice and its placement between the demand model and network simulation tool that can done in several different ways, as shown in Figure 1 (SHRP L04 Project Report).

travel flow models

Trip departure time / scheduling

Another important choice dimension that falls between the demand model and network simulation model is trip departure time choice and related activity scheduling. Traditionally, time-of-day choice and activity scheduling decisions have been a part of the demand model whether 4-step or Activity Based. Recently, several research works discussed advantages of closer integration of trip departure time and route choice decisions in the extended DTA framework (Mahmassani et al, 2001-2006).

# Convergence and Equilibrium

# theory of integrated models.

The early theory and practice of integrated demand-network models have a long history dated back to the fundamental works of Evans, 1976, Florian. 1977, and others. The basic idea of these integrated formulations was that that the demand part of the model can be expressed as a set of entropy-maximization terms (Wilson, 1970) while the network part of the model can be expressed as a set of link-based congestion terms (Beckman, 1956). This theory was largely created for integration of a 4-step demand model with a Static User Equilibrium Assignment Model. Extension of this theory to Activity-Based Models of travel demand and Dynamic Traffic/Transit Assignment models that are implemented in a microsimulation fashion is not straightforward but a substantial progress has been done on this front recently in both theoretical and practical terms (Ben-Akiva, 2001, Mahmassani, 2005, Waller & Castiglione, 2012, Vovsha et al, 2009,SHRP2 C10 and L04 Projects).

# 2-way linkage (I/O) and feedback

Since the technologies of microsimulation have been brought to a certain level of maturity on both the demand side (ABM) and supply (network) side (DTA), the perspective of ABM-DTA integration has become one of the most promising avenues in transportation modeling. Seemingly, the integration between too models should have been as natural and straightforward, as was the integration concept between a 4-step model and static traffic assignment (STA) shown in Figure 2. That relatively simple integration was based on the fact that both I/O entities involved in the process have the same matrix structure. The 4-step demand model produces trip tables needed for assignment, and the assignment procedures produce full Level-of-Service (LOS) skims in a matrix format that is needed for the 4-step model. Note that the LOS variables are provided for all possible trips (not only for the trips generated by the demand model at the current iteration). In this case we can say that the network model provides a full feedback to the demand model. The theory of global demand-network equilibrium is well developed for this case, and guarantees a unique solution for the problem, as well as a basis for effective practical algorithms.

travel flow models

Both ABM and DTA operate with individual particles as modeled units (individual tours and trips) and have compatible levels of spatial and temporal resolution. It might seem that exactly the same integration concept as applied for 4-step models could just be adjusted to account for a list of individual trips instead of fractional-number trip tables. Moreover, the advanced individual ABM-DTA framework would provide an additional beneficial dimension for the integration, in the form of consistent individual schedules (that can never be incorporated in an aggregate framework). Individual schedule consistency means that for each person, the daily schedule (i.e. a sequence of trips and activities) is formed without gaps or overlaps. However, a closer look at the ABM-DTA framework and consideration of the actual technical aspects of implementation reveals some non-trivial issues that toned to be resolved before the advantages offered by overall microsimulation framework can be taken. The problem is specifically that the feedback provided by the DTA procedure does not cover all the needs of the ABM, as shown in Figure 3.

travel flow models

The crux of the problem is, that contrary to the 4-Step-STA integration, the microsimulation DTA can only produce an individual trajectory (path in time and space) for the list of actually simulated trips. It does not automatically produce trajectories for all (potential) trips to other destinations and at other departure times. Thus, it does not provide the necessary level of service feedback to ABM at the disaggregate level for all modeled choices. Any attempt to resolve this issue by “brutal force” would result in an infeasible number of calculations, since all possible trips cannot be processed by DTA at the disaggregate level. In fact, the list of trips for which the individual trajectories can be produced is a very small share of the all possible trips to consider.

As shown in Figure 4, one of the possible solutions is to employ DTA to produce aggregate LOS matrices (the way they are produced by STA), and use these LOS variables to feed the demand model. This approach, in the aggregation of individual trajectories into LOS skims however, would lose most of the details associated with DTA and the advantages of individual microsimulation (for example, individual variation in Values of Time or other person characteristics). Essentially with this approach, the individual schedule consistency concept would be of very limited value because travel times will be very crude for each particular individual. Nevertheless, this approach has been adopted in many studies due to its inherent simplicity [Bekhor et al, 2011; Castiglione, 2012]. The emphasis in these studies was to use more disaggregation in the LOS skims – many more time periods, smaller zones, several VOT classes, etc. but at a certain point, that also becomes unmanageable because of the sheer amount of data.

travel flow models

Several new ideas are currently being considered and tested in several research projects (Chicago ABM-DTA integration).

# Practical Methods to Achieve Convergence in Demand-Network Equilibration

# averaging.

These methods have been borrowed from conventional 4-step modeling techniques, but can be also used with microsimulation as far as they are applied to continuous outputs/inputs like LOS variables and/or synthetic trip tables generated by the Moving Successive Averaging (MSA). Averaging can be applied to different components of the travel model including trip tables, Level-of-Service Variables, link traffic volumes, etc (Vovsha et al, 2008).

# Enforcement

These methods are specific to microsimulation and designed to ensure convergence of “crisp” individual choices by suppressing or avoiding Monte-Carlo variability. These methods are currently at an early stage of theoretical development, with some empirical strategies showing very good results (Vovsha et al, 2008). Enforcement methods include Re-using the same random numbers or starting random seeds for certain choices that would ensure that the choice will be replicated if no change occurs to the inputs, Gradual freezing of portions of households or travel dimensions from iteration to iteration, and analytical discretizing of probability matrices instead of Monte-Carlo simulation.

# Practical Integration Schemas

Four principal practical integration schemas arise as the consequence of two principal demand model structures (4-step and ABM) and two principal network model structures (Static Assignment and Dynamic Assignment that can be combined in all possible ways).

For more information, see Travel Demand and Network Model Integration Schemas

# Integration Design Considerations

# information exchange.

There is essentially a two-way exchange of information between the demand model and the network supply model. In this exchange, the demand model provides estimates of travel demand that are used as input to the network supply model. In turn, the network supply model uses this travel demand information to generate estimates of network performance that are then used as input to the demand model. The type of information exchanged between these model system components reflects the inherent design and structure of these components, and significantly defines the sensitivities of the model system.

Demand → Network Supply

Travel demand model provides estimates of travel demand that are used as input to the network supply model. This travel demand must include the necessary level of detail required by the network supply model, typically information about travel origins, destinations, travel mode, and time-of-day. Some advanced models may include additional information such as values-of-time, toll/no-toll status, or driver/passenger status. The demand information can be transmitted to the network supply model in a number of formats, including trip tables, trip lists, and trip chains.

# Trip Tables

Trip tables represent the most common format for transmitting travel information from the demand model to the network supply model. Most traditional “trip-based” or “4-step” travel demand models generate trip tables. Trip tables are typically square matrices, indexed by origin and destination locations. These origin and destination locations generally represent aggregate spatial geographies such as travel analysis zones (TAZs) because the use of smaller geographies exponentially increases the size of the matrices, requiring more time to read and write and more space to store. Note that there is no intrinsic size to a TAZ – it is a generic term that can be applied to both large and small geographies. In addition to representing aggregate spatial geographies, trip tables also represent aggregate temporal units, such as peak hours, peak time periods, or even entire days. Although trip tables have been used widely in travel forecasting models for decades and there are many tools for creating and manipulating trip tables, they have some notable drawbacks. They are inefficient because the 2-dimensional geographical indexing for all TAZs results in many trip table cells that are empty. In addition, each additional market segment (such as travel model) or time of day multiplies the number of trip tables that are required to represent the complete demand.

# Individual Trip Lists

Trip lists are an increasingly common means of transmitting information from the travel demand model to the network supply model, especially where demand models are linked with dynamic traffic assignment (DTA) models. Trip lists are exactly what their name implies, lists of individual trip records. Each record in a trip list may contain the same type of information as found in traditional trip tables, including origin, destination, travel mode and time-of-day. However, unlike trip tables, trip lists include records only for those origin, destination, mode and time-of-day combinations for which the demand model forecast demand, which is more efficient. Because of this efficiency, trip lists facilitate the incorporation of more detail such as smaller zones, shorter time periods, and more typological detail such as trip-specific values of time.

# Trip Chains

Trip chains are an extension of trip lists, and are essential when linking advanced travel demand models such as activity-based model with advanced network supply models such as DTA models. Trip chains incorporate all the information included in individual trip lists, but also includes additional information about how the individual trips are linked together. Use of trip chains ensures greater integrity between the demand and supply model components because it enforces spatial and temporal consistency – each successive trip begins where the preceding trip ended – and also provide the opportunity to realistically represent aspects such as how an individual vehicle is used throughout the day.

Network Supply → Demand

Origin-destination skims are the primary means through which information about travel times, costs, and other travel-related impedances are transmitted from the network supply back to the travel demand model. Like trip tables, skims are usually square matrices indexed by origin and destination locations. Multiple skims may be generated to represent variations in travel impedances by travel mode, time-of-day or other typological or market segments. Separate skims may be created to represent unique attributes. For example, a set of transit skims may include individual skims for in-vehicle travel time, out-of-vehicle travel time, fare, and transfers. As with trip tables, the origin and destination locations used to develop skims generally represent aggregate spatial geographies such as travel analysis zones (TAZs) because the use of smaller geographies exponentially increases the size of the skim matrices, requiring more time to read and write and more space to store. The use of more aggregate spatial geographies impacts shorter trips more than longer trips, and methods for developing more spatially detailed network impedances for short distances have been developed to complement more spatially aggregate skims. The temporal and typological detail of skims generated by the network model is often relatively coarse. This reflects the challenges associated with assigning temporally detailed demand using static assignment models, the fact that static user equilibrium network assignment model software limits the number of market segment classes that can simultaneously handled, and the longer runtimes associated with increased temporal and typological detail.

# Accessibility Measures

Accessibility indicators are also frequently included in model component specifications, and must also be endogenously generated by the overall model system. These accessibility indicators represent the combined impacts of land use and transportation system performance, and are critical to ensuring reasonable policy sensitivity of the model system to changes in infrastructure and/or land use. In general, four types of accessibility variables are included in the models:

Direct measures of travel times, distances, and costs from modeled network paths

Detailed logsums calculated across alternatives in models that include direct measures

Aggregate (approximate) logsums calculated across alternatives in models that include direct measures

Buffer measures representing the activity opportunities and urban design surrounding each parcel or microzone (e.g. Census block)

The direct measures are used most often in mode choice models. Detailed logsums are calculated from a lower-level choice model, using the full detail from that model. Typical examples in practical AB models are tour mode choice model logsums used in higher level models such as tour time of day choice, tour destination choice, and workplace location choice.

In cases when it is not practical to use the most fully detailed versions of the logsums that are calculated on-the-fly during the simulation every time one is needed. To address this issue, a common approach is to pre-calculate more aggregate accessibility logsums to be used in models where using the more impractical ones would not be computationally or conceptually feasible. For example, some model systems use “aggregate accessibility logsums” calculated from each origin TAZ or microzone, to all possible destinations, via all possible modes. Aggregate logsums are typically calculated for each combination of up to four or five critical dimensions, including:

Origin TAZ or microzone

Tour purpose

Household income group, or Value of time group

Household auto sufficiency (autos owned compared to driving age adults)

Household residence distance from transit service

Aggregate measures are used most often in the day-level models and some of the longer term models, where the model is not yet considering a tour to a specific destination, but is considering, for example, how many tours to make for a given purpose from the home location during the day.

Finally, buffered measures represent the accessibility to very nearby destinations, as could be made by walk, bike or very short car trips. The typical measures that are buffered include:

The number of nearby households

The number of nearby jobs of various types (as proxies for activity locations)

The number of nearby school enrollment places of various school types

The number of nearby transit stops

Clearly, these measures are most relevant when the spatial units themselves are much smaller than the radius of the buffer area. Thus, using buffer-based measures is really only useful when the spatial unit of the model is parcels or (at the largest) Census blocks. One way to make the buffer measures more accurate and relevant is to use on-street shortest path distance to measure the distance to the edge of the buffer, rather than using straight line (crow-fly) or Euclidean distances.

Individual Simulated Trajectories

# Resolution

# consistency between models.

One of the distinguishing aspects of activity-based models is that they include an explicit representation of time-of-day. Many trip-based models generate estimates of daily trips and incorporate peak and off-peak assignment models by using fixed time-of-day factors. In contrast, activity-based models explicitly predict tour and trip arrival times, departure times, and activity durations. The temporal resolution of these more detailed times of day can vary from as broad as three hours or more, to as detailed as 15 minutes or less. The network assignment model design should be consistent with the temporal resolution of the activity-based model. If the activity-based model produces more temporally aggregate demand, such as multi-hour periods, then the network resolution must reflect the activity-based model resolution. However, if the activity-based demand model produces more temporally detailed demand, then users may have tremendous flexibility in the network assignment model design to incorporate this detail. This detail can provide better estimates of network performance by time-of-day and potentially provide more sensitivity to phenomena such as peak spreading. Ideally, the temporal resolution of the activity-based demand and network assignment model would be exactly aligned. Practically, this alignment is usually not possible because most activity-based demand models are linked with static network assignment models, which are incapable of generating reasonable measures of link volumes and network performance indicators for small time periods less than one hour in duration.

# Continuous vs. Time-Sliced

# activity-scheduling approaches & individual schedule consistency, # consistency between models.

Travel analysis zones are used in most travel demand model systems. The term “TAZ” is generic and does not imply or refer to any specific scale. However, TAZs are often defined so that they are similar to or consistent with an existing geographic system such as a region’s Census tracts or Census block groups. The number of TAZs in a region typically ranges from 500-5,000. However, within a region, there may be a fair amount of variation in size amongst the TAZs.

Like “TAZ,” the term “microzone” is a generic term that does not refer to a specific scale. Instead this term is intended to describe a geographic system that incorporates more spatial detail than a typical TAZ system. In a number of regions, microzones have been defined at a resolution that is similar to that of Census blocks, although the block geography is usually modified to ensure that the individual microzones will be meaningful within the model system. For example, blocks that represent water features such as rivers or lakes may be combined with adjacent microzones. Developing microzone-level spatial information, especially for future year scenarios, can be more involved thant developing TAZ-level spatial information. However, there are a number of spatially-detailed, publically available datasets that can be used to create these microzone-level assumptions. A typical model might include 30,000 to 150,000 “microzones”, an order of magnitude more than the typical number of TAZs but also an order of magnitude less than a typical number of parcels in a region.

# Parcel / Link Face

Parcels have a more specific definition than TAZs or microzones. Parcel geographies are most often defined by local-level municipal and county tax assessors’ offices. Parcels are usually extremely fine-grained, with each spatial unit often corresponding to the geography associated with a single building. However, as with TAZs and microzones, there is significant variation in parcel sizes. For example, large institutions that contain diversity of buildings, employment, and uses may be represented by a single parcel. Using a parcel-level spatial scale can provide the greatest ability to incorporate local-level, smaller-scale land use and transportation system attributes, such as the mix of employment within a short walking distance, or the distance to the nearest actual transit stop. Developing, maintaining, and forecasting parcel-level attributes requires more effort than developing similar TAZ-level or microzone-level attributes, especially on the employment side. There are often inconsistencies and errors in the base year or “observed” data sources, and developing future year parcels requires careful consideration of the sources for detailed future population and employment assumptions and potentially methods and practices for “splitting” parcels as they development occurs.

# Typological / Market Segmentation / "User Classes"

# travel markets, # core resident, # auxiliary (trucks, visitors, etc), # modal / submodal**, # auto occupancy / toll / vehicle type.

The recent practice with both 4-step models and ABMs was to include route type choice, most frequently as binary or trinary choice between toll, managed-lane, and non-toll general-purpose paths as the lower-level choice in the mode choice structure as shown in the figure below (NCHRP 08-57, SHRP 2 C04). However, some alternative schemas with more reliance on network model were proposed where a segmentation of highway users by VOT instead of the route type choice was suggested. The two methods (route type choice and use segmentation by VOT) can be applied concurrently in the same framework.

travel flow models

Similar discussion to the route type choice in the context of highway pricing is being held for transit route type choice. Prevailing practice is to have multiple “labeled transit modes” with a subsequent restricted transit assignment for each mode as shown in the Figure below. However, this practice is problematic due to a large number of possible transit mode combinations that can be used for the same trip. Alternative approach operates with a few “non-labeled” generic transit modes and relies on the transit path builder to address use preferences for multiple user classes. (TCRP H-37, Chicago Transit ABM).

travel flow models

# Active Transportation

First attempts to address walk and bike in mode choice and transit assignment and identify their specific Level-of-Service variables and user classes (Portland, SF, Ottawa, San Diego models).

# Travel Preference & Modality Parameters

# trip purpose / vot / vor / schedule flexibility.

Methods to a better achieve consistency between travel demand and network simulation model with respect to VOT and other parameters (Chicago ABM, Baltimore ABM design)

# Propensity to walk

Methods to address user segmentation w.r.t. transit services. Example of the age-parameterized propensity to walk applied in the Chicago ABM.

# Transit awareness

Methods to evaluate user awareness of transit services and corresponding modality classes (TCRP H-37, Walker et al, 2010-2012).

# Integration Examples

# active research projects.

(opens new window) . These project extend the first phase of C10 integrated model development. Integrated model development projects have now been implemented in:

  • Jacksonville

MATSim stands for Multi-Agent Transport Simulation. It is a project that came in part out of TRANSIMS ; it decomposes traffic assignment into iterating between route choice and network loading, and then integrates additional choice dimensions, such as mode or departure time choice, into that loop. The iterations follow a co-evolutionary approach. That is, the synthetic travellers (or "agents") from time to time come up with new alternatives (= new plans), try them out on the network loading (called "mobility simulation"), and obtain a score/utility for these new alternatives. If they do not come up with new alternatives, each synthetic person selects between its existing alternatives according to the score/utility. For more information see the separate MATSim page.

# Emerging Practice and Research Avenues

Fully disaggregate demand and supply simulation Learning and adaptation / Agent-Based Modeling Continuous data exchange vs. intermittent linking Impact of travel information Integrated travel demand/supply and land use models

# Related Content

(opens new window)

Activity Based Modeling Primer

ITM 2016 Presentation: Moving Towards Agent Based Models as the Next Step in Evolution of Integrated ABM-DTA Models _as_the_Next_Step_in_Evolution_of_Integrated_ABM-DTA_Models.pdf)

← Travel Demand and Network Model Integration Schemas Traffic simulation models →

This site uses cookies to learn which topics interest our readers.

U.S. flag

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

National Weather Service launches new website for water prediction and products

Image showing map from NOAA's new National Water Prediction Service (NWPS), transforming how water resources information and services are delivered, and providing a greatly improved user experience through enhanced displays. National Water Prediction Service (NWPS) map showing River Gauge observations and forecasts and associated precipitation estimates for March 26, 2024. This map should not be used for decision making.

The National Weather Service (NWS) is introducing the new National Water Prediction Service (NWPS), transforming how water resources information and services are delivered, and providing a greatly improved user experience through enhanced displays. (Image credit: NOAA)

NOAA’s National Weather Service launched a new website today: The National Water Prediction Service . This new hub for water data, products and services combines local and regional forecasts with water data and new national level capabilities, such as flood inundation maps and the National Water Model.

"This online water hub is modern and flexible — providing information to help our partners and the public make sound decisions for water safety and management," said Ed Clark, director of NOAA’s National Water Center. “The new site leverages modern software, geospatial technology and cloud infrastructure, vastly improving the customer experience before, during and after extreme water events such as floods and droughts.”

Key features integrated into the National Water Prediction Service website

  • A new, dynamic and seamless national map with flexible options and expansive layers available to help analyze water conditions anywhere in the country.
  • Improved hydrographs that are frequently updated, depict water level observations over the past 30 days and provide river flood forecasts up to 10 days in advance. 
  • The National Water Model , which provides 24 hour coverage, seven days a week hydrologic forecast guidance along 3.4 million river miles across the U.S., including river segments, streams and creeks that have no river gauges.
  • Real-time, comprehensive Flood Inundation Maps, which are being implemented in phases, will cover nearly 100% of the U.S. by October 2026.
  • An Application Programming Interface (API) has been added to the traditional geographic information system (GIS) data, which will allow customers to flow water information into their own applications and services.

National Water Prediction Service (NWPS) map showing River Gauge observations and forecasts and associated precipitation estimates for March 26, 2024. This map should not be used for decision making.

NWS continues to improve the essential support services it provides to communities locally and nationwide as part of its transformation into a more nimble, flexible and mobile agency that works hand-in-hand with decision makers. The National Water Prediction Service website provides tools that deliver actionable hydrologic information across all time scales to address the growing risk of flooding, drought and water availability, and enables partners and the American public to make smart water decisions.

Weather.gov Webstory - January 12, 2024

NWPS Product and Users Guide

Flood Inundation Mapping  news release

Biden-Harris Administration announces $80 million to improve flood prediction capabilities

National Weather Service - Flood Safety Tips and Resources

Media Contact

Michael Musher, michael.musher@noaa.gov , (771) 233-1304

Related Features //

Photos of the front of the Owlie Skywarn card, showing Owlie the owl with his wings around three children with the words, "SKYWARN Tips for TORNADO SAFETY."

share this!

April 10, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

New model better predicts our daily travel choices

by Rebecca Mosimann, Ecole Polytechnique Federale de Lausanne

New model better predicts our daily travel choices

An EPFL engineer has developed a forecasting model that factors in not just our commuting habits, but also our activities during the day. Her flexible approach incorporates the idea of trade-offs in order to deliver more realistic predictions.

Transportation engineers often use computer models to estimate demand on a given itinerary, answering questions such as how many cars drive along the stretch of highway between Lausanne and Geneva each year and which train lines carry the most passengers. It's a broad and fascinating field, and one that Janody Pougala, a civil engineering student at EPFL's Transport and Mobility Laboratory, decided to study for her Ph.D. thesis.

Pougala developed a new model for predicting individuals' travel choices that factors in a wider range of variables, and therefore maps actual behavior more closely. Her program, available in open source, looks at not just the way people typically get around but also their everyday activities. It represents a particularly sophisticated approach because it accounts for how people respond to the unpredictable events that inevitably form part of our daily lives.

In conventional models, transportation engineers start by examining each trip an individual makes along with the reasons for that trip, the transportation method the person uses, and the chosen itinerary. The engineers then develop programs that describe this behavior in a sequential, chronological way. But these programs often aren't well-suited to complex realities.

Modeling trade-offs

To design more accurate models, engineers need to gain a better understanding of how people behave. That's especially true in light of today's increasingly diverse lifestyles. With more people working from home, the roll-out of car-sharing systems, and infrastructure improvements that enable employees to live further away from their employer, commuting patterns have changed considerably. These are some of the structural shifts that Pougala wanted to address with her new model, which is based on individuals' activities and preferences, and therefore stands to be more accurate.

How does the model work? "It starts by scheduling an individual's activities over the course of a day, and then links the corresponding variables together with mathematical equations ," says Pougala. "I pulled data for the variables from a number of sources, including the results of commuting surveys and statistics." The key to her model lies in its extremely flexible design. "It doesn't go through the factors sequentially but rather analyzes all of them at the same time," she says.

And because her model isn't bound by a predefined order of events during the course of a day, it can account for decisions based on personal satisfaction and constraints. In short, it's a new way of modeling trade-offs. Pougala took behavioral hypotheses described in the literature and studies of sociology and urban environments , and translated them into mathematical equations.

Then she combined the equations with statistical data so that the model would make as realistic forecasts as possible. To give an example, suppose a woman named Emma decides to work late and not go to the gym. On her way home, her train encounters a technical difficulty at the Lausanne train station. Instead of waiting for a replacement train, Emma decides to take the bus.

Pougala explains, "My model can predict how different individuals would respond under these types of circumstances and how long they'll tolerate situations they don't really like. It can also describe how people adapt and use alternative transportation methods."

City officials can use Pougala's model in their long-term planning to determine which type of transportation infrastructure to develop. It's already been tested against the model used by the Swiss railway company as well as in an urban planning project in Zurich designed to show what the city could look like if half of the transport that takes place there were non-motorized.

Explore further

Feedback to editors

travel flow models

Metasurface antenna could enable future 6G communications networks

Apr 12, 2024

travel flow models

Making cement is very damaging for the climate. One solution is opening in California

travel flow models

New computer vision tool can count damaged buildings in crisis zones and accurately estimate bird flock sizes

Apr 11, 2024

travel flow models

Engineers recreate Star Trek's Holodeck using ChatGPT and video game assets

travel flow models

Engineers quicken the response time for robots to react to human conversation

travel flow models

Researchers show electrical pulses can control thermal resistance in devices

travel flow models

New open-source generative machine learning model simulates future energy-climate impacts

travel flow models

Adding a telescopic leg beneath a quadcopter to create a hopping drone

travel flow models

Tiny AI-trained robots demonstrate remarkable soccer skills

travel flow models

Discovery brings all-solid-state sodium batteries closer to practical use

Related stories.

travel flow models

Modeling urban growth shows that cities develop in ways similar to cancerous tumors

Apr 8, 2024

travel flow models

Global-local path choice model: A new method to understand the walkability of cities

Mar 11, 2024

travel flow models

Predicting city traffic using a machine learning model

Feb 28, 2023

travel flow models

Machine learning models can produce reliable results even with limited training data

Sep 19, 2023

travel flow models

AI improves detail, estimate of urban air pollution

Jan 12, 2023

travel flow models

Researcher proposes more equitable subway stations for the elderly and mobility handicapped

Jan 22, 2024

Recommended for you

travel flow models

New 3D-printing method makes printing objects more affordable and eco-friendly

Apr 10, 2024

travel flow models

With inspiration from Tetris, researchers develop a better radiation detector

travel flow models

Engineering students convert old truck to an electrical vehicle

travel flow models

This device gathers, stores electricity in remote settings

Apr 9, 2024

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Tech Xplore in any form.

Your Privacy

This site uses cookies to assist with navigation, analyse your use of our services, collect data for ads personalisation and provide content from third parties. By using our site, you acknowledge that you have read and understand our Privacy Policy and Terms of Use .

E-mail newsletter

Link Travel Times II: Properties Derived from Traffic-Flow Models

  • Published: December 2004
  • Volume 4 , pages 379–402, ( 2004 )

Cite this article

  • Malachy Carey 1  

153 Accesses

17 Citations

Explore all metrics

We investigate the properties of travel times when the latter are derived from traffic-flow models. In particular we consider exit-flow models, which have been used to model time-varying flows on road networks, in dynamic traffic assignment (DTA). But we here define the class more widely to include, for example, models based on finite difference approximations to the LWR (Lighthill, Whitham and Richards) model of traffic flow, and ‘large step’ versions of these. For the derived travel times we investigate the properties of existence, uniqueness, continuity, first-in-first-out (FIFO), causality and time-flow consistency (or intertemporal consistency). We assume a single traffic type and assume that time may be treated as continuous or as discrete, and for each case we obtain conditions under which the above properties are satisfied, and interrelations among the properties. For example, we find that FIFO is easily satisfied, but not strict causality, and find that if we redefine travel time to ensure strict causality then we lose time-flow consistency, and that neither of these conditions is strictly necessary or sufficient for FIFO. All of the models can be viewed as an approximation to a model that is continuous in time and space (the LWR model), and it seems that any loss of desirable properties is the price we pay for using such approximations. We also extend the exit-flow models and results to allow ‘inhomogeneity’ over time (link capacity or other parameters changing over time), and show that FIFO is still ensured if the exit-flow function is defined appropriately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

travel flow models

Dynamic Traffic Assignment: A Survey of Mathematical Models and Techniques

travel flow models

Algorithms for Flows over Time with Scheduling Costs

travel flow models

Algorithms for flows over time with scheduling costs

Dario Frascaria & Neil Olver

Adamo, V., V. Astarita, M. Florian, M. Mahut, and J.H. Wu. (1998). "An Analytical and Applicative Framework for Spillback Congestion Modelling in the Continuous Time Link Based Dynamic Network Loading Models." Tristan 98 , Puerto Rico.

Adamo, V., V. Astarita, M. Florian, M. Mahut, and J.H. Wu. (1999a). "Analytical Modelling of Intersections in Traffic Flow Models with Queue Spill-Back." In IFORS' 99, 15th Triennial Conference, Hosted by the Operations Research Society of China (ORSC), Beijing, P.R. China, August 16-20, 1999.

Adamo, V., V. Astarita, M. Florian, M. Mahut, and J. H. Wu. (1999b). "Modelling the Spill-back of Congestion in Link Based Dynamic Network Loading Models: A Simulation Model with Application." In 14th International Symposium on Theory of Traffic Flow, Jerusalem, July 1999. Published by Elsevier, pp. 555-573.

Astarita, V. (1995). "Flow Propagation Description in Dynamic Network Loading Models." In Y.J. Stephanedes and F. Filippi (Eds.), Proceedings of IV International Conference on Application of Advanced Technologies in Transportation Engineering (AATT) , pp. 599-603, ASCE.

Astarita, V. (1996). "A Continuous Time Link Model for Dynamic Network Loading Based on Travel Time Function." In J.-B. Lesort (Ed.), Proceedings of the 13th International Symposium on Theory of Traffic Flow , Elsevier, pp. 79-102.

Beckmann, M., C.B. McGuire, and C.B. Winsten. (1956). Studies in the Economics of Transportation . NewHaven, CT: Yale University Press.

Google Scholar  

Boyce, D., D.-H. Lee, and B. Ran. (2001). "Analytical Models of the Dynamic Traffic Assignment Problem." Networks and Spatial Economics 1, 377-390.

Carey, M. (1986). "A Constraint Qualification for a Dynamic Traffic Assignment Model." Transportation Science 20(1), 55-58.

Carey, M. (1987). "Optimal Time-Varying Flows on Congested Networks." Operations Research 35(1), 56-69.

Carey, M. (1990). "Extending and Solving a Multi-Period Congested Network Flow Model." Computers and Operations Research 17(5), 495-507.

Carey, M. (1999). "A Framework for User Equilibrium Dynamic Traffic Assignment." Research Report. Faculty of Business and Management , University of Ulster, BT37 0QB. Being revised for publication.

Carey, M. (2001). "Dynamic Traffic Assignment with more Flexible Modelling within Links." Networks and Spatial Economics 1(4), 349-375.

Carey, M. (2004). "Link Travel Times I: Desirable Properties." Networks and Spatial Economics 4(3), 257-268.

Carey, M. and Y. Ge. (2004). "Comparing Whole-Link Travel Time Models Used in DTA." Transportation Research 37B(10), 905-926.

Carey, M. and M. McCartney. (2000). "A Class of Traffic Flow Models Used in Dynamic Assignment." Computers & Operations Research 31(10), 1583-1602.

Carey, M. and M. McCartney. (2002). "Behaviour of a Whole-Link Travel Time Model Used in Dynamic Traffic Assignment." Transportation Research 36(1), 83-95.

Carey, M. and A. Srinivasan. (1982). Modelling Network Flows with Time-Varying Demands. Working Paper . School of Urban and Public Affairs, Carnegie-Mellon University, Pittsburgh, PA. Report to the U.S. Department of Transportation, Urban Mass Transportation Authority, 73 pages.

Carey, M. and A. Srinivasan. (1993). "Externalities, Average and Marginal Costs, and Tolls on Congested Networks with Time-Varying Flows." Operations Research 41(1), 217-231.

Daganzo, C.F. (1995). "A Finite Difference Approximation of the Kinematic Wave Model of Traffic Flow." Transp Res 29B(4), 261-276.

Friesz T.L., J. Luque, R.L. Tobin, and B-Y. Wie. (1989). "Dynamic Network Traffic Assignment Considered as a Continuous Time Optimal Control Problem." Operations Research 37(6), 893-901.

Friesz, T.L., D. Bernstein, T.E. Smith, R.L. Tobin, and B.W. Wie. (1993). "A Variational Inequality Formulation of the Dynamic Network user Equilibrium Problem." Operations Research 41, 179-191.

Friesz, T.L., D. Bernstein, Z. Suo, and R.L. Tobin. (2001). "Dynamic Network user Equilibrium with State-Dependent Time Lags." Networks and Spatial Economics 1, 319-347.

Jayakrishnan, R., W.K. Tsai, and A. Chen. (1995). "A Dynamic Traffic Assignment Model with Traffic-Flow Relationships." Transportation Research 3C, 51-72.

Lam, W.H.K. and H.-J. Huang. (1995). Dynamic user Optimal Traffic Assignment Model for many to one Travel Demand." Transportation Research 29B(4), 243-259.

Lighthill, M.J. and G.B. Whitham. (1955). "On Kinematic Waves. I: Flow Movement in Long Rivers II: A Theory of Traffic Flow on Long Crowded Roads." Proceedings of the Royal Society A 229, 281-345.

Lo, H.K. (1999). "A Dynamic Traffic Assignment Model that Encapsulates the Cell Transmission Model." In A. Ceder (Ed.), Traffic and Transportation Theory , pp. 327-350.

Lo, H.K. and W.Y. Szeto. (2002). "A Cell-based Variational Inequality Formulation of the Dynamic User Optimal Assignment Problem." Transportation Research 36B, 421-443.

Merchant, D.K. and G.L. Nemhauser. (1978a). "A Model and an Algorithm for the Dynamic Traffic Assignment Problem." Transportation Science 12(3), 183-199.

Merchant, D.K. and G.L. Nemhauser. (1978b). "Optimality Conditions for a Dynamic Traffic Assignment Model." Transportation Science 12(3), 200-207.

Nie, X. and H.M. Zhang. (2002). A Comparative Study of Some Macroscopic Link Models Used in Dynamic Traffic Assignment. Forthcoming in Networks and Spatial Economics .

Ran, B., D.E. Boyce, and L.J. LeBlanc. (1993). "A New Class of Instantaneous Dynamic User-Optimal Traffic Assignment Models." Operations Research 41, 192-202.

Ran, B. and D. Boyce. (1996). Modelling Dynamic Transportation Networks . Heidelberg: Springer-Verlag.

Ran, B., D.-H. Lee, and M.S.-I. Shin. (2002). "New Algorithm for a Multiclass Dynamic Traffic Assignment Model." Journal of Transportation Engineering 128, 323-335.

Richards, P.I. (1956). "Shock Waves on the Highway." Operations Research 4, 42-51.

Wie, B.W. and R.L. Tobin, (1998). "Dynamic Congestion Pricing for General Traffic Networks." Transportation Research B 32(5), 313-327.

Wie, B.W., R.L. Tobin, and T.L. Friesz. (1994). "The Augmented Lagrangian Method for Solving Dynamic Network Traffic Assignment Models in Discrete Time." Transpn. Sci. 28, 204-220.

Wie, B.W., R.L. Tobin, D. Bernstein, and T.L. Friesz. (1995). "A Comparison of System Optimum and User Equilibrium Traffic Assignments with Schedule Delay." Transpn. Res. , 3C, 389-411.

Wu J.H., Y. Chen, and M. Florian. (1995). "The Continuous Dynamic Network Loading Problem: A Mathematical Formulation and Solution Method." In Presented at the 3rd EURO WORKING GROUP Meeting on Urban Traffic and Transportation, Barcelona 27-29 September.

Wu, J.H., Y. Chen, and M. Florian. (1998). "The Continuous Dynamic Network Loading Problem: AMathematical Formulation and Solution Method." Transportation Research , 32B, 173-187.

Xu, Y.W., J.H. Wu, M. Florian, P. Marcotte, and D.L. Zhu. (1999). "Advances in the Continuous Dynamic Network Loading Problem." Transportation Science 33(4), 341-353.

Yang, H. and H.-J. Huang. (1997). "Analysis of the Time-Varying Pricing of a Bottleneck with Elastic Demand using Optimal Control Theory." Transportation Research B 31(6), 425-440.

Zhu, D. and P. Marcotte. (2000). "On the Existence of Solutions to the Dynamic User Equilibrium Problem." Transportation Science 34(4), 402-414.

Download references

Author information

Authors and affiliations.

School of Management and Economics, Queen's University, Belfast, Northern Ireland, BT7 1NN

Malachy Carey

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

About this article

Carey, M. Link Travel Times II: Properties Derived from Traffic-Flow Models. Networks and Spatial Economics 4 , 379–402 (2004). https://doi.org/10.1023/B:NETS.0000047114.31259.3d

Download citation

Issue Date : December 2004

DOI : https://doi.org/10.1023/B:NETS.0000047114.31259.3d

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • road traffic networks
  • dynamic traffic assignment
  • properties of travel times
  • Find a journal
  • Publish with us
  • Track your research

travel flow models

We've detected unusual activity from your computer network

To continue, please click the box below to let us know you're not a robot.

Why did this happen?

Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. For more information you can review our Terms of Service and Cookie Policy .

For inquiries related to this message please contact our support team and provide the reference ID below.

COMMENTS

  1. Introduction to Transportation Modeling: Travel Demand Modeling and

    The relationship between the traffic flow and travel time equation used in the fourth step is: Where. t= link travel time per length unit. t 0 =free-flow travel time. v=link flow. c=link capacity. a, b, and n are model (calibrated) parameters. Model improvement

  2. PDF Activity-Based Travel Demand Models

    Activity-Based Travel Demand Models. A Primer. S2-C46-RR-1. SHRP 2 TRB. Activity-Based Travel Demand Models: A Primer. 029353 SHRP2 Activity-Based Final with No Mailer.indd 1 2/13/15 1:16 PM. TRANSPORTATION RESEARCH BOARD 2015 EXECUTIVE COMMITTEE* OFFICERS. Chair: Daniel Sperling,

  3. Smart transportation planning: Data, models, and algorithms

    To estimate average daily origin-destination trips and solve travel flow problems, ... (SARIMA) model is a suitable choice to model traffic flow behavior. The trend is filtered through the differencing part [12]. Then, the model eliminates random shock factors by generating the moving averages. Suppose the one observation of data involves the ...

  4. A spatial econometric model for travel flow analysis and real-world

    A spatial econometric model is employed for the OD travel flow analysis by integrating massive mobile data with other explanatory features of urban regions. The results of real-world applications in Hangzhou, China, show that: (I) all coefficients of the origin dependence, destination dependence, and OD dependence are statistically significant ...

  5. Modelling traffic flows and estimating road travel times in ...

    The LWR model captures the characteristics of a car-following model at an aggregate level (Newell 1961) and is consistent with flow shockwave models (Whitham and Fowler 2008). However, Liu et al. ( 1998 ) indicated that this model cannot describe the amplification of small disturbances in heavy traffic.

  6. PDF Travel demand modeling

    Travel demand is derived from demand for activities. Tours are interdependent. People face time and space constraints that limit their activity schedule choice. Activity and travel scheduling decisions are made in the context of a broader framework. Conditioned by outcomes of longer term processes.

  7. Full article: Traffic flow prediction models

    Traffic flow prediction is an essential part of the intelligent transport system. This is the accurate estimation of traffic flow in a given region at a particular interval of time in the future. The study of traffic forecasting is useful in mitigating congestion and make safer and cost-efficient travel. While traditional models use shallow ...

  8. Analysis and comparison of traffic flow models: a new hybrid traffic

    This paper compares a hybrid traffic flow model with benchmark macroscopic and microscopic models. The proposed hybrid traffic flow model may be applied considering a mixed traffic flow and is based on the combination of the macroscopic cell transmission model and the microscopic cellular automata. The hybrid model is compared against three microscopic models, namely the Krauß model, the ...

  9. A Spatial Econometric Model for Travel Flow Analysis and Real-World

    A Spatial Econometric Model for Travel Flow Analysis and Real-World Applications with Massive Mobile Phone Data. Cellular signaling data provide a massive and emerging source to acquire urban origin-destination (OD) travel flows for transportation planners, support decision-making of large-scale mobility enhancement, and make it possible to explore underling influence factors of travel demand ...

  10. [PDF] A spatial econometric model for travel flow analysis and real

    DOI: 10.1016/J.TRC.2017.12.002 Corpus ID: 116515251; A spatial econometric model for travel flow analysis and real-world applications with massive mobile phone data @article{Ni2017ASE, title={A spatial econometric model for travel flow analysis and real-world applications with massive mobile phone data}, author={Ling-lin Ni and Xiaokun (Cara) Wang and Xiqun (Michael) Chen}, journal ...

  11. Explainable Traffic Flow Prediction with Large Language Models

    Traffic flow prediction is crucial for urban planning, transportation management, and infrastructure development. However, achieving both accuracy and interpretability in prediction models remains challenging due to the complexity of traffic data and the inherent opacity of deep learning methodologies.

  12. Transportation Modeling & Traffic Modeling

    Traffic flow modelling and simulation enables planners to understand the current issues in their transportation system, identify opportunities and forecast and measure effects of development planning. ... Travel demand models represent all transport-relevant decision processes that make people move. Within a model, future scenarios for ...

  13. Travel Demand Forecasting: Parameters and Techniques

    Some models consider capacity to be applied during free-flow, un congested travel conditions, while others use mathematical formulas and look-up tables based on historical research on speed-flow relationships [e.g., Bureau of Public Roads (BPR) curves and other sources] in varying levels of congestion on different types of physical facilities.

  14. 5.2: Traffic Flow

    Macroscopic Models. Macroscopic traffic flow theory relates traffic flow, running speed, and density. Analogizing traffic to a stream, it has principally been developed for limited access roadways (Leutzbach 1988). The fundamental relationship "q=kv" (flow (q) equals density (k) multiplied by speed (v)) is illustrated by the fundamental ...

  15. Impacts of highly automated vehicles on travel demand: macroscopic

    Modeling research in the context of AV so far mainly focuses either on microscopic traffic flow models or on microscopic travel demand models, i.e. models following an agent-based approach. The latter are used in numerous studies to analyze the impact of automated ridesharing systems on a given travel demand scenario, e.g. Bischoff et al. ...

  16. A review on travel behaviour modelling in dynamic traffic simulation

    These traffic simulation models can be applied to obtain a better understanding of the evacuation conditions and the effect of traffic regulations and control measures hereon, by predicting departure and arrival patterns, travel times, average speeds, queue lengths, traffic flow rates, etc. Insight into this dynamic process is necessary to make ...

  17. Travel time estimation by urgent-gentle class traffic flow model

    An urgent-gentle class traffic flow model (UGM) is developed. • The Navier-Stokes like model proposed by Zhang (Trans. Res. B, 2003) is extended for validating the UGM. • Ring road traffic flows with or without ramp are simulated numerically to calculate travel time through the ring road. • Rational management of road operation is ...

  18. Expressway traffic flow prediction based on MF-TAN and STSA

    Highly accurate traffic flow prediction is essential for effectively managing traffic congestion, providing real-time travel advice, and reducing travel costs. However, traditional traffic flow prediction models often fail to fully consider the correlation and periodicity among traffic state data and rely on static network topology graphs. To solve this problem, this paper proposes a ...

  19. An Open-Access Modeled Passenger Flow Matrix for the Global Air ...

    Their models provided reasonable flow predictions of origin-destination travel. Our research follows the general modeling framework used in Johansson et al [17] , [40] , but extends the specification to a global model which includes: 1) all nodes with a host-city population of more than 100,000; 2) routes between all airports that are within 0 ...

  20. Integrated Travel Demand and Network Models

    An integrated travel demand model that can serve various planning needs should include at least two primary components: A travel demand model that generates trips by origin, destination, mode, and time of day, and. A network model that assigns these trips onto the corresponding networks, identifies routes chosen for each trip, and generates ...

  21. PDF Traffic Flow Theory

    Traffic Flow Theory A Monograph DANIEL L. GERLOUGH and MATTHEW J. HUBER 1~1 SPECIAL REPORT 165 Transportation Research Board National Research Council Washington, D.C., 1975 ... 4.4 Speed-Flow Models 4.5 Travel Time Relationships 4.6 Summary 4.7 References 4.8 Related Literature

  22. Discretised route travel time models based on cumulative flows

    Route travel times can be directly calculated by using the route cumulative flow curves. Let M p (t) (N p (t)) denote the cumulative departure (arrival) flow along route p by time t, and τ p (t) the travel time of route p with respect to departure time t.As shown in Figure 1, route travel times can be used to connect the cumulative departure and arrival flows along each route by the ...

  23. Travel demand models help plan for the future

    Less technically speaking, a travel demand model consists of a series of interlinking computer programs that use statistics and demographic information to forecast current and future transportation system conditions. That means ADOT can plug in data -- info like how many people live and work in various communities -- and the model will forecast ...

  24. National Weather Service launches new website for water prediction and

    The National Water Model, which provides 24 hour coverage, ... (GIS) data, which will allow customers to flow water information into their own applications and services. National Water Prediction Service (NWPS) map showing River Gauge observations and forecasts and associated precipitation estimates for March 26, 2024. This map should not be ...

  25. New model better predicts our daily travel choices

    Pougala developed a new model for predicting individuals' travel choices that factors in a wider range of variables, and therefore maps actual behavior more closely. Her program, available in open source, looks at not just the way people typically get around but also their everyday activities. It represents a particularly sophisticated approach ...

  26. Link Travel Times II: Properties Derived from Traffic-Flow Models

    We investigate the properties of travel times when the latter are derived from traffic-flow models. In particular we consider exit-flow models, which have been used to model time-varying flows on road networks, in dynamic traffic assignment (DTA). But we here define the class more widely to include, for example, models based on finite difference approximations to the LWR (Lighthill, Whitham ...

  27. Evaluating RAG Applications with AzureML Model Evaluation

    AzureML Model Evaluation delivers a versatile experience, offering both an intuitive User Interface (UI) and a powerful Software Development Kit (SDK) a.k.a. azureml-metrics sdk. In this blog, we will be focusing on the SDK flow using azureml-metrics package. Architecture and Flow of AzureML Model Evaluation for a RAG scenario

  28. April 12, 2024

    The US currently expects that Iran will carry out strikes against multiple targets inside Israel and that Iranian proxies could also be involved in attacks, according to a senior administration ...

  29. Airlines Desperate for Planes Are Paying Up for Older Models

    Boeing Co.'s latest 737 Max crisis has worsened an airline shortage of popular narrowbody aircraft, sending the cost of used-jet rentals to the highest level in years.

  30. Boeing CEO's penchant for cost-cutting doesn't apply to his trips on

    When it comes to building planes, Boeing CEO Dave Calhoun is all about streamlining costs. An accountant by training, Calhoun has prioritized fiscal discipline over his four years at the helm ...