While traditional methods have been used to collect traffic data for generations, intelligent transportation systems (ITS) provide new sources and new challenges for traffic data collection. The ITS data includes large amounts of traffic data for immediate use in operations as well as data for analytical applications through archived data management systems (ADMS). The increasing amounts and types of traffic data available from ITS enable new applications but raise concerns about data quality. The potential for ITS data to fulfill data requirements for transportation planning, engineering, and operations applications has only begun to be realized. Institutional, technical and possibly financial issues remain to be resolved before these data are adopted into widespread use for mainstream applications. This section of the report addresses technical issues related to the data quality standards users require, discusses and describes the salient features of existing and future data sharing agreements, estimates the level of effort required for reporting data quality and specifies procedures for using metadata. Each topic is discussed in the following sections.
While the planning, engineering and operations disciplines all require transportation data for their analytical procedures and applications, their spatial and temporal requirements differ considerably, with planning applications generally associated with the least stringent requirements and operations applications associated with the most stringent. Traffic data are also variously important as inputs to analyses and applications, as some applications are more sensitive to variations in input traffic values than others. Traffic data providers can benefit from understanding the data requirements of their customers, either in setting their pricing policies, developing truth-in-data statements or in responding to data requests that do not include clear direction concerning the quality needs of the application. By understanding and being responsive to the data quality needs of secondary users, the traffic data collection community can develop a demand for its services and integrate its business operations with those of the rest of the transportation community. In this way, revenue streams or other types of non-monetary support for ITS related and other traffic operations data can be developed and grown.
The following sections discuss the data quality requirements for several planning, operations, and engineering applications. A description of the application and its data requirements, and the significance of traffic data as a source of error for each of the applications are discussed. For purposes of these discussions, the accuracy measure is used to illustrate the importance of data quality in the various applications.
Municipal governments, metropolitan planning organizations (MPOs) and state DOTs develop and apply travel demand models to determine infrastructure needs and to set land use and transportation policies. Model analyses are integral to the development of air quality conformity analyses and long-range transportation plans by MPOs. State-of-the-practice transportation models provide estimates of annual daily traffic or AADT by direction. State-of-the-art models may provide a finer grain of temporal and spatial coverage, may account for a larger number of travel markets and, correspondingly, require more and better data. The models often cover large geographic areas, including entire states or metropolitan statistical areas. A typical regional model includes all freeways, expressways and major arterials and most minor arterials in its description of the highway network; relatively few collectors and local roads are included. For sub area and corridor studies requiring more precise results, additional network and zonal detail are added, and additional traffic counts are used in the calibration. The Environmental Protection Agency and the Federal Highway Administration have formulated guidelines for acceptable model practice in model formulations and have provided guidance on measures of performance.4
In order to provide reliable forecasts, models are developed to be robust, sensitive and accurate. There are no definitive standards for these qualities. A robust model is capable of providing useful guidance on issues of interest to local policy makers, while sensitivity refers to the model's ability to predict changes in travel behavior resulting from changes in demand (e.g., demographic variables) and supply (e.g., level of infrastructure) characteristics. Accuracy is measured as the level of agreement with observed data in a base-year model whose demand and supply attributes will be modified to reflect alternative future conditions. These observed data range from household trip generation rates and distribution patterns obtained from travel surveys to vehicle and passenger counts.
Traffic counts are the single most important source of observed data used in the calibration of the traffic assignment. Traffic count screen lines demarcate major areas of the model region, and provide one measure of how well the model replicates travel between adjoining regions. Percentage deviations from each crossing location, across the entire screen lines and across all screen lines are major outputs of the typical screen line report. Matches within 5 to 10 percent of observed daily volumes across all screen lines are generally considered adequate. Traffic count on individual links is a second source of assignment calibration data. A measure of average variation between observed and modeled data is often used to measure the quality of the traffic assignment calibration, using percentage deviation, root mean square error (RMSE) and percent RMSE. Percent RMSE is reported by facility type or by volume grouping; in general error tolerances are lower for high-volume facilities than for lower-volume facilities. FHWA-recommended targets for traffic count matches range from seven percent RMSE for freeways to 25 percent for collectors.
Models with transit assignment capabilities utilize station boarding and screen line ridership data for calibration. Time-of-day data are often more critical for transit assignment calibrations, since many assignments cover the morning or afternoon peak period only. More advance modeling practices perform multiple assignments by time of day. This is a considerable effort, because the service characteristics – routes, headways and fares - differ between the peak and off-peak periods.
Traffic count data are only one of several sources of error in a traffic model. Travel behavior is inherently complex and beyond the ability of the relatively simple formulations used in current state-of-the-practice models to predict with a high degree of accuracy. Understanding these limitations, many transportation agencies use the models to predict daily travel patterns, use summary statistics cast over broad areas and round results to an order-of-magnitude estimates, rather than roadway section-specific volumes. Model results are often used in a relative sense to evaluate the differences between two alternative scenarios.
Errors in calibration traffic count datasets may occur and cause temporal and spatial inconsistencies with the underlying network. Neighborhoods and other activity centers are represented as one or more points of access to the street system, making for very "lumpy" traffic distributions, in which modeled traffic volumes change sharply on either side of the traffic loading/unloading points. Traffic counts cannot be reconciled with these loadings very easily. In some cases the count must be moved to one side or another of the actual count location to avoid errors caused by the spatial aggregation of the activity centers. Temporal inconsistencies may arise as well. The model is supposed to represent a snapshot of travel behavior on an average day, when in fact the traffic counts are taken during different years or at different points in time during the year. The application of seasonal, growth and day-of-week factors does not guarantee a consistent distribution of the average day's travel. Counts are sometimes manually smoothed to reduce such inconsistencies.
Overall, the error tolerances of state-of-the-practice travel demand models are relatively high. The traditional threshold for error is one lane of hourly capacity, which can range from 700 for a local road to 2200 for a freeway or expressway. As more sophisticated techniques are adopted to address issues beyond roadway capacity needs, error tolerances will lessen correspondingly.
The Clean Air Act Amendments of 1990 stipulate that designated planning organizations ensure that the transportation projects identified in long-range plans contribute to air quality improvement goals for the region. The Act created air quality planning procedures that require the use of mobile source emissions estimates using vehicle miles of travel (VMT) derived from travel demand forecasting methods and other sources.
Emissions modeling uses VMT and emissions rates, which are developed from an emissions factor model, such as MOBILE 6.0, to estimate total emissions. Emissions of carbon monoxide, volatile organic compounds, sulfur dioxide and oxides of nitrogen are modeled using these inputs. The emissions conformity analysis requires the development of VMT distributions by 15 speed categories by vehicle class, hour and four facility types. In most cases, travel demand models are used for the VMT estimates while traffic count data, existing vehicle classification data and vehicle registration data are used to complete these distributions as inputs to the emissions factor model. Current year vehicle-miles of travel (VMT) are adjusted to match Highway Performance Monitoring System (HPMS) database totals by functional classification. HPMS data are also used for calibration and validation of the model in areas that perform air quality conformity analysis. Observed speeds and VMT are two critical data elements for model validation and calibration. Post-processing programs calibrated to match existing speed data from travel time surveys or dual loop count locations. Modeled VMT is adjusted to match total base year VMT from the HPMS.
Some transportation professionals believe that current state-of-the-art methods can forecast emissions with an accuracy of plus or minus 15 percent to 30 percent.5 Total regional VMT for the base year, which is dependent on accurate HPMS data, is an essential and critical input to the model calibration and thus to emissions estimates. EPA and FHWA have sought to improve modeling practices for air quality conformity analyses less through insisting on improved input data than in providing guidance on improved modeling procedures, such as the introduction of travel time feedback into trip distribution and the development of modeling estimates by time period.6
Air quality conformity analysis requires more detailed model and data than traditional transportation demand modeling analyses. Therefore we conclude that the coverage and accuracy needs for such application would be slightly more stringent than those for state of the practice modeling.
Federal rules require transportation management areas with populations over 200,000 to develop and implement Congestion Management Systems (CMS). The CMS is intended to be a systematic approach for monitoring and measuring transportation system performance and of diagnosing safety, mobility or congestion issues. The CMS is also used as the basis of evaluating and recommending alternative strategies to manage or mitigate regional congestion and to improve regional air quality. CMS findings may be used to inform project selections in the formulations of transportation improvement programs (TIPs) or constrained long-range transportation plans (LRTPs).
System performance measures based on travel time are generally preferred for CMS reports. Many areas routinely conduct floating car travel time studies to identify and monitor congestion in key metropolitan corridors. Real time traffic data from ITS systems are increasingly used to provide the data. For example, a contractor in Virginia (AirSage) recently began collecting cellular phone positional data in the Hampton Roads area from Sprint for the Virginia Department of Transportation (VDOT) and the regional MPO. Typically, the travel time data represented peak travel conditions. In some areas, travel demand models are used to meet CMS reporting requirements. Highway Capacity Manual techniques may be used to translate travel times or volumes to level of service estimates.
The CMS measures mobility trends at identical or similar locations over time. Consistency of data collection procedures and data analysis techniques is one of the major requirements for the CMS.
The Highway Performance Monitoring System (HPMS) is a federally sponsored highway database containing data on the extent, condition, and use of the nation's highway system. The HPMS is used for estimating highway needs, apportioning Federal highway funds to states, and reporting on highway condition and performance at the national level. Urban areas designated as National Ambient Air Quality Standard (NAAQS) non-attainment areas use the HPMS to report total vehicle miles of travel and other statistics for air quality conformity analysis. The HPMS is the data source for the Highway Economics Requirements System (HERS), which is an analytical tool used to estimate long-range national highway infrastructure needs and to set funding levels for Federal transportation appropriations bills. At the most detailed levels of application, states use HPMS to evaluate long-range funding needs in their own for statewide needs analysis.
States provide data for the HPMS annually on a valid sample of roadways, excluding local roads and minor collectors (for urban sections). Among the critical data items provided are average annual daily traffic (AADTs), percentage single unit and combination unit trucks on these sample sections. AADTs are reported for the current reporting year and for a forecast year, which usually corresponds to a 20-year forecast. Various geometric and operational characteristics of the sample roadway segments are reported as well. The HPMS is not used for analyzing individual corridors, roadway segments or sub areas. FHWA advises that HPMS traffic data be updated on a three-year basis, and that all counts are factored to represent current year AADTs, i.e., the appropriate growth, seasonal and axle correction factors be applied.
For the most part, AADT estimates on sample segments are derived from permanent count stations and short counts. Forecast AADT may be generated from travel demand models, or linear regression models which relate traffic growth to growth in population and jobs, or an extrapolation of growth trends exhibited in past traffic count data.
The sample sections are randomly selected from a list of highway sections belonging to one of a number of volume groups. Sample sections are fixed, that is to say the same sections are inventoried and updated on a regular, cyclical basis. Volume groups are established for each functional classification, and are defined by urban area size, air quality conformity status, and AADT volume ranges. The number of traffic count samples needed for each volume group is determined by the level of precision needed for the volume group, the variability of AADT in the group and the size of the universe of available sample sections. In general, the sampling target for most volume groups is associated with an error tolerance of 10 percent and a confidence interval of 90 percent. This means that 90 percent of the time, the data collected for any sample section in a volume group will be within 10 percent of its "true" AADT. Sample sections may be assigned to a different volume group if traffic growth warrants such a change.
FHWA provides HPMS submittal software with internal auditing and validation procedures to state DOTs. FHWA performs its own audit on the HPMS data as well. Audit procedures include screening AADT entries across multiple years to isolate and identify large deviations and abnormally high volume to service flow ratios (V/SF). FHWA field offices also perform HPMS process reviews with DOTs. One of the data items with the largest uncertainty is the truck percentages. Many HPMS segments use truck percentages from permanent count stations or similar functional classification locations.
Given the multitude of uses for the HPMS, accuracy, completeness and timeliness are essential. The data are only as accurate as the sampling methods, traffic data and the factoring procedures that underlie them.
The FHWA asks state DOTs to provide copies of continuous traffic volume data collected monthly by permanent count stations within 20 days after the close of the month for which data are collected. While providing volume data only is acceptable, FHWA encourages the provision of vehicle classification data whenever possible. Hourly traffic volumes are reported for each day that data are available. An acceptable submittal contains a minimum of seven days of data covering all days of the week, not necessarily from consecutive days.
Permanent count station data are the bedrock of a transportation agency's traffic count program. This data are used for the various factors used in a traffic count program, including seasonal, day of week, axle correction and growth factors. Data from count station sites are used as default values for time-of-day factors and for vehicle class distributions. Some agencies use these sites to identify speed enforcement needs.
Transportation agencies conduct safety studies to identify high-probability accident locations, and to identify and treat the cause of the accidents. Traffic data provide information on the relative exposure of travelers to accidents. Exposure is typically expressed in terms of accidents per million miles of travel (MVMT). Desktop safety studies may lead to field reconnaissance to gather additional information on traffic control measures, geometric characteristics or to perform speed studies.
Safety studies report to and use several databases. The Fatal Accident Reporting System (FARS) provides information on traffic fatalities nationwide, with state DOTs contributing most of the data. Additionally, many states maintain a safety management system, which is used to identify safety issues, document the testing and evaluation of potential safety enhancements, and finally, to implement solutions.
Safety studies are hampered by a lack of vehicle classification data, and particularly data on single unit and combination trucks, SUVs and other vehicles. In keeping with the recommendations of the 2001 Traffic Monitoring Guide, state DOTs are beginning to create factor groups for trucks.
The VMT estimates used in safety analyses are subject to the same factoring errors as daily counts used for other analyses. Safety studies would appear to have a relatively high tolerance for systematic bias, since the candidate sites are evaluated in comparison to one another. Likewise, because of the use of the accident per million vehicle miles as a metric, the statistic will not be as adversely affected by errors in the DVMT estimate as other types of analysis.
Traffic simulations mimic the real-time movement of vehicles through intersections, roadway corridors or small areas. Unlike most regional travel demand assignment software, simulation packages take into account most or all of the geometric and operational characteristics of the facility being simulated. These packages can produce second-by-second turning movement data by signal phase, weaving movements across lanes and the delay caused by the buildup and dissipation of queues in the traffic system. Traffic simulations are used for operations and design studies, and are essential in assessing whether a particular geometric configuration will accommodate the anticipated traffic demand. A freeway to arterial interchange design is a typical application of a simulation program. Examples of software packages in use today include Synchro and CORSIM.
Several of the packages produce striking visualizations of the projected motion of vehicles in the traffic stream, as well as detailed statistics such as stopped delay, speed by small increments, gap and headway statistics. Studies using these packages analyze relatively small increments of time such as peak-hour conditions. Relatively small areas such as intersections, portions of roadway corridors or small sub areas are analyzed.
Simulation packages are data intensive, often requiring detailed information about the operational and geometric characteristics of the roadway being simulated. This limits their application for planning purposes. Traffic data are a critical input to the simulation packages since the facility will be engineered to accommodate the traffic demand, recognizing right-of-way and other constraints. Most frequently, the most recent traffic counts available are used for the simulations, although forecast model data are sometimes used as well. For signal timing applications, turning movement data for morning, evening and off-peak are generally required.
There is a high level of confidence in the algorithms that are used to simulate traffic at the microscopic and mesoscopic levels. The largest source of error comes not from the algorithms themselves but from the traffic data inputs. There is a considerable though unquantified uncertainty over whether the input data are representative of the likely variability in the magnitude, temporal and spatial distribution of traffic. Another uncertainty is the degree to which the traffic count input is representative of peak demand, for which a facility is typically designed.
FHWA and many state DOTs perform field evaluations of new technologies in advance of large scale procurements of third-party products. These evaluations are often large, expensive and multidisciplinary, and consider the broader economic and institutional implications of the technology, as well as the narrow questions of effectiveness and efficiency of the technology itself. These technology evaluations assess the potential for success of the technology in large scale deployment, help determine their most appropriate applications and identify the critical external factors which are likely to contribute to the technology's success or failure. These evaluations vary widely in geographic scope, but corridor-level studies are not uncommon. In 2000, for example, the FHWA initiated a multi-year study on the use of wireless technologies for monitoring travel speeds on the Capitol Beltway around Washington, D.C.
The technology evaluations often develop detailed data collection plans as part of the overall evaluation plan. Data needs are specific to the evaluation and can vary from one application to another, but in general site-specific, finer grained data, temporally and spatially, is required for these evaluations than for other types of planning applications. An ideal data collection plan for such a study might include speed, volume and vehicle classification data at less than five-minute increments and between or at the approach to all roadway junctions covered by the study. Most studies fall short of this ideal due to resource constraints. The quality of the traffic data being collected must be monitored almost in real time, since the reliability of the results and findings depend so heavily on accurate, valid and reliable data.
Obviously, the reliability of these program evaluations depends greatly on the amount and the quality of the data collected. Relative to other types of applications, the need for valid, reliable and accurate traffic data is high.
Ramp signals at inbound freeway interchanges meter inbound traffic, allowing vehicles to enter the mainline traffic stream as acceptable gaps appear. Ramp signals have been installed in radial freeway corridors in many North American cities. The signals are designed to minimize disruptions to mainline freeway traffic flow and to maintain steady speeds on the freeway, as even minor, sudden reductions in speed can have major upstream ripple effects. The more advanced systems include algorithms that balance the objectives of smoothing freeway flow, with those of minimizing signal delay and the potential for spillover traffic into adjoining neighborhoods. Most systems are set not to exceed a maximum amount of maximum delay at the ramps regardless of main line conditions.
More advanced ramp signal systems are coordinated over an entire corridor and utilize real-time traffic information from the mainline and at the ramp approaches. These systems are able to adjust their signal timings automatically as conditions change, or be overridden by an operator. Older systems which are not demand responsive, however, rely on fixed timing schemes based on available traffic counts. Optimally, traffic volume data at two- to five-minute increments would be a minimum data requirement for adequate operation of the ramp signals.
Whether governed by fixed or demand-responsive timing schemes, the effectiveness of ramp signals is directly related to the timeliness and accuracy of the traffic volumes data received. There is a low tolerance for delay among travelers at the ramp signals, and the need for reliable and accurate data is very high.
Advanced traveler information systems (ATIS) alert travelers to unusual traffic conditions, allowing travelers to adjust their departure time, route or mode of travel so as to reduce or avoid travel delay. Sources of traveler information include radio and television-based traffic reports derived from monitored police, fire and rescue transmissions, information provided by transportation management centers (TMCs) or helicopter and video surveillance, 511 phone systems, web sites and freeway variable message signs. Many metropolitan travelers can access web sites that provide region wide color-coded maps of current traffic conditions, along with information about incident and accident locations. As of 2003, there were at least 11 metropolitan areas that offered travel time estimates on major freeways.7 A recent study8 estimated that the minimum ATIS accuracy requirements for freeway travelers in Los Angeles to be in the 13 to 15 percent error range. En-route information accessible from in-vehicle systems still lacks an attractive business model to entice widespread private sector participation and a demonstrated willingness to pay by the traveling public.
The most commonly available sources of traveler information are ubiquitous and free, but have not advanced in quality significantly over the past 20 years. The available data are neither timely nor of sufficient spatial coverage to provide reliable route-choice options for individual travelers. According to some studies, widespread availability of accurate, detailed and timely traveler information could improve the efficiency of highway operations by five to 10 percent, albeit at a significant cost.9
Pavement management systems use pavement condition data and sophisticated deterioration models to estimate future reconstruction, rehabilitation and overlay needs and costs. Pavement maintenance needs are a function of several factors, including the composition and condition of the surface and base, the geometric design of the roadway and the composition and magnitude of existing and anticipated traffic.
Pavement design requires information about vehicles and the loads they exert on the pavement beneath them. The 1986 AASHTO roadway design equations used 18,000-pound equivalent single-axle loads as the measure of load. The 2002 AASHTO pavement design equations use load spectra, which characterize traffic loads in terms of the distribution of single-, tandem-, tridem- and quad-axle configurations within each of a number of weight classifications. Volume, vehicle classification and weight data are required to develop load spectra estimates. Typically, weights by vehicle type are developed using data at static weigh stations or weigh-in-motion stations, and these data are applied to vehicle classification data derived from permanent count station and other count locations where classification count data are collected. Vehicle distribution factors, growth factors and seasonal factors are also used to develop volume estimates. Techniques for converting traffic counts to load spectra are under development through work sponsored by the Transportation Research Board (TRB, 2004 [NCHRP 1-37-A]).
Variability in traffic data and especially truck weight data is a significant issue in pavement design. To account for variability, the 1986 AASHTO Design equation included terms for standard deviation and the standard error for truck weight. The 1992 AASHTO Guidelines for Traffic Data Programs10 cites studies suggesting that the standard deviations for WIM data range from 0.55 to 0.80.
The 1992 AASHTO Guidelines demonstrated the relationship between traffic volume errors and overlay thickness. Because the error in overlay thickness increases non-linearly as traffic volumes increase, errors in vehicle classification can have a substantial impact on pavement design estimates. The Guidelines notes that traffic monitoring systems that can achieve traffic data accuracies representative of a 50 percent confidence interval result in pavement overlays (+/-) one-quarter inch to one-half inch of the true pavement thickness needed compared to counts representative of the 80 percent confidence interval,10 for roadway sections experiencing 2.5 million design-equivalent axle loads over the life of a roadway section. Errors of such magnitude can arise, for example, when system-level defaults for vehicle distributions are used for entire functional classifications of roadways, rather than using factors that reflect the prevailing traffic patterns for the roadway sections being analyzed.
The previous section described several typical planning, operations, and engineering applications, discussed various sources of error common to the application and assessed the application's tolerance for error in the types of traffic data ITS systems can provide. Table 4.1 presents a summary of estimated data quality targets for the different applications discussed above. These targets are defined for the six data quality measures:
| Transportation Planning Applications | Data Quality Attribute:1 Accuracy2 | Data Quality Attribute: Completeness | Data Quality Attribute: Validity | Data Quality Attribute: Timeliness | Data Quality Attribute: Typical Coverage | |
|---|---|---|---|---|---|---|
| Air Quality Conformity Analysis | VMT by vehicle class, hour and functional classification | 10% | At a given location 50% - Two weeks per month, 24 hours | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Within three years of model validation year | 75% Freeways/Expressways 25% principal and minor arterials 10% collectors |
| VMT by hour and vehicle classification (Distribution of VMT by speed) | +- 2.5 mph | At a given location 25% - one week per month, 24 hours | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent counts | Within three years of model validation year | 75% Freeways/Expressways 25% principal and minor arterials 10% collectors | |
| Standard demand forecasting for Long Range Planning | Daily traffic volumes | Freeways: 7% Principal Arterials: 15% Minor Arterials: 20% Collectors: 25% | At a given location 25% - 12 consecutive hours out of 48-hour count | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Within three years of model validation year | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors |
| Hourly traffic volumes | Freeways: 7% Principal Arterials: 15% Minor Arterials: 20% Collectors: 25% | At a given location 25% - 12 consecutive hours out of 48-hour count | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent counts | Within three years of model validation year | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors | |
| Vehicle occupancy | 10-15% | At a given location 25% - 12 consecutive hours out of 48-hour count | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent counts | Within three years of model validation year | 1-5% of total population (from surveys) | |
| Percentage single unit trucks Percentage combination trucks | 7-10% 3-5% | Minimum 25% - 12 consecutive hours out of 48-hour count Minimum 50% - 12 consecutive hours out of 24-hour count | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent counts | Within three years of model validation year | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors | |
| Transit boardings and alightings by station and/or stop | 15-20% 7-10% (Transit Planning) | 75% of annual data collection | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent counts | Within three years of model validation year | 100% of rail boardings 10% of bus route ridership from screen line data | |
| Transit vehicle speeds by analysis time period | 15-20% | <5% - one peak and one off-peak route | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent counts | Within three years of model validation year | 100% | |
| Free Flow link speeds | 15-20% | 90-100% validity for instrumented floating car data collection | 90-100% validity for instrumented floating car data collection | Within three years of model validation year | 100% Freeway mileage 100% Major arterial mileage 80-100% Collectors mileage 10% Local road mileage | |
| Congested link speeds | At V/C < 1.0, 10 mph At V/C >1.0, 2.5 mph | 90-100% validity for instrumented floating car data collection | 90-100% validity for instrumented floating car data collection | Within three years of model validation year | 100% Freeway mileage 100% Major arterial mileage 80-100% Collectors mileage 10% Local road mileage | |
| Traffic simulation | Traffic volumes by minute or sub-minute | 2.50% | 90% validity | Up to 15% failure rate - portable traffic counts | Within one year of study | 100% of study area |
| Turning movements by 15 minutes | 5-10% error rate | 95% validity - manual traffic counts | 0% failure - manual traffic counts | Within one year of study | 100% of study area | |
| Free Flow link speeds | 5.00% | 90-100% validity for instrumented floating car data collection | 90-100% validity for instrumented floating car data collection | Within one year of study | 100% of study area | |
| Congested link speeds and delay statistics | 2.50% | 90-100% validity for instrumented floating car data collection | 90-100% validity for instrumented floating car data collection | Within one year of study | 100% of study area | |
| Queue length | 95% validity - manual count | 100% validity - manual count | Within one year of study | 100% of study area | ||
| Congestion management | Corridor-level vehicle speeds and/or travel times by hour | 5% | 90-100% validity for instrumented floating car data collection | 90-100% validity for instrumented floating car data collection | Within six months of study | 100% of study area |
| Origin-Destination travel times by hour | 5% | 90-100% validity for instrumented floating car data collection | 90-100% validity for instrumented floating car data collection | Within six months of study | 1-5% of study area (from surveys) | |
| Highway Performance Monitoring System | AADT | 5-10% Urban Interstate 10% Other urban 8% Rural Interstate 10% Other Rural Mean Absolute Error | 80% continuous count data 70-80% for portable machine counts (24-/48-hour counts) | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data three years old or less | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors |
| K factor D factor | 5-10% RMSE (relative) 1% RMSE (relative) | 80% continuous count data 50% for portable machine counts (24-/48-hour counts) | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data three years old or less | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors | |
| Percent combination and single-unit trucks - Daily | 20% RMSE 15% RMSE | 80% continuous count data 50% for portable machine counts (24-/48-hour counts) | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data three years old or less | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors | |
| VMT | 5-10% RMSE Downward bias | 80% continuous count data 50% for portable machine counts (24-/48-hour counts) | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data one year old or less | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors | |
| Percent combination and single-unit trucks - Peak | 25% RMSE 20% RMSE | 80% continuous count data 50% for portable machine counts | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data three years old or less | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors | |
| Monthly count station volume reports | Hourly volumes for seven consecutive days each month | 2% RMSE | 100% valid data | 100% valid data required | Data one month old or less | <1% of total roadway mileage |
| AVC stations: Hourly volumes by vehicle class category | 15% Single-Unit Truck Classification Error | 100% valid data | 100% valid data required | Data one month old or less | <1% of total roadway mileage | |
| Transportation Operations Applications | Data Quality Attribute:1 Accuracy2 | Data Quality Attribute: Completeness | Data Quality Attribute: Validity | Data Quality Attribute: Timeliness | Data Quality Attribute: Typical Coverage | |
|---|---|---|---|---|---|---|
| Program and Technology Evaluations | Link and corridor volumes | 2% RMSE | 90% valid data | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Less than six months old | 75-80% coverage of corridor needed |
| Link and corridor delay statistics | 2% RMSE | 90% valid data | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Less than six months old | 75-80% coverage of corridor needed | |
| Pre-Determined Ramp and Signal Coordination | Link and corridor volumes | 2% RMSE | 90% valid data | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Less than three months old | 75-80% coverage of corridor needed |
| Link and corridor and delay statistics | 2% RMSE | 90% valid data | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Less than three months old | 75-80% coverage of corridor needed | |
| Traveler Information | Travel times for entire trips or portions of trips over multiple links (e.g., travel time to popular destinations from a point) | 10-15% RMSE | 95-100% valid data | Less than 10% failure rate | Data required close to real-time | 100% area coverage |
| Predictive traffic flow methods (still under research | Link volumes | 2% RMSE | 90% valid data | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data three years old or less | 100% area coverage |
| Link delay statistics | 2% RMSE | 90% valid data | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data three years old or less | 100% area coverage | |
| Highway Safety Applications | Data Quality Attribute:1 Accuracy2 | Data Quality Attribute: Completeness | Data Quality Attribute: Validity | Data Quality Attribute: Timeliness | Data Quality Attribute: Typical Coverage | |
|---|---|---|---|---|---|---|
| Exposure for safety analysis | AADT and VMT by segment | 5-10% Urban Interstate 10% Other urban 8% Rural Interstate 10% Other Rural Mean Absolute Error | 80% continuous count data 50% for portable machine counts (24-/48-hour counts) | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data one year old or less | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors |
| Traffic volumes and flow characteristics at times of specific crashes | 25% | 80% continuous count data 50% for portable machine counts (24-/48-hour counts) | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data one yeas old or less | 2-5% of total roadway segments | |
| Pavement Management Applications | Data Quality Attribute:1 Accuracy2 | Data Quality Attribute: Completeness | Data Quality Attribute: Validity | Data Quality Attribute: Timeliness | Data Quality Attribute: Typical Coverage | |
|---|---|---|---|---|---|---|
| Historical and forecasted loadings | Link volumes | 5-10% Urban Interstate 10% Other urban 8% Rural Interstate 10% Other Rural Mean Absolute Error | 80% continuous count data 70-80% for portable machine counts (24-/48-hour counts) | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data three years old or less | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors |
| Link vehicle class | 20% Combination unit 12% Single unit | 80% continuous count data 50% for portable machine counts (24-/48-hour counts) | Up to 15% failure rate - 48-hour counts Up to 10% failure rate - permanent count stations | Data three years old or less | 55-60% of freeway mileage 25% of principal arterials 15% of minor arterials 10-15% of collectors | |
Notes:
1"Accessibility" for all applications is discussed in the text.
2 Percentage figures correspond to estimate of Mean Absolute Percent Error (MAPE).
Note that assessments of accessibility by application are not included in Table 4.1. This is because, with one exception, the applications are not extremely sensitive, i.e., they do not typically require short access times. The exception is predictive traffic flow methods, which would require archive access time less than 30 seconds. The remainder of the applications can be adequately serviced with access times in the 5-10 minute range.
Sufficient temporal coverage and minimal data quality standards should be in place in advance of the transfer of data to the traffic monitoring system managers. System managers would initiate application specific QA/QC procedures for integrating other data sources into their systems. The data would then be transferred on request to users for applications.
It is clear that maintaining data quality levels requires additional effort on the part of transportation agencies to:
The extra costs associated with assessing and reporting data quality was considered an important issue at the regional TDQ workshops.
Table 4.2 presents estimates of the level of effort, expressed in hours of labor, required to implement a data quality assessment program. These estimates include the time required to calculate and report each of the measures. These are crude estimates that have not been validated in a real situation.
| Task | Action item | Assumed Units | Level of effort | Frequency |
|---|---|---|---|---|
| General | ||||
| Develop mechanism/ system for data quality assessment | Develop data reduction software or procedures | Per program | 40 hours | One time |
| Design and implement input data procedures | Per program | 40 hours | One time | |
| Test, refine, and update systems and software | Per program | 40 hours | Periodic | |
| Develop data quality reporting system | Design/develop reporting procedures and metadata templates | Per program | 40 hours | One time |
| Accuracy | ||||
| Develop reference or ground truth data | Design and collect sample baseline data | Per site or data source | 8 hours | As required |
| Assess accuracy of original source field data using independent equipment; and archived data | Download/process review data. Implement framework/software to calculate accuracy measures | Per site or data source | 1 hour | As required |
| Review results compared to targets | Per site or data source | 15 mins | As required | |
| Completeness, validity, timeliness | ||||
| Assess quality of original source and archived data | Download, process, and review data. Implement framework to calculate quality measures | Per site or data source | 1 hour | As required |
| Review results compared to targets | Per site or data source | 15 mins | As required | |
| Coverage, and accessibility | ||||
| Assess coverage and accessibility qualities of data for the program | Review coverage, accessibility requirements for the program | Per program | 1 hour | As required |
| Download and review data. Implement framework to evaluate data | Per program | 1 hour | As required | |
| Data Quality Reporting and Improvements | ||||
| Summarize and report data qualities to potential users. | Compile and report data quality to users (Metadata) | Per program | 8 hours | Periodic/ as required |
| Identify improvement and communicate quality problems. | Communicate quality problems to field personnel; schedule maintenance | Per site or data source | 4 hours | Periodic/ as required |
Note: As required – based on need and time scales e.g., annual, monthly, weekly, daily, or per request.
These levels of effort estimates are based on experienced data archive administrators who are familiar with the data collection and archiving protocols. Level of effort estimates could be significantly higher in other scenarios.
It is important to note that the estimates presented in Table 4.2 do not account for the level of effort required to maintain or improve data quality. These estimates represent the level of effort required to assess the quality of existing data. Since the labor rates for individuals who would be responsible for function may vary by agency and type of application, it is more appropriate to give guidance on the approximate duration required to perform these data quality calculations. It is also acknowledged that experience in performing these tasks will be reflected in the time and therefore of costs. It is also assumed that the time (cost) will also be a function of the type or source of data and the application. These variables are taken into account in developing the guidelines for costs associated with assessing and reporting data quality measures.
In estimating the level of effort, it is recognized that there are two components of time (cost) involved. First, an initial one time cost will be incurred in establishing the mechanism for assessing the quality of data. While the framework for assessing data quality developed in this project establishes that mechanism to some extent, some extra effort will be required to familiarize with the application of the framework and develop software programs or procedures based on the framework. Second, recurrent cost associated with the application of the framework to assess the quality of any new data. The information presented in Table 4.2 distinguishes between these two cost components.
Metadata is an extremely important consideration for data sharing in general, and especially for communicating data quality. While data users may be several degrees of separation away from data collection, knowledge about what the data represent and their collection conditions is key to their use.
Commonly referred to as "data about data," metadata is typically thought of as dataset descriptions. Metadata are analogous to a library card catalog that contains information about books: accession number, place of printing, author, etc. In this analogy, the books themselves are the "data". The descriptions typically found in a data dictionary (e.g., definition, size, source) are also metadata. Metadata has several purposes:11
Several existing standards provide a framework for using metadata to document data quality. For example, FGDC-STD-001-199812 is an existing American standard for digital geospatial data. The FGDC standard is used by numerous public agencies and private software companies in the United States and does support the reporting of data quality measures; however, the metadata standards community in the U.S. is beginning to move toward eventual adoption of ISO 1911513, an international metadata standard maintained by the International Standards Organization.
ASTM Committee E17.54 is currently developing metadata standards for archiving ITS-generated data. ASTM distinguishes several types of metadata that must be considered:
It is recommended that the ASTM standard, once approved, be used for documenting traffic data quality. This standard borrows heavily from the FGDC standard for general types of metadata (archive structure metadata) and is developing detailed data elements and record structures for processing documentation and data collection system metadata. An example of how the ISO 19115 standard can be used to document archive structure metadata is shown below.
Example Data Quality Documentation Using ISO 19115
This example is provided in a tabbed-outline format (Figure 4.1). Element values are underlined and role names are denoted with a "+". Underlines indicate entered data. Not all potential forms of metadata are entered since the focus here is on data quality.
This data archive contains traffic data summaries for several different granularity levels in time and space. For example, the available data granularity levels include both 15 and 60 minutes, as well as by lane or all directional lanes combined. The data in this archive have been organized in comma-separated value (csv) ASCII-text files in a way that supports easy import and use in desktop computer spreadsheet or database programs such as Microsoft Excel or Access. Alternatively, the data can also be batch-imported into a relational database management system (RDBMS) such as Oracle or Sybase.
|
MD_Metadata |
Figure 4.1. Example of Data Quality Documentation Using ISO 1915
The data archive also includes a sensor inventory spreadsheet that describes approximate sensor locations, sensor location groupings, and other descriptive information. The sensor inventory spreadsheet was developed by TTI with basic sensor information provided by TxDOT.
A shortcoming of the TxDOT ATMS filename convention is that it indicates only the day of the week, not the date. The date stamp on the file itself typically reveals the actual date since it is not contained in the filename. To add date stamps to the filename, we un-zip these files into 52 separate folders that correspond to the week of the year. The file "aus_unzip.xls" was used to create a *.bat file for batch processing. We then use a batch renaming program (CKRename) to substitute a date stamp (YYYYMMDD) for the weekday name, treating separately the files in each individual weekly folder. The renamed files have the filename convention "RR #### SCU YYYYMMDD HHMM.det" where RR=the route designation (e.g., IH, US, etc), ####=the route number (e.g., 0035, 0290, etc). These "SCU date stamp added" text files are then compressed for long-term storage. Note that there are probably more efficient solutions to getting the date stamps from these files into SAS (instead of including in the filename).
|
spatialRepresentationType: 001 |
Figure 4.1 (contd.). Example of Data Quality Documentation Using ISO 1915
Once date stamps have been added to the filename, we can then use SAS to import the CSV text files. We have developed "aus_reformat.sas" for this purpose. The SAS program "aus_reformat.sas" uses a csv template (e.g., "aus_2001_US0183.csv") for each corridor that contains the hourly files to be processed and the corresponding dates. This program combines all original source data (1-minute) for each corridor for the entire year into a single SAS dataset. Thus for 2001 we have 4 SAS datasets, with the filename convention "aus_2001_RR####". These 4 datasets are then compressed for long-term storage. The data are then ready for the next process step. In summary, the pre-processing is as follows:
|
+ unzip original files to folder corresponding to week number of the year using "aus_unzip.xls"
+ use batch processing and CKRename to change the weekday name to a date stamp, then compress and store these "date stamp added" text files ![]() . evaluationMethodType: |
Data sharing agreements codify the roles, expectations and responsibilities among the parties providing and using traffic data. Such agreements can conceivably occur between public entities, entirely between private entities or between private and public entities. In developing the guidelines for data sharing, three existing agreements were reviewed. A summary of these three data sharing agreements is presented below.
SMART Roads
The Virginia Department of Transportation (VDOT) has developed a set of "guidelines for access" to data from the five electronic traffic monitoring sites VDOT operates, under its SMART Roads system. The guidelines apply to new public/private partnerships between distribution providers (VDPs) and VDOT. The VDPs gain access to the traffic management centers and can resell the images collected to third parties, such as television stations. They can also install new equipment within the highway right-of-way. In return, the VDP must advance and support VDOT's goals for improved mobility and, more specifically, must provide free access to the video images through a web site. The only requirement relating to data quality is that the video images be refreshed at a rate of more than one frame per second. This document states that separate contracts will be entered into with individual firms who succeed in their bids to become partners with VDOT.
TRAVinfo
The San Francisco-based TravInfo provides basic ATIS services through a telephone traveler advisory system, which alerts users to incidents, accidents and congestion on the freeway system. Callers are also able to receive up-to-the-minute route-specific information, and are able to connect to all Bay Area transit and ride-share providers. Registered private sector entities are allowed to access TravInfo's open architecture database to provide value-added information on web pages, in-vehicle map displays, or personal digital assistants.
The engineering firm, PB/Farradyne (PBF), is under contract to manage the current ATIS system. The TravInfo contract with PB/Farradyne details "basic" and "enhanced" functional requirements for all aspects of the ATIS operation. Basic data requirements describe the types of data collected and the level of detail and accuracy required. Link speeds for example, are required to be accurate to within 25 percent of actual speeds. Incident data must be posted within one minute of accident verification. Basic data fusion requirements include quality controls for accuracy, timeliness, reliability and usefulness. Enhanced data requirements specify the extent of the data collection effort. Interestingly, these data quality requirements are not extended to third party data consumers.
PBF is responsible for entering into and managing data sharing agreements with third party users, known as registered data disseminators (RDDs). The RDDs are entitled to redistribute, enhance, repackage, or otherwise add value to the data they receive. The data sharing agreement goes to great length to indemnify the public sector data providers and PBF from responsibility for the quality of the data delivered and in fact warns the RDD that "information availability and data accuracy are all subject to change."
Las Vegas
The Las Vegas Area Computer Traffic System (LVACTS) developed a closed circuit video surveillance system for congestion management and accident and signal failure identification on the arterial roadway system 1993. LVACTS's data sharing agreement sets the broad terms for access to the live video images from the system to third parties. The video images are made available for the cost of the access connection; the agreement also states that a monthly subscription fee to defray the operating cost of the traffic management center may be applied. In the subscription agreement, LVACT agrees to provide the same video feed to all subscribers and retains control over the operation of the cameras, the traffic management center and the transmission equipment. The agreement also sets the specific terms of the permitted data uses and the actual charge. The subscribers are responsible for installing and operating any equipment needed for accessing the video feed, which cannot be resold to anyone who is not a party to the subscriber agreement. Finally, the agreement makes no mention of who is responsible for the quality of the data being transmitted nor are data quality standards specified. However, the agreement does contain a broad disclaimer indemnifying LVACTS from misuse or negligent use of the data.
Prior to any agency or company initiating a data sharing program, an agreement between the two parties must be negotiated and signed. This agreement is needed to define the expectations of both parties, a description of the information to be shared, the responsibilities of each party in the transaction, the limits of use or reuse of the data, any required procedures to send or receive the data and liability responsibilities.
Summary
Three themes emerge from a review of three data sharing agreements:
A review of data sharing agreements conducted for this project found that most existing agreements concerned the sharing of video images. Two agreements were reviewed that specifically address data other than video images. These are agreements developed by Virginia DOT and the Metropolitan Transportation Commission (MTC) in the San Francisco Bay Area.
An excerpt from the MTC agreement makes the following statement concerning data quality:
"PBF, MTC, Caltrans, and CHP and their suppliers make, and Registered Data Disseminator receives, no warranty regarding Provided Data, whether express or implied, and all warranties of merchantability and fitness of provided data for any particular purpose are expressly disclaimed. PBF, MTC, Caltrans, and CHP and their suppliers make no warranty that the information will be provided in an uninterrupted manner or that the Provided Data will be free of errors. Provided Data is provided on an "as is" and "with all faults" basis, with the entire risk as to quality and performance with Registered Data Disseminator."
The VDOT agreement does not address data quality. The agreement does make the following statement about video image quality:
"VDOT makes no warranty that the imagery will be provided in an uninterrupted manner. Imagery will be provided on an "as is" and "with all faults" basis." Data quality can be addressed in data sharing agreements by including clauses that provide one of several levels of guarantee, including the following:
As noted above, data quality specifications rarely appear in data sharing agreements between the end user and the data provider. Data sharing agreements typically discuss such items as security and confidentiality, liability, frequency of data transmittals, to whom the data may be disseminated, and fees. However, public sector end users are unlikely to adopt ITS data for their applications on a widespread basis without some assurances that the data meet some minimum standards consistent with current expectations. This section offers guidance on how data quality provisions can be added to data sharing agreements; the entirety of data sharing agreements is not discussed here.
Data providers in data sharing agreements can be either public or private agencies. The same goes for data recipients. Thus, four types of agreements are possible: public-to-public-to-private, private-to-public, and private-to-private. Ignoring other terms of data sharing agreements (such as liability, restrictions on use) and focusing strictly on data quality, there is not much difference in how data quality would incorporated into any of these arrangements. The key decision in structuring data quality clauses is to what extent minimum acceptable data quality criteria are established and enforced. Conceptually, three levels exist for this type of specification:
| Type of Location | Proposed Minimum Quantity Standard | Proposed Quality Standard | |
|---|---|---|---|
| Roadway sections | Single location | Seven consecutive days per month | |
| Single corridor | 100 percent coverage one day per month | Daily count within 10 percent of machine or manual count within 15 percent of hourly count as measured once per year. Twenty percent sample of locations. | |
| Areawide | 75 percent coverage one day per month | Daily count within 10 percent of machine or manual count within 15 percent of hourly count as measured once per year. Five percent sample of locations. | |
| Intersections | Single location | Seven consecutive days per month | N/A |
| Single Corridor | 100 percent coverage one day per month | Five and 10 percent standard applied every five miles in corridor once time per year. Five percent sample of intersection locations. | |
| Areawide | 75 percent coverage one day per month | Five and 10 percent standard applied to one location per corridor per year. One percent sample of locations. | |
|
3. DATA QUALITY FOR ITS-GENERATED VOLUMES AND SPEEDS (Note: text in italics indicate options) 3.1 Reporting Data Quality. The data to be supplied under this agreement shall be reported using the latest metadata standard developed for archived ITS data by the American Society for Testing and Materials. 3.2 Minimum Data Quality Criteria. All tolerances refer to the testing methods in Section 3.3. The definitions of these attributes appear in "Traffic Data Quality Measurement, Final Report, 2004". 3.3 Tests to Determine Data Quality and Frequency of Reporting |
Figure 4.2. Example Language for Specifying Minimum Data Quality Criteria in a Data Sharing Agreement
Payment of the contract amount shall be determined based on the percentage of volume data that annually pass a composite accuracy, completeness, and validity score as follows. The combined score is calculated as the product of the accuracy, completeness, and validity tests:
| Composite Score | % of Contract Amount |
| 75-100% | 100% |
| 50-74% | 75% |
| 30-49% | 50% |
| 15-29% | 25% |
| < 15% | 0% |
Note that other quality measures can be used in computing the composite score. The choice of measures could be driven by the application or the source of the data. Also note that the graduated scale presented above is for illustration purposes only. This concept has not been tested.