
WORK ORDER NUMBER BAT-02-006
Final Report
to
Federal Highway Administration
Washington, D.C.
![]()
505 King Avenue
Columbus, Ohio 43201
In
Association with
Cambridge
Systematics, Inc.
Texas
Transportation Institute
September 25, 2003
.
WORK ORDER NUMBER BAT-02-006
Final Report
Prepared for
Federal Highway Administration
Washington, D.C.
Principal Authors
Dr. Edward Fekpe, PEng.
Mr. Deepak Gopalakrishna
September 25, 2003
The authors gratefully acknowledge the support and guidance of Mr. Ralph Gillmann of the Federal Highway Administration Office of Policy and Mr. James Pol of the Intelligent Transportation Systems Joint Program Office throughout this project.
The authors
also acknowledge the support of the Department of Transportation of the states
of Ohio and Utah for hosting the regional workshops. Mr. David Gardner of Ohio DOT and
Ms. Dian Williams of Utah DOT deserve recognition for their roles in organizing
these regional workshops. Ms. Tami
Hannahs and Ms. Lynn Price of Battelle also provided valuable logistic
assistance in organizing the workshop in Columbus, Ohio. The authors acknowledge the valuable inputs
provided by state and local agency officials during the interview process and
all the workshop participants.
The authors also acknowledge the valuable inputs provided by the project team, particularly in developing the white papers and conducting the regional workshops. The project team members are:
Dr. Edward Fekpe, Principal Investigator, (Battelle)
Mr. Deepak Gopalakrishna (Battelle)
Ms. Mala Raman (Battelle)
Dr. Rich Margiotta (Cambridge Systematics Inc.)
Dr. Dan Middleton (Texas Transportation Institute)
Mr. Shawn Turner (Texas Transportation Institute).
Table of Contents
Action Plan Implementation and Work Items
Case Studies and Clearinghouse
1.2 Project
Objectives and Scope
2.1 Traffic
Data Quality Issues
2.2 Data
Collection – Interviews
2.3 Development
of White Papers
2.6 Additional
Traffic Data Quality Literature
3.2 Session
1 – Defining and Measuring Traffic Data Quality
3.2.3.1... Discussions
– Ohio Workshop
3.2.3.2... Discussions
– Utah Workshop
3.3 Session
2 – State of the Practice in Traffic Data Quality
3.3.1 Types and
Applications for Traffic Data
3.3.2 Traffic
Data Quality: Characteristics
3.3.3 Quality
Issues for Using ITS-Generated Data for Traditional Uses
3.3.4 Recommendations: Possible Solutions
3.3.5.1... Discussions
– Ohio Workshop
3.3.5.2... Discussions
– Utah Workshop
Table of Contents (Continued)
3.4 Session
3 – Advances in Traffic Data Collection and Management
3.4.2 Innovative
Contracting Methods
3.4.4 Training
for Data Collection
3.4.5 Data
Sharing Between Agencies and States
3.4.6 Advanced
Traffic Detection Techniques
3.4.7.1... Discussions
– Ohio Workshop
3.4.7.2... Discussions
– Utah Workshop
3.5.1 Defining and
Measuring Traffic Data Quality
3.5.4 Responsibilities
and Timeline
4.0 ACTION PLAN FOR IMPROVING TRAFFIC DATA QUALITY
4.2 Partnerships
and Coordination
4.3.1 Guidelines
and Standards for Calculating Data Quality Measures
4.3.2 Compilation
of Business Rules/Data Validity Checks and
Quality Control Procedures
4.3.3 Best
Practices for Equipment Installation and Maintenance
4.3.4 Clearinghouse
for Vehicle Detector Information
4.3.5 Sensitivity
Studies to Demonstrate “Value of Data”. 27
4.3.6 Guidelines
for Sharing Resources
4.3.7 Life-cycle
Costs of Detection Equipment
4.3.8 Improved
Contracting Approaches
4.3.9 Case Study
or Pilot Tests
4.3.10 Guidance on
Technologies and Applications
4.4 Implementation
and Work Items
4.4.3 Case
Studies and Clearinghouse
Table of Contents (Continued)
List of Appendices
APPENDIX B:.... INTERVIEWEE
CONTACT LIST AND INTERVIEW GUIDE
APPENDIX C:.... REGIONAL
WORKSHOP ATTENDEES
APPENDIX D:.... RELEVANT
TRAFFIC DATA QUALITY LITERATURE
List of Figures
Figure 1. Traffic Data Quality Research Approach
AADT Average Annual Daily Traffic
AASHTO American Association of State Highway Transportation Officials
ADUS Archived Data User Service
AMATS Akron Metropolitan Area Transportation Study
ARTIMIS Advanced Regional Traffic Interactive Management and Information System
ASTM American Society for Testing and Materials
ATIS Advanced Traveler Information Systems
ATMS Advanced Traffic Management Systems
ATR Automatic Traffic Recorder
COTR Contracting Officer’s Technical Representative
DOT Department(s) of Transportation
EDL Electronic Document Library
ESAL Equivalent Single Axle Loads
FHWA Federal Highway Administration
FOT Field Operational Test
GIS Geographic Information System
ITS JPO Intelligent Transportation Systems – Joint Program Office
ITS Intelligent Transportation Systems
MAG Maricopa Association of Governments
NOACA Northeastern Ohio Areawide Coordinating Agency
ODOT Ohio Department of Transportation
OKI Ohio-Kentucky-Indiana Regional Council of Governments
ROW Right-of-Way
RTMS Remote Traffic Microwave Sensor
TMCs Traffic Management Centers
TMG Traffic Monitoring Guide
TRB Transportation Research Board
TTI Texas Transportation Institute
UDOT Utah Department of Transportation
VDC Vehicle Detector Clearinghouse
VDOT Virginia Department of Transportation
WIM Weigh-in-Motion
WSDOT Washington State DOT
Recent research and analysis have identified several issues regarding the quality of traffic data available from Intelligent Transportation Systems (ITS) for transportation operations, planning, or other functions. Since Federal agencies use and disseminate traffic data from state and local agencies, the quality of the data becomes even more critical. The quality of the traffic data and the information produced from the data are critical factors that affect the abilities of transportation agencies to ensure the security of transportation and the management of the nation’s transportation resources. The focus of data quality is on establishing a consistent methodology for ensuring that data are managed so that a measure of reliability is sustained. The primary objective of this project is to define an action plan to address traffic data quality issues. Such an action plan should include work items that can be executed through the U.S. Department of Transportation (DOT), stakeholder organizations (e.g., American Association of State Highway Transportation Officials [AASHTO], ITS America), and state DOTs.
The development of the action plan involved several steps. First, the issues associated with traffic data quality were reviewed. Second, three white papers were developed whose themes were based on the issues identified. The white papers were developed from information gathered from published literature and through interviews with state and local agencies involved with traffic data collection, use, and management. The white papers are designed to explore the issues and current practices for ensuring data quality. The scopes of the three white papers and the issues addressed are outlined below.
Theme #1:
Defining And Measuring Traffic Data Quality (EDL # 13767).
This white paper defines the measures and methods for quantifying traffic data. Issues considered include definition of traffic data quality for different users and for different applications; data quality metrics or measures; methodology for assessing traffic data quality; and acceptable levels of quality.
Theme # 2: State-of-the-Practice in Traffic Data Quality
(EDL # 13768).
This white paper documents issues, measures, and approaches for assessing, using, and accommodating traffic data quality in various applications. Issues considered include types and applications of traffic data being used by the states; how data quality problems are handled in various applications; methods used or studies conducted by states to ensure data quality; and institutional issues, data sharing issues and funding constraints.
Following the development of the white papers, two regional workshops on traffic data quality were conducted. The three white papers were used to stimulate discussions and obtain inputs from the workshop participants to develop an action plan that addresses traffic data quality issues. The workshops, sponsored by FHWA Office of Policy, the ITS Joint Program Office (JPO), Ohio Department of Transportation (ODOT), and Utah Department of Transportation (UDOT) were held on March 11, 2003 in Columbus, Ohio and on March 13, 2003 in Salt Lake City, Utah.
The workshop attendees included data providers and users as well as those who influence data collection activities in one way or another. In attendance were private sector travel information providers, representatives from 10 state DOTs: Ohio, Delaware, Indiana, Kentucky, Pennsylvania, Utah, Idaho, Texas, Washington, and California. Also, in attendance were representatives from Advanced Regional Traffic Interactive Management and Information System (ARTIMIS) in Cincinnati, Ohio; Maricopa Association of Governments (MAG) in Arizona; Northeast Ohio Areawide Coordinating Agency (NOACA); Ohio-Kentucky-Indiana (OKI) Regional Council of Governments; and Akron Metropolitan Area Transportation Study (AMATS).
The action plan builds upon the findings in the white papers
and inputs obtained from the regional workshops. The action plan provides a blueprint for specific actions to
address traffic data quality issues.
Implementation of the plan will require collaboration among both public
and private partners with the FHWA and state DOTs playing leading roles. The
plan identifies the following 10 priority action items based on those
identified at the regional workshops.
1. Develop guidelines and standards for calculating traffic data quality measures. The guidelines and standards are expected to contain methods to calculate and report the data quality measures for various applications and levels of aggregation.
Coordinators: FHWA or AASHTO
2. Synthesize validation procedures and rules used by various states and other agencies for traffic monitoring devices. The synthesis document should include quality control procedures for all types of applications and data management methods for maintaining high quality data.
Coordinators FHWA, states
3.
Develop a synthesis of best practices for
installation and maintenance of traffic monitoring devices. This document should include guidance for
establishing quality; standard test methods for determining accuracy and other
data quality measures; “triggers” for conducting maintenance; and guidance for
selecting strategic traffic monitoring device locations.
Coordinators: FHWA, states
4. Establish a clearinghouse for vehicle detector information. Establish an independent testing entity to conduct periodic tests and verify claims of the new and emerging traffic detection devices on the market. Store results of tests in a clearinghouse that can be accessed by all potential users.
Coordinators: FHWA, Vehicle Detector Clearinghouse (VDC), states
5. Conduct sensitivity analyses and document the results to illustrate the implications of data quality on user applications. Based on the results of the sensitivity analysis, develop data quality “targets” or “benchmarks’ for each application. The results of the sensitivity analysis would be used to provide guidance or procedures for imputing missing data points.
Coordinators: FHWA, states
6. Develop guidelines for sharing resources for traffic monitoring activities. The guidelines should contain information on shared equipment, personnel, funding, and cooperation among different agencies and departments. The guidelines should also include public-private collaboration approaches and practices which establish trust in private sources of data
Coordinators: FHWA, states
7. Develop a methodology for calculating life-cycle costs. The methodology would enable states and other agencies to investigate alternative data collection technologies; develop quality levels as a function of investment in installation and maintenance; and coordinate or leverage operations and other activities in more than one location or jurisdiction.
Coordinators: FHWA, states
8. Develop guidelines for innovative contracting approaches for traffic data collection. The guidelines should include information on performance-based contracting and management, task-order-type contracts and cooperative agreements for equipment installation and maintenance, and life-cycle-cost based bidding.
Coordinators: FHWA, states
9. Conduct a case study or a pilot test. The goal is to observe state DOT and TMCs working to improve data quality and evaluate the return on investment from the improved data quality.
Coordinators: FHWA, states
10.
Provide guidance on
technologies and applications. This
action item is in two parts:
(i) provide guidance on the data elements to measure and report since this
dictates the type of device procured by the agency, and (ii) provide guidance
on the innovative and emerging uses of loops and existing technologies.
Coordinators:
FHWA, states
Action Plan Implementation and Work Items
FHWA would play a leading role in the overall implementation
of the action plan. Following are the
three potential groups of activities or work items to implement the action
plan.
The majority of the action items relate to the development of
guidelines, which are best implemented through research studies. Action items in this category include the
following:
Some of the action items could be implemented through regional
workshops. Action items in this
category are those that require sharing of experiences and success
stories. The following are action items
in this category:
Case Studies and Clearinghouse
Action item in
this category require establishing or identifying an independent entity and
conducting case studies. The following
are the action items in this category:
Recent research and analysis have identified several issues regarding the quality of traffic data available from Intelligent Transportation Systems (ITS) for transportation operations, planning, or other functions. For example, the Advanced Traveler Information Systems (ATIS) Data Gaps Workshop in 2000 identified information accuracy, reliability, and timeliness as critical to ATIS. The key findings of the workshop, which are included in a document titled “Closing the Data Gap: Guidelines for Quality Advanced Traveler Information System (ATIS) Data” (U.S.DOT, 2000), are the following:
· Guidelines for quality data go beyond ATIS.
A recent report, “Sharing Data for Traveler Information: Practices and Policies of Public Agencies” (Battelle, 2001), issued in January 2002 examines policies aimed at facilitating data sharing and ultimately improving the quality and quantity of information that reaches travelers.
The ITS Archived Data User Service (ADUS) promotes reuse of traffic data collected for real-time operations. The ATIS and Advanced Traffic Management Systems (ATMS) are generating large amounts of traffic data that could be used in other applications, such as performance monitoring. However, initial experience with ITS traffic data has identified serious data gaps and data quality deficiencies. Data can be edited after the fact to remove errors but the problem still remains at the source. The need for guidelines for sharing traffic data among various agencies and users has been recognized.
Section 515 of the Treasury and General Government Appropriations Act for Fiscal
Year 2001 (Public Law 106-554; H.R. 5658) directs the Office of Management and
Budget to issue government-wide guidelines that provide policy and procedural guidance to Federal agencies for ensuring and maximizing the quality, objectivity, utility, and integrity of information (including statistical information) disseminated by Federal agencies. Since Federal agencies use and disseminate traffic data from State and local agencies, the quality of the data will become even more critical.
It is also recognized that the quality of the traffic data and the information produced from the data are critical factors that affect the abilities of transportation agencies to ensure the security of transportation and the management of the nation’s transportation resources. Data reliability requires that the INFOstructure consistently produce output that the public sector and the private sector can accept without skepticism or distrust. Effective data quality methods and tools are critical for ensuring the success of INFOstructure applications.
The focus of data quality is on establishing a consistent methodology for ensuring that data are managed so that a measure of reliability is sustained. Several factors affect data quality, including addressing “data gaps” to rectify coverage deficiencies as well as data compatibility across different software/hardware platforms; ensuring that data elements are efficiently matched with coordinated location and time elements; and resolving conflicts among data formats so that data are manipulated to satisfy information and presentation needs.
1.2 Project Objectives and Scope
The primary objective of this project is to define an action plan with work items that can be executed through the U.S. Department of Transportation (DOT), stakeholder organizations (e.g., American Association of State Highway Transportation Officials [AASHTO], ITS America), State agencies, and private industry. It is anticipated that this effort will establish a multi-year program that will reinforce and sustain the value of INFOstructure applications. Specifically, this project will:
(1) Develop white papers that explore the issues and current practices for ensuring quality, focusing on transportation but also considering how data quality is addressed in other industries
(2) Develop a draft action plan and timeline for U.S. DOT and others to pursue that will develop metrics, tools, and recommended practices to ensure that data quality is effectively attained
(3) Assemble a workshop that includes the co-sponsorship of relevant stakeholder organizations to address the issues and to validate and revise the action plan and timeline
(4) Prepare proceedings and a compendium of the workshop along with an analysis of the validated action plan.
The remainder of this report is divided into several chapters:
Chapter 2 presents an overview of the research approach. It also describes the major issues associated with traffic data quality.
Chapter 3 presents the proceedings of the two regional workshops. This chapter includes summaries of the white papers, workshop discussions, and action items identified at the workshops.
Chapter 4 presents the action plan for addressing the traffic data quality issues. The action plan describes the action items and identifies the responsible agencies for implementing the action items.
Chapter 5 presents the concluding remarks and recommendations.
The detailed white papers and list of workshop participants are included as appendices to the report. Other relevant literature on traffic data quality is also included in the appendices.
The research approach adopted for the project comprises a number of steps as summarized in Figure 1. These steps are discussed below.

Figure 1. Traffic Data Quality Research Approach
2.1 Traffic
Data Quality Issues
As a first step, a kick-off meeting was held at the start of the project with the primary objectives to (i) review the traffic data quality issues, (ii) discuss the themes for the white papers, and (iii) review the strategy for conducting the research. Several issues associated with traffic data were identified that are common to various applications. These issues must be addressed to ensure better quality traffic data for ATIS, ATMS, and ITS data archiving and re-use. These issues can be grouped in different categories, as shown below:
Definition and Measurement Issues
· Defining data quality attributes, including accuracy, consistency, reliability
· Identifying differences in quality perceived by public and private sector data collectors and users
· Quality of data as a function of its intended use
·
Measuring
and ensuring quality data
·
Quantitative
and qualitative metrics/levels
· Identifying minimum acceptable levels of data quality for different applications
·
Quality
control (fixing the problem at the source)
·
Lack of
understanding of the full scope of the issue
·
Lack of a
consistent approach for ensuring consistent quality
Equipment Installation and Maintenance Issues
·
Subcontractors
install loops carelessly
·
Power and
communications disruptions
·
Mix of
technology introduces inherent data discrepancies
· Innovative approaches to data collection
· Loop detectors versus non-intrusive data collection devices
· Those who maintain detectors may be different from those who install them
·
Effects of
contracting approach on data quality
· Relationship between data collection device and quality
·
Loops get
torn out by third parties
Coverage Issues
·
Share
traffic data or collect it yourself
·
Better
quality with less coverage or lower quality with more coverage
· Better definition of depth of coverage
· Coverage of detectors seems to focus on traffic monitoring, but what about forecasting
Resource Issues
·
Budget
limitations for traffic data collection
·
Lack of
field staff for proper maintenance of monitoring devices
·
Lack of
expertise in data management issues
· The implications of funding levels on quality of data collected
Institutional Issues
· Institutional issues relating to data collection and sharing
· Regional or state versus national level interests and perspectives of data quality
These issues were used to scope three white paper themes. Each white paper addresses a set of issues and includes a summary of previous literature, innovative practices, and barriers that exist in transportation operations that prevent data quality metrics, tools, and methodologies to be established. In order to obtain more current information regarding practices, tools, and methodologies, a few states and other users of traffic data were interviewed.
It was also decided
at the kick-off meeting that two or more regional workshops be conducted rather
than the originally planned single national workshop. The regional workshops were expected to provide the opportunity
to share experiences and gather inputs from a wider range of traffic data
users.
2.2 Data Collection – Interviews
In developing the white papers, officials from state DOTs and ITS groups were contacted and interviewed. Representatives from seven states were interviewed: Arizona, Minnesota, Ohio, Kentucky, Pennsylvania, Utah, and Virginia. A structured interview guide was developed and used in conducting the interviews. The contact list and interview guide are included as Appendix B of this report. Information gathered from the interviews was incorporated into the white papers.
2.3 Development of White Papers
As noted above, the white papers were
developed from literature review and information gathered through the
interviews. The draft white papers were
revised based on review comments from the FHWA. Full versions of the revised white papers are provided in
Appendix A to this report. Chapter 3 of
this report presents summaries of each white paper and discussions on the
findings of the regional workshops. The
following are the three white papers that were developed by the project
team.
White Paper #1: Defining
and Measuring Traffic Data Quality (EDL # 13767)
Scope: This white paper defines measures and methods for quantifying traffic data. Issues considered include:
White Paper #2: State of the Practice for Traffic Data Quality (EDL # 13768)
Scope: This white paper documents the issues, measures, and approaches for assessing, using, and accommodating traffic data quality in various applications. Issues considered include:

Two regional
workshops were conducted with the primary objective of obtaining inputs from
participants in developing an action plan to address traffic data quality
issues. The goal was to define an
action plan with work items that can be executed by the U.S. Department of
Transportation (DOT), stakeholder organizations (e.g., American Association of
State Highway Transportation Officials [AASHTO], ITS America), state agencies,
and private industry.
The regional
workshops were sponsored by FHWA Office of Policy, the ITS Joint Program Office
(JPO), Ohio Department of Transportation (ODOT), and Utah Department of
Transportation (UDOT). The workshops
were held on March 11, 2003 in Columbus, Ohio and on March 13, 2003 in Salt
Lake City, Utah. The revised white
papers were distributed to the attendees about two weeks in advance of the
workshops, giving them the opportunity to read and be familiar with the concepts
and material to be discussed. The white
papers served as inputs to stimulate discussions at the regional workshops.
The workshops were
intended for state DOT professionals responsible for collecting and using
traffic detector data for any application including representatives from traffic
management centers (TMCs), traffic operations, traffic monitoring, and planning
divisions. The workshop
attendees included data providers and users as well as those who influence data
collection activities. This group
includes officials, administrators, or managers involved in budgeting and
funding as well as contractors who provide and install data collection
devices. In attendance were private
sector travel information providers and representatives from 10 state DOTs
(Ohio, Delaware, Indiana, Kentucky, Pennsylvania, Utah, Idaho, Texas,
Washington, and California). Also in
attendance were representatives from Advanced Regional Traffic Interactive
Management and Information System (ARTIMIS) in Cincinnati, Ohio; Maricopa
Association of Governments (MAG) in Arizona; Northeast Ohio Areawide
Coordinating Agency (NOACA); Ohio-Kentucky-Indiana (OKI) Regional Council of
Governments; and Akron Metropolitan Area Transportation Study (AMATS). The
list of workshop attendees is provided in Appendix C of this report.
The draft
proceedings of the two regional workshops were prepared and circulated among
the workshop attendees for review and comments. The workshop proceedings included summaries of the white papers,
the discussions, and actions items. The
combined proceedings from the two workshops are presented in Chapter 3 of this
report.
Several action items were identified and prioritized at the
two regional workshops. The action plan
described in Chapter 4 of this report builds upon the findings in the white
papers and inputs obtained from the regional workshops and reflect a broadly
based consensus of the workshop participants.
2.6 Additional Traffic Data Quality Literature
Additional relevant information on traffic data quality issues are compiled and presented in Appendix D of this report. Specifically, the literature pertains to data sharing, institutional issues, vehicle classification, and loop detector failures. These documents are intended to provide more detail on some of the major issues discussed at the regional workshops and in the white papers.
This chapter
presents the combined proceedings of the two regional traffic data quality
workshops. Dr. Edward Fekpe, the
principal investigator of the project, opened each workshop by welcoming all
participants and providing a concise overview of the traffic data quality
project. He also provided a description
of the approach used in developing an action plan to address the various issues
relating to traffic data quality.
At the regional
workshop in Columbus, Ohio (March 11, 2003), Dr. Fekpe reviewed the agenda for
the workshop and then introduced the Contracting Officer’s Technical
Representative (COTR) for the project, Mr. Ralph Gillmann, to discuss the
objectives of the workshop.
Mr. Gillmann outlined the objectives of the project and the expectations for
the one-day workshop. He gave a
background of recent efforts including workshops and studies that addressed
issues of ITS-generated data. The most
recent activities that were highlighted include:
Mr. Gillmann also
distinguished between real-time and archived data with respect to their uses
and the quality requirements for each type.
Finally, Mr. Gillmann outlined the objectives of the workshop, which
included agreeing upon the institutional and technical traffic data quality issues. The primary goal of the workshop was to
define an action plan that includes successful practices, new solutions, and
priorities. Mr. Gillmann also
emphasized that data from traffic detectors were the main focus, although other
traffic data would not be excluded.
At the regional
workshop in Salt Lake City, Utah (March 13, 2003), Mr. James Pol presented
objectives of the meeting and the expectations from the one-day workshop. Mr. Pol gave a background of recent efforts
including workshops and studies to address issues of ITS-generated data. He outlined the objectives of the workshop,
which included agreeing on technical and institutional traffic data quality
issues. He also mentioned the added
importance of traffic data quality with new INFOstructure and integration
strategies being proposed for ITS. As
at the Ohio workshop, the primary goal of the Utah workshop was to define an
action plan that includes successful practices, new solutions, and priorities.
The three white
papers were presented at each workshop, followed by a detailed discussion of
the issues raised. The remainder of
each workshop was devoted to discussions to obtain inputs and ideas for the
development of the action plan. Various
traffic data quality action items were identified and discussed. The following sub-sections present summaries
of the white papers, detailed discussions, and action items.
3.2 Session 1 – Defining and Measuring Traffic Data Quality
The white paper
titled “Defining and Measuring Traffic Data Quality” was written by
Mr. Shawn Turner (TTI) for this project.
The complete version of the white paper is provided in Appendix A. In developing this white paper, current and
advanced practices for addressing data quality were reviewed for three types of
user communities: 1) real-time traffic
data collection and dissemination; 2) historical traffic data collection and
monitoring; and 3) other industries such as data warehousing, management
information systems, and geospatial data sharing. The recommendations in this paper follow from this review.
The literature
contains two similar definitions for data quality. Strong, Lee, and Wang (1997) define information quality as “fit
for use by an information consumer” and indicate that this is a widely adopted
criterion for data quality. English
(1999A) further clarifies this widely adopted definition by suggesting that
information quality is “fitness for all purposes in the enterprise
processes that require it.” English emphasizes that it is the “phenomenon of
fitness for ‘my’ purpose that is the curse of every enterprise-wide data
warehouse project and every data conversion project.” English (1999B) defines information quality as “consistently
meeting knowledge worker and end-customer expectations.” It is clear from these
definitions that data quality is a relative concept that could have different
meanings to different consumers. For
example, data considered to have acceptable quality by one consumer may be of
unacceptable quality to another consumer with more stringent use
requirements. Thus it is important to
consider and understand all intended uses of data before attempting to
measure or prescribe data quality levels.
The recommended
definition for traffic data quality is as follows:
“Data quality is the fitness of data for all purposes that require
it. Measuring data quality requires an
understanding of all intended purposes for that data.”
Based upon the
review, the following data quality measures are recommended:
There are several
other data quality measures that could be appropriate for specific traffic data
applications. The six measures
presented above, however, are fundamental measures that should be universally
considered for measuring data quality in traffic data applications.
At this time, it is
recommended that goals or target values for these traffic data quality measures
be established at the jurisdictional or program level based on a better and
more clear understanding of all intended uses of traffic data. It is evident that data consumers’ needs and
expectations, as well as available resources, vary significantly by implementation
program, urban area, and state and preclude the recommendation of a universal
goal or standard for these traffic data quality measures.
It is also
recommended that if data quality is measured, a data quality report be included
in metadata that is made available with the actual dataset. The practice of requiring a data quality
report using standardized reporting is common in the GIS and other data
communities. In fact, several metadata
standards already exist (FGDC-STD-001-1998 and ISO DIS 19115) for standardized
reporting of data quality in datasets.
Until a formal traffic data archive metadata standard is approved, the
traffic data community should create metadata based upon the core elements
(i.e., mandatory metadata items) required in these two other geospatial
metadata standards.
The following points
were suggested as discussion items at the end of the presentation:
3.2.3.1 Discussions – Ohio Workshop
Shawn Turner (Texas Transportation Institute) initiated the discussions by asking the workshop participants about their reactions to the data quality measures. While there was overall agreement that the data quality measures are adequate, there was discussion about some of the measures.
The completeness measure was acknowledged as a good measure. There was some concern that reporting this measure could be embarrassing for state agencies. None of the state agencies currently report it. Rob Bostrom from the Division of Planning, Kentucky Transportation Cabinet, stated that their Automatic Traffic Recorder (ATR) data do not contain data for 365 days. He also stated that data completeness is important for applications like k-factor calculations (30th highest hour) that are used in highway design and capacity analysis. He also stated that with the existing errors in data collection, the use of the 50th highest hour might not be very different from the 30th hour and that this might be a future research need. Also some applications such as calculating Equivalent Single Axle Loads (ESALs) from WIM data require that all days are represented.
It was also suggested that the data quality measures in the white paper need to be customized by application and region. Greg Oliver from Delaware DOT mentioned that summer periods are critical for traffic data collection in the state because of the increased flow of traffic during these months. It is important that the data quality measure reflect this temporal component.
David Gardner, ODOT, questioned the usefulness of the data quality measures especially to the final user. Most users of ODOT data expect a certain quality level to be met and do not necessarily need all the details regarding quality. A suggestion was to have tiers of users and applications with different data quality documentation needs.
Andrew Pierson, URS, mentioned that it is often difficult to go back and verify data collection efforts especially since a consultant is unable to obtain the ground truth. Data from the states typically lack metadata or the discussion of the context in which the data are produced.
Steve Jessberger from ODOT raised a question about the validity measure of data. Specifically, what should be done with data collected during snow or construction? Should agencies use the “real” but atypical data or try to collect only typical data? Ralph Gillmann, FHWA, replied that FHWA would like to know why the data are abnormal and that while atypical conditions are not good for some applications like average annual daily traffic (AADT), metadata (data about data) for such cases would be helpful. Metadata are not required by FHWA at this time. None of the workshop participants indicated that the state agencies were collecting and reporting metadata.
On the question of metadata and its value, it was noted that agencies are unable to communicate effectively about data quality because there is usually no historical information or metadata that can be used for comparison; that is, there is no quality information associated with existing data. Some participants noted that their existing traffic analysis software or databases did not support the storage of metadata associated with traffic data.
On the issue of minimum acceptable data quality standards, the workshop participants suggested that the minimum acceptable standards vary by state, type of application, and data collection device. Some minimum requirements are already in use by states for automated traffic recorder (ATR) data. Ohio, Kentucky, and Indiana, for example, require two weeks of data per month from the ATRs. Indiana also requires at least two days from each day of the week, per month. There was no consensus as to whether it is necessary or feasible to set minimum acceptable data quality standards.
It was noted that the purposes of the traditional traffic monitoring groups and the ITS groups are different and that this affects their data collection and management philosophy. Scott Evans from ARTIMIS stated that the cameras and the changeable message signs were their priority for their Traffic Management Center (TMC), and they were interested only in the change in traffic volumes.
Several participants expressed concerns about ITS data, including the following:
The planning division in Pennsylvania DOT has been trying to use TMC data and has encountered some challenges in educating the TMC of their data requirements. It was also suggested that additional research be conducted to understand the value of ITS data.
Several traffic monitoring personnel stated that there was significant overhead involved in using ITS data including the pre-processing of data. Ohio and Kentucky have a good relationship with ARTIMIS (the TMC in Cincinnati), and data sharing does exist between the TMC and the traffic monitoring groups. The TMC is able to provide data to the traffic monitoring group at ODOT in a compatible Traffic Monitoring Guide (TMG) format. While ITS groups require dense coverage, the traffic monitoring groups require coverage for a much larger area. Dave Gardner, ODOT, cautioned that the availability of ITS data can sometimes overwhelm the resources of the traffic monitoring group in terms of the post-processing requirements.
All the participants agreed that guidelines are needed to explain the calculation of the suggested data quality measures. The following observations were made regarding the need and usefulness of guidelines:
It was suggested that these guidelines should be similar to what is being done by ASTM (formerly American Society for Testing and Materials) for archived data. It was also noted that standards about data quality might be useful and could be included in the AASHTO guidelines for data monitoring programs.
National benchmarks for data quality were also strongly encouraged. It was noted that the concept of INFOstructure should be used in integrating all transportation-related data. There should be greater emphasis on sharing and integrating data systems at state, local, and regional levels. At minimum, these benchmarks should be set for loop-based detection systems. These benchmarks also should be set based on the type of application.
3.2.3.2 Discussions – Utah Workshop
There was general agreement that the six fundamental measures of traffic data quality adequately describe all aspects. Dr. Mark Hallenback of University of Washington added that the measures presented are the right set of quality measures.
The workshop participants noted that the completeness measure was difficult to define as it may differ based on the application. The assumptions and definitions for this measure also need to be explicit. For example, 100 percent complete data for freeways is only a partial representation if the arterial system is also considered. It was felt that the data quality measures need to be specified differently for different applications and the uses of data should decide the nature and necessity of quality measures. It was suggested that data quality measures need to be fluid and flexible. One of the participants requested additional clarification on the differences between completeness and coverage. Shawn Turner explained that “completeness” refers to the temporal aspect and “coverage” refers to the spatial aspect of traffic monitoring. As far as data quality is concerned, it was noted that there is a lack of guidance for deploying sensors, and they are deployed ad hoc based on operational needs.

Qing Xia of Maricopa
Association of Governments in Arizona raised a question about the weighting or
ranking of the data quality measures.
Shawn Turner noted that there are no rankings or weights associated with
these measures, although that is an idea for future research.
Peter Martin from the University of Utah suggested adding two sub-measures for the accessibility measure of data quality. The first sub-measure suggested was “portability” to indicate the number of different formats in which the data were available to the user. The second sub-measure would provide information on the level of manipulation and the type of manipulation used on the data. Researchers from the University would like information on whether the data are raw or processed and how to access and reformat the data. Mark Hallenback indicated that the TMC in Seattle has status flags for its detector data that indicate problems and applied solutions at different levels of data aggregation.
Martin Knopp, Utah DOT, agreed with the data quality
measures and noted that the accessibility measure could place unusual demands
on the states to provide data in formats to satisfy all users. It was suggested that this measure be stated
as a philosophy instead. If all users can be defined then their
accessibility also can be defined. The
problem is that some uses for data may not be immediately known—future
potential uses of data may have different requirements.
Meeting the quality
goals of non-paying users is difficult for two reasons: (i) the provider may have different perspectives
on data quality and (ii) the requirements of the non-paying user may not be
clearly defined in the budget. It was
felt that if all parties (potential users or beneficiaries) pool resources to
secure sufficient funds, it may be possible to meet the data quality
requirements of all users.
In response to a question about the institutional and technical barriers involved in calculating and reporting these measures, it was noted that cost and time are the two most important issues. There could be a significant cost to modify software to report the quality measures. Some participants would like information on the return on investment obtained by reporting these quality measures. Raelene Viste (Idaho Transportation Department) commented that these measures could be very useful within the transportation group itself to monitor their performance even if the external users do not need these measures. Texas DOT feels that there is a good return on investment if these measures are followed.
Institutional issues
arise because different departments have different data needs, operating rules,
and budgets. There is no existing
mechanism for effective communication and exchange of views relating to traffic
data and its quality.
It was suggested that guidelines and baseline instructions could be helpful in allowing the agencies to calculate and report data quality measures. It was also suggested that these guidelines be provisional, which will give the impetus for the agencies to start collecting quality data, allow them to start reporting data in a certain way, and provide them time to overcome the institutional barriers. Creating a traffic monitoring master plan was suggested to describe how different components work and how they coordinate within agencies. Caltrans indicated that they have already started work in this area. These guidelines should take into consideration that most agencies have legacy systems, which often can be problematic. Another idea to formalize the data quality process was to include data quality requirements in the regional ITS architectures along with data flows. The visibility and the relevance of data collection programs can benefit greatly from data quality reporting.
For a particular
goal or program, there is the need for a minimum set of measures to assess the
quality of the data. However, while
there was no consensus on the minimum set of standards among the participants
for all the applications of traffic data, it was suggested that state DOTs need
to start with provisional standards that include performance statistics that
have visibility within the department.
There was no general
agreement for the need to establish national data quality benchmarks. Some participants felt that there is no need
for a national benchmark; others thought that perhaps “national benchmark” is
too strong, suggesting the use of “national goal” instead. National goals could be set for different
uses of data. It was agreed that
normalizing or leveling the playing field may be difficult given the diverse
application types and needs. However,
it was also noted that such goals could lead to uniformity in data quality
reporting. Caltrans indicated that it
operates according to a performance level but sees some value in having a
national goal. Such national goals also
would be helpful for vendors. Another
view indicated that each state could define its own use and its own goal and
standard instead of adhering to an established national goal, which may be more
difficult to set and achieve. In this
way goals would be defined and met at the state level. States that do a good job in maintaining
data quality should be recognized and rewarded.
3.3 Session 2 – State of the Practice in Traffic Data Quality
The white paper
titled “State of the Practice in Traffic Data Quality” was written by Dr. Rich Margiotta
(Cambridge Systematics) for
this project. The complete version of
the white paper is provided in Appendix A.
3.3.1 Types and Applications for Traffic Data
Several types of
traffic data are collected by both “traditional” and ITS means. Where there is overlap between the two
realms, the basic nature and definitions of the data collected are the
same. However, there are subtle
differences in data collection methodologies that may lead to problems with
data sharing and quality. Among these
are the polling rate and vehicle classification “bins”.
3.3.2 Traffic Data Quality: Characteristics
What Causes “Bad”
Traffic Data: Several
sources contribute to inaccuracies in traffic data. These relate to the nuances of specific equipment and how data
are collected and transmitted from the field:
Detection of
“Bad” Data: The white paper, “Defining and Measuring Traffic Data Quality”, presents a full discussion of how
questionable/inaccurate data are identified after they are collected from the
field. A variety of methods are used,
including internal range checks, cross-checks, time series patterns, comparison
to theory, and historical patterns are used.
Correction of “Bad” Data: Once suspect data are identified, the
question then is what to do about them.
Most applications flag the records failing quality control or set the
measurement values to missing or other special codes. Editing the measurement values is far less common, although some
experimentation with “imputing” values has taken place. Imputation appears to be most applicable
where small intermittent gaps appear in the data rather than large portions of
time with missing or suspect data. A
variety of techniques have been explored including time series smoothing and
historical growth rates by location and day and week. However, there is little consensus in the profession on what techniques
to be used, or if imputation should be done at all.
3.3.3 Quality Issues for Using ITS-Generated Data for Traditional Uses
The applications
that traffic data support in operational and traditional uses of ITS-generated
traffic data – as well as the nuances of data collection in both cases – can
have an impact on data quality. Several
differences exist based on these points:
3.3.4 Recommendations: Possible Solutions
Sampling of ITS
Locations and Data Streams: The selection of certain strategic locations
where both ITS and traffic monitoring groups can concentrate their efforts to
correctly install, inspect and maintain these locations.
Shared Resources: The
sharing of expertise and resources among the various agencies within the state
DOTs to ensure that they benefit from their strengths and help overcome
weaknesses.
Maintenance,
Calibration, and Performance Standards: Undertaking formal studies of
data quality by setting maintenance and calibration standards and goals for
traffic monitoring devices
Contractual
Arrangements: New
and emerging business models such as outsourcing and use of private contractors
for collecting and archiving data.
More Sophisticated Operations Applications as a Data Quality Leader: The current generation of operational
strategies does not require extremely accurate data – operators typically need
to know where the big problems are and their responses are geared to this. New and emerging operations applications may
drive the need for high quality data
New Technologies: The
use of new technologies including non-intrusive devices and probe vehicles
combined with innovative uses of existing inductive loop technologies.
The possible
solutions and recommendations (section 3.3.4) served as the main points for the
session’s discussions.
3.3.5.1 Discussions – Ohio Workshop
Rich Margiotta initiated the discussion by asking the participants what they thought of the potential solutions listed in the white paper. The participants agreed that sharing resources between the ITS and traffic monitoring groups is a good idea. The Division of Planning in Kentucky described an example of shared resources. The Division of Planning invested in equipment they like and trust and ARTIMIS identified modifications to those devices so that they also can be used for ITS applications by the TMC. James Pol, ITS/JPO, mentioned that there will be a greater need for sharing data in the future due to scarce resources.
On the question of whether there have been any observed cost savings due
to data sharing, David Gardner, ODOT, responded that the data sharing with
ARTIMIS was very recent and no cost information was available. Indiana DOT commented that there should be
some expected savings from a safety standpoint as they no longer have to place
road tubes on the roadway. It was
suggested that TMCs start using ITS data only from select locations. It was noted that the TMC in Cleveland is
beginning to consider the use of ATR data for their operations.
One of the major themes of the discussion was the problems encountered during installation of traffic monitoring devices. Installation of equipment is the most critical aspect to ensure that high quality data are obtained from the device. It was noted that the use of pre-qualification of contractors for installing loops and piezo-based detectors was not the usual practice. Ohio does not have any pre-qualification standards for installation and contractors install devices based on manufacturer’s instructions. Indiana DOT calibrates their devices annually but does not have any standards for installation. Pennsylvania DOT uses manual counts as the standard to assess the accuracy of ATR counts. It is recognized, however, that manual counts also can be in error depending on the volume of traffic and thus may not be the most effective measure of ATR count accuracy.
David Gardner, ODOT, mentioned that Ohio DOT is working on a contract to
maintain ATRs. The contract would be a
task order in which the successful contractor would be given maintenance tasks
as needed. ODOT hopes that such a
contract would save time in fixing maintenance problems by having a contractor
in place.
The overall consensus was that there is some existing information about installation and maintenance of equipment but more guidelines and standards are needed.
Quicker notification of sensor problems was discussed. Today, in some cases, a problem might not be known for a period of four to six weeks (during data processing). While in some instances it is possible to poll the devices daily (Kentucky polls its 77 sites daily), states with more sites usually poll less frequently.
On the question of whether the quality assurance software used by the
traffic monitoring groups can be shared with the ITS groups, various states
expressed an interest in the data validation rules used to check traffic
data. It was noted that state agencies
had developed in-house software to validate traffic data using specific
validation checks. A synthesis of the
data validation checks was suggested as a very important and desired research
need.
It was also noted that some equipment does not have sufficient level of
accuracy and it was recognized that vendors need to test the equipment better
and make it more robust. State DOTs
also do not have information on the lifecycle cost of the equipment. The participants also noted that the value
of data to the customers was not clear.
In other words, what benefit would an increase in data quality provide
to the customers?
3.3.5.2 Discussions – Utah Workshop
The participants felt that strategic ITS detector locations in which the
traffic monitoring groups and the ITS groups share resources and devices was a
good idea. Washington DOT already has
started using a similar concept in which certain detectors are more important
than others. However, it was felt that these priority locations are politically
driven and land-use factors can change the priority very quickly. It is essential to include the planning
groups in identifying the location selection and reevaluate priorities
periodically.
The participants also agreed that sharing resources is a good idea. However,
doing it well requires understanding what is possible and what is
practical. It is necessary to define
the types of data needed and collected by all the agencies sharing the data and
equipment. Vehicle classification was
discussed as an example. The 13 vehicle
classes used by FHWA are required by very few analysis procedures but are
required to be collected and reported by the traffic monitoring agencies. However, ITS groups do not have the
equipment to collect such detailed classification. Some other groups within the DOT require information on body
types and commodity hauled. These
discrepancies and specific needs should be understood and resolved to ensure
synergies from the shared resources and equipment.
There were some concerns about sharing equipment, as different protocols and storage requirements used by different groups in the same agency make the use of the same devices difficult.
States have
experienced problems in data collection equipment maintenance, primarily in
inspections of installation after construction begins. Coordinating with construction, planning,
and operations groups to ensure proper installation and inspection is often a
problem. Joe Avis from Caltrans
commented that devices that have had electrical inspections last longer than
those which have not been inspected.
The biggest impediment in performing such inspections is the time and
cost. Sharing resources to achieve this
goal is very beneficial to everyone.
Various participants
noted their frustrations with equipment installation. Texas DOT is developing procedures for design, installation, and
maintenance, and will make these available on the Internet so that contractors
can access them. They are also planning
to train all their regional offices on the procedures related to installation
and maintenance of traffic data collection devices.
The participants
expressed interest in quality control and assurance software used by
traditional traffic monitoring groups.
The software used by states varies greatly and is typically developed
using their respective in-house business rules. Mark Hallenbeck proposed creating an open-source software model
or at least having the documentation of such software available on the web so
that a DOT investing in such software knows what other agencies have used. Martin Knopp (Utah DOT) mentioned a
voluntary group of state agencies that encourages informal exchange of
information. Currently, the scope of
this group is very limited. There also
has been a pooled fund study to look into the elements of quality assurance
software. There was a consensus that
this is an area of great interest to participants.
3.4 Session 3 – Advances in Traffic Data Collection and Management
The third white
paper titled “Advances in Traffic Data Collection and
Management” was written by Dr. Dan Middleton (TTI) for
this project. The complete version of
the white paper is provided in Appendix A.
Without accurate and
reliable detectors, traffic management decisions based upon real-time or historical
data are compromised. Many agencies use
post processing for quality assurance as opposed to quality control. Quality assurance attempts to “fix the data”
or identify defective data rather than ensuring the accuracy and reliability of
the equipment. Quality control
emphasizes good data by ensuring selection of the most accurate detector then
optimizing detector system performance.
This white paper identifies innovative approaches for improving data
quality through innovative contracting methods, standards, training for data
collection, data sharing between agencies and states, and advanced traffic
detection techniques.
3.4.2 Innovative Contracting Methods
A few agencies have
already invested resources in developing new contracting methods as a means of
ensuring data quality at its source.
Performance criteria in contracts, while not common, are being
considered by DOTs as a method to transfer some of the risk and maintenance
requirements to contractors.
The Virginia
Department of Transportation (VDOT) at the Hampton Roads Traffic Management
Center uses contractors for support of its day-to-day operations. The TMC accomplishes the necessary
maintenance on its detection system through hiring contractor personnel who are
supervised by VDOT personnel. VDOT
treats contractor personnel as an extension of its own staff, apparently giving
the TMC director even more latitude to add or remove contractor personnel
compared to VDOT staff. The second
example in Virginia is the VDOT Mobility Management Section (traditional data
collection), which leases its traffic counters and modems from Digital Traffic
Systems (DTS). A state inspector checks
the equipment once a year, but if there are substantial errors in the data, the
contractor has to re-collect the data.
VDOT has established performance-based lease criteria for payment of
data collection services. Contractor compensation
is based on the amount of acceptable data being submitted by the contractor.
Another example of
an innovative contracting method is with the Ohio Department of
Transportation’s Office of Technical Services, Traffic Monitoring Section. ODOT is in the process of executing a
task-order-type contract for maintenance to have contractors on board for
anticipated and unanticipated maintenance requirements of the traditional data
collection equipment statewide. The
contract is expected to begin in the summer of 2003.
Standards
development is another aspect of traffic data quality. The U.S. DOT ITS Standards Program is
working toward the widespread use of standards to encourage the
interoperability of ITS systems, including traffic data collection
systems. There is also a draft standard
being developed by the ASTM, entitled “Standard Specification and Test Methods
for Highway Traffic Monitoring Devices (ASTM, 2002),” which will be available
soon. Standardization has occurred in
Germany, the Netherlands, and France, where national standards for data
collection equipment have been developed (U.S DOT, 1997). The process has increased the quality and
accuracy of the data collected, decreased the effort needed to transfer data
between agencies or offices, and increased the reliability of field
equipment. However, there is increased
initial cost of the equipment when compared to non-standard equipment.
3.4.4 Training for Data Collection
Training of
personnel on the intricacies of the equipment is an essential part of ensuring
data quality. With improvements in
non-intrusive detector hardware and software occurring at a rapid pace,
maintenance personnel must be computer literate and must maintain an awareness
of the latest changes for a variety of detection systems. Initial training of new systems is often
available through the vendor, but turnover in state DOT maintenance staff and
new models requires an ongoing training program.
3.4.5 Data Sharing Between Agencies and States
Budget cuts are
causing agencies to seek alternate means of meeting data supply needs, with one
solution being to share data between agencies.
The Hampton Roads TMC currently shares video with the city of Norfolk
and plans to share video, voice, and data with six other cities in the
immediate area, including Norfolk, which also has a TMC so there is mutual
benefit to sharing each other’s data.
The New England states of Connecticut, Maine, Massachusetts, New
Hampshire, Rhode Island, and Vermont have cooperated to help each other and
share transportation data. ARTIMIS
supplies data to the following agencies:
planning agencies within the Ohio DOT, the Kentucky Transportation
Cabinet, the local MPO (Ohio-Kentucky-Indiana Regional Council of Governments),
the City of Cincinnati Traffic Engineering office, local FHWA contacts, and the
FHWA Mobility Monitoring project. The
agencies sharing data about ARTIMIS perform their own analysis of data
quality.
3.4.6 Advanced Traffic Detection Techniques
Quality control
emphasizes data quality by ensuring selection of the most accurate detector
then optimizing detector system performance.
Two of the most recent research efforts focusing on the performance
attributes of advanced detection techniques occurred at the Texas
Transportation Institute (Middleton et al., 1999, 2000, 2002) and in Phase II
of the Minnesota DOT Non-Intrusive Tests
(MinnDOT & SRF Consulting, 2002). Of the detectors recently tested by TTI and MinnDOT, the
multi-lane detectors that are most competitive from a cost and accuracy
standpoint are Autoscope Solo Pro, Iteris Vantage, RTMS by EIS, SAS-1 by
SmarTek, Traficon NV, and 3M Microloops.
The following points
were suggested as discussion items at the end of the presentation:
·
What are the
equipment-related impediments to data sharing?
·
What are the
data accuracy concerns for ITS data?
·
How many
detectors can be “out” at any given time?
·
Standards
development takes time. What do we do
in the meantime? Current standard
output is “contact closure.”
·
How should/will
equipment vendors help (training, product consistency, information
dissemination, diagnostics)?
3.4.7.1 Discussions – Ohio Workshop
Dan Middleton (Texas Transportation Institute) presented the paper on innovative approaches to traffic data collection management. European agencies have extensive experience with loop detectors and are satisfied with their performance. These agencies are careful with installations and have national standards for loop installations. Dan Middleton remarked that the specifications for the loop detectors themselves are not very different from those currently being followed by Texas DOT (TxDOT), but that there are stricter installation and maintenance standards in Europe.
The participants described the perfect detector as one that is easily installed off the road; weather proof; self-diagnostic; and capable of collecting multi-lane volume, speed, and classification data.
There was also discussion of the appropriate spacing of detectors. Participants felt that the current 0.5-mile-spacing was driven primarily by ramp-metering applications and the one-mile spacing of urban interchanges. For current applications at TMCs, 0.5-mile spacing is not required. However, advanced traffic management applications might need such dense coverage. Traditional traffic monitoring groups need data from only one location in each segment. Thus, the spacing is determined by potential application of data.
In terms of contracting, it was noted that most manufacturers provide a one-year warranty on their equipment and it might be useful if they provided longer warranties (e.g., five years). Performance-based contracts were viewed as an interesting approach but the participants needed more information on how to set up and manage these contracts. There were concerns expressed about situations where the contractor and the state do not agree on the quality of the data and the increased costs of these contracts. Currently, the primary mode of contracting is low-bid. Another idea was to develop an asset management approach for certain devices. It was noted during the discussions about contracting and business models that universities are now becoming archivists of traffic data. The field operational test (FOT) being planned in Virginia would provide more information on such a framework and its advantages and disadvantages.
The participants also indicated the need for a clearinghouse of traffic detectors. Ralph Gillmann mentioned the Vehicle Detector Clearinghouse (VDC), a pooled-fund project operated by New Mexico State University. The clearinghouse has information on traffic detector tests conducted, and offers limited technical assistance. It was noted that the clearinghouse is not a testing facility. The need for such a testing facility was also expressed.
It was noted that vehicle classification was a problem for most of the detectors. The 13 vehicle classes required by FHWA restrict the type of traffic detection device that can be used. Also, length-based detectors have different classification schemes based on the manufacturer. Ralph Gillmann mentioned that FHWA has worked with Illinois DOT to allow it to report length-based classification data.
3.4.7.2 Discussions – Utah Workshop
The participants were receptive to newer detection technologies as long as they are cost effective and approach the accuracy of inductive loops. Participants from traffic monitoring groups indicated that they had tried non-intrusive technologies including remote traffic microwave sensor (RTMS) and video-based detection with varying degrees of success. In terms of the cost-benefit of using newer detection technologies, it was felt that life-cycle costs for traffic detectors would be very valuable in decision-making; however, cost information is often not available. It was also noted that while the cost of traffic control and maintenance are reduced in the case of non-intrusive detectors, there are still some costs which need to be considered in the cost-benefit.
It is not uncommon for vendors to release new or modified equipment before it has been fully tested and before proper training is provided to the vendor’s own personnel. A testing institute was suggested as a solution. The Vehicle Detector Clearinghouse was suggested as a potential candidate to perform such a service. Currently the clearinghouse provides information about detectors and tests conducted by the states, but it does not conduct independent testing
Installation of devices was discussed again in this session as being critical. Dan Middleton remarked that the Netherlands scanning tour indicated that the success of the inductive loops greatly depended on their installation. There needs to be coordination during installation and even afterwards between different divisions of the same agency. For example, milling operations to smooth the pavement can completely destroy loops, and lane-striping resulting in lane shifts can render the loops ineffective because they are no longer centered in the lanes.
Each detector has its issues and problems related to installation and calibration. Location and set-up of these devices sometimes is more art than science. While there are manufacturer’s instructions for set-up and installation, the installer must still use trial-and-error in some installations to achieve optimum performance. Experience gained over time is helpful in correctly and efficiently setting up these devices. Also, a compilation of the installation, maintenance procedures, and best practices would be very useful.
This section
summarizes the action items from brainstorming sessions conducted to identify
and prioritize the action items to address the data quality issues discussed in
the previous sessions. The actions are organized by white paper topic.
3.5.1 Defining and Measuring Traffic Data Quality
Following are the
action items identified to address issues relating to defining and measuring
traffic data quality:
Following are the
action items identified to address issues relating to defining and measuring
traffic data quality:
Following are the
action items identified to address issues relating to the state of the
practice:
Following are the
action items identified to address issues relating to the state of the
practice:
Following action items identified to address issues relating to innovative approaches to data quality:
Following action items identified to address issues relating to innovative approaches to data quality:
3.5.4 Responsibilities and Timeline
Responsibilities and timelines for implementing the action items were not discussed at the regional workshops. Although responsibilities as to which agency should perform the action items were not explicitly identified, it was implicit that FHWA and state agencies will be playing leading roles.
4.0 Action Plan for Improving Traffic Data Quality
As noted earlier, the primary objective of this project is to define an action plan with work items that can be executed through the U.S. Department of Transportation (DOT), stakeholder organizations (e.g., American Association of State Highway Transportation Officials [AASHTO], ITS America), state agencies, and private industry. Several action items were identified and prioritized at the workshops. The action plan builds upon the findings in the white papers and inputs obtained from the regional workshops. The action plan provides a blueprint for specific actions to address traffic data quality issues.
4.2 Partnerships and Coordination
Even though the regional workshops were not attended by representatives from every state, the plan is considered to reflect a broadly based consensus of the states DOTs and others involved in traffic monitoring activities on actions to address data quality issues. Implementation of the plan will require collaboration among both public and private partners with the FHWA and state DOTs playing leading roles.
Coordinators were identified for each action item. It is assumed that the coordinators will assume the primary responsibility of implementing the specified action items. Although specific agency responsibilities for action items were not explicitly identified, it was implicit that FHWA and state agencies will play leading roles. For example, FHWA would lead development of data quality assessment guidelines and the states would lead the use of task order contracting approaches. In other areas, some FHWA assistance may be required in developing general guidance for the states. States can then customize the approach to suit their individual circumstances.
There are three primary organizational units involved in the traffic monitoring activity: Planning, Design, and Intelligent Transportation Systems (ITS) or Traffic Management Centers (TMC). The degree of involvement in traffic monitoring activity can vary from conducting simple road tube counts to operating elaborate ITS installations. Since methods, techniques, and equipment for conducting traffic monitoring activities are similar across the three organizational units, there is significant opportunity for partnering between the units. These partnerships are critical in implementing some of the action items.
The plan identifies 10 priority action items based on those identified at the two regional workshops. These action items were distilled from comments from both regional workshops.
This section describes the ten action items identified for improving traffic data quality from ITS and non-ITS sources. These action items are presented in descending order of priority. The plan includes descriptions of the action items and the issues they address. For each action item, coordinating and collaborating agencies are specified.
4.3.1 Guidelines and Standards for Calculating Data Quality Measures
Description: Develop guidelines and standards for calculating traffic data quality measures. The guidelines and standards are expected to contain methods to calculate and report the data quality measures for various applications and levels of aggregation. In addition, the guidelines should also include:
· Examples or case studies of application of data quality methods
· National goals (by application) – these data quality goals represent what state agencies can strive to achieve in their operations
· Guidance on how to construct and store quality measures
· Specifications and procedures for reporting data quality metadata
· Costs to calculate and report quality measures.
Issues: This
action item was identified as top priority at the two regional workshops. The action item addresses the following key
issues:
·
Defining
and measuring traffic data quality
·
Quantitative
and qualitative metrics/levels of data quality
· Acceptable levels of quality
· Methodology for assessing traffic data quality.
Coordinators: It was suggested that FHWA or AASHTO would be the appropriate agency to develop these guidelines. A suggestion was to include guidelines for calculating data quality measures in the “AASHTO Guidelines for Traffic Data Programs” publication or in the Traffic Monitoring Guide.
4.3.2 Compilation of Business Rules/Data Validity
Checks and
Quality Control Procedures
Description: Synthesize validation procedures and rules used by various states and other agencies for traffic monitoring devices. This synthesis report will also serve as a guide to DOTs and other agencies investing in new software for traffic data collection. The synthesis document should also include quality control procedures for all types of applications and data management methods for maintaining high quality data.
The development and adoption of common software was identified as a possible approach to ensure uniformity among state agencies. Recognizing that software development and testing is expensive and time-intensive, it was suggested that an immediate action would be to share documentation and knowledge of existing software among state agencies.
Issues:
This action item addresses the
following key issues:
Coordinators: FHWA, state DOTs
4.3.3 Best Practices for Equipment Installation and Maintenance
Description: Develop a synthesis of best practices of
installation and maintenance of traffic monitoring devices. This document should, among other things,
include:
Issues: This
action item addresses the following key issues:
Coordinators: FHWA, state DOTs
4.3.4 Clearinghouse for Vehicle Detector Information
Description: Establish an independent testing entity to test and verify claims of the new and emerging traffic detection devices on the market. Such an ongoing program would conduct periodic independent accuracy tests of new equipment. Results from the independent tests should be stored in a clearinghouse that can be accessed by all potential users.
The clearinghouse would also provide technical guidelines on the capabilities of detectors by application and conditions. The guidelines would enable agencies to select the appropriate devices for its applications, budget, and environmental conditions.
It was noted that the capabilities of the existing Vehicle Detector Clearinghouse (VDC), operated out of the New Mexico State University, could potentially be expanded to serve the needs expressed above. In the short-term, a web-log or a moderated discussion forum needs to be added to the existing Vehicle Detector Clearinghouse to help users share experiences.
Issues: This
action item addresses the following key issues:
Coordinators: FHWA, state DOTs, and VDC
4.3.5 Sensitivity Studies to Demonstrate “Value of Data”
Description: Conduct extensive sensitivity analyses and document the results to illustrate the implications of data quality on user applications. This action item is considered important because it would help document and demonstrate the “value of data” and highlight the effects of poor quality data on various applications. Such a document would serve as a reference for potential users in deploying data of different levels of quality. Some applications are extremely sensitive to data quality, whereas others are not. The documentation should include sensitivity of results for selected applications to variations in data quality measures such as accuracy, coverage (density of detectors), and completeness (missing values).
Based on the results of the sensitivity analysis, develop data quality “targets” or “benchmarks” for each application. Also, the results of the sensitivity analysis would be used to provide guidance or procedures for imputing missing data points.
Issues: This action item addresses
the following key issues:
Coordinators: FHWA, state DOTs
4.3.6 Guidelines for Sharing Resources
Description: Develop guidelines for sharing resources for traffic monitoring activities including shared equipment, personnel, funding, and cooperation among different agencies and departments. These should also include guidelines for establishing public-private partnerships for sharing resources as well as guidelines for assessing and validating traffic data collected by the private sector and vice versa.
Information gathered from the regional workshops clearly indicated that budget cuts and financial considerations have forced different groups (within an agency or organization) to look into synergies that would lead to the use of other group’s resources to meet their data needs. Identifying opportunities for different groups within and outside state DOTs to work together to meet their data needs was mentioned as critical. Furthermore, these guidelines will establish trust and confidence in private sources of data for use by the public sector and vice versa.
Issues: This
action item addresses the following key issues:
Coordinators: State DOTs, FHWA
4.3.7 Life-cycle Costs of Detection Equipment
Description: Develop a methodology for calculating lifecycle costs to enable states and other agencies to:
These include cost of equipment, installation, training, and maintenance. The costs of equipment and maintenance impact coverage and other measures of quality. A better understanding of the life-cycle costs and guidance on how to estimate these costs, is expected to help planning and investing in traffic monitoring activities.
Issues: This
action item addresses the following key issues:
Coordinators:
State DOTs, FHWA
4.3.8 Improved Contracting Approaches
Description: Develop guidelines for innovative contracting approaches for traffic data collection. This should include:
· Information regarding performance-based contracting approach and management, and the associated costs and benefits
· Guidance on task-order-type contracts and cooperative agreements for equipment installation and maintenance
· Guidance on life-cycle-cost-based bidding approach.
The question of the contracting approach for data collection device procurement, installation, and maintenance was identified as one of the key issues impacting traffic data quality. This action item is intended to address the issue by providing guidelines that would ensure that vendors are held accountable for the performance of their devices.
Issues: The
action item addresses the following key issues:
Coordinators: State DOTs, FHWA
4.3.9 Case Study or Pilot Tests
Description: Conduct a case study or a pilot test to observe a state DOT and TMCs working to improve data quality and evaluate the return on investment from the improved data quality. Information gathered from such a case study is expected to help implement some of the action items outlined above.
The action
item addresses the following key issues:
Coordinators: FHWA, state DOTs
4.3.10 Guidance on Technologies and Applications
Description: Provide guidance on the data elements to measure and report since this dictates the type of device procured by the agency. For example, the FHWA’s 13 vehicle categories should be revisited and length-based classifications explored. Similarly, new and emerging applications might have additional data needs, which again influence the type of device.
Provide guidance on the innovative uses of loops and existing technologies. Improvements in inductive loop technologies can expand their capabilities beyond volume and speeds (e.g., approaches to derive vehicle classifications from loop signatures).
The action item
addresses the following key issues:
Coordinators:
FHWA, state DOTs
4.4 Implementation and Work Items
As noted earlier in Section 4.2, the coordinators would
assume primary responsibility for implementing the specified action items. FHWA would play a leading role in the
overall implementation of the action plan.
State DOT involvement, coordination, and participation are critical for
some action items more than others. Following are the
three potential groups of activities or work items to implement the action
plan.
The majority of the action items relate to the development of
guidelines, which are best implemented through research studies. The findings of the research effort would
then be disseminated to all potential users.
This will then be followed by evaluation to assess the success of
implementation and identify limitations and shortcomings. FHWA would the conduct these research
activities with support from state DOTs and other agencies and organizations.
For action items falling into this category, the first activity would
be to develop research topics and statements of work for each or combination of
action items. Action items in this
category include the following (with report section identified):
·
Compilation of business rules/data validity checks and quality control
procedures (4.3.2)
·
Best practices for equipment installation and maintenance (4.3.3)
·
Sensitivity studies to demonstrate “value of data” (4.3.5)
·
Guidance on technologies and applications (4.3.10)
Some of the action items could be implemented through regional
workshops. It is believed that action
items in this category are those that require sharing of experiences and
success stories where a workshop or similar forum provides the best
environment. FHWA would coordinate with
the state DOTs to sponsor and organize such workshops. The following are action items in this
category:
·
Guidelines for sharing resources (4.3.6)
·
Life-cycle costs of detection equipment (4.3.7)
·
Improved contracting approaches (4.3.8)
4.4.3 Case Studies and Clearinghouse
Action item in this category require establishing or identifying an
independent entity and conducting case studies. These action items can be implemented only after some of those in
the other categories have been completed.
It is expected that participation in the case studies would be
voluntary. It is envisaged that FHWA,
state DOTs, and other agencies or organizations would work jointly to
successfully complete these action items.
The following are the action items in this category:
·
Case study or pilot tests (4.3.9)
·
Clearinghouse for vehicle detector information (4.3.4)
The action plan was developed based on information from published
literature and discussions at two regional workshops. Ten action items were identified directed at addressing traffic
data quality issues. Coordinators and
work items have been suggested for the various action items. The action items represent the general
consensus of the workshop participants regarding the major traffic data quality
issues. Implementation of the action
plan is seen as a major step towards enhancing the quality of traffic data and
encouraging usage by federal, state, local agencies, and other
organizations.
The action plan in its current form would serve as input for a national
workshop on data quality for review and adoption.
Battelle
Memorial Institute, Sharing Data for Traveler Information: Practices and Policies of Public Agencies,
prepared for U.S. Department of Transportation, July 2001.
Closing the Data Gap: Guidelines for Quality ATIS Data, Prepared for: ITS America and
The U.S. Department
of Transportation, April 2000.
D. Middleton and R. Parker. Initial Evaluation of Selected Detectors to Replace Inductive Loops on Freeways, Research Report FHWA/TX1439-7, Texas Transportation Institute, College Station, Texas, April 2000.
D. Middleton, D. Jasek, and R. Parker, Evaluation of Some Existing Technologies for Vehicle Detection, Research Report FHWA/TX-00/1715-S, Texas Transportation Institute, College Station, Texas, September 1999.
D. Middleton and R. Parker. Evaluation of Promising Vehicle Detection Systems, Research Report FHWA/TX-03/2119-1, Draft, Texas Transportation Institute, College Station, Texas, October 2002.
English, L.P. 7 Deadly Misconceptions about Information
Quality. INFORMATION IMPACT
International, Inc., Brentwood, Tennessee, 1999.
English, L.P. Improving Data Warehouse and Business
Information Quality. John Wiley
& Sons, Inc., New York, New York, 1999.
FHWA Study Tour
for European Traffic Monitoring Programs and Technologies, FHWA’s Scanning Program, U.S. Department of
Transportation, Federal Highway Administration, Washington D.C., August
1997.
MNDOT
and SRF Consulting Group, NIT Phase II:
Evaluation of Non-Intrusive Technologies for Traffic Detection, Final
Report, September 2002.
Strong, D.M., Y.W.
Lee and R.Y. Wang. 10 Potholes in the Road to Information Quality. Institute of
Electrical and Electronic Engineers, August 1997(A), pp. 38-46.
Standard
Specification and Test Methods for Highway Traffic Monitoring Devices, The American Society for Testing and
Materials, Review Copy: Version C for
E17.52, Draft December 2002.
“Defining and Measuring
Traffic Data Quality”
By Shawn Turner
Introduction
Although not specifically referring to intelligent transportation systems (ITS), a Wall Street Journal article speaks to the related subject of data quality: “Thanks to computers, huge databases brimming with information are at our fingertips, just waiting to be tapped. . . . Just one problem: Those huge databases may be full of junk.” (Wand and Wang 1996) As Alan Pisarski noted in his Transportation Research Board (TRB) Distinguished Lecture in 1999, “we are more and more capable of rapidly transferring and effectively manipulating less and less accurate information” (Pisarski 1999).
Recent research and analyses have identified several issues regarding the quality of traffic data available from intelligent transportation systems for transportation operations, planning, or other functions. The Federal Highway Administration (FHWA) is developing an action plan to assist stakeholders in addressing traffic data quality issues. Regional stakeholder workshops and white papers will serve as the basis for this action plan.
As one of those white papers, this document presents recommendations for defining and measuring traffic data quality. This white paper:
Several terms should be defined at the outset. Data and information are sometimes used interchangeably. Data typically refers to information in its earliest stages of collection and processing, and information refers to a product likely to be used by a consumer or stakeholder in making a decision. For example, traffic volume and speed data may be collected from roadway-based sensors every 20 seconds. This traffic data is then processed into information for the end consumer, such as travel time reports provided via the Internet or radio. But the terms are also relative, as one person’s data could be another person’s information. Throughout this paper the term data quality will be used to refer to both data and information quality. No attempt is made to delineate the point at which data becomes information (or knowledge or wisdom, for that matter).
The literature contains two similar definitions for data quality. Strong, Lee and Wang (1997A) define information quality as “fit for use by an information consumer” and indicate that this is a widely adopted criterion for data quality. English (1999A) further clarifies this widely adopted definition by suggesting that information quality is “fitness for all purposes in the enterprise processes that require it.” English emphasizes that it is the “phenomenon of fitness for ‘my’ purpose that is the curse of every enterprise-wide data warehouse project and every data conversion project.” In his book, English (1999B) defines information quality as “consistently meeting knowledge worker and end-customer expectations.” It is clear from these definitions that data quality is a relative concept that could have different meaning(s) to different consumers. For example, data considered to have acceptable quality by one consumer may be of unacceptable quality to another consumer with more stringent use requirements. Thus it is important to consider and understand all intended uses of data before attempting to measure or prescribe data quality levels.
The recommended definition for traffic data quality is as follows:
Data quality is the fitness of data for all purposes that require it. Measuring data quality requires an understanding of all intended purposes for that data.
Several data quality measures were consistently found in both current practice and data quality literature. Based on the findings discussed later in this paper, the following data quality measures are recommended:
There are several other valid data quality measures presented that could be used for specific traffic data applications in some regions. The five measures presented above, though, are fundamental measures that should be considered universally for measuring data quality in all traffic data applications.
At this time, we recommend that goals or target values for these traffic data quality measures be established at the regional level based on a better understanding of all intended uses of traffic data. It is clear that data consumers’ needs and expectations, as well as available resources, vary significantly by region and preclude the recommendation for a national goal or standard for these traffic data quality measures.
The research team also recommends that if data quality is measured, the information should be made available and accessible with the data as metadata. This practice of requiring a data quality report using standardized data quality measures is common in the GIS and other data communities. The American Society of Testing and Materials (ASTM) is developing a data archive metadata standard that could be used to document and describe these data quality measures in sufficient detail for data consumers. The ASTM metadata standard under development has been adapted from the GIS communities’ metadata standard (FGDC-STD-001-1998 and ISO DIS 19115) with their data quality reporting sections intact.
Current practices in measuring traffic data quality are summarized below for three common consumer groups involved in highway transportation:
Our review of current practice found that, in general, consistent and widespread reporting of traffic data quality measures was not evident in any of these three consumer groups. Efforts to address data quality were more evident in the latter two groups than with real-time monitoring and control. A few data quality measures have been suggested or are used in each of these groups. These data quality measures are discussed in the following paragraphs:
Data consumers in this group are typically engaged in traffic management and control or the provision of traveler information. Data uses are considered real-time and are generally concerned only with the most recent data available (e.g., typically five to fifteen minutes old). Some agencies are beginning to use historical data to provide additional value to traveler information. In some cases field data collection hardware and software provide rudimentary data quality checks; in other cases, no data quality checks are made from the field to the application database. Field hardware and software failures are common. In some cases, equipment redundancy provides sufficient information to cover gaps in missing data. In other cases, missing data is simply reported “as is” and decisions are made without this data.
Many agencies provide time-stamped traveler information via websites, thus providing an indication of the data timeliness. Selected examples can be found at Houston TranStar (http://traffic.tamu.edu), WSDOT (http://www.wsdot.wa.gov/PugetSoundTraffic/), and Wisconsin DOT (http://www.dot.wisconsin.gov/travel/milwaukee/index.htm), just to name a few.
Several traffic management centers track failed field equipment through maintenance databases and report such things as the average percent of failed sensors. The Michigan Intelligent Transportation Systems (MITS) Center has defined lane operability as the sensor-minutes of failure, which is a product of the number of failed sensors and the duration of the failure in minutes (Turner et al. 1999). These measures can be classified as measures of coverage or completeness.
Some traffic management centers evaluate the accuracy of new types of sensors before widespread deployment. For example, the Arizona DOT traffic operations center in Phoenix used accuracy to measure the data quality from non-intrusive sensors for which they were considering installation (Jonas 2001). In their evaluation, ADOT compared traffic count and speed data from non-intrusive, passive acoustic detectors to calibrated inductance loop detectors under the assumption that the loop detector data represented the most error-free data obtainable. The measure used in the evaluation was absolute and percentage differences between traffic counts and speeds measured with the two sensor types.(incomplete sentence)
ITS America and the U.S. DOT convened numerous stakeholders in 1999 and developed guidelines for quality advanced traveler information system (ATIS) data (ITS America 2000). The guidelines were developed in an effort to support the expansion of traveler information products and services. One of the explicit purposes of the guidelines was to increase the quality of traffic data being collected. The ITS America guidelines recommended seven data attributes, six of which can be considered data quality measures:
The ITS America guidelines further defined quality levels of “good”, “better”, and “best” and provided specific quality level criteria for each attribute. For example, five to ten percent error in travel times and speeds was classified as a “better” quality level under the Accuracy attribute.
In another white paper about data quality requirements for the INFOstructure (i.e., a national network of traffic information and other sensors), Tarnoff (4) suggests the following data quality measures and possible requirements (Table 1):
Table
1. Possible INFOstructure Performance
Requirements
|
Measure |
Application |
Requirement |
|
|
Local Implementation |
National Implementation |
||
Speed Accuracy
|
Traffic Management |
5-10% |
5-10% |
|
Traveler Information |
20% |
20% |
|
Volume Accuracy
|
Traffic Management |
10% |
N/a |
|
Traveler Information |
N/a |
N/a |
|
Timeliness
|
All |
Delay < 1 minute |
Delay < 5 minutes |
Availability
|
All |
99.9% (approx. 10 hours per year) |
99% (approx. 100 hours per year) |
Source: Tarnoff 2002
Tarnoff presented these data quality requirements as a “starting point for the discussion of these issues” and suggested that there is a tendency in the ITS community to specify performance without a complete understanding of the actual application requirements or cost implications. Thus Tarnoff suggests that any decisions about data quality requirements be grounded in actual application requirements and cost implications.
Data consumers in this group are typically engaged in off-line analytical processing of data generated by traffic operations. Archived data uses vary widely, from academic research (e.g., traffic flow theory) to traveler information (e.g., “normal” traffic conditions), operations evaluation (e.g., ramp meter algorithms), performance monitoring, and basic planning-level statistics. Although the operations data in archives are generated in real-time, most of the applications to-date have been historical in nature and outside of the traffic operations area. Data archive applications are still in relative infancy and thus quality assurance procedures are still being established in most areas. Several data archive managers have voiced concerns about the quality of the data generated by operations groups, presumably because the data archive managers have more stringent data quality requirements for their applications than the operations applications. In fact, this concern about archived data quality is part of the genesis for this FHWA-sponsored project. Most current archived data users recognize these data quality issues but maintain an optimistic attitude of “this is the best data I can get for free” and attempt to use the data for various applications. However, interviews conducted in this project revealed several potential data archive consumers that were reluctant to use the data because of real or perceived data quality issues.
As noted previously, data archive applications are still in relative infancy and thus data quality measures are not extensively or consistently used. Data completeness, expressed as the number of data samples or the percent of available samples in a summary statistic, is the measure most often used in data archives. The data completeness measure is used frequently because operations data is often aggregated or summarized when loaded into a data archive. For example, the ARTIMIS center in Cincinnati, Ohio/Kentucky reports the number of 30-second data samples (shown in bold in Table 2) that have been used to compute each 15-minute summary statistic.
|
Data
for segment SEGK715001 for 07/15/2001 Number
of Lanes: 4 # Time
Samp Speed Vol
Occ 00:01:51 30 47 575 6 00:16:51 30 48 503 5 00:31:51 30 48 503
5 00:46:51 30 49 421 4 01:01:52 30 48 274 5 01:16:52 30 42 275 14 ... |
Source: ARTIMIS Data Archives
The Washington State DOT reports data completeness as well as data validity measures for the Seattle data archives that are distributed on CD-ROM (Ishimaru 1998). In their data archive, they report the number of 20-second data samples in a 5-minute summary statistic (e.g., maximum of 15 data samples possible). A data validity flag (with values of good, bad, suspect, and disabled loop) is also included in data reports to indicate the validity of 5-minute statistics (Table 3). Peak hour, peak period, and daily statistics generated by WSDOT’s CDR data extraction program also report data validity and completeness summary measures (Table 4). The CDR software also has a data quality mapping utility that allows data users to create location-based summaries of data completeness and validity (Ishimaru and Hallenbeck 1999). This utility is designed for data consumers who would like to analyze the underlying data quality for various purposes.
In the FHWA-sponsored Mobility Monitoring Program (http://mobility.tamu.edu/mmp), the Texas Transportation Institute and Cambridge Systematics, Inc. gather archived operations data from numerous traffic management centers nationwide and analyze the archived data to report mobility and reliability trends in the urban areas (Lomax, Turner and Margiotta 2001). As such, the program is an archived data consumer with the primary application of performance monitoring.
The program team performs various data quality checks in the course of processing and analyzing the archived data. In addition to summary statistics on mobility and reliability, performance reports also include information on the following data quality measures:
|
*********************************** Filename:
5TO15.DAT Creation
Date: 02/2/98 (Wed) Creation
Time: 03:16:59 File
Type: SPREADSHEET *********************************** ES-145D:_MS___1
I-5 Lake City Way 170.80 09/01/97
(Mon) ---Raw
Loop Data Listing--- Time
Vol Occ Flg nPds 0:00
49 3.80% 1 15 0:05
37 2.90% 1 15 0:10
38 3.50% 1 15 0:15
34 2.60% 1 15 0:20
48 4.40% 1 15 0:25
44 3.60% 1 15 0:30
35 2.80% 1 15 0:35
33 3.30% 1 15 0:40
28 2.50% 1 15 0:45 30 2.30% 1 15 |
Source: Ishimaru and Hallenbeck 1999
in
Summary Statistics
|
*********************************** Filename:
AADT.MDS Creation
Date: 02/2/98 (Thu) Creation
Time: 10:54:09 File
Type: SPREADSHEET *********************************** ES-145D:_MS___1
I-5 Lake City Way 170.80 Monthly
Avg for 1996 Jan (Sun) ---Multi-Day
Loop Summary Report--- Summary Valid
Vol Occ G
S B D Val Inv Mis Daily VAL 19392 7.50% 1133
18 1
0 4 0 0 AM
Peak VAL 1493
3.50% 142 2
0 0 4 0 0 PM
Peak VAL 5069
15.60% 190 2
0 0 4 0 0 AM
Pk Hour VAL 1381
10.00% 47 1
0 0 4 0 0 10:45 11:45 PM Pk Hour VAL 1576 11.90% 48 0 0 0 4 0 0 13:45 14:45 |
Source: Ishimaru and Hallenbeck 1999
For example, Figure 1 shows summary information for data validity and data completeness. Significant detail for these data quality measures is also stored in databases. For example, one could do time-based and location-based analyses of data quality using the full database.
Historical/Planning-Level Traffic Monitoring
Data consumers in this group are typically engaged in mid- to long-range (5 to 20-plus years) traffic planning and analysis. Data uses are mostly of an historical nature, so in some cases annual average statistics may not be available (or needed) until six or more months after the past year ends. Thus, the consumer groups’ frame of reference for data timeliness differs from the other two groups by an order of magnitude. Whereas operations data consumers may consider data older than 5 minutes unacceptable, planning data consumers may consider waiting up to 9 months for annual statistics to be acceptable. The use of data quality checks or “business rules” for determining the validity of traffic data appears to be fairly common among this group. In many cases, these planning groups serve as the “official source” of traffic data for a particular jurisdiction.
Numerous state departments of transportation (DOTs) use data validation checks or “business rules” when they load traffic data into their information systems. These data quality checks are typically based upon traffic capacity principles, typical traffic trends or patterns, or simply local traffic experience and insight. Thus data validity is a common data quality measure using in many historical traffic monitoring groups. For example, the Texas DOT (TxDOT) plans to use 23 business rules for continuous vehicle counts in their Statewide Traffic Analysis and Reporting System (STARS) (TxDOT 2001). Once a data record has failed a business rule, that record is flagged as “suspect” and must be reviewed by a traffic data analyst prior to the beginning of the traffic monitoring program’s year-end process. Additionally, STARS uses data integrity as a data quality measure as they also run checks on the data file and station integrity.
The traffic monitoring group in the Virginia DOT (VDOT) also uses established business rules to perform traffic data validity checks prior to loading them into their information system. As with TxDOT’s process, data that fails the business rules are flagged as suspect and must be reviewed by a traffic data analyst. If the traffic data is deemed erroneous, it will not be loaded into the traffic information system. VDOT has a unique contracting arrangement in that they lease the traffic data collection equipment from sub-contractors; thus, they pay the sub-contractors lease payments based upon the quality and completeness of the data collected by the sub-contractors’ equipment. For example, a full monthly payment is made for locations “where 25 or more days of useable (for factor creation) classification and volume traffic information are available during a calendar month”. A partial lease payment of 50 percent is made “where 15 or more days of useable (for factor creation) volume traffic information, but less than 15 days (useable for factor creation) classification data are available.” Thus VDOT’s payment for traffic data collection is based on the quality measures of data validity and data completeness.
VDOT also designates quality levels for their traffic data they distribute. The quality level codes and descriptions are as follows:
· Code 0 - Not Reviewed
· Code 1 - Acceptable for Nothing
· Code 2 - Acceptable for Qualified Raw Data Distribution
· Code 3 - Acceptable for Raw Data Distribution
· Code 4 - Acceptable for use in AADT Calculation
·
Code 5 - Acceptable for all TMS uses
These quality codes are designed to indicate to data consumers what the data producers believe to be the fitness of the data for various purposes.
Similar software-based data validity checks are used in several other states. The Pennsylvania and Ohio DOTs both use data validity checks in their traffic information system. These validity checks are performed on a daily basis for all traffic data. The Michigan DOT uses Traffic Data Quality (TDQ), a software tool developed as a result of a pooled-fund study (Flinner and Horsey, no date).
The international experience with traffic data validity checks is comparable to the U.S. experience. A European scanning tour found that several countries perform an automated validation of traffic data (FHWA 1997). All ITS systems observed in the tour countries (the Netherlands, Switzerland, Germany, France, and the United Kingdom) perform some type of automated data validation, usually by comparing current data from a particular site with historical data from that same site during a similar time interval. If an operator identifies questionable data, they use graphic displays to review the data and determine acceptability.
Several of the countries have fairly extensive data validation systems, and all of them require manual input. Most cases involve validation methods based on site-specific development of “rules” based on historical patterns by time of day, day of week, and lane for that site. Data that fail the validation routines alert the attention of system operators, who then decide whether the data are correct. Operators replace invalid data with data from previous time periods at that site, factoring the data with growth estimates (based on nearby counters that worked properly) when appropriate. The discussion that follows covers processes used in individual countries.
The Netherlands uses a software system called INTENS. This system collects traffic data from the various traffic-monitoring sites, conducts automated validation checks, facilitates manual review of flagged data, and produces a variety of summary graphics and statistics. The data validation process consists of a series of parameter checks comparing the data submitted for each site with confidence limits set specifically for that site. Initial data checks ensure that data are labeled correctly (i.e., belong to a site for which data are expected), have the proper number of lanes, and pass other site identification checks. The next set of checks are called “primary control”, which are a series of maximum and minimum allowable data ranges for specific variables that are based on historical data.
At the national level, Switzerland has two sets of data validation checks. The first determines if the telemetry system functioned properly. The second set of validation data examines the submitted records and identifies those that are questionable based on several criteria. These include: zero volumes or other errors in the hourly records; hourly volumes that exceed a maximum percentile; variation in the ratio of 14-hour volumes to 24-hour volumes (14 hours from 6:00 a.m. to 8:00 p.m.) for weekdays; variation in the ratio of 5-hour volumes to 14-hour volumes (5 hours from 3:00 p.m. to 8:00 p.m.) per weekday; and variations in directional distribution.
Like other countries included in the scan tour, Germany utilizes multiple validation procedures. The one included here is being developed for an ITS application in Hesse. The system uses a combined fuzzy logic/expert system approach for data validation. It is trained on data that are considered “valid” and then reports invalid data for subsequent manual review. Data determined to be valid are then included in the training of the system, so that other data with those characteristics will be considered valid.
France uses a software system called MELODIE, which creates many of the basic reporting statistics needed for later analysis. There are no specific algorithms within the system itself, but MELODIE generates graphical output that is viewed by an operator who makes decisions pertaining to its validity. If the operator determines that some data are not valid, the program will use the previous month’s data for replacement. The MELODIE system keeps track of the fact that invalid data have been replaced.
In the United Kingdom, the scan team found multiple validation techniques. The one covered in this document is the Motorway Incident Detection and Analysis System (MIDAS). It performs two levels of validation. In the first level, the system itself has an internal validation method that indicates when the loop system needs recalibration or has failed (other details unavailable). In the second level of validation, the system plots the volume, speed, or loop occupancy by geographic location and time of day. The graphic provides an easy to use visual reference for detecting specific types of equipment errors.
Current Practices in
Measuring Data Quality in Other Disciplines
Data quality literature is readily available in several other disciplines, especially the business management and data warehousing industries. The research team conducted a literature review and identified at least two dozen resources that related directly to data quality measures. Selected resources are summarized below with an emphasis on their relevancy to traffic data quality measures.
The geographic information systems (GIS) community has developed standards for documenting data quality in their Spatial Data Transfer Standard (SDTS) (O’Looney 2000; ANSI 1998). The SDTS data quality categories are shown in Table 5. The purpose of the data quality standard within SDTS is not to require acceptable levels of data quality, but to require a data quality report in all GIS data transfers. Following are the SDTS standardized definitions and measures that are to be used in describing and documenting GIS data quality.
|
Category |
Definition |
Example |
|
Positional Accuracy |
The degree of
horizontal and vertical control in the coordinate system. |
The available precision
or detail of longitude and latitude coordinates. |
|
Attribute Accuracy |
The degree of error
associated with the way thematic data is categorized. |
The degree to which a
soil description is likely to vary from a soil measurement taken from the
corresponding location. |
|
Completeness |
The degree to which
data is missing and the method of handling missing data. |