As discussed at the 2017 TRB Annual Meeting

Improved Data Standards and Data Management


Examples of recent headaches with Data Management:

  • Using observed traffic signal data to predict what an operator would do. No consistently formatted data available.
  • Merging data from across states not from Connecticut.
  • Confidentiality when collecting data. Hard to guarantee confidentiality with the Feds reading emails.
  • Getting a data server set up is challenging
  • Quality of household travel survey for questionable for a research project.
  • Anonymizing schemes for fare card met lots of legal challenges.
  • Connect call record data with other data sources. Privacy, and anonymization are challenging
  • Survey response rates.

What can Zephyr do?

Fellowship program

Addresses workforce issues. A Data Officer could go into a public agency similar to how Code for America works

Standardize vocabulary and data dictionary

A lot of time is wasted having to deal with different variable names and terminology. Zephyr sets standard variable names and units [ i.e. commuter_rail vs CommuterRail ]

Standardize Schema

There should be a periodic update of standards so that they do not become stale or forked. The Zephyr board could decide each year whether or not to revisit each standard. We also need a technicals standing body to monitor data standards to inform the board.

Standardize Meta Data and Data Projects

A standard protocol for managing data projects or developing data resources.

Could include:

  • Who is responsible
  • Data dictionary [specific field names & types] for various types of data The set of data dictionaries would grow and evolve over time
  • Metadata standards
  • Updating, sunsetting

Develop example contract language

Zephyr can help come up with example contract and licensing language to facilitate data sharing with researchers and other stakeholders while protecting privacy.

Create a How-to manual for collecting travel analysis data.

Similar to survey methods manual, but for other data sources.

Data Validation and Grading System

How to grade a dataset: -Completion -Sample size -Representation

A dataset grade would have to depend on the specific application.

Potential actions:

  • develop scripts to analyze suitability and assign a grade to datasets
  • award datasets badges by use type
  • could let private sector grade its own data by using the scripts OR let Zephyr act as a “ratings agency” and give it access to evaluate.
  • organize dataset ratings and previous uses

Who else should be involved?

For data sets that are unique to transportation, Zephyr can set standards but need to talk to public agencies and groups that collect data.

Private data providers.

After relationship, both groups work together to get the right data. Prevent each public agency from having to determine independently examine each private companies’ data.

What do you think?