Data Accessibility and Usability Index
While anyone with a registered user account can put data up on FractTracker’s DataTool, sometimes finding and collecting relevant data in a usable form is more difficult than it should be. I have examined datasets from a wide variety of places (1) and agencies, and after encountering numerous issues, I have devised a grading scheme to reflect the quality of the data being distributed, to be known as the Data Accessibility and Usability Index (DAUI).
The DAUI considers variables in the following seven categories:
- Accessibility (20 points): How easy is the data to obtain?
- Usability (20 points): How much preparation is required to be able to analyze the data?
- Completeness (15 points): Is there anything missing from the data that would interfere with analysis or mapping?
- Metadata (15 points): Are the data column descriptions and data source information readily available?
- Responsiveness (10 points): Has the agency been helpful with information requests? (2)
- Accuracy (10 points): Are there errors in the data? (3)
- Cost (10 points): Is the data free? (4)
Data Accessibility and Usability Index grading scheme, 100 total points. Scroll to the right to see additional categories.
It is important to note that each grade given represents only one specific dataset at one point in time. On occasion, certain aspects of any given dataset are updated by the agency controlling the data, hopefully for the better.
One recent example is the Pennsylvania drilled wells (spuds) database. Until recently, this was published on HTML tables on a monthly basis, but 2011 data is now available in a single Excel file. In addition, this year’s wells have location information, which was missing from previous years data. Although PASDA maintains a list of about 125,000 oil and gas locations in the Commonwealth directly from the DEP, there were still thousands of wells that didn’t match in the years between 1998 and 2010.
Since the new dataset in Pennsylvania only covers 2011 wells so far, it is appropriate to grade both datasets separately. This will also serve as a functional example on how the DAUI works.
Grades for PA DEP’s Drilled Wells Dataset. Scroll to the right for additional grades and total scores.
As you can see, the two changes that they have made have bumped the PA DEP’s grade up from a D- to a solid A. And in fact, the D- might have been generous. Several of our DataTool users have suggested that there might be significant omissions in the older report, but I have never been able to conclusively establish that as a fact. If it is true, the Accuracy rating would fall from 10 to 0, leaving a total score of 50 for that database.
Let’s look at another example, Wells in Quebec near the St. Lawrence, published by Quebec’s bureau d’audiences publiques sur l’environnement. To get the data up on FracTracker, the data had to translated to English (not a demerit, just a step in the process), copied from the PDF file to Excel and pasted so that each column of data fit on one cell. Then the data could be distributed using the space (“ “) as a delimiter, at which point the cells needed to be manually aligned to allow for proper concatenation. Once all of that was done, it was necessary to change the location information from Degree Minute Second format to Decimal Degree to be able to map the data. Finally, the units of measure for depth were mixed, including both meters and feet, which should be consistent. In short, not a very satisfactory experience with the data. Here’s how it grades, based on that experience:
Grade for Quebec’s bureau d’audiences publiques sur l’environnement Wells in Quebec near the St. Lawrence dataset. Scroll to the right for more grades and total score.
Despite my frustrations with this data, the information is published on the agency’s website, appears to be complete, and is well explained. The issue of publishing this dataset on a PDF (which cannot directly be analyzed) was the main result for the agency’s C grade.
Here’s the grade for a dataset that I can’t post: The Railroad Commission (RRC) of Texas’ Newark East (Barnett Shale) gas wells.
Clearly, the RRC is in possession of a tremendous amount of data. You can click on the “Well log” link and see dozens of pages of scanned original documents. However, there are a couple of problems with this data which makes in unusable for FracTracker. First of all, there are over 8,000 records, but it is impossible to view more than 100 at a time. Those would have to be copied and pasted manually from the HTML tables. While that is possible to do, it isn’t worth the effort, because there is no location information. Knowing that they must be able to produce an Excel sheet with some basic data about their drilled wells, I contacted the RRC, and was told that what I wanted could be obtained…for a cost. In my opinion, the RRC is being stubborn on this. They have terrific data, and yet they do everything they can to be (politely) difficult. As I did not elect to purchase data at this time, I will only grade what is available online.
Grade for the Railroad Commission of Texas’ Newark East (Barnett Shale) Drilled Wells dataset. Scroll to the right for more grades and the total score.
Because they elected not to release the data upon request, the RRC earned a failing grade. Had the RRC simply created and sent the proper Excel file from their database, they might have earned 90 points on the DAUI. If they had decided that well location information was a basic thing that citizens might want to know, and posted a downloadable link on their website, they could have full marks. If the for-cost version of the data has everything that is desired, it would have a maximum score of 80, because it was not free and had to be requested.
These three examples show how the DAUI system works. In the near future, I will grade all relevant oil and gas datasets against the same metric. Hopefully, a comprehensive picture of the various agencies that control oil and gas data will emerge.
Scoring 100 points on the DAUI should be attainable, almost 100 percent of the time. If governmental agencies really do not have data on wells, permits, violations, and production, then they are failing their respective citizens, whose lives are affeted by the oil and gas industry, often quite profoundly. If the agencies that control the data simply are in the habit of making it difficult to access, then I must remain hopeful that they will be pressured to realize that is an unacceptable strategy for the 21st century.
- This list includes Pennsylvania, West Virginia, Ohio, Arkansas, Texas, Utah, North Dakota, New Mexico, Colorado, and Quebec. Not all of the datasets have been complete enough to post on FracTracker, a frustration which contributed significantly to the creation of this grading scheme.
- If no requests have been made regarding a given dataset, or if the data simply does not exist in a desired format, full credit should be given in this category.
- Accuracy issues can be very difficult to verify. Also, if certain data doesn’t exist, that is accounted for elsewhere. As with Responsiveness, the agency is afforded the benefit of the doubt here.
- I have seen numerous datasets available from state agencies that cost money, with costs ranging from about $10 to well over $1,000. This is often explained as “recovering costs” of data distribution. In my opinion, this is unacceptable. While maintaining accurate data is undoubtedly expensive, it is an obligation of the overseeing agency to do so, and making the data available to the public is really a minimal component of that process. If it is a genuine budgetary constraint, then the agency should merely charge more for permit fees, etc., so that they are adequately able to perform their job.
>Wow this is incredible work. Thanks for sharing! I had no idea how much information and lack thereof was out there.