2 Implications for Reproducible Reporting
In the world of scientific research, the purpose of data analysis is to make the data analysis as open and transparent as possible to be able to exactly reproduce any of the results that were produced by it (Andrews, 2020).
The data analysis pipeline used by scientific researchers is analogous to the process of analyzing land title ownership. Title analysis begins with acquiring title information from public records, documenting the title chain, reading the title chain and interpreting the chain of ownership and reducing the ownership into numerical quantities. Mineral titles often have many owners with multiple classes of ownership, from royalty interests, production interests, mineral interests, term interests, life estate and remainder interests, severed or interests in both surface and mineral estates. These interests are commonly referred to the “bundle of sticks,” meaning, there are many classes of ownership in the mineral and surface estates. A title report analyzes these interests in the form of fractional ownership. Periods of ownership are important as well. In Nebraska, for example, an owner who has not documented a claim or had a transfer of interest in his severed mineral estate for a period of 23 years, is subject to claims of abandonment of the mineral interest by the surface owner. The individual owners interest must be compiled into an ownership report together with the contact information for each owner. Claims, such as Lis Pendens, delinquent taxes, pending County and District Court suits must be documented when reporting the status of the title for each owner.
2.1 The Hypothetical Title Report
Ownership reports are commonly lengthy and contain a lot of data depending on the complexity of the title. Geographical areas where there are mineral production, the title can expected to be more complex and more time consuming to complete.
Creating title reports, like scientific research, is a multi-step process. The Gantt chart below outlines a hypothetical number of days that it could take for each step in the process of compiling ownership reports.
It is impossible, without reams of data, to provide cost estimates in dollar amounts that could be saved by using reproducible reporting. The cost of examining and reporting title ownership is just too varied to make any sense. However, time savings can be estimated by analyzing each step in the analysis as a percentage of time it takes to complete the process. In the example below, I estimate the time in days to create a hypothetical title report covering complex mineral title. The proportion of time spent on each step of the process will remain fairly constant and therefore, can be a reliable predictor in percentage of time/cost that can be saved. The more complex your title issues, the greater the savings.
The Gantt chart lists six steps for the preparation of an ownership report and preparation of lease packages. I have had some mineral titles that had as few as one owner and as many as 350 owners. Some titles may take a week or less to analyze, others may take a month or more. One never knows what title issues one will encounter, or how complex a title can be until one actually examines the title. The number of days chosen in the example below is simply a hypothetical case for illustration. What remain constant is, the time it takes to complete the first four steps is relative to the time it takes to complete steps 5 and six. That is, the number of days it takes to build the runsheet, analyze the title, and calculate the owners’ interests, is equally proportional to the number of days it takes to build the report and prepare lease packages.
If, for example, one needs 15 days to get through calculating the owners’ interests, one will need nearly that many days to build the report and prepare the lease packages. This is where reproducible reporting becomes an advantage.
Using the techniques of reproducible reporting, the time required for building the reports and preparing the lease packages is reduced from days to usually seconds. It can actually cut the overall time involved in producing the reports in half. That time saved represents a 50% savings in labor cost.
This time saving advantage keeps on giving if changes to the report must be made. For example, if the report needs to be modified later, such as interest changing through sales or otherwise, only the data needs to be updated. The fractions are recalculated when the reports are generated. One does not have to re-write the ownership tables. If these are done in MS Word, for example, creating tables are a nightmare. Using R, the tables and fractional interests are automatically recalculated. If the report is published at a later date, those interests becoming subject to the abandonment statutes are also, automatically updated. Instead of re-writing a report, a simple change in the data for the report is made and the report is re-run automatically updating the calculations in seconds; or literally, fractions of a second.
2.1.1 Additional benefits
Statistics professor Mark Andrews (Andrews, 2020) emphasizes that reproducible data analysis is often motivated by a means of doing more high quality and robust data analysis and as a way of quality control that is essential to measure, identify errors, increase rigor, and verify results and conclusions.1. Mark Andrews offers as an example, a $9bn loss by the investment bank JP Morgan in 2012 (Is Excel the Most Dangerous Piece of Software in the World?, 2021). Keeping data in Excel also has created vulnerabilities. In 2020 for example, the Public Health England lost Covid data as a result of using Excel to collect data. One expert commented “even a high-school computing student would know that better alternatives [to Excel] exist. … you wouldn’t use XLS, nobody would do that” (“Excel,” 2020).
2.1.2 Summary of Savings with Reproducible Reporting
Each title examination is different from any other, no two titles are the same. No title examiner ever knows what awaits. Predicting the length of time to complete the title examination and compile the reporting is not feasible. But, as a rule of thumb, the time required to create title reports is directly proportional to the complexity of title. The more complex the title, the longer the process of preparing reports and lease packages. With reproducible reporting, the time it takes to create the title report itself, like the lease packages, is reduced to almost zero. Thus, by employing reproducible reporting, the savings in terms of time spent, is roughly one-half. Reproducible reporting saves about 50% of the time it normally takes to produce the same result.
2.1.3 Laying the groundwork for project management
The software we at GTLS use for reproducible reporting is R and its companion software, RStudio. These open sourced software are ideal for creating parameterized reports discussed in this .
Data literacy is becoming an important part of higher education. Landmen in today’s world are taking on and accessing large amounts of data and some are even doing data mining. R does all this and, it is open source software which means the software is free. The templates we at GTLS have created typically accept the data in CSV or XLSX format. We can also connect directly to any relational database as a source of data. CSV files are sufficient to acquire the data for R to do its magic. The reporting output can be in MS Word, PDF or HTML files.
Arguably, data are a company’s most valuable asset. Without data, an organization becomes a foundering ship. Reproducible reporting is but one aspect of data literacy, but one in which title processes can be streamlined, time can be saved and costs can be reduced.