Three decades of financial, economic and legal studies of corporate governance have relied heavily on data sets whose provenance is almost unknown. A new article is about to correct this fatal flaw in contemporary corporate governance research by launching a brand new resource: the Cleaning Corporate Governance database.

Data-driven research has become the foundation for many private sector policies, regulations, precedents and practices. From qualitative interviews to big data, now more than ever, we rely on data to guide us in our research, policy development and everyday life. Yet data-driven research is only as good as its components.

The field of empirical corporate governance is an illustration of the growing use of data as a key pillar to inform and shape research. Over the past three decades, empirical corporate governance has grown in importance by quantifying the text traditionally difficult to quantify – the text of state laws, federal regulations, and corporate-level governance documents. This data is then used to measure the quality of governance. Some of the most influential studies have shown that countries with strong investor protections are more likely to have higher company valuations, that more shareholder-friendly companies outperform more management-friendly ones, and that there are many other significant predicted effects of governance in the real world. on the performance of the company.

Despite their obvious appeal, these groundbreaking studies have also long had an unrecognized Achilles heel: three decades of financial, economic, and legal studies of corporate governance have been largely built on data sets of almost unknown origin.

In our recent article, “Cleaning Corporate Governance”, to appear in the University of Pennsylvania Law Journal, we decided to correct this fatal flaw in contemporary corporate governance research by launching a brand new resource, the CCG database, which allows researchers to study, for the first time, the fidelity of fundamental results corporate governance. The database is anchored in a one-of-a-kind open-source body of text representing nearly thirty years of historical charters for companies listed in the S&P 1500, for a total of approximately 3,000 companies over time. Charters are the fundamental organizational document of a company, defining its basic equity structure, purpose and the fundamental rights and responsibilities of shareholders, directors and managers. We manually tag a significant subset of this enterprise-level data regarding these core attributes and augment it further with state-level tagged panel data that follows sixteen statutory governance rules in 50 states (and District of Columbia).

The CCG database disrupts some of the most beatified results of empirical corporate governance. A basic example is the much-cited “G-index” first explored in the classic article “Corporate Governance and Stock Prices” by Paul A. Gompers, Joy L. Ishii and Andrew Metric (GIM). Using data from a third-party vendor, the G-index aggregated 24 corporate governance binary variables into a single additive index to rank companies on a spectrum ranging from more “dictatorial” (or management-centric) to “Democratic” (or shareholder-centered). ).

By rolling out this index for years in the late 1990s, GIM demonstrated that a strategy of systematic investment in democratic issuers (while bypassing dictatorial issuers) would have generated an astonishing 9% excess return on a risk-adjusted basis. Our new database, however, reveals that the data underlying this finding contains significant inaccuracies. For example, we have found that the G-index is inaccurate more than 82 percent of the time, and that the rate of inaccuracy worsened in the 2000s, even as this database (and its results) gained the attention of academics, regulators and practitioners. We use the GCC to implement a conservative correction of the underlying G-index, and we show that the relationship between democratic governance and arbitrage returns decreases significantly with the corrected data.

The CCG database also presents interesting opportunities for further constructive research. Its underlying body of text, in particular, is fertile ground for machine learning and computational text analysis methodologies. We offer a taste of such approaches by deploying some of these nascent methods in our article to show, among other results, that non-Delaware charters have become longer and less readable over time and that the similarity of charters for businesses in some sectors has grown. overtime.

“Lawyers have taken a step back in the assembly and use of quantitative data, fearing that we are not qualified for empirical work.”

The availability of correct, open source data will be an invaluable resource for researchers studying more in-depth governance issues, such as the importance of state law, the evolution of governance during times of upheaval (such as the financial crisis) and whether joint ownership of companies by large passive investors leads to anti-competitive behavior. Our database is also unique in that it allows researchers to use the underlying data to construct new measures of stakeholder governance, which distinguishes it from pre-existing shareholder databases.

Perhaps the most important contribution of CCG data is its underlying corpus and all of our tagged data will become free and open access. By sharing our data, we hope to right two major wrongs.

First, we help to solve the problem of access. While the data we have collected is theoretically available from the State Secretaries of States and the Securities and Exchange Commission, collecting data from either source is not a walk in the park. . We estimate that harvesting the Delaware companies in our sample – constituting about 58 percent of the total data set – from the Delaware Secretary of State’s office would cost half a million dollars in fees alone. Searching the SEC’s EDGAR online database is theoretically free, but frustrating – it’s impossible to search only for charters and regulations, so the process of finding these documents is a digging exercise. Commercial databases like Westlaw and Bloomberg are slightly better (although they also reflect EDGAR’s disorganization), but collecting data from these sources comes with its own hurdles.

Second, we assume that one of the main reasons for the spread of errors in existing data for two decades is that lawyers have stepped back in the assembly and use of quantitative data, fearing that we may not be. qualified for empirical work. In our absence, laymen did their best to render judgments that required distillation into binary variables that straddled state law, stock exchange listing rules, federal securities laws, and debt documents. corporate governance. These complex legal issues require legal training. Through a careful process (and with many research assistants trained in law), we were able to create a database that we believe accurately reflects the governance provisions contained in corporate charters, as well as their interaction with the regulatory apparatus. rule them. available. One of the first reviewers of this article describes existing governance data as “mystery meat”, contrasting our CCG data as “organic product meets farmer’s market producer”. We believe that’s true, and in the years to come, we’re inviting others to help us cultivate this wealth of governance data into the open spectrum for which it was intended.

