General Overview

Quantifying and scoring – a comparison

We were curious: Are environmental scores calculated by various providers aligned with the detailed facility level environmental data that we collect.? The answer is complicated but does help reframe the debate on sustainability in investing.

Others have researched the divergence in ESG scores between different providers. Searching for “divergence esg ratings” returns more than 200,000 results with the first page showing studies by MIT, JP Morgan and BNP Paribas. We are not examining or evaluating the methodology of ESG ratings. Our aim is to compare what we believe to be part of the foundation for evaluation – a company’s facility-level, detailed, monthly impact data – to that company’s environmental rating.

An example of detailed data: Eastman Chemical (NYSE:EMN)
average of EPA limits for water quality

Our first finding is that there is a rough alignment between foundational data and rating within specific industries. To illustrate that point, we can look at petroleum refining. The sector is economically large, has more than a dozen active companies that use broadly similar technologies to produce standardized products.

Petroleum refiners’ environmental ratings vs
5-yr average of chemical oxygen demand

For this sector, we chose just one monthly performance parameter, chemical oxygen demand, and calculated its value as a percentage of the limit set by the EPA for every month from January 2016 to March 2021. This normalizes for the size of the facility. We then averaged those results and plotted them against the environmental score published by one of the providers.

A higher risk rating (x axis) correlates roughly with higher oxygen demand (y-axis), measured by the percentage of the EPA limit. The foundation and the rating are in line but not fully congruent. By this measure, Valero should have a worse rating, Marathon a better one. Marathon has no operations outside the US, while Valero has only one refinery in the UK, so the majority of the companies’ environmental footprint should be measurable using US data.

We next turned to an industry that is less homogenous: chemicals. The sector is economically large, has more than two dozen active large companies, and uses widely divergent technologies to produce specialized products.

Here the relationship between the overall company rating and the detailed environmental data is tenuous at best.

Chemical manufacturer ratings vs
5-yr average of 10 water quality performance parameters

The ratings do not move in line with the detailed measurements, and the data is more widely scattered outside the least-squares fit area. In this case we averaged over the same 63-month period, but instead of using just one indicator, we used the 10 parameters most often and most regularly reported by the chemical companies.

There are a number of reasons for this divergence:

  • All the companies are in the business of making chemicals, but some (Lyondell, Exxon) make bulk commodities, while others are narrowly focused on one class of products (Air Liquide, Air Products & Chemicals), and yet others create very specialized chemicals in small quantities (Merck).
  • Also different is each company’s exposure to the chemical industry. Celanese is a company that only makes chemicals, while Berkshire Hathaway has a large chemical business that is dwarfed by its financial interests. And the financial interests have no environmental footprint, which lowers the company’s overall risk substantially.
  • While some of the companies (Lyondell, Air Liquide) have a significant non-US presence, this should be a minor factor. Large companies tend to adopt best practices globally.
  • The integrated companies (Chevron, Exxon) have activities other than the chemical manufacturing (exploration, production, refining) that will impact their rating. We selected the monthly data specifically related to their chemical production for this analysis.

What’s a poor ESG portfolio manager to do?

With divergent ESG scores, and scattered foundational data, analysts and portfolio managers are scrambling to extract signal from the noise. Based on our experience working with the data, here are some important things to watch.

  • Choose your peer group carefully. We use a combination of GIC industry sectors and subsectors, as well as NAICS codes to construct a like-to-like peer group that allows for meaningful contextualization of the data.
  • Choose your parameters carefully. Different industries use different materials, which means what they discharge into the environment differs. We are developing a matrix of industrial activity and parameters to make this task easier.
  • Do not be misled by an overall low risk rating for a large holding company. An accident at an industrial facility may tarnish the reputation of the entire conglomerate although most of the conglomerate’s activity is virtual and does not impact the environment.

Sounds like a lot of work. So, is it worth it, does it matter? Only if you want to accurately assess operational and reputation risk.