From 2013 - 2021, California census tracts with higher poverty rates demonstrate worse air quality

R Statistics

EDS 222: Statistics for Environmental Data Science - Final Project

Mia Forsline
2021-12-02

Research Question

From 2013 to 2021, does air quality (as measured by annual mean PM2.5 concentrations per census tract) vary with poverty rates (as measured by the percent of the population living below two times the federal poverty level per census tract) in California?

Introduction

In California, events like wildfires can greatly reduce air quality by releasing fine particles called particulate matter, or PM2.5 (Shi et al., 2019). PM2.5 refers to particles with diameters ≤ 2.5 µm, which are known to be hazardous for human health. PM2.5 is especially detrimental for human respiratory and cardiovascular health (Cleland et al., 2021). As California’s wildfires continue to worsen over time, it is becoming increasingly important to monitor air quality, PM2.5 concentrations, and their impacts on populations (Gupta et al., 2018).

However, the environmental burden of poor air quality is not shared equally. For example, the San Joaquin Valley’s economically disadvantaged and ethnically diverse communities breathe some of the most polluted air in the nation (Cisneros et al., 2017). As a result, vulnerable communities such as Mexican American immigrant farm workers and their families experience disproportionately high rates of asthma attacks, hospital admissions, and other medical issues (Schwartz & Pepper, 2009). This inequitable pattern repeats itself in other states (Qian & Wu, 2019), the United States overall (Tessum et al., 2021), and even other countries (Li et al., 2018).

While there are many possible ways to explore the inequity of air pollution in California, I specifically use annual mean PM2.5 to measure of air quality and poverty rate to quantify socioeconomic disparities. I expect to find a significant relationship between these two variables.

Statistical Hypotheses

My null hypothesis (\(H_0\)) is that, in California, there is no relationship between annual mean PM2.5 concentrations per census tract and percent of the population living below twice the federal poverty line per census tract.

My alternative hypothesis (\(H_A\)) is that, in California, there is a relationship between annual mean PM2.5 concentrations per census tract and percent of the population living below twice the federal poverty line per census tract.

Data Description and Collection

show
knitr::opts_chunk$set(echo = FALSE,
                      message = FALSE, 
                      warning = FALSE, 
                      include = TRUE)
#turn off scientific notation and select how many digits to round outputs to 
options("scipen" = 999, "digits" = 4)

#import necessary libraries 
library(tidyverse)
library(here)
library(gt)
library(xtable)
library(kableExtra)
show
#read in data
c1 <- read_csv(file = here("_posts", 
                           "2021-11-18-calenviroscreen", 
                           "CES", 
                           "CES_data", 
                           "ces1_2013.csv"))
c2 <- read_csv(file = here("_posts", 
                           "2021-11-18-calenviroscreen", 
                           "CES", 
                           "CES_data", 
                           "ces2_2014.csv"))
c3 <- read_csv(file = here("_posts", 
                           "2021-11-18-calenviroscreen", 
                           "CES", 
                           "CES_data", 
                           "ces3_2018.csv"))
c4 <- read_csv(file = here("_posts", 
                           "2021-11-18-calenviroscreen", 
                           "CES", 
                           "CES_data", 
                           "ces4_2021.csv"))
#clean data
##select and rename necessary columns 
##add Year column 
c1_clean <- c1 %>% 
  select(c("ZIP Code","Poverty", "PM2.5")) %>% 
  mutate(Year = "2013") %>% 
  dplyr::rename(ZIP = "ZIP Code") %>% 
  mutate(ZIP = as.numeric(ZIP))
c2_clean <- c2 %>% 
  select(c("Census Tract", "California County", "ZIP", "Longitude", "Latitude", "Poverty", "PM2.5")) %>% 
  mutate(Year = "2014",
         ZIP = as.numeric(ZIP))
c3_clean <- c3 %>% 
  select(c("Census Tract", "California County", "ZIP", "Longitude", "Latitude", "Poverty", "PM2.5")) %>% 
  mutate(Year = "2018",
         ZIP = as.numeric(ZIP))
c4_clean <- c4 %>% 
  select(c("Census Tract", "California County", "ZIP", "Longitude", "Latitude", "Poverty", "PM2.5")) %>% 
  mutate(Year = "2021",
         ZIP = as.numeric(ZIP))
show
#fill in missing column data for c1 dataset so all datasets have the same columns 
c1_fill <- full_join(x = c2_clean, y = c1_clean, 
                     by = "ZIP", 
                     suffix = c(".c2", ".c1")) %>%
  select(c("Census Tract", 
           "California County", 
           "ZIP", 
           "Longitude", 
           "Latitude", 
           "Poverty.c1", 
           "PM2.5.c1", 
           "Year.c1")) %>%
  dplyr::rename(Poverty = "Poverty.c1",
                PM2.5 = "PM2.5.c1",
                Year = "Year.c1")
show
#rbind datasets to create a single combined dataset with all 4 years of CalEnviroScreen data
joined <- rbind(c4_clean, c3_clean, c2_clean, c1_fill) %>% 
  mutate(Year = as.factor(Year),
         Year = factor(x = Year, levels = c("2013", "2014", "2018", "2021"))) %>% 
  drop_na(Year) 

I downloaded 2013 - 2021 CalEnviroScreen (CES) data from the California Office of Environmental Health Hazard Assessment (OEHHA) and the California Open Data Portal:

Each data set contains columns of environmental pollution burden indicators, including PM2.5, and population characteristics, including rates of poverty. Each census tract in California is represented as a row and assigned a value per environmental indicator and population characteristic. One thing to note is that due to developing technology over time, earlier data have different sample sizes and sampling techniques compared to newer data, which can complicate how we compare the data across time.

Out of the myriad components of the CES data, I am interested in:

1. PM2.5

The annual mean concentration of PM2.5 is a weighted average of measured monitor concentrations and satellite observations (ug/m3) over 3 years to avoid account for uneven sampling frequency. For example, the CES 1.1. report used data from 2007 - 2009 while the CES 4.0 report used 2015 - 2017. All reports used data from the California Air Resources Board’s Air Monitoring Network (AMN), while CES 3.0 and CES 4.0 also incorporated Satellite Remote Sensing Data.

Data were more likely to be high resolution around certain cities or localized areas, and not all cities had air monitoring stations. Locales with little to no data were either omitted or estimated using nearby locations’ data. For example, in CES 1.1, census tracts with centers > 50km away from the nearest air monitor were omitted. In CES 4.0, missing data was estimated using regression relationships with nearby sites.

For CES 1.1 - 3.0, the quarterly mean PM2.5 concentrations were estimated using ordinary kriging. For CES 4.0, overall PM2.5 annual mean concentrations were estimated for the center of each 1km x 1km grid cell using both the monitoring and satellite data in a weighted average. An inverse-distance weighting method was used, so grid cells close to monitors relied more heavily on monitor estimates while grid cells further from monitors relied more heavily on satellite data. Grid cells with monitors > 50km away relied solely on satellite data.

The quarterly estimates were then averaged to calculate annual means (Figure 1).

show
ggplot(data = joined, aes(x = PM2.5)) + 
  geom_histogram(aes(fill = Year), binwidth = 1) + 
  theme_classic() + 
  facet_wrap(~Year, ncol = 2) + 
  labs(x = expression(paste("Mean PM2.5 per census tract (µg/m"^3~")")),
       y = "Frequency") + 
  theme(legend.position = "none")
2013 - 2021 mean PM2.5 in California was not normally distributed. In 2013 (n = 8,151), 2014 (n = 7,847), 2018 (n = 7,938), and 2021 (n = 7,960), the annual mean concentrations of PM2.5 (µg/m3) per census tract were 11.52, 10.01, 10.38, and 10.15 respectively. Data was sourced from CalEnviroScreen 1.1 - 4.0 (https://oehha.ca.gov/).

Figure 1: 2013 - 2021 mean PM2.5 in California was not normally distributed. In 2013 (n = 8,151), 2014 (n = 7,847), 2018 (n = 7,938), and 2021 (n = 7,960), the annual mean concentrations of PM2.5 (µg/m3) per census tract were 11.52, 10.01, 10.38, and 10.15 respectively. Data was sourced from CalEnviroScreen 1.1 - 4.0 (https://oehha.ca.gov/).

2. Poverty

The percent of the population living below two times the federal poverty level was calculated using a 5-year estimate to produce more reliable results for geographic areas with small populations. For example, the CES 1.1 report used a 5-year estimate from 2007 - 2011 data while the CES 4.0 report used a 5-year estimate from 2015 - 2019 data. Poverty data came from the American Community Survey.

CES defined poverty as twice below the federal poverty line to account for California’s high cost of living relative to other states and because the federal poverty threshold has not changed since the 1980s despite the cost of living increasing over time. The percent per census tract was calculated by individuals living below 200% the poverty level per census tract / total individuals living below 200% of the poverty level (Figure 2). Standard error was calculated to determine the reliability of the calculated poverty rate. Census tracts with unreliable estimates were assigned no value for poverty rate.

show
ggplot(data = joined, aes(x = Poverty)) + 
  geom_histogram(aes(fill = Year), binwidth = 5) + 
  theme_classic() + 
  facet_wrap(~Year, ncol = 2) + 
  labs(x = "Poverty rate per census tract (%)", 
       y = "Frequency")
2013 - 2021 poverty rates in California were not normally distributed. In 2013 (n = 8,151), 2014 (n = 7,847), 2018 (n = 7,938), and 2021 (n = 7,960), the mean percentages of the population per census tract living below two times the federal poverty level were 34.24%, 35.28%, 36.39%, and 31.34% respectively. Data was sourced from CalEnviroScreen 1.1 - 4.0 (https://oehha.ca.gov/).

Figure 2: 2013 - 2021 poverty rates in California were not normally distributed. In 2013 (n = 8,151), 2014 (n = 7,847), 2018 (n = 7,938), and 2021 (n = 7,960), the mean percentages of the population per census tract living below two times the federal poverty level were 34.24%, 35.28%, 36.39%, and 31.34% respectively. Data was sourced from CalEnviroScreen 1.1 - 4.0 (https://oehha.ca.gov/).

Methods - Statistical Analysis Plan

To assess if, in California from 2013 - 2021, air quality varied with poverty rates, I ran a linear regression of PM2.5 ~ Poverty for each year (e.g., 2013, 2014, 2018, 2021). This analysis is appropriate to describe how air quality might be changing with respect to poverty rates. Running multiple regressions over different years can help determine how this relationship could be changing over time.

This method is limited by the fact that I am only including one independent variable (Poverty) in the model. In other words, this analysis is vulnerable to omitted variables bias because it is likely that there are many different factors in addition to poverty that influence air quality. Nevertheless, this is a solid starting point for unraveling those complex relationships.

Results

show
mod1 <- lm(PM2.5 ~ Poverty, data = c1)
sum1 <- summary(mod1)

mod2 <- lm(PM2.5 ~ Poverty, data = c2)
sum2 <- summary(mod2)

mod3 <- lm(PM2.5 ~ Poverty, data = c3)
sum3 <- summary(mod3)

mod4 <- lm(PM2.5 ~ Poverty, data = c4)
sum4 <- summary(mod4)

For all time periods, annual mean PM2.5 concentrations were significantly influenced by the poverty rate (Figure 3). In 2013, PM2.5 increased by 0.0352 µg/m3 as the poverty rate increased by 1% (p-value < 0.001, sd = 0.0044). In 2014, PM2.5 increased by 0.0279 µg/m3 as the poverty rate increased by 1% (p-value < 0.001, sd = 0.0014). In 2018, PM2.5 increased by 0.0299 µg/m3 as the poverty rate increased by 1% (p-value < 0.001, sd = 0.0014). In 2021, PM2.5 increased by 0.0286 µg/m3 as the poverty rate increased by 1% (p-value < 0.001, sd = 0.0013). These results support my hypothesis that mean PM2.5 and poverty in California are positively related.

Over time, the relationship between mean PM2.5 and poverty rate has remained fairly stable with the slope only varying from 0.0279 to 0.0352.

show
joined <- joined %>% 
  drop_na(Poverty, PM2.5)

ggplot(data = joined, aes(x = Poverty, y = PM2.5)) +
  geom_point(aes(color = Year), alpha = 0.05) + 
  geom_smooth(method='lm', 
              formula= y~x,
              size=1, 
              color = "black") + 
  theme_classic()+ 
  labs(x = "Poverty rate (%)",
       y = expression(paste
                       ("Mean PM2.5 (µg/m"^3~")"))) +
  facet_wrap(.~Year, ncol = 2) + 
  theme(legend.position = "none")
Air quality significantly associates poverty in California. In 2013 (n = 8,151), 2014 (n = 7,847), 2018 (n = 7,938), and 2021 (n = 7,960), as poverty rates increase in California, mean PM2.5 increases and air quality deteriorates (p-value <<< 0.001).

Figure 3: Air quality significantly associates poverty in California. In 2013 (n = 8,151), 2014 (n = 7,847), 2018 (n = 7,938), and 2021 (n = 7,960), as poverty rates increase in California, mean PM2.5 increases and air quality deteriorates (p-value <<< 0.001).

Conclusion

As expected, I found a statistically significant relationship between air quality and poverty rates in California during 2013, 2014, 2018, and 2021. For all four years, annual mean concentrations of PM2.5 (µg/m3) increased as the percent of people living below twice the federal poverty level increased (Figure 3). In other words, air quality was on average lower in census tracts with higher poverty rates. These findings supported my hypothesis and corroborated prior research that has identified PM2.5 disparities based on socioeconomic factors in California (Mousavi et al., 2021). This analysis also emphasizes the importance of an environmental justice lens when investigating issues such as air quality.

Future Directions

While my analysis focused on four specific years of comprehensive CalEnviroScreen data, it would be interesting to expand the time frame to before 2013 because 2013 is when California’s cap-and-trade program was initiated. During this time, there is evidence that while overall greenhouse gases were reduced in California, socioeconomically disadvantaged communities actually experienced emission increases (Cushing et al., 2018).

GitHub

The full code can be accessed here.

Cisneros, R., Brown, P., Cameron, L., Gaab, E., Gonzalez, M., Ramondt, S., Veloz, D., Song, A., & Schweizer, D. (2017). Understanding Public Views about Air Quality and Air Pollution Sources in the San Joaquin Valley, California. Journal of Environmental and Public Health, 2017, e4535142. https://doi.org/10.1155/2017/4535142
Cleland, S. E., Serre, M. L., Rappold, A. G., & West, J. J. (2021). Estimating the Acute Health Impacts of Fire-Originated PM2.5 Exposure During the 2017 California Wildfires: Sensitivity to Choices of Inputs. GeoHealth, 5(7), e2021GH000414. https://doi.org/10.1029/2021GH000414
Cushing, L., Blaustein-Rejto, D., Wander, M., Pastor, M., Sadd, J., Zhu, A., & Morello-Frosch, R. (2018). Carbon trading, co-pollutants, and environmental equity: Evidence from Californias cap-and-trade program (20112015). PLOS Medicine, 15(7), e1002604. https://doi.org/10.1371/journal.pmed.1002604
Gupta, P., Doraiswamy, P., Levy, R., Pikelnaya, O., Maibach, J., Feenstra, B., Polidori, A., Kiros, F., & Mills, K. C. (2018). Impact of California Fires on Local and Regional Air Quality: The Role of a Low-Cost Sensor Network and Satellite Observations. GeoHealth, 2(6), 172–181. https://doi.org/10.1029/2018GH000136
Li, V. O., Han, Y., Lam, J. C., Zhu, Y., & Bacon-Shone, J. (2018). Air pollution and environmental injustice: Are the socially deprived exposed to more PM2.5 pollution in Hong Kong? Environmental Science & Policy, 80, 53–61. https://doi.org/10.1016/j.envsci.2017.10.014
Mousavi, A., Yuan, Y., Masri, S., Barta, G., & Wu, J. (2021). Impact of 4th of July Fireworks on Spatiotemporal PM2.5 Concentrations in California Based on the PurpleAir Sensor Network: Implications for Policy and Environmental Justice. International Journal of Environmental Research and Public Health, 18(11), 5735. https://doi.org/10.3390/ijerph18115735
Qian, X., & Wu, Y. (2019). Assessment for health equity of PM2.5 exposure in bikeshare systems: The case of Divvy in Chicago. Journal of Transport & Health, 14, 100596. https://doi.org/10.1016/j.jth.2019.100596
Schwartz, N. A., & Pepper, D. (2009). Childhood Asthma, Air Quality, and Social Suffering Among Mexican Americans in California’s San Joaquin Valley: Nobody Talks to Us Here. Medical Anthropology, 28(4), 336–367. https://doi.org/10.1080/01459740903303944
Shi, H., Jiang, Z., Zhao, B., Li, Z., Chen, Y., Gu, Y., Jiang, J. H., Lee, M., Liou, K.-N., Neu, J. L., Payne, V. H., Su, H., Wang, Y., Witek, M., & Worden, J. (2019). Modeling Study of the Air Quality Impact of Record-Breaking Southern California Wildfires in December 2017. Journal of Geophysical Research: Atmospheres, 124(12), 6554–6570. https://doi.org/10.1029/2019JD030472
Tessum, C. W., Paolella, D. A., Chambliss, S. E., Apte, J. S., Hill, J. D., & Marshall, J. D. (2021). PM 2.5 polluters disproportionately and systemically affect people of color in the United States. Science Advances, 7(18), eabf4491. https://doi.org/10.1126/sciadv.abf4491

References

Citation

For attribution, please cite this work as

Forsline (2021, Dec. 2). Mia Forsline: From 2013 - 2021, California census tracts with higher poverty rates demonstrate worse air quality. Retrieved from miaforsline.github.io/

BibTeX citation

@misc{forsline_ces,
  author = {Forsline, Mia},
  title = {Mia Forsline: From 2013 - 2021, California census tracts with higher poverty rates demonstrate worse air quality},
  url = {miaforsline.github.io/},
  year = {2021}
}