EDA on the distribution of Trees in select
cities of the Bay area
BY: Soorya Narayan Satheesh
INITIALHYPOTHESES OR QUESTIONS
MOTIVATION
In the rapidly warming climate people are affected daily the negative impacts of the climate change. This is particularly true for the people living in the urban areas. In such areas trees play a crucial role in aesthetics, reducing pollution and other ecological functions that are crucial for the survival of humans. Therefore, understanding the distribution of trees, depending on their species and type in the urban cities of the Bay area is crucial for city governments to plan urban greening efforts so that the climate change mitigation goals as well as biodiversity requirements can be met.
HYPOTHESES
The following hypotheses were developed before the analysis as well as while doing the EDA
Does the different cities of the bay area have a uniform distribution of tree species?
Is there diversity in the tree types grown in the Bay area?
Is there variation in the average age of the trees in the Bay area?
Is there any relation between the number of trees in a street and the diversity of species there?
ANALYSIS PLAN
I plan to use the following steps to carryout my analysis: -
Compile the datasets of the seven cities of the Bay area using “union” feature in Tableau.
Compute the species wise total number of trees in each city.
Currently trees are categorized into around twelve types based on various characteristics, they will be grouped into four major categories.
Analyze the variation in DBH of the species across the cities
Analyze the streetwise variation in number of trees.
DATA SOURCE
DESCRIPTION
The data is from the “Raw urban street tree inventory data for 49 California cities” project by McPherson et al. which was carried out through funding from the US Government from 2006 to 2013 (https://www.fs.usda.gov/rds/archive/catalog/RDS-2017-0010). This project has enumerated over 929,823 street trees in 49 Californian cities. Each city has its own dataset in the form of a CSV file. For the purpose of this EDA I focused on seven cities in the Bay area – Berkeley, Burlingame, Hayward, Palo Alto, Redwood City, San Mateo and Walnut Creek. In addition to this, an excel file (citydata.xlsx) with the area, population (in 2020) and population density of each city was prepared using data from Wikipedia and used for gaining more insight.
TABLE 1: VARIABLES FROM ORIGINAL DATASET USED IN EDA
Code |
Description |
Idx |
Code of an individual tree for the city |
SpCode |
Short form of the tree species |
DBH |
Breast height diameter |
Street Name |
Name of the street in the City |
StreetNumber |
Number of the city street |
City |
Name of the City |
Botanical Name |
Botanical Name of that tree species |
Common Name |
Common Name of the tree |
Tree Type |
Special code that categorizes different tree species into one of twelve types |
Code |
Description |
City |
Name of the city from dataset |
Area_mi2 |
Area of the city in square miles |
Population_2020 |
Population of the city in 2020 |
Population_Density |
Population density of the city |
SOURCE(S)
The main dataset is publicly available for download at the website of the US Forest Service https://www.fs.usda.gov/rds/archive/catalog/RDS-2017-0010. The supplementary information on the cities was collected from Wikipedia and compiled to form a xlsx file.
FORMAT
The dataset was downloaded in the .csv format from the website and the city information is stored in xlsx file.
TRANSFORMATIONS
The main transformations I performed were in the creation of groups:
Combined the dataset of seven Bay area cities using union feature area – Berkeley, Burlingame, Hayward, Palo Alto, Redwood City, San Mateo and Walnut Creek.
Linked the tree inventory datasets with the excel file on city information
Grouped tree types from larger twelve to smaller four namely – Broadleaved Deciduous, Broadleaved Evergreen, Conifer Evergreen, Palm Evergreen and Others.
EXPLORATION
EXPLORING HYPOTHESIS 1: Does the different cities of the bay area have a uniform distribution of tree species?
Urban trees play a key role in health and aesthetics of a city. The space available for trees to grow is also linked to the area of the city, the population and the population density. Also having a wide diversity of tree species is crucial. So I first tried to understand how many trees are there in the different cities along with the key species.
Figure 1-Tree numbers analysis
From the study above I realized that there is big disparity in the number of trees and the species of trees present in the different cities. We can see that Palo Alto is having the largest number of trees while Walnut Creek is having the smallest number. We could also find out that while the most common tree is Magnolia grandiflora with 10,265 numbers, at the bottom rung there are several species with just one representative.
Subsequently I decided to study whether there is any impact of the population or population density of a city on the number of tree species (Figure below). First at the bottom right corner I tabulated the city wise number of species. We can see that Palo Alto is having the highest number of species while Walnut Creek has the lowest number. I then decided to plot the number if tree species against the population (in 2020) of the seven cities. This graph is visible on the top right. Here we can observe that there is not much difference in species diversity with increase in population, except with Palo Alto and Walnut Creek being outliers. I then repeated the exercise by using the population density metrics to examine the data, but here also the result was similar.
Figure 2-Tree species distribution analysis
EXPLORING HYPOTHESIS 2: Is there diversity in the tree types grown in the Bay area?
Another key parameter about trees is that they can be categorized based on types. Here we are using the following four types - Broadleaved Deciduous, Broadleaved Evergreen, Conifer Evergreen, Palm Evergreen and Others. Different types of trees naturally grow in areas more suited to their particular climatic conditions. Here since we are evaluating urban trees, we have to include the aspect of human intervention in their introduction and propagation also. Therefore the distribution of different types of trees we see is also the product of the decisions taken by city planners and home owners decades ago.
Figure 3-Tree type distribution
In this (Figure above) analysis ,I wanted to quantify the city wise distribution of the different types of trees. This analysis has given some interesting insights. We can see that by sheer numbers Broadleaved Deciduous type of tree species dwarfs every other category with 24,887 nos, while Palm Evergreen trees form the smallest group with just 724 nos. Among cities, Berkeley is having the largest number of Broadleaved Deciduous trees, while Palo Alto, Hayward and San Mateo top for Broadleaf Evergreen, Conifer Evergreen and Palm Evergreen respectively.
EXPLORING HYPOTHESIS 3: Is there variation in the average age of the trees in the Bay area?
In forestry a key parameter of tree is its diameter which measured at chest height (DBH). For every species one can determine the height of a tree using its DBH by referring to the local tree tables prepared by years of observation and research by the US Forest Service. So, in the absence of height parameter of each tree in this dataset we will use the DBH for assessing the current state of growth of the different trees.
In the dashboard (Figure-Below) I used two bar graphs faceted for tree type to understand the distribution of DBH. In the first one, we can observe that the average DBH is different across different cities and types of trees. For Broadleaved Deciduous trees the average DBH increases from 1.895 for Berkeley to 2.574 ft for Palo Alto. This shows that the current trees in Berkeley of this type is much younger than similar trees in Palo Alto specially since both cities fall in the same bio-climatic region and so must have the same growth characteristics. This deduction can be extended to different types of trees across different cities.
Similarly we analyzed to find the median DBH across the cities across types and found that it is
mostly the same across a type of tree.
Figure 4-Tree DBH distribution
EXPLORING HYPOTHESIS 4: Is there any relation between the number of trees in a street and the diversity of species there?
When I first considered this hypothesis, I thought that it is only intuitive to imagine that with an increase in the number of trees there will also be an increase in the species diversity. I further wanted to study this at a street level across cities.
So first I wanted to check the street wise number of trees across the seven cities. This can be seen from the chart on the top right corner. Here we can see that average number of trees per street is 55, while the maximum number is 3431 for Farm Hill Boulevard in Redwood City.
Next, I analyzed the number of tree species found in each street across the seven cities. This can be seen in the chart on the top left corner of the dashboard below. Here we can see that Burlingame Avenue is having the highest number of 134 species.
Figure 5-Street wise tree distribution
Now after this analysis I wanted to find out which city has the highest street wise per capita diversity of trees. I decided to plot the number of tree species in a city against the number of trees in the street and faceted using the city variable. The trend line for each different city is an indicator of how diverse the streets of the city are in terms of tree species. Higher the slope, higher the diversity and vice versa.
Figure 6-Finding cities with most tree diverse streets
CONCLUSION
This EDA project helped me understand the nitty-gritties of selection of dataset and working with a wide variety of visualization. This project helped changed several presumptions I had about the nature of trees and environment in the Bay area. When we talk of trees, we think of forests in the Yosemite but we forget that the urban trees also play a crucial role in helping the environment as well as the humans living near them. Thus using the insights gleaned from this analysis the city planners could plan ahead to increase the diversity of tree species in the cities to help increase urban biodiversity.
REFERENCES
McPherson, E. Gregory; van Doorn, Natalie S.; de Goede, John. 2017. Raw urban street tree inventory data for 49 California cities. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2017-0010
Wikimedia Foundation. (2023, February 23). Berkeley, California. Wikipedia. Retrieved March 8, 2023, from https://en.wikipedia.org/wiki/Berkeley,_California
Wikimedia Foundation. (2023, February 23). Redwood City, California. Wikipedia. Retrieved March 8, 2023, from https://en.wikipedia.org/wiki/ Redwood_City,_California
Wikimedia Foundation. (2023, February 23). Burlingame, California. Wikipedia. Retrieved March 8, 2023, from https://en.wikipedia.org/wiki/ Burlingame,_California
Wikimedia Foundation. (2023, February 23). Walnut Creek, California. Wikipedia. Retrieved March 8, 2023, from https://en.wikipedia.org/wiki/ Walnut_Creek,_California
Wikimedia Foundation. (2023, February 23). Palo Alto, California. Wikipedia. Retrieved March 8, 2023, from https://en.wikipedia.org/wiki/ Palo_Alto,_California
Wikimedia Foundation. (2023, February 23). Hayward, California. Wikipedia. Retrieved March 8, 2023, from https://en.wikipedia.org/wiki/ Hayward,_California
Wikimedia Foundation. (2023, February 23). San Mateo, California. Wikipedia. Retrieved March 8, 2023, from https://en.wikipedia.org/wiki/ San_Mateo,_California