Scientific Papers

Effects of greenery at different heights in neighbourhood streetscapes on leisure walking: a cross-sectional study using machine learning of streetscape images in Sendai City, Japan | International Journal of Health Geographics

Description of Image

Study area

To examine the urban environment, this study focused on participants residing in urban areas, referred to as Densely Inhabited Districts (DIDs), in Sendai City, Miyagi Prefecture, Japan (Fig. 1). Sendai City is the regional capital in the Northeast region of Japan and consists of five administrative districts: Aoba-ku, Izumi-ku, Miyagino-ku, Taihaku-ku, and Wakabayashi-ku. According to the 2020 Population Census, it has a total population of 1,096,704. Since the period of high economic growth in the 1960s, the residential areas initially developed on plateau areas have spread to hilly suburbs [27]. The city centre of Sendai is surrounded by nature, such as the Hirose River and the Aobayama hills, and the area is adorned with abundant greenery, including tree-lined streets [28].

Fig. 1
figure 1

Study area location: a the location of Sendai City in Miyagi Prefecture, Japan; b the extent of study area in Sendai City


This study used self-reported questionnaires from the baseline survey of the Tohoku Medical Megabank Community-Based Cohort Study (TMM CommCohort Study). The TMM CommCohort Study recruited people aged 20 years old and over, who were registered in the basic resident register of all municipalities in Miyagi and Iwate Prefectures at the time of enrolment. The survey was performed at specific municipal health check sites, the Community Support Centre in Tohoku Medical Megabank Organization (ToMMo), and the satellite in Iwate Medical University Iwate Tohoku Medical Megabank Organization (IMM) [29]. The baseline survey was conducted between 2013 and 2015. Of the 17,688 participants in the TMM CommCohort baseline survey who took the type 2 survey (including those who participated in the type 2 survey after the type 1 survey), 16,928 consented and participated as of 19 August 2021. Of those, 4450 participants who resided in the DIDs in Sendai City were included in the analysis. The human subjects’ committee of the Tohoku Medical Megabank Organization Institution, Tohoku University approved the survey protocol (approval no. 2019-4-065 and 2019-4-032). Informed consent was obtained from all participants, including assuring voluntary participation and the right to withdraw at any time. The analysis was designed and conducted in accordance with the applicable guidelines and regulations for the use and analysis of the TMM CommCohort Study.


Leisure walking time per week (min), based on the validated self-reported questionnaires of the TMM CommCohort Study, was used as an indicator of walking behaviour [30]. In the questionnaires, the participants answered the frequency and average duration of leisure walking, by which the leisure walking time per week was calculated for each participant according to the following formula [31]: leisure walking time (min/week) = duration (h) × 60 (min/h) × frequency (frequency/day) × 7 (day/week). Participants provided an average frequency over the course of one year to eliminate seasonal differences. The average duration categories (assigned average hours per activity) were: < 30 min (0.25), 30 min to 1 h (0.75), 1 to < 2 h (1.5), 2 to < 3 h (2.5), 3 to < 4 h (3.5), and ≥ 4 h (4.0). The frequency categories (assigned average frequency per day) for leisure walking in the questionnaire were: almost none (0), less than once per month (0.5/30), one to three times per month (2/30), one to two times per week (1.5/7), three to four times per week (3.5/7 = 0.5), and almost every day (7/7 = 1.0).

Streetscape indices of greenery

A greenery index was developed using GSV images. We used green visibility, which considers height from the viewer’s perspective. This facilitated a visual understanding of how the overhead greenery are perceived by a pedestrian on the street.

(1) Obtaining GSV images

ArcMap 10.8.1 (ESRI Inc.) was used to set up sample sites for streetscape assessment at 50-m intervals on roads in DIDs and within 1 km from the edge of DIDs in Sendai City. The road data were obtained from ArcGIS GeoSuite Road Network (Esri Japan Inc.) as of 2020. Points on motorways (Sendai Nishi Road, Sendai-Tobu Road, Sendai-Nanbu Road, Tohoku Expressway, and Sanriku Expressway) were excluded. The final set contained 65,762 landscape assessment observations. We obtained GSV images for each sample site using the following procedure. First, metadata were obtained for the closest GSV images taken within a 10 m-radius of each sample site in each year from 2013 to 2015. The metadata include PanoID, latitude, longitude, year and month of image acquisition, and direction of the image. PanoID is a unique ID assigned to each GSV image. It consists of numbers, letters, and symbols, with a total of 22 characters. We could obtain the metadata for GSV images of 59,429 (90.64%) of the sites. The metadata for GSV images of 6333 (9.36%) of the sites could not be acquired and were considered missing values. Second, GSV images (1664 px × 832 px) were obtained based on PanoID. To account for seasonality, images taken during the greener months of April to September were preferentially targeted. For sample sites where only images taken in January to March or October to December were available, these GSV images were used. Of the 59,429 images, 1121 (1.89%) were taken in spring (March to May), 39,895 (67.13%) in summer (June to August), 18,412 (30.98%) in autumn (September to November), and 1 (0.00%) in winter (January, February and December).

(2) Green extraction

A machine learning approach was implemented to identify green areas from the downloaded GSV images. We used DeepLab v3 + [32] to perform the semantic segmentation of images. DeepLab v3 + is a deep learning model for semantic segmentation. Semantic segmentation is a method of classifying images into pixel-wise labels of semantic classes of objects.

Following Nagata et al. [15], this study used a pre-trained model of the Cityscapes Dataset [33] as training data and classified each pixel of the GSV image into 19 classes. Each classified pixel was assigned a corresponding label. The Cityscapes Dataset has 30 classes defined for annotations. Those classes are based on eight groups: flat, construction, nature, vehicle, sky, object, human, and void. In the dataset, classes that are too rare are excluded, leaving 19 classes for evaluation [33]. In this study, as per Xia et al. [34], pixels classified as vegetation classes were treated as green.

(3) Calculation of green visibility

GSV images acquired using equi-cylindrical projection were transformed into a sky map (orthographic projection method) following the method presented by Nishio and Ito [35] (Fig. 2). Green visibility was calculated as follows.

Fig. 2
figure 2

Transformation of the cylindrical GSV panorama to fisheye image

Given that W is the width of the GSV image, \((\mathrm{X},\mathrm{ Y})\) are the orthogonal coordinates of a point in the GSV image, the corresponding point in the sky map in polar coordinates \((\mathrm{r},\uptheta )\), and R is the radius of the sky chart, the following relationship holds:

$$R\, = \,\frac{W}{2\pi },\,X\, = \,R \cdot \theta ,\,Y = \frac{R}{r} \cdot \sqrt {R^{2} – r^{2} }$$

We defined overall green visibility as the ratio of the number of pixels classified as vegetation class to the number of pixels in the whole sky part of the sky map obtained using the above equation.

The sky map was classified into four ranges (0°–22.5°; 22.5°–45.0°; 45.0°–67.5°; 67.5°–90.0°) according to the latitude of the virtual hemisphere, and the ratio of the number of pixels classified as vegetation class to those in all parts of each range was calculated (green visibility separated by latitude) (Fig. 3). Higher latitude range indices represent higher green visibility from the viewpoint location (90° corresponded to zenith).

Fig. 3
figure 3

Green visibility separated by latitude

(4) Neighbourhood indicators of street greenery

Following previous studies [36, 37], we defined the neighbourhood of a participant as the area within the road network buffer within 1000 m from their residence. The mean green visibility on the streets within this area was computed for each participant. This study included only participants with at least 20 sampling sites where the GSV image could be obtained in the buffer; thus, 4431 participants were included and 19 excluded.

Statistical analysis

We conducted multilevel regression analysis (mixed-effect models) in R 4.1.0 using the ‘lme4’ package [38] to investigate the relationships between leisure walking behaviour and the streetscape indicators of green in the neighbourhood of a participant. We set the leisure walking hours per week as the dependent variable, and each of five neighbourhood indicators of green (green visibility of the whole range or four green visibility zones separated by latitude) as the independent variable in the regression analysis. To ensure result robustness, the models were used with a random intercept at the elementary school district level to account for possible clustering tendency of walking hours due to unknown neighbourhood factors.

Model A only considers the age group (20 s; 30 s; 40 s; 50 s; 60 s; \(\ge\) 70) and gender (male; female) as control variables. In Model B, we added the following variables as control variables to Model A: marital status (married; separated; widowed; single), education (primary or junior high school; high school; junior college, technical college or vocational school; university or graduate school; other), family size (number of members, including self), employment status (employed with income; unemployed), alcohol consumption (never; unable; current; former), degree of urbanization (calculated based on the population density of the neighbourhood), and the number of urban parks in a neighbourhood (as an indicator of recreational neighbourhood resource). Population data were obtained from the 2015 Population Census using the quarter grid square data (Statistics Bureau, 1996), which counts population for each grid cell of approximately 250 m side-length. We calculated population density by distributing the population in each grid proportionally by the area where it intersects the 1000-m network buffer from the residence and dividing by the area of the buffer. We included alcohol consumption as a control variable of health awareness. The data concerning urban parks was obtained from the digital national land information provided by the Ministry of Land, Infrastructure, Transport, and Tourism of Japan (

In a series of regression analyses, we conducted a complete-case analysis.

Description of Image

Source link