packages <- c('sf', 'tidyverse', 'tmap', 'httr', 'jsonlite', 'rvest',
'sp', 'ggpubr', 'corrplot', 'broom', 'olsrr', 'spdep',
'GWmodel', 'devtools', 'rgeos', 'lwgeom', 'maptools', 'rsample', 'Metrics', 'SpatialML')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p, repos = "http://cran.us.r-project.org")
}
library(p, character.only = T)
}Take Home Exercise 3: Predicting HDB Public Housing Resale Pricies using Geographically Weighted Methods
Background Information
In every country, housing is a crucial part of household wealth. Whether for a couple or an individual, owning a home has always been a dream or aspiration. Several things influence housing costs. Some of them have a worldwide scope, such the overall health of a nation’s economy or the level of inflation. Some may focus more on the properties themselves. You can further divide these characteristics into structural and geographic ones. Structural considerations are aspects of the property itself, such as its size, configuration, and tenure. Locational factors are elements that relate to the area around the properties, such as closeness to a daycare center, a public transportation stop, and a shopping center.
Standard predictive models for house resale prices were created utilizing the Ordinary Least Square (OLS) approach. The spatial autocorrelation and spatial variability present in geographic data sets like housing transactions, however, were not taken into account by this method. The OLS estimate of predictive housing resale pricing models may produce inaccurate, inconsistent, or ineffective findings due to the presence of spatial autocorrelation (Anselin 1998). With this restriction, Geographic Weighted Models were developed to calibrate house resale price prediction models.
The Objective
FOr this exercise, I am required to forecast HDB resale prices in Singapore for the months of January and February 2023 at the sub-market level (i.e. HDB 3-room, HDB 4-room, and HDB 5-room). The traditional OLS method and GWR approaches must be used to build the predictive models. I must also contrast the performance of the geographically weighted approaches and the traditional OLS method.
1. Datasets
Aspatial dataset:
- HDB Resale data: a list of HDB resale transacted prices in Singapore from Jan 2017 onwards. It is in csv format which can be downloaded from Data.gov.sg.
Geospatial dataset:
- MP14_SUBZONE_WEB_PL: a polygon feature data providing information of URA 2014 Master Plan Planning Subzone boundary data. It is in ESRI shapefile format. This data set was also downloaded from Data.gov.sg
Location factors with regards to the geographic coordinates:
Downloaded from Data.gov.sg.
Eldercare data is a list of eldercare in Singapore. It is in shapefile format.
Hawker Centre data is a list of hawker centres in Singapore. It is in geojson format.
Parks data is a list of parks in Singapore. It is in geojson format.
Supermarket data is a list of supermarkets in Singapore. It is in geojson format.
CHAS clinics data is a list of CHAS clinics in Singapore. It is in geojson format.
Childcare service data is a list of childcare services in Singapore. It is in geojson format.
Kindergartens data is a list of kindergartens in Singapore. It is in geojson format.
Downloaded from Datamall.lta.gov.sg.
MRT data is a list of MRT/LRT stations in Singapore with the station names and codes. It is in shapefile format.
Bus stops data is a list of bus stops in Singapore. It is in shapefile format.
Location factors without geographic coordinates:
Downloaded from Data.gov.sg.
- Primary school data is extracted from the list on General information of schools from data.gov portal. It is in csv format.
Retrieved/Scraped from other sources
CBD coordinates obtained from Google.
Shopping malls data is a list of Shopping malls in Singapore obtained from Wikipedia.
Good primary schools is a list of primary schools that are ordered in ranking in terms of popularity and this can be found at Local Salary Forum.
2. Loading the R packages
The following code chunk will perform the following task. A list called packages will be created and will consists of all the R packages required to accomplish this exercise. There will be a check to see if R packages on package have been installed in R and if not, they will be installed. After which when all the R packages have been installed, the packages will then be loaded
library(xaringanExtra)More on the packages used:
sf: used for importing, managing, and processing geospatial data
tidyverse: used for importing, wrangling and visualising data. It consists of a family of R packages, such as:
readr for importing csv data,
readxl for importing Excel worksheet,
tidyr for manipulating data,
dplyr for transforming data, and
ggplot2 for visualising data
tmap: provides functions for plotting cartographic quality static inpoint patterns maps or interactive maps by using leaflet API.
httr: Useful tools for working with HTTP organised by HTTP verbs (GET(), POST(), etc). Configuration functions make it easy to control additional request components (authenticate(), add_headers() and so on).
- In this analysis, it will be used to send GET requests to OneMapAPI SG to retrieve the coordinates of addresses.
jsonlite: A simple and robust JSON parser and generator for R. It offers simple, flexible tools for working with JSON in R, and is particularly powerful for building pipelines and interacting with a web API.
rvest: A new package that makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup.
- In this analysis, it will be used to scrape data for shopping malls and good primary schools
sp: provides classes and methods for dealing with spatial data in R.
ggpubr: provides some easy-to-use functions for creating and customizing ggplot2 based publication ready plots
- In this analysis, it will be used to arrange multiple ggplots.
corrplot: For Multivariate data visualisation and analysis
broom: Takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy tibble.
- In this analysis, functions like tidy and glance will be used to construct a tibble / summmary of the model which is easier to look at.
oslrr: Used to build OLD and performing diagnostic tests.
spdep: For spatial dependence statistics.
GWmodel: Calibrate geographical weighted family of modes.
devtools: used for installing any R packages which is not available in RCRAN. In this exercise, I will be installing using devtools to install the package xaringanExtra which is still under development stage.
xaringanExtra: is an enhancement of xaringan package. As it is still under development stage, we can still install the current version using install_github function of devtools. This package will be used to add Panelsets to contain both the r code chunk and results whereever applicable.
3. Importing of Aspatial Data & Wrangling
read_csv() function of readr package will be used to import resale-flat-prices into R as a tibble data frame called resale. glimpse() function of dplyr package is used to display the data structure
resale <- read_csv("data/aspatial/resale-flat-prices.csv")glimpse(resale)Rows: 148,373
Columns: 11
$ month <chr> "2017-01", "2017-01", "2017-01", "2017-01", "2017-…
$ town <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO …
$ flat_type <chr> "2 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", …
$ block <chr> "406", "108", "602", "465", "601", "150", "447", "…
$ street_name <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 4", "ANG MO K…
$ storey_range <chr> "10 TO 12", "01 TO 03", "01 TO 03", "04 TO 06", "0…
$ floor_area_sqm <dbl> 44, 67, 67, 68, 67, 68, 68, 67, 68, 67, 68, 67, 67…
$ flat_model <chr> "Improved", "New Generation", "New Generation", "N…
$ lease_commence_date <dbl> 1979, 1978, 1980, 1980, 1980, 1981, 1979, 1976, 19…
$ remaining_lease <chr> "61 years 04 months", "60 years 07 months", "62 ye…
$ resale_price <dbl> 232000, 250000, 262000, 265000, 265000, 275000, 28…
When we load in the dataset for the first time, we can see that:
The dataset contains 11 columns with 148,373 rows. The columns that are present in the data are: month, town, flat_type, block, street_name, storey_range, floor_area_sqm, flat_model, lease_commence_date, remaining_lease, resale_price.
For this take home exercise 3, we are allowed the option to choose to perform our analysis between either 3, 4 or 5 room flat transactions. Therefore, I will be selecting the 3 room flat transactions during the transaction period from 1st January 2021 to 31st December 2022. Test data should be included for January and February 2023 resale prices.
3.1 Filtering HDB Resale Data
filter() function of dplyr package will be used to select the desired flat_type and dates which will be stored in rs_subset.
rs_subset <- filter(resale,flat_type == "3 ROOM") %>%
filter(month >= "2021-01" & month <= "2023-02")unique() function of R package will be used to check whether flat_type and month have been extracted successfully.
unique(rs_subset$month) [1] "2021-01" "2021-02" "2021-03" "2021-04" "2021-05" "2021-06" "2021-07"
[8] "2021-08" "2021-09" "2021-10" "2021-11" "2021-12" "2022-01" "2022-02"
[15] "2022-03" "2022-04" "2022-05" "2022-06" "2022-07" "2022-08" "2022-09"
[22] "2022-10" "2022-11" "2022-12" "2023-01" "2023-02"
unique(rs_subset$flat_type)[1] "3 ROOM"
glimpse() function will be used to view the overall resale transactions available for 3 room flat in Singapore.
glimpse(rs_subset)Rows: 13,780
Columns: 11
$ month <chr> "2021-01", "2021-01", "2021-01", "2021-01", "2021-…
$ town <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO …
$ flat_type <chr> "3 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", …
$ block <chr> "331", "534", "561", "170", "463", "542", "170", "…
$ street_name <chr> "ANG MO KIO AVE 1", "ANG MO KIO AVE 10", "ANG MO K…
$ storey_range <chr> "04 TO 06", "04 TO 06", "01 TO 03", "07 TO 09", "0…
$ floor_area_sqm <dbl> 68, 68, 68, 60, 68, 68, 60, 73, 67, 67, 68, 68, 73…
$ flat_model <chr> "New Generation", "New Generation", "New Generatio…
$ lease_commence_date <dbl> 1981, 1980, 1980, 1986, 1980, 1981, 1986, 1976, 19…
$ remaining_lease <chr> "59 years", "58 years 02 months", "58 years 01 mon…
$ resale_price <dbl> 260000, 265000, 265000, 268000, 268000, 270000, 27…
From the glimpse() function result, we can see that from Jan 2021 to December 2022, there are 23,656 transactions for 3 room flat in Singapore.
3.2 Transforming HDB Resale Data Columns
Here, mutate function of dplyr package will be used to create columns such as:
address: concatenation of theblockandstreet_namecolumns using paste() function of base R package.remaining_lease_yr&remaining_lease_mth: split the year and months part of theremaining_leaserespectively using str_sub() function of stringr package then converting the character to integer using as.integer() function of base R package. After performing mutate function, we will store the new data inrs_transform
rs_transform <- rs_subset %>%
mutate(rs_subset, address = paste(block,street_name)) %>%
mutate(rs_subset, remaining_lease_yr = as.integer(str_sub(remaining_lease, 0, 2)))%>%
mutate(rs_subset, remaining_lease_mth = as.integer(str_sub(remaining_lease, 9, 11)))head(rs_transform)# A tibble: 6 × 14
month town flat_type block street_name storey_range floor_area_sqm flat_model
<chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 2021… ANG … 3 ROOM 331 ANG MO KIO… 04 TO 06 68 New Gener…
2 2021… ANG … 3 ROOM 534 ANG MO KIO… 04 TO 06 68 New Gener…
3 2021… ANG … 3 ROOM 561 ANG MO KIO… 01 TO 03 68 New Gener…
4 2021… ANG … 3 ROOM 170 ANG MO KIO… 07 TO 09 60 Improved
5 2021… ANG … 3 ROOM 463 ANG MO KIO… 04 TO 06 68 New Gener…
6 2021… ANG … 3 ROOM 542 ANG MO KIO… 04 TO 06 68 New Gener…
# ℹ 6 more variables: lease_commence_date <dbl>, remaining_lease <chr>,
# resale_price <dbl>, address <chr>, remaining_lease_yr <int>,
# remaining_lease_mth <int>
3.2.1 Sum remaining lease in months
There are some NA values in remaining lease months with value of 0. We need to multiply the remaining_lease_yr by 12 to convert into months. The remaining_lease_mths column will be created using mutate function of dplyr package which contains the summation of the remaining_lease_yr and remaining_lease_mths using rowSums() of R package.
rs_transform$remaining_lease_mth[is.na(rs_transform$remaining_lease_mth)] <- 0
rs_transform$remaining_lease_yr <- rs_transform$remaining_lease_yr * 12
rs_transform <- rs_transform %>%
mutate(rs_transform, remaining_lease_mths = rowSums(rs_transform[, c("remaining_lease_yr", "remaining_lease_mth")])) %>%
select(month, town, address, block, street_name, flat_type, storey_range, floor_area_sqm, flat_model,
lease_commence_date, remaining_lease_mths, resale_price)head(rs_transform)# A tibble: 6 × 12
month town address block street_name flat_type storey_range floor_area_sqm
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 2021-01 ANG M… 331 AN… 331 ANG MO KIO… 3 ROOM 04 TO 06 68
2 2021-01 ANG M… 534 AN… 534 ANG MO KIO… 3 ROOM 04 TO 06 68
3 2021-01 ANG M… 561 AN… 561 ANG MO KIO… 3 ROOM 01 TO 03 68
4 2021-01 ANG M… 170 AN… 170 ANG MO KIO… 3 ROOM 07 TO 09 60
5 2021-01 ANG M… 463 AN… 463 ANG MO KIO… 3 ROOM 04 TO 06 68
6 2021-01 ANG M… 542 AN… 542 ANG MO KIO… 3 ROOM 04 TO 06 68
# ℹ 4 more variables: flat_model <chr>, lease_commence_date <dbl>,
# remaining_lease_mths <dbl>, resale_price <dbl>
3.3 Retrieval of Addresses & its Coordinates
Data such as postal codes and coordinates of the addresses is required to get the proximity to various location factors later on.
3.3.1 Creaion of unique list of addresses
Unique addresses will be created and stored in add_list. unique() function of base R package will be used to extract the unique addresses and sort() function of R package to sort the unique vector.
add_list <- sort(unique(rs_transform$address))3.3.2 Creation of function to retrieve Coordinates from OneMapSG API
A dataframe postal_coords will be created to store all final retrieved coordinates. The GET() function of httr package will be used to perform a GET request to https://developers.onemap.sg/commonapi/search. There are a few search arguments variables and information we have to take note of
searchVal: Unique keywords that user will enter to filter results
returnGeom {Y/N}: Yes or No to check if user want to return the geometry
getAddrDetails {Y/N}: Yes or No to check if user want to return address details for a point
Return JSON response will contain many fields but we are only interested in postal code and coordinates like Longitude and Latitude. A new dataframe new_row will be created and is used to store each final set of coordinates retrieved. There is also the need to check the number of responses because some searched location have 0, some only have 1 result and others have many. Finally, the JSON result will be appended to the dataframe postal_coords using rbind() function of R.
get_coords <- function(add_list){
# Create a data frame to store all retrieved coordinates
postal_coords <- data.frame()
for (i in add_list){
#print(i)
r <- GET('https://developers.onemap.sg/commonapi/search?',
query=list(searchVal=i,
returnGeom='Y',
getAddrDetails='Y'))
data <- fromJSON(rawToChar(r$content))
found <- data$found
res <- data$results
# Create a new data frame for each address
new_row <- data.frame()
# If single result, append
if (found == 1){
postal <- res$POSTAL
lat <- res$LATITUDE
lng <- res$LONGITUDE
new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
}
# If multiple results, drop NIL and append top 1
else if (found > 1){
# Remove those with NIL as postal
res_sub <- res[res$POSTAL != "NIL", ]
# Set as NA first if no Postal
if (nrow(res_sub) == 0) {
new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
}
else{
top1 <- head(res_sub, n = 1)
postal <- top1$POSTAL
lat <- top1$LATITUDE
lng <- top1$LONGITUDE
new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
}
}
else {
new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
}
# Add the row
postal_coords <- rbind(postal_coords, new_row)
}
return(postal_coords)
}3.3.3 Retrieve resale coordinates using the get_coords function
coords <- get_coords(add_list)3.3.4 Check results
Checking if columns contain any empty values
coords[(is.na(coords$postal) | is.na(coords$latitude) | is.na(coords$longitude) | coords$postal=="NIL"), ][1] address postal latitude longitude
<0 rows> (or 0-length row.names)
The outcome reveals that 215 CHOA CHU KANG CTRL has no postal code but does have available coordinates. The postal code to enter when searching on gothere.sg is 680215.
3.3.5 Combination of HDB resale and coordinate data
After getting the coordinates, we must merge them with the HDB resale dataset using the dplyr package’s left join() method. The information will be kept in rs coords.
rs_coords <- left_join(rs_transform, coords, by = c('address' = 'address'))head(rs_coords)# A tibble: 6 × 15
month town address block street_name flat_type storey_range floor_area_sqm
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 2021-01 ANG M… 331 AN… 331 ANG MO KIO… 3 ROOM 04 TO 06 68
2 2021-01 ANG M… 534 AN… 534 ANG MO KIO… 3 ROOM 04 TO 06 68
3 2021-01 ANG M… 561 AN… 561 ANG MO KIO… 3 ROOM 01 TO 03 68
4 2021-01 ANG M… 170 AN… 170 ANG MO KIO… 3 ROOM 07 TO 09 60
5 2021-01 ANG M… 463 AN… 463 ANG MO KIO… 3 ROOM 04 TO 06 68
6 2021-01 ANG M… 542 AN… 542 ANG MO KIO… 3 ROOM 04 TO 06 68
# ℹ 7 more variables: flat_model <chr>, lease_commence_date <dbl>,
# remaining_lease_mths <dbl>, resale_price <dbl>, postal <chr>,
# latitude <chr>, longitude <chr>
3.4 Writing file to rds
rs_coords_rds <- write_rds(rs_coords, "Data/rds/rs_coords.rds")3.5 Reading the rs_coords_rds file
rs_coords <- read_rds("Data/rds/rs_coords.rds")
glimpse(rs_coords)Rows: 13,780
Columns: 15
$ month <chr> "2021-01", "2021-01", "2021-01", "2021-01", "2021…
$ town <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO…
$ address <chr> "331 ANG MO KIO AVE 1", "534 ANG MO KIO AVE 10", …
$ block <chr> "331", "534", "561", "170", "463", "542", "170", …
$ street_name <chr> "ANG MO KIO AVE 1", "ANG MO KIO AVE 10", "ANG MO …
$ flat_type <chr> "3 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", "3 ROOM",…
$ storey_range <chr> "04 TO 06", "04 TO 06", "01 TO 03", "07 TO 09", "…
$ floor_area_sqm <dbl> 68, 68, 68, 60, 68, 68, 60, 73, 67, 67, 68, 68, 7…
$ flat_model <chr> "New Generation", "New Generation", "New Generati…
$ lease_commence_date <dbl> 1981, 1980, 1980, 1986, 1980, 1981, 1986, 1976, 1…
$ remaining_lease_mths <dbl> 708, 698, 697, 770, 698, 709, 768, 652, 682, 664,…
$ resale_price <dbl> 260000, 265000, 265000, 268000, 268000, 270000, 2…
$ postal <chr> "560331", "560534", "560561", "560170", "560463",…
$ latitude <chr> "1.36211140145298", "1.37405846295585", "1.370577…
$ longitude <chr> "103.85076647513", "103.854168170426", "103.85785…
3.5.1 Transform and Assign CRS
The Latitude & Longitude are in decimal, therefore the projected CRS will be WGS84. We will need to assign the CRS of 4326 first before transforming it to 3414 which is the EPSG code for Singapore SVY21
rs_coords_sf <- st_as_sf(rs_coords,
coords = c("longitude",
"latitude"),
crs=4326) %>%
st_transform(crs = 3414)st_crs(rs_coords_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
3.5.2 Plotting HDB resale points
tmap_mode("view")
tm_shape(rs_coords_sf)+
tm_dots(col="blue", size = 0.02)tmap_mode("plot")4. Importing Geospatial Locational Factors
4.1 Location factors with coordinates
4.1.1 Reading of location factors
elder_sf <- st_read(dsn = "Data/Geospatial", layer="ELDERCARE")Reading layer `ELDERCARE' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 133 features and 18 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21
mrtlrt_sf <- st_read(dsn = "Data/Geospatial", layer="MRTLRTStnPtt")Reading layer `MRTLRTStnPtt' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
bus_sf <- st_read(dsn = "Data/Geospatial", layer="BusStop")Reading layer `BusStop' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
hawker_sf <- st_read("Data/Geospatial/hawker-centres-geojson.geojson") Reading layer `hawker-centres-geojson' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial\hawker-centres-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449217
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
supermkt_sf <- st_read("Data/Geospatial/supermarkets-geojson.geojson") Reading layer `supermarkets-geojson' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial\supermarkets-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 526 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6258 ymin: 1.24715 xmax: 104.0036 ymax: 1.461526
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
chas_sf <- st_read("Data/Geospatial/moh-chas-clinics.geojson")Reading layer `moh-chas-clinics' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial\moh-chas-clinics.geojson'
using driver `GeoJSON'
Simple feature collection with 1064 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.5818 ymin: 1.016264 xmax: 103.9903 ymax: 1.456037
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
childcare_sf <- st_read("Data/Geospatial/childcare.geojson") Reading layer `childcare' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial\childcare.geojson'
using driver `GeoJSON'
Simple feature collection with 1545 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6824 ymin: 1.248403 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
kind_sf <- st_read("Data/Geospatial/preschools-location.geojson") Reading layer `preschools-location' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial\preschools-location.geojson'
using driver `GeoJSON'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6824 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
parks_sf <- st_read("Data/Geospatial/parks-geojson.geojson")Reading layer `parks-geojson' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial\parks-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 350 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6929 ymin: 1.214058 xmax: 104.0017 ymax: 1.461503
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
st_crs(elder_sf)Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(mrtlrt_sf)Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(bus_sf)Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(hawker_sf)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(supermkt_sf)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(chas_sf)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(childcare_sf)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(kind_sf)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(parks_sf)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
4.1.2 Assign EPSG code to sf dataframes
elder_sf <- st_set_crs(elder_sf, 3414)
mrtlrt_sf <- st_set_crs(mrtlrt_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)
hawker_sf <- hawker_sf %>%
st_transform(crs = 3414)
supermkt_sf <- supermkt_sf %>%
st_transform(crs = 3414)
chas_sf <- chas_sf %>%
st_transform(crs = 3414)
childcare_sf <- childcare_sf %>%
st_transform(crs = 3414)
kind_sf <- kind_sf %>%
st_transform(crs = 3414)
parks_sf <- parks_sf %>%
st_transform(crs = 3414)st_crs(elder_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mrtlrt_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(bus_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(hawker_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(supermkt_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(chas_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(childcare_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(kind_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
#st_crs(parks_sf)length(which(st_is_valid(elder_sf) == FALSE))[1] 0
length(which(st_is_valid(mrtlrt_sf) == FALSE))[1] 0
length(which(st_is_valid(hawker_sf) == FALSE))[1] 0
length(which(st_is_valid(parks_sf) == FALSE))[1] 0
length(which(st_is_valid(supermkt_sf) == FALSE))[1] 0
length(which(st_is_valid(chas_sf) == FALSE))[1] 0
length(which(st_is_valid(childcare_sf) == FALSE))[1] 0
length(which(st_is_valid(kind_sf) == FALSE))[1] 0
length(which(st_is_valid(bus_sf) == FALSE))[1] 0
4.1.3 Calculating proximity of the locations
The get prox function will be developed to calculate the distances between the site and HDB resale factors.
get_prox <- function(origin_df, dest_df, col_name){
# creates a matrix of distances
dist_matrix <- st_distance(origin_df, dest_df)
# find the nearest location_factor and create new data frame
near <- origin_df %>%
mutate(PROX = apply(dist_matrix, 1, function(x) min(x)) / 1000)
# rename column name according to input parameter
names(near)[names(near) == 'PROX'] <- col_name
return(near)
}get_prox function will be called to get the proximity of the resale HDB and location factors such as:
Eldercare
MRT
Hawker
Parks
Supermarkets
CHAS clinics
rs_coords_sf <- get_prox(rs_coords_sf, elder_sf, "PROX_ELDERLYCARE")
rs_coords_sf <- get_prox(rs_coords_sf, mrtlrt_sf, "PROX_MRT")
rs_coords_sf <- get_prox(rs_coords_sf, hawker_sf, "PROX_HAWKER")
rs_coords_sf <- get_prox(rs_coords_sf, parks_sf, "PROX_PARK")
rs_coords_sf <- get_prox(rs_coords_sf, supermkt_sf, "PROX_SUPERMARKET")
rs_coords_sf <- get_prox(rs_coords_sf, chas_sf, "PROX_CHAS")4.1.4 Calculation of location factors within distance
Using the st distance feature of the sf package, the get within function will generate a matrix of distances between the HDB and the location factor. Additionally, it will use the sum() function of the basic R package to acquire the sum of the location factor points that are under the threshold distance before adding it to the HDB resale data under a new column using the modify() function of the dpylr package. Last but not least, it will change the column name in accordance with the input provided by the user so that the columns have appropriate names that are unique from one another.
get_within <- function(origin_df, dest_df, threshold_dist, col_name){
# creates a matrix of distances
dist_matrix <- st_distance(origin_df, dest_df)
# count the number of location_factors within threshold_dist and create new data frame
wdist <- origin_df %>%
mutate(WITHIN_DT = apply(dist_matrix, 1, function(x) sum(x <= threshold_dist)))
# rename column name according to input parameter
names(wdist)[names(wdist) == 'WITHIN_DT'] <- col_name
# Return df
return(wdist)
}4.1.4.1 Calling the function for all location factors
In this instance, we will set the criterion to Within 350m for location parameters such kindergartens, daycare centers, and bus stops.
rs_coords_sf <- get_within(rs_coords_sf, kind_sf, 350, "WITHIN_350M_KINDERGARTEN")rs_coords_sf <- get_within(rs_coords_sf, childcare_sf, 350, "WITHIN_350M_CHILDCARE")rs_coords_sf <- get_within(rs_coords_sf, bus_sf, 350, "WITHIN_350M_BUS")4.2 Location factors without coordinates
4.2.1 CBD Town Area
Most people commute to the Central Business District on a daily basis for work. Hence, it is a crucial step. The Downtown Core, commonly known as CBD, may be located with a quick google search at latitude and longitude of 1.287953 and 103.851784, respectively.
# Store CBD Coordinates in Dataframe
name <- c('CBD Area')
latitude= c(1.287953)
longitude= c(103.851784)
cbd_coords <- data.frame(name, latitude, longitude)Convert data frame into sf object and transform crs
cbd_coords_sf <- st_as_sf(cbd_coords,
coords = c("longitude",
"latitude"),
crs=4326) %>%
st_transform(crs = 3414)
st_crs(cbd_coords_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The get prox function will be used to determine how close the HDB resale units are to the CBD region.
rs_coords_sf <- get_prox(rs_coords_sf, cbd_coords_sf, "PROX_CBD")4.2.2 Shopping Malls Places
We may check the XPaths of the various lists on the wikipedia page. There are numerous lists of retail malls grouped by regions in Singapore.
url <- "https://en.wikipedia.org/wiki/List_of_shopping_malls_in_Singapore"
malls_list <- list()
for (i in 1:7){
malls <- read_html(url) %>%
html_nodes(xpath = paste('//*[@id="mw-content-text"]/div[1]/div[',as.character(i),']/ul/li',sep="") ) %>%
html_text()
malls_list <- append(malls_list, malls)
}
#malls_listThere are 164 shopping malls to choose from in the malls list result. We must utilize the previously constructed get coords method to search the names of these shopping malls because they don’t include their coordinates.
malls_list_coords <- get_coords(malls_list) %>%
rename("mall_name" = "address")malls_list_coords <- subset(malls_list_coords, mall_name!= "Yew Tee Shopping Centre")Due to Wikipedia page numbering, we need to change the names of malls that are no longer relevant and malls that have an index beside them, as indicated in the result above.
invalid_malls<- subset(malls_list_coords, is.na(malls_list_coords$postal))
invalid_malls_list <- unique(invalid_malls$mall_name)
corrected_malls <- c("Clarke Quay", "City Gate", "Raffles Holland V", "Knightsbridge", "Mustafa Centre", "GR.ID", "Shaw House",
"The Poiz Centre", "Velocity @ Novena Square", "Singapore Post Centre", "PLQ Mall", "KINEX", "The Grandstand")
for (i in 1:length(invalid_malls_list)) {
malls_list_coords <- malls_list_coords %>%
mutate(mall_name = ifelse(as.character(mall_name) == invalid_malls_list[i], corrected_malls[i], as.character(mall_name)))
}malls_list <- sort(unique(malls_list_coords$mall_name))malls_coords <- get_coords(malls_list)To check if the malls still have any NA values
malls_coords[(is.na(malls_coords$postal) | is.na(malls_coords$latitude) | is.na(malls_coords$longitude)), ][1] address postal latitude longitude
<0 rows> (or 0-length row.names)
To convert the mall dataframe into SF object and perform transformation of CRS
malls_sf <- st_as_sf(malls_coords,
coords = c("longitude",
"latitude"),
crs=4326) %>%
st_transform(crs = 3414)To retrieve proximity between the HDB resale flats and Shopping Malls
rs_coords_sf <- get_prox(rs_coords_sf, malls_sf, "PROX_MALL") 4.2.3 Primary Schools
To extract and retrieve only primary schools data
pri_sch <- read_csv("Data/Aspatial/general-information-of-schools.csv")
pri_sch <- pri_sch %>%
filter(mainlevel_code == "PRIMARY") %>%
select(school_name, address, postal_code, mainlevel_code)
glimpse(pri_sch)Rows: 183
Columns: 4
$ school_name <chr> "ADMIRALTY PRIMARY SCHOOL", "AHMAD IBRAHIM PRIMARY SCHO…
$ address <chr> "11 WOODLANDS CIRCLE", "10 YISHUN STREET 11", "100 …
$ postal_code <chr> "738907", "768643", "579646", "159016", "544969", "5697…
$ mainlevel_code <chr> "PRIMARY", "PRIMARY", "PRIMARY", "PRIMARY", "PRIMARY", …
There are 183 Primary Schools in Singapore.
To create a list to store postal code of the schools and retrieve the coordinates
prisch_list <- sort(unique(pri_sch$postal_code))
prisch_coords <- get_coords(prisch_list)To check if the primary schools still have any NA values
prisch_coords[(is.na(prisch_coords$postal) | is.na(prisch_coords$latitude) | is.na(prisch_coords$longitude)), ][1] address postal latitude longitude
<0 rows> (or 0-length row.names)
To combine coordinates with the schools
prisch_coords = prisch_coords[c("postal","latitude", "longitude")]
pri_sch <- left_join(pri_sch, prisch_coords, by = c('postal_code' = 'postal'))To convert primary schools dataframe into SF object and perform transformation of CRS
prisch_sf <- st_as_sf(pri_sch,
coords = c("longitude",
"latitude"),
crs=4326) %>%
st_transform(crs = 3414)To obtain schools within a proximity of 1km
rs_coords_sf <- get_within(rs_coords_sf, prisch_sf, 1000, "WITHIN_1KM_PRISCH")4.2.4 Proximity to Good Primary Schools
One website in particular that we may use to find reputable primary schools in Singapore is www.salary.sg, which ranks schools based on how well-liked they are. The code that we used to extract the rankings of Singapore’s shopping centers may also be used to retrieve the rankings of elementary schools.
url <- "https://www.salary.sg/2021/best-primary-schools-2021-by-popularity/"
good_pri <- data.frame()
schools <- read_html(url) %>%
html_nodes(xpath = paste('//*[@id="post-3068"]/div[3]/div/div/ol/li') ) %>%
html_text()
for (i in (schools)){
sch_name <- toupper(gsub(" – .*","",i))
sch_name <- gsub("\\(PRIMARY SECTION)","",sch_name)
sch_name <- trimws(sch_name)
new_row <- data.frame(pri_sch_name=sch_name)
# Add the row
good_pri <- rbind(good_pri, new_row)
}
top_good_pri <- head(good_pri, 10)
head(top_good_pri) pri_sch_name
1 CHIJ ST. NICHOLAS GIRLS’ SCHOOL
2 AI TONG SCHOOL
3 CATHOLIC HIGH SCHOOL
4 ROSYTH SCHOOL
5 PEI HWA PRESBYTERIAN PRIMARY SCHOOL
6 NANYANG PRIMARY SCHOOL
To determine whether the names of the retrieved top primary schools correspond to those already present in the primary school dataframe
top_good_pri$pri_sch_name[!top_good_pri$pri_sch_name %in% prisch_sf$school_name][1] "CHIJ ST. NICHOLAS GIRLS’ SCHOOL" "CATHOLIC HIGH SCHOOL"
[3] "ST. HILDA’S PRIMARY SCHOOL"
#Unique list to store good school names
good_pri_list <- unique(top_good_pri$pri_sch_name)To obtain the coordinates of good primary schools
goodprisch_coords <- get_coords(good_pri_list)To check for NA values
goodprisch_coords[(is.na(goodprisch_coords$postal) | is.na(goodprisch_coords$latitude) | is.na(goodprisch_coords$longitude)), ] address postal latitude longitude
10 ST. HILDA’S PRIMARY SCHOOL <NA> <NA> <NA>
Renaming to match the primary school dataframe names
top_good_pri$pri_sch_name[top_good_pri$pri_sch_name == "CHIJ ST. NICHOLAS GIRLS’ SCHOOL"] <- "CHIJ SAINT NICHOLAS GIRLS' SCHOOL"
top_good_pri$pri_sch_name[top_good_pri$pri_sch_name == "ST. HILDA’S PRIMARY SCHOOL"] <- "SAINT HILDA'S PRIMARY SCHOOL"
#Unique list to store good school names
good_pri_list <- unique(top_good_pri$pri_sch_name)
#To obtain the coordinates of good primary schools
goodprisch_coords <- get_coords(good_pri_list)Final inspection to check for NA values
goodprisch_coords[(is.na(goodprisch_coords$postal) | is.na(goodprisch_coords$latitude) | is.na(goodprisch_coords$longitude)), ][1] address postal latitude longitude
<0 rows> (or 0-length row.names)
To do CRS transformation and turn the good primary schools dataframe into SF object
goodpri_sf <- st_as_sf(goodprisch_coords,
coords = c("longitude",
"latitude"),
crs=4326) %>%
st_transform(crs = 3414)
st_crs(goodpri_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The get prox function will be used to determine how close good primary schools are to HDB resale units.
rs_coords_sf <- get_prox(rs_coords_sf, goodpri_sf, "PROX_GOOD_PRISCH")4.3 Filtering dataset
rs_coords_sf_training <- filter(rs_coords_sf ,flat_type == "3 ROOM") %>%
filter(month >= "2021-01" & month <= "2023-02")4.3.1 Writing to RDS file
rs_factors_rds_training <- write_rds(rs_coords_sf_training, "Data/rds/rs_factors_training.rds")5. Import Data for Analysis
5.1 Geospatial Data
Here, st_read of sf package to read simple features or layers from file.
mpsz_sf <- st_read(dsn = "Data/Geospatial", layer="MPSZ-2019")Reading layer `MPSZ-2019' from data source
`C:\Harith-oh\IS415-Harith-2\IS415-Harith-2\Take-home_Ex3\Data\Geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
mpsz_sf <- st_transform(mpsz_sf, 3414)R object used to contain the imported MPSZ-2019 shapefile is called mpsz_sf and it is a simple feature object. The correct EPSG code for SVY21 should be 3414 therefore a transformation is done above.
5.1.1 Check for invalid geometry and handle it
length(which(st_is_valid(mpsz_sf) == FALSE))[1] 6
Making the invalid geometry valid by using st_make_valid function
mpsz_sf <- st_make_valid(mpsz_sf)
length(which(st_is_valid(mpsz_sf) == FALSE))[1] 0
Now the geometry is valid.
5.2 Aspatial Data for HDB resale location factors
rs_factors_training <- read_rds("Data/rds/rs_factors_training.rds")5.3 Extract storey_order due to character type
storeys <- sort(unique(rs_factors_training$storey_range))
# Create dataframe storey_range_order to store order of storey_range
storey_order <- 1:length(storeys)
storey_range_order <- data.frame(storeys, storey_order)
# Combine storey_order with resale dataframe for training and testing data
rs_factors_training <- left_join(rs_factors_training, storey_range_order, by= c("storey_range" = "storeys"))5.4 Selecting required columns for analysis
rs_factors_training <- rs_factors_training %>%
select(resale_price, floor_area_sqm, storey_order, remaining_lease_mths,
PROX_CBD, PROX_ELDERLYCARE, PROX_HAWKER, PROX_MRT, PROX_PARK, PROX_GOOD_PRISCH, PROX_MALL, PROX_CHAS,
PROX_SUPERMARKET, WITHIN_350M_KINDERGARTEN, WITHIN_350M_CHILDCARE, WITHIN_350M_BUS, WITHIN_1KM_PRISCH)6. Exploratory Data Analysis
6.1 HDB 3 room resale prices in Histogram
ggplot(data=rs_factors_training, aes(x=`resale_price`)) +
geom_histogram(bins=20, color="black", fill="light blue")
The findings demonstrate a right-skewed distribution. This indicates that more resale HDB units were sold for an average price of around $500,000. Log transformation can statistically normalize the skewed distribution, which is what we’ll be performing in the following part.
6.2 Normalize HDB 3 room resale prices using Log Transformation in Histogram
By applying a log transformation on the resale price variable, a new variable called LOG RESALE PRICE is created.
rs_factors_training <- rs_factors_training %>%
mutate(`LOG_SELLING_PRICE` = log(resale_price))Replot Histogram of LOG_RESALE_PRICE
ggplot(data=rs_factors_training, aes(x=`LOG_SELLING_PRICE`)) +
geom_histogram(bins=20, color="black", fill="green")
The change has reduced the skewedness of the distribution.
6.3 Various histogram distribution of structural variables
Next, we’ll look at a few structural variables, like floor area, storey order, and remaining lease months.
s_factor <- c("floor_area_sqm","remaining_lease_mths", "storey_order")The structural variables of the histogram will be saved in a list.
s_factor_hist_list <- vector(mode = "list", length = length(s_factor))
for (i in 1:length(s_factor)) {
hist_plot <- ggplot(rs_factors_training, aes_string(x = s_factor[[i]])) +
geom_histogram(color="firebrick", fill = "light coral") +
labs(title = s_factor[[i]]) +
theme(plot.title = element_text(size = 10),
axis.title = element_blank())
s_factor_hist_list[[i]] <- hist_plot
}Visualizing the distribution of structural variables using a histogram
ggarrange(plotlist = s_factor_hist_list,
ncol = 2,
nrow = 2)
Only the floor area sqm findings from the histogram appear to have a normal distribution. The resale HDBs and flat type are typically on the lower floors since the histogram of storey order is skewed to the right. There are a few patterns of common transactions involving 600 to 800 months and 1100 months for the remaining lease mths. This demonstrates that there are more sales of resale apartments with at least 66 years left on the lease.
6.4 Various histogram of location variables
Extraction of location factors column names to plot
l_factor <- c("PROX_CBD", "PROX_ELDERLYCARE", "PROX_HAWKER", "PROX_MRT", "PROX_PARK", "PROX_GOOD_PRISCH", "PROX_MALL", "PROX_CHAS",
"PROX_SUPERMARKET", "WITHIN_350M_KINDERGARTEN", "WITHIN_350M_CHILDCARE", "WITHIN_350M_BUS", "WITHIN_1KM_PRISCH")Similarly, a list will be created to store the histogram location variables
l_factor_hist_list <- vector(mode = "list", length = length(l_factor))
for (i in 1:length(l_factor)) {
hist_plot <- ggplot(rs_factors_training, aes_string(x = l_factor[[i]])) +
geom_histogram(color="midnight blue", fill = "light sky blue") +
labs(title = l_factor[[i]]) +
theme(plot.title = element_text(size = 10),
axis.title = element_blank())
l_factor_hist_list[[i]] <- hist_plot
}Plotting histogram to visualise distribution of location variables
ggarrange(plotlist = l_factor_hist_list,
ncol = 3,
nrow = 5)
There are several trends in PROX_GOOD_PRISCH that indicate that more transactions occur within a 3 kilometer radius. PROX_CBD has characteristics of a normal distribution.
Additional variables like WITHIN 350M KINDERGARTEN, WITHIN 350M CHILDCARE, PROX_SUPERMARKET, PROX HAWKER, PROX_MRT, PROX_MALL, PROX_CHAS, and WITHIN _350M_KINDERGARTEN are skewed to the right. This demonstrates that there are more resale transactions taking place and that locals favor staying close to resale apartments with these advantages.
6.5 Statistical Point Map
tmap_mode("view")
tm_shape(rs_factors_training) +
tm_dots(col = "resale_price",
alpha = 0.6,
style="quantile") +
tm_view(set.zoom.limits = c(11,14)) +
tm_basemap("OpenStreetMap")tmap_mode("plot")The darker orange points on the interactive map show that the resale prices of 3 room HDBs in the North-East, South, and Central tend to be higher. In contrast, notice how the lighter yellow dots are grouped together around the North and East.
7. Data Sampling of Training and Testing Data
The full collection of data is divided into two sets: a training set from 2021-01 to 2022-12 and a test set from 2023-01 to 2023-02. Initial split() from the rsample package can be used to accomplish this. One of the tidymodels packages is rsample.
Due to time restrictions, training will be done using data from 6 months (2022-07 to 2022-12).
set.seed(1234)
rs_coords_sf <- filter(rs_factors_rds_training,flat_type == "3 ROOM") %>%
filter(month >= "2022-07" & month <= "2023-02")
resale_split <- initial_split(rs_coords_sf[,8:26],
prop = 7.5/10,)
train_data <- training(resale_split)
test_data <- testing(resale_split)7.1 Computing Correlation Matrix
rs_coords_sf_nogeo <- rs_coords_sf %>%
st_drop_geometry()
corrplot::corrplot(cor(rs_coords_sf_nogeo[, 14:26]),
diag = FALSE,
order = "AOE",
tl.pos = "td",
tl.cex = 0.5,
method = "number",
type = "upper")
With the exception of WITHIN 350M CHILDCARE, which has a value of 0.92, the correlation matrix above demonstrates that all correlation values are below 0.8. This demonstrates a multicolinearity between WITHIN 350M CHILDCARE and WTIHIN 350M KINDERGARTEN.
7.2 Building a non-spatial multiple linear regression
price_mlr <- lm(resale_price ~ floor_area_sqm +
remaining_lease_mths +
PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRT + PROX_PARK + PROX_MALL +
PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN +
WITHIN_350M_CHILDCARE + WITHIN_350M_BUS +
WITHIN_1KM_PRISCH,
data=train_data)
summary(price_mlr)
Call:
lm(formula = resale_price ~ floor_area_sqm + remaining_lease_mths +
PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRT + PROX_PARK +
PROX_MALL + PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN +
WITHIN_350M_CHILDCARE + WITHIN_350M_BUS + WITHIN_1KM_PRISCH,
data = train_data)
Residuals:
Min 1Q Median 3Q Max
-215209 -30299 -3360 24682 451378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.207e+05 1.104e+04 -10.935 < 2e-16 ***
floor_area_sqm 5.172e+03 1.458e+02 35.475 < 2e-16 ***
remaining_lease_mths 3.744e+02 5.077e+00 73.742 < 2e-16 ***
PROX_CBD -9.255e+03 2.511e+02 -36.864 < 2e-16 ***
PROX_ELDERLYCARE -9.511e+03 1.745e+03 -5.451 5.38e-08 ***
PROX_HAWKER -1.096e+04 2.316e+03 -4.732 2.32e-06 ***
PROX_MRT -1.562e+04 2.477e+03 -6.305 3.26e-10 ***
PROX_PARK -1.168e+04 2.395e+03 -4.879 1.12e-06 ***
PROX_MALL -1.217e+04 2.590e+03 -4.697 2.74e-06 ***
PROX_SUPERMARKET 2.729e+04 4.882e+03 5.589 2.46e-08 ***
WITHIN_350M_KINDERGARTEN 7.231e+03 1.108e+03 6.527 7.72e-11 ***
WITHIN_350M_CHILDCARE -1.025e+04 1.357e+03 -7.554 5.40e-14 ***
WITHIN_350M_BUS 1.428e+03 3.396e+02 4.206 2.66e-05 ***
WITHIN_1KM_PRISCH -1.960e+03 6.721e+02 -2.917 0.00356 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48430 on 3330 degrees of freedom
Multiple R-squared: 0.6886, Adjusted R-squared: 0.6874
F-statistic: 566.4 on 13 and 3330 DF, p-value: < 2.2e-16
7.3 Geographically Weighted Regression Predictive Method
Using the GWmodel package’s geographically weighted regression approach to calibrate a model to predict HDB resale price.
Converting the sf data.frame to SpatialPointDataFrame
train_data_sp <- as_Spatial(train_data)
train_data_spclass : SpatialPointsDataFrame
features : 3344
extent : 11807.06, 45154.27, 28097.64, 48622.47 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 18
names : floor_area_sqm, flat_model, lease_commence_date, remaining_lease_mths, resale_price, postal, PROX_ELDERLYCARE, PROX_MRT, PROX_HAWKER, PROX_PARK, PROX_SUPERMARKET, PROX_CHAS, WITHIN_350M_KINDERGARTEN, WITHIN_350M_CHILDCARE, WITHIN_350M_BUS, ...
min values : 51, DBSS, 1967, 516, 245000, 050004, 2.90010872692511e-07, 0.0391289507571673, 0.0069808810867684, 0.0834312052773543, 1.46885170959002e-09, 1.16317240314437e-09, 0, 0, 1, ...
max values : 155, Terrace, 2019, 1150, 1068000, 824601, 4.78082950913573, 3.51571661813739, 2.46908408139604, 2.37596211741372, 3.31803184483494, 2.71421661000055, 12, 12, 16, ...
Computing adaptive bandwidth
The best bandwidth to utilize will then be determined using bw.gwr() of the GWmodel package.
bw_adaptive <- bw.gwr(resale_price ~ floor_area_sqm +
remaining_lease_mths +
PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRT + PROX_PARK + PROX_MALL +
PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN +
WITHIN_350M_CHILDCARE + WITHIN_350M_BUS +
WITHIN_1KM_PRISCH,
data=train_data_sp,
approach="CV",
kernel="gaussian",
adaptive=TRUE,
longlat=FALSE)Take a cup of tea and have a break, it will take a few minutes.
-----A kind suggestion from GWmodel development group
Adaptive bandwidth: 2074 CV score: 7.276392e+12
Adaptive bandwidth: 1290 CV score: 6.829296e+12
Adaptive bandwidth: 804 CV score: 6.285662e+12
Adaptive bandwidth: 505 CV score: 5.61714e+12
Adaptive bandwidth: 319 CV score: 4.827511e+12
Adaptive bandwidth: 205 CV score: 4.479947e+12
Adaptive bandwidth: 133 CV score: 4.299238e+12
Adaptive bandwidth: 90 CV score: 4.224232e+12
Adaptive bandwidth: 62 CV score: 4.120263e+12
Adaptive bandwidth: 46 CV score: 4.029567e+12
Adaptive bandwidth: 34 CV score: 3.897764e+12
Adaptive bandwidth: 29 CV score: 3.757576e+12
Adaptive bandwidth: 23 CV score: 3.828702e+12
Adaptive bandwidth: 29 CV score: 3.757576e+12
The outcome indicates that, if adaptive bandwidth is employed for this data collection, 29 neighbor points will represent the ideal bandwidth to be used.
Constructing the adaptive bandwidth GWR model
gwr_adaptive <- gwr.basic(formula = resale_price ~
floor_area_sqm +
remaining_lease_mths + PROX_CBD +
PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRT + PROX_PARK + PROX_MALL +
PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN +
WITHIN_350M_CHILDCARE + WITHIN_350M_BUS +
WITHIN_1KM_PRISCH,
data=train_data_sp,
bw=bw_adaptive,
kernel = 'gaussian',
adaptive=TRUE,
longlat = FALSE)
gwr_adaptive ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2023-03-25 16:53:37
Call:
gwr.basic(formula = resale_price ~ floor_area_sqm + remaining_lease_mths +
PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRT + PROX_PARK +
PROX_MALL + PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN +
WITHIN_350M_CHILDCARE + WITHIN_350M_BUS + WITHIN_1KM_PRISCH,
data = train_data_sp, bw = bw_adaptive, kernel = "gaussian",
adaptive = TRUE, longlat = FALSE)
Dependent (y) variable: resale_price
Independent variables: floor_area_sqm remaining_lease_mths PROX_CBD PROX_ELDERLYCARE PROX_HAWKER PROX_MRT PROX_PARK PROX_MALL PROX_SUPERMARKET WITHIN_350M_KINDERGARTEN WITHIN_350M_CHILDCARE WITHIN_350M_BUS WITHIN_1KM_PRISCH
Number of data points: 3344
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-215209 -30299 -3360 24682 451378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.207e+05 1.104e+04 -10.935 < 2e-16 ***
floor_area_sqm 5.172e+03 1.458e+02 35.475 < 2e-16 ***
remaining_lease_mths 3.744e+02 5.077e+00 73.742 < 2e-16 ***
PROX_CBD -9.255e+03 2.511e+02 -36.864 < 2e-16 ***
PROX_ELDERLYCARE -9.511e+03 1.745e+03 -5.451 5.38e-08 ***
PROX_HAWKER -1.096e+04 2.316e+03 -4.732 2.32e-06 ***
PROX_MRT -1.562e+04 2.477e+03 -6.305 3.26e-10 ***
PROX_PARK -1.168e+04 2.395e+03 -4.879 1.12e-06 ***
PROX_MALL -1.217e+04 2.590e+03 -4.697 2.74e-06 ***
PROX_SUPERMARKET 2.729e+04 4.882e+03 5.589 2.46e-08 ***
WITHIN_350M_KINDERGARTEN 7.231e+03 1.108e+03 6.527 7.72e-11 ***
WITHIN_350M_CHILDCARE -1.025e+04 1.357e+03 -7.554 5.40e-14 ***
WITHIN_350M_BUS 1.428e+03 3.396e+02 4.206 2.66e-05 ***
WITHIN_1KM_PRISCH -1.960e+03 6.721e+02 -2.917 0.00356 **
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48430 on 3330 degrees of freedom
Multiple R-squared: 0.6886
Adjusted R-squared: 0.6874
F-statistic: 566.4 on 13 and 3330 DF, p-value: < 2.2e-16
***Extra Diagnostic information
Residual sum of squares: 7.809959e+12
Sigma(hat): 48341.61
AIC: 81654.95
AICc: 81655.09
BIC: 78524.4
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Adaptive bandwidth: 29 (number of nearest neighbours)
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu.
Intercept -4980253.13 -232986.04 -32049.31 169467.34
floor_area_sqm -9574.79 2464.43 3438.08 4478.22
remaining_lease_mths -3648.29 215.90 350.52 477.35
PROX_CBD -629169.82 -19192.90 -5984.63 7171.31
PROX_ELDERLYCARE -383505.02 -12968.76 7232.76 29984.77
PROX_HAWKER -450126.82 -27552.01 -2462.73 30991.70
PROX_MRT -710511.71 -62468.08 -31654.05 -5846.06
PROX_PARK -251974.70 -36228.92 -2538.87 26072.89
PROX_MALL -603433.33 -29692.24 1450.35 34098.72
PROX_SUPERMARKET -1184867.87 -38761.80 -4874.35 29857.60
WITHIN_350M_KINDERGARTEN -59965.43 -4462.57 568.81 5428.25
WITHIN_350M_CHILDCARE -108055.65 -6893.16 -985.86 4095.16
WITHIN_350M_BUS -14758.58 -1472.04 413.62 2462.64
WITHIN_1KM_PRISCH -324466.99 -6455.76 -1683.59 3402.55
Max.
Intercept 4039875.35
floor_area_sqm 15900.42
remaining_lease_mths 732.32
PROX_CBD 633909.91
PROX_ELDERLYCARE 790166.34
PROX_HAWKER 1252984.53
PROX_MRT 400584.35
PROX_PARK 536692.45
PROX_MALL 787334.47
PROX_SUPERMARKET 811138.22
WITHIN_350M_KINDERGARTEN 102394.35
WITHIN_350M_CHILDCARE 183578.37
WITHIN_350M_BUS 24473.74
WITHIN_1KM_PRISCH 454330.66
************************Diagnostic information*************************
Number of data points: 3344
Effective number of parameters (2trace(S) - trace(S'S)): 744.4552
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2599.545
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 78536.97
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 77642.79
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 78668.9
Residual sum of squares: 1.975628e+12
R-square value: 0.9212289
Adjusted R-square value: 0.8986619
***********************************************************************
Program stops at: 2023-03-25 16:53:44
7.4 Preparing Coordinates Data
The code snippet below extracts the complete, training, and test data sets’ x, y coordinates.
coords <- st_coordinates(rs_coords_sf)
coords_train <- st_coordinates(train_data)
coords_test <- st_coordinates(test_data)Dropping Geometry Field
The geometry column of the sf data will be deleted. frame using the sf package’s st drop geometry() function.
train_data <- train_data %>%
st_drop_geometry()7.5 Calibrating Random Forest Model
Using the ranger package’s random forest function, a model to forecast HDB resale price will be calibrated in this part.
set.seed(1234)
rf <- ranger(resale_price ~ floor_area_sqm +
remaining_lease_mths + PROX_CBD + PROX_ELDERLYCARE +
PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_MALL +
PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN +
WITHIN_350M_CHILDCARE + WITHIN_350M_BUS +
WITHIN_1KM_PRISCH,
data=train_data)print(rf)Ranger result
Call:
ranger(resale_price ~ floor_area_sqm + remaining_lease_mths + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_MALL + PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN + WITHIN_350M_CHILDCARE + WITHIN_350M_BUS + WITHIN_1KM_PRISCH, data = train_data)
Type: Regression
Number of trees: 500
Sample size: 3344
Number of independent variables: 13
Mtry: 3
Target node size: 5
Variable importance mode: none
Splitrule: variance
OOB prediction error (MSE): 908069907
R squared (OOB): 0.8789633
7.6 Calibrating Geographical Random Forest Model
In this section, a model to predict HDB resale price will be calibrated using the grf() function from the SpatialML package.
7.6.1 Calibrating using training data.
The code chunk below calibrate a geographic random forest model by using grf() of SpatialML package.
set.seed(1234)
gwRF_adaptive <- grf(formula = resale_price ~ floor_area_sqm +
remaining_lease_mths + PROX_CBD + PROX_ELDERLYCARE +
PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_MALL +
PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN +
WITHIN_350M_CHILDCARE + WITHIN_350M_BUS +
WITHIN_1KM_PRISCH,
dframe=train_data,
bw=29,
kernel="adaptive",
coords=coords_train)Ranger result
Call:
ranger(resale_price ~ floor_area_sqm + remaining_lease_mths + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_MALL + PROX_SUPERMARKET + WITHIN_350M_KINDERGARTEN + WITHIN_350M_CHILDCARE + WITHIN_350M_BUS + WITHIN_1KM_PRISCH, data = train_data, num.trees = 500, mtry = 4, importance = "impurity", num.threads = NULL)
Type: Regression
Number of trees: 500
Sample size: 3344
Number of independent variables: 13
Mtry: 4
Target node size: 5
Variable importance mode: impurity
Splitrule: variance
OOB prediction error (MSE): 884540419
R squared (OOB): 0.8820995
floor_area_sqm remaining_lease_mths PROX_CBD
3.475222e+12 9.707327e+12 4.463145e+12
PROX_ELDERLYCARE PROX_HAWKER PROX_MRT
7.098509e+11 1.141180e+12 1.269908e+12
PROX_PARK PROX_MALL PROX_SUPERMARKET
7.939024e+11 8.066853e+11 8.443973e+11
WITHIN_350M_KINDERGARTEN WITHIN_350M_CHILDCARE WITHIN_350M_BUS
2.359512e+11 3.502023e+11 2.990132e+11
WITHIN_1KM_PRISCH
4.452732e+11
Min. 1st Qu. Median Mean 3rd Qu. Max.
-328439.1 -12212.1 43.9 802.5 12861.1 578714.3
Min. 1st Qu. Median Mean 3rd Qu. Max.
-105158.42 -5394.42 4.79 -38.44 5277.39 114090.99
Min Max Mean StD
floor_area_sqm 0 820484857944 10422165727 38207216483
remaining_lease_mths 188662784 319055039575 17866830832 38145543472
PROX_CBD 101937448 142276839382 5761662490 13869073188
PROX_ELDERLYCARE 104095347 667016931411 7725731142 26066267291
PROX_HAWKER 127332433 139666769224 5767490124 12687436007
PROX_MRT 116170844 481039605970 7256303205 25981313263
PROX_PARK 64785454 618268503018 8041581816 33540521589
PROX_MALL 71567569 352809966932 6794351930 21399168175
PROX_SUPERMARKET 74027262 637398387744 7483835255 33844732102
WITHIN_350M_KINDERGARTEN 9237484 188055279753 2891154690 8095425136
WITHIN_350M_CHILDCARE 7782921 65577242545 2125037689 5464835513
WITHIN_350M_BUS 13267811 155314918270 4359043897 13518665397
WITHIN_1KM_PRISCH 0 90173678644 1669554060 6358926806
write_rds(gwRF_adaptive, "Data/rds/gwRF_adaptive.rds")
gwRF_adaptive <- read_rds("Data/rds/gwRF_adaptive.rds")7.6.2 Predicting using Test Data
The test data and its accompanying coordinates data will be combined using the code chunk below.
Preparing the test data
test_data <- cbind(test_data, coords_test) %>%
st_drop_geometry()Predicting with test data
The next step is to use the test data and the previously calibrated gwRF adaptive model to estimate the resale value using predict.grf() of the spatialML package.
gwRF_pred <- predict.grf(gwRF_adaptive,
test_data,
x.var.name="X",
y.var.name="Y",
local.w=1,
global.w=0)Before moving on, let us save the output into rds file for future use.
GRF_pred <- write_rds(gwRF_pred, "Data/rds/GRF_pred.rds")
GRF_pred [1] 360854.8 341081.0 426346.0 351358.3 360183.4 342303.0 376290.0 346821.4
[9] 372836.6 347931.4 394486.5 331749.2 361610.5 359481.7 350623.9 355095.5
[17] 331357.1 382630.7 558306.8 374674.6 360082.6 355718.3 399743.3 357867.1
[25] 409777.9 409777.9 339629.6 415621.2 424191.5 359733.5 348572.8 356948.7
[33] 376778.8 347296.8 359132.7 712554.9 351028.6 682193.4 288256.5 518638.2
[41] 407044.7 353523.3 378653.5 453368.5 354426.5 369694.5 384389.6 608416.7
[49] 377682.1 428617.3 571664.1 305831.9 549891.8 391439.7 342452.7 384091.7
[57] 359269.6 468631.6 495787.0 379824.4 298181.1 278505.5 281724.8 369605.7
[65] 341459.2 424189.7 386332.1 334724.1 344938.8 279686.1 688642.9 338129.1
[73] 511172.1 422589.7 422589.7 674512.5 532493.7 380143.6 453580.5 438412.5
[81] 435818.3 441749.8 454666.9 403635.2 435914.4 576370.7 354866.6 318683.2
[89] 411249.1 447697.1 444400.3 427232.3 444188.3 385928.6 373959.7 396676.7
[97] 390230.7 516517.4 517755.8 395835.9 394812.0 394435.3 406574.9 381473.3
[105] 414252.9 367982.6 329457.2 320299.7 329457.2 697795.3 287126.6 321269.1
[113] 336278.3 326926.2 336330.8 293352.0 391670.0 317423.7 312923.3 411743.9
[121] 414046.5 405408.7 413029.9 363319.0 391490.7 380698.5 366211.8 350641.9
[129] 379950.5 361405.2 398363.4 589319.2 343607.9 419407.5 370396.0 344615.4
[137] 389313.5 385226.4 393152.0 369112.6 382715.3 355648.8 365800.1 466655.8
[145] 387183.9 379817.6 363041.1 372773.9 362238.8 337970.2 347491.6 384105.3
[153] 360860.4 361522.1 346279.9 333574.2 340889.6 364571.8 370057.5 502004.1
[161] 500388.2 408828.4 616659.9 433593.0 404024.1 629606.3 378782.3 422539.7
[169] 406687.8 375327.2 408767.5 364213.0 456198.0 382009.5 361885.0 384728.8
[177] 573393.1 335196.1 358774.8 394120.0 282167.2 303173.7 280151.5 331110.2
[185] 399811.0 367296.7 336688.5 326130.9 375101.3 365059.9 364612.5 351601.0
[193] 363190.4 441703.4 333004.6 357218.2 379518.9 387983.0 300709.2 411933.1
[201] 370959.4 422325.5 319770.7 688642.9 674736.0 418631.9 511172.1 302438.8
[209] 925869.3 527664.5 563728.0 331943.6 334569.7 494106.0 405495.8 395426.2
[217] 423225.7 438725.1 432566.0 439554.4 435818.3 437956.0 448068.2 473912.2
[225] 447937.7 551419.8 359050.9 398360.6 377149.4 535868.7 402399.2 434508.3
[233] 414721.4 446889.7 446889.7 456209.7 357350.8 445962.6 393114.1 420185.2
[241] 400562.0 391198.1 386321.1 376245.3 427265.0 685604.0 314371.2 336251.7
[249] 324452.9 372819.5 295237.3 322471.4 308272.2 386892.5 331670.9 454700.2
[257] 454700.2 443042.3 454700.2 365502.9 379922.2 358920.4 416491.3 349885.5
[265] 381299.2 458283.0 347427.4 348119.8 427724.6 416358.3 543986.9 356755.2
[273] 356445.2 410747.1 397872.6 410747.1 368451.1 390672.0 376438.9 371980.3
[281] 355179.8 394242.2 356717.1 416708.9 331749.2 296533.2 367777.3 300599.0
[289] 385737.8 356158.0 319777.6 350581.4 355186.8 349801.9 381612.0 353435.3
[297] 409521.6 350054.4 359150.3 401094.7 469499.0 438067.2 331442.9 342923.1
[305] 355936.6 353987.4 388957.5 364134.7 496205.4 356868.5 381450.3 369833.2
[313] 502095.1 500388.2 521447.8 546136.4 583743.8 734527.7 309014.0 383956.5
[321] 742438.5 365824.5 381213.9 346664.2 460452.4 342354.6 313480.1 514345.5
[329] 475290.0 455530.9 541500.0 396894.7 392351.5 291889.8 289647.6 281174.2
[337] 437480.6 449258.2 286266.3 329016.3 301853.5 322062.7 309510.3 388038.6
[345] 367296.7 340920.4 317267.6 376035.0 341246.3 424763.3 372057.5 388152.7
[353] 479031.6 493028.5 350539.6 357437.0 306084.9 382118.3 423182.4 410664.1
[361] 410664.1 688642.9 386772.9 369603.1 925855.7 527604.5 560442.1 395188.8
[369] 492649.0 464928.2 464928.2 473890.1 431931.8 473912.2 435369.5 366148.6
[377] 319310.7 627602.7 675941.4 415419.6 372967.5 368651.5 318983.1 326660.8
[385] 347060.2 332093.3 632808.8 402395.3 456462.0 440793.2 434196.9 454412.2
[393] 393460.9 400928.9 426847.2 363140.3 397912.3 391366.7 396843.9 402549.4
[401] 409439.0 363111.0 359866.9 413510.9 405811.7 289540.8 317995.2 698699.7
[409] 341851.8 705785.4 683714.6 341530.9 317995.2 727183.7 339213.0 310773.2
[417] 361264.8 478318.5 295357.3 417332.4 414905.8 414456.7 380589.3 358568.4
[425] 349200.6 374435.5 345916.7 359927.8 343222.8 369891.0 430338.0 325059.1
[433] 375949.4 376058.6 352586.6 373039.5 358251.4 337799.3 348691.3 369393.5
[441] 388049.9 354045.8 380785.4 378399.8 354866.1 361440.2 374255.1 365019.3
[449] 360038.6 406042.5 344892.0 328704.3 361331.4 445648.9 368156.7 354027.1
[457] 353029.1 349256.1 444591.2 777637.0 309014.0 600236.6 391671.3 344788.8
[465] 654273.3 380175.3 414629.7 422029.8 422159.9 418701.2 416332.4 393789.2
[473] 456198.0 404152.7 532774.0 355565.6 364777.9 378652.5 360385.2 288571.6
[481] 281174.2 430527.1 314313.1 551824.8 322229.3 333275.6 363867.5 365478.6
[489] 441695.4 371802.7 368875.1 421778.1 315884.6 299299.1 360008.7 353463.4
[497] 394831.5 276426.9 365817.9 338633.2 860033.4 681448.7 681448.7 333924.8
[505] 417758.1 402244.4 454610.1 452288.7 435988.6 437475.8 444263.1 293560.0
[513] 617830.7 667490.1 667490.1 395868.2 412496.6 447322.7 462520.2 368203.0
[521] 615078.8 437202.7 461609.0 435450.3 447483.6 454410.2 464723.9 404817.6
[529] 436676.5 422139.2 444948.3 382816.2 373140.1 341258.2 342793.3 486806.5
[537] 401628.1 434071.5 428174.6 383188.1 398269.6 290067.5 343236.6 341534.8
[545] 343280.8 370837.3 363988.4 277205.7 319281.7 337514.9 333808.6 333336.7
[553] 453124.6 414879.9 414504.7 358634.6 381819.5 359044.3 347888.5 347586.0
[561] 374912.0 389063.5 400150.7 382806.3 376052.6 380896.0 392515.2 377699.2
[569] 396382.7 394082.5 392773.0 345816.7 366514.6 652620.2 378385.2 329243.7
[577] 354627.0 561452.1 360038.6 357867.1 529687.0 338788.1 416178.8 316841.6
[585] 362476.6 496205.4 346440.5 411463.8 300171.1 323405.7 384306.8 353096.5
[593] 433257.2 307206.2 380325.1 378299.9 356752.6 366051.5 395269.1 460369.4
[601] 441611.2 456703.8 377914.4 439007.4 352175.1 430527.1 300911.1 288630.3
[609] 548657.3 320491.6 377712.0 347096.5 322229.3 385659.3 386727.1 314601.6
[617] 358597.0 358873.9 373401.4 515918.7 475174.5 361066.1 371802.7 366477.6
[625] 330114.1 373703.6 274574.4 340457.9 351982.9 466248.0 369603.1 338633.2
[633] 284394.3 331857.5 555458.6 342212.3 404701.2 347150.5 333924.8 333924.8
[641] 451374.8 453509.1 435823.0 415488.9 453110.3 453687.6 453687.6 431880.4
[649] 483608.7 412556.3 644326.5 660794.1 406462.9 350738.2 437824.3 441518.8
[657] 448685.5 432712.0 460468.9 399016.9 518722.9 390872.1 414519.3 402434.2
[665] 415034.3 379932.4 384913.1 417422.1 433053.8 430143.5 398324.7 413557.3
[673] 365574.9 337402.5 347325.2 685527.9 326679.6 320921.2 323003.7 307665.7
[681] 297620.0 318918.0 333190.9 418631.2 377709.6 387189.8 385704.3 353550.2
[689] 346859.7 370562.8 363028.3 360571.2 430986.4 417860.9 587681.7 380770.7
[697] 380332.7 350835.0 356581.0 397185.7 347749.9 378464.5 367344.4 370859.4
[705] 390384.2 385458.4 361175.3 372827.2 361700.2 515058.1 395542.3 459773.7
[713] 346016.1 356665.3 348455.3 376099.3 340296.8 500518.2 690016.2 428591.4
[721] 335004.4 373934.5 349237.4 374526.5 312948.3 372517.3 412109.5 380977.3
[729] 351403.1 351403.1 397406.4 290687.2 320086.7 283550.8 286748.4 302241.7
[737] 293140.1 426329.9 357670.2 333275.6 333072.8 380132.9 403054.0 395266.0
[745] 320572.7 363024.6 365682.8 359444.8 359444.8 451733.4 340457.9 415344.2
[753] 430755.9 372080.9 348047.3 501547.5 459572.1 411808.3 472654.5 456065.2
[761] 480263.5 482170.2 412556.3 444529.9 301275.3 322504.8 375733.8 672015.4
[769] 672015.4 354437.7 462520.2 403095.2 354709.0 447520.1 427017.7 431161.8
[777] 453565.7 430418.7 430418.7 440464.4 451346.9 453384.5 454444.4 429729.7
[785] 437157.8 378126.5 421079.7 518722.9 392516.8 399285.5 403134.6 390426.7
[793] 389341.1 415175.5 415701.4 332486.6 344830.7 678424.0 335275.8 435628.2
[801] 289249.1 340598.4 319395.7 305767.6 296309.2 307330.4 430522.1 439283.7
[809] 405408.7 407987.5 399332.1 419011.3 417424.3 438165.1 435498.8 427163.9
[817] 447945.6 339715.9 409938.2 420693.6 354524.8 339088.0 419329.9 417413.6
[825] 375027.2 384330.2 358832.3 370307.2 321848.2 347558.7 380171.9 420132.0
[833] 348301.4 391761.9 375996.1 331225.3 372556.1 380305.4 349639.4 326025.2
[841] 367871.3 317674.4 333580.5 395631.4 351894.0 377165.3 424008.7 576717.8
[849] 362087.1 366968.0 381891.4 405674.4 340819.9 355010.2 356264.6 353144.3
[857] 514711.9 364686.9 521317.8 704716.8 309014.0 539767.3 700128.6 682598.4
[865] 383314.6 360171.8 699870.7 371081.3 398256.7 412963.6 395751.5 473826.1
[873] 421529.8 420797.7 406952.2 367677.7 367456.6 458630.0 411158.0 383590.1
[881] 372171.2 405595.1 396686.8 591852.0 367962.4 368837.6 303064.0 306707.6
[889] 297571.2 580107.9 302291.1 341275.6 549891.8 388452.0 404249.9 408553.8
[897] 387595.4 373328.5 338386.1 376433.1 341358.1 385624.2 377950.8 362524.2
[905] 365091.3 489507.4 442956.1 512760.9 363750.4 368431.7 414212.7 413442.1
[913] 329398.6 399580.8 317261.5 282832.1 358343.4 454326.7 370799.0 397632.7
[921] 372039.4 375648.7 373627.3 373627.3 365951.6 369410.1 369410.1 421699.8
[929] 271764.5 422611.3 300031.9 375357.8 375357.8 457812.8 435907.3 465878.0
[937] 442608.4 412838.5 492534.2 445876.9 440134.4 469119.9 323128.3 281100.1
[945] 373015.4 389494.7 462520.2 415462.4 388889.6 419005.2 419005.2 447777.1
[953] 437302.8 459335.0 451840.0 367731.3 402815.0 399092.1 336310.1 343006.2
[961] 443320.9 405303.7 467749.0 390925.8 410137.0 394402.8 428198.1 410687.5
[969] 393809.8 386958.9 475121.1 456029.7 398894.1 725154.7 339719.0 341969.5
[977] 341969.5 289249.1 396591.2 327938.3 313275.7 300247.0 312116.7 434871.2
[985] 430522.1 457244.4 457244.4 371724.8 417577.5 431065.3 355810.6 349483.6
[993] 370611.9 439966.4 447702.9 373953.7 359725.8 360409.2 396772.3 625624.2
[1001] 438541.6 435495.1 392259.5 350775.4 380278.9 652330.2 604801.4 332167.2
[1009] 342456.4 415634.1 332670.5 379668.3 375424.7 580862.5 360705.5 362330.0
[1017] 357188.0 370693.4 409521.6 362161.5 457310.3 369844.7 680244.4 690016.2
[1025] 449226.3 308984.2 358189.7 348409.6 698511.8 365229.4 381781.0 411627.1
[1033] 465885.2 322528.9 388639.7 356656.3 365306.3 652310.6 369376.6 364354.4
[1041] 379358.1 596953.7 282560.4 532813.4 549891.8 373328.5 374090.2 333004.7
[1049] 351863.4 385688.7 450966.7 366397.5 364453.6 357588.7 313535.5 336016.2
[1057] 278077.6 343606.4 344976.7 359344.2 398019.5 459950.9 712415.6 344820.2
[1065] 300031.9 560462.1 332629.3 498796.2 428350.3 417920.0 437894.8 440182.8
[1073] 419303.7 435795.9 430546.2 416430.6 673527.3 453566.8 422482.3 431033.7
[1081] 427062.4 443320.9 411221.1 410137.0 399285.5 393492.3 414018.3 378461.9
[1089] 386196.8 353163.6 375435.8 325580.1 332585.3 335923.4 461795.2 319380.5
[1097] 350392.6 316742.7 324059.0 312116.7 391750.0 457244.4 435498.8 393114.8
[1105] 425862.6 372991.8 373474.3 355955.1 353415.4 405746.8 482596.9 380698.5
[1113] 375507.0 403400.7 375803.4
Converting the predicting output into a data frame
A vector of predicted values is the result of the predict.grf() function. In order to further visualize and analyze it, it is better to transform it into a data frame.
GRF_pred <- read_rds("Data/rds/GRF_pred.rds")
GRF_pred_df <- as.data.frame(GRF_pred)In the following code block, test data is appended with the expected values using cbind().
test_data_p <- cbind(test_data, GRF_pred_df)write_rds(test_data_p, "Data/rds/test_data_p.rds")7.6.3 Calculating Root Mean Square Error
We can gauge how far predicted values are from observed values in a regression study using the root mean square error (RMSE). The RMSE is calculated in the code snippet below using the Metrics package’s rmse() function.
rmse(test_data_p$resale_price,
test_data_p$GRF_pred)[1] 31988.66
The spatial regression prediction model is shown to be pretty close to the test data resale prices by the RMSE value of 32313.2. Better fit is indicated by a lower RMSE.
7.6.4 Visualising the predicted values
The code chunk below can also be used to see the actual selling price and the anticipated resale price using a scatterplot.
ggplot(data = test_data_p,
aes(x = GRF_pred,
y = resale_price)) +
geom_point()
The scatterplot visualization shows that, if a linear relationship between the predicted and actual resale prices is drawn, they are not far apart. The visualization reveals that the majority of plots are located in the $400,000 scatter area.











