In this paper, we attempt to examine whether the use of change-point analysis techniques is appropriate for detecting fatigue based on data captured from wearable sensors. As such, we perform a secondary data analysis to the data generated in: Baghdadi et al., 2018. The reader should note that their raw data was preprocessed using:
Kalman Filter: Used to process the raw data from sensors to: (i) estimate the spatial orientation of the body with respect to the global reference frame, and (ii) to estimate the kinematics of motion.
Segmentation: The motion segments where then segmented using an algorithm that assumes the existence of two peaks in the translational acceleration of the gait cycle. This assumption was justified based on the results of Tongen and Wunderlich, 2010 as well as the authors’ preliminary analyses.
Since these are not a primary focus of our paper (where detailed in Section 3 of the paper to present the background of how the raw sensor data are transformed to features), we do not detail this analysis in this file. Instead, we refer the reader to the supplementary Matlab files in our GitHub Repository. ** Thus, this R Markdown primarily focuses on the analysis and results described in Sections 4-6 of our paper.**
The reader can show any code chunk by clicking on the code button. We chose to make the default for the code hidden since we: (a) wanted to improve the readability of this document; and (b) assumed that the readers will not be interested in reading every code chunk.
The snippet below documents the list of R libraries that were used in this research. For convenience, we used the package since it allows for installing/loading the needed libraries in one step.
rm(list = ls()) # clear global environment
graphics.off() # close all graphics
library(pacman) # needs to be installed first
# p_load is equivalent to combining both install.packages() and library()
p_load(R.matlab, plotly, extrafont, grDevices, ggpubr,
dplyr, stringr, tidyverse, utils, reshape2,
anomalize, MVN, forecast, fractal,
ecp, dfphase1,
MALDIquant, TSclust, dtwclust,
knitr, kableExtra, dplyr, data.table,
RColorBrewer, grid, gridExtra)
In the snippet below, we extract the 15 “.mat” files in the GitHub repository (where we loaded the data to allow for the reproduction of our work). Note that these files were originally produced in Baghdadi et al., 2018. Then, we perform several transformation steps: (a) extracting the data for the first three columns in the matlab arrays; and (b) computing three kinematic features from the data corresponding to these columns. Due to the differences between Matlab and R, this requires two nested for loops. The outer loop increments over the number of subjects, while the inner loop increments based on the different number of rows of data for each subject. Please see the comments within the code chunk for more details.
num_subjects <- seq(1, 15)
subject_heights <- c(1.71, 1.77, 1.71, 1.59, 1.69,
1.63, 1.60, 1.71, 1.67, 1.78,
1.68, 1.55, 1.83, 1.81, 1.89)
# Initilizing a df for summary data on participants
summary_df <- data.frame(matrix(nrow = 15, ncol = 9))
colnames(summary_df) <- c("Subject.Num", "num.rows",
"num.cols", "mean.scaled.stride.len",
"sd.scaled.stride.len",
"mean.scaled.stride.height",
"sd.scaled.stride.height",
"mean.stride.duration",
"sd.stride.duartion")
for (i in 1:length(num_subjects)) {
# Reading the .mat files from GitHub
raw_data <- readMat(paste0("./Data/Raw/Subject",num_subjects[i],".mat"))
# Compute the number of cells, and rows in each structered matrix
raw_data_size <- lengths(raw_data) # num of cells
num_rows <- raw_data_size / 17 # all data had 17 cols
# Initilizing the six lists needed for storing the data (we keep track of the top 3 for error checking)
time_in_sec <- vector("list", length = num_rows)
position_x <- vector("list", length = num_rows)
position_y <- vector("list", length = num_rows)
stride_time <- vector("list", length = num_rows)
stride_length <- vector("list", length = num_rows)
stride_height <- vector("list", length = num_rows)
stride_duration <- vector("list", length = num_rows)
# Following for loop is needed since R reads the structured array as a nested list. The list containing the data is called "M.i.k" and it transforms/reads the original array --> rowise. This means that our first three features (with the same timestamp) are always seperated with a distance equal to the total number of rows
for (j in 1:num_rows) {
position_x[[j]] <- raw_data[["M.i.k"]][[j]]
position_y[[j]] <- raw_data[["M.i.k"]][[num_rows + j]]
stride_time[[j]] <- raw_data[["M.i.k"]][[2 * num_rows + j]]
dataholder <- raw_data[["M.i.k"]][[16 * num_rows + j]] # data holder for time
# Computing the three needed kinematic features
stride_length[[j]] <-
range(position_x[[j]])[2] - range(position_x[[j]])[1]
stride_height[[j]] <-
range(position_y[[j]])[2] - range(position_y[[j]])[1]
stride_duration[[j]] <-
range(stride_time[[j]])[2] - range(stride_time[[j]])[1]
time_in_sec[[j]] <- lapply(dataholder, mean)# using mean time of stride as a time stamp
}
# Scaling and creating one data frame per subject
assign(paste0("subject", i, "_features"),
data.frame(time.from.start = unlist(time_in_sec),
scaled.stride.len = unlist(stride_length)/subject_heights[i],
scaled.stride.height = unlist(stride_height) / subject_heights[i],
stride.duration = unlist(stride_duration)
)
)
# Creating a Summary Data Frame
df_name <- paste0("subject", i, "_features")
summary_df[i, 1] <- paste0("subject", i)
summary_df[i, 2] <- get(df_name) %>% nrow()
summary_df[i, 3] <- get(df_name) %>% ncol()
summary_df[i, 4] <- get(df_name)[, 2] %>% mean() %>% round(digits = 4)
summary_df[i, 5] <- get(df_name)[, 2] %>% sd() %>% round(digits = 4)
summary_df[i, 6] <- get(df_name)[, 3] %>% mean() %>% round(digits = 4)
summary_df[i, 7] <- get(df_name)[, 3] %>% sd() %>% round(digits = 4)
summary_df[i, 8] <- get(df_name)[, 4] %>% mean() %>% round(digits = 4)
summary_df[i, 9] <- get(df_name)[, 4] %>% sd() %>% round(digits = 4)
}
# Printing the top six rows of Subject 4's data as an example
head(subject4_features) %>% round(digits = 3)
# A Summary of the features for all 15 participants
summary_df
rm(raw_data, raw_data_size, i, j, num_rows,
dataholder, num_subjects)
save.image(file = "./Data/RGenerated/FeatureGeneration.RData")
Based on the analysis above, there are three observations to be made. First, we scaled the stride length and height based on the subject’s height. This in essence allows us to capture the steps as a percentage of the person’s height. This reduces the between subject variablity in the data and is supported by the seminal work of Oberg et al., 1993. Second, we showed the first six rows from Subject 4 to provide the reader with some insights into the sampling frequeny (after the preprocessing done in Baghdadi et al. 2018). Note that the kinematic features are computed from the sensor channels provided in their paper. Third, we saved the generated data into an R.data file, which can be accessed by clicking on: FeatureGeneration.RData. We hope that this saved data allows other researchers to reproduce and/or build on our work.
First, we use a standard line plot to depict the data for each feature by person. For the sake of facilitating the visualization process, we: (a) panel the plot such that the left, center and right panels correspond to the scaled stride length, scaled stride height and step duration, respectively; and (b) we save the results of each participant (P) in a different tab.
# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
for (i in 1:15) {
df_transformed <- melt(get(paste0("subject",i,"_features")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
)
# The few lines below adds a y_min and y_max column to df_transformed
# We are using these to force all partcipants to have the same y_axis
# limits since facet_wrap does not allow a y_lim variable
# This data is not in our analysis and we only use it in the ggplot
# through geom_blank() i.e., not plotted but affecting the y-axis limit
df_transformed <- data.table(df_transformed)
df_transformed[variable == "scaled.stride.len", y_min := 0]
df_transformed[variable == "scaled.stride.len", y_max := 4]
df_transformed[variable == "scaled.stride.height", y_min := 0.05]
df_transformed[variable == "scaled.stride.height", y_max := 0.4]
df_transformed[variable == "stride.duration", y_min := 0.5]
df_transformed[variable == "stride.duration", y_max := 1.5]
# Plotting the data and assigning each ggplot to a Pi variable
assign(paste0("p",i),
ggplot(data = df_transformed,
aes(x=time.from.start,y=value,group=variable,color=variable,
shape=variable)) +
geom_line() + theme_bw() +
#ylim(0,2) +
ggtitle(paste0("Participant ",i,": Pre-filtered Data")) +
theme(legend.position="none") +
facet_wrap(~variable,nrow=3, scales = "free_y") +
geom_blank(aes(y = y_min)) +
geom_blank(aes(y = y_max))
)
# We introduce a tab for each participant using tabset
cat('###',paste0("P", i), "{-}",'\n')
print(get(paste0("p",i)))
cat('\n \n')
stat_printdf <- get(paste0("subject",i,"_features"))
# Programatically print interesting summaries for each participant
cat(paste0('<source> <p> Based on the <b>raw data</b>, Participant ', i,
' has an average scaled stride length of ',
round(mean(stat_printdf[,2]),2),
', an average scaled stride height of ',
round(mean(stat_printdf[,3]),2),
' and an average stride duration of ', round(mean(stat_printdf[,4]),2),
' seconds. Based on their height of ', subject_heights[i],
' meters, their average stride length and stride height are', ' ',
round(mean(stat_printdf[,2])*subject_heights[i],2),
' and ', ' ', round(mean(stat_printdf[,3])*subject_heights[i],2),
', respectively.', '</p> </source>'
)
)
cat(' \n \n') # new line
}
From the analysis in Section 1, we have the time-series for stride length, height and duration for each participant. However, as one can see from Section 1.2, these time-seres are of different length (e.g., compare the x-axis scale for Participants 1 and 7) and exhibit a relatively large number of outliers (due to sensor measurements, segmentation and/or unexpected sudden/not-sustained movements by the participants). In addition, the data obtained from the feature engineering exhibit a high level of autocorrelation, which would make the independence assumption required by most time-series changepoint methods violated. Therefore, we examine how to address these three issues in this section.
In this section, we rescale/normalize the timescale a 0-100% scale for all participants to prepare for a timeseries clustering type analysis. Specifically, we sampled the three features from the data based on a 0.05% from start. This resulted in a standardization of the number of observations within each timeseries across our 15 paricipants. Based on our increment size, each timeseries is now comprised of 2000 observations, each at 0.05% time increment from its neighouring data points. The chunk below presents the code used to standardize the data.
# to load previously generated data
load(file = "./Data/RGenerated/FeatureGeneration.RData")
medianfilt_norm <- list()
cusum_norm <- list()
for (i in 1:15) {
df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
# resampling the data to 2000 samples
nsamples = 2000
timePerc <- seq(0, 100, by=0.05)
timeScale <- df[nrow(df),1]/nsamples
df_norm = df[FALSE,]
for (j in 1:nsamples) {
df_matchIsx <- match.closest(timeScale*(j-0.5),df[,1]) # finding the closest point
df_keep <- df[df_matchIsx,]
df_keep[,1] <- timePerc[j]
df_norm <- rbind(df_norm, df_keep)
}
assign(paste0("subject",i,"_features_norm"), df_norm)
}
save(subject1_features_norm, subject2_features_norm, subject3_features_norm,
subject4_features_norm, subject5_features_norm, subject6_features_norm,
subject7_features_norm, subject8_features_norm, subject9_features_norm,
subject10_features_norm, subject11_features_norm, subject12_features_norm,
subject13_features_norm, subject14_features_norm, subject15_features_norm,
file = "./Data/RGenerated/ScaledData.RData")
The results from rescaling the data can be found at ScaledData.RData.
From the Signal Processing and Ergonomics Literature, the the median filter is a standard approach for smoothing sensor-based data. There are three common implementations of the median filtering approach, which are:
Apply it over non-overlapping windows of the time-series;
Apply it with an overlap equivalent to half the window size; and
Apply it with an overlap equal to 1.
While the fractal package on CRAN presents an implementation of the median filter, we do not use it here since it is limited to case (C). Instead, we developed our own function med.fil
, which presents us with the flexibility to adjust the amount of overlap. There are three inputs to the function: (i) a data frame, (ii) d representing the number of overlapping observations in the window, and (iii) w, which is the window size.
# Inputs to the function are:
#x is the data
#d is the number of overlapping observations in the window
#w is the window size
med.fil<-function(x,d,w){
ms = {}
i = 1
while( w+d*(i-1) < length(x))
{
ms[i] = median(x[(1+d*(i-1)):(w+d*(i-1))])
i = i + 1
}
return(ms)
}
window.size <- 2 # To be used in the ECP package; so we can change it once
Using our customizable med.fil()
, we can then smooth the scaled sensor data. The chunk below presents the details for the normalization and allows us to generate plots of the smoothed sensor data.
load(file = "./Data/RGenerated/ScaledData.RData")
# Inputs to the med.fil function
d = 1
w = 21
medianfilt_norm <- list()
cusum_norm <- list()
for (i in 1:15) {
df <- get(paste0("subject",i,"_features_norm")) # Getting Data from Sec.
# Obtaining the Outliers for each of the three variables
percent.time.from.start <- med.fil(df[,1],d,w)
scaled.stride.len <- med.fil(df[,2],d,w)
scaled.stride.height <- med.fil(df[,3],d,w)
stride.duration <- med.fil(df[,4],d,w)
assign(paste0("subject",i,"_medianf_norm"),
data.frame(cbind(percent.time.from.start, scaled.stride.len,
scaled.stride.height, stride.duration)))
# cusum calculation
df <- get(paste0("subject",i,"_medianf_norm"))
means <- colMeans(df)
df[,2] <- cumsum(df[,2]-means[2])
df[,3] <- cumsum(df[,3]-means[3])
df[,4] <- cumsum(df[,4]-means[4])
assign(paste0("subject",i,"_cusum_norm"), df)
cusum_norm[[i]] <- get(paste0("subject",i,"_cusum_norm"))
# Preparing the data for the Line Graph
df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
id.vars = "percent.time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
df_transformed <- data.table(df_transformed)
df_transformed[variable == "scaled.stride.len", y_min := 0.4]
df_transformed[variable == "scaled.stride.len", y_max := 2.0]
df_transformed[variable == "scaled.stride.height", y_min := 0.04]
df_transformed[variable == "scaled.stride.height", y_max := 0.2]
df_transformed[variable == "stride.duration", y_min := 0.7]
df_transformed[variable == "stride.duration", y_max := 1.2]
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=percent.time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle(paste("Rescaled and Median Filtered Data for Participant",i)) +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y") +
geom_blank(aes(y = y_min)) +
geom_blank(aes(y = y_max))
)
cat('###',paste0("P", i), "{-}",'\n')
print(get(paste0("g",i)))
cat('\n')
cat(paste0('<source> <p> Based on the <b>median filter</b>, Participant ', i,
' has an average scaled stride length of ',
round(mean(scaled.stride.len),2),
', an average scaled stride height of ',
round(mean(scaled.stride.height),2),
' and an average stride duration of ', round(mean(stride.duration),2),
' seconds. Based on their height of ', subject_heights[i],
' meters, their average stride length and stride height are', ' ',
round(mean(scaled.stride.len)*subject_heights[i],2),
' and ', ' ', round(mean(scaled.stride.height)*subject_heights[i],2),
', respectively.', '</p> </source>'
)
)
cat(' \n \n')
}
save(subject1_medianf_norm, subject2_medianf_norm, subject3_medianf_norm,
subject4_medianf_norm, subject5_medianf_norm, subject6_medianf_norm,
subject7_medianf_norm, subject8_medianf_norm, subject9_medianf_norm,
subject10_medianf_norm, subject11_medianf_norm, subject12_medianf_norm,
subject13_medianf_norm, subject14_medianf_norm, subject15_medianf_norm,
file = "./Data/RGenerated/NormMedianFilteredData.RData")
save(subject1_cusum_norm, subject2_cusum_norm, subject3_cusum_norm,
subject4_cusum_norm, subject5_cusum_norm, subject6_cusum_norm,
subject7_cusum_norm, subject8_cusum_norm, subject9_cusum_norm,
subject10_cusum_norm, subject11_cusum_norm, subject12_cusum_norm,
subject13_cusum_norm, subject14_cusum_norm, subject15_cusum_norm,
file = "./Data/RGenerated/NormCusumData.RData")
The rescaled and then median filtered data can be found at: NormMedianFilteredData.RData. Additionally, the cusum of this data (which we use for interpretation) can be found at: NormCusumData.RData
In this section, we will examine the use of the approach of Matteson and James (2014) for multiple change point analysis of our trivariate data. Our analysis takes advantage of their R package titled , which is described in more details in the following paper: James and Matteson (2014). The multivariate changepoint method will be applied on the median filtered data only.
To apply the ecp approach of Matteson and James (2014) on the median filtered data, we need to use a window size (which reflects the smallest size of a shift that is to be detected). We used \(\small m_{ecp} = 2\) since it is the smallest size that converges.
load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
pen <- function(x) -length(x) #Equally penalizes every additional changepoint
len=nrow(subject1_medianf_norm)
changepoints_medianf <- list()
for (i in 1:15) {
df <- get(paste0("subject",i,"_medianf_norm"))
# Making the DF divisiable by the window size
# (i.e., we are removing the remainder so that our chunks are equally sized)
rows.to.keep <- nrow(df[,2:4]) %/% window.size
remainder <- nrow(df[,2:4]) %% window.size
df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
2:4] %>% as.matrix()
# Utilizing the e.agglo() from the ecp package
mem <- rep(1:rows.to.keep, each=window.size)
y = e.agglo(X=df.MultivariateChangepoint,
member= mem,
alpha = 1,
penalty = function(cp,Xts) mean(diff(sort(cp))))
fatigue_from_ECP <- subset(y$estimates,
((y$estimates>window.size) & (y$estimates<len-window.size))
) # Returns observation number
changepoints_medianf[[i]] <- df[fatigue_from_ECP,1]
cat('###',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
id.vars = "percent.time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
df_transformed <- data.table(df_transformed)
df_transformed[variable == "scaled.stride.len", y_min := 0.4]
df_transformed[variable == "scaled.stride.len", y_max := 2.0]
df_transformed[variable == "scaled.stride.height", y_min := 0.04]
df_transformed[variable == "scaled.stride.height", y_max := 0.2]
df_transformed[variable == "stride.duration", y_min := 0.7]
df_transformed[variable == "stride.duration", y_max := 1.2]
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=percent.time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle(paste("Changepoints on Median Filtered Data for Participant",i)) +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y") +
geom_blank(aes(y = y_min)) +
geom_blank(aes(y = y_max)) +
geom_vline(xintercept= changepoints_medianf[[i]])
)
print(get(paste0("g",i)))
cat('\n')
cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for participant ', i,
' is equal to: ',
length(changepoints_medianf[[i]]),
'. These changepoints are located at the following percent time(s) from start: ',
paste(round(changepoints_medianf[[i]], digits = 2), collapse = ", "),
'. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/tree/master/Data/RGenerated/ECPChangePointsMFData.RData">ECPChangePointsMFData.RData</a>. </p> </source>')
)
cat('\n \n')
}
save(changepoints_medianf,
file = "./Data/RGenerated/ECPChangePointsMFData.RData")
Throughout the experiments, participants provided their subjective ratings every 10 minutes. The ratings were based on the Borg 6-20 RPE scale. Recent literature on fatigue detection suggests that a rating of 13 can be used to estimate the onset of fatigue (see Maman et al. (2017) for a detailed analysis). The subjective ratings for each participant can be accessed at: subjective-ratings-raw.
RatingThresh <- 13 # based on Maman et al. 2017
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "percent.time.from.start"
df_RPE_transformed <- melt(RPE,
id.vars = "percent.time.from.start",
measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints
load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
load(file = "./Data/RGenerated/ECPChangePointsMFData.RData")
changepoints_subj_thr <- list()
for (i in 1:15) {
correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
dataholder <- df_RPE_transformed[correct_subject_rows,]
# -------- Calculation of Fatigue based on RPE>= 13 -----------------
fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>%
min() # returns window number
changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
cat('####',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
id.vars = "percent.time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
)
df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,])
df_transformed <- data.table(df_transformed)
df_transformed[variable == "scaled.stride.len", y_min := 0.4]
df_transformed[variable == "scaled.stride.len", y_max := 2.0]
df_transformed[variable == "scaled.stride.height", y_min := 0.04]
df_transformed[variable == "scaled.stride.height", y_max := 0.2]
df_transformed[variable == "stride.duration", y_min := 0.7]
df_transformed[variable == "stride.duration", y_max := 1.2]
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=percent.time.from.start, y=value, colour = variable)) +
geom_line() +
theme_bw() +
ggtitle(paste0("Changepoints & RPE overlaid on Median Filtered Data for P",i)) +
scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
facet_wrap(~variable,nrow=4, scales = "free_y") +
geom_blank(aes(y = y_min)) +
geom_blank(aes(y = y_max)) +
# labs(title = paste("A paneled time-series plot for Participant",i)) +
geom_vline(xintercept= changepoints_medianf[[i]],
show.legend = T, size=1) +
geom_vline(xintercept= changepoints_subj_thr[[i]],
linetype="solid", color="steelblue1",
show.legend = T, size=1) +
theme(legend.position="none",
plot.title = element_text(hjust = 0.5))
)
print(get(paste0("g",i)))
cat('\n \n')
cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following. First, the <b> black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. </b> These changepoints are what we obtained in Section 3.1.1. Second, the <font color= #63B8FF> <b>steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. </font> </b>',
'. </font> </b> \n \n'))
cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the the median filtered data for participant ', i,
' is equal to: ',
length(changepoints_medianf[[i]]),
'. These changepoints are located at: ',
paste(round(changepoints_medianf[[i]]), collapse = ", "),
'. <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
paste(changepoints_subj_thr[[i]]),
'.', collapse = ", "),
'. </b> </font> </source>')
cat('\n \n')
}
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 2 is equal to: 1. These changepoints are located at: 48. The threshold based changepoint in the subjective RPE is located at: 53. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 3 is equal to: 32. These changepoints are located at: 2, 4, 5, 6, 7, 12, 15, 16, 19, 24, 27, 30, 31, 36, 37, 46, 48, 49, 50, 52, 53, 58, 64, 67, 69, 82, 84, 88, 90, 92, 94, 96. The threshold based changepoint in the subjective RPE is located at: 35. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 4 is equal to: 1. These changepoints are located at: 63. The threshold based changepoint in the subjective RPE is located at: 23. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 37, 94. The threshold based changepoint in the subjective RPE is located at: 23. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 6 is equal to: 2. These changepoints are located at: 11, 96. The threshold based changepoint in the subjective RPE is located at: 76. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 7 is equal to: 1. These changepoints are located at: 51. The threshold based changepoint in the subjective RPE is located at: 18. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 8 is equal to: 1. These changepoints are located at: 66. The threshold based changepoint in the subjective RPE is located at: 59. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 9 is equal to: 2. These changepoints are located at: 81, 81. The threshold based changepoint in the subjective RPE is located at: 23. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 10 is equal to: 2. These changepoints are located at: 2, 61. The threshold based changepoint in the subjective RPE is located at: 18. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 11 is equal to: 1. These changepoints are located at: 23. The threshold based changepoint in the subjective RPE is located at: 100. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 12 is equal to: 1. These changepoints are located at: 44. The threshold based changepoint in the subjective RPE is located at: 82. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 13 is equal to: 1. These changepoints are located at: 5. The threshold based changepoint in the subjective RPE is located at: 59. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 14 is equal to: 1. These changepoints are located at: 24. The threshold based changepoint in the subjective RPE is located at: 41. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
Based on the ecp package, the number of changepoints for the the median filtered data for participant 15 is equal to: 4. These changepoints are located at: 48, 80, 81, 91. The threshold based changepoint in the subjective RPE is located at: 29. .
RatingThresh <- 13 # based on Maman et al. 2017
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "percent.time.from.start"
df_RPE_transformed <- melt(RPE,
id.vars = "percent.time.from.start",
measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints
load(file = "./Data/RGenerated/NormCusumData.RData")
load(file = "./Data/RGenerated/ECPChangePointsMFData.RData")
changepoints_subj_thr <- list()
for (i in 1:15) {
correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
dataholder <- df_RPE_transformed[correct_subject_rows,]
# -------- Calculation of Fatigue based on RPE>= 13 -----------------
fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>%
min() # returns window number
changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
cat('####',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_cusum_norm")),
id.vars = "percent.time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
)
df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,])
df_transformed <- data.table(df_transformed)
df_transformed[variable == "scaled.stride.len", y_min := -40]
df_transformed[variable == "scaled.stride.len", y_max := 120]
df_transformed[variable == "scaled.stride.height", y_min := -20]
df_transformed[variable == "scaled.stride.height", y_max := 20]
df_transformed[variable == "stride.duration", y_min := -50]
df_transformed[variable == "stride.duration", y_max := 30]
df_transformed[variable == paste0("Subject.",i,".RPE"), y_min := 5]
df_transformed[variable == paste0("Subject.",i,".RPE"), y_max := 20]
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=percent.time.from.start, y=value, colour = variable)) +
geom_line() +
theme_bw() +
ggtitle(paste0("Changepoints & RPE overlaid on the CUSUMs of Median Filtered Data for P",i)) +
scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
facet_wrap(~variable,nrow=4, scales = "free_y") +
geom_blank(aes(y = y_min)) +
geom_blank(aes(y = y_max)) +
geom_vline(xintercept= changepoints_medianf[[i]],
show.legend = T, size=1) +
geom_vline(xintercept= changepoints_subj_thr[[i]],
linetype="solid", color="steelblue1",
show.legend = T, size=1) +
theme(legend.position="none",
plot.title = element_text(hjust = 0.5))
)
print(get(paste0("g",i)))
cat('\n \n')
cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following. First, the <b> black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. </b> These changepoints are what we obtained in Section 3.1.2. Second, the <font color= #63B8FF> <b>steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. </font> </b>',
'. </font> </b> \n \n'))
cat('\n \n')
}
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .
As mentioned in the paper, the time series clustering can be performed on either the scaled sensor data or on the cummulative sums of the scaled sensor data. We show the results for each approach below based on the use of heirarchical clustering, with a distance measue of . Our clustering approach utilizes the dtwclust package. For more details on the package, the reader is referred to the following dtwclust Vignette.
load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
multvar_subjects_scaled <- vector("list", 15)
for (i in 1:15) {
df <- get(paste0("subject",i,"_medianf_norm"))
multvar_subjects_scaled[[i]] <- data.matrix(df[2:4], rownames.force = NA)
}
names(multvar_subjects_scaled) <- c(paste0("P", seq(1,15,1)))
multvar_subjects_scaled[[3]] <- {} # Deleting Participant 3
mvc <- tsclust(multvar_subjects_scaled, type = "hierarchical", k = 2L, distance = "dtw", seed = 390)
cat(paste0('<source> <p> To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the <i> Dynamic Time Warping Method </i>. The resulting dendogram is shown in the figure below. </p> </source>'))
To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the Dynamic Time Warping Method . The resulting dendogram is shown in the figure below.
# distances.matrix <- as.matrix(distances)
# heatmap(distances.matrix)
plot(mvc, hang=0.1, check = TRUE,
axes = TRUE, ann = TRUE, ylab="Height",
main = "Cluster Dendogram based on the DTWARP distances")
cat(paste0('Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: \n', 'When using k=4, the assignments are as follows: \n'))
Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: When using k=4, the assignments are as follows:
clusterCut3 <- cutree(mvc, 3)
clusterCut4 <- cutree(mvc, 4)
save(clusterCut3, clusterCut4, mvc,
file = "./Data/RGenerated/MFClusters.RData")
# Printing the table using kable
kable(t(clusterCut4), row.names = NA, caption = "Cluster Assignment for each Participant") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
P1 | P2 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 4 | 4 | 2 | 4 | 1 | 1 | 2 | 4 | 3 | 1 |
# a R file used to create a plot based on clusterCut4
# See the file on GitHub Repo for more information
# Not directly included in the RMD since it is long
source('Secondary Analysis/clusters.R')
The results from this analysis are saved at The subjective ratings for each participant can be accessed at: MFClusters.RData.
load(file = "./Data/RGenerated/NormCusumData.RData")
multvar_subjects_scaled <- vector("list", 15)
for (i in 1:15) {
df <- get(paste0("subject",i,"_cusum_norm"))
multvar_subjects_scaled[[i]] <- data.matrix(df[2:4], rownames.force = NA)
}
names(multvar_subjects_scaled) <- c(paste0("P", seq(1,15,1)))
multvar_subjects_scaled[[3]] <- {} # Deleting Participant 3
mvc <- tsclust(multvar_subjects_scaled, type = "hierarchical", distance = "dtw", seed = 390)
cat(paste0('<source> <p> To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the <i> Dynamic Time Warping Method </i>. The resulting dendogram is shown in the figure below. </p> </source>'))
To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the Dynamic Time Warping Method . The resulting dendogram is shown in the figure below.
# distances.matrix <- as.matrix(distances)
# heatmap(distances.matrix)
plot(mvc, hang=0.1, check = TRUE,
axes = TRUE, ann = TRUE, ylab="Height",
main = "Cluster Dendogram based on the DTWARP distances")
cat(paste0('Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: \n', 'When using k=4, the assignments are as follows: \n'))
Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: When using k=4, the assignments are as follows:
clusterCut3 <- cutree(mvc, 3)
clusterCut4 <- cutree(mvc, 4)
save(clusterCut3, clusterCut4, mvc,
file = "./Data/RGenerated/CusumClusters.RData")
kable(t(clusterCut4), row.names = NA, caption = "Cluster Assignment for each Participant") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
P1 | P2 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 2 | 1 | 1 | 3 | 3 | 3 | 1 | 1 | 3 | 3 | 4 | 3 |
# a R file used to create a plot based on clusterCut4
# See the file on GitHub Repo for more information
# Not directly included in the RMD since it is long
source('Secondary Analysis/clustersCUSUMs.R')
The results from this analysis are saved at The subjective ratings for each participant can be accessed at: CusumClusters.RData.