In this paper, we attempt to examine whether the use of change-point analysis techniques is appropriate for detecting fatigue based on data captured from wearable sensors. As such, we perform a secondary data analysis to the data generated in: Baghdadi et al., 2018. The reader should note that their raw data was preprocessed using:

  1. Kalman Filter: Used to process the raw data from sensors to: (i) estimate the spatial orientation of the body with respect to the global reference frame, and (ii) to estimate the kinematics of motion.

  2. Segmentation: The motion segments where then segmented using an algorithm that assumes the existence of two peaks in the translational acceleration of the gait cycle. This assumption was justified based on the results of Tongen and Wunderlich, 2010 as well as the authors’ preliminary analyses.

Since these are not a primary focus of our paper (where detailed in Section 3 of the paper to present the background of how the raw sensor data are transformed to features), we do not detail this analysis in this file. Instead, we refer the reader to the supplementary Matlab files in our GitHub Repository. ** Thus, this R Markdown primarily focuses on the analysis and results described in Sections 4-6 of our paper.**

The reader can show any code chunk by clicking on the code button. We chose to make the default for the code hidden since we: (a) wanted to improve the readability of this document; and (b) assumed that the readers will not be interested in reading every code chunk.


1 Loading Data & Generating Features

The snippet below documents the list of R libraries that were used in this research. For convenience, we used the package since it allows for installing/loading the needed libraries in one step.

rm(list = ls()) # clear global environment
graphics.off() # close all graphics
library(pacman) # needs to be installed first
# p_load is equivalent to combining both install.packages() and library()
p_load(R.matlab, plotly, extrafont, grDevices, ggpubr,
       dplyr, stringr, tidyverse, utils, reshape2,
       anomalize, MVN, forecast, fractal,
       ecp, dfphase1, 
       MALDIquant, TSclust, dtwclust,
       knitr, kableExtra, dplyr, data.table,
       RColorBrewer, grid, gridExtra)

1.1 Feature Generation

In the snippet below, we extract the 15 “.mat” files in the GitHub repository (where we loaded the data to allow for the reproduction of our work). Note that these files were originally produced in Baghdadi et al., 2018. Then, we perform several transformation steps: (a) extracting the data for the first three columns in the matlab arrays; and (b) computing three kinematic features from the data corresponding to these columns. Due to the differences between Matlab and R, this requires two nested for loops. The outer loop increments over the number of subjects, while the inner loop increments based on the different number of rows of data for each subject. Please see the comments within the code chunk for more details.

num_subjects <- seq(1, 15)
subject_heights <- c(1.71, 1.77, 1.71, 1.59, 1.69,
                     1.63, 1.60, 1.71, 1.67, 1.78,
                     1.68, 1.55, 1.83, 1.81, 1.89)

# Initilizing a df for summary data on participants
summary_df <- data.frame(matrix(nrow = 15, ncol = 9))
colnames(summary_df) <- c("Subject.Num", "num.rows",
                          "num.cols", "mean.scaled.stride.len",
                          "sd.scaled.stride.len",
                          "mean.scaled.stride.height",
                          "sd.scaled.stride.height",
                          "mean.stride.duration",
                          "sd.stride.duartion")

for (i in 1:length(num_subjects)) {
  # Reading the .mat files from GitHub
  raw_data <- readMat(paste0("./Data/Raw/Subject",num_subjects[i],".mat"))
  # Compute the number of cells, and rows in each structered matrix
  raw_data_size <- lengths(raw_data) # num of cells
  num_rows <- raw_data_size / 17 # all data had 17 cols
  # Initilizing the six lists needed for storing the data (we keep track of the top 3 for error checking)
  time_in_sec <- vector("list", length = num_rows)
  position_x <- vector("list", length = num_rows)
  position_y <- vector("list", length = num_rows)
  stride_time <- vector("list", length = num_rows)
  stride_length <- vector("list", length = num_rows)
  stride_height <- vector("list", length = num_rows)
  stride_duration <- vector("list", length = num_rows)
  
  # Following for loop is needed since R reads the structured array as a nested list. The list containing the data is called "M.i.k" and it transforms/reads the original array --> rowise. This means that our first three features (with the same timestamp) are always seperated with a distance equal to the total number of rows
  for (j in 1:num_rows) {
    position_x[[j]] <- raw_data[["M.i.k"]][[j]]
    position_y[[j]] <- raw_data[["M.i.k"]][[num_rows + j]]
    stride_time[[j]] <- raw_data[["M.i.k"]][[2 * num_rows + j]]
    dataholder <- raw_data[["M.i.k"]][[16 * num_rows + j]] # data holder for time
    # Computing the three needed kinematic features
    stride_length[[j]] <-
      range(position_x[[j]])[2] - range(position_x[[j]])[1]
    stride_height[[j]] <-
      range(position_y[[j]])[2] - range(position_y[[j]])[1]
    stride_duration[[j]] <-
      range(stride_time[[j]])[2] - range(stride_time[[j]])[1]
    time_in_sec[[j]] <- lapply(dataholder, mean)# using mean time of stride as a time stamp
  }
  
  # Scaling and creating one data frame per subject
  assign(paste0("subject", i, "_features"), 
         data.frame(time.from.start = unlist(time_in_sec), 
                    scaled.stride.len = unlist(stride_length)/subject_heights[i], 
                    scaled.stride.height = unlist(stride_height) / subject_heights[i], 
                    stride.duration = unlist(stride_duration)
                    )
         )
  
  # Creating a Summary Data Frame
  df_name <- paste0("subject", i, "_features")
  summary_df[i, 1] <- paste0("subject", i)
  summary_df[i, 2] <- get(df_name) %>% nrow()
  summary_df[i, 3] <- get(df_name) %>% ncol()
  summary_df[i, 4] <- get(df_name)[, 2] %>% mean() %>% round(digits = 4)
  summary_df[i, 5] <- get(df_name)[, 2] %>% sd() %>% round(digits = 4)
  summary_df[i, 6] <- get(df_name)[, 3] %>% mean() %>% round(digits = 4)
  summary_df[i, 7] <- get(df_name)[, 3] %>% sd() %>% round(digits = 4)
  summary_df[i, 8] <- get(df_name)[, 4] %>% mean() %>% round(digits = 4)
  summary_df[i, 9] <- get(df_name)[, 4] %>% sd() %>% round(digits = 4)
}
# Printing the top six rows of Subject 4's data as an example
head(subject4_features) %>% round(digits = 3)
# A Summary of the features for all 15 participants
summary_df
rm(raw_data, raw_data_size, i, j, num_rows, 
   dataholder, num_subjects)
save.image(file = "./Data/RGenerated/FeatureGeneration.RData")

Based on the analysis above, there are three observations to be made. First, we scaled the stride length and height based on the subject’s height. This in essence allows us to capture the steps as a percentage of the person’s height. This reduces the between subject variablity in the data and is supported by the seminal work of Oberg et al., 1993. Second, we showed the first six rows from Subject 4 to provide the reader with some insights into the sampling frequeny (after the preprocessing done in Baghdadi et al. 2018). Note that the kinematic features are computed from the sensor channels provided in their paper. Third, we saved the generated data into an R.data file, which can be accessed by clicking on: FeatureGeneration.RData. We hope that this saved data allows other researchers to reproduce and/or build on our work.

1.2 Plotting the Data

First, we use a standard line plot to depict the data for each feature by person. For the sake of facilitating the visualization process, we: (a) panel the plot such that the left, center and right panels correspond to the scaled stride length, scaled stride height and step duration, respectively; and (b) we save the results of each participant (P) in a different tab.

# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
for (i in 1:15) {
  df_transformed <- melt(get(paste0("subject",i,"_features")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                         )
  )
  
  # The few lines below adds a y_min and y_max column to df_transformed
  # We are using these to force all partcipants to have the same y_axis
  # limits since facet_wrap does not allow a y_lim variable
  # This data is not in our analysis and we only use it in the ggplot
  # through geom_blank() i.e., not plotted but affecting the y-axis limit
  df_transformed <- data.table(df_transformed)
  df_transformed[variable == "scaled.stride.len", y_min := 0]
  df_transformed[variable == "scaled.stride.len", y_max := 4]
  df_transformed[variable == "scaled.stride.height", y_min := 0.05]
  df_transformed[variable == "scaled.stride.height", y_max := 0.4]
  df_transformed[variable == "stride.duration", y_min := 0.5]
  df_transformed[variable == "stride.duration", y_max := 1.5]
  
  
  # Plotting the data and assigning each ggplot to a Pi variable
  assign(paste0("p",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start,y=value,group=variable,color=variable,
                    shape=variable)) + 
           geom_line() + theme_bw() + 
           #ylim(0,2) +
           ggtitle(paste0("Participant ",i,": Pre-filtered Data")) + 
           theme(legend.position="none") + 
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_blank(aes(y = y_min)) +
           geom_blank(aes(y = y_max))
  )
  
  # We introduce a tab for each participant using tabset
  cat('###',paste0("P", i), "{-}",'\n')
  print(get(paste0("p",i)))
  cat('\n \n')
  
  stat_printdf <- get(paste0("subject",i,"_features"))
  
  # Programatically print interesting summaries for each participant
  cat(paste0('<source> <p> Based on the <b>raw data</b>, Participant ', i,
             ' has an average scaled stride length of ',
             round(mean(stat_printdf[,2]),2),
             ', an average scaled stride height of ',
             round(mean(stat_printdf[,3]),2),
             ' and an average stride duration of ', round(mean(stat_printdf[,4]),2),
             ' seconds. Based on their height of ', subject_heights[i],
             ' meters, their average stride length and stride height are', ' ',
             round(mean(stat_printdf[,2])*subject_heights[i],2),
             ' and ', ' ', round(mean(stat_printdf[,3])*subject_heights[i],2),
             ', respectively.', '</p> </source>'
  )
  )
  
  cat(' \n \n') # new line
}

P1

Based on the raw data, Participant 1 has an average scaled stride length of 1.07, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.82 and 0.16, respectively.

P2

Based on the raw data, Participant 2 has an average scaled stride length of 1.63, an average scaled stride height of 0.13 and an average stride duration of 0.95 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.88 and 0.23, respectively.

P3

Based on the raw data, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.16, respectively.

P4

Based on the raw data, Participant 4 has an average scaled stride length of 1.28, an average scaled stride height of 0.12 and an average stride duration of 0.95 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.03 and 0.2, respectively.

P5

Based on the raw data, Participant 5 has an average scaled stride length of 0.96, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.63 and 0.15, respectively.

P6

Based on the raw data, Participant 6 has an average scaled stride length of 0.91, an average scaled stride height of 0.11 and an average stride duration of 1.06 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.48 and 0.18, respectively.

P7

Based on the raw data, Participant 7 has an average scaled stride length of 0.84, an average scaled stride height of 0.09 and an average stride duration of 1.12 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.34 and 0.15, respectively.

P8

Based on the raw data, Participant 8 has an average scaled stride length of 1.53, an average scaled stride height of 0.14 and an average stride duration of 0.98 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.62 and 0.24, respectively.

P9

Based on the raw data, Participant 9 has an average scaled stride length of 0.87, an average scaled stride height of 0.1 and an average stride duration of 1.11 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.46 and 0.17, respectively.

P10

Based on the raw data, Participant 10 has an average scaled stride length of 1.28, an average scaled stride height of 0.11 and an average stride duration of 1.01 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.27 and 0.2, respectively.

P11

Based on the raw data, Participant 11 has an average scaled stride length of 1.2, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.02 and 0.15, respectively.

P12

Based on the raw data, Participant 12 has an average scaled stride length of 1.61, an average scaled stride height of 0.14 and an average stride duration of 0.89 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.5 and 0.21, respectively.

P13

Based on the raw data, Participant 13 has an average scaled stride length of 0.67, an average scaled stride height of 0.07 and an average stride duration of 1.14 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.23 and 0.13, respectively.

P14

Based on the raw data, Participant 14 has an average scaled stride length of 1.32, an average scaled stride height of 0.14 and an average stride duration of 0.93 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.39 and 0.25, respectively.

P15

Based on the raw data, Participant 15 has an average scaled stride length of 1.18, an average scaled stride height of 0.1 and an average stride duration of 1.04 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.24 and 0.2, respectively.


2 Signal Processing

From the analysis in Section 1, we have the time-series for stride length, height and duration for each participant. However, as one can see from Section 1.2, these time-seres are of different length (e.g., compare the x-axis scale for Participants 1 and 7) and exhibit a relatively large number of outliers (due to sensor measurements, segmentation and/or unexpected sudden/not-sustained movements by the participants). In addition, the data obtained from the feature engineering exhibit a high level of autocorrelation, which would make the independence assumption required by most time-series changepoint methods violated. Therefore, we examine how to address these three issues in this section.

2.1 Using %Time-from-Start to Standardize the Length of Each Participant’s Time-series

In this section, we rescale/normalize the timescale a 0-100% scale for all participants to prepare for a timeseries clustering type analysis. Specifically, we sampled the three features from the data based on a 0.05% from start. This resulted in a standardization of the number of observations within each timeseries across our 15 paricipants. Based on our increment size, each timeseries is now comprised of 2000 observations, each at 0.05% time increment from its neighouring data points. The chunk below presents the code used to standardize the data.

# to load previously generated data 
load(file = "./Data/RGenerated/FeatureGeneration.RData")

medianfilt_norm <- list() 
cusum_norm <- list()

for (i in 1:15) {
  
  df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
  
  # resampling the data to 2000 samples
  nsamples = 2000
  timePerc <- seq(0, 100, by=0.05)
  timeScale <- df[nrow(df),1]/nsamples
  df_norm = df[FALSE,]
  
  for (j in 1:nsamples) {
    df_matchIsx <- match.closest(timeScale*(j-0.5),df[,1]) # finding the closest point
    df_keep <- df[df_matchIsx,]
    df_keep[,1] <- timePerc[j]
    df_norm <- rbind(df_norm, df_keep)
  }
  assign(paste0("subject",i,"_features_norm"), df_norm)
}

save(subject1_features_norm, subject2_features_norm, subject3_features_norm,
     subject4_features_norm, subject5_features_norm, subject6_features_norm,
     subject7_features_norm, subject8_features_norm, subject9_features_norm,
     subject10_features_norm, subject11_features_norm, subject12_features_norm,
     subject13_features_norm, subject14_features_norm, subject15_features_norm,
     file = "./Data/RGenerated/ScaledData.RData")  

The results from rescaling the data can be found at ScaledData.RData.

2.2 The Application of the Median Filter

From the Signal Processing and Ergonomics Literature, the the median filter is a standard approach for smoothing sensor-based data. There are three common implementations of the median filtering approach, which are:

  1. Apply it over non-overlapping windows of the time-series;

  2. Apply it with an overlap equivalent to half the window size; and

  3. Apply it with an overlap equal to 1.

While the fractal package on CRAN presents an implementation of the median filter, we do not use it here since it is limited to case (C). Instead, we developed our own function med.fil, which presents us with the flexibility to adjust the amount of overlap. There are three inputs to the function: (i) a data frame, (ii) d representing the number of overlapping observations in the window, and (iii) w, which is the window size.

# Inputs to the function are:
  #x is the data
  #d is the number of overlapping observations in the window
  #w is the window size

med.fil<-function(x,d,w){
  ms = {}
  i = 1
  while( w+d*(i-1) < length(x))
  {
    ms[i] = median(x[(1+d*(i-1)):(w+d*(i-1))])
    i = i + 1
  }
  return(ms)
}

window.size <- 2 # To be used in the ECP package; so we can change it once

Using our customizable med.fil(), we can then smooth the scaled sensor data. The chunk below presents the details for the normalization and allows us to generate plots of the smoothed sensor data.

load(file = "./Data/RGenerated/ScaledData.RData")

# Inputs to the med.fil function
d = 1
w = 21

medianfilt_norm <- list() 
cusum_norm <- list()

for (i in 1:15) {
  df <- get(paste0("subject",i,"_features_norm")) # Getting Data from Sec.
  # Obtaining the Outliers for each of the three variables
  percent.time.from.start <- med.fil(df[,1],d,w)
  scaled.stride.len <- med.fil(df[,2],d,w)
  scaled.stride.height <- med.fil(df[,3],d,w)
  stride.duration <- med.fil(df[,4],d,w)
  
  assign(paste0("subject",i,"_medianf_norm"), 
         data.frame(cbind(percent.time.from.start, scaled.stride.len, 
                          scaled.stride.height, stride.duration)))
  

  # cusum calculation
  df <- get(paste0("subject",i,"_medianf_norm"))
  means <- colMeans(df)
  df[,2] <- cumsum(df[,2]-means[2])
  df[,3] <- cumsum(df[,3]-means[3])
  df[,4] <- cumsum(df[,4]-means[4])
  assign(paste0("subject",i,"_cusum_norm"), df)  
  cusum_norm[[i]] <- get(paste0("subject",i,"_cusum_norm"))
  
  
  # Preparing the data for the Line Graph
  df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
                         id.vars = "percent.time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
  
  df_transformed <- data.table(df_transformed)
  df_transformed[variable == "scaled.stride.len", y_min := 0.4]
  df_transformed[variable == "scaled.stride.len", y_max := 2.0]
  df_transformed[variable == "scaled.stride.height", y_min := 0.04]
  df_transformed[variable == "scaled.stride.height", y_max := 0.2]
  df_transformed[variable == "stride.duration", y_min := 0.7]
  df_transformed[variable == "stride.duration", y_max := 1.2]
  
  
  assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=percent.time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle(paste("Rescaled and Median Filtered Data for Participant",i)) + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_blank(aes(y = y_min)) +
           geom_blank(aes(y = y_max))
         )
  
  
  cat('###',paste0("P", i), "{-}",'\n')
  
  print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>median filter</b>, Participant ', i,
             ' has an average scaled stride length of ',
             round(mean(scaled.stride.len),2),
             ', an average scaled stride height of ',
             round(mean(scaled.stride.height),2),
             ' and an average stride duration of ', round(mean(stride.duration),2),
             ' seconds. Based on their height of ', subject_heights[i],
             ' meters, their average stride length and stride height are', ' ',
             round(mean(scaled.stride.len)*subject_heights[i],2),
             ' and ', ' ', round(mean(scaled.stride.height)*subject_heights[i],2),
             ', respectively.', '</p> </source>'
             )
      )
    
  cat(' \n \n')

  
}

P1

Based on the median filter, Participant 1 has an average scaled stride length of 0.96, an average scaled stride height of 0.09 and an average stride duration of 1.08 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.64 and 0.15, respectively.

P2

Based on the median filter, Participant 2 has an average scaled stride length of 1.47, an average scaled stride height of 0.12 and an average stride duration of 1 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.61 and 0.21, respectively.

P3

Based on the median filter, Participant 3 has an average scaled stride length of 0.79, an average scaled stride height of 0.11 and an average stride duration of 1.12 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.35 and 0.18, respectively.

P4

Based on the median filter, Participant 4 has an average scaled stride length of 1.25, an average scaled stride height of 0.12 and an average stride duration of 0.96 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 1.99 and 0.19, respectively.

P5

Based on the median filter, Participant 5 has an average scaled stride length of 0.87, an average scaled stride height of 0.09 and an average stride duration of 1.1 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.47 and 0.15, respectively.

P6

Based on the median filter, Participant 6 has an average scaled stride length of 0.83, an average scaled stride height of 0.11 and an average stride duration of 1.08 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.36 and 0.17, respectively.

P7

Based on the median filter, Participant 7 has an average scaled stride length of 0.81, an average scaled stride height of 0.09 and an average stride duration of 1.13 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.29 and 0.14, respectively.

P8

Based on the median filter, Participant 8 has an average scaled stride length of 1.47, an average scaled stride height of 0.14 and an average stride duration of 0.98 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.51 and 0.23, respectively.

P9

Based on the median filter, Participant 9 has an average scaled stride length of 0.84, an average scaled stride height of 0.1 and an average stride duration of 1.12 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.4 and 0.16, respectively.

P10

Based on the median filter, Participant 10 has an average scaled stride length of 1.18, an average scaled stride height of 0.11 and an average stride duration of 1.02 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.1 and 0.19, respectively.

P11

Based on the median filter, Participant 11 has an average scaled stride length of 1.18, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 1.98 and 0.15, respectively.

P12

Based on the median filter, Participant 12 has an average scaled stride length of 1.51, an average scaled stride height of 0.13 and an average stride duration of 0.93 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.34 and 0.2, respectively.

P13

Based on the median filter, Participant 13 has an average scaled stride length of 0.66, an average scaled stride height of 0.07 and an average stride duration of 1.14 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.21 and 0.13, respectively.

P14

Based on the median filter, Participant 14 has an average scaled stride length of 1.18, an average scaled stride height of 0.12 and an average stride duration of 0.95 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.13 and 0.22, respectively.

P15

Based on the median filter, Participant 15 has an average scaled stride length of 1.12, an average scaled stride height of 0.1 and an average stride duration of 1.05 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.11 and 0.19, respectively.

save(subject1_medianf_norm, subject2_medianf_norm, subject3_medianf_norm, 
     subject4_medianf_norm, subject5_medianf_norm, subject6_medianf_norm,
     subject7_medianf_norm, subject8_medianf_norm, subject9_medianf_norm,
     subject10_medianf_norm, subject11_medianf_norm, subject12_medianf_norm,
     subject13_medianf_norm, subject14_medianf_norm, subject15_medianf_norm,
     file = "./Data/RGenerated/NormMedianFilteredData.RData")  

save(subject1_cusum_norm, subject2_cusum_norm, subject3_cusum_norm,
     subject4_cusum_norm, subject5_cusum_norm, subject6_cusum_norm,
     subject7_cusum_norm, subject8_cusum_norm, subject9_cusum_norm,
     subject10_cusum_norm, subject11_cusum_norm, subject12_cusum_norm,
     subject13_cusum_norm, subject14_cusum_norm, subject15_cusum_norm,
     file = "./Data/RGenerated/NormCusumData.RData")

The rescaled and then median filtered data can be found at: NormMedianFilteredData.RData. Additionally, the cusum of this data (which we use for interpretation) can be found at: NormCusumData.RData


3 Changepoint Detection

3.1 The Nonparameteric Approach of Matteson & James (ECP)

In this section, we will examine the use of the approach of Matteson and James (2014) for multiple change point analysis of our trivariate data. Our analysis takes advantage of their R package titled , which is described in more details in the following paper: James and Matteson (2014). The multivariate changepoint method will be applied on the median filtered data only.

To apply the ecp approach of Matteson and James (2014) on the median filtered data, we need to use a window size (which reflects the smallest size of a shift that is to be detected). We used \(\small m_{ecp} = 2\) since it is the smallest size that converges.

  load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
  pen <- function(x) -length(x) #Equally penalizes every additional changepoint
  len=nrow(subject1_medianf_norm)
  
  changepoints_medianf <- list()

  for (i in 1:15) {
      df <- get(paste0("subject",i,"_medianf_norm"))
      # Making the DF divisiable by the window size
      # (i.e., we are removing the remainder so that our chunks are equally sized)
      rows.to.keep <- nrow(df[,2:4]) %/% window.size
      remainder <- nrow(df[,2:4]) %% window.size
      df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
                                      2:4] %>% as.matrix()
      
      # Utilizing the e.agglo() from the ecp package
      mem <- rep(1:rows.to.keep, each=window.size)
      y = e.agglo(X=df.MultivariateChangepoint,
                  member= mem,
                  alpha = 1,
                  penalty = function(cp,Xts) mean(diff(sort(cp))))
      fatigue_from_ECP <- subset(y$estimates, 
                                 ((y$estimates>window.size) & (y$estimates<len-window.size))
                                 ) # Returns observation number
      changepoints_medianf[[i]] <- df[fatigue_from_ECP,1]
      
      cat('###',paste0("P", i), "{-}",'\n')
      df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
                         id.vars = "percent.time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
      
      df_transformed <- data.table(df_transformed)
      df_transformed[variable == "scaled.stride.len", y_min := 0.4]
      df_transformed[variable == "scaled.stride.len", y_max := 2.0]
      df_transformed[variable == "scaled.stride.height", y_min := 0.04]
      df_transformed[variable == "scaled.stride.height", y_max := 0.2]
      df_transformed[variable == "stride.duration", y_min := 0.7]
      df_transformed[variable == "stride.duration", y_max := 1.2]
      
      assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=percent.time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle(paste("Changepoints on Median Filtered Data for Participant",i)) + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_blank(aes(y = y_min)) +
           geom_blank(aes(y = y_max)) +
           geom_vline(xintercept= changepoints_medianf[[i]])
         )
      
      print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for participant ', i,
             ' is equal to: ',
             length(changepoints_medianf[[i]]),
             '. These changepoints are located at the following percent time(s) from start: ',
             paste(round(changepoints_medianf[[i]], digits = 2), collapse = ", "),
             '. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/tree/master/Data/RGenerated/ECPChangePointsMFData.RData">ECPChangePointsMFData.RData</a>. </p> </source>')
      )
  cat('\n \n')
  }

P1

Based on the ecp package, the number of changepoints for participant 1 is equal to: 3. These changepoints are located at the following percent time(s) from start: 5.5, 56.6, 85.4. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P2

Based on the ecp package, the number of changepoints for participant 2 is equal to: 1. These changepoints are located at the following percent time(s) from start: 47.9. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P3

Based on the ecp package, the number of changepoints for participant 3 is equal to: 32. These changepoints are located at the following percent time(s) from start: 1.7, 3.6, 4.6, 6.4, 6.9, 12.1, 14.7, 16.2, 19.1, 24.2, 26.8, 29.9, 30.6, 35.8, 36.7, 46.5, 47.9, 49.1, 50.4, 52, 53.2, 58.4, 63.7, 67.4, 68.9, 81.5, 84.1, 88, 89.8, 92.4, 93.6, 95.7. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P4

Based on the ecp package, the number of changepoints for participant 4 is equal to: 1. These changepoints are located at the following percent time(s) from start: 63.3. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P5

Based on the ecp package, the number of changepoints for participant 5 is equal to: 2. These changepoints are located at the following percent time(s) from start: 36.6, 93.5. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P6

Based on the ecp package, the number of changepoints for participant 6 is equal to: 2. These changepoints are located at the following percent time(s) from start: 10.9, 95.9. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P7

Based on the ecp package, the number of changepoints for participant 7 is equal to: 1. These changepoints are located at the following percent time(s) from start: 51.1. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P8

Based on the ecp package, the number of changepoints for participant 8 is equal to: 1. These changepoints are located at the following percent time(s) from start: 66.5. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P9

Based on the ecp package, the number of changepoints for participant 9 is equal to: 2. These changepoints are located at the following percent time(s) from start: 81, 81.4. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P10

Based on the ecp package, the number of changepoints for participant 10 is equal to: 2. These changepoints are located at the following percent time(s) from start: 1.5, 61.2. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P11

Based on the ecp package, the number of changepoints for participant 11 is equal to: 1. These changepoints are located at the following percent time(s) from start: 23.4. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P12

Based on the ecp package, the number of changepoints for participant 12 is equal to: 1. These changepoints are located at the following percent time(s) from start: 44.2. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P13

Based on the ecp package, the number of changepoints for participant 13 is equal to: 1. These changepoints are located at the following percent time(s) from start: 4.7. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P14

Based on the ecp package, the number of changepoints for participant 14 is equal to: 1. These changepoints are located at the following percent time(s) from start: 24.4. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P15

Based on the ecp package, the number of changepoints for participant 15 is equal to: 4. These changepoints are located at the following percent time(s) from start: 48.2, 79.6, 80.8, 91.2. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

save(changepoints_medianf,
       file = "./Data/RGenerated/ECPChangePointsMFData.RData")

3.2 Comparing the ECP Changepoints with Participants’ Subjective Ratings

Throughout the experiments, participants provided their subjective ratings every 10 minutes. The ratings were based on the Borg 6-20 RPE scale. Recent literature on fatigue detection suggests that a rating of 13 can be used to estimate the onset of fatigue (see Maman et al. (2017) for a detailed analysis). The subjective ratings for each participant can be accessed at: subjective-ratings-raw.

3.2.1 Plots with the Median Filtered Data and their ECP Changepoints

RatingThresh <- 13  # based on Maman et al. 2017  
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "percent.time.from.start"
df_RPE_transformed <- melt(RPE,
                           id.vars = "percent.time.from.start",
                           measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints 
load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
load(file = "./Data/RGenerated/ECPChangePointsMFData.RData")
changepoints_subj_thr <- list()

for (i in 1:15) {
  correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
  dataholder <- df_RPE_transformed[correct_subject_rows,]
  
  
  # -------- Calculation of Fatigue based on RPE>= 13 -----------------
  fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>% 
                              min() # returns window number
  changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
  df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
  
  
  
  
  cat('####',paste0("P", i), "{-}",'\n')
  df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
                       id.vars = "percent.time.from.start",
                       measure.vars=c("scaled.stride.len",
                                      "scaled.stride.height",
                                      "stride.duration"
                                      )
                         )
  
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,])
  
  df_transformed <- data.table(df_transformed)
  df_transformed[variable == "scaled.stride.len", y_min := 0.4]
  df_transformed[variable == "scaled.stride.len", y_max := 2.0]
  df_transformed[variable == "scaled.stride.height", y_min := 0.04]
  df_transformed[variable == "scaled.stride.height", y_max := 0.2]
  df_transformed[variable == "stride.duration", y_min := 0.7]
  df_transformed[variable == "stride.duration", y_max := 1.2]
      
  assign(paste0("g",i),
        ggplot(data = df_transformed,
               aes(x=percent.time.from.start, y=value, colour = variable)) + 
          geom_line() +
          theme_bw() +
          ggtitle(paste0("Changepoints & RPE overlaid on Median Filtered Data for P",i)) +
          scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
          facet_wrap(~variable,nrow=4, scales = "free_y") +
          geom_blank(aes(y = y_min)) +
          geom_blank(aes(y = y_max)) +
          # labs(title = paste("A paneled time-series plot for Participant",i)) + 
          geom_vline(xintercept= changepoints_medianf[[i]], 
                     show.legend = T, size=1) + 
          geom_vline(xintercept= changepoints_subj_thr[[i]],
                     linetype="solid", color="steelblue1",
                     show.legend = T, size=1)  +
          theme(legend.position="none",
                 plot.title = element_text(hjust = 0.5))
      )
      
      print(get(paste0("g",i)))
  
  cat('\n \n')
  
  cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following.  First, the <b> black  lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. </b> These changepoints are what we obtained in Section 3.1.1. Second, the <font color= #63B8FF> <b>steelblue line  correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. </font> </b>',
'.  </font> </b> \n \n'))
  
  cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the the median filtered data for participant ', i,
             ' is equal to: ',
             length(changepoints_medianf[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_medianf[[i]]), collapse = ", "),
             '.  <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
             paste(changepoints_subj_thr[[i]]),
             '.', collapse = ", "),
             '. </b> </font> </source>')
  
  cat('\n \n')
  
  }

P1

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 1 is equal to: 3. These changepoints are located at: 6, 57, 85. The threshold based changepoint in the subjective RPE is located at: 59. .

P2

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 2 is equal to: 1. These changepoints are located at: 48. The threshold based changepoint in the subjective RPE is located at: 53. .

P3

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 3 is equal to: 32. These changepoints are located at: 2, 4, 5, 6, 7, 12, 15, 16, 19, 24, 27, 30, 31, 36, 37, 46, 48, 49, 50, 52, 53, 58, 64, 67, 69, 82, 84, 88, 90, 92, 94, 96. The threshold based changepoint in the subjective RPE is located at: 35. .

P4

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 4 is equal to: 1. These changepoints are located at: 63. The threshold based changepoint in the subjective RPE is located at: 23. .

P5

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 37, 94. The threshold based changepoint in the subjective RPE is located at: 23. .

P6

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 6 is equal to: 2. These changepoints are located at: 11, 96. The threshold based changepoint in the subjective RPE is located at: 76. .

P7

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 7 is equal to: 1. These changepoints are located at: 51. The threshold based changepoint in the subjective RPE is located at: 18. .

P8

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 8 is equal to: 1. These changepoints are located at: 66. The threshold based changepoint in the subjective RPE is located at: 59. .

P9

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 9 is equal to: 2. These changepoints are located at: 81, 81. The threshold based changepoint in the subjective RPE is located at: 23. .

P10

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 10 is equal to: 2. These changepoints are located at: 2, 61. The threshold based changepoint in the subjective RPE is located at: 18. .

P11

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 11 is equal to: 1. These changepoints are located at: 23. The threshold based changepoint in the subjective RPE is located at: 100. .

P12

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 12 is equal to: 1. These changepoints are located at: 44. The threshold based changepoint in the subjective RPE is located at: 82. .

P13

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 13 is equal to: 1. These changepoints are located at: 5. The threshold based changepoint in the subjective RPE is located at: 59. .

P14

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 14 is equal to: 1. These changepoints are located at: 24. The threshold based changepoint in the subjective RPE is located at: 41. .

P15

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 15 is equal to: 4. These changepoints are located at: 48, 80, 81, 91. The threshold based changepoint in the subjective RPE is located at: 29. .

3.2.2 Plots with the CUSUM of Median Filtered Data and ECP Changepoints

RatingThresh <- 13  # based on Maman et al. 2017  
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "percent.time.from.start"
df_RPE_transformed <- melt(RPE,
                           id.vars = "percent.time.from.start",
                           measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints 
load(file = "./Data/RGenerated/NormCusumData.RData")
load(file = "./Data/RGenerated/ECPChangePointsMFData.RData")
changepoints_subj_thr <- list()

for (i in 1:15) {
  correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
  dataholder <- df_RPE_transformed[correct_subject_rows,]
  
  
  # -------- Calculation of Fatigue based on RPE>= 13 -----------------
  fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>% 
                              min() # returns window number
  changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
  df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
  
  
  cat('####',paste0("P", i), "{-}",'\n')
  df_transformed <- melt(get(paste0("subject",i,"_cusum_norm")),
                       id.vars = "percent.time.from.start",
                       measure.vars=c("scaled.stride.len",
                                      "scaled.stride.height",
                                      "stride.duration"
                                      )
                         )
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,])
  
  df_transformed <- data.table(df_transformed)
  df_transformed[variable == "scaled.stride.len", y_min := -40]
  df_transformed[variable == "scaled.stride.len", y_max := 120]
  df_transformed[variable == "scaled.stride.height", y_min := -20]
  df_transformed[variable == "scaled.stride.height", y_max := 20]
  df_transformed[variable == "stride.duration", y_min := -50]
  df_transformed[variable == "stride.duration", y_max := 30]
  df_transformed[variable == paste0("Subject.",i,".RPE"), y_min := 5]
  df_transformed[variable == paste0("Subject.",i,".RPE"), y_max := 20]
  
  assign(paste0("g",i),
        ggplot(data = df_transformed,
               aes(x=percent.time.from.start, y=value, colour = variable)) + 
          geom_line() +
          theme_bw() +
          ggtitle(paste0("Changepoints & RPE overlaid on the CUSUMs of Median Filtered Data for P",i)) +
          scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
          facet_wrap(~variable,nrow=4, scales = "free_y") +
          geom_blank(aes(y = y_min)) +
          geom_blank(aes(y = y_max)) + 
          geom_vline(xintercept= changepoints_medianf[[i]], 
                     show.legend = T, size=1) + 
          geom_vline(xintercept= changepoints_subj_thr[[i]],
                     linetype="solid", color="steelblue1",
                     show.legend = T, size=1) +
          theme(legend.position="none",
                 plot.title = element_text(hjust = 0.5))
      )
      
      print(get(paste0("g",i)))
  
  cat('\n \n')
  
  cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following.  First, the <b> black  lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. </b> These changepoints are what we obtained in Section 3.1.2. Second, the <font color= #63B8FF> <b>steelblue line  correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. </font> </b>',
'. </font> </b> \n \n'))
             
  
  cat('\n \n')
  
  }

P1

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P2

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P3

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P4

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P5

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P6

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P7

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P8

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P9

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P10

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P11

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P12

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P13

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P14

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .

P15

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. .


4 Time Series Clustering

As mentioned in the paper, the time series clustering can be performed on either the scaled sensor data or on the cummulative sums of the scaled sensor data. We show the results for each approach below based on the use of heirarchical clustering, with a distance measue of . Our clustering approach utilizes the dtwclust package. For more details on the package, the reader is referred to the following dtwclust Vignette.

4.1 Multivariate Time Series Clustering based on the Scaled Sensor Data

load(file = "./Data/RGenerated/NormMedianFilteredData.RData") 

multvar_subjects_scaled <- vector("list", 15) 
for (i in 1:15) {
  df <- get(paste0("subject",i,"_medianf_norm"))
  multvar_subjects_scaled[[i]] <- data.matrix(df[2:4], rownames.force = NA)
}

  names(multvar_subjects_scaled) <- c(paste0("P", seq(1,15,1)))

  multvar_subjects_scaled[[3]] <- {} # Deleting Participant 3

 mvc <- tsclust(multvar_subjects_scaled, type = "hierarchical", k = 2L, distance = "dtw", seed = 390)


cat(paste0('<source> <p> To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the <i> Dynamic Time Warping Method </i>. The resulting dendogram is shown in the figure below. </p> </source>'))

To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the Dynamic Time Warping Method . The resulting dendogram is shown in the figure below.

# distances.matrix <- as.matrix(distances)
# heatmap(distances.matrix)


plot(mvc, hang=0.1, check = TRUE, 
     axes = TRUE, ann = TRUE, ylab="Height",
     main = "Cluster Dendogram based on the DTWARP distances")


cat(paste0('Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the  cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: \n', 'When using k=4, the assignments are as follows: \n'))

Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: When using k=4, the assignments are as follows:

clusterCut3 <- cutree(mvc, 3)
clusterCut4 <- cutree(mvc, 4)

save(clusterCut3, clusterCut4, mvc,
     file = "./Data/RGenerated/MFClusters.RData")

# Printing the table using kable 
kable(t(clusterCut4), row.names = NA, caption = "Cluster Assignment for each Participant") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
Cluster Assignment for each Participant
P1 P2 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
1 2 3 4 4 4 2 4 1 1 2 4 3 1
# a R file used to create a plot based on clusterCut4
# See the file on GitHub Repo for more information
# Not directly included in the RMD since it is long
source('Secondary Analysis/clusters.R')

The results from this analysis are saved at The subjective ratings for each participant can be accessed at: MFClusters.RData.

4.2 Multivariate Time Series Clustering based on the CUSUM of the Scaled Sensor Data

load(file = "./Data/RGenerated/NormCusumData.RData") 

multvar_subjects_scaled <- vector("list", 15) 
for (i in 1:15) {
  df <- get(paste0("subject",i,"_cusum_norm"))
  multvar_subjects_scaled[[i]] <- data.matrix(df[2:4], rownames.force = NA)
}

  names(multvar_subjects_scaled) <- c(paste0("P", seq(1,15,1)))

  multvar_subjects_scaled[[3]] <- {} # Deleting Participant 3


mvc <- tsclust(multvar_subjects_scaled, type = "hierarchical", distance = "dtw", seed = 390)


cat(paste0('<source> <p> To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the <i> Dynamic Time Warping Method </i>. The resulting dendogram is shown in the figure below. </p> </source>'))

To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the Dynamic Time Warping Method . The resulting dendogram is shown in the figure below.

# distances.matrix <- as.matrix(distances)
# heatmap(distances.matrix)


plot(mvc, hang=0.1, check = TRUE, 
     axes = TRUE, ann = TRUE, ylab="Height",
     main = "Cluster Dendogram based on the DTWARP distances")


cat(paste0('Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the  cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: \n', 'When using k=4, the assignments are as follows: \n'))

Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: When using k=4, the assignments are as follows:

clusterCut3 <- cutree(mvc, 3)
clusterCut4 <- cutree(mvc, 4)

save(clusterCut3, clusterCut4, mvc,
     file = "./Data/RGenerated/CusumClusters.RData")

kable(t(clusterCut4), row.names = NA, caption = "Cluster Assignment for each Participant") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
Cluster Assignment for each Participant
P1 P2 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
1 2 2 1 1 3 3 3 1 1 3 3 4 3
# a R file used to create a plot based on clusterCut4
# See the file on GitHub Repo for more information
# Not directly included in the RMD since it is long
source('Secondary Analysis/clustersCUSUMs.R')

The results from this analysis are saved at The subjective ratings for each participant can be accessed at: CusumClusters.RData.