Title: | Time Dependent Shared Frailty Cox Model |
---|---|
Description: | Fits time-dependent shared frailty Cox model (specifically the adapted Paik et al.'s Model) based on the paper "Centre-Effect on Survival After Bone Marrow Transplantation: Application of Time-Dependent Frailty Models", by C.M. Wintrebert, H. Putter, A.H. Zwinderman and J.C. van Houwelingen (2004) <doi:10.1002/bimj.200310051>. |
Authors: | Alessandra Ragni [aut, cre] , Giulia Romani [aut], Chiara Masci [ctb] |
Maintainer: | Alessandra Ragni <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.0.9 |
Built: | 2024-11-22 08:27:51 UTC |
Source: | https://github.com/alessandragni/timedepfrail |
Function for studying the log-likelihood function from the point of view of a single parameter and, therefore, in a single direction. It performs both the optimization of the log-likelihood with respect to this parameter and the evaluation of the log-likelihood in several samples of the same parameter, while the other parameters can assume a constant assigned value or can vary in their range.
AdPaik_1D( formula, data, time_axis, index_param_to_vary, flag_optimal_params = FALSE, optimal_params = NULL, categories_range_min, categories_range_max, n_iter = 5, tol_optimize = 1e-06, flag_plot = FALSE, n_points = 150, cex = 0.7, cex_max = 0.8, color_bg = "black", color_max_bg = "red", pch = 21 )
AdPaik_1D( formula, data, time_axis, index_param_to_vary, flag_optimal_params = FALSE, optimal_params = NULL, categories_range_min, categories_range_max, n_iter = 5, tol_optimize = 1e-06, flag_plot = FALSE, n_points = 150, cex = 0.7, cex_max = 0.8, color_bg = "black", color_max_bg = "red", pch = 21 )
formula |
Formula object indicating the response variable, the covariates and the cluster variable. |
data |
Dataset in which the variables of the formula object are located. |
time_axis |
Partitioned time-domain. |
index_param_to_vary |
Index of the parameter, in the parameter vector, with respect to which the log-likelihood function is maximized in a one-dimensional way. The index s provided to identify the parameter under consideration inside the vector, avoiding providing its name or value. |
flag_optimal_params |
Are the other parameters extracted from the optimal vector of parameters? If so, the flag should be equal to TRUE Otherwise, the flag is equal to FALSE. |
optimal_params |
Vector of optimal parameters, determined through an entire multi-dimensional maximization of the log-likelihood function. The default value (NULL) indicates that no vector is provided and the parameters are randomly extracted in their range. |
categories_range_min |
Vector containing the minimum value assumed by each parameter category. |
categories_range_max |
Vector containing the maximum value assumed by each parameter category. |
n_iter |
Number of times the one-dimensional analysis with respect to the indicated parameter must be executed. Default value is 5. See details for more information. |
tol_optimize |
Tolerance used in the optimize R function for the one-dimensional optimization of the log-likelihood function. |
flag_plot |
Logical value for plotting the trend of the log-likelihood function with respect to the parameter under consideration. A plot for each iteration (n_iter) is reported. Defaults to FALSE. Be careful that if the optimal parameters are provided, then the trend may be always the same and therefore it may be sufficient to set n_iter = 1. On the other hand, if optimal parameters are not provided, then it is recommended to impose a higher n_iter. |
n_points |
Number of internal points in which the log-likelihood function must be evaluated, to plot it. |
cex |
Dimension of the points in the plot. |
cex_max |
Dimension of the optimal point in the plot. |
color_bg |
Color used in the plot for the points. |
color_max_bg |
Color used for the optimal point in the plot. |
pch |
Shape to be used for the points. |
The one-dimensional analysis of the log-likelihood function can be performed in two ways, with two different aims and results:
Keeping fixed the other parameters (all the parameters in the vector, except for the one under consideration) to their optimal value (flag_optimal_params = TRUE), determined through the multi-dimensional optimization. In this way, the optimized value of the parameter coincides with the one get with the general and global approach and, therefore, there is no need to repeat this procedure several times (n_iter = 1). However, this approach is really useful if we want to check the trend the log-likelihood function and to observe if it increases, decreases or is constant.
Letting the other parameters vary in their range (flag_optimal_params = FALSE). The optimized parameter value will always assume a different
value, because it depends on the value of the other parameters, and it is suggested to repeat the procedure several times (n_iter 5),
so that it is possible to identify a precise existence region for such parameter
If the flag for the plot has been activated, the function returns both the plot of the one-dimensional log-likelihood function and a class S3 object. Otherwise, only a S3 object of class 'AdPaik_1D'. This class object is composed of:
numerical vector of length @n_iter containing the optimal estimated parameter.
numerical vector of length @n_iter containing the associated one-dimensional optimized log-likelihood value
# Consider the 'Academic Dropout dataset' data(data_dropout) # Define the variables needed for the model execution formula <- time_to_event ~ Gender + CFUP + cluster(group) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) eps <- 1e-10 categories_range_min <- c(-8, -2, eps, eps, eps) categories_range_max <- c(-eps, 0, 1 - eps, 1, 10) # Choose the parameter with respect to which you want to study the \ # log-likelihood function and provide its position in the parameter vector \ # for identifying a parameter existence range index_param_to_vary <- 1 # Call the main model without providing optimal parameter result <- AdPaik_1D(formula, data_dropout, time_axis, index_param_to_vary, FALSE, NULL, categories_range_min, categories_range_max, n_iter = 5) # or for studying the log-likelihood behaviour.
# Consider the 'Academic Dropout dataset' data(data_dropout) # Define the variables needed for the model execution formula <- time_to_event ~ Gender + CFUP + cluster(group) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) eps <- 1e-10 categories_range_min <- c(-8, -2, eps, eps, eps) categories_range_max <- c(-eps, 0, 1 - eps, 1, 10) # Choose the parameter with respect to which you want to study the \ # log-likelihood function and provide its position in the parameter vector \ # for identifying a parameter existence range index_param_to_vary <- 1 # Call the main model without providing optimal parameter result <- AdPaik_1D(formula, data_dropout, time_axis, index_param_to_vary, FALSE, NULL, categories_range_min, categories_range_max, n_iter = 5) # or for studying the log-likelihood behaviour.
Function for applying the 'Adapted Paik et al.'s Model', an innovative Cox Model with time-dependent frailty, shared by individuals belonging to the same group/cluster.
To generate time-dependence, the temporal domain must be divided into a certain number of intervals. For more information about the time-domain, its relationship with the follow-up and its internal subdivision refer to Details.
The model log-likelihood function depends on a certain number of parameters and it is maximized with respect to all of them, using a reinterpretation of the 'Powell's method in multidimension', that is a multi-dimensional optimization method based on repeated one-dimensional optimization of the log-likelihood function (with respect to one parameter at the time). In this context, the one-dimensional optimization is performed through the 'optimize' R function. For more information about the unknown model parameters, their type and numerosity refer to Details.
Several quantities are estimated at the end of the optimization phase:
parameters, their standard error and confidence interval;
frailty standard deviation (and frailty variance);
posterior frailty estimates, their variance and confidence interval;
Akaike Information Criterion (AIC).
AdPaikModel( formula, data, time_axis, categories_range_min, categories_range_max, flag_fullsd = TRUE, n_extrarun = 60, tol_ll = 1e-06, tol_optimize = 1e-06, h_dd = 0.001, print_previous_ll_values = c(TRUE, 3), verbose = FALSE )
AdPaikModel( formula, data, time_axis, categories_range_min, categories_range_max, flag_fullsd = TRUE, n_extrarun = 60, tol_ll = 1e-06, tol_optimize = 1e-06, h_dd = 0.001, print_previous_ll_values = c(TRUE, 3), verbose = FALSE )
formula |
Formula object having on the left hand side the @time_to_event variable, that is the time-instant in which the individual failed. On the right hand side, it has the regressors and the cluster variable. |
data |
Dataset in which all variables of the formula object must be found and contained. This dataset can also contain other variables, but they will not be considered. It can be either a dataframe or a matrix, but in both cases the name of each column must correspond to the name of the formula variables. It is not necessary to attach it (in case of a data.frame) |
time_axis |
Temporal domain |
categories_range_min |
Vector containing the minimum value assumable by each parameter category. |
categories_range_max |
Vector containing the maximum value assumable by each parameter category. |
flag_fullsd |
If TRUE, the full frailty standard deviation is computed, otherwise the partial one that keeps into account only the time-dependent component. |
n_extrarun |
Total number of runs (iterations) are obtained summing to the number of parameters and n_extrarun. |
tol_ll |
Tolerance on the log-likelihood value. |
tol_optimize |
Internal tolerance for the one-dimensional optimization through 'optimize' R function. |
h_dd |
Discretization step used for the numerical approximation of the second derivative of the log-likelihood function. |
print_previous_ll_values |
If we want to print the previous values of the log-likelihood function. This can be useful for controlling that the optimization procedure is proceeding in a monotone way and it does not oscillate. This argument is composed of two elements: TRUE/FALSE if we want or not to print the previous values and how many values we want to print on the console. Default is (TRUE, 3), so that only the previous 3 values of the log-likelihood are printed. |
verbose |
Logical. If |
Two observation needs to made about the time-domain:
The time domain may coincide with the follow-up or it may be contained in it. Indeed, the left boundary can be greater than the beginning of the follow-up and, for instance, it can coincide with the time-instants in which the events begin to happen; conversely, the right boundary of the two must be the same.
The partition of the time domain into intervals can be made according to two selected criteria: (1) using an already existent partition of the follow-up (2) using the shape of the baseline hazard function as reference: divide the time-domain according to regions in which it has a peak or a plateau.
The parameters with respect to which the log-likelihood function must be optimized are:
baseline log-hazard (number of parameters = number of intervals of the time-domain)
data regressors
,
: parameters of the gamma distribution of
(time-independent/constant) (2 parameters)
: parameters of the gamma distribution of
(time-dependent) (number of parameters = number of intervals)
Another model parameter is
and it is get imposing the constraint that
.
As it can be notice, some parameters can be grouped into the same category (regressors, baseline log-hazard and so on)
and we can easily constraint them assigning each category both a minimum and maximum range.
The category vector is structured as follows: (baseline log-hazard, regressors,
,
,
) with dimension
(n_intervals, n_regressors, 1, 1, n_intervals).
The output of the model call 'AdPaikModel(...)' is a S3 object of class 'AdPaik', composed of the following quantities:
formula: formula object provided in input by the user and specifying the relationship between the time-to-event, the covariates of the dataset (regressors) and the cluster variable.
Regressors: categorical vector of length R, with the name of the regressors. They could be different from the original covariates of the dataset in case of categorical covariates. Indeed, each categorical covariate with n levels needs to be transformed into (n-1) dummy variables and, therefore, (n-1) new regressors.
NRegressors: number of regressors (R)
ClusterVariable: name of the variable with respect to which the individuals can be grouped.
NClusters: number of clusters/centres (also indicated with N).
NIntervals: number of intervals of the time-domain, also called with L. It corresponds to the length of the time-domain minus 1.
NParameters: number of parameters of the model. It can be computed as: .
ParametersCategories: Numerical vector of length 5, containing the numerosity of each parameter category.
ParametersRangeMin: Numerical vector of length , giving the minimum range of each parameter.
ParametersRangeMax: Numerical vector of length , giving the maximum range of each parameter.
Loglikelihood: value of the maximized log-likelihood function, at the optimal estimated parameters.
AIC: 'Akaike Information Criterion': it can be computed as .
It gives an idea of the loss of information related to the model fitting and output.
The smaller it is, the less loss of information and the better model accuracy.
Status: Logical value. Does the model reach convergence? If so, the variable is TRUE, otherwise FALSE.
NRun: Number of runs necessary to reach convergence. If the model does not reach convergence, such number is equal to the maximum number of imposed runs.
OptimalParameters: numerical vector of length , containing the optimal estimated parameters or, in other words, the parameters
that maximizes the log-likelihood function.
StandardErrorParameters: numerical vector of length , corresponding to the standard error of each estimated parameters.
ParametersCI: S3 object of class 'ParametersCI', composed of two numerical vector of length equal to : the left and right confidence
interval of each estimated parameter.
FrailtyStandardDeviation: numerical vector of length equal to L (i.e. number of intervals of the time-domain), reporting the standard deviation of the frailty.
PosteriorFrailtyEstimates: S3 object of class 'PFE.AdPaik'. See details.
PosteriorFrailtyVariance: S3 object of class 'PFV.AdPaik'. See details.
PosteriorFrailtyCI: S3 object of class 'PFCI.AdPaik'. See details.
The object of class 'PFE.AdPaik' contains the Posterior Frailty Estimates computed with the procedure indicated in the reference paper and it is composed of three elements:
'alpha': posterior frailty estimates for . It is a vector of length equal to the number of centres.
'eps': posterior frailty estimates for . Matrix of dimension (N, L).
'Z': posterior frailty estimates for . Matrix of dimension (N, L).
The object of class 'PFV.AdPaik' contains the Posterior Frailty Variances computed as indicated in the reference papaer and it is composed of three elements:
'alphaVar': posterior frailty variance for . It is a vector of length equal to the number of centres.
'epsVar': posterior frailty variance for . Matrix of dimension (N, L).
'ZVar': posterior frailty variance for . Matrix of dimension (N, L).
The object of class 'PFCI.AdPaik' contains the Posterior Frailty Confidence Interval and it is composed of two elements:
left confidence interval for the estimated . Matrix of dimension (N, L).
right confidence interval for the estimated . Matrix of dimension (N, L).
S3 object of class 'AdPaik', composed of several elements. See Details.
...
# Consider the 'Academic Dropout dataset' data(data_dropout) # Define the variables needed for the model execution formula <- time_to_event ~ Gender + CFUP + cluster(group) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) eps <- 1e-10 categories_range_min <- c(-8, -2, eps, eps, eps) categories_range_max <- c(-eps, 0, 1 - eps, 1, 10) # Call the main model result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max, TRUE)
# Consider the 'Academic Dropout dataset' data(data_dropout) # Define the variables needed for the model execution formula <- time_to_event ~ Gender + CFUP + cluster(group) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) eps <- 1e-10 categories_range_min <- c(-8, -2, eps, eps, eps) categories_range_max <- c(-eps, 0, 1 - eps, 1, 10) # Call the main model result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max, TRUE)
The method computes the baseline hazard step-function in each interval of the time-domain, using the estimated parameters
bas_hazard(optimal_params, time_axis)
bas_hazard(optimal_params, time_axis)
optimal_params |
Numerical vector of length equal to the number of model parameters, containing the optimal estimated parameters. |
time_axis |
Numerical vector of temporal domain. |
Numerical vector of length equal to the number of intervals of the time-domain, with the value of the baseline hazard step-function.
The method controls the multiplicative constant C is non-negative (i.e. positive).
check.C_mult(C_mult)
check.C_mult(C_mult)
C_mult |
Multiplicative constant |
An error if the condition is not satisfied.
The function controls that the provided parameters categories have a length equal to the number of categories required by the model
parameters. For the current model, the number of categories is 5 because there are five blocks of unkown parameters ().
Moreover, it also controls that the minimum value of a parameter category is actually less than or eqaul to the maximum value for the same category.
check.categories_params( n_categories, categories_range_min, categories_range_max )
check.categories_params( n_categories, categories_range_min, categories_range_max )
n_categories |
Number of categories expected by the model. For the current model, they are 5. |
categories_range_min |
Numerical vector of length 5, containing the minimum ranges for the parameters beloning to those categories. |
categories_range_max |
Numerical vector of length equal to 5, containing the maximum ranges for the parameters belonging to those categories. |
An error if the any condition is not satisfied.
The function controls that the provided cluster variable is a vector, with at least two levels. Indeed, it is not possible to apply the Time-Dependent Shared Frailty Cox Model with no clusters.
check.centre(centre)
check.centre(centre)
centre |
Numerical vector of length equal to the number of individuals in the study, containing the individual grouo/cluster membership. |
An error if any condition is not satisfied
The method controls that the dataset does not contain 'NULL', 'null', 'NaN' or 'nan' value.
check.dataset(data)
check.dataset(data)
data |
Dataset (dataframe) |
An error if any condition is not satisfied.
The function controls that one of the following condition is satisfied:
if the flag for the optimal parameters is activated, then the optimal parameters should be provided in input
if the flag is not activated, then the optimal parameters should not be provided and the parameter vector should be NU
check.flag_optimal_params(optimal_params, flag_optimal_params)
check.flag_optimal_params(optimal_params, flag_optimal_params)
optimal_params |
Either a numerical vector of length equal to the number of model parameters or a NULL value. |
flag_optimal_params |
Logical value. Did the user want to provide optimal parameters vector? If so, the variable should be TRUE; otherwise (no optimal parameters), FALSE. |
An error if any condition is not satisfied.
The function controls that the terms composing the formula object provided in input to the main model are correct. They must include:
response variable on the left hand side
covariates (numerical or categorical) on the right hand side
cluster variable (categorical) on the right hand side and specified by the function 'cluster()'
Moreover, it controls that the covariates are contained in the dataset provided.
check.formula_terms(formula, data)
check.formula_terms(formula, data)
formula |
Formula object specifying the relationship between the time-to-event, the covariates and the cluster variables. |
data |
Dataset in which these variables can be found. |
An error if any condition is not satified.
The function controls that the frailty standard deviation vector has a length equal to the number of inyervals of the time domain and that its elements are non-negative.
check.frailty_dispersion(frailty_dispersion, n_intervals)
check.frailty_dispersion(frailty_dispersion, n_intervals)
frailty_dispersion |
Frailty dispersion |
n_intervals |
Number of intervals of the time-domain |
An error if any condition is not satisfied.
The method controls that the provided input index exists: it cannot be greater than the maximum number of parameters of the current model.
check.index(index, n_params)
check.index(index, n_params)
index |
Index with respect to which the user wants to study the one dimensional behaviour of the log-likelihood function. |
n_params |
Number of parameters of the model |
An error if any condition is not satisfied.
The function controls that the input variables 'pch_type' and 'color_bg' have the correct structure, they have the same dimension of the number of clusters in the dataset and they have meaningful elements.
These variables are used for the plot of the posterio frailty estimates: the estimates for each faculty are plotted through a symbol, having color and shape indicated by the variables (for the k-th faculty, consider the k-th element of both vectors).
check.pchtype_colorbg(centre_codes, pch_type, color_bg)
check.pchtype_colorbg(centre_codes, pch_type, color_bg)
centre_codes |
Numerical vector of length equal to the number of centres/clusters in the dataset and containing the distinct centres/clusters. They correspond to the levels of the numerical vector of individual group membership. |
pch_type |
Numerical vector of length equal to the number of centres and containing the point shape for each faculty. |
color_bg |
Numerical vector of length equal to the number of centres and containing the color of the point for each faculty. |
An error if any condition is not satisfied.
The method controls that the frailty standard deviation vector has non-negative elements
check.pos_frailty_sd(sd, n_intervals)
check.pos_frailty_sd(sd, n_intervals)
sd |
Numerical vector of length equal to the number of intervals, containing the frailty standard deviation |
n_intervals |
Number of intervals of the time-domain |
An error if any condition is not satisfied.
The function controls that the provided position of the legend is correct. It can be either a vector of length 2, giving the x and y coordinates, or a string, giving the exact position among different possibilities.
check.poslegend(pos_legend)
check.poslegend(pos_legend)
pos_legend |
Either a numerical vector of length 2, with the x and y coordinates, or a string with the exact position. |
An error if any condition is not satisfied.
The function controls that a time-dependent posterior frailty estimate is computed for each centre
check.post_frailty_centre(post_frailty_est, centre_codes)
check.post_frailty_centre(post_frailty_est, centre_codes)
post_frailty_est |
An S3 class object containing the posterior frailty estimates |
centre_codes |
Numerical vector of length equal to the number of distinct centres/clusters in the study |
An error if any condition is not satisfied.
The function controls that the input parameter vector have a length equal to the theoretical one required by the model and that each parameter properly belongs to its range.
check.range_params(optimal_params, params_range_min, params_range_max)
check.range_params(optimal_params, params_range_min, params_range_max)
optimal_params |
Numerical vector of length equal to the number of model parameters. For the 'Adapted Paik et al.'s Model' it can be computed
as: |
params_range_min |
Numerical vector of length equal to the number of model parameters ( |
params_range_max |
Numerical vector of length equal to the number of model parameters ( |
An error if any condition is not satisfied.
The function controls that the structure of the input variable is coherent with the one returned by the 'AdPaikModel' execution.
check.result.AdPaik(result)
check.result.AdPaik(result)
result |
S3 object of class 'AdPaik', composed of several elements. See details. |
The output of the model call 'AdPaikModel(...)' is a S3 object of class 'AdPaik', composed of 18 quantities:
formula: formula object provided in input by the user and specifying the relationship between the time-to-event, the covariates of the dataset (regressors) and the cluster variable.
Regressors: categorical vector of length R, with the name of the regressors. They could be different from the original covariates of the dataset in case of categorical covariates. Indeed, each categorical covariate with n levels needs to be transformed into (n-1) dummy variables and, therefore, (n-1) regressors.
NRegressors: number of regressors (R)
ClusterVariable: name of the variable with respect to which the individuals can be grouped.
NClusters: number of clusters/groups/centres
NIntervals: number of intervals of the time-domain, also called with L. It corresponds to the length of the time-domain minus 1.
NParameters: number of parameters of the model. It can be computed as: .
ParametersCategories: Numerical vector of length 5, containing the numerosity of each parameter category.
ParametersRangeMin: Numerical vector of length , giving the minimum range of each parameter.
ParametersRangeMax: Numerical vector of length , giving the maximum range of each parameter.
Loglikelihood: value of the maximized log-likelihood function, at the optimal estimated parameters.
AIC: 'Akaike Information Criterion': it can be computed as .
It gives an idea of the loss of information related to the model fitting and output.
The smaller it is, the less loss of information and the better model accuracy.
Status: Logical value. Does the model reach convergence? If so, the variable is TRUE, otherwise FALSE.
NRun: Number of runs necessary to reach convergence. If the model does not reach convergence, such number is equal to the maximum number of imposed runs.
OptimalParameters: numerical vector of length , containing the optimal estimated parameters or, in other words, the parameters
that maximizes the log-likelihood function.
StandardErrorParameters: numerical vector of length , corresponding to the standard error of each estimated parameters.
ParametersCI: S3 object of class 'ParametersCI', composed of two numerical vector of length equal to : the left and right confidence
interval of each estimated parameter.
FrailtyStandardDeviation: numerical vector of length equal to L (i.e. number of intervals of the time-domain), reporting the standard deviation of the frailty.
PosteriorFrailtyEstimates: S3 object of class 'PFE.AdPaik'. See details.
PosteriorFrailtyVariance: S3 object of class 'PFV.AdPaik'. See details.
PosteriorFrailtyCI: S3 object of class 'PFCI.AdPaik'. See details.
The object of class 'PFE.AdPaik' contains the Posterior Frailty Estimates computed with the procedure indicated in the reference paper and it is composed of three elements:
'alpha': posterior frailty estimates for . It is a vector of length equal to the number of groups/centres.
'eps': posterior frailty estimates for . Matrix of dimension (N, L).
'Z': posterior frailty estimates for . Matrix of dimension (N, L).
The object of class 'PFV.AdPaik' contains the Posterior Frailty Variances computed as indicated in the reference papaer and it is composed of three elements:
'alphaVar': posterior frailty variance for . It is a vector of length equal to the number of groups/centres.
'epsVar': posterior frailty variance for . Matrix of dimension (N, L).
'ZVar': posterior frailty variance for . Matrix of dimension (N, L).
The object of class 'PFCI.AdPaik' contains the Posterior Frailty Confidence Interval and it is composed of two elements:
left confidence interval for the estimated . Matrix of dimension (N, L).
right confidence interval for the estimated . Matrix of dimension (N, L).
An error if any condition is not satisfied.
The function controls that the structure of the Parameters Confidence Intervals coincides with the theoretical one.
check.structure_paramsCI(parametersCI)
check.structure_paramsCI(parametersCI)
parametersCI |
S3 object of class 'ParametersCI', composed of two elements:
|
An error if any condition is not satisfied.
The function controls that the structure of the 'Posterior Frailty Confidence Interval' coincides with the theoretical one.
check.structure_post_frailty_CI(post_frailty_CI, n_intervals, n_centres)
check.structure_post_frailty_CI(post_frailty_CI, n_intervals, n_centres)
post_frailty_CI |
Posterior frailty estimates S3 object of class 'PFCI.AdPaik', composed of two elements:
|
n_intervals |
Number of intervals of the time-domain |
n_centres |
Number of centres/clusters. |
An error if any condition is not satisfied.
The function controls that the structure of the 'Posterior Frailty Estimates' coincides with the theoretical one.
check.structure_post_frailty_est(post_frailty_est, n_intervals, n_centres)
check.structure_post_frailty_est(post_frailty_est, n_intervals, n_centres)
post_frailty_est |
Posterior frailty estimates S3 object of class 'PFE.AdPaik', composed of three elements:
|
n_intervals |
Number of intervals of the time-domain |
n_centres |
Number of centres/clusters. |
An error if any condition is not satisfied.
The function controls that the structure of the 'Posterior Frailty Variances' coincides with the theoretical one.
check.structure_post_frailty_var(post_frailty_var, n_intervals, n_centres)
check.structure_post_frailty_var(post_frailty_var, n_intervals, n_centres)
post_frailty_var |
Posterior frailty variances S3 object of class 'PFV.AdPaik', composed of three elements:
|
n_intervals |
Number of intervals of the time-domain |
n_centres |
Number of centres/clusters. |
An error if any condition is not satisfied.
The function controls that the time domain is a vector and it has at least 2 elements and that all of them are not negative. Moreover, it checks that all its elements are non-negative and properly ordered, in an ascending way.
check.time_axis(time_axis)
check.time_axis(time_axis)
time_axis |
Numerical vector of temporal domain subdivision. |
An error is returned if any condition is not satisfied.
The function controls that all posterior frailty estimates are non-negative. Indeed, by construction the realizations of a gamma distribution are non negative.
check.value_post_frailty(post_frailty_est, n_centres, n_intervals)
check.value_post_frailty(post_frailty_est, n_centres, n_intervals)
post_frailty_est |
An S3 class object containing the posterior frailty estimates: |
n_centres |
Number of groups/clusters. |
n_intervals |
Number of intervals of the time domain |
An error if any condition is not satisfied.
This dataset is extracted from an administrative database provided by a non-specified university and tracks students enrolled in 2012 over three academic years (or 6 semesters). We are interested in understanding what factors lead to students dropping out. Dropout students with a time-instant in the first semester have been removed, for internal reasons (the university cannot take preventive action to reduce or avoid their withdrawal). The students are followed for at most 3 academic years or, equivalently, 6 semesters (follow-up periods), from the first day of lecture up to the time-instant of withdrawal (i.e. survival event) or the end of the academic year.
data_dropout
data_dropout
A data frame with 4448 rows and 4 columns:
Categorical covariate (Male or Female).
Standardized numerical covariate indicating the number of credits \ passed by the students by the end of the first semester.
Time (in semesters) at which a student decides to leave the university. \ A value greater than 6.0 indicates the student did not drop out during the follow-up (e.g. 6.1 semesters)
Categorical variable indicating the student's course of study, with 16 different levels from CosA, CosB, ... , CosP.
Data for demonstration purposes.
This function produces for a categorical variable of the dataset (covariate) the associated dummy variables: for n levels of the covariate, (n-1) dummy binary variables are generated. The chosen reference value is the first one of the list of extracted levels and cannot be changed (the first one in alphabetical order). Therefore, if an individual has null value for all dummy variables, then his/her belonging level is the reference one.
Each dummy variable has a name, corresponding to the name of the covariate + name of the level.
extract_dummy_variables(covariate, covariate_name)
extract_dummy_variables(covariate, covariate_name)
covariate |
Categorical dataset covariate, with at least 2 levels. |
covariate_name |
Name of the covariate, for assigning each dummy variable a proper name. |
The S3 class object 'DummyData' contains the variables related to the transformation of a single categorical covariate present in the dataset into (n-1) binary covariates, stored in a matrix. To be precise:
DummyMatrix: binary matrix of dimension (n_individuals, n-1), where each column corresponds to one level of the original categorical covariate.
DummyVariablesName: categorical vector of length (n-1), reporting the names of the dummy variables and, therefore, the new name of each regressor.
DummyVariablesNumber: number of dummy variables (n-1).
S3 object of class 'DummyData', composed of three elements. See details.
Function for extracting from the dataset quantities necessary to the evaluation of the posterior frailty estimates.
extract_event_data(dataset, time_to_event, centre, time_axis, phi, betar)
extract_event_data(dataset, time_to_event, centre, time_axis, phi, betar)
dataset |
Dataset containing the covariates/regressors. Their numerosity is indicated with R. |
time_to_event |
Time-instant in the follow-up in which an individual fails or faces the event. If an individual does not face the event, the time-instant assumes a default value. |
centre |
Categorical vector indicating the group/cluster membership. The number of distinct group is indicated with N. |
time_axis |
Numerical vector of the temporal domain. Its length is (L+1), where L indicates the number of intervals of the time-domain. |
phi |
Numerical vector of length L, of estimated baseline log-hazard. |
betar |
Numerical vector of length R, of estimated regressors. |
The S3 class obejct 'EventData' contains the variables necessary for the estimate of the posterior frailty and that can be extracted or computed starting from the dataset.
N_ik: matrix of dimension (N, L), containing the number of event in each interval k and group i.
N_i: numerical vector of length L, with the number of event in each group i. It can be computed as: .
e_ijk: matrix of dimension (n_individuals, L) with the evaluation of the temporal integral, for each individual j, group i and interval k.
Y_risk: binary matrix of dimension (n_individuals, L) reporting for each individual, in each interval, his/her risk of facing the event. For an individual, the risk is equal to 1 in an interval k if, in that interval, he/she has not faced the event yet; otherwise, it is equal to 0.
cum_hazard_group: matrix of dimension (N, L), where each element in position (i,k) indicates the computed cumulative hazard for all individuals belonging to group i and at interval k.
sum_cum_hazard_group: numerical vector of length N, giving the sum of the computed cumulative hazard for all intervals k and for all individuals belonging to group i. It can be computed from the previous element, summing with respect to the interval k.
S3 object of class 'EventData', composed of six elements. See details.
The function computes both the standard deviation and the variance of the time-dependent frailty of the 'Adapted Paik et al.'s Model'.
Recalling the frailty structure as being composed by a constant group-dependent term
(
) and a time and group dependent term (
), the frailty standard deviation (and variance)
can be computed in two different way:
Considering only the time-dependent spread of the clusters/groups/centre: .
In this case, the flag_fullsd should be FALSE.
Considering both the time-dependent and constant spread of the clusters: .
The new added term only moves upward the other case and the flag_fullsd should be TRUE.
The final case only depends on what we want to observe.
frailty_sd(result, flag_fullsd = TRUE)
frailty_sd(result, flag_fullsd = TRUE)
result |
S3 object of class 'AdPaik' returned by the main model output, that contains all the information for the computation of the frailty standard deviation. |
flag_fullsd |
Logical value. Do we want to compute the full frailty standard deviation? If so, the flag must be TRUE, otherwise, FALSE. |
S3 class object 'FrailtyDispersion' containing both two numerical vectors of length equal to the numbero of intervals of the time-domain:
FrailtyVariance
FrailtyStandardDevation
# Consider the 'Academic Dropout dataset' data(data_dropout) # Define the variables needed for the model execution formula <- time_to_event ~ Gender + CFUP + cluster(group) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) eps <- 1e-10 categories_range_min <- c(-8, -2, eps, eps, eps) categories_range_max <- c(-eps, 0, 1 - eps, 1, 10) # Call the main model result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max, TRUE) frailty_sd(result, TRUE) frailty_sd(result, FALSE)
# Consider the 'Academic Dropout dataset' data(data_dropout) # Define the variables needed for the model execution formula <- time_to_event ~ Gender + CFUP + cluster(group) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) eps <- 1e-10 categories_range_min <- c(-8, -2, eps, eps, eps) categories_range_max <- c(-eps, 0, 1 - eps, 1, 10) # Call the main model result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max, TRUE) frailty_sd(result, TRUE) frailty_sd(result, FALSE)
The function computes both the standard deviation and the variance of the time-dependent frailty of the 'Adapted Paik et al.'s Model'.
Recalling the frailty structure as being composed by a constant group-dependent term
(
) and a time and group dependent term (
), the frailty standard deviation (and variance)
can be computed in two different way:
Considering only the time-dependent spread of the clusters/groups/centre: .
In this case, the flag_fullsd should be FALSE.
Considering both the time-dependent and constant spread of the clusters: .
The new added term only moves upward the other case and the flag_fullsd should be TRUE.
The final case only depends on what we want to observe.
frailty_sd.AdPaik(result, flag_fullsd = TRUE)
frailty_sd.AdPaik(result, flag_fullsd = TRUE)
result |
S3 object of class 'AdPaik' returned by the main model output, that contains all the information for the computation of the frailty standard deviation. |
flag_fullsd |
Logical value. Do we want to compute the full frailty standard deviation? If so, the flag must be TRUE, otherwise, FALSE. |
S3 class object 'FrailtyDispersion' containing both two numerical vectors of length equal to the numbero of intervals of the time-domain:
FrailtyVariance
FrailtyStandardDevation
# Consider the 'Academic Dropout dataset' data(data_dropout) # Define the variables needed for the model execution formula <- time_to_event ~ Gender + CFUP + cluster(group) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) eps <- 1e-10 categories_range_min <- c(-8, -2, eps, eps, eps) categories_range_max <- c(-eps, 0, 1 - eps, 1, 10) # Call the main model result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max, TRUE) frailty_sd(result, TRUE) frailty_sd(result, FALSE)
# Consider the 'Academic Dropout dataset' data(data_dropout) # Define the variables needed for the model execution formula <- time_to_event ~ Gender + CFUP + cluster(group) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) eps <- 1e-10 categories_range_min <- c(-8, -2, eps, eps, eps) categories_range_max <- c(-eps, 0, 1 - eps, 1, 10) # Call the main model result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max, TRUE) frailty_sd(result, TRUE) frailty_sd(result, FALSE)
Model log-likelihood function to be optimized only with respect to a parameter. To correctly identify this parameter inside the model and inside the vector of all parameter, it is necessary to provide also the position (index) of this parameter in the vector.
This function is internally used by the main function @AdPaikModel to perform, as said, the one-dimensional optimization through 'optimize'. It cannot be used to evaluate the log-likelihood function at a vector of parameter and at the provided data. For this purpose, we have to use another implemented function, called @ll_AdPaik_eval.
ll_AdPaik_1D( x, index, params, dataset, centre, time_axis, dropout_matrix, e_matrix )
ll_AdPaik_1D( x, index, params, dataset, centre, time_axis, dropout_matrix, e_matrix )
x |
Value of the parameter, with respect to which the log-likelihood function has to be optimized. |
index |
Index of the parameter inside the parameter vector. For instance, if we need to optimize the log-likelihood function with respect to the first regressor, then @x will be generic but @index will be equal to (n_intervals + 1) because in the parameter vector the first regressor appears after the baseline log-hazard group (n_intervals elements). |
params |
Parameter vector. |
dataset |
Matrix containing only the formula regressors, that is the regressors appearing in the formula object provided by the user and eventually modified if they are categorical (nd therefore transformed into dummy variables). |
centre |
Individual membership to the clusters. |
time_axis |
Temporal domain. |
dropout_matrix |
Binary matrix indicating in which interval of the time domain an individual failed. For an individual, the sum of the row elements must be equal to 1 (if he/she failed) or 0 (if he/she does not failed). It has dimension equal to (n_individuals, n_intervals). |
e_matrix |
Matrix of dimension (n_individual, n_intervals), where each element contains the evaluation of the temporal integral, performed through the function @param time_int_eval. |
This function firstly divides the individuals according to their group/cluster membership, extracting group customized dataset and other variables, and then compute the group log-likelihood function through the function @ll_AdPaik_centre_1D. The produced group log-likelihood value is summed together the other values into a unique result, that corresponds to the overall (and final) log-likelihood value.
Overall log-likelihood function
This function simply implements the group log-likelihood function, following the definition. It is internally used by @ll_AdPaik_1D and, therefore, it requires as first and second argument the parameter according to which the global log-likelihood is one-dimensionally optimized and its position inside the vector of parameters.
ll_AdPaik_centre_1D( param_onedim, index_param_onedim, params, dataset, dropout_matrix, e_matrix )
ll_AdPaik_centre_1D( param_onedim, index_param_onedim, params, dataset, dropout_matrix, e_matrix )
param_onedim |
One dimensional parameter, with respect to which the log-likelihood function must be optmize. |
index_param_onedim |
Index of the previous parameter inside the parameter vector. |
params |
Parameter vector. |
dataset |
Matrix of dataset regressors, with a number of rows equal to the number of individuals in a cluster. |
dropout_matrix |
Binary matrix indicating in which interval of the time domain and individual failed. For an individual, the sum of the row elements must be equal to 1 (if he/she failed) or 0 (if he/she does not failed). It has dimension equal to (n_individuals, n_intervals) |
e_matrix |
Matrix of dimension (n_individual, n_intervals), where each element contains the evaluation of the temporal integral, performed through the function @param time_int_eval. |
Centre log-likelihood function.
Evaluation of model group log-likelihood at the provided parameter vector and data. This function is internally called by 'll_AdPaik_eval' to evaluate the log-likelihood function, considering all and only the individuals belonging to a group.
ll_AdPaik_centre_eval(params, dataset, dropout_matrix, e_matrix)
ll_AdPaik_centre_eval(params, dataset, dropout_matrix, e_matrix)
params |
Parameter vector. |
dataset |
Matrix of dataset regressors, with a number of rows equal to the number of individuals in a cluster. |
dropout_matrix |
Binary matrix indicating in which interval of the time domain and individual failed. For an individual, the sum of the row elements must be equal to 1 (if he/she failed) or 0 (if he/she does not failed). It has dimension equal to (n_individuals, n_intervals) |
e_matrix |
Matrix of dimension (n_individual, n_intervals), where each element contains the evaluation of the temporal integral, performed through the function @time_int_eval. |
Group log-likelihood evaluation
Evaluation of the log-likelihood function at the provided parameter vector and data.
ll_AdPaik_eval(params, dataset, centre, time_axis, dropout_matrix, e_matrix)
ll_AdPaik_eval(params, dataset, centre, time_axis, dropout_matrix, e_matrix)
params |
Parameter vector |
dataset |
Matrix of dimension equal to (number of individuals in the study, number of regressors), where only the regressors indicated in the formula object are considered. |
centre |
Vector of length equal to the number of individuals in the study, where each element corresponds to the individual cluster membership. |
time_axis |
Temporal domain |
dropout_matrix |
Binary matrix indicating in which interval of the time domain and individual failed. For an individual, the sum of the row elements must be equal to 1 (if he/she failed) or 0 (if he/she does not failed). It has dimension equal to (n_individuals, n_intervals) |
e_matrix |
Matrix of dimension (n_individual, n_intervals), where each element contains the evaluation of the temporal integral, performed through the function @time_int_eval. |
The function divides the individuals according to their group/cluster membership and then evaluates the group log-likelihood through another implemented function, but using all and only the individuals belonging to that group. Then the results are summed together to return the overall log-likelihood value.
Overall log-likelihood function value at the provided parameters and data
Nodes and weights for the Gauss_hermite quadrature formula for the 'Centre-Specific Frailty Model with Power Parameter'. The nodes and weights have been extracted from the 'Handbook of Mathematical functions' pag 940.
n_nodes
n_nodes
An object of class numeric
of length 1.
Nodes and weights for the Gauss-Hermite quadrature formula, for the 'Stochastic Time-Dependent Centre-Specific Frailty Model'. For the G function, the chosen nodes should not contain the zero (node) since it appears at the denominator of a fraction. Also in this case, the nodes and weights have been extracted from the 'Handbook of Mathematical functions', pag 940.
n_nodesG
n_nodesG
An object of class numeric
of length 1.
The function provides the confidence interval for each estimated parameter, using the standard error computed through another method and provided as second argument to the current function.
params_CI(optimal_params, se_params)
params_CI(optimal_params, se_params)
optimal_params |
Numerical vector of optimal estimated parameters. Its length is equal to the number of model parameters. |
se_params |
Numerical vector containing the standard error associated to each estimated parameter. |
A S3 object of class 'ParametersCI', composed of two numerical vector of length equal to the number of model parameters:
ParamsCI_left: left confidence interval for each parameter
ParamsCI_right: right confidence interval for each parameter
Function for computing the standard error of each optimal parameter, estimated through the constraint multi-dimensional optimization. The procedure for the computation is based on the numerical approximation of the second derivative of the log-likelihood function, by the 'centered finite difference scheme' with an accuracy of the second order.
params_se.AdPaik( optimal_params, params_range_min, params_range_max, dataset, centre, time_axis, dropout_matrix, e_matrix, h_dd )
params_se.AdPaik( optimal_params, params_range_min, params_range_max, dataset, centre, time_axis, dropout_matrix, e_matrix, h_dd )
optimal_params |
Numerical vector of optimal parameters. Its length (i.e. number of parameters) is equal to |
params_range_min |
Numerical vector of length equal to |
params_range_max |
Numerical vector of length equal to |
dataset |
Dataset containing the value of the regressors for all individuals in the study. |
centre |
vector containing the group membership of each individual and that induces the clustering subdivision. |
time_axis |
Temporal domain. Its number of intervals corresponds to the length of the time-domain minus 1 |
dropout_matrix |
Binary matrix of dimension (n_individuals, n_intervals). The sum of the elements of each row must be (1), if the associated individual failed in a precise interval, and (0) if the individual did not fail in the @time-axis. Therefore, if an individual failed in the time-domain, the interval in which he failed will have value (1) and the others (0). |
e_matrix |
Matrix of dimension (n_individuals, n_intervals) where each element contains the resolution of the temporal integral for that individual in that interval, thorugh the 'e_time_fun' function. |
h_dd |
Discretization step for the numerical approximation of the second derivative fo the loglikelihood function. |
The standrd error of each parameter is computed as the inverse of the square root of the 'Information matrix', that in turn is computed as the opposite of the 'Hessian matrix'. Only its diagonal is built and its elements are separatey evaluated through a numerical approximation of the second derivative of the log-likelihood function.
The function requires the optimal parameter vector and other parameters-related variables, to check:
the right numerosity of the parameter vector
the correct range existence of each parameter (i.e. each parameter lies in its range).
Vector of parameter standard error, of length equal to the number of model parameters.
This function plots the baseline hazard step-function based on the estimated parameters from the Adapted Paik et al.'s model.
plot_bas_hazard( result, xlim = c(0, length(result$TimeDomain) - 1), ylim = c(0, max(result$BaselineHazard)), xlab = "x", ylab = "y", main_title = "Baseline hazard step-function", color = "black", pch = 21, bg = "black", cex_points = 0.7 )
plot_bas_hazard( result, xlim = c(0, length(result$TimeDomain) - 1), ylim = c(0, max(result$BaselineHazard)), xlab = "x", ylab = "y", main_title = "Baseline hazard step-function", color = "black", pch = 21, bg = "black", cex_points = 0.7 )
result |
S3 object of class 'AdPaik', returned by the method call 'AdPaikModel(...)'. |
xlim |
A numeric vector of length 2 specifying the x-axis limits. Default is set to 0 and the number of time-domain intervals. |
ylim |
A numeric vector of length 2 specifying the y-axis limits. Default is 0 to the maximum value of the baseline hazard. |
xlab , ylab
|
String giving the x and y axis name. Default values are 'x' and 'y'. |
main_title |
Title of the plot. Default title is 'Baseline hazard step-function'. |
color |
Color used for plotting the horizontal segments of the step-function. Default one is 'black'. |
pch |
Symbol for marking the boundaries of each segment. Default is a dot (value 21). |
bg |
Color for the boundary symbols. Default matches the plot color ('black'). |
cex_points |
Size of the boundary symbols. Default is 0.7. |
The function plots a horizontal segment for each interval of the time domain, representing the baseline hazard. The boundaries of each segment are marked with colored dots, and subsequent segments are intentionally left unconnected to reflect the discrete nature of the intervals.
Plot of the baseline hazard step-function and value of the function in each interval.
# Import data data(data_dropout) # Define the variables needed for the model execution eps_paik <- 1e-10 categories_range_min <- c(-8, -2, eps_paik, eps_paik, eps_paik) categories_range_max <- c(-eps_paik, 0.4, 1 - eps_paik, 1, 10) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) formula <- time_to_event ~ Gender + CFUP + cluster(group) # Call the main model function result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max) plot_bas_hazard(result)
# Import data data(data_dropout) # Define the variables needed for the model execution eps_paik <- 1e-10 categories_range_min <- c(-8, -2, eps_paik, eps_paik, eps_paik) categories_range_max <- c(-eps_paik, 0.4, 1 - eps_paik, 1, 10) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) formula <- time_to_event ~ Gender + CFUP + cluster(group) # Call the main model function result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max) plot_bas_hazard(result)
This function generates a plot of either the frailty standard deviation or the frailty variance for the intervals in the time-domain.
plot_frailty_sd( result, frailty_sd = NULL, flag_variance = FALSE, flag_sd_external = FALSE, xlim = c(1, length(result$TimeDomain) - 1), ylim = c(0, 10), xlab = "Intervals", ylab = "Values", main_title = "Frailty standard deviation", pch = 21, color_bg = "blue", cex_points = 0.7 )
plot_frailty_sd( result, frailty_sd = NULL, flag_variance = FALSE, flag_sd_external = FALSE, xlim = c(1, length(result$TimeDomain) - 1), ylim = c(0, 10), xlab = "Intervals", ylab = "Values", main_title = "Frailty standard deviation", pch = 21, color_bg = "blue", cex_points = 0.7 )
result |
An S3 object of class 'AdPaik', returned by the main model call 'AdPaikModel(...)'. |
frailty_sd |
A numerical vector representing the evaluated frailty standard deviation, with length equal to the number of time-domain intervals. Its elements must be non-negative. |
flag_variance |
A boolean flag indicating whether to plot the frailty variance ( |
flag_sd_external |
A logical flag indicating whether the user is providing an external frailty standard deviation vector. |
xlim |
A numeric vector of length 2, defining the range for the x-axis (intervals). Default is from 1 to the number of intervals. |
ylim |
A numeric vector of length 2, defining the range for the y-axis (values). Default is |
xlab |
A string for the x-axis label. Default is |
ylab |
A string for the y-axis label. Default is |
main_title |
A string for the plot title. Default title is |
pch |
A numeric or character symbol used for plotting the frailty standard deviation values. Default is a dot ( |
color_bg |
A string specifying the color used for the symbols. Default is |
cex_points |
A numeric value indicating the size of the plotting symbols. Default is |
The plot represents the values of the frailty standard deviation or variance for each time interval. It connects these points to illustrate the trend of the chosen metric.
This function supports two modes of operation:
Plotting the frailty standard deviation or variance retrieved from the main model (contained in the S3 object of class 'AdPaik').
Plotting a user-provided vector of frailty standard deviations, which can be computed using the method frailty.sd
. This allows for flexibility in analysis without re-optimizing the log-likelihood function. For instance, users can compare frailty standard deviations computed with different model specifications (e.g., including only time-dependent terms).
The output will differentiate between these two cases, ensuring the correct values are plotted regardless of the source.
A plot displaying either the frailty standard deviation or variance across the specified intervals.
# Import data data(data_dropout) # Define the variables needed for the model execution eps_paik <- 1e-10 categories_range_min <- c(-8, -2, eps_paik, eps_paik, eps_paik) categories_range_max <- c(-eps_paik, 0.4, 1 - eps_paik, 1, 10) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) formula <- time_to_event ~ Gender + CFUP + cluster(group) # Call the main model function result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max) plot_frailty_sd(result, ylim=c(0, 0.50), xlab = 'Time [intervals]', ylab = 'Standard deviation')
# Import data data(data_dropout) # Define the variables needed for the model execution eps_paik <- 1e-10 categories_range_min <- c(-8, -2, eps_paik, eps_paik, eps_paik) categories_range_max <- c(-eps_paik, 0.4, 1 - eps_paik, 1, 10) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) formula <- time_to_event ~ Gender + CFUP + cluster(group) # Call the main model function result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max) plot_frailty_sd(result, ylim=c(0, 0.50), xlab = 'Time [intervals]', ylab = 'Standard deviation')
This function plots the trend of the log-likelihood function concerning a single parameter specified by its index in the parameter vector. It generates samples of the parameter, evaluates them in the log-likelihood function, and displays the results along with the maximum point of the one-dimensional log-likelihood function.
plot_ll_1D( param_1D, index_param_1D, ll_1D, params, param_range_min, param_range_max, dataset, centre, time_axis, dropout_matrix, e_matrix, n_points = 150, cex = 0.7, cex_max = 0.8, color_bg = "black", color_max_bg = "red", pch = 21 )
plot_ll_1D( param_1D, index_param_1D, ll_1D, params, param_range_min, param_range_max, dataset, centre, time_axis, dropout_matrix, e_matrix, n_points = 150, cex = 0.7, cex_max = 0.8, color_bg = "black", color_max_bg = "red", pch = 21 )
param_1D |
A numeric value representing the optimal parameter determined by maximizing the log-likelihood function for the specified parameter. |
index_param_1D |
An integer representing the index of the optimal parameter within the parameter vector. |
ll_1D |
A numeric value of the log-likelihood function evaluated at the optimal parameter |
params |
A numeric vector of length equal to the number of parameters minus one, containing the fixed values for the other parameters. |
param_range_min |
A numeric value indicating the minimum allowable value for the parameter |
param_range_max |
A numeric value indicating the maximum allowable value for the parameter |
dataset |
A data frame or matrix containing individual covariates. |
centre |
A numeric vector indicating individual cluster membership; its length must match the number of individuals in the dataset. |
time_axis |
A numeric vector corresponding to the subdivisions of the temporal domain. |
dropout_matrix |
A binary matrix indicating which interval of the time domain an individual failed. Each row should sum to 1 (if failed) or 0 (if not failed), with dimensions (n_individuals, n_intervals). |
e_matrix |
A matrix of dimensions (n_individuals, n_intervals), where each element contains the evaluation of the temporal integral performed by the function |
n_points |
An integer specifying the number of points at which to evaluate the log-likelihood function. A value that is neither too small nor too high is recommended; the default is 150. |
cex |
A numeric value specifying the size of the points used for the graphical representation of the log-likelihood function. Default is 0.7. |
cex_max |
A numeric value indicating the size of the optimal point (the one maximizing the log-likelihood function). Default is 0.8. |
color_bg |
A string specifying the color for the points representing the log-likelihood trend. Default is |
color_max_bg |
A string specifying the color for the optimal point provided as the first argument. Default is |
pch |
A numeric or character symbol representing the shape of the plotted points. Default is a circle ( |
A plot displaying the trend of the log-likelihood function concerning a single parameter, including the maximum point.
This function plots the trend of the log-likelihood function with respect to a single parameter specified by its index in the parameter vector. It generates samples of the parameter, evaluates them in the log-likelihood function, and displays the results along with the maximum point of the one-dimensional log-likelihood function.
plot_ll_1D.AdPaik( param_1D, index_param_1D, ll_1D, params, param_range_min, param_range_max, dataset, centre, time_axis, dropout_matrix, e_matrix, n_points = 150, cex = 0.7, cex_max = 0.8, color_bg = "black", color_max_bg = "red", pch = 21 )
plot_ll_1D.AdPaik( param_1D, index_param_1D, ll_1D, params, param_range_min, param_range_max, dataset, centre, time_axis, dropout_matrix, e_matrix, n_points = 150, cex = 0.7, cex_max = 0.8, color_bg = "black", color_max_bg = "red", pch = 21 )
param_1D |
A numeric value representing the optimal parameter determined by maximizing the log-likelihood function for the specified parameter. |
index_param_1D |
An integer representing the index of the optimal parameter within the parameter vector. |
ll_1D |
A numeric value of the log-likelihood function evaluated at the optimal parameter |
params |
A numeric vector of length equal to the number of parameters minus one, containing fixed values for the other parameters. |
param_range_min |
A numeric value indicating the minimum allowable value for the parameter |
param_range_max |
A numeric value indicating the maximum allowable value for the parameter |
dataset |
A data frame or matrix containing individual covariates. |
centre |
A numeric vector indicating individual cluster membership; its length must match the number of individuals in the dataset. |
time_axis |
A numeric vector corresponding to the subdivisions of the temporal domain. |
dropout_matrix |
A binary matrix indicating which interval of the time domain an individual failed. Each row should sum to 1 (if failed) or 0 (if not failed), with dimensions (n_individuals, n_intervals). |
e_matrix |
A matrix of dimensions (n_individuals, n_intervals) where each element contains the evaluation of the temporal integral performed by the function |
n_points |
An integer specifying the number of points at which to evaluate the log-likelihood function. A value that is neither too small nor too high is recommended; the default is 150. |
cex |
A numeric value specifying the size of the points used for the graphical representation of the log-likelihood function. Default is 0.7. |
cex_max |
A numeric value indicating the size of the optimal point (the one maximizing the log-likelihood function). Default is 0.8. |
color_bg |
A string specifying the color for the points representing the log-likelihood trend. Default is |
color_max_bg |
A string specifying the color for the optimal point provided as the first argument. Default is |
pch |
A numeric or character symbol representing the shape of the plotted points. Default is a circle ( |
A plot displaying the trend of the log-likelihood function with respect to a single parameter, including the maximum point.
This function plots the posterior frailty estimates for each group in each time interval. Each group’s estimates are represented by a sequence of points connected by straight lines. The function can plot either the entire posterior frailty estimate or its time-independent and time-dependent components based on user-specified flags.
plot_post_frailty_est( result, data, flag_eps = FALSE, flag_alpha = FALSE, xlim = c(0, length(result$TimeDomain) - 1), ylim = c(0, 10), xlab = "Intervals", ylab = "Values", main_title = "Posterior frailty estimates", cex = 0.7, pch_type = rep(21, length(centre_codes)), color_bg = rep("black", length(centre_codes)), cex_legend = 0.7, pos_legend = "topright" )
plot_post_frailty_est( result, data, flag_eps = FALSE, flag_alpha = FALSE, xlim = c(0, length(result$TimeDomain) - 1), ylim = c(0, 10), xlab = "Intervals", ylab = "Values", main_title = "Posterior frailty estimates", cex = 0.7, pch_type = rep(21, length(centre_codes)), color_bg = rep("black", length(centre_codes)), cex_legend = 0.7, pos_legend = "topright" )
result |
S3 object of class 'AdPaik', returned by the method call 'AdPaikModel(...)'. |
data |
Dataset (dataframe) in which all variables of the formula object must be found and contained. |
flag_eps |
Logical flag indicating whether to plot only the time-dependent posterior frailty estimates. Default is FALSE. |
flag_alpha |
Logical flag indicating whether to plot only the time-independent posterior frailty estimates. Default is FALSE. |
xlim |
Numeric vector of length 2 specifying the x-axis limits. Default is based on the number of intervals. |
ylim |
Numeric vector of length 2 specifying the y-axis limits. Default is (0, 10). |
xlab , ylab
|
String giving the x and y axis name. Default values are 'Intervals' and 'Values'. |
main_title |
Title of the plot. Default title is 'Posterior frailty estimates'. |
cex |
Dimension of the points used for plotting the estimates. |
pch_type |
Numerical vector of length equal to the number of clusters in the data, giving the symbol to be used for plotting the estimates. Default symbol (circle, 21) is the same for all clusters. |
color_bg |
Numerical vector of length equal to the number of clusters in the data, giving the color to be used for plotting the symbols for the estimates. Default ('black') is the same for all faculties. On the other hand, the same color is used throughout the intervals for the same faculty. |
cex_legend |
Dimension of the symbol in the legend. Default is 0.7. |
pos_legend |
Either a numeric vector providing the x and y coordinates for the legend or a string specifying the legend's position (e.g., 'bottomright', 'bottom', 'bottomleft', 'left', 'topleft', 'top', 'topright', 'right', 'center'). |
Recalling the frailty structure as and the posterior
frailty estimate as
,
this function allows plotting either the entire posterior frailty estimate
or its time-independent
or
time-dependent
components.
The user can control which components to display using the flag_eps and flag_alpha parameters.
Only one of these flags can be set to TRUE at a time.
The plot of the posterior frailty estimates.
# Import data data(data_dropout) # Define the variables needed for the model execution eps_paik <- 1e-10 categories_range_min <- c(-8, -2, eps_paik, eps_paik, eps_paik) categories_range_max <- c(-eps_paik, 0.4, 1 - eps_paik, 1, 10) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) formula <- time_to_event ~ Gender + CFUP + cluster(group) # Call the main model function result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max) # Define variables for plotting the estimates pch_type <- c(21, seq(21,25,1), seq(21,25,1), seq(21,25,1)) color_bg <- c("darkblue", rep("red", 5), rep("purple", 5), rep("green",5)) plot_post_frailty_est(result, data_dropout, xlim = c(1, 11), ylim = c(0,3), pch_type = pch_type, color_bg = color_bg, xlab = 'Time [intervals]', ylab = 'Posterior estimates', pos_legend = 'bottomright')
# Import data data(data_dropout) # Define the variables needed for the model execution eps_paik <- 1e-10 categories_range_min <- c(-8, -2, eps_paik, eps_paik, eps_paik) categories_range_max <- c(-eps_paik, 0.4, 1 - eps_paik, 1, 10) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) formula <- time_to_event ~ Gender + CFUP + cluster(group) # Call the main model function result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max) # Define variables for plotting the estimates pch_type <- c(21, seq(21,25,1), seq(21,25,1), seq(21,25,1)) color_bg <- c("darkblue", rep("red", 5), rep("purple", 5), rep("green",5)) plot_post_frailty_est(result, data_dropout, xlim = c(1, 11), ylim = c(0,3), pch_type = pch_type, color_bg = color_bg, xlab = 'Time [intervals]', ylab = 'Posterior estimates', pos_legend = 'bottomright')
Function for computing the confidence interval for each posterior frailty estimates .
post_frailty_CI.AdPaik( post_frailty_est, post_frailty_est_var, n_centres, n_intervals )
post_frailty_CI.AdPaik( post_frailty_est, post_frailty_est_var, n_centres, n_intervals )
post_frailty_est |
Posterior frailty estimates list. |
post_frailty_est_var |
Posterior frailty variance list. |
n_centres |
Number of clusters/centres |
n_intervals |
Number of intervals of the time-domain. it is equal to the length of the tima_axis minus one. |
S3 object of class 'PFCI.AdPaik' composed of two matrices of dimension (number groups, number of intervals):
PostFrailtyCI_left: left confidence interval for each posterior frailty estimates
PostFrailtyCI_right: right confidence interval for each each posterior frailty estimates
Function for computing the posterior frailty estimates and variances of the time-dependent shared frailty Cox model.
Recalling the structure of the frailty as being composed by the sum
of two independent gamma distributions:
the posterior distribution of both terms is still a gamma with different mean and variance and the
posterior frailty estimate corresponds to the 'empirical Bayes estimate', that is the previous mentioned posterior mean.
post_frailty.AdPaik(optimal_params, dataset, time_to_event, centre, time_axis)
post_frailty.AdPaik(optimal_params, dataset, time_to_event, centre, time_axis)
optimal_params |
Optimal parameters estimated by maximizing the log-likelihood function, through the constraint multi-dimensional optmization method. |
dataset |
Dataset containing all the covariates/regressors. |
time_to_event |
Time-instant, in the follow-up, in which an individual faces the event or fails. If an individual does not face the event in the follow-up, then the time-instant must assume a default value. |
centre |
Individual group/cluster membership. |
time_axis |
Temporal domain. |
S3 object of class 'PF.AdPaik' composed of two elements of different class:
PosteriorFrailtyEst: S3 object of class 'PFE.AdPaik'.
PosteriorFrailtyVar: S3 object of class 'PFV.AdPaik'.
This function displays a summary of the model output based on the class of the result object. It delegates to the appropriate summary method according to the class of the result.
summary(result)
summary(result)
result |
An object containing the output of the model call. The class of this object determines which summary method is invoked. |
A summary of the model output printed to the console.
This function provides a comprehensive summary of the results from the Adapted Paik et al.'s Time-Dependent Shared Frailty Model. It includes key information about the dataset (e.g., number of individuals, regressors, intervals, and clusters), model parameters, and output (log-likelihood, AIC). The summary also lists the estimated regressors along with their standard errors
## S3 method for class 'AdPaik' summary(result)
## S3 method for class 'AdPaik' summary(result)
result |
'S3' class object returned by the main model call, i.e. output of the 'Adapted Paik et al.'s Model'. |
The function reports the estimated regressors, their standard errors, and confidence intervals (if available).
Model summary printed on output.
# Define the variables needed for the model execution data(data_dropout) eps_paik <- 1e-10 categories_range_min <- c(-8, -2, eps_paik, eps_paik, eps_paik) categories_range_max <- c(-eps_paik, 0.4, 1 - eps_paik, 1, 10) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) formula <- time_to_event ~ Gender + CFUP + cluster(group) # Call the main model function result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max) # Call the summary summary(result)
# Define the variables needed for the model execution data(data_dropout) eps_paik <- 1e-10 categories_range_min <- c(-8, -2, eps_paik, eps_paik, eps_paik) categories_range_max <- c(-eps_paik, 0.4, 1 - eps_paik, 1, 10) time_axis <- c(1.0, 1.4, 1.8, 2.3, 3.1, 3.8, 4.3, 5.0, 5.5, 5.8, 6.0) formula <- time_to_event ~ Gender + CFUP + cluster(group) # Call the main model function result <- AdPaikModel(formula, data_dropout, time_axis, categories_range_min, categories_range_max) # Call the summary summary(result)
Function for the resolution of an integral with respect to time, in the evaluation of the log-likelihood function. It is implemented as defined in the paper of Wintrebert et al.'s (2004)
time_int_eval(time_t, k, time_axis)
time_int_eval(time_t, k, time_axis)
time_t |
Event time instant |
k |
k-th interval of the time-axis |
time_axis |
Temporal domain (it may coincide with the follow-up) |
Evaluation of the temporal integral