Train_model()
trains a multinomial logistic regression model with a LASSO penalty, where the outcome categories are the current CRAN Task Views
and an additional "None" category.
Usage
Train_model(
TEST = FALSE,
limiting_n_observations = 100,
get_input_stored = FALSE,
get_input_path =
"tests/testthat/fixtures/get_CRAN_logs_output/get_CRAN_logs_output.rds",
save_output = FALSE,
save_path = "OUTPUT/"
)
Arguments
- TEST
logical. Default is
FALSE
. IfTRUE
, then a subset of the data that is extracted from CRAN is selected. This is to speed up testing.More precisely, if
TRUE
a random selection of rows fromCRAN_data
is selected, where the number of rows chosen is given bylimiting_n_observations
.- limiting_n_observations
Integer that decides the size of the subset of
CRAN_data
, whenTEST
isTRUE
.- get_input_stored
logical. If
TRUE
then the function uses pre saved data as input, otherwise it runs theCTVsuggestTrain
internalget_data()
function.- get_input_path
string. If
get_input_stored
is set toTRUE
,get_input_path
gives the path location of the pre-saved data.- save_output
logical. Default is
FALSE
. IfTRUE
, then the list that is returned is saved to the path set bysave_path
.- save_path
string. Sets the path where the list created by the function will be saved, which is when
save_output
is set toTRUE
Value
Returns
predicted_probs_for_suggestions
- data.frame where each row is the predicted probability vector for each CRAN package that is not assigned Task View that does not meet monthly download threshold.predicted_probs_for_suggestions
is created using thepredict()
function and themodel
object.model
- Model objectmodel_accuracy
- A percentage value which says how accurate the model is on a test set.
Details
The Train_model()
function, relies on the four internal functions:
These four internal functions are run within each other in this order e.g. Train_model()
initiates with running get_CRAN_logs()
which initiates with get_create_features()
.
Hence the entire pipeline begins with get_data()
.
The Train_model()
function itself, after running get_CRAN_logs()
, carries out the model training using the response matrix and feature matrix that were constructed with the four internal
functions mentioned above.