Looks at the monthly downloads of packages that are assigned a Task View. Then uses this data to select a monthly package download threshold to decide which packages, that are not assigned a Task View, will be used in model training.
Usage
get_CRAN_logs(
TEST = FALSE,
limiting_n_observations = 100,
get_input_stored = FALSE,
get_input_path =
"tests/testthat/fixtures/get_create_features_output/get_create_features_output.rds",
save_output = FALSE,
save_path = "tests/testthat/fixtures/get_CRAN_logs_output",
file_name = "get_CRAN_logs_output.rds"
)
Arguments
- TEST
logical. Default is
FALSE
. IfTRUE
, then a subset of the data that is extracted from CRAN is selected. This is to speed up testing.More precisely, if
TRUE
a random selection of rows fromCRAN_data
is selected, where the number of rows chosen is given bylimiting_n_observations
.- limiting_n_observations
Integer that decides the size of the subset of
CRAN_data
, whenTEST
isTRUE
.- get_input_stored
logical. If
TRUE
then the function uses pre saved data as input, otherwise it runs theCTVsuggestTrain
internalget_data()
function.- get_input_path
string. If
get_input_stored
is set toTRUE
,get_input_path
gives the path location of the pre-saved data.- save_output
logical. Default is
FALSE
. IfTRUE
, then the list that is returned is saved to the path set bysave_path
.- save_path
string. Sets the path where the list created by the function will be saved, which is when
save_output
is set toTRUE
- file_name
string. Sets the file name for the saved object.
Value
Returns
no_tsk_pckgs_meet_threshold
- vector of packages that are not assigned a Task View and meet monthly download threshold.response_matrix
,features
,final_package_names
,tvdb
- are objects created by CTVsuggest:::get_create_features function, that need to be carried forward.
Details
The get_CRAN_logs()
function is run inside Train_model()
.
get_CRAN_logs()
carries out the following steps:
Firstly, creates list object that gives the monthly downloads for each package in a Task View for the past month using
cranlogs::cran_downloads()
. Using this list, it Computes the 75th percentile of the monthly downloadsNext, a vector of packages that are not assigned a Task View and whose monthly downloads exceeds this threshold is created. These packages are those that will be a labelled as not belonging to a Task Viw in the training of the model.