Skip to contents

Looks at the monthly downloads of packages that are assigned a Task View. Then uses this data to select a monthly package download threshold to decide which packages, that are not assigned a Task View, will be used in model training.

Usage

get_CRAN_logs(
  TEST = FALSE,
  limiting_n_observations = 100,
  get_input_stored = FALSE,
  get_input_path =
    "tests/testthat/fixtures/get_create_features_output/get_create_features_output.rds",
  save_output = FALSE,
  save_path = "tests/testthat/fixtures/get_CRAN_logs_output",
  file_name = "get_CRAN_logs_output.rds"
)

Arguments

TEST

logical. Default is FALSE. If TRUE, then a subset of the data that is extracted from CRAN is selected. This is to speed up testing.

More precisely, if TRUE a random selection of rows from CRAN_data is selected, where the number of rows chosen is given by limiting_n_observations.

limiting_n_observations

Integer that decides the size of the subset of CRAN_data, when TEST is TRUE.

get_input_stored

logical. If TRUE then the function uses pre saved data as input, otherwise it runs the CTVsuggestTrain internal get_data() function.

get_input_path

string. If get_input_stored is set to TRUE, get_input_path gives the path location of the pre-saved data.

save_output

logical. Default is FALSE. If TRUE, then the list that is returned is saved to the path set by save_path.

save_path

string. Sets the path where the list created by the function will be saved, which is when save_output is set to TRUE

file_name

string. Sets the file name for the saved object.

Value

Returns

  • no_tsk_pckgs_meet_threshold - vector of packages that are not assigned a Task View and meet monthly download threshold.

  • response_matrix, features, final_package_names, tvdb - are objects created by CTVsuggest:::get_create_features function, that need to be carried forward.

Details

The get_CRAN_logs() function is run inside Train_model().

get_CRAN_logs() carries out the following steps:

  • Firstly, creates list object that gives the monthly downloads for each package in a Task View for the past month using cranlogs::cran_downloads(). Using this list, it Computes the 75th percentile of the monthly downloads

  • Next, a vector of packages that are not assigned a Task View and whose monthly downloads exceeds this threshold is created. These packages are those that will be a labelled as not belonging to a Task Viw in the training of the model.