I have a job that runs each morning using cronR. Our data engineer emailed me to say that for a few days over the past week there was an absurdly high number of Athena queries coming from our odbc connection that I have set up on rstudio.
Sure enough, I checked the cron log of the job and it looks like the job is trying over and over. I deduce this from the fact that there's hundreds of lines of output of just the packages loading but not the completion messages of the job itself.
I realise that this is a vague post. I would like to provide more information I am just really at a loss for what or where to look. Any pointers most welcome, any suggestions about where I should look most welcome.
Below is a sanitized copy of the cron output log. You can see the tidyverse loading up again and again whereas it should just appear once when the script runs per day.
When the script does run fine it should accumulate a data frame for 4 time horizons and it outputs a message like so:
fungame: Gathered data for cohort: 2020-03-27 horizon: 30
fungame: Gathered data for cohort: 2020-03-27 horizon: 90
fungame: Gathered data for cohort: 2020-03-27 horizon: 180
fungame: Gathered data for cohort: 2020-03-27 horizon: 365
After this the script combines these prediction vectors onto a data frame and then sends to AWS s3. However, it looks like it's spinning it's wheels since many times I see the message e.g.
fungame: Gathered data for cohort: 2020-03-27 horizon: 30
fungame: Gathered data for cohort: 2020-03-27 horizon: 90
So in this case it only managed to get 2 of the 4 predictions.
If this kind of issue sounds familiar to anyone please share your experience. I have a cron task that seems to be spinning it's wheels and retrying over and over which is a problem since we are billed on a per query basis using AWS Athena.
When I open the cronR gui and load the file my_schedule.cron I see my job:
## cronR job
## id: fungame_revenue_predictions
## tags: predictions
## desc: Fungame daily revenue predictions
* 14 * * * /usr/lib64/R/bin/Rscript '/home/rstudio-doug/analysis/radhoc/revenue_model/cron_scripts/cron_run.R' fungame 7 >> '/home/rstudio-doug/analysis/radhoc/revenue_model/cron_scripts/cron_run_fungame.log' 2>&1
I'm not to familiar with how cronR and the crontab work or the relationship between them. Could the fact that I have pressed the 'load crontab schedule' within the gui several times duplicated the job? I don't think so because when I run crontab -e
in the terminal I only see a single job. I wondered if I might have inadvertantly duplicated the scheduling of this job by reloading the schedule several times. But that might be an irrelevant diversion, I really do not know why this is happening?