estimate a parameter per period

supermarco · July 9, 2024, 2:07pm

Hello,
I would like to estimate a parameter x. I join the code for the SSE and the sample data.
I would like to complete the code with a function to estimate the x corresponding to each period
like:
5-1-2021 to 5-15-2021: first x
5-1-2021 to 6-19-2021: second x
5-1-2021 to 7-18-2021: third x
...

I did a try to include the function 'for(i in 1:nrow(df[1:j,])){}' but got no decent results
so i prefer no to copy paste it. Can anyone do a suggestion please? Many thanks.

Below is the basic code:

parameters to read
t = df$t # begining
T = df$T # end
P = df$P # theoretical cost
Q = df$Q # real cost

f <- function(x){P^2 - P*x^3.5 + 5}
MSE <- function(x){ mean((f(x)-Q)^2) }
XX <- nlm(MSE,.5)

Below is the table (i did not know how to upload)

|Column 1 | Column 2 | Column 3 | Column 4|
|t | T | P | Q|
|05/01/2021 | 5-15-2021 | 28.1526245271719 | 16.2|
|05/01/2021 | 5-15-2021 | 31.0374701077521 | 16.25|
|05/01/2021 | 5-15-2021 | 33.7793286109241 | 16.3|
|05/01/2021 | 5-15-2021 | 37.4022801498769 | 16.35|
|05/01/2021 | 6-19-2021 | 40.9596399443349 | 11.75|
|05/01/2021 | 6-19-2021 | 44.8941281749715 | 11.8|
|05/01/2021 | 6-19-2021 | 48.8157358179208 | 11.9|
|05/01/2021 | 6-19-2021 | 53.3207228878127 | 12|
|05/01/2021 | 6-19-2021 | 57.7738530980319 | 12.2|
|05/01/2021 | 7-18-2021 | 62.5236845819242 | 12.75|
|05/01/2021 | 7-18-2021 | 67.3270001586133 | 13.2|
|05/01/2021 | 7-18-2021 | 72.2907124853028 | 13.25|
|05/01/2021 | 7-18-2021 | 77.1012119122497 | 13.5|
|05/01/2021 | 7-18-2021 | 81.9988293252421 | 13.6|
|05/01/2021 | 7-18-2021 | 87.0004730550886 | 13.65|
|05/01/2021 | 7-18-2021 | 92.0020964833537 | 13.7|
|05/04/2021 | 5-15-2021 | 96.9589367658646 | 21.2|
|05/04/2021 | 5-15-2021 | 102.005211124637 | 21.25|
|05/04/2021 | 5-15-2021 | 107.006712608529 | 21.3|
|05/04/2021 | 5-15-2021 | 111.926997774298 | 21.35|
|05/04/2021 | 5-15-2021 | 116.927960560155 | 21.4|
|05/04/2021 | 5-15-2021 | 121.890052104498 | 21.85|
|05/04/2021 | 5-15-2021 | 126.846498258905 | 21.9|
|05/04/2021 | 6-19-2021 | 136.847619112249 | 14.05|
|05/04/2021 | 6-19-2021 | 146.848676071273 | 14.1|
|05/04/2021 | 6-19-2021 | 151.84919455444 | 14.15|
|05/04/2021 | 6-19-2021 | 0.103524619770535 | 14.2|
|05/04/2021 | 6-19-2021 | 0.103506257603117 | 14.25|
|05/04/2021 | 6-19-2021 | 0.103441845896075 | 14.3|
|05/04/2021 | 6-19-2021 | 0.103359839336302 | 14.35|
|05/04/2021 | 6-19-2021 | 0.205395884926941 | 14.4|
|05/04/2021 | 6-19-2021 | 0.103190917272473 | 14.8|
|05/04/2021 | 6-19-2021 | 0.205284783697305 | 14.85|
|05/04/2021 | 6-19-2021 | 0.205157605182112 | 14.9|
|05/04/2021 | 6-19-2021 | 0.102999244352431 | 14.95|
|05/04/2021 | 7-17-2021 | 0.221862896678315 | 13|
|05/04/2021 | 7-17-2021 | 0.221759591534659 | 13.1|
|05/04/2021 | 7-17-2021 | 0.127137687828712 | 13.2|
|05/04/2021 | 7-17-2021 | 0.221623235949667 | 13.25|
|05/04/2021 | 7-17-2021 | 0.22150678271241 | 13.3|
|05/04/2021 | 7-17-2021 | 0.12690006301537 | 13.35|
|05/04/2021 | 7-17-2021 | 0.126809526748304 | 13.4|
|05/04/2021 | 7-17-2021 | 0.12672136985722 | 13.45|
|05/04/2021 | 7-17-2021 | 0.171509850309103 | 13.5|
|05/04/2021 | 7-17-2021 | 0.148062999444731 | 13.55|
|05/04/2021 | 7-17-2021 | 0.148006943405468 | 13.6|
|05/04/2021 | 7-17-2021 | 0.198100748961343 | 13.65|
|05/05/2021 | 5-15-2021 | 0.198018416853687 | 20.05|
|05/05/2021 | 5-15-2021 | 0.147720761862854 | 20.1|
|05/05/2021 | 5-15-2021 | 0.147672200328692 | 20.15|
|05/05/2021 | 5-15-2021 | 0.147599028101768 | 20.2|
|05/05/2021 | 5-15-2021 | 0.147537774835907 | 20.25|
|05/05/2021 | 5-15-2021 | 0.147479830307385 | 20.3|
|05/05/2021 | 5-15-2021 | 0.147420990424604 | 20.35|
|05/05/2021 | 5-15-2021 | 0.147356932923342 | 20.4|
|05/05/2021 | 5-15-2021 | 0.197267807152614 | 20.45|
|05/05/2021 | 5-15-2021 | 0.252039096686449 | 20.5|
|05/05/2021 | 5-15-2021 | 0.166873154549201 | 20.55|
|05/05/2021 | 5-15-2021 | 0.251867638945013 | 20.6|
|05/05/2021 | 5-15-2021 | 0.251802856073001 | 20.65|
|05/05/2021 | 5-15-2021 | 0.264100645685321 | 20.7|
|05/05/2021 | 5-15-2021 | 0.251616645928542 | 20.75|
|05/05/2021 | 5-15-2021 | 0.25155481886409 | 20.8|
|05/05/2021 | 5-15-2021 | 0.166453578451474 | 20.85|
|05/05/2021 | 5-15-2021 | 0.166370951569743 | 20.9|
|05/05/2021 | 5-15-2021 | 0.300404850007174 | 20.95|
|05/05/2021 | 5-15-2021 | 0.310982625879516 | 21|
|05/05/2021 | 5-15-2021 | 0.310884063211701 | 21.05|
|05/05/2021 | 5-15-2021 | 0.184458913190428 | 21.1|
|05/05/2021 | 5-15-2021 | 0.328110933303664 | 21.6|
|05/05/2021 | 5-15-2021 | 0.324826205328924 | 21.65|
|05/05/2021 | 5-15-2021 | 0.324737739322167 | 21.7|
|05/05/2021 | 5-15-2021 | 0.32464833875156 | 21.75|
|05/05/2021 | 5-15-2021 | 0.324568477528462 | 21.8|
|05/05/2021 | 5-15-2021 | 0.391365324569997 | 21.85|
|05/05/2021 | 5-15-2021 | 0.280786120728432 | 21.9|
|05/05/2021 | 5-15-2021 | 0.517069625897987 | 21.95|
|05/05/2021 | 5-15-2021 | 0.2805859248693 | 22|
|05/05/2021 | 5-15-2021 | 0.217947215371299 | 22.1|
|05/05/2021 | 5-15-2021 | 0.338056553996876 | 22.15|
|05/05/2021 | 6-19-2021 | 0.337961807201776 | 12.9|
|05/05/2021 | 6-19-2021 | 0.351786150963446 | 13|
|05/05/2021 | 6-19-2021 | 0.480374804740862 | 13.1|
|05/05/2021 | 6-19-2021 | 0.351595746920873 | 13.2|
|05/05/2021 | 6-19-2021 | 0.572031655988992 | 13.25|
|05/05/2021 | 6-19-2021 | 0.443191568986322 | 13.3|
|05/06/2021 | 5-15-2021 | 0.505750739173 | 17.45|
|05/06/2021 | 5-15-2021 | 0.459057152955033 | 17.5|
|05/06/2021 | 5-15-2021 | 0.737382755889556 | 17.55|
|05/06/2021 | 5-15-2021 | 0.339351092871443 | 17.6|
|05/06/2021 | 5-15-2021 | 0.41703590440116 | 17.65|
|05/06/2021 | 5-15-2021 | 0.547195056545363 | 17.7|
|05/06/2021 | 5-15-2021 | 0.655506047426648 | 17.75|
|05/06/2021 | 5-15-2021 | 0.608035149661355 | 17.8|
|05/06/2021 | 5-15-2021 | 0.573593680954865 | 17.85|
|05/06/2021 | 5-15-2021 | 0.336371479476565 | 17.9|
|05/06/2021 | 5-15-2021 | 0.767094268645232 | 17.95|
|05/06/2021 | 5-15-2021 | 0.35009250200978 | 18|
|05/06/2021 | 5-15-2021 | 0.349954683385633 | 18.05|
|05/06/2021 | 5-15-2021 | 0.440554070864674 | 18.1|

FJCC · July 17, 2024, 1:32pm

I understand that the x you want to store is a single number for each range of dates. You can filter the data set within a for loop as done in the following code and then store the result of each loop in a vector. I'm not sure I filtered the data the way you need, but you can adjust that. I set t to equal 2021-05-01 in all cases and I set T to equal the various values.

library(tidyverse)
DF <- read.csv("~/R/Play/Dummy.csv", sep = "|")
DF <- DF |> mutate(t = mdy(t), T = mdy(T)) #change date to be numeric dates for filtering
StartDate <- ymd("2021-05-01")
EndDates <- ymd(c("2021-05-15", "2021-06-19", "2021-07-18"))
Results <- vector("numeric", length = length(EndDates))
for(i in seq_along(EndDates)) {
  DF_filter <- DF |> filter(t == StartDate, T == EndDates[i])
 
  #Do your calculation of x using DF_filter
  
  Results[i] <- x
  
}

supermarco · July 17, 2024, 3:58pm

Hello
Thank you very much for your response and time.
I think i was not clear. I'm sorry. Please let me rephrase it.
X is the parameter to estimate for the basic code i wrote. It is not a number to extract from the database.
So you are right we need a for(i=.. ) with start/end dates but at each period, the basic code should be executed to estimate (via MSE) the X. I hope i made things clear that time. Again thank you.

FJCC · July 17, 2024, 5:20pm

I thought you knew how to calculate the best value x and needed help getting subsets of the initial data frame. If I have the filtering right, you can get the best values of x like this:

library(tidyverse)

DF <- read.csv("~/R/Play/Dummy.csv", sep = "|")
DF <- DF |> mutate(t = mdy(t), T = mdy(T))
StartDate <- ymd("2021-05-01")
EndDates <- ymd(c("2021-05-15", "2021-06-19", "2021-07-18"))
Results <- vector("numeric", length = length(EndDates))
MSE_calc <- function(x) {
  Val <- P^2 - P*x^3.5 + 5
  SE <- (Q - Val)^2
  MSE <- mean(SE)
  return(MSE)
}
for(i in seq_along(EndDates)) {
  DF_filter <- DF |> filter(t == StartDate, T == EndDates[i])
  P <- DF_filter$P
  Q <- DF_filter$Q
  X <- nlm(MSE_calc, 0.5)
  Results[i] <- X$estimate
}
Results
#> [1] 2.714719 3.065551 3.491502

^{Created on 2024-07-17 with reprex v2.0.2}
Does that make sense?

supermarco · July 17, 2024, 7:05pm

Hello
Again thank you very much. Appreciated. Let me try the code please and come back.
It looks fine, in case i have several dates, i would consider the condition that it estimates a new X when EndDates change. Come back rapidly.
Thanks again,

supermarco · July 18, 2024, 10:49am

Hello
Thanks a lot ! I tested and it seems to work perfectly with the sample ! i learn from you the correct syntax !

Please: If possible i would like to point out a problem when using a larger sample. Imagine StartDate (day 1, day 2, ... day 30) that are far more numerous than the EndDates (mid month M, mid month M+1, mid month M+2). Then i realized that the codedoesnt computes the right number of estimates even if i fix the length(EndDates) with a kind of message i receive like "longer object length is not a multiple of shorter object length". This why i suggested something in the code that induce new estimate when endDate changes for example. To illustrate, imagine i write this:
StartDate <- ymd(c("2021-05-01" ,"2021-05-04", "2021-05-05", "2021-05-06", "2021-05-07",
"2021-05-08", "2021-05-11", "2021-05-12", "2021-05-13", "2021-05-14", "2021-05-15",
"2021-05-18", "2021-05-19", "2021-05-20", "2021-05-21", "2021-05-22", "2021-05-26",
"2021-05-27", "2021-05-28", "2021-05-29"))
EndDates <- ymd(c("2021-05-15", "2021-06-19", "2021-07-17", "2021-08-21"))
Does your code work?
Please let me know if it is not clear.
Many thanks Again !

FJCC · July 18, 2024, 12:58pm

No, my code will not work if StartDate has a length greater than one. To handle that, the code would have to:

have two for() loops, one for the StartDate and one for End Dates
make Results big enough to store the results. I would use a matrix with one row for each StartDate and one column for each EndDate.
have code to handle the cases where StartDate > EndDate.
Here is a sketch of the for loop structure.

Results <- matrix(0, nrow = length(StartDate), ncol = length(EndDates))
for(j in seq_along(StartDate)) {
  for(i in seq_along(EndDates)) {
    if(StarDate[j] < EndDates[i]) {
      #Do your calculation
      Results[j, i] <- X
    }

supermarco · July 18, 2024, 2:13pm

Thank you very much for your clear response. Indeed, StartDate > 1 because each day i recompute the cost of a product....I notice that you set a condition if StartDate < EndDate.
So i will take this piece of code and integer inside the calculation. I come back very soon when i see how to handle it. Many thanks again.

supermarco · July 19, 2024, 10:25am

Hello !
I run the code and i find a problem:
We should have (3 + 3 + 1) 7 estimates of X instead of only three.
Please see below an example for clarification:
StartDate -> EndDates:
2021-05-01 -> 2021-05-15 / 2021-06-19 / 2021-07-17
2021-05-04 -> 2021-05-15 / 2021-06-19 / 2021-07-17
2021-05-05 -> 2021-05-15
Whe should have:
Results
[,1] [,2] [,3]
[1,] X1 X2 X3
[2,] X4 X5 X6
[3,] X7 0 0

Thank you very much !!!

the new sample for clarification

|t | T | P | Q|
|01/05/2021 | 05-15-2021 | 0.307226768393632 | 0.48332|
|01/05/2021 | 05-15-2021 | 0.301688731220703 | 0.48332|
|01/05/2021 | 05-15-2021 | 0.396139887693312 | 0.48332|
|01/05/2021 | 06-19-2021 | 0.323209211730937 | 0.343432|
|01/05/2021 | 06-19-2021 | 0.319602633192871 | 0.343432|
|01/05/2021 | 06-19-2021 | 0.312321320233133 | 0.343432|
|01/05/2021 | 06-19-2021 | 0.30329073088301 | 0.343432|
|01/05/2021 | 06-19-2021 | 0.322303373168933 | 0.343432|
|01/05/2021 | 06-19-2021 | 0.103883903792236 | 0.343432|
|01/05/2021 | 06-19-2021 | 0.100133319383373 | 0.343432|
|01/05/2021 | 06-19-2021 | 0.110331003036399 | 0.343432|
|01/05/2021 | 07-17-2021 | 0.303272396369116 | 0.328632|
|01/05/2021 | 07-17-2021 | 0.373632906761169 | 0.328632|
|01/05/2021 | 07-17-2021 | 0.313616062927236 | 0.328632|
|01/05/2021 | 07-17-2021 | 0.331809311366162 | 0.328632|
|01/05/2021 | 07-17-2021 | 0.303132639693213 | 0.328632|
|01/05/2021 | 07-17-2021 | 0.397136088790893 | 0.328632|
|01/05/2021 | 07-17-2021 | 0.383877823336263 | 0.328632|
|04/05/2021 | 05-15-2021 | 0.236323307903033 | 0.48332|
|04/05/2021 | 05-15-2021 | 0.233372881698608 | 0.48332|
|04/05/2021 | 05-15-2021 | 0.231780329830088 | 0.48332|
|04/05/2021 | 05-15-2021 | 0.23386123036873 | 0.48332|
|04/05/2021 | 05-15-2021 | 0.233308869933082 | 0.48332|
|04/05/2021 | 05-15-2021 | 0.229373693777893 | 0.48332|
|04/05/2021 | 05-15-2021 | 0.0928117196083069 | 0.48332|
|04/05/2021 | 05-15-2021 | 0.0916033303396606 | 0.48332|
|04/05/2021 | 05-15-2021 | 0.0936313619063331 | 0.48332|
|04/05/2021 | 06-19-2021 | 0.396189322171021 | 0.30832|
|04/05/2021 | 06-19-2021 | 0.392973268331063 | 0.30832|
|04/05/2021 | 06-19-2021 | 0.389768168238667 | 0.30832|
|04/05/2021 | 06-19-2021 | 0.386373310360718 | 0.30832|
|04/05/2021 | 06-19-2021 | 0.387233363333833 | 0.30832|
|04/05/2021 | 06-19-2021 | 0.383031632908323 | 0.30832|
|04/05/2021 | 07-17-2021 | 0.363130308318273 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.362333112682333 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.339339339130839 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.336762032999268 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.333986731161336 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.361321991729736 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.333963620030893 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.33217086638931 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.339383693376263 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.0960337319166363 | 0.32372|
|04/05/2021 | 07-17-2021 | 0.0993326877973363 | 0.32372|
|05/05/2021 | 05-15-2021 | 0.383237680311373 | 0.4802|
|05/05/2021 | 05-15-2021 | 0.33337396282939 | 0.4802|
|05/05/2021 | 05-15-2021 | 0.318139362363087 | 0.4802|
|05/05/2021 | 05-15-2021 | 0.312297829337236 | 0.4802|
|05/05/2021 | 05-15-2021 | 0.306362313368632 | 0.4802|
|05/05/2021 | 05-15-2021 | 0.300638233383397 | 0.4802|
|05/05/2021 | 05-15-2021 | 0.303293118713379 | 0.4802|

FJCC · July 19, 2024, 2:05pm

Please show the code that gives you the bad result. Also please post your data as the output of the dput() function. If your data frame is named DF, run

dput(DF)

and post the output. The dput function will output code that can be used to make a copy of your data.
For both the code and the output, put lines with three back ticks just before and after the posted content, like this:
```
output of dput() or your code
```

supermarco · July 19, 2024, 6:20pm

Sure. I'll try to do the dput() function as you ask and come back rapidly. Thank you !

supermarco · July 20, 2024, 10:45am

Hello,

Please find below the dput() function output with the code.
Again, thank you very much!

DATA:

structure(list(t = structure(c(18632, 18632, 18632, 18632, 18632, 
18632, 18632, 18632, 18632, 18632, 18632, 18632, 18632, 18632, 
18632, 18632, 18632, 18632, 18722, 18722, 18722, 18722, 18722, 
18722, 18722, 18722, 18722, 18722, 18722, 18722, 18722, 18722, 
18722, 18722, 18722, 18722, 18722, 18722, 18722, 18722, 18722, 
18722, 18722, 18722, 18752, 18752, 18752, 18752, 18752, 18752, 
18752), class = "Date"), T = structure(c(18762, 18762, 18762, 
18797, 18797, 18797, 18797, 18797, 18797, 18797, 18797, 18825, 
18825, 18825, 18825, 18825, 18825, 18825, 18762, 18762, 18762, 
18762, 18762, 18762, 18762, 18762, 18762, 18797, 18797, 18797, 
18797, 18797, 18797, 18825, 18825, 18825, 18825, 18825, 18825, 
18825, 18825, 18825, 18825, 18825, 18762, 18762, 18762, 18762, 
18762, 18762, 18762), class = "Date"), P = c(0.307226768393632, 
0.301688731220703, 0.396139887693312, 0.323209211730937, 0.319602633192871, 
0.312321320233133, 0.30329073088301, 0.322303373168933, 0.103883903792236, 
0.100133319383373, 0.110331003036399, 0.303272396369116, 0.373632906761169, 
0.313616062927236, 0.331809311366162, 0.303132639693213, 0.397136088790893, 
0.383877823336263, 0.236323307903033, 0.233372881698608, 0.231780329830088, 
0.23386123036873, 0.233308869933082, 0.229373693777893, 0.0928117196083069, 
0.0916033303396606, 0.0936313619063331, 0.396189322171021, 0.392973268331063, 
0.389768168238667, 0.386373310360718, 0.387233363333833, 0.383031632908323, 
0.363130308318273, 0.362333112682333, 0.339339339130839, 0.336762032999268, 
0.333986731161336, 0.361321991729736, 0.333963620030893, 0.33217086638931, 
0.339383693376263, 0.0960337319166363, 0.0993326877973363, 0.383237680311373, 
0.33337396282939, 0.318139362363087, 0.312297829337236, 0.306362313368632, 
0.300638233383397, 0.303293118713379), Q = c(0.48332, 0.48332, 
0.48332, 0.343432, 0.343432, 0.343432, 0.343432, 0.343432, 0.343432, 
0.343432, 0.343432, 0.328632, 0.328632, 0.328632, 0.328632, 0.328632, 
0.328632, 0.328632, 0.48332, 0.48332, 0.48332, 0.48332, 0.48332, 
0.48332, 0.48332, 0.48332, 0.48332, 0.30832, 0.30832, 0.30832, 
0.30832, 0.30832, 0.30832, 0.32372, 0.32372, 0.32372, 0.32372, 
0.32372, 0.32372, 0.32372, 0.32372, 0.32372, 0.32372, 0.32372, 
0.4802, 0.4802, 0.4802, 0.4802, 0.4802, 0.4802, 0.4802)), class = "data.frame", row.names = c(NA, -51L))

CODE:

DF = read.csv(C:/sample.csv)
DF <- DF |> mutate(t = mdy(t), T = mdy(T))
StartDate <- ymd(c("2021-05-01", "2021-05-04", "2021-05-05")) 
EndDates <- ymd(c("2021-05-15", "2021-06-19", "2021-07-17"))
Results <- matrix(0, nrow = length(StartDate), ncol = length(EndDates))
MSE_calc <- function(x) {
  Val <- P^2 - P*x^3.5 + 5
  SE <- (Q - Val)^2
  MSE <- mean(SE)
  return(MSE) 		}
for(j in seq_along(StartDate)) {
  for(i in seq_along(EndDates)) { 
   if(StartDate[j] < EndDates[i]) {
      
	#Do your calculation
	DF_filter <- DF |> filter(t == StartDate, T == EndDates[i])
  	P <- DF_filter$P
  	Q <- DF_filter$Q
  	X <- nlm(MSE_calc, 0.5)
  	Results[j, i] <- X$estimate
  					    }}}

Results

FJCC · July 20, 2024, 1:14pm

I see one problem with your code and another problem with your data. In the filter() function in your code

filter(t == StartDate, T == EndDates[i])

you compare t to the entire StartDate vector. You need to pick a particular StartDate using the j index.

Your data does not match up well with your chosen start and end dates. Maybe you have not shown all your data, so this may be a problem only in this example. There are no start dates of 2021-05-01 or 2021-05-04, so when you filter for those date you will get an empty data frame. There are start dates of 2021-05-05 but they are paired only with end dates of 2021-05-15. So you will only get one real result from your loops with this data set. If you need to filter using an == condition, you need to have data that matches the chosen dates.

supermarco · July 21, 2024, 6:17pm

Hello,

Thank you very much for useful comment. I recognize that i have a small problem in the date presentation in this sample (of the real dataset) and the other comment may be useful too. I will see this rapidly and come back if necessary but i have to clean this code. Again thanks a lot for your kindness and great help. I'm here to learn from people like you.

system · July 28, 2024, 6:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.