Find consumption values based on previous data

I have the value of production waste (m3/day) and also the consumption (kwh/day) of some locations. You can see in the output that I have total information for locations 1 to 10. However, I don't have a consumption value for locations 11 to 15. Therefore, I would like to know if it is possible to find these values, based on the consumption values I already have. It is important to note that they do not need to be exact values, but rather an average of what value it could be.

df<-structure(list(Locations=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), Production = c(239.936, 422.18352, 5.863376, 23.9936, 406.09168, 143.9616, 42.348704, 61.67968, 12.956544, 182.058268,6168.5,714.593,268.545,175.2,227.5775
), Consumption = c(467.36, 795.2, 176.2, 467.36, 
738.5, 2226.36, 107.13, 198.63, 
210.3, 1198.96,"","","","","")), row.names = c(NA, 15L), class = "data.frame")

   Locations  Production Consumption
1          1  239.936000      467.36
2          2  422.183520       795.2
3          3    5.863376       176.2
4          4   23.993600      467.36
5          5  406.091680       738.5
6          6  143.961600     2226.36
7          7   42.348704      107.13
8          8   61.679680      198.63
9          9   12.956544       210.3
10        10  182.058268     1198.96
11        11 6168.500000            
12        12  714.593000            
13        13  268.545000            
14        14  175.200000            
15        15  227.577500

You can run a regression of consumption on production using lm() and then use predict.lm() to get predicted values.

To get started, something like

myModel <- lm(Consumption~Production)

@startz, thanks for your answer. Following what you recommended, I did the following:

df<-structure(list(Locations=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), Production = c(239.936, 422.18352, 5.863376, 23.9936, 406.09168, 143.9616, 42.348704, 61.67968, 12.956544, 182.058268,6168.5,714.593,268.545,175.2,227.5775
), Consumption = c(467.36, 795.2, 176.2, 467.36, 
738.5, 2226.36, 107.13, 198.63, 
210.3, 1198.96,"","","","","")), row.names = c(NA, 15L), class = "data.frame")

df<-df%>% mutate(Production=as.numeric(Production),Consumption=as.numeric(Consumption))

myModel <- lm(Consumption~Production,data=df)

> myModel

Call:
lm(formula = Consumption ~ Production, data = df)

Coefficients:
(Intercept)   Production  
    449.406        1.357

Eu pedi para um professor verificar o melhor fit para este caso e me deu a seguinte resposta: Using the least squares method, the best fit for this case is :

g(p)=0.7533+46.1266√(p)

You know how I can find the best fit. I believe you must have to use lm right? Any tips?

Using lm gives the "best fit" (for a particular definition of "best") for a model as specified. But you're free to try different specifications. You've used the a linear specification, which is where one usually starts. Looks like the professor finds that using a square root on the right works better. One can also put powers of production on the right, or use logs.

Since the amount of data is pretty small, you might want to begin by plotting Consumption against Production to see if any particular curve pops out at you.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.