left_join() does not provide the expected results

There are multiple issues.

  1. df2 do not have a column named Date_Time. There is no way that code generated the output that you shared at the end of your question. It should fail with this error.

Error: Join columns must be present in data.
x Problem with Date_Time.

  1. The join of time column should lead to the result you got. The left table, df has times on 8th January, while right table has times on 8th February, and these are obviously different. So the join lead to no matching row in right table, and hence all values are coming as missing.
  2. This is not a coding issue, but I just wanted to point out that your code uses left_join, while thread title says inner_join. If you actually wanted to do inner_join on this data, it should lead you to zero rows.

Hope this helps.


NOTE

As you were told in last thread, please share minimal reproducible examples when asking a question, whenever possible. That means a stand alone code (meaning it will include library imports, data definitions and actual codes) which can generate same error that you are facing on any machine, and it should be such that it is small enough to isolate the issue instead of a huge code. Also, it needs to be shared in a copy-paste friendly manner, so anyone can copy paste from your thread and then try it on their own systems. What you shared is not reproducible. There is no library imports, and there is nothing called ata.frame in R. Please go through the example in reprex FAQ, it's quite easy to follow if you just go through it.

I understand which library you are using and it's a copying issue, and I can fix it, but we would expect you to do these. All of us in community are trying to help you in your issues, surely you can help them by solving these yourself?

(Just sharing as a feedback so that you can improve your next threads, hope you will take it frankly as such. If it still offends you, I apologise.)


Edit (in response to #3)

Can you please provide a proper reproducible example showing how are you getting these results? It's unexpected, and I get different if I run same code as you shared.

The result I get is just 2 NA corresponding to last two rows. This is correct, as last two entries in df$time are on 11th January. Hence these don't match times of 8th January as present in df2.

Hope this helps.


I'm so confused on your edit. If you run the following code in a fresh R session, are you not getting the expected results?

library(dplyr)

df_1 <- data.frame(time = c("2015-01-08 10:30:00", "2015-01-08 12:30:00", "2015-01-08 13:00:00", "2015-01-11 10:00:00", "2015-01-11 11:30:00"),
                   NDVI = c(0.704001825073964, 0.664227174588314, 0.646672010715424, 0.68403345905421, 0.704133275285083))

df_2 <- data.frame(time = c("2015-01-08 07:00:00", "2015-01-08 07:30:00", "2015-01-08 08:00:00", "2015-01-08 08:30:00", "2015-01-08 09:00:00", "2015-01-08 09:30:00", "2015-01-08 10:00:00", "2015-01-08 10:30:00", "2015-01-08 11:00:00", "2015-01-08 11:30:00", "2015-01-08 12:00:00", "2015-01-08 12:30:00", "2015-01-08 13:00:00", "2015-01-08 13:30:00", "2015-01-08 14:00:00", "2015-01-08 14:30:00", "2015-01-08 15:00:00", "2015-01-08 15:30:00", "2015-01-08 16:00:00", "2015-01-08 16:30:00"),
                   GPP = c(0, 1.5599170625, 2.5339026749, 6.956438303, 6.8692502975, 7.6814880371, 7.9598636627, 8.0170240402, 9.0688500404, 8.6675083637, 7.0284771919, 11.4835240841, 8.2856509686, 0, 5.3423130512, 5.9049360752, 6.8801484108, 7.2114927769, 29.7322113514, 3.9361767769))

obtained_result <- df_1 %>%
    inner_join(df_2, by="time")

expected_result <- data.frame(time = c("2015-01-08 10:30:00", "2015-01-08 12:30:00", "2015-01-08 13:00:00"),
                              NDVI = c(0.704001825073964, 0.664227174588314, 0.646672010715424),
                              GPP = c(8.017024, 11.483524, 8.285651))

all.equal(target=expected_result,
          current=obtained_result)

sessionInfo()
Outputs
> library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

> 
> df_1 <- data.frame(time = c("2015-01-08 10:30:00", "2015-01-08 12:30:00", "2015-01-08 13:00:00", "2015-01-11 10:00:00", "2015-01-11 11:30:00"),
+                    NDVI = c(0.704001825073964, 0.664227174588314, 0.646672010715424, 0.68403345905421, 0.704133275285083))
> 
> df_2 <- data.frame(time = c("2015-01-08 07:00:00", "2015-01-08 07:30:00", "2015-01-08 08:00:00", "2015-01-08 08:30:00", "2015-01-08 09:00:00", "2015-01-08 09:30:00", "2015-01-08 10:00:00", "2015-01-08 10:30:00", "2015-01-08 11:00:00", "2015-01-08 11:30:00", "2015-01-08 12:00:00", "2015-01-08 12:30:00", "2015-01-08 13:00:00", "2015-01-08 13:30:00", "2015-01-08 14:00:00", "2015-01-08 14:30:00", "2015-01-08 15:00:00", "2015-01-08 15:30:00", "2015-01-08 16:00:00", "2015-01-08 16:30:00"),
+                    GPP = c(0, 1.5599170625, 2.5339026749, 6.956438303, 6.8692502975, 7.6814880371, 7.9598636627, 8.0170240402, 9.0688500404, 8.6675083637, 7.0284771919, 11.4835240841, 8.2856509686, 0, 5.3423130512, 5.9049360752, 6.8801484108, 7.2114927769, 29.7322113514, 3.9361767769))
> 
> obtained_result <- df_1 %>%
+     inner_join(df_2, by="time")
> 
> expected_result <- data.frame(time = c("2015-01-08 10:30:00", "2015-01-08 12:30:00", "2015-01-08 13:00:00"),
+                               NDVI = c(0.704001825073964, 0.664227174588314, 0.646672010715424),
+                               GPP = c(8.017024, 11.483524, 8.285651))
> 
> all.equal(target=expected_result,
+           current=obtained_result)
[1] TRUE
Session Info
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.1.1
> 
> sessionInfo(package="dplyr")
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] C

attached base packages:
character(0)

other attached packages:
[1] dplyr_1.0.7

loaded via a namespace (and not attached):
[1] compiler_4.1.1  graphics_4.1.1  tools_4.1.1     utils_4.1.1    
[5] grDevices_4.1.1 stats_4.1.1     datasets_4.1.1  methods_4.1.1  
[9] base_4.1.1 

Here I've used inner_join as your expected results has just the common rows. But if you use left_join, it should NOT be all NA. If you continue to get so, please provide a reprex along with session information (run sessionInfo() on R console) for others to help, as it's beyond my understanding. Sorry.

1 Like