Joining Data Sets

I have two data sets that I merged together using the dplyr::left_join function. When doing this, the common named columns "joined" properly. My last column that had no common named columns, which is also a numerical column, turned all my numerical values into NAs. How can I keep my numerical values while joining the two datasets

Can you please share a small part of the data sets in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

Unfortunately, I am not catching how to do the last steps in the video. I'm assuming that what I have put down below will not work:

Blocks_Per_Game Turnovers_Per_Game Offensive_Rating Defensive_Rating Salary
1 1.06 1.51 122.0 101.9 NA
2 1.29 2.82 116.2 102.2 NA
3 1.64 1.40 114.7 109.1 NA
4 1.31 1.10 131.3 101.0 NA
5 1.05 3.65 116.1 90.2 NA
6 0.47 1.72 103.2 109.0 NA

I can see why it wouldn't.. Those NAs are supposed to be under salary, for example

I think it would be better if you could prepare a reproducible example (reprex) illustrating your issue. Please have a look at this guide, to see how to create one:

head(nba_salaries2, 10)[,c("Players", "Salary", "Rank")]
#> Error in head(nba_salaries2, 10): object 'nba_salaries2' not found
datapasta::df_paste(head(nba_salaries2, 10)[,c("Players", "Salary", "Rank")])
#> Error in head(nba_salaries2, 10): object 'nba_salaries2' not found
nba_reg <- data.frame(
  stringsAsFactors = FALSE,
                      Players = c("Stephen Curry","Chris Paul","Russell Westbrook",
                                  "John Wall","James Harden","LeBron James",
                                  "Kevin Durant","Blake Griffin","Kyle Lowry",
                                  "Paul George"),
                       Salary = c(40231758,
                                  38506482,38178000,37800000,37800000,
                                  37436858,37199000,34234964,33296296,33005556),
              Rank = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
           )

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
nba_stats4 <- dplyr::left_join(nba_stats3, nba_salaries2)
#> Error in dplyr::left_join(nba_stats3, nba_salaries2): object 'nba_stats3' not found

Created on 2020-10-31 by the reprex package (v0.3.0)

Did it work?? I'm feeling pretty good about learning something new here

In order to make your example reproducible, you have to provide sample data for nba_stats3 and nba_salaries2

So, should I do a reprex for nba_stats3 now? Was the last bit of data sufficient for what is needed for nba_salaries2? Or was that something entirely different than a sample of nba_salaries2

Please read the guide I gave you more carefully, you need to provide sample data (in a copy/paste friendly format) that allows us to run your code on our own, see what is going on and give you a solution.

I'm unable to fix the error messages that populate when attempting to render the reprex. Not sure what else to try as I may be making a bigger mess than I'm attempting to clean up

This is a reproducible example of making a left_join() (which means you can simply copy the code as it is and make it work on your computer). Try to make one that shows your issue.

library(dplyr)

# Sample data on a copy/paste friendly format
nba_salary <- data.frame(
    stringsAsFactors = FALSE,
    Players = c("Stephen Curry","Chris Paul","Russell Westbrook",
                "John Wall","James Harden","LeBron James",
                "Kevin Durant","Blake Griffin","Kyle Lowry",
                "Paul George"),
    Salary = c(40231758,
               38506482,38178000,37800000,37800000,
               37436858,37199000,34234964,33296296,33005556),
    Rank = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
)

nba_stats <- data.frame(
    stringsAsFactors = FALSE,
    Players = c("Stephen Curry","Chris Paul","Russell Westbrook",
                "John Wall","James Harden","LeBron James",
                "Kevin Durant","Blake Griffin","Kyle Lowry",
                "Paul George"),
    some_stat = rnorm(10)
)

# Relevant code
nba_stats %>% 
    left_join(nba_salary, by = "Players")
#>              Players   some_stat   Salary Rank
#> 1      Stephen Curry  0.01482341 40231758    1
#> 2         Chris Paul -0.18417913 38506482    2
#> 3  Russell Westbrook -0.42616613 38178000    3
#> 4          John Wall -0.63422415 37800000    4
#> 5       James Harden  1.34669284 37800000    5
#> 6       LeBron James -0.65364381 37436858    6
#> 7       Kevin Durant -0.69911135 37199000    7
#> 8      Blake Griffin  0.75686720 34234964    8
#> 9         Kyle Lowry -1.34995016 33296296    9
#> 10       Paul George  0.59446418 33005556   10

Created on 2020-11-01 by the reprex package (v0.3.0.9001)

Ok I was able to get majority of the salaries joined to my stats data frame using the code you displayed. There were a few that the salary came through as an "NA" still but that's not a problem for my project. Still not entirely sure how to recreate my example in a reusable format to put on here. But will go back through the steps and try it again

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.