Trying to find the equivalent of pwcorr in Rstudio

Hello Everyone,

I am new to R and Rstudio, and I have more familiarity with State. Right now, I am trying to run a Pearson's correlation test along with the significance in RStudio, but I cannot seem to get it right. In stata, I would type "pwcorr b_killing hisppct, sig" for example. Would you all have any recommendations of code or packages that could help me accomplish this?

I've never used Stata, but you should look at cor.test() in base R or corr.test() in the pysch package.

2 Likes

So, I am using the psych package, and RStudio is saying "'x' must be numeric". I formatted x as a number, so why am I having this issue? X is a binary variable in my case (0 or 1).

I appreciate all of the help.

Hard for me to tell what is happening from that description. Here are some tips for writing questions that are more likely to get answers.

If you ever want to know "How do I do X in R?", you can use the ?? command. Using your initial question as an example:

??"correlation test"

This searches the documentation of all installed packages and then shows a list of related pages. When I make that query, I get a lot of pages from the nlme package and the psych::phi and stats::cor.test pages. Of course, what's returned depends on what you have installed.

This last note has nothing to do with your question but can really help new R users: there are a lot of packages people have shared on CRAN. No matter the subject of your analysis, chances are CRAN has a package to make it easier. The best place to start looking for them is through the Task Views. Total shot in the dark, but you may be interested in the Statistics for the Social Sciences category.

3 Likes

@truelovewaits, how are you formatting x as a number? What is the result of class(x) for you?

x <- c('0','1','1','1')
y<- c('1','0','1','1')

class(x)
#=> [1] "character"

corr.test(x,y)
#=> Error in cor(x, y, use = use, method = method) : 'x' must be numeric

Using as.numeric , I get the expected result

num_x <- as.numeric(x)
num_y <- as.numeric(y)
corr.test(num_x, num_y)

#=> Correlation matrix 
#=> [1] -0.58
1 Like

@grosscol, I am formatting x as a number in excel, and the environment recognizes it as a number as well.
Here's what the code looks like:

> class('bph')
[1] "character"
> #correlation test
> corr.test('bph', 'VCRate1', use = "pairwise" , method="pearson", adjust="holm", 
+           alpha=.05,ci=TRUE)
Error in cor(x, y, use = use, method = method) : 'x' must be numeric

For the as.numeric code, I get this:

 as.numeric('bph')
[1] NA
Warning message:
NAs introduced by coercion 

The quotation marks around bph mean that you're referring to it as a string. That's why it's returning "character" for its class. Here's a simple example:

name <- "mara"
'name'
#> [1] "name"
name
#> [1] "mara"

Created on 2018-11-02 by the reprex package (v0.2.1.9000)

For a (much) more in-depth explanation, see the Names and values chapter of Advanced R
https://adv-r.hadley.nz/names-values.html

1 Like

@truelovewaits, @mara is correct. In your code, you're passing the name of the variable as a string. Try passing the variable itself.

What is the result of the following code for you?

class(bph)

corr.test(bph, VCRate1, use = "pairwise" , method="pearson", adjust="holm", + alpha=.05,ci=TRUE)
1 Like

@grosscol, This is what I am getting:


> class(bph)
Error: object 'bph' not found
> 
> corr.test(bph, VCRate1, use = "pairwise" , method="pearson", adjust="holm", + alpha=.05,ci=TRUE)
Error: unexpected '=' in "corr.test(bph, VCRate1, use = "pairwise" , method="pearson", adjust="holm", + alpha="

I am really sorry if I am missing some basic stuff here.

Looks like the variable doesn't exist in your environment. You're reading data in from an Excel document or tabular format? Is it getting read into a data frame? Or more simply, how are you reading data in?

1 Like

Hmm, I think we’ll need to see more of your code to understand why bph isn’t defined. One possibility: if it’s a column in a data frame, then you’ll need to refer to it as your_dataframe_name$bph.

Here are some resources that might be helpful for making the transition from Stata-land to the R-chipelago:

4 Likes

@grosscol, I imported the data using environment->import dataset->I chose my CSV file. Does this not upload my data? In total, my code says:

#Purpose: Run a pearson's test of significance of bph and VCRate 1 for 2005-2007
#DV-VCRate1   IV-bph
#05to071=Dataset

`05to071`

setwd("C:/Users/Jacob/OneDrive/Classes/4980H/QGIS/RStudio")

#use psych package for corr.test
library(psych)

class(bph)

corr.test(bph, VCRate1, use = "pairwise" , method="pearson", adjust="holm", + alpha=.05,ci=TRUE)

#purpose:Find descriptive statistics
summary(05to071)

@truelovewaits, I believe that does import your data. I think the wizard shows you the equivalent code in the "Code Preview" section of the import data dialog.

If youre data set name is 05to071, then you can access the columns in it using the [] or $. Since the name of the data frame begins with a number, you're going to have to wrap it in back ticks every time you want to reference it.

What is the result of the following code for you?

class(`05to071`)
class(`05to07`$bph)

corr.test(`05to071`$bph, `05to071`$VCRate1, use = "pairwise" , method="pearson", adjust="holm", alpha=.05, ci=TRUE)
2 Likes

@grosscol, If I remove the + sign from the code, I get results! Here's what it says:

> class(`05to071`)
[1] "data.frame"
> class(`05to071`$bph)
[1] "numeric"
> 
> corr.test(`05to071`$bph, `05to071`$VCRate1, use = "pairwise" , method="pearson", adjust="holm", alpha=.05,ci=TRUE)
Call:corr.test(x = `05to071`$bph, y = `05to071`$VCRate1, 
    use = "pairwise", method = "pearson", adjust = "holm", 
    alpha = 0.05, ci = TRUE)
Correlation matrix 
[1] 0.09
Sample Size 
[1] 887
Probability values  adjusted for multiple tests. 
[1] 0.01

I believe my issue has been solved!

2 Likes

Excellent. The plus in the parameters snuck in there from copy pasting. Sorry about that. Good catch.

Glad it's working for you. In your subsequent work, try to pause and test that you can access the data you're working with and that it has the expected values. I usually write a little bit of code to check things like:

  • Print the column names of the data.
  • Print the number of columns in the data.
  • Print the 2nd column in the data.

Those will usually catch the problems with referencing the data early on.

2 Likes

@grosscol, Thank you for all of the help. You have been amazing! I will look out for those issues in the future.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.