+ sign after trying to declare a variable

kylec1729 · May 25, 2020, 4:36am

I'm trying to declare the following variable as a list of patient ID's to use for further analysis:

id = c("patientID1", "patientID2", ...)

I have a list of about 5000 patient ID's I need to do this for, so I used a comma separated list generator. The minimal working example given in this Stack post works, but when I try it for all of my patient ID's it doesn't seem to work. Importantly, when I tried the minimal working example in the Stack post (it worked), I did type it out manually rather than using the column to list converter.

Any ideas here? Is there another way that I should be making the comma separated list in R from an excel column to make sure that it works? Why is R giving me the + sign anyways? The brackets are closed.

nirgrahamuk · May 25, 2020, 4:38am

The console has a character limit.
Put your excel column into CSV format and read the CSV file into R

kylec1729 · May 25, 2020, 4:43am

Thank you very much! Can I declare the variable "id" as the column in Excel? Are there any guides to this? Apologies, I'm new to R.

nirgrahamuk · May 25, 2020, 5:07am

Typically the first row of a CSV is a header and users to name the column that will be made from the following rows beneath it. any read CSV function would support that. Also you can rename columns later.

kylec1729 · May 25, 2020, 5:14am

When I use id <- read.csv(file.choose(), header=T) I get an error in the next thing that I want to do:

> filtered <- data_rs146217251[id,]
Error in data_rs146217251[id, ] : invalid subscript type 'list'

I want to import the .csv column exactly as this:

> id <- c("4428814_4428814", "3490518_3490518", "3094358_3094358")
> id
[1] "4428814_4428814" "3490518_3490518"
[3] "3094358_3094358"

So that I can run this:

> filtered <- data_rs146217251[id,]
> filtered
                g=0 g=1 g=2
4428814_4428814   0   0   1
3490518_3490518   0   0   1
3094358_3094358   0   0   1

How do I import the .csv column in a format that would be precisely in the same form as the c() function, as in id <- c("4428814_4428814", "3490518_3490518", "3094358_3094358"), rather than the list that read.csv() gives?

nirgrahamuk · May 25, 2020, 5:25am

Where does your data_rs146217251 come from ?
The first thing you do is read the CSV into an object called id.

Secondly I don't understand where the g= information is supposed to have come from if you are loading only a list of ids?

kylec1729 · May 25, 2020, 5:48am

The data_rs146217251 comes from a main dataset called data which has a bunch of different genetic data. The g= information comes from a different file that I loaded from a .bgen file using the rbgen package (it's allele dosage data), but that's unrelated.

Here's a post which explains exactly what I'm trying to do, and for which there is a solution which I'm trying to use: https://stackoverflow.com/questions/61965995/extracting-rows-in-r-based-on-an-id-number

I tried id <- read.csv(file.choose(), header=T) using the .csv file with the list of patient ID's, but then as soon as I ran > filtered <- data_rs146217251[id,] it gave me the error Error in data_rs146217251[id, ] : invalid subscript type 'list'.

So that's why I'd prefer a way that I could just load the column from the .csv in a way that replicates the same vectorized form that's given by

> id <- c("4428814_4428814", "3490518_3490518", "3094358_3094358")
> id
[1] "4428814_4428814" "3490518_3490518"
[3] "3094358_3094358"

Does that help clarify what I'm trying to do?

kylec1729 · May 25, 2020, 6:15am

Apologies, I was able to figure out my problem. What I did was:

df <- read.csv('C:\\Path\\To\\DataFile.csv')
id <- df[[1]]

to import the column as a vector and that worked!

Thank you for your help again.

system · June 1, 2020, 6:15am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.