New to R so need a hand with converting excel to a ggplot please!
I have an excel spreadsheet of two columns: RSeq value (number) and tumour_type (Normal/Cancer)
I'm trying to convert the first column to a vector variable:
rseq=FILENAME[ ,1]
but this always saves a list not a vector- not sure why though! this makes it difficult to convert to a dataframe and make a box-plot though.
Secondly I convert the second column to a factor by:
tumour_type_factor=factor(tumour_type, levels=c( "Normal", "Cancer"))
But when I run the factor it comes up with
"N/A
levels = Normal, Cancer"
so clearly it can't read the values in the column properly.
When I generate a box plot of ggplot it returns: default method not implemented for type 'list'
I have managed to do it right one time but can't replicate it! Am I importing the data wrongly from excel? I have tried .xlsl and .csv formats?
I find the excel.xlsx in the files section, click import dataset, in the viewer it confirms that the first column is double and second is char.
First row as names option is ticked.
This is the code displayed in the code preview:
library(readxl)
FILENAME <- read_excel("~/PhD/TCGA Analysis/FILENAME.xlsx")
View(FILENAME)
Data import seems fine. I'm not so sure about your data transformations though. Don't see what purpose the vector rseq serves. You shouldn't need to convert tumour_type to factor either; geom_boxplot() works fine with character vectors.
I was able to generate a boxplot using some dummy numbers for RSeq using the code below. Can you see if it works with your data?
library(readxl)
library(ggplot2)
FILENAME <- read_excel("~/FILENAME.xlsx")
print(FILENAME)
#> # A tibble: 14 x 2
#> Rseq tumour_type
#> <dbl> <chr>
#> 1 14 Normal
#> 2 35 Cancer
#> 3 47 Normal
#> 4 32 Cancer
#> 5 40 Normal
#> 6 24 Normal
#> 7 34 Cancer
#> 8 14 Normal
#> 9 41 Cancer
#> 10 12 Cancer
#> 11 28 Normal
#> 12 49 Normal
#> 13 14 Cancer
#> 14 44 Normal
ggplot(FILENAME, aes(x = tumour_type, y = Rseq)) +
geom_boxplot()
Oh fab this is great! So when I learnt to do this we had to extract the data from a larger spreadsheet which wasn't tidy so because my original data is tidy it goes into ggplot fine?
Just another quick question- how do I change the x axis so it reads normal then cancer?
I'm not sure what your original data looks like but yes, ggplot2 always plays nicer with tidy data sets.
Re-ordering the X-axis labels will require converting tumour_type to factor and specifying the order of its levels. However, this can be done on-the-fly in the aes() call without modifying your data itself.