Error in as.data.frame.data.frame(hits, hits$`Combined adj. P-value` < : invalid 'row.names', length 0 for a data frame with 3420 rows

cminnnis · April 23, 2020, 1:29pm

Hi there,

I've been having real issue with generating a new data frame for values less than 0.05

df.hits_sig <- hits[,hits$`Combined adj. P-value`<= 0.05]# is giving me 3420 rows with 0 variables when I know there is values below 0.05.

df.hits_sig <- hits[hits$`Combined adj. P-value`<= 0.05,] #s giving me 0 rows with 3 variables when I know there is values below 0.05.

This is really strange and don't know how to solve it. Anyone have any ideas.

Kindest regards,

Chris

nirgrahamuk · April 23, 2020, 2:09pm

I would question what you think you know about the presence of variables <=0.05 in your data.
How do you justify this knowledge ?

hits <- structure(list(`Combined adj. P-value` = c(0.04, 0.06), another_var = c(1, 
                                                                                2)), class = "data.frame", row.names = c(NA, -2L))

df.hits_sig1 <- hits[,hits$`Combined adj. P-value`<= 0.05]
# 0.04 0.06
df.hits_sig2 <- hits[hits$`Combined adj. P-value`<= 0.05,] 
#  Combined adj. P-value another_var
# 1                  0.04           1

hits_no_p <- filter(hits,
                    `Combined adj. P-value`>0.05)

df.hits_sig3 <- hits_no_p[,hits_no_p$`Combined adj. P-value`<= 0.05]
#data frame with 0 columns and 1 row
df.hits_sig4 <- hits_no_p[hits_no_p$`Combined adj. P-value`<= 0.05,] 
#[1] Combined adj. P-value another_var          
#<0 rows> (or 0-length row.names)

cminnnis · April 23, 2020, 3:38pm

I can see it in the hits data so prior knowledge I guess. I've also done this to show the head rownames

df.hits_sig <- hits[hits$Combined adj. P-value<= 0.05,]
head(df.hits_sig)
[1] Colony.Size.Difference.1 Colony.Size.Difference.2
[3] Combined\nadj. P-value
<0 rows> (or 0-length row.names)

Here is the what the hits data set looks like

head(hits)
Colony.Size.Difference.1 Colony.Size.Difference.2
SPAC22F3.02 -5.996004e-05 0.008834666
SPAC23C4.08 4.870028e-04 0.016607275
SPAC3A12.08 -7.199038e-05 -0.015666830
SPCC1672.03c 1.022088e-04 0.056408217
SPAC4F10.02 6.478393e-05 -0.067926452
SPBP4H10.08 -2.452256e-04 0.036827895
Combined\nadj. P-value
SPAC22F3.02 0.9999938
SPAC23C4.08 0.9999937
SPAC3A12.08 0.9999887
SPCC1672.03c 0.9999665
SPAC4F10.02 0.9999598
SPBP4H10.08 0.9999582

nirgrahamuk · April 23, 2020, 3:42pm

the hits data you shared has 6 rows all with p values close to 1

heres an idea. install the skimr package.
then skimr::skim(hits), it should have useful information about the distribution of the P value column

cminnnis · April 23, 2020, 3:48pm

Thanks I've done that and i get this

skimr::skim(hits)
── Data Summary ────────────────────────
Values
Name hits
Number of rows 3420
Number of columns 3

Column type frequency:
numeric 3

Group variables None

── Variable type: numeric ───────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0
1 "Colony.Size.Difference.1" 820 0.760 -0.0323 0.200 -9.01e- 1
2 "Colony.Size.Difference.2" 820 0.760 0.00293 0.127 -1.17e+ 0
3 "Combined\nadj. P-value" 820 0.760 0.835 0.239 3.72e-14
p25 p50 p75 p100 hist
1 -0.133 -0.0186 0.0922 1.19 ▁▅▇▁▁
2 -0.0589 0.00532 0.0724 0.636 ▁▁▂▇▁
3 0.798 0.945 0.986 1.00 ▁▁▁▁▇

Here is some of the highest pvalues

Colony.Size.Difference.1 Colony.Size.Difference.2 Combined adj. P-value


SPBC2F12.15c	-0.9006921	-0.8791890	3.719247e-14
SPBC1271.12	-0.5766002	-0.5990040	1.212481e-05
SPAC17G6.04c	0.4522996	0.3608838	7.562817e-05
SPCP1E11.05c	-0.4118330	-0.2682728	1.001307e-03

nirgrahamuk · April 23, 2020, 4:17pm

try a tidyverse approach

library(tidyverse)
(df.hits_sig <- filter(hits,`Combined adj. P-value`<= 0.05))

cminnnis · April 23, 2020, 4:35pm

Now getting other errors

library(tidyverse)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘tibble’ 2.1.3 is already loaded, but >= 3.0.0 is required

(df.hits_sig <- filter(hits,Combined adj. P-value<= 0.05))
Error in filter(hits, Combined adj. P-value <= 0.05) :
object 'Combined adj. P-value' not found

I'm not sure how to update the '"tibble", tried update.packages("tibble")
but didn't work.

nirgrahamuk · April 23, 2020, 4:40pm

install.packages("tidyverse")

cminnnis · April 23, 2020, 4:42pm

Sorry, I should have said I did install the packages

I get this:

library(tidyverse)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘tibble’ 2.1.3 is already loaded, but >= 3.0.0 is required

nirgrahamuk · April 23, 2020, 4:44pm

ok,
i would first try

install.packages("tibble")

cminnnis · April 23, 2020, 4:52pm

Thanks again,

So i restarted the libraries, installed the packages and then loaded the libraries for tibble and tidyverse

> library(tibble)
> library(tidyverse)
── Attaching packages ──────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.0     ✓ dplyr   0.8.5
✓ tidyr   1.0.2     ✓ stringr 1.4.0
✓ readr   1.3.1     ✓ forcats 0.5.0
✓ purrr   0.3.3     
── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
x stringr::boundary() masks graph::boundary()
x dplyr::collapse()   masks IRanges::collapse()
x dplyr::combine()    masks Biobase::combine(), BiocGenerics::combine()
x dplyr::count()      masks matrixStats::count()
x dplyr::desc()       masks IRanges::desc()
x tidyr::expand()     masks S4Vectors::expand()
x dplyr::filter()     masks stats::filter()
x dplyr::first()      masks S4Vectors::first()
x dplyr::lag()        masks stats::lag()
x ggplot2::Position() masks BiocGenerics::Position(), base::Position()
x purrr::reduce()     masks GenomicRanges::reduce(), IRanges::reduce()
x dplyr::rename()     masks S4Vectors::rename()
x dplyr::select()     masks AnnotationDbi::select()
x purrr::simplify()   masks DelayedArray::simplify()
x dplyr::slice()      masks IRanges::slice()
> (df.hits_sig <- filter(hits,`Combined adj. P-value`<= 0.05))
Error: object 'Combined adj. P-value' not found

nirgrahamuk · April 23, 2020, 5:02pm

my apologies, in tidyverse land, when using the select/filter/mutate verbs etc, non standard variable names within the dataframes should be referenced within quoutation marks, single or double, and not the backticks.

> (df.hits_sig <- filter(hits,"Combined adj. P-value"<= 0.05))

cminnnis · April 23, 2020, 6:05pm

nirgrahamuk:

my apologies, in tidyverse land, when using the select/filter/mutate verbs etc, non standard variable names within the dataframes should be referenced within quoutation marks, single or double, and not the backticks.
> (df.hits_sig <- filter(hits,"Combined adj. P-value"<= 0.05))

Same issue return no variables. It's very strange

(df.hits_sig <- filter(hits,"Combined adj. P-value"<= 0.05))
[1] Colony.Size.Difference.1 Colony.Size.Difference.2
[3] Combined\nadj. P-value
<0 rows> (or 0-length row.names)

nirgrahamuk · April 23, 2020, 7:00pm

sorry friend, something strange is going on, will try to get to the bottom, can you do this and share the results ?

head(arrange(hits,"Combined adj. P-value"))

cminnnis · April 23, 2020, 7:14pm

I know its super weird

head(arrange(hits,"Combined adj. P-value"))
Error: incorrect size (1) at position 1, expecting : 3420

nirgrahamuk · April 23, 2020, 9:45pm

I've just noticed that when skimr skimmed your hits df the p value variable has a newline character in the name ? That's super strange.
Can you try
names(hits) <- c("a","b","c") for simplicity sake and see if that makes a difference to the rest of the functions we've been trying to use?

cminnnis · April 24, 2020, 12:00pm

Yep that was the issue, why would that affect it?

now trying to link it to my ggplot with different issues but will make a different post for that.

Thank you so much.

system · May 15, 2020, 12:00pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.