Hi there,

I've been having real issue with generating a new data frame for values less than 0.05

```
df.hits_sig <- hits[,hits$`Combined adj. P-value`<= 0.05]# is giving me 3420 rows with 0 variables when I know there is values below 0.05.
df.hits_sig <- hits[hits$`Combined adj. P-value`<= 0.05,] #s giving me 0 rows with 3 variables when I know there is values below 0.05.
```

This is really strange and don't know how to solve it. Anyone have any ideas.

Kindest regards,

Chris

I would question what you think you know about the presence of variables <=0.05 in your data.
How do you justify this knowledge ?

```
hits <- structure(list(`Combined adj. P-value` = c(0.04, 0.06), another_var = c(1,
2)), class = "data.frame", row.names = c(NA, -2L))
df.hits_sig1 <- hits[,hits$`Combined adj. P-value`<= 0.05]
# 0.04 0.06
df.hits_sig2 <- hits[hits$`Combined adj. P-value`<= 0.05,]
# Combined adj. P-value another_var
# 1 0.04 1
hits_no_p <- filter(hits,
`Combined adj. P-value`>0.05)
df.hits_sig3 <- hits_no_p[,hits_no_p$`Combined adj. P-value`<= 0.05]
#data frame with 0 columns and 1 row
df.hits_sig4 <- hits_no_p[hits_no_p$`Combined adj. P-value`<= 0.05,]
#[1] Combined adj. P-value another_var
#<0 rows> (or 0-length row.names)
```

I can see it in the hits data so prior knowledge I guess. I've also done this to show the head rownames

df.hits_sig <- hits[hits$`Combined adj. P-value`

<= 0.05,]
head(df.hits_sig)
[1] Colony.Size.Difference.1 Colony.Size.Difference.2
[3] Combined\nadj. P-value
<0 rows> (or 0-length row.names)

Here is the what the hits data set looks like

head(hits)
Colony.Size.Difference.1 Colony.Size.Difference.2
SPAC22F3.02 -5.996004e-05 0.008834666
SPAC23C4.08 4.870028e-04 0.016607275
SPAC3A12.08 -7.199038e-05 -0.015666830
SPCC1672.03c 1.022088e-04 0.056408217
SPAC4F10.02 6.478393e-05 -0.067926452
SPBP4H10.08 -2.452256e-04 0.036827895
Combined\nadj. P-value
SPAC22F3.02 0.9999938
SPAC23C4.08 0.9999937
SPAC3A12.08 0.9999887
SPCC1672.03c 0.9999665
SPAC4F10.02 0.9999598
SPBP4H10.08 0.9999582

the hits data you shared has 6 rows all with p values close to 1

heres an idea. install the `skimr`

package.
then `skimr::skim(hits)`

, it should have useful information about the distribution of the P value column

Thanks I've done that and i get this

skimr::skim(hits)
── Data Summary ────────────────────────
Values
Name hits
Number of rows 3420
Number of columns 3

Column type frequency:
numeric 3

Group variables None

── Variable type: numeric ───────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0
1 "Colony.Size.Difference.1" 820 0.760 -0.0323 0.200 -9.01e- 1
2 "Colony.Size.Difference.2" 820 0.760 0.00293 0.127 -1.17e+ 0
3 "Combined\nadj. P-value" 820 0.760 0.835 0.239 3.72e-14
p25 p50 p75 p100 hist
1 -0.133 -0.0186 0.0922 1.19 ▁▅▇▁▁
2 -0.0589 0.00532 0.0724 0.636 ▁▁▂▇▁
3 0.798 0.945 0.986 1.00 ▁▁▁▁▇

Here is some of the highest pvalues

Colony.Size.Difference.1 Colony.Size.Difference.2 Combined adj. P-value

SPBC2F12.15c
-0.9006921
-0.8791890
3.719247e-14
SPBC1271.12
-0.5766002
-0.5990040
1**.212481e-05**
SPAC17G6.04c
0.4522996
0.3608838
7.562817e-05
SPCP1E11.05c
-0.4118330
-0.2682728
1.001307e-03

try a tidyverse approach

```
library(tidyverse)
(df.hits_sig <- filter(hits,`Combined adj. P-value`<= 0.05))
```

Now getting other errors

library(tidyverse)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘tibble’ 2.1.3 is already loaded, but >= 3.0.0 is required

(df.hits_sig <- filter(hits,`Combined adj. P-value`

<= 0.05))
Error in filter(hits, `Combined adj. P-value`

<= 0.05) :
object 'Combined adj. P-value' not found

I'm not sure how to update the '"tibble", tried update.packages("tibble")
but didn't work.

`install.packages("tidyverse")`

Sorry, I should have said I did install the packages

I get this:

library(tidyverse)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘tibble’ 2.1.3 is already loaded, but >= 3.0.0 is required

ok,
i would first try

`install.packages("tibble")`

Thanks again,

So i restarted the libraries, installed the packages and then loaded the libraries for tibble and tidyverse

```
> library(tibble)
> library(tidyverse)
── Attaching packages ──────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.0 ✓ dplyr 0.8.5
✓ tidyr 1.0.2 ✓ stringr 1.4.0
✓ readr 1.3.1 ✓ forcats 0.5.0
✓ purrr 0.3.3
── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
x stringr::boundary() masks graph::boundary()
x dplyr::collapse() masks IRanges::collapse()
x dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
x dplyr::count() masks matrixStats::count()
x dplyr::desc() masks IRanges::desc()
x tidyr::expand() masks S4Vectors::expand()
x dplyr::filter() masks stats::filter()
x dplyr::first() masks S4Vectors::first()
x dplyr::lag() masks stats::lag()
x ggplot2::Position() masks BiocGenerics::Position(), base::Position()
x purrr::reduce() masks GenomicRanges::reduce(), IRanges::reduce()
x dplyr::rename() masks S4Vectors::rename()
x dplyr::select() masks AnnotationDbi::select()
x purrr::simplify() masks DelayedArray::simplify()
x dplyr::slice() masks IRanges::slice()
> (df.hits_sig <- filter(hits,`Combined adj. P-value`<= 0.05))
Error: object 'Combined adj. P-value' not found
```

cminnnis:

(df.hits_sig <- filter(hits,`Combined adj. P-value`

<= 0.05))

my apologies, in tidyverse land, when using the select/filter/mutate verbs etc, non standard variable names within the dataframes should be referenced within quoutation marks, single or double, and not the backticks.

`> (df.hits_sig <- filter(hits,"Combined adj. P-value"<= 0.05))`

nirgrahamuk:

my apologies, in tidyverse land, when using the select/filter/mutate verbs etc, non standard variable names within the dataframes should be referenced within quoutation marks, single or double, and not the backticks.

```
> (df.hits_sig <- filter(hits,"Combined adj. P-value"<= 0.05))
```

Same issue return no variables. It's very strange

(df.hits_sig <- filter(hits,"Combined adj. P-value"<= 0.05))
[1] Colony.Size.Difference.1 Colony.Size.Difference.2
[3] Combined\nadj. P-value
<0 rows> (or 0-length row.names)

sorry friend, something strange is going on, will try to get to the bottom, can you do this and share the results ?

`head(arrange(hits,"Combined adj. P-value"))`

I've just noticed that when skimr skimmed your hits df the p value variable has a newline character in the name ? That's super strange.
Can you try
`names(hits) <- c("a","b","c")`

for simplicity sake and see if that makes a difference to the rest of the functions we've been trying to use?

Yep that was the issue, why would that affect it?

now trying to link it to my ggplot with different issues but will make a different post for that.

Thank you so much.

system
Closed
May 15, 2020, 12:00pm
18
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.