number of observations

jak123 · November 3, 2021, 7:41am

Hi R com.

i have imported a CSV dataset into r as a dataframe

As u can see th df have 43018 observations BUT when i opens (view) the data and scroll down i has 104139 observations.

the dateframe have 4 variables (les call them x1,x2,x3,x4) and im looking for a specific number in column x1 so I use :
which(df$x1 == 457)
but the output is :
integer(0)
even though i can find this number when i opens the dataframe and searrch for it

info:
str(df) is all numeric

GreyMerchant · November 3, 2021, 7:54am

Hi,

It is really hard to see what is happening in your case as we have no data. As you can see in my mini example I am able to get the right row to show after filtering and can do it with other assignment as well.

library(tidyverse)

df <- data.frame(
  x1 = c(400:500),
  x2 = c(300:400)
)

df_output <- df %>% filter(x1 == 457)

df_output
#>    x1  x2
#> 1 457 357


df$x1 == 457
#>   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
#>  [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [97] FALSE FALSE FALSE FALSE FALSE

^{Created on 2021-11-03 by the reprex package (v2.0.0)}

Chances are your numbers are not truly numeric in your data.frame which will be leading to this problem. Create a reprex and we can help FAQ: How to do a minimal reproducible example ( reprex ) for beginners

jak123 · November 3, 2021, 8:46am

Hi again,

jak123 · November 3, 2021, 9:10am

and also

what is the "row" = 40959
its like the row numbers are missed up because i deleted some data

nirgrahamuk · November 3, 2021, 9:43am

your issue is probably row.names. You might drop the rownames, or make them a real column which records the original row number they had come from, (but not their current row number)

recreation of your problem

#example data
(hiris <- head(iris))
#mess with the row.names
row.names(hiris) <- 1001:1006

#check it
hiris
View(hiris)

# simple fix
row.names(hiris) <- NULL

#check again
View(hiris)

jak123 · November 3, 2021, 9:55am

why is those to values showed in the image not equal? in the global environment it says 43.018 observation and when i opens it it shows 104.139 observations
thanks again

GreyMerchant · November 3, 2021, 10:04am

You can clearly see on the screenshot you posted that the row numbers on the left are not continuous. You have 43018 actual rows in your data but the rownames are associated with an original ID from a previous action or operation.

jak123 · November 3, 2021, 11:35am

haha yea i see, thanks GreyMerchant and everyone else ! do you run row.names(df) <- NULL every time i delete something?

GreyMerchant · November 3, 2021, 11:37am

It depends on what the goal is. I try and only apply rownames at the very end of all my operations and for specific reasons (like wanting to create a table etc). Typically, it is just better to create your ID or name column as a actual column and not as rownames.

nirgrahamuk · November 3, 2021, 12:06pm

if you deleted the rownames, they wont be there to be deleted subsequently. In other words, whether you should remove the row.names will depend on whether the data.frame in question had row.names.

row.names have gone out of fashion, I rarely encounter them in modern workflows.

system · November 24, 2021, 12:07pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.