# fetch out the frequency of numeric values

now i want to create a function which can detect the frequency of numeric values in all columns excepting columns have "Text".

i was thing about the logic like to convert all the char values into numeric and then divide by total. if the total is 100% then its numeric or the column is character column.

is there any other way to do this ..??

library(tidyverse)
df <-  tibble(Region = c("AU","USA",65,"USA","!UK",88,"USA","CA","!UK"),
lock = c(26,18,NA,1,"Test",15,NA,"21%",13),
type= c("sale",NA,NA,"target","target",NA,"sale",NA,"target"),
state_Text=c("TX","CA","NY","OT","DE","WN","WA","PH","NJ"))

I'm not sure I understand your goal. Here are two methods for finding the fraction of each column that can be coerced to be a number. The as.numeric() function does not coerce 21% to a number but parse_number() does. I get 0.22 for the Region column and either 0.55 or 0.66 for the lock column.

library(tidyverse)
df <-  tibble(Region = c("AU","USA",65,"USA","!UK",88,"USA","CA","!UK"),
lock = c(26,18,NA,1,"Test",15,NA,"21%",13),
type= c("sale",NA,NA,"target","target",NA,"sale",NA,"target"),
state_Text=c("TX","CA","NY","OT","DE","WN","WA","PH","NJ"))
str(df)
#> tibble [9 × 4] (S3: tbl_df/tbl/data.frame)
#>  \$ Region    : chr [1:9] "AU" "USA" "65" "USA" ...
#>  \$ lock      : chr [1:9] "26" "18" NA "1" ...
#>  \$ type      : chr [1:9] "sale" NA NA "target" ...
#>  \$ state_Text: chr [1:9] "TX" "CA" "NY" "OT" ...
map_dbl(df, ~ 1 - sum(is.na(as.numeric(.x)))/nrow(df))
#> Warning in .f(.x[[i]], ...): NAs introduced by coercion

#> Warning in .f(.x[[i]], ...): NAs introduced by coercion

#> Warning in .f(.x[[i]], ...): NAs introduced by coercion

#> Warning in .f(.x[[i]], ...): NAs introduced by coercion
#>     Region       lock       type state_Text
#>  0.2222222  0.5555556  0.0000000  0.0000000

map_dbl(df, ~ 1 - sum(is.na(parse_number(.x)))/nrow(df))
#> Warning: 7 parsing failures.
#> row col expected actual
#>   1  -- a number    AU
#>   2  -- a number    USA
#>   4  -- a number    USA
#>   5  -- a number    !UK
#>   7  -- a number    USA
#> ... ... ........ ......
#> See problems(...) for more details.
#> Warning: 1 parsing failure.
#> row col expected actual
#>   5  -- a number   Test
#> Warning: 5 parsing failures.
#> row col expected actual
#>   1  -- a number sale
#>   4  -- a number target
#>   5  -- a number target
#>   7  -- a number sale
#>   9  -- a number target
#> Warning: 9 parsing failures.
#> row col expected actual
#>   1  -- a number     TX
#>   2  -- a number     CA
#>   3  -- a number     NY
#>   4  -- a number     OT
#>   5  -- a number     DE
#> ... ... ........ ......
#> See problems(...) for more details.
#>     Region       lock       type state_Text
#>  0.2222222  0.6666667  0.0000000  0.0000000

Created on 2023-10-03 with reprex v2.0.2

so in my original data frame column definition already provided as character to all column now i wanted to check the frequency of numeric values columns have .

Sorry, I do not understand the difference between what I did and what you want.

My result can be summarized as this: None of the values in your original data are numbers. All of the columns are characters. In the Region column, 2 of the 9 values can be converted to numbers and in the lock column, 5 or 6 of the values can be converted to numbers. If you do the conversion, the values that cannot be converted are changed to NA.

What other information do you want to get?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.