Column clean and find frequency

uwmarkyo · October 8, 2019, 9:18pm

I have a csv survey, it contains ethnicity.

I used gsub or aggregate to find the frequency but failed.
Please teach me how to clean this data to this following example:
American | 300
Indian | 100
Vietnamese | 60

thanks

Mark

andresrcs · October 8, 2019, 9:25pm

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

uwmarkyo · October 8, 2019, 9:49pm

Thanks.
However, I think mine is not an issue. It's I don't know how to do it. I hope to know learn how to clean it and extract the texts.

Thanks.

Mark

andresrcs · October 8, 2019, 10:13pm

Well if you don't know how to do it, I think that is an issue for you, and in order for us to help you we need sample data on a copy/paste friendly format. (like is explained in the link I gave you).

uwmarkyo · October 8, 2019, 11:18pm

Hi, andresrcs,
Sorry for my ignorance about the term.
I thought it means reproduce the process.

However, I tried the link you provided. I've encountered some issues as follow:

I've tried copy-paste from excel, but it wasn't allowed to paste with tipple.
I also tried others. Do you mind helping me understand my mistake?
Thanks.

Mark

andresrcs · October 8, 2019, 11:33pm

Hard to know with the information you are providing, this is a nice blog post about datpasta that might be of help for you.

Another option is to share a link to your csv file so we can download it and try to help you.

andresrcs · October 8, 2019, 11:48pm

I'm going to give you a small example of how to do this task, as long as you are able to read your data as a data frame, you can do something like this.

library(tidyverse)

# Sample data / you have to replace this by your actual dataset
sample <- data.frame(stringsAsFactors = FALSE,
                     ethnicity = c("[African]", "[African] & [Latino/Hispanic] & [Mexican]", 
                                 "[African] & [Middle Eastern/North African]",
                                 "[American Indian/Alaska Native]",
                                 "[American Indian/Alaska Native] & [Black or African American]"))

sample %>% 
    separate_rows(ethnicity, sep = "\\s&\\s") %>% 
    mutate(ethnicity = str_remove_all(ethnicity, "[\\[\\]]")) %>% 
    count(ethnicity)
#> # A tibble: 6 x 2
#>   ethnicity                         n
#>   <chr>                         <int>
#> 1 African                           3
#> 2 American Indian/Alaska Native     2
#> 3 Black or African American         1
#> 4 Latino/Hispanic                   1
#> 5 Mexican                           1
#> 6 Middle Eastern/North African      1

^{Created on 2019-10-08 by the reprex package (v0.3.0.9000)}

uwmarkyo · October 9, 2019, 3:39am

Hi, andresrcs,

Sorry for the late reply. I was in the class.

Thanks for the example.
By the way, why is the difference between factor and list or data.table?
Do you mind recommending a site so that I can understand more?
Thanks.

Mark

andresrcs · October 9, 2019, 4:07am

I think I don't really understand your question, you are mentioning some object classes but I don't understand what is your doubt about them.
If you are looking for a basic introduction to R I think this online book is a good resource.

system · October 21, 2019, 5:18pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.