Extracting data from csv file help

metalgearrex35 · November 13, 2018, 9:29pm

I am trying to extract data from a certain row from a csv file. When I run this code I get all three columns of Exam 1, 2 & 3 when I only want Exam 2 and the difference in the final exam. It also does not display the gender under the heading, only N/A. Does anyone have any idea what I'm missing? I can't seem to get it right.

exam.data %>%
  mutate(Gender = factor(Gender, levels=1:3,),
         diffs = Final-Exam2)

mfherman · November 13, 2018, 9:37pm

Hey there, @metalgearrex35! First of all, it is going to be easier for folks around here to help if you include a sample of the data and code you've used to get to where you are stuck. See this post for more information about best practices for writing questions and including a reproducible example (reprex).

But in general, it sounds like you want to select a subset of columns from your whole dataset. The dplyr function for doing that is (intuitively) called, select(). If you want just the Exam2 and diffs columns, you would do this:

exam.data %>%
  mutate(Gender = factor(Gender, levels=1:3,),
         diffs = Final-Exam2) %>%
  select(Exam2, diffs)

As for the other issue regarding the gender variable, it will be easier to help if you include a sample of your data as well as what you expect the output to be.

metalgearrex35 · November 13, 2018, 10:04pm

Hi, @mfherman

I have only read in the data so this is my starting point. So in my csv file I have 4 columns of data with the folloqing headings: Gender, Exam 1, Exam 2, Exam 3 and Final.

And I want to extract the gender column along with exam 2 and the difference in exam 2 and the final score in these tests. I tried the code above and added in Gender in the select function but it still does not display the gender only N/A. It is displaying the Exam 2 and difference though


exam.data = read.csv(file="Exam.csv")

exam.data %>%
  mutate(Gender = factor(Gender, levels=1:3,),
         diffs = Final-Exam2) %>%
  
  select(Gender, Exam2, diffs)

mfherman · November 13, 2018, 10:21pm

Got it! It's best if you can include a small sample of the data that is contained in the CSV. Since you're the only one that has that particular CSV file, I don't know what's contained in it and why you might be seeing errors. Here's a great thread about how to include sample data in a post: Best Practices: how to prepare your own data for use in a `reprex` if you can’t, or don’t know how to reproduce a problem with a built-in dataset?

So for your example, I used the tribble() function from the tibble pacakge to create some made up data and then we can all use that same sample data to work through the problem.

In the first line of your mutate() it seems like you are trying to convert the Gender variable to a factor. Are you doing this because you will be plotting the data later? In any case, if you remove the levels argument, I believe you'll get what you are looking for.

library(tidyverse)

exam.data <- tribble(
  ~Gender, ~Exam1, ~Exam2, ~Exam3, ~Final,
  "male", 91, 88, 74, 90,
  "female", 98, 90, 91, 95,
  "female", 88, 78, 90, 86
  )

exam.data %>%
  mutate(
    Gender = factor(Gender),
    diffs = Final - Exam2
    ) %>%
  select(Gender, Exam2, diffs)
#> # A tibble: 3 x 3
#>   Gender Exam2 diffs
#>   <fct>  <dbl> <dbl>
#> 1 male      88     2
#> 2 female    90     5
#> 3 female    78     8

^{Created on 2018-11-13 by the reprex package (v0.2.1)}

system · December 4, 2018, 10:21pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.