Hello guys, first of all i want to apologise in case i ask for anything i should not to ask.
It is my first time here and i dont know well the rules.
After that, i wanna present myself, im a first year Data Science and Engineering Student (I study in Madrid-Spain, so my english is broken, sorry for that)
I have an asignment that i have to do in RStudio, the assignment its related with the survivors of the titanic
they give me a table with the information of every passenger , their ticket price, they cabin, class of their ticket, etc
The table has this variables (columns) and has 600 rows aproximately:
each passenger has this variables where tell us if the passenger survived, which class of ticket
he got, if was male of female, if had family, how much he paid for the ticket, etc.
Variable | Description :
Survived | Survival (0 = No, 1 = Yes )
Pclass | Ticket class ( 1 = 1st, 2 = 2nd, 3 = 3rd )
Sex | Sex (female, male)
Age | Age in years
SibSp | number of siblings or spouses aboard the Titanic
Parch | number of parents or children aboard the Titanic
Ticket | Ticket number
Fare | Passenger fare
Cabin | Cabin number
Embarked | Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
I attach the file with the table named ("titanic_train.Rdata")
and i have to ask questions related with that information and correlate the data and show the answer with frequency plots, histograms, density plots, scatter plots, boxplots and line plots
The question i have choose to develope for this assignment are:
Q1: having family members on board (whether siblings, spouses, children or parents) had more chances of survival?
Q2: People who payed for a Cabin had more probabilities of survival?
Q3: The Ticket class is related with more probabilities of survival?
Q4: The number of the ticket is related with more probabilities of survival?
Q5: Sex, age and class are variables that increases the possibilities to survive?
And the code for the Q1 is:
load("titanic_train.Rdata")
head(titanic.train)
aux=titanic.train$Parch == 0 & titanic.train$SibSp == 0
travels_alone = rep("No", length (aux))
travels_alone[aux] = "yes"
titanic.train = cbind(titanic.train, travels_alone)
prop.table(table(titanic.train$travels_alone, titanic.train$Survived), margin = 1)
library (ggplot2)
ggplot (data=titanic.train)
ggplot(titanic.train) + aes(x = travels_alone, fill = Sex) +
geom_bar(position = position_fill())
and the barchart i got is this
How can i continue with the Questions? It is being so harsh for me to plot the 1 question and i dont even know if its correct.
Can somebody help me?