How can i make a good Boxplot, Scatterplot, Barchart with this data?

Hello guys, first of all i want to apologise in case i ask for anything i should not to ask.
It is my first time here and i dont know well the rules.

After that, i wanna present myself, im a first year Data Science and Engineering Student (I study in Madrid-Spain, so my english is broken, sorry for that)

I have an asignment that i have to do in RStudio, the assignment its related with the survivors of the titanic
they give me a table with the information of every passenger , their ticket price, they cabin, class of their ticket, etc

The table has this variables (columns) and has 600 rows aproximately:
each passenger has this variables where tell us if the passenger survived, which class of ticket
he got, if was male of female, if had family, how much he paid for the ticket, etc.

Variable | Description :
Survived | Survival (0 = No, 1 = Yes )
Pclass | Ticket class ( 1 = 1st, 2 = 2nd, 3 = 3rd )
Sex | Sex (female, male)
Age | Age in years
SibSp | number of siblings or spouses aboard the Titanic
Parch | number of parents or children aboard the Titanic
Ticket | Ticket number
Fare | Passenger fare
Cabin | Cabin number
Embarked | Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

I attach the file with the table named ("titanic_train.Rdata")

and i have to ask questions related with that information and correlate the data and show the answer with frequency plots, histograms, density plots, scatter plots, boxplots and line plots

The question i have choose to develope for this assignment are:

Q1: having family members on board (whether siblings, spouses, children or parents) had more chances of survival?

Q2: People who payed for a Cabin had more probabilities of survival?

Q3: The Ticket class is related with more probabilities of survival?

Q4: The number of the ticket is related with more probabilities of survival?

Q5: Sex, age and class are variables that increases the possibilities to survive?

And the code for the Q1 is:

aux=titanic.train$Parch == 0 & titanic.train$SibSp == 0
travels_alone = rep("No", length (aux))
travels_alone[aux] = "yes"
titanic.train = cbind(titanic.train, travels_alone)
prop.table(table(titanic.train$travels_alone, titanic.train$Survived), margin = 1)

library (ggplot2)
ggplot (data=titanic.train)
ggplot(titanic.train) + aes(x = travels_alone, fill = Sex) +
geom_bar(position = position_fill())

and the barchart i got is this

How can i continue with the Questions? It is being so harsh for me to plot the 1 question and i dont even know if its correct.

Can somebody help me?

The plot of the first question is like this:
captura histo

but i have problems with the question 3 cause i dont see the relation between the number
of the ticket and an increasing rate of survival due to that number...

You need to carefully consider what you need to show on the plot to answer your question, e.g. writing it down with a pen and pencil first.

For example in your case you show the distribution of the sexes, but not the survival rate.
Also your ggplot-call isn't correct (although it's working).
Here is an example for Q1:

ggplot(data = titanic.train, 
       aes(x = travels_alone, fill = as.factor(Survived))) +
  geom_bar(position = position_fill()) + 
  labs(fill = "survived?",
       y = "proportion")

First you define the data-source, then the aesthetics (aes - as what is mapped) and define how it is shown.

Some hints for the other questions:
Q2: Difficult one. Could be extracted by the cabin number, but I guess this just contains the information if the cabin is known (and this might be biased towards the surviving passengers as they can remember and report the number or still have their ticket?)

Q3: X-axis = PClass, Y-axis = Survived

Q4: Very difficult one, you can think of a scatterplot, x = ticket number, y = survived. There are a few problems: survived has only 0 or 1 and the ticket number contains letters that should be removed before. However if there are no relation this is also an answer...

Hi, welcome!

Please have a look to our homework policy, homework inspired questions are welcome but they should not include verbatim instructions from your course.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.