Hey everyone, beginner to R. I am using the "starwars" dataset. Just wondering, are there any other efficient ways for me to combine the sum of the height column with the sum of the mass column besides this way? I could only think of this way to be the most easiest/efficient. Thank you.
There is no particular "goal," I was just working with the summarise, separate, and unite functions, and working with the concept of combining two columns together. So I was wondering if there were any other efficient ways for me to add the totals of the two columns together over than the way I did it, which I think is pretty efficient, but it's always nice to know other methods.
What do you mean by efficient is unclear. From a coding perspective or a computational one? Anyway, what you did looks pretty efficient to me (from both perspectives). Of course, you could have done it in one step, if not interested in separately summing height and mass as well:
Both actually. I don't work with very large data sets so I, personally, don't really benefit all that much on the computational side but there are reports of very impressive gains in processing time in some but not all functions versus base R or {dplyr}. Reading and writing a .csv file with fread() & fwrite() gives impressive time savings over read.csv & write.csv for example.
It is reported to be considerably more memory efficient in some operations.
From a coding perspective, data.table syntax is often less verbose than base R or dplyr. Here is a very simple example.
The character count difference is not huge in this case but noticeable. It can be quite impressive in more complicated statements.
I have learned is that there can be severe conflicts between {data.tabe} and {tidyverse} and I am a great fan of many parts of {tidyverse}, especially {lubridate} and {ggplot2} .
One should always load {data.table} before {tidyverse}.
I sometimes work with large datasets but never tried something beyond the standard vectorization done by base R (and inherited by the tidyvserse) for such simple operations. Of course, there are many ways to parallelize the computation if you need to save time. On the other hand, you're right in pointing out that data.table::fread() is much quicker than read.csv() or dplyr::read_csv(), although I prefer to use the parquet format from arrow when dealing with large datasets.
Just for fun. I think I have almost the same coding equivalents for the original question with the starwars data. Note I am not including the conversion of starwars to data.table format as I am assuming it is the standard working format.