My aim is to convert this data frame into a table that adds up the numerical values for each gene to generate a summed score for each cell type. It should look something like this:
CellType Gene1 Gene2
CellA 6 30
CellB 2 10
CellC 7 35
Any suggestions for possible function/code that I can apply towards my data frame to get this type of table? Many thanks in advance!
Hi @Adam52: In principle, the specific task you mention has a straightforward solution, but as usual, the devil is often in the details, so it's best if you can post a sample of your actual data or something that mimics it as closely as possible.
The sample table you posted is a good start, but isn't easily copied and pasted by folks who would like to help, and doesn't contain details about that structure of the table. It would be helpful if instead you could paste a sample of your data like this:
Here's code that may accomplish the narrow task you described, but may not be the ideal solution in your specific context, which could recommend a restructuring of your actual data:
Is it possible to edit the code in order to display the summed values to the nearest 2 decimal places (in case I have numerical values that are not whole numbers)?
Thanks again for all your help. And I'll be sure to use reprex the next time around.
The first answer to the post Formatting Decimal places in R suggests how to control the printing of decimal places, but this is different from controlling how numbers look in tibble output.
For example, this code applies this suggestion to replace numbers by string representations of them
so they look the way you want, but they're no longer numbers. It may be possible to control how tibble output represents numbers, rather than changing numbers to strings, but I'm not sure how to do that.
Thanks, David. Now that I am thinking more about my dataset, I was wondering if its possible to modify the summarise function if my data set has characters instead of numbers. For example:
Is it possible to re-work the summarise function so it recognises text characters, sums them up and displays a total score? For instance, I want to assign a value of 0 to "NULL" or "Wt" and 1 to "Mut". The end result after summation would look something like this:
CellType Gene1 Gene2
CellA 1 1
CellB 1 0
CellC 1 2
I think I could replace the is.numeric function with is.character but not sure about the next steps.
Yes, the command case_when() should help. However, your 'after' table seems at odds with the assignment rules you gave, and could you post a sample of your 'before' table like this?
Also, can you please clarify the issue with the assignment rules in the 'after' table? Essentially, my aim is to generate a summed score per cell type if R can recognise the 3 text items I had mentioned. For example, in Cell C, Gene 2 has two mutations (i.e. 'Mut'). Since each mutation has a score of 1, then the resulting addition should produce a value of 2 as seen in the 'after' table.
Would greatly appreciate your help with coming up with a code that can allow me to get this type of result.
Hi @Adam52, yes: In tidyverse data-transformation commands, it is often possible to refer to columns by their names, in a way that mimics referring to them by their index.
For example, if you're interested in working with columns 3, 7 through 10, and 16, you could write this as 3, 7:10, 16 or in your case, as Gene2, Gene6:Gene9, Gene15. In these contexts, positive indices (or names) indicate the columns you want to include, and negative indices (or names), the ones you want to exclude. (This is reflects how indices are used more generally in vectors and other objects in R.)
Some commands, like select() can take these expressions directly, but others, like mutate_if(), require an intermediary function, like vars(), to do this.