Making basic analysis (Characterize a distribution) to an excel sheet uploaded to R

Hello
I am struggling to do a very basic kind of analysis to an excel sheet uploaded to R
I am trying to make a sum to the values of a numeric column and I usually get the same result, NA

File type: I uploaded two excel files: 1) Microsoft Excel 97-2003 Worksheet (.xls), 2) and the other is Microsoft Excel Template (.xlt)

I uploaded both files using
the following commands 1) library (readxl) 2) d <- read_excel ("----"), then 3) head (d) to check for the column heads. Both files were uploaded successfully.

Class of data: the column I am analyzing is titled "Dollar Sales" and I checked for the column type, using "Class" function and the answer is "numeric"

I am really wandering how and why I cannot make analysis , very basic summation equation to a numeric column?

Snap


shot is attached.

If one of the values is missing in the Excel file and was replaced by NA, then the whole sum will be evaluated to NA unless you set na.rm = TRUE. Try

sum(d1$`Dollar Sales`, na.rm = TRUE)

Super it worked,
Really thank you,
You are right, the reason is missing values,

Now I did an adjustment to my code which I originally used to upload my excel data to read the following:

d <- read_excel("C:/Users/mebak/Desktop/Tiny Data.xlsx", sheet = "Store 312 Cleansed", range = "A1:L3528", na = "")**

Now the problem is solved, and I can write down sum(d$'dollar sales') without using na.rm = TRUE , and the results are displayed. I don't receive the NA any more.

May I ask for two more questions:
Question 1: I am trying to use R for the same set of data to calculate the following characteristics of the distribution : A) Sum, B) min, C) mode, D) median, E) mean, F) max, G) Var, H) SD
To be able to do so, I am using 8 different lines of code for each function. Is there a way where I can use one line of code instead of 8 different lines of codes, ?
Here are my different lines of codes:
sum(d$Dollar Sales)
min(d$Dollar Sales)
mode(d$Dollar Sales)
median(d$Dollar Sales)
mean(d$Dollar Sales)
max(d$Dollar Sales)
var(d$Dollar Sales)
sd(d$Dollar Sales)
mode(d$Dollar Sales)

Is there one line of code , which I can use instead?

Question 2: Why when I tried to calculate the mode of the data, I received "numeric" although the answers were displayed for the rest (sum, min, median, mean, max, Var, DS)?

The mode( ) function does not calculate the statistical mode. The description in the help file is "Get or set the ‘mode’ (a kind of ‘type’), or the storage mode of an R object."

1 Like

I don't have d$Dollar sales but I do have iris$Petal.Length

library(tidyverse)

iris$Petal.Length

summarise(iris,
        across(Petal.Length,
               .fns = list(sum=sum,
                           min=min,
                           mode=function(x){
                             ux <- unique(x)
                             ux[which.max(tabulate(match(x, ux)))]
                           },
                           mean=mean,
                           max=max,
                           var=var,
                           sd=sd)))
Petal.Length_sum Petal.Length_min Petal.Length_mode Petal.Length_mean Petal.Length_max Petal.Length_var Petal.Length_sd
1            563.7                1               1.4             3.758              6.9         3.116278        1.765298
1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.