Making basic analysis (Characterize a distribution) to an excel sheet uploaded to R

Bakr · December 18, 2021, 5:32pm

Hello
I am struggling to do a very basic kind of analysis to an excel sheet uploaded to R
I am trying to make a sum to the values of a numeric column and I usually get the same result, NA

File type: I uploaded two excel files: 1) Microsoft Excel 97-2003 Worksheet (.xls), 2) and the other is Microsoft Excel Template (.xlt)

I uploaded both files using
the following commands 1) library (readxl) 2) d <- read_excel ("----"), then 3) head (d) to check for the column heads. Both files were uploaded successfully.

Class of data: the column I am analyzing is titled "Dollar Sales" and I checked for the column type, using "Class" function and the answer is "numeric"

I am really wandering how and why I cannot make analysis , very basic summation equation to a numeric column?

Snap

shot is attached.

FJCC · December 18, 2021, 5:58pm

If one of the values is missing in the Excel file and was replaced by NA, then the whole sum will be evaluated to NA unless you set na.rm = TRUE. Try

sum(d1$`Dollar Sales`, na.rm = TRUE)

Bakr · December 19, 2021, 9:58am

Super it worked,
Really thank you,
You are right, the reason is missing values,

Now I did an adjustment to my code which I originally used to upload my excel data to read the following:

d <- read_excel("C:/Users/mebak/Desktop/Tiny Data.xlsx", sheet = "Store 312 Cleansed", range = "A1:L3528", na = "")**

Now the problem is solved, and I can write down sum(d$'dollar sales') without using na.rm = TRUE , and the results are displayed. I don't receive the NA any more.

May I ask for two more questions:
Question 1: I am trying to use R for the same set of data to calculate the following characteristics of the distribution : A) Sum, B) min, C) mode, D) median, E) mean, F) max, G) Var, H) SD
To be able to do so, I am using 8 different lines of code for each function. Is there a way where I can use one line of code instead of 8 different lines of codes, ?
Here are my different lines of codes:
sum(d$Dollar Sales)
min(d$Dollar Sales)
mode(d$Dollar Sales)
median(d$Dollar Sales)
mean(d$Dollar Sales)
max(d$Dollar Sales)
var(d$Dollar Sales)
sd(d$Dollar Sales)
mode(d$Dollar Sales)

Is there one line of code , which I can use instead?

Question 2: Why when I tried to calculate the mode of the data, I received "numeric" although the answers were displayed for the rest (sum, min, median, mean, max, Var, DS)?

EconProf · December 19, 2021, 6:12pm

The mode( ) function does not calculate the statistical mode. The description in the help file is "Get or set the ‘mode’ (a kind of ‘type’), or the storage mode of an R object."

nirgrahamuk · December 19, 2021, 6:26pm

I don't have d$Dollar sales but I do have iris$Petal.Length

library(tidyverse)

iris$Petal.Length

summarise(iris,
        across(Petal.Length,
               .fns = list(sum=sum,
                           min=min,
                           mode=function(x){
                             ux <- unique(x)
                             ux[which.max(tabulate(match(x, ux)))]
                           },
                           mean=mean,
                           max=max,
                           var=var,
                           sd=sd)))

Petal.Length_sum Petal.Length_min Petal.Length_mode Petal.Length_mean Petal.Length_max Petal.Length_var Petal.Length_sd
1            563.7                1               1.4             3.758              6.9         3.116278        1.765298

system · April 9, 2024, 4:41pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.