How to handle this error "arguments imply differing number of rows:" when the row count is consistent across all columns??

jaggu_ramesh · September 5, 2023, 5:39pm

Hi,
I have a data frame where one of the columns contain time(HH:MM:SS) and of character format.
I am trying to convert it into minutes to make further calculations easy.
I have a function like this to do it by doing matrix multiplication.

time2sec <- function(x) {
  c(as.matrix(read.table(text = x, sep = ":")) %*% c(3600, 60, 1))
}

My original DF is huge with 1698572 rows .
So i create a subset like this to try

subDF <- head(cleanDF,100)
subDF <- transform(subDF, ride_time_in_minutes = time2sec(ride_length)/60)

It works beautifully with a new column 'ride_time_in_minutes' of num datatype.
But when i try it on the entire DF , it fails with the message
arguments imply differing number of rows: 1698572, 1697551
I have already checked for null values and o s and the row numbers are also consistent in the original set.
Confused how to proceed to debug.
Any idea what goes wrong??
Thanks in advance !!!

technocrat · September 5, 2023, 9:27pm

Try checking with

dim(cleanDF)[1] - sum(complete.cases(cleanDF))

If NA values are present that's the problem (definitely if the difference is 21).

jaggu_ramesh · September 6, 2023, 2:31am

dim(cleanDF)[1] - sum(complete.cases(cleanDF))

When i run this , i get 15164.
but since i am passing only one column 'ride_length' for the function , i ran it like this

dim(cleanDF$ride_length)[1] - sum(complete.cases(cleanDF$ride_length))

This gives me integer(0).
Should i clean missing values in all columns?

technocrat · September 6, 2023, 3:57am

The aim is to get all columns to have the same number of rows. The purpose of dim(object)[1] is to get the number of rows. dim() returns a vector of length two with the row/column counts—you can also use nrow() or ncol(). But all of these require that it be used on something with more than one dimension. A single column of a data frame has only a length(). As a result, dim(cleanDF$ride_length)[1] evaluates to NULL and that makes the result of

equal to zero.

This

Tells us that 15,164 have one or more columns with missing values. With a dataset as big as this, discarding them is sensible.

recleanedDF <- cleanDF[complete.cases(cleanDF,]:

which reads subset cleanDF by taking all rows with no NA values and all columns.

jaggu_ramesh · September 6, 2023, 6:13am

Thanks for your reply,
Now using

recleanedDF <- cleanDF[complete.cases(cleanDF,]:

I removed all columns with NA values.
But still the same error gets repeated !!

technocrat · September 6, 2023, 8:28am

I got distracted by the title of the post and went off in the wrong direction. Try this on a copy of your data

# Load the lubridate package
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# Create the data frame
d <- data.frame(
  hms = c("3:10:01", "23:10:02", "", NA, "asdf", "212", "23:10:02"),
  stuff = rep(TRUE, 7),
  more = seq(1:7),
  last = c("A", NA, LETTERS[3:7])
)

# Convert the hms column to a period object using the hms() function from lubridate
d$hms <- hms(d$hms)
#> Warning in .parse_hms(..., order = "HMS", quiet = quiet): Some strings failed
#> to parse, or all strings are NAs

# Convert the period object to seconds using the as.numeric() function
d$hms <- as.numeric(d$hms, "seconds")

# Display the modified data frame
d
#>     hms stuff more last
#> 1 11401  TRUE    1    A
#> 2 83402  TRUE    2 <NA>
#> 3    NA  TRUE    3    C
#> 4    NA  TRUE    4    D
#> 5    NA  TRUE    5    E
#> 6    NA  TRUE    6    F
#> 7 83402  TRUE    7    G

^{Created on 2023-09-06 with reprex v2.0.2}

jaggu_ramesh · September 6, 2023, 9:28am

Worked like a charm!!
learnt a lot by getting struck here
Thanks a lot!!

maritzawyman · September 6, 2023, 2:22pm

Thanks for sharing this wonderfull knowledge.
It works beautifully with a new column 'ride_time_in_minutes' of numeric datatype. However, when I try it on the entire DataFrame, it fails with the following message:.

Regards, Pro

nirgrahamuk · September 6, 2023, 2:28pm

an odd error message indeed

Demetrius675 · September 20, 2023, 10:19am

I've issue while trying to create a plot for multivariate time series data in RStudio. I have a dataset with multiple variables recorded over time, and I'm attempting to visualize the relationships between these variables. When I use the ggplot2 package or other plotting libraries to create a time series plot with multiple lines (one for each variable), I run into an error. The error message I receive is somewhat cryptic: "Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ."

Focus Auto Detailing

nirgrahamuk · September 20, 2023, 10:48am

@ Demetrius675 The issue discussed in this thread was resolved; If you have another (possibly similar) coding issue that you would like support with I would encourage you to start a new thread where that can be discussed.

I also recommend that you review the following guide, FAQ: Tips for writing R-related questions.
For example, the guide emphasizes asking coding questions with formatted code-chunks and a reprex.

You may have noticed folks here requesting minimal reprexes, that's because asking questions this way saves answerers a lot of time.

Reproducible Examples:

help make your question clear and replicable
increases the probability folks will reach out and try to help,
reduces the number of back-and-forths required to understand the question,
and makes your question and suggested solutions more useful to folks in the future researching similar problems.

system · September 27, 2023, 10:49am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.