How do I form a data frame in r from multiple datasets, and share the new dataset through ggplot2 visualizations?
Thanks,
Benjamin
How do I form a data frame in r from multiple datasets, and share the new dataset through ggplot2 visualizations?
Thanks,
Benjamin
Are you talking about multiple datasets with the same variables in them? Are you looking to generate a single plot with a line/bar/something else for each source dataset?
Thanks for responding. I am try to get something like variables x, y, z from dataset A, variables b, c, d from dataset B and variables j, k, l from dataset C. Form a new dataset D then plot these variables and visualize with ggplot2. Now the date/time should be the same and from the same file/folder (mturkfitbit_export_3.12.16-4.11.16). Thanks!
Is there a 1-to-1 correspondence between rows in each dataset (i.e., the first row in A goes with the first rows in B and C), or some columns that tell you which row in A matches which row in B and which row in C?
Only common column among them is the Id (Identification) column. But it's not even uniform or in the same order. Thanks!
By the looks of things. I don't think what I am trying to do is possible or correct. Hence, I will try something else. Thanks!
Sir how do you separate date and time (3/12/2016 0:00) into separate columns. Assuming they are under ActivityHour column? Thanks!
Try the following example (which produces your data frame D but does not do any plotting).
# Create some test data.
times <- c("10/17/25 09:30", "10/03/25 13:10", "10/08/25 04:07", "10/11/25 13:32") |>
strptime(format = "%m/%d/%Y %H:%M") # time stamps for the data
A <- data.frame(Time = times, x = 1:4, y = 12:15, z = -3:0, q = NA, w = c(0, 1, 0, 1))
B <- data.frame(Time = sort(times), b = 8:5, c = -2, d = (1:4) * pi)
C <- data.frame(Time = sort(times, decreasing = TRUE), j = 4:1, k = 12:15, l = c(-1, 1, 2, -1),
m = 9:12, n = -3:0)
# Merge the dataframes by time stamp, keeping the desired columns.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
D <- inner_join(A, B, by = join_by(Time)) |>
inner_join(C, by = join_by(Time)) |>
select(Time, x, y, z, b, c, d, j, k, l) |>
arrange(Time)
Created on 2025-10-17 with reprex v2.1.1
The code assumes that the ID column has the same name in every dataframe ("Time" here), but that can be worked around in the join_by() calls. It does not assume that dataframes have the same row ordering, nor that all variables are to be used. If an ID value appears in one or more of A, B and C but not all of them, the data with that ID will not be included in D.
Appreciated. Many thanks Sir!