How to get data between markers from dataframe in rstudio

hrsyed · March 24, 2022, 8:29am

I have 12 CVS files of 3 different subjects' activity as (subj1_act1...subj3_act4). I am loading a CSV file and it has data and markers(column names). Trying to find a solution to first find markers and with those indexes and only save the data into a different data frame between those markers. I have a data frame with column names repeating randomly and I need data between column names.

sample human activity data frame as follow

view (df) 
576 -2404   6556  12512  30.74  3788  -2249
577 -1428  10020  13024  30.69  2308  -1852
Sample aX  aY az Temp gX gY gz
554  375   2648   7412  13612  30.18  -286   -288  
Sample aX  aY az Temp gX gY gz
1 -1844 10224 11768 30.69 -332 -387 -62
2 -1876 10192 11708 30.65 -297 -435 -21
3 -1804 10332 11692 30.74 -355 -265 -78
Sample aX  aY az Temp gX gY gz
1068  375   2648   7412  13612  30.18  -286   -288  
1069  376   2760

Markers into separate data frame or in a single df with heading

Sample aX  aY az Temp gX gY gz
1 -1844 10224 11768 30.69 -332 -387 -62
2 -1876 10192 11708 30.65 -297 -435 -21
3 -1804 10332 11692 30.74 -355 -265 -78

Tried the

df = readLines("file.csv")
colname_rows = df[grepl("^[a-z]", df)]
colname_values = unique(df[colname_rows,])
df = df[!colname_rows, ]
df = read.table(text = df, sep = ",")

But getting unable to handle or resolve the dimension issues.

Procedure/Algorithm clarification maybe help full. I want to obtain the indexes for the start and end markers and use the markers obtained and get the data between the markers only. Store data into data frames and identified the data markers, name each of the variables.

How to achieve any guidance/direction or kindly refer be to any material which will be helpful/beneficial or help me out to complete the task.

Again thanks

PS: I don't have any expertise in R language

pieterjanvc · March 24, 2022, 11:17am

Hi there,

Selecting columns based on names and those in between is easiest to do with dplyr from the Tidyverse, though it can be done in base R too

library(dplyr)

myData = data.frame(
  var1 = runif(5),
  var2 = runif(5),
  var3 = runif(5),
  var4 = runif(5),
  var5 = runif(5)
)

#Using Base R
cols = colnames(myData)
myData[,which(cols == "var2"):which(cols == "var4")]
#>        var2      var3      var4
#> 1 0.8785906 0.2597909 0.8604980
#> 2 0.2605855 0.1271064 0.2194006
#> 3 0.3269434 0.9760147 0.1657206
#> 4 0.7942637 0.6075967 0.3605893
#> 5 0.8042586 0.1958767 0.6405772

#Using dplyr (Tidyverse)
myData %>% select(var2:var4)
#>        var2      var3      var4
#> 1 0.8785906 0.2597909 0.8604980
#> 2 0.2605855 0.1271064 0.2194006
#> 3 0.3269434 0.9760147 0.1657206
#> 4 0.7942637 0.6075967 0.3605893
#> 5 0.8042586 0.1958767 0.6405772

Hope this helps,
PJ

^{Created on 2022-03-24 by the reprex package (v2.0.1)}

system · April 14, 2022, 11:18am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.