Merging matrices

Hello,

I'm very much a novice R user. I would like to merge into a large matrix, a number of different .csv files with matrices consisting of 1064 rows and 1300 columns.
Is there a general merging code or something that I could use for this please?

Thank you!

Do you mean a data frame or do you actually mean a matrix?

If you do mean a data frame, are the columns the same in the different csv files?

It will help if you provide a reproducible example.

HI, thank for your reply. I've attached screenshots of parts of the type of files that I'd like to merge.
The two images show the top section and bottom section I need, so the first 1069 rows of each file. The number of columns vary for each file.
(I wasn't able to upload the .csv files)

Thank you!

Sorry, the attached now!
Was only able to upload one image, of the top part of the file.

There is only one image.

Also, it would be more useful if you could create a reproducible example as in the article.

Here is the code I am using:
rm(list=ls())

#library (sqldf)

#library(RSQLite)

mydata1 = read.csv("C:/Users/mnie020/SpectralFiles/Data1", header=T)

mydata2 = read.csv("C:/Users/mnie020/SpectralFiles/Data2", header=T)

mydata3 = read.csv("C:/Users/mnie020/SpectralFiles/Data3", header=T)

read.dir="C:/Users/Desktop/Spectral files"

setwd(read.dir)

mydata1.T <-t(mydata1)

mydata2.T <-t(mydata2)

mydata3.T <-t(mydata3)

mytempdata = merge(mydata1.T, mydata2.T)

myfulldata=merge(mytempdata,mydata3.T)

#sql="C:/Users/Desktop/DB_MergedData.sqlite"

#db=dbConnect(SQLite(),sql)

function(x)

  • nam <- names(x)

  • x <- as.numeric(as.character(x))

  • names(x) <- namx

WriteTable(name="MergedData1_3", value=myfulldata, row.names=FALSE,append=T)

dbWriteTable(conn=db, name="MergedData1_3", value=myfulldata, row.names=FALSE,append=T)

dbDisconnect(db)

A representative file is:

Sample ID+A1:D26 spectrum1 spectrum2 spectrum3
Replicate 1 1 1
Pin Number
240 0 0 0
241 1.221127 1.234181 1.194771
242 1.211973 1.224474 1.186408
243 1.201275 1.21317 1.176557
244 1.190714 1.202022 1.166929
1293 0.999164 0.945079 0.980991
1294 0.994434 0.939917 0.975257
1295 0.987079 0.940604 0.973154
1296 0.978832 0.947709 0.975815
1297 0 0 0
1298 0 0 0
1299 0 0 0
Dilution Factor 1 1 1
Sample Weight
Time of Analysis 42979.31 42979.31 42979.31
item1
item2
itme3

I'd like to combine a number of such files, preferably with the 1st column applying to all (it is the same in each file), and just adding columns 2 onwards.
Also, just combing the files up to the "Time of Analysis" row.

I have tried to run this, and write the data into an SQLite file, as the data will be very large.

I'd be grateful for any advice on why my code doesn't work?
Thank you!

The aim of the reproducible example is so that others can help you without trying to reproduce your code.

This is sort of what is useful (slightly modified):

df <- data.frame(stringsAsFactors=FALSE,
   Sample.ID = c("Replicate", "Pin Number", "240", "241", "242", "243", "244",
                 NA, "1293", "1294", "1295", "1296", "1297", "1298", "1299",
                 "Dilution Factor", "Sample Weight", "Time of Analysis"),
   spectrum1 = c(1, NA, 0, 1.221127, 1.211973, 1.201275, 1.190714, NA,
                 0.999164, 0.994434, 0.987079, 0.978832, 0, 0, 0, 1, NA,
                 42979.31),
   spectrum2 = c(1, NA, 0, 1.234181, 1.224474, 1.21317, 1.202022, NA, 0.945079,
                 0.939917, 0.940604, 0.947709, 0, 0, 0, 1, NA, 42979.31),
   spectrum3 = c(1, NA, 0, 1.194771, 1.186408, 1.176557, 1.166929, NA,
                 0.980991, 0.975257, 0.973154, 0.975815, 0, 0, 0, 1, NA,
                 42979.31)
)

> df
          Sample.ID    spectrum1    spectrum2    spectrum3
1         Replicate     1.000000     1.000000     1.000000
2        Pin Number           NA           NA           NA
3               240     0.000000     0.000000     0.000000
4               241     1.221127     1.234181     1.194771
5               242     1.211973     1.224474     1.186408
6               243     1.201275     1.213170     1.176557
7               244     1.190714     1.202022     1.166929
8              <NA>           NA           NA           NA
9              1293     0.999164     0.945079     0.980991
10             1294     0.994434     0.939917     0.975257
11             1295     0.987079     0.940604     0.973154
12             1296     0.978832     0.947709     0.975815
13             1297     0.000000     0.000000     0.000000
14             1298     0.000000     0.000000     0.000000
15             1299     0.000000     0.000000     0.000000
16  Dilution Factor     1.000000     1.000000     1.000000
17    Sample Weight           NA           NA           NA
18 Time of Analysis 42979.310000 42979.310000 42979.310000

> t(df)
          [,1]           [,2]         [,3]           [,4]           [,5]           [,6]           [,7]           [,8] [,9]           [,10]          [,11]          [,12]          [,13]         
Sample.ID "Replicate"    "Pin Number" "240"          "241"          "242"          "243"          "244"          NA   "1293"         "1294"         "1295"         "1296"         "1297"        
spectrum1 "    1.000000" NA           "    0.000000" "    1.221127" "    1.211973" "    1.201275" "    1.190714" NA   "    0.999164" "    0.994434" "    0.987079" "    0.978832" "    0.000000"
spectrum2 "    1.000000" NA           "    0.000000" "    1.234181" "    1.224474" "    1.213170" "    1.202022" NA   "    0.945079" "    0.939917" "    0.940604" "    0.947709" "    0.000000"
spectrum3 "    1.000000" NA           "    0.000000" "    1.194771" "    1.186408" "    1.176557" "    1.166929" NA   "    0.980991" "    0.975257" "    0.973154" "    0.975815" "    0.000000"
          [,14]          [,15]          [,16]             [,17]           [,18]             
Sample.ID "1298"         "1299"         "Dilution Factor" "Sample Weight" "Time of Analysis"
spectrum1 "    0.000000" "    0.000000" "    1.000000"    NA              "42979.310000"    
spectrum2 "    0.000000" "    0.000000" "    1.000000"    NA              "42979.310000"    
spectrum3 "    0.000000" "    0.000000" "    1.000000"    NA              "42979.310000"  

I see what you are trying to do though. You are transposing the csv and then trying to merge the files. I don't have a good solution at the moment though.

Thank you :slight_smile:
I don't actually need to transpose the files first. But I will try your suggestion!

Thanks again

Maybe you could do something like this them merge them. This way you have the same column headers. If that is what you are trying to do anyway (though maybe it is not).

library(tidyverse)
df %>% 
  gather(newrows, valname, -Sample.ID) %>% 
  spread(Sample.ID, valname)

    newrows     1293     1294     1295     1296 1297 1298 1299 240      241      242      243      244 Dilution Factor Pin Number Replicate Sample Weight Time of Analysis <NA>
1 spectrum1 0.999164 0.994434 0.987079 0.978832    0    0    0   0 1.221127 1.211973 1.201275 1.190714               1         NA         1            NA         42979.31   NA
2 spectrum2 0.945079 0.939917 0.940604 0.947709    0    0    0   0 1.234181 1.224474 1.213170 1.202022               1         NA         1            NA         42979.31   NA
3 spectrum3 0.980991 0.975257 0.973154 0.975815    0    0    0   0 1.194771 1.186408 1.176557 1.166929               1         NA         1            NA         42979.31   NA

Hi William,

Yes, thank you, this is more along the lines of what I need to do. However, the files I need to merge are individual .csv files, all in the same directory. So I'd need to read in all the .csv files, and merge these. Would this series of commands allow me to read all the .csv files in the folder, and then merge them together?

# make two example csv with different columns
set.seed(42)
commonids <- sample.int(1000,size=10,replace = TRUE)

ex1 <- data.frame (id=commonids,col1=1:5)
ex2 <- data.frame (id=commonids,col2=letters[1:5])
ex3 <- data.frame (id=commonids,col3=runif(5))
ex4 <- data.frame (id=commonids,col4=letters[10:14])

#save to csv
write.csv(ex1,"_ex1.csv",row.names = FALSE)
write.csv(ex2,"_ex2.csv",row.names = FALSE)
write.csv(ex3,"_ex3.csv",row.names = FALSE)
write.csv(ex4,"_ex4.csv",row.names = FALSE)

### Now we can begin to automate read and merge
library(tidyverse)
library(purrr)

allfiles <- dir()
myfiles <- allfiles[grepl("_ex",allfiles)]

my_data <- purrr::map(myfiles,read_csv)

for (i in 1:(length(my_data)-1))
{ 
   if(i==1)
  {
    cat(i , " ",i+1,"\n")
     my_output <- full_join(my_data[[i]],my_data[[i+1]])
  }
  else
  {
    cat(i+1,"\n")
    my_output <- full_join(my_output,my_data[[i+1]])
  }
}

Thank you so much NirGrahamuk :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.