Merging matrices

Mich · February 18, 2020, 10:27pm

Hello,

I'm very much a novice R user. I would like to merge into a large matrix, a number of different .csv files with matrices consisting of 1064 rows and 1300 columns.
Is there a general merging code or something that I could use for this please?

Thank you!

williaml · February 18, 2020, 10:48pm

Do you mean a data frame or do you actually mean a matrix?

If you do mean a data frame, are the columns the same in the different csv files?

It will help if you provide a reproducible example.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Mich · February 18, 2020, 11:15pm

HI, thank for your reply. I've attached screenshots of parts of the type of files that I'd like to merge.
The two images show the top section and bottom section I need, so the first 1069 rows of each file. The number of columns vary for each file.
(I wasn't able to upload the .csv files)

Thank you!

Mich · February 18, 2020, 11:18pm

Sorry, the attached now!
Was only able to upload one image, of the top part of the file.

williaml · February 18, 2020, 11:22pm

There is only one image.

Also, it would be more useful if you could create a reproducible example as in the article.

Mich · February 18, 2020, 11:48pm

Here is the code I am using:
rm(list=ls())

#library (sqldf)

#library(RSQLite)

mydata1 = read.csv("C:/Users/mnie020/SpectralFiles/Data1", header=T)

mydata2 = read.csv("C:/Users/mnie020/SpectralFiles/Data2", header=T)

mydata3 = read.csv("C:/Users/mnie020/SpectralFiles/Data3", header=T)

read.dir="C:/Users/Desktop/Spectral files"

setwd(read.dir)

mydata1.T <-t(mydata1)

mydata2.T <-t(mydata2)

mydata3.T <-t(mydata3)

mytempdata = merge(mydata1.T, mydata2.T)

myfulldata=merge(mytempdata,mydata3.T)

#sql="C:/Users/Desktop/DB_MergedData.sqlite"

#db=dbConnect(SQLite(),sql)

function(x)

nam <- names(x)
x <- as.numeric(as.character(x))
names(x) <- namx

WriteTable(name="MergedData1_3", value=myfulldata, row.names=FALSE,append=T)

dbWriteTable(conn=db, name="MergedData1_3", value=myfulldata, row.names=FALSE,append=T)

dbDisconnect(db)

A representative file is:

Sample ID+A1:D26	spectrum1	spectrum2	spectrum3
Replicate	1	1	1
Pin Number
240	0	0	0
241	1.221127	1.234181	1.194771
242	1.211973	1.224474	1.186408
243	1.201275	1.21317	1.176557
244	1.190714	1.202022	1.166929

1293	0.999164	0.945079	0.980991
1294	0.994434	0.939917	0.975257
1295	0.987079	0.940604	0.973154
1296	0.978832	0.947709	0.975815
1297	0	0	0
1298	0	0	0
1299	0	0	0
Dilution Factor	1	1	1
Sample Weight
Time of Analysis	42979.31	42979.31	42979.31
item1



item2
itme3

I'd like to combine a number of such files, preferably with the 1st column applying to all (it is the same in each file), and just adding columns 2 onwards.
Also, just combing the files up to the "Time of Analysis" row.

I have tried to run this, and write the data into an SQLite file, as the data will be very large.

I'd be grateful for any advice on why my code doesn't work?
Thank you!

williaml · February 19, 2020, 12:30am

The aim of the reproducible example is so that others can help you without trying to reproduce your code.

This is sort of what is useful (slightly modified):

df <- data.frame(stringsAsFactors=FALSE,
   Sample.ID = c("Replicate", "Pin Number", "240", "241", "242", "243", "244",
                 NA, "1293", "1294", "1295", "1296", "1297", "1298", "1299",
                 "Dilution Factor", "Sample Weight", "Time of Analysis"),
   spectrum1 = c(1, NA, 0, 1.221127, 1.211973, 1.201275, 1.190714, NA,
                 0.999164, 0.994434, 0.987079, 0.978832, 0, 0, 0, 1, NA,
                 42979.31),
   spectrum2 = c(1, NA, 0, 1.234181, 1.224474, 1.21317, 1.202022, NA, 0.945079,
                 0.939917, 0.940604, 0.947709, 0, 0, 0, 1, NA, 42979.31),
   spectrum3 = c(1, NA, 0, 1.194771, 1.186408, 1.176557, 1.166929, NA,
                 0.980991, 0.975257, 0.973154, 0.975815, 0, 0, 0, 1, NA,
                 42979.31)
)

> df
          Sample.ID    spectrum1    spectrum2    spectrum3
1         Replicate     1.000000     1.000000     1.000000
2        Pin Number           NA           NA           NA
3               240     0.000000     0.000000     0.000000
4               241     1.221127     1.234181     1.194771
5               242     1.211973     1.224474     1.186408
6               243     1.201275     1.213170     1.176557
7               244     1.190714     1.202022     1.166929
8              <NA>           NA           NA           NA
9              1293     0.999164     0.945079     0.980991
10             1294     0.994434     0.939917     0.975257
11             1295     0.987079     0.940604     0.973154
12             1296     0.978832     0.947709     0.975815
13             1297     0.000000     0.000000     0.000000
14             1298     0.000000     0.000000     0.000000
15             1299     0.000000     0.000000     0.000000
16  Dilution Factor     1.000000     1.000000     1.000000
17    Sample Weight           NA           NA           NA
18 Time of Analysis 42979.310000 42979.310000 42979.310000

> t(df)
          [,1]           [,2]         [,3]           [,4]           [,5]           [,6]           [,7]           [,8] [,9]           [,10]          [,11]          [,12]          [,13]         
Sample.ID "Replicate"    "Pin Number" "240"          "241"          "242"          "243"          "244"          NA   "1293"         "1294"         "1295"         "1296"         "1297"        
spectrum1 "    1.000000" NA           "    0.000000" "    1.221127" "    1.211973" "    1.201275" "    1.190714" NA   "    0.999164" "    0.994434" "    0.987079" "    0.978832" "    0.000000"
spectrum2 "    1.000000" NA           "    0.000000" "    1.234181" "    1.224474" "    1.213170" "    1.202022" NA   "    0.945079" "    0.939917" "    0.940604" "    0.947709" "    0.000000"
spectrum3 "    1.000000" NA           "    0.000000" "    1.194771" "    1.186408" "    1.176557" "    1.166929" NA   "    0.980991" "    0.975257" "    0.973154" "    0.975815" "    0.000000"
          [,14]          [,15]          [,16]             [,17]           [,18]             
Sample.ID "1298"         "1299"         "Dilution Factor" "Sample Weight" "Time of Analysis"
spectrum1 "    0.000000" "    0.000000" "    1.000000"    NA              "42979.310000"    
spectrum2 "    0.000000" "    0.000000" "    1.000000"    NA              "42979.310000"    
spectrum3 "    0.000000" "    0.000000" "    1.000000"    NA              "42979.310000"

I see what you are trying to do though. You are transposing the csv and then trying to merge the files. I don't have a good solution at the moment though.

Mich · February 19, 2020, 12:34am

Thank you
I don't actually need to transpose the files first. But I will try your suggestion!

Thanks again

williaml · February 19, 2020, 12:39am

Maybe you could do something like this them merge them. This way you have the same column headers. If that is what you are trying to do anyway (though maybe it is not).

library(tidyverse)
df %>% 
  gather(newrows, valname, -Sample.ID) %>% 
  spread(Sample.ID, valname)

    newrows     1293     1294     1295     1296 1297 1298 1299 240      241      242      243      244 Dilution Factor Pin Number Replicate Sample Weight Time of Analysis <NA>
1 spectrum1 0.999164 0.994434 0.987079 0.978832    0    0    0   0 1.221127 1.211973 1.201275 1.190714               1         NA         1            NA         42979.31   NA
2 spectrum2 0.945079 0.939917 0.940604 0.947709    0    0    0   0 1.234181 1.224474 1.213170 1.202022               1         NA         1            NA         42979.31   NA
3 spectrum3 0.980991 0.975257 0.973154 0.975815    0    0    0   0 1.194771 1.186408 1.176557 1.166929               1         NA         1            NA         42979.31   NA

Mich · February 19, 2020, 12:54am

Hi William,

Yes, thank you, this is more along the lines of what I need to do. However, the files I need to merge are individual .csv files, all in the same directory. So I'd need to read in all the .csv files, and merge these. Would this series of commands allow me to read all the .csv files in the folder, and then merge them together?

nirgrahamuk · February 19, 2020, 1:18pm

# make two example csv with different columns
set.seed(42)
commonids <- sample.int(1000,size=10,replace = TRUE)

ex1 <- data.frame (id=commonids,col1=1:5)
ex2 <- data.frame (id=commonids,col2=letters[1:5])
ex3 <- data.frame (id=commonids,col3=runif(5))
ex4 <- data.frame (id=commonids,col4=letters[10:14])

#save to csv
write.csv(ex1,"_ex1.csv",row.names = FALSE)
write.csv(ex2,"_ex2.csv",row.names = FALSE)
write.csv(ex3,"_ex3.csv",row.names = FALSE)
write.csv(ex4,"_ex4.csv",row.names = FALSE)

### Now we can begin to automate read and merge
library(tidyverse)
library(purrr)

allfiles <- dir()
myfiles <- allfiles[grepl("_ex",allfiles)]

my_data <- purrr::map(myfiles,read_csv)

for (i in 1:(length(my_data)-1))
{ 
   if(i==1)
  {
    cat(i , " ",i+1,"\n")
     my_output <- full_join(my_data[[i]],my_data[[i+1]])
  }
  else
  {
    cat(i+1,"\n")
    my_output <- full_join(my_output,my_data[[i+1]])
  }
}

Mich · February 19, 2020, 8:20pm

Thank you so much NirGrahamuk

system · March 11, 2020, 8:20pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.