Merge vectors in for loop

Dear all

I'm working on a tool to structure my bank transactions and create a comprehensive annual report. I have a list of about 600 transactions. I select the desired transactions for each journal entry with help of the grep function.

However, I want to check if each bank transaction is included in the overview:

for(transaction in all.transactions) {
x <- grep(transaction, dataset, perl = , value = F)
print(x)
}

Now R shows me the location of each used transaction as scanned by the grep function. However, now R shows only the location of the last iteration. I want that R merges x in this case for each iteration of the loop. How can I merge this? E.g. next loop:

result <- vector("numeric", 10)
for (i in length(result) ){
squared <- i^2
result[i] <- squared
}
result

Source: http://rfaqs.com/category/r-control-structure/for-loop-in-r

Hope that my issue is clear now. Thanks in advance.

Hi,

I think I know what you're after, but can you please generate some dummy data for 'all.transactions' and 'dataset'. For example, does dataset contain everything in all.transactions, or can it be that there are transactions that are not in the dataset.

Also, do you just need to check if it's present or not, or do you really like for each the location in the original dataset?

Thanks

1 Like

Thanks for the quick response.

all.transactions could be ("supplier1", "supplier2", "supplier3"), and then R connects the payments that belong to these unique names. However, it can be the case that some payments are not recognized for any reason.

Dataset are all the transactions, a dataframe with column A: transaction amount and B supplier 1, or 2, etc.

Hope this helps, thanks in advance again

What would be much more helpful is a reproducible example (reprex) illustrating your issue. Please have a look at this guide, to see how to create one:

1 Like

Thanks for the help. I made a reprex to clarify the problem:

head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa

for(Species in iris) {
print(grep(Species, iris$Petal.Width, perl = , value = F))
}

#> [1] 1 2 3 4 5 8 9 11 12 15 21 23 25 26 28 29 30 31 34 35 36 37 39
#> [24] 40 43 47 48 49 50

Now the loop returns (I removed the warnings in this case, but is irrelevant to illustrate the problem) the row numbers of the species. In my case, I have 50 specie values, and gives the same output, but then a lot of row numbers. I want that the loop merges all the numbers to indicate if a value is missing. A first step:

for(Species in iris) {
x <- grep(Species, iris$Petal.Width, perl = , value = F)
}
#> Warning in grep(Species, iris$Petal.Width, perl = , value = F): argument
#> 'pattern' has length > 1 and only the first element will be used

#> Warning in grep(Species, iris$Petal.Width, perl = , value = F): argument
#> 'pattern' has length > 1 and only the first element will be used

#> Warning in grep(Species, iris$Petal.Width, perl = , value = F): argument
#> 'pattern' has length > 1 and only the first element will be used

#> Warning in grep(Species, iris$Petal.Width, perl = , value = F): argument
#> 'pattern' has length > 1 and only the first element will be used

#> Warning in grep(Species, iris$Petal.Width, perl = , value = F): argument
#> 'pattern' has length > 1 and only the first element will be used

x
#> integer(0)

However, a lot of warnings appear. I want that x in this case returns all the row numbers where species appear. So I am able to locate the missing values.

Hi,

I'm still confused by your example, but let me give you an example of what I think you mean and let's go from there :slight_smile:

So: I created two lists, one with all the data I want to test against called 'dataFull'. The other is the list of all samples (transactions) I have and want to see if they are found in dataFull, called dataSamples

dataFull = LETTERS
dataSamples = c(sample(LETTERS, 40, replace = T), "&", "%")

For simplicity, I took the alphabet as the full dataset vector, and 40 random letters plus some symbols as my samples.

Now, its very easy to see which data in the fullDataset is never seen in samples like this:

dataFull[!dataFull %in% dataSamples]

Or do the reverse, and see which samples are not found in the full dataset

dataSamples[!dataSamples %in% dataFull]

Run this, play with it and see if it answers the question you have...

Grtz
PJ

1 Like

Thanks very much for your answer. Yes, indeed, thanks for the code.

However, one step is required before I can execute the code as provide by you.

In my current loop, R returns for each supplier the row number of the transaction, e.g.:

1 15 18 200 250 300
30 50 80
20 500 540

Due to the print statement in the code

for(zoekterm in zoektermen) {
print(grep(zoekterm, transacties.17.18$Omschrijving, perl = , value = F))
}

When I change print to x, the output is similar to the last zoekterm in this case,

20 500 540

Code:
for(zoekterm in zoektermen) {
x <- (grep(zoekterm, transacties.17.18$Omschrijving, perl = , value = F))
}
x

However, I want the following output (one vector that combines all calculated row numbers in the loop)

1 15 20 30 18 50 80 200 250 300 500 540

My final goal is that the loop returns all missing values in 1:550 or something, so all values except for 1 15 20 30 18 50 80 200 250 300 500 540

So I can identify the missing transactions

Hope it is more clear now, sorry for misunderstanding

Could you make a reprex with a small sample of your actual data? It would be easier if we all were on the same page (i.e. looking at the same data).

1 Like

Ok,

If this is not what you're looking for you really have to provide dummy data and code in a format we can actually run because it's still confusing :slight_smile:

This is how I created some fake data based on what I understand:

#Data frame with all transactions and the supplier
transactions = data.frame(id = 1:100, description = paste("supplier", sample(1:30, 100, replace = T), sep = ""))

#List of all suppliers you have
allSuppliers = paste("supplier", 1:20, sep = "")

This is the code you wrote, adapted and applied to that data:

#Checking the transactions for all suppliers
presentValues = unlist(sapply(allSuppliers, function(supplier){
  which(supplier == transactions$description)
}))

#List transactions without supplier in allSuppliers 
missingTransactions = transactions[!1:nrow(transactions) %in% presentValues,]

If this is indeed what you're looking for, it can be done in just one line of code:

missingTransactions =  transactions[!transactions$description %in% allSuppliers,]

Again, if this is not what you want, you have to write out all the code and provide a data frame, not just and example.

Good luck
PJ

1 Like

Thanks all for your responses, they are all useful.

However, I still mis a final solution. I agree that I need to explain it better (I am a beginner in R/this forum), so I tried to offer you a clear example of my desired output.

image

Source: r - fill matrix in loop with vectors of different length - Stack Overflow

I've created next example to make my current problem more clear. In next code, I generate 3 vectors. If I print the three vectors separately, the next is happening:

for (i in 1:3){
set.seed((3434))
print(sample(i+4, replace=F ))

i = i+1
}
#> [1] 2 4 1 5 3
#> [1] 3 5 1 2 6 4
#> [1] 3 6 1 7 4 5 2

I am looking for a solution to combine all the output (so 1, 2 and 3) into 1 vector, called x. So x has to be: 2, 4, 1, 5, 3, 3, 5, 1 2, 6, 4, 3, 6, 1, 7, 4, 5, 2

A first step to solve this problem is the next:

for (i in 1:3){
set.seed((3434))
x <- (sample(i+4, replace=F ))

i = i+1
}

x
#> [1] 3 6 1 7 4 5 2

However, now I only know the last output (x). I want that x returns all the output (of all iterations combined).

In case of a dataframe, the issue can be solved with the following code inside the for loop:

x <- rbind(x, x)

However, this line of code does not work, also because of the unequal lengths of the vectors.

Hopefully, now my problem is more clear to you. Looking forward to your feedback. Thanks in advance again.

You can use c() to concatenate vectors, although I still can't understand the purpose of your code.

x <- NULL
for (i in 1:3) {
    set.seed(3434)
    x <- c(x, sample(i+4, replace=F ))
    i = i+1
}
x
#>  [1] 4 3 1 2 5 4 3 6 1 2 5 4 3 5 2 7 6 1
1 Like

Dear all

I created as example of my transactions file to make the purpose clear to you.

# show data set

# transactions 2018

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

amount.money <- c("10", "20", "50", "70", "90", "100", "150", "170", "190", "200")
party <- c("interest", "food", "drinks", "interest", "drinks", "drinks", "interest", "interest",
                              "food", "interest")


transactions <- cbind(amount.money, party) %>% as.data.frame()
colnames(transactions)[1] = "money"
colnames(transactions)[2] = "party"


search.term <- c("interest", "drinks")


for(term in search.term) {
  print(grep(term, transactions$party, perl = , value = F))
}
#> [1]  1  4  7  8 10
#> [1] 3 5 6

I created a dataframe, called transactions, where I show 10 transactions that have been done in 2018. The transactions consist of 3 categories, namely interest, food and drinks. However, as you can see, in my search terms I missed the "food" transactions (number 2 and number 9 in the string) (see output for loop in the last part of the code). With the grap function, I do recognize the interest and drinks expenses. However, as already explained, I miss the food expenses in my overview.

My purpose: firstly, combine the two separate vectors (output of for loop). Next, generate a code to indicate which values in 1:10 are missing. In this case, 2 and 9. So, I can add the food category to my search terms, so I do not miss them in my final overview.

Hope it is much more clear now. Thanks.

Using a for-loop for this seems pretty inefficient, why not to simply use a filter like this?

library(dplyr)
library(stringr)

transactions <- data.frame(stringsAsFactors = FALSE,
                           amount.money = c(10, 20, 50, 70, 90, 100, 150, 170, 190, 200),
                           party = c("interest", "food", "drinks", "interest", "drinks",
                                      "drinks", "interest", "interest", "food", "interest"))
transactions %>%
    filter(!str_detect(party, "interest|drinks"))
#>   amount.money party
#> 1           20  food
#> 2          190  food
1 Like

Thanks, that is much easier than a loop.

However, in my example, I have only 3 values. In my transactions file, there are 50 parties. So I think I still need a loop. When I build a loop around the filter code, R returns for each party the transactions that are not equal to the specific name. But that 50 times..

Could you explain why?, What are your concerns with this approach? I don't see how a loop would deal better with more parties than a filter.

See this example with made up data (100 registers with 25 parties)

library(dplyr)
library(stringr)

set.seed(123) # For reproducibility

# Much larger made up sample data with 100 registers and 25 unique parties.
transactions <- data.frame(stringsAsFactors = FALSE,
                           amount.money = rnorm(100),
                           party = sample(letters, 100, replace = TRUE))

# Made up search terms
search_term <- sample(letters, 10, replace = FALSE)
search_term
#>  [1] "y" "t" "c" "n" "a" "b" "d" "j" "v" "e"

transactions %>%
    filter(!str_detect(party, paste0(search_term, collapse = "|")))
#>    amount.money party
#> 1    1.55870831     p
#> 2    0.07050839     x
#> 3    1.71506499     k
#> 4    0.46091621     p
#> 5   -0.68685285     h
#> 6    0.40077145     l
#> 7   -0.55584113     q
#> 8   -1.96661716     k
#> 9   -0.47279141     w
#> 10  -1.06782371     h
#> 11  -1.02600445     u
#> 12  -0.72889123     m
#> 13  -1.68669331     k
#> 14   0.83778704     m
#> 15  -1.13813694     f
#> 16   0.42646422     h
#> 17  -0.29507148     l
#> 18   0.89512566     z
#> 19   0.82158108     m
#> 20   0.55391765     u
#> 21  -0.06191171     p
#> 22  -0.30596266     w
#> 23  -0.69470698     h
#> 24  -0.20791728     h
#> 25   1.20796200     h
#> 26  -1.12310858     r
#> 27  -0.40288484     u
#> 28  -0.46665535     i
#> 29   0.77996512     g
#> 30  -0.08336907     g
#> 31   0.25331851     z
#> 32  -0.04287046     x
#> 33  -0.22577099     w
#> 34   1.51647060     z
#> 35  -1.54875280     k
#> 36   0.21594157     s
#> 37  -0.50232345     u
#> 38  -1.01857538     m
#> 39  -1.07179123     k
#> 40   0.30352864     k
#> 41   0.92226747     z
#> 42   2.05008469     g
#> 43  -2.30916888     w
#> 44   1.00573852     z
#> 45  -0.68800862     x
#> 46  -0.28477301     i
#> 47  -1.22071771     i
#> 48  -0.13889136     w
#> 49  -0.37066003     f
#> 50  -0.22048656     z
#> 51   1.09683901     q
#> 52   0.43518149     z
#> 53  -0.32593159     q
#> 54   1.14880762     z
#> 55   0.99350386     u
#> 56   0.54839696     g
#> 57   0.23873174     u
#> 58  -0.62790608     z
#> 59   1.36065245     i
#> 60   2.18733299     f
#> 61   1.53261063     r
#> 62  -0.23570036     q

Created on 2019-07-17 by the reprex package (v0.3.0)

1 Like

This is the one I am looking for!

In my case, I used this:

transacties.17.18 %>%
filter(!str_detect(transacties.17.18$Omschrijving, paste0(zoektermen, collapse = "|")))

Thanks all very much @andresrcs, @pieterjanvc, @mishabalyasin

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.