Pull data from email body, and store in a datatable (Outlook)

Hi there, each morning I get an email listing the sales of fruit and veg for the day. What I'd like to do is create a code that will search my outlook emails for a particular email subject, iterate line by line through the body of those emails for the fruit data, take the 'Date of report' from the body and then tie it altogether in a datatable.

The structure of the emails is as follows:

Date of report:,22-JAN-2022
Time report produced:,23-JAN-2022 00:11
Apples,5
Grapes,8
Grapefruit,6
Lemons,5
Oranges,1
Pears,2

I was able to achieve the desired result in Python (see code below), however as time has passed it's become necessary to achieve the result in R instead. Now, I realise there are ways to interface between Python and R by using the likes of RPy, but unfortunately that isn't going to be suitable for me.

Fruit_1 = "Apples"
Fruit_2 = "Grapefruit"
Fruit_3 = "Oranges"

fruit_search = [Fruit_1, Fruit_2, Fruit_3]

def get_vals(report, searches):
    dct = {}
    for line in report:
        term, *value = line
        if term.casefold().startswith('date'):
            dct['date'] = pd.to_datetime(value[0])
        elif term in searches:
            dct[term] = float(value[0])
    if len(dct.keys()) != len(searches):
        dct.update({x: None for x in searches if x not in dct})
    return dct

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6) 
messages = inbox.Items
messages.Sort("[ReceivedTime]", False)

result = []

for message in messages:
    if message.subject == 'FRUIT SALES':
        if Fruit_1 in message.body and Fruit_2 in message.body:
            data = [line.strip().split(",") for line in message.body.split('\n')]
            result.append(get_vals(data, fruit_search))
        else:
            pass

Fruit_vals = pd.DataFrame(result)
Fruit_vals.columns = map(str.upper, Fruit_vals.columns)

Now, I'm horrendously new to R; but I've been using what I did in Python as a bit of a guide and I've got as far as the function:

Fruit_1 <- "Apples"
Fruit_2 <- "Grapefruit"
Fruit_3 <- "Oranges"

fruit_search <- c(Fruit_1, Fruit_2, Fruit_3) ## Assuming the vectors work similarly to the previous
OutApp <- COMCreate("Outlook.Application")
outlookNameSpace = OutApp$GetNameSpace("MAPI")

search <- OutApp$AdvancedSearch("Inbox", "urn:schemas:httpmail:subject = 'FRUIT SALES'")
inbox <- outlookNameSpace$Folders(6)$Folders("Inbox")
results <- search$Results()
results$Count()

emails = inbox$Items

totalEmails = indbox$Items()$Count()

get_vals <- function(report,searches) {
  data <- read.table(text=report,sep=",")
  colnames(data) <- c('key','value')

Conceptually I get stuck around creating the function, and though I would like to use an R vector (fruit_search), what I've been reading seems to suggest vectors won't work in a comparable way to the fruit_search in python.

Any advice would be very much appreciated. :slight_smile:

So with a bit of progress, I've produced a loop to iterate through the latest email (though I'd like to iterate and pull data for all).

for (i in totalEmails:1)
{
  subject <- emails(i)$Subject(1)
  if (grepl(search, subject)[1])
  {
    text <- emails(i)$Body() ## just here to check the output
    break
  }
}

I've also created the beginnings of a function to read and find the function I want to apply in each iterative loop:

vec <- c() ## empty vector is here as something to put the data to

get_vals <- function(reports,searches) {
  data <- read.table(text=reports,sep=",")
  colnames(data) <- c('key','value')
  

  date <- data[grepl("Date of report:",sep=",",data$key,ignore.case=T),"value"]
  

  lst <- split(data$value,data$key)
  

  c(list(date=date),lst[searches])
}

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.