Failing to load Pipe Separated tables

lsatch · March 29, 2022, 12:32pm

I have a large data extract from Salesforce Marketing Cloud which is meant to be pipe separated. I want to put it in a tidy format in a tibble, but I'm failing to do so.

With some samples of the actual data, the following code worked well:

Sent_DATA <- read.delim("data.csv", header = TRUE, sep = "|", dec = ".")

However, when I try to process the full size file, I get a huge 1 column x 47,086,892 rows tibble, where I should have 10 columns:

ClientID|SendID|SubscriberKey|EmailAddress|SubscriberID|ListID|EventDate|EventType|BatchID|TriggeredSendExternalKey
7001222|2684596|sample@email.com|sample@email.com|158964051|1671|1/1/2016 12:26:42 PM|Sent|126|membership rfi correct
7001222|2684596|sample@email.com|sample@email.com|159039493|1671|1/4/2016 9:51:48 AM|Sent|129|membership rfi correct
7001222|2684596|sample@email.com|sample@email.com|159432435|1671|1/9/2016 1:21:51 AM|Sent|130|membership rfi correct
7001222|2684596|sample@email.com|sample@email.com|159564062|1671|1/12/2016 3:36:40 PM|Sent|136|membership rfi correct
7001222|2684597|sample@email.com|sample@email.com|159874316|1671|1/18/2016 4:56:39 PM|Sent|414|membership rfi
7001222|2684597|sample@email.com|sample@email.com|160022235|1671|1/20/2016 5:49:36 PM|Sent|418|membership rfi

I have been told that this dataset is inconsistent, and sometimes a row does not finish with a pipe. Sometimes it does.
I know that in some instances there are pipes in the string that should not be delimiters (re-exporting from Salesforce with commas is unfortunately not an option, this historical data has been deleted).

I would appreciate code snippets and suggestions that could help me to:

Assess the issues with my data
Isolate inconsistent rows (with more or less pipes than I should have)
Any ideas to put my data in a tibble when the method above is not working.
Unfortunately, I can't share the dataset (which makes it harder to get help.)

I have also tried—unsuccessfully:

Sent_DATA2 <- read.table("data.csv", header = FALSE, sep = "|")

Thank you for any help you can provide.

system · April 19, 2022, 12:33pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.