I have a large data extract from Salesforce Marketing Cloud which is meant to be pipe separated. I want to put it in a tidy format in a tibble, but I'm failing to do so.
With some samples of the actual data, the following code worked well:
Sent_DATA <- read.delim("data.csv", header = TRUE, sep = "|", dec = ".")
However, when I try to process the full size file, I get a huge 1 column x 47,086,892 rows tibble, where I should have 10 columns:
ClientID|SendID|SubscriberKey|EmailAddress|SubscriberID|ListID|EventDate|EventType|BatchID|TriggeredSendExternalKey
7001222|2684596|sample@email.com|sample@email.com|158964051|1671|1/1/2016 12:26:42 PM|Sent|126|membership rfi correct
7001222|2684596|sample@email.com|sample@email.com|159039493|1671|1/4/2016 9:51:48 AM|Sent|129|membership rfi correct
7001222|2684596|sample@email.com|sample@email.com|159432435|1671|1/9/2016 1:21:51 AM|Sent|130|membership rfi correct
7001222|2684596|sample@email.com|sample@email.com|159564062|1671|1/12/2016 3:36:40 PM|Sent|136|membership rfi correct
7001222|2684597|sample@email.com|sample@email.com|159874316|1671|1/18/2016 4:56:39 PM|Sent|414|membership rfi
7001222|2684597|sample@email.com|sample@email.com|160022235|1671|1/20/2016 5:49:36 PM|Sent|418|membership rfi
- I have been told that this dataset is inconsistent, and sometimes a row does not finish with a pipe. Sometimes it does.
- I know that in some instances there are pipes in the string that should not be delimiters (re-exporting from Salesforce with commas is unfortunately not an option, this historical data has been deleted).
I would appreciate code snippets and suggestions that could help me to:
- Assess the issues with my data
- Isolate inconsistent rows (with more or less pipes than I should have)
- Any ideas to put my data in a tibble when the method above is not working.
- Unfortunately, I can't share the dataset (which makes it harder to get help.)
I have also tried—unsuccessfully:
Sent_DATA2 <- read.table("data.csv", header = FALSE, sep = "|")
Thank you for any help you can provide.