Failing to load Pipe Separated tables

I have a large data extract from Salesforce Marketing Cloud which is meant to be pipe separated. I want to put it in a tidy format in a tibble, but I'm failing to do so.

With some samples of the actual data, the following code worked well:

Sent_DATA <- read.delim("data.csv", header = TRUE, sep = "|", dec = ".")

However, when I try to process the full size file, I get a huge 1 column x 47,086,892 rows tibble, where I should have 10 columns:

7001222|2684596|||158964051|1671|1/1/2016 12:26:42 PM|Sent|126|membership rfi correct
7001222|2684596|||159039493|1671|1/4/2016 9:51:48 AM|Sent|129|membership rfi correct
7001222|2684596|||159432435|1671|1/9/2016 1:21:51 AM|Sent|130|membership rfi correct
7001222|2684596|||159564062|1671|1/12/2016 3:36:40 PM|Sent|136|membership rfi correct
7001222|2684597|||159874316|1671|1/18/2016 4:56:39 PM|Sent|414|membership rfi
7001222|2684597|||160022235|1671|1/20/2016 5:49:36 PM|Sent|418|membership rfi
  • I have been told that this dataset is inconsistent, and sometimes a row does not finish with a pipe. Sometimes it does.
  • I know that in some instances there are pipes in the string that should not be delimiters (re-exporting from Salesforce with commas is unfortunately not an option, this historical data has been deleted).

I would appreciate code snippets and suggestions that could help me to:

  • Assess the issues with my data
  • Isolate inconsistent rows (with more or less pipes than I should have)
  • Any ideas to put my data in a tibble when the method above is not working.
  • Unfortunately, I can't share the dataset (which makes it harder to get help.)

I have also tried—unsuccessfully:

Sent_DATA2 <- read.table("data.csv", header = FALSE, sep = "|")

Thank you for any help you can provide. :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.