So, i have this CSV that i made into a data frame and it has some lines that are split into 2 or 3 lines when they should be one
EX: line 75, 76 and 77 should be one line.
I was thinking that every correct line starts with a number and every wrong one doesn't so i could use this somehow to stick the wrong lines together adding the ";" in the process
Thanks. As often happens, there is more than the one problem. Here, it is that lines 6-8, which we want to fix, have more potential semicolon ; delimited variables than we'd like.
I'll use saude.txt as the name of the file from which the data come. From a terminal session (not the console)
Interesting! Thank you. I didn't know this awk command and i don't know if i can use it in my Rstudio on windows but you've shown me the problem.
It's not ideal but i guess i could just discard any lines that don't have 30. I'm trying to figure out how to do this.
It seems like the error in the data varies, some other lines have different missing data but they seem to always start with not a number so i was a thinking of detecting that to pick which lines to kill
This can be done within the RStudio terminal pane. You can run the awk script there, using the first one to discover how many records need to be discarded (if too many, then must address converting them) and the second to create the good.txt file to import into R in place of the original.
I normally wouldn't suggest this pre-processing non-R approach, but it vastly simpler than doing the same operation within R.
And Windows. I left it long ago. I think you need Powershell(?) installed and possibly to add awk.
Wow, i think you just solved it all. I can't find a single defective line with your method.
I had used a whole nother method to separate things that kept the defect but it was as simple as that, and you're also right with the FALSE, it needs to be TRUE
Thanks! Now i can fail at the actual data mining instead of formatting
BTW, you may want to use "read.csv2()" if you have decimal values in the data set. read.csv assumes that . is the decimal marker so we have pi = 3.1416. I do not know but would you say pi = 3,1416?