csv-format, all values pressed in one Column and following rows

Hi again.!
As mentioned, i'm back with some beginner-half-stupid-questions.

For writing my paper I do have a problem with getting one csv into the right format.
I use a specific statistics from the german "pks"

Normally it looks like this (year 2018):

But for some reason the year 2020 looks like this:

I tried some fancy stuff like:

header_row <- readLines(file(csv_file, "r", encoding = "latin1"), n = 1)

header_row <- iconv(header_row, to = "UTF-8")

column_names <- strsplit(header_row, ",", fixed = TRUE)[[1]]

data <- read.csv(csv_file, header = FALSE, encoding = "latin1")

data_split <- separate(data, V1, into = column_names, sep = ",")

It kinda works...but not really. So the current best result is:

I would love to hear your feedback :slight_smile:

It looks to me like the 2020 data file is simply malformed. There are no separators that I can see in many of the lines. I do see some commas in some rows, but those might be decimal marks rather than column separators. Fixing the import with code is probably not practical. Can you get a clean version of the data?

Thank you for your valuable time.

Well on of the Tasks in this paper is to Format and manipulate the data in files to get the values you need..
To get a clean data-file might need a while if ever, cause it's from the government. But yeah, I might send a request.

I found another Kind of workaround, since this file exists as an xls-file too.

So i just saved the xls as a csv and it looks ugly, but I should get the needed values out of it.

Not sure what the corrector will say to this, but since the original csv-file is malformed it could be a valid excuse...

But at least I can keep working.

You can directly read an Excel file. There are a few packages that do that. I have used the readxl and the openxlsx packages.

1 Like

That worked. The columns and rows are still ugly, but workable. Thank you!:slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.