Hello. I have a list of (consistently) poorly formatted data frames. I've spent much of the morning trying to figure out a solution using purrr and it's proving very frustrating. Here's some example data:
Note that each data frame has the same number of columns.
Problems:
The first row of data is stored as the column names.
There are separate columns for dollar signs ($).
What I hope to accomplish:
Move the column names down into a row.
Rename each column (a simple x1:x(n) scheme is fine).
Dropping the columns containing "$" is not a problem, as I can just do it later on. Buuuuut if anyone thinks that it would be better before I combine the data frames, please say so
PS - If you provide any rlang or tidy evaluation context in your response, I would really appreciate it. I'm trying to get a better grasp of it.
Thanks @MikeBadescu. It's not working on my actual data at the moment, but I think I'm close to a solution using your suggestion. I'll follow up later on.
@cderv - I'm a big fan of read_csv ! Unfortunately the data are stored in a series of poorly arranged tables in several PDF documents. I imported data via the tabulizer package.
Did you use the output = "data.frame" argument in extract_tables ?
If it is what I think, this function is expecting by default the table to have a header. Internally, the method to returns data.frame call read.delim, and extract_tables as a ... argument in which you can pass argument to method used. So here, if you add header = FALSE to your call of extract_tables, it should import the table with no header.
Could you try to confirm this ?
Thanks for the question ! It helps see that the documentation is not fully correct because it says that ... are used with method and not output. (I opened an issue about that)
One great thing about open source software is that you can always go read the source code to understand better what is feasible and how it works. Unfortunately, everything is not always documented and reading a function source code can help a lot to learn a few things.
It is also why documentation is very very important in software development to help user fully understand the power of the software.