I have found this community a great resource for R /tidyverse/RStudio questions.
I am at a different (earlier) stage learning Python. But have to use it alongside R.
Can anyone recommend an equivalent novice/intermediate-friendly community for python?
For instance, I have inherited some code where string columns are still objects and I plan to convert them to true strings. Ideally I would like an elegant way of doing that. ie Not df[['c1','c2']] = df[['c1','c2']] .astype('string')
You can ask Python questions here too, there are plenty of programmers who know both.
What do you find inelegant about df[['c1','c2']] = df[['c1','c2']] .astype('string')
? Do you have hundreds of columns / trying to do it without specifying every column?
Best,
Randy
Fair question! I do have a lot more columns than that in the file I read in (with read_excel)
What seems inelegant is having to specify the columns twice (once on each side) - and then there is the issue of them having to be in the same order across the assignment.
I think I could do this neatly in R/tidyverse with across.
But don't be surprised if I have missed something obvious - I am not that fluent in Python
There's probably a half-dozen tricky ways to do this in Python, here's a few:
- if you don't explicitly need to use pandas, you could try polars which doesn't use the
object
data type
- if you want/need to use pandas, you can explicitly write out the dtypes on your statement that creates the dataframe (such as
pd.read_csv
. That can be tedious, but this is a good trick for not writing out every column type
- (this one is probably the one I'd start with) given you already have a df, you can use
df.dtypes
to get the column types. Loop over that series and test "is this an object type?". If so, then do the conversion you wrote above. This means you'll only write the logic once, then for however many columns you have it will convert them all
After that, you're probably getting further into things like "read it in polars, then convert to a pandas df" if you have to use pandas, or figure out how to make your object columns that represent numbers actually be Int/float, but I'd start with one of the choices above first.
Best,
Randy
Thanks. polars will have to wait for a new project - or at least some study.
Both the other solutions seem very doable.
I now remember that I ended up writing a read_excel2 (sorry for the name) for R which creates a col_types vector, populates it with "guess" but then overrides some columns based on lists of columns to force to date, text and number. I might need to put the same effort into helper functions that I did with R.
If I get helpful answers like that, I'll stick with asking here!
1 Like