Date Fields - Do they need processing for my analysis

jcolomb · May 27, 2020, 1:16pm

Dates and time are always a pain,
look into:
?as.Date

tugga82 · May 28, 2020, 7:03pm

Please use the original dput structure I posted for this following question:

The first column where you see GO-2019770786 is the event_unique_id, although it says unique I see duplicates. I understand that one event can have multiple offences i,e, MCI categories in the dataset and those will not be duplicates. However, I found the duplicate event ids with the same MCI for some records. In this case, how would I drop the duplicates.

I am not sure how to proceed here.

tugga82 · May 28, 2020, 7:15pm

The first two records are duplicates whereas the last two are not.

event_unique_id	premisetype	ucr_code	ucr_ext	offence	MCI
GO-20141262553	Other	1430	100	Assault	Assault
GO-20141262553	Other	1430	100	Assault	Assault
GO-20141296470	Commercial	2120	200	B&E	Break and Enter
GO-20141296470	Commercial	1480	100	Assault - Resist/ Prevent Seiz	Assault

FJCC · May 28, 2020, 7:55pm

If you want the data set to have a data frame with no duplicated rows, you can use the unique() function. If the data frame is named DF

DF_uniq <- unique(DF)

tugga82 · May 29, 2020, 3:33am

Thank you so much. This works.

tugga82 · June 1, 2020, 2:43am

If I use tree based algorithms - say decision tree or random forest to train, is integer/label encoding enough for these variables in the dataset? - Integer encoding/label encoding for premise types, occ month, occ day of week, neighbourhood? One hot encoding will be required only if I use other algorithms? Also lat/lon can be as is for these tree based algorithms? Sorry for a lot of these questions. Any help will be appreciated.
Thanks,

FJCC · June 1, 2020, 3:33am

Please start a new thread for this question. It is very different than your initial question and a new thread will be much more likely to attract someone with the right knowledge.

tugga82 · June 1, 2020, 4:17am

Okay. Sure. Thanks for your help. I just posted it.

system · June 22, 2020, 4:17am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.