Find "Similar" last records?


Looking for some clever ideas..

I have a dataset which has Car Registrations, the time they caught on camera, and which camera they were caught on.

Note: These are not real number plates...!

NumberPlate	ANPR	DateTime
XX65 XXX	CAMERA 3	2021-01-04 05:16:43
YY68 XXX	CAMERA 3	2021-01-04 05:18:22
XX65 XXX	CAMERA 2	2021-01-04 05:19:24
ZZ65 XXX	CAMERA 3	2021-01-04 05:19:30
AA65 XXX	CAMERA 3	2021-01-04 05:19:44
YC68 XXX	CAMERA 1	2021-01-04 05:19:49
DD67 XXX	CAMERA 3	2021-01-04 05:22:02

As you can see the Number Plate XX65 XXX was first caught on Camera 3, then Camera 2.
So using dplyr, I can do something like the below to get the last camera it used.

df <- df %>%
group_by(NumberPlate) %>%
mutate(PreviousCamera = lag(ANPR,1))

However the example "YY68 XXX" appears once, but the chances are that the number plate "YC68 XXX" is the same vehicle with the camera not detecting all characters properly. Is there a clever way which you find the previous "similar" match?

Thanks all!

I feel a bit dirty assisting anything related to ANPR (boo , speed cameras, boo! :nauseated_face:), but you could experiment with the agrep() function or else the stringdist package.

I tried something like this some time ago without much success, but that was based on much longer variable-length strings, so this should be easier.

1 Like

Hi Martin,

Thanks for this - the agrep() package worked a treat!

I included this as part of a while / for loop to trace back through the records until something "similar" turned up!

Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.