Check if a string is a date with a regex but the result is not the expected one

Assume the following list:

string_test <- c("na-05-2021", "nk-31-2006", "NK-nk-1965", "NK-nk-89", "00-12-2005", "12-12-nk")

I would like to check if the strings respect the following date format: dd/mm/YYYY
If the day or the month is unknown, we admit to replace it by nk (upper or lower case).
So, the accepted strings in our example are: "na-05-2021", and "NK-nk-1965".
The other are not accepted.

I did the following regex but the returned results are not what i expected:

#install.packages("stringr")
library(stringr)
result <- str_detect(string_test , "/^(nk|NK|(0[1-9]|[12]/d|3[01]))+(-)+(nk|NK|0[1-9]|1[0-2])(-)+(1|2)+(0|9)+[0-9]+[0-9]/", negate = TRUE)

I think it's a problem with the regex but i tested it in https://regexr.com/ and it works as expected.

The problem is that you are using Javascript regex flavor and stringr uses ICU flavor, for example, in ICU you use ^ instead of \^

Sorry this is my correct code:

str_detect(test_string, "^(nk|NK|((0[1-9]|[12]\\d|3[01])))+(-)+(nk|NK|(0[1-9]|1[0-2]))(-)+((1|2)+(0|9)+[0-9]+[0-9])")

You are right but where can I find a website to test my regex in ICU flavor because I didn't find anything and also, I don't understand why it doesn't give me an error but an incorrect result.

You can test it from within R with regexplain

Because you are scaping the metacharacter so it is not an invalid regex expression, it is just looking for a literal "^"

1 Like

Thank you, i found on regexplain the following regex to test the correct format date:

(0?[1-9]|[12][0-9]|3[01])([ /-])(0?[1-9]|1[012])\2([0-9][0-9][0-9][0-9])(([ -])([0-1]?[0-9]|2[0-3]):([0-5]?[0-9]):[0-5]?[0-9])?

But it checks only if the date is in the format dd-mm-yyyy, so I tried to add the "nk" condition but nothing works.

Have you an idea of how add this condition please ? or anyone else

I finally found the correct regex to do it, I put it here if anyone has the same problem:

^(nk|NK|(0[1-9]|[12][0-9]|3[01]))(-)(nk|NK|(0[1-9]|1[0-2]))(-)((1|2)(0|9)[0-9][0-9])

It test if the string match with a correct date format dd-mm-yyyy and if day or month is unknown, "nk" (upper or lower case) is admitted. (e.g.: nk-01-2021, nk-nk-2021, etc.)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.