I have a database of emails. like below, i want to filter out those emails are not correct.
for eg:
- if email is not having "."
- if email have more than one "@"
- if email have more than one "." before and after "@"
- if email have spaces inside email or outside email.
- if email have domain other than "gmail.com" like (hotmail.com, live.com)
please help me like this if in future i will found anything to amend than i can add more conditions.
df <- data.frame(email=c("abc@gmail.com","def@gmail.com","ghi@gmail.com","jkl@gmail.com","mno@gmail.com","pqr@hotmail.com","st@u@live.com","vwx@gmail.com","yza@gmail.com","a.a.b@gmail.c.om",
"aac@gmail.com","abb@gmail.com","abc@gmail.com","cab@gmailcom","dfc@gmail.com"))
for example the output be like
Emails require complex regular expressions to parse to account for almost all possible cases, such as
?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
See RFC5322; see also this S/O
Starting at step 5 in the OP reduces the complexity, however, and makes the other tests in the OP unnecessary
suppressPackageStartupMessages({library(dplyr)
library(stringr)
})
df <- data.frame(email=c("abc@gmail.com","def@gmail.com","ghi@gmail.com","jkl@gmail.com","mno@gmail.com","pqr@hotmail.com","st@u@live.com","vwx@gmail.com","yza@gmail.com","a.a.b@gmail.c.om","aac@gmail.com","abb@gmail.com","abc@gmail.com","cab@gmailcom","dfc@gmail.com"))
is_gmail <- "gmail.com"
df %>% filter(str_detect(email,is_gmail))
#> email
#> 1 abc@gmail.com
#> 2 def@gmail.com
#> 3 ghi@gmail.com
#> 4 jkl@gmail.com
#> 5 mno@gmail.com
#> 6 vwx@gmail.com
#> 7 yza@gmail.com
#> 8 aac@gmail.com
#> 9 abb@gmail.com
#> 10 abc@gmail.com
#> 11 dfc@gmail.com
Created on 2020-08-27 by the reprex package (v0.3.0)
system
Closed
3
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.