How get only 1 value for many repeats values in a column

M_AcostaCH · February 14, 2023, 5:58pm

Hi community

Im have this data with repeats values. The idea is get only a entry for each one. For example when all entry for specific ACC have numeric values only get 1 entry for this and if is all is NA put a values.

The data base have 19.000 rows
Im try with unique but only get a sum.

dats <- data.frame(ACC=c('BRA1290','BRA1290',
                         'CAE401','CAE401',
                         'FLA444-2','FLA444-2','FLA444-2',
                         'PER413-3','PER413-3','PER413-3',
                         'PER426','PER426','PER426',
                         'PER408','PER408','PER408'),
                   core=c('N','N','N','N','N','N','N','N','N','N','N','N','N','N','N','N'),
                   lat=c(-14.66670000000000,-14.66670000000000,NA,NA,NA,-15.8,-15.8,NA,-6.8,-6.8,NA,NA,-3.75,NA,NA,NA),
                   long=c(-52.35,-52.35,NA,NA,NA,-48.833300000000001,-48.833300000000001,NA,-63.916699999999999,-63.916699999999999,NA,NA,-73.183300000000003,NA,NA,NA))

head(dats,15)
#>         ACC core      lat     long
#> 1   BRA1290    N -14.6667 -52.3500
#> 2   BRA1290    N -14.6667 -52.3500
#> 3    CAE401    N       NA       NA
#> 4    CAE401    N       NA       NA
#> 5  FLA444-2    N       NA       NA
#> 6  FLA444-2    N -15.8000 -48.8333
#> 7  FLA444-2    N -15.8000 -48.8333
#> 8  PER413-3    N       NA       NA
#> 9  PER413-3    N  -6.8000   -63.9167
#> 10 PER413-3    N -6.8000  -63.9167
#> 11   PER426    N       NA       NA
#> 12   PER426    N       NA       NA
#> 13   PER426    N  -3.7500 -73.1833
#> 14   PER408    N       NA       NA
#> 15   PER408    N       NA       NA
Created on 2023-02-14 by the reprex package (v2.0.1)

data link:

For example, the desire df is something like that:

dats2 <- data.frame(ACC=c('BRA1290','CAE401','FLA444-2','PER413-3','PER426','PER408'),
                    core=c('N','N','N','N','N','N'),
                    lat=c(-14.6667,NA,-15.8000,-6.8000,-3.7500,NA),
                    long=c(-52.3500,NA,-48.8333,-63.9167,-73.1833,NA))
head(dats2)
#>        ACC core      lat     long
#> 1  BRA1290    N -14.6667 -52.3500
#> 2   CAE401    N       NA       NA
#> 3 FLA444-2    N -15.8000 -48.8333
#> 4 PER413-3    N  -6.8000 -63.9167
#> 5   PER426    N  -3.7500 -73.1833
#> 6   PER408    N       NA       NA

Im try with this:

dats_unique <- distinct(dats, ACC, .keep_all = TRUE)

a bad result
# ACC core      lat   long
# 1  BRA1290    N -14.6667 -52.35
# 2   CAE401    N       NA     NA
# 3 FLA444-2    N       NA     NA
# 4 PER413-3    N       NA     NA
# 5   PER426    N       NA     NA
# 6   PER408    N       NA     NA

Tnks

FJCC · February 14, 2023, 6:37pm

This gets close to your desired result. I do not understand how you want PER413-3 calculated. It has two lat and two long values and your final answer mixes the two.

dats <- data.frame(ACC=c('BRA1290','BRA1290',
                         'CAE401','CAE401',
                         'FLA444-2','FLA444-2','FLA444-2',
                         'PER413-3','PER413-3','PER413-3',
                         'PER426','PER426','PER426',
                         'PER408','PER408','PER408'),
                   core=c('N','N','N','N','N','N','N','N','N','N','N','N','N','N','N','N'),
                   lat=c(-14.66670000000000,-14.66670000000000,NA,NA,NA,-15.8,-15.8,NA,-6.8,-63.916699999999999,NA,NA,-3.75,NA,NA,NA),
                   long=c(-52.35,-52.35,NA,NA,NA,-48.833300000000001,-48.833300000000001,NA,-6.8,-63.916699999999999,NA,NA,-73.183300000000003,NA,NA,NA))

library(purrr)
CheckNA <- function(DF) {
  DF <- unique(DF)
  if(any(!is.na(DF$lat))) DF <- na.omit(DF)
  return(DF)
}
datsUniq <- map_dfr(split(dats, dats$ACC), CheckNA)
datsUniq  
#>        ACC core      lat     long
#> 1  BRA1290    N -14.6667 -52.3500
#> 2   CAE401    N       NA       NA
#> 3 FLA444-2    N -15.8000 -48.8333
#> 4   PER408    N       NA       NA
#> 5 PER413-3    N  -6.8000  -6.8000
#> 6 PER413-3    N -63.9167 -63.9167
#> 7   PER426    N  -3.7500 -73.1833

^{Created on 2023-02-14 with reprex v2.0.2}

technocrat · February 14, 2023, 7:10pm

run length encoding—rle()—along with cumsum will get there.

dats <- data.frame(ACC=c('BRA1290','BRA1290',
                         'CAE401','CAE401',
                         'FLA444-2','FLA444-2','FLA444-2',
                         'PER413-3','PER413-3','PER413-3',
                         'PER426','PER426','PER426',
                         'PER408','PER408','PER408'),
                   core=c('N','N','N','N','N','N','N','N','N','N','N','N','N','N','N','N'),
                   lat=c(-14.66670000000000,-14.66670000000000,NA,NA,NA,-15.8,-15.8,NA,-6.8,-63.916699999999999,NA,NA,-3.75,NA,NA,NA),
                   long=c(-52.35,-52.35,NA,NA,NA,-48.833300000000001,-48.833300000000001,NA,-6.8,-63.916699999999999,NA,NA,-73.183300000000003,NA,NA,NA))


dats[cumsum(rle(dats[[1]])$lengths),]
#>         ACC core      lat     long
#> 2   BRA1290    N -14.6667 -52.3500
#> 4    CAE401    N       NA       NA
#> 7  FLA444-2    N -15.8000 -48.8333
#> 10 PER413-3    N -63.9167 -63.9167
#> 13   PER426    N  -3.7500 -73.1833
#> 16   PER408    N       NA       NA

rle() returns the number of times each entry recurs, and the cumsum() of that corresponds to the row indexes first happen. In turn that allows dats to be subset.

M_AcostaCH · February 14, 2023, 7:30pm

I do not understand how you want PER413-3

Hi @FJCC, yes I'm check this error when I make the reprex, was my error.
I'm going to check in the original dats. Tnks

M_AcostaCH · February 15, 2023, 7:39pm

Im try with boths code helps but when not all ACC get unique value.
Im goig to put all data.
Im try without space in ACC column but not work.

For example, when I count repeat values FLA444-2 have 2 next to run the codes.

Is good advance because the data pass to get 7000 repeat values to only 17.

system · March 8, 2023, 7:40pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.