NHANES sample code for Sugar Sweetened Beverages not producing the expected results

I am using sample code from the National Health and Nutrition Examination Survey (NHANES) to see if I can generate the same results with their publicly available data. Specifically, I am trying to get the average number of calories from sugar sweetened beverages consumed by individuals in the dataset (years 2011-2014).

The code uses demographic data (age, RIDAGEYR) as well as dietary recall data (DR1IFDCD, are food codes that indicate a sugar sweetened beverage, and DR1DRSTZ is a status variable which indicates a valid dietary recall). Finally, each individual has a unique identified, SEQN. Finally, there are also survey design variables (SDMVSTRA,SDMVPSU, and WTDRD1).

I am using R, but their sample code is written in SUDAAN. I am ending up with a mean of 294 kcals, and in their publication they report 179.

Here is my version of their code for the relevant sections only:

# load packages 
library(tidyverse)
library(survey)

# download publicly available data

demo_g <-download.file("https://wwwn.cdc.gov/nchs/nhanes/2011-2012/DEMO_G.XPT", tf <- tempfile(), mode="wb")
DEMO_G <- foreign::read.xport(tf)[,c("SEQN","RIAGENDR","RIDAGEYR","SDMVSTRA","SDMVPSU")]

demo_h <-download.file("https://wwwn.cdc.gov/nchs/nhanes/2013-2014/DEMO_H.XPT", tf <- tempfile(), mode="wb")
DEMO_H <- foreign::read.xport(tf)[,c("SEQN","RIAGENDR","RIDAGEYR","SDMVSTRA","SDMVPSU")]

#bind demo data into one df
demo <- bind_rows(DEMO_G,DEMO_H)
rm(DEMO_G,DEMO_H)

download.file("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DR1IFF_G.XPT",  tf <- tempfile(), mode="wb")
DIET_G <- foreign::read.xport(tf)[,c("SEQN","WTDRD1","DR1IFDCD","DR1IKCAL","DR1DRSTZ")]

download.file("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DR1IFF_H.XPT",  tf <- tempfile(), mode="wb")
DIET_H <- foreign::read.xport(tf)[,c("SEQN","WTDRD1","DR1IFDCD","DR1IKCAL","DR1DRSTZ")]

#bind diet data into one df
diet <- bind_rows(DIET_G,DIET_H)
rm(DIET_G,DIET_H)

# merge diet and demo data
diet_demo <- merge(diet, demo, by= "SEQN")

# create a list of food codes that are considered sugar sweetened beverages (SSBs)
sampcode <- c(42401010,42403010,42404010,64200100,64201010,64201500,64202010,64203020,64204010,64205010,64210010,64213010,64215010,64221010,92101600,92101800,92101920,92101930,92101950,92101960,92121000,92121010,92121020,92130000,92130001,92130020,92301060,92301130,92301160,92301190,92302200,92302400,92302600,92302800,92305000,92305040,92305050,92305800,92306020,92306040,92306090,92306100,92400000,92410110,92410310,92410315,92410330,92410340,92410360,92410390,92410410,92410510,92410550,92410610,92410710,92410810,92411510,92417010,92431000,92432000,92433000,92510610,92510650,92510720,92510730,92511010,92511250,92512050,92512090,92512110,92530410,92530510,92530610,92530950,92531030,92541010,92542000,92550030,92550110,92550350,92552030,92560000,92582100,92582110,92582120,92610010,92610110,92611010,92611100,92611510,92611600,92612010,92613010,92803000,92804000,94100300,94210200,11561010,92101650,92101670,92101925,92101935,92161005,92162005,92204000,92306200,92306610,92307500,92411520,92513000,92550610,92613510,95310200,95310400,95310500,95310550,95310555,95310560,95310600,95310700,95310750,95310800,95311000,95320200,95320500,95321000,95342000,74302000,92101820,92101921,92101923,92101926,92101928,92101931,92101933,92101936,92101938,92101955,92101965,92101970,92101975,92102000,92102010,92102020,92102030,92102040,92102050,92102060,92102070,92102080,92102090,92102100,92102110,92102450,92102600,92102601,92102602,92102610,92102611,92102612,92121001,92121050,92130001,92305910,92307400,92307510,92308000,92308030,92308500,92308530,92309000,92309010,92309500,92510955,92510960,92511015,92513010,92550200,92550360,  92550370,92550380,92610020,92610030)

# get a total number of calories from sugar sweetened beverages by participant
repro <- diet_demo %>% 
  filter((DR1IFDCD %in% sampcode) & (RIDAGEYR >=20) & (DR1DRSTZ == 1)) %>%
  group_by(SEQN) %>% summarize(ssbkcals = sum(DR1IKCAL))

# merge the new ssbkcals variable back into the main  data frame then slice it to one row per participant. Finally, use the existing survey weight variable  `WTDRD1` to create a new survey weight (analytic guidelines stipulate that you divide by the number of cycles used, here there are two cycles, i.e. 2011-12 and 2013-14).
repro_kcals <- merge(repro, diet_demo, by= "SEQN") %>% group_by(SEQN) %>% slice(1) %>% ungroup() %>% mutate(WT4YR = WTDRD1*1/2)
  
# set up the complex survey design, with PSUs nested within Strata and the newly derived survey weight
design <- svydesign(data = repro_kcals, id=~SDMVPSU, strata=~SDMVSTRA, weights=~WT4YR, nest=TRUE) 

#get mean
svymean(~ssbkcals, design)

Any guidance would be much appreciated.

It turns out that the sample code is assigning zeroes to people that don't consume SSBs and keeping those observations in the calculation of the mean, which I did not do. OK to close this topic.

4 Likes

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.