This is my kaggle notebook
and the issue is I wanted to clear observations(rows) where there are missing values in corresponding columns
and I have used drop_na()
but still the data frame contains rows where the some variables(columns is null)
Could you please share here some rows from your data that have this problem? For instance, maybe you can view your data and see that rows 12,000 and 12,003 in your dataset df
have this problem. You could run dput(df[12000:12005,])
and paste the output into your question.
That code will let us create an exact replica of that data excerpt and try potential solutions without needing to go to your external link, follow a variety of preprocessing steps, and try to identify the incorrect data ourselves.
dput(cycle_shares[12000:12005,])
structure(list(trip_id = c("2CA14DBA9D651C47", "DF207684FC2E8BFB", "7E9350C8BC7B6556", "484D3218D648D90D", "E4CF3205E0DFE2CF", "BA36F5E7C39871A2" ), bike_type = c("electric_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike"), start_time = c("2020-11-09 21:31:46", "2020-11-08 19:05:26", "2020-11-09 16:48:26", "2020-11-08 18:10:48", "2020-11-08 22:36:04", "2020-11-08 18:47:21"), end_time = c("2020-11-09 21:38:41", "2020-11-08 19:08:18", "2020-11-09 17:19:51", "2020-11-08 18:28:21", "2020-11-08 22:59:00", "2020-11-08 19:07:31"), start_station_name = c("Larrabee St & Webster Ave", "Noble St & Milwaukee Ave", "Leavitt St & Belmont Ave", "Broadway & Barry Ave", "Clark St & Berwyn Ave", "Indiana Ave & Roosevelt Rd"), start_station_id = c("144", "29", "664", "300", "463", "255"), end_station_name = c("Southport Ave & Wrightwood Ave", "Noble St & Milwaukee Ave", "Leavitt St & Belmont Ave", "Lakefront Trail & Bryn Mawr Ave", "Paulina St & Montrose Ave", "Green St & Madison St"), end_station_id = c("190", "29", "664", "459", "297", "198"), start_lat = c(41.9218205, 41.9006471666667, 41.9392753333333, 41.937648, 41.9780221666667, 41.8679126666667), start_lng = c(-87.6440245, -87.6626198333333, -87.6832818333333, -87.6440336666667, -87.6680306666667, -87.6230515 ), end_lat = c(41.928803, 41.9007388333333, 41.9394016666667, 41.9840153333333, 41.9614741666667, 41.8819971666667), end_lng = c(-87.6637743333333, -87.6625855, -87.6832893333333, -87.65235, -87.6714206666667, -87.648773), user_type = c("casual", "casual", "casual", "casual", "casual", "casual")), row.names = 12000:12005, class = "data.frame")
keyboard_arrow_right
Sorry, I wasn't clear enough -- can you please identify some examples of rows with missing values after you used drop_na()
, and share a range of rows that includes one or more of those? I don't know which rows those might be -- I used the range from 12000 to 12005 as a hypothetical example, but those rows look fine as far as I can tell.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.