ggplot() ugly plot vs plot() simple adjustments

Hi, I was taking a look at some data and decided to use ggplot to create a simple plot. The plot generated looked horrible and didn't represent the data correctly. I didn't pay much attention to until a colleague showed me his plot which was a better representation.

For my ggplot the actual axis lines weren't drawn for some reason. I'm unsure how to adjust these features on the plot, as the scale is off. Usually when I use ggplot() I get pretty plots this was the first one that looked off... I'm guessing ggplot() is doing a bad job determining the x and y scales?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

# Sample Data set:

df <- tibble::tribble(
  ~income,            ~p_beph,
  "91802",   1.39819857561793,
  "40969",  0.606421369842628,
  "36886",  0.192773641299542,
  "48750",   0.18018018018018,
  "29898",  0.118028917084686,
  "-",   4.06976744186047,
  "47437",   1.04676396749075,
  "43750",  0.503778337531486,
  "28508",  0.087989441267048,
  "55794",   0.32561844077961,
  "78375",   1.01564644523744,
  "104607",    3.4850640113798,
  "102188",   1.60427807486631,
  "44380",  0.173425827107791,
  "75181",   2.84633244475019,
  "53644",   1.32281994086217,
  "44410",  0.457645166406126,
  "50673",  0.295778435063189,
  "-",  0.110405741098537,
  "42182",  0.190597204574333,
  "48311",  0.246837395865474,
  "44166",  0.436652974091923,
  "106959",   1.81834025619427,
  "115392",   1.71095740808154,
  "34408",                  0,
  "66397",   1.14153428594236,
  "65299",    2.9047491757997,
  "43395",   0.43086037430995,
  "47705",  0.388222464558342,
  "61300",   0.92461255581823,
  "67122",   1.03092783505155,
  "58648",  0.994445091869635,
  "72398",  0.921153111874761,
  "61667",  0.369467228256854,
  "56590",  0.490470852017937,
  "186378",   4.70380913650535,
  "89425",   3.40295705233513,
  "100323",   3.24567738646502,
  "63042",  0.404658854991379,
  "46076",  0.576008014024543,
  "106581",   2.01897236833039,
  "85313",   1.82648401826484,
  "53492",  0.662056169351438,
  "62255",   0.60965954077593,
  "98343",   2.30803128787044,
  "62407",   1.24401254350598,
  "54247",  0.391448358928034,
  "38548",                  0,
  "54476",  0.659674120984234,
  "36450",  0.144901285998913,
  "116250",     2.356637863315,
  "51209",   0.37037037037037,
  "91090",   2.66736061414352,
  "71041",  0.488468263439816,
  "36667",  0.286532951289398,
  "225000",   1.71102661596958,
  "42880",   2.84874104024995,
  "74012",  0.957741562493182,
  "54414",  0.663732271361699,
  "46229",  0.313773578100768,
  "71576",  0.461661983139302,
  "36106",   1.19088125212657,
  "53545",  0.260276431520373,
  "82994",   2.70919816473673,
  "17694",  0.434512725015518,
  "62937",  0.868055555555556,
  "141761",   3.01863792384435,
  "102599",   3.17929691804724,
  "59583",   1.88133140376266,
  "66549",   1.06200541333499,
  "88671",   2.17663919742028,
  "32188",  0.137741046831956,
  "44444",   5.95238095238095,
  "67878",  0.896860986547085,
  "25708",  0.142585551330798,
  "34737", 0.0795544948289578,
  "37750",                  0,
  "113300",  0.675675675675676,
  "85354",   2.91218413810358,
  "51918",   1.24637771476646,
  "41169",   0.29604163682376,
  "50469",                  0,
  "46333",  0.172950536146662,
  "37778",  0.669757856774858,
  "-",   4.70588235294118,
  "77153",   2.48161764705882,
  "-",   3.82775119617225,
  "110659",   3.57568726030488,
  "58906",  0.368324125230203,
  "26094",                  0
)

ggplot(data = df,
       aes(x = income, y = p_beph)) + geom_point()

plot(df$income, df$p_beph)
#> Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by coercion

Created on 2021-10-26 by the reprex package (v2.0.1)

Your income is a character. Change it to numeric.

At the moment the X axis is not in numerical order. The clue is often in ggplot's need to show every category

THEN you can address any other cosmetic deatils.

Your reprex shows the income in "4567" rather than 4567. That tells R to treat it as a character.

1 Like

I suspect that this is because "-" is present and this needs to be set to ?NA.

If you ar importing the real data, some sample of that process may help to fix

1 Like

This is happening because income is a "character" variable (most likely because of the presence of missing values as "-") and ggplot2() is more strict when it comes to respecting variable classes. If you explicitly convert income to a numeric variable you will get the expected result.

library(tidyverse)

df <- tibble::tribble(
    ~income,            ~p_beph,
    "91802",   1.39819857561793,
    "40969",  0.606421369842628,
    "36886",  0.192773641299542,
    "48750",   0.18018018018018,
    "29898",  0.118028917084686,
    "-",   4.06976744186047,
    "47437",   1.04676396749075,
    "43750",  0.503778337531486,
    "28508",  0.087989441267048,
    "55794",   0.32561844077961,
    "78375",   1.01564644523744,
    "104607",    3.4850640113798,
    "102188",   1.60427807486631,
    "44380",  0.173425827107791,
    "75181",   2.84633244475019,
    "53644",   1.32281994086217,
    "44410",  0.457645166406126,
    "50673",  0.295778435063189,
    "-",  0.110405741098537,
    "42182",  0.190597204574333,
    "48311",  0.246837395865474,
    "44166",  0.436652974091923,
    "106959",   1.81834025619427,
    "115392",   1.71095740808154,
    "34408",                  0,
    "66397",   1.14153428594236,
    "65299",    2.9047491757997,
    "43395",   0.43086037430995,
    "47705",  0.388222464558342,
    "61300",   0.92461255581823,
    "67122",   1.03092783505155,
    "58648",  0.994445091869635,
    "72398",  0.921153111874761,
    "61667",  0.369467228256854,
    "56590",  0.490470852017937,
    "186378",   4.70380913650535,
    "89425",   3.40295705233513,
    "100323",   3.24567738646502,
    "63042",  0.404658854991379,
    "46076",  0.576008014024543,
    "106581",   2.01897236833039,
    "85313",   1.82648401826484,
    "53492",  0.662056169351438,
    "62255",   0.60965954077593,
    "98343",   2.30803128787044,
    "62407",   1.24401254350598,
    "54247",  0.391448358928034,
    "38548",                  0,
    "54476",  0.659674120984234,
    "36450",  0.144901285998913,
    "116250",     2.356637863315,
    "51209",   0.37037037037037,
    "91090",   2.66736061414352,
    "71041",  0.488468263439816,
    "36667",  0.286532951289398,
    "225000",   1.71102661596958,
    "42880",   2.84874104024995,
    "74012",  0.957741562493182,
    "54414",  0.663732271361699,
    "46229",  0.313773578100768,
    "71576",  0.461661983139302,
    "36106",   1.19088125212657,
    "53545",  0.260276431520373,
    "82994",   2.70919816473673,
    "17694",  0.434512725015518,
    "62937",  0.868055555555556,
    "141761",   3.01863792384435,
    "102599",   3.17929691804724,
    "59583",   1.88133140376266,
    "66549",   1.06200541333499,
    "88671",   2.17663919742028,
    "32188",  0.137741046831956,
    "44444",   5.95238095238095,
    "67878",  0.896860986547085,
    "25708",  0.142585551330798,
    "34737", 0.0795544948289578,
    "37750",                  0,
    "113300",  0.675675675675676,
    "85354",   2.91218413810358,
    "51918",   1.24637771476646,
    "41169",   0.29604163682376,
    "50469",                  0,
    "46333",  0.172950536146662,
    "37778",  0.669757856774858,
    "-",   4.70588235294118,
    "77153",   2.48161764705882,
    "-",   3.82775119617225,
    "110659",   3.57568726030488,
    "58906",  0.368324125230203,
    "26094",                  0
)

df %>% 
    mutate(income = as.numeric(income)) %>% 
    ggplot(aes(x = income, y = p_beph)) +
    geom_point()
#> Warning in mask$eval_all_mutate(quo): NAs introducidos por coerción
#> Warning: Removed 4 rows containing missing values (geom_point).

Created on 2021-10-26 by the reprex package (v2.0.1)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.