need help creating a heatmap

Although I tried to create a dataframe, I did not get the heatmap that I was looking for. I tried to demonstrate a positive relationship between computer types, subscriptions, and income

On the x-axis, I want computer types, on the y-axis, I want subscriptions, and the filing is income

data = data.frame(one_or_more_computer = c(1659065, 242325, 2467069, 1031971, 12360505, 2028365, 1274056, 345186, 7383447, 3522345),
                  no_computer = c(229439, 12848, 176361, 138573, 742609, 109037, 111381, 25767, 547866, 307919),
                  subscriptions = c(1516816, 223644, 2296123, 904327, 11677634, 1922418, 1213487, 325137, 6794361,3240391),
                  subscriptions2 = c(371688, 31529, 347307, 266217, 1425480, 214984, 171950, 45816, 1136952, 589873),
                  high_income = c(646321, 132472, 1077679, 365183, 6818051, 1072011, 728121, 171113, 3032764, 1567356))

library(ggplot2)
f = ggplot(data, aes(y = subscriptions, x = one_or_more_computer, fill = high_income))
f + geom_tile()

Are you using a spreadsheet program like Excel or Google Sheets, a programming language like Python or R, or a dedicated data visualization tool?

@nnguyen I don't understand your data. The values in your computer and subscription related columns vary over a large range. What do they represent? Also, your ggplot code refers to a column named computer which does not exist in the data frame.
To make working with your data easier, I renamed your columns so they do not have spaces.

data = data.frame(One_or_more_computer = c(1659065, 242325, 2467069, 1031971, 12360505, 2028365, 1274056, 345186, 7383447, 3522345),
                  no_computer = c(229439, 12848, 176361, 138573, 742609, 109037, 111381,25767, 547866, 307919),
                  subscriptions = c(1516816, 223644, 2296123, 904327, 11677634, 1922418, 1213487, 325137, 6794361,3240391),
                  subscriptions2 = c(371688, 31529, 347307, 266217, 1425480, 214984, 171950, 45816, 1136952, 589873),
                  high_income = c(646321,132472, 1077679, 365183, 6818051,1072011,728121,171113,3032764,1567356))

I have three data sets in Excel, but I'm using R to visualize a heatmap

I have edited the post. Thanks.

These values show how many households have these computer types; subscriptions represents households which have subscriptions while subscriptions2 represents households without subscriptions. High_income refers to those who earn a high income.

geom_tile() does not seem like a good choice for this data set nor does plotting the raw counts of households. The number of households per row varies from about 255 thousand to 13 million. Comparing rows 2 and 3, about ten times as many households in row 3 have computers but there are also ten times as many households. I think plotting the fractions of households that have computes or subscriptions would be more informative. I would use geom_point and then color the symbols by the fraction of households that have high income.

library(ggplot2)
data = data.frame(One_or_more_computer = c(1659065, 242325, 2467069, 1031971, 12360505, 2028365, 1274056, 345186, 7383447, 3522345),
                  no_computer = c(229439, 12848, 176361, 138573, 742609, 109037, 111381,25767, 547866, 307919),
                  subscriptions = c(1516816, 223644, 2296123, 904327, 11677634, 1922418, 1213487, 325137, 6794361,3240391),
                  subscriptions2 = c(371688, 31529, 347307, 266217, 1425480, 214984, 171950, 45816, 1136952, 589873),
                  high_income = c(646321,132472, 1077679, 365183, 6818051,1072011,728121,171113,3032764,1567356))
data$TotalHouseholds <- data$One_or_more_computer + data$no_computer
data$FracWithComp <- data$One_or_more_computer/data$TotalHouseholds
data$FracWithSub <- data$subscriptions/data$TotalHouseholds
data$FracHighInc <- data$high_income/data$TotalHouseholds

data
#>    One_or_more_computer no_computer subscriptions subscriptions2 high_income
#> 1               1659065      229439       1516816         371688      646321
#> 2                242325       12848        223644          31529      132472
#> 3               2467069      176361       2296123         347307     1077679
#> 4               1031971      138573        904327         266217      365183
#> 5              12360505      742609      11677634        1425480     6818051
#> 6               2028365      109037       1922418         214984     1072011
#> 7               1274056      111381       1213487         171950      728121
#> 8                345186       25767        325137          45816      171113
#> 9               7383447      547866       6794361        1136952     3032764
#> 10              3522345      307919       3240391         589873     1567356
#>    TotalHouseholds FracWithComp FracWithSub FracHighInc
#> 1          1888504    0.8785075   0.8031839   0.3422397
#> 2           255173    0.9496498   0.8764407   0.5191458
#> 3          2643430    0.9332833   0.8686150   0.4076821
#> 4          1170544    0.8816166   0.7725698   0.3119772
#> 5         13103114    0.9433258   0.8912106   0.5203382
#> 6          2137402    0.9489862   0.8994181   0.5015486
#> 7          1385437    0.9196059   0.8758875   0.5255533
#> 8           370953    0.9305384   0.8764911   0.4612795
#> 9          7931313    0.9309237   0.8566502   0.3823786
#> 10         3830264    0.9196089   0.8459968   0.4092031

ggplot(data, aes(x = FracWithComp, y = FracWithSub, color = FracHighInc)) +
  geom_point(size = 4)

Created on 2024-02-15 with reprex v2.0.2
There is a strong correlation between households and subscriptions and higher values of those correlate with a higher fraction of high incomes.

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.