IMPROVING MY CORRELATION ANALYSIS. Correlating Metadata and Bacteria

Hello everyone, I hope you are all doing well. I've been struggling with this script for a few days because I'm not an expert in the R language. I've tried everything, and I don't know how to make my data correlate between Metadata and Bacteria. I don't want my data to correlate with itself, for example, dissolved oxygen vs. dissolved oxygen, I don't want that.

I would also like the bacteria to be on one side and the physicochemical parameters on the other side. I would greatly appreciate your help, really.

I wish you all happy holiday

library(tidyverse)
library(corrplot)
#> corrplot 0.92 loaded
library(RColorBrewer)


metadata <- data.frame(tibble::tribble(
                         ~SampleID, ~`Chlorophyll-a`,   ~TOC,   ~NH3,   ~NO2,   ~NO3,    ~ON,    ~TN,   ~TKN,    ~TP,   ~PO4, ~Transparency,   ~TDS,    ~EC, ~pH, ~Salinity,  ~DO, ~TSS, ~ET, ~WT,
                              "1A",           2.1257, 2.7276,   0.02,  0.021,   0.02,  1.564,  1.625,  1.584, 0.0472, 0.0196,           0.2, 44296L, 55370L, 7.6,      36.6, 3.26,  77L, 30L, 30L,
                              "1B",              0.5, 0.6491,  0.034,  0.021, 0.0242, 3.2149, 3.2941, 3.2489, 0.0341,  0.007,           2.5, 43232L, 54040L, 8.1,      35.6,  5.1,  55L, 33L, 31L,
                              "1C",              0.5,  0.642,   0.02,  0.021,  0.046, 1.6416, 1.7286, 1.6616, 0.0348,  0.007,           1.2, 43144L, 53930L,   8,      34.8, 4.17,  45L, 33L, 30L,
                              "1D",              0.5, 0.5403, 0.0343, 0.0361, 0.5983, 1.2226, 1.8913, 1.2569, 0.0237,  0.007,           2.4, 42248L, 52810L,   8,      34.6, 4.29,  45L, 34L, 31L,
                              "1E",              0.5,    0.5,   0.02,  0.022, 0.0642, 0.0499, 0.1561, 0.0699, 0.0293,  0.007,           1.7, 67344L, 84180L, 8.4,        50, 5.44,  41L, 35L, 30L,
                              "1F",              0.5, 0.5783,  0.095,  0.021, 0.2069, 0.1923, 0.5152, 0.2873, 0.0379,  0.007,           1.8, 66152L, 82690L, 8.4,        50, 5.16,  40L, 34L, 30L,
                              "1G",              0.5,    0.5,   0.02,  0.021, 0.0862, 0.2174, 0.3446, 0.2374, 0.0271,  0.007,           2.2, 66888L, 83610L, 8.4,        50, 5.65,  38L, 35L, 31L,
                              "1H",              0.5,    0.5, 0.0233,  0.021, 0.0532, 0.4722, 0.5697, 0.4955, 0.0306,  0.007,           7.1, 68248L, 85310L, 8.4,        50, 5.06,  38L, 32L, 30L,
                              "2A",              0.5,  0.893, 0.0479,  0.021,   0.02, 0.2196, 0.3085, 0.2675, 0.0299,  0.007,           4.1, 67688L, 84610L, 8.3,        50, 5.13,  47L, 30L, 31L
                         )
)

bacteria <- data.frame (tibble::tribble(
                          ~SampleID, ~Rhodobacteraceae, ~Fusobacteriaceae, ~Vibrionaceae, ~Egicoccaceae, ~Alteromonadaceae, ~Anaerolineaceae, ~Flavobacteriaceae, ~Prolixibacteraceae, ~Bacillaceae, ~Pseudomonadaceae,
                               "1A",            37198L,                0L,          235L,           15L,             9140L,              12L,              3748L,                  0L,         448L,             7589L,
                               "1B",            37175L,                1L,          196L,           13L,            11140L,              11L,              4535L,                  1L,         497L,             8462L,
                               "1C",              799L,               33L,          150L,          591L,              119L,            5955L,              2437L,                552L,        1891L,             1772L,
                               "1D",              797L,               50L,          138L,          598L,              156L,            6281L,              2304L,                474L,        1623L,             1949L,
                               "1E",             1737L,            23105L,        54555L,            0L,               73L,              38L,              1741L,              19873L,          39L,               55L,
                               "1F",             8353L,            50217L,        46425L,            0L,              703L,              43L,              6739L,               8097L,          39L,              479L,
                               "1G",             4833L,            16027L,        44525L,            0L,              129L,              21L,              3214L,               6245L,          11L,               71L,
                               "1H",             1789L,                1L,          210L,        29402L,               41L,            3778L,               587L,                  2L,        5713L,              330L,
                               "2A",             1911L,                2L,          234L,        31403L,               56L,            3944L,               535L,                  9L,        5628L,              326L
                          )
)



datos_combinados <- merge(metadata, bacteria, by = "SampleID")
correlaciones <- cor(datos_combinados[, -1], method = "pearson")  # Excluimos la columna de SampleID

# Muestra la matriz de correlaciĂłn
print(correlaciones)
#>                    Chlorophyll.a        TOC         NH3         NO2         NO3
#> Chlorophyll.a          1.0000000  0.9849270 -0.22867574 -0.13409666 -0.20930936
#> TOC                    0.9849270  1.0000000 -0.18268160 -0.16702207 -0.25441684
#> NH3                   -0.2286757 -0.1826816  1.00000000 -0.02515876  0.22982939
#> NO2                   -0.1340967 -0.1670221 -0.02515876  1.00000000  0.94865426
#> NO3                   -0.2093094 -0.2544168  0.22982939  0.94865426  1.00000000
#> ON                     0.2105944  0.2302813 -0.22858074  0.06644582 -0.05511900
#> TN                     0.1666225  0.1791222 -0.16362047  0.23958503  0.13327447
#> TKN                    0.2062831  0.2271552 -0.20617937  0.06619329 -0.04998375
#> TP                     0.7873282  0.7975293  0.14165810 -0.50709633 -0.46529785
#> PO4                    1.0000000  0.9849270 -0.22867574 -0.13409666 -0.20930936
#> Transparency          -0.4467028 -0.4326329 -0.01168038 -0.04459133 -0.06633015
#> TDS                   -0.3630707 -0.3657432  0.27098724 -0.40500232 -0.28251586
#> EC                    -0.3630707 -0.3657432  0.27098724 -0.40500232 -0.28251586
#> pH                    -0.7810943 -0.7921955  0.31588245 -0.22179510 -0.08272154
#> Salinity              -0.3359260 -0.3401093  0.29812474 -0.41478348 -0.28330122
#> DO                    -0.7685322 -0.7725798  0.22354109 -0.23733708 -0.13808756
#> TSS                    0.9016155  0.9300877 -0.21335475 -0.08417437 -0.21440402
#> ET                    -0.5700877 -0.6767166  0.04628581  0.24837010  0.37200918
#> WT                    -0.3162278 -0.2518177 -0.03462451  0.37664145  0.29355965
#> Rhodobacteraceae       0.6535815  0.6588637 -0.06531505 -0.25365312 -0.31504759
#> Fusobacteriaceae      -0.2140950 -0.2610845  0.71222740 -0.19542316  0.08323318
#> Vibrionaceae          -0.2478578 -0.3220707  0.27327351 -0.21154661 -0.03233862
#> Egicoccaceae          -0.1932816 -0.1092937  0.02103473 -0.19088769 -0.25731570
#> Alteromonadaceae       0.5717376  0.5833448 -0.12630049 -0.20409526 -0.29695719
#> Anaerolineaceae       -0.3040841 -0.2589108 -0.15733249  0.53820899  0.44671534
#> Flavobacteriaceae      0.1668394  0.1322887  0.60183757 -0.12288988  0.09756714
#> Prolixibacteraceae    -0.2183790 -0.2874165  0.06634308 -0.13385773 -0.05809973
#> Bacillaceae           -0.2132878 -0.1295999 -0.06272740 -0.04182650 -0.13525751
#> Pseudomonadaceae       0.5957404  0.6082168 -0.18803854 -0.06152923 -0.17762132
#>                             ON         TN         TKN          TP        PO4
#> Chlorophyll.a       0.21059437  0.1666225  0.20628312  0.78732823  1.0000000
#> TOC                 0.23028130  0.1791222  0.22715515  0.79752930  0.9849270
#> NH3                -0.22858074 -0.1636205 -0.20617937  0.14165810 -0.2286757
#> NO2                 0.06644582  0.2395850  0.06619329 -0.50709633 -0.1340967
#> NO3                -0.05511900  0.1332745 -0.04998375 -0.46529785 -0.2093094
#> ON                  1.00000000  0.9820021  0.99973664  0.30096532  0.2105944
#> TN                  0.98200209  1.0000000  0.98317779  0.21793958  0.1666225
#> TKN                 0.99973664  0.9831778  1.00000000  0.30584703  0.2062831
#> TP                  0.30096532  0.2179396  0.30584703  1.00000000  0.7873282
#> PO4                 0.21059437  0.1666225  0.20628312  0.78732823  1.0000000
#> Transparency       -0.25492159 -0.2664344 -0.25650339 -0.44153945 -0.4467028
#> TDS                -0.84042621 -0.8837610 -0.83834541 -0.28046404 -0.3630707
#> EC                 -0.84042621 -0.8837610 -0.83834541 -0.28046404 -0.3630707
#> pH                 -0.61623623 -0.6227242 -0.61194815 -0.58293161 -0.7810943
#> Salinity           -0.83674924 -0.8796479 -0.83400989 -0.24885883 -0.3359260
#> DO                 -0.42467789 -0.4438737 -0.42158484 -0.60525536 -0.7685322
#> TSS                 0.54413542  0.4988034  0.54189470  0.75452579  0.9016155
#> ET                 -0.18232190 -0.1131450 -0.18216520 -0.52283178 -0.5700877
#> WT                  0.21921192  0.2718586  0.21951917 -0.55531761 -0.3162278
#> Rhodobacteraceae    0.71522385  0.6540597  0.71734967  0.69203497  0.6535815
#> Fusobacteriaceae   -0.50156022 -0.4694236 -0.48734147  0.09323127 -0.2140950
#> Vibrionaceae       -0.59281479 -0.5913394 -0.58941110 -0.14680495 -0.2478578
#> Egicoccaceae       -0.34125322 -0.3864846 -0.34250628 -0.21164313 -0.1932816
#> Alteromonadaceae    0.80116540  0.7417678  0.80229406  0.60976365  0.5717376
#> Anaerolineaceae     0.02401039  0.1024834  0.02042467 -0.42090441 -0.3040841
#> Flavobacteriaceae   0.27220989  0.3022366  0.28779196  0.48423294  0.1668394
#> Prolixibacteraceae -0.51663158 -0.5244563 -0.51771539 -0.17462367 -0.2183790
#> Bacillaceae        -0.21657892 -0.2416836 -0.21916754 -0.24843524 -0.2132878
#> Pseudomonadaceae    0.87384855  0.8347395  0.87389437  0.59081153  0.5957404
#>                    Transparency        TDS         EC          pH   Salinity
#> Chlorophyll.a       -0.44670277 -0.3630707 -0.3630707 -0.78109430 -0.3359260
#> TOC                 -0.43263291 -0.3657432 -0.3657432 -0.79219551 -0.3401093
#> NH3                 -0.01168038  0.2709872  0.2709872  0.31588245  0.2981247
#> NO2                 -0.04459133 -0.4050023 -0.4050023 -0.22179510 -0.4147835
#> NO3                 -0.06633015 -0.2825159 -0.2825159 -0.08272154 -0.2833012
#> ON                  -0.25492159 -0.8404262 -0.8404262 -0.61623623 -0.8367492
#> TN                  -0.26643443 -0.8837610 -0.8837610 -0.62272424 -0.8796479
#> TKN                 -0.25650339 -0.8383454 -0.8383454 -0.61194815 -0.8340099
#> TP                  -0.44153945 -0.2804640 -0.2804640 -0.58293161 -0.2488588
#> PO4                 -0.44670277 -0.3630707 -0.3630707 -0.78109430 -0.3359260
#> Transparency         1.00000000  0.4944315  0.4944315  0.54080879  0.4599220
#> TDS                  0.49443154  1.0000000  1.0000000  0.84631409  0.9982972
#> EC                   0.49443154  1.0000000  1.0000000  0.84631409  0.9982972
#> pH                   0.54080879  0.8463141  0.8463141  1.00000000  0.8353902
#> Salinity             0.45992203  0.9982972  0.9982972  0.83539022  1.0000000
#> DO                   0.42420849  0.7372285  0.7372285  0.94244102  0.7328460
#> TSS                 -0.48433691 -0.5980532 -0.5980532 -0.88137360 -0.5732093
#> ET                  -0.17868291  0.1516006  0.1516006  0.49271984  0.1561237
#> WT                   0.10561482 -0.1172386 -0.1172386  0.07600114 -0.1181842
#> Rhodobacteraceae    -0.37391556 -0.5049504 -0.5049504 -0.60977028 -0.4702183
#> Fusobacteriaceae    -0.23543793  0.5055090  0.5055090  0.51367647  0.5390130
#> Vibrionaceae        -0.25851191  0.6022809  0.6022809  0.59726217  0.6276331
#> Egicoccaceae         0.84398785  0.4982883  0.4982883  0.34286792  0.4661919
#> Alteromonadaceae    -0.32812289 -0.5675921 -0.5675921 -0.60905854 -0.5391948
#> Anaerolineaceae      0.31260529 -0.2930972 -0.2930972 -0.13810318 -0.3388751
#> Flavobacteriaceae   -0.55750169 -0.2160419 -0.2160419 -0.12953991 -0.1675195
#> Prolixibacteraceae  -0.25065552  0.5002781  0.5002781  0.49726950  0.5131908
#> Bacillaceae          0.82013776  0.2944531  0.2944531  0.20524463  0.2532446
#> Pseudomonadaceae    -0.37789598 -0.7287474 -0.7287474 -0.74380737 -0.7057958
#>                             DO         TSS          ET          WT
#> Chlorophyll.a      -0.76853219  0.90161550 -0.57008771 -0.31622777
#> TOC                -0.77257983  0.93008773 -0.67671663 -0.25181766
#> NH3                 0.22354109 -0.21335475  0.04628581 -0.03462451
#> NO2                -0.23733708 -0.08417437  0.24837010  0.37664145
#> NO3                -0.13808756 -0.21440402  0.37200918  0.29355965
#> ON                 -0.42467789  0.54413542 -0.18232190  0.21921192
#> TN                 -0.44387370  0.49880342 -0.11314498  0.27185858
#> TKN                -0.42158484  0.54189470 -0.18216520  0.21951917
#> TP                 -0.60525536  0.75452579 -0.52283178 -0.55531761
#> PO4                -0.76853219  0.90161550 -0.57008771 -0.31622777
#> Transparency        0.42420849 -0.48433691 -0.17868291  0.10561482
#> TDS                 0.73722849 -0.59805320  0.15160062 -0.11723856
#> EC                  0.73722849 -0.59805320  0.15160062 -0.11723856
#> pH                  0.94244102 -0.88137360  0.49271984  0.07600114
#> Salinity            0.73284599 -0.57320934  0.15612368 -0.11818418
#> DO                  1.00000000 -0.77708582  0.54098109  0.29645619
#> TSS                -0.77708582  1.00000000 -0.63794631 -0.08329227
#> ET                  0.54098109 -0.63794631  1.00000000  0.05547002
#> WT                  0.29645619 -0.08329227  0.05547002  1.00000000
#> Rhodobacteraceae   -0.41482873  0.82706820 -0.37079668  0.04143422
#> Fusobacteriaceae    0.43627476 -0.38667625  0.52326726 -0.32251711
#> Vibrionaceae        0.60090375 -0.45681092  0.70218317 -0.19610470
#> Egicoccaceae        0.20978257 -0.21801737 -0.57276103  0.07908488
#> Alteromonadaceae   -0.39661693  0.79850071 -0.35642612  0.10186807
#> Anaerolineaceae    -0.28820678 -0.24220422 -0.16757132  0.11535328
#> Flavobacteriaceae  -0.05821080  0.19941070  0.27994741 -0.10787165
#> Prolixibacteraceae  0.49547664 -0.37040581  0.63268333 -0.31519280
#> Bacillaceae         0.05185402 -0.20525644 -0.57970997  0.07138961
#> Pseudomonadaceae   -0.54798613  0.83845904 -0.37919969  0.10474063
#>                    Rhodobacteraceae Fusobacteriaceae Vibrionaceae Egicoccaceae
#> Chlorophyll.a            0.65358148      -0.21409503  -0.24785783  -0.19328161
#> TOC                      0.65886366      -0.26108450  -0.32207073  -0.10929374
#> NH3                     -0.06531505       0.71222740   0.27327351   0.02103473
#> NO2                     -0.25365312      -0.19542316  -0.21154661  -0.19088769
#> NO3                     -0.31504759       0.08323318  -0.03233862  -0.25731570
#> ON                       0.71522385      -0.50156022  -0.59281479  -0.34125322
#> TN                       0.65405971      -0.46942358  -0.59133938  -0.38648463
#> TKN                      0.71734967      -0.48734147  -0.58941110  -0.34250628
#> TP                       0.69203497       0.09323127  -0.14680495  -0.21164313
#> PO4                      0.65358148      -0.21409503  -0.24785783  -0.19328161
#> Transparency            -0.37391556      -0.23543793  -0.25851191   0.84398785
#> TDS                     -0.50495042       0.50550897   0.60228093   0.49828828
#> EC                      -0.50495042       0.50550897   0.60228093   0.49828828
#> pH                      -0.60977028       0.51367647   0.59726217   0.34286792
#> Salinity                -0.47021832       0.53901299   0.62763307   0.46619189
#> DO                      -0.41482873       0.43627476   0.60090375   0.20978257
#> TSS                      0.82706820      -0.38667625  -0.45681092  -0.21801737
#> ET                      -0.37079668       0.52326726   0.70218317  -0.57276103
#> WT                       0.04143422      -0.32251711  -0.19610470   0.07908488
#> Rhodobacteraceae         1.00000000      -0.18892612  -0.27771369  -0.32874251
#> Fusobacteriaceae        -0.18892612       1.00000000   0.84165407  -0.33169097
#> Vibrionaceae            -0.27771369       0.84165407   1.00000000  -0.38442435
#> Egicoccaceae            -0.32874251      -0.33169097  -0.38442435   1.00000000
#> Alteromonadaceae         0.98643968      -0.28431324  -0.35411875  -0.30740993
#> Anaerolineaceae         -0.55366111      -0.51341634  -0.59913891   0.35536540
#> Flavobacteriaceae        0.48511542       0.63243078   0.34592088  -0.67065345
#> Prolixibacteraceae      -0.28800357       0.65724576   0.89001207  -0.33708091
#> Bacillaceae             -0.37931479      -0.47962417  -0.55802785   0.95992964
#> Pseudomonadaceae         0.94926144      -0.39618963  -0.48272260  -0.34737079
#>                    Alteromonadaceae Anaerolineaceae Flavobacteriaceae
#> Chlorophyll.a             0.5717376     -0.30408407        0.16683936
#> TOC                       0.5833448     -0.25891083        0.13228871
#> NH3                      -0.1263005     -0.15733249        0.60183757
#> NO2                      -0.2040953      0.53820899       -0.12288988
#> NO3                      -0.2969572      0.44671534        0.09756714
#> ON                        0.8011654      0.02401039        0.27220989
#> TN                        0.7417678      0.10248338        0.30223655
#> TKN                       0.8022941      0.02042467        0.28779196
#> TP                        0.6097636     -0.42090441        0.48423294
#> PO4                       0.5717376     -0.30408407        0.16683936
#> Transparency             -0.3281229      0.31260529       -0.55750169
#> TDS                      -0.5675921     -0.29309721       -0.21604190
#> EC                       -0.5675921     -0.29309721       -0.21604190
#> pH                       -0.6090585     -0.13810318       -0.12953991
#> Salinity                 -0.5391948     -0.33887507       -0.16751950
#> DO                       -0.3966169     -0.28820678       -0.05821080
#> TSS                       0.7985007     -0.24220422        0.19941070
#> ET                       -0.3564261     -0.16757132        0.27994741
#> WT                        0.1018681      0.11535328       -0.10787165
#> Rhodobacteraceae          0.9864397     -0.55366111        0.48511542
#> Fusobacteriaceae         -0.2843132     -0.51341634        0.63243078
#> Vibrionaceae             -0.3541187     -0.59913891        0.34592088
#> Egicoccaceae             -0.3074099      0.35536540       -0.67065345
#> Alteromonadaceae          1.0000000     -0.47251656        0.41383610
#> Anaerolineaceae          -0.4725166      1.00000000       -0.55218999
#> Flavobacteriaceae         0.4138361     -0.55218999        1.00000000
#> Prolixibacteraceae       -0.3206858     -0.48290926        0.09870659
#> Bacillaceae              -0.3338404      0.58744097       -0.72852296
#> Pseudomonadaceae          0.9756125     -0.29561560        0.36755266
#>                    Prolixibacteraceae Bacillaceae Pseudomonadaceae
#> Chlorophyll.a             -0.21837903 -0.21328785       0.59574040
#> TOC                       -0.28741653 -0.12959994       0.60821685
#> NH3                        0.06634308 -0.06272740      -0.18803854
#> NO2                       -0.13385773 -0.04182650      -0.06152923
#> NO3                       -0.05809973 -0.13525751      -0.17762132
#> ON                        -0.51663158 -0.21657892       0.87384855
#> TN                        -0.52445632 -0.24168359       0.83473954
#> TKN                       -0.51771539 -0.21916754       0.87389437
#> TP                        -0.17462367 -0.24843524       0.59081153
#> PO4                       -0.21837903 -0.21328785       0.59574040
#> Transparency              -0.25065552  0.82013776      -0.37789598
#> TDS                        0.50027805  0.29445312      -0.72874744
#> EC                         0.50027805  0.29445312      -0.72874744
#> pH                         0.49726950  0.20524463      -0.74380737
#> Salinity                   0.51319080  0.25324463      -0.70579580
#> DO                         0.49547664  0.05185402      -0.54798613
#> TSS                       -0.37040581 -0.20525644       0.83845904
#> ET                         0.63268333 -0.57970997      -0.37919969
#> WT                        -0.31519280  0.07138961       0.10474063
#> Rhodobacteraceae          -0.28800357 -0.37931479       0.94926144
#> Fusobacteriaceae           0.65724576 -0.47962417      -0.39618963
#> Vibrionaceae               0.89001207 -0.55802785      -0.48272260
#> Egicoccaceae              -0.33708091  0.95992964      -0.34737079
#> Alteromonadaceae          -0.32068580 -0.33384040       0.97561247
#> Anaerolineaceae           -0.48290926  0.58744097      -0.29561560
#> Flavobacteriaceae          0.09870659 -0.72852296       0.36755266
#> Prolixibacteraceae         1.00000000 -0.47497948      -0.42192034
#> Bacillaceae               -0.47497948  1.00000000      -0.31457698
#> Pseudomonadaceae          -0.42192034 -0.31457698       1.00000000


testRes = cor.mtest(correlaciones, conf.level = 0.95)

#diagonal
corrplot(correlaciones, p.mat = testRes$p, method = 'color', diag = FALSE, type = 'upper',
         sig.level = c(0.05), pch.cex = 0.9,
         insig = 'label_sig', pch.col = 'grey20', order = 'AOE')


#cuadrado

corrplot(correlaciones, p.mat = testRes$p, method = 'color', diag = FALSE,
         sig.level = c(0.05), pch.cex = 0.9,
         insig = 'label_sig', pch.col = 'grey20', order = 'AOE')

Created on 2023-12-23 with reprex v2.0.2

The two objects are non-conformable, meaning that they are represented by data frames with differing dimensions—specifically, although they have the same number of rows, they differ by the number of columns. That is fixed by creating datos_combinados.

A correlation matrix will always have a diagonal with values of 1, reflecting that a variable correlated with itself is alway unit. For purposes of display, it is possible to suppress the display when using corrplot() with the diag = FALSE argument.

The output object is the upper half of a square matrix, the rows and columns of which are vectors of the same variables. To put one type of vector on one side and the other opposite would be what? Is it even feasible with unequal number of the two? No way to get a square matrix.

A useful framework for approaching problems using R is to solve for y = f(x) where

y is a statistical measure being sought
x is the data at hand
f is one or more functions to transform x into x

In your case x consists of nine observations of 31 variables divided into two data frames, metadata and bacteria.

The first step of f is to merge() into a single object of dim 9,30. So far, so good.

Here be dragons. At this point, correlaciones is a matrix showing the correlations of all the variables of datos_combinados excluding the SampleID variable. To that the cor.mtest() function is applied to produce a typeof list object with the following structure:

> str(testRes)
List of 3
 $ p    : num [1:29, 1:29] 0.00 2.61e-31 7.71e-03 3.84e-01 5.21e-02 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:29] "Chlorophyll.a" "TOC" "NH3" "NO2" ...
  .. ..$ : chr [1:29] "Chlorophyll.a" "TOC" "NH3" "NO2" ...
 $ lowCI: num [1:29, 1:29] 1 0.993 -0.723 -0.503 -0.645 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:29] "Chlorophyll.a" "TOC" "NH3" "NO2" ...
  .. ..$ : chr [1:29] "Chlorophyll.a" "TOC" "NH3" "NO2" ...
 $ uppCI: num [1:29, 1:29] 1 0.9986 -0.1436 0.2116 0.0027 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:29] "Chlorophyll.a" "TOC" "NH3" "NO2" ...
  .. ..$ : chr [1:29] "Chlorophyll.a" "TOC" "NH3" "NO2" ...

of which the fist element, testRes$p is a matrix of p-values.

In turn, that is given as an argument to corrplot with its associated correlation matrix, correlaciones, which displays a heatmap-style grid that shows the measure of correlation and an indicator, an * to show if it is "significant" at an \alpha of 0.05 (i.e., the 95% confidence interval; recall that this is merely a convention that equates to the statement there is only one chance in twenty that the correlation is due to random variation. If the same experiment were conducted on a population 20 times, then, it is possible that 19 would show "non-significant" and the results at hand drew the unlucky card of a result being due solely to random variation. A good rule of thumb is to think of \alpha = 0.05 resulting in "significances" is *that's interesting and worth looking at again with larger N.").

To play with plots, it's often helpful to take a subset of the full data to reduce the dimensionality and be better able to hold the intermediate results in working memory using the *magic number 7 ± 2 heuristic).

Thanks for your message and explanation, but I still can't manage to compare bacteria vs metadata.

Merry Christmas

1 Like

Merry Christmas.

Could you please articulate for me better to understand the goal, the what, rather than the how? Given the output so far, which is

  • a matrix representing
  • coefficients of correlations
  • among all variables and
  • some other object representing the 0.95 confidence yes/no and
  • a distinction between two categories (x,y) of
  • unequal number, a
  • non-conformable array (unequal dimensions column wise)

what is the interpretation of a plot x/y? The binary test of significance? This or that pairwise comparison of x/y significance level, is logically

  • TRUE, TRUE
  • FALSE, FALSE
  • TRUE, FALSE
  • FALSE, TRUE

If so, what information does that convey about the variability in the data in domain terms?

1 Like

I want to perform a Pearson correlation matrix between the abundance of bacteria I have and the physicochemical parameters that were measured at the sampling site where the bacterial sequencing analysis was performed. I assume that this analysis will provide me with information about the linear relationship between these two classes of variables. The correlation matrix will give me correlation coefficients for each pair of variables, indicating the strength and direction of the linear relationship. For example, if you find significant positive correlations between the abundance of certain bacteria and certain physicochemical parameters, this could indicate a positive association. For example, you could observe that as the concentration of a certain chemical component increases, the abundance of certain bacteria also tends to increase.

The purpose of this study is not to analyze the bacteria among themselves (because I have already done another analysis for this), nor the parameters among themselves. In the analysis I am obtaining, they are being analyzed among themselves and it is not what I am looking for.

I am attaching a PDF of what I am looking to represent

Correlation plots (1).pdf (179.8 KB)

1 Like

Ok, now I understand. The code provided is trying to use both categories in the same plot. Subset, rather than merge, then do it for each.

This is the non-conformable array difficulty when trying to compare two matrix objects of unequal dimension. In this case the row dimensions are the same but the column dimensions are unequal. To make them conformable, some combination of sacrificing columns from the larger matrix and padding the smaller matrix with dummy columns is needed. One possibility is to fill out the smaller matrix with columns from the larger. Those will all be on the diagonal and be readily identifiable.

That will provide a plot of correlations of the correlations (CoC). That's not something that is routinely done with the raw data because it creates an additional layer of abstraction which may make interpretation more difficult.

There are several use cases for CoC:

  1. In aid of detecting parameter collinearity in modelling
  2. To detect specific variables that have differences of interest, i.e. re-inforcing when they should be expected to be antagonistic.
  3. When seeking to understand graph (network) structure, as is done with gene expression analysis
  4. In connection with cluster analysis, to explore similarity in how variables interact, as is done in biostatistics to group gened or proteins
  5. Dimension reduction opportunity identification for feature engineering.

There are others. Those above have in common the goal of identifying similarities and differences among pairs of variables. Using a heatmap with continuous scale might help. However, like everything continuous difficulty can be encountered from either too much granularity (too many shades to distinguish) or too little by binning to reduce to a manageable number of discrete ranges. Of course, if there are known or expected breaks—quantum like effects—that may be exactly what is being watched for.

One approach that may be helpful once conformable matrix objects are created by some combination of omitting and padding is the examination of differences. That can be done simply by subtracting one from the other.

With the resulting matrix of differences, the diagonal and other zero differences are easily identified, as are positive (re-enforcing) and negative (antagonistic) pair values. Going further, with the non-zero differences, a judgment can be made as to the magitude required to be somehow interesting for further examination. Those that pass can be coded TRUE and the remainder FALSE. That's very easy to plot.

The alternative is to create the separate subsets, correlaciones_bacteria and correlaciones_metadata without trimming or padding, which will yield conformable matrix objects that can be ploted and compared visually with twin heatmaps as in the illustration provided. The only difference is that now they have difference appearances due to the difference in size.

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.