RSQLite: How to access 'BLOB' encoded data (little-endian 64 bit doubles) ?

I am trying to access a file in sqlite3 format using library(RSQLite).
There is a column with type BLOB and it is encoded as little-endian 64 bit doubles.

How to return this binary format into double type in R.

for a data frame x:

> x
# A tibble: 113 × 3
    col1          col2        col3
   <int>        <blob>      <blob>
 1     9   <raw 448 B> <raw 231 B>
 2    36 <raw 1.01 kB> <raw 555 B>
 3    46   <raw 853 B> <raw 475 B>
 4    47 <raw 1.02 kB> <raw 629 B>
 5    51   <raw 782 B> <raw 427 B>
 6    63   <raw 871 B> <raw 497 B>
 7    65   <raw 706 B> <raw 378 B>

class of column2:

> class(x[[2]])
[1] "blob"          "vctrs_list_of" "vctrs_vctr"    "list"   

type of column2:

> typeof(x[[2]])
[1] "list"

unlist

> unlist(x[[2]])
   [1] 78 9c 13 59 e7 fe b0 ca 25 c4 61 49 81 2d d7 75 e7 30 07 06 30 08 77 10 01 8b 87 3b e8 df 55 61 6b 3c 1a ee 10 d3
  [39] 7f e8 ab 86 4b 84 c3 c1 53 0b 5d b7 dd 8e 82 f2 a3 1d ca f7 cd 97 d2 3f 1a 0d d5 1f e3 c0 d6 38 d5 b9 db 05 4a b3
  [77] c4 3a 44 58 6e 39 51 e6 16 e7 d0 fa 3a 70 87 dc d1 38 07 eb fb fe bd d3 93 12 1c 02 7a a7 e7 09 29 27 3a 80 94 e5
 [115] 34 27 3a a8 1b 72 ac 91 59 94 e8 b0 69 ee fb e5 c7 1e 43 c5 93 93 1d 9e 5e 50 ba fd f3 50 b2 83 3c c8 00 a5 54 87
 [153] a0 1d 72 ad af 1b 53 1d b4 db c5 6e 9e 7b 9c ea f0 e5 ef 95 8a 97 4a 69 0e 9f 2e f9 26 09 a8 a4 39 4c 03 6a 7b ae
 [191] 9c e1 30 59 82 25 8c 2f 34 03 62 df a2 0c 87 0f cb 8f 79 9b 2f cd 70 a8 b5 37 8d db f5 34 c3 e1 b8 b7 79 a7 23 4b
 [229] 96 43 dc 2e 4f 1e a6 e0 2c 88 fe 96 2c 88 78 4b b6 c3 c3 2a 60 08 a8 e6 3a 5c ad 78 a9 66 98 92 eb f0 78 e9 ec 23
 [267] 0a 4f 72 1d ea b3 f6 94 4c 5e 92 ef e0 94 00 74 d9 e3 7c 88 3f 83 0a 1c 3a 1d 81 02 9f 0a 20 ea 98 8a 1c d6 c8 44
 [305] a5 58 33 17 39 08 80 02 20 a8 08 12 9e 52 c5 0e f9 42 cd 07 4e 19 17 43 dc d9 58 0c 51 37 bb 18 12 4e af 8a 1d 80
 [343] bc fb fe 97 4b 1c de 00 83 ab b5 b9 cc a1 14 e4 11 93 0a 87 6f 1a c0 10 2f ae 80 f0 53 2a 21 ee 2d a9 74 00 fa 52
 [381] 77 93 71 95 c3 b6 cf c0 80 68 ae 72 48 03 81 96 2a 07 a0 69 fb e6 4f aa 72 d8 50 94 31 f1 ed 92 1a 07 49 90 c2 29
 [419] 75 90 f0 ed 6a 80 c4 db e4 66 07 1f a0 77 13 b4 db 1d a4 41 11 bd a5 cb 01 00 d5 2b e4 43 78 9c 2d 54 7b 6c 53 65
 [457] 14 8f b7 0d c1 39 89 29 03 b1 3e b2 74 45 c5 45 0d 53 b2 b0 89 fe d8 c0 07 8f 68 b7 c5 4c 9c 20 bd eb d6 5d b6 ae
 [495] bd b7 dd fa a2 eb ed 6d 0b 6c 34 04 0b 01 47 90 47 47 94 49 0c 69 26 31 93 10 68 07 ea d0 29 50 63 80 18 02 1d 31
 [533] 80 8a 7b 28 a2 92 f8 7d 3b fb fe 39 fd fa 9d c7 ef fc ce f9 5d df 8d 27 17 ce 7c b1 0e 7d ce 25 05 3f 69 f5 78 81
 [571] 9f ec 1a 74 49 c7 dd c9 a3 ef 62 86 ba b3 ba 67 59 03 59 fd 7b 88 dd b2 7c f1 c4 e9 b5 f8 95 99 98 b6 0e 17 47 ee
 [609] d4 e6 63 eb c0 9f 5b b5 f5 48 ef bd fd f1 99 fc 7a 3c b5 70 e6 91 c7 32 56 54 5e 79 33 b1 4b 10 b1 2d f3 e7 d3 0d
 [647] 3a 91 e2 1e 16 31 f0 75 c7 89 7d 56 11 ff 84 58 a1 94 88 57 16 ad 1d 7c a3 4f 44 fd 62 f6 d2 2f 22 cb dc b7 65 44
 [685] b4 fd f2 ef e1 31 66 cf 1e 58 7e 6c e2 54 23 bd 0b 36 7c b5 b2 7c cb 52 b3 0d 6f cf 7a 36 bd 57 b4 a1 2e 7f 68 cf
 [723] 50 a3 0d da c9 e1 03 cb af d9 50 ba 69 ee c5 91 bc 8d 70 89 4d a8 7a ff fa 0f a6 54 13 1a 38 10 73 33 ae fa 8b 3e
 [761] 7b 6d 7e 33 e1 cd 36 e3 23 66 56 ea ec e0 b0 d7 e8 ed f8 9e b5 75 48 b4 83 d3 30 af cf 8e 2d 4b 59 82 8c 9d f0 e6
 [799] ed f8 8b c3 bb 6e 07 2f 7f bb a4 05 fb 8d cf ff 6c b6 b6 4c e3 6d a1 38 b3 84 0b ab ad 0f d5 ab 12 ca 79 82 21 89
 [837] f2 e6 25 aa 33 2a 11 1f d5 1b e0 62 f4 a7 46 37 50 de fb 5a 69 1e a9 56 30 94 57 fd fa 36 ea 27 db 86 ef be 65 67
 [875] 99 83 fe ef 73 e0 c4 3e 56 78 d2 01 9e ae 51 d7 0e 96 ed 96 c5 dc 4e 7e 43 ed 58 51 28 94 6e 32 39 c1 c3 ef 58 9c
 [913] 68 e2 47 76 d2 dc b6 3b c1 69 2b 1e 76 52 dc 35 27 04 5e 88 d9 29 3c 82 8b f8 32 b8 a8 9f 32 17 fe 66 f0 f7 54 b9
 [951] c0 cb 17 45 5c e4 c7 ec 7f 87 c7 ce af 4e bb c0 6e ce 25 82 0c 0e a7 52 27 83 8f b5 d0 20 83 ff 5d 50 29 83 d3 73
 [989] de 22 d3 3e d5 c8 78 f5 d8 c4 bd dc
 [ reached getOption("max.print") -- omitted 98398 entries ]

Thanks for the help.

It would be really useful to have a way to double check, because I'm only ~80% sure I can get it right on the first try :slight_smile:

One more note: x is a tibble, so x[[2]] the second column, so you'd want to take x[[2]][[1]] to extract a single blob. With your unlist() you are concatenating all the rows, maybe it makes sense in your context.

Anyway, starting with a raw:

xx <- as.raw(c(0x78,0x9c,0x13,0x59,0xe7,0xfe,0xb0,0xca,0x25,0xc4,0x61,0x49,0x81,0x2d,0xd7,0x75,0xe7,0x30,0x07,0x06,0x30,0x08,0x77,0x10,0x01,0x8b,0x87,0x3b,0xe8,0xdf,0x55,0x61,0x6b,0x3c,0x1a,0xee,0x10,0xd3,
             0x7f,0xe8,0xab,0x86,0x4b,0x84,0xc3,0xc1,0x53,0x0b,0x5d,0xb7,0xdd,0x8e,0x82,0xf2,0xa3,0x1d,0xca,0xf7,0xcd,0x97,0xd2,0x3f,0x1a,0x0d,0xd5,0x1f,0xe3,0xc0,0xd6,0x38,0xd5,0xb9,0xdb,0x05,0x4a,0xb3))

typeof(xx)
#> [1] "raw"

we can use readBin(). We will save the output as what = "double"or equivalently numeric (note this is the R type, not about the fact that your blob is C doubles). The size of each element is 64 bits = 8 bytes. And we can pre-calculate the number of elements based on the length of your blob. So I'd use:

xx <- as.raw(c(0x78,0x9c,0x13,...))

readBin(xx, what = "double", size = 8L, endian = "little", n = floor(length(xx)/8))
#>  [1]  -6.358853e+51  4.454589e+259  2.373639e-229  7.688446e+160 -2.323157e+195
#>  [6]  4.210583e-254  6.765449e-166  1.501694e-245  -4.185307e-30  1.594171e+117
#> [11] -3.927280e+283 -3.727741e+207  1.676743e-298 -1.454807e-307  4.610455e+241
#> [16] -7.429313e+223 -2.294045e-214   9.481478e+81 -2.385114e-273  2.025261e-167
#> [21] -3.572079e+205  1.590221e+199 -4.232818e-132 -1.612949e+162  3.160594e-293
#> [26] -1.361732e-232   2.486213e-40  -2.856300e-40   5.819625e-83 -6.382006e-198
#> [31]  1.103055e-102  1.985392e+273 -1.193198e+200  -1.300114e-58 -4.121159e-208
#> [36] -4.937324e-293  1.613476e-257   2.345729e+23  1.225703e-296  1.637254e-237
#> [41]   1.625437e+15  1.527103e-189 -1.084945e+288  -4.344127e-99 -8.109168e-217
#> [46]  -3.123264e-86 -2.489093e-110 -2.189798e-205  2.595359e+244  6.134318e+200
#> ...

Created on 2024-03-26 with reprex v2.0.2

However, note that these numbers are really all over the place! To me that strongly suggests there is a problem (but maybe it makes sense in your context). I tried a few changes in my readBin() call and can't really see anything wrong.

1 Like

Thank you very much. You are totally right.
I had to join another column from different table containing the n. Then it all makes sense.
But then to let it work instead of having these huge numbers I had to use memDecompress function for some but not all the BLOBS.

readBin(memDecompress(as.raw(x), "gzip"), double(), n= n)
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.