The search function of the bs4_book bookdown book does not seem to find words beyond a certain number of paragraphs. For example, I created a new book based on the bs4_book template. Next, I created a random text and copied that text to a paragraph in chapter 2, and to a paragraph in chapter 3. Searching for a word from that text finds them in both chapters.
Now, I copied the same text twice in a paragraph in chapter 2, and once in a paragraph in chapter 3. Searching for a word finds it in chapter 3, but not in chapter 2. See https://ecodiv.earth/test/_book/index.html and search e.g., for the word 'bibendum'.
Question: is there a limit to the number of words per paragrap and or page that can be searched and found?
Interesting. I would need to look closer into your example to see if we are missing something while creating the search list. Can you share your example repo ?
We have a scoring filter and some options. As you text is copy pasted exactly maybe it is filter.
I don't have yet an answer for your question as I am not expert with fuse.js, but I hope the above hints can help dig into that and maybe adapt on our side.
From debugging your example in Devtools console in browser, it seems that the occurrences on chapter 2 Hello Bookdown are filtered out because of scoring sets by Fuse.js
From fuse doc, score 0 is perfect match, and score 1 is complete mismatch. In bs4_book() we keep only item with score <= 0.75.
That is why the chapter 2 result is not show as it got a score of 0.82.
This is something related to Fuse.js algorithm and I don't know how the scoring works exactly. You could look into this with your case.
Yes, I had noticed that the search.json contained all the text. I already had a look at the fuse help pages, but with your clues it might be easier to know what to look for, thanks.
I would recommend raising the question there. The weighting just seems completely wrong in this example.
Only thing I can think of that may be negatively affecting this is it not being English, or a “supported” language, possibly. Not sure if that’s a thing in Fuse.js
Field-length Norm: The shorter the field, the higher its relevance. If a pattern matches a short field (such as a title field) it is likely to be more relevant than the same pattern matched with a bigger field.
Distance, Threshold, and Location, text determine the number of words that are included in the search.
I tried if setting the search engine options to ignore the location and field norm, but that does not seem to work (or I am using this wrongly).
(gitbook() does not have the same issue as no filter is done)
This would be a feature request to add this to bs4_book() but deactivating scoring would create issue as of now because score value is used in the filter.
You would need to fork bookdown and change some options to see how that would work
It is also possible to change the effect of the field length. So being able to set this parameters world be a very welcome feature. Is this the best place to do a feature request?
Is it possible that the scoring is relative, @ecodiv ? I haven't read that page yet, but I can imagine that since the exact search phrase happens more than once in the text of the book, that it then tries to rank them by adjusting their scores.
I'm still very unimpressed with the idea that an exact match for my search phrase would not even show up in the results.
Doesn't everybody agree that's really a fundamental problem? Searching for a phrase should find that exact phrase anywhere it's in the book.
What are the ramifications of adjusting the score threshold, @cderv? Does it have to be set? Can it just be left to Fuse.js to determine whether there are matching results and how to prioritize them? Or does a threshold have to be set?
Because if scoring is relative, then I'm guessing we should just get rid of the threshold if we can.
I don't know a lot about Fuse.js as I did not implement the search feature in the first place. In bs4_book() we tried to add some logic to improve, in gitbook() we just used the default options.
If there is some improvment to do based on a better understanding of fuse.js, I'll be happy to do it. It seems indeed a bit off that exact match are not show. I can't answer your question regarding Fuse.js though, we'll need to search and try with the JS Lib.
I think the threshold is there to avoid having every result in the pop up box when you search but if this is not used correctly, then we need to change it.
if someone is willing to improve it, please submit a PR.
In bs4_book, the location is apparently already being ignored via ignoreLocation: true
I suspect somebody should try to modify bookdown and set ignoreFieldNorm: true
That wouldn't turn off or break scoring, so it should be a fairly simple test. And I'm guessing it's the most likely culprit, now that I've read the Fuse.js documentation.
If that doesn't help, then I'd try setting findAllMatches: true. However, since the match is apparently being returned according to @cderv's test, just scored too low, I'm guessing that's not the problem.