{"componentChunkName":"component---src-templates-post-template-js","path":"/clamav-scan-bm-en","result":{"data":{"markdownRemark":{"id":"d83a180c-b727-52d5-9e91-7f79c744913b","html":"<blockquote>\n<p>This page has been machine-translated from the <a href=\"/clamav-scan-bm\">original page</a>.</p>\n</blockquote>\n<p>In the previous article, <a href=\"/clamav-scan-fmap\">Search Algorithms Powering AntiVirus 1</a>, I looked at an implementation of the Aho–Corasick algorithm and how it is used in ClamAV.</p>\n<p>This time, again using ClamAV’s code as a base, I will summarize Boyer–Moore (BM), Wu-Manber (WM), and related techniques that are applied to signature pattern matching.</p>\n<!-- omit in toc -->\n<h2 id=\"table-of-contents\" style=\"position:relative;\"><a href=\"#table-of-contents\" aria-label=\"table of contents permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Table of Contents</h2>\n<ul>\n<li><a href=\"#background-pattern-matching-in-clamav-recap\">(Background) Pattern Matching in ClamAV (Recap)</a></li>\n<li>\n<p><a href=\"#what-is-the-boyermoore-bm-algorithm\">What Is the Boyer–Moore (BM) Algorithm</a></p>\n<ul>\n<li><a href=\"#bad-character-rule\">Bad Character Rule</a></li>\n<li><a href=\"#implementing-the-bad-character-rule\">Implementing the Bad Character Rule</a></li>\n<li><a href=\"#good-suffix-rule\">Good Suffix Rule</a></li>\n<li><a href=\"#boyermoorehorspool-bmh\">Boyer–Moore–Horspool (BMH)</a></li>\n<li><a href=\"#wu-manber-wm\">Wu-Manber (WM)</a></li>\n</ul>\n</li>\n<li>\n<p><a href=\"#reading-clamavs-implementation\">Reading ClamAV’s Implementation</a></p>\n<ul>\n<li><a href=\"#function-calls\">Function Calls</a></li>\n<li><a href=\"#building-tables-in-preprocessing\">Building Tables in Preprocessing</a></li>\n<li><a href=\"#scan-processing\">Scan Processing</a></li>\n</ul>\n</li>\n<li><a href=\"#summary\">Summary</a></li>\n</ul>\n<h2 id=\"background-pattern-matching-in-clamav-recap\" style=\"position:relative;\"><a href=\"#background-pattern-matching-in-clamav-recap\" aria-label=\"background pattern matching in clamav recap permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>(Background) Pattern Matching in ClamAV (Recap)</h2>\n<p>As mentioned in the <a href=\"/clamav-scan-fmap\">previous article</a>, ClamAV’s core file-scanning function, <code class=\"language-text\">cli_scan_fmap</code>, is, simply put, a function that scans a memory map abstracted as the <code class=\"language-text\">cl_fmap</code> structure.</p>\n<p>Inside it, processing such as hash-based checks and pattern matching against the loaded signature database is performed.</p>\n<p>After performing scan initialization, the <code class=\"language-text\">cli_scan_fmap</code> function reads up to SCANBUFF (0x20000 bytes) from the file map as one chunk and then scans it with the <code class=\"language-text\">matcher_run</code> function.</p>\n<p><a href=\"https://kashiwaba-yuki.com/static/9227e830595fb5eaa841331b3333b68c/807a0/image-20251214111115528.png\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 56.25%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAIAAADwazoUAAAACXBIWXMAAAsTAAALEwEAmpwYAAABnElEQVQoz2WSbXOjIBSF/bJp4PKmRlDUGFR8TbKJTT+0O93//7P2mkm3ncnhmTuAwpwDBIfKlWVprS2LIs9z7JgvJUkSxzEA0CfBXUFe5trol5ettoXzfdt63CPZJZw9xDlXKKmklEIIrDjzWJxVqck12W6jKDRaa2OW8/lyufjOD8PQY+v8NIxD13dd55zzbZtlGSFkXWxtYXS62RCQkbF7a/O6cvXBuUM9jBN+TXQqhAJYTfwHh0jg+6lph30RtQfpG5tmey5iYCEwtCcpqC3EWIF9w+5gJ3h7u91ur+cxvM2/Zs+7trie9kWG2RTj+BMXHDCsukdmDL5B29fl9ffl2tfi5DedE1VpBp81LuMypaKgIo1iPIfMpFmiDW7F2AMAHmiT100/jhOmdK52dW9zp0ININZgPCY8JaAI5YQKyuQP/zIIIzXP09/PTzzh4/G4LFfvW6AEHqIMCFvr6hOHPwlCGU/z8f3jz/F0Hqd5WRZ8KngRlLInvibhQaCqpHT1vmoSbbUpduYg40IofFsmjHT0VFfCFal2/wDfjzee1CXLPgAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/fb97a835ba23b88ea1aaebfacb13d7e0/8ac56/image-20251214111115528.webp 240w,\n/static/fb97a835ba23b88ea1aaebfacb13d7e0/d3be9/image-20251214111115528.webp 480w,\n/static/fb97a835ba23b88ea1aaebfacb13d7e0/e46b2/image-20251214111115528.webp 960w\"\n              sizes=\"(max-width: 960px) 100vw, 960px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/fb97a835ba23b88ea1aaebfacb13d7e0/8ff5a/image-20251214111115528.png 240w,\n/static/fb97a835ba23b88ea1aaebfacb13d7e0/e85cb/image-20251214111115528.png 480w,\n/static/fb97a835ba23b88ea1aaebfacb13d7e0/d9199/image-20251214111115528.png 960w\"\n            sizes=\"(max-width: 960px) 100vw, 960px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/fb97a835ba23b88ea1aaebfacb13d7e0/d9199/image-20251214111115528.png\"\n            alt=\"image-20251214111115528\"\n            title=\"image-20251214111115528\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n    </span></a></p>\n<p>This <code class=\"language-text\">matcher_run</code> function is called with the following parameters and scans the received buffer.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token keyword\">static</span> <span class=\"token keyword\">inline</span> <span class=\"token class-name\">cl_error_t</span> <span class=\"token function\">matcher_run</span><span class=\"token punctuation\">(</span>\n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_matcher</span> <span class=\"token operator\">*</span>root<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">unsigned</span> <span class=\"token keyword\">char</span> <span class=\"token operator\">*</span>buffer<span class=\"token punctuation\">,</span> <span class=\"token class-name\">uint32_t</span> length<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">char</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>virname<span class=\"token punctuation\">,</span> <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_ac_data</span> <span class=\"token operator\">*</span>mdata<span class=\"token punctuation\">,</span>\n    <span class=\"token class-name\">uint32_t</span> offset<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_target_info</span> <span class=\"token operator\">*</span>tinfo<span class=\"token punctuation\">,</span>\n    <span class=\"token class-name\">cli_file_t</span> ftype<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_matched_type</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>ftoffset<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">unsigned</span> <span class=\"token keyword\">int</span> acmode<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">unsigned</span> <span class=\"token keyword\">int</span> pcremode<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_ac_result</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>acres<span class=\"token punctuation\">,</span>\n    <span class=\"token class-name\">fmap_t</span> <span class=\"token operator\">*</span>map<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_bm_off</span> <span class=\"token operator\">*</span>offdata<span class=\"token punctuation\">,</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_pcre_off</span> <span class=\"token operator\">*</span>poffdata<span class=\"token punctuation\">,</span>\n    cli_ctx <span class=\"token operator\">*</span>ctx\n<span class=\"token punctuation\">)</span></code></pre></div>\n<p>Here, the <code class=\"language-text\">cli_bm_scanbuff</code> function first performs scanning with a Boyer-Moore-based algorithm, after which the <code class=\"language-text\">cli_ac_scanbuff</code> function performs pattern matching with the Aho-Corasick algorithm.</p>\n<p>I covered the <code class=\"language-text\">cli_ac_scanbuff</code> function in the previous article, so this time I will focus on scanning with the Boyer-Moore method through the <code class=\"language-text\">cli_bm_scanbuff</code> function.</p>\n<h2 id=\"what-is-the-boyermoore-bm-algorithm\" style=\"position:relative;\"><a href=\"#what-is-the-boyermoore-bm-algorithm\" aria-label=\"what is the boyermoore bm algorithm permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What Is the Boyer–Moore (BM) Algorithm</h2>\n<p>The BM algorithm is a fast algorithm for searching whether a specified keyword (signature) is contained in a particular text.</p>\n<p>The basic idea of the BM algorithm is to preprocess the keyword pattern and achieve fast keyword search by comparing characters from the end of the keyword toward the beginning and by avoiding unnecessary checks as much as possible when testing whether the target string matches the keyword.</p>\n<p>The standard BM algorithm generally refers to the version that applies the two rules described below: the <strong>Bad Character Rule</strong> and the <strong>Good Suffix Rule</strong>.</p>\n<p>This standard BM algorithm has a worst-case time complexity of O(N*M) (text length N × keyword length M), but it seems that by adding the rule known as the Galil rule, the worst-case complexity can be reduced to linear time.</p>\n<p>There is also a derivative of the BM algorithm called the Boyer–Moore–Horspool (BMH) algorithm, which uses only the Bad Character Rule and further simplifies BM.</p>\n<p>In signature pattern matching for AntiVirus software such as ClamAV, this Boyer–Moore–Horspool (BMH) method is apparently used in some cases for several reasons, so in this article I will mainly focus on BMH.</p>\n<h3 id=\"bad-character-rule\" style=\"position:relative;\"><a href=\"#bad-character-rule\" aria-label=\"bad character rule permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Bad Character Rule</h3>\n<p>In the BM algorithm, the Bad Character Rule is a rule for skipping unnecessary checks based on where a character that mismatched the keyword appears inside the keyword.</p>\n<p>The diagram in the following article was especially easy to understand for the Bad Character Rule.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/e4478ed6b73fca556612d6eabf7c87e0/73b94/image-20251228125008226.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 51.66666666666666%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAKCAYAAAC0VX7mAAAACXBIWXMAAAsTAAALEwEAmpwYAAABS0lEQVQoz5VS2U7DMBD0/38XEhKCl8JLOSSUForSxnFuX8mw46YHqEVipfHKu+vZtceq6zoQRVGgrmu0bYtGfFNVaJomIV88oVh/wLRSp0sYyeV5Dq01dnKuEL/fl1AkqKRgGAaEEBAFvY8w1qe9DxGvN3e4f17jVju8mAHTGBHiT8RxRN/3UCTihNZaDILgLN53Bo8bjV5yjJfLJd6yDRa6R2a61NR7L83CHj6k5qxVmG2aJi6QFZ1M3cq1o3ROuRQ9GWsvgaauJUa5AgnprxFcgsIfRjLiP6b4fhSGKhtjkqr0ZVmeICpSuEqeQc+xg8o8R3DP+FFlEvOK5/DOwQoePjW+2gFjDHDOJ0EooGNevJ3rGuFKKlNuJtMflAnpa/FuLlzkNTZ1NxO6BJKS7PydWasOj09w5CzLsFqtsN1uj/Hx1+SXwG9Drm+Fcwn0I96jnQAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/e4478ed6b73fca556612d6eabf7c87e0/8ac56/image-20251228125008226.webp 240w,\n/static/e4478ed6b73fca556612d6eabf7c87e0/d3be9/image-20251228125008226.webp 480w,\n/static/e4478ed6b73fca556612d6eabf7c87e0/e46b2/image-20251228125008226.webp 960w,\n/static/e4478ed6b73fca556612d6eabf7c87e0/22cc9/image-20251228125008226.webp 1004w\"\n              sizes=\"(max-width: 960px) 100vw, 960px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/e4478ed6b73fca556612d6eabf7c87e0/8ff5a/image-20251228125008226.png 240w,\n/static/e4478ed6b73fca556612d6eabf7c87e0/e85cb/image-20251228125008226.png 480w,\n/static/e4478ed6b73fca556612d6eabf7c87e0/d9199/image-20251228125008226.png 960w,\n/static/e4478ed6b73fca556612d6eabf7c87e0/73b94/image-20251228125008226.png 1004w\"\n            sizes=\"(max-width: 960px) 100vw, 960px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/e4478ed6b73fca556612d6eabf7c87e0/d9199/image-20251228125008226.png\"\n            alt=\"image-20251228125008226\"\n            title=\"image-20251228125008226\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>Reference: <a href=\"https://qiita.com/t_fuki/items/408fe87dceb4c88bd036\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">文字列検索アルゴリズム③ ー Boyer-Moore法 #競技プログラミング - Qiita</a></p>\n<p>When applying the BM algorithm, after aligning the beginning of keyword T with the current search position in the target string S, you compare the target string and the keyword starting from the word at the end of the keyword.</p>\n<p>When a mismatch occurs between the target character and the keyword character, the Bad Character Rule makes string search efficient by shifting the next search position as far as possible.</p>\n<p>With the Bad Character Rule, a shift table is created in advance by preprocessing the search keyword (that is, how many characters the search position can be shifted when a mismatch occurs), and the search position is skipped as much as possible depending on the character that mismatched at the end.</p>\n<p>For example, in the example explained on Wikipedia, the keyword NNAAMAN and the text are compared from the end, and the rule is evaluated at the position of the mismatched <code class=\"language-text\">N</code>.</p>\n<p>If this <code class=\"language-text\">N</code> also appears to the left of that position, the search position is skipped to that occurrence. If the character does not appear in the keyword at all, then the keyword can no longer match there, so the search position is skipped to the next character after it.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 265px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/b3e186931bfe24f0f6f7a4e5fd9f428c/78e79/image-20251228132341061.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 101.66666666666666%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAAACXBIWXMAAAsTAAALEwEAmpwYAAADFElEQVQ4y7VUW1PaQBTm/z/X57YP1WkdsVgugghodeQqt8hF7hByA0IASQgEvp7doGOd6UtrM7M5m7O7J+c75/vWs9lsYFoWLBrrNc1Nk+yaz8eairtkGmvaw77ZHsfZIBjww5jNYds29/EYdG7jOPBst1segDkdcjC75dbBarWCMZ2+WluD7Z9MJrRmc/9v52jNg3d+PPbKgiAIaDYeUS5XUK08QBRFCNUaLEkGYUavVkX6Po92s4HeYIhcJglJGaFSqZDvEclUGtZqjR3L0CLsxQJtbreRL5Qx6PegE8ySUMacBUynIdbryBaKPGC3P0A+l+GBBdrTatRRLAlYLC0O/f0h73Y7PtH1CRVbh7JQMLPmmE4NapYNw5hhMZ9DUVXMyS6enqAqEkG0ATq7ezM8nCLU/qpQQjxxi+t+HNeNG2Tu0mg0m0gR5FqthtB5GMX8PR5qj7g496PVFTk7lssl77BrHTdD1m42GBWcLfFtveIcex6Mg4wyLGPbdi0L8jozdp5n+K81272tIfsTYzsjMYPArGkucRW/QqlVxmU3hnwph2azwylSIcjxWBTFYhnZbBblUgG3yQycPck9TBUuLFctruzW6HW7kEYyWnoTnUEHiqxCVRXIioJ2p43hUEK304Esy+j0+vyM81+Uwl7iYIAOEfs+n8d4POb00DSNrIHzwA/4/EFcRqP4eXODcDgMfyCIRCIOn8+HarWOFrFhIEp76RFcWRpCIlUIDxUufF3XCdKQUyHkP8Ox14cT7wlCoRC+HXtxeurFxUUEh4dHKJFcp1MdEpXkJcPZjMi7WFBQia4qhzTp8LmqapjqU77GDmmjEfmHXJr86qIGLhZzXk/ecUYb1tWrRIy6VkIkEkGPtNrv9fD500eEI1HE4wmkUikE/H6cfvfh4OADTn1niEZjyBDp2d345csRNjuXRB72eqIMTNOCTcGfHybF8XhC2RtcfgyFSSVQFFeCBn1blslrrmkj7BXsSi8cDCCWuMbXo0OIsvbXFH/Rcj5HBKWGZDNpTHSDLz9L6e34k3+3h/wLNenjhmGW5wYAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/b3e186931bfe24f0f6f7a4e5fd9f428c/8ac56/image-20251228132341061.webp 240w,\n/static/b3e186931bfe24f0f6f7a4e5fd9f428c/4dbec/image-20251228132341061.webp 265w\"\n              sizes=\"(max-width: 265px) 100vw, 265px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/b3e186931bfe24f0f6f7a4e5fd9f428c/8ff5a/image-20251228132341061.png 240w,\n/static/b3e186931bfe24f0f6f7a4e5fd9f428c/78e79/image-20251228132341061.png 265w\"\n            sizes=\"(max-width: 265px) 100vw, 265px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/b3e186931bfe24f0f6f7a4e5fd9f428c/78e79/image-20251228132341061.png\"\n            alt=\"image-20251228132341061\"\n            title=\"image-20251228132341061\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>Reference: <a href=\"https://ja.wikipedia.org/wiki/%E3%83%9C%E3%82%A4%E3%83%A4%E3%83%BC-%E3%83%A0%E3%83%BC%E3%82%A2%E6%96%87%E5%AD%97%E5%88%97%E6%A4%9C%E7%B4%A2%E3%82%A2%E3%83%AB%E3%82%B4%E3%83%AA%E3%82%BA%E3%83%A0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">ボイヤー-ムーア文字列検索アルゴリズム - Wikipedia</a></p>\n<h3 id=\"implementing-the-bad-character-rule\" style=\"position:relative;\"><a href=\"#implementing-the-bad-character-rule\" aria-label=\"implementing the bad character rule permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Implementing the Bad Character Rule</h3>\n<p>To deepen my understanding of string search with the Bad Character Rule, I created a Python script that visualizes each search step.</p>\n<p>In the <code class=\"language-text\">BMSearcher</code> class in the following script, the search keyword is preprocessed to build a Bad Character Table.</p>\n<div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\"><span class=\"token keyword\">class</span> <span class=\"token class-name\">BMSearcher</span><span class=\"token punctuation\">:</span>\n\n    <span class=\"token keyword\">def</span> <span class=\"token function\">__init__</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">,</span> pattern<span class=\"token punctuation\">:</span> <span class=\"token builtin\">str</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n        self<span class=\"token punctuation\">.</span>pattern <span class=\"token operator\">=</span> pattern\n        self<span class=\"token punctuation\">.</span>m <span class=\"token operator\">=</span> <span class=\"token builtin\">len</span><span class=\"token punctuation\">(</span>pattern<span class=\"token punctuation\">)</span>\n\n        <span class=\"token comment\"># Build the Bad Character Table</span>\n        self<span class=\"token punctuation\">.</span>_bad_char_table<span class=\"token punctuation\">:</span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>_build_bad_char_table<span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span>\n\n            \n    <span class=\"token keyword\">def</span> <span class=\"token function\">_build_bad_char_table</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">:</span>\n        <span class=\"token comment\"># Skip table: Dict[str, int]</span>\n        <span class=\"token comment\"># Characters in the search keyword and the index of the last occurrence of each character in the keyword</span>\n        table <span class=\"token operator\">=</span> <span class=\"token punctuation\">{</span><span class=\"token punctuation\">}</span>\n        <span class=\"token keyword\">for</span> i<span class=\"token punctuation\">,</span> char <span class=\"token keyword\">in</span> <span class=\"token builtin\">enumerate</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>pattern<span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n            table<span class=\"token punctuation\">[</span>char<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> i\n        <span class=\"token keyword\">return</span> table</code></pre></div>\n<p>The Bad Character Table is defined as a dictionary whose keys are characters contained in the search keyword and whose values indicate the index where that character last appears (the rightmost occurrence) in the keyword.</p>\n<p>This makes it possible to determine how far the next search position can be skipped when a character mismatch occurs during the search.</p>\n<p>The code that implements the search process above is as follows.</p>\n<div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\"><span class=\"token decorator annotation punctuation\">@dataclass</span><span class=\"token punctuation\">(</span>frozen<span class=\"token operator\">=</span><span class=\"token boolean\">True</span><span class=\"token punctuation\">)</span>\n<span class=\"token keyword\">class</span> <span class=\"token class-name\">SearchStep</span><span class=\"token punctuation\">:</span>\n    shift<span class=\"token punctuation\">:</span> <span class=\"token builtin\">int</span>                      <span class=\"token comment\"># Shift position</span>\n    mismatch_index<span class=\"token punctuation\">:</span> Optional<span class=\"token punctuation\">[</span><span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span>   <span class=\"token comment\"># Index where the mismatch occurred (relative position from the end)</span>\n    bad_char<span class=\"token punctuation\">:</span> Optional<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">]</span>         <span class=\"token comment\"># The mismatched character</span>\n    shift_amount<span class=\"token punctuation\">:</span> <span class=\"token builtin\">int</span>               <span class=\"token comment\"># Shift amount for the next step</span>\n    description<span class=\"token punctuation\">:</span> <span class=\"token builtin\">str</span>                <span class=\"token comment\"># Description</span>\n\n\n<span class=\"token keyword\">class</span> <span class=\"token class-name\">BMSearcher</span><span class=\"token punctuation\">:</span>\n\n    <span class=\"token keyword\">def</span> <span class=\"token function\">__init__</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">,</span> pattern<span class=\"token punctuation\">:</span> <span class=\"token builtin\">str</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n        self<span class=\"token punctuation\">.</span>pattern <span class=\"token operator\">=</span> pattern\n        self<span class=\"token punctuation\">.</span>m <span class=\"token operator\">=</span> <span class=\"token builtin\">len</span><span class=\"token punctuation\">(</span>pattern<span class=\"token punctuation\">)</span>\n\n        <span class=\"token comment\"># Build the Bad Character Table</span>\n        self<span class=\"token punctuation\">.</span>_bad_char_table<span class=\"token punctuation\">:</span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>_build_bad_char_table<span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span>\n\n\n    <span class=\"token keyword\">def</span> <span class=\"token function\">_build_bad_char_table</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">:</span>\n        <span class=\"token comment\"># Skip table: Dict[str, int]</span>\n        <span class=\"token comment\"># Characters in the search keyword and the index of the last occurrence of each character in the keyword</span>\n        table <span class=\"token operator\">=</span> <span class=\"token punctuation\">{</span><span class=\"token punctuation\">}</span>\n        <span class=\"token keyword\">for</span> i<span class=\"token punctuation\">,</span> char <span class=\"token keyword\">in</span> <span class=\"token builtin\">enumerate</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>pattern<span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n            table<span class=\"token punctuation\">[</span>char<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> i\n        <span class=\"token keyword\">return</span> table\n\n\n    <span class=\"token comment\"># Generator that returns the search state so the process can be visualized</span>\n    <span class=\"token keyword\">def</span> <span class=\"token function\">search_generator</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">,</span> text<span class=\"token punctuation\">:</span> <span class=\"token builtin\">str</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> Iterator<span class=\"token punctuation\">[</span>SearchStep<span class=\"token punctuation\">]</span><span class=\"token punctuation\">:</span>\n\n        n <span class=\"token operator\">=</span> <span class=\"token builtin\">len</span><span class=\"token punctuation\">(</span>text<span class=\"token punctuation\">)</span>\n        shift <span class=\"token operator\">=</span> <span class=\"token number\">0</span>\n\n        <span class=\"token keyword\">while</span> shift <span class=\"token operator\">&lt;=</span> n <span class=\"token operator\">-</span> self<span class=\"token punctuation\">.</span>m<span class=\"token punctuation\">:</span>\n            j <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>m <span class=\"token operator\">-</span> <span class=\"token number\">1</span>\n            \n            <span class=\"token comment\"># Compare the keyword and the text from the end toward the beginning</span>\n            <span class=\"token keyword\">while</span> j <span class=\"token operator\">>=</span> <span class=\"token number\">0</span> <span class=\"token keyword\">and</span> self<span class=\"token punctuation\">.</span>pattern<span class=\"token punctuation\">[</span>j<span class=\"token punctuation\">]</span> <span class=\"token operator\">==</span> text<span class=\"token punctuation\">[</span>shift <span class=\"token operator\">+</span> j<span class=\"token punctuation\">]</span><span class=\"token punctuation\">:</span>\n                j <span class=\"token operator\">-=</span> <span class=\"token number\">1</span>\n\n            <span class=\"token comment\"># If the string matches completely</span>\n            <span class=\"token keyword\">if</span> j <span class=\"token operator\">&lt;</span> <span class=\"token number\">0</span><span class=\"token punctuation\">:</span>\n                <span class=\"token keyword\">yield</span> SearchStep<span class=\"token punctuation\">(</span>\n                    shift<span class=\"token operator\">=</span>shift<span class=\"token punctuation\">,</span>\n                    mismatch_index<span class=\"token operator\">=</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span>\n                    bad_char<span class=\"token operator\">=</span><span class=\"token boolean\">None</span><span class=\"token punctuation\">,</span>\n                    shift_amount<span class=\"token operator\">=</span><span class=\"token number\">1</span><span class=\"token punctuation\">,</span>\n                    description<span class=\"token operator\">=</span><span class=\"token string\">\"Complete match found!\"</span>\n                <span class=\"token punctuation\">)</span>\n                shift <span class=\"token operator\">+=</span> <span class=\"token number\">1</span>\n\n            <span class=\"token comment\"># If a mismatch occurs during the search</span>\n            <span class=\"token keyword\">else</span><span class=\"token punctuation\">:</span>\n                text_char <span class=\"token operator\">=</span> text<span class=\"token punctuation\">[</span>shift <span class=\"token operator\">+</span> j<span class=\"token punctuation\">]</span>\n                \n                <span class=\"token comment\"># Bad Character Rule formula: shift_amount = max(1, mismatch_index - last_occurrence_index)</span>\n                last_occurrence <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>_bad_char_table<span class=\"token punctuation\">.</span>get<span class=\"token punctuation\">(</span>text_char<span class=\"token punctuation\">,</span> <span class=\"token operator\">-</span><span class=\"token number\">1</span><span class=\"token punctuation\">)</span>\n                \n                <span class=\"token comment\"># Shift amount for skipping (shift by at least 1 to avoid an infinite loop)</span>\n                shift_amount <span class=\"token operator\">=</span> <span class=\"token builtin\">max</span><span class=\"token punctuation\">(</span><span class=\"token number\">1</span><span class=\"token punctuation\">,</span> j <span class=\"token operator\">-</span> last_occurrence<span class=\"token punctuation\">)</span>\n                \n                reason <span class=\"token operator\">=</span> <span class=\"token punctuation\">(</span>\n                    <span class=\"token string-interpolation\"><span class=\"token string\">f\"Bad char '</span><span class=\"token interpolation\"><span class=\"token punctuation\">{</span>text_char<span class=\"token punctuation\">}</span></span><span class=\"token string\">' found at pattern index </span><span class=\"token interpolation\"><span class=\"token punctuation\">{</span>last_occurrence<span class=\"token punctuation\">}</span></span><span class=\"token string\">.\"</span></span>\n                    <span class=\"token keyword\">if</span> last_occurrence <span class=\"token operator\">!=</span> <span class=\"token operator\">-</span><span class=\"token number\">1</span> <span class=\"token keyword\">and</span> last_occurrence <span class=\"token operator\">&lt;</span> j\n                    <span class=\"token keyword\">else</span> <span class=\"token string-interpolation\"><span class=\"token string\">f\"Bad char '</span><span class=\"token interpolation\"><span class=\"token punctuation\">{</span>text_char<span class=\"token punctuation\">}</span></span><span class=\"token string\">' logic (not in pattern or to right).\"</span></span>\n                <span class=\"token punctuation\">)</span>\n\n                <span class=\"token keyword\">yield</span> SearchStep<span class=\"token punctuation\">(</span>\n                    shift<span class=\"token operator\">=</span>shift<span class=\"token punctuation\">,</span>\n                    mismatch_index<span class=\"token operator\">=</span>j<span class=\"token punctuation\">,</span>\n                    bad_char<span class=\"token operator\">=</span>text_char<span class=\"token punctuation\">,</span>\n                    shift_amount<span class=\"token operator\">=</span>shift_amount<span class=\"token punctuation\">,</span>\n                    description<span class=\"token operator\">=</span>reason\n                <span class=\"token punctuation\">)</span>\n                \n                shift <span class=\"token operator\">+=</span> shift_amount</code></pre></div>\n<p>When you run a script based on the above, you can see character search from the end using the Bad Character Rule and how the search position shifts when a mismatch occurs, as shown below.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 765px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/3ee9198d20543aad4862bd950355a094/bbb77/image-20251228181258799.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 94.58333333333333%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAATCAYAAACQjC21AAAACXBIWXMAAAsTAAALEwEAmpwYAAAC+UlEQVQ4y11UWXbiQAz0MRII4KW9YrwvGAMmyeRj7n+gGpWMnWQ+9HqVWlUqtdU0DbIsR3o6oe87hGGItu3QdR1838fplOF4PKKua1RVhSRJ0LStrCvU4ns+n2E8D29vb9jv97Bs28ZirnOA6zqwHQeuXPLEGNR1XRwOh/menHG+GINw3O12c8BWMjmlRwTREd3tC3leimWaYZZlOA+DnKeaAW1xXIyPcOQ+zXq8P3C7XeGZEHbSIw5iOHs5/PEynYqi1EcSgU9aaMfkuGboPDO3yqpElueIowhFnqIscgQCk47kN5XsTspvj2malLdC7vMBriMJHMfCa1MjiWNYlZBN0iPZHIYL8qJAWdW4XEY0dYPxehXnQjKYeWYmjnBKXr0nz9xfIbNKrVSNl2xbiHZ9ePmENMlgywUW5/DkisF+csj5drtdg+0ZkGkPQnwYBgIt1UIYP8AxilUarDIz/Pj4wOefT0yPSe/lAnu63wVu/OsRKxLHIAgQCYdlWWrAOAqlsifhqtBg5GwpBHVojFHjnH4LZA04XkfNJAhCDRiyOPWA4WtCXGRKB7MgtM1moyMdv2U0Q5/XOwa8qtrJz0q648NEPpzAKH/cY+BxHHEiAnmA2TF78m7bjlBjtFAWW4qZkRc6dV2rkHORyllasZAznvPR9/d3sBG4rkUB1DAVMrdmpaPFQ+rHCPmVyIWvU3fNWbIppFdFf8xm8/qKl5cXhbzAJQVztef5Vvas2/2mVZ6VPreRLYr30h5e0iE1nnZNGIT6mPvUoOtw9FbZrFVmr7IjeLkoKepSq57EIbq2QszfRc4fj4dCpswIjbK5i2yI5tfnQDiLbJqm1YsKWeZlWaERK6QQvvFVKpTO0iH0oe8v2dynuxDe6+UF0pxhDD8VPv9e4VcpXndbbA+79cf5lsr3D6StN1xmyHyFQdkZ5JOBTSBzkc/etRHfGpg+W4P9b2vA8ikL/i7jeNUCqXzE+l76XGTFn+h8HrRjPM+skD1jdO38gPwPZfUwkoNmsOMAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/3ee9198d20543aad4862bd950355a094/8ac56/image-20251228181258799.webp 240w,\n/static/3ee9198d20543aad4862bd950355a094/d3be9/image-20251228181258799.webp 480w,\n/static/3ee9198d20543aad4862bd950355a094/33b41/image-20251228181258799.webp 765w\"\n              sizes=\"(max-width: 765px) 100vw, 765px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/3ee9198d20543aad4862bd950355a094/8ff5a/image-20251228181258799.png 240w,\n/static/3ee9198d20543aad4862bd950355a094/e85cb/image-20251228181258799.png 480w,\n/static/3ee9198d20543aad4862bd950355a094/bbb77/image-20251228181258799.png 765w\"\n            sizes=\"(max-width: 765px) 100vw, 765px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/3ee9198d20543aad4862bd950355a094/bbb77/image-20251228181258799.png\"\n            alt=\"image-20251228181258799\"\n            title=\"image-20251228181258799\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<h3 id=\"good-suffix-rule\" style=\"position:relative;\"><a href=\"#good-suffix-rule\" aria-label=\"good suffix rule permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Good Suffix Rule</h3>\n<p>The Good Suffix Rule is an additional rule used to search more efficiently than the Bad Character Rule alone.</p>\n<p>At a high level, the idea of the Good Suffix Rule is that if a suffix matched when evaluating from the end, that matched portion is treated as a good suffix and used to determine the next search position, enabling more efficient scanning.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 339px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/769d3696cd437c3baaa9688b7bb9749f/16caa/image-20251228215603440.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 74.58333333333333%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAPCAYAAADkmO9VAAAACXBIWXMAAAsTAAALEwEAmpwYAAACgUlEQVQ4y41Ui27aQBDk/78joarSFKUSUpMmSqtKpaEEjHkEwsvYuLZxHhjbYGOfp7sHBipFak863/puPbs7O75ClmVIkgRpmu5Xnps4xmSiISFbCIEwDDEajZGSfex/+EaAR4EBN5uNPOB1azNghKmuI463Z2EYYDwe0/nBL6agOShPCYg3huCIQuB/RpomoJzk4OQKGX3YqNfRUBSMBgOoagv3tSpmpoVWi2z9DhX9F5q/W7gcXGIwfES300a1VsdDt4t6Q4Hv+5KWfclBEMDnuV5jSVz5qxVSOmTegijAYu3R6sOPfSyXSwQE8LpYwPM8rOgbBmMe9yUHBIBoDSw8UDjaCIDnZzy9vMBbLRBuVgQWwg3cN8vmpLJd3QWCh9rpIDQN4Nt3ZGoTGZUvavdo9QdQ9Coq0yraVgtf+lcIqDlRFGFNmeWTm8NN2pcso8huiDzk303KM/lHg7ZNoQfXL3b6S3cdTijiZDJBRo3CcAShTYH7OvSpRvtjas4YhmGg3+/BsufHgEJqLRcqCzohrUXrFbTpFOlwCPQfAbJB1DiWBded09YApmliRNq0LBtiz6FEFpIX5iKPtNUYCZzsmGZIZx758GC+2If9syO9ygzZcOcu+r0eug89vFBnHduW5Z6enuLs7APevyuieHKCi4tP+FgqoVgsoq40oVPJqqpiSJS8vi52gPRwHAc2gbjuk9QZz5Ck9EzS4d/NMHRoxOFE00jMD5I723bkahgzuS6X/qHkm+tr/KhU5M8/m81kZHfuEDeW5NE0Z7i7q6JE2d3efoWiNPCT/M/PS/h8dYNyuYx2u3sAPNYRqz6/gXgv7zy/55fBlu9Irsc3FI8/5WZzEEhsP7EAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/769d3696cd437c3baaa9688b7bb9749f/8ac56/image-20251228215603440.webp 240w,\n/static/769d3696cd437c3baaa9688b7bb9749f/d6e88/image-20251228215603440.webp 339w\"\n              sizes=\"(max-width: 339px) 100vw, 339px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/769d3696cd437c3baaa9688b7bb9749f/8ff5a/image-20251228215603440.png 240w,\n/static/769d3696cd437c3baaa9688b7bb9749f/16caa/image-20251228215603440.png 339w\"\n            sizes=\"(max-width: 339px) 100vw, 339px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/769d3696cd437c3baaa9688b7bb9749f/16caa/image-20251228215603440.png\"\n            alt=\"image-20251228215603440\"\n            title=\"image-20251228215603440\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>Reference: <a href=\"https://ja.wikipedia.org/wiki/%E3%83%9C%E3%82%A4%E3%83%A4%E3%83%BC-%E3%83%A0%E3%83%BC%E3%82%A2%E6%96%87%E5%AD%97%E5%88%97%E6%A4%9C%E7%B4%A2%E3%82%A2%E3%83%AB%E3%82%B4%E3%83%AA%E3%82%BA%E3%83%A0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">ボイヤー-ムーア文字列検索アルゴリズム - Wikipedia</a></p>\n<p>In general, the algorithm that uses both the Bad Character Rule and the Good Suffix Rule is regarded as the standard BM algorithm.</p>\n<p>In some cases, the rule called the Galil rule is added as well.</p>\n<p>However, although the standard BM algorithm with the Good Suffix heuristic, or the BM algorithm extended with the Galil rule to guarantee linear-time search, is theoretically more efficient, there seem to be cases where it is not suitable in practice because of implementation complexity and the cost of preprocessing.</p>\n<p>For that reason, in use cases that require implementation simplicity and good average speed—such as general text editor search functions and AntiVirus pattern matching like the target of this article—it seems common to use BM with only the Bad Character Rule, the simplified Boyer–Moore–Horspool (BMH) algorithm, or algorithms based on it.</p>\n<p><em>Note:</em> In the case of AntiVirus software such as ClamAV, it appears that processing closer to the Wu-Manber method is used rather than a simple BMH algorithm, in order to search multiple patterns simultaneously.</p>\n<p>Reference: <a href=\"https://www.cs.arizona.edu/sites/default/files/TR94-17.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">AFASTALGORITHMFORMULTI-PATTERNSEARCHING</a></p>\n<p>Reference: <a href=\"https://webhome.cs.uvic.ca/~nigelh/Publications/stringsearch.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Practical Fast Searching in Strings</a></p>\n<p>Reference: <a href=\"http://www-igm.univ-mlv.fr/~lecroq/string/node18.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Horspool algorithm</a></p>\n<h3 id=\"boyermoorehorspool-bmh\" style=\"position:relative;\"><a href=\"#boyermoorehorspool-bmh\" aria-label=\"boyermoorehorspool bmh permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Boyer–Moore–Horspool (BMH)</h3>\n<p>As described above, the Boyer–Moore–Horspool (BMH) algorithm is a simplified algorithm based on BM with the Good Suffix Rule removed.</p>\n<p>For the BM algorithm extended with the Galil rule, the worst-case time complexity of string-search matching is guaranteed to be O(N), so in theory it is regarded as faster than the BMH algorithm, which is O(NM).</p>\n<p>However, in ClamAV, a BMH-based algorithm is used for pattern matching.</p>\n<p>In practice, although the BM algorithm including the Good Suffix Rule and the Galil rule is theoretically faster than BMH in terms of worst-case time complexity, if you also take preprocessing cost into account, BMH can provide better average performance in some use cases, so it is not unusual for applications to adopt BMH instead of BM.</p>\n<p>Reference: <a href=\"https://www.studyplan.dev/pro-cpp/search-algorithms/q/boyer-moore-vs-horspool\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Boyer-Moore vs Boyer-Moore-Horspool Search Algorithms | Search Algorithms | StudyPlan.dev</a></p>\n<p>Reference: <a href=\"https://stackoverflow.com/questions/11462153/which-is-a-better-string-searching-algorithm-boyer-moore-or-boyer-moore-horspoo\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">c++ - Which is a better string searching algorithm? Boyer-Moore or Boyer Moore Horspool? - Stack Overflow</a></p>\n<p>Reference: <a href=\"https://deve68.hatenadiary.org/entry/20120205/1328454937\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Boyer-Moore algorithm - コードの恵み</a></p>\n<p>The clearest explanation I found of the difference between BMH and the BM algorithm’s Bad Character Rule was on the following page.</p>\n<p>Reference: <a href=\"https://www.nct9.ne.jp/m_hiroi/light/pyalgo11.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Algorithms with Python / 文字列の探索</a></p>\n<p>Put simply, BMH is a technique that always focuses on the last character when a pattern mismatch occurs and skips the search position as much as possible.</p>\n<p>As mentioned earlier, in the Bad Character Rule, when a mismatch occurs, the character contained in the search keyword is used as a key, and skipping is performed based on a dictionary that indicates the index where that character last appears (the rightmost occurrence) in the keyword.</p>\n<p>In other words, as in the example below, if the mismatch occurs at the second character from the end, <code class=\"language-text\">e</code>, during the first search, the Bad Character Rule performs the next matching step by aligning the position of the mismatched <code class=\"language-text\">e</code>.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 679px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/94e5404de2142ac0e1dc5dc7a9bceea7/1b747/image-20260109213910621.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 69.16666666666667%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAYAAAAvxDzwAAAACXBIWXMAAAsTAAALEwEAmpwYAAACHUlEQVQ4y21U63aiYAzkOVREBAS5I4igRVt73Pd/o9mZuHRtT3/kfDcymUwSnLZtcTwekWYpxvOIY9dxn9n+cGiRpimapkFZlqjrBvePOw7tAafTCQ3Prut+M8fzPGw2G7P1ev31oP3r22xBEMD3fTOd9c2rOYqc5Zk9dmQ7jmL2ZJBnuTlutwFta2BazbgPfpjunYEA1+kNSbLHrp5wHCaESYRdFBnYzFbrarX61RaLBZbLpe0N8PZ+QxiGiJMUmYB3O6O/3+9xv3/idntHR22ld13XaLuWutb2jVhJ59nHyfPcUtzFMR0OqPmhnGKey6qCAhoIwQbKoH3f9+hYSIEo6DAMdh+RlCMngQpdVT2PZ1SsqjTUx0pjFnwulFbZXBi756qzo/a4XidzFv24/UBRjsgYba3Ket87QAFmwF81PJHuREBVUqBxElPHBAGjZezHx+MPdbyjJ+MTU1XqYi9NxTYMQuvRlL7GUE08ayYt1cQ1ZUh4LoqCWnXIuepeIHJWUfStfERCg1FVJSJ2htPwoa4rS6ttO7xdLqhYnGE62/SsFkt47jNFd9bQ/b+XvQ6Ctc00TdaHuvCpmVfG8MYCbhpiGfm0jTlJJ+m1eunNGWw9j17fn1iU67OfmL4Yp/sURZohrwqUjwvqzxE+3wt2g1KXbj9n+GuWBVIx5YSFaP6NnH4CLfWSlk3FHiRIHCemtQVkI0uvn6bh+AtrlpxBDjcVrwAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/94e5404de2142ac0e1dc5dc7a9bceea7/8ac56/image-20260109213910621.webp 240w,\n/static/94e5404de2142ac0e1dc5dc7a9bceea7/d3be9/image-20260109213910621.webp 480w,\n/static/94e5404de2142ac0e1dc5dc7a9bceea7/8fbda/image-20260109213910621.webp 679w\"\n              sizes=\"(max-width: 679px) 100vw, 679px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/94e5404de2142ac0e1dc5dc7a9bceea7/8ff5a/image-20260109213910621.png 240w,\n/static/94e5404de2142ac0e1dc5dc7a9bceea7/e85cb/image-20260109213910621.png 480w,\n/static/94e5404de2142ac0e1dc5dc7a9bceea7/1b747/image-20260109213910621.png 679w\"\n            sizes=\"(max-width: 679px) 100vw, 679px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/94e5404de2142ac0e1dc5dc7a9bceea7/1b747/image-20260109213910621.png\"\n            alt=\"image-20260109213910621\"\n            title=\"image-20260109213910621\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>On the other hand, when using the BMH algorithm, regardless of where the mismatch occurs, the algorithm focuses on the last character and slides the search position so that the last character aligns with the rightmost occurrence of that character within the keyword.</p>\n<p>(As in the example below, if the last character <code class=\"language-text\">d</code> is not included in the keyword, the check is performed by aligning the beginning of the keyword with the next character.)</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 822px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/0dbaabe4f73da390a2a0d7550a41da07/f73a1/image-20260109214000177.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 56.25%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAAsTAAALEwEAmpwYAAABvUlEQVQoz2VS2ZKjQAzjMwIJAQI0932FZEJtzf7/N2ksk6Qyuw8uuwWttmRbZVWhLEskaQqTJCiKAlEUoa5rJHJmZFmmeNu2qAQv8hzn8xmu60qc3zWzFccxgiBALCQk5uVcLhDzfX8PqT3fg+d5bywMQw1iJHqF9dgeGIcepuixrH9QtpV2QMLXT3zde9an0+kde4eu1sfjUbO13m4YhDCMDPKshBHpfHUcRzweG67XBcuy4H6/Czbs9ogNaZaqNZfLRTNVMVv0hYemabCuK+ZpRNu16hcx5mma0LQ7NgwDauId7+1+x7FBJbMwYp81CgFfDcN9EHl3RdcMcEWCbTtwHAeHwwGObeuZta21/ZbJTJzZ+v77rZ3FUQxjEkR5iyxOMT0lz/OsHVJy33c6NGOMBuUGga/NEOfZotxE16XEKBcHWpBmKpcdvySrzLZRyZTHmiScNMmJcWNkKOvzYy7+TehuC/rtiuASwBU5nzv2b83hfU6ZtRLyVf2BPyayW7mB12Tw+wyeEHtn7z8S3/+9f+89pEdd16GQwWzbhmkYsUwzxuuM+UtWSlaF/1B+3dRyHlUifec9Lvln1z8u9UIAssT8SAAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/0dbaabe4f73da390a2a0d7550a41da07/8ac56/image-20260109214000177.webp 240w,\n/static/0dbaabe4f73da390a2a0d7550a41da07/d3be9/image-20260109214000177.webp 480w,\n/static/0dbaabe4f73da390a2a0d7550a41da07/355f1/image-20260109214000177.webp 822w\"\n              sizes=\"(max-width: 822px) 100vw, 822px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/0dbaabe4f73da390a2a0d7550a41da07/8ff5a/image-20260109214000177.png 240w,\n/static/0dbaabe4f73da390a2a0d7550a41da07/e85cb/image-20260109214000177.png 480w,\n/static/0dbaabe4f73da390a2a0d7550a41da07/f73a1/image-20260109214000177.png 822w\"\n            sizes=\"(max-width: 822px) 100vw, 822px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/0dbaabe4f73da390a2a0d7550a41da07/f73a1/image-20260109214000177.png\"\n            alt=\"image-20260109214000177\"\n            title=\"image-20260109214000177\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<p>Because of this, when using the BMH algorithm there is no need to consider where the check failed, and only the last character needs to be examined. This makes the implementation simpler than the Bad Character Rule and also allows the search position to be shifted farther than with the Bad Character Rule.</p>\n<p>The result of implementing the table-building process used by BMH in Python is as follows.</p>\n<div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\"><span class=\"token keyword\">class</span> <span class=\"token class-name\">BMHSearcher</span><span class=\"token punctuation\">:</span>\n\n    <span class=\"token keyword\">def</span> <span class=\"token function\">__init__</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">,</span> pattern<span class=\"token punctuation\">:</span> <span class=\"token builtin\">str</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n        self<span class=\"token punctuation\">.</span>pattern <span class=\"token operator\">=</span> pattern\n        self<span class=\"token punctuation\">.</span>m <span class=\"token operator\">=</span> <span class=\"token builtin\">len</span><span class=\"token punctuation\">(</span>pattern<span class=\"token punctuation\">)</span>\n        self<span class=\"token punctuation\">.</span>_shift_table<span class=\"token punctuation\">:</span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>_build_shift_table<span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span>\n\n    <span class=\"token keyword\">def</span> <span class=\"token function\">_build_shift_table</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">:</span>\n\n        table <span class=\"token operator\">=</span> <span class=\"token punctuation\">{</span><span class=\"token punctuation\">}</span>\n        <span class=\"token comment\"># Register all characters except the last character in the pattern (from 0 to m-2)</span>\n        <span class=\"token keyword\">for</span> i <span class=\"token keyword\">in</span> <span class=\"token builtin\">range</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>m <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n            char <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>pattern<span class=\"token punctuation\">[</span>i<span class=\"token punctuation\">]</span>\n            table<span class=\"token punctuation\">[</span>char<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>m <span class=\"token operator\">-</span> <span class=\"token number\">1</span> <span class=\"token operator\">-</span> i\n        <span class=\"token keyword\">return</span> table</code></pre></div>\n<p>This is almost the same as the Bad Character Rule implementation, but the difference is that the information stored in the table is not the index where a character in the keyword last appears in the keyword, but rather the distance from the final character.</p>\n<h3 id=\"wu-manber-wm\" style=\"position:relative;\"><a href=\"#wu-manber-wm\" aria-label=\"wu manber wm permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Wu-Manber (WM)</h3>\n<p>The Wu-Manber (WM) method is a search algorithm designed to search multiple patterns at the same time based on the idea of the BMH algorithm.</p>\n<p>The BMH idea of trying to shift as far as possible by using a mismatch at the last character becomes more efficient as the probability rises that the last character does not appear in the keyword.</p>\n<p>However, if the number of patterns to search increases, the probability that some character in the pattern set matches the last character also increases, and eventually the algorithm converges to a state where a large shift width can no longer be obtained.</p>\n<p>To solve this multi-pattern search problem, the WM method evaluates and skips search positions based not on a single character but usually on blocks of 2 or 3 characters.</p>\n<p>In the WM method, in the preprocessing stage, the minimum pattern length in the pattern set is taken as <code class=\"language-text\">m</code>, the block size—usually 2 or 3—is taken as <code class=\"language-text\">B</code>, and three tables, SHIFT, HASH, and PREFIX, are created using the first <code class=\"language-text\">m</code> characters of each pattern.</p>\n<p>The SHIFT table is an extension of the BM shift table and is used during text scanning to determine how many characters in the text can be skipped.</p>\n<p>The HASH table contains pointers to lists of patterns that share the same suffix (the last <code class=\"language-text\">B</code> characters of the string obtained by taking the first <code class=\"language-text\">m</code> characters of the pattern), and the PREFIX table contains the hash value of each pattern’s prefix (the first <code class=\"language-text\">B</code> characters). These are used during pattern matching.</p>\n<p>I implemented this preprocessing in Python as follows.</p>\n<div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\"><span class=\"token keyword\">class</span> <span class=\"token class-name\">WuManber</span><span class=\"token punctuation\">:</span>\n\n    <span class=\"token keyword\">def</span> <span class=\"token function\">__init__</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">,</span> patterns<span class=\"token punctuation\">:</span> List<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> block_size<span class=\"token punctuation\">:</span> <span class=\"token builtin\">int</span> <span class=\"token operator\">=</span> <span class=\"token number\">2</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> <span class=\"token boolean\">None</span><span class=\"token punctuation\">:</span>\n        <span class=\"token keyword\">if</span> <span class=\"token keyword\">not</span> patterns<span class=\"token punctuation\">:</span>\n            <span class=\"token keyword\">raise</span> ValueError<span class=\"token punctuation\">(</span><span class=\"token string\">\"Invalid patterns.\"</span><span class=\"token punctuation\">)</span>\n        \n        self<span class=\"token punctuation\">.</span>patterns <span class=\"token operator\">=</span> patterns\n        self<span class=\"token punctuation\">.</span>block_size <span class=\"token operator\">=</span> block_size\n        self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">=</span> <span class=\"token builtin\">min</span><span class=\"token punctuation\">(</span><span class=\"token builtin\">len</span><span class=\"token punctuation\">(</span>p<span class=\"token punctuation\">)</span> <span class=\"token keyword\">for</span> p <span class=\"token keyword\">in</span> patterns<span class=\"token punctuation\">)</span>\n\n        <span class=\"token keyword\">if</span> self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">&lt;</span> self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">:</span>\n            <span class=\"token keyword\">raise</span> ValueError<span class=\"token punctuation\">(</span><span class=\"token string-interpolation\"><span class=\"token string\">f\"Block size </span><span class=\"token interpolation\"><span class=\"token punctuation\">{</span>self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">}</span></span><span class=\"token string\"> is larger than the minimum pattern length </span><span class=\"token interpolation\"><span class=\"token punctuation\">{</span>self<span class=\"token punctuation\">.</span>min_len<span class=\"token punctuation\">}</span></span><span class=\"token string\">.\"</span></span><span class=\"token punctuation\">)</span>\n\n        <span class=\"token comment\"># SHIFT table</span>\n        self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">:</span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> <span class=\"token punctuation\">{</span><span class=\"token punctuation\">}</span>\n\n        <span class=\"token comment\"># HASH table</span>\n        self<span class=\"token punctuation\">.</span>hash_table<span class=\"token punctuation\">:</span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> List<span class=\"token punctuation\">[</span><span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> defaultdict<span class=\"token punctuation\">(</span><span class=\"token builtin\">list</span><span class=\"token punctuation\">)</span>\n\n        <span class=\"token comment\"># PREFIX table / Python's built-in hash() returns an integer, so use it as is</span>\n        self<span class=\"token punctuation\">.</span>prefix_table<span class=\"token punctuation\">:</span> List<span class=\"token punctuation\">[</span><span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> <span class=\"token punctuation\">[</span><span class=\"token punctuation\">]</span>\n\n        self<span class=\"token punctuation\">.</span>_build_tables<span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span>\n\n    <span class=\"token keyword\">def</span> <span class=\"token function\">_get_hash</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">,</span> text_block<span class=\"token punctuation\">:</span> <span class=\"token builtin\">str</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">:</span>\n        <span class=\"token keyword\">return</span> <span class=\"token builtin\">hash</span><span class=\"token punctuation\">(</span>text_block<span class=\"token punctuation\">)</span>\n\n    <span class=\"token keyword\">def</span> <span class=\"token function\">_build_tables</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> <span class=\"token boolean\">None</span><span class=\"token punctuation\">:</span>\n\n        <span class=\"token keyword\">for</span> idx<span class=\"token punctuation\">,</span> pattern <span class=\"token keyword\">in</span> <span class=\"token builtin\">enumerate</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>patterns<span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n\n            prefix_part <span class=\"token operator\">=</span> pattern<span class=\"token punctuation\">[</span><span class=\"token punctuation\">:</span>self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">]</span>\n            self<span class=\"token punctuation\">.</span>prefix_table<span class=\"token punctuation\">.</span>append<span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>_get_hash<span class=\"token punctuation\">(</span>prefix_part<span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span>\n\n            limit_pattern <span class=\"token operator\">=</span> pattern<span class=\"token punctuation\">[</span><span class=\"token punctuation\">:</span>self<span class=\"token punctuation\">.</span>min_len<span class=\"token punctuation\">]</span>\n            \n            <span class=\"token comment\"># shift_table: {'ap': 3, 'pp': 2, 'pl': 1, 'le': 0, 'am': 3}</span>\n            <span class=\"token keyword\">for</span> i <span class=\"token keyword\">in</span> <span class=\"token builtin\">range</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">-</span> self<span class=\"token punctuation\">.</span>block_size <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n                block <span class=\"token operator\">=</span> limit_pattern<span class=\"token punctuation\">[</span>i <span class=\"token punctuation\">:</span> i <span class=\"token operator\">+</span> self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">]</span>\n                shift <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">-</span> <span class=\"token number\">1</span> <span class=\"token operator\">-</span> <span class=\"token punctuation\">(</span>i <span class=\"token operator\">+</span> self<span class=\"token punctuation\">.</span>block_size <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">)</span>\n                \n                <span class=\"token keyword\">if</span> block <span class=\"token keyword\">not</span> <span class=\"token keyword\">in</span> self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">:</span>\n                    self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> shift\n                <span class=\"token keyword\">else</span><span class=\"token punctuation\">:</span>\n                    self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> <span class=\"token builtin\">min</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> shift<span class=\"token punctuation\">)</span>\n                \n                <span class=\"token keyword\">if</span> shift <span class=\"token operator\">==</span> <span class=\"token number\">0</span><span class=\"token punctuation\">:</span>\n                    self<span class=\"token punctuation\">.</span>hash_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span><span class=\"token punctuation\">.</span>append<span class=\"token punctuation\">(</span>idx<span class=\"token punctuation\">)</span></code></pre></div>\n<p>First, in the initialization process below, various tables are created using <code class=\"language-text\">patterns</code>, the list of search keywords, and the default block size of 2.</p>\n<div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\"><span class=\"token keyword\">def</span> <span class=\"token function\">__init__</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">,</span> patterns<span class=\"token punctuation\">:</span> List<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> block_size<span class=\"token punctuation\">:</span> <span class=\"token builtin\">int</span> <span class=\"token operator\">=</span> <span class=\"token number\">2</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> <span class=\"token boolean\">None</span><span class=\"token punctuation\">:</span>\n    <span class=\"token keyword\">if</span> <span class=\"token keyword\">not</span> patterns<span class=\"token punctuation\">:</span>\n        <span class=\"token keyword\">raise</span> ValueError<span class=\"token punctuation\">(</span><span class=\"token string\">\"Invalid patterns.\"</span><span class=\"token punctuation\">)</span>\n\n    self<span class=\"token punctuation\">.</span>patterns <span class=\"token operator\">=</span> patterns\n    self<span class=\"token punctuation\">.</span>block_size <span class=\"token operator\">=</span> block_size\n    self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">=</span> <span class=\"token builtin\">min</span><span class=\"token punctuation\">(</span><span class=\"token builtin\">len</span><span class=\"token punctuation\">(</span>p<span class=\"token punctuation\">)</span> <span class=\"token keyword\">for</span> p <span class=\"token keyword\">in</span> patterns<span class=\"token punctuation\">)</span>\n\n    <span class=\"token keyword\">if</span> self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">&lt;</span> self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">:</span>\n        <span class=\"token keyword\">raise</span> ValueError<span class=\"token punctuation\">(</span><span class=\"token string-interpolation\"><span class=\"token string\">f\"Block size </span><span class=\"token interpolation\"><span class=\"token punctuation\">{</span>self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">}</span></span><span class=\"token string\"> is larger than the minimum pattern length </span><span class=\"token interpolation\"><span class=\"token punctuation\">{</span>self<span class=\"token punctuation\">.</span>min_len<span class=\"token punctuation\">}</span></span><span class=\"token string\">.\"</span></span><span class=\"token punctuation\">)</span>\n\n    <span class=\"token comment\"># SHIFT table</span>\n    self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">:</span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> <span class=\"token punctuation\">{</span><span class=\"token punctuation\">}</span>\n\n    <span class=\"token comment\"># HASH table</span>\n    self<span class=\"token punctuation\">.</span>hash_table<span class=\"token punctuation\">:</span> Dict<span class=\"token punctuation\">[</span><span class=\"token builtin\">str</span><span class=\"token punctuation\">,</span> List<span class=\"token punctuation\">[</span><span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> defaultdict<span class=\"token punctuation\">(</span><span class=\"token builtin\">list</span><span class=\"token punctuation\">)</span>\n\n    <span class=\"token comment\"># PREFIX table / Python's built-in hash() returns an integer, so use it as is</span>\n    self<span class=\"token punctuation\">.</span>prefix_table<span class=\"token punctuation\">:</span> List<span class=\"token punctuation\">[</span><span class=\"token builtin\">int</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> <span class=\"token punctuation\">[</span><span class=\"token punctuation\">]</span>\n\n    self<span class=\"token punctuation\">.</span>_build_tables<span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span></code></pre></div>\n<p>At this point, the length of the shortest pattern among all patterns is stored as <code class=\"language-text\">min_len</code>. (If this <code class=\"language-text\">min_len</code> is smaller than the block size, search using the standard WM method cannot be used.)</p>\n<p>The <code class=\"language-text\">_build_tables</code> method that actually creates the tables is implemented as follows.</p>\n<div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\"><span class=\"token keyword\">def</span> <span class=\"token function\">_build_tables</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> <span class=\"token boolean\">None</span><span class=\"token punctuation\">:</span>\n\n    <span class=\"token keyword\">for</span> idx<span class=\"token punctuation\">,</span> pattern <span class=\"token keyword\">in</span> <span class=\"token builtin\">enumerate</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>patterns<span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n\n        prefix_part <span class=\"token operator\">=</span> pattern<span class=\"token punctuation\">[</span><span class=\"token punctuation\">:</span>self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">]</span>\n        self<span class=\"token punctuation\">.</span>prefix_table<span class=\"token punctuation\">.</span>append<span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>_get_hash<span class=\"token punctuation\">(</span>prefix_part<span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span>\n\n        limit_pattern <span class=\"token operator\">=</span> pattern<span class=\"token punctuation\">[</span><span class=\"token punctuation\">:</span>self<span class=\"token punctuation\">.</span>min_len<span class=\"token punctuation\">]</span>\n\n        <span class=\"token keyword\">for</span> i <span class=\"token keyword\">in</span> <span class=\"token builtin\">range</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">-</span> self<span class=\"token punctuation\">.</span>block_size <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n            block <span class=\"token operator\">=</span> limit_pattern<span class=\"token punctuation\">[</span>i <span class=\"token punctuation\">:</span> i <span class=\"token operator\">+</span> self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">]</span>\n            shift <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">-</span> <span class=\"token number\">1</span> <span class=\"token operator\">-</span> <span class=\"token punctuation\">(</span>i <span class=\"token operator\">+</span> self<span class=\"token punctuation\">.</span>block_size <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">)</span>\n\n            <span class=\"token keyword\">if</span> block <span class=\"token keyword\">not</span> <span class=\"token keyword\">in</span> self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">:</span>\n                self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> shift\n            <span class=\"token keyword\">else</span><span class=\"token punctuation\">:</span>\n                self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> <span class=\"token builtin\">min</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> shift<span class=\"token punctuation\">)</span>\n\n            <span class=\"token keyword\">if</span> shift <span class=\"token operator\">==</span> <span class=\"token number\">0</span><span class=\"token punctuation\">:</span>\n                self<span class=\"token punctuation\">.</span>hash_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span><span class=\"token punctuation\">.</span>append<span class=\"token punctuation\">(</span>idx<span class=\"token punctuation\">)</span></code></pre></div>\n<p>Here, the code loops over the list of patterns. First, it appends to the PREFIX table the integer value produced by hashing the first <code class=\"language-text\">B</code> characters of the pattern (<code class=\"language-text\">pattern[:self.block_size]</code>) with Python’s built-in <code class=\"language-text\">hash()</code> function.</p>\n<p><em>Note:</em> This makes it possible to retrieve the hash of the leading block from the PREFIX table using the pattern index.</p>\n<p>Next, after storing the first <code class=\"language-text\">m</code> characters of the pattern as <code class=\"language-text\">limit_pattern = pattern[:self.min_len]</code>, the code runs a loop for <code class=\"language-text\">m - B + 1</code> iterations and creates the SHIFT table there.</p>\n<p>Here, blocks of length <code class=\"language-text\">B</code> are taken out one by one from the first <code class=\"language-text\">m</code> characters of each pattern, and a safely shiftable amount is added to the SHIFT table for each block.</p>\n<p>If a block is the last block of the pattern of length <code class=\"language-text\">m</code> (<code class=\"language-text\">shift == 0</code>), the pattern index is added to the list keyed by that block, creating the HASH table.</p>\n<div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\">block <span class=\"token operator\">=</span> limit_pattern<span class=\"token punctuation\">[</span>i <span class=\"token punctuation\">:</span> i <span class=\"token operator\">+</span> self<span class=\"token punctuation\">.</span>block_size<span class=\"token punctuation\">]</span>\nshift <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>min_len <span class=\"token operator\">-</span> <span class=\"token number\">1</span> <span class=\"token operator\">-</span> <span class=\"token punctuation\">(</span>i <span class=\"token operator\">+</span> self<span class=\"token punctuation\">.</span>block_size <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">)</span>\n\n<span class=\"token keyword\">if</span> block <span class=\"token keyword\">not</span> <span class=\"token keyword\">in</span> self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">:</span>\n    self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> shift\n<span class=\"token keyword\">else</span><span class=\"token punctuation\">:</span>\n    self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> <span class=\"token builtin\">min</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> shift<span class=\"token punctuation\">)</span>\n\n<span class=\"token keyword\">if</span> shift <span class=\"token operator\">==</span> <span class=\"token number\">0</span><span class=\"token punctuation\">:</span>\n    self<span class=\"token punctuation\">.</span>hash_table<span class=\"token punctuation\">[</span>block<span class=\"token punctuation\">]</span><span class=\"token punctuation\">.</span>append<span class=\"token punctuation\">(</span>idx<span class=\"token punctuation\">)</span></code></pre></div>\n<p>With this, all the tables needed to perform search with the WM method have been created.</p>\n<p>Next, let’s look at the text search process (<code class=\"language-text\">search</code> function) that uses these tables.</p>\n<div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\"><span class=\"token keyword\">def</span> <span class=\"token function\">search</span><span class=\"token punctuation\">(</span>self<span class=\"token punctuation\">,</span> text<span class=\"token punctuation\">:</span> <span class=\"token builtin\">str</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">-</span><span class=\"token operator\">></span> Iterator<span class=\"token punctuation\">[</span>Tuple<span class=\"token punctuation\">[</span><span class=\"token builtin\">int</span><span class=\"token punctuation\">,</span> <span class=\"token builtin\">str</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">:</span>\n    n <span class=\"token operator\">=</span> <span class=\"token builtin\">len</span><span class=\"token punctuation\">(</span>text<span class=\"token punctuation\">)</span>\n    m <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>min_len\n    B <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>block_size\n\n    <span class=\"token keyword\">if</span> n <span class=\"token operator\">&lt;</span> m<span class=\"token punctuation\">:</span>\n        <span class=\"token keyword\">return</span>\n\n    idx <span class=\"token operator\">=</span> m <span class=\"token operator\">-</span> <span class=\"token number\">1</span>\n    default_shift <span class=\"token operator\">=</span> m <span class=\"token operator\">-</span> B <span class=\"token operator\">+</span> <span class=\"token number\">1</span>\n\n    <span class=\"token keyword\">while</span> idx <span class=\"token operator\">&lt;</span> n<span class=\"token punctuation\">:</span>\n        <span class=\"token comment\"># Obtain the suffix block and determine the shift amount</span>\n        suffix_block <span class=\"token operator\">=</span> text<span class=\"token punctuation\">[</span>idx <span class=\"token operator\">-</span> B <span class=\"token operator\">+</span> <span class=\"token number\">1</span> <span class=\"token punctuation\">:</span> idx <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">]</span>\n        shift <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>shift_table<span class=\"token punctuation\">.</span>get<span class=\"token punctuation\">(</span>suffix_block<span class=\"token punctuation\">,</span> default_shift<span class=\"token punctuation\">)</span>\n\n        <span class=\"token comment\"># If the shift amount is 0, compare with candidate patterns (otherwise shift)</span>\n        <span class=\"token keyword\">if</span> shift <span class=\"token operator\">==</span> <span class=\"token number\">0</span><span class=\"token punctuation\">:</span>\n\n            candidate_indices <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>hash_table<span class=\"token punctuation\">.</span>get<span class=\"token punctuation\">(</span>suffix_block<span class=\"token punctuation\">,</span> <span class=\"token punctuation\">[</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">)</span>\n            text_start_pos <span class=\"token operator\">=</span> idx <span class=\"token operator\">-</span> m <span class=\"token operator\">+</span> <span class=\"token number\">1</span>\n            text_prefix_block <span class=\"token operator\">=</span> text<span class=\"token punctuation\">[</span>text_start_pos <span class=\"token punctuation\">:</span> text_start_pos <span class=\"token operator\">+</span> B<span class=\"token punctuation\">]</span>\n            text_prefix_hash <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>_get_hash<span class=\"token punctuation\">(</span>text_prefix_block<span class=\"token punctuation\">)</span>\n\n            <span class=\"token keyword\">for</span> p_idx <span class=\"token keyword\">in</span> candidate_indices<span class=\"token punctuation\">:</span>\n                <span class=\"token keyword\">if</span> self<span class=\"token punctuation\">.</span>prefix_table<span class=\"token punctuation\">[</span>p_idx<span class=\"token punctuation\">]</span> <span class=\"token operator\">!=</span> text_prefix_hash<span class=\"token punctuation\">:</span>\n                    <span class=\"token keyword\">continue</span>\n\n                pattern <span class=\"token operator\">=</span> self<span class=\"token punctuation\">.</span>patterns<span class=\"token punctuation\">[</span>p_idx<span class=\"token punctuation\">]</span>\n                p_len <span class=\"token operator\">=</span> <span class=\"token builtin\">len</span><span class=\"token punctuation\">(</span>pattern<span class=\"token punctuation\">)</span>\n\n                <span class=\"token keyword\">if</span> text_start_pos <span class=\"token operator\">+</span> p_len <span class=\"token operator\">&lt;=</span> n<span class=\"token punctuation\">:</span>\n                    <span class=\"token keyword\">if</span> text<span class=\"token punctuation\">[</span>text_start_pos <span class=\"token punctuation\">:</span> text_start_pos <span class=\"token operator\">+</span> p_len<span class=\"token punctuation\">]</span> <span class=\"token operator\">==</span> pattern<span class=\"token punctuation\">:</span>\n                        <span class=\"token keyword\">yield</span> <span class=\"token punctuation\">(</span>text_start_pos<span class=\"token punctuation\">,</span> pattern<span class=\"token punctuation\">)</span>\n\n            idx <span class=\"token operator\">+=</span> <span class=\"token number\">1</span>\n\n        <span class=\"token keyword\">else</span><span class=\"token punctuation\">:</span>\n            idx <span class=\"token operator\">+=</span> shift</code></pre></div>\n<p>When the <code class=\"language-text\">search</code> function is called, it first takes <code class=\"language-text\">m</code> characters from the text, extracts the last <code class=\"language-text\">B</code> characters as <code class=\"language-text\">suffix_block</code>, and compares them with the SHIFT table.</p>\n<p>If the last <code class=\"language-text\">B</code> characters do not exist in the SHIFT table, the search position is moved by <code class=\"language-text\">m - B + 1</code>. If they do exist in the SHIFT table, the search position is moved by the shift amount stored in the table.</p>\n<p>If the shift amount is 0, there is a possibility that a pattern is contained at the current search position, so the list of pattern indices whose suffix includes that block is extracted from the HASH table as <code class=\"language-text\">candidate_indices</code>.</p>\n<p>Next, after retrieving from the PREFIX table the hash of the leading block of the patterns whose suffix contains that block, the code compares it with the hash of the leading block in the text at the current search position. If they match, it performs a full-text comparison against the pattern.</p>\n<p>The reason for doing prefix matching with the PREFIX table here seems to be that when searching for words in English text, suffix blocks shared by many words, such as <code class=\"language-text\">tion</code> or <code class=\"language-text\">ee</code>, may match frequently.</p>\n<p>Therefore, by checking not only the suffix block but also the leading block, and then performing a full-text search only for the candidates that pass both checks, the process can be made more efficient.</p>\n<p>Reference: <a href=\"https://www.cs.arizona.edu/sites/default/files/TR94-17.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">AFASTALGORITHMFORMULTI-PATTERNSEARCHING</a></p>\n<p>Reference: <a href=\"https://www.ijert.org/research/variations-of-wu-manber-string-matching-algorithm-IJERTV3IS042295.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Variations of Wu-Manber String Matching  Algorithm </a></p>\n<p>Reference: <a href=\"https://github.com/bubiche/wu_manber/blob/master/wu_manber.hpp\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">wu<em>manber/wu</em>manber.hpp at master · bubiche/wu_manber</a></p>\n<h2 id=\"reading-clamavs-implementation\" style=\"position:relative;\"><a href=\"#reading-clamavs-implementation\" aria-label=\"reading clamavs implementation permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Reading ClamAV’s Implementation</h2>\n<p>Up to this point, I have summarized the BM algorithm, the BMH algorithm, and the WM method, which supports multi-pattern matching based on the same idea.</p>\n<p>From here, I will read the actual implementation of ClamAV’s <code class=\"language-text\">cli_bm_scanbuff</code> function and examine how pattern matching is implemented in AntiVirus software.</p>\n<h3 id=\"function-calls\" style=\"position:relative;\"><a href=\"#function-calls\" aria-label=\"function calls permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Function Calls</h3>\n<p>The <code class=\"language-text\">bm</code> in the function name <code class=\"language-text\">cli_bm_scanbuff</code> stands for the BM algorithm. It is implemented as “Extended Boyer-Moore,” and its actual behavior is close to the WM method.</p>\n<p>The <code class=\"language-text\">cli_bm_scanbuff</code> function is called with the following parameters, and when scanning files with clamscan, the data from the scanned file is passed in as <code class=\"language-text\">buffer</code>.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/b613b4d9193430993b2507dbf45cdfa5/29229/image-20260111223526284.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 5.416666666666667%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAABCAYAAADeko4lAAAACXBIWXMAAAsTAAALEwEAmpwYAAAASklEQVQI1x3LOQ6AIBQAUVu1gIiSAH8ptKDC+59uXNrJm6ndlXxleu+MMTAzWqvsR2aNzhKMOZ6ETTAVSqmo6msaIoK7/8/XUko8HDAaXBBDZH4AAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/b613b4d9193430993b2507dbf45cdfa5/8ac56/image-20260111223526284.webp 240w,\n/static/b613b4d9193430993b2507dbf45cdfa5/d3be9/image-20260111223526284.webp 480w,\n/static/b613b4d9193430993b2507dbf45cdfa5/e46b2/image-20260111223526284.webp 960w,\n/static/b613b4d9193430993b2507dbf45cdfa5/3eff0/image-20260111223526284.webp 1167w\"\n              sizes=\"(max-width: 960px) 100vw, 960px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/b613b4d9193430993b2507dbf45cdfa5/8ff5a/image-20260111223526284.png 240w,\n/static/b613b4d9193430993b2507dbf45cdfa5/e85cb/image-20260111223526284.png 480w,\n/static/b613b4d9193430993b2507dbf45cdfa5/d9199/image-20260111223526284.png 960w,\n/static/b613b4d9193430993b2507dbf45cdfa5/29229/image-20260111223526284.png 1167w\"\n            sizes=\"(max-width: 960px) 100vw, 960px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/b613b4d9193430993b2507dbf45cdfa5/d9199/image-20260111223526284.png\"\n            alt=\"image-20260111223526284\"\n            title=\"image-20260111223526284\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token class-name\">cl_error_t</span> <span class=\"token function\">cli_bm_scanbuff</span><span class=\"token punctuation\">(</span>\n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">unsigned</span> <span class=\"token keyword\">char</span> <span class=\"token operator\">*</span>buffer<span class=\"token punctuation\">,</span> \n    <span class=\"token class-name\">uint32_t</span> length<span class=\"token punctuation\">,</span> \n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">char</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>virname<span class=\"token punctuation\">,</span> \n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_bm_patt</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>patt<span class=\"token punctuation\">,</span> \n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_matcher</span> <span class=\"token operator\">*</span>root<span class=\"token punctuation\">,</span> \n    <span class=\"token class-name\">uint32_t</span> offset<span class=\"token punctuation\">,</span> \n    <span class=\"token keyword\">const</span> <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_target_info</span> <span class=\"token operator\">*</span>info<span class=\"token punctuation\">,</span> \n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_bm_off</span> <span class=\"token operator\">*</span>offdata<span class=\"token punctuation\">,</span> \n    cli_ctx <span class=\"token operator\">*</span>ctx\n<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span></code></pre></div>\n<h3 id=\"building-tables-in-preprocessing\" style=\"position:relative;\"><a href=\"#building-tables-in-preprocessing\" aria-label=\"building tables in preprocessing permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Building Tables in Preprocessing</h3>\n<p>Because ClamAV’s “Extended Boyer-Moore” behaves in practice much like the WM method, preprocessing is needed to create tables such as the SHIFT table and HASH table.</p>\n<p>These tables are not created in <code class=\"language-text\">cli_bm_scanbuff</code>; instead, they appear to be stored in the <code class=\"language-text\">cli_matcher</code> object passed in as <code class=\"language-text\">root</code>.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_matcher</span> <span class=\"token punctuation\">{</span>\n    <span class=\"token keyword\">unsigned</span> <span class=\"token keyword\">int</span> type<span class=\"token punctuation\">;</span>\n\n    <span class=\"token comment\">/* Extended Boyer-Moore */</span>\n    <span class=\"token class-name\">uint8_t</span> <span class=\"token operator\">*</span>bm_shift<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_bm_patt</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>bm_suffix<span class=\"token punctuation\">,</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>bm_pattab<span class=\"token punctuation\">;</span>\n    <span class=\"token class-name\">uint32_t</span> <span class=\"token operator\">*</span>soff<span class=\"token punctuation\">,</span> soff_len<span class=\"token punctuation\">;</span> <span class=\"token comment\">/* for PE section sigs */</span>\n    <span class=\"token class-name\">uint32_t</span> bm_offmode<span class=\"token punctuation\">,</span> bm_patterns<span class=\"token punctuation\">,</span> bm_reloff_num<span class=\"token punctuation\">,</span> bm_absoff_num<span class=\"token punctuation\">;</span>\n\n    <span class=\"token comment\">/* HASH */</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_hash_patt</span> hm<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_hash_wild</span> hwild<span class=\"token punctuation\">;</span>\n\n    <span class=\"token comment\">/* Extended Aho-Corasick */</span>\n    <span class=\"token class-name\">uint32_t</span> ac_partsigs<span class=\"token punctuation\">,</span> ac_nodes<span class=\"token punctuation\">,</span> ac_lists<span class=\"token punctuation\">,</span> ac_patterns<span class=\"token punctuation\">,</span> ac_lsigs<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_ac_lsig</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>ac_lsigtable<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_ac_node</span> <span class=\"token operator\">*</span>ac_root<span class=\"token punctuation\">,</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>ac_nodetable<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_ac_list</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>ac_listtable<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_ac_patt</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>ac_pattable<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">cli_ac_patt</span> <span class=\"token operator\">*</span><span class=\"token operator\">*</span>ac_reloff<span class=\"token punctuation\">;</span>\n    <span class=\"token class-name\">uint32_t</span> ac_reloff_num<span class=\"token punctuation\">,</span> ac_absoff_num<span class=\"token punctuation\">;</span>\n    <span class=\"token class-name\">uint8_t</span> ac_mindepth<span class=\"token punctuation\">,</span> ac_maxdepth<span class=\"token punctuation\">;</span>\n    <span class=\"token keyword\">struct</span> <span class=\"token class-name\">filter</span> <span class=\"token operator\">*</span>filter<span class=\"token punctuation\">;</span>\n\n    <span class=\"token class-name\">uint16_t</span> maxpatlen<span class=\"token punctuation\">;</span>\n    <span class=\"token class-name\">uint8_t</span> ac_only<span class=\"token punctuation\">;</span>\n\n    <span class=\"token punctuation\">{</span>omitted<span class=\"token punctuation\">}</span>\n<span class=\"token punctuation\">}</span><span class=\"token punctuation\">;</span></code></pre></div>\n<p>What corresponds to the SHIFT table in the WM method is <code class=\"language-text\">bm_shift</code>, and its values are set for each pattern to be matched by the <code class=\"language-text\">cli_bm_addpatt</code> function.</p>\n<h3 id=\"scan-processing\" style=\"position:relative;\"><a href=\"#scan-processing\" aria-label=\"scan processing permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Scan Processing</h3>\n<p>The scan processing in <code class=\"language-text\">cli_bm_scanbuff</code> is implemented as shown below, in the same way as the WM scan described earlier.</p>\n<p>It extracts 3 characters at a time from <code class=\"language-text\">buffer</code>, the data to be scanned, hashes them into an integer index with a custom macro, and checks the SHIFT table.</p>\n<div class=\"gatsby-highlight\" data-language=\"c\"><pre class=\"language-c\"><code class=\"language-c\"><span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">BM_MIN_LENGTH</span> <span class=\"token expression\"><span class=\"token number\">3</span></span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name\">BM_BLOCK_SIZE</span> <span class=\"token expression\"><span class=\"token number\">3</span></span></span>\n<span class=\"token macro property\"><span class=\"token directive-hash\">#</span><span class=\"token directive keyword\">define</span> <span class=\"token macro-name function\">HASH</span><span class=\"token expression\"><span class=\"token punctuation\">(</span>a<span class=\"token punctuation\">,</span> b<span class=\"token punctuation\">,</span> c<span class=\"token punctuation\">)</span> <span class=\"token punctuation\">(</span><span class=\"token number\">211</span> <span class=\"token operator\">*</span> a <span class=\"token operator\">+</span> <span class=\"token number\">37</span> <span class=\"token operator\">*</span> b <span class=\"token operator\">+</span> c<span class=\"token punctuation\">)</span></span></span>\n\n<span class=\"token punctuation\">{</span>omitted<span class=\"token punctuation\">}</span>\n\ni <span class=\"token operator\">=</span> BM_MIN_LENGTH <span class=\"token operator\">-</span> BM_BLOCK_SIZE<span class=\"token punctuation\">;</span>\n<span class=\"token keyword\">for</span> <span class=\"token punctuation\">(</span><span class=\"token punctuation\">;</span> i <span class=\"token operator\">&lt;</span> length <span class=\"token operator\">-</span> BM_BLOCK_SIZE <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">;</span><span class=\"token punctuation\">)</span> <span class=\"token punctuation\">{</span>\n    idx   <span class=\"token operator\">=</span> <span class=\"token function\">HASH</span><span class=\"token punctuation\">(</span>buffer<span class=\"token punctuation\">[</span>i<span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> buffer<span class=\"token punctuation\">[</span>i <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> buffer<span class=\"token punctuation\">[</span>i <span class=\"token operator\">+</span> <span class=\"token number\">2</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n    shift <span class=\"token operator\">=</span> root<span class=\"token operator\">-></span>bm_shift<span class=\"token punctuation\">[</span>idx<span class=\"token punctuation\">]</span><span class=\"token punctuation\">;</span>\n\n    <span class=\"token keyword\">if</span> <span class=\"token punctuation\">(</span>shift <span class=\"token operator\">==</span> <span class=\"token number\">0</span><span class=\"token punctuation\">)</span> <span class=\"token punctuation\">{</span>\n        prefix <span class=\"token operator\">=</span> buffer<span class=\"token punctuation\">[</span>i <span class=\"token operator\">-</span> BM_MIN_LENGTH <span class=\"token operator\">+</span> BM_BLOCK_SIZE<span class=\"token punctuation\">]</span><span class=\"token punctuation\">;</span>\n        p      <span class=\"token operator\">=</span> root<span class=\"token operator\">-></span>bm_suffix<span class=\"token punctuation\">[</span>idx<span class=\"token punctuation\">]</span><span class=\"token punctuation\">;</span>\n        <span class=\"token keyword\">if</span> <span class=\"token punctuation\">(</span>p <span class=\"token operator\">&amp;&amp;</span> p<span class=\"token operator\">-></span>cnt <span class=\"token operator\">==</span> <span class=\"token number\">1</span> <span class=\"token operator\">&amp;&amp;</span> p<span class=\"token operator\">-></span>pattern0 <span class=\"token operator\">!=</span> prefix<span class=\"token punctuation\">)</span> <span class=\"token punctuation\">{</span>\n        <span class=\"token punctuation\">{</span>omitted<span class=\"token punctuation\">}</span></code></pre></div>\n<p>Furthermore, when the SHIFT table’s shift amount is 0, ClamAV checks the PREFIX table and HASH table, compares the buffer with the pattern, and if they match, ultimately returns <code class=\"language-text\">CL_VIRUS</code> to report a detection.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/d8e5faac658b6991b16a18548f3b575d/dd507/image-20260111224939909.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 60.416666666666664%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAAAsTAAALEwEAmpwYAAACM0lEQVQoz1VTf3OiMBR05mZqCYEAQn6RRES0tdW7Vuz0rv3+n2vvETyd+2Mn4YVssvs2C2UUtNVgCUOapjA+oN8M0I2CVRrrEFCXFYw20MagpVpdVdBKQUoZ582qRp7ncf9COgnlFBhjyLIUjIpu0+P0PmJYbxCCh6Cfi6KIKAm3byFonsVvzvmVUGo6SeFhmSBhdAoXRJoh4QWWjxxLqrOU0YFJPDRhKRJS80hIkukC080y2jdjsel36PonOGcQ2hzBFVCqJkkC1uZo5Aq8XINXPXjhsao1SfcwNtAYICoHxiukWRGxOJ/fcbl84vxzjfPhAeNrgl2n0KkKvSwQdANlHIqSfBIl2TJLFGJGls1S+RWLj8uIy8cn3o4Ob88/ML4ssSXCfRdwHDpYWaMsC+S0cfJrMn8inMaZLCEidif0YYN++4z9fsDQa/SdRGs9Nm6Pjd1Rhw1WjYFULZR2EZrWJ9m19EiFQ5o35N9Vcp4LrNcdxvED4+U3ht0Bh5cjXo8nmLaNTWApp4ZMSCOS6zg1L3rH83tTqrLEbhjw/fUH399fOByeiPyM0+kI71zsbkaR4Dy9IbvN2VXuP8kMi4wIW5L98vqGp2ciCT12+wPZMGC1aug2FKFUxEj9B6rNnS1vHY6S5SDhtuRZ18MYS9AUBZKaBzwSGkU1a+OatS1a5yMUvRjO7/m7SVa/Gti9gW89bbCUPwlR0lMUPqKh0FtriNDEdUc2eO/Q1PUcF36PzIS/aVQrKXi68LUAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <picture>\n          <source\n              srcset=\"/static/d8e5faac658b6991b16a18548f3b575d/8ac56/image-20260111224939909.webp 240w,\n/static/d8e5faac658b6991b16a18548f3b575d/d3be9/image-20260111224939909.webp 480w,\n/static/d8e5faac658b6991b16a18548f3b575d/e46b2/image-20260111224939909.webp 960w,\n/static/d8e5faac658b6991b16a18548f3b575d/f992d/image-20260111224939909.webp 1440w,\n/static/d8e5faac658b6991b16a18548f3b575d/2be02/image-20260111224939909.webp 1528w\"\n              sizes=\"(max-width: 960px) 100vw, 960px\"\n              type=\"image/webp\"\n            />\n          <source\n            srcset=\"/static/d8e5faac658b6991b16a18548f3b575d/8ff5a/image-20260111224939909.png 240w,\n/static/d8e5faac658b6991b16a18548f3b575d/e85cb/image-20260111224939909.png 480w,\n/static/d8e5faac658b6991b16a18548f3b575d/d9199/image-20260111224939909.png 960w,\n/static/d8e5faac658b6991b16a18548f3b575d/07a9c/image-20260111224939909.png 1440w,\n/static/d8e5faac658b6991b16a18548f3b575d/dd507/image-20260111224939909.png 1528w\"\n            sizes=\"(max-width: 960px) 100vw, 960px\"\n            type=\"image/png\"\n          />\n          <img\n            class=\"gatsby-resp-image-image\"\n            src=\"/static/d8e5faac658b6991b16a18548f3b575d/d9199/image-20260111224939909.png\"\n            alt=\"image-20260111224939909\"\n            title=\"image-20260111224939909\"\n            loading=\"lazy\"\n            style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n          />\n        </picture>\n  </a>\n    </span></p>\n<h2 id=\"summary\" style=\"position:relative;\"><a href=\"#summary\" aria-label=\"summary permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Summary</h2>\n<p>Following the previous article on the Aho–Corasick algorithm, this time I summarized the BM algorithm and the WM method derived from it as information-search algorithms that support AntiVirus.</p>","fields":{"slug":"/clamav-scan-bm-en","tagSlugs":["/tag/clam-av/","/tag/malware/","/tag/linux/","/tag/english/"]},"frontmatter":{"date":"2026-01-11","description":"Using ClamAV as a reference, this article summarizes the Boyer–Moore (BM) and Wu-Manber (WM) algorithms that support AntiVirus pattern matching.","tags":["ClamAV","Malware","Linux","English"],"title":"Search Algorithms Powering AntiVirus 2 - Boyer–Moore (BM) & Wu-Manber (WM)","socialImage":{"publicURL":"/static/74ab7012ed7ff070cf628e39655850e5/clamav-scan-bm.png"}}}},"pageContext":{"slug":"/clamav-scan-bm-en"}},"staticQueryHashes":["251939775","401334301","825871152"]}