Notes on Tracing the ClamAV Scan Process Until the Eicar Test File Is Detected

This page has been machine-translated from the original page.

I’ve been reading through the ClamAV source code without any particular goal, but I’ve reached a natural stopping point, so I’m publishing notes on the sequence of operations that occur from the time ClamAV scans an Eicar file until it is detected.

Scanning a File with clamscan or clamdscan
- clamscan Options
- clamdscan Options
Scan Behavior of clamdscan
Debugging clamd
- Retrieving the File Path on the clamd Side
- Executing the Scan Callback Function
About the fmap_t Structure
- File Mapping Process
Tracing the Scanner Processing
Summary

Scanning a File with clamscan or clamdscan

clamdscan is a client program that performs file scanning as a client of a running clamd instance.

clamscan, on the other hand, uses ClamAV’s libclamav to act as a scanner for files and directories.

Unlike clamdscan, clamscan does not require a running clamd instance; instead, it creates a new scan engine and loads the virus database each time it is invoked.

Reference: Scanning - ClamAV Documentation

Let’s start by actually trying scans with both clamscan and clamdscan.

The following commands were used to fetch an Eicar test file and verify detection.

# Fetch the Eicar file
wget https://secure.eicar.org/eicar.com

# Detection test
clamscan eicar.com
clamdscan --fdpass eicar.com

Note: when clamd is configured to run as the clamav user, clamd may be unable to access the file requested by clamdscan, causing the scan to fail with File path check failure: Permission denied. ERROR.

For this reason, clamd was started in LocalSocket mode and scan requests were issued using the --fdpass option, which passes the file descriptor directly to clamd.

Reference: Configure clamdscan to scan all files on a system on Ubuntu 12.04 - Stack Overflow

clamscan Options

clamscan supports considerably more options than clamdscan.

The following is the help text for version 0.104.0.

$ clamscan --help

                       Clam AntiVirus: Scanner 0.104.0
           By The ClamAV Team: https://www.clamav.net/about.html#credits
           (C) 2021 Cisco Systems, Inc.

    clamscan [options] [file/directory/-]

    --help                -h             Show this help
    --version             -V             Print version number
    --verbose             -v             Be verbose
    --archive-verbose     -a             Show filenames inside scanned archives
    --debug                              Enable libclamav's debug messages
    --quiet                              Only output error messages
    --stdout                             Write to stdout instead of stderr. Does not affect 'debug' messages.
    --no-summary                         Disable summary at end of scanning
    --infected            -i             Only print infected files
    --suppress-ok-results -o             Skip printing OK files
    --bell                               Sound bell on virus detection

    --tempdir=DIRECTORY                  Create temporary files in DIRECTORY
    --leave-temps[=yes/no(*)]            Do not remove temporary files
    --gen-json[=yes/no(*)]               Generate JSON metadata for the scanned file(s). For testing & development use ONLY.
                                         JSON will be printed if --debug is enabled.
                                         A JSON file will dropped to the temp directory if --leave-temps is enabled.
    --database=FILE/DIR   -d FILE/DIR    Load virus database from FILE or load all supported db files from DIR
    --official-db-only[=yes/no(*)]       Only load official signatures
    --log=FILE            -l FILE        Save scan report to FILE
    --recursive[=yes/no(*)]  -r          Scan subdirectories recursively
    --allmatch[=yes/no(*)]   -z          Continue scanning within file after finding a match
    --cross-fs[=yes(*)/no]               Scan files and directories on other filesystems
    --follow-dir-symlinks[=0/1(*)/2]     Follow directory symlinks (0 = never, 1 = direct, 2 = always)
    --follow-file-symlinks[=0/1(*)/2]    Follow file symlinks (0 = never, 1 = direct, 2 = always)
    --file-list=FILE      -f FILE        Scan files from FILE
    --remove[=yes/no(*)]                 Remove infected files. Be careful!
    --move=DIRECTORY                     Move infected files into DIRECTORY
    --copy=DIRECTORY                     Copy infected files into DIRECTORY
    --exclude=REGEX                      Don't scan file names matching REGEX
    --exclude-dir=REGEX                  Don't scan directories matching REGEX
    --include=REGEX                      Only scan file names matching REGEX
    --include-dir=REGEX                  Only scan directories matching REGEX

    --bytecode[=yes(*)/no]               Load bytecode from the database
    --bytecode-unsigned[=yes/no(*)]      Load unsigned bytecode
                                         **Caution**: You should NEVER run bytecode signatures from untrusted sources.
                                         Doing so may result in arbitrary code execution.
    --bytecode-timeout=N                 Set bytecode timeout (in milliseconds)
    --statistics[=none(*)/bytecode/pcre] Collect and print execution statistics
    --detect-pua[=yes/no(*)]             Detect Possibly Unwanted Applications
    --exclude-pua=CAT                    Skip PUA sigs of category CAT
    --include-pua=CAT                    Load PUA sigs of category CAT
    --detect-structured[=yes/no(*)]      Detect structured data (SSN, Credit Card)
    --structured-ssn-format=X            SSN format (0=normal,1=stripped,2=both)
    --structured-ssn-count=N             Min SSN count to generate a detect
    --structured-cc-count=N              Min CC count to generate a detect
    --structured-cc-mode=X               CC mode (0=credit debit and private label, 1=credit cards only
    --scan-mail[=yes(*)/no]              Scan mail files
    --phishing-sigs[=yes(*)/no]          Enable email signature-based phishing detection
    --phishing-scan-urls[=yes(*)/no]     Enable URL signature-based phishing detection
    --heuristic-alerts[=yes(*)/no]       Heuristic alerts
    --heuristic-scan-precedence[=yes/no(*)] Stop scanning as soon as a heuristic match is found
    --normalize[=yes(*)/no]              Normalize html, script, and text files. Use normalize=no for yara compatibility
    --scan-pe[=yes(*)/no]                Scan PE files
    --scan-elf[=yes(*)/no]               Scan ELF files
    --scan-ole2[=yes(*)/no]              Scan OLE2 containers
    --scan-pdf[=yes(*)/no]               Scan PDF files
    --scan-swf[=yes(*)/no]               Scan SWF files
    --scan-html[=yes(*)/no]              Scan HTML files
    --scan-xmldocs[=yes(*)/no]           Scan xml-based document files
    --scan-hwp3[=yes(*)/no]              Scan HWP3 files
    --scan-archive[=yes(*)/no]           Scan archive files (supported by libclamav)
    --alert-broken[=yes/no(*)]           Alert on broken executable files (PE & ELF)
    --alert-broken-media[=yes/no(*)]     Alert on broken graphics files (JPEG, TIFF, PNG, GIF)
    --alert-encrypted[=yes/no(*)]        Alert on encrypted archives and documents
    --alert-encrypted-archive[=yes/no(*)] Alert on encrypted archives
    --alert-encrypted-doc[=yes/no(*)]    Alert on encrypted documents
    --alert-macros[=yes/no(*)]           Alert on OLE2 files containing VBA macros
    --alert-exceeds-max[=yes/no(*)]      Alert on files that exceed max file size, max scan size, or max recursion limit
    --alert-phishing-ssl[=yes/no(*)]     Alert on emails containing SSL mismatches in URLs
    --alert-phishing-cloak[=yes/no(*)]   Alert on emails containing cloaked URLs
    --alert-partition-intersection[=yes/no(*)] Alert on raw DMG image files containing partition intersections
    --nocerts                            Disable authenticode certificate chain verification in PE files
    --dumpcerts                          Dump authenticode certificate chain in PE files

    --max-scantime=#n                    Scan time longer than this will be skipped and assumed clean (milliseconds)
    --max-filesize=#n                    Files larger than this will be skipped and assumed clean
    --max-scansize=#n                    The maximum amount of data to scan for each container file (**)
    --max-files=#n                       The maximum number of files to scan for each container file (**)
    --max-recursion=#n                   Maximum archive recursion level for container file (**)
    --max-dir-recursion=#n               Maximum directory recursion level
    --max-embeddedpe=#n                  Maximum size file to check for embedded PE
    --max-htmlnormalize=#n               Maximum size of HTML file to normalize
    --max-htmlnotags=#n                  Maximum size of normalized HTML file to scan
    --max-scriptnormalize=#n             Maximum size of script file to normalize
    --max-ziptypercg=#n                  Maximum size zip to type reanalyze
    --max-partitions=#n                  Maximum number of partitions in disk image to be scanned
    --max-iconspe=#n                     Maximum number of icons in PE file to be scanned
    --max-rechwp3=#n                     Maximum recursive calls to HWP3 parsing function
    --pcre-match-limit=#n                Maximum calls to the PCRE match function.
    --pcre-recmatch-limit=#n             Maximum recursive calls to the PCRE match function.
    --pcre-max-filesize=#n               Maximum size file to perform PCRE subsig matching.
    --disable-cache                      Disable caching and cache checks for hash sums of scanned files.

Pass in - as the filename for stdin.

(*) Default scan settings
(**) Certain files (e.g. documents, archives, etc.) may in turn contain other
   files inside. The above options ensure safe processing of this kind of data.

clamscan provides options to control output (such as --quiet, --infected, and --no-summary), to specify the virus database used by the scan engine, and to remove or quarantine detected files.

Many additional options are available for finely controlling scan targets and exclusion criteria.

clamdscan Options

Because clamdscan performs file scanning as a client of a running clamd instance, unlike clamscan — which loads the engine and virus database fresh on each invocation — clamdscan has far fewer runtime scan options. The tradeoff is reduced overhead since the already-running clamd handles the actual scanning.

$ clamdscan --help

                      Clam AntiVirus: Daemon Client 0.104.0
           By The ClamAV Team: https://www.clamav.net/about.html#credits
           (C) 2021 Cisco Systems, Inc.

    clamdscan [options] [file/directory/-]

    --help              -h             Show this help
    --version           -V             Print version number and exit
    --verbose           -v             Be verbose
    --quiet                            Be quiet, only output error messages
    --stdout                           Write to stdout instead of stderr. Does not affect 'debug' messages.
                                       (this help is always written to stdout)
    --log=FILE          -l FILE        Save scan report in FILE
    --file-list=FILE    -f FILE        Scan files from FILE
    --ping              -p A[:I]       Ping clamd up to [A] times at optional interval [I] until it responds.
    --wait              -w             Wait up to 30 seconds for clamd to start. Optionally use alongside --ping to set attempts [A] and interval [I] to check clamd.
    --remove                           Remove infected files. Be careful!
    --move=DIRECTORY                   Move infected files into DIRECTORY
    --copy=DIRECTORY                   Copy infected files into DIRECTORY
    --config-file=FILE                 Read configuration from FILE.
    --allmatch            -z           Continue scanning within file after finding a match.
    --multiscan           -m           Force MULTISCAN mode
    --infected            -i           Only print infected files
    --no-summary                       Disable summary at end of scanning
    --reload                           Request clamd to reload virus database
    --fdpass                           Pass filedescriptor to clamd (useful if clamd is running as a different user)
    --stream                           Force streaming files to clamd (for debugging and unit testing)

When using clamdscan, output control through --quiet, --infected, and --no-summary is available, similar to clamscan. Removing and quarantining detected files with --remove and --move is also supported.

Scan Behavior of clamdscan

In this section we trace the behavior when a scan is performed with clamdscan.

In the main function of clamdscan, after parsing all runtime options, the following code is executed.

date_start = time(NULL);
gettimeofday(&t1, NULL);

ret = client(opts, &infected, &err);
optfree(clamdopts);

/* TODO: Implement STATUS in clamd */
if (!optget(opts, "no-summary")->enabled) {
    struct tm tmp;

    date_end = time(NULL);
    gettimeofday(&t2, NULL);
    ds  = t2.tv_sec - t1.tv_sec;
    dms = t2.tv_usec - t1.tv_usec;
    ds -= (dms < 0) ? (1) : (0);
    dms += (dms < 0) ? (1000000) : (0);
    logg("\n----------- SCAN SUMMARY -----------\n");
    logg("Infected files: %d\n", infected);
    if (err)
        logg("Total errors: %d\n", err);
    if (notremoved) {
        logg("Not removed: %d\n", notremoved);
    }
    if (notmoved) {
        logg("Not moved: %d\n", notmoved);
    }
    logg("Time: %d.%3.3d sec (%d m %d s)\n", ds, dms / 1000, ds / 60, ds % 60);

#ifdef _WIN32
    if (0 != localtime_s(&tmp, &date_start)) {
#else
    if (!localtime_r(&date_start, &tmp)) {
#endif
        logg("!Failed to get local time for Start Date.\n");
    }
    strftime(buffer, sizeof(buffer), "%Y:%m:%d %H:%M:%S", &tmp);
    logg("Start Date: %s\n", buffer);

#ifdef _WIN32
    if (0 != localtime_s(&tmp, &date_end)) {
#else
    if (!localtime_r(&date_end, &tmp)) {
#endif
        logg("!Failed to get local time for End Date.\n");
    }
    strftime(buffer, sizeof(buffer), "%Y:%m:%d %H:%M:%S", &tmp);
    logg("End Date:   %s\n", buffer);
}

Reference: clamav/clamdscan/clamdscan.c at rel/0.104 · kash1064/clamav

Executing the client Function

First, the client function is called with a variable holding a pointer to the parsed options list.

The client function receives the list via const struct optstruct *opts, extracts each option and configuration value, performs the scan of the target file, and returns the result.

struct optstruct {
    char *name;
    char *cmd;
    char *strarg;
    long long numarg;
    int enabled;
    int active;
    int flags;
    int idx;
    struct optstruct *nextarg;
    struct optstruct *next;

    char **filename; /* cmdline */
};

int ds, dms, ret, infected = 0, err = 0;
struct optstruct *opts;

ret = client(opts, &infected, &err);

The client function is implemented in clamdscan/client.c as follows.

int client(const struct optstruct *opts, int *infected, int *err)
{
    int remote, scantype, session = 0, errors = 0, scandash = 0, maxrec, flags = 0;
    const char *fname;

    if (optget(opts, "wait")->enabled) {
        int16_t ping_result = ping_clamd(opts);
        switch (ping_result) {
            case 0:
                break;
            case 1:
                return (int)CL_ETIMEOUT;
            default:
                return (int)CL_ERROR;
        }
    }

    scandash = (opts->filename && opts->filename[0] && !strcmp(opts->filename[0], "-") && !optget(opts, "file-list")->enabled && !opts->filename[1]);
    remote   = isremote(opts) | optget(opts, "stream")->enabled;
#ifdef HAVE_FD_PASSING
    if (!remote && optget(clamdopts, "LocalSocket")->enabled && (optget(opts, "fdpass")->enabled || scandash)) {
        scantype = FILDES;
        session  = optget(opts, "multiscan")->enabled;
    } else
#endif
        if (remote || scandash) {
        scantype = STREAM;
        session  = optget(opts, "multiscan")->enabled;
    } else if (optget(opts, "multiscan")->enabled)
        scantype = MULTI;
    else if (optget(opts, "allmatch")->enabled)
        scantype = ALLMATCH;
    else
        scantype = CONT;

    maxrec    = optget(clamdopts, "MaxDirectoryRecursion")->numarg;
    maxstream = optget(clamdopts, "StreamMaxLength")->numarg;
    if (optget(clamdopts, "FollowDirectorySymlinks")->enabled)
        flags |= CLI_FTW_FOLLOW_DIR_SYMLINK;
    if (optget(clamdopts, "FollowFileSymlinks")->enabled)
        flags |= CLI_FTW_FOLLOW_FILE_SYMLINK;
    flags |= CLI_FTW_TRIM_SLASHES;

    *infected = 0;

    if (scandash) {
        int sockd, ret;
        STATBUF sb;
        if (FSTAT(0, &sb) < 0) {
            logg("client.c: fstat failed for file name \"%s\", with %s\n.",
                 opts->filename[0], strerror(errno));
            return 2;
        }
        if ((sb.st_mode & S_IFMT) != S_IFREG) scantype = STREAM;
        if ((sockd = dconnect()) >= 0 && (ret = dsresult(sockd, scantype, NULL, &ret, NULL)) >= 0)
            *infected = ret;
        else
            errors = 1;
        if (sockd >= 0) closesocket(sockd);
    } else if (opts->filename || optget(opts, "file-list")->enabled) {
        if (opts->filename && optget(opts, "file-list")->enabled)
            logg("^Only scanning files from --file-list (files passed at cmdline are ignored)\n");

        while ((fname = filelist(opts, NULL))) {
            if (!strcmp(fname, "-")) {
                logg("!Scanning from standard input requires \"-\" to be the only file argument\n");
                continue;
            }
            errors += client_scan(fname, scantype, infected, err, maxrec, session, flags);
            /* this may be too strict
    if(errors >= 10) {
logg("!Too many errors\n");
break;
    }
    */
        }
    } else {
        errors = client_scan("", scantype, infected, err, maxrec, session, flags);
    }
    return *infected ? 1 : (errors ? 2 : 0);
}

Reference: clamav/clamdscan/client.c at rel/0.104 · kash1064/clamav

Setting the scantype

Inside this function, two variables, scandash and remote, are declared first.

scandash = (opts->filename && opts->filename[0] && !strcmp(opts->filename[0], "-") && !optget(opts, "file-list")->enabled && !opts->filename[1]);

remote = isremote(opts) | optget(opts, "stream")->enabled;

scandash stores the AND of (opts->filename && opts->filename[0] && !strcmp(opts->filename[0], "-") and !optget(opts, "file-list")->enabled && !opts->filename[1]) for the received opts.

This appears to be a check for whether the file-list option is disabled and the input uses - (stdin).

Reference: linux - What’s the magic of ”-” (a dash) in command-line parameters? - Stack Overflow

remote = isremote(opts) | optget(opts, "stream")->enabled; checks whether clamdscan is running remotely or whether the Stream option is enabled.

If either the scandash or remote flag is true, scantype is set to STREAM.

In this case, however, we are using a LocalSocket and the --fdpass option, so scantype becomes FILDES and this branch is skipped.

#ifdef HAVE_FD_PASSING
    if (!remote && optget(clamdopts, "LocalSocket")->enabled && (optget(opts, "fdpass")->enabled || scandash)) {
        scantype = FILDES;
        session  = optget(opts, "multiscan")->enabled;
    } else
#endif
    if (remote || scandash) {
        scantype = STREAM;
        session  = optget(opts, "multiscan")->enabled;
    } else if (optget(opts, "multiscan")->enabled)
        scantype = MULTI;
    else if (optget(opts, "allmatch")->enabled)
        scantype = ALLMATCH;
    else
        scantype = CONT;

Calling the Scan Function

When scandash is false (as in this case), client_scan is called directly with the scantype and other variables.

*infected = 0;

if (scandash) {
    /* omitted */
} else {
    errors = client_scan("", scantype, infected, err, maxrec, session, flags);   
}

Inside client_scan, the absolute path of the target file (file) is resolved.

Based on the received session value (session = optget(opts, "multiscan")->enabled;), either serial_client_scan or parallel_client_scan is then called.

/* Recursively scans a path with the given scantype
 * Returns non zero for serious errors, zero otherwise */
static int client_scan(const char *file, int scantype, int *infected, int *err, int maxlevel, int session, int flags)
{
    int ret;
    char *real_path = NULL;
    char *fullpath  = NULL;

    /* Convert relative path to fullpath */
    fullpath = makeabs(file);

    /* Convert fullpath to the real path (evaluating symlinks and . and ..).
       Doing this early on will ensure that the scan results will appear consistent
       across regular scans, --fdpass scans, and --stream scans. */
    if (CL_SUCCESS != cli_realpath(fullpath, &real_path)) {
        logg("*client_scan: Failed to determine real filename of %s.\n", fullpath);
    } else {
        free(fullpath);
        fullpath = real_path;
    }

    if (!fullpath)
        return 0;
    if (!session)
        ret = serial_client_scan(fullpath, scantype, infected, err, maxlevel, flags);
    else
        ret = parallel_client_scan(fullpath, scantype, infected, err, maxlevel, flags);
    free(fullpath);
    return ret;
}

When the multiscan option is not used, the file path, scantype, and other information are passed to serial_client_scan.

/* Non-IDSESSION handler
 * Returns non zero for serious errors, zero otherwise */
int serial_client_scan(char *file, int scantype, int *infected, int *err, int maxlevel, int flags)
{
    struct cli_ftw_cbdata data;
    struct client_serial_data cdata;
    int ftw;

    cdata.infected = 0;
    cdata.files    = 0;
    cdata.errors   = 0;
    cdata.printok  = printinfected ^ 1;
    cdata.scantype = scantype;
    data.data      = &cdata;

    ftw = cli_ftw(file, flags, maxlevel ? maxlevel : INT_MAX, serial_callback, &data, ftw_chkpath);
    *infected += cdata.infected;
    *err += cdata.errors;

    if (!cdata.errors && (ftw == CL_SUCCESS || ftw == CL_BREAK)) {
        if (cdata.printok)
            logg("~%s: OK\n", file);
        return 0;
    } else if (!cdata.files) {
        logg("~%s: No files scanned\n", file);
        return 0;
    }
    return 1;
}

A debugger confirms that char *file holds the full path of the target file at this point.

Inside this function, a client_serial_data struct variable cdata is initialized, stored into the data member of a cli_ftw_cbdata struct variable data, and then cli_ftw is called with the file information and other arguments.

/* wrap void*, so that we don't mix it with some other pointer */
struct cli_ftw_cbdata {
    void *data;
};

/* Used by serial_callback() */
struct client_serial_data {
    int infected;
    int scantype;
    int printok;
    int files;
    int errors;
};

struct cli_ftw_cbdata data;
struct client_serial_data cdata;
int ftw;

cdata.infected = 0;
cdata.files    = 0;
cdata.errors   = 0;
cdata.printok  = printinfected ^ 1;
cdata.scantype = scantype;
data.data      = &cdata;

ftw = cli_ftw(file, flags, maxlevel ? maxlevel : INT_MAX, serial_callback, &data, ftw_chkpath);
*infected += cdata.infected;
*err += cdata.errors;

Executing the cli_ftw Function

cli_ftw is a function implemented in others_common.c in libclamav and performs roughly the following operations.

handle_filetype retrieves the file type and checks whether it should be skipped (if the type is ft_skipped_link or ft_skipped_special, the entry appears to be skipped).
ft_skipped checks whether to skip the entry (it appears to be skipped when ft != ft_regular && ft != ft_directory is true).
For directories, the callback function (in this case serial_callback) is called directly; for files, it is called via handle_entry.

Reference: clamav/libclamav/others_common.c at rel/0.104 · kash1064/clamav

As described above, when clamdscan is invoked with a specific file, the full path of the target file is stored in the filename member of a dirent_data struct variable entry, and then handle_entry is called.

/*
 * Now call handle_entry() to either call the callback for files,
 * or recurse deeper into the file tree walk.
 * TODO: Recursion is bad, this whole thing should be iterative
 */
if (entry.is_dir) {
    entry.dirname = path;
} else {
    /* Allocate the filename for the callback function within the handle_entry function. TODO: this FTW code is spaghetti, refactor. */
    filename_for_handleentry = cli_strdup(path);
    if (NULL == filename_for_handleentry) {
        goto done;
    }

    entry.filename = filename_for_handleentry;
}
status = handle_entry(&entry, flags, maxdepth, callback, data, pathchk);

Debugging the call to handle_entry confirms that the filename member holds the full path of eicar.com.

> print *(struct dirent_data *)entry

$3 = {
  filename = 0x55555559b0c0 "/home/kash1064/Downloads/eicar.com",
  dirname = 0x0,
  statbuf = <optimized out>,
  ino = <optimized out>,
  is_dir = <optimized out>
}

handle_entry simply calls the callback function (in this case serial_callback).

The fourth argument cli_ftw_reason receives visit_file.

static int handle_entry(struct dirent_data *entry, int flags, int maxdepth, cli_ftw_cb callback, struct cli_ftw_cbdata *data, cli_ftw_pathchk pathchk)
{
    if (!entry->is_dir) {
        return callback(entry->statbuf, entry->filename, entry->filename, visit_file, data);
    } else {
        return cli_ftw_dir(entry->dirname, flags, maxdepth, callback, data, pathchk);
    }
}

Calling serial_callback

serial_callback is implemented in clamdscan/proto.c and was passed as the callback function when cli_ftw was called from serial_client_scan.

Reference: clamav/clamdscan/proto.c at rel/0.104 · kash1064/clamav

After performing several checks, dconnect connects to the clamd daemon, and dsresult issues a scan request over the obtained socket.

dsresult returns the number of infected files as an integer.

if ((sockd = dconnect()) < 0) {
    c->errors++;
    goto done;
}
ret = dsresult(sockd, c->scantype, f, &c->printok, &c->errors);
closesocket(sockd);
if (ret < 0) {
    c->errors++;
    goto done;
}
c->infected += ret;
if (reason == visit_directory_toplev) {
    status = CL_BREAK;
    goto done;
}

Debugging the dsresult call confirms that the first argument sockd is 3 and the third argument filename holds the path of the target file.

sockd is the file descriptor of the socket obtained inside dconnect via sockd = socket(AF_UNIX, SOCK_STREAM, 0) when using LocalSocket mode.

This can be confirmed by running ls -la /proc/$(pgrep clamdscan)/fd/ or lsof -p $(pgrep clamdscan) to verify that clamdscan has the socket open using that file descriptor.

Inside dsresult, after initializing several struct variables, a request is sent to clamd according to the value of scantype.

In this case, with the --fdpass option and scantype set to FILDES, len = send_fdpass(sockd, filename); is called first.

switch (scantype) {
    case MULTI:
    case CONT:
    case ALLMATCH:
        /* omitted */
    case STREAM:
        /* omitted */

#ifdef HAVE_FD_PASSING
    case FILDES:
        /* NULL filename safe in send_fdpass() */
        len = send_fdpass(sockd, filename);
        break;
#endif
}

In send_fdpass, the file descriptor of the scan target — obtained via fd = open(filename, O_RDONLY) — is sent to clamd using the sendmsg system call.

iov[0].iov_base = dummy;
iov[0].iov_len  = 1;
memset(&msg, 0, sizeof(msg));
msg.msg_control         = fdbuf;
msg.msg_iov             = iov;
msg.msg_iovlen          = 1;
msg.msg_controllen      = CMSG_LEN(sizeof(int));
cmsg                    = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_len          = CMSG_LEN(sizeof(int));
cmsg->cmsg_level        = SOL_SOCKET;
cmsg->cmsg_type         = SCM_RIGHTS;
*(int *)CMSG_DATA(cmsg) = fd;
if (sendmsg(sockd, &msg, 0) == -1) {
    logg("!FD send failed: %s\n", strerror(errno));
    close(fd);
    return -1;
}

Debugging clamd

First, compile ClamAV components including clamd as a Debug build using the following commands, following the steps described in Building ClamAV from Source and Setting Up OnAccessScan.

cmake .. \
    -D CMAKE_BUILD_TYPE=Debug \
    -D OPTIMIZE=OFF \
    -D ENABLE_EXAMPLES=OFF \
    -D ENABLE_STATIC_LIB=ON \
    -D ENABLE_SYSTEMD=ON

cmake --build . --target install

Running clamd and attaching with gdb -p $(pgrep clamd), the following debug output is produced when a scan is triggered by clamdscan.

Sat May 10 06:59:27 2025 -> $Got new connection, FD 12
Sat May 10 06:59:27 2025 -> $Received POLLIN|POLLHUP on fd 6
Sat May 10 06:59:27 2025 -> $fds_poll_recv: timeout after 30 seconds
Sat May 10 06:59:27 2025 -> $Received POLLIN|POLLHUP on fd 12
Sat May 10 06:59:27 2025 -> $Receveived a file descriptor: 13
Sat May 10 06:59:27 2025 -> $got command FILDES (7, 9), argument:
Sat May 10 06:59:27 2025 -> $RECVTH: FILDES command complete
Sat May 10 06:59:27 2025 -> $mode -> MODE_WAITREPLY
Sat May 10 06:59:27 2025 -> $Breaking command loop, mode is no longer MODE_COMMAND
Sat May 10 06:59:27 2025 -> $Consumed entire command
Sat May 10 06:59:27 2025 -> $Number of file descriptors polled: 1 fds
Sat May 10 06:59:27 2025 -> $fds_poll_recv: timeout after 600 seconds
Sat May 10 06:59:27 2025 -> $THRMGR: queue (single) crossed low threshold -> signaling
Sat May 10 06:59:27 2025 -> $THRMGR: queue (bulk) crossed low threshold -> signaling
LibClamAV debug: cli_get_filepath_from_filedesc: File path for fd [13] is: /home/kash1064/Downloads/eicar.com
LibClamAV debug: Recognized ASCII text
LibClamAV debug: cache_check: 44d88612fea8a8f36de82e1278abb02f is negative
LibClamAV debug: matcher_run: performing regex matching on full map: 0+68(68) >= 68
LibClamAV debug: FP SIGNATURE: 44d88612fea8a8f36de82e1278abb02f:68:Win.Test.EICAR_HDB-1
LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0
LibClamAV debug: Win.Test.EICAR_HDB-1 found
LibClamAV debug: cli_magic_scan_desc: returning 1  at line 4605
Sat May 10 06:59:27 2025 -> /home/kash1064/Downloads/eicar.com: Win.Test.EICAR_HDB-1(44d88612fea8a8f36de82e1278abb02f:68) FOUND
Sat May 10 06:59:27 2025 -> $Closed fd 13
Sat May 10 06:59:27 2025 -> $Finished scanthread
Sat May 10 06:59:27 2025 -> $Scanthread: connection shut down (FD 12)
Sat May 10 06:59:27 2025 -> $THRMGR: queue (single) crossed low threshold -> signaling
Sat May 10 06:59:27 2025 -> $THRMGR: queue (bulk) crossed low threshold -> signaling

From this output we can see that functions such as cli_get_filepath_from_filedesc, implemented in clamd’s scanner.c, are being called.

Reference: clamav/clamd/scanner.c at rel/0.104 · kash1064/clamav

Retrieving the File Path on the clamd Side

The cli_get_filepath_from_filedesc function called here uses the readlink system call on the received file descriptor to obtain the full path of the scan target and stores it in fname.

char fname[PATH_MAX];
char link[32];

memset(&fname, 0, PATH_MAX);

snprintf(link, sizeof(link), "/proc/self/fd/%u", desc);
link[sizeof(link) - 1] = '\0';

if (-1 == (linksz = readlink(link, fname, PATH_MAX - 1))) {
    cli_dbgmsg("cli_get_filepath_from_filedesc: Failed to resolve filename for descriptor %d (%s)\n", desc, link);
    status = CL_EOPEN;
    goto done;
}

The obtained file path is then saved as evaluated_filepath and assigned to the filepath argument received by the function.

cli_dbgmsg("cli_get_filepath_from_filedesc: File path for fd [%d] is: %s\n", desc, evaluated_filepath);
status    = CL_SUCCESS;
*filepath = evaluated_filepath;

Tracing this function’s return in the debugger revealed that it is called from the scanfd function in scanner.c.

/* Try and get the real filename, for logging purposes */
if (!stream) {
    if (CL_SUCCESS != cli_get_filepath_from_filedesc(fd, &filepath)) {
        logg("*%s: Unable to determine the filepath given the file descriptor.\n", fdstr);
    } else {
        log_filename = filepath;
    }
}

Reference: clamav/clamd/scanner.c at rel/0.104 · kash1064/clamav

Executing the Scan Callback Function

The file path obtained from the file descriptor is saved as log_filename and then passed to the cl_scandesc_callback function.

thrmgr_setactivetask(fdstr, NULL);
context.filename = fdstr;
context.virsize  = 0;
context.scandata = NULL;
ret              = cl_scandesc_callback(fd, log_filename, &virname, scanned, engine, options, &context);
thrmgr_setactivetask(NULL, NULL);

if (thrmgr_group_need_terminate(conn->group)) {
    logg("*Client disconnected while scanjob was active\n");
    ret = ret == CL_ETIMEOUT ? ret : CL_BREAK;
    goto done;
}

In the code above, thrmgr_setactivetask registers fdstr and context is initialized before calling cl_scandesc_callback. The function then receives the file descriptor and file path to perform the actual scan.

/**
 * @brief Scan a file, given a file descriptor.
 *
 * This callback variant allows the caller to provide a context structure that caller provided callback functions can interpret.
 *
 * @param desc              File descriptor of an open file. The caller must provide this or the map.
 * @param filename          (optional) Filepath of the open file descriptor or file map.
 * @param[out] virname      Will be set to a statically allocated (i.e. needs not be freed) signature name if the scan matches against a signature.
 * @param[out] scanned      The number of bytes scanned.
 * @param engine            The scanning engine.
 * @param scanoptions       Scanning options.
 * @param[in,out] context   An opaque context structure allowing the caller to record details about the sample being scanned.
 * @return cl_error_t       CL_CLEAN, CL_VIRUS, or an error code if an error occured during the scan.
 */
extern cl_error_t cl_scandesc_callback(int desc, const char *filename, const char **virname, unsigned long int *scanned, const struct cl_engine *engine, struct cl_scan_options *scanoptions, void *context);

cl_scandesc_callback is implemented in libclamav’s scanner.c.

Inside this function, cli_basename extracts the filename from the received file path as filename_base, and fmap maps the file as a fmap_t structure used for ClamAV file scanning.

if (NULL != filename) {
    (void)cli_basename(filename, strlen(filename), &filename_base);
}

if (NULL == (map = fmap(desc, 0, sb.st_size, filename_base))) {
    cli_errmsg("CRITICAL: fmap() failed\n");
    status = CL_EMEM;
    goto done;
}

status = scan_common(map, filename, virname, scanned, engine, scanoptions, context);

Reference: clamav/libclamav/scanners.c at rel/0.104 · kash1064/clamav

The mapped structure is passed together with the file path and other information to scan_common, where the actual virus scan is performed.

About the fmap_t Structure

ClamAV uses the cl_fmap_t (fmap_t) data structure to perform memory mapping for efficient file scanning.

struct cl_fmap;
typedef cl_fmap_t fmap_t;

struct cl_fmap {
    /* handle interface */
    void *handle;
    clcb_pread pread_cb;

    /* internal */
    time_t mtime;
    uint64_t pages;
    uint64_t pgsz;
    uint64_t paged;
    uint16_t aging;
    uint16_t dont_cache_flag;
    uint16_t handle_is_fd;

    /* memory interface */
    const void *data;

    /* common interface */
    size_t offset;        /* file offset */
    size_t nested_offset; /* buffer offset for nested scan*/
    size_t real_len;      /* amount of data mapped from file, starting at offset */
    size_t len;           /* length of data accessible via current fmap */

    /* real_len = nested_offset + len
     * file_offset = offset + nested_offset + need_offset
     * maximum offset, length accessible via fmap API: len
     * offset in cached buffer: nested_offset + need_offset
     *
     * This allows scanning a portion of an already mapped file without dumping
     * to disk and remapping (for uncompressed archives for example) */

    /* vtable for implementation */
    void (*unmap)(fmap_t *);
    const void *(*need)(fmap_t *, size_t at, size_t len, int lock);
    const void *(*need_offstr)(fmap_t *, size_t at, size_t len_hint);
    const void *(*gets)(fmap_t *, char *dst, size_t *at, size_t max_len);
    void (*unneed_off)(fmap_t *, size_t at, size_t len);
#ifdef _WIN32
    HANDLE fh;
    HANDLE mh;
#endif
    unsigned char maphash[16];
    uint64_t *bitmap;
    char *name;
};

File Mapping Process

For a clamdscan request such as this one, the scan target file is mapped into memory as a fmap_t structure by the fmap_check_empty function.

Reference: clamav/libclamav/fmap.c at rel/0.104 · kash1064/clamav

fmap_t *fmap_check_empty(int fd, off_t offset, size_t len, int *empty, const char *name)
{
    STATBUF st;
    fmap_t *m = NULL;

    *empty = 0;
    if (FSTAT(fd, &st)) {
        cli_warnmsg("fmap: fstat failed\n");
        return NULL;
    }

    if (!len) len = st.st_size - offset; /* bound checked later */
    if (!len) {
        cli_dbgmsg("fmap: attempted void mapping\n");
        *empty = 1;
        return NULL;
    }
    if (!CLI_ISCONTAINED(0, st.st_size, offset, len)) {
        cli_warnmsg("fmap: attempted oof mapping\n");
        return NULL;
    }
    m = cl_fmap_open_handle((void *)(ssize_t)fd, offset, len, pread_cb, 1);
    if (!m)
        return NULL;
    m->mtime = st.st_mtime;

    if (NULL != name) {
        m->name = cli_strdup(name);
        if (NULL == m->name) {
            funmap(m);
            return NULL;
        }
    }

    return m;
}

This function first uses fstat to store the state of the received file descriptor into a STATBUF struct (stat or stat64) variable st.

STATBUF st;
fmap_t *m = NULL;

*empty = 0;
if (FSTAT(fd, &st)) {
    cli_warnmsg("fmap: fstat failed\n");
    return NULL;
}

For the eicar.com file being scanned, fstat wrote the following information into st.

This matches the output of the stat eicar.com command, confirming that the stat information was retrieved correctly.

After validating the mapping range of the file, cl_fmap_open_handle is called with the file descriptor and other information.

Inside cl_fmap_open_handle, memory is allocated by cli_calloc (which wraps calloc) and the resulting fmap_t struct region m is populated with file information and pointers to several callbacks.

m->handle          = handle;
m->pread_cb        = pread_cb;
m->aging           = use_aging;
m->offset          = offset;
m->nested_offset   = 0;
m->len             = len; /* m->nested_offset + m->len = m->real_len */
m->real_len        = len;
m->pages           = pages;
m->pgsz            = pgsz;
m->paged           = 0;
m->dont_cache_flag = 0;
m->unmap           = unmap_handle;
m->need            = handle_need;
m->need_offstr     = handle_need_offstr;
m->gets            = handle_gets;
m->unneed_off      = handle_unneed_off;
m->handle_is_fd    = 1;

/* Calculate the fmap hash to be used by the FP check later */
if (CL_SUCCESS != fmap_get_MD5(hash, m)) {
    cli_warnmsg("fmap: failed to get MD5\n");
    goto done;
}
memcpy(m->maphash, hash, 16);

fmap_get_MD5 then computes the MD5 hash using the allocated fmap_t structure.

After this processing, the fmap_t structure is confirmed to have the MD5 hash of eicar.com — 44d88612fea8a8f36de82e1278abb02f — correctly written to it.

Furthermore, inside fmap_get_MD5, the file data read via m->need (handle_need) — called through fmap_need_off_once — can be confirmed to be stored in m->data.

Tracing the Scanner Processing

Up to this point, the scan request from clamdscan triggered cl_scandesc_callback, inside which the fmap_check_empty function (called from fmap) mapped the scan target file into memory as a fmap_t structure.

Next, scan_common is called from cl_scandesc_callback with this mapped file information as an argument.

/**
 * @brief   The main function to initiate a scan of an fmap.
 *
 * @param map               File map.
 * @param filepath          (optional, recommended) filepath of the open file descriptor or file map.
 * @param[out] virname      Will be set to a statically allocated (i.e. needs not be freed) signature name if the scan matches against a signature.
 * @param[out] scanned      The number of bytes scanned.
 * @param engine            The scanning engine.
 * @param scanoptions       Scanning options.
 * @param[inout] context    An opaque context structure allowing the caller to record details about the sample being scanned.
 * @return int              CL_CLEAN, CL_VIRUS, or an error code if an error occured during the scan.
 */
static cl_error_t scan_common(cl_fmap_t *map, const char *filepath, const char **virname, unsigned long int *scanned, const struct cl_engine *engine, struct cl_scan_options *scanoptions, void *context)

Reference: clamav/libclamav/scanners.c at main · Cisco-Talos/clamav

Arguments of the scan_common Function

Among the arguments passed to scan_common, the first argument cl_fmap_t *map contains the fmap structure mapped into memory as described in the previous section.

The second argument filepath holds the full path of the scan target file (/home/kash1064/Downloads/eicar.com).

The fifth argument const struct cl_engine *engine contains the configuration of the cl_engine structure used for scanning.

This structure is defined as follows.

Reference: clamav/libclamav/others.h at main · Cisco-Talos/clamav

struct cl_engine {
    uint32_t refcount; /* reference counter */
    uint32_t sdb;
    uint32_t dboptions;
    uint32_t dbversion[2];
    uint32_t ac_only;
    uint32_t ac_mindepth;
    uint32_t ac_maxdepth;
    char *tmpdir;
    uint32_t keeptmp;
    uint64_t engine_options;

    /* Limits */
    uint32_t maxscantime; /* Time limit (in milliseconds) */
    uint64_t maxscansize; /* during the scanning of archives this size
           * will never be exceeded
           */
    uint64_t maxfilesize; /* compressed files will only be decompressed
           * and scanned up to this size
           */
    uint32_t maxreclevel; /* maximum recursion level for archives */
    uint32_t maxfiles;    /* maximum number of files to be scanned
           * within a single archive
           */
    /* This is for structured data detection.  You can set the minimum
     * number of occurrences of an CC# or SSN before the system will
     * generate a notification.
     */
    uint32_t min_cc_count;
    uint32_t min_ssn_count;

    /* Roots table */
    struct cli_matcher **root;

    /* hash matcher for standard MD5 sigs */
    struct cli_matcher *hm_hdb;
    /* hash matcher for MD5 sigs for PE sections */
    struct cli_matcher *hm_mdb;
    /* hash matcher for MD5 sigs for PE import tables */
    struct cli_matcher *hm_imp;
    /* hash matcher for allow list db */
    struct cli_matcher *hm_fp;

    /* Container metadata */
    struct cli_cdb *cdb;

    /* Phishing .pdb and .wdb databases*/
    struct regex_matcher *allow_list_matcher;
    struct regex_matcher *domain_list_matcher;
    struct phishcheck *phishcheck;

    /* Dynamic configuration */
    struct cli_dconf *dconf;

    /* Filetype definitions */
    struct cli_ftype *ftypes;
    struct cli_ftype *ptypes;

    /* Container password storage */
    struct cli_pwdb **pwdbs;

    /* Pre-loading test matcher
     * Test for presence before using; cleared on engine compile.
     */
    struct cli_matcher *test_root;

    /* Ignored signatures */
    struct cli_matcher *ignored;

    /* PUA categories (to be included or excluded) */
    char *pua_cats;

    /* Icon reference storage */
    struct icon_matcher *iconcheck;

    /* Negative cache storage */
    struct CACHE *cache;

    /* Database information from .info files */
    struct cli_dbinfo *dbinfo;

    /* Signature counting, for progress callbacks */
    size_t num_total_signatures;

    /* Used for memory pools */
    mpool_t *mempool;

    /* crtmgr stuff */
    crtmgr cmgr;

    /* Callback(s) */
    clcb_pre_cache cb_pre_cache;
    clcb_pre_scan cb_pre_scan;
    clcb_post_scan cb_post_scan;
    clcb_virus_found cb_virus_found;
    clcb_sigload cb_sigload;
    void *cb_sigload_ctx;
    clcb_hash cb_hash;
    clcb_meta cb_meta;
    clcb_file_props cb_file_props;
    clcb_progress cb_sigload_progress;
    void *cb_sigload_progress_ctx;
    clcb_progress cb_engine_compile_progress;
    void *cb_engine_compile_progress_ctx;
    clcb_progress cb_engine_free_progress;
    void *cb_engine_free_progress_ctx;

    /* Used for bytecode */
    struct cli_all_bc bcs;
    unsigned *hooks[_BC_LAST_HOOK - _BC_START_HOOKS];
    unsigned hooks_cnt[_BC_LAST_HOOK - _BC_START_HOOKS];
    unsigned hook_lsig_ids;
    enum bytecode_security bytecode_security;
    uint32_t bytecode_timeout;
    enum bytecode_mode bytecode_mode;

    /* Engine max settings */
    uint64_t maxembeddedpe;      /* max size to scan MSEXE for PE */
    uint64_t maxhtmlnormalize;   /* max size to normalize HTML */
    uint64_t maxhtmlnotags;      /* max size for scanning normalized HTML */
    uint64_t maxscriptnormalize; /* max size to normalize scripts */
    uint64_t maxziptypercg;      /* max size to re-do zip filetype */

    /* Statistics/intelligence gathering */
    void *stats_data;
    clcb_stats_add_sample cb_stats_add_sample;
    clcb_stats_remove_sample cb_stats_remove_sample;
    clcb_stats_decrement_count cb_stats_decrement_count;
    clcb_stats_submit cb_stats_submit;
    clcb_stats_flush cb_stats_flush;
    clcb_stats_get_num cb_stats_get_num;
    clcb_stats_get_size cb_stats_get_size;
    clcb_stats_get_hostid cb_stats_get_hostid;

    /* Raw disk image max settings */
    uint32_t maxpartitions; /* max number of partitions to scan in a disk image */

    /* Engine max settings */
    uint32_t maxiconspe; /* max number of icons to scan for PE */
    uint32_t maxrechwp3; /* max recursive calls for HWP3 parsing */

    /* PCRE matching limitations */
    uint64_t pcre_match_limit;
    uint64_t pcre_recmatch_limit;
    uint64_t pcre_max_filesize;

#ifdef HAVE_YARA
    /* YARA */
    struct _yara_global *yara_global;
#endif
};

The sixth argument specifies scan options via the cl_scan_options structure.

/*** scan options ***/
struct cl_scan_options {
    uint32_t general;
    uint32_t parse;
    uint32_t heuristic;
    uint32_t mail;
    uint32_t dev;
};

The options set during this clamdscan invocation were as follows.

These arguments received by scan_common are subsequently saved as members of the cli_ctx struct variable ctx after it is initialized.

memset(&ctx, '\0', sizeof(cli_ctx));
ctx.engine  = engine;
ctx.virname = virname;
ctx.scanned = scanned;
ctx.options = malloc(sizeof(struct cl_scan_options));
memcpy(ctx.options, scanoptions, sizeof(struct cl_scan_options));
ctx.found_possibly_unwanted = 0;
ctx.containers              = cli_calloc(sizeof(cli_ctx_container), ctx.engine->maxreclevel + 2);
if (!ctx.containers) {
    rc = CL_EMEM;
    goto done;
}
cli_set_container(&ctx, CL_TYPE_ANY, 0);
ctx.dconf  = (struct cli_dconf *)engine->dconf;
ctx.cb_ctx = context;
fmap_head  = cli_calloc(sizeof(fmap_t *), ctx.engine->maxreclevel + 3);
if (!fmap_head) {
    rc = CL_EMEM;
    goto done;
}
if (!(ctx.hook_lsig_matches = cli_bitset_init())) {
    rc = CL_EMEM;
    goto done;
}

/*
 * The first fmap in ctx.fmap must be NULL so we can fmap-- while not NULL.
 * But we need an fmap to be set so we can append viruses or report the
 * fmap's file descriptor in the virus found callback (like for deferred
 * low-seveerity alerts).
 */
ctx.fmap  = fmap_head + 1;
*ctx.fmap = map;

This structure contains the file path of the scan target and various other status information, as defined below.

The structure is ultimately passed from scan_common to the actual scan function cli_magic_scan via cli_magic_scan(&ctx, CL_TYPE_ANY);.

/* internal clamav context */
typedef struct cli_ctx_tag {
    char *target_filepath;    /**< (optional) The filepath of the original scan target */
    const char *sub_filepath; /**< (optional) The filepath of the current file being parsed. May be a temp file. */
    char *sub_tmpdir;         /**< The directory to store tmp files at this recursion depth. */
    const char **virname;
    unsigned int num_viruses;
    unsigned long int *scanned;
    const struct cli_matcher *root;
    const struct cl_engine *engine;
    unsigned long scansize;
    struct cl_scan_options *options;
    unsigned int recursion;
    unsigned int scannedfiles;
    unsigned int found_possibly_unwanted;
    unsigned int corrupted_input;
    unsigned int img_validate;
    cli_ctx_container *containers; /* set container type after recurse */
    unsigned char handlertype_hash[16];
    struct cli_dconf *dconf;
    fmap_t **fmap; /* pointer to current fmap in an allocated array, incremented with recursion depth */
    bitset_t *hook_lsig_matches;
    void *cb_ctx;
    cli_events_t *perf;
#ifdef HAVE__INTERNAL__SHA_COLLECT
    int sha_collect;
#endif
#ifdef HAVE_JSON
    struct json_object *properties;
    struct json_object *wrkproperty;
#endif
    struct timeval time_limit;
    int limit_exceeded;
} cli_ctx;

Dumping ctx at the time cli_magic_scan is called reveals the following information.

Identifying the File Type

After the context information is passed to cli_magic_scan, following several checks, the code below is executed.

Here, cli_determine_fmap_type and cli_ftname identify the file type.

hash        = (*ctx->fmap)->maphash;
hashed_size = (*ctx->fmap)->len;

old_hook_lsig_matches = ctx->hook_lsig_matches;
if (type == CL_TYPE_PART_ANY) {
    typercg = 0;
}

/*
 * Perform file typing from the start of the file.
 */
perf_start(ctx, PERFT_FT);
if ((type == CL_TYPE_ANY) || type == CL_TYPE_PART_ANY) {
    type = cli_determine_fmap_type(*ctx->fmap, ctx->engine, type);
}
perf_stop(ctx, PERFT_FT);
if (type == CL_TYPE_ERROR) {
    cli_dbgmsg("cli_magic_scan: cli_determine_fmap_type returned CL_TYPE_ERROR\n");
    ret = CL_EREAD;
    cli_dbgmsg("cli_magic_scan: returning %d %s (no post, no cache)\n", ret, __AT__);
    goto early_ret;
}
filetype = cli_ftname(type);

The list of file types is defined in libclamav/filetypes.h. The Eicar file in this case was identified as CL_TYPE_TEXT_ASCII.

Reference: clamav/libclamav/filetypes.h at main · Cisco-Talos/clamav

Cache Check

After the file type is identified, cli_magic_scan also performs a check via dispatch_prescan_callback, but at this point ctx->engine->cb_pre_cache is NULL so nothing happens, and execution proceeds directly to the cache_check function.

This function checks whether the MD5 hash of the scan target file is present in the cacheset stored in ctx->engine->cache. If it is, CL_VIRUS is returned.

/* Hashes a file onto the provided buffer and looks it up the cache.
   Returns CL_VIRUS if found, CL_CLEAN if not FIXME or a recoverable error,
   and returns CL_EREAD if unrecoverable */
cl_error_t cache_check(unsigned char *hash, cli_ctx *ctx)
{
    fmap_t *map;
    int ret;

    if (!ctx || !ctx->engine || !ctx->engine->cache)
        return CL_VIRUS;

    if (ctx->engine->engine_options & ENGINE_OPTIONS_DISABLE_CACHE) {
        cli_dbgmsg("cache_check: Caching disabled. Returning CL_VIRUS.\n");
        return CL_VIRUS;
    }

    map = *ctx->fmap;
    ret = cache_lookup_hash(hash, map->len, ctx->engine->cache, ctx->recursion);
    cli_dbgmsg("cache_check: %02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x is %s\n", hash[0], hash[1], hash[2], hash[3], hash[4], hash[5], hash[6], hash[7], hash[8], hash[9], hash[10], hash[11], hash[12], hash[13], hash[14], hash[15], (ret == CL_VIRUS) ? "negative" : "positive");
    return ret;
}

In this case, the Eicar hash was already registered in the engine’s cacheset, so it appeared that the file was detected as a virus at this point.

Note: when the engine or cacheset is NULL, cache_check returns CL_VIRUS to allow scanning to proceed, so a return value of CL_VIRUS from this function alone does not necessarily mean the file will be reported as a virus.

if (!ctx || !ctx->engine || !ctx->engine->cache)
    return CL_VIRUS;

Tracing Scan Behavior After the Cache Check

To make the scan flow easier to trace, disable the cache by setting DisableCache yes in clamd.conf.

# This option allows you to disable clamd's caching feature.
# Default: no
DisableCache yes

With this change, repeating the clamdscan scan request skips the cacheset lookup in cache_check, and cli_magic_scan continues to subsequent processing.

After several operations, cli_magic_scan calls various scan functions corresponding to the type of the scan target file.

Since the Eicar file is classified as CL_TYPE_TEXT_ASCII, a check via cli_scan_structured(ctx); is performed.

ctx->recursion++;
perf_nested_start(ctx, PERFT_CONTAINER, PERFT_SCAN);
/* set current level as container AFTER recursing */
cli_set_container(ctx, type, (*ctx->fmap)->len);
switch (type) {
    case CL_TYPE_IGNORED:
        break;
    /* omitted */
    case CL_TYPE_BINARY_DATA:
    case CL_TYPE_TEXT_UTF16BE:
        if (SCAN_HEURISTICS && (DCONF_OTHER & OTHER_CONF_MYDOOMLOG))
            ret = cli_check_mydoom_log(ctx);
        break;

    case CL_TYPE_TEXT_ASCII:
        if (SCAN_HEURISTIC_STRUCTURED && (DCONF_OTHER & OTHER_CONF_DLP))
            /* TODO: consider calling this from cli_scanscript() for
             * a normalised text
             */

            ret = cli_scan_structured(ctx);
        break;

    default:
        break;
}
perf_nested_stop(ctx, PERFT_CONTAINER, PERFT_SCAN);
ctx->recursion--;

However, this function is used for DLP purposes to check whether credit card numbers or SSNs are present, so it can be ignored for this case.

Continuing to trace cli_magic_scan, we find that it ultimately calls scanraw(ctx, type, typercg, &dettype, (ctx->engine->engine_options & ENGINE_OPTIONS_DISABLE_CACHE) ? NULL : hash);.

/* CL_TYPE_HTML: raw HTML files are not scanned, unless safety measure activated via DCONF */
if (type != CL_TYPE_IGNORED && (type != CL_TYPE_HTML || !(SCAN_PARSE_HTML) || !(DCONF_DOC & DOC_CONF_HTML_SKIPRAW)) && !ctx->engine->sdb) {
    res = scanraw(ctx, type, typercg, &dettype, (ctx->engine->engine_options & ENGINE_OPTIONS_DISABLE_CACHE) ? NULL : hash);
    if (res != CL_CLEAN) {
        switch (res) {
            /* List of scan halts, runtime errors only! */
/* omitted */
            case CL_VIRUS:
                ret = res;
                if (SCAN_ALLMATCHES)
                    break;
                cli_bitset_free(ctx->hook_lsig_matches);
                ctx->hook_lsig_matches = old_hook_lsig_matches;
                goto done;
            /* omitted */
        }
    }
}

scanraw performs a raw scan of the fmap that contains the mapped scan target file, so the actual file scanning happens inside this function.

/**
 * @brief Perform raw scan of current fmap.
 *
 * @param ctx           Current scan context.
 * @param type          File type
 * @param typercg       Enable type recognition (file typing scan results).
 *                      If 0, will be a regular ac-mode scan.
 * @param[out] dettype  If typercg enabled and scan detects HTML or MAIL types,
 *                      will output HTML or MAIL types after performing HTML/MAIL scans
 * @param refhash       Hash of current fmap
 * @return cl_error_t
 */
static cl_error_t scanraw(cli_ctx *ctx, cli_file_t type, uint8_t typercg, cli_file_t *dettype, unsigned char *refhash)

Within scanraw, various scan routines are again called depending on the file type. The actual Eicar detection occurs inside cli_scan_fmap.

perf_start(ctx, PERFT_RAW);
ret = cli_scan_fmap(ctx, type == CL_TYPE_TEXT_ASCII ? CL_TYPE_ANY : type, 0, &ftoffset, acmode, NULL, refhash);
perf_stop(ctx, PERFT_RAW);

Inside cli_scan_fmap, two checks are performed: a signature-matcher scan that reads the file in fixed-size buffer chunks, and a hash-based scan.

The hash-based scan uses the database (hdb) from ctx->engine->hm_hdb.

The Eicar file used in this case was detected via hash-based scanning at the following location.

hdb = ctx->engine->hm_hdb;
fp  = ctx->engine->hm_fp;

/* omitted */

virname = NULL;
for (hashtype = CLI_HASH_MD5; hashtype < CLI_HASH_AVAIL_TYPES; hashtype++) {
    const char *virname_w = NULL;
    int found             = 0;

    /* If no hash, skip to next type */
    if (!compute_hash[hashtype])
        continue;

    /* Do hash scan */
    if ((ret = cli_hm_scan(digest[hashtype], map->len, &virname, hdb, hashtype)) == CL_VIRUS) {
        found += 1;
    }
    if (!found || SCAN_ALLMATCHES) {
        if ((ret = cli_hm_scan_wild(digest[hashtype], &virname_w, hdb, hashtype)) == CL_VIRUS)
            found += 2;
    }

    /* If found, do immediate hash-only FP check */
    if (found && fp) {
        for (hashtype2 = CLI_HASH_MD5; hashtype2 < CLI_HASH_AVAIL_TYPES; hashtype2++) {
            if (!compute_hash[hashtype2])
                continue;
            if (cli_hm_scan(digest[hashtype2], map->len, NULL, fp, hashtype2) == CL_VIRUS) {
                found = 0;
                ret   = CL_CLEAN;
                break;
            } else if (cli_hm_scan_wild(digest[hashtype2], NULL, fp, hashtype2) == CL_VIRUS) {
                found = 0;
                ret   = CL_CLEAN;
                break;
            }
        }
    }

    /* If matched size-based hash ... */
    if (found % 2) {
        viruses_found = 1;
        ret           = cli_append_virus(ctx, virname);
        if (!SCAN_ALLMATCHES || ret != CL_CLEAN)
            break;
        virname = NULL;
    }
    /* If matched size-agnostic hash ... */
    if (found > 1) {
        viruses_found = 1;
        ret           = cli_append_virus(ctx, virname_w);
        if (!SCAN_ALLMATCHES || ret != CL_CLEAN)
            break;
    }
}

For hash-based scanning, the hm_scan function is ultimately used.

Inside it, the hash list contained in the cli_sz_hash structure is compared against the scan target’s hash using the hm_cmp function.

/* cli_hm_scan will scan only size-specific hashes, if any */
static int hm_scan(const unsigned char *digest, const char **virname, const struct cli_sz_hash *szh, enum CLI_HASH_TYPE type)
{
    unsigned int keylen;
    size_t l, r;

    if (!digest || !szh || !szh->items)
        return CL_CLEAN;

    keylen = hashlen[type];

    l = 0;
    r = szh->items - 1;
    while (l <= r) {
        size_t c = (l + r) / 2;
        int res  = hm_cmp(digest, &szh->hash_array[keylen * c], keylen);

        if (res < 0) {
            if (!c)
                break;
            r = c - 1;
        } else if (res > 0)
            l = c + 1;
        else {
            if (virname)
                *virname = szh->virusnames[c];
            return CL_VIRUS;
        }
    }
    return CL_CLEAN;
}

Reference: clamav/libclamav/matcher-hash.c at rel/0.104 · kash1064/clamav

Because hm_cmp returns lexicographic comparison results, the hash list is searched using a binary search algorithm.

The hash table itself is also relatively small, since cli_htu32_find is used beforehand to extract the first byte of the scan target’s hash as a key for lookup.

This is how efficient hash-based scanning detects known viruses such as Eicar.

After detecting the Eicar file through the above process, cli_hm_scan is also used to compare against the false-positive (fp) database.

Summary

The implementation turned out to be simpler than I expected, but the many branches and jumps still made reading the code quite exhausting.

The second half got a bit rambling, but I managed to trace the full scan flow related to Eicar detection.

Published May 12, 2025

Aspiring Reverse Engineer and CTF Player (Team: 0nePadding). Passionate about WinDbg and Anti-Virus internals. OSCP / CISSP. Working at Microsoft Japan, but all views expressed are my own.かしわば(@kash1064) on Twitter