All Articles

ClamAV On-Access Scanning with fanotify - Learning AntiVirus on Linux Through OSS -

This page has been machine-translated from the original page.

Out of the blue, but at the online market for Technical Book Fest 17, which has been running since 11/2, I am distributing a free technical doujin book titled A part of Anti-Virus -Learning Windows filesystem minifilter driver from public sample code-.

In that book, I explain how AntiVirus software for Windows uses filesystem minifilter drivers for real-time file scanning (On-Access scanning).

img

Reference: A part of Anti-Virus:Frog’s Bookshelf

As with Magical WinDbg Vol 1/2, this one is also available for free due to various circumstances, so please feel free to pick it up.

In this article, as a kind of extra chapter to that book, I will briefly explain fanotify, the kernel framework used by AntiVirus software for Linux for real-time file scanning (On-Access scanning).

ClamAV for Linux implements On-Access scanning using fanotify, which is supported by the Linux kernel.

Major commercial AntiVirus products for Linux also use fanotify to implement On-Access scanning just like ClamAV, so understanding ClamAV at the source-code level should also be useful for understanding how those commercial AntiVirus products behave.

Reference: fanotify(7) - Linux manual page

Reference: On-Access Scanning - ClamAV Documentation

In this article, I will give a rough summary of how On-Access scanning for Linux AntiVirus software behaves when built on fanotify, using ClamAV as the example.

In this article, I refer to the source code of ClamAV 0.104.

Table of Contents

Creating a fanotify test program

Before looking at ClamAV’s implementation, I first wrote a test program to understand how fanotify itself works.

What is fanotify?

Like inotify, the framework for monitoring filesystem events that has existed since Linux kernel 2.6.13, fanotify can notify userspace about filesystem events in the system, and it can also allow or deny those operations. (Apparently it was preceded by an even older mechanism called dnotify.)

Reference: Linux file system notification subsystem

fanotify was added in Linux kernel 2.6.36 and enabled in 2.6.37.

Since then, various features and bug fixes seem to have been added.

As far as I could tell from the Linux-kernel-related books I own, fanotify is barely mentioned. But when you read an LWN.net newsletter article from 2009, you can see that fanotify was added in response to requests from AntiVirus vendors and customers for a practical way to implement On-Access scanning.

So it’s back to that time. I’m not quite sure how to present fanotify. I can start sending patches (they are available), but this message is just going to be a re-into, what questions and problems are still out there?

Long ago the anti-malware vendors started asking the community for a reasonable way to do on access file scanning, historically they have used syscall table rewrites and binary LSM hook hacks to get their information.

Customers and Linux users keep demanding this stuff and in an effort give them a supportable method to use these products I have been working to develop fanotify.

fanotify provides two things:

  1. a new notification system, sorta like inotify, only instead of an arbitrary ‘watch descriptor’ which userspace has to know how to map back to an object on the filesystem, fanotify provides an open read-only fd back to the original object. It should be noted that the set of fanotify events is much smaller than the set of inotify events.
  2. an access system in which processes may be blocked until the fanotify userspace listener has decided if the operation should be allowed.

Reference: fanotify: the fscking all notification system [LWN.net]

Note that fanotify is implemented only as an interface for On-Access scanning by AntiVirus software; fanotify itself does not detect or block malware.

The following threads were also helpful references for discussions related to fanotify’s design and implementation.

Reference: Re: [malware-list] scanner interface proposal was: [TALPA] Intro to a linux interface for on access scanning

Reference: TALPA - a threat model? well sorta.

Creating a sample fanotify program

To better understand fanotify, I decided to create a test program and verify its behavior.

The code below is a slightly customized version of the sample code introduced in the following manual.

Reference: Ubuntu Manpage: fanotify - monitor filesystem events

/* Define _GNU_SOURCE, Otherwise we don't get O_LARGEFILE */
#define _GNU_SOURCE

#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <poll.h>
#include <errno.h>
#include <limits.h>
#include <sys/stat.h>
#include <sys/signalfd.h>
#include <fcntl.h>

#include <linux/fanotify.h>

/* Structure to keep track of monitored directories */
typedef struct
{
    /* Path of the directory */
    char *path;
} monitored_t;

/* Size of buffer to use when reading fanotify events */
#define FANOTIFY_BUFFER_SIZE 8192

/* Enumerate list of FDs to poll */
enum
{
    FD_POLL_SIGNAL = 0,
    FD_POLL_FANOTIFY,
    FD_POLL_MAX
};

/* Setup fanotify notifications (FAN) mask. All these defined in fanotify.h. */
static uint64_t event_mask =
    (FAN_ACCESS |         /* File accessed */
     FAN_MODIFY |         /* File modified */
     FAN_CLOSE_WRITE |    /* Writtable file closed */
     FAN_CLOSE_NOWRITE |  /* Unwrittable file closed */
     FAN_OPEN |           /* File was opened */
     FAN_ONDIR |          /* We want to be reported of events in the directory */
     FAN_EVENT_ON_CHILD); /* We want to be reported of events in files of the directory */

/* Array of directories being monitored */
static monitored_t *monitors;
static int n_monitors;

static char *
get_program_name_from_pid(int pid,
                          char *buffer,
                          size_t buffer_size)
{
    int fd;
    ssize_t len;
    char *aux;

    /* Try to get program name by PID */
    sprintf(buffer, "/proc/%d/cmdline", pid);
    if ((fd = open(buffer, O_RDONLY)) < 0)
        return NULL;

    /* Read file contents into buffer */
    if ((len = read(fd, buffer, buffer_size - 1)) <= 0)
    {
        close(fd);
        return NULL;
    }
    close(fd);

    buffer[len] = '\0';
    aux = strstr(buffer, "^@");
    if (aux)
        *aux = '\0';

    return buffer;
}

static char *
get_file_path_from_fd(int fd,
                      char *buffer,
                      size_t buffer_size)
{
    ssize_t len;

    if (fd <= 0)
        return NULL;

    sprintf(buffer, "/proc/self/fd/%d", fd);
    if ((len = readlink(buffer, buffer, buffer_size - 1)) < 0)
        return NULL;

    buffer[len] = '\0';
    return buffer;
}

static void event_process(struct fanotify_event_metadata *event)
{
    char path[PATH_MAX];

    printf("Received event in path '%s'",
           get_file_path_from_fd(event->fd,
                                 path,
                                 PATH_MAX)
               ? path
               : "unknown");
    printf(" pid=%d (%s): \n",
           event->pid,
           (get_program_name_from_pid(event->pid,
                                      path,
                                      PATH_MAX)
                ? path
                : "unknown"));

    if (event->mask & FAN_OPEN)
        printf("\tFAN_OPEN\n");
    if (event->mask & FAN_ACCESS)
        printf("\tFAN_ACCESS\n");
    if (event->mask & FAN_MODIFY)
        printf("\tFAN_MODIFY\n");
    if (event->mask & FAN_CLOSE_WRITE)
        printf("\tFAN_CLOSE_WRITE\n");
    if (event->mask & FAN_CLOSE_NOWRITE)
        printf("\tFAN_CLOSE_NOWRITE\n");
    fflush(stdout);

    close(event->fd);
}

static void
shutdown_fanotify(int fanotify_fd)
{
    int i;

    for (i = 0; i < n_monitors; ++i)
    {
        /* Remove the mark, using same event mask as when creating it */
        fanotify_mark(fanotify_fd,
                      FAN_MARK_REMOVE,
                      event_mask,
                      AT_FDCWD,
                      monitors[i].path);
        free(monitors[i].path);
    }
    free(monitors);
    close(fanotify_fd);
}

static int
initialize_fanotify(int argc,
                    const char **argv)
{
    int i;
    int fanotify_fd;

    /* Create new fanotify device */
    if ((fanotify_fd = fanotify_init(FAN_CLASS_CONTENT | FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS,
                                     O_RDONLY | O_LARGEFILE)) < 0)
    {
        fprintf(stderr,
                "Couldn't setup new fanotify device: %s\n",
                strerror(errno));
        return -1;
    }

    /* Allocate array of monitor setups */
    n_monitors = argc - 1;
    monitors = malloc(n_monitors * sizeof(monitored_t));

    /* Loop all input directories, setting up marks */
    for (i = 0; i < n_monitors; ++i)
    {
        monitors[i].path = strdup(argv[i + 1]);
        /* Add new fanotify mark */
        if (fanotify_mark(fanotify_fd,
                          FAN_MARK_ADD | FAN_MARK_MOUNT,
                          event_mask,
                          AT_FDCWD,
                          monitors[i].path) < 0)
        {
            fprintf(stderr,
                    "Couldn't add monitor in directory '%s': '%s'\n",
                    monitors[i].path,
                    strerror(errno));
            return -1;
        }

        printf("Started monitoring directory '%s'...\n",
               monitors[i].path);
    }

    return fanotify_fd;
}

static void
shutdown_signals(int signal_fd)
{
    close(signal_fd);
}

static int
initialize_signals(void)
{
    int signal_fd;
    sigset_t sigmask;

    /* We want to handle SIGINT and SIGTERM in the signal_fd, so we block them. */
    sigemptyset(&sigmask);
    sigaddset(&sigmask, SIGINT);
    sigaddset(&sigmask, SIGTERM);

    if (sigprocmask(SIG_BLOCK, &sigmask, NULL) < 0)
    {
        fprintf(stderr,
                "Couldn't block signals: '%s'\n",
                strerror(errno));
        return -1;
    }

    /* Get new FD to read signals from it */
    if ((signal_fd = signalfd(-1, &sigmask, 0)) < 0)
    {
        fprintf(stderr,
                "Couldn't setup signal FD: '%s'\n",
                strerror(errno));
        return -1;
    }

    return signal_fd;
}

int main(int argc,
         const char **argv)
{
    int signal_fd;
    int fanotify_fd;
    struct pollfd fds[FD_POLL_MAX];

    /* Input arguments... */
    if (argc < 2)
    {
        fprintf(stderr, "Usage: %s directory1 [directory2 ...]\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    /* Initialize signals FD */
    if ((signal_fd = initialize_signals()) < 0)
    {
        fprintf(stderr, "Couldn't initialize signals\n");
        exit(EXIT_FAILURE);
    }

    /* Initialize fanotify FD and the marks */
    if ((fanotify_fd = initialize_fanotify(argc, argv)) < 0)
    {
        fprintf(stderr, "Couldn't initialize fanotify\n");
        exit(EXIT_FAILURE);
    }

    /* Setup polling */
    fds[FD_POLL_SIGNAL].fd = signal_fd;
    fds[FD_POLL_SIGNAL].events = POLLIN;
    fds[FD_POLL_FANOTIFY].fd = fanotify_fd;
    fds[FD_POLL_FANOTIFY].events = POLLIN;

    /* Now loop */
    for (;;)
    {
        /* Block until there is something to be read */
        if (poll(fds, FD_POLL_MAX, -1) < 0)
        {
            fprintf(stderr,
                    "Couldn't poll(): '%s'\n",
                    strerror(errno));
            exit(EXIT_FAILURE);
        }

        /* Signal received? */
        if (fds[FD_POLL_SIGNAL].revents & POLLIN)
        {
            struct signalfd_siginfo fdsi;

            if (read(fds[FD_POLL_SIGNAL].fd, &fdsi, sizeof(fdsi)) != sizeof(fdsi))
            {
                fprintf(stderr, "Couldn't read signal, wrong size read\n");
                exit(EXIT_FAILURE);
            }

            /* Break loop if we got the expected signal */
            if (fdsi.ssi_signo == SIGINT || fdsi.ssi_signo == SIGTERM)
            {
                break;
            }

            fprintf(stderr, "Received unexpected signal\n");
        }

        /* fanotify event received? */
        if (fds[FD_POLL_FANOTIFY].revents & POLLIN)
        {
            char buffer[FANOTIFY_BUFFER_SIZE];
            ssize_t length;

            /* Read from the FD. It will read all events available up to
             * the given buffer size. */
            if ((length = read(fds[FD_POLL_FANOTIFY].fd, buffer, FANOTIFY_BUFFER_SIZE)) > 0)
            {
                struct fanotify_event_metadata *metadata;

                metadata = (struct fanotify_event_metadata *)buffer;
                while (FAN_EVENT_OK(metadata, length))
                {
                    event_process(metadata);
                    if (metadata->fd > 0)
                        close(metadata->fd);
                    metadata = FAN_EVENT_NEXT(metadata, length);
                }
            }
        }
    }

    /* Clean exit */
    shutdown_fanotify(fanotify_fd);
    shutdown_signals(signal_fd);

    printf("Exiting fanotify example...\n");

    return EXIT_SUCCESS;
}

I will not go into a detailed explanation of the sample above because it is not the main point here, but it initializes fanotify with settings that are almost the same as ClamAV's, as shown below.

``` c
/* Setup fanotify notifications (FAN) mask. All these defined in fanotify.h. */
static uint64_t event_mask =
    (FAN_ACCESS |         /* File accessed */
     FAN_MODIFY |         /* File modified */
     FAN_CLOSE_WRITE |    /* Writtable file closed */
     FAN_CLOSE_NOWRITE |  /* Unwrittable file closed */
     FAN_OPEN |           /* File was opened */
     FAN_ONDIR |          /* We want to be reported of events in the directory */
     FAN_EVENT_ON_CHILD); /* We want to be reported of events in files of the directory */

fanotify_init(FAN_CLASS_CONTENT | FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS, O_RDONLY | O_LARGEFILE)
    
fanotify_mark(fanotify_fd,
              FAN_MARK_ADD | FAN_MARK_MOUNT,
              event_mask,
              AT_FDCWD,
              monitors[i].path)

If you build this source code and run the resulting program with a mount point as the argument, you can receive fanotify notifications when directory or file accesses occur within that mount point.

![image-20241027211958055](../../static/media/2024-10-23-clamav-onaccess-scanning/image-20241027211958055.png)

One especially confusing part here was how `FAN_MARK_MOUNT` behaves.

When `FAN_MARK_MOUNT` is specified in the `fanotify_mark` function, fanotify appears to monitor all directories and files inside the "mount point" received via the path argument.

In other words, the object being monitored here seems to be the mount point itself, not a "directory" in the same sense as a Windows folder.

In the environment where I first tested this, the `/home` directory existed under the same mount point as directories such as `/usr`, so even if I specified `/home/user` as the monitor target, I still received access events for every file in that same mount point, such as `/usr/file`.

In practice, after following the steps in the article below to move `/home` to a separate partition and then editing `/etc/fstab` to assign `/home` its own mount point, I was able to monitor only accesses to directories and files under `/home` with fanotify.

Reference: [Moving an in-use home directory to another partition on Ubuntu (no LiveUSB required) - kamocyc’s blog](https://kamocyc.hatenablog.com/entry/2019/10/26/132015)

If you want to monitor operations only on a specific directory or file, it seems better to specify just `FAN_MARK_ADD` rather than `FAN_MARK_MOUNT`.

### Denying operations on a specific file

Once I could receive filesystem events with fanotify, I next rewrote the code so that only operations on a specific file would be denied.

The number of changes is small, but I will include the full source code below.

``` c
/* Define _GNU_SOURCE, Otherwise we don't get O_LARGEFILE */
#define _GNU_SOURCE

#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <poll.h>
#include <errno.h>
#include <limits.h>
#include <sys/stat.h>
#include <sys/signalfd.h>
#include <fcntl.h>

#include <sys/fanotify.h>

/* Structure to keep track of monitored directories */
typedef struct
{
    /* Path of the directory */
    char *path;
} monitored_t;

/* Size of buffer to use when reading fanotify events */
#define FANOTIFY_BUFFER_SIZE 8192

/* Enumerate list of FDs to poll */
enum
{
    FD_POLL_SIGNAL = 0,
    FD_POLL_FANOTIFY,
    FD_POLL_MAX
};

/* Setup fanotify notifications (FAN) mask. All these defined in fanotify.h. */
static uint64_t event_mask =
    (FAN_OPEN_PERM |
     FAN_ACCESS |         /* File accessed */
     FAN_MODIFY |         /* File modified */
     FAN_CLOSE_WRITE |    /* Writtable file closed */
     FAN_CLOSE_NOWRITE |  /* Unwrittable file closed */
     FAN_OPEN |           /* File was opened */
     FAN_ONDIR |          /* We want to be reported of events in the directory */
     FAN_EVENT_ON_CHILD); /* We want to be reported of events in files of the directory */

/* Array of directories being monitored */
static monitored_t *monitors;
static int n_monitors;

static char *
get_program_name_from_pid(int pid,
                          char *buffer,
                          size_t buffer_size)
{
    int fd;
    ssize_t len;
    char *aux;

    /* Try to get program name by PID */
    sprintf(buffer, "/proc/%d/cmdline", pid);
    if ((fd = open(buffer, O_RDONLY)) < 0)
        return NULL;

    /* Read file contents into buffer */
    if ((len = read(fd, buffer, buffer_size - 1)) <= 0)
    {
        close(fd);
        return NULL;
    }
    close(fd);

    buffer[len] = '\0';
    aux = strstr(buffer, "^@");
    if (aux)
        *aux = '\0';

    return buffer;
}

static char *
get_file_path_from_fd(int fd,
                      char *buffer,
                      size_t buffer_size)
{
    ssize_t len;

    if (fd <= 0)
        return NULL;

    sprintf(buffer, "/proc/self/fd/%d", fd);
    if ((len = readlink(buffer, buffer, buffer_size - 1)) < 0)
        return NULL;

    buffer[len] = '\0';
    return buffer;
}

static void event_process(int fanotify_fd, struct fanotify_event_metadata *event)
{
    char path[PATH_MAX];

    printf("Received event in path '%s'",
           get_file_path_from_fd(event->fd,
                                 path,
                                 PATH_MAX)
               ? path
               : "unknown");
    printf(" pid=%d (%s): \n",
           event->pid,
           (get_program_name_from_pid(event->pid,
                                      path,
                                      PATH_MAX)
                ? path
                : "unknown"));

    if (event->mask & FAN_OPEN)
        printf("\tFAN_OPEN\n");
    if (event->mask & FAN_ACCESS)
        printf("\tFAN_ACCESS\n");
    if (event->mask & FAN_MODIFY)
        printf("\tFAN_MODIFY\n");
    if (event->mask & FAN_CLOSE_WRITE)
        printf("\tFAN_CLOSE_WRITE\n");
    if (event->mask & FAN_CLOSE_NOWRITE)
        printf("\tFAN_CLOSE_NOWRITE\n");
    fflush(stdout);

    struct fanotify_response response;
    response.fd = event->fd;

    if (strcmp(get_file_path_from_fd(event->fd,
                                     path,
                                     PATH_MAX),
               "/home/rana/eicar") == 0)
    {
        response.response = FAN_DENY;
    }
    else
    {
        response.response = FAN_ALLOW;
    }

    write(fanotify_fd, &response, sizeof(response));

    close(event->fd);
}

static void
shutdown_fanotify(int fanotify_fd)
{
    int i;

    for (i = 0; i < n_monitors; ++i)
    {
        /* Remove the mark, using same event mask as when creating it */
        fanotify_mark(fanotify_fd,
                      FAN_MARK_REMOVE,
                      event_mask,
                      AT_FDCWD,
                      monitors[i].path);
        free(monitors[i].path);
    }
    free(monitors);
    close(fanotify_fd);
}

static int
initialize_fanotify(int argc,
                    const char **argv)
{
    int i;
    int fanotify_fd;

    /* Create new fanotify device */
    if ((fanotify_fd = fanotify_init(FAN_CLASS_CONTENT | FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS,
                                     O_RDONLY | O_LARGEFILE)) < 0)
    {
        fprintf(stderr,
                "Couldn't setup new fanotify device: %s\n",
                strerror(errno));
        return -1;
    }

    /* Allocate array of monitor setups */
    n_monitors = argc - 1;
    monitors = malloc(n_monitors * sizeof(monitored_t));

    /* Loop all input directories, setting up marks */
    for (i = 0; i < n_monitors; ++i)
    {
        monitors[i].path = strdup(argv[i + 1]);
        /* Add new fanotify mark */
        if (fanotify_mark(fanotify_fd,
                          FAN_MARK_ADD | FAN_MARK_MOUNT,
                          event_mask,
                          AT_FDCWD,
                          monitors[i].path) < 0)
        {
            fprintf(stderr,
                    "Couldn't add monitor in directory '%s': '%s'\n",
                    monitors[i].path,
                    strerror(errno));
            return -1;
        }

        printf("Started monitoring directory '%s'...\n",
               monitors[i].path);
    }

    return fanotify_fd;
}

static void
shutdown_signals(int signal_fd)
{
    close(signal_fd);
}

static int
initialize_signals(void)
{
    int signal_fd;
    sigset_t sigmask;

    /* We want to handle SIGINT and SIGTERM in the signal_fd, so we block them. */
    sigemptyset(&sigmask);
    sigaddset(&sigmask, SIGINT);
    sigaddset(&sigmask, SIGTERM);

    if (sigprocmask(SIG_BLOCK, &sigmask, NULL) < 0)
    {
        fprintf(stderr,
                "Couldn't block signals: '%s'\n",
                strerror(errno));
        return -1;
    }

    /* Get new FD to read signals from it */
    if ((signal_fd = signalfd(-1, &sigmask, 0)) < 0)
    {
        fprintf(stderr,
                "Couldn't setup signal FD: '%s'\n",
                strerror(errno));
        return -1;
    }

    return signal_fd;
}

int main(int argc,
         const char **argv)
{
    int signal_fd;
    int fanotify_fd;
    struct pollfd fds[FD_POLL_MAX];

    /* Input arguments... */
    if (argc < 2)
    {
        fprintf(stderr, "Usage: %s directory1 [directory2 ...]\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    /* Initialize signals FD */
    if ((signal_fd = initialize_signals()) < 0)
    {
        fprintf(stderr, "Couldn't initialize signals\n");
        exit(EXIT_FAILURE);
    }

    /* Initialize fanotify FD and the marks */
    if ((fanotify_fd = initialize_fanotify(argc, argv)) < 0)
    {
        fprintf(stderr, "Couldn't initialize fanotify\n");
        exit(EXIT_FAILURE);
    }

    /* Setup polling */
    fds[FD_POLL_SIGNAL].fd = signal_fd;
    fds[FD_POLL_SIGNAL].events = POLLIN;
    fds[FD_POLL_FANOTIFY].fd = fanotify_fd;
    fds[FD_POLL_FANOTIFY].events = POLLIN;

    /* Now loop */
    for (;;)
    {
        /* Block until there is something to be read */
        if (poll(fds, FD_POLL_MAX, -1) < 0)
        {
            fprintf(stderr,
                    "Couldn't poll(): '%s'\n",
                    strerror(errno));
            exit(EXIT_FAILURE);
        }

        /* Signal received? */
        if (fds[FD_POLL_SIGNAL].revents & POLLIN)
        {
            struct signalfd_siginfo fdsi;

            if (read(fds[FD_POLL_SIGNAL].fd, &fdsi, sizeof(fdsi)) != sizeof(fdsi))
            {
                fprintf(stderr, "Couldn't read signal, wrong size read\n");
                exit(EXIT_FAILURE);
            }

            /* Break loop if we got the expected signal */
            if (fdsi.ssi_signo == SIGINT || fdsi.ssi_signo == SIGTERM)
            {
                break;
            }

            fprintf(stderr, "Received unexpected signal\n");
        }

        /* fanotify event received? */
        if (fds[FD_POLL_FANOTIFY].revents & POLLIN)
        {
            char buffer[FANOTIFY_BUFFER_SIZE];
            ssize_t length;

            /* Read from the FD. It will read all events available up to
             * the given buffer size. */
            if ((length = read(fds[FD_POLL_FANOTIFY].fd, buffer, FANOTIFY_BUFFER_SIZE)) > 0)
            {
                struct fanotify_event_metadata *metadata;

                metadata = (struct fanotify_event_metadata *)buffer;
                while (FAN_EVENT_OK(metadata, length))
                {
                    event_process(fanotify_fd, metadata);
                    if (metadata->fd > 0)
                        close(metadata->fd);
                    metadata = FAN_EVENT_NEXT(metadata, length);
                }
            }
        }
    }

    /* Clean exit */
    shutdown_fanotify(fanotify_fd);
    shutdown_signals(signal_fd);

    printf("Exiting fanotify example...\n");

    return EXIT_SUCCESS;
}

The first key point is that the `FAN_OPEN_PERM` bit is added to the `event_mask` used by the `fanotify_mark` function.

``` c
static uint64_t event_mask =
    (FAN_OPEN_PERM |
     FAN_ACCESS |         /* File accessed */
     FAN_MODIFY |         /* File modified */
     FAN_CLOSE_WRITE |    /* Writtable file closed */
     FAN_CLOSE_NOWRITE |  /* Unwrittable file closed */
     FAN_OPEN |           /* File was opened */
     FAN_ONDIR |          /* We want to be reported of events in the directory */
     FAN_EVENT_ON_CHILD); /* We want to be reported of events in files of the directory */

Once you add this, fanotify suspends the target operation until it receives a notification that allows or denies the operation through the file descriptor created by fanotify_init. (Apparently the default timeout is 5 seconds.)

After that, when a filesystem event is received from fanotify, the code checks whether the target file is /home/rana/eicar, and only if that specific file was accessed does it set the response member of the fanotify_response structure to FAN_DENY.

struct fanotify_response response;
response.fd = event->fd;

if (strcmp(get_file_path_from_fd(event->fd,
                                 path,
                                 PATH_MAX),
                                 "/home/rana/eicar") == 0)
{
    response.response = FAN_DENY;
}
else
{
    response.response = FAN_ALLOW;
}

write(fanotify_fd, &response, sizeof(response));

When you run this program, accesses are denied only when a file at the hard-coded path below is touched.

image-20241027224825657

Reading clamonacc

Now that I have a rough understanding of how to use fanotify, I will take a closer look at the clamonacc code that ClamAV uses when On-Access scanning is enabled.

Checking the implementation of the clamonacc main function

In ClamAV, On-Access scanning seems to be realized by two cooperating processes: clamonacc and clamd.

Since clamd is the multithreaded daemon process that uses libclamav to perform virus scanning, I will start by looking at the clamonacc code to understand the behavior of On-Access scanning.

Initialization

Inside the clamonacc main function, the first step is to initialize storage for onas_context by using the onas_init_context function.

struct onas_context *onas_init_context(void)
{
    struct onas_context *ctx = (struct onas_context *)cli_malloc(sizeof(struct onas_context));
    if (NULL == ctx) {
        return NULL;
    }

    memset(ctx, 0, sizeof(struct onas_context));
    return ctx;
}

Although cli_malloc, which is used here for memory allocation, is a ClamAV-specific function, it appears to be a simple wrapper that allocates memory with malloc using the size passed as its argument.

#define CLI_MAX_ALLOCATION (182 * 1024 * 1024)

void *cli_malloc(size_t size)
{
    void *alloc;

    if (!size || size > CLI_MAX_ALLOCATION) {
        cli_errmsg("cli_malloc(): Attempt to allocate %lu bytes. Please report to https://github.com/Cisco-Talos/clamav/issues\n", (unsigned long int)size);
        return NULL;
    }

    alloc = malloc(size);

    if (!alloc) {
        perror("malloc_problem");
        cli_errmsg("cli_malloc(): Can't allocate memory (%lu bytes).\n", (unsigned long int)size);
        return NULL;
    } else
        return alloc;
}

The `onas_context` structure allocated here contains a variety of information related to scanning, but in the current version it does not seem to be used that heavily.

``` c
struct optstruct {
    char *name;
    char *cmd;
    char *strarg;
    long long numarg;
    int enabled;
    int active;
    int flags;
    int idx;
    struct optstruct *nextarg;
    struct optstruct *next;

    char **filename; /* cmdline */
};

struct onas_context {
    const struct optstruct *opts;
    const struct optstruct *clamdopts;

    int printinfected;
    int maxstream;

    uint32_t ddd_enabled;

    int fan_fd;
    uint64_t fan_mask;
    uint8_t retry_on_error;
    uint8_t retry_attempts;
    uint8_t deny_on_error;

    uint64_t sizelimit;
    uint64_t extinfo;

    int scantype;
    int isremote;
    int session;
    int timeout;

    int64_t portnum;

    int32_t maxthreads;
} __attribute__((packed));

### Getting command-line arguments

The next few lines parse the command-line arguments for `clamonacc`.

The command-line arguments obtained here are stored in the `opts` member of the context, and the information parsed by the `optparse` function is stored in the `clamdopts` member.

``` c
/* Parse out all our command line options */
opts = optparse(NULL, argc, argv, 1, OPT_CLAMONACC, OPT_CLAMSCAN, NULL);
if (opts == NULL) {
    logg("!Clamonacc: can't parse command line options\n");
    return 2;
}
ctx->opts = opts;

if (optget(opts, "verbose")->enabled) {
    mprintf_verbose = 1;
    logg_verbose    = 1;
}

/* And our config file options */
clamdopts = optparse(optget(opts, "config-file")->strarg, 0, NULL, 1, OPT_CLAMD, 0, NULL);
if (clamdopts == NULL) {
    logg("!Clamonacc: can't parse clamd configuration file %s\n", optget(opts, "config-file")->strarg);
    optfree((struct optstruct *)opts);
    return 2;
}
ctx->clamdopts = clamdopts;

Both of these are pointers to `optstruct` structures and are managed as linked lists.

![image-20241025221022858](../../static/media/2024-10-23-clamav-onaccess-scanning/image-20241025221022858.png)

### Registering fanotify

Next comes the startup check performed by the `startup_checks` function.

If this check returns any meaningful value, execution appears to jump to the `done` section and the process exits after cleanup.

``` c
/* Make sure we're good to begin spinup */
ret = startup_checks(ctx);
if (ret) {
    if (ret == (int)CL_BREAK) {
        ret = 0;
    }
    goto done;
}

Inside the `startup_checks` function, fanotify is registered.

#if defined(_GNU_SOURCE)
ctx->fan_fd = fanotify_init(FAN_CLASS_CONTENT | FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS, O_LARGEFILE | O_RDONLY);
#else
ctx->fan_fd = fanotify_init(FAN_CLASS_CONTENT | FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS, O_RDONLY);
#endif
if (ctx->fan_fd < 0) {
    logg("!Clamonacc: fanotify_init failed: %s\n", cli_strerror(errno, faerr, sizeof(faerr)));
    if (errno == EPERM) {
        logg("!Clamonacc: clamonacc must have elevated permissions ... exiting ...\n");
    }
    ret = 2;
    goto done;
}

Registration with fanotify is performed by the `fanotify_init` function.

When I debugged this in my local environment, I confirmed that the `fanotify_init` function was called with the arguments in `fanotify_init(FAN_CLASS_CONTENT | FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS, O_LARGEFILE | O_RDONLY);`.

![image-20241025225143323](../../static/media/2024-10-23-clamav-onaccess-scanning/image-20241025225143323.png)

This function creates and initializes a fanotify group.

As its return value, `fanotify_init` returns the file descriptor for the event queue associated with the group it created.

In `clamonacc`, this return value appears to be stored in the `fan_fd` member of the context.

Reference: [fanotify_init(2) - Linux manual page](https://man7.org/linux/man-pages/man2/fanotify_init.2.html)

The `flag` value passed as the first argument to `fanotify_init` includes the notification class configuration.

In `clamonacc`, it registers `FAN_CLASS_CONTENT`, which is generally used by AntiVirus software.

This class can receive both "events that notify you when a file access has occurred" and "events used to decide whether access to that file should be permitted."

It also specifies `FAN_UNLIMITED_QUEUE` and `FAN_UNLIMITED_MARKS`, which remove the limits on the number of fanotify marks and queued events.

Reference: [fanotify(7) - Linux manual page](https://man7.org/linux/man-pages/man7/fanotify.7.html)

The second argument also includes `O_LARGEFILE | O_RDONLY`.

`O_LARGEFILE` supports monitoring files larger than 2 GB, and `O_RDONLY` appears to configure read-only access.

After calling `fanotify_init`, the monitoring targets are registered via the `fanotify_mark` function.

In `clamonacc`, there are several branches that call `fanotify_mark` depending on the `clamd` configuration. But when a mount point is specified with the `OnAccessMountPath` option, the following code in `fanotif.c`, called from `onas_setup_fanotif`, appears to run.

![image-20241101223410446](../../static/media/2024-10-23-clamav-onaccess-scanning/image-20241101223410446.png)

Here, `fanotify_mark(onas_fan_fd, FAN_MARK_ADD | FAN_MARK_MOUNT, (*ctx)->fan_mask, (*ctx)->fan_fd, pt->strarg)` is executed, and the path specified by `OnAccessMountPath` is passed in the `pt->strarg` argument.

Note that ClamAV cannot use the `OnAccessMountPath` option together with `OnAccessPrevention`, which blocks file access with fanotify.

> The OnAccessMountPath option uses a different fanotify api configuration which makes it incompatible with OnAccessIncludePath and the DDD System. Therefore, inotify watch-point limitations will not be a concern when using this option. Unfortunately, this also means that the following options cannot be used in conjunction with OnAccessMountPath:
>
> OnAccessExtraScanning - is built around catching inotify events.
> OnAccessExcludePath - is built upon the DDD System.
> OnAccessPrevention - would lock up the system if / was selected for OnAccessMountPath. If you need OnAccessPrevention, you should use OnAccessIncludePath instead of OnAccessMountPath.

Reference: [On-Access Scanning - ClamAV Documentation](https://docs.clamav.net/manual/OnAccess.html)

That means if you specify `OnAccessMountPath` rather than `OnAccessIncludePath`, access itself will not be blocked even if a file is detected.

In fact, when you read the `clamonacc` source code, you can confirm that when `OnAccessMountPath` is enabled, the fanotify mask uses `FAN_OPEN` rather than `FAN_OPEN_PERM`.

However, for testing purposes, I modified the source code as shown below so that file access would still be blocked even when `OnAccessMountPath` was enabled.

![image-20241103155744015](../../static/media/2024-10-23-clamav-onaccess-scanning/image-20241103155744015.png)

Once the processing in `onas_setup_fanotif` described above finishes, `onas_handle_signals` and `onas_start_eloop` are called, and monitoring for On-Access scanning begins.

``` c
/* Setup fanotify */
switch (onas_setup_fanotif(&ctx)) {
    case CL_SUCCESS:
        break;
    case CL_BREAK:
        ret = 0;
        goto done;
        break;
    case CL_EARG:
    default:
        mprintf("!Clamonacc: can't setup fanotify\n");
        ret = 2;
        goto done;
        break;
}

**
    
/* Setup signal handling */
g_ctx = ctx;
onas_handle_signals();

logg("*Clamonacc: beginning event loops\n");
/*  Kick off event loop(s) */
ret = onas_start_eloop(&ctx);

Setting up signal handling

The onas_handle_signals function sets up signal handling with the code below.

static void onas_handle_signals()
{
    sigset_t sigset;
    struct sigaction act;

    /* ignore all signals except SIGUSR1 */
    sigfillset(&sigset);
    sigdelset(&sigset, SIGUSR1);
    sigdelset(&sigset, SIGUSR2);
    /* The behavior of a process is undefined after it ignores a
	 * SIGFPE, SIGILL, SIGSEGV, or SIGBUS signal */
    sigdelset(&sigset, SIGFPE);
    sigdelset(&sigset, SIGILL);
    sigdelset(&sigset, SIGSEGV);
    sigdelset(&sigset, SIGINT);
    sigdelset(&sigset, SIGTERM);
#ifdef SIGBUS
    sigdelset(&sigset, SIGBUS);
#endif
    pthread_sigmask(SIG_SETMASK, &sigset, NULL);
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = onas_clamonacc_exit;
    sigfillset(&(act.sa_mask));
    sigaction(SIGUSR2, &act, NULL);
    sigaction(SIGTERM, &act, NULL);
    sigaction(SIGSEGV, &act, NULL);
    sigaction(SIGINT, &act, NULL);
}

After initializing the signal set so that it includes all signals with the `sigfillset` function, the code removes some of those signals from the set with `sigdelset`.

After that, it uses the newly prepared `sigaction` structure named `act` to associate signals such as `SIGTERM` with the `onas_clamonacc_exit` function.

Reference: [sigfillset(3) man page](https://nxmnpg.lemoda.net/ja/3/sigfillset)

When I actually debugged what happened after issuing `SIGTERM` with the `kill` command, I confirmed that `onas_clamonacc_exit` was called and the process terminated.

![image-20241101230336454](../../static/media/2024-10-23-clamav-onaccess-scanning/image-20241101230336454.png)

### Waiting for scan events

If execution reaches this point, the `onas_start_eloop` function is finally called to trigger the event loop.

Inside that function, `onas_fan_eloop` is executed in turn.

A partial implementation of that function is shown below.

``` c
time_t start = time(NULL) - 30;

while (((bread = read((*ctx)->fan_fd, buf, sizeof(buf))) > 0) || (errno == EOVERFLOW || errno == EMFILE || errno == EACCES)) {

    switch (errno) {
        ***
    }

    fmd = (struct fanotify_event_metadata *)buf;
    while (FAN_EVENT_OK(fmd, bread)) {
        
        ***

        scan = 1;

        ***
    }
}

Just like in the sample code used earlier in this article, `fanotify_event_metadata` is read into `buf` from the file descriptor registered with fanotify, the `FAN_EVENT_OK` macro is used to validate the event, and then the `scan` flag is set to 1.

After several additional checks, a variable of type `onas_scan_event` is initialized and populated.

The `onas_scan_event` structure stores information such as the file name obtained with the `readlink` function.

``` c
struct onas_scan_event {
    const char *tcpaddr;
    int64_t portnum;
    char *pathname;
    int fan_fd;
#if defined(HAVE_SYS_FANOTIFY_H)
    struct fanotify_event_metadata *fmd;
#endif
    uint8_t retry_attempts;
    uint64_t sizelimit;
    int32_t scantype;
    int64_t maxstream;
    int64_t timeout;
    uint8_t bool_opts;
} __attribute((packed));


if (scan) {
    struct onas_scan_event *event_data;

    event_data = cli_calloc(1, sizeof(struct onas_scan_event));
    
    ***

    /* general mapping */
    onas_map_context_info_to_event_data(*ctx, &event_data);
    scan ? event_data->bool_opts |= ONAS_SCTH_B_SCAN : scan;

    /* fanotify specific stuffs */
    event_data->bool_opts |= ONAS_SCTH_B_FANOTIFY;
    event_data->fmd = cli_malloc(sizeof(struct fanotify_event_metadata));
    
    ***
           
    memcpy(event_data->fmd, fmd, sizeof(struct fanotify_event_metadata));
    event_data->pathname = cli_strdup(fname);
    
    ***

    logg("*ClamFanotif: attempting to feed consumer queue\n");

    /* feed consumer queue */
    if (CL_SUCCESS != onas_queue_event(event_data)) {
        close(fmd->fd);
        free(event_data->pathname);
        free(event_data->fmd);
        free(event_data);
        logg("!ClamFanotif: error occurred while feeding consumer queue ... \n");
        if ((*ctx)->retry_on_error) {
            err_cnt++;
            if (err_cnt < (*ctx)->retry_attempts) {
                logg("ClamFanotif: ... recovering ...\n");
                fmd = FAN_EVENT_NEXT(fmd, bread);
                continue;
            }
        }
        return 2;
    }
}

The event_data variable created here, which is a value of type onas_scan_event, is ultimately passed to the onas_queue_event function.

Inside this function, it first acquires a lock using the pthread_mutex_t variable onas_queue_lock, which is defined as a global variable.

It then appends a node to the queue list managed around the global variable g_onas_event_queue_tail and writes the event_data received as an argument into that node.

cl_error_t onas_queue_event(struct onas_scan_event *event_data)
{
    struct onas_event_queue_node *node = NULL;
    if (CL_EMEM == onas_new_event_queue_node(&node))
        return CL_EMEM;

    pthread_mutex_lock(&onas_queue_lock);
    node->next                                                            = g_onas_event_queue_tail;
    node->prev                                                            = g_onas_event_queue_tail->prev;
    ((struct onas_event_queue_node *)g_onas_event_queue_tail->prev)->next = node;
    g_onas_event_queue_tail->prev                                         = node;

    node->data = event_data;

    g_onas_event_queue.size++;

    pthread_cond_signal(&onas_scan_queue_empty_cond);
    pthread_mutex_unlock(&onas_queue_lock);

    return CL_SUCCESS;
}

So this appears to be the point where data containing information about the scan target is added to the queue.

### Monitoring the scan queue

Although the order is a bit reversed in this explanation, the following code is executed in the `main` function of `clamonacc.c` before `onas_setup_fanotif` performs the fanotify registration.

/* Setup our event queue */
ctx->maxthreads = optget(ctx->clamdopts, "OnAccessMaxThreads")->numarg;

switch (onas_scan_queue_start(&ctx)) {
    case CL_SUCCESS:
        break;
    case CL_BREAK:
    case CL_EARG:
    case CL_ECREAT:
    default:
        ret = 2;
        logg("!Clamonacc: can't setup event consumer queue\n");
        goto done;
        break;
}

The `onas_scan_queue_start` function called here starts a new thread with the `pthread_create` function.

At that time, the `onas_scan_queue_th` function is set as the thread's `start_routine`.

cl_error_t onas_scan_queue_start(struct onas_context **ctx)
{

    pthread_attr_t scan_queue_attr;
    int32_t thread_started = 1;

    if (!ctx || !*ctx) {
        logg("*ClamScanQueue: unable to start clamonacc. (bad context)\n");
        return CL_EARG;
    }

    if (pthread_attr_init(&scan_queue_attr)) {
        return CL_BREAK;
    }
    pthread_attr_setdetachstate(&scan_queue_attr, PTHREAD_CREATE_JOINABLE);
    thread_started = pthread_create(&scan_queue_pid, &scan_queue_attr, onas_scan_queue_th, *ctx);

    if (0 != thread_started) {
        /* Failed to create thread */
        logg("*ClamScanQueue: Unable to start event consumer queue thread ... \n");
        return CL_ECREAT;
    }

    return CL_SUCCESS;
}

As shown above, inside the `onas_scan_queue_th` function, signal handling is configured with `sigfillset` and `sigdelset`.

This seems to overlap partially with the behavior of the `onas_handle_signals` function introduced earlier. But I do not fully understand this area, so I do not know why `sigfillset` is also used on the `onas_scan_queue_th` side.

When I attached a debugger, the call order of `onas_handle_signals` and `onas_scan_queue_th` appeared inconsistent, and in some cases `onas_handle_signals` seemed to be called first, so I am not entirely sure whether they interfere with each other.

If I learn the reason for this implementation later, I will add a note.

After configuring signal handling, the `onas_init_event_queue` function initializes the `g_onas_event_queue` list of type `onas_event_queue` with size 0, and then the `onas_consume_event` function is executed in a loop.

``` c
void *onas_scan_queue_th(void *arg)
{

    /* not a ton of use for context right now, but perhaps in the future we can pass in more options */
    struct onas_context *ctx = (struct onas_context *)arg;
    sigset_t sigset;
    int ret;

    /* ignore all signals except SIGUSR2 */
    sigfillset(&sigset);
    sigdelset(&sigset, SIGUSR2);
    /* The behavior of a process is undefined after it ignores a
	 * SIGFPE, SIGILL, SIGSEGV, or SIGBUS signal */
    sigdelset(&sigset, SIGFPE);
    sigdelset(&sigset, SIGILL);
    sigdelset(&sigset, SIGSEGV);
    sigdelset(&sigset, SIGTERM);
    sigdelset(&sigset, SIGINT);
#ifdef SIGBUS
    sigdelset(&sigset, SIGBUS);
#endif

    logg("*ClamScanQueue: initializing event queue consumer ... (%d) threads in thread pool\n", ctx->maxthreads);
    onas_init_event_queue();
    threadpool thpool = thpool_init(ctx->maxthreads);
    g_thpool          = thpool;

    /* loop w/ onas_consume_event until we die */
    pthread_cleanup_push(onas_scan_queue_exit, NULL);
    logg("*ClamScanQueue: waiting to consume events ...\n");
    do {
        onas_consume_event(thpool);
    } while (1);

    pthread_cleanup_pop(1);
}

The `onas_consume_event` function pops an element from the `g_onas_event_queue` queue and passes it to the thread pool together with the `onas_scan_worker` task by using the `thpool_add_work` function.

Reference: [Pithikos/C-Thread-Pool: A minimal but powerful thread pool in ANSI C](https://github.com/Pithikos/C-Thread-Pool/tree/master)

``` c
static int onas_queue_is_b_empty()
{

    if (g_onas_event_queue.head->next == g_onas_event_queue.tail) {
        return 1;
    }

    return 0;
}


static int onas_consume_event(threadpool thpool)
{
    pthread_mutex_lock(&onas_queue_lock);

    while (onas_queue_is_b_empty()) {
        pthread_cond_wait(&onas_scan_queue_empty_cond, &onas_queue_lock);
    }

    struct onas_event_queue_node *popped_node = g_onas_event_queue_head->next;
    g_onas_event_queue_head->next             = g_onas_event_queue_head->next->next;
    g_onas_event_queue_head->next->prev       = g_onas_event_queue_head;
    g_onas_event_queue.size--;

    pthread_mutex_unlock(&onas_queue_lock);

    thpool_add_work(thpool, (void *)onas_scan_worker, (void *)popped_node->data);
    onas_destroy_event_queue_node(popped_node);

    return 1;
}

The `onas_scan_worker` function performs the scan job after several conditional branches.

In the environment I am using for testing this time, it passes the event information retrieved from the queue to the `onas_scan_thread_handle_file` function.

![image-20241102181444683](../../static/media/2024-10-23-clamav-onaccess-scanning/image-20241102181444683.png)

After doing some simple checks and obtaining file information, the `onas_scan_thread_handle_file` function passes the initialized `infected` variable and related values to the `onas_scan_thread_scanfile` function.

``` c
ret = onas_scan_thread_scanfile(event_data, curr->fts_path, sb, &infected, &err, &ret_code);

The onas_scan_thread_scanfile function then calls onas_scan -> onas_scan_safe -> onas_client_scan, and eventually executes the onas_dsresult function, which issues the scan request to clamd and obtains the scan result.

onas_scan(event_data, fname, sb, infected, err, ret_code);onas_scan_safe(event_data, fname, sb, infected, err, ret_code);onas_client_scan(event_data->tcpaddr, event_data->portnum, event_data->scantype, event_data->maxstream,fname, fd, event_data->timeout, sb, infected, err, ret_code);if ((ret = onas_dsresult(curl, scantype, maxstream, fname, fd, timeout, &ret, err, ret_code)) >= 0) {
    *infected = ret;
}

Inside the onas_scan_thread_scanfile function, the final decision about whether the target file should be accessible is made based on the scan result received from clamd.

If the scan determines that the target file is malware, the following code sets the response member of the fanotify_response structure to FAN_DENY.

if (b_fanotify) {
    if ((*err && *ret_code && b_deny_on_error) || *infected) {
        res.response = FAN_DENY;
    }
}

With that, if the target file is detected, fanotify blocks access to the file.

By the way, if the fanotify access mask is left as FAN_OPEN_PERM, I cannot make much progress when debugging fanotify itself, so this time I modified the code to use print-based debugging instead.

image-20241103160512144

When I actually tested detection of Eicar with On-Access scanning in my test environment, the code above executed as shown below, and I confirmed that access was denied via FAN_DENY.

image-20241103154210665

Summary

Using ClamAV, an open-source AntiVirus product for Linux, as a reference, I summarized how file access control and On-Access scanning are implemented with fanotify.