Date: Wed, 13 Aug 2008 11:21:49 -0700
From: Arjan van de Ven <arjan@infradead.org>
To: Theodore Tso <tytso@mit.edu>
Cc: Eric Paris <eparis@redhat.com>, linux-kernel@vger.kernel.org,
       malware-list@lists.printk.net, andi@firstfloor.org, riel@redhat.com,
       greg@kroah.com, viro@ZenIV.linux.org.uk, alan@lxorguk.ukuu.org.uk,
       peterz@infradead.org, hch@infradead.org
Subject: Re: TALPA - a threat model?  well sorta.
Message-ID: <20080813112149.2fda0fa4@infradead.org>
In-Reply-To: <20080813181549.GH8232@mit.edu>
References: <1218645375.3540.71.camel@localhost.localdomain>
	<20080813103951.1e3e5827@infradead.org>
	<20080813181549.GH8232@mit.edu>
Organization: Intel
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4101
Lines: 82

On Wed, 13 Aug 2008 14:15:49 -0400
Theodore Tso <tytso@mit.edu> wrote:

> On Wed, Aug 13, 2008 at 10:39:51AM -0700, Arjan van de Ven wrote:
> > for the "dirty" case it gets muddy. You clearly want to scan "some
> > time" after the write, from the principle of getting rid of malware
> > that's on the disk, but it's unclear if this HAS to be synchronous.
> > (obviously, synchronous behavior hurts performance bigtime so lets
> > do as little as we can of that without hurting the protection).
> 
> Something else to think about is what happens if the file is naturally
> written in pieces.  For example, I've been playing with bittorrent
> recently, and it appears that trackerd will do something... not very
> intelligent in that it will repeatedly try to index a file which is
> being written in pieces, and in some cases, it will do things like
> call pdftext that aren't exactly cheap.  A timeout *can* help (i.e.,
> don't try to scan/index this file until 15 minutes after the last
> write), but it won't help if the torrent is very large, or the
> download bitrate is very slow.  One very simple workaround is to
> disable trackerd altogether while you are downloading the file, but
> that's not very pleasant solution; it's horribly manual.
> 
> Most of this may end up being outside of the kernel (i.e.,some kind of
> interface where a bittorrent client can say, "look this file is still
> being downloaded, so it's don't bother scanning it unless some process
> *other* than the bittorrent client tries to access the file".  And
> maybe there should be some other more complex policies, such as the
> bittorrent client explicitly telling the indexer/scanner that the file
> is has been completely downloaded, so it's safe to index it now.
> 

> verification --- is very much a policy question where different system
> administrators will come down on different sides about what should and
> shouldn't be allowed --- and therefore this kind of policy decision
> should ****NOT**** be in the kernel.

exactly. Even more, since this is async work, the scheduling of the
order of work also is a policy.. and userland is again the right place
for that.

> 
> > For efficiency the kernel ought to keep track of which files have
> > been declared clean, and it needs to track of a 'generation' of the
> > scan with which it has been found clean (so that if you update your
> > virus definitions, you can invalidate all previous scanning just by
> > bumping the 'generation' number in whatever format we use).
> 
> We have an i_version support for NFSv4, so we have that already as far
> as the version of the file.  We can have a single bit which means
> "block on open" that is stored on a file, and some kind of policy
> which dictates whether or not any modification to the file contens
> should automatically set the bit.
> 
> However, questions of which version of virus database was used to scan
> a particular file should be stored outside of the filesystem, since

well I was assuming we only store this in memory (say in the inode) and
just rescan the file if we destroy the in memory inode.
I don't see the need for this to be persistent data; in fact I assume
(Eric, please confirm) that this data is not *supposed* to be
persistent.


> each product will have its own version namespace, and the questions of
> what happens if a user switches from one version checker to another is

yes that's a hard question; what if you have 2 virus scanners active.

(they could register a version of the database with the kernel, and the
in kernel version-cookie could be a hash of all registered versions I
suppose.. if anything changes ever we just rehash and scan as if we
have to do that)

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/