2008-02-06 01:49:52

by Clem Taylor

[permalink] [raw]
Subject: inotify_add_watch() returning ENOSPC in 2.6.24 [watch descriptor leak?]

I'm trying to move a MIPS based embedded system from 2.6.16.16 to
2.6.24. Most things seem to be working, but I'm having troubles with
inotify. The code is using inotify to detect a file written to /tmp
(tmpfs). The writer creates a file with a temporary name and then
rename()s the tmp file over the file I'm monitoring.

With 2.6.16.16, everything works fine, but with 2.6.24, the inotify
process runs for a while (~100 events) and then inotify_add_watch()
returns ENOSPC. Once this happens, I can't add new watches, even if I
kill the process and restart it. fs.inotify.max_user_instances and
fs.inotify.max_user_watches are both 128, so I'd imagine I'm hitting
this limit. For some reason the watches aren't getting cleaned up
(even after the process is killed).

In a loop, the code is doing:
wd = inotify_add_watch(fd, file, IN_CLOSE_WRITE|IN_DELETE_SELF|IN_ONESHOT);
blocking read on notify fd

Has something changed in the inotify() API since 2.6.16.16, or could
this be a leak?

--Clem


2008-02-06 09:51:05

by Andrew Morton

[permalink] [raw]
Subject: Re: inotify_add_watch() returning ENOSPC in 2.6.24 [watch descriptor leak?]

On Tue, 5 Feb 2008 20:49:42 -0500 "Clem Taylor" <[email protected]> wrote:

> I'm trying to move a MIPS based embedded system from 2.6.16.16 to
> 2.6.24. Most things seem to be working, but I'm having troubles with
> inotify. The code is using inotify to detect a file written to /tmp
> (tmpfs). The writer creates a file with a temporary name and then
> rename()s the tmp file over the file I'm monitoring.
>
> With 2.6.16.16, everything works fine, but with 2.6.24, the inotify
> process runs for a while (~100 events) and then inotify_add_watch()
> returns ENOSPC. Once this happens, I can't add new watches, even if I
> kill the process and restart it. fs.inotify.max_user_instances and
> fs.inotify.max_user_watches are both 128, so I'd imagine I'm hitting
> this limit. For some reason the watches aren't getting cleaned up
> (even after the process is killed).
>
> In a loop, the code is doing:
> wd = inotify_add_watch(fd, file, IN_CLOSE_WRITE|IN_DELETE_SELF|IN_ONESHOT);
> blocking read on notify fd
>
> Has something changed in the inotify() API since 2.6.16.16, or could
> this be a leak?
>

Good bug report, thanks. That code was significantly altered in June 2006
and perhaps something broke.

It's a bit hard to find people who work on inotify, I'm afraid. If you had
the time to come up with a script or program which demonstrates the bug,
that would be super-helpful?

2008-02-06 19:41:17

by Clem Taylor

[permalink] [raw]
Subject: Re: inotify_add_watch() returning ENOSPC in 2.6.24 [watch descriptor leak?]

On Feb 6, 2008 4:51 AM, Andrew Morton <[email protected]> wrote:
> On Tue, 5 Feb 2008 20:49:42 -0500 "Clem Taylor" <[email protected]> wrote:
> > I'm trying to move a MIPS based embedded system from 2.6.16.16 to
> > 2.6.24. Most things seem to be working, but I'm having troubles with
> > inotify. The code is using inotify to detect a file written to /tmp
> > (tmpfs). The writer creates a file with a temporary name and then
> > rename()s the tmp file over the file I'm monitoring.
> >
> > With 2.6.16.16, everything works fine, but with 2.6.24, the inotify
> > process runs for a while (~100 events) and then inotify_add_watch()
> > returns ENOSPC. Once this happens, I can't add new watches, even if I
> > kill the process and restart it. fs.inotify.max_user_instances and
> > fs.inotify.max_user_watches are both 128, so I'd imagine I'm hitting
> > this limit. For some reason the watches aren't getting cleaned up
> > (even after the process is killed).

> Good bug report, thanks. That code was significantly altered in June 2006
> and perhaps something broke.

I also tested on a 2.6.20 x86 desktop machine. It took ~8k iterations
to fail, which matched max_user_watches. Once the program fails, it
will fail right away if it is re-run.

> It's a bit hard to find people who work on inotify, I'm afraid. If you had
> the time to come up with a script or program which demonstrates the bug,
> that would be super-helpful?

Attached is a simple example that shows off the problem. On a system
with a problem, it will only run for about
fs.inotify.max_user_watches iterations. If everything is working, it
should run forever.

Thanks,
Clem


Attachments:
(No filename) (1.68 kB)
inotifyLeak.c (3.14 kB)
Download all attachments

2008-02-07 03:04:22

by Amy Griffis

[permalink] [raw]
Subject: Re: inotify_add_watch() returning ENOSPC in 2.6.24 [watch descriptor leak?]

Clem Taylor wrote: [Wed Feb 06 2008, 02:40:58PM EST]
> > Good bug report, thanks. That code was significantly altered in June 2006
> > and perhaps something broke.
>
> I also tested on a 2.6.20 x86 desktop machine. It took ~8k iterations
> to fail, which matched max_user_watches. Once the program fails, it
> will fail right away if it is re-run.
>
> > It's a bit hard to find people who work on inotify, I'm afraid. If you had
> > the time to come up with a script or program which demonstrates the bug,
> > that would be super-helpful?
>
> Attached is a simple example that shows off the problem. On a system
> with a problem, it will only run for about
> fs.inotify.max_user_watches iterations. If everything is working, it
> should run forever.

I'll take a look at this. Thanks for providing a reproducer.

Amy

2008-02-07 18:54:35

by Ulisses Furquim

[permalink] [raw]
Subject: Re: inotify_add_watch() returning ENOSPC in 2.6.24 [watch descriptor leak?]

Hi,

On Feb 6, 2008 4:40 PM, Clem Taylor <[email protected]> wrote:
> I also tested on a 2.6.20 x86 desktop machine. It took ~8k iterations
> to fail, which matched max_user_watches. Once the program fails, it
> will fail right away if it is re-run.

Yeah, I had the same results, and it fails afterwards because it
reaches the maximum number of watches per user.

> Attached is a simple example that shows off the problem. On a system
> with a problem, it will only run for about
> fs.inotify.max_user_watches iterations. If everything is working, it
> should run forever.

Ok, I had a go with it and found the problem. We weren't releasing
one-shot watches because the test for them was wrong. We're using the
event's mask to test for one-shot watches when we should've been using
the watch's mask.

Patch against latest Linus git repo attached.

Regards,

-- Ulisses


Attachments:
(No filename) (874.00 B)
patch (1.01 kB)
Download all attachments

2008-02-07 21:25:49

by Clem Taylor

[permalink] [raw]
Subject: Re: inotify_add_watch() returning ENOSPC in 2.6.24 [watch descriptor leak?]

On Feb 7, 2008 1:54 PM, Ulisses Furquim <[email protected]> wrote:
> Ok, I had a go with it and found the problem. We weren't releasing
> one-shot watches because the test for them was wrong. We're using the
> event's mask to test for one-shot watches when we should've been using
> the watch's mask.

Thanks, this patch seems to fix the problem.

--Clem

2008-02-07 21:44:58

by Andrew Morton

[permalink] [raw]
Subject: Re: inotify_add_watch() returning ENOSPC in 2.6.24 [watch descriptor leak?]

On Thu, 7 Feb 2008 16:24:15 -0500
"Clem Taylor" <[email protected]> wrote:

> On Feb 7, 2008 1:54 PM, Ulisses Furquim <[email protected]> wrote:
> > Ok, I had a go with it and found the problem. We weren't releasing
> > one-shot watches because the test for them was wrong. We're using the
> > event's mask to test for one-shot watches when we should've been using
> > the watch's mask.
>
> Thanks, this patch seems to fix the problem.
>

Awesome. Thanks, guys.