2004-11-08 08:53:04

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: unix_gc and file_count

I am trying to clean up uses of file_count in 2.6 as suggested by Viro
earlier. Biggest hurdle is unix_gc. I am trying to get rid of file_count in
unix_gc. But I am not sure if I understand unix_gc very well.
I thought I'd put out questions I've been having...
Appreciate if someone can clarify.

Here goes:

unix_gc pushes a root set onto a stack. This root set is not to be garbage
collected. Root set is determined by the condition:
if (open_count > atomic_read(&unix_sk(s)->inflight))
>From my understanding, for a unix socket, opencount cannot be less than
inflight, simply because unix_*_sendmsg bumps up the f_count of each struct
file of the passed fd array through scm_send. inflight is later bumped up at
unix_attach_fds for unix sockets in the fd[] payload. unix_*_recvmsg bumps
down inflight for unix sockets of the payload fd[] and later does a fput for
all fds of the payload fd array. Hence, the condition for sockets to
be GC'ed is open_count == inflight. If open_count is +'ve for a GC able socket,
it means the open_count is only because the socket is part of a fd payload
waiting to be received. Some of the sockets with open_count == inflight
may not be GC'ed if they happen to be in the receive queue as a fd payload
of a inuse unix socket (on the stack). All sockets which can be GC'ed will
be in the hash list with unix_sk(s)->gc_tree == GC_ORPHAN;
unix_gc then walks through all the unix sockets in the hashtable, and
processes sockets marked for gc (GC_ORPHAN). unix_gc frees up the skbs in the
receive queue of these unix sockets which have a fd[] payload on that skb.
Other skbs are left as is.

1. Even if gc doesn't garbage collect the fd[] payload skbs, they'll be later
freed by unix_release through __fput when the f_count for that socket goes to 0.
Isn't a GC just for fd[] payloads which will anyway be freed wasteful? I
probably am missing something here........
2. Now, other skbs are left as is, what is the need for just the fd[] payload
to be freed? Either free all skbs or free none at all no?
3. If a process does a sendmsg with SCM_RIGHTS payload and the process which
is supposed to do do recvmsg dies, will the payload fd[] f_count not hang
around for ever? Only the fds which are unix sockets will have their skbs having
fd[] load cleaned up. All struct (file)s of the payload remain....

Appreciate if someone can answer the above/point out gaps in my understanding.

Thanks,
Kiran


2004-11-08 10:35:53

by Al Viro

[permalink] [raw]
Subject: Re: unix_gc and file_count

On Mon, Nov 08, 2004 at 02:18:25PM +0530, Ravikiran G Thirumalai wrote:
> unix_gc then walks through all the unix sockets in the hashtable, and
> processes sockets marked for gc (GC_ORPHAN). unix_gc frees up the skbs in the
> receive queue of these unix sockets which have a fd[] payload on that skb.
> Other skbs are left as is.

> 1. Even if gc doesn't garbage collect the fd[] payload skbs, they'll be later
> freed by unix_release through __fput when the f_count for that socket goes to 0.
> Isn't a GC just for fd[] payloads which will anyway be freed wasteful? I
> probably am missing something here........

You are. Put descriptor of AF_UNIX socket into an SCM_RIGHTS datagram.
Send that datagram to that socket. Close all descriptors you had opened.

We won't get the final fput() until all references to struct file are
gone. We won't get all references gone until one in SCM_RIGHTS datagram
is gone. I.e. that datagram has to die *before* we get to unix_release().
I.e. the only thing that can trigger the whole thing is GC.

That's why we need the damn thing in the first place. And that's why
socket and non-socket cases are different.

Funnier case to look at:

fd1 and fd2 are AF_UNIX sockets.
get an SCM_RIGHTS datagram with fd1 in it into the queue of fd2.
get an SCM_RIGHTS datagram with fd2 in it into the queue of fd1.
close all opened descriptors (or just exit)

We have two struct file, each with ->f_count == 1. Each has an AF_UNIX
socket associated with it (with inflight == 1). And the only reference
to either struct file is sitting in skb in the receiving queue of another
one.

Current algorithm is obviously correct: GC_ORPHAN is set on the sockets
that will never have anything received from them. Since we know that
we'll never receive pending SCM_RIGHTS datagrams in their queues, we can
pull all such datagrams out and kill them, which will release the references
to files held by them. And that's all we can kill - a datagram in queue
of non-GC_ORPHAN socket could be eventually received, so we can't drop the
struct file references it carries.