[9p and sunrpc folks added to Cc]
On Thu, Jul 02, 2015 at 04:20:42AM +0100, Al Viro wrote:
> On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote:
> > Mismatched reply could also be a possibility, but only if we end up with
> > sending more than one request with the same tag without waiting for response
> > for the first one.
>
> ... and I think I see what's going on. Tags are 16bit. Suppose the
> server stalls for some reason *and* we keep piling the requests up.
> New tags keep being grabbed by this:
>
> tag = P9_NOTAG;
> if (type != P9_TVERSION) {
> tag = p9_idpool_get(c->tagpool);
> if (tag < 0)
> return ERR_PTR(-ENOMEM);
> }
> tag is int here. Then we pass tag to
> req = p9_tag_alloc(c, tag, req_size);
> and that's what sets req->tc->tag. OK, but... The argument of p9_tag_alloc()
> in u16, so after 2^16 pending requests we'll wrap around. p9_idpool_get()
> will happily return values greater than 65535 - it's using idr and it's
> used (with different pools) for 16bit tags and 32bit FIDs.
>
> Now, p9_tag_alloc(c, 65539, max_size) will return the same req we'd got from
> p9_tag_alloc(c, 3, max_size). And we are fucked - as far as the server is
> concerned, we'd just sent another request with tag 3. And on the client
> there are two threads waiting for responses on the same p9_req_t. Both
> happen to be TWRITE. Response to the first request arrives and we happen
> to let the second thread go at it first. Voila - the first request had
> been for page-sized write() and got successfully handled. The _second_ one
> had been short and is very surprised to see confirmation of 4Kb worth of
> data having been written.
>
> It should be easy to confirm - in p9_client_prepare_req() add
> if (WARN_ON_ONCE(tag != (u16)tag)) {
> p9_idpool_put(tag, c->tagpool);
> return ERR_PTR(-ENOMEM);
> }
> right after
> tag = p9_idpool_get(c->tagpool);
> if (tag < 0)
> return ERR_PTR(-ENOMEM);
>
> and see if it triggers. I'm not sure if failing with ENOMEM is the
> right response (another variant is to sleep there until the pile
> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> not for the real work, but it will do for confirming that this is what
> we are hitting.
FWIW, we probably would be better off with throttling rather than ENOMEM
in such situations. I'm not familiar with sunrpc enough to be sure how
to do that right way (note that RPC equivalent of 9P tags is 32bit, so
the throttling there is based on memory shortage rather than running out
of XID space), but the interesting issues should be similar - potential
deadlocks in near-OOM situations. Suggestions?
[repeating, since my previous email didn't reach mailing lists]
2015-07-02 7:10 GMT+03:00 Al Viro <[email protected]>:
>> It should be easy to confirm - in p9_client_prepare_req() add
>> if (WARN_ON_ONCE(tag != (u16)tag)) {
>> p9_idpool_put(tag, c->tagpool);
>> return ERR_PTR(-ENOMEM);
>> }
>> right after
>> tag = p9_idpool_get(c->tagpool);
>> if (tag < 0)
>> return ERR_PTR(-ENOMEM);
>>
>> and see if it triggers. I'm not sure if failing with ENOMEM is the
>> right response (another variant is to sleep there until the pile
>> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
>> not for the real work, but it will do for confirming that this is what
>> we are hitting.
>
Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.
On Thu, Jul 02, 2015 at 10:19:07AM +0300, Andrey Ryabinin wrote:
> On 07/02/2015 07:10 AM, Al Viro wrote:
> >>
> >> It should be easy to confirm - in p9_client_prepare_req() add
> >> if (WARN_ON_ONCE(tag != (u16)tag)) {
> >> p9_idpool_put(tag, c->tagpool);
> >> return ERR_PTR(-ENOMEM);
> >> }
> >> right after
> >> tag = p9_idpool_get(c->tagpool);
> >> if (tag < 0)
> >> return ERR_PTR(-ENOMEM);
> >>
> >> and see if it triggers. I'm not sure if failing with ENOMEM is the
> >> right response (another variant is to sleep there until the pile
> >> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> >> not for the real work, but it will do for confirming that this is what
> >> we are hitting.
> >
>
> Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.
While the one in p9_client_write() (on rsize < count) did?
On Thu, Jul 02, 2015 at 10:50:05AM +0300, Andrey Ryabinin wrote:
> >> and see if it triggers. I'm not sure if failing with ENOMEM is the
> >> right response (another variant is to sleep there until the pile
> >> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> >> not for the real work, but it will do for confirming that this is what
> >> we are hitting.
> >
>
> Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.
Summary for those who'd missed the beginning of the thread: what we are
seeing is p9_client_write() issing TWRITE and getting RWRITE in reply
(tags match, packets look plausible) with count in RWRITE way more than
that in TWRITE.
IOW, we are telling the server to write e.g. 93 bytes and are getting told
that yes, the write had been successful - all 4096 bytes of it.
qemu virtio-9p for server; from my reading of qemu side of things, it can't
be sending reply with count greater than that in request.
The bug I suspected to be the cause of that is in tag allocation in
net/9p/client.c - we could end up wrapping around 2^16 with enough pending
requests and that would have triggered that kind of mess. However, Andrey
doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
BTW, was that on the run where debugging printk in p9_client_write() *did*
trigger?
On 07/02/2015 10:59 AM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 10:50:05AM +0300, Andrey Ryabinin wrote:
>
>>>> and see if it triggers. I'm not sure if failing with ENOMEM is the
>>>> right response (another variant is to sleep there until the pile
>>>> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
>>>> not for the real work, but it will do for confirming that this is what
>>>> we are hitting.
>>>
>>
>> Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.
>
> Summary for those who'd missed the beginning of the thread: what we are
> seeing is p9_client_write() issing TWRITE and getting RWRITE in reply
> (tags match, packets look plausible) with count in RWRITE way more than
> that in TWRITE.
>
> IOW, we are telling the server to write e.g. 93 bytes and are getting told
> that yes, the write had been successful - all 4096 bytes of it.
>
> qemu virtio-9p for server; from my reading of qemu side of things, it can't
> be sending reply with count greater than that in request.
Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
this under kvmtool as well. It just takes a bit longer to reproduce
this in kvmtool.
> The bug I suspected to be the cause of that is in tag allocation in
> net/9p/client.c - we could end up wrapping around 2^16 with enough pending
> requests and that would have triggered that kind of mess. However, Andrey
> doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
> BTW, was that on the run where debugging printk in p9_client_write() *did*
> trigger?
Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
but debug printk in p9_client_write() *did* trigger.
On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote:
> Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
> this under kvmtool as well. It just takes a bit longer to reproduce
> this in kvmtool.
>
> > The bug I suspected to be the cause of that is in tag allocation in
> > net/9p/client.c - we could end up wrapping around 2^16 with enough pending
> > requests and that would have triggered that kind of mess. However, Andrey
> > doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
> > BTW, was that on the run where debugging printk in p9_client_write() *did*
> > trigger?
>
> Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
> but debug printk in p9_client_write() *did* trigger.
Bloody wonderful... Could you check if v9fs_write() in qemu
hw/9pfs/virtio-9p.c ever gets to
offset = 7;
err = pdu_marshal(pdu, offset, "d", total);
with total > count on your testcase?
On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote:
> On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote:
> > Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
> > this under kvmtool as well. It just takes a bit longer to reproduce
> > this in kvmtool.
> >
> > > The bug I suspected to be the cause of that is in tag allocation in
> > > net/9p/client.c - we could end up wrapping around 2^16 with enough pending
> > > requests and that would have triggered that kind of mess. However, Andrey
> > > doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
> > > BTW, was that on the run where debugging printk in p9_client_write() *did*
> > > trigger?
> >
> > Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
> > but debug printk in p9_client_write() *did* trigger.
>
> Bloody wonderful... Could you check if v9fs_write() in qemu
> hw/9pfs/virtio-9p.c ever gets to
> offset = 7;
> err = pdu_marshal(pdu, offset, "d", total);
> with total > count on your testcase?
Another thing that might be worth checking: in p9_tag_alloc() (net/9p/client.c)
before
req->status = REQ_STATUS_ALLOC;
check that req->status == REQ_STATUS_IDLE and yell if it isn't.
BTW, the loop in there (
/* check again since original check was outside of lock */
while (tag >= c->max_tag) {
) looks fishy. If we get more than P9_ROW_MAXTAG allocations at once,
we'll have trouble, but I doubt that this is what we are hitting. In any
case, adding WARN_ON(c->req[row]); right after
row = (tag / P9_ROW_MAXTAG);
wouldn't hurt. I would be very surprised if that one triggered, though.
On 07/02/2015 11:42 AM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote:
>> On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote:
>>> Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
>>> this under kvmtool as well. It just takes a bit longer to reproduce
>>> this in kvmtool.
>>>
>>>> The bug I suspected to be the cause of that is in tag allocation in
>>>> net/9p/client.c - we could end up wrapping around 2^16 with enough pending
>>>> requests and that would have triggered that kind of mess. However, Andrey
>>>> doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
>>>> BTW, was that on the run where debugging printk in p9_client_write() *did*
>>>> trigger?
>>>
>>> Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
>>> but debug printk in p9_client_write() *did* trigger.
>>
>> Bloody wonderful... Could you check if v9fs_write() in qemu
>> hw/9pfs/virtio-9p.c ever gets to
>> offset = 7;
>> err = pdu_marshal(pdu, offset, "d", total);
>> with total > count on your testcase?
Added:
+ if (total > count)
+ *(char *)0 = 0
and never hit this condition.
>
> Another thing that might be worth checking: in p9_tag_alloc() (net/9p/client.c)
> before
> req->status = REQ_STATUS_ALLOC;
> check that req->status == REQ_STATUS_IDLE and yell if it isn't.
>
diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..16a17a0 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -286,6 +286,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
p9pdu_reset(req->rc);
req->tc->tag = tag-1;
+ if (WARN_ON(req->status != REQ_STATUS_IDLE))
+ pr_err("req->status: %d\n", req->status);
req->status = REQ_STATUS_ALLOC;
return req;
[ 150.155020] ------------[ cut here ]------------
[ 150.156700] WARNING: CPU: 2 PID: 2304 at ../net/9p/client.c:289 p9_client_prepare_req+0x3b0/0x550()
[ 150.158404] Modules linked in:
[ 150.160177] CPU: 2 PID: 2304 Comm: trinity-c84 Not tainted 4.1.0-rc8+ #409
[ 150.161794] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[ 150.165540] 0000000000000009 ffff8801ed13f7a8 ffffffff8161434b 0000000000000000
[ 150.170939] 0000000000000000 ffff8801ed13f7f8 ffffffff8107cf99 ffff8801f451d5f0
[ 150.175942] ffffffff815f6760 0000000000000003 ffff8800bbac00e0 ffff8800bbac00f0
[ 150.178393] Call Trace:
[ 150.178883] [<ffffffff8161434b>] dump_stack+0x45/0x57
[ 150.179914] [<ffffffff8107cf99>] warn_slowpath_common+0x99/0xe0
[ 150.181375] [<ffffffff815f6760>] ? p9_client_prepare_req+0x3b0/0x550
[ 150.182597] [<ffffffff8107d145>] warn_slowpath_null+0x15/0x20
[ 150.184067] [<ffffffff815f6760>] p9_client_prepare_req+0x3b0/0x550
[ 150.185043] [<ffffffff815fb1e2>] p9_client_zc_rpc.constprop.5+0xe2/0x730
[ 150.186229] [<ffffffff8161ba46>] ? _raw_spin_unlock+0x16/0x70
[ 150.187049] [<ffffffff815fb100>] ? p9_client_xattrwalk+0x1b0/0x1b0
[ 150.188477] [<ffffffff812b1b5e>] ? idr_remove+0x2ce/0x420
[ 150.189443] [<ffffffff815f5af0>] ? v9fs_unregister_trans+0x70/0x70
[ 150.190456] [<ffffffff812b1890>] ? idr_mark_full+0x80/0x80
[ 150.191489] [<ffffffff815f5af0>] ? v9fs_unregister_trans+0x70/0x70
[ 150.193911] [<ffffffff8161bae7>] ? _raw_spin_unlock_irqrestore+0x47/0xb0
[ 150.195462] [<ffffffff815fcae4>] ? p9_idpool_put+0x54/0x60
[ 150.196729] [<ffffffff812c9906>] ? iov_iter_advance+0xb6/0x240
[ 150.199766] [<ffffffff815fbea3>] p9_client_write+0x333/0x3d0
[ 150.201073] [<ffffffff811a882e>] ? kasan_kmalloc+0x5e/0x70
[ 150.202512] [<ffffffff815fbb70>] ? p9_client_readdir+0x340/0x340
[ 150.204115] [<ffffffff811c80dd>] ? rw_copy_check_uvector+0xed/0x170
[ 150.204960] [<ffffffff812d82d6>] ? __percpu_counter_add+0x26/0xb0
[ 150.206517] [<ffffffff8113574a>] ? generic_write_checks+0xfa/0x1e0
[ 150.208092] [<ffffffff8125c054>] v9fs_file_write_iter+0xc4/0x200
[ 150.209642] [<ffffffff811c9840>] ? __sb_end_write+0x80/0x80
[ 150.211305] [<ffffffff8125bf90>] ? v9fs_file_lock_dotl+0x3d0/0x3d0
[ 150.216908] [<ffffffff81128d14>] ? ctx_sched_in.isra.57+0xe4/0x2f0
[ 150.221069] [<ffffffff811c6d84>] ? rw_verify_area+0x54/0x150
[ 150.222570] [<ffffffff811c7363>] do_readv_writev+0x223/0x450
[ 150.229044] [<ffffffff811290b7>] ? perf_event_context_sched_in.isra.61+0x127/0x180
[ 150.231020] [<ffffffff8125bf90>] ? v9fs_file_lock_dotl+0x3d0/0x3d0
[ 150.231862] [<ffffffff811c7140>] ? vfs_write+0x1e0/0x1e0
[ 150.232583] [<ffffffff81129b7a>] ? __perf_event_task_sched_in+0x5a/0xa0
[ 150.233471] [<ffffffff810aee08>] ? finish_task_switch+0xa8/0x1b0
[ 150.234282] [<ffffffff8161588b>] ? __schedule+0x3db/0xc90
[ 150.235020] [<ffffffff81616197>] ? schedule+0x57/0xd0
[ 150.235709] [<ffffffff8161ba46>] ? _raw_spin_unlock+0x16/0x70
[ 150.236493] [<ffffffff81618a0f>] ? __mutex_lock_slowpath+0x1df/0x200
[ 150.239059] [<ffffffff811c9840>] ? __sb_end_write+0x80/0x80
[ 150.242120] [<ffffffff81618830>] ? __ww_mutex_lock_interruptible+0xe0/0xe0
[ 150.245274] [<ffffffff810ecfd3>] ? hrtimer_start+0x13/0x20
[ 150.247527] [<ffffffff810ee54d>] ? do_setitimer+0x30d/0x400
[ 150.249351] [<ffffffff811c764e>] vfs_writev+0x4e/0x70
[ 150.250378] [<ffffffff811c8348>] SyS_writev+0xa8/0x140
[ 150.251545] [<ffffffff811c82a0>] ? SyS_readv+0x140/0x140
[ 150.253208] [<ffffffff8161c2ae>] system_call_fastpath+0x12/0x71
[ 150.256990] ---[ end trace 4f640ea141ed3d61 ]---
[ 150.259076] 9pnet: req->status: 4
> BTW, the loop in there (
> /* check again since original check was outside of lock */
> while (tag >= c->max_tag) {
> ) looks fishy. If we get more than P9_ROW_MAXTAG allocations at once,
> we'll have trouble, but I doubt that this is what we are hitting. In any
> case, adding WARN_ON(c->req[row]); right after
I didn't get this. c->reqs[row] is always non-NULL as it should be, so this warning
will trigger all the time.
> row = (tag / P9_ROW_MAXTAG);
> wouldn't hurt. I would be very surprised if that one triggered, though.
>
On Thu, Jul 02, 2015 at 03:19:57PM +0300, Andrey Ryabinin wrote:
> Added:
> + if (total > count)
> + *(char *)0 = 0
>
> and never hit this condition.
>
OK, so it's definitely a mismatched response.
> req->tc->tag = tag-1;
> + if (WARN_ON(req->status != REQ_STATUS_IDLE))
> + pr_err("req->status: %d\n", req->status);
> req->status = REQ_STATUS_ALLOC;
>
> return req;
> [ 150.259076] 9pnet: req->status: 4
IOW, REQ_STATUS_RCVD. Hmm... Stray tag seen by req_done() after we'd already
freed the tag in question? That, or it really would have to had wrapped
around... Note that req_done() does *not* check anything about the req -
not even that p9_tag_lookup() hasn't returned NULL, so a server sending you
any response tagged with number well above anything you'd ever sent will
reliably oops you.
Frankly, the whole thing needs fuzzing from the server side - start throwing
crap at the client and see how badly does it get fucked... Folks, it's
a network protocol, with userland servers, no less. You *can't* assume
them competent and non-malicious...
How much traffic does it take to reproduce that fun, BTW? IOW, is attempting
to log the sequence of tag {allocation,freeing}/tag of packet being {sent,
received} something completely suicidal, or is it more or less feasible?
> I didn't get this. c->reqs[row] is always non-NULL as it should be, so this warning
> will trigger all the time.
????
row = (tag / P9_ROW_MAXTAG);
c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
sizeof(struct p9_req_t), GFP_ATOMIC);
and you are seeing c->reqs[row] != NULL *BEFORE* that kcalloc()? All the time,
no less? Just to make sure we are on the same page - the delta against
mainline I would like tested is this:
diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..fa88c9a 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -248,6 +248,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
/* check again since original check was outside of lock */
while (tag >= c->max_tag) {
row = (tag / P9_ROW_MAXTAG);
+
+ WARN_ON_ONCE(c->reqs[row]); // are we about to leak?
+
c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
sizeof(struct p9_req_t), GFP_ATOMIC);
@@ -286,6 +289,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
p9pdu_reset(req->rc);
req->tc->tag = tag-1;
+ if (req->status != REQ_STATUS_IDLE)
+ pr_err("using tag %d with odd status (%d)", tag, req->status);
req->status = REQ_STATUS_ALLOC;
return req;
@@ -425,6 +430,8 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
* the other thread wakes up will indeed be seen by the waiting side.
*/
smp_wmb();
+ if (req->status == REQ_STATUS_IDLE)
+ pr_err("late delivery, tag %d already freed", req->tc->tag);
req->status = status;
wake_up(req->wq);
@@ -693,6 +700,10 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
tag = p9_idpool_get(c->tagpool);
if (tag < 0)
return ERR_PTR(-ENOMEM);
+ if (WARN_ON_ONCE(tag != (u16)tag)) { // wrapped around?
+ p9_idpool_put(tag, c->tagpool);
+ return ERR_PTR(-ENOMEM);
+ }
}
req = p9_tag_alloc(c, tag, req_size);
@@ -1647,7 +1658,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
if (*err) {
trace_9p_protocol_dump(clnt, req->rc);
p9_free_req(clnt, req);
+ break;
}
+ if (rsize < count)
+ pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
> req->tc->tag = tag-1;
> + if (req->status != REQ_STATUS_IDLE)
> + pr_err("using tag %d with odd status (%d)", tag, req->status);
Should be tag - 1 here, actually.
diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..9719886 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -248,6 +248,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
/* check again since original check was outside of lock */
while (tag >= c->max_tag) {
row = (tag / P9_ROW_MAXTAG);
+
+ WARN_ON_ONCE(c->reqs[row]); // are we about to leak?
+
c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
sizeof(struct p9_req_t), GFP_ATOMIC);
@@ -286,6 +289,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
p9pdu_reset(req->rc);
req->tc->tag = tag-1;
+ if (req->status != REQ_STATUS_IDLE)
+ pr_err("using tag %d with odd status (%d)", tag - 1, req->status);
req->status = REQ_STATUS_ALLOC;
return req;
@@ -425,6 +430,8 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
* the other thread wakes up will indeed be seen by the waiting side.
*/
smp_wmb();
+ if (req->status == REQ_STATUS_IDLE)
+ pr_err("late delivery, tag %d already freed", req->tc->tag);
req->status = status;
wake_up(req->wq);
@@ -693,6 +700,10 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
tag = p9_idpool_get(c->tagpool);
if (tag < 0)
return ERR_PTR(-ENOMEM);
+ if (WARN_ON_ONCE(tag != (u16)tag)) { // wrapped around?
+ p9_idpool_put(tag, c->tagpool);
+ return ERR_PTR(-ENOMEM);
+ }
}
req = p9_tag_alloc(c, tag, req_size);
@@ -1647,7 +1658,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
if (*err) {
trace_9p_protocol_dump(clnt, req->rc);
p9_free_req(clnt, req);
+ break;
}
+ if (rsize < count)
+ pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
2015-07-02 19:43 GMT+03:00 Al Viro <[email protected]>:
> On Thu, Jul 02, 2015 at 03:19:57PM +0300, Andrey Ryabinin wrote:
>
>> Added:
>> + if (total > count)
>> + *(char *)0 = 0
>>
>> and never hit this condition.
>>
>
> OK, so it's definitely a mismatched response.
>
>> req->tc->tag = tag-1;
>> + if (WARN_ON(req->status != REQ_STATUS_IDLE))
>> + pr_err("req->status: %d\n", req->status);
>> req->status = REQ_STATUS_ALLOC;
>>
>> return req;
>
>> [ 150.259076] 9pnet: req->status: 4
>
> IOW, REQ_STATUS_RCVD. Hmm... Stray tag seen by req_done() after we'd already
> freed the tag in question? That, or it really would have to had wrapped
> around... Note that req_done() does *not* check anything about the req -
> not even that p9_tag_lookup() hasn't returned NULL, so a server sending you
> any response tagged with number well above anything you'd ever sent will
> reliably oops you.
>
> Frankly, the whole thing needs fuzzing from the server side - start throwing
> crap at the client and see how badly does it get fucked... Folks, it's
> a network protocol, with userland servers, no less. You *can't* assume
> them competent and non-malicious...
>
> How much traffic does it take to reproduce that fun, BTW? IOW, is attempting
> to log the sequence of tag {allocation,freeing}/tag of packet being {sent,
> received} something completely suicidal, or is it more or less feasible?
>
No idea. Usually it takes 1-2 minutes after trinity (100 threads) starts.
>> I didn't get this. c->reqs[row] is always non-NULL as it should be, so this warning
>> will trigger all the time.
>
> ????
> row = (tag / P9_ROW_MAXTAG);
> c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
> sizeof(struct p9_req_t), GFP_ATOMIC);
>
> and you are seeing c->reqs[row] != NULL *BEFORE* that kcalloc()? All the time,
> no less? Just to make sure we are on the same page - the delta against
> mainline I would like tested is this:
>
Ah, I was looking at the second ' row = tag / P9_ROW_MAXTAG;' line
which is after kcalloc().
I'll check tomorrow then.
On 07/02/2015 07:49 PM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
>> req->tc->tag = tag-1;
>> + if (req->status != REQ_STATUS_IDLE)
>> + pr_err("using tag %d with odd status (%d)", tag, req->status);
>
> Should be tag - 1 here, actually.
So, with this change:
[ 81.654695] 9pnet: late delivery, tag 2 already freed
[ 81.655186] 9pnet: mismatched reply [tag = 2]
[ 85.014795] 9pnet: late delivery, tag 4 already freed
[ 85.015753] 9pnet: late delivery, tag 4 already freed
[ 85.016473] 9pnet: late delivery, tag 4 already freed
[ 87.275842] 9pnet: mismatched reply [tag = 5]
[ 88.943223] 9pnet: late delivery, tag 4 already freed
[ 88.943790] 9pnet: late delivery, tag 4 already freed
[ 88.944320] 9pnet: late delivery, tag 4 already freed
[ 88.945050] 9pnet: mismatched reply [tag = 4]
[ 105.551220] 9pnet: late delivery, tag 3 already freed
[ 105.552065] 9pnet: late delivery, tag 3 already freed
[ 114.528467] 9pnet: late delivery, tag 4 already freed
[ 164.791295] 9pnet: late delivery, tag 2 already freed
[ 164.792156] 9pnet: using tag 2 with odd status (4)
[ 164.864398] 9pnet: mismatched reply [tag = 4]
[ 166.419993] 9pnet: mismatched reply [tag = 3]
[ 174.062625] 9pnet: late delivery, tag 3 already freed
[ 174.063121] 9pnet: late delivery, tag 3 already freed
[ 174.063637] 9pnet: late delivery, tag 3 already freed
[ 174.064018] 9pnet: late delivery, tag 3 already freed
[ 174.064547] 9pnet: mismatched reply [tag = 3]
[ 175.418729] 9pnet: late delivery, tag 3 already freed
[ 177.911727] 9pnet: mismatched reply [tag = 1]
[ 235.748520] 9pnet: late delivery, tag 1 already freed
[ 235.749595] 9pnet: using tag 1 with odd status (4)
[ 243.496782] 9pnet: late delivery, tag 6 already freed
[ 243.497697] 9pnet: late delivery, tag 6 already freed
[ 243.499079] 9pnet: mismatched reply [tag = 6]
[ 243.736388] 9pnet: late delivery, tag 1 already freed
[ 243.740019] 9pnet: late delivery, tag 1 already freed
[ 248.337712] 9pnet: late delivery, tag 2 already freed
[ 249.872950] 9pnet: mismatched reply [tag = 9]
[ 252.448858] 9pnet: mismatched reply [tag = 8]
[ 252.668878] 9pnet: mismatched reply [tag = 11]
On Fri, Jul 03, 2015 at 11:19:31AM +0300, Andrey Ryabinin wrote:
> On 07/02/2015 07:49 PM, Al Viro wrote:
> > On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
> >> req->tc->tag = tag-1;
> >> + if (req->status != REQ_STATUS_IDLE)
> >> + pr_err("using tag %d with odd status (%d)", tag, req->status);
> >
> > Should be tag - 1 here, actually.
>
> So, with this change:
>
> [ 81.654695] 9pnet: late delivery, tag 2 already freed
> [ 81.655186] 9pnet: mismatched reply [tag = 2]
> [ 85.014795] 9pnet: late delivery, tag 4 already freed
> [ 85.015753] 9pnet: late delivery, tag 4 already freed
> [ 85.016473] 9pnet: late delivery, tag 4 already freed
> [ 87.275842] 9pnet: mismatched reply [tag = 5]
> [ 88.943223] 9pnet: late delivery, tag 4 already freed
> [ 88.943790] 9pnet: late delivery, tag 4 already freed
> [ 88.944320] 9pnet: late delivery, tag 4 already freed
> [ 88.945050] 9pnet: mismatched reply [tag = 4]
> [ 105.551220] 9pnet: late delivery, tag 3 already freed
> [ 105.552065] 9pnet: late delivery, tag 3 already freed
> [ 114.528467] 9pnet: late delivery, tag 4 already freed
> [ 164.791295] 9pnet: late delivery, tag 2 already freed
> [ 164.792156] 9pnet: using tag 2 with odd status (4)
> [ 164.864398] 9pnet: mismatched reply [tag = 4]
> [ 166.419993] 9pnet: mismatched reply [tag = 3]
> [ 174.062625] 9pnet: late delivery, tag 3 already freed
> [ 174.063121] 9pnet: late delivery, tag 3 already freed
> [ 174.063637] 9pnet: late delivery, tag 3 already freed
> [ 174.064018] 9pnet: late delivery, tag 3 already freed
> [ 174.064547] 9pnet: mismatched reply [tag = 3]
> [ 175.418729] 9pnet: late delivery, tag 3 already freed
> [ 177.911727] 9pnet: mismatched reply [tag = 1]
> [ 235.748520] 9pnet: late delivery, tag 1 already freed
> [ 235.749595] 9pnet: using tag 1 with odd status (4)
> [ 243.496782] 9pnet: late delivery, tag 6 already freed
> [ 243.497697] 9pnet: late delivery, tag 6 already freed
> [ 243.499079] 9pnet: mismatched reply [tag = 6]
> [ 243.736388] 9pnet: late delivery, tag 1 already freed
> [ 243.740019] 9pnet: late delivery, tag 1 already freed
> [ 248.337712] 9pnet: late delivery, tag 2 already freed
> [ 249.872950] 9pnet: mismatched reply [tag = 9]
> [ 252.448858] 9pnet: mismatched reply [tag = 8]
> [ 252.668878] 9pnet: mismatched reply [tag = 11]
FWIW, I'd reproduced it here as well. With the addition of checking in
p9_client_cb() whether the tag is already freed in IDR, reporting ->status
when it's not "sent" and reporting Tflush, I've got this:
[ 2754.713015] 9pnet: flushing 1
[ 2755.516123] 9pnet: flush 1 [3]done
[ 2758.443265] 9pnet: flushing 16
[ 2768.655768] 9pnet: flush 16 [8]done
[ 2783.939538] 9pnet: flushing 30
[ 2786.067856] 9pnet: flush 30 [3]done
[ 2809.784119] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2809.948681] 9pnet: [ffff880113213000] using tag 1 with odd status (4)
Here we have delivery when that sucker is into p9_free_req(), but hasn't
removed from IDR yet.
[ 2809.948681] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2816.767861] 9pnet: [ffff880113213000] using tag 1 with odd status (4)
Ditto.
[ 2816.767861] 9pnet: flushing 4
[ 2816.769484] 9pnet: flush 4 [5]done
[ 2846.327753] 9pnet: flushing 3
[ 2854.876131] 9pnet: flush 3 [4]done
[ 2856.492801] 9pnet: flushing 6
[ 2857.675642] 9pnet: flush 6 [1]done
[ 2860.051701] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2860.052941] 9pnet: [ffff880113213000] late delivery, tag 1 already freed in IDR
[ 2860.680181] 9pnet: [ffff880113213000] using tag 1 with odd status (4)
This has hit between p9_free_req() and reallocating that request.
[ 2909.911815] 9pnet: [ffff880113213000] delivery in 4, tag 1
[ 2909.913145] 9pnet: [ffff880113213000] late delivery, tag 1 already freed in IDR
... while _this_ one has raced with p9_free_req() in a different way.
[ 2910.852202] 9pnet: flushing 3
[ 2917.985941] 9pnet: flush 3 [4]done
[ 2937.600652] 9pnet: flushing 2
[ 2939.775354] 9pnet: flush 2 [4]done
[ 2961.521600] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2962.320383] 9pnet: [ffff880113213000] using tag 1 with odd status (4)
Same at the very first one.
[ 2962.320383] 9pnet: [ffff880113213000] delivery in 4, tag 8
[ 3001.578372] 9pnet: [ffff880113213000] delivery in 4, tag 6
Those had hit before p9_free_req().
AFAICS, we get occasional stray responses from somewhere. And no, it doesn't
seem to be related to flushes or to dropping chan->lock in req_done() (this
run had been with chan->lock taken on the outside of the loop).
What I really don't understand is WTF is it playing with p9_tag_lookup() -
it's stashing req->tc via virtqueue_add_sgs() opaque data argument, fetches
it back in req_done(), then picks ->tag from it and uses p9_tag_lookup() to
find req. Why not simply pass req instead? I had been wrong about that
p9_tag_lookup() being able to return NULL, but why bother with it at all?
On Fri, Jul 03, 2015 at 10:42:10AM +0100, Al Viro wrote:
> AFAICS, we get occasional stray responses from somewhere. And no, it doesn't
> seem to be related to flushes or to dropping chan->lock in req_done() (this
> run had been with chan->lock taken on the outside of the loop).
>
> What I really don't understand is WTF is it playing with p9_tag_lookup() -
> it's stashing req->tc via virtqueue_add_sgs() opaque data argument, fetches
> it back in req_done(), then picks ->tag from it and uses p9_tag_lookup() to
> find req. Why not simply pass req instead? I had been wrong about that
> p9_tag_lookup() being able to return NULL, but why bother with it at all?
Got it. What happens is that on zero-copy path a signal hitting in the
end of p9_virtio_zc_request() is treated as "it hadn't been sent, got
an error, fuck off and mark the tag ready for reuse". No TFLUSH issued,
etc. As the result, when reply finally *does* arrive (we had actually
sent the request), it plays hell on the entire thing - tag might very
well have been reused by then and an unrelated request sent with the
same tag. Depending on the timing, results can get rather ugly.
There are still other bogosities found in this thread, and at the very
least we need to cope with genuine corrupted response from server, but
the patch below fixes the problem with stray responses here and stops the
"what do you mean, you'd written 4K? I've only sent 30 bytes!" problems
here. 10 minutes of trinity running without triggering it, while without
that patch it triggers in 2-3 minutes.
Could you verify that the patch below deals with your setup as well?
If it does, I'm going to put it into tonight's pull request, after I get
some sleep... Right now I'm about to crawl in direction of bed - 25 hours
of uptime is a bit too much... ;-/
diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..8c4941d 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -843,7 +843,8 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
if (err < 0) {
if (err == -EIO)
c->status = Disconnected;
- goto reterr;
+ if (err != -ERESTARTSYS)
+ goto reterr;
}
if (req->status == REQ_STATUS_ERROR) {
p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
@@ -1647,7 +1648,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
if (*err) {
trace_9p_protocol_dump(clnt, req->rc);
p9_free_req(clnt, req);
+ break;
}
+ if (rsize < count)
+ pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
2015-07-03 18:00 GMT+03:00 Al Viro <[email protected]>:
> Could you verify that the patch below deals with your setup as well?
> If it does, I'm going to put it into tonight's pull request, after I get
> some sleep... Right now I'm about to crawl in direction of bed - 25 hours
> of uptime is a bit too much... ;-/
>
Works for me.
Tested-by: Andrey Ryabinin <[email protected]>