Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp2201359imm; Thu, 11 Oct 2018 06:45:49 -0700 (PDT) X-Google-Smtp-Source: ACcGV62fyjehzK/FsDOnKM//8dsI60JUvRHUpfA5PVz8bTS6KMTggQ2hwYSPBfP3Mk9T4a1dXtxp X-Received: by 2002:a17:902:708b:: with SMTP id z11-v6mr1716383plk.151.1539265549494; Thu, 11 Oct 2018 06:45:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539265549; cv=none; d=google.com; s=arc-20160816; b=SS3eyr0AcfAgyWpoYA+nBQ0XpX5xMf9AI3orYXJHl3FjetVjgmgY9QA0nF+VXzXXLu WzUbBMruR0CeLMXrwnql2UXPyKsie00lWjjHBFZUdfPIRs6+192i9A8wdK4ByX1ESKa+ XBTZw1dr87Eaplxpv6yCzsy9XSx0KH6bP8ab/fGm3/l6Jt8pzDcu5HtujAXk28shbuuM Eg1CcByv4j6dCyucPeohMmbYTTB56O3G3gtDEeS/tm2c9UUh3N5Y9+TTX7op4Qh1gzIQ HSWPbaYip5V+siDQcEedjo67JET+Mk3/ldlI6elHqQGM1TXF49U39jjhFAJZAKuOwbkk lcyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature; bh=SNwRvaFjrdvNiQ3ratbRJAl9Mw9Xitgjw2n4N2uR3iE=; b=TTnyf6QdUmkFERxt+T08y4wCHLh6/DScLKBvapDAQHZhmM13XEOFR4qdIqDaxnddYO AQ57VXgdd6L/i3D1J5GTr2eEIQlXV9FqwQp4xAsRDp1TOJ0CekdyVPAUunFqS20ShY13 G3vL6u8tVOcLZOleJDEB1TKZ/KzSHsgoWXUD70CEJk8l9Gd5haBjmzTo2XHlU0kXJENJ B0BYHtHwfsb/NQTqfwbK/wpY/L6oGzSvdQT4epeL/WDBLiPPn3an2P11pNHWO8+aR5Lz KlUMGIhQfICz1PI9T3nqgNbO7bnOeChNCZgyNx0+LrdGB0jyTQJi74/3BGRlUZ3kqvuf CDyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=I6TAoPv6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h7-v6si31706256pfe.72.2018.10.11.06.45.34; Thu, 11 Oct 2018 06:45:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=I6TAoPv6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728103AbeJKUzP (ORCPT + 99 others); Thu, 11 Oct 2018 16:55:15 -0400 Received: from mail-io1-f68.google.com ([209.85.166.68]:43876 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726071AbeJKUzO (ORCPT ); Thu, 11 Oct 2018 16:55:14 -0400 Received: by mail-io1-f68.google.com with SMTP id y10-v6so6555251ioa.10 for ; Thu, 11 Oct 2018 06:28:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=SNwRvaFjrdvNiQ3ratbRJAl9Mw9Xitgjw2n4N2uR3iE=; b=I6TAoPv6Mg2GkSc/vINLWdMeyVWFi4pAycGMG/Yxfp7f805voOtey3ysJvtmSbjRXJ THKByQ+dQrVMx+UKrSleT0u9raOjcwJOMNPD82DnpWKD6/f8K78HNxQRM8g5tsh2e2Z8 WMgHq8umqcg8J/c63VSi6nMXwqa54gbF95an95J9XMo+fRd3gNtGk2MbYLljEzYJdRL/ Sd63CfGtAZDtHBNroYKk9meBgIAM0egHxES3eYVSAZmZX7i0YSXPtxwxqiE1iydWMcfP ZwyEL9oboQRtdXYBeW1MMTwDmPolrVVkwddxt/XoS7VFM0m0sIXQJD1L+sOai4ke/yaK LR4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=SNwRvaFjrdvNiQ3ratbRJAl9Mw9Xitgjw2n4N2uR3iE=; b=Bw8NsoZwhQ/NjCPvseKZgnQqEt6TKkGnmboluVVbwyI+ERZ2Mz/eSYQ6QUyqyyDj2p MOKoCvGcdhEz6+/d3gfPIlFSMbP46IPLKx7H/Yo1HTH1xzL6r4L9+QYb3mmkgsqxPvbn 0F1URNVi6Xd5lnYW6HQe8jHX0ZVz92fLFHksex4GwEFz5fkJxsH0GjhwSnRHfwqzxIed jT9zXAukrBEPSttV0OA3T7pSnHluZQigJBZySFR0JO5D0YfFijpqyewGn4FIEIttESE7 UItaMB+tjeE8PzrPkDnmMiaQ76AcLbtvG84BLDzH+1F5CuTT5mZR5/ULxjCUHKRW1nqV P+LQ== X-Gm-Message-State: ABuFfojJ8lddsR455gOz7uYQe3wUNBadcVMichveCRXUezG7z6c7jTYa +jYaeRZpA0LFp3rkkcbT33dFLfUDJL7ooNe5lZjFYw== X-Received: by 2002:a6b:6209:: with SMTP id f9-v6mr1053377iog.11.1539264481957; Thu, 11 Oct 2018 06:28:01 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:1003:0:0:0:0:0 with HTTP; Thu, 11 Oct 2018 06:27:41 -0700 (PDT) In-Reply-To: <20181011131045.GA32030@nautica> References: <000000000000ca61cd0571178677@google.com> <000000000000fddb150577c15af6@google.com> <20181009020949.GA29622@nautica> <20181010144059.GA20918@nautica> <20181010155814.GC20918@nautica> <20181011131045.GA32030@nautica> From: Dmitry Vyukov Date: Thu, 11 Oct 2018 15:27:41 +0200 Message-ID: Subject: Re: BUG: corrupted list in p9_read_work To: Dominique Martinet Cc: Leon Romanovsky , syzbot , David Miller , Eric Van Hensbergen , LKML , Latchesar Ionkov , netdev , Ron Minnich , syzkaller-bugs , v9fs-developer@lists.sourceforge.net Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet wrote: > Dmitry Vyukov wrote on Thu, Oct 11, 2018: >> > That's still the tricky part, I'm afraid... Making a separate server >> > would have been easy because I could have reused some of my junk for the >> > actual connection handling (some rdma helper library I wrote ages >> > ago[1]), but if you're going to just embed C code you'll probably want >> > something lower level? I've never seen syzkaller use any library call >> > but I'm not even sure I would know how to create a qp without >> > libibverbs, would standard stuff be OK ? >> >> Raw syscalls preferably. >> What does 'rxe_cfg start ens3' do on syscall level? Some netlink? > > modprobe rdma_rxe (and a bunch of other rdma modules before that) then > writes the interface name in /sys/module/rdma_rxe/parameters/add > apparently; then checks it worked. > this part could be done in C directly without too much trouble, but as > long as the proper kernel configuration/modules are available Now we are talking! We generally assume that all modules are simply compiled into kernel. At least that's we have on syzbot. If somebody can't compile them in, we can suggest to add modprobe into init. So this boils down to just writing to /sys/module/rdma_rxe/parameters/add. >> Any libraries and utilities are hell pain in linux world. Will it work >> in Android userspace? gVisor? Who will explain all syzkaller users >> where they get this for their who-knows-what distro, which is 10 years >> old because of corp policies, and debug how their version of the >> library has a slightly incompatible version? >> For example, after figuring out that rxe_cfg actually comes from >> rdma-core (which is a separate delight on linux), my debian >> destribution failed to install it because of some conflicts around >> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about >> such package. And we've just started :) > > The rdma ecosystem is a pain, I'll easily agree with that... > >> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP, >> ok, that's it. If it works, great, we can use it. > > I'll have to look into it a bit more; libibverbs abstracts a lot of > stuff into per-nic userspace drivers (the files I cited in a previous > mail) and basically with the mellanox cards I'm familiar with the whole > user session looks like this: > * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and > /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi > version, what user driver to load etc) > * it and the userspace driver issue "commands" over these two files' fd > to setup the connection ; some commands are standard but some are > specific to the interface and defined in the driver. But we will use some kind of virtual/stub driver, right? We don't have real hardware. So all these commands should be fixed and known for the virtual/stub driver. > There are many facets to a connection in RDMA: a protection domain used > to register memory with the nic, a queue pair that is the actual tx/rx > connection, optionally a completion channel that will be another fd to > listen on for events that tell you something happened and finally some > memory regions to directly communicate with the nic from userspace > depending on the specific driver. > * then there's the actual usage, more commands through the uverbs0 char > device to register the memory you'll use, and once that's done it's > entierly up to the driver - for example the mellanox lib can do > everything in userspace playing with the memory regions it registered, > but I'd wager the rxe driver does more calls through the uverbs0 fd... > > Honestly I'm not keen on reimplementing all of this; the interface > itself pretty much depends on your version of the kernel (there is a > common ABI defined, but as far as specific nics are concerned if your > kernel module doesn't match the user library version you can get some > nasty surprises), and it's far from the black or white of a good ol' > ENOSUPP error. > > > I'll look if I can figure out if there is a common subset of verbs > commands that are standard and sufficient to setup a listening > connection and exchange data that should be supported for all devices > and would let us reimplement just that, but while I hear your point > about android and ten years in the future I think it's more likely than > ten years in the future the verb abi will have changed but libibverbs > will just have the new version implemented and hide the change :P But again we don't need to support all of the available hardware. For example, we are testing net stack from external side using tun. tun is a very simple, virtual abstraction of a network card. It allows us to test all of generic net stack starting from L2 without messing with any real drivers and their differences entirely. I had impression that we are talking about something similar here too. Or not? Also I am a bit missing context about rdma<->9p interface. Do we need to setup all these ring buffers to satisfy the parts that 9p needs? Is it that 9p actually reads data directly from these ring buffers? Or there is some higher-level rdma interface that 9p uses?