2019-01-14 09:02:52

by Joel Nider

[permalink] [raw]
Subject: updating user verbs documentation

A small patchset to update the verbs API documentation with some
information regarding the ioctl syscall. First patch converts the
file format to ReST, since this is the new preferred format. 2nd
patch links this file to the main index so we can actually find
it by browsing (search will work in any case). 3rd patch adds the
new content, documenting a bit of the internal workings of the
kernel side of the API functions. The goal is to make it easier
for developers unfamiliar with the structure to understand what
is going on when adding a new function.

[PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst
[PATCH 2/3] docs-rst: update index file with infiniband docs
[PATCH 3/3] docs-rst: infiniband: update verbs API details



2019-01-14 09:03:00

by Joel Nider

[permalink] [raw]
Subject: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

Replace the existing Documentation/infiniband/user_verbs.txt with
Documentation/infiniband/user_verbs.rst. No substantial changes to
the content - just some minor reformatting to have the rendering
come out nicely.
This is in preparation for updating the content in a subsequent
patch.

Signed-off-by: Joel Nider <[email protected]>
---
Documentation/infiniband/user_verbs.rst | 70 +++++++++++++++++++++++++++++++++
Documentation/infiniband/user_verbs.txt | 69 --------------------------------
2 files changed, 70 insertions(+), 69 deletions(-)
create mode 100644 Documentation/infiniband/user_verbs.rst
delete mode 100644 Documentation/infiniband/user_verbs.txt

diff --git a/Documentation/infiniband/user_verbs.rst b/Documentation/infiniband/user_verbs.rst
new file mode 100644
index 0000000..ffc4aec
--- /dev/null
+++ b/Documentation/infiniband/user_verbs.rst
@@ -0,0 +1,70 @@
+======================
+Userspace Verbs Access
+======================
+The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
+enables direct userspace access to IB hardware via "verbs," as
+described in chapter 11 of the InfiniBand Architecture Specification.
+
+To use the verbs, the libibverbs library, available from
+https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a
+device-independent API for using the ib_uverbs interface.
+libibverbs also requires appropriate device-dependent kernel and
+userspace driver for your InfiniBand hardware. For example, to use
+a Mellanox HCA, you will need the ib_mthca kernel module and the
+libmthca userspace driver be installed.
+
+User-kernel communication
+=========================
+Userspace communicates with the kernel for slow path, resource
+management operations via the /dev/infiniband/uverbsN character
+devices. Fast path operations are typically performed by writing
+directly to hardware registers mmap()ed into userspace, with no
+system call or context switch into the kernel.
+
+Commands are sent to the kernel via write()s on these device files.
+The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
+The structs for commands that require a response from the kernel
+contain a 64-bit field used to pass a pointer to an output buffer.
+Status is returned to userspace as the return value of the write()
+system call.
+
+Resource management
+===================
+Since creation and destruction of all IB resources is done by
+commands passed through a file descriptor, the kernel can keep track
+of which resources are attached to a given userspace context. The
+ib_uverbs module maintains idr tables that are used to translate
+between kernel pointers and opaque userspace handles, so that kernel
+pointers are never exposed to userspace and userspace cannot trick
+the kernel into following a bogus pointer.
+
+This also allows the kernel to clean up when a process exits and
+prevent one process from touching another process's resources.
+
+Memory pinning
+==============
+Direct userspace I/O requires that memory regions that are potential
+I/O targets be kept resident at the same physical address. The
+ib_uverbs module manages pinning and unpinning memory regions via
+get_user_pages() and put_page() calls. It also accounts for the
+amount of memory pinned in the process's locked_vm, and checks that
+unprivileged processes do not exceed their RLIMIT_MEMLOCK limit.
+
+Pages that are pinned multiple times are counted each time they are
+pinned, so the value of locked_vm may be an overestimate of the
+number of pages pinned by a process.
+
+/dev files
+==========
+To create the appropriate character device files automatically with
+udev, a rule like::
+
+ KERNEL=="uverbs*", NAME="infiniband/%k"
+
+can be used. This will create device nodes named::
+
+ /dev/infiniband/uverbs0
+
+and so on. Since the InfiniBand userspace verbs should be safe for
+use by non-privileged processes, it may be useful to add an
+appropriate MODE or GROUP to the udev rule.
diff --git a/Documentation/infiniband/user_verbs.txt b/Documentation/infiniband/user_verbs.txt
deleted file mode 100644
index df049b9..0000000
--- a/Documentation/infiniband/user_verbs.txt
+++ /dev/null
@@ -1,69 +0,0 @@
-USERSPACE VERBS ACCESS
-
- The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
- enables direct userspace access to IB hardware via "verbs," as
- described in chapter 11 of the InfiniBand Architecture Specification.
-
- To use the verbs, the libibverbs library, available from
- https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a
- device-independent API for using the ib_uverbs interface.
- libibverbs also requires appropriate device-dependent kernel and
- userspace driver for your InfiniBand hardware. For example, to use
- a Mellanox HCA, you will need the ib_mthca kernel module and the
- libmthca userspace driver be installed.
-
-User-kernel communication
-
- Userspace communicates with the kernel for slow path, resource
- management operations via the /dev/infiniband/uverbsN character
- devices. Fast path operations are typically performed by writing
- directly to hardware registers mmap()ed into userspace, with no
- system call or context switch into the kernel.
-
- Commands are sent to the kernel via write()s on these device files.
- The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
- The structs for commands that require a response from the kernel
- contain a 64-bit field used to pass a pointer to an output buffer.
- Status is returned to userspace as the return value of the write()
- system call.
-
-Resource management
-
- Since creation and destruction of all IB resources is done by
- commands passed through a file descriptor, the kernel can keep track
- of which resources are attached to a given userspace context. The
- ib_uverbs module maintains idr tables that are used to translate
- between kernel pointers and opaque userspace handles, so that kernel
- pointers are never exposed to userspace and userspace cannot trick
- the kernel into following a bogus pointer.
-
- This also allows the kernel to clean up when a process exits and
- prevent one process from touching another process's resources.
-
-Memory pinning
-
- Direct userspace I/O requires that memory regions that are potential
- I/O targets be kept resident at the same physical address. The
- ib_uverbs module manages pinning and unpinning memory regions via
- get_user_pages() and put_page() calls. It also accounts for the
- amount of memory pinned in the process's locked_vm, and checks that
- unprivileged processes do not exceed their RLIMIT_MEMLOCK limit.
-
- Pages that are pinned multiple times are counted each time they are
- pinned, so the value of locked_vm may be an overestimate of the
- number of pages pinned by a process.
-
-/dev files
-
- To create the appropriate character device files automatically with
- udev, a rule like
-
- KERNEL=="uverbs*", NAME="infiniband/%k"
-
- can be used. This will create device nodes named
-
- /dev/infiniband/uverbs0
-
- and so on. Since the InfiniBand userspace verbs should be safe for
- use by non-privileged processes, it may be useful to add an
- appropriate MODE or GROUP to the udev rule.
--
2.7.4


2019-01-14 09:03:22

by Joel Nider

[permalink] [raw]
Subject: [PATCH 2/3] docs-rst: update index file with infiniband docs

Link the previously converted Documentation/infiniband/user_verbs.rst
to the main index by creating a new subsystem (Infiniband) under the
root document. This manifests as a new section under "Kernel API
Documentation" in the index.html, as well as a new section in the
table of contents pane.

This has been tested with 'make htmldocs'.
---
Documentation/conf.py | 2 ++
Documentation/index.rst | 1 +
Documentation/infiniband/conf.py | 10 ++++++++++
Documentation/infiniband/index.rst | 9 +++++++++
4 files changed, 22 insertions(+)
create mode 100644 Documentation/infiniband/conf.py
create mode 100644 Documentation/infiniband/index.rst

diff --git a/Documentation/conf.py b/Documentation/conf.py
index 72647a3..ff71088 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -389,6 +389,8 @@ latex_documents = [
'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
'The kernel development community', 'manual'),
+ ('infiniband/index', 'infiniband.tex', 'Infiniband subsystem',
+ 'The kernel development community', 'manual'),
('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
'The kernel development community', 'manual'),
('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To Hacking The Linux Kernel',
diff --git a/Documentation/index.rst b/Documentation/index.rst
index c858c2e..8d91ea5 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -82,6 +82,7 @@ needed).
core-api/index
media/index
networking/index
+ infiniband/index
input/index
gpu/index
security/index
diff --git a/Documentation/infiniband/conf.py b/Documentation/infiniband/conf.py
new file mode 100644
index 0000000..dc42d33
--- /dev/null
+++ b/Documentation/infiniband/conf.py
@@ -0,0 +1,10 @@
+# -*- coding: utf-8; mode: python -*-
+
+project = "Linux Infiniband Documentation"
+
+tags.add("subproject")
+
+latex_documents = [
+ ('index', 'infiniband.tex', project,
+ 'The kernel development community', 'manual'),
+]
diff --git a/Documentation/infiniband/index.rst b/Documentation/infiniband/index.rst
new file mode 100644
index 0000000..2dedc65
--- /dev/null
+++ b/Documentation/infiniband/index.rst
@@ -0,0 +1,9 @@
+Infiniband Documentation
+========================
+
+Contents:
+
+.. toctree::
+ :maxdepth: 1
+
+ user_verbs
--
2.7.4


2019-01-14 09:05:04

by Joel Nider

[permalink] [raw]
Subject: [PATCH 3/3] docs-rst: infiniband: update verbs API details

It is important to understand the existing framework when implementing
a new verb. The majority of existing API functions are implemented using
the write syscall, but this has been superceded by the ioctl syscall
for new commands. This patch updates the documentation regarding how
to go about implementing a new verb, focusing on the new ioctl
interface.

The documentation is far from complete, but this is a good step in the
right direction. Future patches can add more detail according to need.
Also, the interface is still undergoing substantial changes so an
effort was made to document only the stable parts so as to avoid
incorrect information since documentation changes tend to lag behind
code changes.

Signed-off-by: Joel Nider <[email protected]>
---
Documentation/infiniband/user_verbs.rst | 69 ++++++++++++++++++++++++++++++++-
1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/Documentation/infiniband/user_verbs.rst b/Documentation/infiniband/user_verbs.rst
index ffc4aec..f0c7cd3 100644
--- a/Documentation/infiniband/user_verbs.rst
+++ b/Documentation/infiniband/user_verbs.rst
@@ -21,12 +21,79 @@ devices. Fast path operations are typically performed by writing
directly to hardware registers mmap()ed into userspace, with no
system call or context switch into the kernel.

-Commands are sent to the kernel via write()s on these device files.
+There are currently two methods for executing commands in the kernel: write() and ioctl().
+Older commands are sent to the kernel via write()s on the device files
+mentioned earlier. New commands must use the ioctl() method. For completeness,
+both mechanisms are described here.
+
+The interface between userspace and kernel is kept in sync by checking the
+version number. In the kernel, it is defined by IB_USER_VERBS_ABI_VERSION
+(in include/uapi/rdma/ib_user_verbs.h).
+
+Write system call
+-----------------
The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
The structs for commands that require a response from the kernel
contain a 64-bit field used to pass a pointer to an output buffer.
Status is returned to userspace as the return value of the write()
system call.
+The entry point to the kernel is the ib_uverbs_write() function, which is
+invoked as a response to the 'write' system call. The requested function is
+looked up from an array called uverbs_cmd_table which contains function pointers
+to the various command handlers.
+
+Write Command Handlers
+~~~~~~~~~~~~~~~~~~~~~~
+These command handler functions are declared
+with the IB_VERBS_DECLARE_CMD macro in drivers/infiniband/core/uverbs.h. There
+are also extended commands, which are kept in a similar manner in the
+uverbs_ex_cmd_table. The extended commands use 64-bit values in the command
+header, as opposed to the 32-bit values used in the regular command table.
+
+
+Ioctl system call
+-----------------
+The entry point for the 'ioctl' system call is the ib_uverbs_ioctl() function.
+Unlike write(), ioctl() accepts a 'cmd' parameter, which must have the value
+defined by RDMA_VERBS_IOCTL. More documentation regarding the ioctl numbering
+scheme can be found in: Documentation/ioctl/ioctl-number.txt. The
+command-specific information is passed as a pointer in the 'arg' parameter,
+which is cast as a 'struct ib_uverbs_ioctl_hdr*'.
+
+The way command handler functions (methods) are looked up is more complicated
+than the array index used for write(). Here, the ib_uverbs_cmd_verbs() function
+uses a radix tree to search for the correct command handler. If the lookup
+succeeds, the method is invoked by ib_uverbs_run_method().
+
+Ioctl Command Handlers
+~~~~~~~~~~~~~~~~~~~~~~
+Command handlers (also known as 'methods') for ioctl are declared with the
+UVERBS_HANDLER macro. The handler is registered for use by the
+DECLARE_UVERBS_NAMED_METHOD macro, which binds the name of the handler with its
+attributes. By convention, the methods are implemented in files named with the
+prefix 'uverbs_std_types_'.
+
+Each method can accept a set of parameters called attributes. There are 6
+types of attributes: idr, fd, pointer, enum, const and flags. The idr attribute
+declares an indirect (translated) handle for the method, and
+specifies the object that the method will act upon. The first attribute should
+be a handle to the uobj (ib_uobject) which contains private data. There may be
+0 or more
+additional attributes, including other handles. The 'pointer' attribute must be
+specified as 'in' or 'out', depending on if it is an input from userspace, or
+meant to return a value to userspace.
+
+The method also needs to be bound to an object, which is done with the
+DECLARE_UVERBS_NAMED_OBJECT macro. This macro takes a variable
+number of methods and stores them in an array attached to the object.
+
+Objects are declared using DECLARE_UVERBS_NAMED_OBJECT macro. Most of the
+objects (including pd, mw, cq, etc.) are defined in uverbs_std_types.c,
+and the remaining objects are declared in files that are prefixed with the
+name 'uverbs_std_types_'.
+
+Objects trees are declared using the DECLARE_UVERBS_OBJECT_TREE macro. This
+combines all of the objects.

Resource management
===================
--
2.7.4


2019-01-14 16:55:38

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 2/3] docs-rst: update index file with infiniband docs

On Mon, Jan 14, 2019 at 11:00:50AM +0200, Joel Nider wrote:
> Link the previously converted Documentation/infiniband/user_verbs.rst
> to the main index by creating a new subsystem (Infiniband) under the
> root document. This manifests as a new section under "Kernel API
> Documentation" in the index.html, as well as a new section in the
> table of contents pane.
>
> This has been tested with 'make htmldocs'.
> ---

Missed signed-off-by

> Documentation/conf.py | 2 ++
> Documentation/index.rst | 1 +
> Documentation/infiniband/conf.py | 10 ++++++++++
> Documentation/infiniband/index.rst | 9 +++++++++
> 4 files changed, 22 insertions(+)
> create mode 100644 Documentation/infiniband/conf.py
> create mode 100644 Documentation/infiniband/index.rst
>
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 72647a3..ff71088 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -389,6 +389,8 @@ latex_documents = [
> 'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
> ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
> 'The kernel development community', 'manual'),
> + ('infiniband/index', 'infiniband.tex', 'Infiniband subsystem',
> + 'The kernel development community', 'manual'),
> ('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
> 'The kernel development community', 'manual'),
> ('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To Hacking The Linux Kernel',
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index c858c2e..8d91ea5 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -82,6 +82,7 @@ needed).
> core-api/index
> media/index
> networking/index
> + infiniband/index
> input/index
> gpu/index
> security/index
> diff --git a/Documentation/infiniband/conf.py b/Documentation/infiniband/conf.py
> new file mode 100644
> index 0000000..dc42d33
> --- /dev/null
> +++ b/Documentation/infiniband/conf.py
> @@ -0,0 +1,10 @@
> +# -*- coding: utf-8; mode: python -*-
> +
> +project = "Linux Infiniband Documentation"
> +
> +tags.add("subproject")
> +
> +latex_documents = [
> + ('index', 'infiniband.tex', project,
> + 'The kernel development community', 'manual'),
> +]
> diff --git a/Documentation/infiniband/index.rst b/Documentation/infiniband/index.rst
> new file mode 100644
> index 0000000..2dedc65
> --- /dev/null
> +++ b/Documentation/infiniband/index.rst
> @@ -0,0 +1,9 @@
> +Infiniband Documentation
> +========================
> +
> +Contents:
> +
> +.. toctree::
> + :maxdepth: 1
> +
> + user_verbs
> --
> 2.7.4
>

2019-01-14 16:57:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

On Mon, Jan 14, 2019 at 11:00:49AM +0200, Joel Nider wrote:
> Replace the existing Documentation/infiniband/user_verbs.txt with
> Documentation/infiniband/user_verbs.rst. No substantial changes to
> the content - just some minor reformatting to have the rendering
> come out nicely.
> This is in preparation for updating the content in a subsequent
> patch.
>
> Signed-off-by: Joel Nider <[email protected]>
> ---
> Documentation/infiniband/user_verbs.rst | 70 +++++++++++++++++++++++++++++++++
> Documentation/infiniband/user_verbs.txt | 69 --------------------------------
> 2 files changed, 70 insertions(+), 69 deletions(-)
> create mode 100644 Documentation/infiniband/user_verbs.rst
> delete mode 100644 Documentation/infiniband/user_verbs.txt

Thanks for getting this going Joe, I've been mulling over writing more
docs for this area for a while now.

Jonathan/linux-doc: Can you Ack at least the build system parts of
this please? I can take it through the rdma tree, unless you prefer
otherwise?

Cheers,
Jason

2019-01-14 17:36:02

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

On Mon, 14 Jan 2019 09:56:21 -0700
Jason Gunthorpe <[email protected]> wrote:

> > Documentation/infiniband/user_verbs.rst | 70 +++++++++++++++++++++++++++++++++
> > Documentation/infiniband/user_verbs.txt | 69 --------------------------------
> > 2 files changed, 70 insertions(+), 69 deletions(-)
> > create mode 100644 Documentation/infiniband/user_verbs.rst
> > delete mode 100644 Documentation/infiniband/user_verbs.txt
>
> Thanks for getting this going Joe, I've been mulling over writing more
> docs for this area for a while now.
>
> Jonathan/linux-doc: Can you Ack at least the build system parts of
> this please? I can take it through the rdma tree, unless you prefer
> otherwise?

Can I make a request? This appears to be user-oriented documentation; can
we please place it into the userspace-api manual, rather than keeping it
in its own silo?

Thanks,

jon

2019-01-14 18:53:45

by Joel Nider

[permalink] [raw]
Subject: Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

Jonathan Corbet <[email protected]> wrote on 01/14/2019 07:34:21 PM:

> From: Jonathan Corbet <[email protected]>
> To: Jason Gunthorpe <[email protected]>
> Cc: Joel Nider <[email protected]>, Leon Romanovsky <[email protected]>,
Doug
> Ledford <[email protected]>, Mike Rapoport <[email protected]>,
linux-
> [email protected], [email protected]
> Date: 01/14/2019 07:37 PM
> Subject: Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to
rst
>
> On Mon, 14 Jan 2019 09:56:21 -0700
> Jason Gunthorpe <[email protected]> wrote:
>
> > > Documentation/infiniband/user_verbs.rst | 70
+++++++++++++++++++++++++++++++++
> > > Documentation/infiniband/user_verbs.txt | 69
--------------------------------
> > > 2 files changed, 70 insertions(+), 69 deletions(-)
> > > create mode 100644 Documentation/infiniband/user_verbs.rst
> > > delete mode 100644 Documentation/infiniband/user_verbs.txt
> >
> > Thanks for getting this going Joel, I've been mulling over writing
more
> > docs for this area for a while now.
> >
> > Jonathan/linux-doc: Can you Ack at least the build system parts of
> > this please? I can take it through the rdma tree, unless you prefer
> > otherwise?
>
> Can I make a request? This appears to be user-oriented documentation;
can
> we please place it into the userspace-api manual, rather than keeping it
> in its own silo?

I knew someone was going to ask me to do that :-)
I agree that userspace stuff should all be together, but I guess for
historical reasons this one stayed in the infiniband section. So if Jason
is ok with moving the doc, I'll take a look in the morning to see how best
to work that into the patchset.

> Thanks,
>
> jon
>



2019-01-14 18:58:57

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

On Mon, Jan 14, 2019 at 08:52:14PM +0200, Joel Nider wrote:
> > > Jonathan/linux-doc: Can you Ack at least the build system parts of
> > > this please? I can take it through the rdma tree, unless you prefer
> > > otherwise?
> >
> > Can I make a request? This appears to be user-oriented documentation; can
> > we please place it into the userspace-api manual, rather than keeping it
> > in its own silo?
>
> I knew someone was going to ask me to do that :-)
> I agree that userspace stuff should all be together, but I guess for
> historical reasons this one stayed in the infiniband section. So if Jason
> is ok with moving the doc, I'll take a look in the morning to see how best
> to work that into the patchset.

I don't mind

Jason