Hi,
I'm trying to fix the inside-secure driver to pass all testmgr
tests and I have one final issue remaining with the AEAD ciphers.
As it was not clear at all what the exact problem was, I spent
some time reverse engineering testmgr and I got the distinct
impression that it is using scatter particles that cross page
boundaries. On purpose, even.
While the inside-secure driver is built on the premise that
scatter particles are continuous in device space. As I can't
think of any reason why you would want to scatter/gather other
than to handle virtual-to-physical address translation ...
In any case, this should affect all other other operations as
well, but maybe those just got "lucky" by getting particles
that were still contiguous in device space, despite the page
crossing (to *really* verify this, you would have to fully
randomize your page allocation!)
Anyway, assuming that I *should* be able to handle particles
that are *not* contiguous in device space, then there should
probably already exist some function in the kernel API that
converts a scatterlist with non-contiguous particles into a
scatterlist with contiguous particles, taking into account the
presence of an IOMMU? Considering pretty much every device
driver would need to do that?
Does anyone know which function(s) to use for that?
Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
On Wed, 17 Apr 2019 at 12:51, Pascal Van Leeuwen
<[email protected]> wrote:
>
> Hi,
>
> I'm trying to fix the inside-secure driver to pass all testmgr
> tests and I have one final issue remaining with the AEAD ciphers.
> As it was not clear at all what the exact problem was, I spent
> some time reverse engineering testmgr and I got the distinct
> impression that it is using scatter particles that cross page
> boundaries. On purpose, even.
>
> While the inside-secure driver is built on the premise that
> scatter particles are continuous in device space. As I can't
> think of any reason why you would want to scatter/gather other
> than to handle virtual-to-physical address translation ...
> In any case, this should affect all other other operations as
> well, but maybe those just got "lucky" by getting particles
> that were still contiguous in device space, despite the page
> crossing (to *really* verify this, you would have to fully
> randomize your page allocation!)
>
> Anyway, assuming that I *should* be able to handle particles
> that are *not* contiguous in device space, then there should
> probably already exist some function in the kernel API that
> converts a scatterlist with non-contiguous particles into a
> scatterlist with contiguous particles, taking into account the
> presence of an IOMMU? Considering pretty much every device
> driver would need to do that?
> Does anyone know which function(s) to use for that?
>
Hello Pascal,
Scatterlists are made up of struct page/offset tuples, and so they
should map transparently onto physical ranges.
It looks like the AEAD skcipher walk API lacks a *_async() variant
setting the SKCIPHER_WALK_PHYS bit, like we have for the ordinary
block ciphers. Plumbing that into crypto/skcipher.c should be rather
straight-forward.
--
Ard.
Hi Pascal,
On Wed, Apr 17, 2019 at 07:51:08PM +0000, Pascal Van Leeuwen wrote:
> Hi,
>
> I'm trying to fix the inside-secure driver to pass all testmgr
> tests and I have one final issue remaining with the AEAD ciphers.
> As it was not clear at all what the exact problem was, I spent
> some time reverse engineering testmgr and I got the distinct
> impression that it is using scatter particles that cross page
> boundaries. On purpose, even.
>
> While the inside-secure driver is built on the premise that
> scatter particles are continuous in device space. As I can't
> think of any reason why you would want to scatter/gather other
> than to handle virtual-to-physical address translation ...
> In any case, this should affect all other other operations as
> well, but maybe those just got "lucky" by getting particles
> that were still contiguous in device space, despite the page
> crossing (to *really* verify this, you would have to fully
> randomize your page allocation!)
>
> Anyway, assuming that I *should* be able to handle particles
> that are *not* contiguous in device space, then there should
> probably already exist some function in the kernel API that
> converts a scatterlist with non-contiguous particles into a
> scatterlist with contiguous particles, taking into account the
> presence of an IOMMU? Considering pretty much every device
> driver would need to do that?
> Does anyone know which function(s) to use for that?
>
> Regards,
> Pascal van Leeuwen
> Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
>
Indeed, since v5.1, testmgr tests scatterlist elements that cross a page.
However, the pages are guaranteed to be *physically* contiguous. Does
dma_map_sg() not handle this?
BTW, this isn't just a theoretical case. Many crypto API users do crypto on
kmalloced buffers, and those can cross a page boundary, especially if they are
large. All software crypto algorithms handle this case.
The fact that these types of issues are just being considered now certainly
isn't raising my confidence in the hardware crypto drivers in the kernel...
- Eric
> > I'm trying to fix the inside-secure driver to pass all testmgr
> > tests and I have one final issue remaining with the AEAD ciphers.
> > As it was not clear at all what the exact problem was, I spent
> > some time reverse engineering testmgr and I got the distinct
> > impression that it is using scatter particles that cross page
> > boundaries. On purpose, even.
> >
> > While the inside-secure driver is built on the premise that
> > scatter particles are continuous in device space. As I can't
> > think of any reason why you would want to scatter/gather other
> > than to handle virtual-to-physical address translation ...
> > In any case, this should affect all other other operations as
> > well, but maybe those just got "lucky" by getting particles
> > that were still contiguous in device space, despite the page
> > crossing (to *really* verify this, you would have to fully
> > randomize your page allocation!)
> >
> > Anyway, assuming that I *should* be able to handle particles
> > that are *not* contiguous in device space, then there should
> > probably already exist some function in the kernel API that
> > converts a scatterlist with non-contiguous particles into a
> > scatterlist with contiguous particles, taking into account the
> > presence of an IOMMU? Considering pretty much every device
> > driver would need to do that?
> > Does anyone know which function(s) to use for that?
> >
>
> Hello Pascal,
>
> Scatterlists are made up of struct page/offset tuples, and so they
> should map transparently onto physical ranges.
>
> It looks like the AEAD skcipher walk API lacks a *_async() variant
> setting the SKCIPHER_WALK_PHYS bit, like we have for the ordinary
> block ciphers. Plumbing that into crypto/skcipher.c should be rather
> straight-forward.
>
> --
> Ard.
Ard,
Am I reading this correctly as "your driver should indeed expect
those particles to be contiguous, but there is some problem in some
other location in the API causing this to not always be the case"?
I took a quick peek at skcipher.c, which is a design unfamiliar to
me, but the first thing that strikes me is that there is an
"skcipher_walk_async", that indeed sets this SKCIPHER_WALK_PHYS bit,
but no matching "skcipher_walk_aead_async" function that I would sort
of expect doing a similar thing for AEAD ciphers ...
Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
On Wed, 17 Apr 2019 at 13:49, Pascal Van Leeuwen
<[email protected]> wrote:
>
> > > I'm trying to fix the inside-secure driver to pass all testmgr
> > > tests and I have one final issue remaining with the AEAD ciphers.
> > > As it was not clear at all what the exact problem was, I spent
> > > some time reverse engineering testmgr and I got the distinct
> > > impression that it is using scatter particles that cross page
> > > boundaries. On purpose, even.
> > >
> > > While the inside-secure driver is built on the premise that
> > > scatter particles are continuous in device space. As I can't
> > > think of any reason why you would want to scatter/gather other
> > > than to handle virtual-to-physical address translation ...
> > > In any case, this should affect all other other operations as
> > > well, but maybe those just got "lucky" by getting particles
> > > that were still contiguous in device space, despite the page
> > > crossing (to *really* verify this, you would have to fully
> > > randomize your page allocation!)
> > >
> > > Anyway, assuming that I *should* be able to handle particles
> > > that are *not* contiguous in device space, then there should
> > > probably already exist some function in the kernel API that
> > > converts a scatterlist with non-contiguous particles into a
> > > scatterlist with contiguous particles, taking into account the
> > > presence of an IOMMU? Considering pretty much every device
> > > driver would need to do that?
> > > Does anyone know which function(s) to use for that?
> > >
> >
> > Hello Pascal,
> >
> > Scatterlists are made up of struct page/offset tuples, and so they
> > should map transparently onto physical ranges.
> >
> > It looks like the AEAD skcipher walk API lacks a *_async() variant
> > setting the SKCIPHER_WALK_PHYS bit, like we have for the ordinary
> > block ciphers. Plumbing that into crypto/skcipher.c should be rather
> > straight-forward.
> >
> > --
> > Ard.
> Ard,
>
>
> Am I reading this correctly as "your driver should indeed expect
> those particles to be contiguous, but there is some problem in some
> other location in the API causing this to not always be the case"?
>
Indeed.
> I took a quick peek at skcipher.c, which is a design unfamiliar to
> me, but the first thing that strikes me is that there is an
> "skcipher_walk_async", that indeed sets this SKCIPHER_WALK_PHYS bit,
> but no matching "skcipher_walk_aead_async" function that I would sort
> of expect doing a similar thing for AEAD ciphers ...
>
That is precisely my point.
> -----Original Message-----
> From: Eric Biggers [mailto:[email protected]]
> Sent: Wednesday, April 17, 2019 10:24 PM
> To: Pascal Van Leeuwen <[email protected]>
> Cc: [email protected]; Herbert Xu
> <[email protected]>
> Subject: Re: Question regarding crypto scatterlists / testmgr
>
> Hi Pascal,
>
> On Wed, Apr 17, 2019 at 07:51:08PM +0000, Pascal Van Leeuwen wrote:
> > Hi,
> >
> > I'm trying to fix the inside-secure driver to pass all testmgr
> > tests and I have one final issue remaining with the AEAD ciphers.
> > As it was not clear at all what the exact problem was, I spent
> > some time reverse engineering testmgr and I got the distinct
> > impression that it is using scatter particles that cross page
> > boundaries. On purpose, even.
> >
> > While the inside-secure driver is built on the premise that
> > scatter particles are continuous in device space. As I can't
> > think of any reason why you would want to scatter/gather other
> > than to handle virtual-to-physical address translation ...
> > In any case, this should affect all other other operations as
> > well, but maybe those just got "lucky" by getting particles
> > that were still contiguous in device space, despite the page
> > crossing (to *really* verify this, you would have to fully
> > randomize your page allocation!)
> >
> > Anyway, assuming that I *should* be able to handle particles
> > that are *not* contiguous in device space, then there should
> > probably already exist some function in the kernel API that
> > converts a scatterlist with non-contiguous particles into a
> > scatterlist with contiguous particles, taking into account the
> > presence of an IOMMU? Considering pretty much every device
> > driver would need to do that?
> > Does anyone know which function(s) to use for that?
> >
> > Regards,
> > Pascal van Leeuwen
> > Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
> >
>
> Indeed, since v5.1, testmgr tests scatterlist elements that cross a
> page.
> However, the pages are guaranteed to be *physically* contiguous. Does
> dma_map_sg() not handle this?
>
I'm not entirely sure and the API documentation is not particularly
clear on *what* dma_map_sg() actually does, but I highly doubt it
considering the particle count is only an input parameter (i.e. it
can't output an increase in particles that would be required).
So I think it just ensures the pages are actually flushed to memory
and accessible by the device (in case an IOMMU interferes) and not
much than that.
In any case, scatter particles to be used by hardware should *not*
cross any physical page boundaries.
But also see the thread I had on this with Ard - seems like the crypto
API already has some mechanism for enforcing this but it's not enabled
for AEAD ciphers?
>
> BTW, this isn't just a theoretical case. Many crypto API users do
> crypto on
> kmalloced buffers, and those can cross a page boundary, especially if
> they are
> large. All software crypto algorithms handle this case.
>
Software sits behind the CPU's MMU and sees virtual memory as
contiguous. It does not need to "handle" anything, it gets it for free.
Hardware does not have that luxury, unless you have a functioning IOMMU
but that is still pretty rare.
So for hardware, you need to break down your buffers until individual
pages and stitch those together. That's the main use case of a scatter
list and it requires the particles to NOT cross physical pages.
> The fact that these types of issues are just being considered now
> certainly
> isn't raising my confidence in the hardware crypto drivers in the
> kernel...
>
Actually, this is *not* a problem with the hardware drivers. It's a
problem with the API and/or how you are trying to use it. Hardware
does NOT see the nice contiguous virtual memory that SW sees.
If the driver may expect to receive particles that cross page
boundaries - if that's the spec - fine, but then it will have to
break those down into individual pages by itself. However, whomever
created the inside-secure driver was under the impression that this
was not supposed to be the case. And I don't know who's right or
wrong there, but from a side discussion with Ard I got the impression
that the Crypto API should fix this up before it reaches the driver.
Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
> > Indeed, since v5.1, testmgr tests scatterlist elements that cross a
> > page.
> > However, the pages are guaranteed to be *physically* contiguous.
> Does
> > dma_map_sg() not handle this?
> >
> I'm not entirely sure and the API documentation is not particularly
> clear on *what* dma_map_sg() actually does, but I highly doubt it
> considering the particle count is only an input parameter (i.e. it
> can't output an increase in particles that would be required).
> So I think it just ensures the pages are actually flushed to memory
> and accessible by the device (in case an IOMMU interferes) and not
> much than that.
>
> In any case, scatter particles to be used by hardware should *not*
> cross any physical page boundaries.
> But also see the thread I had on this with Ard - seems like the crypto
> API already has some mechanism for enforcing this but it's not enabled
> for AEAD ciphers?
>
> >
> > BTW, this isn't just a theoretical case. Many crypto API users do
> > crypto on
> > kmalloced buffers, and those can cross a page boundary, especially if
> > they are
> > large. All software crypto algorithms handle this case.
> >
> Software sits behind the CPU's MMU and sees virtual memory as
> contiguous. It does not need to "handle" anything, it gets it for free.
> Hardware does not have that luxury, unless you have a functioning IOMMU
> but that is still pretty rare.
> So for hardware, you need to break down your buffers until individual
> pages and stitch those together. That's the main use case of a scatter
> list and it requires the particles to NOT cross physical pages.
>
> > The fact that these types of issues are just being considered now
> > certainly
> > isn't raising my confidence in the hardware crypto drivers in the
> > kernel...
> >
> Actually, this is *not* a problem with the hardware drivers. It's a
> problem with the API and/or how you are trying to use it. Hardware
> does NOT see the nice contiguous virtual memory that SW sees.
>
> If the driver may expect to receive particles that cross page
> boundaries - if that's the spec - fine, but then it will have to
> break those down into individual pages by itself. However, whomever
> created the inside-secure driver was under the impression that this
> was not supposed to be the case. And I don't know who's right or
> wrong there, but from a side discussion with Ard I got the impression
> that the Crypto API should fix this up before it reaches the driver.
>
Long story short: testmgr appears to be doing nothing wrong AND the
driver appears to be doing nothing wrong, but it seems like there's a
bug in the Crypto API itself with the scatter walk for AEAD ciphers.
Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
Hi Pascal,
On Wed, Apr 17, 2019 at 09:16:54PM +0000, Pascal Van Leeuwen wrote:
> > -----Original Message-----
> > From: Eric Biggers [mailto:[email protected]]
> > Sent: Wednesday, April 17, 2019 10:24 PM
> > To: Pascal Van Leeuwen <[email protected]>
> > Cc: [email protected]; Herbert Xu
> > <[email protected]>
> > Subject: Re: Question regarding crypto scatterlists / testmgr
> >
> > Hi Pascal,
> >
> > On Wed, Apr 17, 2019 at 07:51:08PM +0000, Pascal Van Leeuwen wrote:
> > > Hi,
> > >
> > > I'm trying to fix the inside-secure driver to pass all testmgr
> > > tests and I have one final issue remaining with the AEAD ciphers.
> > > As it was not clear at all what the exact problem was, I spent
> > > some time reverse engineering testmgr and I got the distinct
> > > impression that it is using scatter particles that cross page
> > > boundaries. On purpose, even.
> > >
> > > While the inside-secure driver is built on the premise that
> > > scatter particles are continuous in device space. As I can't
> > > think of any reason why you would want to scatter/gather other
> > > than to handle virtual-to-physical address translation ...
> > > In any case, this should affect all other other operations as
> > > well, but maybe those just got "lucky" by getting particles
> > > that were still contiguous in device space, despite the page
> > > crossing (to *really* verify this, you would have to fully
> > > randomize your page allocation!)
> > >
> > > Anyway, assuming that I *should* be able to handle particles
> > > that are *not* contiguous in device space, then there should
> > > probably already exist some function in the kernel API that
> > > converts a scatterlist with non-contiguous particles into a
> > > scatterlist with contiguous particles, taking into account the
> > > presence of an IOMMU? Considering pretty much every device
> > > driver would need to do that?
> > > Does anyone know which function(s) to use for that?
> > >
> > > Regards,
> > > Pascal van Leeuwen
> > > Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
> > >
> >
> > Indeed, since v5.1, testmgr tests scatterlist elements that cross a
> > page.
> > However, the pages are guaranteed to be *physically* contiguous. Does
> > dma_map_sg() not handle this?
> >
> I'm not entirely sure and the API documentation is not particularly
> clear on *what* dma_map_sg() actually does, but I highly doubt it
> considering the particle count is only an input parameter (i.e. it
> can't output an increase in particles that would be required).
> So I think it just ensures the pages are actually flushed to memory
> and accessible by the device (in case an IOMMU interferes) and not
> much than that.
>
> In any case, scatter particles to be used by hardware should *not*
> cross any physical page boundaries.
> But also see the thread I had on this with Ard - seems like the crypto
> API already has some mechanism for enforcing this but it's not enabled
> for AEAD ciphers?
>
> >
> > BTW, this isn't just a theoretical case. Many crypto API users do
> > crypto on
> > kmalloced buffers, and those can cross a page boundary, especially if
> > they are
> > large. All software crypto algorithms handle this case.
> >
> Software sits behind the CPU's MMU and sees virtual memory as
> contiguous. It does not need to "handle" anything, it gets it for free.
> Hardware does not have that luxury, unless you have a functioning IOMMU
> but that is still pretty rare.
> So for hardware, you need to break down your buffers until individual
> pages and stitch those together. That's the main use case of a scatter
> list and it requires the particles to NOT cross physical pages.
>
> > The fact that these types of issues are just being considered now
> > certainly
> > isn't raising my confidence in the hardware crypto drivers in the
> > kernel...
> >
> Actually, this is *not* a problem with the hardware drivers. It's a
> problem with the API and/or how you are trying to use it. Hardware
> does NOT see the nice contiguous virtual memory that SW sees.
>
I don't understand why you keep talking about virtual memory. The memory in
each scatterlist element is referenced by struct page, not by virtual address.
It may cross page boundaries; however, all pages referenced by each element are
guaranteed to be adjacent, i.e. physically contiguous. Am I missing something?
Note that memory allocated by kmalloc() is both virtually and physically
contigious. That's why it works to use sg_init_one() on a kmalloc()'ed buffer.
- Eric
On Wed, 17 Apr 2019 at 14:17, Pascal Van Leeuwen
<[email protected]> wrote:
>
> > -----Original Message-----
> > From: Eric Biggers [mailto:[email protected]]
> > Sent: Wednesday, April 17, 2019 10:24 PM
> > To: Pascal Van Leeuwen <[email protected]>
> > Cc: [email protected]; Herbert Xu
> > <[email protected]>
> > Subject: Re: Question regarding crypto scatterlists / testmgr
> >
> > Hi Pascal,
> >
> > On Wed, Apr 17, 2019 at 07:51:08PM +0000, Pascal Van Leeuwen wrote:
> > > Hi,
> > >
> > > I'm trying to fix the inside-secure driver to pass all testmgr
> > > tests and I have one final issue remaining with the AEAD ciphers.
> > > As it was not clear at all what the exact problem was, I spent
> > > some time reverse engineering testmgr and I got the distinct
> > > impression that it is using scatter particles that cross page
> > > boundaries. On purpose, even.
> > >
> > > While the inside-secure driver is built on the premise that
> > > scatter particles are continuous in device space. As I can't
> > > think of any reason why you would want to scatter/gather other
> > > than to handle virtual-to-physical address translation ...
> > > In any case, this should affect all other other operations as
> > > well, but maybe those just got "lucky" by getting particles
> > > that were still contiguous in device space, despite the page
> > > crossing (to *really* verify this, you would have to fully
> > > randomize your page allocation!)
> > >
> > > Anyway, assuming that I *should* be able to handle particles
> > > that are *not* contiguous in device space, then there should
> > > probably already exist some function in the kernel API that
> > > converts a scatterlist with non-contiguous particles into a
> > > scatterlist with contiguous particles, taking into account the
> > > presence of an IOMMU? Considering pretty much every device
> > > driver would need to do that?
> > > Does anyone know which function(s) to use for that?
> > >
> > > Regards,
> > > Pascal van Leeuwen
> > > Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
> > >
> >
> > Indeed, since v5.1, testmgr tests scatterlist elements that cross a
> > page.
> > However, the pages are guaranteed to be *physically* contiguous. Does
> > dma_map_sg() not handle this?
> >
> I'm not entirely sure and the API documentation is not particularly
> clear on *what* dma_map_sg() actually does, but I highly doubt it
> considering the particle count is only an input parameter (i.e. it
> can't output an increase in particles that would be required).
> So I think it just ensures the pages are actually flushed to memory
> and accessible by the device (in case an IOMMU interferes) and not
> much than that.
>
> In any case, scatter particles to be used by hardware should *not*
> cross any physical page boundaries.
> But also see the thread I had on this with Ard - seems like the crypto
> API already has some mechanism for enforcing this but it's not enabled
> for AEAD ciphers?
>
It has simply never been implemented because nobody had a need for it.
> >
> > BTW, this isn't just a theoretical case. Many crypto API users do
> > crypto on
> > kmalloced buffers, and those can cross a page boundary, especially if
> > they are
> > large. All software crypto algorithms handle this case.
> >
> Software sits behind the CPU's MMU and sees virtual memory as
> contiguous. It does not need to "handle" anything, it gets it for free.
> Hardware does not have that luxury, unless you have a functioning IOMMU
> but that is still pretty rare.
> So for hardware, you need to break down your buffers until individual
> pages and stitch those together. That's the main use case of a scatter
> list and it requires the particles to NOT cross physical pages.
>
kmalloc() is guaranteed to return physically contiguous memory, but
assuming that this results in contiguous DMA memory requires the DMA
map call to cover the whole thing, or the IOMMU may end up mapping it
in some other way.
The safe approach (which the async walk seems to take) is just to
carve up each scatterlist entry so it does not cross any page
boundaries, and return it as discrete steps in the walk.
> > The fact that these types of issues are just being considered now
> > certainly
> > isn't raising my confidence in the hardware crypto drivers in the
> > kernel...
> >
> Actually, this is *not* a problem with the hardware drivers. It's a
> problem with the API and/or how you are trying to use it. Hardware
> does NOT see the nice contiguous virtual memory that SW sees.
>
> If the driver may expect to receive particles that cross page
> boundaries - if that's the spec - fine, but then it will have to
> break those down into individual pages by itself. However, whomever
> created the inside-secure driver was under the impression that this
> was not supposed to be the case. And I don't know who's right or
> wrong there, but from a side discussion with Ard I got the impression
> that the Crypto API should fix this up before it reaches the driver.
>
To be clear, is that driver upstream? And if so, where does it reside?
> -----Original Message-----
> From: Eric Biggers [mailto:[email protected]]
> Sent: Wednesday, April 17, 2019 11:43 PM
> To: Pascal Van Leeuwen <[email protected]>
> Cc: [email protected]; Herbert Xu
> <[email protected]>
> Subject: Re: Question regarding crypto scatterlists / testmgr
>
> Hi Pascal,
>
> On Wed, Apr 17, 2019 at 09:16:54PM +0000, Pascal Van Leeuwen wrote:
> > > -----Original Message-----
> > > From: Eric Biggers [mailto:[email protected]]
> > > Sent: Wednesday, April 17, 2019 10:24 PM
> > > To: Pascal Van Leeuwen <[email protected]>
> > > Cc: [email protected]; Herbert Xu
> > > <[email protected]>
> > > Subject: Re: Question regarding crypto scatterlists / testmgr
> > >
> > > Hi Pascal,
> > >
> > > On Wed, Apr 17, 2019 at 07:51:08PM +0000, Pascal Van Leeuwen wrote:
> > > > Hi,
> > > >
> > > > I'm trying to fix the inside-secure driver to pass all testmgr
> > > > tests and I have one final issue remaining with the AEAD ciphers.
> > > > As it was not clear at all what the exact problem was, I spent
> > > > some time reverse engineering testmgr and I got the distinct
> > > > impression that it is using scatter particles that cross page
> > > > boundaries. On purpose, even.
> > > >
> > > > While the inside-secure driver is built on the premise that
> > > > scatter particles are continuous in device space. As I can't
> > > > think of any reason why you would want to scatter/gather other
> > > > than to handle virtual-to-physical address translation ...
> > > > In any case, this should affect all other other operations as
> > > > well, but maybe those just got "lucky" by getting particles
> > > > that were still contiguous in device space, despite the page
> > > > crossing (to *really* verify this, you would have to fully
> > > > randomize your page allocation!)
> > > >
> > > > Anyway, assuming that I *should* be able to handle particles
> > > > that are *not* contiguous in device space, then there should
> > > > probably already exist some function in the kernel API that
> > > > converts a scatterlist with non-contiguous particles into a
> > > > scatterlist with contiguous particles, taking into account the
> > > > presence of an IOMMU? Considering pretty much every device
> > > > driver would need to do that?
> > > > Does anyone know which function(s) to use for that?
> > > >
> > > > Regards,
> > > > Pascal van Leeuwen
> > > > Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
> > > >
> > >
> > > Indeed, since v5.1, testmgr tests scatterlist elements that cross a
> > > page.
> > > However, the pages are guaranteed to be *physically* contiguous.
> Does
> > > dma_map_sg() not handle this?
> > >
> > I'm not entirely sure and the API documentation is not particularly
> > clear on *what* dma_map_sg() actually does, but I highly doubt it
> > considering the particle count is only an input parameter (i.e. it
> > can't output an increase in particles that would be required).
> > So I think it just ensures the pages are actually flushed to memory
> > and accessible by the device (in case an IOMMU interferes) and not
> > much than that.
> >
> > In any case, scatter particles to be used by hardware should *not*
> > cross any physical page boundaries.
> > But also see the thread I had on this with Ard - seems like the
> crypto
> > API already has some mechanism for enforcing this but it's not
> enabled
> > for AEAD ciphers?
> >
> > >
> > > BTW, this isn't just a theoretical case. Many crypto API users do
> > > crypto on
> > > kmalloced buffers, and those can cross a page boundary, especially
> if
> > > they are
> > > large. All software crypto algorithms handle this case.
> > >
> > Software sits behind the CPU's MMU and sees virtual memory as
> > contiguous. It does not need to "handle" anything, it gets it for
> free.
> > Hardware does not have that luxury, unless you have a functioning
> IOMMU
> > but that is still pretty rare.
> > So for hardware, you need to break down your buffers until individual
> > pages and stitch those together. That's the main use case of a
> scatter
> > list and it requires the particles to NOT cross physical pages.
> >
> > > The fact that these types of issues are just being considered now
> > > certainly
> > > isn't raising my confidence in the hardware crypto drivers in the
> > > kernel...
> > >
> > Actually, this is *not* a problem with the hardware drivers. It's a
> > problem with the API and/or how you are trying to use it. Hardware
> > does NOT see the nice contiguous virtual memory that SW sees.
> >
>
> I don't understand why you keep talking about virtual memory. The
> memory in
> each scatterlist element is referenced by struct page, not by virtual
> address.
> It may cross page boundaries; however, all pages referenced by each
> element are
> guaranteed to be adjacent, i.e. physically contiguous. Am I missing
> something?
>
Ok, I'm not super at home with the behavior of all these kernel API calls
just, just learning, so I did not really know that. I thought you were
trying to say that the pages themselves were contiguous, not that the
pages were contiguous with respect to each other.
The pages not being contiguous seemed to be a perfect explanation for the
behavior I was seeing, and testmgr *does* try to make data cross page
boundaries, which does not seem to be super useful if they're guaranteed
to be contiguous anyway. However, I just thought of another reason that
could explain the same behavior (and why it affects only AEAD) ... let me
explore that tomorrow.
> Note that memory allocated by kmalloc() is both virtually and
> physically
> contigious. That's why it works to use sg_init_one() on a kmalloc()'ed
> buffer.
>
> - Eric
Thanks,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines
> -----Original Message-----
> From: Ard Biesheuvel [mailto:[email protected]]
> Sent: Wednesday, April 17, 2019 11:43 PM
> To: Pascal Van Leeuwen <[email protected]>
> Cc: Eric Biggers <[email protected]>; [email protected];
> Herbert Xu <[email protected]>
> Subject: Re: Question regarding crypto scatterlists / testmgr
>
> On Wed, 17 Apr 2019 at 14:17, Pascal Van Leeuwen
> <[email protected]> wrote:
> >
> > > -----Original Message-----
> > > From: Eric Biggers [mailto:[email protected]]
> > > Sent: Wednesday, April 17, 2019 10:24 PM
> > > To: Pascal Van Leeuwen <[email protected]>
> > > Cc: [email protected]; Herbert Xu
> > > <[email protected]>
> > > Subject: Re: Question regarding crypto scatterlists / testmgr
> > >
> > > Hi Pascal,
> > >
> > > On Wed, Apr 17, 2019 at 07:51:08PM +0000, Pascal Van Leeuwen wrote:
> > > > Hi,
> > > >
> > > > I'm trying to fix the inside-secure driver to pass all testmgr
> > > > tests and I have one final issue remaining with the AEAD ciphers.
> > > > As it was not clear at all what the exact problem was, I spent
> > > > some time reverse engineering testmgr and I got the distinct
> > > > impression that it is using scatter particles that cross page
> > > > boundaries. On purpose, even.
> > > >
> > > > While the inside-secure driver is built on the premise that
> > > > scatter particles are continuous in device space. As I can't
> > > > think of any reason why you would want to scatter/gather other
> > > > than to handle virtual-to-physical address translation ...
> > > > In any case, this should affect all other other operations as
> > > > well, but maybe those just got "lucky" by getting particles
> > > > that were still contiguous in device space, despite the page
> > > > crossing (to *really* verify this, you would have to fully
> > > > randomize your page allocation!)
> > > >
> > > > Anyway, assuming that I *should* be able to handle particles
> > > > that are *not* contiguous in device space, then there should
> > > > probably already exist some function in the kernel API that
> > > > converts a scatterlist with non-contiguous particles into a
> > > > scatterlist with contiguous particles, taking into account the
> > > > presence of an IOMMU? Considering pretty much every device
> > > > driver would need to do that?
> > > > Does anyone know which function(s) to use for that?
> > > >
> > > > Regards,
> > > > Pascal van Leeuwen
> > > > Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
> > > >
> > >
> > > Indeed, since v5.1, testmgr tests scatterlist elements that cross a
> > > page.
> > > However, the pages are guaranteed to be *physically* contiguous.
> Does
> > > dma_map_sg() not handle this?
> > >
> > I'm not entirely sure and the API documentation is not particularly
> > clear on *what* dma_map_sg() actually does, but I highly doubt it
> > considering the particle count is only an input parameter (i.e. it
> > can't output an increase in particles that would be required).
> > So I think it just ensures the pages are actually flushed to memory
> > and accessible by the device (in case an IOMMU interferes) and not
> > much than that.
> >
> > In any case, scatter particles to be used by hardware should *not*
> > cross any physical page boundaries.
> > But also see the thread I had on this with Ard - seems like the
> crypto
> > API already has some mechanism for enforcing this but it's not
> enabled
> > for AEAD ciphers?
> >
>
> It has simply never been implemented because nobody had a need for it.
>
> > >
> > > BTW, this isn't just a theoretical case. Many crypto API users do
> > > crypto on
> > > kmalloced buffers, and those can cross a page boundary, especially
> if
> > > they are
> > > large. All software crypto algorithms handle this case.
> > >
> > Software sits behind the CPU's MMU and sees virtual memory as
> > contiguous. It does not need to "handle" anything, it gets it for
> free.
> > Hardware does not have that luxury, unless you have a functioning
> IOMMU
> > but that is still pretty rare.
> > So for hardware, you need to break down your buffers until individual
> > pages and stitch those together. That's the main use case of a
> scatter
> > list and it requires the particles to NOT cross physical pages.
> >
>
> kmalloc() is guaranteed to return physically contiguous memory, but
> assuming that this results in contiguous DMA memory requires the DMA
> map call to cover the whole thing, or the IOMMU may end up mapping it
> in some other way.
>
> The safe approach (which the async walk seems to take) is just to
> carve up each scatterlist entry so it does not cross any page
> boundaries, and return it as discrete steps in the walk.
>
That's interesting. Is that actually true though or just assumption?
If the pages are guaranteed to be contiguous, then why break up the
scatter chain further into individual pages?
For our hardware, the number of particles may become a performance
bottleneck, so the less particles the better. Also, the work to walk
the chain and break it up would take up precious CPU cycles.
>
> > > The fact that these types of issues are just being considered now
> > > certainly
> > > isn't raising my confidence in the hardware crypto drivers in the
> > > kernel...
> > >
> > Actually, this is *not* a problem with the hardware drivers. It's a
> > problem with the API and/or how you are trying to use it. Hardware
> > does NOT see the nice contiguous virtual memory that SW sees.
> >
> > If the driver may expect to receive particles that cross page
> > boundaries - if that's the spec - fine, but then it will have to
> > break those down into individual pages by itself. However, whomever
> > created the inside-secure driver was under the impression that this
> > was not supposed to be the case. And I don't know who's right or
> > wrong there, but from a side discussion with Ard I got the impression
> > that the Crypto API should fix this up before it reaches the driver.
> >
>
> To be clear, is that driver upstream? And if so, where does it reside?
>
FYI: the original driver I started with is upstream:
drivers/crypto/inside-secure
Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
On Wed, 17 Apr 2019 at 20:16, Pascal Van Leeuwen
<[email protected]> wrote:
>
> > -----Original Message-----
> > From: Ard Biesheuvel [mailto:[email protected]]
> > Sent: Wednesday, April 17, 2019 11:43 PM
> > To: Pascal Van Leeuwen <[email protected]>
> > Cc: Eric Biggers <[email protected]>; [email protected];
> > Herbert Xu <[email protected]>
> > Subject: Re: Question regarding crypto scatterlists / testmgr
> >
> > On Wed, 17 Apr 2019 at 14:17, Pascal Van Leeuwen
> > <[email protected]> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Eric Biggers [mailto:[email protected]]
> > > > Sent: Wednesday, April 17, 2019 10:24 PM
> > > > To: Pascal Van Leeuwen <[email protected]>
> > > > Cc: [email protected]; Herbert Xu
> > > > <[email protected]>
> > > > Subject: Re: Question regarding crypto scatterlists / testmgr
> > > >
> > > > Hi Pascal,
> > > >
> > > > On Wed, Apr 17, 2019 at 07:51:08PM +0000, Pascal Van Leeuwen wrote:
> > > > > Hi,
> > > > >
> > > > > I'm trying to fix the inside-secure driver to pass all testmgr
> > > > > tests and I have one final issue remaining with the AEAD ciphers.
> > > > > As it was not clear at all what the exact problem was, I spent
> > > > > some time reverse engineering testmgr and I got the distinct
> > > > > impression that it is using scatter particles that cross page
> > > > > boundaries. On purpose, even.
> > > > >
> > > > > While the inside-secure driver is built on the premise that
> > > > > scatter particles are continuous in device space. As I can't
> > > > > think of any reason why you would want to scatter/gather other
> > > > > than to handle virtual-to-physical address translation ...
> > > > > In any case, this should affect all other other operations as
> > > > > well, but maybe those just got "lucky" by getting particles
> > > > > that were still contiguous in device space, despite the page
> > > > > crossing (to *really* verify this, you would have to fully
> > > > > randomize your page allocation!)
> > > > >
> > > > > Anyway, assuming that I *should* be able to handle particles
> > > > > that are *not* contiguous in device space, then there should
> > > > > probably already exist some function in the kernel API that
> > > > > converts a scatterlist with non-contiguous particles into a
> > > > > scatterlist with contiguous particles, taking into account the
> > > > > presence of an IOMMU? Considering pretty much every device
> > > > > driver would need to do that?
> > > > > Does anyone know which function(s) to use for that?
> > > > >
> > > > > Regards,
> > > > > Pascal van Leeuwen
> > > > > Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
> > > > >
> > > >
> > > > Indeed, since v5.1, testmgr tests scatterlist elements that cross a
> > > > page.
> > > > However, the pages are guaranteed to be *physically* contiguous.
> > Does
> > > > dma_map_sg() not handle this?
> > > >
> > > I'm not entirely sure and the API documentation is not particularly
> > > clear on *what* dma_map_sg() actually does, but I highly doubt it
> > > considering the particle count is only an input parameter (i.e. it
> > > can't output an increase in particles that would be required).
> > > So I think it just ensures the pages are actually flushed to memory
> > > and accessible by the device (in case an IOMMU interferes) and not
> > > much than that.
> > >
> > > In any case, scatter particles to be used by hardware should *not*
> > > cross any physical page boundaries.
> > > But also see the thread I had on this with Ard - seems like the
> > crypto
> > > API already has some mechanism for enforcing this but it's not
> > enabled
> > > for AEAD ciphers?
> > >
> >
> > It has simply never been implemented because nobody had a need for it.
> >
> > > >
> > > > BTW, this isn't just a theoretical case. Many crypto API users do
> > > > crypto on
> > > > kmalloced buffers, and those can cross a page boundary, especially
> > if
> > > > they are
> > > > large. All software crypto algorithms handle this case.
> > > >
> > > Software sits behind the CPU's MMU and sees virtual memory as
> > > contiguous. It does not need to "handle" anything, it gets it for
> > free.
> > > Hardware does not have that luxury, unless you have a functioning
> > IOMMU
> > > but that is still pretty rare.
> > > So for hardware, you need to break down your buffers until individual
> > > pages and stitch those together. That's the main use case of a
> > scatter
> > > list and it requires the particles to NOT cross physical pages.
> > >
> >
> > kmalloc() is guaranteed to return physically contiguous memory, but
> > assuming that this results in contiguous DMA memory requires the DMA
> > map call to cover the whole thing, or the IOMMU may end up mapping it
> > in some other way.
> >
> > The safe approach (which the async walk seems to take) is just to
> > carve up each scatterlist entry so it does not cross any page
> > boundaries, and return it as discrete steps in the walk.
> >
> That's interesting. Is that actually true though or just assumption?
> If the pages are guaranteed to be contiguous, then why break up the
> scatter chain further into individual pages?
> For our hardware, the number of particles may become a performance
> bottleneck, so the less particles the better. Also, the work to walk
> the chain and break it up would take up precious CPU cycles.
>
Seems like I was misreading the code: we have the following code in
skcipher_walk_next
if (!err && (walk->flags & SKCIPHER_WALK_PHYS)) {
walk->src.phys.page = virt_to_page(walk->src.virt.addr);
walk->dst.phys.page = virt_to_page(walk->dst.virt.addr);
walk->src.phys.offset &= PAGE_SIZE - 1;
walk->dst.phys.offset &= PAGE_SIZE - 1;
}
but all that does is normalize the offset. In fact, this code looks
slightly dodgy to me, given that, if the offset /does/ exceed
PAGE_SIZE, it normalizes the offset but does not advance the page
pointers accordingly.
The thing to be aware of is that struct pages are not guaranteed to be
mapped on the CPU, and so a lot of the virt handling deals with
mapping/unmapping on the *cpu* side rather than the device side. So a
phys walk gives you each physically contiguous entry in turn, and it
is up to the device driver to map it for DMA if needed.
To satisfy my curiosity, I looked at the existing async drivers, and
very few actually appear to be using any of this stuff. So perhaps my
attempt to clarify things ended up achieving the opposite, and we are
really only interested in whether dma_map_sg() does what you expect in
your driver.
> >
> > > > The fact that these types of issues are just being considered now
> > > > certainly
> > > > isn't raising my confidence in the hardware crypto drivers in the
> > > > kernel...
> > > >
> > > Actually, this is *not* a problem with the hardware drivers. It's a
> > > problem with the API and/or how you are trying to use it. Hardware
> > > does NOT see the nice contiguous virtual memory that SW sees.
> > >
> > > If the driver may expect to receive particles that cross page
> > > boundaries - if that's the spec - fine, but then it will have to
> > > break those down into individual pages by itself. However, whomever
> > > created the inside-secure driver was under the impression that this
> > > was not supposed to be the case. And I don't know who's right or
> > > wrong there, but from a side discussion with Ard I got the impression
> > > that the Crypto API should fix this up before it reaches the driver.
> > >
> >
> > To be clear, is that driver upstream? And if so, where does it reside?
> >
> FYI: the original driver I started with is upstream:
> drivers/crypto/inside-secure
>
OK, so indeed, you are using dma_map_sg(), which seems absolutely fine
if your hardware supports that model. So apologies for the noise ...
On Wed, Apr 17, 2019 at 08:29:59PM -0700, Ard Biesheuvel wrote:
>
> Seems like I was misreading the code: we have the following code in
> skcipher_walk_next
>
> if (!err && (walk->flags & SKCIPHER_WALK_PHYS)) {
> walk->src.phys.page = virt_to_page(walk->src.virt.addr);
> walk->dst.phys.page = virt_to_page(walk->dst.virt.addr);
> walk->src.phys.offset &= PAGE_SIZE - 1;
> walk->dst.phys.offset &= PAGE_SIZE - 1;
> }
>
> but all that does is normalize the offset. In fact, this code looks
> slightly dodgy to me, given that, if the offset /does/ exceed
> PAGE_SIZE, it normalizes the offset but does not advance the page
> pointers accordingly.
I wouldn't be surprised if the async walk code is buggy. Hardly
anybody uses this.
Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt