Improper calculation of the number of pages causes bio_alloc() to
be called with nr_iovecs=0, and slab corruption later.
For example, a simple scatterlist that fails: {(3644,452), (0, 60)},
(offset, size). bufflen=512 => nr_pages=1 => breakage. The proper
page count for this example is 2.
Signed-off-by: Dan Aloni <[email protected]>
---
commit 8faa94b01e6fd4518b760ce39a2db0ede9444ded
tree c2e3c6ee5f59a4c1e166e4798ddc6e938f448de2
parent c4a1745aa09fc110afdefea0e5d025043e348bae
author Dan Aloni <[email protected]> Tue, 21 Mar 2006 10:19:11 +0200
committer Dan Aloni <[email protected]> Tue, 21 Mar 2006 10:19:11 +0200
drivers/scsi/scsi_lib.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 701a328..a42f3aa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -368,13 +368,20 @@ static int scsi_req_map_sg(struct reques
int nsegs, unsigned bufflen, gfp_t gfp)
{
struct request_queue *q = rq->q;
- int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ int nr_pages = 0;
unsigned int data_len = 0, len, bytes, off;
struct page *page;
struct bio *bio = NULL;
int i, err, nr_vecs = 0;
for (i = 0; i < nsegs; i++) {
+ off = sgl[i].offset;
+ len = sgl[i].length;
+
+ nr_pages += ((off + len + PAGE_SIZE - 1) >> PAGE_SHIFT) - (off >> PAGE_SHIFT);
+ }
+
+ for (i = 0; i < nsegs; i++) {
page = sgl[i].page;
off = sgl[i].offset;
len = sgl[i].length;
--
Dan Aloni
[email protected], [email protected], [email protected], [email protected]
This is a good email to discuss on the scsi list:
[email protected]; whom I've added to the cc list.
On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
> Improper calculation of the number of pages causes bio_alloc() to
> be called with nr_iovecs=0, and slab corruption later.
>
> For example, a simple scatterlist that fails: {(3644,452), (0, 60)},
> (offset, size). bufflen=512 => nr_pages=1 => breakage. The proper
> page count for this example is 2.
Such a scatterlist would likely violate the device's underlying
boundaries and is not legal ... there's supposed to be special code
checking the queue alignment and copying the bio to an aligned buffer if
the limits are violated. Where are you generating these scatterlists
from?
James
On Tue, Mar 21, 2006 at 09:54:54AM -0600, James Bottomley wrote:
> This is a good email to discuss on the scsi list:
> [email protected]; whom I've added to the cc list.
>
> On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
> > Improper calculation of the number of pages causes bio_alloc() to
> > be called with nr_iovecs=0, and slab corruption later.
> >
> > For example, a simple scatterlist that fails: {(3644,452), (0, 60)},
> > (offset, size). bufflen=512 => nr_pages=1 => breakage. The proper
> > page count for this example is 2.
>
> Such a scatterlist would likely violate the device's underlying
> boundaries and is not legal ... there's supposed to be special code
> checking the queue alignment and copying the bio to an aligned buffer if
> the limits are violated. Where are you generating these scatterlists
> from?
These scatterlists can be generated using the sg driver. Though I am
actually running a customized version of the sg driver, it seems the
conversion from a userspace array of sg_iovec_t to scatterlist stays
the same and also applies to the original driver (see
st_map_user_pages()).
--
Dan Aloni
[email protected], [email protected], [email protected], [email protected]
On Tuesday 21 March 2006 10:19, Dan Aloni wrote:
> On Tue, Mar 21, 2006 at 09:54:54AM -0600, James Bottomley wrote:
> > This is a good email to discuss on the scsi list:
> > [email protected]; whom I've added to the cc list.
> >
> > On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
> > > Improper calculation of the number of pages causes bio_alloc() to
> > > be called with nr_iovecs=0, and slab corruption later.
> > >
> > > For example, a simple scatterlist that fails: {(3644,452), (0, 60)},
> > > (offset, size). bufflen=512 => nr_pages=1 => breakage. The proper
> > > page count for this example is 2.
> >
> > Such a scatterlist would likely violate the device's underlying
> > boundaries and is not legal ... there's supposed to be special code
> > checking the queue alignment and copying the bio to an aligned buffer if
> > the limits are violated. Where are you generating these scatterlists
> > from?
>
> These scatterlists can be generated using the sg driver. Though I am
> actually running a customized version of the sg driver, it seems the
> conversion from a userspace array of sg_iovec_t to scatterlist stays
> the same and also applies to the original driver (see
> st_map_user_pages()).
Hello,
I am seeing the same issue when using direct io with sg. sg will perform
direct io on any date that is aligned with the devices dma_align. The
default for drivers that do not specify is 512. sg builds the scatter gather
list from the user specified location, offsetting the first entry in the list
if not page aligned. This is the case that causes the improper allocation of
"nr_iovec" in scsi_req_map_sgand the later slab corruption.
I don't think it is necessary to calculate nr_pages from the entire list.
Only sgl[0] is allowed to have an offset, so we can calculate from that as
follows.
--- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
+++ b/drivers/scsi/scsi_lib.c 2006-03-21 11:36:39.389763804 -0600
@@ -368,12 +368,15 @@
int nsegs, unsigned bufflen, gfp_t gfp)
{
struct request_queue *q = rq->q;
- int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ int nr_pages = 0;
unsigned int data_len = 0, len, bytes, off;
struct page *page;
struct bio *bio = NULL;
int i, err, nr_vecs = 0;
-
+
+ if (nsegs)
+ nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
for (i = 0; i < nsegs; i++) {
page = sgl[i].page;
off = sgl[i].offset;
--
Bryan Holty
Bryan Holty wrote:
> On Tuesday 21 March 2006 10:19, Dan Aloni wrote:
>
>>On Tue, Mar 21, 2006 at 09:54:54AM -0600, James Bottomley wrote:
>>
>>>This is a good email to discuss on the scsi list:
>>>[email protected]; whom I've added to the cc list.
>>>
>>>On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
>>>
>>>>Improper calculation of the number of pages causes bio_alloc() to
>>>>be called with nr_iovecs=0, and slab corruption later.
>>>>
>>>>For example, a simple scatterlist that fails: {(3644,452), (0, 60)},
>>>>(offset, size). bufflen=512 => nr_pages=1 => breakage. The proper
>>>>page count for this example is 2.
>>>
>>>Such a scatterlist would likely violate the device's underlying
>>>boundaries and is not legal ... there's supposed to be special code
>>>checking the queue alignment and copying the bio to an aligned buffer if
>>>the limits are violated. Where are you generating these scatterlists
>>>from?
>>
>>These scatterlists can be generated using the sg driver. Though I am
>>actually running a customized version of the sg driver, it seems the
>>conversion from a userspace array of sg_iovec_t to scatterlist stays
>>the same and also applies to the original driver (see
>>st_map_user_pages()).
>
>
> Hello,
> I am seeing the same issue when using direct io with sg. sg will perform
> direct io on any date that is aligned with the devices dma_align. The
> default for drivers that do not specify is 512. sg builds the scatter gather
> list from the user specified location, offsetting the first entry in the list
> if not page aligned. This is the case that causes the improper allocation of
> "nr_iovec" in scsi_req_map_sgand the later slab corruption.
>
> I don't think it is necessary to calculate nr_pages from the entire list.
> Only sgl[0] is allowed to have an offset, so we can calculate from that as
> follows.
> --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
> +++ b/drivers/scsi/scsi_lib.c 2006-03-21 11:36:39.389763804 -0600
> @@ -368,12 +368,15 @@
> int nsegs, unsigned bufflen, gfp_t gfp)
> {
> struct request_queue *q = rq->q;
> - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + int nr_pages = 0;
> unsigned int data_len = 0, len, bytes, off;
> struct page *page;
> struct bio *bio = NULL;
> int i, err, nr_vecs = 0;
> -
> +
> + if (nsegs)
you can drop that test
> + nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +
I think we can do this without looping but I think this is broken. If we
had a slight variant of Dan's example but we have a page and some change
in the first entry {3644, 4548} and 0,60} in the last one, that would
would only calculate two pages but we want three.
I think we can have to calculate the first and last entries but the
middle ones we can assume have no offset and lengths that are multiples
of a page.
On Tuesday 21 March 2006 13:17, Mike Christie wrote:
> Bryan Holty wrote:
> > On Tuesday 21 March 2006 10:19, Dan Aloni wrote:
> >>On Tue, Mar 21, 2006 at 09:54:54AM -0600, James Bottomley wrote:
> >>>This is a good email to discuss on the scsi list:
> >>>[email protected]; whom I've added to the cc list.
> >>>
> >>>On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
> >>>>Improper calculation of the number of pages causes bio_alloc() to
> >>>>be called with nr_iovecs=0, and slab corruption later.
> >>>>
> >>>>For example, a simple scatterlist that fails: {(3644,452), (0, 60)},
> >>>>(offset, size). bufflen=512 => nr_pages=1 => breakage. The proper
> >>>>page count for this example is 2.
> >>>
> >>>Such a scatterlist would likely violate the device's underlying
> >>>boundaries and is not legal ... there's supposed to be special code
> >>>checking the queue alignment and copying the bio to an aligned buffer if
> >>>the limits are violated. Where are you generating these scatterlists
> >>>from?
> >>
> >>These scatterlists can be generated using the sg driver. Though I am
> >>actually running a customized version of the sg driver, it seems the
> >>conversion from a userspace array of sg_iovec_t to scatterlist stays
> >>the same and also applies to the original driver (see
> >>st_map_user_pages()).
> >
> > Hello,
> > I am seeing the same issue when using direct io with sg. sg will
> > perform direct io on any date that is aligned with the devices dma_align.
> > The default for drivers that do not specify is 512. sg builds the
> > scatter gather list from the user specified location, offsetting the
> > first entry in the list if not page aligned. This is the case that
> > causes the improper allocation of "nr_iovec" in scsi_req_map_sgand the
> > later slab corruption.
> >
> > I don't think it is necessary to calculate nr_pages from the entire
> > list. Only sgl[0] is allowed to have an offset, so we can calculate from
> > that as follows.
> > --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
> > +++ b/drivers/scsi/scsi_lib.c 2006-03-21 11:36:39.389763804 -0600
> > @@ -368,12 +368,15 @@
> > int nsegs, unsigned bufflen, gfp_t gfp)
> > {
> > struct request_queue *q = rq->q;
> > - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + int nr_pages = 0;
> > unsigned int data_len = 0, len, bytes, off;
> > struct page *page;
> > struct bio *bio = NULL;
> > int i, err, nr_vecs = 0;
> > -
> > +
> > + if (nsegs)
>
> you can drop that test
>
> > + nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > +
>
> I think we can do this without looping but I think this is broken. If we
> had a slight variant of Dan's example but we have a page and some change
> in the first entry {3644, 4548} and 0,60} in the last one, that would
> would only calculate two pages but we want three.
>
> I think we can have to calculate the first and last entries but the
> middle ones we can assume have no offset and lengths that are multiples
> of a page.
You are absolutely correct.
bufflen is not page masked and when added to the offset of the first entry,
will generate the correct nr_pages.
nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
==
nr_pages = ((bufflen & PAGE_MASK) + (PAGE_SIZE-1) + sgl[0].offset +
sgl[nsegs-1].length) >> PAGE_SHIFT;
Either will calculate correctly.
--
Bryan Holty
On Tuesday 21 March 2006 14:48, Bryan Holty wrote:
> On Tuesday 21 March 2006 13:17, Mike Christie wrote:
> > Bryan Holty wrote:
> > > On Tuesday 21 March 2006 10:19, Dan Aloni wrote:
> > >>On Tue, Mar 21, 2006 at 09:54:54AM -0600, James Bottomley wrote:
> > >>>This is a good email to discuss on the scsi list:
> > >>>[email protected]; whom I've added to the cc list.
> > >>>
> > >>>On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
> > >>>>Improper calculation of the number of pages causes bio_alloc() to
> > >>>>be called with nr_iovecs=0, and slab corruption later.
> > >>>>
> > >>>>For example, a simple scatterlist that fails: {(3644,452), (0, 60)},
> > >>>>(offset, size). bufflen=512 => nr_pages=1 => breakage. The proper
> > >>>>page count for this example is 2.
> > >>>
> > >>>Such a scatterlist would likely violate the device's underlying
> > >>>boundaries and is not legal ... there's supposed to be special code
> > >>>checking the queue alignment and copying the bio to an aligned buffer
> > >>> if the limits are violated. Where are you generating these
> > >>> scatterlists from?
> > >>
> > >>These scatterlists can be generated using the sg driver. Though I am
> > >>actually running a customized version of the sg driver, it seems the
> > >>conversion from a userspace array of sg_iovec_t to scatterlist stays
> > >>the same and also applies to the original driver (see
> > >>st_map_user_pages()).
> > >
> > > Hello,
> > > I am seeing the same issue when using direct io with sg. sg will
> > > perform direct io on any date that is aligned with the devices
> > > dma_align. The default for drivers that do not specify is 512. sg
> > > builds the scatter gather list from the user specified location,
> > > offsetting the first entry in the list if not page aligned. This is
> > > the case that causes the improper allocation of "nr_iovec" in
> > > scsi_req_map_sgand the later slab corruption.
> > >
> > > I don't think it is necessary to calculate nr_pages from the entire
> > > list. Only sgl[0] is allowed to have an offset, so we can calculate
> > > from that as follows.
> > > --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
> > > +++ b/drivers/scsi/scsi_lib.c 2006-03-21 11:36:39.389763804 -0600
> > > @@ -368,12 +368,15 @@
> > > int nsegs, unsigned bufflen, gfp_t gfp)
> > > {
> > > struct request_queue *q = rq->q;
> > > - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > + int nr_pages = 0;
> > > unsigned int data_len = 0, len, bytes, off;
> > > struct page *page;
> > > struct bio *bio = NULL;
> > > int i, err, nr_vecs = 0;
> > > -
> > > +
> > > + if (nsegs)
> >
> > you can drop that test
> >
> > > + nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +
> >
> > I think we can do this without looping but I think this is broken. If we
> > had a slight variant of Dan's example but we have a page and some change
> > in the first entry {3644, 4548} and 0,60} in the last one, that would
> > would only calculate two pages but we want three.
> >
> > I think we can have to calculate the first and last entries but the
> > middle ones we can assume have no offset and lengths that are multiples
> > of a page.
>
> You are absolutely correct.
> bufflen is not page masked and when added to the offset of the first entry,
> will generate the correct nr_pages.
>
> nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> ==
> nr_pages = ((bufflen & PAGE_MASK) + (PAGE_SIZE-1) + sgl[0].offset +
> sgl[nsegs-1].length) >> PAGE_SHIFT;
>
> Either will calculate correctly.
>
> --
> Bryan Holty
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Based on above, I think the most intuitive fix would be the offset addition of
the first entry to the initialization of nr_pages.
Without this change, for instance, with 4K io's every sg io that is
dma_aligned for direct io, but not page aligned will cause slab corruption
and an oops
I am able to run a number of tests with sg that cause the boundary to be
crossed, and with this fix there is no slab corruption or data corruption.
Thanks Dan, I had been hunting for this for a couple of days!!
Thoughts??
Signed-off-by: Bryan Holty <[email protected]>
--- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
+++ b/drivers/scsi/scsi_lib.c 2006-03-22 06:09:09.669599539 -0600
@@ -368,7 +368,7 @@
int nsegs, unsigned bufflen, gfp_t gfp)
{
struct request_queue *q = rq->q;
- int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
unsigned int data_len = 0, len, bytes, off;
struct page *page;
struct bio *bio = NULL;
--
Bryan Holty
On Tue, Mar 21, 2006 at 06:19:12PM +0200, Dan Aloni wrote:
> These scatterlists can be generated using the sg driver. Though I am
> actually running a customized version of the sg driver, it seems the
> conversion from a userspace array of sg_iovec_t to scatterlist stays
> the same and also applies to the original driver (see
> st_map_user_pages()).
What kernel version did you reproduce this with? Since 2.6.16 sg should
obey all request size/alingment limitations. If not that's a bug in
scsi_execute_async and it's helpers and should be fixed there.
On Thursday 23 March 2006 08:52, Christoph Hellwig wrote:
> On Tue, Mar 21, 2006 at 06:19:12PM +0200, Dan Aloni wrote:
> > These scatterlists can be generated using the sg driver. Though I am
> > actually running a customized version of the sg driver, it seems the
> > conversion from a userspace array of sg_iovec_t to scatterlist stays
> > the same and also applies to the original driver (see
> > st_map_user_pages()).
>
> What kernel version did you reproduce this with? Since 2.6.16 sg should
> obey all request size/alingment limitations. If not that's a bug in
> scsi_execute_async and it's helpers and should be fixed there.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
I am able to reproduce this with 2.6.16-rc5 - 2.6.16. There is a problem in
scsi_req_map_sg which is called by scsi_execute_async.
Currently, scsi_req_map_sg assumes every sgl entry is page aligned. It will
cause later slab corruption by under-allocating the number of bio entries if
sgl[0].offset + sgl[last].length > PAGE_SIZE.
Dan pointed this out, and I have submitted a patch that I believe correctly
fixes the issue. Just waiting for some feedback.
--
Bryan Holty
Bryan Holty wrote:
> On Tuesday 21 March 2006 14:48, Bryan Holty wrote:
>> On Tuesday 21 March 2006 13:17, Mike Christie wrote:
>>> Bryan Holty wrote:
>>>> On Tuesday 21 March 2006 10:19, Dan Aloni wrote:
>>>>> On Tue, Mar 21, 2006 at 09:54:54AM -0600, James Bottomley wrote:
>>>>>> This is a good email to discuss on the scsi list:
>>>>>> [email protected]; whom I've added to the cc list.
>>>>>>
>>>>>> On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
>>>>>>> Improper calculation of the number of pages causes bio_alloc() to
>>>>>>> be called with nr_iovecs=0, and slab corruption later.
>>>>>>>
>>>>>>> For example, a simple scatterlist that fails: {(3644,452), (0, 60)},
>>>>>>> (offset, size). bufflen=512 => nr_pages=1 => breakage. The proper
>>>>>>> page count for this example is 2.
>>>>>> Such a scatterlist would likely violate the device's underlying
>>>>>> boundaries and is not legal ... there's supposed to be special code
>>>>>> checking the queue alignment and copying the bio to an aligned buffer
>>>>>> if the limits are violated. Where are you generating these
>>>>>> scatterlists from?
>>>>> These scatterlists can be generated using the sg driver. Though I am
>>>>> actually running a customized version of the sg driver, it seems the
>>>>> conversion from a userspace array of sg_iovec_t to scatterlist stays
>>>>> the same and also applies to the original driver (see
>>>>> st_map_user_pages()).
>>>> Hello,
>>>> I am seeing the same issue when using direct io with sg. sg will
>>>> perform direct io on any date that is aligned with the devices
>>>> dma_align. The default for drivers that do not specify is 512. sg
>>>> builds the scatter gather list from the user specified location,
>>>> offsetting the first entry in the list if not page aligned. This is
>>>> the case that causes the improper allocation of "nr_iovec" in
>>>> scsi_req_map_sgand the later slab corruption.
>>>>
>>>> I don't think it is necessary to calculate nr_pages from the entire
>>>> list. Only sgl[0] is allowed to have an offset, so we can calculate
>>>> from that as follows.
>>>> --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
>>>> +++ b/drivers/scsi/scsi_lib.c 2006-03-21 11:36:39.389763804 -0600
>>>> @@ -368,12 +368,15 @@
>>>> int nsegs, unsigned bufflen, gfp_t gfp)
>>>> {
>>>> struct request_queue *q = rq->q;
>>>> - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
>>>> + int nr_pages = 0;
>>>> unsigned int data_len = 0, len, bytes, off;
>>>> struct page *page;
>>>> struct bio *bio = NULL;
>>>> int i, err, nr_vecs = 0;
>>>> -
>>>> +
>>>> + if (nsegs)
>>> you can drop that test
>>>
>>>> + nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
>>>> +
>>> I think we can do this without looping but I think this is broken. If we
>>> had a slight variant of Dan's example but we have a page and some change
>>> in the first entry {3644, 4548} and 0,60} in the last one, that would
>>> would only calculate two pages but we want three.
>>>
>>> I think we can have to calculate the first and last entries but the
>>> middle ones we can assume have no offset and lengths that are multiples
>>> of a page.
>> You are absolutely correct.
>> bufflen is not page masked and when added to the offset of the first entry,
>> will generate the correct nr_pages.
>>
>> nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
>> ==
>> nr_pages = ((bufflen & PAGE_MASK) + (PAGE_SIZE-1) + sgl[0].offset +
>> sgl[nsegs-1].length) >> PAGE_SHIFT;
>>
>> Either will calculate correctly.
>>
>> --
>> Bryan Holty
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> Based on above, I think the most intuitive fix would be the offset addition of
> the first entry to the initialization of nr_pages.
>
> Without this change, for instance, with 4K io's every sg io that is
> dma_aligned for direct io, but not page aligned will cause slab corruption
> and an oops
>
> I am able to run a number of tests with sg that cause the boundary to be
> crossed, and with this fix there is no slab corruption or data corruption.
>
> Thanks Dan, I had been hunting for this for a couple of days!!
>
> Thoughts??
>
> Signed-off-by: Bryan Holty <[email protected]>
>
> --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
> +++ b/drivers/scsi/scsi_lib.c 2006-03-22 06:09:09.669599539 -0600
> @@ -368,7 +368,7 @@
> int nsegs, unsigned bufflen, gfp_t gfp)
> {
> struct request_queue *q = rq->q;
> - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> unsigned int data_len = 0, len, bytes, off;
> struct page *page;
> struct bio *bio = NULL;
> --
Was this patch ok? I was looking at some other problem and noticed this
was not merged. The original calculation that I did in the code is
wrong. I thought the calculation in this patch was correct, but I have
showed that my math skills are lacking so I may be wrong.
On Friday 26 May 2006 01:13, Mike Christie wrote:
> Bryan Holty wrote:
> > On Tuesday 21 March 2006 14:48, Bryan Holty wrote:
> >> On Tuesday 21 March 2006 13:17, Mike Christie wrote:
> >>> Bryan Holty wrote:
> >>>> On Tuesday 21 March 2006 10:19, Dan Aloni wrote:
> >>>>> On Tue, Mar 21, 2006 at 09:54:54AM -0600, James Bottomley wrote:
> >>>>>> This is a good email to discuss on the scsi list:
> >>>>>> [email protected]; whom I've added to the cc list.
> >>>>>>
> >>>>>> On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
> >>>>>>> Improper calculation of the number of pages causes bio_alloc() to
> >>>>>>> be called with nr_iovecs=0, and slab corruption later.
> >>>>>>>
> >>>>>>> For example, a simple scatterlist that fails: {(3644,452), (0,
> >>>>>>> 60)}, (offset, size). bufflen=512 => nr_pages=1 => breakage. The
> >>>>>>> proper page count for this example is 2.
> >>>>>>
> >>>>>> Such a scatterlist would likely violate the device's underlying
> >>>>>> boundaries and is not legal ... there's supposed to be special code
> >>>>>> checking the queue alignment and copying the bio to an aligned
> >>>>>> buffer if the limits are violated. Where are you generating these
> >>>>>> scatterlists from?
> >>>>>
> >>>>> These scatterlists can be generated using the sg driver. Though I am
> >>>>> actually running a customized version of the sg driver, it seems the
> >>>>> conversion from a userspace array of sg_iovec_t to scatterlist stays
> >>>>> the same and also applies to the original driver (see
> >>>>> st_map_user_pages()).
> >>>>
> >>>> Hello,
> >>>> I am seeing the same issue when using direct io with sg. sg will
> >>>> perform direct io on any date that is aligned with the devices
> >>>> dma_align. The default for drivers that do not specify is 512. sg
> >>>> builds the scatter gather list from the user specified location,
> >>>> offsetting the first entry in the list if not page aligned. This is
> >>>> the case that causes the improper allocation of "nr_iovec" in
> >>>> scsi_req_map_sgand the later slab corruption.
> >>>>
> >>>> I don't think it is necessary to calculate nr_pages from the
> >>>> entire list. Only sgl[0] is allowed to have an offset, so we can
> >>>> calculate from that as follows.
> >>>> --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
> >>>> +++ b/drivers/scsi/scsi_lib.c 2006-03-21 11:36:39.389763804 -0600
> >>>> @@ -368,12 +368,15 @@
> >>>> int nsegs, unsigned bufflen, gfp_t gfp)
> >>>> {
> >>>> struct request_queue *q = rq->q;
> >>>> - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >>>> + int nr_pages = 0;
> >>>> unsigned int data_len = 0, len, bytes, off;
> >>>> struct page *page;
> >>>> struct bio *bio = NULL;
> >>>> int i, err, nr_vecs = 0;
> >>>> -
> >>>> +
> >>>> + if (nsegs)
> >>>
> >>> you can drop that test
> >>>
> >>>> + nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >>
> >>>> PAGE_SHIFT; +
> >>>
> >>> I think we can do this without looping but I think this is broken. If
> >>> we had a slight variant of Dan's example but we have a page and some
> >>> change in the first entry {3644, 4548} and 0,60} in the last one, that
> >>> would would only calculate two pages but we want three.
> >>>
> >>> I think we can have to calculate the first and last entries but the
> >>> middle ones we can assume have no offset and lengths that are multiples
> >>> of a page.
> >>
> >> You are absolutely correct.
> >> bufflen is not page masked and when added to the offset of the first
> >> entry, will generate the correct nr_pages.
> >>
> >> nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >> ==
> >> nr_pages = ((bufflen & PAGE_MASK) + (PAGE_SIZE-1) + sgl[0].offset +
> >> sgl[nsegs-1].length) >> PAGE_SHIFT;
> >>
> >> Either will calculate correctly.
> >>
> >> --
> >> Bryan Holty
> >> -
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> >> in the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> >
> > Based on above, I think the most intuitive fix would be the offset
> > addition of the first entry to the initialization of nr_pages.
> >
> > Without this change, for instance, with 4K io's every sg io that is
> > dma_aligned for direct io, but not page aligned will cause slab
> > corruption and an oops
> >
> > I am able to run a number of tests with sg that cause the boundary to be
> > crossed, and with this fix there is no slab corruption or data
> > corruption.
> >
> > Thanks Dan, I had been hunting for this for a couple of days!!
> >
> > Thoughts??
> >
> > Signed-off-by: Bryan Holty <[email protected]>
> >
> > --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
> > +++ b/drivers/scsi/scsi_lib.c 2006-03-22 06:09:09.669599539 -0600
> > @@ -368,7 +368,7 @@
> > int nsegs, unsigned bufflen, gfp_t gfp)
> > {
> > struct request_queue *q = rq->q;
> > - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > unsigned int data_len = 0, len, bytes, off;
> > struct page *page;
> > struct bio *bio = NULL;
> > --
>
> Was this patch ok? I was looking at some other problem and noticed this
> was not merged. The original calculation that I did in the code is
> wrong. I thought the calculation in this patch was correct, but I have
> showed that my math skills are lacking so I may be wrong.
Yes, this patch fixes the problem. We have been running with it since it was
originally posted without problem. We crash near immediately without the
patch in place. We are using sg with direct_io heavily.
--
Thanks,
Bryan Holty