2013-03-04 15:04:52

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: cifs: bugfix for unreclaimed writeback pages in cifs_writev_requeue()

Hi,

The below commit that is present in 3.9-rc1 is buggy. It releases the page at which point it may no longer exist and then it unlocks it afterwards. Even if you are somehow getting away with it I think it is an explosion/memory corruption waiting to happen...

Best regards,

Anton

On 2 Mar 2013, at 19:55, Linux Kernel Mailing List <[email protected]> wrote:

> Gitweb: http://git.kernel.org/linus/;a=commit;h=c51bb0ea40ca038da26b1fa7d450f4078124af03
> Commit: c51bb0ea40ca038da26b1fa7d450f4078124af03
> Parent: 0b7bc84000d71f3647ca33ab1bf5bd928535c846
> Author: Ouyang Maochun <[email protected]>
> AuthorDate: Mon Feb 18 09:54:52 2013 -0600
> Committer: Steve French <[email protected]>
> CommitDate: Thu Feb 28 09:01:47 2013 -0600
>
> cifs: bugfix for unreclaimed writeback pages in cifs_writev_requeue()
>
> Pages get the PG_writeback flag set before cifs sends its
> request to SMB server in cifs_writepages(), if the SMB service
> goes down, cifs may try to recommit the writing requests in
> cifs_writev_requeue(). However, it does not clean its PG_writeback
> flag and relaimed the pages even if it fails again in
> cifs_writev_requeue(), which may lead to the hanging of the
> processes accessing the cifs directory. This patch just cleans
> the PG_writeback flags and reclaims the pages under that circumstances.
>
> Steps to reproduce the bug(trying serveral times may trigger the issue):
> 1.Write from cifs client continuously.(e.g dd if=/dev/zero of=<cifs file>)
> 2.Stop SMB service from server.(e.g service smb stop)
> 3.Wait for two minutes, and then start SMB service from
> server.(e.g service smb start)
> 4.The processes which are accessing cifs directory may hang up.
>
> Signed-off-by: Ouyang Maochun <[email protected]>
> Signed-off-by: Jiang Yong <[email protected]>
> Tested-by: Zhang Xianwei <[email protected]>
> Reviewed-by: Wang Liang <[email protected]>
> Reviewed-by: Cai Qu <[email protected]>
> Reviewed-by: Jiang Biao <[email protected]>
> Reviewed-by: Jeff Layton <[email protected]>
> Reviewed-by: Pavel Shilovsky <[email protected]>
> Signed-off-by: Steve French <[email protected]>
> ---
> fs/cifs/cifssmb.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 00e12f2..7353bc5 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -1909,8 +1909,11 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
> } while (rc == -EAGAIN);
>
> for (i = 0; i < wdata->nr_pages; i++) {
> - if (rc != 0)
> + if (rc != 0) {
> SetPageError(wdata->pages[i]);
> + end_page_writeback(wdata->pages[i]);
> + page_cache_release(wdata->pages[i]);
> + }
> unlock_page(wdata->pages[i]);
> }
>

--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/


2013-03-04 19:19:14

by Jeff Layton

[permalink] [raw]
Subject: Re: cifs: bugfix for unreclaimed writeback pages in cifs_writev_requeue()

On Mon, 4 Mar 2013 15:04:49 +0000
Anton Altaparmakov <[email protected]> wrote:

> Hi,
>
> The below commit that is present in 3.9-rc1 is buggy. It releases the page at which point it may no longer exist and then it unlocks it afterwards. Even if you are somehow getting away with it I think it is an explosion/memory corruption waiting to happen...
>
> Best regards,
>
> Anton
>
> On 2 Mar 2013, at 19:55, Linux Kernel Mailing List <[email protected]> wrote:
>
> > Gitweb: http://git.kernel.org/linus/;a=commit;h=c51bb0ea40ca038da26b1fa7d450f4078124af03
> > Commit: c51bb0ea40ca038da26b1fa7d450f4078124af03
> > Parent: 0b7bc84000d71f3647ca33ab1bf5bd928535c846
> > Author: Ouyang Maochun <[email protected]>
> > AuthorDate: Mon Feb 18 09:54:52 2013 -0600
> > Committer: Steve French <[email protected]>
> > CommitDate: Thu Feb 28 09:01:47 2013 -0600
> >
> > cifs: bugfix for unreclaimed writeback pages in cifs_writev_requeue()
> >
> > Pages get the PG_writeback flag set before cifs sends its
> > request to SMB server in cifs_writepages(), if the SMB service
> > goes down, cifs may try to recommit the writing requests in
> > cifs_writev_requeue(). However, it does not clean its PG_writeback
> > flag and relaimed the pages even if it fails again in
> > cifs_writev_requeue(), which may lead to the hanging of the
> > processes accessing the cifs directory. This patch just cleans
> > the PG_writeback flags and reclaims the pages under that circumstances.
> >
> > Steps to reproduce the bug(trying serveral times may trigger the issue):
> > 1.Write from cifs client continuously.(e.g dd if=/dev/zero of=<cifs file>)
> > 2.Stop SMB service from server.(e.g service smb stop)
> > 3.Wait for two minutes, and then start SMB service from
> > server.(e.g service smb start)
> > 4.The processes which are accessing cifs directory may hang up.
> >
> > Signed-off-by: Ouyang Maochun <[email protected]>
> > Signed-off-by: Jiang Yong <[email protected]>
> > Tested-by: Zhang Xianwei <[email protected]>
> > Reviewed-by: Wang Liang <[email protected]>
> > Reviewed-by: Cai Qu <[email protected]>
> > Reviewed-by: Jiang Biao <[email protected]>
> > Reviewed-by: Jeff Layton <[email protected]>
> > Reviewed-by: Pavel Shilovsky <[email protected]>
> > Signed-off-by: Steve French <[email protected]>
> > ---
> > fs/cifs/cifssmb.c | 5 ++++-
> > 1 files changed, 4 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> > index 00e12f2..7353bc5 100644
> > --- a/fs/cifs/cifssmb.c
> > +++ b/fs/cifs/cifssmb.c
> > @@ -1909,8 +1909,11 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
> > } while (rc == -EAGAIN);
> >
> > for (i = 0; i < wdata->nr_pages; i++) {
> > - if (rc != 0)
> > + if (rc != 0) {
> > SetPageError(wdata->pages[i]);
> > + end_page_writeback(wdata->pages[i]);
> > + page_cache_release(wdata->pages[i]);
> > + }
> > unlock_page(wdata->pages[i]);
> > }
> >
>

Well spotted...

We definitely should be unlocking the page before releasing it. I
think it's sufficient to simply move the unlock call before the
check of "rc". I'll send out a patch to do just that once I've
smoke-tested it.

Thanks,
--
Jeff Layton <[email protected]>