I'm getting a huge sendfile regression in 2.6.1; when serving a large
file on an IPv4 TCP socket via sendfile64, the transfer starts at about
4 MB (2.6.0: >7 MB) and the hard disk like stays on 100% (which normally
only happens if the file is badly fragmented, fragmentation of this file
is 0%). Then, suddenly, the network performance drops dramatically,
getting worse and worse. strace shows that the process is hanging
inside sendfile64 (which should not happen since the socket is
non-blocking). The process then stays inside sendfile for up to a
minute or so and can't be interrupted or killed in that time.
Please fix!
Felix
Felix von Leitner <[email protected]> wrote:
>
> I'm getting a huge sendfile regression in 2.6.1; when serving a large
> file on an IPv4 TCP socket via sendfile64, the transfer starts at about
> 4 MB (2.6.0: >7 MB) and the hard disk like stays on 100% (which normally
> only happens if the file is badly fragmented, fragmentation of this file
> is 0%). Then, suddenly, the network performance drops dramatically,
> getting worse and worse. strace shows that the process is hanging
> inside sendfile64 (which should not happen since the socket is
> non-blocking). The process then stays inside sendfile for up to a
> minute or so and can't be interrupted or killed in that time.
>
Probably it is the ill-advised readahead tweak. Does the below patch fix
it up?
diff -puN mm/filemap.c~a mm/filemap.c
--- 25/mm/filemap.c~a Fri Jan 9 16:20:32 2004
+++ 25-akpm/mm/filemap.c Fri Jan 9 16:20:46 2004
@@ -596,12 +596,6 @@ void do_generic_mapping_read(struct addr
offset = *ppos & ~PAGE_CACHE_MASK;
last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
- /*
- * Let the readahead logic know upfront about all
- * the pages we'll need to satisfy this request
- */
- for (; index < last; index++)
- page_cache_readahead(mapping, ra, filp, index);
index = *ppos >> PAGE_CACHE_SHIFT;
for (;;) {
_
On Fri, 2004-01-09 at 16:21, Andrew Morton wrote:
> Probably it is the ill-advised readahead tweak. Does the below patch fix
> it up?
>
> diff -puN mm/filemap.c~a mm/filemap.c
> --- 25/mm/filemap.c~a Fri Jan 9 16:20:32 2004
> +++ 25-akpm/mm/filemap.c Fri Jan 9 16:20:46 2004
> @@ -596,12 +596,6 @@ void do_generic_mapping_read(struct addr
> offset = *ppos & ~PAGE_CACHE_MASK;
> last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
>
> - /*
> - * Let the readahead logic know upfront about all
> - * the pages we'll need to satisfy this request
> - */
> - for (; index < last; index++)
> - page_cache_readahead(mapping, ra, filp, index);
> index = *ppos >> PAGE_CACHE_SHIFT;
>
> for (;;) {
>
There is a small mistake in Andrew's patch. The call to
page_cache_readahead() is missing.
Try this one.
*** linux-2.6.1-rc2/mm/filemap.c Fri Jan 9 00:47:04 2004
--- linux-2.6.1-rc2/mm/filemap.c.1 Fri Jan 9 00:46:43 2004
***************
*** 587,608 ****
read_actor_t actor)
{
struct inode *inode = mapping->host;
! unsigned long index, offset, last;
struct page *cached_page;
int error;
cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
- last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
-
- /*
- * Let the readahead logic know upfront about all
- * the pages we'll need to satisfy this request
- */
- for (; index < last; index++)
- page_cache_readahead(mapping, ra, filp, index);
- index = *ppos >> PAGE_CACHE_SHIFT;
for (;;) {
struct page *page;
--- 587,599 ----
read_actor_t actor)
{
struct inode *inode = mapping->host;
! unsigned long index, offset;
struct page *cached_page;
int error;
cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
for (;;) {
struct page *page;
***************
*** 621,626 ****
--- 612,618 ----
}
cond_resched();
+ page_cache_readahead(mapping, ra, filp, index);
nr = nr - offset;
find_page:
> _
>
>
Ram Pai <[email protected]> wrote:
>
> There is a small mistake in Andrew's patch. The call to
> page_cache_readahead() is missing.
>
> Try this one.
Third time lucky.
diff -puN mm/filemap.c~readahead-partial-backout mm/filemap.c
--- 25/mm/filemap.c~readahead-partial-backout 2004-01-09 17:41:14.000000000 -0800
+++ 25-akpm/mm/filemap.c 2004-01-09 17:41:14.000000000 -0800
@@ -587,22 +587,13 @@ void do_generic_mapping_read(struct addr
read_actor_t actor)
{
struct inode *inode = mapping->host;
- unsigned long index, offset, last;
+ unsigned long index, offset;
struct page *cached_page;
int error;
cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
- last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
-
- /*
- * Let the readahead logic know upfront about all
- * the pages we'll need to satisfy this request
- */
- for (; index < last; index++)
- page_cache_readahead(mapping, ra, filp, index);
- index = *ppos >> PAGE_CACHE_SHIFT;
for (;;) {
struct page *page;
@@ -621,6 +612,7 @@ void do_generic_mapping_read(struct addr
}
cond_resched();
+ page_cache_readahead(mapping, ra, filp, index);
nr = nr - offset;
find_page:
_
On Sat, Jan 10, 2004 at 01:01:28AM +0100, Felix von Leitner wrote:
> strace shows that the process is hanging
> inside sendfile64 (which should not happen since the socket is
> non-blocking).
What if the data you're sending is not in the page cache?
--L
On Sat, 10 Jan 2004, Lennert Buytenhek wrote:
> On Sat, Jan 10, 2004 at 01:01:28AM +0100, Felix von Leitner wrote:
>
> > strace shows that the process is hanging
> > inside sendfile64 (which should not happen since the socket is
> > non-blocking).
>
> What if the data you're sending is not in the page cache?
It will always block on the actual page cache, although we could try to
change that. However, even if it blocks, it should only block at one page
at a time (or "incidental" blockage due to memory allocations etc).
Blocking for long times implies a bug.
Linus
On Sun, Jan 11, 2004 at 06:58:55PM -0800, Linus Torvalds wrote:
> > > strace shows that the process is hanging
> > > inside sendfile64 (which should not happen since the socket is
> > > non-blocking).
> >
> > What if the data you're sending is not in the page cache?
>
> It will always block on the actual page cache, although we could try to
> change that.
My impression is that this is the reason why AIO sendfile was attempted.
Although my wild guess is that that probably would only work on raw block
devices and still not on regular filesystem files.
> However, even if it blocks, it should only block at one page
> at a time (or "incidental" blockage due to memory allocations etc).
>
> Blocking for long times implies a bug.
(Even if it only blocks a page at a time, it will happily block on the
next page and the one after that as long as there is write space in the
destination socket?)
--L