|author||Ying Han <email@example.com>||2009-01-06 14:40:18 -0800|
|committer||Linus Torvalds <firstname.lastname@example.org>||2009-01-06 15:59:08 -0800|
mm: make get_user_pages() interruptible
The initial implementation of checking TIF_MEMDIE covers the cases of OOM killing. If the process has been OOM killed, the TIF_MEMDIE is set and it return immediately. This patch includes: 1. add the case that the SIGKILL is sent by user processes. The process can try to get_user_pages() unlimited memory even if a user process has sent a SIGKILL to it(maybe a monitor find the process exceed its memory limit and try to kill it). In the old implementation, the SIGKILL won't be handled until the get_user_pages() returns. 2. change the return value to be ERESTARTSYS. It makes no sense to return ENOMEM if the get_user_pages returned by getting a SIGKILL signal. Considering the general convention for a system call interrupted by a signal is ERESTARTNOSYS, so the current return value is consistant to that. Lee: An unfortunate side effect of "make-get_user_pages-interruptible" is that it prevents a SIGKILL'd task from munlock-ing pages that it had mlocked, resulting in freeing of mlocked pages. Freeing of mlocked pages, in itself, is not so bad. We just count them now--altho' I had hoped to remove this stat and add PG_MLOCKED to the free pages flags check. However, consider pages in shared libraries mapped by more than one task that a task mlocked--e.g., via mlockall(). If the task that mlocked the pages exits via SIGKILL, these pages would be left mlocked and unevictable. Proposed fix: Add another GUP flag to ignore sigkill when calling get_user_pages from munlock()--similar to Kosaki Motohiro's 'IGNORE_VMA_PERMISSIONS flag for the same purpose. We are not actually allocating memory in this case, which "make-get_user_pages-interruptible" intends to avoid. We're just munlocking pages that are already resident and mapped, and we're reusing get_user_pages() to access those pages. ?? Maybe we should combine 'IGNORE_VMA_PERMISSIONS and '_IGNORE_SIGKILL into a single flag: GUP_FLAGS_MUNLOCK ??? [Lee.Schermerhorn@hp.com: ignore sigkill in get_user_pages during munlock] Signed-off-by: Paul Menage <email@example.com> Signed-off-by: Ying Han <firstname.lastname@example.org> Reviewed-by: KOSAKI Motohiro <email@example.com> Reviewed-by: Pekka Enberg <firstname.lastname@example.org> Cc: Nick Piggin <email@example.com> Cc: Hugh Dickins <firstname.lastname@example.org> Cc: Oleg Nesterov <email@example.com> Cc: Lee Schermerhorn <firstname.lastname@example.org> Cc: Rohit Seth <email@example.com> Cc: David Rientjes <firstname.lastname@example.org> Signed-off-by: Lee Schermerhorn <email@example.com> Signed-off-by: Andrew Morton <firstname.lastname@example.org> Signed-off-by: Linus Torvalds <email@example.com>
Diffstat (limited to 'mm/mlock.c')
1 files changed, 5 insertions, 4 deletions
diff --git a/mm/mlock.c b/mm/mlock.c
index 3035a56e761..e125156c664 100644
@@ -173,12 +173,13 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma,
(atomic_read(&mm->mm_users) != 0));
- * mlock: don't page populate if page has PROT_NONE permission.
- * munlock: the pages always do munlock althrough
- * its has PROT_NONE permission.
+ * mlock: don't page populate if vma has PROT_NONE permission.
+ * munlock: always do munlock although the vma has PROT_NONE
+ * permission, or SIGKILL is pending.
- gup_flags |= GUP_FLAGS_IGNORE_VMA_PERMISSIONS;
+ gup_flags |= GUP_FLAGS_IGNORE_VMA_PERMISSIONS |
if (vma->vm_flags & VM_WRITE)
gup_flags |= GUP_FLAGS_WRITE;