aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-12-06Linux 3.10.62v3.10.62Greg Kroah-Hartman
2014-12-06nfsd: Fix ACL null pointer derefSergio Gelato
BugLink: http://bugs.launchpad.net/bugs/1348670 Fix regression introduced in pre-3.14 kernels by cherry-picking aa07c713ecfc0522916f3cd57ac628ea6127c0ec (NFSD: Call ->set_acl with a NULL ACL structure if no entries). The affected code was removed in 3.14 by commit 4ac7249ea5a0ceef9f8269f63f33cc873c3fac61 (nfsd: use get_acl and ->set_acl). The ->set_acl methods are already able to cope with a NULL argument. Signed-off-by: Sergio Gelato <Sergio.Gelato@astro.su.se> [bwh: Rewrite the subject] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Moritz Mühlenhoff <muehlenhoff@univention.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06powerpc/powernv: Honor the generic "no_64bit_msi" flagBenjamin Herrenschmidt
commit 360743814c4082515581aa23ab1d8e699e1fbe88 upstream. Instead of the arch specific quirk which we are deprecating and that drivers don't understand. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06bnx2fc: do not add shared skbs to the fcoe_rx_listMaurizio Lombardi
commit 01a4cc4d0cd6a836c7b923760e8eb1cbb6a47258 upstream. In some cases, the fcoe_rx_list may contains multiple instances of the same skb (the so called "shared skbs"). the bnx2fc_l2_rcv thread is a loop that extracts a skb from the list, modifies (and destroys) its content and then proceed to the next one. The problem is that if the skb is shared, the remaining instances will be corrupted. The solution is to use skb_share_check() before adding the skb to the fcoe_rx_list. [ 6286.808725] ------------[ cut here ]------------ [ 6286.808729] WARNING: at include/scsi/fc_frame.h:173 bnx2fc_l2_rcv_thread+0x425/0x450 [bnx2fc]() [ 6286.808748] Modules linked in: bnx2x(-) mdio dm_service_time bnx2fc cnic uio fcoe libfcoe 8021q garp stp mrp libfc llc scsi_transport_fc scsi_tgt sg iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel e1000e ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper ptp cryptd hpilo serio_raw hpwdt lpc_ich pps_core ipmi_si pcspkr mfd_core ipmi_msghandler shpchp pcc_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c ata_generic pata_acpi sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit ata_piix drm_kms_helper ttm drm libata i2c_core hpsa dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mdio] [ 6286.808750] CPU: 3 PID: 1304 Comm: bnx2fc_l2_threa Not tainted 3.10.0-121.el7.x86_64 #1 [ 6286.808750] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 [ 6286.808752] 0000000000000000 000000000b36e715 ffff8800deba1e00 ffffffff815ec0ba [ 6286.808753] ffff8800deba1e38 ffffffff8105dee1 ffffffffa05618c0 ffff8801e4c81888 [ 6286.808754] ffffe8ffff663868 ffff8801f402b180 ffff8801f56bc000 ffff8800deba1e48 [ 6286.808754] Call Trace: [ 6286.808759] [<ffffffff815ec0ba>] dump_stack+0x19/0x1b [ 6286.808762] [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80 [ 6286.808763] [<ffffffff8105e00a>] warn_slowpath_null+0x1a/0x20 [ 6286.808765] [<ffffffffa054f415>] bnx2fc_l2_rcv_thread+0x425/0x450 [bnx2fc] [ 6286.808767] [<ffffffffa054eff0>] ? bnx2fc_disable+0x90/0x90 [bnx2fc] [ 6286.808769] [<ffffffff81085aef>] kthread+0xcf/0xe0 [ 6286.808770] [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [ 6286.808772] [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0 [ 6286.808773] [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [ 6286.808774] ---[ end trace c6cdb939184ccb4e ]--- Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Acked-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06nfsd4: fix leak of inode reference on delegation failureJ. Bruce Fields
commit bf7bd3e98be5c74813bee6ad496139fb0a011b3b upstream. This fixes a regression from 68a3396178e6688ad7367202cdf0af8ed03c8727 "nfsd4: shut down more of delegation earlier". After that commit, nfs4_set_delegation() failures result in nfs4_put_delegation being called, but nfs4_put_delegation doesn't free the nfs4_file that has already been set by alloc_init_deleg(). This can result in an oops on later unmounting the exported filesystem. Note also delaying the fi_had_conflict check we're able to return a better error (hence give 4.1 clients a better idea why the delegation failed; though note CONFLICT isn't an exact match here, as that's supposed to indicate a current conflict, but all we know here is that there was one recently). Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [tuomasjjrasanen: backported to 3.10 Conflicts fs/nfsd/nfs4state.c: Delegation type flags have been removed from upstream code. In 3.10-series, they still exists and therefore the commit caused few conflicts in function signatures. ] Signed-off-by: Tuomas Räsänen <tuomasjjrasanen@opinsys.fi> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06nfsd: Fix slot wake up race in the nfsv4.1 callback codeTrond Myklebust
commit c6c15e1ed303ffc47e696ea1c9a9df1761c1f603 upstream. The currect code for nfsd41_cb_get_slot() and nfsd4_cb_done() has no locking in order to guarantee atomicity, and so allows for races of the form. Task 1 Task 2 ====== ====== if (test_and_set_bit(0) != 0) { clear_bit(0) rpc_wake_up_next(queue) rpc_sleep_on(queue) return false; } This patch breaks the race condition by adding a retest of the bit after the call to rpc_sleep_on(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06rt2x00: do not align payload on modern H/WStanislaw Gruszka
commit cfd9167af14eb4ec21517a32911d460083ee3d59 upstream. RT2800 and newer hardware require padding between header and payload if header length is not multiple of 4. For historical reasons we also align payload to to 4 bytes boundary, but such alignment is not needed on modern H/W. Patch fixes skb_under_panic problems reported from time to time: https://bugzilla.kernel.org/show_bug.cgi?id=84911 https://bugzilla.kernel.org/show_bug.cgi?id=72471 http://marc.info/?l=linux-wireless&m=139108549530402&w=2 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1087591 Panic happened because we eat 4 bytes of skb headroom on each (re)transmission when sending frame without the payload and the header length not being multiple of 4 (i.e. QoS header has 26 bytes). On such case because paylad_aling=2 is bigger than header_align=0 we increase header_align by 4 bytes. To prevent that we could change the check to: if (payload_length && payload_align > header_align) header_align += 4; but not aligning payload at all is more effective and alignment is not really needed by H/W (that has been tested on OpenWrt project for few years now). Reported-and-tested-by: Antti S. Lankila <alankila@bel.fi> Debugged-by: Antti S. Lankila <alankila@bel.fi> Reported-by: Henrik Asp <solenskiner@gmail.com> Originally-From: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06can: dev: avoid calling kfree_skb() from interrupt contextThomas Körper
commit 5247a589c24022ab34e780039cc8000c48f2035e upstream. ikfree_skb() is Called in can_free_echo_skb(), which might be called from (TX Error) interrupt, which triggers the folloing warning: [ 1153.360705] ------------[ cut here ]------------ [ 1153.360715] WARNING: CPU: 0 PID: 31 at net/core/skbuff.c:563 skb_release_head_state+0xb9/0xd0() [ 1153.360772] Call Trace: [ 1153.360778] [<c167906f>] dump_stack+0x41/0x52 [ 1153.360782] [<c105bb7e>] warn_slowpath_common+0x7e/0xa0 [ 1153.360784] [<c158b909>] ? skb_release_head_state+0xb9/0xd0 [ 1153.360786] [<c158b909>] ? skb_release_head_state+0xb9/0xd0 [ 1153.360788] [<c105bc42>] warn_slowpath_null+0x22/0x30 [ 1153.360791] [<c158b909>] skb_release_head_state+0xb9/0xd0 [ 1153.360793] [<c158be90>] skb_release_all+0x10/0x30 [ 1153.360795] [<c158bf06>] kfree_skb+0x36/0x80 [ 1153.360799] [<f8486938>] ? can_free_echo_skb+0x28/0x40 [can_dev] [ 1153.360802] [<f8486938>] can_free_echo_skb+0x28/0x40 [can_dev] [ 1153.360805] [<f849a12c>] esd_pci402_interrupt+0x34c/0x57a [esd402] [ 1153.360809] [<c10a75b5>] handle_irq_event_percpu+0x35/0x180 [ 1153.360811] [<c10a7623>] ? handle_irq_event_percpu+0xa3/0x180 [ 1153.360813] [<c10a7731>] handle_irq_event+0x31/0x50 [ 1153.360816] [<c10a9c7f>] handle_fasteoi_irq+0x6f/0x120 [ 1153.360818] [<c10a9c10>] ? handle_edge_irq+0x110/0x110 [ 1153.360822] [<c1011b61>] handle_irq+0x71/0x90 [ 1153.360823] <IRQ> [<c168152c>] do_IRQ+0x3c/0xd0 [ 1153.360829] [<c1680b6c>] common_interrupt+0x2c/0x34 [ 1153.360834] [<c107d277>] ? finish_task_switch+0x47/0xf0 [ 1153.360836] [<c167c27b>] __schedule+0x35b/0x7e0 [ 1153.360839] [<c10a5334>] ? console_unlock+0x2c4/0x4d0 [ 1153.360842] [<c13df500>] ? n_tty_receive_buf_common+0x890/0x890 [ 1153.360845] [<c10707b6>] ? process_one_work+0x196/0x370 [ 1153.360847] [<c167c723>] schedule+0x23/0x60 [ 1153.360849] [<c1070de1>] worker_thread+0x161/0x460 [ 1153.360852] [<c1090fcf>] ? __wake_up_locked+0x1f/0x30 [ 1153.360854] [<c1070c80>] ? rescuer_thread+0x2f0/0x2f0 [ 1153.360856] [<c1074f01>] kthread+0xa1/0xc0 [ 1153.360859] [<c1680401>] ret_from_kernel_thread+0x21/0x30 [ 1153.360861] [<c1074e60>] ? kthread_create_on_node+0x110/0x110 [ 1153.360863] ---[ end trace 5ff83639cbb74b35 ]--- This patch replaces the kfree_skb() by dev_kfree_skb_any(). Signed-off-by: Thomas Körper <thomas.koerper@esd.eu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06spi: dw: Fix dynamic speed change.Thor Thayer
commit 0a8727e69778683495058852f783eeda141a754e upstream. An IOCTL call that calls spi_setup() and then dw_spi_setup() will overwrite the persisted last transfer speed. On each transfer, the SPI speed is compared to the last transfer speed to determine if the clock divider registers need to be updated (did the speed change?). This bug was observed with the spidev driver using spi-config to update the max transfer speed. This fix: Don't overwrite the persisted last transaction clock speed when updating the SPI parameters in dw_spi_setup(). On the next transaction, the new speed won't match the persisted last speed and the hardware registers will be updated. On initialization, the persisted last transaction clock speed will be 0 but will be updated after the first SPI transaction. Move zeroed clock divider check into clock change test because chip->clk_div is zero on startup and would cause a divide-by-zero error. The calculation was wrong as well (can't support odd #). Reported-by: Vlastimil Setka <setka@vsis.cz> Signed-off-by: Vlastimil Setka <setka@vsis.cz> Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06iser-target: Handle DEVICE_REMOVAL event on network portal listener correctlySagi Grimberg
commit 3b726ae2de02a406cc91903f80132daee37b6f1b upstream. In this case the cm_id->context is the isert_np, and the cm_id->qp is NULL, so use that to distinct the cases. Since we don't expect any other events on this cm_id we can just return -1 for explicit termination of the cm_id by the cma layer. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06target: Don't call TFO->write_pending if data_length == 0Roland Dreier
commit 885e7b0e181c14e4d0ddd26c688bad2b84c1ada9 upstream. If an initiator sends a zero-length command (e.g. TEST UNIT READY) but sets the transfer direction in the transport layer to indicate a data-out phase, we still shouldn't try to transfer data. At best it's a NOP, and depending on the transport, we might crash on an uninitialized sg list. Reported-by: Craig Watson <craig.watson@vanguard-rugged.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06srp-target: Retry when QP creation fails with ENOMEMBart Van Assche
commit ab477c1ff5e0a744c072404bf7db51bfe1f05b6e upstream. It is not guaranteed to that srp_sq_size is supported by the HCA. So if we failed to create the QP with ENOMEM, try with a smaller srp_sq_size. Keep it up until we hit MIN_SRPT_SQ_SIZE, then fail the connection. Reported-by: Mark Lehrer <lehrer@gmail.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06Input: xpad - use proper endpoint typeGreg Kroah-Hartman
commit a1f9a4072655843fc03186acbad65990cc05dd2d upstream. The xpad wireless endpoint is not a bulk endpoint on my devices, but rather an interrupt one, so the USB core complains when it is submitted. I'm guessing that the author really did mean that this should be an interrupt urb, but as there are a zillion different xpad devices out there, let's cover out bases and handle both bulk and interrupt endpoints just as easily. Signed-off-by: "Pierre-Loup A. Griffais" <pgriffais@valvesoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ARM: 8222/1: mvebu: enable strex backoff delayThomas Petazzoni
commit 995ab5189d1d7264e79e665dfa032a19b3ac646e upstream. Under extremely rare conditions, in an MPCore node consisting of at least 3 CPUs, two CPUs trying to perform a STREX to data on the same shared cache line can enter a livelock situation. This patch enables the HW mechanism that overcomes the bug. This fixes the incorrect setup of the STREX backoff delay bit due to a wrong description in the specification. Note that enabling the STREX backoff delay mechanism is done by leaving the bit *cleared*, while the bit was currently being set by the proc-v7.S code. [Thomas: adapt to latest mainline, slightly reword the commit log, add stable markers.] Fixes: de4901933f6d ("arm: mm: Add support for PJ4B cpu and init routines") Signed-off-by: Nadav Haklai <nadavh@marvell.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ARM: 8216/1: xscale: correct auxiliary register in suspend/resumeDmitry Eremin-Solenikov
commit ef59a20ba375aeb97b3150a118318884743452a8 upstream. According to the manuals I have, XScale auxiliary register should be reached with opc_2 = 1 instead of crn = 1. cpu_xscale_proc_init correctly uses c1, c0, 1 arguments, but cpu_xscale_do_suspend and cpu_xscale_do_resume use c1, c1, 0. Correct suspend/resume functions to also use c1, c0, 1. The issue was primarily noticed thanks to qemu reporing "unsupported instruction" on the pxa suspend path. Confirmed in PXA210/250 and PXA255 XScale Core manuals and in PXA270 and PXA320 Developers Guides. Harware tested by me on tosa (pxa255). Robert confirmed on pxa270 board. Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devicesJurgen Kramer
commit 6e84a8d7ac3ba246ef44e313e92bc16a1da1b04a upstream. This patch adds a USB control message delay quirk for a few specific Marantz/Denon devices. Without the delay the DACs will not work properly and produces the following type of messages: Nov 15 10:09:21 orwell kernel: [ 91.342880] usb 3-13: clock source 41 is not valid, cannot use Nov 15 10:09:21 orwell kernel: [ 91.343775] usb 3-13: clock source 41 is not valid, cannot use There are likely other Marantz/Denon devices using the same USB module which exhibit the same problems. But as this cannot be verified I limited the patch to the devices I could test. The following two devices are covered by this path: - Marantz SA-14S1 - Marantz HD-DAC1 Signed-off-by: Jurgen Kramer <gtmkramer@xs4all.nl> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06can: esd_usb2: fix memory leak on disconnectAlexey Khoroshilov
commit efbd50d2f62fc1f69a3dcd153e63ba28cc8eb27f upstream. It seems struct esd_usb2 dev is not deallocated on disconnect. The patch adds the missing deallocation. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Matthias Fuchs <matthias.fuchs@esd.eu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06USB: xhci: don't start a halted endpoint before its new dequeue is setMathias Nyman
commit c3492dbfa1050debf23a5b5cd2bc7514c5b37896 upstream. A halted endpoint ring must first be reset, then move the ring dequeue pointer past the problematic TRB. If we start the ring too early after reset, but before moving the dequeue pointer we will end up executing the same problematic TRB again. As we always issue a set transfer dequeue command after a reset endpoint command we can skip starting endpoint rings at reset endpoint command completion. Without this fix we end up trying to handle the same faulty TD for contol endpoints. causing timeout, and failing testusb ctrl_out write tests. Fixes: e9df17e (USB: xhci: Correct assumptions about number of rings per endpoint.) Tested-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000Hans de Goede
commit 263e80b43559a6103e178a9176938ce171b23872 upstream. This wireless mouse receiver needs a reset-resume quirk to properly come out of reset. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1165206 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06usb: serial: ftdi_sio: add PIDs for Matrix Orbital productsTroy Clark
commit 204ec6e07ea7aff863df0f7c53301f9cbbfbb9d3 upstream. Add PIDs for new Matrix Orbital GTT series products. Signed-off-by: Troy Clark <tclark@matrixorbital.ca> [johan: shorten commit message ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06USB: serial: cp210x: add IDs for CEL MeshConnect USB StickPreston Fick
commit ffcfe30ebd8dd703d0fc4324ffe56ea21f5479f4 upstream. Signed-off-by: Preston Fick <pffick@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06USB: keyspan: fix tty line-status reportingJohan Hovold
commit 5d1678a33c731b56e245e888fdae5e88efce0997 upstream. Fix handling of TTY error flags, which are not bitmasks and must specifically not be ORed together as this prevents the line discipline from recognising them. Also insert null characters when reporting overrun errors as these are not associated with the received character. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06USB: keyspan: fix overrun-error reportingJohan Hovold
commit 855515a6d3731242d85850a206f2ec084c917338 upstream. Fix reporting of overrun errors, which are not associated with a character. Instead insert a null character and report only once. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06USB: ssu100: fix overrun-error reportingJohan Hovold
commit 75bcbf29c284dd0154c3e895a0bd1ef0e796160e upstream. Fix reporting of overrun errors, which should only be reported once using the inserted null character. Fixes: 6b8f1ca5581b ("USB: ssu100: set tty_flags in ssu100_process_packet") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit maskCristina Ciocan
commit ccf54555da9a5e91e454b909ca6a5303c7d6b910 upstream. The direction field is set on 7 bits, thus we need to AND it with 0111 111 mask in order to retrieve it, that is 0x7F, not 0xCF as it is now. Fixes: ade7ef7ba (staging:iio: Differential channel handling) Signed-off-by: Cristina Ciocan <cristina.ciocan@intel.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06powerpc/pseries: Fix endiannes issue in RTAS call from xmonLaurent Dufour
commit 3b8a3c01096925a824ed3272601082289d9c23a5 upstream. On pseries system (LPAR) xmon failed to enter when running in LE mode, system is hunging. Inititating xmon will lead to such an output on the console: SysRq : Entering xmon cpu 0x15: Vector: 0 at [c0000003f39ffb10] pc: c00000000007ed7c: sysrq_handle_xmon+0x5c/0x70 lr: c00000000007ed7c: sysrq_handle_xmon+0x5c/0x70 sp: c0000003f39ffc70 msr: 8000000000009033 current = 0xc0000003fafa7180 paca = 0xc000000007d75e80 softe: 0 irq_happened: 0x01 pid = 14617, comm = bash Bad kernel stack pointer fafb4b0 at eca7cc4 cpu 0x15: Vector: 300 (Data Access) at [c000000007f07d40] pc: 000000000eca7cc4 lr: 000000000eca7c44 sp: fafb4b0 msr: 8000000000001000 dar: 10000000 dsisr: 42000000 current = 0xc0000003fafa7180 paca = 0xc000000007d75e80 softe: 0 irq_happened: 0x01 pid = 14617, comm = bash cpu 0x15: Exception 300 (Data Access) in xmon, returning to main loop xmon: WARNING: bad recursive fault on cpu 0x15 The root cause is that xmon is calling RTAS to turn off the surveillance when entering xmon, and RTAS is requiring big endian parameters. This patch is byte swapping the RTAS arguments when running in LE mode. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06powerpc/pseries: Honor the generic "no_64bit_msi" flagBenjamin Herrenschmidt
commit 415072a041bf50dbd6d56934ffc0cbbe14c97be8 upstream. Instead of the arch specific quirk which we are deprecating Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06of/base: Fix PowerPC address parsing hackBenjamin Herrenschmidt
commit 746c9e9f92dde2789908e51a354ba90a1962a2eb upstream. We have a historical hack that treats missing ranges properties as the equivalent of an empty one. This is needed for ancient PowerMac "bad" device-trees, and shouldn't be enabled for any other PowerPC platform, otherwise we get some nasty layout of devices in sysfs or even duplication when a set of otherwise identically named devices is created multiple times under a different parent node with no ranges property. This fix is needed for the PowerNV i2c busses to be exposed properly and will fix a number of other embedded cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Grant Likely <grant.likely@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ASoC: wm_adsp: Avoid attempt to free buffers that might still be in useCharles Keepax
commit 9da7a5a9fdeeb76b2243f6b473363a7e6147ab6f upstream. We should not free any buffers associated with writing out coefficients to the DSP until all the async writes have completed. This patch updates the out of memory path when allocating a new buffer to include a call to regmap_async_complete. Reported-by: JS Park <aitdark.park@samsung.com> Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ASoC: sgtl5000: Fix SMALL_POP bit definitionFabio Estevam
commit c251ea7bd7a04f1f2575467e0de76e803cf59149 upstream. On a mx28evk with a sgtl5000 codec we notice a loud 'click' sound to happen 5 seconds after the end of a playback. The SMALL_POP bit should fix this, but its definition is incorrect: according to the sgtl5000 manual it is bit 0 of CHIP_REF_CTRL register, not bit 1. Fix the definition accordingly and enable the bit as intended per the code comment. After applying this change, no loud 'click' sound is heard after playback Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06PCI/MSI: Add device flag indicating that 64-bit MSIs don't workBenjamin Herrenschmidt
commit f144d1496b47e7450f41b767d0d91c724c2198bc upstream. This can be set by quirks/drivers to be used by the architecture code that assigns the MSI addresses. We additionally add verification in the core MSI code that the values assigned by the architecture do satisfy the limitation in order to fail gracefully if they don't (ie. the arch hasn't been updated to deal with that quirk yet). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ipx: fix locking regression in ipx_sendmsg and ipx_recvmsgJiri Bohac
[ Upstream commit 01462405f0c093b2f8dfddafcadcda6c9e4c5cdf ] This fixes an old regression introduced by commit b0d0d915 (ipx: remove the BKL). When a recvmsg syscall blocks waiting for new data, no data can be sent on the same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked. This breaks mars-nwe (NetWare emulator): - the ncpserv process reads the request using recvmsg - ncpserv forks and spawns nwconn - ncpserv calls a (blocking) recvmsg and waits for new requests - nwconn deadlocks in sendmsg on the same socket Commit b0d0d915 has simply replaced BKL locking with lock_sock/release_sock. Unlike now, BKL got unlocked while sleeping, so a blocking recvmsg did not block a concurrent sendmsg. Only keep the socket locked while actually working with the socket data and release it prior to calling skb_recv_datagram(). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06pptp: fix stack info leak in pptp_getname()Mathias Krause
[ Upstream commit a5f6fc28d6e6cc379c6839f21820e62262419584 ] pptp_getname() only partially initializes the stack variable sa, particularly only fills the pptp part of the sa_addr union. The code thereby discloses 16 bytes of kernel stack memory via getsockname(). Fix this by memset(0)'ing the union before. Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G ModemMartin Hauke
[ Upstream commit bb2bdeb83fb125c95e47fc7eca2a3e8f868e2a74 ] Added the USB VID/PID for the HP lt4112 LTE/HSPA+ Gobi 4G Modem (Huawei me906e) Signed-off-by: Martin Hauke <mardnh@gmx.de> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ieee802154: fix error handling in ieee802154fake_probe()Alexey Khoroshilov
[ Upstream commit 8c2dd54485ccee7fc4086611e188478584758c8d ] In case of any failure ieee802154fake_probe() just calls unregister_netdev(). But it does not look safe to unregister netdevice before it was registered. The patch implements straightforward resource deallocation in case of failure in ieee802154fake_probe(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ipv4: Fix incorrect error code when adding an unreachable routePanu Matilainen
[ Upstream commit 49dd18ba4615eaa72f15c9087dea1c2ab4744cf5 ] Trying to add an unreachable route incorrectly returns -ESRCH if if custom FIB rules are present: [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4 RTNETLINK answers: Network is unreachable [root@localhost ~]# ip rule add to 55.66.77.88 table 200 [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4 RTNETLINK answers: No such process [root@localhost ~]# Commit 83886b6b636173b206f475929e58fac75c6f2446 ("[NET]: Change "not found" return value for rule lookup") changed fib_rules_lookup() to use -ESRCH as a "not found" code internally, but for user space it should be translated into -ENETUNREACH. Handle the translation centrally in ipv4-specific fib_lookup(), leaving the DECnet case alone. On a related note, commit b7a71b51ee37d919e4098cd961d59a883fd272d8 ("ipv4: removed redundant conditional") removed a similar translation from ip_route_input_slow() prematurely AIUI. Fixes: b7a71b51ee37 ("ipv4: removed redundant conditional") Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06inetdevice: fixed signed integer overflowVincent BENAYOUN
[ Upstream commit 84bc88688e3f6ef843aa8803dbcd90168bb89faf ] There could be a signed overflow in the following code. The expression, (32-logmask) is comprised between 0 and 31 included. It may be equal to 31. In such a case the left shift will produce a signed integer overflow. According to the C99 Standard, this is an undefined behavior. A simple fix is to replace the signed int 1 with the unsigned int 1U. Signed-off-by: Vincent BENAYOUN <vincent.benayoun@trust-in-soft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06sparc64: Fix constraints on swab helpers.David S. Miller
[ Upstream commit 5a2b59d3993e8ca4f7788a48a23e5cb303f26954 ] We are reading the memory location, so we have to have a memory constraint in there purely for the sake of showing the data flow to the compiler. Reported-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUMEAndy Lutomirski
commit 82975bc6a6df743b9a01810fb32cb65d0ec5d60b upstream. x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06x86, mm: Set NX across entire PMD at bootKees Cook
commit 45e2a9d4701d8c624d4a4bcdd1084eae31e92f58 upstream. When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory. Before: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82c00000 10M RW PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82df5000 2004K RW GLB NX pte 0xffffffff82df5000-0xffffffff82e00000 44K RW GLB x pte 0xffffffff82e00000-0xffffffffc0000000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82e00000 12M RW PSE GLB NX pmd 0xffffffff82e00000-0xffffffffc0000000 978M pmd [ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment. We really should unmap the reminder along with the holes caused by init,initdata etc. but thats a different issue ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06x86: Require exact match for 'noxsave' command line optionDave Hansen
commit 2cd3949f702692cf4c5d05b463f19cd706a92dd3 upstream. We have some very similarly named command-line options: arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup); __setup() is designed to match options that take arguments, like "foo=bar" where you would have: __setup("foo", x86_foo_func...); The problem is that "noxsave" actually _matches_ "noxsaves" in the same way that "foo" matches "foo=bar". If you boot an old kernel that does not know about "noxsaves" with "noxsaves" on the command line, it will interpret the argument as "noxsave", which is not what you want at all. This makes the "noxsave" handler only return success when it finds an *exact* match. [ tglx: We really need to make __setup() more robust. ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06x86_64, traps: Rework bad_iretAndy Lutomirski
commit b645af2d5905c4e32399005b867987919cbfc3ae upstream. It's possible for iretq to userspace to fail. This can happen because of a bad CS, SS, or RIP. Historically, we've handled it by fixing up an exception from iretq to land at bad_iret, which pretends that the failed iret frame was really the hardware part of #GP(0) from userspace. To make this work, there's an extra fixup to fudge the gs base into a usable state. This is suboptimal because it loses the original exception. It's also buggy because there's no guarantee that we were on the kernel stack to begin with. For example, if the failing iret happened on return from an NMI, then we'll end up executing general_protection on the NMI stack. This is bad for several reasons, the most immediate of which is that general_protection, as a non-paranoid idtentry, will try to deliver signals and/or schedule from the wrong stack. This patch throws out bad_iret entirely. As a replacement, it augments the existing swapgs fudge into a full-blown iret fixup, mostly written in C. It's should be clearer and more correct. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06x86_64, traps: Stop using IST for #SSAndy Lutomirski
commit 6f442be2fb22be02cafa606f1769fa1e6f894441 upstream. On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in CAndy Lutomirski
commit af726f21ed8af2cdaa4e93098dc211521218ae65 upstream. There's nothing special enough about the espfix64 double fault fixup to justify writing it in assembly. Move it to C. This also fixes a bug: if the double fault came from an IST stack, the old asm code would return to a partially uninitialized stack frame. Fixes: 3891a04aafd668686239349ea58f3314ea2af86b Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06MIPS: Loongson: Make platform serial setup always built-in.Aaro Koskinen
commit 26927f76499849e095714452b8a4e09350f6a3b9 upstream. If SERIAL_8250 is compiled as a module, the platform specific setup for Loongson will be a module too, and it will not work very well. At least on Loongson 3 it will trigger a build failure, since loongson_sysconf is not exported to modules. Fix by making the platform specific serial code always built-in. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reported-by: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Markos Chandras <Markos.Chandras@imgtec.com> Patchwork: https://patchwork.linux-mips.org/patch/8533/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06MIPS: oprofile: Fix backtrace on 64-bit kernelAaro Koskinen
commit bbaf113a481b6ce32444c125807ad3618643ce57 upstream. Fix incorrect cast that always results in wrong address for the new frame on 64-bit kernels. Signed-off-by: Aaro Koskinen <aaro.koskinen@nsn.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8110/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21Linux 3.10.61v3.10.61Greg Kroah-Hartman
2014-11-21mm: memcg: handle non-error OOM situations more gracefullyJohannes Weiner
commit 4942642080ea82d99ab5b653abb9a12b7ba31f4a upstream. Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full callstack on OOM") assumed that only a few places that can trigger a memcg OOM situation do not return VM_FAULT_OOM, like optional page cache readahead. But there are many more and it's impractical to annotate them all. First of all, we don't want to invoke the OOM killer when the failed allocation is gracefully handled, so defer the actual kill to the end of the fault handling as well. This simplifies the code quite a bit for added bonus. Second, since a failed allocation might not be the abrupt end of the fault, the memcg OOM handler needs to be re-entrant until the fault finishes for subsequent allocation attempts. If an allocation is attempted after the task already OOMed, allow it to bypass the limit so that it can quickly finish the fault and invoke the OOM killer. Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21mm: memcg: do not trap chargers with full callstack on OOMJohannes Weiner
commit 3812c8c8f3953921ef18544110dafc3505c1ac62 upstream. The memcg OOM handling is incredibly fragile and can deadlock. When a task fails to charge memory, it invokes the OOM killer and loops right there in the charge code until it succeeds. Comparably, any other task that enters the charge path at this point will go to a waitqueue right then and there and sleep until the OOM situation is resolved. The problem is that these tasks may hold filesystem locks and the mmap_sem; locks that the selected OOM victim may need to exit. For example, in one reported case, the task invoking the OOM killer was about to charge a page cache page during a write(), which holds the i_mutex. The OOM killer selected a task that was just entering truncate() and trying to acquire the i_mutex: OOM invoking task: mem_cgroup_handle_oom+0x241/0x3b0 mem_cgroup_cache_charge+0xbe/0xe0 add_to_page_cache_locked+0x4c/0x140 add_to_page_cache_lru+0x22/0x50 grab_cache_page_write_begin+0x8b/0xe0 ext3_write_begin+0x88/0x270 generic_file_buffered_write+0x116/0x290 __generic_file_aio_write+0x27c/0x480 generic_file_aio_write+0x76/0xf0 # takes ->i_mutex do_sync_write+0xea/0x130 vfs_write+0xf3/0x1f0 sys_write+0x51/0x90 system_call_fastpath+0x18/0x1d OOM kill victim: do_truncate+0x58/0xa0 # takes i_mutex do_last+0x250/0xa30 path_openat+0xd7/0x440 do_filp_open+0x49/0xa0 do_sys_open+0x106/0x240 sys_open+0x20/0x30 system_call_fastpath+0x18/0x1d The OOM handling task will retry the charge indefinitely while the OOM killed task is not releasing any resources. A similar scenario can happen when the kernel OOM killer for a memcg is disabled and a userspace task is in charge of resolving OOM situations. In this case, ALL tasks that enter the OOM path will be made to sleep on the OOM waitqueue and wait for userspace to free resources or increase the group's limit. But a userspace OOM handler is prone to deadlock itself on the locks held by the waiting tasks. For example one of the sleeping tasks may be stuck in a brk() call with the mmap_sem held for writing but the userspace handler, in order to pick an optimal victim, may need to read files from /proc/<pid>, which tries to acquire the same mmap_sem for reading and deadlocks. This patch changes the way tasks behave after detecting a memcg OOM and makes sure nobody loops or sleeps with locks held: 1. When OOMing in a user fault, invoke the OOM killer and restart the fault instead of looping on the charge attempt. This way, the OOM victim can not get stuck on locks the looping task may hold. 2. When OOMing in a user fault but somebody else is handling it (either the kernel OOM killer or a userspace handler), don't go to sleep in the charge context. Instead, remember the OOMing memcg in the task struct and then fully unwind the page fault stack with -ENOMEM. pagefault_out_of_memory() will then call back into the memcg code to check if the -ENOMEM came from the memcg, and then either put the task to sleep on the memcg's OOM waitqueue or just restart the fault. The OOM victim can no longer get stuck on any lock a sleeping task may hold. Debugged by Michal Hocko. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: azurIt <azurit@pobox.sk> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21mm: memcg: rework and document OOM waiting and wakeupJohannes Weiner
commit fb2a6fc56be66c169f8b80e07ed999ba453a2db2 upstream. The memcg OOM handler open-codes a sleeping lock for OOM serialization (trylock, wait, repeat) because the required locking is so specific to memcg hierarchies. However, it would be nice if this construct would be clearly recognizable and not be as obfuscated as it is right now. Clean up as follows: 1. Remove the return value of mem_cgroup_oom_unlock() 2. Rename mem_cgroup_oom_lock() to mem_cgroup_oom_trylock(). 3. Pull the prepare_to_wait() out of the memcg_oom_lock scope. This makes it more obvious that the task has to be on the waitqueue before attempting to OOM-trylock the hierarchy, to not miss any wakeups before going to sleep. It just didn't matter until now because it was all lumped together into the global memcg_oom_lock spinlock section. 4. Pull the mem_cgroup_oom_notify() out of the memcg_oom_lock scope. It is proctected by the hierarchical OOM-lock. 5. The memcg_oom_lock spinlock is only required to propagate the OOM lock in any given hierarchy atomically. Restrict its scope to mem_cgroup_oom_(trylock|unlock). 6. Do not wake up the waitqueue unconditionally at the end of the function. Only the lockholder has to wake up the next in line after releasing the lock. Note that the lockholder kicks off the OOM-killer, which in turn leads to wakeups from the uncharges of the exiting task. But a contender is not guaranteed to see them if it enters the OOM path after the OOM kills but before the lockholder releases the lock. Thus there has to be an explicit wakeup after releasing the lock. 7. Put the OOM task on the waitqueue before marking the hierarchy as under OOM as that is the point where we start to receive wakeups. No point in listening before being on the waitqueue. 8. Likewise, unmark the hierarchy before finishing the sleep, for symmetry. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>