aboutsummaryrefslogtreecommitdiff
path: root/drivers/ata/libata-eh.c
AgeCommit message (Collapse)Author
2013-11-13libata: make ata_eh_qc_retry() bump scmd->allowed on bogus failuresGwendal Grignou
commit f13e220161e738c2710b9904dcb3cf8bb0bcce61 upstream. libata EH decrements scmd->retries when the command failed for reasons unrelated to the command itself so that, for example, commands aborted due to suspend / resume cycle don't get penalized; however, decrementing scmd->retries isn't enough for ATA passthrough commands. Without this fix, ATA passthrough commands are not resend to the drive, and no error is signalled to the caller because: - allowed retry count is 1 - ata_eh_qc_complete fill the sense data, so result is valid - sense data is filled with untouched ATA registers. Signed-off-by: Gwendal Grignou <gwendal@google.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-14libata: update "Maintained by:" tagsTejun Heo
Jeff moved on to a greener pasture. s/Maintained by: Jeff Garzik/Maintained by: Tejun Heo/g Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeff Garzik <jgarzik@pobox.com>
2013-01-25[libata] pm: differentiate system and runtime pm for ata portAaron Lu
We need to do different things for system PM and runtime PM, e.g. we do not need to enable runtime wake for ZPODD when we are doing system suspend, etc. Currently, we use PMSG_SUSPEND for both system suspend and runtime suspend and PMSG_ON for both system resume and runtime resume. Change this by using PMSG_AUTO_SUSPEND for runtime suspend and PMSG_AUTO_RESUME for runtime resume. And since PMSG_ON means no transition, it is changed to PMSG_RESUME for ata port's system resume. The ata_acpi_set_state is modified accordingly, and the sata case and pata case is seperated for easy reading. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2013-01-21libata: handle power transition of ODDAaron Lu
When ata port is runtime suspended, it will check if the ODD attched to it is a zero power(ZP) capable ODD and if the ZP capable ODD is in zero power ready state. And if this is not the case, the highest acpi state will be limited to ACPI_STATE_D3_HOT to avoid powering off the ODD. And if the ODD can be powered off, runtime wake capability needs to be enabled and powered_off flag will be set to let resume code knows that the ODD was in powered off state. And on resume, before it is powered on, if it was powered off during suspend, runtime wake capability needs to be disabled. After it is recovered, the ODD is considered functional, post power on processing like eject tray if the ODD is drawer type is done, and several ZPODD related fields will also be reset. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2013-01-21libata: check zero power ready status for ZPODDAaron Lu
Per the Mount Fuji spec, the ODD is considered zero power ready when: - For slot type ODD, no media inside; - For tray type ODD, no media inside and tray closed. The information can be retrieved by either the returned information of command GET_EVENT_STATUS_NOTIFICATION(the command is used to poll for media event) or sense code. The information provided by the media status byte is not accurate, it is possible that after a new disc is just inserted, the status byte still returns media not present. So this information can not be used as the deciding factor, we use sense code to decide if zpready status is true. When we first sensed the ODD in the zero power ready state, the zp_sampled will be set and timestamp will be recoreded. And after ODD stayed in this state for some pre-defined period, the ODD is considered as power off ready and the zp_ready flag will be set. The zp_ready flag serves as the deciding factor other code will use to see if power off is OK for the ODD. The Mount Fuji spec suggests a delay should be used here, to avoid the case user ejects the ODD and then instantly inserts a new one again, so that we can avoid a power transition. And some ODDs may be slow to place its head to the home position after disc is ejected, so a delay here is generally a good idea. And the delay time can be changed via the module param zpodd_poweroff_delay. The zero power ready status check is performed in the ata port's runtime suspend code path, when port is not frozen yet, as we need to issue some IOs to the ODD. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2013-01-14[libata] ahci: Fix lack of command retry after a success error handler.Bian Yu
It should be a mistake introduced by commit 8d899e70c1b3afff. qc->flags can't be set AC_ERR_* Signed-off-by: Bian Yu <bianyu@kedacom.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-12-03libata: set dma_mode to 0xff in resetAaron Lu
ata_device->dma_mode's initial value is zero, which is not a valid dma mode, but ata_dma_enabled will return true for this value. This patch sets dma_mode to 0xff in reset function, so that ata_dma_enabled will not return true for this case, or it will cause problem for pata_acpi. The corrsponding bugzilla page is at: https://bugzilla.kernel.org/show_bug.cgi?id=49151 Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Tested-by: Szymon Janc <szymon@janc.net.pl> Tested-by: Dutra Julio <dutra.julio@gmail.com> Acked-by: Alan Cox <alan@linux.intel.com> Cc: <stable@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-10-02Merge tag 'scsi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull first round of SCSI updates from James Bottomley: "This is a large set of updates, mostly for drivers (qla2xxx [including support for new 83xx based card], qla4xxx, mpt2sas, bfa, zfcp, hpsa, be2iscsi, isci, lpfc, ipr, ibmvfc, ibmvscsi, megaraid_sas). There's also a rework for tape adding virtually unlimited numbers of tape drives plus a set of dif fixes for sd and a fix for a live lock on hot remove of SCSI devices. This round includes a signed tag pull of isci-for-3.6 Signed-off-by: James Bottomley <JBottomley@Parallels.com>" Fix up trivial conflict in drivers/scsi/qla2xxx/qla_nx.c due to new PCI helper function use in a function that was removed by this pull. * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (198 commits) [SCSI] st: remove st_mutex [SCSI] sd: Ensure we correctly disable devices with unknown protection type [SCSI] hpsa: gen8plus Smart Array IDs [SCSI] qla4xxx: Update driver version to 5.03.00-k1 [SCSI] qla4xxx: Disable generating pause frames for ISP83XX [SCSI] qla4xxx: Fix double clearing of risc_intr for ISP83XX [SCSI] qla4xxx: IDC implementation for Loopback [SCSI] qla4xxx: update copyrights in LICENSE.qla4xxx [SCSI] qla4xxx: Fix panic while rmmod [SCSI] qla4xxx: Fail probe_adapter if IRQ allocation fails [SCSI] qla4xxx: Prevent MSI/MSI-X falling back to INTx for ISP82XX [SCSI] qla4xxx: Update idc reg in case of PCI AER [SCSI] qla4xxx: Fix double IDC locking in qla4_8xxx_error_recovery [SCSI] qla4xxx: Clear interrupt while unloading driver for ISP83XX [SCSI] qla4xxx: Print correct IDC version [SCSI] qla4xxx: Added new mbox cmd to pass driver version to FW [SCSI] scsi_dh_alua: Enable STPG for unavailable ports [SCSI] scsi_remove_target: fix softlockup regression on hot remove [SCSI] ibmvscsi: Fix host config length field overflow [SCSI] ibmvscsi: Remove backend abstraction ...
2012-09-13ahci: implement aggressive SATA device sleep supportShane Huang
Device Sleep is a feature as described in AHCI 1.3.1 Technical Proposal. This feature enables an HBA and SATA storage device to enter the DevSleep interface state, enabling lower power SATA-based systems. Aggressive Device Sleep enables the HBA to assert the DEVSLP signal as soon as there are no commands outstanding to the device and the port specific Device Sleep idle timer has expired. This enables autonomous entry into the DevSleep interface state without waiting for software in power sensitive systems. This patch enables Aggressive Device Sleep only if both host controller and device support it. Tested on AMD reference board together with Device Sleep supported device sample. Signed-off-by: Shane Huang <shane.huang@amd.com> Reviewed-by: Aaron Lu <aaron.lwe@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-08-24[SCSI] libata: reset onceDan Williams
Hotplug testing with libsas currently encounters a 55 second wait for link recovery to give up. In the case where the user trusts the response time of their devices permit the recovery attempts to be limited to one. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-25Merge branch 'master' [vanilla Linus master] into libata-dev.git/upstreamJeff Garzik
Two bits were appended to the end of the bitfield list in struct scsi_device. Resolve that conflict by including both bits. Conflicts: include/scsi/scsi_device.h
2012-07-25libata-eh.c: local functions should not be exposed globallyH Hartley Sweeten
The function ata_ering_clear_cb is only referenced in this file and should be marked static to prevent it from being exposed globally. This quiets the sparse warning: warning: symbol 'ata_ering_clear_cb' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-07-20[SCSI] libata, libsas: introduce sched_eh and end_eh port opsDan Williams
When managing shost->host_eh_scheduled libata assumes that there is a 1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship so it needs to manage host_eh_scheduled cumulatively at the host level. The sched_eh and end_eh port port ops allow libsas to track when domain devices enter/leave the "eh-pending" state under ha->lock (previously named ha->state_lock, but it is no longer just a lock for ha->state changes). Since host_eh_scheduled indicates eh without backing commands pinning the device it can be deallocated at any time. Move the taking of the domain_device reference under the port_lock to guarantee that the ata_port stays around for the duration of eh. Reviewed-by: Jacek Danecki <jacek.danecki@intel.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-07libata-eh don't waste time retrying media errors (v3)Mark Lord
ATA and SATA drives have had built-in retries for media errors for as long as they've been commonplace in computers (early 1990s). When libata stumbles across a bad sector, it can waste minutes sitting there doing retry after retry before finally giving up and letting the higher layers deal with it. This patch removes retries for media errors only. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-05-03libata: skip old error history when counting probe trialsLin Ming
Commit d902747("[libata] Add ATA transport class") introduced ATA_EFLAG_OLD_ER to mark entries in the error ring as cleared. But ata_count_probe_trials_cb() didn't check this flag and it still counts the old error history. So wrong probe trials count is returned and it causes problem, for example, SATA link speed is slowed down from 3.0Gbps to 1.5Gbps. Fix it by checking ATA_EFLAG_OLD_ER in ata_count_probe_trials_cb(). Cc: stable <stable@vger.kernel.org> # 2.6.37+ Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-02-19[SCSI] libsas: execute transport link resets with libata-eh via host workqueueDan Williams
Link resets leave ata affiliations intact, so arrange for libsas to make an effort to avoid dropping the device due to a slow-to-recover link. Towards this end carry out reset in the host workqueue so that it can check for ata devices and kick the reset request to libata. Hard resets, in contrast, bypass libata since they are meant for associating an ata device with another initiator in the domain (tears down affiliations). Need to add a new transport_sas_phy_reset() since the current sas_phy_reset() is a utility function to libsas lldds. They are not prepared for it to loop back into eh. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-11-09[libata] Issue SRST to Sil3726 PMPGwendal Grignou
Reenable sending SRST to devices connected behind a Sil3726 PMP. This allow staggered spinups and handles drives that spins up slowly. While the drives spin up, the PMP will not accept SRST. Most controller reissues the reset until the drive is ready, while some [Sil3124] returns an error. In ata_eh_error, wait 10s before reset the ATA port and try again. Signed-off-by: Gwendal Grignou <gwendal@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-10-31ide/ata: Add export.h for EXPORT_SYMBOL/THIS_MODULE where neededPaul Gortmaker
They were getting this implicitly by an include of module.h from device.h -- but we are going to clean that up and break that include chain, so include export.h explicitly now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-08libata-eh: ata_eh_followup_srst_needed() does not need 'classes' parameterSergei Shtylyov
... since it does not use it. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-07-23libata: leave port thawed after reset failureTejun Heo
libata EH intentionally left a port frozen if it failed ata_eh_reset(). The intention was avoiding continuous loop of resets when the controller or attached device is flaky and reporting spurious hotplug events. Once port enters this state, it can be recovered with manual rescan, which seemed reasonable. However, outside of my convoluted test setup, there have been very few reports justifying this choice while there have been more cases where the automatic freezing of the port after hotplug attempt of a faulty device caused confusion and led to unnecessary resets. This patch changes the behavior so that the port is thawed after reset failure. This change doesn't necessarily solve but makes it easier and more intuitive to work around hotplug related problems (ie. re-pluggin or power cycling the device) as reported in the followings. https://bugzilla.kernel.org/show_bug.cgi?id=34712 http://thread.gmane.org/gmane.linux.kernel/1123265/focus=49548 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Reartes Guillermo <rtguille@gmail.com> Reported-by: Bruce Stenning <b.stenning@indigovision.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23ata: Convert ata_<foo>_printk(KERN_<LEVEL> to ata_<foo>_<level>Joe Perches
Saves text by removing nearly duplicated text format strings by creating ata_<foo>_printk functions and printf extension %pV. ata defconfig size shrinks ~5% (~8KB), allyesconfig ~2.5% (~13KB) Format string duplication comes from: #define ata_link_printk(link, lv, fmt, args...) do { \ if (sata_pmp_attached((link)->ap) || (link)->ap->slave_link) \ printk("%sata%u.%02u: "fmt, lv, (link)->ap->print_id, \ (link)->pmp , ##args); \ else \ printk("%sata%u: "fmt, lv, (link)->ap->print_id , ##args); \ } while(0) Coalesce long formats. $ size drivers/ata/built-in.* text data bss dec hex filename 544969 73893 116584 735446 b38d6 drivers/ata/built-in.allyesconfig.ata.o 558429 73893 117864 750186 b726a drivers/ata/built-in.allyesconfig.dev_level.o 141328 14689 4220 160237 271ed drivers/ata/built-in.defconfig.ata.o 149567 14689 4220 168476 2921c drivers/ata/built-in.defconfig.dev_level.o Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-06-07libata: fix unexpectedly frozen port after ata_eh_reset()Tejun Heo
To work around controllers which can't properly plug events while reset, ata_eh_reset() clears error states and ATA_PFLAG_EH_PENDING after reset but before RESET is marked done. As reset is the final recovery action and full verification of devices including onlineness and classfication match is done afterwards, this shouldn't lead to lost devices or missed hotplug events. Unfortunately, it forgot to thaw the port when clearing EH_PENDING, so if the condition happens after resetting an empty port, the port could be left frozen and EH will end without thawing it, making the port unresponsive to further hotplug events. Thaw if the port is frozen after clearing EH_PENDING. This problem is reported by Bruce Stenning in the following thread. http://thread.gmane.org/gmane.linux.kernel/1123265 stable: I think we should weather this patch a bit longer in -rcX before sending it to -stable. Please wait at least a month after this patch makes upstream. Thanks. -v2: Fixed spelling in the comment per Dave Howorth. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Bruce Stenning <b.stenning@indigovision.com> Cc: stable@kernel.org Cc: Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-05-19libata: Power off empty portsKristen Carlson Accardi
Give users the option of completely powering off unoccupied SATA ports using the existing min_power link_power_management_policy option. When the use selects this option on an empty port, we will power the port off by setting DET to off. For occupied ports, behavior is unchanged. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-05-14libata: fix oops when LPM is used with PMPTejun Heo
ae01b2493c (libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65) added ATA_FLAG_NO_DIPM and made ata_eh_set_lpm() check the flag. However, @ap is NULL if @link points to a PMP link and thus the unconditional @ap->flags dereference leads to the following oops. BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510 ... Pid: 295, comm: scsi_eh_4 Tainted: P 2.6.38.5-core2 #1 System76, Inc. Serval Professional/Serval Professional RIP: 0010:[<ffffffff813f98e1>] [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510 RSP: 0018:ffff880132defbf0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880132f40000 RCX: 0000000000000000 RDX: ffff88013377c000 RSI: ffff880132f40000 RDI: 0000000000000000 RBP: ffff880132defce0 R08: ffff88013377dc58 R09: ffff880132defd98 R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000 R13: 0000000000000000 R14: ffff88013377c000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 0000000001a03000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process scsi_eh_4 (pid: 295, threadinfo ffff880132dee000, task ffff880133b416c0) Stack: 0000000000000000 ffff880132defcc0 0000000000000000 ffff880132f42738 ffffffff813ee8f0 ffffffff813eefe0 ffff880132defd98 ffff88013377f190 ffffffffa00b3e30 ffffffff813ef030 0000000032defc60 ffff880100000000 Call Trace: [<ffffffff81400867>] sata_pmp_error_handler+0x607/0xc30 [<ffffffffa00b273f>] ahci_error_handler+0x1f/0x70 [libahci] [<ffffffff813faade>] ata_scsi_error+0x5be/0x900 [<ffffffff813cf724>] scsi_error_handler+0x124/0x650 [<ffffffff810834b6>] kthread+0x96/0xa0 [<ffffffff8100cd64>] kernel_thread_helper+0x4/0x10 Code: 8b 95 70 ff ff ff b8 00 00 00 00 48 3b 9a 10 2e 00 00 48 0f 44 c2 48 89 85 70 ff ff ff 48 8b 8d 70 ff ff ff f6 83 69 02 00 00 01 <48> 8b 41 18 0f 85 48 01 00 00 48 85 c9 74 12 48 8b 51 08 48 83 RIP [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510 RSP <ffff880132defbf0> CR2: 0000000000000018 Fix it by testing @link->ap->flags instead. stable: ATA_FLAG_NO_DIPM was added during 2.6.39 cycle but was backported to 2.6.37 and 38. This is a fix for that and thus also applicable to 2.6.37 and 38. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: "Nathan A. Mourey II" <nmoureyii@ne.rr.com> LKML-Reference: <1304555277.2059.2.camel@localhost.localdomain> Cc: Connor H <cmdkhh@gmail.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65Tejun Heo
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM is used. Implement ATA_FLAG_NO_DIPM and apply it. This problem was reported by Stefan Bader in the following thread. http://thread.gmane.org/gmane.linux.ide/48841 stable: applicable to 2.6.37 and 38. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-02libata: separate error handler into usable componentsJames Bottomley
Right at the moment, the libata error handler is incredibly monolithic. This makes it impossible to use from composite drivers like libsas and ipr which have to handle error themselves in the first instance. The essence of the change is to split the monolithic error handler into two components: one which handles a queue of ata commands for processing and the other which handles the back end of readying a port. This allows the upper error handler fine grained control in calling libsas functions (and making sure they only get called for ATA commands whose lower errors have been fixed up). Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02libata: fix eh lockingJames Bottomley
The SCSI host eh_cmd_q should be protected by the host lock (not the port lock). This probably doesn't matter that much at the moment, since we try to serialise the add and eh pieces, but it might matter in future for more convenient error handling. Plus this switches libata to the standard eh pattern where you lock, remove from the cmd queue to a local list and unlock and then operate on the local list. Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02libata: fix hotplug for drivers which don't implement LPMTejun Heo
ata_eh_analyze_serror() suppresses hotplug notifications if LPM is being used because LPM generates spurious hotplug events. It compared whether link->lpm_policy was different from ATA_LPM_MAX_POWER to determine whether LPM is enabled; however, this is incorrect as for drivers which don't implement LPM, lpm_policy is always ATA_LPM_UNKNOWN. This disabled hotplug detection for all drivers which don't implement LPM. Fix it by comparing whether lpm_policy is greater than ATA_LPM_MAX_POWER. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-12-24libata: issue DIPM enable commands with LPM state updatedTejun Heo
Low level drivers may behave differently depending on the current link->lpm_policy. During ata_eh_set_lpm(), DIPM enable commands are issued after the successful completion of ap->ops->set_lpm(), which means that the controller is already in the target state. This causes DIPM enable commands to be processed with mismatching controller power state and link->lpm_policy value. In ahci, link->lpm_policy is used to ignore certain PHY events if LPM is enabled; however, as DIPM commands are issued with stale link->lpm_policy, they sometimes end up triggering these conditions and get aborted leading to LPM configuration failure. Fix it by updating link->lpm_policy before issuing DIPM enable commands. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Kyle McMartin <kyle@mcmartin.ca> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21libata: implement cross-port EH exclusionTejun Heo
In libata, the non-EH code paths should always take and release ap->lock explicitly when accessing hardware or shared data structures. However, once EH is active, it's assumed that the port is owned by EH and EH methods don't explicitly take ap->lock unless race from irq handler or other code paths are expected. However, libata EH didn't guarantee exclusion among EHs for ports of the same host. IOW, multiple EHs may execute in parallel on multiple ports of the same controller. In many cases, especially in SATA, the ports are completely independent of each other and this doesn't cause problems; however, there are cases where different ports share the same resource, which lead to obscure timing related bugs such as the one fixed by commit 213373cf (ata_piix: fix locking around SIDPR access). This patch implements exclusion among EHs of the same host. When EH begins, it acquires per-host EH ownership by calling ata_eh_acquire(). When EH finishes, the ownership is released by calling ata_eh_release(). EH ownership is also released whenever the EH thread goes to sleep from ata_msleep() or explicitly and reacquired after waking up. This ensures that while EH is actively accessing the hardware, it has exclusive access to it while allowing EHs to interleave and progress in parallel as they hit waiting stages, which dominate the time spent in EH. This achieves cross-port EH exclusion without pervasive and fragile changes while still allowing parallel EH for the most part. This was first reported by yuanding02@gmail.com more than three years ago in the following bugzilla. :-) https://bugzilla.kernel.org/show_bug.cgi?id=8223 Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Reported-by: yuanding02@gmail.com Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21libata: add @ap to ata_wait_register() and introduce ata_msleep()Tejun Heo
Add optional @ap argument to ata_wait_register() and replace msleep() calls with ata_msleep() which take optional @ap in addition to the duration. These will be used to implement EH exclusion. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21libata: implement LPM support for port multipliersTejun Heo
Port multipliers can do DIPM on fan-out links fine. Implement support for it. Tested w/ SIMG 57xx and marvell PMPs. Both the host and fan-out links enter power save modes nicely. SIMG 37xx and 47xx report link offline on SStatus causing EH to detach the devices. Blacklisted. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21libata: reimplement link power managementTejun Heo
The current LPM implementation has the following issues. * Operation order isn't well thought-out. e.g. HIPM should be configured after IPM in SControl is properly configured. Not the other way around. * Suspend/resume paths call ata_lpm_enable/disable() which must only be called from EH context directly. Also, ata_lpm_enable/disable() were called whether LPM was in use or not. * Implementation is per-port when it should be per-link. As a result, it can't be used for controllers with slave links or PMP. * LPM state isn't managed consistently. After a link reset for whatever reason including suspend/resume the actual LPM state would be reset leaving ap->lpm_policy inconsistent. * Generic/driver-specific logic boundary isn't clear. Currently, libahci has to mangle stuff which libata EH proper should be handling. This makes the implementation unnecessarily complex and fragile. * Tied to ALPM. Doesn't consider DIPM only cases and doesn't check whether the device allows HIPM. * Error handling isn't implemented. Given the extent of mismatch with the rest of libata, I don't think trying to fix it piecewise makes much sense. This patch reimplements LPM support. * The new implementation is per-link. The target policy is still port-wide (ap->target_lpm_policy) but all the mechanisms and states are per-link and integrate well with the rest of link abstraction and can work with slave and PMP links. * Core EH has proper control of LPM state. LPM state is reconfigured when and only when reconfiguration is necessary. It makes sure that LPM state is reset when probing for new device on the link. Controller agnostic logic is now implemented in libata EH proper and driver implementation only has to deal with controller specifics. * Proper error handling. LPM config failure is attributed to the device on the link and LPM is disabled for the link if it fails repeatedly. * ops->enable/disable_pm() are replaced with single ops->set_lpm() which takes @policy and @hints. This simplifies driver specific implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21libata: clean up lpm related symbols and sysfs show/store functionsTejun Heo
Link power management related symbols are in confusing state w/ mixed usages of lpm, ipm and pm. This patch cleans up lpm related symbols and sysfs show/store functions as follows. * lpm states - NOT_AVAILABLE, MIN_POWER, MAX_PERFORMANCE and MEDIUM_POWER are renamed to ATA_LPM_UNKNOWN and ATA_LPM_{MIN|MAX|MED}_POWER. * Pre/postfixes are unified to lpm. * sysfs show/store functions for link_power_management_policy were curiously named get/put and unnecessarily complex. Renamed to show/store and simplified. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21[libata] Add ATA transport classGwendal Grignou
This is a scheleton for libata transport class. All information is read only, exporting information from libata: - ata_port class: one per ATA port - ata_link class: one per ATA port or 15 for SATA Port Multiplier - ata_device class: up to 2 for PATA link, usually one for SATA. Signed-off-by: Gwendal Grignou <gwendal@google.com> Reviewed-by: Grant Grundler <grundler@google.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09libata: skip EH autopsy and recovery during suspendTejun Heo
For some mysterious reason, certain hardware reacts badly to usual EH actions while the system is going for suspend. As the devices won't be needed until the system is resumed, ask EH to skip usual autopsy and recovery and proceed directly to suspend. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-08-07Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits) workqueue: mark init_workqueues() as early_initcall() workqueue: explain for_each_*cwq_cpu() iterators fscache: fix build on !CONFIG_SYSCTL slow-work: kill it gfs2: use workqueue instead of slow-work drm: use workqueue instead of slow-work cifs: use workqueue instead of slow-work fscache: drop references to slow-work fscache: convert operation to use workqueue instead of slow-work fscache: convert object to use workqueue instead of slow-work workqueue: fix how cpu number is stored in work->data workqueue: fix mayday_mask handling on UP workqueue: fix build problem on !CONFIG_SMP workqueue: fix locking in retry path of maybe_create_worker() async: use workqueue for worker pool workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead workqueue: implement unbound workqueue workqueue: prepare for WQ_UNBOUND implementation libata: take advantage of cmwq and remove concurrency limitations workqueue: fix worker management invocation without pending works ... Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-01[libata] add ATA_CMD_DSM to ata_get_cmd_descriptFUJITA Tomonori
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-07-02libata: take advantage of cmwq and remove concurrency limitationsTejun Heo
libata has two concurrency related limitations. a. ata_wq which is used for polling PIO has single thread per CPU. If there are multiple devices doing polling PIO on the same CPU, they can't be executed simultaneously. b. ata_aux_wq which is used for SCSI probing has single thread. In cases where SCSI probing is stalled for extended period of time which is possible for ATAPI devices, this will stall all probing. #a is solved by increasing maximum concurrency of ata_wq. Please note that polling PIO might be used under allocation path and thus needs to be served by a separate wq with a rescuer. #b is solved by using the default wq instead and achieving exclusion via per-port mutex. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com>
2010-05-19libata-sff: separate out BMDMA EHTejun Heo
Some of error handling logic in ata_sff_error_handler() and all of ata_sff_post_internal_cmd() are for BMDMA. Create ata_bmdma_error_handler() and ata_bmdma_post_internal_cmd() and move BMDMA part into those. While at it, change DMA protocol check to ata_is_dma(), fix post_internal_cmd to call ap->ops->bmdma_stop instead of directly calling ata_bmdma_stop() and open code hardreset selection so that ata_std_error_handler() doesn't have to know about sff hardreset. As these two functions are BMDMA specific, there's no reason to check for bmdma_addr before calling bmdma methods if the protocol of the failed command is DMA. sata_mv and pata_mpc52xx now don't need to set .post_internal_cmd to ATA_OP_NULL and pata_icside and sata_qstor don't need to set it to their bmdma_stop routines. ata_sff_post_internal_cmd() becomes noop and is removed. This fixes p3 described in clean-up-BMDMA-initialization patch. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-19libata-sff: port_task is SFF specificTejun Heo
port_task is tightly bound to the standard SFF PIO HSM implementation. Using it for any other purpose would be error-prone and there's no such user and if some drivers need such feature, it would be much better off using its own. Move it inside CONFIG_ATA_SFF and rename it to sff_pio_task. The only function which is exposed to the core layer is ata_sff_flush_pio_task() which is renamed from ata_port_flush_task() and now also takes care of resetting hsm_task_state to HSM_ST_IDLE, which is possible as it's now specific to PIO HSM. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-22libata: ensure NCQ error result taskfile is fully initializedJeff Garzik
before returning it via qc->result_tf. Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-22libata: fix locking around blk_abort_request()Tejun Heo
blk_abort_request() expectes queue lock to be held by the caller. Grab it before calling the function. Lack of this synchronization led to infinite loop on corrupt q->timeout_list. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-01-20libata: retry FS IOs even if it has failed with AC_ERR_INVALIDTejun Heo
libata currently doesn't retry if a command fails with AC_ERR_INVALID assuming that retrying won't get it any further even if retried. However, a failure may be classified as invalid through hardware glitch (incorrect reading of the error register or firmware bug) and there isn't whole lot to gain by not retrying as actually invalid commands will be failed immediately. Also, commands serving FS IOs are extremely unlikely to be invalid. Retry FS IOs even if it's marked invalid. Transient and incorrect invalid failure was seen while debugging firmware related issue on Samsung n130 on bko#14314. http://bugzilla.kernel.org/show_bug.cgi?id=14314 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Johannes Stezenbach <js@sig21.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03libata: retry failed FLUSH if device didn't fail itTejun Heo
If ATA device failed FLUSH, it means that the device failed to write out some amount of data and the error needs to be reported to upper layers. As retries can't recover the lost data, FLUSH failures need to be reported immediately in general. However, if FLUSH fails due to transmission errors, the FLUSH needs to be retried; otherwise, filesystems may switch to RO mode and/or raid array may drop a drive for a random transmission glitch. This condition can be rather easily reproduced on certain ahci controllers which go through a PHY event after powersave mode switch + ext4 combination. Powersave mode switch is often closely followed by flush from the filesystem failing the FLUSH with ATA bus error which makes the filesystem code believe that data is lost and drop to RO mode. This was reported in the following bugzilla bug. http://bugzilla.kernel.org/show_bug.cgi?id=14543 This patch makes libata EH retry FLUSH if it wasn't failed by the device. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-10-16libata: fix PMP initializationTejun Heo
Commit 842faa6c1a1d6faddf3377948e5cf214812c6c90 fixed error handling during attach by not committing detected device class to dev->class while attaching a new device. However, this change missed the PMP class check in the configuration loop causing a new PMP device to go through ata_dev_configure() as if it were an ATA or ATAPI device. As PMP device doesn't have a regular IDENTIFY data, this makes ata_dev_configure() tries to configure a PMP device using an invalid data. For the most part, it wasn't too harmful and went unnoticed but this ends up clearing dev->flags which may have ATA_DFLAG_AN set by sata_pmp_attach(). This means that SATA_PMP_FEAT_NOTIFY ends up being disabled on PMPs and on PMPs which honor the flag breaks hotplug support. This problem was discovered and reported by Ethan Hsiao. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Ethan Hsiao <ethanhsiao@jmicron.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-10-06libata: fix incorrect link online check during probeTejun Heo
While trying to work around spurious detection retries for non-existent devices on slave links, commit 816ab89782ac139a8b65147cca990822bb7e8675 incorrectly added link offline check logic before ata_eh_thaw() was called. This means that if an occupied link goes down briefly at the time that offline check was performed, device class will be cleared to ATA_DEV_NONE and libata wouldn't retry thus failing detection of the device. The offline check should be done after the port is thawed together with online check so that such link glitches can be detected by the interrupt handler and handled properly. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tim Blechmann <tim@klingt.org> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-01libata: add command name parsing for error outputRobert Hancock
This patch improve libata's output for error/notification messages to allow easier comprehension and debugging: When ATAPI commands issued through the SCSI layer fail, use SCSI functions to print the CDB in human-readable form instead of just dumping out the CDB in hex. Print out the name of the failed command (as defined by the ATA specification) in error handling output along with the raw register contents. When reporting status of ACPI taskfile commands executed on resume, also output the names of the commands being executed (or not) in readable form. Since the extra data for printing command names increases kernel size slightly, a config option has been added to allow disabling command name output (as well as some of the error register parsing) for those highly sensitive to kernel text size. Signed-off-by: Robert Hancock <hancockrwd@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-01libata: clear eh_info on reset completionTejun Heo
Resets are done with port frozen but some controllers still issue interrupts during reset and they may end up recording error conditions in ehi leading to unnecessary EH retrials. This patch makes ata_eh_reset() clear ehi on reset completion. As reset is the most severe recovery action, there's nothing to lose by clearing ehi on its completion. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>