aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mthca/mthca_catas.c
AgeCommit message (Collapse)Author
2011-01-11IB/mthca: Fix driver when sizeof (phys_addr_t) > sizeof (long)John L. Burr
Some systems have PCI addresses that don't fit in unsigned long (eg some 32-bit PowerPC 440 systems have 36-bit bus addresses). Fix up the driver by using phys_addr_t where appropriate, so we don't truncate any PCI resource addresses before ioremapping them. Signed-off-by: John L. Burr <jlburr@cadence.com> [ Update to apply to current driver source. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-24IB/mthca: Fix access to freed memory in catastrophic event handlingJack Morgenstein
catas_reset() uses a pointer to mthca_dev, but mthca_dev is not valid after the call to __mthca_restart_one(). Based on a similar patch for mlx4 (634354d7, "mlx4: Fix access to freed memory") by Vitaliy Gusev <vgusev@openvz.org> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05IB/mthca: Don't allow userspace open while recovering from catastrophic errorJack Morgenstein
Userspace apps are supposed to release all ib device resources if they receive a fatal async event (IBV_EVENT_DEVICE_FATAL). However, the app has no way of knowing when the device has come back up, except to repeatedly attempt ibv_open_device() until it succeeds. However, currently there is no protection against the open succeeding while the device is in being removed following the fatal event. In this case, the open will succeed, but as a result the device waits in the middle of its removal until the new app releases its resources -- and the new app will not do so, since the open succeeded at a point following the fatal event generation. This patch adds an "active" flag to the device. The active flag is set to false (in the fatal event flow) before the "fatal" event is generated, so any subsequent ibv_dev_open() call to the device will fail until the device comes back up, thus preventing the above deadlock. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-29IB/mthca: Use pci_request_regions()Roland Dreier
Back in prehistoric (pre-git!) days, the kernel's MSI-X support did request_mem_region() on a device's MSI-X tables, which meant that a driver that enabled MSI-X couldn't use pci_request_regions() (since that would clash with the PCI layer's MSI-X request). However, that was removed (by me!) years ago, so mthca can just use pci_request_regions() and pci_release_regions() instead of its own much more complicated code that avoids requesting the MSI-X tables. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IB/mthca: Use round_jiffies() for catastrophic error polling timerRoland Dreier
Exactly when the catastrophic error polling timer function runs is not important, so use round_jiffies() to save unnecessary wakeups. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IB/mthca: Remove "stop" flag for catastrophic error polling timerRoland Dreier
Since we use del_timer_sync() anyway, there's no need for an additional flag to tell the timer not to rearm. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14RDMA: Remove subversion $Id tagsRoland Dreier
They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-22WorkStruct: make allyesconfigDavid Howells
Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-09-22IB/mthca: Recover from catastrophic errorsJack Morgenstein
Trigger device remove and then add when a catastrophic error is detected in hardware. This, in turn, will cause a device reset, which we hope will recover from the catastrophic condition. Since this might interefere with debugging the root cause, add a module option to suppress this behaviour. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
2005-11-10[IB] mthca: fix typo in catastrophic error pollingRoland Dreier
Fix a typo in the rearming of the catastrophic error polling timer: we should rearm the timer as long as the stop flag is _not_ set. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-07[PATCH] fix remaining missing includesTim Schmielau
Fix more include file problems that surfaced since I submitted the previous fix-missing-includes.patch. This should now allow not to include sched.h from module.h, which is done by a followup patch. Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-27[IB] mthca: first pass at catastrophic error reportingRoland Dreier
Add some initial support for detecting and reporting catastrophic errors reported by Mellanox HCAs. We start a periodic timer which polls the catastrophic error reporting buffer in device memory. If an error is detected, we dump the contents of the buffer for port-mortem debugging, and report a fatal asynchronous error to higher levels. In the future we can try to recover from these errors by resetting the device, but this will require some work in higher-level code as well. Let's get this in now, so that we at least get catastrophic errors reported in logs. Signed-off-by: Roland Dreier <rolandd@cisco.com>