libitm: Fixed conversion to C++11 atomics.

	libitm/
	* beginend.cc (GTM::gtm_thread::begin_transaction): Add comment.
	(GTM::gtm_thread::try_commit): Changed memory order.
	* config/linux/alpha/futex_bits.h (sys_futex0): Take atomic int
	as parameter.
	* config/linux/x86/futex_bits.h (sys_futex0): Same.
	* config/linux/sh/futex_bits.h (sys_futex0): Same.
	* config/linux/futex_bits.h (sys_futex0): Same.
	* config/linux/futex.cc (futex_wait, futex_wake): Same.
	* config/linux/futex.h (futex_wait, futex_wake): Same.
	* config/linux/rwlock.h (gtm_rwlock::writers,
	gtm_rwlock::writer_readers, gtm_rwlock::readers): Change to atomic
	ints.
	* config/linux/rwlock.cc (gtm_rwlock::read_lock,
	gtm_rwlock::write_lock_generic, gtm_rwlock::read_unlock,
	gtm_rwlock::write_unlock): Fix memory orders and fences.
	* config/posix/rwlock.cc (gtm_rwlock::read_lock,
	gtm_rwlock::write_lock_generic, gtm_rwlock::read_unlock,
	gtm_rwlock::write_unlock): Same.
	* config/linux/rwlock.h (gtm_rwlock::summary): Change to atomic int.
	* method-gl.cc (gl_mg::init, gl_wt_dispatch::memtransfer_static,
	gl_wt_dispatch::memset_static, gl_wt_dispatch::begin_or_restart):
	Add comments.
	(gl_wt_dispatch::pre_write, gl_wt_dispatch::validate,
	gl_wt_dispatch::load, gl_wt_dispatch::store,
	gl_wt_dispatch::try_commit, gl_wt_dispatch::rollback): Fix memory
	orders and fences.  Add comments.

From-SVN: r182674
diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index bcc8516..d0ad5a7 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -259,6 +259,9 @@
   else
     {
 #ifdef HAVE_64BIT_SYNC_BUILTINS
+      // We don't really care which block of TIDs we get but only that we
+      // acquire one atomically; therefore, relaxed memory order is
+      // sufficient.
       tx->id = global_tid.fetch_add(tid_block_size, memory_order_relaxed);
       tx->local_tid = tx->id + 1;
 #else
@@ -471,17 +474,28 @@
       // Ensure privatization safety, if necessary.
       if (priv_time)
 	{
+          // There must be a seq_cst fence between the following loads of the
+          // other transactions' shared_state and the dispatch-specific stores
+          // that signal updates by this transaction (e.g., lock
+          // acquisitions).  This ensures that if we read prior to other
+          // reader transactions setting their shared_state to 0, then those
+          // readers will observe our updates.  We can reuse the seq_cst fence
+          // in serial_lock.read_unlock() however, so we don't need another
+          // one here.
 	  // TODO Don't just spin but also block using cond vars / futexes
 	  // here. Should probably be integrated with the serial lock code.
-	  // TODO For C++0x atomics, the loads of other threads' shared_state
-	  // should have acquire semantics (together with releases for the
-	  // respective updates). But is this unnecessary overhead because
-	  // weaker barriers are sufficient?
 	  for (gtm_thread *it = gtm_thread::list_of_threads; it != 0;
 	      it = it->next_thread)
 	    {
 	      if (it == this) continue;
-	      while (it->shared_state.load(memory_order_relaxed) < priv_time)
+	      // We need to load other threads' shared_state using acquire
+	      // semantics (matching the release semantics of the respective
+	      // updates).  This is necessary to ensure that the other
+	      // threads' memory accesses happen before our actions that
+	      // assume privatization safety.
+	      // TODO Are there any platform-specific optimizations (e.g.,
+	      // merging barriers)?
+	      while (it->shared_state.load(memory_order_acquire) < priv_time)
 		cpu_relax();
 	    }
 	}