aboutsummaryrefslogtreecommitdiff
path: root/security
AgeCommit message (Collapse)Author
2011-01-07Merge branch 'vfs-scale-working' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...
2011-01-07fs: rcu-walk for path lookupNick Piggin
Perform common cases of path lookups without any stores or locking in the ancestor dentry elements. This is called rcu-walk, as opposed to the current algorithm which is a refcount based walk, or ref-walk. This results in far fewer atomic operations on every path element, significantly improving path lookup performance. It also avoids cacheline bouncing on common dentries, significantly improving scalability. The overall design is like this: * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk. * Take the RCU lock for the entire path walk, starting with the acquiring of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are not required for dentry persistence. * synchronize_rcu is called when unregistering a filesystem, so we can access d_ops and i_ops during rcu-walk. * Similarly take the vfsmount lock for the entire path walk. So now mnt refcounts are not required for persistence. Also we are free to perform mount lookups, and to assume dentry mount points and mount roots are stable up and down the path. * Have a per-dentry seqlock to protect the dentry name, parent, and inode, so we can load this tuple atomically, and also check whether any of its members have changed. * Dentry lookups (based on parent, candidate string tuple) recheck the parent sequence after the child is found in case anything changed in the parent during the path walk. * inode is also RCU protected so we can load d_inode and use the inode for limited things. * i_mode, i_uid, i_gid can be tested for exec permissions during path walk. * i_op can be loaded. When we reach the destination dentry, we lock it, recheck lookup sequence, and increment its refcount and mountpoint refcount. RCU and vfsmount locks are dropped. This is termed "dropping rcu-walk". If the dentry refcount does not match, we can not drop rcu-walk gracefully at the current point in the lokup, so instead return -ECHILD (for want of a better errno). This signals the path walking code to re-do the entire lookup with a ref-walk. Aside from the final dentry, there are other situations that may be encounted where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take a reference on the last good dentry) and continue with a ref-walk. Again, if we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup using ref-walk. But it is very important that we can continue with ref-walk for most cases, particularly to avoid the overhead of double lookups, and to gain the scalability advantages on common path elements (like cwd and root). The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * dentries with d_revalidate * Following links In future patches, permission checks and d_revalidate become rcu-walk aware. It may be possible eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache rationalise dget variantsNick Piggin
dget_locked was a shortcut to avoid the lazy lru manipulation when we already held dcache_lock (lru manipulation was relatively cheap at that point). However, how that the lru lock is an innermost one, we never hold it at any caller, so the lock cost can now be avoided. We already have well working lazy dcache LRU, so it should be fine to defer LRU manipulations to scan time. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache remove dcache_lockNick Piggin
dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale subdirsNick Piggin
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale d_unhashedNick Piggin
Protect d_unhashed(dentry) condition with d_lock. This means keeping DCACHE_UNHASHED bit in synch with hash manipulations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1436 commits) cassini: Use local-mac-address prom property for Cassini MAC address net: remove the duplicate #ifdef __KERNEL__ net: bridge: check the length of skb after nf_bridge_maybe_copy_header() netconsole: clarify stopping message netconsole: don't announce stopping if nothing happened cnic: Fix the type field in SPQ messages netfilter: fix export secctx error handling netfilter: fix the race when initializing nf_ct_expect_hash_rnd ipv4: IP defragmentation must be ECN aware net: r6040: Return proper error for r6040_init_one dcb: use after free in dcb_flushapp() dcb: unlock on error in dcbnl_ieee_get() net: ixp4xx_eth: Return proper error for eth_init_one include/linux/if_ether.h: Add #define ETH_P_LINK_CTL for HPNA and wlan local tunnel net: add POLLPRI to sock_def_readable() af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks. net_sched: pfifo_head_drop problem mac80211: remove stray extern mac80211: implement off-channel TX using hw r-o-c offload mac80211: implement hardware offload for remain-on-channel ...
2011-01-05af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks.David S. Miller
unix_release() can asynchornously set socket->sk to NULL, and it does so without holding the unix_state_lock() on "other" during stream connects. However, the reverse mapping, sk->sk_socket, is only transitioned to NULL under the unix_state_lock(). Therefore make the security hooks follow the reverse mapping instead of the forward mapping. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03ima: fix add LSM rule bugMimi Zohar
If security_filter_rule_init() doesn't return a rule, then not everything is as fine as the return code implies. This bug only occurs when the LSM (eg. SELinux) is disabled at runtime. Adding an empty LSM rule causes ima_match_rules() to always succeed, ignoring any remaining rules. default IMA TCB policy: # PROC_SUPER_MAGIC dont_measure fsmagic=0x9fa0 # SYSFS_MAGIC dont_measure fsmagic=0x62656572 # DEBUGFS_MAGIC dont_measure fsmagic=0x64626720 # TMPFS_MAGIC dont_measure fsmagic=0x01021994 # SECURITYFS_MAGIC dont_measure fsmagic=0x73636673 < LSM specific rule > dont_measure obj_type=var_log_t measure func=BPRM_CHECK measure func=FILE_MMAP mask=MAY_EXEC measure func=FILE_CHECK mask=MAY_READ uid=0 Thus without the patch, with the boot parameters 'tcb selinux=0', adding the above 'dont_measure obj_type=var_log_t' rule to the default IMA TCB measurement policy, would result in nothing being measured. The patch prevents the default TCB policy from being replaced. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Cc: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: David Safford <safford@watson.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-26Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/fib_frontend.c
2010-12-23KEYS: Don't call up_write() if __key_link_begin() returns an errorDavid Howells
In construct_alloc_key(), up_write() is called in the error path if __key_link_begin() fails, but this is incorrect as __key_link_begin() only returns with the nominated keyring locked if it returns successfully. Without this patch, you might see the following in dmesg: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- mount.cifs/5769 is trying to release lock (&key->sem) at: [<ffffffff81201159>] request_key_and_link+0x263/0x3fc but there are no more locks to release! other info that might help us debug this: 3 locks held by mount.cifs/5769: #0: (&type->s_umount_key#41/1){+.+.+.}, at: [<ffffffff81131321>] sget+0x278/0x3e7 #1: (&ret_buf->session_mutex){+.+.+.}, at: [<ffffffffa0258e59>] cifs_get_smb_ses+0x35a/0x443 [cifs] #2: (root_key_user.cons_lock){+.+.+.}, at: [<ffffffff81201000>] request_key_and_link+0x10a/0x3fc stack backtrace: Pid: 5769, comm: mount.cifs Not tainted 2.6.37-rc6+ #1 Call Trace: [<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc [<ffffffff81081601>] print_unlock_inbalance_bug+0xca/0xd5 [<ffffffff81083248>] lock_release_non_nested+0xc1/0x263 [<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc [<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc [<ffffffff81083567>] lock_release+0x17d/0x1a4 [<ffffffff81073f45>] up_write+0x23/0x3b [<ffffffff81201159>] request_key_and_link+0x263/0x3fc [<ffffffffa026fe9e>] ? cifs_get_spnego_key+0x61/0x21f [cifs] [<ffffffff812013c5>] request_key+0x41/0x74 [<ffffffffa027003d>] cifs_get_spnego_key+0x200/0x21f [cifs] [<ffffffffa026e296>] CIFS_SessSetup+0x55d/0x1273 [cifs] [<ffffffffa02589e1>] cifs_setup_session+0x90/0x1ae [cifs] [<ffffffffa0258e7e>] cifs_get_smb_ses+0x37f/0x443 [cifs] [<ffffffffa025a9e3>] cifs_mount+0x1aa1/0x23f3 [cifs] [<ffffffff8111fd94>] ? alloc_debug_processing+0xdb/0x120 [<ffffffffa027002c>] ? cifs_get_spnego_key+0x1ef/0x21f [cifs] [<ffffffffa024cc71>] cifs_do_mount+0x165/0x2b3 [cifs] [<ffffffff81130e72>] vfs_kern_mount+0xaf/0x1dc [<ffffffff81131007>] do_kern_mount+0x4d/0xef [<ffffffff811483b9>] do_mount+0x6f4/0x733 [<ffffffff8114861f>] sys_mount+0x88/0xc2 [<ffffffff8100ac42>] system_call_fastpath+0x16/0x1b Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-23SELinux: indicate fatal error in compat netfilter codeEric Paris
The SELinux ip postroute code indicates when policy rejected a packet and passes the error back up the stack. The compat code does not. This patch sends the same kind of error back up the stack in the compat code. Based-on-patch-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-23SELinux: Only return netlink error when we know the return is fatalEric Paris
Some of the SELinux netlink code returns a fatal error when the error might actually be transient. This patch just silently drops packets on potentially transient errors but continues to return a permanant error indicator when the denial was because of policy. Based-on-comments-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17SELinux: return -ECONNREFUSED from ip_postroute to signal fatal errorEric Paris
The SELinux netfilter hooks just return NF_DROP if they drop a packet. We want to signal that a drop in this hook is a permanant fatal error and is not transient. If we do this the error will be passed back up the stack in some places and applications will get a faster interaction that something went wrong. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15capabilities/syslog: open code cap_syslog logic to fix build failureEric Paris
The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build failure when CONFIG_PRINTK=n. This is because the capabilities code which used the new option was built even though the variable in question didn't exist. The patch here fixes this by moving the capabilities checks out of the LSM and into the caller. All (known) LSMs should have been calling the capabilities hook already so it actually makes the code organization better to eliminate the hook altogether. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: APPARMOR: Fix memory leak of apparmor_init() APPARMOR: Fix memory leak of alloc_namespace()
2010-11-12Restrict unprivileged access to kernel syslogDan Rosenberg
The kernel syslog contains debugging information that is often useful during exploitation of other vulnerabilities, such as kernel heap addresses. Rather than futilely attempt to sanitize hundreds (or thousands) of printk statements and simultaneously cripple useful debugging functionality, it is far simpler to create an option that prevents unprivileged users from reading the syslog. This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the dmesg_restrict sysctl. When set to "0", the default, no restrictions are enforced. When set to "1", only users with CAP_SYS_ADMIN can read the kernel syslog via dmesg(8) or other mechanisms. [akpm@linux-foundation.org: explain the config option in kernel.txt] Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Eugene Teo <eugeneteo@kernel.org> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-11APPARMOR: Fix memory leak of apparmor_init()wzt.wzt@gmail.com
set_init_cxt() allocted sizeof(struct aa_task_cxt) bytes for cxt, if register_security() failed, it will cause memory leak. Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-11-11APPARMOR: Fix memory leak of alloc_namespace()wzt.wzt@gmail.com
policy->name is a substring of policy->hname, if prefix is not NULL, it will allocted strlen(prefix) + strlen(name) + 3 bytes to policy->hname in policy_init(). use kzfree(ns->base.name) will casue memory leak if alloc_namespace() failed. Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-29convert get_sb_single() usersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-28Fix install_process_keyring error handlingAndi Kleen
Fix an incorrect error check that returns 1 for error instead of the expected error code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...
2010-10-26Merge branch 'ima-memory-use-fixes'Linus Torvalds
* ima-memory-use-fixes: IMA: fix the ToMToU logic IMA: explicit IMA i_flag to remove global lock on inode_delete IMA: drop refcnt from ima_iint_cache since it isn't needed IMA: only allocate iint when needed IMA: move read counter into struct inode IMA: use i_writecount rather than a private counter IMA: use inode->i_lock to protect read and write counters IMA: convert internal flags from long to char IMA: use unsigned int instead of long for counters IMA: drop the inode opencount since it isn't needed for operation IMA: use rbtree instead of radix tree for inode information cache
2010-10-26IMA: fix the ToMToU logicEric Paris
Current logic looks like this: rc = ima_must_measure(NULL, inode, MAY_READ, FILE_CHECK); if (rc < 0) goto out; if (mode & FMODE_WRITE) { if (inode->i_readcount) send_tomtou = true; goto out; } if (atomic_read(&inode->i_writecount) > 0) send_writers = true; Lets assume we have a policy which states that all files opened for read by root must be measured. Lets assume the file has permissions 777. Lets assume that root has the given file open for read. Lets assume that a non-root process opens the file write. The non-root process will get to ima_counts_get() and will check the ima_must_measure(). Since it is not supposed to measure it will goto out. We should check the i_readcount no matter what since we might be causing a ToMToU voilation! This is close to correct, but still not quite perfect. The situation could have been that root, which was interested in the mesurement opened and closed the file and another process which is not interested in the measurement is the one holding the i_readcount ATM. This is just overly strict on ToMToU violations, which is better than not strict enough... Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: explicit IMA i_flag to remove global lock on inode_deleteEric Paris
Currently for every removed inode IMA must take a global lock and search the IMA rbtree looking for an associated integrity structure. Instead we explicitly mark an inode when we add an integrity structure so we only have to take the global lock and do the removal if it exists. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: drop refcnt from ima_iint_cache since it isn't neededEric Paris
Since finding a struct ima_iint_cache requires a valid struct inode, and the struct ima_iint_cache is supposed to have the same lifetime as a struct inode (technically they die together but don't need to be created at the same time) we don't have to worry about the ima_iint_cache outliving or dieing before the inode. So the refcnt isn't useful. Just get rid of it and free the structure when the inode is freed. Signed-off-by: Eric Paris <eapris@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: only allocate iint when neededEric Paris
IMA always allocates an integrity structure to hold information about every inode, but only needed this structure to track the number of readers and writers currently accessing a given inode. Since that information was moved into struct inode instead of the integrity struct this patch stops allocating the integrity stucture until it is needed. Thus greatly reducing memory usage. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: move read counter into struct inodeEric Paris
IMA currently allocated an inode integrity structure for every inode in core. This stucture is about 120 bytes long. Most files however (especially on a system which doesn't make use of IMA) will never need any of this space. The problem is that if IMA is enabled we need to know information about the number of readers and the number of writers for every inode on the box. At the moment we collect that information in the per inode iint structure and waste the rest of the space. This patch moves those counters into the struct inode so we can eventually stop allocating an IMA integrity structure except when absolutely needed. This patch does the minimum needed to move the location of the data. Further cleanups, especially the location of counter updates, may still be possible. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: use i_writecount rather than a private counterEric Paris
IMA tracks the number of struct files which are holding a given inode readonly and the number which are holding the inode write or r/w. It needs this information so when a new reader or writer comes in it can tell if this new file will be able to invalidate results it already made about existing files. aka if a task is holding a struct file open RO, IMA measured the file and recorded those measurements and then a task opens the file RW IMA needs to note in the logs that the old measurement may not be correct. It's called a "Time of Measure Time of Use" (ToMToU) issue. The same is true is a RO file is opened to an inode which has an open writer. We cannot, with any validity, measure the file in question since it could be changing. This patch attempts to use the i_writecount field to track writers. The i_writecount field actually embeds more information in it's value than IMA needs but it should work for our purposes and allow us to shrink the struct inode even more. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: use inode->i_lock to protect read and write countersEric Paris
Currently IMA used the iint->mutex to protect the i_readcount and i_writecount. This patch uses the inode->i_lock since we are going to start using in inode objects and that is the most appropriate lock. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: convert internal flags from long to charEric Paris
The IMA flags is an unsigned long but there is only 1 flag defined. Lets save a little space and make it a char. This packs nicely next to the array of u8's. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: use unsigned int instead of long for countersEric Paris
Currently IMA uses 2 longs in struct inode. To save space (and as it seems impossible to overflow 32 bits) we switch these to unsigned int. The switch to unsigned does require slightly different checks for underflow, but it isn't complex. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: drop the inode opencount since it isn't needed for operationEric Paris
The opencount was used to help debugging to make sure that everything which created a struct file also correctly made the IMA calls. Since we moved all of that into the VFS this isn't as necessary. We should be able to get the same amount of debugging out of just the reader and write count. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: use rbtree instead of radix tree for inode information cacheEric Paris
The IMA code needs to store the number of tasks which have an open fd granting permission to write a file even when IMA is not in use. It needs this information in order to be enabled at a later point in time without losing it's integrity garantees. At the moment that means we store a little bit of data about every inode in a cache. We use a radix tree key'd on the inode's memory address. Dave Chinner pointed out that a radix tree is a terrible data structure for such a sparse key space. This patch switches to using an rbtree which should be more efficient. Bug report from Dave: "I just noticed that slabtop was reporting an awfully high usage of radix tree nodes: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 4200331 2778082 66% 0.55K 144839 29 2317424K radix_tree_node 2321500 2060290 88% 1.00K 72581 32 2322592K xfs_inode 2235648 2069791 92% 0.12K 69864 32 279456K iint_cache That is, 2.7M radix tree nodes are allocated, and the cache itself is consuming 2.3GB of RAM. I know that the XFS inodei caches are indexed by radix tree node, but for 2 million cached inodes that would mean a density of 1 inode per radix tree node, which for a system with 16M inodes in the filsystems is an impossibly low density. The worst I've seen in a production system like kernel.org is about 20-25% density, which would mean about 150-200k radix tree nodes for that many inodes. So it's not the inode cache. So I looked up what the iint_cache was. It appears to used for storing per-inode IMA information, and uses a radix tree for indexing. It uses the *address* of the struct inode as the indexing key. That means the key space is extremely sparse - for XFS the struct inode addresses are approximately 1000 bytes apart, which means the closest the radix tree index keys get is ~1000. Which means that there is a single entry per radix tree leaf node, so the radix tree is using roughly 550 bytes for every 120byte structure being cached. For the above example, it's probably wasting close to 1GB of RAM...." Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25fs: take dcache_lock inside __d_pathChristoph Hellwig
All callers take dcache_lock just around the call to __d_path, so take the lock into it in preparation of getting rid of dcache_lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: do not assign default i_ino in new_inodeChristoph Hellwig
Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-22Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bklLinus Torvalds
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek
2010-10-21selinux: include vmalloc.h for vmalloc_userStephen Rothwell
Include vmalloc.h for vmalloc_user (fixes ppc build warning). Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21selinux: implement mmap on /selinux/policyEric Paris
/selinux/policy allows a user to copy the policy back out of the kernel. This patch allows userspace to actually mmap that file and use it directly. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21SELinux: allow userspace to read policy back out of the kernelEric Paris
There is interest in being able to see what the actual policy is that was loaded into the kernel. The patch creates a new selinuxfs file /selinux/policy which can be read by userspace. The actual policy that is loaded into the kernel will be written back out to userspace. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21SELinux: drop useless (and incorrect) AVTAB_MAX_SIZEEric Paris
AVTAB_MAX_SIZE was a define which was supposed to be used in userspace to define a maximally sized avtab when userspace wasn't sure how big of a table it needed. It doesn't make sense in the kernel since we always know our table sizes. The only place it is used we have a more appropiately named define called AVTAB_MAX_HASH_BUCKETS, use that instead. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21SELinux: deterministic ordering of range transition rulesEric Paris
Range transition rules are placed in the hash table in an (almost) arbitrary order. This patch inserts them in a fixed order to make policy retrival more predictable. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21security: secid_to_secctx returns len when data is NULLEric Paris
With the (long ago) interface change to have the secid_to_secctx functions do the string allocation instead of having the caller do the allocation we lost the ability to query the security server for the length of the upcoming string. The SECMARK code would like to allocate a netlink skb with enough length to hold the string but it is just too unclean to do the string allocation twice or to do the allocation the first time and hold onto the string and slen. This patch adds the ability to call security_secid_to_secctx() with a NULL data pointer and it will just set the slen pointer. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21secmark: make secmark object handling genericEric Paris
Right now secmark has lots of direct selinux calls. Use all LSM calls and remove all SELinux specific knowledge. The only SELinux specific knowledge we leave is the mode. The only point is to make sure that other LSMs at least test this generic code before they assume it works. (They may also have to make changes if they do not represent labels as strings) Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Paul Moore <paul.moore@hp.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21AppArmor: Ensure the size of the copy is < the buffer allocated to hold itJohn Johansen
Actually I think in this case the appropriate thing to do is to BUG as there is currently a case (remove) where the alloc_size needs to be larger than the copy_size, and if copy_size is ever greater than alloc_size there is a mistake in the caller code. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21TOMOYO: Print URL information before panic().Tetsuo Handa
Configuration files for TOMOYO 2.3 are not compatible with TOMOYO 2.2. But current panic() message is too unfriendly and is confusing users. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21security: remove unused parameter from security_task_setscheduler()KOSAKI Motohiro
All security modules shouldn't change sched_param parameter of security_task_setscheduler(). This is not only meaningless, but also make a harmful result if caller pass a static variable. This patch remove policy and sched_param parameter from security_task_setscheduler() becuase none of security module is using it. Cc: James Morris <jmorris@namei.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21selinux: fix up style problem on /selinux/statusKaiGai Kohei
This patch fixes up coding-style problem at this commit: 4f27a7d49789b04404eca26ccde5f527231d01d5 selinux: fast status update interface (/selinux/status) Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21selinux: change to new flag variablematt mooney
Replace EXTRA_CFLAGS with ccflags-y. Signed-off-by: matt mooney <mfm@muteddisk.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21selinux: really fix dependency causing parallel compile failure.Paul Gortmaker
While the previous change to the selinux Makefile reduced the window significantly for this failure, it is still possible to see a compile failure where cpp starts processing selinux files before the auto generated flask.h file is completed. This is easily reproduced by adding the following temporary change to expose the issue everytime: - cmd_flask = scripts/selinux/genheaders/genheaders ... + cmd_flask = sleep 30 ; scripts/selinux/genheaders/genheaders ... This failure happens because the creation of the object files in the ss subdir also depends on flask.h. So simply incorporate them into the parent Makefile, as the ss/Makefile really doesn't do anything unique. With this change, compiling of all selinux files is dependent on completion of the header file generation, and this test case with the "sleep 30" now confirms it is functioning as expected. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: James Morris <jmorris@namei.org>