aboutsummaryrefslogtreecommitdiff
path: root/fs/open.c
AgeCommit message (Collapse)Author
2013-02-26get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zeroAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22propagate error from get_empty_filp() to its callersAl Viro
Based on parts from Anatol's patch (the rest is the next commit). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20vfs: make fchownat retry once on ESTALE errorsJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20vfs: make fchmodat retry once on ESTALE errorsJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20vfs: have chroot retry once on ESTALE errorJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20vfs: have chdir retry lookup and call once on ESTALE errorJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20vfs: have faccessat retry once on an ESTALE errorJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20vfs: have do_sys_truncate retry once on an ESTALE errorJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20VFS: Make more complete truncate operation available to CacheFilesDavid Howells
Make a more complete truncate operation available to CacheFiles (including security checks and suchlike) so that it can use this to clear invalidated cache files. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-19vfs: Allow chroot if you have CAP_SYS_CHROOT in your user namespaceEric W. Biederman
Once you are confined to a user namespace applications can not gain privilege and escape the user namespace so there is no longer a reason to restrict chroot. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-10-12vfs: make path_openat take a struct filename pointerJeff Layton
...and fix up the callers. For do_file_open_root, just declare a struct filename on the stack and fill out the .name field. For do_filp_open, make it also take a struct filename pointer, and fix up its callers to call it appropriately. For filp_open, add a variant that takes a struct filename pointer and turn filp_open into a wrapper around it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12vfs: define struct filename and have getname() return itJeff Layton
getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12audit: set the name_len in audit_inode for parent lookupsJeff Layton
Currently, this gets set mostly by happenstance when we call into audit_inode_child. While that might be a little more efficient, it seems wrong. If the syscall ends up failing before audit_inode_child ever gets called, then you'll have an audit_names record that shows the full path but has the parent inode info attached. Fix this by passing in a parent flag when we call audit_inode that gets set to the value of LOOKUP_PARENT. We can then fix up the pathname for the audit entry correctly from the get-go. While we're at it, clean up the no-op macro for audit_inode in the !CONFIG_AUDITSYSCALL case. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...
2012-10-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...
2012-09-26switch simple cases of fget_light to fdgetAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26switch fchmod(2) to fget_light()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26switch fallocate(2) to fget_light()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26switch ftruncate(2) to fget_lightAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26don't leak O_CLOEXEC into ->f_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26take descriptor-related part of close() to file.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26move put_unused_fd() and fd_install() to fs/file.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-21userns: Teach security_path_chown to take kuids and kgidsEric W. Biederman
Don't make the security modules deal with raw user space uid and gids instead pass in a kuid_t and a kgid_t so that security modules only have to deal with internal kernel uids and gids. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: James Morris <james.l.morris@oracle.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-15vfs: canonicalize create mode in build_open_flags()Miklos Szeredi
Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) | S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> CC: stable@vger.kernel.org
2012-08-04missed mnt_drop_write() in do_dentry_open()Al Viro
This one ought to be __mnt_drop_write(), to match __mnt_want_write() in the beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31fs: Protect write paths by sb_start_write - sb_end_writeJan Kara
There are several entry points which dirty pages in a filesystem. mmap (handled by block_page_mkwrite()), buffered write (handled by __generic_file_aio_write()), splice write (generic_file_splice_write), truncate, and fallocate (these can dirty last partial page - handled inside each filesystem separately). Protect these places with sb_start_write() and sb_end_write(). ->page_mkwrite() calls are particularly complex since they are called with mmap_sem held and thus we cannot use standard sb_start_write() due to lock ordering constraints. We solve the problem by using a special freeze protection sb_start_pagefault() which ranks below mmap_sem. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31fs: Add freezing handling to mnt_want_write() / mnt_drop_write()Jan Kara
Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29take grabbing f->f_path to do_dentry_open()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-23switch dentry_open() to struct path, make it grab references itselfAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14VFS: Make chown() and lchown() call fchownat()David Howells
Make the chown() and lchown() syscalls jump to the fchownat() syscall with the appropriate extra arguments. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14do_dentry_open(): close the race with mark_files_ro() in failure exitAl Viro
we want to take it out of mark_files_ro() reach *before* we start checking if we ought to drop write access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14do_dentry_open(): take initialization of file->f_path to callerAl Viro
... and get rid of a couple of arguments and a pointless reassignment in finish_open() case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14fold __dentry_open() into its sole callerAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14switch do_dentry_open() to returning intAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14make finish_no_open() return intAl Viro
namely, 1 ;-) That's what we want to return from ->atomic_open() instances after finish_no_open(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14kill struct opendataAl Viro
Just pass struct file *. Methods are happier that way... There's no need to return struct file * from finish_open() now, so let it return int. Next: saner prototypes for parts in namei.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14kill opendata->{mnt,dentry}Al Viro
->filp->f_path is there for purpose... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14don't modify od->filp at allAl Viro
make put_filp() conditional on flag set by finish_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14->atomic_open() prototype change - pass int * instead of bool *Al Viro
... and let finish_open() report having opened the file via that sucker. Next step: don't modify od->filp at all. [AV: FILE_CREATE was already used by cifs; Miklos' fix folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14vfs: remove open intents from nameidataMiklos Szeredi
All users of open intents have been converted to use ->atomic_{open,create}. This patch gets rid of nd->intent.open and related infrastructure. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14vfs: add i_op->atomic_open()Miklos Szeredi
Add a new inode operation which is called on the last component of an open. Using this the filesystem can look up, possibly create and open the file in one atomic operation. If it cannot perform this (e.g. the file type turned out to be wrong) it may signal this by returning NULL instead of an open struct file pointer. i_op->atomic_open() is only called if the last component is negative or needs lookup. Handling cached positive dentries here doesn't add much value: these can be opened using f_op->open(). If the cached file turns out to be invalid, the open can be retried, this time using ->atomic_open() with a fresh dentry. For now leave the old way of using open intents in lookup and revalidate in place. This will be removed once all the users are converted. David Howells noticed that if ->atomic_open() opens the file but does not create it, handle_truncate() will be called on it even if it is not a regular file. Fix this by checking the file type in this case too. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-07vfs: make O_PATH file descriptors usable for 'fchdir()'Linus Torvalds
We already use them for openat() and friends, but fchdir() also wants to be able to use O_PATH file descriptors. This should make it comparable to the O_SEARCH of Solaris. In particular, O_PATH allows you to access (not-quite-open) a directory you don't have read persmission to, only execute permission. Noticed during development of multithread support for ksh93. Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org # O_PATH introduced in 3.0+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01vfs: nameidata_to_filp(): don't throw away file on errorMiklos Szeredi
If open fails, don't put the file. This allows it to be reused if open needs to be retried. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01vfs: nameidata_to_filp(): inline __dentry_open()Miklos Szeredi
Copy __dentry_open() into nameidata_to_filp(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01vfs: do_dentry_open(): don't put filpMiklos Szeredi
Move put_filp() out to __dentry_open(), the only caller now. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01vfs: split __dentry_open()Miklos Szeredi
Split __dentry_open() into two functions: do_dentry_open() - does most of the actual work, doesn't put file on failure open_check_o_direct() - after a successful open, checks direct_IO method This will allow i_op->atomic_open to do just the file initialization and leave the direct_IO checking to the VFS. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...
2012-05-03userns: Convert user specfied uids and gids in chown into kuids and kgidEric W. Biederman
Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03userns: Convert capabilities related permsion checksEric W. Biederman
- Use uid_eq when comparing kuids Use gid_eq when comparing kgids - Use make_kuid(user_ns, 0) to talk about the user_namespace root uid Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>