Here's a proposal for some useful code transformations the kernel janitors
could do as opposed to running checkpatch.pl.
Most ioctl handlers still running implicitely under the big kernel
lock (BKL). But long term Linux is trying to get away from that. There is a new
->unlocked_ioctl entry point that allows ioctls without BKL, but the code
needs to be explicitely converted to use this.
The first step of getting rid of the BKL is typically to make it visible
in the source. Once it is visible people will have incentive to eliminate it.
That is how the BKL conversion project for Linux long ago started too.
On 2.0 all system calls were still implicitely BKL and in 2.1 the
lock/unlock_kernel()s were moved into the various syscall functions and then
step by step eliminated.
And now it's time to do the same for all the ioctls too.
So my proposal is to convert the ->ioctl handlers all over the tree
to ->unlocked_ioctl with explicit lock_kernel()/unlock_kernel.
It is not a completely trivial conversion. You will have to
at least read the source, although not completely understand it.
But it is not very difficult.
Rough recipe:
Grep the source tree for "struct file_operations". You should
fine something like
static int xyz_ioctl(struct inode *i, struct file *f, unsigned cmd, unsigned long arg)
{
switch (cmd) {
case IOCTL1:
err = ...;
...
break;
case IOCTL2:
...
err = ...;
break;
default:
err = -ENOTTY;
}
return err;
}
...
struct file_operations xyz_ops = {
...
.ioctl = xyz_ioctl
};
The conversion is now to change the prototype of xyz_ioctl to
static long xyz_ioctl(struct file *f, unsigned cmd, unsigned long arg)
{
}
This means return type from int to long and drop the struct inode * argument
Then add lock_kernel() to the entry point and unlock_kernel() to all exit points.
So you should get
static long xyz_ioctl(struct file *f, unsigned cmd, unsigned long arg)
{
lock_kernel();
...
unlock_kernel();
return ...[Andi Kleen - Tue, Jan 08, 2008 at 05:40:15PM +0100] | | Here's a proposal for some useful code transformations the kernel janitors | could do as opposed to running checkpatch.pl. | [...snip...] | -Andi | | got it, thanks - Cyrill - --
Thanks, Andi! I think it'd very useful change. --
> Thanks, Andi! I think it'd very useful change. Reminds me this is something that should be actually flagged in checkpatch.pl too Andy, it would be good if checkpatch.pl complained about .ioctl = as opposed to .unlocked_ioctl = ... Also perhaps if a whole new file_operations with a ioctl is added complain about missing compat_ioctl as a low prioritity warning? (might be ok if it's architecture specific on architectures without compat layer) -Andi --
This is rather hard, as there are different data structures that all contain ->ioctl and/or ->unlocked_ioctl function pointers. Some of them already use ->ioctl in an unlocked fashion only, Also, not every data structure that provides a ->ioctl callback also has a ->compat_ioctl, although there should be fewer exceptions here. Arnd <>< --
I imagined it would check for
+struct file_operations ... = {
+ ...
+ .ioctl = ...
That wouldn't catch the case of someone adding only .ioctl to an
already existing file_operations which is not visible in the patch context,
but that should be hopefully rare. The more common case is adding
That's probably a bug in general. e.g. those likely won't work
at all on the "compat by default" architectures like sparc or ppc64.
-Andi
--
Right, this would work fine. We can probably even have a list of data structures that work like file_operations in this regard. Arnd <>< --
file_operations & block_device_operations are the only two that I can find. -- Kevin Winchester --
Because "diff -p" format used by the kernel mailing list takes
the most recent line that begins with an identifier letter and
put that on the hunk header line, even in such a case, you will
see:
@@ -l,k +m,n @@ struct file_operations ... = {
...
...
+ ...
+ .ioctl = ...
+ ...
...
which would be a good enough cue that the new .ioctl member is
added to file_operations.
--
You should probably restrict yourself to files that you can at least compile ... -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Yes of course, I've been silly in didn't verify whether the file compile but I would appreciate to know whether I'm on the right track or not. Switching to a different file now... Thanks. Ciao, -- Paolo http://paolo.ciarrocchi.googlepages.com/ --
Well ... you're not. You've deleted the definition of rtc_fops for no reason I can tell. You put an unlock_kernel() on every place that invokes break;. You should instead have put an unlock_kernel() before every return;. You needed to put a lock_kernel() at the beginning of rtc_ioctl(). You needed to adjust the prototype of rtc_ioctl(). Honestly, if your C skills are as weak as they seem to be, you may be better off trying to gain experience with some other project -- the patch you sent appeared to be simple cargo-cult programming, and we don't need that kind of contribution. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Here's what a correct conversion might look like. I haven't tried to
compile it, so I'm copying and pasting it in order to damage whitespace
and make sure nobody tries to compile it.
index bf1075e..0c543a8 100644
--- a/arch/arm/common/rtctime.c
+++ b/arch/arm/common/rtctime.c
@@ -174,8 +174,7 @@ static unsigned int rtc_poll(struct file *file, poll_table *
return data != 0 ? POLLIN | POLLRDNORM : 0;
}
-static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
- unsigned long arg)
+static long rtc_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct rtc_ops *ops = file->private_data;
struct rtc_time tm;
@@ -183,6 +182,8 @@ static int rtc_ioctl(struct inode *inode, struct file *file,
void __user *uarg = (void __user *)arg;
int ret = -EINVAL;
+ lock_kernel();
+
switch (cmd) {
case RTC_ALM_READ:
ret = rtc_arm_read_alarm(ops, &alrm);
@@ -277,6 +278,9 @@ static int rtc_ioctl(struct inode *inode, struct file *file,
ret = ops->ioctl(cmd, arg);
break;
}
+
+ unlock_kernel();
+
return ret;
}
@@ -334,7 +338,7 @@ static const struct file_operations rtc_fops = {
.llseek = no_llseek,
.read = rtc_read,
.poll = rtc_poll,
- .ioctl = rtc_ioctl,
+ .unlocked_ioctl = rtc_ioctl,
.open = rtc_open,
.release = rtc_release,
.fasync = rtc_fasync,
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
Thank you Matthew, I definitely need to back studying before submitting another patch. Ciao, -- Paolo http://paolo.ciarrocchi.googlepages.com/ --
It seems that the kernel janitors project needs to explicitly describe the prerequisites a person should meet before tackling the enterprise's toughest technical challenges such as operating system kernel development. On the face of it, it seems quite ridiculous to me that a team of extremely qualified kernel engineers spend their valuable time teaching the basics of the C language using this mailing list. Time is precious, and even more so when we are having that many unresolved problems, e.g. when the Kernel Bug Tracker lists more than 1300 open bugs: http://bugzilla.kernel.org/buglist.cgi?order=relevance&bug_status=__open__ Paolo, I have nothing against you personally as you seem to have adequately reacted to what is going on and went on strengthening your C skills; my point is that an operating system kernel development facility such as LKML is most probably not the right place to set up a correspondence course in programming --
Nice start, Paolo, really! I'd recommend to start supervising all this effort, if you wish to participate. This doesn't require C knowledge at all, not every sw manager shall have lang:c [+]. But you will save other's valuable time and effort spent. Also you will eliminate rude and sucking comments like <20080111223651.GA5942@kroah.com>. If you know how to make simple "unlocked_ioctl progress" web page for that on wiki, and to look after it, please do so on kernelnewbies.org site. For now i see on the list following: Mathieu Segaud, Sergio Luis: 'PCI: change procfs' Andre Noll: 'drivers/scsi/sg.c to unlocked_ioctl' Kevin Winchester: 'drivers/char/drm to use .unlocked_ioctl' Nikanth Karthikesan: 'drm_ioctl as an unlocked_ioctl' 'x86 Machine check handler to use unlocked_iocl' 'paride driver to use unlocked_ioctl' 'Random number driver: make random_ioctl as an unlocked_ioctl' Also you can make a "Tools" section there, where experience and tools may be presented. This knowledge may improve future source tree processing efforts, and save one's time and bandwidth (searching archives, wrong patches, etc.) as well. I refer to the _text_ processing tools and techniques, i'm going to present below (skip it guys, if you don't like it). == Tools == As discussions show, knowledge of C doesn't mean solving this task at all. Even worse, one may re-factor/change functions in the way they like, but maintainers don't. Compile-time "testing" is a distraction from the cause. Note, however, that my vision of the root also may have some flaws. Documentation and tutorials. While i'm willing to help in any technical things, i know, please look at this first: http://unix.org.ua/orelly/unix/ "UNIX Power Tools" is a must (without perl part). Keywords: kernel, shell, BRE, sed. So, expressions. Let's see on the error you've made. This is not a C error at first, this is an expression error: what you've did is introducing more special symbols, to ...
On Tue, 8 Jan 2008 20:58:04 +0100 This is a struct with function pointers, not a function. No wonder it doesn't compile after you tried turning it into one :) -- All rights reversed. --
> I grepped and tried to do what you suggested. First a quick question: how would you rate your C knowledge? Did you ever write a program yourself? It's probably better to only do that on files which you can easily compile. In this case it would be better to just put the unlock_kernel() directly before the single return. You only need to sprinkle them all over when the function has multiple returns. Or then change the flow to only No the lock_kernel() of course has to be in the function, not in the structure definition which does not contain any code. Also the _operations structure stays the same (except for .ioctl -> .unlocked_ioctl); I would suggest to only work on files that compile. e.g. do a make allyesconfig make -j$[$(grep -c processor /proc/cpuinfo)*2] &1 |tee LOG (will probably take a long time) first and then only modify files when are mentioned in "LOG" -Andi --
Yes I did but I probably beeing inactive for too much time, Thanks you. Ciao, -- Paolo http://paolo.ciarrocchi.googlepages.com/ --
I did grep for "struct file_operations" in mm:
paolo@paolo-desktop:~/linux-2.6/mm$ grep "struct file_operations" *
shmem.c:static const struct file_operations shmem_file_operations;
shmem.c:static const struct file_operations shmem_file_operations = {
swapfile.c:static const struct file_operations proc_swaps_operations = {
Am I right in saying that both the files don't need to be modified?
There is nothing like:
struct file_operations xyz_ops = {
...
.ioctl = xyz_ioctl
};
in there.
So I guess I need a smarter trick to find out which files need to be modified
as you previously suggested.
Ciao,
--
Paolo
http://paolo.ciarrocchi.googlepages.com/
--
grep -P '\.ioctl.*=' $(grep -rl 'struct file_operations' * ) should work. There are also special multiline greps iirc that might also be able to do this better (like sgrep) -Andi --
You'll need to change the prototype, the unlocked version doesn't take an inode. And you'll need to make sure that nothing in the function uses This is not a return statement. You only need to unlock before the Heh. -- Mathematics is the supreme nostalgia of our time. --
This old chestnut again. Perhaps we could have inode passed to unlocked_ioctl? I never understood why it wasn't there in the first place if the plan was for .unlocked_ioctl to supercede .ioctl whenever possible. Device-mapper for example in dm_blk_ioctl(), has no need for BKL so drops it immediately, but it does need the inode parameter, so it is unable to switch as things stand. Alasdair -- agk@redhat.com --
If you still need inode use struct inode *inode = file->f_dentry->d_inode; -Andi --
And oops if that's not defined? Alasdair -- agk@redhat.com --
Isn't this basically identical to what was being passed in to .ioctl()? Chris --
Not in every case, unless someone's been through and created fake structures in all the remaining places that pass in a NULL 'file' because there isn't one available. Alasdair -- agk@redhat.com --
From 2.6.23's fs/ioctl.c - do_ioctl():
if (filp->f_op->unlocked_ioctl) {
error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
if (error == -ENOIOCTLCMD)
error = -EINVAL;
goto out;
} else if (filp->f_op->ioctl) {
lock_kernel();
error = filp->f_op->ioctl(filp->f_path.dentry->d_inode,
filp, cmd, arg);
unlock_kernel();
}
As a sidenote, please don't use f_dentry and f_vfsmnt, since they are just
#defines for the correct fields. They were meant to be temporary transition
helpers, but (alas) have refused to die thus far. If noone beats me to it,
I'll take a look-see at deprecating them all the way.
-- Vadim Lobanov
--
Ah - you're talking about struct file_operations of course; I was talking about struct block_device_operations. Alasdair -- agk@redhat.com --
For file_operations which we talk about here it always is defined. Block_device is a different story, but it'll get a completely different prototype soon with neither file nor inode passed to it. --
Nothing - patches to do this are queued for 2.6.25: http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-ioctl-remove-... http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-ioctl-move-co... Alasdair -- agk@redhat.com --
Nice. This removes a deadlock we hit, where if creating a device mapper target blocks indefinitely (say on network IO) then nobody else can complete a device mapper operation because BKL is held. If completing the device create depends on some other device mapper operation, then it is game over. Our current workaround is to test for and drop BKL, ugh. Thanks for the cleanup. Regards, Daniel --
Just put unlock kernel near return ret;... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
<snip> I notice that every driver in drivers/ata uses a .ioctl that points to ata_scsi_ioctl(). I could add the BKL to that function, and then change all of the drivers to .unlocked_ioctl, but I assume this would be a candidate to actually clean up by determining why the lock is needed and removing it if necessary. Does anyone know off-hand the reason for needing the lock (I assume someone does or it wouldn't have survived this long)? If the lock is absolutely required, then I can write the patch to add lock_kernel() and unlock_kernel(). -- Kevin Winchester --
Sorry about the noise here - I now notice that not all .ioctl function pointers have the option of changing to .unlocked_ioctl. In this case, the ioctl is in the struct scsi_host_template, rather than struct file_operations. I'll try to be a little more careful about the git grepping in the future. -- Kevin Winchester --
Well it just points to another area that needs to be improved. Clearly scsi_host_template should have a unlocked_ioctl too. -Andi --
On 17:40, Andi Kleen wrote:
Here's my take on drivers/scsi/sg.c. It's only compile-tested on x86-64.
Please review.
Andre
---
Convert sg.c to the new unlocked_ioctl entry point.
Signed-off-by: Andre Noll <maan@systemlinux.org>
---
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index f1871ea..3063307 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -48,6 +48,7 @@ static int sg_version_num =3D 30534; /* 2 digits for each=
component */
#include <linux/blkdev.h>
#include <linux/delay.h>
#include <linux/scatterlist.h>
+#include <linux/smp_lock.h>
=20
#include "scsi.h"
#include <scsi/scsi_dbg.h>
@@ -764,20 +765,22 @@ sg_srp_done(Sg_request *srp, Sg_fd *sfp)
return done;
}
=20
-static int
-sg_ioctl(struct inode *inode, struct file *filp,
- unsigned int cmd_in, unsigned long arg)
+static long
+sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
{
void __user *p =3D (void __user *)arg;
int __user *ip =3D p;
- int result, val, read_only;
+ int val, read_only;
Sg_device *sdp;
Sg_fd *sfp;
Sg_request *srp;
unsigned long iflags;
+ long result;
=20
+ lock_kernel();
+ result =3D -ENXIO;
if ((!(sfp =3D (Sg_fd *) filp->private_data)) || (!(sdp =3D sfp->parentdp=
)))
- return -ENXIO;
+ goto out;
SCSI_LOG_TIMEOUT(3, printk("sg_ioctl: %s, cmd=3D0x%x\n",
sdp->disk->disk_name, (int) cmd_in));
read_only =3D (O_RDWR !=3D (filp->f_flags & O_ACCMODE));
@@ -787,57 +790,66 @@ sg_ioctl(struct inode *inode, struct file *filp,
{
int blocking =3D 1; /* ignore O_NONBLOCK flag */
=20
+ result =3D -ENODEV;
if (sdp->detached)
- return -ENODEV;
+ goto out;
+ result =3D -ENXIO;
if (!scsi_block_when_processing_errors(sdp->device))
- return -ENXIO;
+ goto out;
+ result =3D -EFAULT;
if (!access_ok(VERIFY_WRITE, p, SZ_SG_IO_HDR))
- return -EFAULT;
- result =3D
- sg_new_write(sfp, p, SZ_SG_IO_HDR,
- blocking, read_only, &srp);
+ goto out;
+ result =3D ...... and x86 with all(yes|mod)config. :)
Would had preferred:
if (x) {
result = -Exxxx;
goto out;
}
then:
result = -Exxxx;
if (x)
goto out;
but it looks correct.
cu,
Richard
--
AFAIK, the second form is preferred in Linux because it is better readable and it generates slightly better code. Thanks for the review. Andre --=20 The only person who always got his work done by Friday was Robinson Crusoe
Can you explain the rationale behind that running on the BKL? What type of= =20 things needs to be protected so that this huge hammer is needed? What would= =20 be an earlier point to release the BKL? Greetings, Eike
> Can you explain the rationale behind that running on the BKL? What type of It used to always run with the BKL because everything used to and originally nobody wanted to review all ioctl handlers in tree to see if That depends on the driver. A lot don't need BKL at all and in others it can be easily eliminated. But it needs case-by-case review of the locking situation. The goal of the proposal here is just to make it more visible. -Andi --
So if I write my own driver and have never heard of ioctls running on BKL=20 before I can rather be sure that I just can change the interface of the ioc= tl=20 function, put it in unlocked_ioctl and are fine? Cool. Eike
If you know the BKL is not needed in your code you should use unlocked_ioctl correct. -Andi --
