Re: [JANITOR PROPOSAL] Switch ioctl functions to ->unlocked_ioctl

Previous thread: [2.6.24-rc6-mm1]Build failure in drivers/net/ehea/ehea_main.c by Sudhir Kumar on Tuesday, January 8, 2008 - 9:33 am. (15 messages)

Next thread: [PATCH][KJ] input: remove duplicate includes by Andre Haupt on Tuesday, January 8, 2008 - 9:47 am. (2 messages)
From: Andi Kleen
Date: Tuesday, January 8, 2008 - 9:40 am

Here's a proposal for some useful code transformations the kernel janitors
could do as opposed to running checkpatch.pl.

Most ioctl handlers still running implicitely under the big kernel 
lock (BKL). But long term Linux is trying to get away from that. There is a new
->unlocked_ioctl entry point that allows ioctls without BKL, but the code
needs to be explicitely converted to use this.

The first step of getting rid of the BKL is typically to make it visible
in the source. Once it is visible people will have incentive to eliminate it.
That is how the BKL conversion project for Linux long ago started too.
On 2.0 all system calls were still implicitely BKL and in 2.1 the
lock/unlock_kernel()s were moved into the various syscall functions and then
step by step eliminated.

And now it's time to do the same for all the ioctls too.

So my proposal is to convert the ->ioctl handlers all over the tree 
to ->unlocked_ioctl with explicit lock_kernel()/unlock_kernel.

It is not a completely trivial conversion. You will have to 
at least read the source, although not completely understand it.
But it is not very difficult.

Rough recipe:

Grep the source tree for "struct file_operations". You should
fine something like

static int xyz_ioctl(struct inode *i, struct file *f, unsigned cmd, unsigned long arg) 
{
	switch (cmd) { 
	case IOCTL1:
		err = ...;
		...	
		break;
	case IOCTL2:
		...
		err = ...;
		break;
	default:
		err = -ENOTTY;
	} 
	return err;
}
...

struct file_operations xyz_ops = {
	...
	.ioctl = xyz_ioctl
};

The conversion is now to change the prototype of xyz_ioctl to 

static long xyz_ioctl(struct file *f, unsigned cmd, unsigned long arg)
{
}

This means return type from int to long and drop the struct inode * argument

Then add lock_kernel() to the entry point and unlock_kernel() to all exit points.
So you should get

static long xyz_ioctl(struct file *f, unsigned cmd, unsigned long arg)
{
	lock_kernel();
	...
	unlock_kernel();
	return ...
From: Cyrill Gorcunov
Date: Tuesday, January 8, 2008 - 10:05 am

[Andi Kleen - Tue, Jan 08, 2008 at 05:40:15PM +0100]
| 
| Here's a proposal for some useful code transformations the kernel janitors
| could do as opposed to running checkpatch.pl.
| 
[...snip...]
| -Andi
| 
| 

got it, thanks

		- Cyrill -
--

From: Alexey Dobriyan
Date: Tuesday, January 8, 2008 - 11:52 am

Thanks, Andi! I think it'd very useful change.
--

From: Andi Kleen
Date: Tuesday, January 8, 2008 - 12:18 pm

> Thanks, Andi! I think it'd very useful change.

Reminds me this is something that should be actually flagged
in checkpatch.pl too

Andy, it would be good if checkpatch.pl complained about .ioctl = 
as opposed to .unlocked_ioctl = ...

Also perhaps if a whole new file_operations with a ioctl is added
complain about missing compat_ioctl as a low prioritity warning?
(might be ok if it's architecture specific on architectures without
compat layer) 

-Andi
--

From: Arnd Bergmann
Date: Tuesday, January 8, 2008 - 5:40 pm

This is rather hard, as there are different data structures that
all contain ->ioctl and/or ->unlocked_ioctl function pointers.
Some of them already use ->ioctl in an unlocked fashion only,

Also, not every data structure that provides a ->ioctl callback
also has a ->compat_ioctl, although there should be fewer exceptions
here.

	Arnd <><
--

From: Andi Kleen
Date: Tuesday, January 8, 2008 - 5:47 pm

I imagined it would check for 

+struct file_operations ... = { 
+      ...
+	.ioctl = ...

That wouldn't catch the case of someone adding only .ioctl to an 
already existing file_operations which is not visible in the patch context, 
but that should be hopefully rare. The more common case is adding

That's probably a bug in general. e.g. those likely won't work 
at all on the "compat by default" architectures like sparc or ppc64.

-Andi

--

From: Arnd Bergmann
Date: Tuesday, January 8, 2008 - 6:19 pm

Right, this would work fine. We can probably even have a list of
data structures that work like file_operations in this regard.

	Arnd <><
--

From: Kevin Winchester
Date: Tuesday, January 8, 2008 - 6:31 pm

file_operations & block_device_operations are the only two that I can find.

-- 
Kevin Winchester
--

From: Junio C Hamano
Date: Wednesday, January 9, 2008 - 3:00 am

Because "diff -p" format used by the kernel mailing list takes
the most recent line that begins with an identifier letter and
put that on the hunk header line, even in such a case, you will
see:

	@@ -l,k +m,n @@ struct file_operations ... = {
		...
		...
	+	...
	+	.ioctl = ...
	+	...
		...

which would be a good enough cue that the new .ioctl member is
added to file_operations.
--

From: Matthew Wilcox
Date: Tuesday, January 8, 2008 - 1:00 pm

You should probably restrict yourself to files that you can at least
compile ...

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Paolo Ciarrocchi
Date: Tuesday, January 8, 2008 - 1:03 pm

Yes of course, I've been silly in didn't verify whether the file compile
but I would appreciate to know whether I'm on the right track or not.

Switching to a different file now...

Thanks.

Ciao,
-- 
Paolo
http://paolo.ciarrocchi.googlepages.com/
--

From: Matthew Wilcox
Date: Tuesday, January 8, 2008 - 1:16 pm

Well ... you're not.

You've deleted the definition of rtc_fops for no reason I can tell.

You put an unlock_kernel() on every place that invokes break;.  You
should instead have put an unlock_kernel() before every return;.

You needed to put a lock_kernel() at the beginning of rtc_ioctl().

You needed to adjust the prototype of rtc_ioctl().

Honestly, if your C skills are as weak as they seem to be, you may be
better off trying to gain experience with some other project -- the
patch you sent appeared to be simple cargo-cult programming, and we
don't need that kind of contribution.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Matthew Wilcox
Date: Tuesday, January 8, 2008 - 1:21 pm

Here's what a correct conversion might look like.  I haven't tried to
compile it, so I'm copying and pasting it in order to damage whitespace
and make sure nobody tries to compile it.

index bf1075e..0c543a8 100644
--- a/arch/arm/common/rtctime.c
+++ b/arch/arm/common/rtctime.c
@@ -174,8 +174,7 @@ static unsigned int rtc_poll(struct file *file, poll_table *
        return data != 0 ? POLLIN | POLLRDNORM : 0;
 }
 
-static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
-                    unsigned long arg)
+static long rtc_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
        struct rtc_ops *ops = file->private_data;
        struct rtc_time tm;
@@ -183,6 +182,8 @@ static int rtc_ioctl(struct inode *inode, struct file *file,
        void __user *uarg = (void __user *)arg;
        int ret = -EINVAL;
 
+       lock_kernel();
+
        switch (cmd) {
        case RTC_ALM_READ:
                ret = rtc_arm_read_alarm(ops, &alrm);
@@ -277,6 +278,9 @@ static int rtc_ioctl(struct inode *inode, struct file *file,
                        ret = ops->ioctl(cmd, arg);
                break;
        }
+
+       unlock_kernel();
+
        return ret;
 }
 
@@ -334,7 +338,7 @@ static const struct file_operations rtc_fops = {
        .llseek         = no_llseek,
        .read           = rtc_read,
        .poll           = rtc_poll,
-       .ioctl          = rtc_ioctl,
+       .unlocked_ioctl = rtc_ioctl,
        .open           = rtc_open,
        .release        = rtc_release,
        .fasync         = rtc_fasync,


-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Paolo Ciarrocchi
Date: Tuesday, January 8, 2008 - 1:26 pm

Thank you Matthew,
I definitely need to back studying before submitting another patch.

Ciao,
-- 
Paolo
http://paolo.ciarrocchi.googlepages.com/
--

From: Dmitri Vorobiev
Date: Tuesday, January 8, 2008 - 4:55 pm

It seems that the kernel janitors project needs to explicitly describe the
prerequisites a person should meet before tackling the enterprise's toughest
technical challenges such as operating system kernel development.

On the face of it, it seems quite ridiculous to me that a team of extremely
qualified kernel engineers spend their valuable time teaching the basics of
the C language using this mailing list. Time is precious, and even more so
when we are having that many unresolved problems, e.g. when the Kernel Bug
Tracker lists more than 1300 open bugs:

http://bugzilla.kernel.org/buglist.cgi?order=relevance&bug_status=__open__

Paolo, I have nothing against you personally as you seem to have adequately
reacted to what is going on and went on strengthening your C skills; my point
is that an operating system kernel development facility such as LKML is most
probably not the right place to set up a correspondence course in programming

--


Nice start, Paolo, really!

I'd recommend to start supervising all this effort, if you wish to
participate. This doesn't require C knowledge at all, not every sw
manager shall have lang:c [+]. But you will save other's valuable time
and effort spent. Also you will eliminate rude and sucking comments like
<20080111223651.GA5942@kroah.com>.

If you know how to make simple "unlocked_ioctl progress" web page for that
on wiki, and to look after it, please do so on kernelnewbies.org site.

For now i see on the list following:

Mathieu Segaud, Sergio Luis:
  'PCI: change procfs'

Andre Noll:
  'drivers/scsi/sg.c to unlocked_ioctl'

Kevin Winchester:
  'drivers/char/drm to use .unlocked_ioctl'

Nikanth Karthikesan:
  'drm_ioctl as an unlocked_ioctl'
  'x86 Machine check handler to use unlocked_iocl'
  'paride driver to use unlocked_ioctl'
  'Random number driver: make random_ioctl as an unlocked_ioctl'


Also you can make a "Tools" section there, where experience and tools
may be presented. This knowledge may improve future source tree
processing efforts, and save one's time and bandwidth (searching
archives, wrong patches, etc.) as well.

I refer to the _text_ processing tools and techniques, i'm going to
present below (skip it guys, if you don't like it).

== Tools ==

As discussions show, knowledge of C doesn't mean solving this task at
all. Even worse, one may re-factor/change functions in the way they like,
but maintainers don't. Compile-time "testing" is a distraction from the
cause. Note, however, that my vision of the root also may have some flaws.


Documentation and tutorials. While i'm willing to help in any technical
things, i know, please look at this first:

http://unix.org.ua/orelly/unix/

"UNIX Power Tools" is a must (without perl part).
Keywords: kernel, shell, BRE, sed.


So, expressions. Let's see on the error you've made. This is not a C
error at first, this is an expression error:


what you've did is introducing more special symbols, to ...
From: Rik van Riel
Date: Tuesday, January 8, 2008 - 1:22 pm

On Tue, 8 Jan 2008 20:58:04 +0100

This is a struct with function pointers, not a function.

No wonder it doesn't compile after you tried turning it into one :)

-- 
All rights reversed.
--

From: Andi Kleen
Date: Tuesday, January 8, 2008 - 1:42 pm

> I grepped and tried to do what you suggested.

First a quick question: how would you rate your C knowledge? Did you
ever write a program yourself? 


It's probably better to only do that on files which you can easily compile.

In this case it would be better to just put the unlock_kernel() directly
before the single return. You only need to sprinkle them all over when
the function has multiple returns. Or then change the flow to only

No the lock_kernel() of course has to be in the function, not in the structure
definition which does not contain any code.

Also the _operations structure stays the same (except for .ioctl -> .unlocked_ioctl); 

I would suggest to only work on files that compile. e.g. do a

make allyesconfig
make -j$[$(grep -c processor /proc/cpuinfo)*2] &1 |tee LOG             (will probably take a long time) 

first and then only modify files when are mentioned in "LOG" 

-Andi
--

From: Paolo Ciarrocchi
Date: Tuesday, January 8, 2008 - 1:45 pm

Yes I did but I probably beeing inactive for too much time,



Thanks you.

Ciao,
-- 
Paolo
http://paolo.ciarrocchi.googlepages.com/
--

From: Paolo Ciarrocchi
Date: Tuesday, January 8, 2008 - 4:43 pm

I did grep for "struct file_operations" in mm:

paolo@paolo-desktop:~/linux-2.6/mm$ grep "struct file_operations" *
shmem.c:static const struct file_operations shmem_file_operations;
shmem.c:static const struct file_operations shmem_file_operations = {
swapfile.c:static const struct file_operations proc_swaps_operations = {

Am I right in saying that both the files don't need to be modified?

There is nothing like:
struct file_operations xyz_ops = {
       ...
       .ioctl = xyz_ioctl
};

in there.

So I guess I need a smarter trick to find out which files need to be modified
as you previously suggested.

Ciao,
-- 
Paolo
http://paolo.ciarrocchi.googlepages.com/
--

From: Andi Kleen
Date: Tuesday, January 8, 2008 - 5:03 pm

grep -P '\.ioctl.*=' $(grep -rl 'struct file_operations' * )

should work. There are also special multiline greps iirc that might also be able
to do this better (like sgrep)

-Andi
--

From: Matt Mackall
Date: Wednesday, January 9, 2008 - 1:12 pm

You'll need to change the prototype, the unlocked version doesn't take
an inode. And you'll need to make sure that nothing in the function uses

This is not a return statement. You only need to unlock before the

Heh.

-- 
Mathematics is the supreme nostalgia of our time.

--

From: Alasdair G Kergon
Date: Wednesday, January 9, 2008 - 3:40 pm

This old chestnut again.  Perhaps we could have inode passed to unlocked_ioctl?
I never understood why it wasn't there in the first place if the plan was for
.unlocked_ioctl to supercede .ioctl whenever possible.

Device-mapper for example in dm_blk_ioctl(), has no need for BKL so drops it
immediately, but it does need the inode parameter, so it is unable to switch as
things stand.

Alasdair
-- 
agk@redhat.com
--

From: Andi Kleen
Date: Wednesday, January 9, 2008 - 3:46 pm

If you still need inode use

struct inode *inode = file->f_dentry->d_inode;

-Andi

--

From: Alasdair G Kergon
Date: Wednesday, January 9, 2008 - 3:45 pm

And oops if that's not defined?

Alasdair
-- 
agk@redhat.com
--

From: Chris Friesen
Date: Wednesday, January 9, 2008 - 3:58 pm

Isn't this basically identical to what was being passed in to .ioctl()?

Chris
--

From: Alasdair G Kergon
Date: Wednesday, January 9, 2008 - 4:05 pm

Not in every case, unless someone's been through and created fake structures in
all the remaining places that pass in a NULL 'file' because there isn't one
available.

Alasdair
-- 
agk@redhat.com
--

From: Vadim Lobanov
Date: Wednesday, January 9, 2008 - 4:31 pm

From 2.6.23's fs/ioctl.c - do_ioctl():

        if (filp->f_op->unlocked_ioctl) {
                error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                if (error == -ENOIOCTLCMD)
                        error = -EINVAL;
                goto out;
        } else if (filp->f_op->ioctl) {
                lock_kernel();
                error = filp->f_op->ioctl(filp->f_path.dentry->d_inode,
                                          filp, cmd, arg);
                unlock_kernel();
        }

As a sidenote, please don't use f_dentry and f_vfsmnt, since they are just 
#defines for the correct fields. They were meant to be temporary transition 
helpers, but (alas) have refused to die thus far. If noone beats me to it, 
I'll take a look-see at deprecating them all the way.

-- Vadim Lobanov
--

From: Alasdair G Kergon
Date: Wednesday, January 9, 2008 - 5:00 pm

Ah - you're talking about struct file_operations of course;
I was talking about struct block_device_operations.

Alasdair
-- 
agk@redhat.com
--

From: Vadim Lobanov
Date: Wednesday, January 9, 2008 - 9:59 pm

Good point. :-)

-- Vadim Lobanov
--

From: Christoph Hellwig
Date: Thursday, January 10, 2008 - 1:34 am

For file_operations which we talk about here it always is defined.
Block_device is a different story, but it'll get a completely different
prototype soon with neither file nor inode passed to it.
--

From: Daniel Phillips
Date: Thursday, January 10, 2008 - 2:49 am

[Empty message]
From: Daniel Phillips
Date: Thursday, January 10, 2008 - 3:55 pm

Nice.  This removes a deadlock we hit, where if creating a device mapper 
target blocks indefinitely (say on network IO) then nobody else can 
complete a device mapper operation because BKL is held.  If completing 
the device create depends on some other device mapper operation, then 
it is game over.

Our current workaround is to test for and drop BKL, ugh.  Thanks for the 
cleanup.

Regards,

Daniel
--

From: Pavel Machek
Date: Friday, January 11, 2008 - 1:33 am

Just put unlock kernel near return ret;...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Kevin Winchester
Date: Tuesday, January 8, 2008 - 4:50 pm

<snip>

I notice that every driver in drivers/ata uses a .ioctl that points to
ata_scsi_ioctl().  I could add the BKL to that function, and then change
all of the drivers to .unlocked_ioctl, but I assume this would be a
candidate to actually clean up by determining why the lock is needed and
removing it if necessary.  Does anyone know off-hand the reason for
needing the lock (I assume someone does or it wouldn't have survived
this long)?  If the lock is absolutely required, then I can write the
patch to add lock_kernel() and unlock_kernel().

-- 
Kevin Winchester

--

From: Kevin Winchester
Date: Tuesday, January 8, 2008 - 5:17 pm

Sorry about the noise here - I now notice that not all .ioctl function
pointers have the option of changing to .unlocked_ioctl.  In this case,
the ioctl is in the struct scsi_host_template, rather than struct
file_operations.

I'll try to be a little more careful about the git grepping in the future.

-- 
Kevin Winchester
--

From: Andi Kleen
Date: Tuesday, January 8, 2008 - 5:27 pm

Well it just points to another area that needs to be improved. Clearly
scsi_host_template should have a unlocked_ioctl too.

-Andi
--

From: Andre Noll
Date: Wednesday, January 9, 2008 - 3:34 am

On 17:40, Andi Kleen wrote:

Here's my take on drivers/scsi/sg.c. It's only compile-tested on x86-64.

Please review.
Andre
---

Convert sg.c to the new unlocked_ioctl entry point.

Signed-off-by: Andre Noll <maan@systemlinux.org>

---
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index f1871ea..3063307 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -48,6 +48,7 @@ static int sg_version_num =3D 30534;	/* 2 digits for each=
 component */
 #include <linux/blkdev.h>
 #include <linux/delay.h>
 #include <linux/scatterlist.h>
+#include <linux/smp_lock.h>
=20
 #include "scsi.h"
 #include <scsi/scsi_dbg.h>
@@ -764,20 +765,22 @@ sg_srp_done(Sg_request *srp, Sg_fd *sfp)
 	return done;
 }
=20
-static int
-sg_ioctl(struct inode *inode, struct file *filp,
-	 unsigned int cmd_in, unsigned long arg)
+static long
+sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 {
 	void __user *p =3D (void __user *)arg;
 	int __user *ip =3D p;
-	int result, val, read_only;
+	int val, read_only;
 	Sg_device *sdp;
 	Sg_fd *sfp;
 	Sg_request *srp;
 	unsigned long iflags;
+	long result;
=20
+	lock_kernel();
+	result =3D -ENXIO;
 	if ((!(sfp =3D (Sg_fd *) filp->private_data)) || (!(sdp =3D sfp->parentdp=
)))
-		return -ENXIO;
+		goto out;
 	SCSI_LOG_TIMEOUT(3, printk("sg_ioctl: %s, cmd=3D0x%x\n",
 				   sdp->disk->disk_name, (int) cmd_in));
 	read_only =3D (O_RDWR !=3D (filp->f_flags & O_ACCMODE));
@@ -787,57 +790,66 @@ sg_ioctl(struct inode *inode, struct file *filp,
 		{
 			int blocking =3D 1;	/* ignore O_NONBLOCK flag */
=20
+			result =3D -ENODEV;
 			if (sdp->detached)
-				return -ENODEV;
+				goto out;
+			result =3D -ENXIO;
 			if (!scsi_block_when_processing_errors(sdp->device))
-				return -ENXIO;
+				goto out;
+			result =3D -EFAULT;
 			if (!access_ok(VERIFY_WRITE, p, SZ_SG_IO_HDR))
-				return -EFAULT;
-			result =3D
-			    sg_new_write(sfp, p, SZ_SG_IO_HDR,
-					 blocking, read_only, &srp);
+				goto out;
+			result =3D ...
From: Richard Knutsson
Date: Wednesday, January 9, 2008 - 6:17 am

... and x86 with all(yes|mod)config. :)

Would had preferred:

if (x) {
    result = -Exxxx;
    goto out;
}

then:

result = -Exxxx;
if (x)
    goto out;

but it looks correct.

cu,
Richard

--

From: Andre Noll
Date: Wednesday, January 9, 2008 - 6:33 am

AFAIK, the second form is preferred in Linux because it is better
readable and it generates slightly better code.

Thanks for the review.
Andre
--=20
The only person who always got his work done by Friday was Robinson Crusoe
From: Rolf Eike Beer
Date: Thursday, January 10, 2008 - 1:52 am

Can you explain the rationale behind that running on the BKL? What type of=
=20
things needs to be protected so that this huge hammer is needed? What would=
=20
be an earlier point to release the BKL?

Greetings,

Eike
From: Andi Kleen
Date: Thursday, January 10, 2008 - 2:25 am

> Can you explain the rationale behind that running on the BKL? What type of 

It used to always run with the BKL because everything used to 
and originally nobody wanted to review all ioctl handlers in tree to see if 

That depends on the driver. A lot don't need BKL at all and 
in others it can be easily eliminated. But it needs case-by-case
review of the locking situation.

The goal of the proposal here is just to make it more visible.

-Andi
--

From: Rolf Eike Beer
Date: Thursday, January 10, 2008 - 3:02 am

So if I write my own driver and have never heard of ioctls running on BKL=20
before I can rather be sure that I just can change the interface of the ioc=
tl=20
function, put it in unlocked_ioctl and are fine? Cool.

Eike
From: Andi Kleen
Date: Thursday, January 10, 2008 - 3:06 am

If you know the BKL is not needed in your code you should use unlocked_ioctl
correct.

-Andi
--

Previous thread: [2.6.24-rc6-mm1]Build failure in drivers/net/ehea/ehea_main.c by Sudhir Kumar on Tuesday, January 8, 2008 - 9:33 am. (15 messages)