login
Header Space

 
 

Btrfs 0.12, Performance Improvements

February 6, 2008 - 4:05pm
Submitted by Jeremy on February 6, 2008 - 4:05pm.
Linux news

"I wasn't planning on releasing v0.12 yet, and it was supposed to have some initial support for multiple devices. But, I have made a number of performance fixes and small bug fixes, and I wanted to get them out there before the (destabilizing) work on multiple-devices took over," explained Chris Mason regarding the 0.12 release of his new btrfs filesytem. Btrfs was first announced in June of 2007, as an alpha-quality filesystem offering checksumming of all files and metadata, extent based file storage, efficient packing of small files, dynamic inode allocation, writable snapshots, object level mirroring and striping, and fast offline filesystem checks, among other features. The project's website explains, "Linux has a wealth of filesystems to choose from, but we are facing a number of challenges with scaling to the large storage subsystems that are becoming common in today's data centers. Filesystems need to scale in their ability to address and manage large storage, and also in their ability to detect, repair and tolerate errors in the data stored on disk." Regarding the latest release, Chris offered:

"So, here's v0.12. It comes with a shiny new disk format (sorry), but the gain is dramatically better random writes to existing files. In testing here, the random write phase of tiobench went from 1MB/s to 30MB/s. The fix was to change the way back references for file extents were hashed."


From: Chris Mason <chris.mason@...>
Subject: [ANNOUNCE] Btrfs v0.12 released
Date: Feb 6, 1:00 pm 2008

Hello everyone,

I wasn't planning on releasing v0.12 yet, and it was supposed to have some 
initial support for multiple devices.  But, I have made a number of 
performance fixes and small bug fixes, and I wanted to get them out there 
before the (destabilizing) work on multiple-devices took over.

So, here's v0.12.  It comes with a shiny new disk format (sorry), but the gain 
is dramatically better random writes to existing files.  In testing here, the 
random write phase of tiobench went from 1MB/s to 30MB/s.  The fix was to 
change the way back references for file extents were hashed.

Other changes:

Insert and delete multiple items at once in the btree where possible.  Back 
references added more tree balances, and it showed up in a few benchmarks.  
With v0.12, backrefs have no real impact on performance.

Optimize bio end_io routines.  Btrfs was spending way too much CPU time in the 
bio end_io routines, leading to lock contention and other problems.

Optimize read ahead during transaction commit.  The old code was trying to 
read far too much at once, which made the end_io problems really stand out.

mount -o ssd option, which clusters file data writes together regardless of 
the directory the files belong to.  There are a number of other performance 
tweaks for SSD, aimed at clustering metadata and data writes to better take 
advantage of the hardware.

mount -o max_inline=size option, to override the default max inline file data 
size (default is 8k).  Any value up to the leaf size is allowed (default 
16k).

Simple -ENOSPC handling.  Emphasis on simple, but it prevents accidentally 
filling the disk most of the time.  With enough threads/procs banging on 
things, you can still easily crash the box.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


just stabilize it already

February 7, 2008 - 7:14am
Anonymous (not verified)

Just having an extents based file system with fast fcsks, tail packing and dynamic inodes is more than I could wish for. I'd love to have this subset of btrfs stable enough for everyday use. I am a bit worried that Chris Mason is a bit too ambitious when he wants to add multiple devices with striping etc. to it. What additional benefits would it give compared to running it over the device mapper?

I agree completely

February 7, 2008 - 2:03pm
Anonymous (not verified)

Release 1.0 before any crazy pseudo-raid/lvm support. That could be 2.0.
After all, most people interrested in lvm/raid already have it set up, and they don't mind the current implementation. Is there something wrong with dm? Then let's fix dm.

Many of us desperately need btrfs so much, prio nr1 is 1.0. Just to get rid of ext.
Please Chris, reconsider this.

Disagree, Chris is right

February 7, 2008 - 4:21pm
Robert Devi (not verified)

I'm eagerly awaiting Btrfs too but file systems are not applications or even kernel modules. If an application or kernel module has a problem or limitation, 9999 times out of 10000 you can silently replace it without affecting things too much. However, if a file system has a problem, you're stuck with it. Worse, if there is potential for data corruption, you can't trust any of your data. That's why ext4 is a relatively tame upgrade of ext3. Ext3 is a good workhorse that's good for most users, but even the authors know there are some inherent limitations to what can be done without serious breakage. And since no-one wants serious breakage, the problems have to be just accepted.

If Btrfs were stabilized today and released to production within 6 months, the fundamental problems in Btrfs (which haven't been found) will be mostly set in stone and people will start complaining about Btrfs (especially if the competition has the features).

The important thing is to have a clear and defined set of requirements so that feature creep doesn't keep Btrfs in the HURD stage for the next 20 years. The specs of ZFS should be a goal of what is currently realistically possible, so that might be a good goal. If Btrfs reaches the point where "it's obvious how to implement all of ZFS's features given enough time without breaking the disk format", then it's ready for beta and then final release. But not before.

Is there something wrong

February 7, 2008 - 5:58pm
Anonymous (not verified)


Is there something wrong with dm? Then let's fix dm.

Many of us desperately need btrfs so much, prio nr1 is 1.0. Just to get rid of ext.


Is there something wrong with ext? Then let's fix ext.

;-)

faster and easier integrity checking, rebuild, self-repair

February 7, 2008 - 2:27pm
Miguel Sousa Filipe (not verified)

Hi,

Well, if its the filesystem that handles device replication/mirroring, it can make a rebuild mutch faster, because it doesn't need to resilver/resync the whole span of the device volume, only the data and metadata (that can make a huge diference).
(That can be observed with a md raid 5 versus a ZFS raidz, failing one disk, replace it with a new, and watch the speed of the reconstruction..).
Any kind of reconstruction, be it extending/growing with new devices, replacing devices..etc.. can be faster with this kind of tight coupling between filesystem and volume management.

Other thing that becomes easier is detecting device silent data corruption from one drive, by using checksums, and going to the next device with that data, retrieve it, check it, and if good, replace the corrupted data in the 1st drive with the good data. (something that marketing guys would call "self-healing").

Notice that without having control of the mirrored volumes, the FS would have more dificulty in discovering that one device silently had currupted data, and the other did not. (when the md raid returned the block, from which device did it came? which device silently currupted that block?)

Basically, having "inside" information about disk layout and state provides a faster/easier way to implement this kind of "health-checking", "self-healing" ..etc..

It would still be possible to have these features with the traditional approach, but its harder, more labour intensive (you have to add new logic and new information channels between virtual block devices and filesystem layer).

That's what I can think off.

Miguel Sousa Filipe

The focus on Btrfs

February 7, 2008 - 4:40pm
Chris Mason (not verified)

The focus on Btrfs development has been getting the code production ready as quickly as possible. I've made a lot of tradeoffs that favor development speed over perfection...

I really appreciate that people are anxious to start using Btrfs, but a big part of the Btrfs story is being robust in the face of metadata corruption.

A key component of that is metadata mirroring, even in single disk configurations. For multiple spindles, MD and DM based mirroring don't make it easy to read an alternate copy of the block from the mirror set, and they make it very difficult for the FS to understand the underlying storage topology.

The Btrfs chunking design aims to solve that, and I hope to push it down a layer so that other filesystems can take advantage of it. It will also be a key component in taking advantage of the SSD combo drives that are coming out, saving power and improving performance.

Most importantly, adding these features after the disk format is frozen would greatly complicate life, and I think lower the quality of the FS as a whole. A few months spent hammering it out will give us a much better long term code base.

thanks

February 8, 2008 - 5:41am
Anonymous (not verified)

Thank you for explaining your thoughts around this. We mere mortals are anxiously awaiting the time when we can sink our teeth into this... Just please keep the discussion with the other powers that be alive so we don't risk issues such as the ones that plagued Reiser from the start. What I mean is that if you reinvent too much stuff then integrating this important piece of code will be hard, for both technical and social reasons. A lot of people actually want to use this. (Yesterday, if possible :) )

A few months spent hammering

February 8, 2008 - 1:44pm

A few months spent hammering it out [...]

Pun intended ? ;-)

Btrfs

February 7, 2008 - 4:23pm
Anonymous (not verified)

I agree with the first two comments. We need an extendable filesystem now. Even managing multiple OS's on desktop/laptop systems with large HD's is a problem without an _easily extendable_ filesystem.

How those this compare to

February 7, 2008 - 4:50pm
Anonymous (not verified)

How those this compare to Reiser4 ?

Re: How those this compare to

February 8, 2008 - 4:40am
Anonymous (not verified)

Btrfs still has an active developer.

Reiser 4 also has an active,

February 8, 2008 - 12:18pm
Anonymous (not verified)

Reiser 4 also has an active, unpaid developer.
Please check your facts before trolling.
http://chichkin_i.zelnet.ru/namesys/

compression, anyone?

February 11, 2008 - 7:36am
mangoo (not verified)

Will Linux ever have a filesystem with transparent compression? Currently, only JFFS2 has it, but it's for flash-media only.

Already, kind of

October 25, 2008 - 8:42am
Anonymous (not verified)

Linux already has the following compressed filesystems. Unfortunately each has problems that prevent general purpose use:
Reiserfs 4: Requires invasive patch, not integrated in the Linux Kernel.
Squashfs: Read-only.
compFUSEd: Still expiremental, and is a loop back system so don't expect big performance gains from the reduction in IO.
ZFS on FUSE: Not especially optimised, cannot be integrated into kernel due to licensing concerns.

Unfortunately JFFS2 is not only for flash-media, I don't think it supports flash with an IDE interface such as you find on an Eeepc.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary