|
The Future of Computing
It's all about file systems
Wednesday, February 22, 2006
Fifty years from now, when you save a document it will no
longer be a single file. You will not even know exactly where that document has
been saved. Automatically, the system will create a read-only copy of it and
save it to yet another location. You will be able to group your documents,
e-mail and other content into categories and say who has what kind of access to
them. You would also be able to instruct data to either self destruct in certain
conditions or remain alive somewhere for all time. Your PC will be a workstation
connected to a vast network of both peers and servers all over the world and it
will seamlessly exchange information over well-defined and architected P2P
protocols. Whenever you need your document, you will be able to see all the
different versions of it and open one of them as you need.
Sounds like fantasy? Think again. Systems (FS) that exist
today already do most if not all of this, although pretty much in discrete
islands. Where we are headed seems to be the above times of interconnected
seamless systems that provide a global computing experience. Now for the first
important question-how will all this be accomplished? This will happen because
of the radical changes happening to the way our computers store and handle data.
The way that is about to happen, is what our story is about. Let's dive in.
What is a file system?
Let's start with an analogy. In your office, you put away paper files into
filing cabinets and maintain separate records of where each file is. A file
system in a computer is the same. If the hard disk is the equivalent to your
filing cabinet, then the FS is your index of what files lie within the filing
cabinet.
FS have different ways and levels of maintaining
information. Some limit themselves to basic descriptions of what files are
stored. Others add capabilities like access control, encryption and compression,
and let you create links to refer to files/folders. The reason for this
differentiation was the platform or need that file system serves. And, the way
the FS works largely depended on what OS it ran under.
As far as FS go, there isn't a perfect salve for all needs yet and the
evolution continues. Every major OS release is accompanied by a minor or major
improvement or overhaul of the FS it depends on. NTFS for instance has been
revised five times in all so far and each of them has added a feature or
enhancement. In comparison, the “ext” FS for Linux has seen three technical
evolutions.
Tied to an OS?
While the earliest of operating systems singled out one FS (more if the FS
was backward-compatible with another one) that they could use. However, that
trend faded as one OS evolved out of another and needed to use the older one's
applications. This brought in the era where you could use multiple FS, sometimes
different from one another, under an OS. The way we compute and deal with data
also changed. We're now in an era where we save documents to a server
somewhere on the Web that does not even belong to us. This actually brings in
new challenges. The average-Joe user, does not care where his files are and
what's happening behind the scenes. All he would want is to be able to store
his files and get them back when he wants them. This is why concepts like
distributed FS came in.
New requirements
The original requirements of needing to keep track of what files are stored
on a disk still hold for today's FS. But there are some new ones too. Let's
take them one at a time below.
Scalability:
A 630 MB hard disk can no longer fulfill your needs. Today, it is common
to find even a 120 or 200 GB hard disk filled up in a couple of months. If this
is a common global pattern, instead of each disk maintaining a separate discrete
FS, a common external FS would be more efficient. This FS would be scalable and
there would be no limit at which more disks will not be needed. Already, this is
an urgent need. Specialized storage systems like SAN and DAS systems already use
such FS to virtualize and manage lots of hot-swappable disks that may be removed
and replaced with a blank one anytime.
Virtualization: The need is for a virtualizing FS that can virtualize
(atleast to the user and his applications) a standard set of features and
capabilities, regardless of what the system can do behind the scenes. Storage
systems take care of this by running its own OS and abstracting a virtual FS
over the network.
Longevity and Destructibility: Both are now required by laws like the
Sarb-Ox. You need data to be available for long periods of time and destroyed
permanently at the end of that time. Traditional FS may let you do the first one
easily. But when it comes to retrieving deleted data from a hard disk, there are
tons of recovery software. The best permanent destruction solution today is to
overwrite with complicated hash values several times. But is there something
better? If the previous FS records can still be accessed, then it's the logic
of the FS that needs to improve, not the means to hide content.
Content independence: Information in the enterprise can be in many
different forms and formats. The usual solution to retrieve and make use of them
is to use an enterprise application. But if the FS itself can help do this, that
would ideal.
Performance: The performance of the FS is a function of the ease and
speed of locating the file (by the OS) and access the content. Some FS, like
FAT, are optimized for floppies and small hard disks. Others like Red Hat's
GFS and Sun's ZFS are for high capacity storage. So, we have different file
systems for different sized storage and performance would depend on what you
selected.
How are these factors are driving changes in FS? Let's
take up different areas of FS usage and examine them in detail.
 |
| Despite their closed nature, FAT and NTFS together have the biggest base of third party tools for management (partitioning, recovery, etc) |
HPFS386 (HPFS for servers) was originally designed to be what NTFS is today, with reduced fragmentation, mixed code-page, and more |
The file system supporting the maximum single-volume size ever today is the UFS2 (UNIX File System), at 1 YottaByte (1024 bytes) |
Keep the users happy
| ZFS will reach its files limit in 9,000
years if you create 1,000 files a second |
One of the challenges for the administrator at the desktop
and server level is to strike a balance between security and recoverability. In
this segment, when we talk of “server”, we refer to the basic file server,
which may be performing additional tasks like authentication. The enterprise
user needs to be given the right amount of security for his files. At the same
time, when data is lost due to some reason, it should be easily recoverable. A
third key element in deciding the right desktop file system is efficient usage
of limited capacity. Unlike at the server, desktops have limited storage (say 40
GB). This space has to be utilized as efficiently as possible.
 |
|

|
|
Traditional file
systems that we know and use today are 64-bit at best. This lets them
store a few terabytes of data independently before you need to start
thinking about clustering many of them to address more content. A 128-bit
system can store about 1021 bytes of data, which is a thousand times more
than all of Human knowledge till the end of the previous millennium. This
is the first notable point about Sun's ZFS file system. The theoretical
capacity of ZFS is 16 Exabytes per system. Other than this, it also has
other features that let it be used in mission-critical environs.
Loss prevention
Data is susceptible to corruption. File systems follow various mechanisms
to prevent or minimize this loss. A reason for data loss is that blocks
get over-written on the disk because file sizes keep changing. In ZFS,
this is prevented by writing new data to new locations on the disk and
then deleting the old information. This way, file expansion is less likely
to overwrite adjacent blocks belonging to (other) files. This principle is
actually similar to the WAFL file system (see later in this story for a
discussion on this) which also works the same way. To verify integrity of
files, most modern file systems use checksums. Ordinarily, these are
32-bit checksums. But ZFS uses 64-bit checksums letting it protect data a
little more aggressively. ZFS minimizes performance problems faced by
journaling file systems due to excessive writes by grouping write
operations into 'transaction blocks' and then treating these groups as
one.
Like NTFS, ZFS also takes snapshots in a content-sensitive way. Normally,
snapshots are copies of the entire file which ends up occupying large
sizes as data grows. Sun's implementation will snapshot only that part
of the data that has changed, letting the file system simply use pointers
instead of copies of unchanged information. This way, the disk is also
utilized more efficiently.
Solaris 10, the parent OS for ZFS also bundles virtualization
technologies. ZFS makes use of this by adding storage virtualization at a
very low level. It also removes the necessity for separate volume
management for each storage device. This also makes ZFS highly scalable,
since you can add more capacity without needing to make changes anywhere
else. This ease of management is also enhanced by the new ZFS paradigm of
creating policies instead of actions. Here, you can instruct the system on
a policy to apply (like quotas) instead of actually doing a step-by-step
implementation.
ZFS natively supports mirroring data to other disks in the storage pool.
This means like EMC's CAS storage system, ZFS can correct corrupted data
blocks using checksums and retrieving data from locations with the correct
data. Since it can do this auto-magically, there is no need to perform a
'fsck' on a ZFS volume. In spite of all this functionality, the file
system appears to the user as a normal POSIX system. And if you're not
satisfied with the way ZFS works, the file system implementation is
available under CDDL with source code. It is a part of OpenSolaris as
well.
Shortcomings
Its not as if ZFS is free of shortcomings. Two major ones at the moment
(although unconfirmed) are: one, you cannot mount ZFS volumes under any
other OS should you so require it. But we can probably expect atleast
third-party solutions for this in the near future. The second is you
cannot convert between existing UNIX File System (UFS) and ZFS formats,
meaning you need to install from scratch if you want ZFS on the system. |
Now in a typical enterprise layout, users would be storing
their files mostly in a central file server where the above aspects have been
taken care of. Then the only files that would need recovery at the desktop would
be downloaded e-mail, instant message logs and any files that have not yet been
stored on the file server. If your organization uses centralized messaging
storage (as is likely if you're using something like Lotus Domino/Notes or MS
Exchange), this aspect is transferred to the messaging server. Also, because of
standardization of software in the enterprise, the desktops would be all on the
same OS (some flavor of Linux or Windows) and hence on ext3 or NTFS.
But all this set to change. Remember we said that capacity
is growing. What this also means is that the amount of data being stored is
increasing. A challenge therefore is to not only store TBs of data safely, but
also find it quickly when your user wants it. Traditional FS like FAT, NTFS or
ext3 are not geared for such activity. FAT is meant for low-volume storage and
has no security features. Ext3 and NTFS can address a lot of storage and provide
strong security but they are not inherently search-friendly. Some of the FS that
take care of these problems are: Reiser4, WinFS and ZFS. Strangely, a look at
the specialties of each of the three reveals no discernable pattern.
Limitless expansion:
While there are no known limits to the capacity WinFS or Reiser4 can address,
Reiser4 limits each file to a maximum of 8 TB. ZFS has a limit of 16 Exabytes.
For the uninitiated in the world of numbers, the entire human knowledge in any
known format at the end of the last millennium was only 12 Exabytes. ZFS will be
able to address data for a long time.
Journaling: Logging of changes before it is made is called
'Journaling'. If there is a power failure, or some error condition before
all required operations have been done then changes can be rolled back. A
jouurnaling file system (JFS) can solve frequent CHKDSK/FSCK screens at boot up.
A performance limitation is that there are two write operations per actual
write. So, if you simply saved this file in a JFS, the engine would write down
that it saved the file and then also save the file itself.
The Reiser4 FS faetures “wandering logs” where B* Trees not populated to a
certain degree are not saved to disk unless the underlying transaction is
complete.Reiser4 is supposed to be much faster than ext3. NTFS also has
journaling, using B+ Trees. ZFS uses out-of-place copies.
Finding data: WinFS is known for its search-optimization. As we said
in our earlier article (“WinFS Beta 1”, Oct 2005), it would let you run
almost SQL-like queries on your file system to locate what you want. In a
retrogressive move, WinFS is slated to flatten the FS, meaning no more folders.
Instead, it would categorize data by adding attributes to your files and search
based on those values.
So, we're going to see the ability to address more
capacity, locate our data faster and store it reliably.
Time Line

Page(s) 1 2 3
|