Sunday, November 22, 2009  
Google
Web pcquest.com

CIOL Network sites

Search by Issue | Sitemap | Advanced Search

• For most updated version of DQ TOP 20 issue, visit dqindia.com • Ad : Play and Plug ERP by IBM
 Home > Top Stories

The Future of Computing

It's all about file systems

Wednesday, February 22, 2006

Print Comment Email DiggDigg DeliciousDel.icio.us RedittReddit TwitterTwitter

Fifty years from now, when you save a document it will no longer be a single file. You will not even know exactly where that document has been saved. Automatically, the system will create a read-only copy of it and save it to yet another location. You will be able to group your documents, e-mail and other content into categories and say who has what kind of access to them. You would also be able to instruct data to either self destruct in certain conditions or remain alive somewhere for all time. Your PC will be a workstation connected to a vast network of both peers and servers all over the world and it will seamlessly exchange information over well-defined and architected P2P protocols. Whenever you need your document, you will be able to see all the different versions of it and open one of them as you need.

Sounds like fantasy? Think again. Systems (FS) that exist today already do most if not all of this, although pretty much in discrete islands. Where we are headed seems to be the above times of interconnected seamless systems that provide a global computing experience. Now for the first important question-how will all this be accomplished? This will happen because of the radical changes happening to the way our computers store and handle data. The way that is about to happen, is what our story is about. Let's dive in.

What is a file system?
Let's start with an analogy. In your office, you put away paper files into filing cabinets and maintain separate records of where each file is. A file system in a computer is the same. If the hard disk is the equivalent to your filing cabinet, then the FS is your index of what files lie within the filing cabinet.

FS have different ways and levels of maintaining information. Some limit themselves to basic descriptions of what files are stored. Others add capabilities like access control, encryption and compression, and let you create links to refer to files/folders. The reason for this differentiation was the platform or need that file system serves. And, the way the FS works largely depended on what OS it ran under.  As far as FS go, there isn't a perfect salve for all needs yet and the evolution continues. Every major OS release is accompanied by a minor or major improvement or overhaul of the FS it depends on. NTFS for instance has been revised five times in all so far and each of them has added a feature or enhancement. In comparison, the “ext” FS for Linux has seen three technical evolutions.

Tied to an OS?
While the earliest of operating systems singled out one FS (more if the FS was backward-compatible with another one) that they could use. However, that trend faded as one OS evolved out of another and needed to use the older one's applications. This brought in the era where you could use multiple FS, sometimes different from one another, under an OS. The way we compute and deal with data also changed. We're now in an era where we save documents to a server somewhere on the Web that does not even belong to us. This actually brings in new challenges. The average-Joe user, does not care where his files are and what's happening behind the scenes. All he would want is to be able to store his files and get them back when he wants them. This is why concepts like distributed FS came in.

New requirements
The original requirements of needing to keep track of what files are stored on a disk still hold for today's FS. But there are some new ones too. Let's take them one at a time below.

Scalability:  A 630 MB hard disk can no longer fulfill your needs. Today, it is common to find even a 120 or 200 GB hard disk filled up in a couple of months. If this is a common global pattern, instead of each disk maintaining a separate discrete FS, a common external FS would be more efficient. This FS would be scalable and there would be no limit at which more disks will not be needed. Already, this is an urgent need. Specialized storage systems like SAN and DAS systems already use such FS to virtualize and manage lots of hot-swappable disks that may be removed and replaced with a blank one anytime.
Virtualization:
The need is for a virtualizing FS that can virtualize (atleast to the user and his applications) a standard set of features and capabilities, regardless of what the system can do behind the scenes. Storage systems take care of this by running its own OS and abstracting a virtual FS over the network.
Longevity and Destructibility:
Both are now required by laws like the Sarb-Ox. You need data to be available for long periods of time and destroyed permanently at the end of that time. Traditional FS may let you do the first one easily. But when it comes to retrieving deleted data from a hard disk, there are tons of recovery software. The best permanent destruction solution today is to overwrite with complicated hash values several times. But is there something better? If the previous FS records can still be accessed, then it's the logic of the FS that needs to improve, not the means to hide content.
Content independence:
Information in the enterprise can be in many different forms and formats. The usual solution to retrieve and make use of them is to use an enterprise application. But if the FS itself can help do this, that would ideal.
Performance:
The performance of the FS is a function of the ease and speed of locating the file (by the OS) and access the content. Some FS, like FAT, are optimized for floppies and small hard disks. Others like Red Hat's GFS and Sun's ZFS are for high capacity storage. So, we have different file systems for different sized storage and performance would depend on what you selected.

How are these factors are driving changes in FS? Let's take up different areas of FS usage and examine them in detail.

Despite their closed nature, FAT and NTFS together have the biggest base of third party tools for management (partitioning, recovery, etc) HPFS386 (HPFS for servers) was originally designed to be what NTFS is today, with reduced fragmentation, mixed code-page, and more The file system supporting the maximum single-volume size ever today is the UFS2 (UNIX File System), at 1 YottaByte (1024 bytes)

Keep the users happy

ZFS will reach its files limit in 9,000 years if you create 1,000 files a second

One of the challenges for the administrator at the desktop and server level is to strike a balance between security and recoverability. In this segment, when we talk of “server”, we refer to the basic file server, which may be performing additional tasks like authentication. The enterprise user needs to be given the right amount of security for his files. At the same time, when data is lost due to some reason, it should be easily recoverable. A third key element in deciding the right desktop file system is efficient usage of limited capacity. Unlike at the server, desktops have limited storage (say 40 GB). This space has to be utilized as efficiently as possible.

 

Traditional file systems that we know and use today are 64-bit at best. This lets them store a few terabytes of data independently before you need to start thinking about clustering many of them to address more content. A 128-bit system can store about 1021 bytes of data, which is a thousand times more than all of Human knowledge till the end of the previous millennium. This is the first notable point about Sun's ZFS file system. The theoretical capacity of ZFS is 16 Exabytes per system. Other than this, it also has other features that let it be used in mission-critical environs.

Loss prevention
    Data is susceptible to corruption. File systems follow various mechanisms to prevent or minimize this loss. A reason for data loss is that blocks get over-written on the disk because file sizes keep changing. In ZFS, this is prevented by writing new data to new locations on the disk and then deleting the old information. This way, file expansion is less likely to overwrite adjacent blocks belonging to (other) files. This principle is actually similar to the WAFL file system (see later in this story for a discussion on this) which also works the same way. To verify integrity of files, most modern file systems use checksums. Ordinarily, these are 32-bit checksums. But ZFS uses 64-bit checksums letting it protect data a little more aggressively. ZFS minimizes performance problems faced by journaling file systems due to excessive writes by grouping write operations into 'transaction blocks' and then treating these groups as one.
    Like NTFS, ZFS also takes snapshots in a content-sensitive way. Normally, snapshots are copies of the entire file which ends up occupying large sizes as data grows. Sun's implementation will snapshot only that part of the data that has changed, letting the file system simply use pointers instead of copies of unchanged information. This way, the disk is also utilized more efficiently.
    Solaris 10, the parent OS for ZFS also bundles virtualization technologies. ZFS makes use of this by adding storage virtualization at a very low level. It also removes the necessity for separate volume management for each storage device. This also makes ZFS highly scalable, since you can add more capacity without needing to make changes anywhere else. This ease of management is also enhanced by the new ZFS paradigm of creating policies instead of actions. Here, you can instruct the system on a policy to apply (like quotas) instead of actually doing a step-by-step implementation.
    ZFS natively supports mirroring data to other disks in the storage pool. This means like EMC's CAS storage system, ZFS can correct corrupted data blocks using checksums and retrieving data from locations with the correct data. Since it can do this auto-magically, there is no need to perform a 'fsck' on a ZFS volume. In spite of all this functionality, the file system appears to the user as a normal POSIX system. And if you're not satisfied with the way ZFS works, the file system implementation is available under CDDL with source code. It is a part of OpenSolaris as well.

Shortcomings
      Its not as if ZFS is free of shortcomings. Two major ones at the moment (although unconfirmed) are: one, you cannot mount ZFS volumes under any other OS should you so require it. But we can probably expect atleast third-party solutions for this in the near future. The second is you cannot convert between existing UNIX File System (UFS) and ZFS formats, meaning you need to install from scratch if you want ZFS on the system.

Now in a typical enterprise layout, users would be storing their files mostly in a central file server where the above aspects have been taken care of. Then the only files that would need recovery at the desktop would be downloaded e-mail, instant message logs and any files that have not yet been stored on the file server. If your organization uses centralized messaging storage (as is likely if you're using something like Lotus Domino/Notes or MS Exchange), this aspect is transferred to the messaging server. Also, because of standardization of software in the enterprise, the desktops would be all on the same OS (some flavor of Linux or Windows) and hence on ext3 or NTFS.

But all this set to change. Remember we said that capacity is growing. What this also means is that the amount of data being stored is increasing. A challenge therefore is to not only store TBs of data safely, but also find it quickly when your user wants it. Traditional FS like FAT, NTFS or ext3 are not geared for such activity. FAT is meant for low-volume storage and has no security features. Ext3 and NTFS can address a lot of storage and provide strong security but they are not inherently search-friendly. Some of the FS that take care of these problems are: Reiser4, WinFS and ZFS. Strangely, a look at the specialties of each of the three reveals no discernable pattern.

Limitless expansion: While there are no known limits to the capacity WinFS or Reiser4 can address, Reiser4 limits each file to a maximum of 8 TB. ZFS has a limit of 16 Exabytes. For the uninitiated in the world of numbers, the entire human knowledge in any known format at the end of the last millennium was only 12 Exabytes. ZFS will be able to address data for a long time.
Journaling:
Logging of changes before it is made is called 'Journaling'. If there is a power failure, or some error condition before all required operations have been done then changes can be rolled back. A jouurnaling file system (JFS) can solve frequent CHKDSK/FSCK screens at boot up. A performance limitation is that there are two write operations per actual write. So, if you simply saved this file in a JFS, the engine would write down that it saved the file and then also save the file itself.
The Reiser4 FS faetures “wandering logs” where B* Trees not populated to a certain degree are not saved to disk unless the underlying transaction is complete.Reiser4 is supposed to be much faster than ext3. NTFS also has journaling, using B+ Trees. ZFS uses out-of-place copies.
Finding data:
WinFS is known for its search-optimization. As we said in our earlier article (“WinFS Beta 1”, Oct 2005), it would let you run almost SQL-like queries on your file system to locate what you want. In a retrogressive move, WinFS is slated to flatten the FS, meaning no more folders. Instead, it would categorize data by adding attributes to your files and search based on those values.

So, we're going to see the ability to address more capacity, locate our data faster and store it reliably.

Time Line

Page(s)   1  2  3  

Print Comment Email DiggDigg DeliciousDel.icio.us RedittReddit TwitterTwitter


Untitled Document



ZTE:Leading CDMA Technology


Extraordinary Networks:Freedom of Choice


   
 

 
 

Magazine Subscription | RQS | Contact Us | Team PCQuest | Advertising - Print | jobs@cybermedia