Thursday, November 20, 2008  
Google
Web pcquest.com

CIOL Network sites

Search by Issue | CD Search | Sitemap | Advanced Search

"Ad: Nortel data network solutions are 40% more energy efficient" "Ad:Discover Green Intelligence, make your business strong"
   
 Home > Technology

Petabytes of Data @ Your Fingertips: How?

What makes a relational database tick? How does it efficiently store information? How can it trace a particular value from a cluster of hundreds or thousands of data stores? Find the answers inside

Sujay V Sarma

Saturday, December 09, 2006

A database today is a vastly complex affair. The 'big' implementations can be a cluster of thousands of database servers spread across the geography. Yet, as enterprises scale up and grow beyond those geographical boundaries, they continue to demand and expect that the data will always be there, at their fingertips. The magic that works behind the scenes to make this happen is what makes a modern relational database tick. If you consider a database deployment as vast as those of a Google or a Yahoo, the amount of data that is there, with the primary ambition of being found in response to any number of any manner of queries runs into Petabytes.

How does this work? What is the secret databases engines hold that stores all this information efficiently and in a manner that they can be queried and processed in fractions of minutes? The technologies that apply are the very same ones that power your enterprise. And we're going to plumb the depths of a modern RDBMS system and see why and how it just works. Before we move on, let's get an overview of what a modern RDBMS looks like on the inside.

Quick refresher
A modern RDBMS has a few logical sections to it, although you may not see them as such discreet pieces (as in separate modules) in reality. Exactly how the different pieces fit together also differs between each implementation. But broadly speaking, they would have the following:

Storage engine: This would determine where each data element goes and stores it in the best possible way. Sometimes, compression is also applied to make efficient usage of storage capacity.

Buffer/Cache engine: This takes care of buffering and caching, both data as well as instructions between the database backend server and the user front-end (which may be over a Web connection). This is also important when cursors several data pages long are being retrieved for reporting or analysis.

Transaction engine: RDBMS operations are atomized into transactions and sets of transactions can be rolled back in case of failure. This is well known. The engine that makes this happen is the transaction engine, which has built-in atomization logic, along with the ability to detect different kinds of failures (recoverable and irrecoverable) and handle them appropriately.

Recovery engine: In case a transaction fails or data is corrupted, the recovery engine is called upon to get things to working order again. This means that you get the data in a previously 'known' and 'sane' state as against something that's completely unknown or damaged.

Query engine: This engine is responsible for fetching the queries (from the application or user), parsing it, optimizing it and then queuing it for execution.

Each of these engines, in turn, would use sub-engines that perform further specialized tasks. For instance, the storage engine would need to call on one of several discreet file system drivers to perform the storage depending on if the storage is happening locally, over the network onto a SAN or if it is in a distributed environment.

Now let's turn to some of the key players in the RDBMS arena and see what technologies make them work better for us.

Databases for SOA
There is a new set of databases aimed at getting better performance out of SOA applications that use databases. IBM's DB2 9 'VIPER' and MS SQL Server 2005 are two such SOA ready databases.

SOA enables databases to interact better with SOA applications like web services, and are targeted at large e-commerce deployments. Regular databases and database applications are not geared to meet the demands of SOA deployments. To do that, an RDBMS must do the following at a minimum:

1. Be able to talk and understand XML. All SOA transactions happen via XML so it is important for the database engine to know how these messages look and work so to know what are the really interesting ones that it must store and what is part of regular chatter that can be safely thrown away.

Special database engines
ORACLE Berkeley DB XML
This is an open source database engine based around XML and built on top of the Berkeley DB database engine. It provides a quick and easy way to store and retrieve XML based data, using XQuery. The engine runs in-process with the application that uses it and does not require external administration. As such, it provides ACID transactions along with XML validation, document meta-data and document indexing features. It is supported on multiple platforms and useable with all the major developer languages.

ORACLE Berkeley DB Java Edition
Again built around the Berkeley DB architecture, this database is entirely built with Java and meant for EJB object persistence. The idea is that instead of having to use external databases for Java EJB objects and then using SQL queries to store and retrieve them and incur the penalty of having to translate Java objects into SQL entities and back each time, one can use powerful Java statements to do the same. Thus, the Berkeley DB Java Edition uses Java code via a DPL (Direct Persistence Layer) to persist and retrieve EJB objects in a relational database. The database functions as a Managed Transaction Resource within your EJB application as it exposes a JTA (Java Transaction API) and JCA (Java Connector Architecture). Data storage is always local and in-process.

2. It must do HTTP and SOAP. Most databases simply communicate on a predetermined TCP port (like 1433 for MS SQL Server over TDS [Tabular Data Stream] protocol,1521 for Oracle and port 3306 for MySQL) and this has nothing to do with regular SOA traffic. A true SOA database must communicate on the regular channels like HTTP (port 80) atleast.

In order to act as a service provider for SOA applications or data, the RDBMS must be further aware of SOAP authentication methods and be able to proxy that authentication between token providers and consumers in the SOA chain. Future databases will need to be SODA instead of just an RDBMS to serve the purposes of modern computing.

Oracle 10g
Automatic Storage Mgmt: ASM for short, think of it as your SAN's storage manager embedded into the kernel of your RDBMS engine. Hierarchically, the ASM would fit in between the different database instances (see Jargon Buster in this article for more on this) and the storage they use. Thus, it virtualizes the storage available to Oracle databases across a clustered storage environment and exposes them as a single available set of locations to the database. Where the exact file (or file segments) are located and how they are laid out, accessed or managed becomes transparent to the database engine, with the positive side effect of removing the headache of managing storage from the DBA's workload. ASM is used by the Oracle kernel only for the database files and cannot be used for other storage.

A set of physical disks are grouped into 'disk groups' and a set of these are available to a storage node. An ASM manages storage at the node level and different nodes are formed into a storage cluster. ASMs among different nodes in a cluster can communicate among themselves to manage storage between them.

At start up, the kernel would fire up the ASM which would mount the disk groups after making sure everything is ok (or performing maintenance on them). It is only after this that the DB instances are started up, which in turn mount the databases.

MySQL Server allows DBAs to choose from a set of storage engines on the fly, each engine is optimized for a specific app area from archival to cluster computing

Cluster File System: When an Oracle RAC (Real Application Cluster) is spread across several cluster-nodes, a robust and high performance CFS (Cluster File System) is required to efficiently access and store data across the cluster. Oracle's open source OCFS (Oracle Cluster File System) is a contribution in this direction. Unlike what we said about the ASM above, OCFS lets the database store data files, log files as well as archives. But, regular files other than Oracle database files cannot be stored on the OCFS and this support is planned for OCFS 2.0 (current release is 1.0.14-1 at the time of going to press). The OCFS can work only on a SAN or a DAS where local servers have direct control over the storage. The system runs on a RHEL Advanced Server based server, and standard Linux commands (FSCK, MKFS, etc) are available to manage and maintain OCFS storage.

MS SQL Server
Automatic Consistency Checking: Since version 7.0, SQL Server has supported file based databases. This lets a single database be extended among different storage units and managed together from a single interface. SQL Server 2000 removed the need to run DBCC (consistency checks) on databases prior to backup, since the engine anyway performs these checks at regular intervals. Also, as and when the storage engine encounters a problem, it is handled without needing a separate management operation. With the 2005 edition, running a DBCC CHECK (although still available) is no longer a necessary step, as it is completely automated.

JARGON BUSTER
Server Instance: Many RDBMS backends can run multiple instances, and each of these instances service one or more applications.

This is an efficient way to fine tune the engine configuration depending on the application(s) being serviced, instead of being forced to run at the same configuration for all databases
and applications.

Each instance can use the same database, but due to file locking when an instance mounts a database, only one of it can use it at a time (the other instances will need to place the database offline or un-mount it).

Cursor: A cursor is a range ('page') of information being pulled from a database in response to a query. Cursors are used to reduce bandwidth utilization while processing a large amount of data. They do this by accessing only a portion of it at a time.

Transaction: Database operations are usually multi-step. Each step has a possibility of failure. If the operation fails and data has been written to the database (or deleted), then the database can be in an inconsistent state since the entire operation has not completed (for eg, the database may have recorded that the order was dispatched, but details like how it was paid for and how it was sent away were not yet saved). Transactions help batch these operations and on failure within a batch that batch or the entire set can be undone ('rolled back') and retried afresh after corrective measures.

Atomization: The process of breaking up an operation or information into smaller parts is called atomization. The smaller a piece, the less likely the entire operation fails. Also errors can be caught and handled better.

Copy-on-write: Primarily a file system/backup related technology, it refers to the type of data copy operation that occurs only when a WRITE occurs on a corresponding file. This means a CoW agent will ignore all READ operations as well as idle states of the data and gets triggered only when data content is changed in some way. This is a more efficient way to snapshot/backup information.

Data partition: When a database contains a large amount/types of data, it becomes difficult to analyze properly, since the application will be looking at all of it at a time. The RDBMS if it supports partitioning, can let you create 'windows' into the data, much like the VIEWS that let you see only a subset of that data. Different types of partitioning exist (range, hash, key, list and composite). Sub-partitioning lets you create smaller subsets from existing data partitions.

Page-wise Checksum: Instead of checksums being computed at the file level, SQL Server 2005 checksums pages of data, which can increase data protection. This can, for instance catch problems that are not reported by the OS or I/O hardware --- an ALTER DATABASE option enables page level checksums for databases.

Read-only Database Compression: Another improvement in this edition is that when you have compressed NTFS volumes, you can place a database file group that's been marked 'read only' onto that compressed volume. Since the data in these files are not subject to change (since they are read only), the file does not grow and hence compression can help better utilize its space.

Quick Initialization: To improve performance, SQL Server 2005 permits creating or extending a database's data files (not log files) without first zeroing the space allocated to it. But, to be able to do this, the SQL Server instance needs the SE_MANAGE_VOLUME_NAME privilege, which is not available to the Network Service account (default for the service) and hence by default this facility is turned off. This kind of quick-initialization potentially exposes uninitialized portions of the disk that may contain previous data to the application or user. But, the storage engine itself restricts access only to those portions where data has been written to by the database engine, preventing such leaks. In a broad manner of speaking, this is similar to the 'quick format' option for disks available in major OSs.

Snapshots: Point-in-time backups can be taken using database snapshots. This uses NTFS Sparse File technology to efficiently manage the space used. Pages of data are copied from the database to the snapshot on a copy-on-write basis, letting the snapshot file attain similar sizes as the original database.

Row-wise versioning: And imagine having different versions of each row in a database! That's also now supported with SQL Server 2005's row-level versioning. This also works on a copy-on-write basis, but stamps each row with its transaction ID to
enable a trace back.

Data partitioning: Data in the latest version of this RDBMS can be partitioned into independent segments and accessed and modified separately as long as they are assigned to different file groups. Thus, the storage engine in SQL Server 2005 also handles file groups (actually data partitions), along with switching partitions between file groups and aligning them to different boundaries.

MySQL
Pluggable Engines: This well-known open source enterprise class database uses a pluggable storage architecture. While about eight different types are shipped with different editions of MySQL, about five engines are also available from their partners and the MySQL community. Each storage engine is specifically written and optimized for a particular application area. Some handle read-intensive operations more suited to Web based deployments, while others are raw OLTP, clustering and warehousing optimized. The default storage engine is called 'MyISAM'. Other native engines include: Cluster, Federated, Archive, Merge, Memory, CSV and BlackHole. Oracle/InnoDB's own InnoDB engine is tasked with transactional data storage with row-level locking and referential integrity. PrimeBase XT, Open OLAP and Berkeley DB (Oracle) are among the community developed contributions.

Till now, this architecture allowed MySQL DBAs to select the best available storage engine for a MySQL instance. From version 5.1, the architecture has been changed to allow database developers change the storage engine on the fly using what MySQL terms as a 'clap-on
clap-off' approach with special LOAD and UNLOAD statements.

Other enhancements include the transactional engine, OLAP engine and multi-dimensional engine. The MySQL community also got its long standing wish to be able to do data partitioning (range, hash, key, list, composite and sub-partition) on its databases with the latest version, although the support for this is rather shaky and under development.

Falcon: This new database engine for MySQL was originally supposed to enter public beta testing in summer this year, has not yet come into the open even with broad specifications at the time of this going to press. All that is known about this engine is that it is based on the Netfrastructure engine by Jim Starkey that makes better use of modern processor and RAM capabilities. More cores in a processor means the engine would use them to marshal data from one place to another more efficiently and go multi-threaded to do so. Larger memories mean more caching, but Falcon wants to do better than that by using in-memory databases that fill up from disk-based content. Tuning databases are also rumored to be a thing of the past with Falcon handling your database. Multiple versions of a table will exist in memory while there would be just a single version on the disk,
making versioning faster and more consistent. Similarly, transactions are also performed in memory with the outputs flowing to the disk. For the rest, we'll have to wait till a version of Falcon actually hits the beta servers.

IBM DB2 VIPER
Not just relational DBMS: This is version 9 of the IBM DB2 and was almost named “DB3” given the wide ranging changes to the DB2 system. IBM says that as of Viper, DB2 is no longer a pure relational environment but provides an XML plus relational environment.

pureXML: DB2 Viper is intensely an XML environment. XML documents can be stored in a database using several methods: you could break up the document into its constituent parts and store each one separately or, store entire XML as a single piece. Both have their pros and cons. When you break up the file, you can store it more efficiently, replacing duplicates with pointers to the already stored copy. But, this has a performance penalty when a document has to be reconstructed in response to a query. DB2 Viper stores XML documents as a single LOB (Large Object) entity, making it compliance friendly, as when you break up something into pieces and recombine it later, you do not end up with the original (for eg, digital certificates will fail to verify as the original document). Storing XML as a LOB overcomes this problem as well.

SOA friendly: Having a database that can directly do XML is great. But, having it do a part of SOA for you as well would make for a great deal. This is what DB2 Viper also enables with its pureXML. So, instead of having separate DB2 instances for each platform or application that needs the database, you can have just a single DB2 deployment that can serve data via XML to the various applications, thus complying with the primary requirement of SOA: disconnectedness.

Tokenized Compression: To compress data effectively, DB2 v9 replaces duplicate data even within columns with tokens. This reduces redundancy. DB2 will look for duplication across the database and replace the duplicates with numeric tokens each time it appears. A central token 'dictionary' helps refresh the data item with the actual information when it is retrieved. Numeric values, LOB, XML and indexes are not currently tokenized.

Optimized for SAP: It not onl detects its deployed environment on installation, but can also detect a SAP deployment in that environment. It can discover SAP's configuration and use these values to work better with that application. DB2 can be deployed in a special 'silent install' mode as a part of the SAP installation. The information picked up from SAP is used for database tuning, indexing and multi-dimensional clustering.

Security partitioning: This is a form of data partitioning (see Jargon Buster) where DB2 lets you partition data based on security tokens. This means you can put confidential information in one partition, top secret data in another and see-this-and-die information in a third one. You can do partitioning using an ALTER TABLE command. This function is useful when you need to perform decision-support analysis (what-if, scenario building, etc) where you may need to temporarily 'delete' data for the analysis query period.

Replicated partitions: This is a feature that is a part of data warehouse appliances and may soon find its way into DB2. Partition replication is useful in parallel processing environments where you can replicate a partition onto another instance.

Self-Tuning: In both memory and storage, the database engine can automatically detect application workloads (especially SAP) and tune itself to better performance. This is done by redistributing memory among the different DB2 processes to optimize the usage.

ILM: The DMS (Data Managed Storage) module of Viper lets you have disks of different speeds in the same system and it will automatically allocate data to each of them based on their age. That is, older data will go onto slower disks and current data on faster ones. Thus, DB2 has a built-in ILM module as well.

Page(s)   1  



Untitled 1


Does your business have Green Intelligence


What is SDSIASWODB?


No.1 Linux platform for SAP Applications


   
 


 
 

Magazine Subscription | RQS | Contact Us | Team PCQuest