We use cookies to help us make this website better. By browsing our site you agree to our use of cookies. You will only see this message once.

Find out more


THE NATURAL DYNAMICS OF DATA

CHALLENGE - SHARING LEGACY INFORMATION


We identified XML as a standard for exchanging disparate information types

THE PROBLEMS


MONTGOMERY, Ala. - The military must jettison its long-standing concepts regarding information ownership and integrate information technology systems to enable the military to fight jointly, according to two high-ranking generals.

"Our legacy systems just don't talk to each other," said Gen. Lance Smith, commander of the U.S. Joint Forces Command. "And the reason for that is somebody thought that there was data that was unique to them. We have to build a culture that is gathering that kind of information and making it available to commanders in the field."

Smith called for the military to agree on an enforceable, integrated data strategy. The military services have set up data fiefdoms to protect their information, Smith said. "We don't have good standards out there," he added. "We have to agree on a common data model, which is our responsibility."

(Reported in Federal Computer Weekly - 21 Aug 06)

THE BACKGROUND


Openkast has looked at the issue of legacy information integration projects and come to the following conclusions:

We identified XML as a standard for exchanging disparate information types.However, using RDBMS technology to host complex XML data structures is highly inefficient.

IBM has recently announced a development effort to add native XML database capabilities to its DB2 technology but recognises the "accepted" problems with XML data management:

- Data bloat (Huge extra disk resources plus the concomitant management resource required

Measurements have been performed on sample Openkast test data sets. The test data has been generated using a test tool that creates a stream of transaction records that represent concurrent user activity.

The total size of all the transaction records was 310Mbytes of XML text (including mark-up and content). The total number of records was 500,000. Once stored and encoded in Openkast the size is reduced to 182Mbytes. This gives a reduction down to 60% of the original XML data size.

- Slow update performance of the data base

The acquisition and storage rate was measured as 14 minutes for 100,000 transaction records or 119 records per second. The host system used for the measurements was equipped with a 1.6GHz Pentium CPU and a 5400rpm disk drive.

Retrieval performance is based on the query and content. The tracker allows free text queries to complete very quickly.

Best case searches are those that can be completed by processing the query against the tracker and do not require further processing. Typically the tracker will be cached in memory after several searches and the search will produce a result within milliseconds on a repository of several million records

THE SOLUTION


The XML silver bullet has problems which we have addressed with a technology called Openkast.

We have understood from the beginning that information integration is all about collections of databases with many schemas. Openkast has only one index mode. Total indexing. Full text plus full context, full path index.

This is achieved without the penalties of disk usage explosion and painful transaction throughput rates normally associated with XML full indexation.