infoTECH Feature

November 07, 2013

Updated Version of dtSearch Web Offers Improved Filters for Document Text Retrieval

The explosion of digital data has made vast amounts of information accessible with just a few keystrokes, yet finding what you are looking for in all those documents can be a challenge.

So, software developer dtSearch Corp. is looking to help professionals combat this problem via the company’s text retrieval and full-text search engine. Its search solutions work by building an index that stores the location of each word within a document. Once an index is complete, the company reports that search time is generally less than a second.

The company’s enterprise and developer products can reportedly index more than a terabyte of text in a single index, spanning multiple directories, emails and attachments, online data and other databases. dtSearch products can create and search any number of indexes, the company said, adding that the product line also supports highly concurrent, multithreaded searching for online and other shared access repositories. 

Available in several different iterations, depending on the needs of the end user, the latest version of the dtSearch product line includes updated NET (News - Alert), C++ and Java API options for dtSearch Engine developers, allowing embedded document filters for data parsing, conversion, extraction and display of retrieved data with highlighted hits.

One such product, dtSearch Web with Spider, is available in a beta version that provides HTML5 template enhancements for publishing instantly searchable data to an Internet or Intranet site. The dtSearch Web product is designed to provide search capability for both static and dynamic online data.

The dtSearch solutions use document filters to parse, index and search full-text and metadata content, including integrated images. The software supports both dynamic and static Web content, including HTML, XML/XSL, PDF, ASP.NET, PHP, SharePoint and more. Database support includes XML, Access, XBASE, CSV and others. The dtSearch Engine APIs also support SQL-type data, along with the full-text of BLOB data.

For users of Microsoft (News - Alert) Office and related software products, dtSearch also supports Word, PowerPoint, Excel, Access and OneNote documents, as well as PDF, OpenOffice, RAR, ZIP, GZIP/TAR and other file formats.

Both emails and attachments can be searched, including those from Outlook, Exchange and Thunderbird programs. For example, the dtSearch document filters would support the indexing and searching of an email attachment consisting of a ZIP container including both a PDF and an Access database, where the latter also includes an embedded PowerPoint with embedded images.

The dtSearch product line offers 25+ search types, including special forensics search options. The company’s products provide Unicode support for international language text, including support for right-to-left languages as well as special Chinese/Japanese/Korean character options.




Edited by Blaise McNamee
FOLLOW US

Subscribe to InfoTECH Spotlight eNews

InfoTECH Spotlight eNews delivers the latest news impacting technology in the IT industry each week. Sign up to receive FREE breaking news today!
FREE eNewsletter

infoTECH Whitepapers