ICgrep: The fastest way to search text to find the patterns
ICgrep embodies a completely new algorithmic approach to high-performance regular expression matching. In contrast to the byte-at-a-time approach of NFA, DFA, and backtracking matchers, ICgrep processes UTF-8 input streams 128 code units at a time (using SSE2 technology, or 256 code units at a time with Intel's new AVX2 instructions). Regular expressions are widely used to identify patterns in data files and data streams, with applications in security, search engines, biomedical and genome research, database systems, and in a wide range of big data applications.
An accelerated 'grep' utility supporting parallel processing of regular expressions with Unicode
While there are many libraries that have been developed to support regular expression processing, the most commonly used utility is called 'grep', which is built-in to the Linux and Mac-OS operating systems (and a commonly downloaded utility for the Windows platform). ICgrep accepts ASCII or UTF-8 input files and provides a full suite of Unicode processing features meeting the requirements of Unicode Level 1 support of Unicode Technical Standard #18. See Unicode Level 1 Support in ICgrep for details. Visit our Downloads page to try it out, or browse the source code and even build it yourself for your own machines and environments.
ICgrep builds on the patented parallel bitstream software technology of International Characters Inc (IC). ICs patented technologies are dedicated for free use in open source software, experimentation, research and teaching. (See the IC covenant for details).
Processes text up to 100 times faster
ICgrep has demonstrated performance significantly faster than other grep utilities on regex processing tasks as demonstrated at the 23rd International Conference on Parallel Architectures and Compilation. Download our ICgrep Demonstration Paper, and visit our technical website for more details.
|