Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search
A distributed massive data engine for vertical search in C++.
RDBMS
. Collections could managed totally dynamically without stopping the server instance.Commercially proved . SF1R has been fully proved under commercial environments with both complicated situations and ultra high concurrency. In order to satisfy different kinds of requirements, three kinds of indices are supported within SF1R, including Lucene like file based inverted index, pure memory based inverted index with ultra high decompression performance, and succinct self index. This is a practical deployment for a search cloud with both distributed and non-distributed verticals, all of them are behind a single nginx based http reverse proxy to provide unified entry.
Mining components extendable. In the early stage of SF1R, there are tens of mining components attached, such as duplicate detection
,taxonomy generation
, query recommendation
, collaborative filtering
,…,etc. To keep the repository as lite as possible, we made some refinements to remove most mining components. However, the architecture of SF1R has guaranteed the flexibility to introduce any of them, actually, one of index—-succinct self index, it was encapsulated using mining component for conveniences.
The Chinese documents could be accessed here, while we also prepared the English technical report.
We’ve just switched to C++ 11
for SF1R recently, and GCC 4.8
is required to build SF1R correspondingly. We do not recommend to use Ubuntu for project building due to the nested references among lots of libraries. CentOS / Redhat / Gentoo / CoreOS are preferred platform. You also need CMake
and Boost 1.56
to build the repository .Here are the dependent repositories list:
Besides, there are some third party repositores required:
Cassandra
client in izenelib.Additionally, there are two extra projects:
To use SF1R, you should have configuration files located in the config
directory. After that:
$ cd bin
$ ./CobraProcess -F config
Please see the documents for further usage.
The SF1R project is published under the Apache License, Version 2.0:
http://www.apache.org/licenses/LICENSE-2.0