JuangaCovas.info

La página personal de Juan Gabriel Covas

Herramientas de usuario

Herramientas del sitio


linux:howtos:manticore-playground

Diferencias

Muestra las diferencias entre dos versiones de la página.

Enlace a la vista de comparación

Ambos lados, revisión anteriorRevisión previa
Próxima revisión
Revisión previa
linux:howtos:manticore-playground [21/08/2021 23:27] – [Windows] Juanga Covaslinux:howtos:manticore-playground [23/08/2021 19:15] (actual) – [Comments] Juanga Covas
Línea 1: Línea 1:
-====== Manticore search quirks ======+====== Manticore Search adventures ====== 
 + 
 +This is my page of notes after playing with Manticore Search to adopt it. 
 + 
 +**[[https://manticoresearch.com/about/|Manticore Search]]** is an open-source **search engine designed specifically for search, including full-text search**, with focus on low latency and high throughput. It was born in 2017 as a continuation of the famous //Sphinx// Search engine. 
 + 
 +**Things I like**: 
 +  * SQL-first, you can connect to the server using just a MySQL/MariaDB client or your mysql connection library of choice, from any language. 
 +    * Default port: 9306 instead of 3306 (default for mysql) 
 +  * Official PHP interface (complete HTTP API integration via cURL) for index maintenance, etc. (Searches could be done from mysql or API) 
 +  * Real Time indexes that allow instant updates  
 +    * Attaching a plain index to a real-time index: A plain -static- index can be converted into a real-time index or added to an existing real-time index. 
 +  * Supports Main+Delta schema: There's a frequent situation when the total dataset is too big to be reindexed from scratch often, but the amount of new records is rather small. Example: a forum with a 1,000,000 archived posts, but only 1,000 new posts per day. In this case, "live" (almost real time) index updates could be implemented using so called "main+delta" scheme. 
 +  * Fast geospatial search 
 +  * You could go for distributed architecture for faster indexing and searching over petabytes of data 
 + 
 +**Other notes** 
 +  * A confusing concept to understand is how //searchd// is run in "RT mode" OR "Plain mode" VS. the index types (RT index and Plain index also). **RT mode is __required__ if you want to enable  //replication//** and does __NOT__ allow to create Plain indexes from config (RT mode is set by setting ''data_dir'' in config, which is to say "RT mode").\\ \\ Basically: 
 +    * REAL-TIME MODE **requires** no index definition in the configuration file and having a //data_dir// directive in searchd section. Index files are stored inside this ''data_dir''. __Replication is available only in this mode.__ 
 +    * PLAIN MODE allows to specify index schema in config which will be read on Manticore start and created if missing. This mode is especially useful for plain indexes that need to be built from an external storage. Dropping indexes is only possible by removing them from the configuration file or by removing the path setting and sending a HUP signal to the server or restarting it.\\ **You can still use REAL-TIME INDEX (RT indexes) in this Plain Mode** since [[https://manual.manticoresearch.com/Creating_an_index/Local_indexes#Index-types-and-modes|it supports ALL index types]].
  
-This is my page of playing with Manticore Search 
  
 ===== Windows ===== ===== Windows =====
Línea 18: Línea 36:
     listen = 127.0.0.1:9312     listen = 127.0.0.1:9312
     listen = 127.0.0.1:9306:mysql     listen = 127.0.0.1:9306:mysql
-#   listen = 127.0.0.1:9308:http+#   listen = 127.0.0.1:9308:http # http(s) port can be the same of binary protocol port (9312)
     log = E:/manticore36/log/searchd.log     log = E:/manticore36/log/searchd.log
     query_log = E:/manticore36/log/query.log     query_log = E:/manticore36/log/query.log
     pid_file = E:/manticore36/log/searchd.pid     pid_file = E:/manticore36/log/searchd.pid
-    data_dir = E:/manticore36/data+#   PLAIN MODE is enabled by omitting "data_dir" (permits ALL index types, RT and Plain) 
 +#   RT-MODE is required only if you need to enable REPLICATION 
 +#   data_dir = E:/manticore36/data-rtmode # warning, data_dir *enables RT MODE* and does NOT allow index definitions at this config. (plain indexes)
     query_log_format = sphinxql     query_log_format = sphinxql
 } }
Línea 40: Línea 60:
   .\bin\searchd -c manticore.conf.in   .\bin\searchd -c manticore.conf.in
  
-To ensure a fast connection, use ''127.0.0.1'' and **not** ''localhost'':+To ensure a fast connection, use ''127.0.0.1'' and **not** ''localhost'' which can be poorly resolved:
   mysql -P9306 -h127.0.0.1   mysql -P9306 -h127.0.0.1
  
Línea 91: Línea 111:
 # can also be escaped using \. Escaping is required if # is present in database credential in source declarations. # can also be escaped using \. Escaping is required if # is present in database credential in source declarations.
  
 +===== Source =====
 +
 +Nice usage of ''sql_query_pre'', ''sql_query_range'', ''sql_range_step'', and ''sql_query_post_index''. As seen [[https://blog.ardabeyazoglu.com/using-manticoresphinx-search-with-mysql-cjxs1lei300221ws1t83ubagj|here]].
 +
 +A table to keep some indexing information
 +  CREATE TABLE `product_search_status`  (
 +    `id` varchar(30) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
 +    `value` bigint(20) UNSIGNED NOT NULL,
 +    PRIMARY KEY (`id`) USING BTREE
 +  ) ENGINE = InnoDB;
 +
 +<code|''**source configuration in manticore**''>
 +      # we set unicode charset and wait_timeout to a high value to prevent connection timeout errors
 +      sql_query_pre = SET NAMES utf8
 +      sql_query_pre = SET SESSION wait_timeout=3600
 +      # we store the index time for information
 +      sql_query_pre = REPLACE INTO product_search_status (id, value) VALUES ('last_indexed_time', UNIX_TIMESTAMP())
 +      # we set start-end document ids so that manticore will know where to start and stop indexing 
 +      sql_query_range = SELECT MIN(id), MAX(id) FROM product
 +      sql_range_step = 10000
 +      # this is the main query to create documents
 +      sql_query = SELECT \
 +                       id, \
 +                       name AS name_ft, \
 +                       categories AS categories_ft, \
 +                       name \
 +                 FROM product \
 +                 WHERE id >= $start AND id <= $end
 +      # we store the most recent document id for information
 +      sql_query_post_index = REPLACE INTO product_search_status (id, value) VALUES ('last_indexed_id', $maxid)
 +</code>
  
linux/howtos/manticore-playground.1629581267.txt.bz2 · Última modificación: 21/08/2021 23:27 por Juanga Covas