# kor_search Korean text search extension for PostgreSQL. ## Description `kor_search` is a PostgreSQL extension that provides text search functionality between Korean and English. Developed without reliance on external APIs like translators or morphological analyzers, it is optimized for word-based searches. While it supports sentence-to-sentence searches to some extent, it is primarily designed for word searches. This extension can be used in environments where PostgreSQL is installed, and it also provides functions that can be used in environments where external extensions are restricted, such as RDS. ### Key Features - **LIKE Search**: Checks if the input text matches or includes the specified search term. - **tsvector Search**: Converts text to tsvector format to support similar word searches. - **Regex Search**: Provides regex search functionality for complex search conditions. - **Similarity Search**: Evaluates sentence similarity based on a dictionary of synonyms. ### Flexible Custom Search `kor_search` allows you to modify internal table data to provide search tailored to specific industries or fields. For example, in the case of »ê¾÷ÀÇ ¿ª±º, the extension has been customized for searches related to construction by applying a large amount of construction-related data. ### Performance Considerations Performance analysis is essential when querying large amounts of data, as search speed can be affected by the size of the dictionary. While functions similar to the extension are implemented for environments where external extensions are restricted (such as RDS), they may not perform as well as the extension itself. #### Usage Example: 1. Sentence Similarity Check: ```sql -- '¹ä ¸Ô´Ù' is semantically similar to 'I eat rice', so TRUE is expected SELECT kor_search_similar('I eat rice', '¹ä ¸Ô´Ù'); -- Result: true -- '¼­¿ï »ì´Ù' is semantically similar to 'She lives in Seoul', so TRUE is expected SELECT kor_search_similar('She lives in Seoul', '¼­¿ï »ì´Ù'); -- Result: true -- 'Â÷°¡ ºü¸£´Ù' is semantically similar to 'The car is fast', so TRUE is expected SELECT kor_search_similar('The car is fast', 'Â÷°¡ ºü¸£´Ù'); -- Result: true ``` ### kor_like - `kor_like(input_text text, search_text text)`: Checks if the synonyms corresponding to `search_text` are included in `input_text` using a LIKE query. #### Usage Example: 1. Word Inclusion Check: ```sql -- Search for 'lg' keyword with '¿¤Áö', '¾ÙÁö' SELECT kor_like('ÀÌ°ÍÀº ¿¤Áö Á¦Ç°ÀÔ´Ï´Ù', 'lg'); -- Result: true SELECT kor_like('ÀÌ°ÍÀº LG Á¦Ç°ÀÔ´Ï´Ù', '¿¤Áö'); -- Result: true -- Search for 'apple' keyword with '¾ÖÇÃ', '»ç°ú' SELECT kor_like('¾ÖÇÃÀº ÈǸ¢ÇÑ °úÀÏÀÔ´Ï´Ù', 'apple'); -- Result: true SELECT kor_like('»ç°ú¸¦ ÁÁ¾ÆÇÕ´Ï´Ù', 'apple'); -- Result: true SELECT kor_like('AppleÀº °úÀÏÀÔ´Ï´Ù', '»ç°ú'); -- Result: true ``` ### kor_search_tsvector - `kor_search_tsvector(input_text text, search_text text)`: Checks if the synonyms corresponding to `search_text` are included in the tsvector of `input_text`. #### Usage Example: 1. Search for Similar Words Using tsvector: ```sql -- Search for 'data science' keyword with 'µ¥ÀÌÅÍ °úÇÐ', 'µ¥ÀÌÅÍ »çÀ̾ð½º' SELECT kor_search_tsvector('µ¥ÀÌÅÍ °úÇÐÀº ¹Ì·¡ÀÇ À¯¸ÁÇÑ ºÐ¾ßÀÔ´Ï´Ù', 'data science'); -- Result: true SELECT kor_search_tsvector('µ¥ÀÌÅÍ »çÀ̾𽺴 ¸¹Àº °¡´É¼ºÀ» Á¦°øÇÕ´Ï´Ù', 'data science'); -- Result: true SELECT kor_search_tsvector('Data Science´Â ¸¹Àº °¡´É¼ºÀ» Á¦°øÇÕ´Ï´Ù', 'µ¥ÀÌÅÍ °úÇÐ'); -- Result: true -- Search for 'machine learning' keyword with '¸Ó½Å·¯´×', '±â°èÇнÀ' SELECT kor_search_tsvector('¸Ó½Å·¯´× ±â¼úÀÌ ¹ßÀüÇÏ°í ÀÖ½À´Ï´Ù', 'machine learning'); -- Result: true SELECT kor_search_tsvector('±â°èÇнÀ ¾Ë°í¸®ÁòÀ» ¿¬±¸ÇÕ´Ï´Ù', 'machine learning'); -- Result: true SELECT kor_search_tsvector('Machine Learning ¾Ë°í¸®ÁòÀ» ¿¬±¸ÇÕ´Ï´Ù', '±â°èÇнÀ'); -- Result: true ``` ### kor_regex_search - `kor_regex_search(input_text text, pattern text)`: Checks if a regex pattern matches the `input_text`. #### Usage Example: 1. Regex Search: ```sql -- Search for specific word patterns using regex SELECT kor_regex_search('ÀÚ¹Ù´Â °­·ÂÇÑ ¾ð¾îÀÔ´Ï´Ù', 'ÀÚ¹Ù|ÆÄÀ̽ã'); -- Result: true SELECT kor_regex_search('ÆÄÀ̽ãÀº ¹è¿ì±â ½¬¿î ¾ð¾îÀÔ´Ï´Ù', 'ÀÚ¹Ù|ÆÄÀ̽ã'); -- Result: true SELECT kor_regex_search('JAVA¿Í PYTHONÀº ÀαâÀÖ´Â ¾ð¾îÀÔ´Ï´Ù', '(?i)ÀÚ¹Ù|ÆÄÀ̽ã'); -- Result: true -- Search for 'big data' and '´ë¿ë·® µ¥ÀÌÅÍ' using regex SELECT kor_regex_search('ºòµ¥ÀÌÅÍ ºÐ¼®ÀÌ Áß¿äÇÕ´Ï´Ù', 'ºòµ¥ÀÌÅÍ|´ë¿ë·® µ¥ÀÌÅÍ'); -- Result: true SELECT kor_regex_search('´ë¿ë·® µ¥ÀÌÅ͸¦ ó¸®ÇÕ´Ï´Ù', 'ºòµ¥ÀÌÅÍ|´ë¿ë·® µ¥ÀÌÅÍ'); -- Result: true SELECT kor_regex_search('Big Data´Â Çö´ë ±â¼úÀÇ ÇÙ½ÉÀÔ´Ï´Ù', '(?i)ºòµ¥ÀÌÅÍ|´ë¿ë·® µ¥ÀÌÅÍ'); -- Result: true ``` ## Managing the Word Conversion Table You can add new keywords and synonyms to the word conversion table. This allows for implementing custom searches tailored to specific industries or business needs. For example, to add a synonym for the 'apple' keyword, do the following: ```sql INSERT INTO kor_search_word_transform (keyword) VALUES ('apple'); INSERT INTO kor_search_word_synonyms (keyword_id, synonym) VALUES ((SELECT id FROM kor_search_word_transform WHERE keyword = 'apple'), '¾ÖÇÃ'); ```