As the size of text documents based on cloud storage increases, the time and cost of string search and keyword search increase. However, when searching for words or sentences in documents, most string search algorithms do not take the lexical structure used in the real world, or the constitutional characteristics of the character, into account. In particular, the previous string search algorithms have not considered well-formatted official document (articles, news, novels, academic papers, patents, etc.) characteristics of a limited number of characters and composition. In this paper, we propose a vowel-oriented binary tree that considers the probability of the occurrence of a character in real world documents and its compositional characteristics in well-formatted documents and well-formatted words. Based on the vowel-oriented binary tree, we propose a vowel-centered string search algorithm that searches for a specific word in a document. Based on several dictionaries (Free Dictionary Project Dictionary, Scrabble Helper), the frequency and pattern of occurrence of vowels and consonants were analyzed. A strategy and an algorithm for constructing a vowel-oriented binary tree that can express the frequency and probability patterns of the occurrence of vowels are proposed. The vowel-oriented binary tree is reconstructed according to the characteristics of the occurrence of vowels, and the consonants existing between vowels are distinguished and expressed. In addition, based on the vowel-oriented binary tree, we propose an enhanced vowel-oriented string search algorithm that quickly searches for words that can occur in real world documents.
|Title of host publication
|Advances in Information and Communication - Proceedings of the 2022 Future of Information and Communication Conference, FICC
|Springer Science and Business Media Deutschland GmbH
|Number of pages
|Published - 2022
|Future of Information and Communication Conference, FICC 2022 - Virtual, Online
Duration: 2022 Mar 3 → 2022 Mar 4
|Lecture Notes in Networks and Systems
|Future of Information and Communication Conference, FICC 2022
|22/3/3 → 22/3/4
Bibliographical noteFunding Information:
Acknowledgments. This work was supported by the 2021 Korea National Open University Research Fund.
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
- Occurrence frequency of vowels
- Repetition pattern of vowels
- String search
- Vowel-based string search
- Vowel-oriented binary tree
- Vowel-oriented string search algorithm
ASJC Scopus subject areas
- Control and Systems Engineering
- Signal Processing
- Computer Networks and Communications