Algoritma Stemming Dalam Bahasa Bali Menggunakan Pendekatan N-Gram
Abstract
: Regional languages are one of the nation's heritage that must be preserved. According to the results of a survey conducted by UNESCO, 11% of the world's languages are found in Indonesia. The data obtained shows that 9 regional languages in Papua have become extinct and there are more than 50 regional languages that are threatened with extinction. Balinese is one of the regional languages in Indonesia and certainly has a threat of extinction. This study aims to create a stemming algorithm to be able to obtain basic words from affixed words in Balinese. Stemming is one of the most important algorithms needed in information-gathering and text-mining methods. The developed algorithm is an algorithm that initiates research in the fields of text-mining and information retrieval in Balinese. From the results of the research conducted, it is known that the approach using the n-gram analysis technique produces unsatisfactory accuracy. In this study an additional method was used by adding a dictionary to the algorithm designed by matching the basic words with the data in the dictionary. The results of the study show that this method can increase accuracy from 54% to 80% depending on the number of basic words in the dictionary
Downloads
References
UNESCO, "Biodiversity and linguistic diversity," 23 April 2014. [Online]. Available: http://www.unesco.org/new/en/culture/the mes/endangered-languages/biodiversity- and-linguistic-diversity/.
Rinci Kembang Hapsari dan Yunus Juli Santoso, “Stemming Artikel Berbahasa Indonesia Dengan Pendekatan Confix-Stripping”. Prosiding Seminar Nasional Manajemen Teknologi XXII, 2015.
Anita Guterres, Gunawan, Joan Santoso, “Stemming Bahasa Tetun Menggunakan Pendekatan Rule Based”, 2019.
Rahardyan Bisma Setya Putra, Ema Utami, Suwanto Raharjo, “Optimalisasi Stemming Kata Berimbuhan Tidak Baku Pada Bahasa Indonesia Dengan Levenshtein Distance”. Jurnal Pengembangan IT (JPIT), Vol.03, No.02, 2018.
DPA, "Home: 140 Bahasa Daerah Di Indonesia Terancam Punah," 23 April 2014. [Online]. Available: http://www.suarapembaruan.com/home/1 40-bahasa-daerah-di-indonesia-terancam- punah/50053.
JPNN, "Berita: Ratusan Bahasa Daerah Terancam Punah," 05 September 2012. [Online] Available: http://www.dikti.go.id/id/2012/09/05/ratu san-bahasa-daerah-terancam-punah/.
A. Nazief and M. Adriani, "Confix Stripping: Approach to Stemming Algorithm for Bahasa Indonesia (Citations: 2)," in ACM Transactions on Asian Language Information Processing, 1996.
L. Agusta, "Perbandingan Algoritma Stemming Porter dengan Algoritma Nazief & Adriani untuk Stemming Dokumen Teks Bahasa Indonesia," in Konferensi Nasional Sistem dan Informatika, Bali, 2009.
Dep. P&K, Tata Bahasa Bali: Proyek Pengembangan Bahasa dan Sastra Indonesia dan Daerah, Denpasar, 1984/1985.
I. Tinggen, Pedoman Perubahan Ejaan Bahasa Bali dengan Huruf Latin dan Huruf Bali, Denpasar, 1987.
P. D. T. I. Dinas Pengajaran Bali, Ejaan Bahasa Daerah Bali yang Disempurnakan (Huruf Latin)., Denpasar, 1990.
A. HANAFI, "Ensiclopedia: Metode N- Gram," Digital Library ITS, 30 April 2009. [Online]. Available: http://digilib.ittelkom.ac.id/index.php?opt ion=com_content&view=article&id=531: metode-n- gram&catid=20:informatika&Itemid=14. [Accessed 23 April 2014].
F. RAHMAWAN, Implementasi Question Answering System pada Dokumen Bahasa Indonesia menggunakan Metode N-Gram, Bogor: Fak. MIPA, IPB, 2011.
L. Sendy Andrian Sugianto, "PEMBUATAN APLIKASI PREDICTIVE TEXT MENGGUNAKAN METODE N-GRAM-BASED," JURNAL INFRA, vol. I, no. 2, 2013.
H. Sujaini, A. Purwarianti, A. Arman and Kuspriyanto, "Extended word similarity based clustering on unsupervised PoS induction to improve English-Indonesian statistical machine translation," in Conference on Asian Spoken Language Research and Evaluation, 2013.
Copyright (c) 2020 Smart-Techno

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with the Smart Techno agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work. (See The Effect of Open Access)