ALGORITMA STEMMING DALAM BAHASA BALI MENGGUNAKAN PENDEKATAN N-GRAM
Abstract
: Regional languages are one of the nation's heritage that must be preserved. According to the results of a survey conducted by UNESCO, 11% of the world's languages are found in Indonesia. The data obtained shows that 9 regional languages in Papua have become extinct and there are more than 50 regional languages that are threatened with extinction. Balinese is one of the regional languages in Indonesia and certainly has a threat of extinction. This study aims to create a stemming algorithm to be able to obtain basic words from affixed words in Balinese. Stemming is one of the most important algorithms needed in information-gathering and text-mining methods. The developed algorithm is an algorithm that initiates research in the fields of text-mining and information retrieval in Balinese. From the results of the research conducted, it is known that the approach using the n-gram analysis technique produces unsatisfactory accuracy. In this study an additional method was used by adding a dictionary to the algorithm designed by matching the basic words with the data in the dictionary. The results of the study show that this method can increase accuracy from 54% to 80% depending on the number of basic words in the dictionary
Downloads
References
UNESCO, "Biodiversity and linguistic diversity," 23 April 2014. [Online]. Available: http://www.unesco.org/new/en/culture/the mes/endangered-languages/biodiversity- and-linguistic-diversity/.
Rinci Kembang Hapsari dan Yunus Juli Santoso, “Stemming Artikel Berbahasa Indonesia Dengan Pendekatan Confix-Stripping”. Prosiding Seminar Nasional Manajemen Teknologi XXII, 2015.
Anita Guterres, Gunawan, Joan Santoso, “Stemming Bahasa Tetun Menggunakan Pendekatan Rule Based”, 2019.
Rahardyan Bisma Setya Putra, Ema Utami, Suwanto Raharjo, “Optimalisasi Stemming Kata Berimbuhan Tidak Baku Pada Bahasa Indonesia Dengan Levenshtein Distance”. Jurnal Pengembangan IT (JPIT), Vol.03, No.02, 2018.
DPA, "Home: 140 Bahasa Daerah Di Indonesia Terancam Punah," 23 April 2014. [Online]. Available: http://www.suarapembaruan.com/home/1 40-bahasa-daerah-di-indonesia-terancam- punah/50053.
JPNN, "Berita: Ratusan Bahasa Daerah Terancam Punah," 05 September 2012. [Online] Available: http://www.dikti.go.id/id/2012/09/05/ratu san-bahasa-daerah-terancam-punah/.
A. Nazief and M. Adriani, "Confix Stripping: Approach to Stemming Algorithm for Bahasa Indonesia (Citations: 2)," in ACM Transactions on Asian Language Information Processing, 1996.
L. Agusta, "Perbandingan Algoritma Stemming Porter dengan Algoritma Nazief & Adriani untuk Stemming Dokumen Teks Bahasa Indonesia," in Konferensi Nasional Sistem dan Informatika, Bali, 2009.
Dep. P&K, Tata Bahasa Bali: Proyek Pengembangan Bahasa dan Sastra Indonesia dan Daerah, Denpasar, 1984/1985.
I. Tinggen, Pedoman Perubahan Ejaan Bahasa Bali dengan Huruf Latin dan Huruf Bali, Denpasar, 1987.
P. D. T. I. Dinas Pengajaran Bali, Ejaan Bahasa Daerah Bali yang Disempurnakan (Huruf Latin)., Denpasar, 1990.
A. HANAFI, "Ensiclopedia: Metode N- Gram," Digital Library ITS, 30 April 2009. [Online]. Available: http://digilib.ittelkom.ac.id/index.php?opt ion=com_content&view=article&id=531: metode-n- gram&catid=20:informatika&Itemid=14. [Accessed 23 April 2014].
F. RAHMAWAN, Implementasi Question Answering System pada Dokumen Bahasa Indonesia menggunakan Metode N-Gram, Bogor: Fak. MIPA, IPB, 2011.
L. Sendy Andrian Sugianto, "PEMBUATAN APLIKASI PREDICTIVE TEXT MENGGUNAKAN METODE N-GRAM-BASED," JURNAL INFRA, vol. I, no. 2, 2013.
H. Sujaini, A. Purwarianti, A. Arman and Kuspriyanto, "Extended word similarity based clustering on unsupervised PoS induction to improve English-Indonesian statistical machine translation," in Conference on Asian Spoken Language Research and Evaluation, 2013.
Copyright (c) 2020 Smart-Techno
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution International License (CC BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Copyright without Restrictions
The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
The submitted papers are assumed to contain no proprietary material unprotected by patent or patent application; responsibility for technical content and for protection of proprietary material rests solely with the author(s) and their organizations and is not the responsibility of the SMART TECHNO or its Editorial Staff. The main (first/corresponding) author is responsible for ensuring that the article has been seen and approved by all the other authors. It is the responsibility of the author to obtain all necessary copyright release permissions for the use of any copyrighted materials in the manuscript prior to the submission.