ALGORITMA STEMMING DALAM BAHASA BALI MENGGUNAKAN PENDEKATAN N-GRAM

  • I Putu Satwika, S.Kom., M. Kom. STMIK Primakara
  • Helmy Syahk Alam STMIK Primakara
Keywords: stemming, Balinese language, algorithm, information retrieval

Abstract

: Regional languages are one of the nation's heritage that must be preserved. According to the results of a survey conducted by UNESCO, 11% of the world's languages are found in Indonesia. The data obtained shows that 9 regional languages in Papua have become extinct and there are more than 50 regional languages that are threatened with extinction. Balinese is one of the regional languages in Indonesia and certainly has a threat of extinction. This study aims to create a stemming algorithm to be able to obtain basic words from affixed words in Balinese. Stemming is one of the most important algorithms needed in information-gathering and text-mining methods. The developed algorithm is an algorithm that initiates research in the fields of text-mining and information retrieval in Balinese. From the results of the research conducted, it is known that the approach using the n-gram analysis technique produces unsatisfactory accuracy. In this study an additional method was used by adding a dictionary to the algorithm designed by matching the basic words with the data in the dictionary. The results of the study show that this method can increase accuracy from 54% to 80% depending on the number of basic words in the dictionary

Downloads

Download data is not yet available.

References

UNESCO, "Biodiversity and linguistic diversity," 23 April 2014. [Online]. Available: http://www.unesco.org/new/en/culture/the mes/endangered-languages/biodiversity- and-linguistic-diversity/.

Rinci Kembang Hapsari dan Yunus Juli Santoso, “Stemming Artikel Berbahasa Indonesia Dengan Pendekatan Confix-Stripping”. Prosiding Seminar Nasional Manajemen Teknologi XXII, 2015.

Anita Guterres, Gunawan, Joan Santoso, “Stemming Bahasa Tetun Menggunakan Pendekatan Rule Based”, 2019.

Rahardyan Bisma Setya Putra, Ema Utami, Suwanto Raharjo, “Optimalisasi Stemming Kata Berimbuhan Tidak Baku Pada Bahasa Indonesia Dengan Levenshtein Distance”. Jurnal Pengembangan IT (JPIT), Vol.03, No.02, 2018.

DPA, "Home: 140 Bahasa Daerah Di Indonesia Terancam Punah," 23 April 2014. [Online]. Available: http://www.suarapembaruan.com/home/1 40-bahasa-daerah-di-indonesia-terancam- punah/50053.

JPNN, "Berita: Ratusan Bahasa Daerah Terancam Punah," 05 September 2012. [Online] Available: http://www.dikti.go.id/id/2012/09/05/ratu san-bahasa-daerah-terancam-punah/.

A. Nazief and M. Adriani, "Confix Stripping: Approach to Stemming Algorithm for Bahasa Indonesia (Citations: 2)," in ACM Transactions on Asian Language Information Processing, 1996.

L. Agusta, "Perbandingan Algoritma Stemming Porter dengan Algoritma Nazief & Adriani untuk Stemming Dokumen Teks Bahasa Indonesia," in Konferensi Nasional Sistem dan Informatika, Bali, 2009.

Dep. P&K, Tata Bahasa Bali: Proyek Pengembangan Bahasa dan Sastra Indonesia dan Daerah, Denpasar, 1984/1985.

I. Tinggen, Pedoman Perubahan Ejaan Bahasa Bali dengan Huruf Latin dan Huruf Bali, Denpasar, 1987.

P. D. T. I. Dinas Pengajaran Bali, Ejaan Bahasa Daerah Bali yang Disempurnakan (Huruf Latin)., Denpasar, 1990.

A. HANAFI, "Ensiclopedia: Metode N- Gram," Digital Library ITS, 30 April 2009. [Online]. Available: http://digilib.ittelkom.ac.id/index.php?opt ion=com_content&view=article&id=531: metode-n- gram&catid=20:informatika&Itemid=14. [Accessed 23 April 2014].

F. RAHMAWAN, Implementasi Question Answering System pada Dokumen Bahasa Indonesia menggunakan Metode N-Gram, Bogor: Fak. MIPA, IPB, 2011.

L. Sendy Andrian Sugianto, "PEMBUATAN APLIKASI PREDICTIVE TEXT MENGGUNAKAN METODE N-GRAM-BASED," JURNAL INFRA, vol. I, no. 2, 2013.

H. Sujaini, A. Purwarianti, A. Arman and Kuspriyanto, "Extended word similarity based clustering on unsupervised PoS induction to improve English-Indonesian statistical machine translation," in Conference on Asian Spoken Language Research and Evaluation, 2013.

Published
2020-09-14
Section
Articles