Alta Tokenizer is a tool for converting Kinyarwanda text into tokens.
It was trained on Kinyarwanda text which means it has high compression rate on Kinyarwanda compared to other languages in general.
What is Tokenization?
Tokenization is a process to convert text into tokens. Language models process text using tokens which are sequence of integers produced by tokenizer given the text.
You can use below playground to see how tokenizer converts text into tokens.
0
0
0X