Constructed Languages Asked by Oliver Mason on August 20, 2021
I’m currently working on an NLP project using toki pona, analysing and generating sentences. I was wondering if there was a structured dictionary available for it, or whether people have attempted to create one.
Let me explain what I mean by “structured dictionary”:
There are only 120-odd words in toki pona. The word jan refers to a person. Compounds are used to be more specific, ie jan pona is “friend”, jan utala is “fighter”, jan alasa is “hunter”, etc. This is reminiscent of expressing the meaning of nouns through semantic primitives, a bit like Wilks (1975). So the set of words described by jan would be {“person”, “friend”, “fighter”, “hunter”, …}. There are subsets which are more specific, so jan utala is {“fighter”, “soldier”, “mercenary”, …}. jan utala pi ma would be {“soldier”, “private”, “general”, …}
You can envisage it as a tree structure, where the leaves are the word meanings, and the roots (as there would be multiple trees: one for each ‘top’ primitive) encompasses all of them. As you wander down the tree, the path is an ever longer chain of toki pona words, and the set of meanings covered by those words becomes smaller and smaller.
Another example would be ilo, “tool”. ilo toki is a tool for communication: {“telephone”, “telegraph”, “vhf radio”, “loud hailer”, …}; ilo toki uta suli (tool talk mouth big) could be a loud hailer. Another sub-tree would capture wireless communication devices, perhaps ilo toki pi kon.
I know this somewhat goes against the toki pona philosophy of being a simple and small language, but it seems to me to more or less accidentally provide a useful set of semantic primitives that can be used to describe word meanings in general. So before I embark on creating such a structure, has anyone already attempted something similar? Surely there must be dictionaries of mulit-word toki pona expressions? I haven’t been able to find a good one yet.
Wilks, Y (1975) An Intelligent Analyzer and Understander of English, Communications of the ACM 18(5):264-274
I think you may find some of what you're looking for in the paper Basic concepts and tools for the Toki Pona minimal and constructed language by Renato Fabbri.
A first Toki Pona Wordnet was constructed relating each of the TP words in the dictionary to English Wordnet synsets [19] through the English lemmas. The canonical (i.e. Princeton) Wordnet only contains nouns, adjectives,verbs, and adverbs. Thus, particles were not considered. Numbers were considered adjectives. Words presented as adjectives in the dictionary were considered both as adjectives and adverbs. Prepositions were considered inall classes. [19]
The TPWordnetclass [10], provides such tentative TP Wordnets in their simplest form: the TP words are keysin a dictionary that returns the corresponding synsets.
A "synset" in WordNet is a set of cognitive synonyms, so this seems like a start towards what you want. pali pona!
The references are:
Answered by mattdm on August 20, 2021
It sounds like you're after a trie.
The closest example I can think of is this glosbe site. It has a search tool, which orders matches in the way you want, but not in the structure you want.
.
The only issue is that there are fan created/added word combinations which may not be 'official'.
Answered by Pureferret on August 20, 2021
Proving a negative is always difficult, but as far as I can tell there is no obvious candidate for what you're looking for and if you made one yourself it would presumably fill an empty niche in the Toki Pona universe. At least I haven't found any sites written in English yet that seem to fit the bill.
The closest thing I can find to what you are looking for is a Toki Pona corpus search tool, which seems like it would be useful for tracking down examples of usage and might help you find compounds containing a particular word that have already been coined.
Also, I have yet to find two simpler things that would be good stepping stones for a hierarchically organized Toki Pona dictionary.
All of the Toki Pona dictionaries I have found so far define just the basic vocabulary and do not provide entries for any multi-morpheme lexemes, even famous ones like jan utala
.
Many of the dictionaries out there are more or less copies of the official dictionary, such as this one. Other dictionaries like this Wiktionary appendix do not provide definitions for whole Toki Pona expressions at a time.
Toki Pona-English parallel texts would help distinguish lexeme boundaries within complex phrases, since the presence or absence of pi
is not a completely reliable cue as to whether a "real lexeme boundary" was intended.
Answered by Gregory Nisbet on August 20, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP