WordNet Provides Morphological Processing Functions
Posted on by Gentaro "hibariya" Terada
WordNet library has handy functions to deal with irregular inflections like morphstr()
. It tries to find the base form of the given word and the pos (part of speech) in the database.
When it comes to looking up a word with WordNet, the word is often unfamiliar for the person. You can guess whether the word is singular or plural form, present or past tense from the context of the sentence. However, you cannot be 100% sure about the base form because there is a possibility that the word morphs in an unexpected way. If you are not sure about the base form of a given word, morphstr()
would help.
For example, morphstr()
returns the base form of irregular words like below.
morphstr("automata", NOUN); // "automaton"
morphstr("forsaken", VERB); // "forsake"
morphstr("sought", VERB); // "seek"
morphstr("flown", VERB); // "fly"
morphstr("cacti", NOUN); // "cactus"
morphstr("alumni", NOUN); // "alumnus"
morphstr("axes", NOUN); // "ax"
morphstr("corpora", NOUN); // "corpus"
morphstr("criteria", NOUN); // "criterion"
morphstr("errata", NOUN); // "erratum"
morphstr("feet", NOUN); // "foot"
morphstr("geese", NOUN); // "goose"
morphstr("lice", NOUN); // "louse"
morphstr("mice", NOUN); // "mouse"
morphstr("phenomena", NOUN); // "phenomenon"
Note that a word might have multiple base forms. To get another base form for the same word, call the function passing NULL for the first argument. If there is not another form, the function will return NULL.
morphstr("axes", NOUN); // "ax"
morphstr(NULL, NOUN); // "axis"
morphstr(NULL, NOUN); // NULL
Since this function returns a pointer to a static character buffer, subsequent calls with a non-null first argument will modify the same buffer. So callers should duplicate the returned string and should not use the original one.