Synset Representation in The WordNet Library
Posted on by Gentaro "hibariya" Terada
In the WordNet library, a synset is represented by a struct called Synset
. There are 26 members in this struct. While playing with tracing the WordNet graph, I had a chance to use half of them. Let me introduce them.
Basic Information About The Synset Itself
char* defn
is a short description of the synset (a.k.a. gloss). It often includes a couple of short example sentences, too. int* pos
is the part of speech of the synset. It is a character like “n” as in “noun” and “v” as in “verb”. In many cases, a pos is represented by an integer, and here's not the case for some reason. You can get the integer representation of the corresponding character with the utility function int getpos(char*)
.
A synset consists of one or more couple of words and senses. int wcount
is the number of words of the synset. char** words
is an array of the actual words. Another array int* wnsns
holds each word's sense number.
When loading a synset by read_synset(int, long, char*)
, the word you're looking for can be passed as the third argument. The returned synset will store the position of the word in the array words
as int whichword
. The position starts with 1. The value should be decreased by 1 to read the word from words
.
Additionally, the structure holds its offset in the database as long hereiam
.
The Links to Its Adjacent Synsets
Synsets in the WordNet database are connected to each other in the network. By following adjacent synsets of a synset, you can trace synsets as far as you want. A link from a synset to another one is called a pointer. int ptrcount
is the number of the pointers. The following members are arrays that contain particular information of each pointer.
long* ptroff
: the offset (location) in the WordNet databaseint* ppos
: the part of speechint* ptrtyp
: the pointer type such asANTPTR
(antonym),HYPERPTR
(hypernym), andHYPOPTR
(hyponym). If the pointer type of a pointer isANTPTR
, the synset that is pointed to is an antonym of the synset that points to it.int* pto
: The position of the word in the synset that pointed to. Since the number starts from 1, the value should be decremented to use it as an array index likeanother_synset->words[synset->pto[i] - 1]
.int* frm
: The position of the word in the synset that points to the other synset. This also starts from 1.