Synset Representation in The WordNet Library
Posted on by Gentaro "hibariya" Terada
In the WordNet library, a synset is represented by a struct called
Synset. There are 26 members in this struct. While playing with tracing the WordNet graph, I had a chance to use half of them. Let me introduce them.
Basic Information About The Synset Itself
char* defn is a short description of the synset (a.k.a. gloss). It often includes a couple of short example sentences, too.
int* pos is the part of speech of the synset. It is a character like “n” as in “noun” and “v” as in “verb”. In many cases, a pos is represented by an integer, and here's not the case for some reason. You can get the integer representation of the corresponding character with the utility function
A synset consists of one or more couple of words and senses.
int wcount is the number of words of the synset.
char** words is an array of the actual words. Another array
int* wnsns holds each word's sense number.
When loading a synset by
read_synset(int, long, char*), the word you're looking for can be passed as the third argument. The returned synset will store the position of the word in the array
int whichword. The position starts with 1. The value should be decreased by 1 to read the word from
Additionally, the structure holds its offset in the database as
The Links to Its Adjacent Synsets
Synsets in the WordNet database are connected to each other in the network. By following adjacent synsets of a synset, you can trace synsets as far as you want. A link from a synset to another one is called a pointer.
int ptrcount is the number of the pointers. The following members are arrays that contain particular information of each pointer.
long* ptroff: the offset (location) in the WordNet database
int* ppos: the part of speech
int* ptrtyp: the pointer type such as
HYPOPTR(hyponym). If the pointer type of a pointer is
ANTPTR, the synset that is pointed to is an antonym of the synset that points to it.
int* pto: The position of the word in the synset that pointed to. Since the number starts from 1, the value should be decremented to use it as an array index like
another_synset->words[synset->pto[i] - 1].
int* frm: The position of the word in the synset that points to the other synset. This also starts from 1.