Synset Representation in The WordNet Library

In the WordNet library, a synset is represented by a struct called Synset. There are 26 members in this struct. While playing with tracing the WordNet graph, I had a chance to use half of them. Let me introduce them.

Basic Information About The Synset Itself

char* defn is a short description of the synset (a.k.a. gloss). It often includes a couple of short example sentences, too. int* pos is the part of speech of the synset. It is a character like “n” as in “noun” and “v” as in “verb”. In many cases, a pos is represented by an integer, and here's not the case for some reason. You can get the integer representation of the corresponding character with the utility function int getpos(char*).

A synset consists of one or more couple of words and senses. int wcount is the number of words of the synset. char** words is an array of the actual words. Another array int* wnsns holds each word's sense number.

When loading a synset by read_synset(int, long, char*), the word you're looking for can be passed as the third argument. The returned synset will store the position of the word in the array words as int whichword. The position starts with 1. The value should be decreased by 1 to read the word from words.

Additionally, the structure holds its offset in the database as long hereiam.

Synsets in the WordNet database are connected to each other in the network. By following adjacent synsets of a synset, you can trace synsets as far as you want. A link from a synset to another one is called a pointer. int ptrcount is the number of the pointers. The following members are arrays that contain particular information of each pointer.

  • long* ptroff: the offset (location) in the WordNet database
  • int* ppos: the part of speech
  • int* ptrtyp: the pointer type such as ANTPTR (antonym), HYPERPTR (hypernym), and HYPOPTR (hyponym). If the pointer type of a pointer is ANTPTR, the synset that is pointed to is an antonym of the synset that points to it.
  • int* pto: The position of the word in the synset that pointed to. Since the number starts from 1, the value should be decremented to use it as an array index like another_synset->words[synset->pto[i] - 1].
  • int* frm: The position of the word in the synset that points to the other synset. This also starts from 1.

See Also

Gentaro "hibariya" Terada

Otaka-no-mori, Chiba, Japan
Email me

Likes Ruby, Internet, and Programming.