Playing With WordNet as an English Learner

TL; DR WordNet helps English learners with getting a clearer grasp of relationships between a couple of words. Its API lets the users explore the whole network of English words in their own way along with their purpose.

WordNet is a huge lexical database of English. Every sense of words is represented as a “synset” (a set of synonyms) in the database. Every synset has its short definition with a couple of example sentences and pointers to related words.

Unlike ordinary dictionaries, synsets are linked to other synsets and each link has its type of relationships like synonym, antonym, hypernym, hyponym, holonym, and meronym. For example, both “left” and “wrong” are linked to “right” as its antonym. “delete” is linked to “remove” as its hyponym — a word that is more specific than a given word. “publication” is linked to “book” as its hypernym — a word that is more generic than a given word.

To learn a 2nd language, we don't have to be a linguist. Having said that, exploring the relationships between words gives us great information to get a more accurate picture of each word. If you know that “delete” is a hyponym of “remove”, you will be able to make a more specific expression by replacing “remove” with “delete” in some contexts. WordNet is a great tool to get such information.

Using WordNet With the Built-in Interfaces

The easiest way to use WordNet is usually its web and command-line interface. For example, the following command shows each sense with its short description of the given word “flour”.

$ wn flour -over
Overview of noun flour

The noun flour has 1 sense (first 1 from tagged texts)

1. (3) flour -- (fine powdery foodstuff obtained by grinding and sifting the meal of a cereal grain)

Overview of verb flour

The verb flour has 2 senses (no senses from tagged texts)

1. flour -- (cover with flour; "flour fish or meat before frying it")
2. flour -- (convert grain into flour)

Also, it can display related words which match the given search type option. The command below displays hyponyms of “impede” as a verb.

$ wn impede -hypov
Troponyms (hyponyms) of verb impede

2 senses of impede

Sense 1
impede, hinder
       => obstruct, obturate, impede, occlude, jam, block, close up
       => inhibit
       => repress
       => interfere
       => set back
       => hobble
       => stunt
...

The rules of these options are a bit tricky. Its suffix changes depending on the part of speech of the word like -hypov and -hypon. You can also just pass a word you want to look up. Without any options, wn will tell us which options are available for the given word.

$ wn association
Information available for noun association
        -antsn          Antonyms
        -hypen          Hypernyms
        -hypon, -treen  Hyponyms & Hyponym Tree
        -synsn          Synonyms (ordered by estimated frequency)
        -membn          Has Member Meronyms
...

The output of the command above shows there are antonyms, hypernyms, hyponyms, and more for “association”.

Using WordNet API

WordNet also provides its C library API. As you can see, synsets in the database are connected to each other. As its name suggests, it's like a huge graph of English words. So even if the existing interfaces do not quite fit your purpose, you can still do what you want with its API. The whole graph can be traced in any way you like.

Find Synsets for a Word and POS

First off, let me explain more about synset. A synset expresses a particular sense but does not represent a whole word. A word can be a different part of speech and have multiple senses. “right” can be used as a noun, adjective, adverb, and also verb. In a sense, “right” expresses a direction, but in another sense, it means a legal entitlement. The thing is word and synset are different concepts and a word can be expressed by multiple synsets.

Here's an example function that prints each sense of a word.

void display_synsets(char* word, int pos) { // pos (part of speech) can be NOUN, VERB, ADJ, ADV, or SATELLITE
  wninit();

  Index* index = getindex(word, pos);

  if (!index) {
    puts("Failed to get the index.");
    exit(1);
  }

  for (int i = 0; i < index->off_cnt; i++) {
    Synset* synset = read_synset(pos, index->offset[i], word);

    for (int j = 0; j < synset->wcount; j++) {
      printf("%s#%d; ", synset->words[j], synset->wnsns[j]);
    }
    puts(synset->defn);
    display_linked_synsets(synset);
    puts("");

    free_synset(synset);
  }

  free_index(index);
}

Before calling any API, we have to open database files with wninit in advance.

getindex takes a word and pos (part of speech) and returns an index that contains information about the synsets that matches the given word and pos. The index has off_cnt — the number of synsets — and offset — the array of the location for each synset.

read_synset returns a synset reading from the database with the given pos and offset.

A synset consists of a set of words that represents a particular sense, and its metadata. The set of words can be iterated with wcount and words. wnsns holds the sense number for each word. A synset also has a short description of the sense as defn.

An index and synset have to be freed by free_index and free_synset when you are done with them.

Trace the Graph

The ability to express relationships between different senses is the beauty part of WordNet. There's no reason not to take the advantage of it. Let's trace the graph.

A synset holds information about the other related synsets like hypernyms and hyponyms. By reading these adjacent synsets one by one, you can move around over the graph connected to the first synset you got.

The following example reads all the synsets linked from the given one, and prints the words and the definition for each of them.

void display_linked_synsets(Synset* synset) {
  for (int i = 0; i < synset->ptrcount; i++) {
    printf("  -- %s --> ", type_labels[synset->ptrtyp[i]]);
    Synset* another = read_synset(synset->ppos[i], synset->ptroff[i], "");
    for (int j = 0; j < another->wcount; j++) {
      printf("%s(%s)#%d; ", another->words[j], another->pos, another->wnsns[j]);
    }
    puts("");
    free_synset(another);
  }
}

ptrcount is the number of synsets linked from itself. The following members hold particular information for each linked synset.

  • ptrtyp: The type of the relationship (e.g., hypernym). Corresponding constants can be found in “wn.h” (from ANTPTR to INSTANCES) .
  • ppos: The part of speech of the synset. Corresponding constants can be found in “wn.h” (from NOUN to SATELLITE).
  • ptroff: The offset of the synset.

Conclusion

Go to TL; DR

See Also

Example Code

#include <stdio.h>
#include <stdlib.h>
#include "wn.h"

static char* type_labels[] = {
  "(unknown)",
  "ANTPTR",
  "HYPERPTR",
  "HYPOPTR",
  "ENTAILPTR",
  "SIMPTR",
  "ISMEMBERPTR",
  "ISSTUFFPTR",
  "ISPARTPTR",
  "HASMEMBERPTR",
  "HASSTUFFPTR",
  "HASPARTPTR",
  "MERONYM",
  "HOLONYM",
  "CAUSETO",
  "PPLPTR",
  "SEEALSOPTR",
  "PERTPTR",
  "ATTRIBUTE",
  "VERBGROUP",
  "DERIVATION",
  "CLASSIFICATION",
  "CLASS",
  "SYNS",
  "FREQ",
  "FRAMES",
  "COORDS",
  "RELATIVES",
  "HMERONYM",
  "HHOLONYM",
  "WNGREP",
  "OVERVIEW"
  "CLASSIF_CATEGORY",
  "CLASSIF_USAGE",
  "CLASSIF_REGIONAL",
  "CLASS_CATEGORY",
  "CLASS_USAGE",
  "CLASS_REGIONAL",
  "INSTANCE",
  "INSTANCES"
};

void display_linked_synsets(Synset* synset) {
  for (int i = 0; i < synset->ptrcount; i++) {
    printf("  -- %s --> ", type_labels[synset->ptrtyp[i]]);
    Synset* another = read_synset(synset->ppos[i], synset->ptroff[i], "");
    for (int j = 0; j < another->wcount; j++) {
      printf("%s(%s)#%d; ", another->words[j], another->pos, another->wnsns[j]);
    }
    puts("");
    free_synset(another);
  }
}

void display_synsets(char* word, int pos) { // pos (part of speech) can be NOUN, VERB, ADJ, ADV, or SATELLITE
  wninit();

  Index* index = getindex(word, pos);

  if (!index) {
    puts("Failed to get the index.");
    exit(1);
  }

  for (int i = 0; i < index->off_cnt; i++) {
    Synset* synset = read_synset(pos, index->offset[i], word);

    for (int j = 0; j < synset->wcount; j++) {
      printf("%s#%d; ", synset->words[j], synset->wnsns[j]);
    }
    puts(synset->defn);
    display_linked_synsets(synset);
    puts("");

    free_synset(synset);
  }

  free_index(index);
}

int main(int argc, char *argv[]) {
  // For example, display synsets for "facilitate" as a verb
  display_synsets("facilitate", VERB);

  return 0;
}

Output

facilitate#1; ease#3; alleviate#2; (make easier; "you could facilitate the process by sharing your knowledge")
  -- HYPERPTR --> help(v)#1; assist(v)#1; aid(v)#1;
  -- DERIVATION --> easing(n)#1; moderation(n)#2; relief(n)#7;
  -- DERIVATION --> facilitative(s)#1;
  -- DERIVATION --> facilitation(n)#1;
  -- DERIVATION --> facilitation(n)#3;
  -- DERIVATION --> facilitator(n)#1;

help#3; facilitate#2; (be of use; "This will help to prevent accidents")
  -- HYPERPTR --> serve(v)#3;
  -- DERIVATION --> facilitation(n)#3;
  -- DERIVATION --> aid(n)#1; assistance(n)#2; help(n)#3;
  -- DERIVATION --> aid(n)#2; assist(n)#1; assistance(n)#1; help(n)#1;

facilitate#3; (increase the likelihood of (a response); "The stimulus facilitates a delayed impulse")
  -- HYPERPTR --> cause(v)#1; do(v)#5; make(v)#5;
  -- CLASSIF_USAGE --> physiology(n)#2;
  -- DERIVATION --> facilitatory(s)#1;
  -- DERIVATION --> facilitation(n)#2;

Gentaro "hibariya" Terada

Otaka-no-mori, Chiba, Japan
Email me

Loves Ruby, Internet, and Programming.