NGordnet — WordNet Hyponym Explorer
Web-based linguistic analysis tool built in Java as part of UC Berkeley's CS61B Data Structures course. Uses graph traversal and historical language data to explore semantic relationships between words.
NGordnet combines two large datasets: Princeton WordNet (semantic relationships between words) and Google N-Gram historical word frequencies. The system allows users to search for words and explore semantic relationships such as hyponyms — more specific versions of a word — while optionally ranking results by historical usage frequency.
For example, a query for the word change returns related hyponyms such as alteration, increase, jump, leap, modification, transition, and variation — ranked using historical language usage data from the Google Books corpus.
The Challenge
The goal of the project was to extend an existing N-Gram analysis tool with semantic graph queries. Given one or more words, the system must:
- Find all matching WordNet synsets
- Traverse a directed graph of semantic relationships
- Return all reachable hyponyms
- Compute intersections for multi-word queries
- Rank results using historical frequency data
- Return the top-k most frequently used words
The implementation needed to handle datasets containing tens of thousands of words and relationships efficiently.
Technical Implementation
The project is built around three primary components:
Graph
A custom directed graph implementation using adjacency lists. Supports graph construction, efficient neighbor lookup, Depth-First Search (DFS), and reachability queries.
WordNet
Parses synset files and hyponym relationship files. Builds mappings between word → synset IDs and synset ID → words, and exposes APIs for semantic lookups.
HyponymsHandler
The web-facing query handler. Responsible for processing user requests, computing graph traversals, finding intersections between multiple word queries, integrating N-Gram frequency rankings, and returning sorted results.
What I Learned
This project strengthened my understanding of:
- Graph theory and directed graph design
- Depth-First Search (DFS) traversal
- Large-scale data parsing
- HashMaps and adjacency-list representations
- Algorithm design
- Software architecture
- Building APIs around complex data structures
Key Contributions
- Implemented directed graph data structures from scratch
- Built graph traversal algorithms for semantic search
- Parsed and integrated large WordNet datasets
- Added frequency-based ranking using N-Gram data
- Implemented multi-word query intersections
- Developed backend query handling in Java