Morph-it! is a free (as in free speech and free beer) morphological resource for the Italian language (more legal stuff at the bottom of the page).
Morph-it! is a lexicon of inflected forms with their lemma and morphological features. For example:
The lexicon currently contains 505,074 entries and 35,056 lemmas.
Morph-it! can be used as a data source for a lemmatizer / morphological analyzer / morphological generator.
As an example application, we provide pre-compiled versions of the lexicon for use with the
SFST Tools and for Jan Daciuk's
FSA utilities. You can download both automata from the box on
the right. Jan Daciuk's automaton (codenamed barba) is platform independent while the SFST automata (codenamed pippi) are compiled for the i386
platform, we can provide advice on the compilation process if you are on a different platform.
You can see a demonstration of the functionalities of both automata in the "Demo" section of this page. The SFST Tools provide morphological
analysis, Jan Daciuk's fsa_guess is the engine of the guesser part of the demo.
The data for Morph-it! were prepared by Marco Baroni and
Eros Zanchetta using a mixture of corpus-based methods,
regular-expression-based rules and manual checking (see the papers below for all the gory details).
Morph-it! is constantly under construction and there is little doubt that it will contain errors, gaps, unlikely forms, etc. If you find any problem, we will be grateful if you point it out to us (email@example.com).
- Eros Zanchetta and Marco Baroni (2005) Morph-it! A free corpus-based morphological resource for the Italian language, proceedings of Corpus Linguistics 2005, University of Birmingham, Birmingham, UK. (bib)
- Helmut Schmid, Marco Baroni, Eros Zanchetta, Achim Stein (2007) The Enriched TreeTagger System, in proceedings of the Evalita 2007 Workshop (10th Congress of Italian Association for Artificial Intelligence, AI*IA 2007), University of Roma "Tor Vergata", Rome, Italy.
Who's using Morph-it!
Please quote the CL 2005 article if you use Morph-it! in your research