"Lorca Corpora Collection"
The Lorca corpus is a 1-million word corpus of the complete works of Spanish poet and playwright Federico García Lorca. It was developed as part of the ECEI (Edición Crítica Electrónica Integral) project.
The aim of the project is the publication of a hypertextual edition of Lorca's work, allowing the user to both simply browse through the contents and to conduct advanced linguistic research on the texts.In view of the latter, the corpus is tokenized, pos-tagged and lemmatized.
The corpus has been indexed using the IMS Corpus WorkBench.
From this page you can access three collections of poems included in the main corpus:
: Primer Romancero Gitano
: Sonetos del Amor Oscuro
The three subcorpora can be queried separately.
For a quick tutorial on how to query the corpus using the CQP language, and for information about the properties we encoded as positional and structural attributes, please read the advanced query how-to (see link on left bar).
For information about how to extract ngram frequency lists, please read the frequency lists how-to (see link on left bar).
To access the tag list used for pos-tagging, please read the Tag List (see link on left bar).
Detailed dynamically generated information about the corpus can be found in the local Corpus Information page.
This interface is still experimental, and it is under constant revision: please check back often, and let us know what you think