The perl scripts included in the BootCaT toolkit implement an
iterative procedure to bootstrap specialized corpora and terms from
the web, requiring only a list of "seeds" (terms that are expected
to be typical of the domain of interest) as input.
In implementing the algorithm, we followed the old UNIX adage that
each program should do only one thing, but do it well. Thus, we
developed a small, independent tool for each separate subtask of the
As a result, BootCaT is extremely modular: One can easily run a subset
of the programs, look at intermediate output files, add new tools to
the suite, or change one program without having to worry about the
For more information about the BootCaT tools, please take a look at
the Readme file, available from the download page.