SSLMIT Dev Home
|
Site map
Web as Corpus
Web as Corpus
Post-Processing
Web as Corpus Home
Tools and Resources
Crawling
Post-Processing
Annotation
Indexing and interfaces
About SSLMIT Dev
About this Site
Contact Us
Post processing
Tools for post processing downloaded text
Available tools:
shared_ngrams_collector.pl
version 0.41 (2005-07-26)
PotaModule
version 1.0 (2005-09-07), the archive contains the module and sample scripts.
See the readme file for more information
Shared ngrams collector
v 0.41 (2005-07-26)
DOWNLOAD
Perl script useful for near-duplicate detection
PotaModule
v 1.0 (2005-09-07)
DOWNLOAD
Perl module useful for boilerplate stripping
Manage Your Profile
|
Contact Us
|
SSLMIT Dev Online Newsletter
©2004 SSLMIT (University of Bologna).
Terms of Use
|
Privacy Statement