My Library

University LibraryCatalogue

Limit search to items available for borrowing or consultation
Result Page: Previous Next
Can't find that book? Try BONUS+

Search Discovery

Search CARM Centre Catalogue

Search Trove

Add record to RefWorks


Title The New York times annotated corpus [electronic resource].

Published [Philadelphia, Pa.] : Linguistic Data Consortium, c2008.


Location Call No. Status
 UniM Bail CD-ROM  025.40285 NEW    ASK AT DESK
Physical description 1 DVD-ROM ; 4 3/4 in.
Series LDC corpora ; LDC2008T19
LDC corpora ; LDC2008T19.
Notes Title from disc label.
Restrictions Access restricted to University of Melbourne staff and students only.
Use of the data must comply with the LDC Terms and Conditions:
Summary The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at The corpus includes: over 1.8 million articles (excluding wire services articles that appeared during the covered period); over 650,000 article summaries written by library scientists; over 1,500,000 articles manually tagged by library scientists with tags drawn from a normalized indexing vocabulary of people, organizations, locations and topic descriptors; over 275,000 algorithmically-tagged articles that have been hand verified by the online production staff at; Java tools for parsing corpus documents from .xml into a memory resident object. As part of the New York Times' indexing procedures, most articles are manually summarized and tagged by a staff of library scientists. This collection contains over 650,000 article-summary pairs which may prove to be useful in the development and evaluation of algorithms for automated document summarization. Also, over 1.5 million documents have at least one tag. Articles are tagged for persons, places, organizations, titles and topics using a controlled vocabulary that is applied consistently across articles.
Other author Linguistic Data Consortium.
New York Times Company
Subject New York times -- Abstracting and indexing -- Databases
New York times -- Data processing -- Databases
Computational linguistics -- Databases.
English language -- Data processing -- Databases.
ISBN 1585634865