Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery

Author: Gregory Grefenstette

Publisher: Springer Science & Business Media

Published: 2012-12-06

Total Pages: 313

ISBN-13: 1461527104

DOWNLOAD EBOOK

Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus.


Book Synopsis Explorations in Automatic Thesaurus Discovery by : Gregory Grefenstette

Download or read book Explorations in Automatic Thesaurus Discovery written by Gregory Grefenstette and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 313 pages. Available in PDF, EPUB and Kindle. Book excerpt: Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus.


Survey of Text Mining

Survey of Text Mining

Author: Michael W. Berry

Publisher: Springer Science & Business Media

Published: 2013-03-14

Total Pages: 251

ISBN-13: 147574305X

DOWNLOAD EBOOK

Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory. As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing needs and environments. This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.


Book Synopsis Survey of Text Mining by : Michael W. Berry

Download or read book Survey of Text Mining written by Michael W. Berry and published by Springer Science & Business Media. This book was released on 2013-03-14 with total page 251 pages. Available in PDF, EPUB and Kindle. Book excerpt: Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory. As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing needs and environments. This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.


Spotting and Discovering Terms Through Natural Language Processing

Spotting and Discovering Terms Through Natural Language Processing

Author: Christian Jacquemin

Publisher: MIT Press

Published: 2001

Total Pages: 406

ISBN-13: 9780262100854

DOWNLOAD EBOOK

The acquired parsed terms can then be applied for precise retrieval and assembly of information."--BOOK JACKET.


Book Synopsis Spotting and Discovering Terms Through Natural Language Processing by : Christian Jacquemin

Download or read book Spotting and Discovering Terms Through Natural Language Processing written by Christian Jacquemin and published by MIT Press. This book was released on 2001 with total page 406 pages. Available in PDF, EPUB and Kindle. Book excerpt: The acquired parsed terms can then be applied for precise retrieval and assembly of information."--BOOK JACKET.


Computational Linguistics and Intelligent Text Processing

Computational Linguistics and Intelligent Text Processing

Author: Alexander Gelbukh

Publisher: Springer

Published: 2003-08-03

Total Pages: 664

ISBN-13: 3540364560

DOWNLOAD EBOOK

CICLing 2003 (www.CICLing.org) was the 4th annual Conference on Intelligent Text Processing and Computational Linguistics. It was intended to provide a balanced view of the cutting-edge developments in both the theoretical foundations of computational linguistics and the practice of natural language text processing with its numerous applications. A feature of CICLing conferences is their wide scope that covers nearly all areas of computational linguistics and all aspects of natural language processing applications. The conference is a forum for dialogue between the specialists working in these two areas. This year we were honored by the presence of our keynote speakers Eric Brill (Microsoft Research, USA), Aravind Joshi (U. Pennsylvania, USA), Adam Kilgarriff (Brighton U., UK), and Ted Pedersen (U. Minnesota, USA), who delivered excellent extended lectures and organized vivid discussions. Of 92 submissions received, after careful reviewing 67 were selected for presentation; 43 as full papers and 24 as short papers, by 150 authors from 23 countries: Spain (23 authors), China (20), USA (16), Mexico (13), Japan (12), UK (11), Czech Republic (8), Korea and Sweden (7 each), Canada and Ireland (5 each), Hungary (4), Brazil (3), Belgium, Germany, Italy, Romania, Russia and Tunisia (2 each), Cuba, Denmark, Finland and France (1 each).


Book Synopsis Computational Linguistics and Intelligent Text Processing by : Alexander Gelbukh

Download or read book Computational Linguistics and Intelligent Text Processing written by Alexander Gelbukh and published by Springer. This book was released on 2003-08-03 with total page 664 pages. Available in PDF, EPUB and Kindle. Book excerpt: CICLing 2003 (www.CICLing.org) was the 4th annual Conference on Intelligent Text Processing and Computational Linguistics. It was intended to provide a balanced view of the cutting-edge developments in both the theoretical foundations of computational linguistics and the practice of natural language text processing with its numerous applications. A feature of CICLing conferences is their wide scope that covers nearly all areas of computational linguistics and all aspects of natural language processing applications. The conference is a forum for dialogue between the specialists working in these two areas. This year we were honored by the presence of our keynote speakers Eric Brill (Microsoft Research, USA), Aravind Joshi (U. Pennsylvania, USA), Adam Kilgarriff (Brighton U., UK), and Ted Pedersen (U. Minnesota, USA), who delivered excellent extended lectures and organized vivid discussions. Of 92 submissions received, after careful reviewing 67 were selected for presentation; 43 as full papers and 24 as short papers, by 150 authors from 23 countries: Spain (23 authors), China (20), USA (16), Mexico (13), Japan (12), UK (11), Czech Republic (8), Korea and Sweden (7 each), Canada and Ireland (5 each), Hungary (4), Brazil (3), Belgium, Germany, Italy, Romania, Russia and Tunisia (2 each), Cuba, Denmark, Finland and France (1 each).


Natural Language Processing – IJCNLP 2004

Natural Language Processing – IJCNLP 2004

Author: Keh-Yih Su

Publisher: Springer Science & Business Media

Published: 2005-01-31

Total Pages: 827

ISBN-13: 3540244751

DOWNLOAD EBOOK

This book constitutes the thoroughly refereed post-proceedings of the First International Joint Conference on Natural Language Processing, IJCNLP 2004, held in Hainan Island, China in March 2004. The 84 revised full papers presented in this volume were carefully selected during two rounds of reviewing and improvement from 211 papers submitted. The papers are organized in topical sections on dialogue and discourse; FSA and parsing algorithms; information extractions and question answering; information retrieval; lexical semantics, ontologies, and linguistic resources; machine translation and multilinguality; NLP software and applications, semantic disambiguities; statistical models and machine learning; taggers, chunkers, and shallow parsers; text and sentence generation; text mining; theories and formalisms for morphology, syntax, and semantics; word segmentation; NLP in mobile information retrieval and user interfaces; and text mining in bioinformatics.


Book Synopsis Natural Language Processing – IJCNLP 2004 by : Keh-Yih Su

Download or read book Natural Language Processing – IJCNLP 2004 written by Keh-Yih Su and published by Springer Science & Business Media. This book was released on 2005-01-31 with total page 827 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the thoroughly refereed post-proceedings of the First International Joint Conference on Natural Language Processing, IJCNLP 2004, held in Hainan Island, China in March 2004. The 84 revised full papers presented in this volume were carefully selected during two rounds of reviewing and improvement from 211 papers submitted. The papers are organized in topical sections on dialogue and discourse; FSA and parsing algorithms; information extractions and question answering; information retrieval; lexical semantics, ontologies, and linguistic resources; machine translation and multilinguality; NLP software and applications, semantic disambiguities; statistical models and machine learning; taggers, chunkers, and shallow parsers; text and sentence generation; text mining; theories and formalisms for morphology, syntax, and semantics; word segmentation; NLP in mobile information retrieval and user interfaces; and text mining in bioinformatics.


Encyclopedia of Information Science and Technology, First Edition

Encyclopedia of Information Science and Technology, First Edition

Author: Khosrow-Pour, D.B.A., Mehdi

Publisher: IGI Global

Published: 2005-01-31

Total Pages: 3807

ISBN-13: 159140794X

DOWNLOAD EBOOK

Comprehensive coverage of critical issues related to information science and technology.


Book Synopsis Encyclopedia of Information Science and Technology, First Edition by : Khosrow-Pour, D.B.A., Mehdi

Download or read book Encyclopedia of Information Science and Technology, First Edition written by Khosrow-Pour, D.B.A., Mehdi and published by IGI Global. This book was released on 2005-01-31 with total page 3807 pages. Available in PDF, EPUB and Kindle. Book excerpt: Comprehensive coverage of critical issues related to information science and technology.


The Role of Digital Libraries in a Time of Global Change

The Role of Digital Libraries in a Time of Global Change

Author: Gobinda Chowdhury

Publisher: Springer

Published: 2010-06-18

Total Pages: 281

ISBN-13: 3642136540

DOWNLOAD EBOOK

The year 2010 was a landmark in the history of digital libraries because for the first time this year the ACM/IEEE Joint Conference on Digital Libraries (JCDL) and the annual International Conference on Asia-Pacific Digital Libraries (ICADL) were held together at the Gold Coast in Australia. The combined conferences provided an - portunity for digital library researchers, academics and professionals from across the globe to meet in a single forum to disseminate, discuss, and share their valuable - search. For the past 12 years ICADL has remained a major forum for digital library - searchers and professionals from around the world in general, and for the Asia-Pacific region in particular. Research and development activities in digital libraries that began almost two decades ago have gone through some distinct phases: digital libraries have evolved from mere networked collections of digital objects to robust information services designed for both specific applications as well as global audiences. Con- quently, researchers have focused on various challenges ranging from technical issues such as networked infrastructure and the creation and management of complex digital objects to user-centric issues such as usability, impact and evaluation. Simulta- ously, digital preservation has emerged and remained as a major area of influence for digital library research. Research in digital libraries has also been influenced by s- eral socio-economic and legal issues such as the digital divide, intellectual property, sustainability and business models, and so on. More recently, Web 2.


Book Synopsis The Role of Digital Libraries in a Time of Global Change by : Gobinda Chowdhury

Download or read book The Role of Digital Libraries in a Time of Global Change written by Gobinda Chowdhury and published by Springer. This book was released on 2010-06-18 with total page 281 pages. Available in PDF, EPUB and Kindle. Book excerpt: The year 2010 was a landmark in the history of digital libraries because for the first time this year the ACM/IEEE Joint Conference on Digital Libraries (JCDL) and the annual International Conference on Asia-Pacific Digital Libraries (ICADL) were held together at the Gold Coast in Australia. The combined conferences provided an - portunity for digital library researchers, academics and professionals from across the globe to meet in a single forum to disseminate, discuss, and share their valuable - search. For the past 12 years ICADL has remained a major forum for digital library - searchers and professionals from around the world in general, and for the Asia-Pacific region in particular. Research and development activities in digital libraries that began almost two decades ago have gone through some distinct phases: digital libraries have evolved from mere networked collections of digital objects to robust information services designed for both specific applications as well as global audiences. Con- quently, researchers have focused on various challenges ranging from technical issues such as networked infrastructure and the creation and management of complex digital objects to user-centric issues such as usability, impact and evaluation. Simulta- ously, digital preservation has emerged and remained as a major area of influence for digital library research. Research in digital libraries has also been influenced by s- eral socio-economic and legal issues such as the digital divide, intellectual property, sustainability and business models, and so on. More recently, Web 2.


Progress in Artificial Intelligence

Progress in Artificial Intelligence

Author: Luís Seabra Lopes

Publisher: Springer

Published: 2009-10-07

Total Pages: 690

ISBN-13: 364204686X

DOWNLOAD EBOOK

This book contains a selection of higher quality and reviewed papers of the 14th Portuguese Conference on Artificial Intelligence, EPIA 2009, held in Aveiro, Portugal, in October 2009. The 55 revised full papers presented were carefully reviewed and selected from a total of 163 submissions. The papers are organized in topical sections on artificial intelligence in transportation and urban mobility (AITUM), artificial life and evolutionary algorithms (ALEA), computational methods in bioinformatics and systems biology (CMBSB), computational logic with applications (COLA), emotional and affective computing (EAC), general artificial intelligence (GAI), intelligent robotics (IROBOT), knowledge discovery and business intelligence (KDBI), muli-agent systems (MASTA) social simulation and modelling (SSM), text mining and application (TEMA) as well as web and network intelligence (WNI).


Book Synopsis Progress in Artificial Intelligence by : Luís Seabra Lopes

Download or read book Progress in Artificial Intelligence written by Luís Seabra Lopes and published by Springer. This book was released on 2009-10-07 with total page 690 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book contains a selection of higher quality and reviewed papers of the 14th Portuguese Conference on Artificial Intelligence, EPIA 2009, held in Aveiro, Portugal, in October 2009. The 55 revised full papers presented were carefully reviewed and selected from a total of 163 submissions. The papers are organized in topical sections on artificial intelligence in transportation and urban mobility (AITUM), artificial life and evolutionary algorithms (ALEA), computational methods in bioinformatics and systems biology (CMBSB), computational logic with applications (COLA), emotional and affective computing (EAC), general artificial intelligence (GAI), intelligent robotics (IROBOT), knowledge discovery and business intelligence (KDBI), muli-agent systems (MASTA) social simulation and modelling (SSM), text mining and application (TEMA) as well as web and network intelligence (WNI).


Computational Processing of the Portuguese Language

Computational Processing of the Portuguese Language

Author: Nuno J. Mamede

Publisher: Springer Science & Business Media

Published: 2003-06-18

Total Pages: 282

ISBN-13: 3540404368

DOWNLOAD EBOOK

The refereed proceedings of the 6th International Workshop on Computational Processing of the Portuguese Language, PROPOR 2003, held in Faro, Portugal, in June 2003. The 24 revised full papers and 17 revised short papers presented were carefully reviewed and selected from 64 submissions. The papers are organized in topical sections on speech analysis and recognition; speech synthesis; pragmatics, discourse, semantics, syntax, and the lexicon; tools, resources, and applications; dialogue systems; summarization and information extraction; and evaluation.


Book Synopsis Computational Processing of the Portuguese Language by : Nuno J. Mamede

Download or read book Computational Processing of the Portuguese Language written by Nuno J. Mamede and published by Springer Science & Business Media. This book was released on 2003-06-18 with total page 282 pages. Available in PDF, EPUB and Kindle. Book excerpt: The refereed proceedings of the 6th International Workshop on Computational Processing of the Portuguese Language, PROPOR 2003, held in Faro, Portugal, in June 2003. The 24 revised full papers and 17 revised short papers presented were carefully reviewed and selected from 64 submissions. The papers are organized in topical sections on speech analysis and recognition; speech synthesis; pragmatics, discourse, semantics, syntax, and the lexicon; tools, resources, and applications; dialogue systems; summarization and information extraction; and evaluation.


Survey of Text Mining II

Survey of Text Mining II

Author: Michael W. Berry

Publisher: Springer Science & Business Media

Published: 2007-12-10

Total Pages: 243

ISBN-13: 1848000464

DOWNLOAD EBOOK

This Second Edition brings readers thoroughly up to date with the emerging field of text mining, the application of techniques of machine learning in conjunction with natural language processing, information extraction, and algebraic/mathematical approaches to computational information retrieval. The book explores a broad range of issues, ranging from the development of new learning approaches to the parallelization of existing algorithms. Authors highlight open research questions in document categorization, clustering, and trend detection. In addition, the book describes new application problems in areas such as email surveillance and anomaly detection.


Book Synopsis Survey of Text Mining II by : Michael W. Berry

Download or read book Survey of Text Mining II written by Michael W. Berry and published by Springer Science & Business Media. This book was released on 2007-12-10 with total page 243 pages. Available in PDF, EPUB and Kindle. Book excerpt: This Second Edition brings readers thoroughly up to date with the emerging field of text mining, the application of techniques of machine learning in conjunction with natural language processing, information extraction, and algebraic/mathematical approaches to computational information retrieval. The book explores a broad range of issues, ranging from the development of new learning approaches to the parallelization of existing algorithms. Authors highlight open research questions in document categorization, clustering, and trend detection. In addition, the book describes new application problems in areas such as email surveillance and anomaly detection.