Bookmark

An Efficient Way to Extract the Main Topics from a Sentence | The Tokenizer

thetokenizer.com/2013/05/09/efficient-way-to-extract-the-main-topics-of-a-sentence/, posted 2013 by peter in language nlp python toread

Last week, while working on new features for our product, I had to find a quick and efficient way to extract the main topics/objects from a sentence. Since I’m using Python, I initially thought that it’s going to be a very easy task to achieve with NLTK. However, when I tried its default tools (POS tagger, Parser…), I indeed got quite accurate results, but performance was pretty bad. So I had to find a better way. Like I did in my previous post, I’ll start with the bottom line – Here you can find my code for extracting the main topics/noun phrases from a given sentence. It works fine with real sentences (from a blog/news article). It’s a bit less accurate compared to the default NLTK tools, but it works much faster!

Bookmark

translate.google.com/toolkit, posted 2013 by peter in conversion free language nlp online

Google Translator Toolkit is a powerful and easy-to-use editor that helps translators work faster and better.

Bookmark

Delver - a natural language interface to your app

delver.io/, posted 2013 by peter in development language nlp software toread

Down in the depths of your organisation, you have a treasure-trove of valuable data. But how hard is it for your users to retrieve it? Salvage your data with a natural language interface - ask your app English questions, get clear answers and reports back.

Bookmark

High Scalability - High Scalability - DuckDuckGo Architecture - 1 Million Deep Searches a Day andÂ Growing

highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html, posted 2013 by peter in development nlp scalability search

This is an interview with Gabriel Weinberg, founder of Duck Duck Go and general all around startup guru, on what DDG’s architecture looks like in 2012.

Bookmark

BBC News - Phone call translator app to be offered by NTT Docomo

www.bbc.co.uk/news/technology-20004210, posted 2012 by peter in japan language mobile nlp voice

An app offering real-time translations is to allow people in Japan to speak to foreigners over the phone with both parties using their native tongue.
NTT Docomo - the country's biggest mobile network - will initially convert Japanese to English, Mandarin and Korean, with other languages to follow.

Even though the translations are bound to be hilariously bad sometimes, this may still be useful in some situations.

Bookmark

Is Writing Style Sufficient to Deanonymize Material Posted Online? Â« 33 Bits of Entropy

33bits.org/2012/02/20/is-writing-style-sufficient-to-deanonymize-material-posted-online/, posted 2012 by peter in language nlp privacy science

So what exactly did we achieve? Our research has dramatically increased the number of authors that can be distinguished using writing-style analysis: from about 300 to 100,000. More importantly, the accuracy of our algorithms drops off gently as the number of authors increases, so we can be confident that they will continue to perform well as we scale the problem even further. Our work is therefore the first time that stylometry has been shown to have to have serious implications for online anonymity.

Bookmark

Pattern | CLiPS

www.clips.ua.ac.be/pages/pattern, posted 2011 by peter in development free nlp python software

Pattern is a web mining module for the Python programming language.
It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).
The module is bundled with 30+ example scripts.

Bookmark

The Easy Way to Extract Useful Text from Arbitrary HTML - AI Depot

ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/, posted 2011 by peter in ai development nlp python scraping

This article shows you how to write a relatively simple script to extract text paragraphs from large chunks of HTML code, without knowing its structure or the tags used. It works on news articles and blogs pages with worthwhile text content, among others…

Bookmark

Python Package Index : jellyfish 0.1.2

pypi.python.org/pypi/jellyfish/0.1.2, posted 2010 by peter in development free language math nlp python

Jellyfish is a python library for doing approximate and phonetic matching of strings.

...

String comparison: * Levenshtein Distance * Damerau-Levenshtein Distance * Jaro Distance * Jaro-Winkler Distance * Match Rating Approach Comparison * Hamming Distance
Phonetic encoding:
* American Soundex * Metaphone * NYSIIS (New York State Identification and Intelligence System) * Match Rating Codex

Bookmark

Natural Language Toolkit

www.nltk.org/, posted 2010 by peter in ai development free language nlp python software

Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.

|< First < Previous 11–20 (46) Next > Last >|

An Efficient Way to Extract the Main Topics from a Sentence | The Tokenizer

Delver - a natural language interface to your app

High Scalability - High Scalability - DuckDuckGo Architecture - 1 Million Deep Searches a Day andÂ Growing

BBC News - Phone call translator app to be offered by NTT Docomo

Is Writing Style Sufficient to Deanonymize Material Posted Online? Â« 33 Bits of Entropy

Pattern | CLiPS

The Easy Way to Extract Useful Text from Arbitrary HTML - AI Depot

Python Package Index : jellyfish 0.1.2

Natural Language Toolkit

Hello,

More Sites and Experiments