Perfection is unattainable: Learning English as a lingua franca (ELF) involves approaching the language as a tongue shared by non-native speakers around the world rather than as a lingo that must be mastered to native-speaker level. Letting go of the idea of speaking 'perfect English' could do wonders for Japanese students' confidence.

While a lot of work has been done on tone and intonation, there has been no large-scale test of whether lexical tone and intonation are, in fact, in competition diachronically. If a functional dependency between lexical tone and intonation exists, tone languages should be more likely than intonational languages to develop grammatical devices to encode utterance-level meaning such as particles, word affixes, and changes in word order. On the other hand, if an optimal division of the phonetic space between lexical tone and intonation is often reached cross-linguistically, tonal and intonational languages should exhibit grammatical devices for encoding utterance-level meanings at a similar frequency.

People who create web forms, databases, or ontologies are often unaware how different people’s names can be in other countries. They build their forms or databases in a way that assumes too much on the part of foreign users. This article will first introduce you to some of the different styles used for personal names, and then some of the possible implications for handling those on the Web.

Click on the parts that are in the kanji you are looking for. You can click on them again to de-select them.

Amongst the thousands of languages spoken across the world, here are just eighty. How many can you distinguish between?

Nicholas Ostler, author of Ad Infinitum, a history of Latin, and the Chairman of the Foundation for Endangered Languages, compares Latin's presence on the internet (interretialis) to a small European language - it is comparable to "Icelandic, Lithuanian or Slovenian". § Ostler emails his brother in Latin for fun and enthusiasts maintain websites such as Circulus Latinus Interretialis (Internet Latin Circle), Grex Latine Loquentium (Flock of those Speaking Latin) and the connected online paper Ephemeris. The Finnish radio station YLE even broadcasts news in Latin.

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, translation, and more.

Last week, while working on new features for our product, I had to find a quick and efficient way to extract the main topics/objects from a sentence. Since I’m using Python, I initially thought that it’s going to be a very easy task to achieve with NLTK. However, when I tried its default tools (POS tagger, Parser…), I indeed got quite accurate results, but performance was pretty bad. So I had to find a better way. Like I did in my previous post, I’ll start with the bottom line – Here you can find my code for extracting the main topics/noun phrases from a given sentence. It works fine with real sentences (from a blog/news article). It’s a bit less accurate compared to the default NLTK tools, but it works much faster!

Bookmark

translate.google.com/toolkit, posted May '13 by peter in conversion free language nlp online

Google Translator Toolkit is a powerful and easy-to-use editor that helps translators work faster and better.

So what characters can you count on nearly everyone being able to see? To answer this question, I looked at the characters in the intersection of several common fonts: Verdana, Georgia, Times New Roman, Arial, Courier New, and Droid Sans. My thought was that this would make a very conservative set of characters. There are 585 characters supported by all the fonts listed above. Most of the characters with code points up to U+01FF are included. This range includes the code blocks for Basic Latin, Latin-1 Supplement, Latin Extended-A, and some of Latin Extended-B. The rest of the characters in the intersection are Greek and Cyrillic letters and a few scattered symbols. Flat, natural, sharp, and gradient didn’t make the cut.

1–10 (99)   Next >   Last >|