Wastholm.com

Log In

Language and Technology

Hello, you are reading Peter Wastholm's brand new blog about linguistics, natural language processing and computers in general.

1–9 (9)

Reverting a Zaurus to the Original Sharp Settings

posted 11 Apr by peter

IMG_0857

Back to normal. Or at least to the original, Sharp-supplied software.

A few months ago, I installed OpenZaurus on my Zaurus SL-C3100 handheld computer, but I was never really happy with it. This week, I spent a frustrating evening reverting the machine to factory settings. Just in case someone out there might want to do something similar, here's how I pulled it off.

The Problem with OpenZaurus

"Upgrading" from the Sharp ROM that the machine ships with to OpenZaurus seemed like a good idea on paper: a more modern desktop, sorry, palmtop environment (a recent Opie instead of a somewhat old Qtopia); a more recent kernel; and, I thought, a larger selection of applications. That last bit may or may not be true, but all in all, the reality was not so cut and dry.

The Opie environment and the Konqueror Embedded web browser were swell; other things, not so much. The biggest problem was that the volume applet didn't seem to do anything, so I had to use the terminal program alsamixer a lot. I also never managed to get full-screen video to work, or to figure out how to close the lid while the music player was running without suspending the machine. I even had to make an ugly hack to get software package management to work. (The package manager was, in itself, great, but it used wget to download stuff, and the busybox version used under OpenZaurus didn't understand all the command-line switches so I had to hack up a script that pretended to be wget, discarded the offending switches and invoked the actual program. Yuck.) And I missed the neat "combined English and Japanese" feature more than I thought I would. So, rather than moving ahead and trying out the latest and greatest Ångström, I decided to go back to the trusty old Sharp ROM.

Reinstalling the Sharp ROM

I found these instructions, but following them only got me halfway towards my goal. After I had completed the process outlined on that page, my Zaurus booted nicely — the first time. I then applied the patch to switch the language setting from "all…

Read More Comment

The Spamalyzer Gets Her Own Site

posted 8 Mar by peter

spamalyzer

The Spamalyzer hard at work.

Assuming that there must be something wrong with the people who send spam, I opened a free advice column for spammers a few years ago. I am not qualified to give such advice myself, but fortunately, the Spamalyzer is.

The Spamalyzer is a slightly customized version of the well-known Eliza psychoanalytical conversation robot; above all, the script that the robot follows is specially crafted to generate reasonable responses (or at least comical non sequitur) to phrases likely to appear in spam messages. I set it up so that a few of the hundreds of spam messages I receive every day are passed to the Spamalyzer (we are of course unable to help everyone). Every couple of days, I then post her best responses to the advice column, which recently moved from a page here to a site all its own at DearSpamalyzer.com.

Yes, this is silly, but entertaining at times, if you are so inclined. On a more useful note, I am working on software to collect some statistics on the spam I receive. I don't mean quantitative statistics — there's plenty of that already — but rather qualitative statistics, like the "spamminess" of words and phrases and other forms of linguistic analysis. Things like this can be used in rule-based spam filters, like SpamAssassin and its ilk. I'll post them here and on DearSpamalyzer.com as they become available.

Comment

Toddlers as Data Miners

posted 6 Feb by peter

In an interesting experiment, some researchers at Indiana University have demonstrated that toddlers (in this case, 12- to 14-month-olds) are capable of learning language by "mining" data presented to them. The researchers showed 28 young test subjects a number of pictures, two at a time, while playing two pre-recorded words, without any indication of which word went with which picture. The pictures and words occurred in various combinations, and the children were largely able to figure out which word described which picture through inference.

The researchers are quoted as saying that this "changes completely how we understand children's word learning." I'm not so sure about that, but it's an interesting result.

I found this via Slashdot which today also ran a story on a lawyer who accidentally outed the proposed terms of a $1 billion legal settlement — simply by misaddressing an email message. (Yeah yeah, cue lawyer jokes.) You'd think, perhaps, that someone entrusted with the handling of such delicate matters would have some kind of idea of how to secure their communication (like, I don't know, not sending sensitive messages unencrypted over the public Internet). Apparently, you'd be wrong.

Comment

Startling Discoveries

posted 14 Jan by peter (updated 21 Jan)

Martha_Washington

Martha Washington. Image from Wikipedia.

Apparently, I have been married before. To a really really famous person. To a really really famous man. A former president of the United States, no less. Strangely, I have no recollection of this.

Apparently, you see, I was the First Lady to U.S. President George Washington. At least according to a page about me at AllMyQuotes.com (if they should change it, here's an earlier version of the page from the Internet Archive).

Oh, and apparently, I died in 1802.

I can only assume that this is the result of a screen scraping program or database import script gone haywire. Both the description they have on me and the years of birth and death match the info in the White House's bio on Martha Washington. But AllMyQuotes.com instead describes Martha as the "First President of the USA" — a description that, in real life, of course fits George himself. So it seems the lines are crossed all over the place.

In fact, you might want to explore more of AllMyQuotes.com's author pages to learn that Vladimir Putin was a 19th-century theologian, that the CEO of Sun Microsystems is supermodel Elle Macpherson, that there was once an American historian with the peculiar name French Revolution Motto, and many other fascinating facts.

Full disclosure: I am the editor and webmaster of Aphorisms Galore!, which, I am sure, also contains factual errors, misattributions and other mistakes. I'm working on eliminating them.

Comment

Old PC + Linux = Green Computing

posted 10 Dec by peter

So I hear the UK government has concluded that Linux is more eco-friendly than Windows. The reason, of course, is that Windows users are forced to replace their computers more frequently, resulting in more e-waste. I can't say I'm surprised (though, admittedly, I can't say the environment is the reason I run Linux, either). In recent years, Windows has more or less become completely irrelevant to me, so I haven't tried Vista myself, but I have of course heard of its outrageous hardware requirements. And Windows has always seemed more resource-hungry to me than Linux.

My main computer is an Acer Ferrari 3000 laptop (yes, that bright red one). It's, um, around four years old, maybe, and only has 384 MB of RAM, so it appears I couldn't run Vista on it even if I wanted to. And I don't even know how old my server is. I got it for free several years ago when, emblematically, it had to be replaced because of some software upgrade (I know the people I got it from ran Windows on it, but I don't know which version). I have since stuck a couple of additional hard disks in it, but it's otherwise the same old clunker as ever. Despite this, using it as a file and web server, and even for watching movies, is no problem at all. I could probably have done this, more or less, under Windows too, but that would have meant sticking with some old and out-of-date version of the OS. Instead, I run a somewhat recent version of Fedora on it, and plan on upgrading it to the latest Xubuntu... some day when I have the time.

The Irregular Verb Deadpool

posted Oct '07 by peter

Not that it should come as a surprise to linguists, but a research team at Harvard have demonstrated mathematically that more uncommon irregular verbs regularize more quickly than common ones. They did this by studying works like Beowulf (from the 8th or 10th century, depending on who you ask), The Canterbury Tales (from the 14th century) and even Harry Potter. During this time, the seven conjugation classes used in Old English have become just one in Modern English. Yes, they only studied English, apparently.

As for which irregular English verb is next in line to go regular, their money is on "wed." Wanna bet? We should have the answer in just a couple of centuries.

Comment

Oh My Gosh, I've Been Nominated!

posted Oct '07 by peter

Today, I received notification by email that I have been nominated for inclusion in the "Honors Section" of the upcoming 2007–2008 edition of the well-known Hamilton Who's Who.

Okay, so I've never heard of it, but apparently, it includes "biographies of the world's most accomplished Professionals" (yes, so very accomplished that they're "Professionals" with a capital P). And apparently, "inclusion is considered by many as the single highest mark of achievement." So you can see that this has got to be legit.

Naturally, I decided to check out their web site, to see who else shared this honor with me. I immediately found a testimonial from a gentleman named Luís Gonzales, who was "delighted with the reach and caliber of the individuals using this network." See, they've got testimonials and everything. There can't possibly be anything fishy here.

I was a bit disappointed, of course, when Mr. Luís Gonzales' company name, Jorge Gonzales Group (perhaps it was started by his father), only yielded 28 hits in Google (even after I expanded the list by "including very similar pages"). Not to mention a bit puzzled to find that every single one of these hits — which were spread out across various different domain names — somehow related to Mr. Gonzales' inclusion in Hamilton Who's Who. But there could be a perfectly natural explanation. Maybe Mr. Gonzales is very successful in a field where being easy to find on Google just isn't a priority. Maybe he just really likes his privacy and takes great pains to stay off the Net altogether. Except where his inclusion in Hamilton Who's Who is concerned, of course.

And Mark Fallon, who was featured on their "Sample Bio" page, and his employer, the Creative Edge Advertising Company, were apparently even prouder of having been accepted into the directory. Every single one of this successful advertising firm's nine Google hits somehow mentioned Mr. Fallon's inclusion in Hamilton Who's Who.

And now it was my turn. Solemnly, I entered all my contact information…

Read More Comment

Got Duplicates?

posted Sep '07 by peter

Isn't it annoying when you have a directory (or, worse, an entire directory tree) full of files and you suspect there are duplicates here and there — two or more files with the exact same content — silently wasting your precious disk space, but you don't know which files are identical to which?

Well, I thought so anyway, so I whipped up this little Perl script a couple of years ago. Only recently, it occurred to me that maybe someone else somewhere sometimes has this same problem.

So here it is, for your downloading enjoyment: dups. The script currently uses MD5 hashes to determine whether or not two files are identical, so it requires Digest::MD5.

By default, dups scans the current working directory and prints out identical files in pairs, one pair per line, e.g. foo.txt == bar.txt, but you can give an arbitrary number of names of directories and/or files to have it check those instead. There are also a couple of switches you can play with. -a, for instance, makes it also check files whose names begin with a period (when you're checking a directory; file names given on the command line will always be checked). -r tells it to scan directory trees recursively (i.e., check subdirectories, and their subdirectories, and so on). -u prints out all unique files found, instead of the pairs mentioned earlier. And there are a couple more. If you get confused, just do dups -h and the script will helpfully tell you all the switches it understands.

An obvious improvement would be to have the script actually do something with the duplicates (or unique files) it finds — soft- or hardlink duplicates to each other, simply remove duplicates, or something along those lines. Another might be a "picky mode," in which the script would use diff to make absolutely sure that two different files didn't just happen to get the same MD5 hash.

Send your ideas, bug reports, thanks, complaints etc. this way.

Time to Blow the Crap Out of Your Computer?

posted Sep '07 by peter

The other day, I had to remove (and later replace) a dead memory module from my laptop. As long as I had the little latch over the memory socket open, I blew into the opening to remove a few visible specks of dust. What happened shocked me.

A large cloud of gray dust billowed out from every aperture of the computer casing. You know in horror movies, how they always find some ancient book that conveniently explains how to slay the monster that has already killed off all the extras and is now bent on an especially gruesome end for the main characters? Well, they find this book and, for some reason, blow hard on it, and a backlit shot shows a large cloud of dust to demonstrate to us just how very ancient and untouched and no doubt magical this book is. That's what this cloud looked like.

It had never occurred to me how much gunk might build up over time, so this prompted me to open up the computer as much as possible, without actually resorting to disassembling it, and repeat the "blow hard" operation for every socket, latch or opening. Thus having removed an amount of dust probably exceeding the amount NASA have collected from the Moon in all missions combined (though I'm sure their dust is more interesting than mine), I started the computer again, and was amazed to look at the temperature indicator that I always keep running in a corner of the screen since overheating has been a problem with this particular laptop since day one.

In the three or so years I've had this laptop, I've gotten used to seeing internal temperatures around 65–75 degrees C; putting some stress on the CPU could easily bring it over 100 degrees and cause the computer to turn itself off (measures can be taken to avoid this, though; more on that some other time). Now, the indicator hovered around 51 degrees, and even giving the machine a good workout barely brought it over 70. Under normal working conditions, my computer is now around 20 degrees cooler than it used to be. That's quite a difference.

So, allow me to humbly suggest that you…

Read More

1–9 (9)

© Wastholm Media 1997–2008