Teach yourself programming in ten years

Here’s an interesting article that Peter Norvig wrote about how wanna-be programmers are in such a hurry in becoming a full-fledged “developer,” buying into Teach Yourself <some language> in 21 days books and the likes.

I’ve been developing/programming for a while now and to become a developer, a great one, you simply cannot take shortcuts.

Every aspiring or veteran developer should read or re-read Norvig’s Teach Yourself Programming in Ten Years article.

Computer program detects author gender

I read an excerpt at the Nature website by Phillip Ball, wherein a computer program, developed by Moshe Koppel and his colleagues, can guess/tell whether the author [of a book] was written by a man or a woman.  It’s algorithm (see below), says Ball, basically scans for keywords and syntax to accomplish this feat, and is surprisingly “around 80 percent accurate.”

Ball states:

The program’s success seems to confirm the stereotypical perception of differences in male and female language use.  Crudely put, men talk about objects, and women more about relationships.  Female writers use more pronouns (I, you, she, their, myself), say the program’s developers, Moshe Koppel, and colleagues.  Males prefer words that identify or determine nouns (a, the, that) and words that quantify them (one, two, more).

There’s also a Perl module on CPAN called Lingua::EN::Gender that uses the algorithm below.

Moshe Koppel and colleague’s algorithm

Take any piece of fiction and do the following:

1. Count the number of words in the document.

2. For each appearance in the document of the following words ADD the number of points indicated:
‘the’ (17)
‘a’ (6)
‘some’ (6)
any number, written in digits or in words (5)
‘it’ (2)

3. For each appearance in the document of the following words SUBTRACT the number of points indicated:
‘with’ (14)
possessives, ending in ‘s’ (5)
possessive pronouns, such as ‘mine’, ‘yours’, ‘his’, ‘hers’, (3)
‘for’ (4)
‘not’ or any word ending with ‘n’t’ (4)

4. If the total score (after adding and subtracting as indicated) is greater than the total number of words in the document, then the author of the document is probably a male. Otherwise, the author is probably a female.