Computer program detects author gender

I read an excerpt at the Nature website by Phillip Ball, wherein a computer program, developed by Moshe Koppel and his colleagues, can guess/tell whether the author [of a book] was written by a man or a woman.  It’s algorithm (see below), says Ball, basically scans for keywords and syntax to accomplish this feat, and is surprisingly “around 80 percent accurate.”

Ball states:

The program’s success seems to confirm the stereotypical perception of differences in male and female language use.  Crudely put, men talk about objects, and women more about relationships.  Female writers use more pronouns (I, you, she, their, myself), say the program’s developers, Moshe Koppel, and colleagues.  Males prefer words that identify or determine nouns (a, the, that) and words that quantify them (one, two, more).

There’s also a Perl module on CPAN called Lingua::EN::Gender that uses the algorithm below.

Moshe Koppel and colleague’s algorithm

Take any piece of fiction and do the following:

1. Count the number of words in the document.

2. For each appearance in the document of the following words ADD the number of points indicated:
‘the’ (17)
‘a’ (6)
‘some’ (6)
any number, written in digits or in words (5)
‘it’ (2)

3. For each appearance in the document of the following words SUBTRACT the number of points indicated:
‘with’ (14)
possessives, ending in ‘s’ (5)
possessive pronouns, such as ‘mine’, ‘yours’, ‘his’, ‘hers’, (3)
‘for’ (4)
‘not’ or any word ending with ‘n’t’ (4)

4. If the total score (after adding and subtracting as indicated) is greater than the total number of words in the document, then the author of the document is probably a male. Otherwise, the author is probably a female.

Leave a Reply

Your email address will not be published. Required fields are marked *