The name John is the most used by email fraudsters

This is the story of how participants in an international innovation competition attempted to cheat their way to the prize money.

In 2013, a consortium of international organisations created a joint venture to support innovations and new technologies that promote good governance, transparency, and citizen empowerment. To this end, the association launched an international innovation competition to award seed capital to the most promising ideas submitted by participants. The competition ran for three years, ending in 2017, and received concepts from over 50 countries.

The fickle nature of the human psyche cannot resist the urge to cheat whenever there is an incentive for cash.

The competition set £300,000 (Sh30 million) as prize money to be shared among winning ideas. Submitted business plans underwent three evaluation stages to stand a chance of winning: a public "ballot", a project evaluation, and an expert review. It is the first stage that is of interest to this analysis as it draws a parallel with election voting. In this stage, individuals put forth business concepts on an online portal and solicit votes from the general public.

A caveat on the voting specified one vote per person, but an individual can vote for multiple business plans. The highest-ranking concepts at this stage proceed to the next round. Remedial measures such as wild cards and regional representation featured in balancing the number of votes.

Data on the public voting phase for the three years that the contest lasted was obtained from the consortium to analyse the votes. The fickle nature of the human psyche cannot resist the urge to cheat whenever there is an incentive for cash. The task at hand, therefore, is to find out how people cheat. A warning from organisers informed participants that they would be disqualified if they engaged in unfair practices.

Singularity

When individuals seek to cheat, they do so by maximising the use of personal resources, including gadgets, multiple email addresses, different IP addresses, domain names, etc. Thus, the task of finding fraudulent votes centres on identifying two or more votes that belong to the same person. Clustering variables related to each entry is a suitable methodology for investigating the similarity of votes. To that end, we find votes that have similar-looking email addresses, same IP address, and close voting times. The diagram atop the page shows the initial closely matched votes from the idea that garnered the highest number of public votes.

We can observe that in the first vote the voter makes use of a personal email address. In the subsequent sections, there are votes from the same IP address that bare the same initial (vp) as the first email. It is in this section that we observe four fascinating facts. The first is that the initial three email addresses are used to vote within a minute. These addresses also share the same web address. Upon further investigation of the web address, the purchase date matched the initial voting day. Hence it is probable that the user bought the domain and created several email addresses for voting purposes.

The second intriguing fact is in the next two emails in this section. These emails emanate from an internet domain that registers temporary email addresses. Several websites such as https://10minutemail.net and https://temp-mail.org/en/ provide temporary email addresses that expire after a set duration. A user can create an unlimited number of these fake email addresses and thus create non-existent users. On initial scrutiny, the email addresses pass the verification test. Running the same test several weeks later returned a negative result. In this scenario, it seems the user
ran out of provisions of email addresses on the bought web domain and opted to create temporary email addresses.

The third impressive fact arises from the time of day the user under examination prefers to place votes. The time recorded for each entry is not within working hours (8am – 5pm), which implies that the user was employed and partook in the nefarious activities during free time using home resources. A prime candidate for this profile thus far is a web developer or software engineer.

The fourth enchanting feature is the use of initials (vp) in all the email addresses. The voter introduces a ‘marker’ that they use as a starting point to create new email addresses. In behavioural economics, the observed phenomenon dwells in theory as the concept of anchoring and adjustment. By definition, the theory states that people start with an implicitly suggested reference point (“the anchor”) when they encounter a new situation, then make adjustments to it to fulfil a task.

An example is that when travellers visit a city for the first time, they will make the tallest building an anchor for use when lost. Thus, adjustments are made from the anchor to find direction to new destinations. The user, in his or her bid not to get lost in the chaos of email names, opted to have anchors to update in creating sets of email addresses as shown in the example below.

The voter anchored these set of emails on the name Rob and updated it to create multiple aliases. All these emails came from the same IP address and used both the bought web address and temporary domains – a pattern we have established to be spurious. It seems entirely impossible for a user to create hundreds of random email addresses without repeating specific keywords or creating an anchor – that’s the trap of finding fraudulent votes. With this theory in mind, we analyse the occurrence of names in the email addresses that have anchors from the top idea as shown in the chart below.

  • James
  • John
  • Robert
  • Michael
  • William
  • David
  • Richard
  • Charles
  • Joseph
  • Thomas

The top names used as anchors are John, James, George, David, Peter, Bill and William. Preference for the male name does suggest the voter is also male. Comparing top anchor names with the most popular American male names reveals an unexpected similarity. Herein are the top American male names as provided by the Social Security Administration.

There is a close match between the top America names and top anchor names, an indication that the voter used an American fake name generator website such as https://www.fakenamegenerator.com/ to create usernames for email creation. About 80 per cent of votes for this idea bore these hallmarks of cheating.

Duplicity

The next two business plans with a high number of votes exhibited a puzzling pattern – 97 per cent of voters between them were similar. Although the same individual submitted both proposals, it is impossible to get almost the same number of people voting for both ideas. The deduction in this situation is an attempt to ensure both ideas got a high number of votes. An inspection of the email addresses shows a quarter originated from a domain hosted in a different country that speaks a different language from the idea's origin.

This fact is an odd occurrence as it is unlikely that a quarter of votes originate from a country not conversant with the language of the business idea. An examination of the ‘foreign’ email addresses confirms the suspicion. Unlike in the previous business plan, the usernames did not have an anchor and included both male and female names. Nonetheless, a strange detail persists – all the usernames were English names. You wouldn’t expect people in a non-English-speaking country to create their email addresses using English names.

The only verdict is vice, with over a thousand of the ‘foreign’ votes stemming from nine IP addresses. A firm conclusion is that the owner of the two ideas hired a foreigner over the internet to vote for their business concept. The contracted party already had these email addresses created a while ago for use in nefarious activities around the web – however, they never counted on the language difference to be a red flag in this particular use case.

Aftermath

The first top three business ideas in this competition were disqualified on account of unfair practices. Such challenges are endemic to online voting systems. As the world goes digital and hopes garner momentum for replacing long queues with electronic voting, two key technologies will guard against fraud – artificial intelligence and blockchain.

Machine learning, a concept in artificial intelligence, provides robust techniques for tracking the ever-changing human behaviour. Instead of rule-based detection, it adapts to the changing instance of fraud through statistical learning processes, and is thus a suitable choice for building predictive models for fraud detection systems.

On new technology, blockchain is promising a foolproof database to store every operation in a system. Given that blockchain is an immutable database, it protects records from manipulation. Together with AI, blockchain secures the future of democracy.

*The writer is a data scientist