Hey @Twitter, here are some suggestions for dealing with spam
I am befuddled by how @Twitter can miss some blatant cases of spam accounts. So much that I have come close to conclude that these are paid accounts, thus won’t be removed no matter how much they are flagged and/or blocked. Here are some suggestions, based on what I have observed with spammers on Twitter, for spam-matching rules to improve the catch ratio. The accounts I use as examples have been hand-picked, so my points are open to interpretation, and could be way improved with data that Twitter has, such as tweet rate, number of spam flags and blocks, etc. These checks could be triggered in escalating order according to the number of users flagging an account for spam, as an example.
[Update] @Ed has replied to my tweet and part of this post:
I said “I have come close to conclude…”, not that they are paid accounts. But, it defies all logic how an account like @kredits can still be up and running after close to 64.000 (yes, that’s sixty-four thousand) spam tweets that break many of the rules/filters I have written about below. It is not a question of writing an algorithm for every variation, but a set of rules which give individual scores, and a minimum score to suspend an account. Basically, x-spam-score followed by an x-spam-status that determines account suspension, or lack thereof.
Follower to following ratio
Some spam accounts use aggressive follow techniques to try and spread their trash, and this gets reflected by auto-follow bots. The result are accounts with following/followed ratios close to one. Case examples: @Bqe1212 with a ratio of 1.01, or @vidalconsulting with 1.04. Others do not follow this approach, and only follow few accounts, for example, @kredits, with only 168 followers and following 49.
Tweet rate
One case I observed (the account is now suspended, so kudos there) had the particularity that tweets were pushed out every three minutes exactly. Twenty-four hours a day. This is something -very- easy to catch (and equally easy to defeat, but hey, some spammers are dumb).
Tweet content
We can split this check into various sub-checks:
1. Keywords
In the case of @jenlock1014, the word ‘money’ appears in almost every single tweet pushed out. The actual text of the tweets vary, as do the linked URLs, but the keyword is there. Other usual keywords are ‘free’, ‘cash’, and so on.
2. Linking the same URL
In some cases we see links to the same URL in every tweet, such as @Bqe1212, with tweets like:
http://twttr.me/dbxV Q&A: HOW CAN I MAKE MONEY FAST ON THE INTERNET FOR FREE!! NO …: by Chri… http://bit.ly/aJVi6Whttp://twttr.me/dbxV
and
http://twttr.me/dbxV How to Make Money Online With Online Writing Sites: There are many sites … http://bit.ly/cichxl http://twttr.me/dbxV
The target site’s linked short URL is different, but every tweet contains (two in this case!) copies of the same short link. Again, both tweets would also trigger rule #1 above for keywords.
3. Linking the same URL with differing URL shorteners
One technique often used is to spread the target link among various URL shorteners. This is the case of @kredits, which uses snurl.com, ej.uz, short.ie, bit.ly, and others, all of which redirect to the same final URL. A simple check, once an account is flagged for processing, is to follow all shortened URLs and look for patterns. For example:
- Exactly the same URL.
- Same host, same path, but varying query string (oft used to track sources).
- Same host, varying path, but same query string.
- Same host with both varying path and query string.
- Varying subdomains of the same host.
A combination of the above can be used to determine a spam score for a set of given URLs. An extra check when fuzzing techniques are used on the final URL is to parse the target site’s content, looking for similar headers, keywords, image URIs, Google Analytics account IDs, etc.
Reaction tweets™
Many times a spammer searches for certain keywords, and sends a reaction tweet when one is found. As an example, when I sent this reply to Ed Shahzade (@Ed) in reply to his tweet about auto-follower bots and spam, I received this other tweet from @atraiskredits:
@mikepuchol Problēmu var atrisināt ātrais kredīts? Izvērtēs kredīta piedāvājumu! Atver www.opencredit.lv un gaidi naudu savā kontā.
Obviously this is not English, and thus it was sent as a blind reply to my tweet mentioning @kredits without caring much about my original language, or wether I understand the content of the tweet.
On a flagged account, it should be very easy to check when response tweets are sent, by accumulating the words used in the original triggering tweets, and testing the occurrence of each word in all, or a high percentage, of them. As another case example, 10 minutes after @djsandman813 was sent this tweet by @kredits, and he replied this, @atraiskredits sent this reaction tweet. Screenshots below in case they go missing:
* OK “Reaction tweets” is not really trademarked, but maybe it should be!
Account aggregation
Spammers can try to avoid being flagged, or delay detection, by spreading their activity across multiple accounts. The way to detect this is to run a check among flagged accounts for the above filters, eg. catching various accounts all sending reaction tweets with the same short URL.
Account name
Many spammers are not too creative and simply throw random words and letters into the account name – this can also be an indicator of a spammer account.
Reaction flags
When a user receives a spam tweet, his initial reaction may usually be to block flag the sender as spam. An accumulation of such flags, particularly with other indicators such as single tweets towards a user followed by a flag (denoting not a conversation but a directed one-way message), should be enough to suspend an account.
What else?
I’m sure there are many other checks possible, but I have to get back to work – so, @delbius, do I get a job offer? Just kidding – was thinking of the guy who got offered a job at YouTube after writing ‘YouTube Instant’.










