Archive - Uncategorized RSS Feed

Hey @Twitter, here are some suggestions for dealing with spam

Twitter spam

I am befuddled by how @Twitter can miss some blatant cases of spam accounts. So much that I have come close to conclude that these are paid accounts, thus won’t be removed no matter how much they are flagged and/or blocked. Here are some suggestions, based on what I have observed with spammers on Twitter, for spam-matching rules to improve the catch ratio. The accounts I use as examples have been hand-picked, so my points are open to interpretation, and could be way improved with data that Twitter has, such as tweet rate, number of spam flags and blocks, etc. These checks could be triggered in escalating order according to the number of users flagging an account for spam, as an example.

[Update] @Ed has replied to my tweet and part of this post:

I said “I have come close to conclude…”, not that they are paid accounts. But, it defies all logic how an account like @kredits can still be up and running after close to 64.000 (yes, that’s sixty-four thousand) spam tweets that break many of the rules/filters I have written about below. It is not a question of writing an algorithm for every variation, but a set of rules which give individual scores, and a minimum score to suspend an account. Basically, x-spam-score followed by an x-spam-status that determines account suspension, or lack thereof.

Follower to following ratio

Some spam accounts use aggressive follow techniques to try and spread their trash, and this gets reflected by auto-follow bots. The result are accounts with following/followed ratios close to one. Case examples: @Bqe1212 with a ratio of 1.01, or @vidalconsulting with 1.04. Others do not follow this approach, and only follow few accounts, for example, @kredits, with only 168 followers and following 49.

Tweet rate

One case I observed (the account is now suspended, so kudos there) had the particularity that tweets were pushed out every three minutes exactly. Twenty-four hours a day. This is something -very- easy to catch (and equally easy to defeat, but hey, some spammers are dumb).

Tweet content

We can split this check into various sub-checks:

1. Keywords

In the case of @jenlock1014, the word ‘money’ appears in almost every single tweet pushed out. The actual text of the tweets vary, as do the linked URLs, but the keyword is there. Other usual keywords are ‘free’, ‘cash’, and so on.

2. Linking the same URL

In some cases we see links to the same URL in every tweet, such as @Bqe1212, with tweets like:

http://twttr.me/dbxV Q&A: HOW CAN I MAKE MONEY FAST ON THE INTERNET FOR FREE!! NO …: by Chri… http://bit.ly/aJVi6Whttp://twttr.me/dbxV

and

http://twttr.me/dbxV How to Make Money Online With Online Writing Sites: There are many sites … http://bit.ly/cichxl http://twttr.me/dbxV

The target site’s linked short URL is different, but every tweet contains (two in this case!) copies of the same short link. Again, both tweets would also trigger rule #1 above for keywords.

3. Linking the same URL with differing URL shorteners

One technique often used is to spread the target link among various URL shorteners. This is the case of @kredits, which uses snurl.com, ej.uz, short.ie, bit.ly, and others, all of which redirect to the same final URL. A simple check, once an account is flagged for processing, is to follow all shortened URLs and look for patterns. For example:

  • Exactly the same URL.
  • Same host, same path, but varying query string (oft used to track sources).
  • Same host, varying path, but same query string.
  • Same host with both varying path and query string.
  • Varying subdomains of the same host.

A combination of the above can be used to determine a spam score for a set of given URLs. An extra check when fuzzing techniques are used on the final URL is to parse the target site’s content, looking for similar headers, keywords, image URIs, Google Analytics account IDs, etc.

Reaction tweets™

Many times a spammer searches for certain keywords, and sends a reaction tweet when one is found. As an example, when I sent this reply to Ed Shahzade (@Ed) in reply to his tweet about auto-follower bots and spam, I received this other tweet from @atraiskredits:

@mikepuchol Problēmu var atrisināt ātrais kredīts? Izvērtēs kredīta piedāvājumu! Atver www.opencredit.lv un gaidi naudu savā kontā.

Obviously this is not English, and thus it was sent as a blind reply to my tweet mentioning @kredits without caring much about my original language, or wether I understand the content of the tweet.

On a flagged account, it should be very easy to check when response tweets are sent, by accumulating the words used in the original triggering tweets, and testing the occurrence of each word in all, or a high percentage, of them. As another case example, 10 minutes after @djsandman813 was sent this tweet by @kredits, and he replied this, @atraiskredits sent this reaction tweet. Screenshots below in case they go missing:

* OK “Reaction tweets” is not really trademarked, but maybe it should be!

Account aggregation

Spammers can try to avoid being flagged, or delay detection, by spreading their activity across multiple accounts. The way to detect this is to run a check among flagged accounts for the above filters, eg. catching various accounts all sending reaction tweets with the same short URL.

Account name

Many spammers are not too creative and simply throw random words and letters into the account name – this can also be an indicator of a spammer account.

Reaction flags

When a user receives a spam tweet, his initial reaction may usually be to block flag the sender as spam. An accumulation of such flags, particularly with other indicators such as single tweets towards a user followed by a flag (denoting not a conversation but a directed one-way message), should be enough to suspend an account.

What else?

I’m sure there are many other checks possible, but I have to get back to work – so, @delbius, do I get a job offer? Just kidding – was thinking of the guy who got offered a job at YouTube after writing ‘YouTube Instant’.

343 FDNY Never Forget

It has been nine years. On that day, 343 brave men lost their lives while saving thousands more at the World Trade Center, New York. Never forget. Below is the full list of those who never made it back from the towers.

A
Joseph Agnello, Lad.118 Lt. Brian Ahearn, Bat.13 Eric Allen, Sqd.18 (D) Richard Allen, Lad.15 Cpt. James Amato, Sqd.1 Calixto Anaya Jr., Eng.4 Joseph Agnello, Lad.118 Lt. Brian Ahearn, Bat.13 Eric Allen, Sqd.18 (D) Richard Allen, Lad.15 Cpt. James Amato, Sqd.1 Calixto Anaya Jr., Eng.4 Joseph Angelini, Res.1 (D) Joseph Angelini Jr., Lad.4 Faustino Apostol Jr., Bat.2 David Arce, Eng.33 Louis Arena, Lad.5 (D) Carl Asaro, Bat.9 Lt. Gregg Atlas, Eng.10 Gerald Atwood, Lad.21

B
Gerald Baptiste, Lad.9 A.C. Gerard Barbara, Cmd. Ctr. Matthew Barnes, Lad.25 Arthur Barry, Lad.15 Lt.Steven Bates, Eng.235 Carl Bedigian, Eng.214 Stephen Belson, Bat.7 John Bergin, Res.5 Paul Beyer, Eng.6 Peter Bielfeld, Lad.42 Brian Bilcher, Sqd.1 Carl Bini, Res.5 Christopher Blackwell, Res.3 Michael Bocchino, Bat.48 Frank Bonomo, Eng.230 Gary Box, Sqd.1 Michael Boyle, Eng.33 Kevin Bracken, Eng.40 Michael Brennan, Lad.4 Peter Brennan, Res.4 Cpt. Daniel Brethel, Lad.24 (D) Cpt. Patrick Brown, Lad.3 Andrew Brunn, Lad.5 (D) Cpt. Vincent Brunton, Lad.105 F.M. Ronald Bucca Greg Buck, Eng.201 Cpt. William Burke Jr., Eng.21 A.C. Donald Burns, Cmd. Ctr. John Burnside, Lad.20 Thomas Butler, Sqd.1 Patrick Byrne, Lad.101
(more…)

GPS adventures with a MiFi 2352

So I got a MiFi 2352 from Vodafone, which at 40€/mo unlimited data at up to 7.2Mbps seemed like a good deal, but it actually sucks where I am right now, getting at best, during the night, 300kbps. But I digress.

The MiFi is sold by Vodafone factory-unlocked, which is also like a good deal as there is no penalty for contract cancellation – naturally, it came with firmware version 5.15, which is ancient, and suffers from many drawbacks, one of which is poor HSUPA support. The one that caught my eye however was that the MiFi comes with a built-in GPS, which in theory provides positioning data to devices such as the WiFi-only iPad.
(more…)

So Dave Winer is tolerant and open-minded…not

[Update] After a tweet from @GadgetDon I thought I’d try to fix things, and thus deleted a couple of tweets that could have been offensive, and removed this post, with the thought of emailing Dave to ask what offended him so much. However, during the few hours since, I have been reading and researching Dave Winer’s background, and it seems I am not alone in what happened. It doesn’t seem to take much to be blocked out of Winer’s world, ergo, Winer’s world is by definition boring, uninteresting and dated. I have been going back through his Twitter feed (hint Dave: anyone can read your tweets just by logging out of Twitter, so blocking is pointless, duh!), and there hasn’t been anything that I didn’t find through other means, meaningful opinions or worthy information. Looks like I’m not going to be missing much. I’m moving on and re-posting this, there are tons of interesting people to follow on Twitter and blogs.

(more…)

Sobre Nikodemo, capital riesgo, y WebTV

En primer lugar, y dado que sé cómo se siente Albert en éstos momentos, darle tódo mi ánimo en su nuevo proyecto, el WebSeries Festival. Por otro lado, no puedo quedarme al margen de la mucha tinta que se ha versado respecto al modelo de WebTV, el capital riesgo, y los emprendedores, tanto para bién como para mal. Yo mismo he experimentado el que te digan “no” en repetidas ocasiones, escuchar que el proyecto no está teniendo “tracción”, o que le faltan cosas. Como última consecuencia, el “no” repetitivo forzó la venta de Whisher en condiciones no demasiado óptimas (por mucho que lo intente maquillar mi ex-socio en su perfil de LinkedIn, aunque ésa es otra historia que no viene al caso).

Me apena decir adiós a series como Cálico Electrónico, que en sus inicios nos hizo contactar con Albert sobre la posibilidad de que nos creasen un video animado de introducción a Whisher – aunque al final no se hizo por cambios sustanciales en nuestra página web. El cierre de Nikodemo & Co. viene forzado por no encontrar financiación que pudiese sostener el proyecto, que todavía tenía resultados económicamente negativos – aunque positivos en cuanto al público y lo social. Albert se queja de la falta de “riesgo” en la ecuación “capital riesgo”, aunque quizás el primer error fue la elección de las fuentes de financiación. El capital riesgo (en adelante, VC, como en los contratos) puro, tal y como se entiende en el mundo de los emprendedores, es desgraciadamente muy escaso en España. Me vienen a la cabeza unos pocos fondos, como Nauta, Debaeque, Adara, o Perennius. Más abundantes son los “business angels”, que son como un VC pero sin un garrote tan gordo para cuando van mal las cosas. Por debajo de aquí tenemos ya a los innumerables fondos, créditos, ayudas, viveros, parques tecnológicos, y pseudo-VCs. Los más preocupantes son éstos últimos, ya que en los primeros casos las cosas están bién claras desde el principio. Cuando accedes a un préstamo tipo NEOTEC, los términos son claros:

La empresa devolverá la ayuda a CDTI según vaya generando cash-flow positivo. Para ello, la empresa se compromete a facilitar a CDTI anualmente las cuentas anuales cerradas. La cuota anual de devolución será de hasta un 20% del cash-flow positivo generado hasta la amortización total del crédito.

Es decir, no corres riesgo. Si la empresa no llega a afianzarse, no tienes que hipotecar o vender la casa e irte a vivir debajo de un puente para devolver préstamos. el CDTI también se blinda un poco en cuanto a su riesgo de esta forma:

CDTI anticipa a la empresa, a la firma del contrato que regula la ayuda NEOTEC, entre el 40 y el 60% de la ayuda aprobada. El resto se entregará a la empresa a la finalización y justificación técnica y económica del proyecto-plan de empresa aprobado.

Si tus cuentas no dan resultado, el CDTI habrá perdido un máximo del 60%, de a su vez el 70% del coste total del proyecto, que es lo que otorgan. Otros tipos de ayudas oficiales se rigen por términos similares, y se convierten en una buena opción de capital semilla. El único problema es el arduo proceso de solicitud y trámite, que en ocasiones, puede alargarse meses, demasiado para una startup. Para solucionar en parte este problema, han aparecido una serie de empresas que se dedican a asesorar a startups en el proceso, a cambio de cuotas mensuales y/o porcentajes del capital conseguido – también otro tema para tratar en otro momento.

(more…)

My PCB business card flashes its LEDs!

Finally, I received the new PCBs from the manufacturer, after the first batch were found to be defective on track continuity (possibly due to too aggressive etching). This is a short video showing how the first one I assembled and programmed works:

Merry Christmas to All!

Enough said.

Trust

n. Firm reliance on the integrity, ability, or character of a person or thing.

Google search for “trust”:

Must be a pretty important thing.

Trust is something that can take years to build, but only a second to destroy.

Few things are more painful than someone breaking your trust.

My new policy against trolls

I don’t like to censor people’s comments, even when they are repeatedly stupid, annonymous, empty in content, or just “why don’t you say your own company is shit” type of crap. Most blogs, particular those widely read, are actually critic, rather than full of praise. They do of course offer some praise when it’s due, but they don’t look like corporate blogs or paid shills. Same goes for this blog – I tend to write mostly when I get pissed off at how something works, or someone acts. I also write positive comments when something is actually done right, or above and beyond the call of duty.

It so happens that some people feel it’s their duty to attack with empty words, without any substance or fact, in defence of what they believe is “right”. They will post a comment, with false names and email addresses, not revealing for who they are actually posting. They are called trolls. Thus, my new policy on troll comments will be this: I will approve the comment, and ask for an explanation to the attack (as in, actual facts), plus a real identity behind the commenter. If either fails to turn up in three days, the comment will be deleted. Let the retard games begin!