Archive for the ‘general’ Category

Form protection

Friday, August 5th, 2011

I’ve seen a discussion recently on how to protect your forms from spammers/bots that come and fill the forms to either fill your database with crap data or fill your page with porn links. When I read the answers I figured out that none of the people read the amazing article I did years ago, so I decided to try to remember what it said. So, a big fat disclaimer: I read this in an article somewhere and I don’t remember where. If you know the original article please post it in the comments, I’d love to link to it, I bet it has way more info than this one.

The problem

Almost all websites now have some forms on them, some of them are contact / registration forms, others use the data submitted and display it on the site itself (comment forms). But letting others submit data to your site/database opens you to all sorts of attacks. If you actually show the content of the submitted form, you’ll get a bunch of spammers posting comments with lots of links. If you only store data and not show it anywhere you’re still at risk – if you don’t notice your disk can fill up, your database may grow beyond its limits,… So what we want to do is to prevent bogus form posting.

Spammer approach

If you think about writing a spam-bot that will try to spam as many sites you possibly can you have two basic approaches.

Record / replay

This is a very simple approach – you use a person to submit the form, preferably with something that looks like real input and record the request made. Then you hand that data off to the bot, it changes some content and tries to resubmit it.

Automation based on heuristics

I wanted to say AI, but it really isn’t. What it is is a set of simple rules and randomized values that the bot thinks might trick your site into accepting the submit. Let’s say you have three fields, two are inputs with field names “name” and “email” and the third field is “comment”. A simple script can fill these with “valid” data and try to submit it.

Human entry

By far the simplest, but also most costly for spammers. Go on Amazon Turk or whatever other service, send a link to a Google Spreadsheet and have people manually enter the stuff into your forms. This is the source of “Sorry, this is my job” endings to spam comments.

Popular solutions

Turing test

Add a field to the form that the user must fill with something that humans can do easily, but machines can’t. The biggest group here are Captchas (display an image with distorted characters, possibly an audio link that reads out the same characters, and have the user somehow figure it out and write the solution), but there have been others, like a “Human?” checkbox, or “3 + 4 =” fields, “Is this a kitten?” with a pic next to it.

2-step process

Supposedly by far the easiest way to do this is by introducing a 2-step process. After the initial submit, you get back a preview of the submitted data, possibly with a checkbox that says “Is this correct?” and another submit button. Robots are usually not good at following through and thus don’t actually submit the content.

Both solutions have an impact on user experience. With Captchas it’s sometimes really hard to see what they are and even if they have a “different image” link, it just seems like the owner of the site wants to make your life hell before you can submit your data. The other challenges might be easier on the user, but also easier to figure out if you’re being targeted by bots. The 2-step process works great for comments, that usually don’t have an edit link, so it might actually be good for user experience if done right (not Wikipedia style), but are less appropriate on other types of forms.

Protect yourself differently

These are the techniques that should prevent most bogus form entries from random passing bots, except “Human entry” – no protection for that, even though Captchas try hard. There is not much you can do when you’re targeted…

Honeypot field

Use this field to trick autoguessing bots to submit something in a field you know should be empty.

  • Add an input field to your form with a regular name (state, maiden-name,…) that does not appear on your form otherwise.
  • Use a label that will clearly communicate that it needs to be empty.
  • Hide it with CSS, preferably not by adding class=”hidden”.

If the form post includes content in this field discard it and redirect back to the form. The trick is to make sure the bots don’t figure out this is a honeypot, so use valid looking but nonsensical classes…

Date field

Use it to prevent resubmit of data too far from the creation date. Allow users a few hours to post the form.

To prevent manual modification you can use either proper encryption (symetric or asymetric) that will allow you to decode it on form post or use this date in combination with the onetime token.

Onetime token

Use this field to prevent replay of request data. If you can, save it into the database.It is a good idea to make this token in a form that it cannot be faked (say one character changed ad you have a valid one). This can be done with hashing data or encryption.

This one can be as tricky as you want. What I usually do (disclaimer: I don’t know much about encryption so this might be crap advice) is use a plain datetime field with the onetime token generated from IP address, UserAgent and the date field with HMAC. There is no need for this token to be reversible – I can recreate the same thing with the data from the form post and check if it matches.

When using these techniques make sure you take care of the user experience. If you detect a problem on what might be valid user input (“timeout” on the date field with a non used onetime token, wrong onetime token from an ip change by the service provider), you might want to display a second step from the “2-step process”. Whatever you do, don’t call your users spammers or bots – be nice, bots don’t read the text anyway.

Did I miss anything?

I know of no plugin that uses all of these techniques, but I haven’t really looked for it. What I do know is that I don’t want to ever use a Captcha, cause it often keeps me out, and the 2-step process in just too weird sometimes. Hope this helps. And again – if you find the original article (must be some 5 years old now at least if not more) or have any other solutions you use or endorse, do leave a comment.

Hiring developers: King of the Hill effect

Friday, August 14th, 2009

You might have noticed that we’re hiring at Zemanta. As we’re a start-up we’re looking for experienced developers that can get to work right away. The environment of browsers and as if that’s not enough blogging platforms and rich-text editors is very challenging so the bar is set quite high. As we were talking about hiring a new person we tried to write a job description and application invitation that would set the bar high and also test some things we wanted to know before conducting the interviews. It seems we might have went a bit too far as we only got a few applications.

You always expect some people to send you a job application request that will not conform to what you’re requesting and that’s ok – you can easily ignore these applications. You will also always get some people who don’t fit the requirements by a mile but are trying to get “on the shortlist” for possible openings in the future – that’s ok too. And you’ll also get a lot of people that will value their own knowledge far more that they should – these are the ones I wanna talk about here.

You’ve probably met a bunch of “airheads”, “egomaniacs” or whatever you call the people that are full of themselves and describe their knowledge as expert but turn silent when you pop a simple advance question. I call them “king of the hill” types.

Assessing your own knowledge is hard

It’s in our nature to compare. My “house” is bigger than your “house” is part of our minds, even more so in Slovenia, a small market where basically everybody knows everybody. So it’s only natural that we assess our knowledge based on comparison. There’s an obvious problem with that – I will have no idea who you’re comparing yourself with and therefore your score will make no sense to me. In that sense it’s similar to confidence levels in search.

You might think setting a comparison chart would make the scores better, but it really doesn’t. If you tell a developer that he should assess his knowledge of a language based on a scale where 1 is “can read it” and 10 is “i invented it” you’ll get a lot of 8s. Which would mean they’re basically the best developer for that language in the country. When you do, you can easily think that the guy saying it is a moron and discard him as a viable candidate. And you’d be wrong doing that, at least sometimes.

Are you King of the Hill?

When people overestimate their knowledge it’s because of two basic reasons:

  1. They are genuine asswipes that think they know more than the guy that invented the language, but know showing they’re an egomaniac on the interview is not smart. So they’ll lower the score to an 8 to make you feel good. These guys are usually easy to recognize as they’ll be defensive and dismiss any questions they don’t know with something like “that sucks”, “i never need that”, …
  2. They really have no idea what else is out there. I was sure that in the age of internet such people don’t exist anymore, but even in computer related industries you can easily find people that got stuck in a particular part of the web of amateur forums and people of the previous variety. This means they do actually know everything, but don’t realize that everything is a lot bigger than they know.

But this is only a part of the story.

Who to hire then?

I’ve drawn a simple graph to help explain this:

Don\'t hire the King of the HillDon’t hire the King of the Hill!

The beginners

As you start learning about something you’ll know that you know next to nothing. You’ll also see a lot you can learn and might even see where you want to get – as near to the “i inveted it” as possible. At that time your think you know less than you really do – these are the people you should hire only if you are willing to wait until they learn more and if you believe they can. As you learn and overcome the problems you thought were the “the big problems” you’ll go over the equilibrium and become a smart-ass.

The kings

This is where it gets interesting. Some people like to be king of the hill so they will ignore everything beyond that point. They will also make sure that people lower on the actual knowledge axis will not see over that point. They’ll have all sorts of reasons why everything beyond this point is crap. These are the people that make the most damage to development communities, as they’re usually the vocal ones. You should not hire them.

The enlightened

But as I said before they might really just have no idea what’s beyond that point. That’s easily solvable even during the interview – you can show them some code, throw around some ideas and arguments on why that’s good and some people will say “Wow, I didn’t even know all that is possible!” (yep, I actually got a response like that). You should hire this people immediately – seriously, don’t let them leave the interview without signing a contract. Tell them they’re the last interview before the decision and that you decided already and don’t need to wait. And because they’re now sure they know less than they really do, you’ll be able to get great value for money.

You must however be really careful with developers in this stage. They will climb the curve fast and soon they’ll start whining about lack of challenging work. They’ll also start to want more money. They might even want to say they want to be a team leader. When they do, they’re really just saying they want more money. Don’t confuse expert developer knowledge for managerial skills!

The experts

The best you can do at this stage is making them architect or senior developer. And with that done you need to start sending them to topnotch conferences and encourage them to write papers trying to get a gig at one of them. They’ll meet with the inventors, see that they’re actually normal people with a unique set of knowledge that is much wider than they expected. The new “lack of focus” will keep them busy and inspire them. They might become a better developer or stay at the same level, but they’ll be happy. They’ll be the kid with a new toy. With the ability to find a new toy when this one gets boring. That’s what supporting extracurricular activities and flexibility is good for.

Reblog this post [with Zemanta]

Firefox 3 Release Event 2008

Tuesday, June 24th, 2008

So I’ll be talking at Firefox 3 Release Event at Kiberpipa today. Feel free to come listen to the talks or just come to the party. If you can’t come you can watch the whole thing online (the link is likely to be available somewhere on the event page).

Review: Adria Airways and NLB

Monday, June 16th, 2008

Recently two more big and very frequented Slovenian sites relaunched and I think they too deserve a mention.

Adria Airways

The first page I want to put to the test is the new page of the first and the biggest Slovenian airline. It was recently launched by my ex colleagues at Parsek as the second version to be made there. The first edition was designed and prepared in another agency and Parsek only did the backend while the new version is all Parsek. To be fair the biggest and the most important part — the reservation module — is still made by the french company Amadeus.

The new design tries to incorporate a leaner navigation with less elements even though it became wider, almost reaching the 1000px mark. The front page is much more sales oriented, displaying a lot of useful information. I can’t get past the color scheme that is really too dull. There are quite a few validation errors, the ones in HTML mostly due to non–escaped ampersands, while those in CSS are just sloppy coding without checking the validator.

I was surprised to see that some stuff doesn’t work well with Firefox 3 and Safari 3 even though the first one isn’t released yet (will be tomorrow) and the second one doesn’t have a lot of users in Slovenia. I’d still stick to what Yahoo! has to say in their Graded Browser support table for browser support.

I was positively surprised at how well some inside pages are designed down to the last dot and icon and negatively how bad the pages that “only” present CMS content look. I don’t know whose fault this is and I don’t even care, it doesn’t matter for the end user. I’m sure the guys at Parsek will check these pages out and try to make changes that will make them better. When I first saw the design while I was still at Parsek I wasn’t sure if the title on the right would work but now that I’m surfing the page I actually think it does. There is one problem there though – if you visit this page (screenshot) you’ll see that you can see its title “About us” four times in a very small area. It’s nice to know where you are but isn’t this a little bit too much?


The next big redesign is the biggest Slovenian bank which redesigned their site after quite a while. I don’t really know what to say about the redesign – the last one was horrendous so this one is easy on the eye. It too got wider and restructured so people can find relevant information easier. The home page lists all the products for residents and businesses so you can access them directly.

If the design got overhauled the backend didn’t — if it did it got it fashion tips from the 90s. Validation returns a lot of errors and — prepare for a shock — the encoding is iso-8859-2. The number of non semantic elements is significant and inline scripts are there too (<SCRIPT language=JavaScript>).

The most interesting thing about the new page is the fact that it now uses “friendly URLs”. And how utterly broken they are. You could also say this page is a textbook case for how wrong things can go when you don’t think about them. So you’ll have two pages, one at /nalozbe-v-vrednostne-papirje and the other at /nalozbe-v-vrednostne-papirje1. I have no idea how that tells you anything about how the content behind these links is different. It would tell you more if the first was prefixed with /residential and the second one with /businesses.

Another funny thing I noticed is how banners are designed to look as if they weren’t images but rather just HTML parts of the page. The reason I noticed is that I was on the Mac while checking the page and since font rendering is different it looks really weird. I think I might have seen the same difference on Vista with ClearType on.

Zemanta Pixie

Leaving Parsek

Friday, May 23rd, 2008

After quite a few years devoted to growing and developing Parsek I’ve come to the end of this very interesting and challenging project. I decided to continue my career elsewhere.

New challenges lie ahead and I will delve into them with everything I have.

Video is the new AJAX

Monday, April 14th, 2008

TVImage via WikipediaRecently there’s been a lot of video news sites popping up here in Slovenia. In addition to TV networks almost every newspaper site now has a video section. I understand that these sites need to evolve and that media is changing. Every year we see statistics changing telling us we read more on the web and less newspapers. Even TV is losing ground. The media business is changing and in this ever changing world the easiest and the cheapest solution is to follow what others are doing. Unfortunately this also means that you do things without thinking them over thoroughly.

When you do that you have a problem – you’re thinking that you’re giving readers what they really want but in turn you’re giving them what you want. Or what you think they want – either way you’re not on the right track. That made me think of the ways I watch video online and the ways I want to watch it.


Most video I watch is actually not on the internet – it gets downloaded (almost) automatically into iTunes. I don’t watch the podcasts everyday even though some podcasts are daily news reports.

So local media companies are adding podcast feeds to their video content and hoping that people watch them[1]. Newsflash – podcasts are not a technical issue. Most people don’t even know what feeds are (another story), why do you think that they know what podcasts are?

The solution here is quite simple – for a quick start of course. Make real podcasts, use the news you’re making or providing on your site already. This way you can leverage your existing content while providing something that people might actually watch. Focus on local news[2] and target the younger audience, with daily episodes not exceeding 4 minutes in length. A very important thing is choosing the presenter – they need to reflect your your goals and suite your target content and audience. This means that your average TV anchormen won’t work – check the most popular podcasts to get the feeling what you’re looking for, keywords probably being humorous, personal, friendly.

Such podcasts have a few possible ways of monetizing themselves. One possibility is to add commercials (add them at the end, not the beginning), you might have weekly or monthly sponsors that you display in the background or even at the beginning of the show (not more that one screenshot). Since you can differentiate your subscribers from random web users you can adjust advertising to get most from both worlds. Be creative!

TV shows

Fortunately both local TV networks now have ways of watching locally produced shows I’ve missed. I do that quite often[3] since I can’t really fit some of them into my already busy schedule. When I’m watching such a show on the internet that’s probably the only thing I’m doing at that specific time and means that the computer is actually acting as if it was a TV.

Since I can move the slider you can’t push ads to me as you would on TV. That doesn’t mean you can’t have ads in such shows, you just need to think about them differently. What I do often is pause the video to check my email, browse around or just wait for the show to download – perfect time for placing ads. When I come back there’ll be an ad waiting, I’ll click next and continue watching the show.

The idea is not mine – when I was in the Netherlands a few years ago I went to the movies – in the middle of the movie there was a commercial which announced a brief break during the movie. I don’t remember the commercials going on while the break lasted (we all left the theater) but they were on again when we started coming back.

A great option with watching TV shows would also be to allow me to set the shows in my profile – that way I could see when something will be on TV and when I can watch it online. If I have a few shows to check you should allow me to add them to a playlist much as I would in iTunes or on my iPod. And I wouldn’t mind ads in between – if I’m watching a show that has already preloaded you could preload an ad into memory and play it while you start buffering the next show in my playlist – I’d have to wait anyway. You could also create a podcast that would push the shows I added or subscribed to.

Video news

This is the one that most media providers do currently and get it wrong most of the time. When reading news on the internet I’ll have many tabs open since what I’m doing is browsing. This means I’ll start at the homepage and then click on random news there, maybe click a category I’m really interested in, when news open I might click some related news and so on. This “trip” is rather random and fast.

Since I’m in browsing mode I’m more likely to only skim the information on the news page. This means that when I come to a page that only provides a video I’ll have nothing to skim and will close that tab immediately. I won’t see the ad in front of the video and I won’t see the video. In a month I might discover that I’m not getting quality information and move on to another site that will let me skim what I want to skim and fully concentrate on what I want to see.

Video as add-on

One solution to this problem is to use the video to convey information that text can’t. For example if you’re talking about a football match you might add video of the best move or all the goals scored. Another possibility would be that you’re pushing news on Britney and you add video of the incident. This way I can skim the news, figure if I want to see the video and check it if it interests me enough.

Video as primary content

When you think the only way to present content is video (I don’t think that ever happens but some do) you could use the idea already mentioned – profiles and sort-of bookmarks. I first saw this implemented on the International Herald Tribune website for text only articles – while browsing and skimming for interesting news you add what you want to read to your profile. At the end you can sit back and read what you saved or in this case check your own news show. Hey, you could even add social features to this with sharing of such shows (technically speaking playlists) with friends,… This also makes ads less invasive since you can add them less often then on every video I watch.


Some of you might know that I hate AJAX and I do for the same reason I hate video on the web currently. There’s a bunch of idiotssites shoving it down my throat in totally inappropriate ways and I really hate being molestedbothered this way. Technologies are here to solve problems and the only way they can do that is if people think what problems they solve better than others. That way we can read the news, watch the video, get an AJAXy[4] exeprience when and where we want to and where that specific technology solves our problems best.

  1. I’d really love to see the statistics on that. Anybody know where to get them? back
  2. We get world news in other podcasts or from other sources – keep it linked to what you know best. back
  3. More often on PopTV since I prefer their way of delivering content – via a fullscreen Flash interface – opposed to a small Windows Media / Real player on RTV Slovenija. back
  4. By the way – with all the AJAX around home pages of both local media houses reload automatically (which could really be an asynchronous request to retrieve the latest news) – one with a meta refresh tag and the other with inline JavaScript. back