We take them for granted but without them — and the unsung tech-head heroes who write them — we wouldn’t be able to get to work, sort our emails from the spam, search the Web or do a gazillion other things that algorithms make possible.
Here are just some of the algorithms we use every day without even thinking about them.
The Bayesian Algorithm Keeping Spam at Bay
The sort of anti-spam filters used on both email programs and by Google to stop search engine spam are all based on the Bayesian algorithm.
Bayes’ Law is a probability theory which, when used in a spam filter, estimates the likelihood that content is spam based on the probability of certain words appearing together. The algorithm itself looks like this:
This means that the probability that an email is spam if it contains particular words is equal to the probability of finding those words in a spam email, multiplied by the probability that any email is spam, divided by the probability of finding those words in any email.
Or to put it another way, if it looks like spam, it probably is.
Bayesian filtering is also flexible and “intelligent.” It can be trained to differentiate between spam and legitimate email based on a user’s online activities and a subsequent analysis of the email’s content, headers, HTML code and meta tags. It can also assess the probability of one online newsletter being spam rather than another based on web Wurfing patterns.
Of course, filters still have to decide which words are likely to be part of spam, and probability isn’t certain so you’ll still get some emails you don’t want — and lose some you do. But a quick look at the contents of your spam filter will show you just how useful Bayes can be.
Porter Carries Search Engines
Porter’s stemming algorithm is widely used by search engines to understand search terms. The algorithm strips words down to their roots, then searches for the root plus all the possible related linguistic constructions.
Martin Porter wrote the first widely-used stemming algorithm in 1980 and it soon became the main algorithm used for stemming in the English language. Since then, several types of stemming algorithms have been developed, using a range of different techniques. Query tables can match an inflection, for example, while other rules focus on stripping out suffixes.
The more complex the language, the harder it is to design a stemming algorithm, which has to take into account character encoding, verb inflections, noun declensions etc. Russian is difficult; Hebrew and Arabic are worse.
Martin Porter kindly made his algorithm available for public use and it is distributed freely over the Net to anyone who wants to use it to develop an information retrieval system. Porter’s original paper published in 1979 can be read here on his website.
Flickr’s “Interesting” Algorithm
Not all algorithms are as serious and complex as search engine stemming or spam mail filtering. Some are fun too. Photopreneur, recently ran an interview with Serguei Mourachov, a programmer at Flickr who worked on the algorithm that decides which photos make it to the site’s Explore page.
According to Seguei, the algorithm calculates the Probability of reaching the Explore Page (PEP) by counting the number of times an image is faved, commented on and viewed. It them makes deductions from the score if the photo appears in more than “15-20” groups, if the photographer has been highlighted recently or if it appears in groups with lots of unsafe photos or requirements that would increase the PEP.
One of the best things about the algorithm (from Flickr’s point of view, if not the view of ambitious Flickr members) is that it can be easily adjusted to give different weightings to faves, comments and views to suit the “current climate of [the] Flickrverse.”
And you’ll still need to take a good photo.
Traffic light controls
Of course, not all algorithms are used online. One of the non-web uses of algorithms is in traffic control.
It’s estimated that intelligent traffic control systems can save billions of dollars a year by reducing idling time at the lights and lowering fuel consumption. The system is based on dynamic programming algorithms that take into account the time needed for a car to go through the string of lights until it gets to its destination. It then calculates the “green wave.”
Next time you’re sitting in gridlock then, don’t blame the rubberneckers. It’s probably a programmer’s fault.
[tags] Algorithms [/tags]