Implementing the Apriori Data Mining Algorithm with JavaScript

I’ve been working on my thesis for a little too long. I’m hopefully about finished, but that is beside the point. Part of what I’ve been working on revolves around the Apriori Data Mining algorithm. If you know what Apriori is, and you are looking for how to implement it, then this post is for you.

I’ve created a JavaScript implementation of Apriori. Admittedly, JavaScript isn’t probably the most efficient programming language to implement Apriori with; however, I was constrained to use it for my project [1]. There have been many improvements to the Apriori Algorithm since Agrawal suggested it in 1993. I chose to instead create a simple implementation of the original algorithm. First, learn how the algorithm works; second, learn how to optimize it.

Continue reading “Implementing the Apriori Data Mining Algorithm with JavaScript”

YUI Image Uploader Example with TurboGears

After completing the YUI Image Uploader, I received a lot of requests for a working example. I didn’t originally create a working example, because that requires server functionality that this server didn’t have. I’ve remedied the situation and have completed an example with TurboGears. Of course, any server side language or framework will do as long as you have the ability to upload and store an image.

Continue reading “YUI Image Uploader Example with TurboGears”

Similarity of texts: The Vector Space Model with Python

I’m working on a little task that compares the similarity of text documents. One of the most common methods of doing this is called the Vector Space Model. In short, you map words from the documents you want to compare onto a vector that is based on the words found in all documents. Then, you find the cosine of the angle between the vectors of the documents that you want to compare. This is called the cosine measure. When the cosine measure is 0, the documents have no similarity. A value of 1 is yielded when the documents are equal.

I found an example implementation of a basic document search engine by Maciej Ceglowski, written in Perl, here. I thought I’d find the equivalent libraries in Python and code me up an implementation.

Continue reading “Similarity of texts: The Vector Space Model with Python”

An Image Upload Extension for YUI Rich Text Editor

Before you begin: Read the updates at the bottom of the page. This post was written for an older version of the YUI library.

I’ve had nothing but good things to say about the Yahoo User Interface tools. It seems to me like the developers continually add all the things from other libraries that I like into one simple to use, well documented, overall good quality library.

The new addition of the rich text editor has left me no less pleased. I can now ditch the other editors I’ve used in the past in favor of one that will be maintained. (I’ve tried two that worked, but were becoming out-dated and didn’t really have support.)

Continue reading “An Image Upload Extension for YUI Rich Text Editor”

Programming a client for the WHOIS protocol

I have a little task that involves programmatically determining whether DNS servers are set correctly for a domain. Since this project is written in Python, I first set out to see if there were any “whois” clients already available for Python. I eventually found rwhois.py, which is a whois client with recursive ability. I noticed it hasn’t changed since 2003, but thought that if it works, that shouldn’t be much of a problem.

My first run of the program resulted in an error. The client successfully found the registrar information for my domain, but failed to parse and display it. There was a “NoParser for: whois.godaddy.com” exception. I set out to analyze the rwhois.py client and the whois protocol and see if I couldn’t either fix it or come up with something for a replacement.

Continue reading “Programming a client for the WHOIS protocol”