All My BrainWhere stuff from my brain lands

November 12, 2007

Implementing the Apriori Data Mining Algorithm with JavaScript

Filed under: Programming — Tags: , , , — Dennis @ 10:29 am

I’ve been working on my thesis for a little too long. I’m hopefully about finished, but that is beside the point. Part of what I’ve been working on revolves around the Apriori Data Mining algorithm. If you know what Apriori is, and you are looking for how to implement it, then this post is for you.

I’ve created a JavaScript implementation of Apriori. Admittedly, JavaScript isn’t probably the most efficient programming language to implement Apriori with; however, I was constrained to use it for my project [1]. There have been many improvements to the Apriori Algorithm since Agrawal suggested it in 1993. I chose to instead create a simple implementation of the original algorithm. First, learn how the algorithm works; second, learn how to optimize it.

October 22, 2007

YUI Image Uploader Example with TurboGears

Filed under: Programming — Tags: , , , , , , , — Dennis @ 12:53 pm

After completing the YUI Image Uploader, I received a lot of requests for a working example. I didn’t originally create a working example, because that requires server functionality that this server didn’t have. I’ve remedied the situation and have completed an example with TurboGears. Of course, any server side language or framework will do as long as you have the ability to upload and store an image.

October 19, 2007

Similarity of texts: The Vector Space Model with Python

I’m working on a little task that compares the similarity of text documents. One of the most common methods of doing this is called the Vector Space Model. In short, you map words from the documents you want to compare onto a vector that is based on the words found in all documents. Then, you find the cosine of the angle between the vectors of the documents that you want to compare. This is called the cosine measure. When the cosine measure is 0, the documents have no similarity. A value of 1 is yielded when the documents are equal.

I found an example implementation of a basic document search engine by Maciej Ceglowski, written in Perl, here. I thought I’d find the equivalent libraries in Python and code me up an implementation.

October 16, 2007

An Image Upload Extension for YUI Rich Text Editor

Filed under: Programming — Tags: , , , , , , — Dennis @ 1:24 pm

Before you begin: Read the updates at the bottom of the page. This post was written for an older version of the YUI library.

I’ve had nothing but good things to say about the Yahoo User Interface tools. It seems to me like the developers continually add all the things from other libraries that I like into one simple to use, well documented, overall good quality library.

The new addition of the rich text editor has left me no less pleased. I can now ditch the other editors I’ve used in the past in favor of one that will be maintained. (I’ve tried two that worked, but were becoming out-dated and didn’t really have support.)

October 10, 2007

Programming a client for the WHOIS protocol

Filed under: Programming,Web — Tags: , , , , — Dennis @ 7:54 pm

I have a little task that involves programmatically determining whether DNS servers are set correctly for a domain. Since this project is written in Python, I first set out to see if there were any “whois” clients already available for Python. I eventually found rwhois.py, which is a whois client with recursive ability. I noticed it hasn’t changed since 2003, but thought that if it works, that shouldn’t be much of a problem.

My first run of the program resulted in an error. The client successfully found the registrar information for my domain, but failed to parse and display it. There was a “NoParser for: whois.godaddy.com” exception. I set out to analyze the rwhois.py client and the whois protocol and see if I couldn’t either fix it or come up with something for a replacement.