Monday, April 28, 2008

Tokenizer and Beyond

Wrote the tokenizer last week. Works nicely.

Started the new and improved (i.e. GPU version) word count functions this weekend.

Figured out a new and better question to ask. Rather than asking the political sway of an article - which is too hard to even ask humans to label - my tool will try to determine the source of an article (CNN, FOX, etc), which is sort of the same question, but easier to annotate.

Started thinking of how to parallelize the learning algorithm.

1 comment:

Pedro said...

Are you interested in a summer job offer?

It would be 3 months. GPGPU is the primary focus.