Monday, April 28, 2008

Tokenizer and Beyond

Wrote the tokenizer last week. Works nicely.

Started the new and improved (i.e. GPU version) word count functions this weekend.

Figured out a new and better question to ask. Rather than asking the political sway of an article - which is too hard to even ask humans to label - my tool will try to determine the source of an article (CNN, FOX, etc), which is sort of the same question, but easier to annotate.

Started thinking of how to parallelize the learning algorithm.

Tuesday, April 8, 2008

old beginnings, new beginnings

Wrote a word count program on the cpu this weekend. Turned out to be not what the doctors ordered....

Started over last night. Wrote the beginnings of a tokenizer on the gpu. My emulator at home was giving me problems, spent a while debugging.

Hoping to finish the tokenizer + get a tagged corpus this week.