MSTU5031/Final
From Studyplace
Teachers College • Columbia University
MSTU5031 Navigator
Fall 2008
Thursday, 6:50 PM - 9:10 PM
Location: 234 HM
Antonios Saravanos, Instructor
Labs & Office Hours
Project Proposals
Course Participants
Programming Resources
Java Style Guide
- The Problem
Write a program that "crawls" a website. Starting at the root page, it should visit all of the pages on the site to gather information. Specifically, it will generate a table that tracks all of the links internal and external links out of a page, as well as the in-links to a page. It will write this data out to a text file.
You are welcome to consult any written sources and to use any portion of the Java API to complete this exam. You must not consult other students or outside programmers for help or use 3rd party APIs. If you are not sure what is permissible, ask the instructor.
You can use the following stub to get started:
public class LinkChecker { public static void main(String[] args) { String site = "http://www.tc.edu"; String outputFile = "/tmp/tc-links.csv"; LinkChecker checker = new LinkChecker(); checker.checkLinks(site, outputFile); } public LinkChecker() { } public void checkLinks(String url, String outputFilePath) { //go for it } }
The program should generate a file of comma separated values (csv) with the following columns: page, internal links, external links, in links. The program must follow all of the internal links from the initial page, but not any external links.
Example:
/mst/ccte/,43,3,25 /mst/ccte/detail.asp?Id=2007%2D08+Faculty+%26+Student+Presentations&Info=AERA+%28American+Education+Research+Association%29#1,12,1,3 ...
If you have any questions or problems please contact the instructor right away.
