Final Project

GOAL:

The goal is to view large geometric data over a thinwire. Specifically, we want to (I) process, (II) simplify and (III) display the TIGER Map data for 16 counties in the Metropolitan NYC area. Call these Tasks I, II and III, respectively. It is important to understand that we view the abstract Map as an overlay of three layers: Polygonal Subdivision (Sigma), Transportation Network (Gamma) and Point Landmarks (Lambda).

OVERVIEW OF TASKS

Here is a brief description of the main challenges in each task:

-- Task I requires Postgresql and JDBC interfaces to read and join TIGER data files. Some issues here are (a) converting a fixed-field format of TIGER files into the more flexible (array type) fields of Postgress, (b) merging data from more than one county.

-- Task II requires understanding the data to create effective simplifications. You will need to figure out the nesting structure of the polygons, and determine effective rules for simplification.

-- Task III calls for a Client program that uses JDBC to access the database. It requires an effective use of (a) image buffering, (b) hashing to avoid redundant bandwidth usage, (c) design for responsiveness. It is important to note that we put all the ``smartness'' into your client program as there is no Server Program except for the Postgresql server.


INTERFACES BETWEEN TASKS

Since the three tasks are sequential, it is very important to adhere to the following Interface Specification: Basically, the interface comprises tables (relations) that will be stored in the Postgresql Database.

PLEASE NOTE THAT THE INTERFACE SPECS IS SUBJECT TO CHANGE. IF YOU WANT SOMETHING CHANGED HERE, PLEASE LET ME KNOW.

I-II INTERFACE

This comprises three tables, Tlines (storing tiger line information), TPolys (for polygon information), Landmarks (for point landmarks). This is the first table:
	Tlines (Tlid	INT,
		StartPt	POINT,
		EndPt	POINT,
		Detail	PATH,		-- type "INT[]" is also OK
		BBox	BOX,		-- bounding box	
		CFCC	CHAR(3),
		Name	CHAR(30),
		Side	CHAR(1),
		LPolyId CHAR(15),
		RPolyId CHAR(15),
	
The CFCC, Name, Side information are all found in RT1. Side is either 1 or 2, identifying boundary lines. [NOTE: RT1, the 2-sided Tlines are given a NULL value, and 1-sided Tlines are given the value 1.] Basically, CFCC and Name provide the "merging information" for tiger lines. The Detail field includes (redundantly) the StartPt and EndPoint, and represents a PATH. However, we noted that you can also represent it by an array of INT's. The reason is that this may be more efficient, we do not need the PATH datatype to support fast range query (this function is already provided by the BBox field).

The LPolyId and RPolyId are the polygons to the left and right of the Tline. Note that these are given by a character string of length 15. The original Polygon id's are not unique across counties: we make it unique by concatenating the Census File ID (CenId) information. The original PolyId (before concatenation) and the CenId information are found in Record Type A files. These two files are 10 and 5 characters, respectively. They are essentially integers (except CenId is always the letter "C" followed by 4 digits). So we could combine them and view it as a 14 digit integer. However, for the sake of convenience for Group III, we will simply view it as a 15 character string. [Compared to integers, this is less efficient in comparisons, and wastes space. Perhaps storing as a 14 digit DOUBLE is best]

Here is the second table:

	TPolys (PolyId	CHAR(15),		-- combined with Census ID
		BoundaryList	ARRAY[int],	-- unordered list of Tlines
		BBox	BOX,			-- bounding box for Polygon
		Place90	INT,			-- FIPS 55 Code.
		Watermark CHAR(1),
		Name	CHAR(60))		-- using "TEXT" type is also OK
	
The TPolys table has an entry for each original polygon in the Tiger data. BoundaryList is a list of all the Tlid's that bound the current polygon. This list is in arbitrary order -- Task II will sort this out later. Note that we have a BBox to facilitate range queries on TPolys. Place90 is the information from RTA and helps identify the type of this polygon -- it is the same as the Fips field in RTC. Name is a field from RTC, and again helps in merging and coloring area landmarks (color pink). To get this information, you can join RTA and RTC:
	=> SELECT Place90, Name FROM RTA, RTC
	->    WHERE RTA.Place90 = RTC.Fips and RTC.Fips>0;
	
Note that the "Fips>0" clause assumes that when the Fips field is a NULL in RTC, we replace it by -1. Watermark will identify this polygon as water or not: this information can be found in RTS.

Here is the third table:

	Landmarks (LandId  INT,		-- landmark identifier number
		CFCC	   CHAR(3),	-- CFCC code
		Name	   CHAR(30),	-- name
		Position   POINT	-- lon/lat
		)
	
This Landmarks table is derived from Record Type 7. However, we should make sure that only point landmarks are kept in this list (this is indicated by the fact that the Lon/Lat values in RT7 file has NULL values).

II-III INTERFACE:

This comprises 6 tables, TlineX and TPolyX where X = 0, 1 or 2. There is a 7th table for landmarks, but for now, we just pass it straight through from Task I. The indicator X tells us the level of detail (LOD), with X=0 being the highest level of detail. The basic idea (following Ken Been) is to first merge polygons from TPolys as long as their "type information" agree. This gives us TPoly0. TLine0 is similarly obtained from TLines, by merging them as long as their "type information" agrees. But now there is an additional requirement -- their left and right MERGED polygons must also be the same in order for the lines to be mergeable.
	TPolyX (PolyId	INT,
		OuterLoop ARRAY[INT],
		BBox	BOX,		-- bounding box
		Parent	INT,
		Color	INT,		-- white, blue, green, yellow, pink
		Name	TEXT
		)
	
	TLineX (Tlid	INT,
		StartPt	POINT,
		EndPt	POINT,
		Detail	PATH,		-- could be ARRAY[INT] if you like
		BBox	BOX,		-- bounding box
		CFCC	CHAR(3),
		Border	CHAR(1),
		Name	TEXT,
		Usage	CHAR(1),
		LPolyId INT,
		RPolyId INT,
		)
	
NOTES:
  1. When two polygons merge, their union takes on the smaller of the two original ID's. Similarly for the merger of two lines.
  2. Level 1 (TPoly1 and TLine1) is obtained from Level 0 (TLine0 and TPoly0) by simplification of details. We recommend Douglas-Peucker for this. IMPORTANT: simplication is controlled by a single parameter "s1" which is the maximum error in meters. You need to convert Lon/Lat coordinates into meters for this purpose. This value is empirically chosen and must be stored somewhere for Task III.
  3. The simplification from Level 1 to Level 2 is similarly obtained and controlled by the parameter "s2". Here, we choose "s2" to be large enough so that the Level 2 data set has a manageable size (can be downloaded at the start of the visualization).
  4. The Parent field in TPolysX points to the the polygon that contains it. This field could be null. We also have the color of polygons explicitly coded in.
  5. In TLineX: each line has a Usage field coded as follows: 1 = used in transportation network only, 2 = used in subdivision only, 3 = used in both transportation as well as subdivision.


IMPLEMENTATION NOTES AND HINTS:


  1. -- Please re-read Lecture VI for information about TIGER files. We also suggest downloading and printing Chapter 6 of the TIGER Document -- this contains a complete description of all the TIGER file formats.
    -- The 16 counties we are interested are listed below. First, we suggest working with the following 8 counties:
    	
    	NY:
    	36005	Bronx
    	36047	Kings (Brooklyn)
    	36061	New York (Manhattan)
    	36081	Queens
    	36085	Richmond (Staten Island)
    
    	NJ:
    	34003   Bergen (Northern NJ)
    	34017	Hudson (main connection from Manhattan to NJ)
    	34039	Union (connected to Staten Island)
    	
    The remaining 8 counties are
    	36059   Nassau  (in Long Island)
    	36103   Suffolk (in LongIsland)
    	36087   Rockland (Upstate NY)
    	36071   Orange (Upstate NY)
    	36119   Westchester (Upstate NY)
    	34013   Essex (NJ, west of Hudson)
    	34023   Middlesex (Southern NJ)
    	34025   Monmouth (Southern NJ)
    	

    -- For Group II, we suggest that you initially work with Prince William County, Manassas and Manassas Park in Virginia. This is because these are smaller data sets but most of the topological problems in Tiger files show up here. For your convenience, these counties are unpacked in /usr/unsupported/packages/visual/tgr/rt/:
    		Prince William	51153
    		Manassas	51683
    		Manassas Park	51685
    		

    -- For Task III, you can begin to test your programs without waiting for group II. We suggest that you use the "visual" server and the database called "us300".
    -- Speeding up your processing on Postgres:
    		1) Use indexes.  E.g., for Tlines, you probably
    		   should create an index on Tlid.
    		2) To process the boundary lines for each
    		   group of county, you probably want to keep them
    		   in a separate temporary table at first -- the
    		   number of these boundary lines are MUCH smaller than
    		   the total number of lines.
    		


INSTRUCTIONS FOR SUBMITTING THE FINAL PROJECT:

  1. Deadline for project is Tuesday May 13. Since many of you are seniors, you want to have the grades submitted in a timely fashion for graduation.
  2. Send me an email with the SUBJECT "project" and with a tar file called ProjX.tar (X is I, II or III) as attachment. Your tar file should contain
    (1) ALL java programs that are needed to run your code,
    (2) A 2-Page Report,
    (3) ALL data files (except for raw TIGER files) and
    (4) a Makefile with various targets to compile and to run your programs.
  3. Be sure to cc your email to EACH member of the team. This helps me when I send my response to the whole team.
  4. DETAILS for Makefile: I want to be able to type "make" or "make XXX" to immediately test your code. Here are the minimal set of targets I would like from each group:
    	GROUP I Targets: 
    	   > make 		-- compile all the programs
    	   > make tlines	-- create the Tlines table from scratch
    	   > make tpolys	-- create the Tpolys table from scratch
    	   > make landmarks	-- create the Landmarks table from scratch
    	GROUP II Targets:
    	   > make 		-- compile all the programs
    	   > make N		-- where N = 0, 1 or 2.
    	   			   It should create the Level N tables
    				   from scratch
    	GROUP III Targets:
    	   > make 		-- compile all the programs
    	   > make md		-- start the MapDisplay program
    	
    You can insert other targets as you find useful. Put verbal instructions as comments in the Makefile. Remember that each target can have several equivalent names (e.g., "make one" could be equivalent to "make TigerLoader").
  5. DETAILS for Report: This should be a text file called REPORT.txt. It should have a title, names of teams (and other info like phone, emails, etc). There should be four sections: OVERVIEW, DETAILS, CODE and CONCLUSION. OVERVIEW describes the goals, the relation of your work to the other groups, assumptions you make, etc. DETAILS should be explicit, and may address technical difficulties you encountered and its solution, etc. CODE is a brief overview of each java file, each class (important methods or variables in the class), and dependencies among the files and classes. CONCLUSION can include what you have learned in this work, ideas for extension of your work, etc.
  6. Please be sure that all your programs are minimally commented, and can compile properly before you submit.