JCLambert GIS Enterprise: Special Topics in GIS

Showing posts with label Special Topics in GIS. Show all posts

14 October 2023

Lab 6 Topic 3 Scale Effect and Spatial Data Aggregation

Part 1b Scale Effects on Vector and Raster Data

This week’s lab was determining the effect of scale and resolution on vector and raster data. Another lab part was analyzing boundaries with Modifiable Area Unit Problem (MAUP), this involved looking at Gerrymandering in U.S. Congressional Districts.

For the vector data, the scale of data was 1:1200, 1:24000, and 1:100000. Because maps have different scales, a greater emphasis should be put on ensuring spatial accuracy is adhered to as much as possible. Understanding the effect of scale and resolution on vector data differs from observing raster data.

In the first part of the lab, we used the Clip tool for our hydrography datasets with the county as the “clip to” feature. After clipping all the data to the county, we added fields and calculated geometry to get length, area, and total count.

As resolution decreases, the accuracy and details diminish. Scale expresses the amount of detail for vector data; the hydrographic features are polylines and vector data. Because the large scale map has more detail and the small scale has less detail, these show how the relationship between scale and these hydrography data are affected.

Map Scale 1:1500 Scale and resolution effects

Map Scale 1:20,000

Part 2b Gerrymandering

The Merriam-Webster Dictionary defines gerrymandering as “dividing or arranging a territorial unit into election districts in a way that gives one political party an unfair advantage in elections.” Its history dates back to the early 1800s when it became official and later defined but was known prior to this time. The Modifiable Areal Unit Problem (MAUP) is an issue with boundaries and scale in spatial analysis. It highlights potential issues of delineation, creating bias within voting areas, i.e., congressional districts. In this final part of the lab, the feature class consisted of the continental U.S. I used the Dissolve tool to amalgamate the districts and in doing so I was able to find out the number of polygons each Congressional District (CD) consisted of. The below picture is of CD 01, the compactness score from the Polsby-Popper test was the lowest of all the districts we looked at in this lab. It is the "worst offender" of having bizarre-shaped legislative districts.

Congressional District 01

04 October 2023

Lab 5 M2.2 Surface Interpolation

This week’s lab focused on water quality in Tampa Bay, officially Surface Interpolation. It is always interesting to learn how there are different ways of studying data and interpreting results. We worked with different ways of interpolating data, specifically Thiessen, Inverse Distance Weighted (IDW), and Spline (Regularized and Tension). The data (BOD_MGL) for the study used BOD (Biochemical Oxygen Demand) in MGL (Milligrams Per Liter ) to measure data points for water quality in Tampa Bay (the body of water). We needed to determine areas with low and high water quality based on the results using different interpolation techniques.

The techniques we used to interpolate gave somewhat similar results. The Thiessen offered the same results as the non-spatial information. The IDW was very similar to Thiessen, only offering a difference in standard deviation. Spline was the interpolation technique that offered the greatest variation from the others. Interpolation offers a way to study the spatial distribution of phenomena across a wide range of points. These are a few of those options.

Thiessen-This interpolation technique contains only a single point having any location within the output polygon closer than any other point, it defines an area around a point. It divides areas into proximal zones or polygons. Thiessen polygons are also called Voronoi polygons or Voronoi diagrams.

Inverse Distance Weighted (IDW)-As the name suggests it relies on inverse distance from points with emphasis placed on the nearest ones. The mapped variables have decreased influence as distance increases from the sampled location.

Spline-This technique has two types: Regularized and Tension. Regularized offers a smooth changing surface and has values that may be outside of its range. Tension offers a less smooth surface but has data that adheres closer to sample data ranges. Both can be altered in the number of points and the weight when running the tool.

Thiessen Polygons

Spline Regularized

Spline Tension

Inverse Distance Weighted

Compare these different interpolation techniques. They are similar but do offer different levels of insight to this study area.

20 September 2023

Lab 4 M2.1 Surfaces - TINs and DEMs

This week’s lab delved into TINs (Triangular Irregular Networks) and DEMs(Digital Elevation Models) created, edited, and analyzed each of these. If you search in ESRI’s help, which is very informative, it distinguishes between these two where a TIN is vector-based and preserves the precision of input data while a DEM is a raster-based representation of a continuous surface. Thus, the TIN is more accurate than a DEM. It is visible in the results where triangles are completed by the points that represent the terrain surface, the preservation of data. Dr. Zandbergen in his lecture is also clear on the distinction between the two. “[For a DEM] there is nothing special about its data structure or values. So you have to know that the cell values represent elevation to be sure that it is indeed a DEM. This is different from a TIN, where the data model itself defines a 3D surface.”

In ArcGIS, we worked in Local Scene to better represent the data in 3D. Once again there were several tools used and sequential steps to follow to get the desired output for the deliverables. For the TIN and DEM, the vertical exaggeration was set at 2.00 and contour lines were set at 100m. The TIN is more accurate and what makes the TIN more accurate is the points that can be randomly spaced for the measured elevation points. One of the biggest differences is the contour lines of the DEM appear more widely spaced in some areas and closer in others where the terrain is very steep. Even though it closely resembles the TIN it is apparent that the accuracy is not as complete, even with the interval set at 100 for each. Even with the better accuracy of the TIN the DEM is still close to it as a representation for elevation.

Comparable view of very close proximity to the same location

I have not worked with TINs very much. It is interesting to see the differences between it and a DEM. They are close and both serve a purpose. It is clear that both represent terrain effectively but for better accuracy go with the TIN. The image below shows a TIN where slope, aspect, and edges were added. Once these renderers are set you can click on any triangle to get the value of each.

I like this view of a TIN.

13 September 2023

Lab 3 M1.3 Data Quality - Assessment

Initial map showing visual difference

This week’s lab assignment was about Data Quality - Assessment, specifically road network quality and completeness. The information to work with is comprised of TIGER 2000 and Street Centerline data for roads in Jackson County, OR. The task is to determine the length of roads in each dataset in kilometers and which roads are more complete,

When comparing datasets the first step is to make sure both are in the same coordinate system to ensure consistency. For each feature, the attribute table showed me what field should be measured to determine the total length of roads so summarizing the field length was next. I converted those results from international feet into kilometers. This was interesting because international feet are not as familiar to me as US survey feet, which are seen frequently in our lab datasets. These two are very similar but have differences in decimal spots that in results show greater precision and accuracy.

After getting these results the next step is to measure the total length of both features within each grid. This was somewhat similar but I used the Summarize Within geoprocessing tool to get these results. The Grid was the input, and each road feature was the input summary. For the summary fields of the tool, I used the specific length fields to summarize the total length of the output. I added a field after spatially joining the two road files to determine the percentage of roads within each. The TIGER Road data are the more complete dataset over the Street Centerlines.

Total grids in which TIGER data are more complete: 165

Total grids in which Centerlines are more complete: 132

Road Length total of TIGER Roads: 11,253.5 KM

Road Length total of Centerlines Roads: 10,671.2 KM

The formula for calculating the results is:

% 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 = (𝑡𝑜𝑡𝑎𝑙 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑐𝑒𝑛𝑡𝑒𝑟𝑙𝑖𝑛𝑒𝑠 − 𝑡𝑜𝑡𝑎𝑙 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑇𝐼𝐺𝐸𝑅 𝑅𝑜𝑎𝑑𝑠)/(𝑡𝑜𝑡𝑎𝑙 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑐𝑒𝑛𝑡𝑒𝑟𝑙𝑖𝑛𝑒𝑠) × 100%

The Street Centerline data “is intended for use as a reference map layer for GIS maps and applications and represents all known roads and trails within Jackson County, Oregon. The accuracy of this data is +/- 10 feet for state highways and in the urbanized portions of the county and +/- 30 feet in the remote rural areas.” (Street Centerlines Metadata)

The TIGER data used is from 2000 with an accuracy of 5.975-6.531 meters for horizontal positional accuracy. Seo reminds us that “TIGER/Line data are produced by the US Census Bureau for 1:100,000 scale maps and contain various topographic features and administrative boundaries and their nominal accuracy is about 50 m (US Census Bureau 2000).” The dataset we used in this lab was gathered using a survey-grade GPS unit which used the same data but mapped it for greater accuracy.

Seo, Suyoung and O'Hara, Charles G.(2009) 'Quality assessment of linear data', International Journal of Geographical Information Science, 23: 12, 1503 — 1525, First published on: 22 September 2008 (iFirst)

Final map showing percentage difference between roads

06 September 2023

Lab 2 M1.2 Data Quality - Standards

Street Map USA points

Albuquerque street points

Reference points

This week’s lab Is about data quality for standards of road network quality and horizontal positional accuracy. In determining the road network quality, the National Standard for Spatial Data Accuracy (NSSDA) procedures were used. The data for this were the streets of Albuquerque. We compared Street Map USA data from ESRI’s TeleAtlas and street data from Albuquerque to orthophotos of a study area. The orthophotos are the reference points for Albuquerque's two test point street maps. Street intersections are the most easily identified and easy to establish for points to compare.

We were to choose at least 20 points so I chose 30 points from the two test points and 30 of the reference points. I created these points by first creating a point feature class in the geodatabase. From here in Edit mode I created points at intersections that all three study areas had in common. Once these were completed, I uploaded the x and y coordinates into MS Excel for good formatting and access to formulas to get results. These formulas are simple calculations of adding, subtracting, squaring, averaging, and calculating the root mean square error(RMSE). Each of these results enabled me to get the result for the NSSDA error.

My numbers were rather large perhaps due to choosing 30 points as opposed to only 20. I maintained the same zoom level on the map scale when creating the points. The RMSE is used to calculate the NSSDA result, multiply the RMSE by 1.7308. According to the Federal Geographic Data Committee in Geospatial Positioning Accuracy Standards Part 3: National Standard for Spatial Data Accuracy, “A minimum of 20 check points shall be tested, distributed to reflect the geographic area of interest and the distribution of error in the dataset. When 20 points are tested, the 95% confidence level 4 allows one point to fail the threshold given in product specifications.” As I read this and also according to Dr. Morgan’s lab exercise, a minimum of 20 points is how I ended up with 30 points. It does not necessarily mean that 30 test points will provide greater accuracy or error in the results. However, in the article, it does not mention going beyond the number of 20 points. A journal article written by Zandbergen et al. (2011), required reading for this lab, where a test conducted used 100 sample points of data for their study of census data to determine accuracy. I mention this to show the comparison of the two studies, obviously, the census data comparison study was more comprehensive and covered a larger area.

(Zandbergen, Paul & Ignizio, Drew & Lenzer, Kathryn. (2011). Positional accuracy of TIGER 2000 and 2009 road networks. Transactions in GIS. 15. 495-519. 10.1111/j.1467-9671.2011.01277.x.)

The results of the data tested to meet NSSDA standards were 4716.28 feet horizontal accuracy for the Albuquerque street data. The results of the data from the StreetMapsUSA data were 4481.29 feet horizontal accuracy. My numbers seem far too large for this to be considered accurate and I am not sure where I went wrong with gathering the data and running calculations.

For the ABQ street data set: Tested 4716.28 feet horizontal accuracy at 95% confidence level.

For the Street Map data set: Tested 4481.29 feet horizontal accuracy at 95% confidence level.

30 August 2023

Lab 1 M1.1 Fundamentals

This is the first week of Special Topics in GIS and looks to be a class of new challenges with a different approach to data. However, after all, it is still data analysis and maps. That is what makes GIS fun and challenging to me. The title for this lab is Calculating Metrics for Spatial Data Quality. We learned the importance of accuracy versus precision and how one does not guarantee the other. Today’s cell phones and GPS units are quite accurate for the most part. They can be used to map data but should not be used without determining their accuracy and precision for horizontal and vertical positional accuracy and precision.

For this lab, we added shapefile data from the UWF repository that included waypoints and a reference point shapefile. We determined the precision and accuracy measurements of 50 waypoints that were mapped with a handheld Garmin unit with an accuracy of less than 15 meters, according to the unit owner’s manual. The points were sporadic at best with the actual location point being unclear. Once added to ArcGIS we created buffers of 1, 2, and 5 meters around an average waypoint location to make it simpler to map. After that was established we created three buffers that represented the 50th, 68th, and 95th percentile of the data. The 68th percentile was the most common representation of data. After this was completed we added the reference point shapefile that was mapped using a Trimble Pathfinder GPS unit with an accuracy of less than one meter if set up correctly, according to the manufacturer’s specifications.

Great! Now what? We have a map created and we need to make sense of the data. For this, we used Root-mean-square error (RMSE) and cumulative distribution function (CDF) to determine the accuracy and precision of the results. For this part, we used MS Excel, which was a bit of a challenge when trying to figure out the formulas to get the final results for the CDF. It was a large dataset to work with of x and y points, 200 to be exact. It helps to have some knowledge of Excel when doing this. We calculated the RMSE of the average between the x and y values which we derived from the coordinates. Once the results were determined we graphed them in a CDF, where the relationship between cumulative percentage and the errors in xy were shown in a scatterplot. The CDF helps with visualizing the results.

XY Coordinate errors and percentages

Error metrics for the GPS positions

The main point of our lab this week is determining accuracy and precision. Both can be confusing and may seem interchangeable but they are not. According to Bolstad in GIS Fundamentals, “accuracy measures how close an object is to the true value.” He defines precision as “how a dispersed a set of repeat measurements are from the average measurement.” In these definitions, you can see how related they might be but still different. The introduction of bias is not something to overlook in data and will certainly affect accuracy and precision. It could be measured in the wrong coordinate system or faulty equipment.

JCLambert GIS Enterprise