Best approach to optimized rendering

Ted · March 22, 2016, 7:13am

We display many layers that are surface representations based upon small polygons. We display only area styles... no labels or outlines. The field to be themed is numeric. We'll tend to use about 20 class breaks. The feature source will have from 5,000 to 50,000 "cells".

What is the most efficient way to theme this? I'm looking for some insight into the ValueStyle and ClassBreakStyle themes. In the Feature, I have to convert my numeric value and provide it to you as a string. And, in a classbreak style, you convert that back to a number and then drive into the class break table? When you do that, do you do a sequential search through the breaks, or a binary search? Do you cache the most recently used break and test for it first, assuming that there is a good chance that sequential geometries will fall within the same class break? (which is the case with our layers).

As an alternative... I have an optimized style picker that works on the numerical data w/o conversion, caches the most recently used break, and uses a binary search when the value doesn't fall within the cached break. Should I use this object within my FeatureSource and create a FeatureCollection where the field is a resultant index for a ValueStyle, rather than the numerical number. Do you have a mechanism that would find which of 20 value style members should be used that would be a lot faster than determining which of 20 class breaks should be used?

Do you offer a style where I could have my field be an actual index into right member, rather than you having to search for the right member?

Do I have any options for building custom renderers for this type of data that would optimize the display of this type of data.

Thanks!

PS: When I have asked this question of other vendors over the past several years, the response is that I should represent my data as a raster. That is a topic worth discussion, but leads into many questions of a displaying a lon/lat WGS84 raster in a UTM NAD83 projection, etc, etc. It's a worthwhile discussion, but it doesn't change my interest in knowing how to optimize the application of styles.

David · March 22, 2016, 8:14am

Ted,

   I have a few questions about this.

1.Where is this data stored?  How long does it take to read the data to display on a typical screen?  Are the different breaks indexed to be able to query our just a given break from a rectangle?
2.How many polygons are typically drawn on a screen at a given time.  How many points are they made up of on average?
3.Are the polygons squares cells or can they be irregular shapes?
4.Is this data dynamic and changing often, if so how often?

I have some ideas about some of them require more information.

1.This sounds like grid territory.  You create a grid file and then it is loaded into memory and the values are colored based on a color range.  I am not sure if you are familiar with this but if not we can help you with it.
2.If you are concerned about class-breaks being slow to loop through the items then you maybe be better off setting two colors, the first at the low range you set and the other at the high range. Instead of falling into a break the value of a cell is colored based on where it fits between the high and low values and you create a color in between.  In this way it is a single calculation and not a loop.  This might save you some time.  You could also create a break effect like this by specifying a number of steps that the process takes.  This will mean from Light Red to Dark Red there are only five steps and the math has a rounding and a ceiling or floor type calculation.
3.If the query from your source if really fast then another great way is to not use a decision based style like a value or class break but to have separate layers where each on focuses on a different color.  We do this for our world map kit.  When we get the original data it has all road types lumped into one shapefile.  We then create custom indexes that only contain one type of road such as minor road, major road, paths, highways, interstates etc.  We load each road type as a different layer, each layer uses the same shape file but a different index.  In this way we do not even have to look at a field to determine how to draw.  All of the drawing is hard coded to the particular layer.  This works very well because our index is an Rtree and is very efficient in finding shapes in a rectangle quickly.  One thing to remember is that we can help you to use our R-Tree index for your own custom data source.  The R-Tree index could be a first pass filter to then tell you what records to fetch in your real underlying data.
4.Of course whatever you can cache in memory would be good.  We have some InMemoryFeaureLayers that also have R-Tree caches bolted on them, there is an example in the Developers Blog forum.  If you can load it all into memory that would be good.  When I heard you were using a binary tree to search through 20 class breaks I knew you were not messing around.  My only question is that does the searching really take that much time.  I would imagine that an optimization somewhere else might yield more fruit then this.  I can’t imagine this taking allow of time versus getting the data, doing validation etc.  Of course I have no answers to the questions above so it might be a good optimization to make.
5.Of course we can share with you the code for the ClassBreakStyle if you would like however it is not very high tech and I think you could put one together that is faster for your particular needs.  Many of the classes we provide are general purpose and have not been tweaked for all out speed.
6.Another approach is to cache the rendered tiles.  If your data doesn’t change that often we can cache what is drawn in tiles on the disk.  This is kind of like what the other vendors told you but in this way we will cache what is drawn in tile sizes of your choosing.  The tiles are generated on demand.  As the client uses the system more the more will be cached.  When you change projections then you specify a different CacheId and we will start a new cache.  You can flip between both cached by changing the CacheId.  This also cuts down on drawing time as the user pans etc.  if the user pans 40% then we only need to re-draw a smaller part of the screen as the tiles will populate the rest.  We can also help you write a little routine to pre-create the tiles for a few zoom levels if you wanted as well. In this way you are in control.  The nice thing about thte caching of tiles is that it is pretty transparent for you.  When an overlay gets the Draw method called we determine what we have in cache and only call into your layer for the rest of what we don;t have.  We then stitch what we had and what you give us and send it back.

I hope this helps, sounds like you have some interesting work going on there!

David

Ted · March 22, 2016, 8:14am

David, I have uploaded a PowerPoint Slide Show that illustrates our application and the requirements we have from a mapping engine. Within the 20 or 30 slides, there are 5 or 6 that explicitly illustrate why I have this specific question on rendering performance.

Brief summary: We are a company that builds software for "Precision Agriculture". This is the application of GIS and GPS technology to crop production. Our customers have GPS on their machines and the log operating information every second as they drive through a field. As such, in a 1/2 mile by 1/2 mile field (160 acres), there might be 150,000 logged data records. Our goal is to display that data on a map, such that the user can see high and low yield areas, product application rates, soil density, nutrient analysis, etc, etc for any area in the field. This is what we call Dense point data. 90% of the time, we display all of the data for a field within the same map frame. The ability to select a specific subset as we zoom in is a low priority. An example of a map of "raw" cotton yield data is shown on the 10th slide entitled "Field Op Records".

On the 13th slide, entitled Spatially Aware Sensors, you can see that we do have reason to zoom in on a map at times. And when we do, we are displaying a set of dots that are sized dynamically to reflect the operating width of the machine, and overlapped in time series so that it is easy to see in what direction the machine was travelling. These are the gray dots. The green dots represent the operation of just a single part of the machine, as farmers may have one variety in each half of a planter. Our special skill within the industry is our ability to map and display these individual operating parts of a machine.

We also deal with Sparse data... manually pulled soil test sites within a field, or things like soil type and soil testing polygons.

For Dense and Sparse sites, we sometimes generate surfaces. These are persisted within our system as layers of many small rectangular polygons, with the polygons that are intersected by the containing field boundary being irregularly shaped. These are the 5,000 to 50,000 record surface layers to which I was referring in the original post. There is an illustration of one of these on the IntelliCalc Multi-Layer Processing slide. That's an old slide, and of our COM system, so teh cells are about 10 meters, and were rendered with outlines. In our new .Net stuff, the cells are 2 to 5 meters, and rendered w/o outlines. But, in any case, you can see that we are displaying them all at one time, in most cases.

On the FieldOps data, for the first five five years we used this package, we drew this data directly into the map window ourself, using GDI. On a 1ghz machine, we could render 50,000 yield points within a couple of seconds. That was compared to 10 to 15 seconds with the out-of-the box renderers drawing from a shapefile. We were finally able to contract with the vendor to write an optimized renderer plug-in. They took their grid renderer and made it available for vector layers, and added a parameter for a size field when rendering dots. This let us render size and color at the same pass. On a 2ghz machine today, I can render to 50,000 sites in about 1/2 second. The issue is the time it takes to copy my data into their required recordset for rendering. That takes about 3 seconds, after they gave us a bulk load method. It was 10 seconds. However, in their architecture, once I have created that recordset, I'm done... as the user zooms in and out, or pans, the rendering happens automatically w/o a requirement that I reload the recordset. I've accomplish this in MapSuite by caching the feature collection that I build. But I don't know how hard you work against that collection, though.

In ou .Net world, we store these 50,000 records as a single compressed blob in a database (or in a flat-file), and read them into a .Net dataset. The geometry is represented as an NTS geomtry object. In a separate question, I'll ask about that. Once I have the "blob" out of the database or flatfile, I can instantiate my pouplated dataset in less than a second. That becomes my datasource. Getting the blob depends if it is coming down the wire from a web service, or opening a local flat file, etc.

As you review our slides, you wee that we operate from a busines navigational tree. A farmer has farms, farms have fields, and fields have cropzones. Cropzones have these FieldOp records and soil test layers, etc. The user will navigate around by clicking on those nodes. We expect no longer than a 3 second response for displaying the "yield map" for a large field after the user clicks on the node. It may be longer the first time they click on the node if we have to cache the data locally from a remote source.

Clearly my other question about filtering from a feature source was related to having a layer with all of the field boundaries for the farmer, themed in one manner, the boundaries of the currently selected farm themed in a second manner, and the boundary of a currently selected field themed in a third manner. If the user clicks on a farm node, then there is no field layer, but I wanted to keep the placeholder on the map so I didn't have to reorder layers when they go back and click on a different field (or a different farm). As the farm and field layers are subsets of the full grower (farmer) layer, I wanted to only have one copy of the data and filter it for the subset. We know know of two ways to handle that efficiently.

All of our data is stored as WGS84 Lon/Lat, and we dynamically project it into UTM NAD83 at display. On our COM stuff, we could run a bulk transform on the high-frequency data and project 50,000 sites in about .2 seconds. It was amazing. Proj4Net appears to be pretty fast, but I'm not sure if you run a bulk transform in your drawing stuff, or not. That's kind of where I'm looking for a flow diagram. Where does the transform occur, and is it performed each time the map is redrawn? Would appear that it is, as you ask for a new FeatureCollection as the map is zoomed and panned. So, that likely means that in my FeatureSource, I would like to be aware of the target projection, and do the projection one time as I build the cached feature collection, preventing you from doing it over and over again?

We have been using NTS and GeoAPI since we started our port to .Net two years ago. We have a strong collection of supporting geometry and GIS objects that we have written around them (surfacing, topology cleaners, more robust intersections and unions, et). I won't be using any of the ThinkGeo components for geometry processing, per se... just presentation.

So.... let me see what I've missed on your questions: I think I covered the first 4.

I am somewhat familiar with a Grid. But, we do have a need to display the original raw GPS sites in some instances, and the grid is not applicable there. And, when we do build surfaces, we build them square to the geographic coordinate system, but want to display them in a projected coordinate system. Is that an option? And, then there is the issue of the "edge" cells of the grid needing to be clipped to a overlaying field boundary. That's not an option, is it? These are the issues that have kept us from pursuing the Grid approach in the past.

My personal preference is to theme surfaces and sites with a smooth gradiant legend, and yes... if it were known that this was happening, then we could calculate the index prorated between min and max values, and dive into the correct break. We also have the option for "custom" breaks, though, and many of our customers want to define that pH values between 5.5 and 5.8 are Red, 5.8 and 6.1 are yellow, etc,. So, we could optimize the renderer for indexing in a gradiant theme, but we cannot assume that is always what the user will want to use.

I do think the option of having all cells that are between 5.5 and 5.8 be in one layer, 5.8 to 6.1 in another layer, etc is quite intriguing. My datasource is always going to be all 50,000 points, but each layer could filter for only the appropriate cells. I need to understand more about how to implement a Layer object that is a collection of layers, and get them all drawn. I think this could be very promising.

My only connection to a database (of file system) is to get the data the first time the grower requests it, and to the extent possible, keep it cached in our in-memory .Net typed datasets.

Does the searching for the class break really take enough time to warrant this discussion? Very good point. All I know is that using other packages in the past, the rendering was extremely slow compared to what we could do. I'm still testing some of this stuff in MapSuite. 15,000 points is not an issue. 75,000 points hung my box. And then I had to get back to my day job. But I'll be exploring more this weekend, and was asking questions based upon lots of experience with other packages, and a little with MapSuite.

Sharing the code for the class break style would let me see how you are doing it, and I really think you will find that caching the last value found and testing it first on the next point could be a big help, if you are not doing that. But, I'm not clear if I would have the opportunity to write my own alternative class break style guy by deriving from one of your classes or not. It appears your compoennts are very extensible in many areas, but I don't really find that magic document that defines the full scope of your extensibility :) That's not a complaint. I hear the same thing from people that integrate my stuff into their applications, too :)

I'm not ready to explore tile caching. I think we have lots of other options, first.

Thanks, David. I appreciate the design brainstorming. I hope I have provided some useful information to help flush out the discussion. I'll be happy to visit by phone, but the background I've provided here should help that conversation if we need it.

That document that you have someone starting on the rendering workflow is really the most significant peice of documentation that would help me on knowing where to spend optimization resources. If that document can highlight the workflow steps as "overridable", that would be perfect. I guess any method you have called *Core is overridable?

Ted · March 22, 2016, 8:14am

I just reviewed the custom style video again. After some experience with buliding custom layers and data sources, it made a lot more sense, and I think it addresses many of the questions that I had raised here.

Thanks!

David · March 22, 2016, 8:14am

Ted,

   I am glad you found the video helpful.   Just let me know any of the above issues it did not address and I would be happy to give you any insight  or suggestions I have.

David