Loading Layers Twice for Labels : Performance Implications

Steve · March 22, 2016, 7:13am

I have read Brenden and David's post (gis.thinkgeo.com/Support/Dis...fault.aspx) about the reasons behind why the layer needs to be loaded twice for labels to display properly but I am not clear on what performance implications this has in comparison to the 2.0 control. It is mentioned that some testing should be conducted to find out the most efficient method, does anyone have any numbers to share?

Common sense would say that we would be using double the memory and creating double the overhead I/O at some point in the bitmap creation process because we are opening the same shapefile twice regardless of the extent. Is this incorrect?

Are the performance implications different for shapefiles, SQL2008 layers, PostGIS/GRE Layers as far as memory usage and processing overhead are concerned when declaring the layer twice for labeling purposes?

Thanks!

David · March 22, 2016, 8:14am

Steve,

   I don’t have numbers that are very recent however I can share what we have found in general.  Typically most of the time is spent in I/O accessing the disk to get the features.  When we render twice it is almost always the case that the data from the previous layer is already in either the OS cache or the hard disk cache.  This of course is the case if we are talking about shape files or a file based source.  In all of our test you see this over and over again especially since the two layers render in very close temporally.  In the case of Sql there is overhead in this however what we find is that it is normally fairly slow to render complete maps from spatial databases as flat files are typically far faster.  We found that even though we designed our 3.x framework to do much more validation and use more virtual methods and OO design approach it runs a bit faster then 2.x in most cases.  Also once a layer is open in the Desktop Edition we never close it so there is not much overhead there.  The caches for file based layers are shared as far as I know.

   We separate the drawing because of a number of reasons.  I am not sure how many I reference in the thread you referred to but there are quite a few.  If you really want to do the drawing of the labels at the same time you draw the lines then you can do this.  We would create a modified TextStyle that draws the labels at a high drawing level than is the default.  The only issue is that these labels will be drawn on top of if there are layers that are drawn on top of the roads one.

   We could also simulate the way we did it in 2.x.  What we could do is this…  Create a modified layer of whatever type, shapefile, sql, etc and add a labels layer property to it.  This layer would be an InMemoryFeatureLayer and you would set it to the property and also add it to you overlays and give it a TextStyle  You would then in the DrawCore of the custom layer take the features that will draw and add them to the InMemoryFeatureLayer.  We will use this later to draw the labels.  After this layer draws then we will a little later get to you InMemoryFeatureLayer, this layer will already have the features in it from the original query and they will be ready to label.  In this way you only query the data once and get the speed especially on the server based sources.  I am sorry if I am not explaining this well but I think I could work up a sample to show you.  What do you think?

David

Steve · March 22, 2016, 8:14am

David,

I think this is great idea as long as it is easy to follow the order of when the symbols and the labels are being drawn. I am intrigued by this statement: "In this way you only query the data once and get the speed especially on the server based sources."

The way you explained the I/O when the layer is declared twice is that the first time the layer is read in, the disc or OS cache stores that layer. The second time we declare the layer (for labeling) Map Suite will instead go to the disc/OS cache (instead of the shapefile on the disc) and read the data in for the labels.

If we were to use the same setup as in 2.x we would load the layer the first time to display the symbols, and then when the labels were to be drawn (later on) we would be pulling the label information from memory correct?

So it seems that the difference between these two would be whatever the performance difference between the Memory in the computer and the Disc/OS Cache (which I would not expect to be much?) and one read cycle.

Although in my opinion the fewer the I/O operations the better, you are the Map control designer and would have a better idea on what would be easier to implement and what meets good software design standards.

David · March 22, 2016, 8:14am

Steve,

   You have reasons to think this is a slower method then before because it is.  In that interesting statement I made it was because what we count on when separating the labeling from the drawing was that the files would be cached by the hard drive itself and or the OS.  We didn’t think about the sources like SQL Server and Oracle etc.  In these cases the OS or hard drive are not able to cache them so we need to make a separate query.  That is why the solution I mentioned I think would work well for those kinds of feature sources.

  — “The way you explained the I/O when the layer is declared twice is that the first time the layer is read in, the disc or OS cache stores that layer. The second time we declare the layer (for labeling) Map Suite will instead go to the disc/OS cache (instead of the shapefile on the disc) and read the data in for the labels.”

What I mean here is that we do not do anything special to go to the disk or OS cache.  We just use the same file stream however there is caching happening at various different levels below us that we cannot control but is happening.  When we access the file stream for the same file the caching kicks in without us doing anything, this is the beauty of it.

I agree that the fewer IO operations in isolation the better and in 3.x we do more than in 2.x.  That is not the whole story as we did this to open other avenues that we could not address with the 2.x architecture.  The main reason we did this was because while the 2.x system was more efficient it also caused some major structural problems.  In 2.x what we did to ensure the labels would draw last was to in essence cache all of the drawing plans in memory and then after we cached all of the layers drawing we drew them all.  This was like winding up a bunch of commands and then at the end executing them all in a specific.  This also took memory and when drawing lots of things it consumes much more memory.  It also limited us to how we could handle things like Overlays, threaded drawing, tiling, etc.

   I can’t wait for you to see the things we have been working on for the next version.  We have really gone way beyond what we had before and I hope that these new changes will address many of your concerns.  At the same time I think the idea I had above needs to be flushed out a bit more and maybe we can integrate that solution directly into the framework and make it transparent to you.  I will champion this and let you know what we come up with.

Also it took me awhile to get back with you as I was out for a few days.

David