How to speed up InMemoryLayers that use Classbreaks

granthamm · March 22, 2016, 7:43am

Hey everyone,

I've been doing a bit of development with the ThinkGeo platform, and a few days ago I ran into a snag. I was using a fairly large amount of data (6900 columns, 3100 rows) to dynamically render shapes against classbreak values. The trouble is, as ThinkGeo has pointed out, that the InMemoryLayer wasn't really designed with sizes like that in mind. So, ThinkGeo came to our rescue and helped us create an InMemoryLayer that uses an r-tree index to speed things up a bit(see the seperate topic for the code on that one). This helped a good bit, but the performance compared to an indexed shapefile still wasn't there. After testing, I realized that the problem is an inefficiency in the way ThinkGeo renders it's class breaks. I don't know how to expose the actual method, so I can only guess, but I suspect that for each feature, it performs a GetColumnByName, or something to that effect, grabs the value, matches it, and then moves on to the next shape. Unfortunately, when you get to datatables of the size I'm working with, it slows everything down. A lot. So, here's what we came up with. It's a bit of a hack, and I'm sure ThinkGeo can do better, but I thought I'd share it for them and anyone else interested.

Step 1.

Don't store the data in the layer, only the shape and the ID. Instead, we created a custom class that is essentially a collection of hashtables. Each column is a hashtable, and each entry is the shapeID and the value.

Public Class DataContainer 'Contains a Hashtable whose keys are columns, and objects' are FeatureID, FeatureValue hashtables. Private Columns As New Hashtable Private WorkingHashtable As Hashtable ' Maintain reference to last call to improve efficiency Private WorkingTableName As String Public Sub Add(ByRef ColumnName As String, ByRef FeatureID As String, ByRef Value As String) Dim ColumnHash As Hashtable = Columns(ColumnName) If Not ColumnHash Is Nothing Then ColumnHash.Add(FeatureID, Value) Else Columns.Add(ColumnName, New Hashtable) ColumnHash = Columns(ColumnName) ColumnHash.Add(FeatureID, Value) End If End Sub Public Function GetValue(ByRef ColumnName As String, ByRef FeatureID As String) As Double If ColumnName = WorkingTableName Then If Not WorkingHashtable Is Nothing Then Return WorkingHashtable(FeatureID) Else Return 0 End If Else Dim ColumnHash As Hashtable = Columns(ColumnName) WorkingHashtable = ColumnHash WorkingTableName = ColumnName If Not ColumnHash Is Nothing Then Return ColumnHash(FeatureID) Else Return 0 End If End If End Function End Class

Step 2.

Use the IndexedInMemoryLayer from the other post, and Return the shapes inside the bounding box. In the code below, mData is a reference to the class above.

'Create new featuresource for output Dim fsc As New Collection(Of FeatureSourceColumn) fsc.Add(New FeatureSourceColumn("DValue")) 'Create outputlayer for featuresource Dim outputlayer As New IndexedInMemoryFeatureLayer(fsc) outputlayer.Open() outputlayer.EditTools.BeginTransaction() 'GetFeaturesInBoundingBox and add them to the featuresource For Each item As Feature In ActiveLayer.FeatureSource.GetFeaturesForDrawing(BoundingBox, 256, 256, New Collection(Of String)) Dim r As New Feature(item.GetShape) r.ColumnValues("DValue") = mData.GetValue(AmendedColumnName, item.Id) outputlayer.EditTools.Add(r) Next outputlayer.EditTools.CommitTransaction()

Step 3.

Use the newly created Layer and add your classbreak classes to it, setting the render column to the "DValue" column we created to hold the data we want to display.

Essentially, this condenses the objects such that they may be rendered much more efficiently. It takes much less time to create the new layer and render breaks against it than the original. In my tests, the render speed was roughly 10 times faster.

Finally, this will not work for everyone, as hashtables are very inefficient to add items to. In my case it does not matter, as my web server will persist the created shapes and datalayers for weeks at a time with no alterations. This will also not help you if you have 50 columns. But, if you want to use vast amounts of data, and get away from shp files with their 255 column limit, this is the fastest way to do it that we've seen. Perhaps ThinkGeo might even be able to find a way to include some performance enhancements in future revisions. I think the InMemoryLayer can have many more uses this way.

If anyone has any questions, feel free to ask!

Regards,

Grant Hamm

Developer,

AWhere, Inc.

David · March 22, 2016, 10:28am

Grant,

   I will take a look at this and see what I can find.  I find it strange because the column data for an InMemoryFeatureLayer is stored in a dictionary with the key being the column name.  The lookup on this should be rather fast.  When you go to draw how many of the 3,000+ records are drawn?  Also when you specify the columns you want returned how many do you specify?

David