Shapefile indexes

Elisa · March 22, 2016, 7:21am

Is there a way to create an index for all the values in a particular field, and then use that index when running GetFeaturesByColumnValue? This function is quiet slow in a large shapefile, and I was hoping an index would speed things up. I could not find any examples of indexing on a field without specifying a value, and no examples of switching between indexes during runtime.

Thanks.

Elisa

David · March 22, 2016, 8:47am

Elisa,

   That is a good idea and we are going to see what that would take.  Give me a few days to figure this out.

David

Elisa · March 22, 2016, 8:47am

David:

Thanks, I appreciate the assistance and look forward to the solution.

Just a further note - most of the shapefiles that I work with have a field that is a unique identifier, string 10 characters in length, and an average of 350000+ records.  These shapefiles were exported from a PersonalGeodatabase and the unique identifer is the hookback to the source file.  The unique identifier is most often the field I am searching for, and would need the index for to use with the GetFeaturesByColumnValue.

Elisa

ThinkGeo · March 22, 2016, 8:47am

Elisa,

Thanks for your case description.

For my understanding, we can do it may be in following steps:
1)Get all the unique identifiers out and sort them alphabetically.
2)Save those information into Memory or disk file in a dictionary format(key is identifier, value is the recordId of the dbf).
3)When a identifier want to be queried , just do a binary search on those dictionary keys to get out recordId, this would be very fast with using binary searching algorithm.
4)Use GetFeatureById and get out what we want.

Of course, this is just a trick way, hope it works. Now, I and David are working together on the support of the dbf indexing, it is quite complicated, hope we can get it there in the coming weeks.
clicketyclick.dk/databases/xbase/format/ndx_example.html#NDX_STRUCT

Any more questions just feel free to let me know.

Thanks.

Yale

Elisa · March 22, 2016, 8:47am

Yale:

The method you mentioned is what I have had to use.

This method becomes a problem when I either add or delete a feature.  After adding/deleting, you need to call ShapeFileFeatureLayer.Rebuild which reassigns the featureID’s, and therefore I would need to re-extract the uniquid and featureid, and rebuild the lookup.

ShapeFileFeatureLayer.Rebuild takes quite a bit of time on a large file (350000+ records), so this method is inefficient, thus why I am asking about an index on the uniquid instead.

As a suggestion, would it be possible to adjust the ShapeFileFeatureLayer.Rebuild function to NOT change existing featureid’s, only remove the deleted ones and add the new ones to the internal indexes?  This way the suggested method would work very well.

Elisa

ThinkGeo · March 22, 2016, 8:47am

Elisa,

That is the problem we have to think about somehow. I agree that the rebuild of the look up is needed if the edit was committing against to the dbf in that solution. So, this kind of simple solution seems can only be applied to those stable cases which you may be not in.

Then next I will do some research on how to use B+ tree to store those data instead, maybe in this way, we can dynamically edit(insert/add/remove) features. It will take a few days for this investigation to see whether we have the chance to get it supported to our products.

Any more questions just feel free to let me know.

Thanks.

Yale

David · March 22, 2016, 8:47am

Elisa,

   Just to let you know we are still working on this…

David

Elisa · March 22, 2016, 8:47am

Any idea how far along this is?
Elisa

ThinkGeo · March 22, 2016, 8:47am

Elisa,

We have tried to do some enhancement for it, right now we are still in testing.

We will let you know if we make it.

Thanks
James