ThinkGeo.com    |     Documentation    |     Premium Support

Calling GetAllFeatures on a very large FeatureSource

I am getting an ESRI “General function failure” exception when I call GetAllFeatures() on a geodatabase table with roughly a million features in it (I know this is a huge dataset).

The reason I need all of the features is I have implemented an attribute table and need to load the features into the attribute table in the order they appear in the geodatabase table. I am not too stressed about the exception as I am sure even if it didn’t fail it would take an extreme amount of time to fetch that many features from the gdb.

My initial thought is to adopt a lazy loading methodology where I only grab the first couple thousand features and load more on an as needed basis. Where I am stuck is finding a ThinkGeo functionality that allows me to get some number of features in the order they appear in the geodatabase.

In looking over the existing ‘GetFeatureBy…’ methods I see lots for spatial query options and a couple for column contents and feature IDs but I am unsure how to translate this into something that is data-agnostic (will work on any large geodatabase).

In an ideal world for my needs something like GetFeaturesByRows(0, 3000) GetFeaturesByRows(3001, 6000) would be great but I am open to other ideas or solutions.

Thanks,
Sean Jamieson

Hi Sean, I had a similar issue and solved it by limiting the columns returned to none, and then splitting the IDs returned in batches to split the rows. Maybe see if that already works?

Hi Julian,

That sounds promising, I will give that a go after lunch here.

Thank you!

Let me know if that works - I assume it is cleaner if we have an internal batch api though, as that would cut down on a lot of function calls

Hi guys,

Julian’s approach will help but still has room to improve ----GetAllFeatures() always fetches geometries, so it will still be slow even when limiting the returned columns to none.

Please use FeatureSource.GetFeatureIds() instead, which returns only the IDs without any geometry data.

However, I found that FileGeoDatabaseFeatureSource didn’t have its own implementation of this method, so it was falling back to the base class which calls GetAllFeatures() internally. We’ve just fixed it in v15.0.0-beta006 — pull the latest and give it a try.

@Julian_Thoms, again thanks for chiming in and sharing your thoughts!

Thanks,
Ben

Just to follow up — the following FeatureSources already have GetFeatureIdsCore() implemented:

SqlServerFeatureSource, PostgreSqlFeatureSource, SqliteFeatureSource, ShapeFileFeatureSource,
MultipleShapeFileFeatureSource, TabFeatureSource, WkbFileFeatureSource, NauticalChartsFeatureSource, GridFeatureSource, DelimitedFeatureSource, FileGeoDatabaseFeatureSource

The following don’t have it yet, and we plan to add it in a future version:
GdalFeatureSource, CadFeatureSource

Excellent to hear, Ben!

I am part way through implementing a solution that uses GetFeaturesByIds and have not tested it yet but it would seem you have gotten ahead of my next road block already (most of our large data is in GDB’s).

I will get on the newest beta and give it a go once I have it all implemented on my end.

Thanks again @Ben and @Julian_Thoms, this was a huge help!

You are always welcome!

Hi Ben,

Sorry I didn’t get the chance to crack back into this until recently.

I see this new implementation of FeatureSource.GetFeatureIds(), unfortunately it doesn’t quite accomplish what I was hoping for. Generally speaking OBJECTID tends to start at 1 and is sequential so I can generate them on the fly. What I need is all of the columns as I will be displaying them in a table. My hope was that I could grab a handful of rows at a time rather than grabbing the whole 1 million (for performance).

What I have is this:

if (totalRows > _attributeTableMaxRecordLoadCount)
{
    IEnumerable<string> ids = Enumerable.Range(1, _attributeTableBatchSize).Select(n => n.ToString());
                            features = featureLayer.FeatureSource.GetFeaturesByIds(ids, ReturningColumnsType.AllColumns);
}
else
{
    features = featureLayer.FeatureSource.GetAllFeatures(ReturningColumnsType.AllColumns);
}

The idea here being if the total number of rows is greater than some threshold (in this case 100 000) I batch the features (in this case the first batch of 10 000) and lazy load them using FeatureSource.GetFeaturesByIds(). This is working but unfortunately performance is still very poor, around 4 minutes for FeatureSource.GetFeaturesByIds() to return a range of 10 000 features.

Digging into it I can see GetFeaturesByIds() calls GetFeaturesByIdsCore() which unfortunately calls GetAllFeaturesCore() meaning we are still grabbing all features all of the time.

Maybe there is another way around this, is there a way to read only some of the rows in the GDB? Or possibly returning all of the columns and all of the rows but without the geometries?

Thanks and sorry again for the delay in getting back to this,
Sean

HI Sean,

I’m wondering which FeatureSource you were using? GetFeaturesByIdsCore() calls GetAllFeaturesCore() in the abstract base class, but the following classes have rewritten the method with a much more efficient implementation.

SqlServerFeatureSource, PostgreSqlFeatureSource, SqliteFeatureSource, ShapeFileFeatureSource,
MultipleShapeFileFeatureSource, TabFeatureSource, WkbFileFeatureSource, NauticalChartsFeatureSource, GridFeatureSource, DelimitedFeatureSource, FileGeoDatabaseFeatureSource

Let us know which class you were using and we can dig into it. Also can you cast it to your specific feature source (such as ((ShapeFileFeatureSource)layer.FeatureSource)) see if it helps?

Thanks,
Ben

Hi Ben,

The feature source is a GDB and inspecting in the debugger does show its interpreting FeatureSource as FileGeoDatabaseFeatureSource at runtime, also I changed my code to the following which rendered identical results (~4 min method call):

 IEnumerable<string> ids = Enumerable.Range(1, _attributeTableBatchSize).Select(n => n.ToString());
 FileGeoDatabaseFeatureSource gdbFeatureSource = featureLayer.FeatureSource as FileGeoDatabaseFeatureSource;
 if (gdbFeatureSource != null)
     features = gdbFeatureSource.GetFeaturesByIds(ids, ReturningColumnsType.AllColumns);

I will say that if I go back to using FeatureSource.GetAllFeatures() I do still get the general function failure, so it is clear FeatureSource.GetFeaturesByIds() is doing something substantially different.

I understand using something like QGIS as a bench mark may be an unfair comparison but nonetheless I am seeing ~10 second load times for the QGIS attribute table functionality on this particular dataset. I will look into exporting this larger dataset over to a shapefile on Monday and test if there is any substantial performance differences with the ShapeFileFeatureSource class’s implementation of GetFeaturesByIds().

Thanks,
Sean

Hi Sean,

We just supported SQL query for FileGdb, can you pull the latest beta (v15.0.0-beta049) and try the following code?

var source = new FileGeoDatabaseFeatureSource(
    @".\Data\FileGeoDatabase\zoning.gdb",
    "zoning",
    "OBJECTID");

try
{
    source.Open();

    var table = source.ExecuteQuery("SELECT OBJECTID, zoning, pd FROM zoning WHERE OBJECTID <= 10");
    var count = source.ExecuteScalar("SELECT COUNT(*) AS TotalCount FROM zoning");
    var canExecuteSqlQuery = source.CanExecuteSqlQuery; // now returns true

    Console.WriteLine($"CanExecuteSqlQuery = {canExecuteSqlQuery}");
    Console.WriteLine($"Rows = {table.Rows.Count}");
    Console.WriteLine($"TotalCount = {count}");

    if (table.Rows.Count > 0)
    {
        Console.WriteLine($"{table.Rows[0]["OBJECTID"]}, {table.Rows[0]["zoning"]}, {table.Rows[0]["pd"]}");
    }
}
finally
{
    source.Close();
}

Thanks,
Ben