I am still having problems with your explanations.
1. That's correct - with your example it seems to behave ok. But if you look at the original address example that I provided that gives you a different story "1314 University Ave, Sewanee TN 37375" . Because of the data issue in ThinkGeo as outlined before in my previous post, the geocoder fails to match anything when a ZIP code is included in the string. If I remove the ZIP code from the address string and submit the address without the zipcode "1314 University Ave, Sewanee TN", geocoder takes 1028ms to return even with the changes you proposed to initialization of geocoder. See sample code below of what I ran:
var results = geoCoder.Match("1314 University Ave, Sewanee TN");
Since I don't have any alternative for information for any other zip codes for this address (as received from the source client data), my only option is to go with matching without the address (and since ThinkGeo does not understand the address with the correct zip code anyway). Given that I am trying to geocode about 1.5 million records out of which about 500,000 fail to geocode with the zip code, the fall back for the problematic records would take about 140 hours which is not acceptable.
2. From your explanation, streets.dbf was built incorrectly and the logic seems to be faulty. Basically you are missing quite a lot of data in your streets.dbf file. The cases of having different zip codes on the left and right sides of the street is quite common in the US since that's how Postal Office defines zip code boundaries. In the example file that I supplied in my previous post that covered only one county in US there were 130 cases where the ZipCodes were different on the left and right side of the street. In addition one TLID record can match to multiple address ranges. Here is another example of the data problems in your data set:
Census data:
TLID
TLID
FROMHN
TOHN
SIDE
ZIP
PLUS4
FROMTYP
TOTYP
FROMARMID
TOARMID
ARID
MTFCC
FULLNAME
NAME
PREDIRABRV
PRETYPABRV
PREQUALABR
SUFDIRABRV
SUFTYPABRV
SUFQUALABR
PREDIR
PRETYP
614844025
4073
4071
R
37345
I
0
0
4002318771642
D1000
John Hunter Hwy
John Hunter
Hwy
614844025
4073
4071
R
37345
I
0
0
4002318771642
D1000
State Hwy 122
122
State Hwy
579
614844025
4074
4072
L
37345
I
0
0
4002318771632
D1000
John Hunter Hwy
John Hunter
Hwy
614844025
4074
4072
L
37345
I
0
0
4002318771632
D1000
State Hwy 122
122
State Hwy
579
614844025
4142
4100
L
37328
I
0
0
4002318771639
D1000
John Hunter Hwy
John Hunter
Hwy
614844025
4142
4100
L
37328
I
0
0
4002318771639
D1000
State Hwy 122
122
State Hwy
579
614844025
4143
4101
R
37328
I
0
0
4002318771626
D1000
John Hunter Hwy
John Hunter
Hwy
614844025
4143
4101
R
37328
I
0
0
4002318771626
D1000
State Hwy 122
122
State Hwy
579
All those 8 records get condensed into just one in your dataset. So in this example you are not only loosing mutliple zip codes, but you are loosing the fact that particular TLID is getting translated into multiple street names. Some other things to consider: County names could be different on each side of the street and so could be the city names. Think of Lake Cook Road in Chicago area for example. That road separates Lake and Cook counties.
So for your last question, do I want you to improve your index data - the answer is definite YES, because in its current state its not exactly usable.