HTTP Connection Status

Dennis · September 26, 2022, 2:14pm

MapSuite Team,

I’m using WmsServer V10.6 to access WmsRasterLayer requests from our local client workstations and to then pass said requests on to our WMS Provider which is hosted on AWS. A very simple application proxy.

Our system has worked just fine for years until our provider upgraded their server applications and now our system is having issues due to high number of simultaneous requests that we send to them, which was also done with their old system.

There was no code change on our applications except to accommodate a new URL.

A major difference between their old and new systems is the old system did not do load sharing and the new system does load share. You can see this in the last lines of the netstat’s below.

This issue only arises when we have roughly 50+ outstanding requests sent to them. What we see in our system is an IIS Worker Process running at nearly 100% and high memory usage.

I’ve noticed that when we have high CPU and memory usage a netstat shows many connections in TIME_WAIT status (also have seen TIME_CLOSE). When the system is functioning normally there are rarely any TIME_WAIT’s. Below are two examples of our netstat command. The IP’s have been removed for security purposes.

The TIME_WAIT indicates to me that an HTTP connection is being kept open, but why would that be? Does this indicate waiting for a response from our WMS Provider?

Why are there ESTABLISHED Connections listed? Should it not be closed once the response is received?

Are you able to offer any hints as what this problem might be related to.

Thanks,
Dennis Berry

Local Address is our WmsServer. Foreign Address corresponds to client workstations on our LAN. Except the last two lines where it is our WMS Provider on the Internet.

This netstat occurred during 100% IIS Worker Process:
Proto Local Address Foreign Address State
TCP oo.oo.oo.oo:5049 10.3.1.106:57246 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.113:49643 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.129:64574 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.130:60591 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.136:65371 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.140:65456 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.161:55814 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.174:54492 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.189:59135 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.1.220:56147 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.3.123:54561 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.3.125:50718 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.3.131:56899 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.3.213:63749 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.3.213:63793 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.3.223:57635 TIME_WAIT
TCP oo.oo.oo.oo:5049 10.3.5.35:57709 TIME_WAIT
TCP oo.oo.oo.oo:5049 203.3.16.102:54671 TIME_WAIT
TCP oo.oo.oo.oo:5049 203.3.16.104:59833 TIME_WAIT
TCP oo.oo.oo.oo:47908 10.3.3.188:60964 CLOSE_WAIT
TCP oo.oo.oo.oo:47908 10.3.3.188:60965 ESTABLISHED
TCP oo.oo.oo.oo:47908 10.3.5.31:61829 ESTABLISHED

TCP oo.oo.oo.oo:51769 server-xx-xx-xx-xx:https CLOSE_WAIT WMS Provider via AWS
TCP oo.oo.oo.oo:51777 server-xx-xx-xx-xx:https ESTABLISHED WMS Provider via AWS

This netstat shows what is looks like when everything is functioning normally:
Proto Local Address Foreign Address State
TCP oo.oo.oo.oo:5049 10.3.1.103:50206 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.106:54443 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.124:65069 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.130:51647 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.136:58355 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.146:57037 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.148:56306 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.163:61172 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.164:55635 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.186:57084 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.196:55399 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.218:50353 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.1.220:63986 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.104:50341 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.108:57702 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.123:63737 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.145:51796 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.147:56940 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.151:52293 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.162:63251 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.173:55128 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.178:65027 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.217:58853 ESTABLISHED
TCP oo.oo.oo.oo:5049 10.3.3.221:58095 ESTABLISHED
TCP oo.oo.oo.oo:47908 10.3.3.188:52594 ESTABLISHED

TCP oo.oo.oo.oo:52999 server-xx-xx-xx-xx:https ESTABLISHED WMS Provider via AWS
TCP oo.oo.oo.oo:53021 server-xx-xx-xx-xx:https TIME_WAIT WMS Provider via AWS
TCP oo.oo.oo.oo:53022 server-xx-xx-xx-xx:https ESTABLISHED WMS Provider via AWS

Kyle_Day · September 26, 2022, 5:46pm

Hey @Dennis,

This can be common for non-load balanced servers that experience high load. What ends up happening is that we find that we can connect to the remote machine, but end up waiting for a very long time for it to respond due to high load, which in turn produces a high load the the server making those requests simply just waiting for a response. And this can be a common side effect when multiple users load up at the same time since it isn’t uncommon to request 15+ tiles at a given zoom level with monitor resolutions these days.

It’s unfortunate that your provider made these changes. I assume that you do not have access to their servers in order to find out why the server stalls after a certain amount of requests are thrown at it. Could be that they need to reindex their backend database holding your data, cache more aggressively, re-enable load balancing, or use higher spec EC2 instances on AWS.

Usually, these requests should timeout after nothing comes back. The default on the WmsRasterLayer’s TimeoutInSecond property should be 20 seconds. Are you seeing these requests persist further beyond that timeframe on the proxy server? You may also be able to manually request a cancel by hooking into the SendingWebRequest event and setting e.Cancel = true if you do not get a response soon enough. You’d also probably want to make sure that the e.WebRequest.Timeout property is set to what you would expect as well. The only problem with this method is how you handle a cancelled request on the client side. Do you queue up another request and risk overloading the remote anyway or just tell the client that the tile can’t be retrieved at that time.

Another thing you can do is cache the remote tiles assuming that they do not change often, as that would reduce the load on the remote server having to query and draw each tile on request. Even if that cache expired after an hour, performance would probably improve considerably.

Thanks,
Kyle

Dennis · September 27, 2022, 2:17pm

hi Kyle,

Thanks for your thorough explanation. I have all sorts of timers on both the WmsServer and client, so I am going to review the timer values and how they all interact.

We don’t have access to their server, nor do I know anything about it. One thing that troubles me is that no matter what I do in terms of our applications our load is our load. Even if we split our workstations amongst several ASP.NET websites or across more than one server I don’t know if they can handle it.

A few more questions if I may:

In this netstat line where exactly are these TIME_WAIT’s? In the line below 00.00.00.00 is our ‘proxy’ server and 10.3.1.106 is one of the client workstations. Which machine is waiting and which end?
TCP oo.oo.oo.oo:5049 10.3.1.106:57246 TIME_WAIT

In this netstat line does ESTABLISHED signify that there is active communication between our ‘proxy’ server and our client workstation? The number of ESTABLISHED Connections is always varying as you’d expect. I’ve noticed though this state persists even once there are no longer any requests coming in from the client. With no activity they can remain for minutes after.
TCP oo.oo.oo.oo:5049 10.3.1.103:50206 ESTABLISHED

The value for TimeoutInSecond is set to 10 seconds, but I never see this timeout expire in our logs.

I’m not setting e.WebRequest.Timeout, what is the default value?

The client has timers such that if there is no response from the server it does not make another request for 90 seconds and continues every 90 seconds until it does receive a response. The 90 seconds is based on a successful GetCapabilities response from our ‘proxy’ server the first time the application runs or if there had been a previous error. If there is no response from the server then we actually revert to an older MrSid image that is local to each workstation, and then after 90 seconds try again.

Our ‘proxy’ server makes use of the MapSuite Tile Cache. In the past the tiles were aged out after 12 hours, but with this system I set it to 25 hours.

Much appreciate your help.
Dennis

Kyle_Day · September 27, 2022, 5:15pm

Hey @Dennis,

I’m not much of an expert on networking, but TIME_WAIT appears to mean that the local (proxy) has hung up the connection to the remote (client) and is waiting for the remote to acknowledge that the connection has been closed so that the client and proxy can tidy up. I don’t suspect that this would take long to resolve though as usually those disconnects are easy to respond to unless the remote (in this case, client) is too busy to respond back.

The ESTABLISHED state is quite normal to behave that way. TCP connections should stay alive for a period of time for efficiency of not having to reconnect and handshake every time a request comes in. Depending on your client, you may have multiple established connections with varying amount of keep-alive time.

The default Timeout value for HttpWebRequest is 100 seconds. You can adjust that in that SendingWebRequest event handler I mentioned previously. Which might help the stalling from the remote WMS. and in turn help your client retry after 90 seconds.

Something to think about with your proxy servers though: consider consolidating the cache into one shared location, if possible. That way you aren’t caching the same tile on multiple servers, which would also reduce the load on the remote server a bit.

You may also want to consider letting the clients invalidate the cache by tacking on rebuild=True to the proxy’s query string. That way the client doesn’t have to wait 25 hours in order to get fresh tiles.

I do wonder a bit about your remote WMS though. From what I understand, the data that you receive from them is constantly being updated and is a raster MrSid file on the backend. Maybe you could have some sort of service that regularly goes through the process of updating the proxy servers’ cache by looping through the more frequently visited spots, however which way you might track that. I’m just trying to think of some sort of way to avoid overloading their server based on your request load.

Or maybe they don’t have a load balancer anymore, but still have multiple servers? In which case, you could add multiple URIs to the WmsRasterLayer to round-robin the requests. I’m not sure, but maybe asking what solutions they may have because I’m sure they don’t want their servers to always be pegged out for all their customers as well.

Thanks,
Kyle

Dennis · September 27, 2022, 7:46pm

Kyle,

Thanks again for your insight. I’m not much of a network person either and less an HTTP person so this is a learning experience for me.

A few days ago, we changed the value of TcpTimedWaitDelay in the Registry to 30 seconds and that seemed to help. The TIME_WAIT states now appear only when the IIS Worker Process goes to 100% CPU.

I’ll definitely add code to change the http WebRequest time value. I have all these values as configuration in Web.config so that they’re easily changed to find the best value.

Does MapSuite set the Linger time for the TCP connection?

Our cache of the imagery is done via WmsServer with the tiles being stored on same server.

Our WMS Provider provides new imagery every few months so setting retention to 25 hours is not really an issue.

I’m not sure what format the providers images are in. In the URL I request jpeg format. We only use MrSid when clients cannot access WMS and that is all processed on each client (no server required).

How do you suggest adding multiple URI to the WmsRasterLayer? I can see only one property for URI. Or is this something that I would have to develop?

Thanks,
Dennis

Kyle_Day · September 28, 2022, 3:52pm

Hey @Dennis,

I’ve done plenty of web service work in the past, but it doesn’t go too deep in the nitty gritty of having to manage underlying TCP connections. Good to hear that the TimedWaitDelay helps a bit.

Does MapSuite set the Linger time for the TCP connection?

No, we don’t manage anything like that low level. Whatever the standard WebRequest library uses usually works best.

Our cache of the imagery is done via WmsServer with the tiles being stored on same server.

Ok, good. I just wanted to make sure because you implied that you had several proxy servers setup at one point. Multiple proxy servers with individual caches would just exacerbate the issue.

Our WMS Provider provides new imagery every few months so setting retention to 25 hours is not really an issue.

If that’s the case, you may get away with setting the cache retention higher, like 3 days or even a week. I was under the impression that the data was constantly updated daily.

How do you suggest adding multiple URI to the WmsRasterLayer? I can see only one property for URI. Or is this something that I would have to develop?

That’s my bad. I got WmsRasterLayer mixed up with WmtsLayer. WMTS supports the idea of multiple URIs, but WMS really doesn’t. You could make a custom WmsRasterLayer to bounce around the different servers, but I wouldn’t recommend it. It’d be better if your provider just enabled load balancing from their different servers, assuming that they have multiple servers.

Thanks,
Kyle