UNC Shares and "java.io.ioexception: insufficient system resources exist to complete the request"
I recently had cause to investigate a production problem related to the error message:
java.io.ioexception: insufficient system resources exist to complete the request
and noticed that through the course of my investigation there was quite a bit of cruft floating around this issue. In particular, if you enter the error message in your favourite generic internet search engine, you will most likely hit a bug logged with Sun in 2003 that offers nothing in the way of help in the slightest.
Before I start, let me try and get some of the background in first.
In our production setup we had a 64 bit Windows Server 2008 box running a long-lived Java-based daemon/service. This process is the man in the middle for an SFTP and bulk load process; all day it does something like this:
- Poll UNC share advertised by Samba on a Solaris box for files uploaded via SFTP.
- Determine that those files are not still being written to and some other unimportant (to this explanation!) validation.
- Copy files to database server (another similar box)
- Send message to database server that files have arrived and bulk upload into database can begin.
Of course, all of this had been tested in the various stages of our production release cycle and passed with flying colors. In production, it was now failing to write to the database server.
In light of that, if we now add UNC to our search query we get this Sun bug ticket, which is a bit more enlightening:
I have seen similar when a java application on Windows tries a "write" over a UNC path.
When this is just a "write" there is an application limit of 64MB, "writes" cannot use Windows cache manager if the destination is a remote file systems. If you change this to a "read/write" it will work. The restriction of 64MB is not configurable and is set to prevent Applications starting by issuing a lot of large IO. This issue only happens for remote file systems, local systems will work.For local cases, MS can cache the data because they can enforce the “write-only” logic in the OS. They don’t allow a “write only” handle to read, so its OK if the cache manager caches the write. They can’t do this for remote files because the server can’t trust the client to enforce that logic.
Hmm. This seemed more promising to me, but the 64MB file size just didn’t gel with what we were seeing: the files it was failing to copy were well below 64MB, so even if the code had been terrible nasty heap-abusing things, failing because the files were over 64MB wasn’t one of them in this case.
After much digging about (this Microsoft support article about failing backups turned out to be the key to understanding this) we determined that the following was happening:
- java.nio was being used in the application code, and it wasn’t using any sort of chunking or buffering - it was just passing a whacking great multi-megabyte array of bytes to Windows.
- When a file is copied to a UNC share in this fashion (this doesn’t occur for local copies) the client machine will allocate a region of memory approximately the size of (never less than) the file to be copied from the paged kernel memory. You can see your current paged kernel memory value in the Windows task manager.
- The sized of the kernel paged memory buffer will vary depending on a number of factors, including but not limited to, the uptime of the machine, the amount of RAM in the machine, the usage and I/O profile of the machine over it’s uptime - I’m no Windows internals guru but let’s just say the thing can change ;)
What we discovered is that is the current size of the paged kernel memory, plus the size of the buffer that Windows is attempting to allocate out of it (which is itself derived from the size of the file you’re trying to copy) exceed 160MB then the "insufficient system resources" exception will be raised.
It turned out for us that all of our internal and production release machines had a suitably small paged kernel memory size and as such didn’t pop this seemingly arbitrary 160MB limit. Our production machines, much longer lived and more trafficked, did. The file in question was 47MB in size and our test machines had 22MB of paged kernel memory, whereas our production machines at 120MB. 47 + 22 was a flying pass in the tests, 47 + 120 popped the limit by 7MB and it was game over for files of that size or above.
There is a registry key that can be edited to raise the 160MB upper limit for the paged kernel memory buffer and therefore allow drivers to allocate more memory for buffers, but that isn’t really the way to fix problems such as these. In my case the solution was to explain all of the above to the developers and have them configure their java.nio class to properly "chunk" the copy of the file in suitably small pieces.

