A potential memory leak – Metaspace leak

For the last few weeks, I have been working on Benchmarking a Product that was devloped using JDK 1.8u65 and deployed on Weblogic 12C.

Started the initial load tests  just to see how the application is behaving and during that process was observing DB and JVM performance. Found few issues during the load testing process.

  1. Inserts in the DB was happening 1 at a time
  2. Complete Refresh was happening every 30 seconds.
  3. Full GC’s were observed

The issue with point 1 is, too many network round trips to the database and also the log file sync wait event is the top wait event. Informed the dev team to make changes in the code so that the records get commited to the database in batches, thereby reducing the no. of network round trips.

The issue with point 2 is, MV refresh happens for the  all the records in the database every time the refresh happens. We dont see much impact when the number of records in the database size is in few MBs. The real problem would arise when the database grows to few GB/TBs. Asked the Dev team to implement incremental refresh.  Only for the first time complete refresh happens there onwards refresh happens for the delta records.

In the meantime, started analyzing  the GC logs and the reason for the Full GCs. At first glance, full gcs seems to be okay. On further analyzing the full gcs, the metaspace was also getting resized (space getting reclaimed) which is not normal. We know that young and old region gets resized (space getting reclaimed) during minor and major gcs. But why is metaspace is getting resized that too in a steady state.

There might be two reasons for this.

  1. The load testing process (the way the requests are sent). This can be ruled out as I do the inilization once and then the main transactions run.
  2. New instances are getting created again and again. This is the only option left to be explored. At this point, I cannot involve the developers as they would ask me what are those instances?

As a next step, took 3 complete heap dumps when the heap occupancy was at different stages (immediately after the full gc, when heap is half full and when the heap is about get full gc’ed).

Analyzed all the heap dumps, and found out a particular class instance was responsible for this. I could see the  no. of instance of this particular class increasing in all those three dumps.

Now, I had the required data, checked with the developer if the particular class instance is being used in the code, he confirmed that his part of the code is creating the class instances.

Oooola.. got the fix.  Again repeated the same process. This time I could see only a few instances of those class. The leak is fixed.

Hope it helps.

Advertisements