Classic Java OutOfMemoryError

Few months back, one of the performance testing team in the hospitality industry encountered OutOfMemoryError during the endurance runs at the end of 2 hours. They repeated the endurance run again, but the end result was same.  Without a second thought, the performance testing team increased the heap to 8 GB from the existing 4 GB. This time the OutOfMemoryError had occurred at the end of 4 hours.

About the application: The AUT (Application under test) was very simple, users login into the system and does 3000 iterations of invoking a static page and then logs out.

Requirements: At any given point of time 8000 users have to be logged into the system. There are around 100K users available in the system

Analysis: Ideally there should not be any problem with such simple applications. Knowing this team for a while, asked them few questions:

  • What does the application perform?
  • How the users login and what does the user do?
  • Will the same users logins into the system or a new user logins every time?

Based on the responses provided by the team, came to a conclusion that the session is not getting in-validated when the user logs out.

Looked at the Session Keep-Alive time and found that this value has been configured to 24 hours.

After reducing the Session Keep-Alive time to 30 minutes, the OutOfMemoryError has been resolved even without looking at the heapdump (lots of time has been saved!!).

Conclusion: As a performance tester, very first thing is understand the application behavior.  Never change the values without fully understanding the impact of the change.


A potential memory leak – Metaspace leak

For the last few weeks, I have been working on Benchmarking a Product that was devloped using JDK 1.8u65 and deployed on Weblogic 12C.

Started the initial load tests  just to see how the application is behaving and during that process was observing DB and JVM performance. Found few issues during the load testing process.

  1. Inserts in the DB was happening 1 at a time
  2. Complete Refresh was happening every 30 seconds.
  3. Full GC’s were observed

The issue with point 1 is, too many network round trips to the database and also the log file sync wait event is the top wait event. Informed the dev team to make changes in the code so that the records get commited to the database in batches, thereby reducing the no. of network round trips.

The issue with point 2 is, MV refresh happens for the  all the records in the database every time the refresh happens. We dont see much impact when the number of records in the database size is in few MBs. The real problem would arise when the database grows to few GB/TBs. Asked the Dev team to implement incremental refresh.  Only for the first time complete refresh happens there onwards refresh happens for the delta records.

In the meantime, started analyzing  the GC logs and the reason for the Full GCs. At first glance, full gcs seems to be okay. On further analyzing the full gcs, the metaspace was also getting resized (space getting reclaimed) which is not normal. We know that young and old region gets resized (space getting reclaimed) during minor and major gcs. But why is metaspace is getting resized that too in a steady state.

There might be two reasons for this.

  1. The load testing process (the way the requests are sent). This can be ruled out as I do the inilization once and then the main transactions run.
  2. New instances are getting created again and again. This is the only option left to be explored. At this point, I cannot involve the developers as they would ask me what are those instances?

As a next step, took 3 complete heap dumps when the heap occupancy was at different stages (immediately after the full gc, when heap is half full and when the heap is about get full gc’ed).

Analyzed all the heap dumps, and found out a particular class instance was responsible for this. I could see the  no. of instance of this particular class increasing in all those three dumps.

Now, I had the required data, checked with the developer if the particular class instance is being used in the code, he confirmed that his part of the code is creating the class instances.

Oooola.. got the fix.  Again repeated the same process. This time I could see only a few instances of those class. The leak is fixed.

Hope it helps.