Setting Timeout in Elastic Load Balancer

AWS Elastic Load Balancer has an idle timeout value set at 60 seconds. If there is no activity for 60 seconds, then the connection is teared down and HTTP error code 504 was thrown to the customer. Here are the  steps to change the timeout value in the AWS Elastic Load Balancer:

  1. Sign in to AWS Console
  2. Go to EC2 Services
  3. On the left panel, click on the Load Balancing > Load Balancers
  4. In the top panel, select the Load Balancer for which you want to change the idle timeout
  5. Now in the bottom panel, under the ‘Attributes’ section, click on the ‘Edit idle timeout’ button. The default value would be 60 seconds. Change it to the value that you would like. (say 180 seconds)
  6. Click on ‘Save’ button

 

Advertisements

Configuring workloads using JMeter – Defined Percentage Probability

Here is the usecase: I have 3 scenarios named A, B and C which are to be load tested with 6, 3 and 1 threads respectively.

These 3 scenarios have 7 use cases (T1 to T7)and  are to be executed using defined percentages as shown below:

Screen Shot 2017-03-15 at 8.03.31 AM

How do we configure it in JMeter? Had it been just the users with 3 scenarios we would have configured them in ThreadGroup. But how abt T1 to T7?

First create 3 threads groups with desired no. of users as shown below: Screen Shot 2017-03-15 at 8.14.44 AM

Then under the thread group add Throughput controller which is under logical controller. Configure the percentage and add the request to the throughput controller as show below:Screen Shot 2017-03-15 at 8.19.54 AM

Hope this helps.

Load Testing Oracle/Postgres JDBC Query Through Apache JMeter

The Requirement is to load test Oracle and Postgres Databases using JMeter (Apache Open source) load testing tool.

I need to work on two scripts one pointing to Postgres Database and the other script pointing to Oracle Database.

  • Postgres Database:
  • Oracle Database:
    • Query using bind variables
    • Executing a procedure with bind variables

Creating a JMeter script is same in both Postgres and Oracle Databases. There are few subtle differences.

  • Add the jdbc jar (download the jdbc jar from the internet) file related to the version of database to the test plan as shown below: Screen Shot 2017-03-15 at 7.20.03 AM
  • From the config element, add JDBC Connection Configuration as shown below for the Postgres database: Screen Shot 2017-03-15 at 7.23.28 AM
  • Incase Oracle Database, replace database URL to: jdbc:oracle:thin:@//<ip:port>/dbname and JDBC Driver class to: oracle.jdbc.OracleDriver
  • Under Connection Pool Configuration, I have set Max Number of Connection to 0, which means that each JMeter thread will use a single dedicated database connection.
  • Variable name is the name of the pool you choose to have. You will be using the same name JDBC Request.
  • Add  JDBC Request sampler to the thread group: Screen Shot 2017-03-15 at 7.31.56 AM
  • In case of Oracle query use: Screen Shot 2017-03-15 at 7.34.50 AM
  • Incase of Oracle stored procedure: Screen Shot 2017-03-15 at 7.37.25 AM

Application Performance and Antipatterns

  • Excessive Layering – Most of the underlying performance starts with the excessive layering antipattern. The application design has grown over the usage of controllers, commands and facades. In order to decouple each layer, the designers are adding facades at each of the tiers. Now, for every request at the web tier, the request call goes through multiple layers just to fetch the results. Imagine doing this for thousands of requests coming in and the load the JVM need to handle to process these requests. The number of objects that get created and destroyed when making these calls add to the memory overhead. This further limits the amount of requests that can be handled by each server node. Based on the size of the application, deployment model, the number of user’s, appropriate decision need to be taken to reduce the number of layers. E.g. if the entire application gets deployed in the same container, there is no need to create multiple layers of process beans, service beans(business beans), data access objects etc. Similarly, when developing an internet scale application, large number of layers start adding overheads to the request processing. Remember, large number of layers means large number of classes which effectively start impacting the overall application maintainability.
  • Round Tripping– With the advent of ORM mappings, Session/DAO objects, the programmer starts making calls to beans for every data. This leading to excessive calls between the layers. Another side issue is the number of method calls each layer start having to support this model. Worse case is, when the beans are web service based. Client tier making multiple web service calls within a single user request have a direct impact on the application performance. To reduce the round tripping, the application needs to handle or combine multiple requests at the business tier.
  • Overstuffed Session– Session object is a feature provided by the JEE container to track user session during the web site visit. The application start with the promise of putting very minimal information in the session but over a period of time, the session object keeps on growing. Too much of data or wrong kind of data is stuffed into the session object. Large data objects will mean that the objects placed in the session will linger on till the session object is destroyed. This impacts the number of user’s that can be served by the application server node. Further, I have seen, application using session clustering to support availability requirements but adding significant overheads to the network traffic and ability of application to handle higher number of users. To unstuff the session object, take an inventory of what all goes there, see what is necessary, what objects can be defaulted to request scope. For others, remove the objects from session when their usage is over.
  • Golden Hammer (Everything is a Service) – With the advent of SOA, there is tendency to expose the business services, which can be orchestrated into process services. In the older applications, one can observe similar pattern being implemented with EJBs. This pattern coupled with the bottom up design approach at times, means exposing each and every data entity as a business service. This kind of design might be working correctly functionally, but from the performance and maintenance point of view, it soon becomes a night mare. Every web service call adds overhead in terms of data serialization and deserialization. At times, the data(XML) being passed with web service calls is also huge leading to performance issues. The usage of services or ejb’s should to be evaluated from application usage perspective. Attention needs to be paid on the contract design.
  • Chatty Services – Another pattern observed is the way the service is implemented via multiple web service calls each of which is communicating a small piece of data. This results in explosion of web services and which leads to degradation of performance and unmaintainable code. Also, from the deployment perspective, the application starts running into problems. I have come across projects which have hundred plus services all getting crammed into a single deployment unit. When the application comes up, the base heap requirement is already in 2Gb range leaving not much space for application to run. If the application is having too many fine grained services, then it an indication towards the application of this antipattern.

Refer to : https://www.linkedin.com/pulse/application-performance-antipatterns-munish-kumar-gupta

 

Top tips to troubleshoot & monitor JEE Web Application

How to troubleshoot and monitor a Java and JEE application having performance and scalability problem. Here are the techniques used for production systems.

  1. Perform a series of JDK thread dump to locate the following possible problems:
    1. Application bottleneck: Identify application bottlenecks by locating the most common stack trace. Optimize requests that happen most often on the stack trace.
    2. Bad SQLs: If most threads are in the waiting state for the JDBC calls, trace down the bad SQLs to the DB.
    3. Slow DB: If many SQLs are having problem, conduct a DB profiling to locate the DB problem.
    4. DB or external system outages: Check if a lot of threads are in the waiting state of making external connection.
    5. Concurrency issue: Check if many stack trace are waiting in the same code for a lock.
    6. Infinite loop: Verify if threads remaining running over minutes at similar part of the source code.
    7. Connectivity problem: Un-expected low idling thread count indicates the requests are not reaching the application server.
    8. Thread count mis-configuration: Increase thread count if CPU utilization is low yet most thread are in runnable state.
  2. Monitor CPU utilization
    1. High CPU utilization implies design or coding in-efficiency. Execute a thread dump to locate bottleneck. If no problems are found, the system may reach full capacity.
    2. Low CPU utilization with abnormal high response time implies many threads are blocked. Execute a thread dump to narrow down the problem.
    3. Monitor process health including the Java application server
    4. Monitor whether all web servers, application servers, middle tier systems and DB server is running.   Configure the system as service so it can be automatically re-started when the process die suddenly.
  3. Monitor the Java Heap Utilization
    1. Monitor the amount of Java Heap memory that can be re-claimed after a major garbage collections. If the re-claimed amount keep dropping consistently, the application is leaking memory. Perform memory profiling in locating the memory leak. If no memory is leaking but yet major garbage collection is frequent, tune the Java heap accordingly.
  4. Monitor un-usual exception in application log & application server log
  5. Monitor and resolve any exceptions detected in the application and server log. Examine the source code to ensure all resources, in particular DB, file, socket and JMS resources, are probably closed when the application throws an exception.
  6. Monitor memory & paging activities
    1. Growing residence (native) memory implies leaking memory in the native code. The source of leaking may include the application non-java native code, C code in the JVM and third party libraries.   Also monitor the paging activities closely. Frequent paging means memory mis-configuration.
  7. Perform DB profiling
  8. Monitor the following matrix closely
    1. Identify the top SQLs in logical reads, latency and counts – Re-write or tune poorly performed SQLs or DB programming code.
    2. Top DB waiting and latch events – Identify bad DB coding or bad DB instance or table configuration.
    3. Amount of hard parses – Identify scalability problem because of improper DB programming.
    4. Hit ratio for different buffers and caches – Proof of bad SQLs or improper buffer size configuration.
    5. File I/O statistics – Proof of bad SQLs, or disk mis-configuration or layout
    6. Rollback ratio – Identify improper application logic
    7. Sorting efficiency – Improper sorting buffer configuration
    8. Undo log or rollback segment performance – Identify DB tuning problem
    9. Amount of SQL statements and transactions per second – A sudden jump reviews any bad application coding
  9. JMS Resources
    1. Monitor the Queue length and resource utilization
    2. Poison messages: Check if many messages un-processed and staying in the queues for a long time.
    3. JMS queue deadlocks: Check if no messages can be de-queued and finished.
    4. JMS listener problems: Check if no messages are processed in a particular queue.
    5. Memory consumption: Ensure queues having a large amount of pending messages can be paged out of the physical memory.
    6. JMS retry: Ensure the failed messages are not re-processed immediately. Otherwise, poison messages may consumes most of the CPU.
  10. Monitor file I/O performance
    1. Trend the I/O access and wait time.   Re-design or re-configure the disk layout if necessary for better I/O performance in particular for the DB server.
  11. Monitor resource utilization including file descriptor
    1. Monitor resources closely to identify any application code is depleting OS level resources.
  12. Monitor HTTP access
    1. Monitor the top IP address in accessing the system.   Detect any intruder trying to steal the content and data in the web site. Use the access log to trace any non 200 HTTP response.
  13. Monitor security access log
    1. Monitor OS level security log and web server log to detect hacker intrusion. It also gives hints on how hackers are attacking the system.
  14. Monitor network connectivity and TCP status
  15. Run netstat constantly to monitor the TCP socket state.
  16. High amount of TCP idle wait state implies TCP mis-configuration.
  17. High amount of TCP in SYNC or FIN state implies possible denial of service attack (DoS).

Top J2EE application performance problems

What are the most common performance and scalability problems for a J2EE (Java EE) Web application? Here are the most common tips and problems found in real production systems.

  1. Bad Caching Strategy: It is rare that users require absolutely real time information. Simply refreshing HTML content with a 60 second cache can already dramatically reduce the load to the application server and most important the DB for a high traffic web site. Cache HTML segment for the home page and most visited pages.   Implement other caching strategy in the business service layer or the DB layer. For example, use Spring AOP to cache data returned from a business service or configure hibernate to cache DB query result.
  2. Missing DB indexes: After a new code push, indexes may be missing for the new SQL codes. The data query may be slow if the table is huge and the missing index forces a full table scan. Most development DB has a very small data set and therefore the problem is un-detected. Check the DB log or profile in production for long executed SQLs and add index if needed.
  3. Bad SQLs: The second most common DB performance problem is bad SQLs. Check the DB log or profile for long executed query. Most problems can be resolved by re-written the SQLs. Paid attentions to sub-query or SQLs with complicated joins. Occasionally, DB table tuning may be required.
  4. Too many fine grain calls to the service, data or the DB layer: Developers may use an iteration loop in retrieving a list of data. Each iteration may make a middle tier call which results in multiple SQL calls. If the list is long, the total DB requests can be huge. Developers should write a new service call and retrieve the list in a single DB call.
  5. All application server threads are waiting for the DB or external system connection: Web server has a limited number of threads. When a HTTP request is processed, a thread will exclusively dedicate to a request until it is completed. Hence, if an external system like DB is very slow, all web server threads may be waiting.   When this happens, the web server will pause all new incoming requests.   From a end user perspective, the system seems not responding. Add timeout logic when communicate with external system. Increasing the thread counts will only delay the problem and in some cases counter productive.
  6. SQLs retrieve too many rows of data: Do not retrive hundreds row of data to just display a few of them. Check the DB log or profile constantly for un-expected high usage of SQLs that retrieve a lot of rows .
  7. Do not use prepared statement for the DB: Always use prepared statement to avoid DB side SQL hard parsing.   SQL hard parsing causes a lot of DB scalability problem when DB requests increases.
  8. Lack or improper pagination of data: Implement pagination to display a long list of data. Do not retrieve all the data from the database and use the Java code to filter out the data. Always use the database for data filtering and pagination.
  9. Non-optimize connection pool configuration: The maximum / minimum pool size and the retaining policy of idling pool thread can significant impact an application performance. The web server will be idle waiting for a DB connection if the pool size is too low. The retaining policy is important since most DB pool creation code has very low concurrency and cannot handle a sudden surge of concurrent requests.
  10. Frequent garbage collection caused by memory leak: When memory is leaking, the Java JVM will perform frequent garbage collection (GC) even they cannot reclaim too many memory. Eventually, the web server spend most of the time executing the GC rather than processing HTTP requests.   Rebooting the server can temporarily release the problem but only stopping the leak can solve the problem.
  11. Do not process large amount of data at once: For request involving large amount of data, in particular batch process, sub-divide the large data set into chunk and process it separately. Otherwise, the request may deplete the Java heap or stack memory and crashes the JVM.
  12. Concurrency problems in the synchronization block: Code synchronization block carefully.   Use established library to manage system and application resources like DB connection pool. For system with concurrency problem, the CPU utilization remains low even significantly increase the traffic.
  13. Bad DB tuning: If DB response is slow regardless of SQLs, DB instance tuning is needed. Monitor the memory paging activity closely in identifying any memory mis-configuration. Also monitor the file I/O wait time and DB memory usage closely.
  14. Process data in batch: To reduce DB requests, combine DB requests together and process those in a single batch. Use SQL batch if necessary instead of large volume of small SQL requests.
  15. JMS or application deadlock: Avoid a cyclic loop in making JMS requests.   A request may send to Queue A which then send a message to Queue B and then again to Queue A. This circle loop will trigger deadlock in high volume requests.
  16. Bad Java heap configuration: Configure the maximum heap size, the minimum heap size, the young generation heap and the garbage collection algorithm correctly. The bigger is not the better and it is often depends on the application.
  17. Bad application server thread configuration: Too high of a thread count triggers high context switching overhead while low thread count causes low concurrency. Tuning it according to the application needs and behavior. Configure the connection pool thread count according to the amount of thread count.
  18. Internal bugs in the third party libraries or the application server: If new third party libraries are added to the application, monitor any concurrency and memory leak issue closely.
  19. Out of file descriptors: If the application does not close file or network resources correctly in particular within exception handling, the application may ran out of file descriptors and stop processing new requests.
  20. Infinite loop in the application code: An iteration loop may run into an infinite loop and trigger high CPU utilization.   It can be data sensitive and happen to a small set of traffic.   If the CPU utilization remains high during low traffic time, monitor the thread closely.
  21. Wrong firewall configuration: Some firewall configuration limits the amount of concurrent access from a single IP.   This can be problematic if a web server is connected to another DB server through a firewall. Verify the firewall configuration if the application achieves much higher concurrency if tested within in a local network.
  1. Bad TCP tuning: In-proper TCP tuning causes un-resonable high amount of socket waiting to be closed (TIME_WAIT).   New version of OS is usually tuned correctly for Web server. Make changes to the default TCP tuning parameters only if needed. Direct TCP programming may sometimes need special programming parameters for short but frequent TCP messages.

Types of OOM Error & Java Memory Leak Causes

Types of OOM:

  • java.lang.OutOfMemoryError: Java heap space
  • java.lang.OutOfMemoryError: PermGen space
  • java.lang.OutOfMemoryError: GC overhead limit exceeded
  • java.lang.OutOfMemoryError: unable to create new native thread
  • java.lang.OutOfMemoryError: nativeGetNewTLA
  • java.lang.OutOfMemoryError: Requested array size exceeds VM limit
  • java.lang.OutOfMemoryError: request <size> bytes for <reason>. Out of swap
  • java.lang.OutOfMemoryError: <reason> <stack trace> (Native method)
  • java.lang.OutOfMemoryError: Metaspace

Here are the typical cause of Java Memory Leak:

  • Do not close DB, file, socket, JMS resources and other external resources properly
  • Do not close resources properly when an exception is thrown
  • Keep adding objects to a cache or a hash map or hashtable, or vector or ArrayLIst without expiring the old one
  • Do not implement the hash and equal function correctly for the key to a cache
  • Session data is too large
  • Leak in third party library or the application server
  • In an infinite application code loop (likely cause for high cpu)
  • Leaking memory in the native code

performance engineering questions

Few of the questions that can help in solving performance issues:

Load balance issues

  • What type of load balancing scheme is used? (Round robin, sticky IP, least connections, subnet based?)
  • What is the timeout of LB table?
  • Does it do any connection pooling?
  • Is it doing any content filtering?
  • Is it checking for HTTP response status?
  • Are there application dependencies associated with the LB timeout settings?
  • What failover strategies are employed?
  • What is the connection persistence timeout?
  • Are there application dependencies associated with the LB timeout settings?
  • What are the timeouts for critical functions?

Firewall issues

  • What is the throughput capacity?
  • What is the connection capacity and rate?
  • What is the DMZ operation?
  • What are the throughput policies from a single IP?
  • What are the connection policies from a single IP?

Firewalls and multiple DMZs

  • Does the firewall do content filtering?
  • Is it sensitive to inbound and/or outbound traffic?
  • What is its upper connection limit?
  • Are there policies associated with maximum connection or throughput per IP address?
  • Are there multiple firewalls in the architecture (multiple DMZs)?
  • If it has multiple DMZs, is it sensitive to data content?

Web server issues

  • How many connections can the server handle?
  • How many open file descriptors or handles is the server configured to handle?
  • How many processes or threads is the server configured to handle?
  • Does it release and renew threads and connections correctly?
  • How large is the server’s listen queue?
  • What is the server’s “page push” capacity?
  • What type of caching is done?
  • Is there any page construction done here?
  • Is there dynamic browsing?
  • What type of server-side scripting is done? (ASP, JSP, Perl, JavaScript, PHP, etc.)
  • Are there any SSL acceleration devices in front of the web server?
  • Are there any content caching devices in front of the web server?
  • Can server extensions and their functions be validated? (ASP, JSP, PHP, Perl, CGI, servlets, ISAPI filter/app, etc.)
  • Monitoring (Pools: threads, processes, connections, etc. Queues: ASP, sessions, etc. General: CPU, memory, I/O, context switch rate, paging, etc.)

Application server issues

  • Is there any page construction done here?
  • How is session management done and what is the capacity?
  • Are there any clustered configurations?
  • Is there any load balancing done?
  • If there is software load balancing, which one is the load balancer?
  • What is the page construction capacity?
  • Do components have a specific interface to peripheral and external systems?

Database server issues

  • Have both small and large data sets been tested?
  • What is the connection pooling configuration?
  • What are its upper limits?

The experienced performance engineer asks the questions;

  • Why is the application updating all these tables on an order creation?
  • Why is it calling the remote pricing call three times?
  • Why are you creating a new object for the same customer or product?
  • Why is the database connection handler making so many connections for a static number of users?
  • Did you expect your users/customer to come from a slow wireless connection? Did you test for that?
  • Did you realize the Application Servers where in one data center and the database was in another data center?
  • Who set the JVM memory configuration?
  • Why are the indexes on the same volumes as the files?
  • The performance testing database was one quarter size the production database.
  • How many physical CPU’s did you really allocate to the Database Server?
  • How was the peak volume determined?

“CodeCache is full. Compiler has been disabled”

JVM JIT generates compiled code and stores that in a memory area called CodeCache. The default maximum size of the CodeCache on most of the platforms is 48M. If any application needs to compile large number of methods resulting in huge amount of compiled code then this CodeCache may become full. When it becomes full, the compiler is disabled to stop any further compilations of methods, and a message like the following gets logged:

Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.

Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=

Code Cache  [0xffffffff77400000, 0xffffffff7a390000, 0xffffffff7a400000) total_blobs=11659 nmethods=10690 adapters=882 free_code_cache=909Kb largest_free_block=502656

When this situation occurs, JVM may invoke sweeping and flushing of this space to make some room available in the CodeCache. There is a JVM option UseCodeCacheFlushing that can be used to control the flushing of the Codeache. With this option enabled JVM invokes an emergency flushing that discards older half of the compiled code(nmethods) to make space available in the CodeCache. In addition, it disables the compiler until the available free space becomes more than the configured CodeCacheMinimumFreeSpace. The default value ofCodeCacheMinimumFreeSpace option is 500K.

UseCodeCacheFlushing is set to false by default in JDK6, and is enabled by default since JDK7u4. This essentially means that in jdk6 when the CodeCache becomes full, it is not swept and flushed and further compilations are disabled, and in jdk7u+, an emergency flushing is invoked when the CodeCache becomes full. Enabling this option by default made some issues related to the CodeCache flushing visible in jdk7u4+ releases. The following are two known problems in jdk7u4+ with respect to the CodeCache flushing:

1. The compiler may not get restarted even after the CodeCache occupancy drops down to almost half after the emergency flushing.
2. The emergency flushing may cause high CPU usage by the compiler threads leading to overall performance degradation.

This performance issue, and the problem of the compiler not getting re-enabled again has been addressed in JDK8. To workaround these in JDK7u4+, we can increase the code cache size usingReservedCodeCacheSize option by setting it to a value larger than the compiled-code footprint so that the CodeCache never becomes full. Another solution to this is to disable the CodeCache Flushing using -XX:-UseCodeCacheFlushing JVM option.

Reference: https://blogs.oracle.com/poonam/entry/why_do_i_get_message

Bad state of performance testing … in the so called technology company

The code is already pushed to production without the load test.   Now the performance team has started the load test. We are seeing high response time and not getting the desired TPS.

The client’s performance engineer looks at the production access logs and says, even in the production we are getting High Response Times and concludes the code is bad. The clients performance engineer nor the performance team  does not bother to find out the root cause of the high response time. To avoid the high response times, on the load generators, timeout has been set to one second. Any response greater than 1 sec is marked as failed. This performance team is going to send these results to the management.

The Performance Team  is not bothered to do the code profiling. The fact is they dont know that the code profiler exists and profiling should be done in such cases

I feel very bad to work with such lousy people when they are not ready to listen. The performance team does not understand anything.

Initially I tried to educate the client and the performance team, but it was futile. At the end of one year, I realised that they are dumb fellows who are not ready to listen to anything. I stopped educating them…

Thanks, Frustrated Performance Engineer.