set serveroutput on
stmt_task := DBMS_SQLTUNE.CREATE_TUNING_TASK(sql_id => ‘8bk0dw24d58jg’);
EXEC DBMS_SQLTUNE.execute_tuning_task(task_name => ‘TASK_1342’);
set long 9999999
SET PAGESIZE 1000
SET LINESIZE 32767
SELECT DBMS_SQLTUNE.report_tuning_task(‘TASK_1342’) AS recommendations FROM dual;
Below are few of the spark configurations that I have used while tuning the Apache Spark Job.
- “-XX:MetaspaceSize=100M”: Before adding the parameter, Full GC’s were observed due to the metaspace resizing. Added the parameter and after that no full gc on account of metaspace resizing observed
- “-XX:+DisableExplicitGC”: In Standalone mode, System.gc is being invoked by the code every 30 minutes which does a full GC (not a right practice). After adding this parameter no full GC on account of System.gc observed
- “-Xmx2G”: OOM was observed with the default heap size (1G) when executing the run with more than 140K messages in the aggregation window. After heap has been increased to 2GB (the maximum allowed in this box) I was able to process 221K messages successfully in the aggregation window. At 250k messages we are getting OOM.
- spark.memory.fraction – 0.85: In spark 1.6.0, the default value of 0.75 by storage/executor memory. This value has been increased to give more memory to the storage/executor memory, this is done to avoid OOM.
- Storage level has been changed to ‘Disk_Only’:Before the change, we were getting OOM when processing 250K messages during the aggregation window of 300 seconds. After the change, we could process 540K messages in the aggregation window without getting OOM. Even though, IN-Memory gives better performance, due to limitation of the hardware availability i had to implement Disk-Only.
- spark.serializer is set to KryoSerializer: Java serilizer has bigger memory footprint, To avoid the high memory footprint and for better performance we used this serializer
- “-Xloggc:~/etl-gc.log -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCCause” : This parameters needs to be added as part of good performance practice. Also, it will be helpful to diagnose the problem by looking at the gc logs. The overhead of these parameters are very minimal in production
- spark.streaming.backpressure.enabled – true: This enables the Spark Streaming to control the receiving rate based on the current batch scheduling delays and processing times so that the system receives only as fast as the system can process.
- spark.io.compression.codec set to org.apache.spark.io.LZFCompressionCodec: The codec used to compress internal data such as RDD partitions, broadcast variables and shuffle outputs.
Hope it helps.
AWS Elastic Load Balancer has an idle timeout value set at 60 seconds. If there is no activity for 60 seconds, then the connection is teared down and HTTP error code 504 was thrown to the customer. Here are the steps to change the timeout value in the AWS Elastic Load Balancer:
- Sign in to AWS Console
- Go to EC2 Services
- On the left panel, click on the Load Balancing > Load Balancers
- In the top panel, select the Load Balancer for which you want to change the idle timeout
- Now in the bottom panel, under the ‘Attributes’ section, click on the ‘Edit idle timeout’ button. The default value would be 60 seconds. Change it to the value that you would like. (say 180 seconds)
- Click on ‘Save’ button
Here is the usecase: I have 3 scenarios named A, B and C which are to be load tested with 6, 3 and 1 threads respectively.
These 3 scenarios have 7 use cases (T1 to T7)and are to be executed using defined percentages as shown below:
How do we configure it in JMeter? Had it been just the users with 3 scenarios we would have configured them in ThreadGroup. But how abt T1 to T7?
First create 3 threads groups with desired no. of users as shown below:
Then under the thread group add Throughput controller which is under logical controller. Configure the percentage and add the request to the throughput controller as show below:
Hope this helps.