4 Troubleshoot Performance Issues Using JFR

This chapter identifies performance issues with a Java application and debugs these issues using the Java Flight Recorder.

The Java Flight Recorder (JFR) is a commercial feature. You can use it for free on developer desktops or laptops, and for evaluation purposes in test, development, and production environments. However, to enable JFR on a production server, you must have a commercial license. Using the JMC UI for other purposes on the JDK does not require a commercial license.

To know more about the JFR commercial license, see the license agreement. To know more about creating a flight recording, see How to Produce a Flight Recording.

The Java Flight Recorder is a great tool to investigate performance issues. No other tool gives as much profiling data without skewing the results with its own performance overhead. This chapter gives examples of performance issues that you can identify and debug issues using the Java Flight Recorder.

This chapter contains the following sections:

JFR Overhead

When you measure performance, it is important to consider any performance overhead added by the flight recorder itself. The overhead differs depending on the application. In case you have any performance tests set up, you can measure if there is any noticeable overhead on your specific application.

That said, the overhead for recording a standard profiling recording using the default settings is less than 2 percent for most applications. Running with a standard continuous recording generally has no measurable performance effect.

One major contributor to the overhead is the Heap Statistics events, which is disabled by default. Enabling Heap Statistics triggers an old garbage collection at the beginning and the at end of the test run. These old GCs give some extra pause times to the application, so if you are measuring latency or if your environment is sensitive to pause times, don't run with Heap Statistics enabled. Heap Statistics are great when debugging memory leaks or when investigating the live set of the application. See Debug a Memory Leak Using Java Flight Recorder.

Note:

For performance profiling use cases, this information may not be necessary.

Find Bottlenecks

Different applications have different bottlenecks. For some applications, a bottleneck may be waiting for I/O or networking, it may be synchronization between threads, or it may be actual CPU usage. For others, a bottleneck may be garbage collection times. It is possible that an application has more than one bottleneck.

One way to find out the application bottlenecks is to look at the Events tab. This is an advanced tab, and there are a few things to do. First, click the Events tab, which opens the Event Types tab on the left side of the JFR window. This is where you select the events that you are interested in looking at. For now, select all Java Application events except for Statistics and Allocation, as shown in Figure 4-1.

Figure 4-1 Find Bottlenecks - Java Application Events

Description of Figure 4-1 follows
Description of "Figure 4-1 Find Bottlenecks - Java Application Events"

Now, in all the Events tabs, you will only see these events. Next, from the Graph tab, look at the main threads for the Java application, as shown in Figure 4-2.

Figure 4-2 Find Bottlenecks - Main Threads from the Graph Tab

Description of Figure 4-2 follows
Description of "Figure 4-2 Find Bottlenecks - Main Threads from the Graph Tab"

The Graph tab may be hard to grasp at first. Each row is a thread, and each thread can have several lines. In Figure 4-2, each thread has a line, which represents the Java Application events that were enabled in the Event Types tab for this recording. The selected Java Application events all have the important property that they are all thread-stalling events. Thread stalling indicates that the thread was not running your application during the event, and they are all duration events. The duration event measures the duration the application was not running.

From the Event Types tab, look at the color of each event. For example, yellow represents Java Monitor Wait events. The yellow part is when threads are waiting for an object. This often means that the thread is idle, perhaps waiting for a task. Red represents the Java Monitor Blocked events or synchronization events. If your Java application's important threads spend a lot of time being blocked, then that means that a critical section of the application is single threaded, which is a bottleneck. Blue represents the Socket Reads and Socket Writes events. Again, if the Java application spends a lot of time waiting for sockets, then the main bottleneck may be in the network or with the other machines that the application communicates.

From Figure 4-2, green represents parts that don't have any events. The green part means that the thread is not sleeping, waiting, reading to or from a socket, or not being blocked. In general, this is where the application code is run. If your Java application's important threads are spending a lot of time without generating any application events, then the bottleneck in the application is the time spent executing code or the CPU itself.

Note:

For most Java Application event types, only events longer than 20 ms are recorded. (This threshold can be modified when starting the flight recording.) To summarize, the areas may not have recorded events because the application is doing a lot of short tasks, such as writing to a file (a small part at a time) or spending time in synchronization for very short amounts of time.

Each of the previous bottlenecks can be further investigated within the flight recording.

The Event tab does not show garbage collections and whether garbage collections may be a bottleneck. See the next topic about garbage collection performance.

Garbage Collection Performance

Java application issues with garbage collections can be diagnosed using JFR.

Tuning the HotSpot Garbage Collector can have a big effect on performance. See Garbage Collection Tuning Guidefor general information.

First, take a profiling flight recording of your application when it is up and running. Do not include the heap statistics, because that will trigger extra old collections. To get a good sample, take a longer recording, for example 1 hour.

Select the Memory tab, and then select the GC Times subtab. GC Times is a good tab to investigate the overall performance impact of the GC. From the top-right corner, see the All Collections Pause Time section, and look at the Average Sum of Pauses, Maximum Sum of Pauses, and Total Pause Time from the recording. The Sum of Pauses is the total amount of time that the application was paused during a GC. Many GCs do most of their work in the background. In those cases, the length of the GC does not matter and what matters is how long the application actually had to stop. Therefore, the Sum of Pauses is a good measure for the GC effect.

Figure 4-3 shows a flight recording for 5 minutes (as seen from the time select bar). During this time, the average sum of pauses was 16 ms, the maximum sum of pauses was 49 ms, and the total pause time was 2s 86 ms.

Figure 4-3 Garbage Collection Performance - GC Pauses

Description of Figure 4-3 follows
Description of "Figure 4-3 Garbage Collection Performance - GC Pauses"

The main performance problems with garbage collections are usually either that individual GCs take too long, or that too much time is spent in paused GCs (total GC pauses).

When an individual GC takes too long, you may need to change the GC strategy. Different GCs have different trade-offs when it comes to pause times verses throughput performance. See Behavior-Based Tuning..

For example, you may also need to fix your application so that it makes less use of finalizers or semireferences.

When the application spends too much time paused, there are different ways to work around that.

One way is to increase the Java heap size. Look at the Garbage Collection subtab to estimate the heap size used by the application, and change Xms and Xmx to a higher value. The bigger the Java heap, the longer time it is between GCs. Watch out for any memory leaks in the Java application, because that may cause more and more frequent GCs until an OutOfMemoryError is thrown. For more information, see Debug a Memory Leak Using Java Flight Recorder.

Another way to reduce the number for GCs is to allocate fewer temporary objects. Under the Allocations tab, look at how much memory is allocated over the course of the recording. Small objects are allocated inside TLABs, and large objects are allocated outside TLABs. Often, the majority of allocations happen inside TLABs.

Last, to reduce the need of GCs, decrease the allocation rate. Select the Allocation in new TLAB tab and then choose Allocations tab to look at the allocation sites and stack traces that have the most memory pressure. You can either view it per class, or select the Allocation by Thread to see which threads consume the most allocation.

For general details about the JFR Allocation tab, see Inspect a Flight Recording.

Some other settings may also increase GC performance of the Java application. See Garbage Collection Tuning Guide in the Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide to discuss GC performance.

Synchronization Performance

To debug Java Application synchronization issues, or in other words where the application threads spend a lot of time waiting to enter a monitor, look at the Contention tab in the Threads tab group.

Take a look at the locks that are contended the most and the stack trace of the threads waiting to acquire the lock, as shown in Figure 4-4.

Figure 4-4 Synchronization Performance - Contention Tab

Description of Figure 4-4 follows
Description of "Figure 4-4 Synchronization Performance - Contention Tab"

From Figure 4-4, the range selector at the top lets you see where the events took place. Zoom in on the range selector for the contention events in the selected time range.

Typically, look for contention that you did not think would be an issue. Logging is a common area that can be an unexpected bottleneck in some applications.

When you see performance degradation after a program update or at any specific times in the Java application, take a flight recording when things are good, and take another one when things are bad to look for a synchronization site that increases a lot.

Note:

The events shown in the range selector are not all synchronization events. By default, contention events with a duration longer than 20 ms are recorded. (This threshold can be modified when starting the flight recording.) Shorter thresholds give more events and also potentially more overhead. If you believe contention is an issue, then you could take a shorter recording with a very low threshold (only a few milliseconds). When this is done on a live application, make sure to start with a very short recording, and monitor the performance overhead.

I/O Performance

You can diagnose I/O issues in an application by monitoring the Socket Read tab under the I/O group.

When a Java application spends a lot of time either in Socket Read, Socket Write, File Read, or File Write, then I/O or networking may be the bottleneck. To diagnose I/O issues in applications, look at the Socket Read tab under the I/O group, as shown in Figure 4-5.

Figure 4-5 I/O Performance Issues - Socket Read Tab

Description of Figure 4-5 follows
Description of "Figure 4-5 I/O Performance Issues - Socket Read Tab"

Figure 4-5 shows that the application had 100 reads from the remote address 198.51.100.0. The total number of bytes read is 356 bytes, and the total time spent waiting is 1 min 57 s. Select the By Event tab at the top-left corner, and look at each event to analyze the time spent and data read.

File or networking I/O issues are diagnosed in a similar fashion. Look at the files read to or written to the most, then see each file read/write and the time spent on I/O.

All the tabs in I/O, by default list events with a duration longer than 20 ms. When starting a flight recording, you can lower the File I/O Threshold or the Socket I/O Threshold to gather more data, potentially with a higher performance effect.

Code Execution Performance

The code execution performance can be monitored using the Java Mission Control, Call Tree tab.

When there are not a lot of Java Application events, it could be that the main bottleneck of your application is the running code. First, look at the Threads tab, and select the Overview tab. See CPU Usage Over Time. This shows the CPU usage of the JVM being recorded and the total CPU usage on the machine. In case the JVM CPU usage is low, but the CPU usage of the machine is high, which means some other application is taking a lot of CPU. Then, look at the other applications running on the system in the Processes tab from the System tab group. However, you may not see their CPU usage, so it is usually easier to use OS tools such as Top or the task manager to find out which processes are using a lot of CPU.

Select the Code tab group and look at the Hot Threads tab in case your application is using a lot of CPU time. This tab shows the threads that use the most CPU time. However, this information is based on method sampling, so it may not be 100% accurate if the sample count is low. When a JFR is running, the JVM samples the threads. By default, a continuous recording does only some method sampling, while a profiling recording does as much as possible. The method sampling gathers data from only those threads running code. The threads waiting for I/O, sleeping, waiting for locks, and so on are not sampled. Therefore, threads with a lot of method samples are the ones using the most CPU time; however, how much CPU is used by each thread is not known.

The Hot Methods tab in the Code tab group helps find out where your application spends most of the execution time. This tab shows all the samples grouped by top method in the stack. Use the Call Tree tab to start with the lowest method in the stack traces and then move upward. Figure 4-6 starts with Thread.run, and then looks at the calls that have been most sampled.

Figure 4-6 Code Execution Performance - Call Tree Tab

Description of Figure 4-6 follows
Description of "Figure 4-6 Code Execution Performance - Call Tree Tab"