Application Level Monitoring
Application-level resource utilization monitoring is built in to EcoSys. This data, called SERVERMETRICS, is written to the application log at a user-specified interval (the default setting is every two minutes). You can also view this data interactively using the EcoSys user interface.
Viewing Real-time Server Metrics Graphs
To view real-time graphs of the server metrics within the EcoSys application, click Admin > System Status & Logs > System Information, and then select the SERVER METRICS tab. The screen displays several panels with chronological values for resource utilization. You can control the time range using the parameters at the top of the screen.
Viewing the System Status Monitor Dashboard
In addition to the Server Metrics view, a System Status Monitor dashboard is available to monitor the application externally. You can reach this view using the System Status link on the Server Metrics view. You can also load the dashboard by appending /systemstatus to the EcoSys URL (for example, type https://server.example.com/ecosys/systemstatus). The system status dashboard displays a snapshot of the recent resource utilization, with averages over the last 1, 5, and 15 minutes. This is useful for monitoring the health of and load on a given instance.
You can also view the basic metrics data in text or XML format using the links at the top. This can be incorporated into an automated monitoring system, as needed.
Some of the metrics values are available only in EcoSysVersion 6+.
Server Metrics in the Application Log
Use System Admin > Display log or access the application server’s log directory (for example, C:\EcoSys\log) and filter on the tag SERVERMETRICS to extract the server metrics data from a given system. This data is written on an interval and is useful for correlating system load with specific user activities over time. For more information, see the section on Application Tracing.
Interpreting Server Metrics Values
Server Metrics are a periodic capture of system-wide resources. By default, these metrics are written to the application log every 2 minutes. Server Metrics summarize the resource utilization on the system within those 2 minutes immediately prior. Server Metrics are independent of any specific user session or request, as they aggregate across all users and activity on the application instance (the JVM).
You can control the granularity of Server Metrics using the Application Tracing screen. Finer grained capture is appropriate for troubleshooting. For example, you can set the capture rate to 10 seconds and the logging rate to 30 seconds during a troubleshooting session to be able to see more frequent updates of system-wide resource utilization.
Most Server Metrics values are expressed as a per-second average value over the logging interval. This means that you can compare any value to any corresponding value in another sample since they all use the same units, irrespective of the sample interval or logging interval. The following table describes the specific metrics.
Some of the metrics values are available only in EcoSysVersion 6+.
Units |
Description |
---|---|
#/s |
count per second, an average rate over the logging interval |
s/s |
seconds per second, or percent of a single thread or single core on the system |
# |
count, peak value for each sample interval averaged over each logging interval |
% |
percent, peak value for each sample interval averaged over each logging interval |
s |
seconds for logging interval |
KB |
kilobytes average for logging interval |
Key & Units |
Description |
---|---|
cpu(%) |
Summary percent of system waits for CPU, percent of total capacity of all cores |
db(%) |
Summary percent of system waits for database, an approximation of capacity by cores (could exceed 100%) |
thblk(%) |
Summary percent of system waits for concurrency blocks |
mem(%) |
Amount of Java heap memory in use (average over interval) |
totsess(#) |
Count of total user sessions logged in |
actsess(#) |
Count of active user sessions logged in (communicating with server within last 5 minutes) |
webreq(#/s) |
Count of web requests per second, includes both data and static content (excludes real time monitors) |
scpu(%) |
Percent of system CPU resources in use (total CPU, both system and application) |
ucpu(%) |
Percent of user (application) CPU resources in use |
blk(s/s) |
Concurrency block time among Java threads, or percent of a full thread blocked |
blk(#/s) |
Concurrency block instance count |
gc(s/s) |
Garbage collection time (Java), or percent of a single core spent on GC |
gc(#/s) |
Garbage collection invocation count |
dbqsel(s/s) |
Database query wait time for type SELECT |
dbfetdata(s/s) |
Database fetch data wait time, including fetching rows and CLOB cell values |
dbcmmt(s/s) |
Database COMMIT wait time across all write queries |
dbqins(s/s) |
Database query wait time for type INSERT |
dbqupd(s/s) |
Database query wait time for type UPDATE |
dbqdel(s/s) |
Database query wait time for type DELETE |
dbqproc(s/s) |
Database stored procedure wait time |
dbqsel(#/s) |
Database query count for type SELECT |
dbqcch(#/s) |
Database query count that are satisfied by the query cache (not sent to database) |
dbfetrow(#/s) |
Database row fetch count |
dbcmmt(#/s) |
Database COMMIT count |
dbqins(#/s) |
Database query count for type INSERT |
dbqupd(#/s) |
Database query count for type UPDATE |
dbqdel(#/s) |
Database query count for type DELETE |
dbqproc(#/s) |
Database stored procedure invocation count |
dbcon(#/s) |
Database connection used (leased from connection pool) |
dbcondef(#/s) |
Database connections deferred (not leased or activated) |
dbconwait(s/s) |
Database connection pool lease wait time (consistently > 0 indicates undersized connection pool) |
thpwait(s/s) |
Thread pool lease wait time (consistently > 0 with no other waits indicates undersized thread pool) |
thpwait(#/s) |
Thread pool lease wait count (LANA engine threads only, not web app server threads) |
thpsize(#/s) |
Thread pool size (LANA engine threads) |
thpactv(#/s) |
Thread pool active count within pool (LANA engine threads) |
ssapi(#/s) |
Static SOAP web service API call count |
ssapi(s/s) |
Static SOAP web service API wait times (aggregated only upon completion of each call) |
globcch(KB) |
Global enterprise data cache size within heap (approximated) |
qrycch(KB) |
Query cache size within heap (approximated) |
avail(K) |
Available Java heap memory |
logerr(#) |
Number of ERROR log events within log interval |
logwrn(#) |
Number of WARN log events within log interval |
sampint(s) |
Server metrics sample interval (the granularity of the background capture) |
logint(s) |
Server metrics log interval (aggregates samples within each log interval) |
EcoSys SERVERMETRICS values reflect only those resources used within the Java application environment. Resources used by separate Java processes or other applications running on the server is not reflected in the SERVERMETRICS data. The Java VM is sharing the system-level resources with other processes on the system. For suggestions about monitoring these external resources, refer to the operating system monitoring section.
Performance Summary Report Available
You can analyzae an EcoSys application log to produce a summary report of server metrics over time. To access this function, click System Admin > Display Log > Download Log, and then select Analysis. You may also submit one or more application log files to EcoSys technical support for analysis. For more information about producing and interpreting a performance log file analysis, refer to the section in this document on log file analysis.
Operating System Level Monitoring for Windows
We recommend that you monitor the following components, including Windows event viewer logs, for errors: CPU, memory, network I/O, swap space.
Operating System Level Monitoring for Unix/Linux
Using standard Unix/Linux tools such as top, htop, dstat, vmstat, and nmon verify the following:
Memory
Verify that the server paging is at the minimum, and that the Java machine has enough space to completely run in the physical memory. You must also consider the max memory allocated to the Java Heap, specified by the -XMx option. For more information, see the Java Heap sizing section.
CPU
Ensure that no other process or service other than the application server (Java process) is consuming CPU resources intensively on the server. At times during heavy load on the system, the CPU usage naturally spikes, and the Java process consumes the resources it needs. After the activity is over, CPU usage levels should drop down to less than 2%. EcoSys recommends a minimum of two CPUs allocated to the application server.
I/O
Ensure that there is no intensive I/O activity by any other process running on the server.
Disk Space
Ensure there is at least 5GBs of available disk space on the mount where the application server is installed and running.
Make sure the data collected includes timestamps so it can be correlated with the application log and other performance data.
Database Monitoring for Oracle
Monitor CPU, memory, I/O, Alert logs, SGA utilization, object fragmentation, row chaining, and event waits. Make sure that the data collected includes timestamps, so that the data can be correlated with the application log and other performance data. Also review Oracle utilities, such as AWR and ADDM.
Database Monitoring for SQL Server
Monitor CPU, memory, I/O, object fragmentation, SQL server logs. Make sure that the data collected includes timestamps, so that the data can be correlated with the application log and other performance data.