The Job Tracker tool allows you to view and monitor the run history and current running progress of jobs that have run in the past and jobs that are currently running. The tool allows you to view the runtime log history for jobs that the user is interested in monitoring. Perform the following steps to select and track the jobs you are interested in:
You also have the option of monitoring "Running Jobs" or "Queued Jobs". This a nice feature if you are interested in tracking jobs that are still in progress. As described above you can filter for these kinds of jobs by setting the "Filter By Status" options.
If you are using JobServer Pro, you can also filter by the Agents that the jobs have run on. You can search for jobs running on Primary/Secondary hosts or on specific Agent servers.
JobServer captures all Log4J and Java Logging API messages
made by the Tasklet and makes them available for viewing from
the Job Tracker tool. Note that Log4J Logger must support
additivity=true for logs to be captured and reported by Job Tracker.
This filter option allows you to filter for jobs that have certain runtime conditions. You can search for jobs that have one or more of the following conditions:
In addition to the search filters above, you can also specify to view jobs that are still "Running" or are in "Queued" state. This will return any job that is running or queued. You can also choose to only look for jobs that have "Finished" running and thus completed processing. "Finished" also includes jobs that may have been terminated or failed, but either way they are no longer running.
Search for the jobs that you are interested in. You can search by job name or by job id. For job id, you also have the option of entering more than one job id using comma separated job ids. The pattern "*" is used as a wild card search. Using the character "*" before and/or after a search pattern functions as a wild card. If you enter "*" then all jobs within the Group and Partition specified will be returned. For example, "foo*" will return any job that starts with the name "foo". If you enter a number, then it will try to find a job with the exact job id. Once you execute the search (by clicking the "go" button or by hitting the "Enter" key), the results will be placed in the drop down list below it, showing all jobs that match the search criteria.
There is also a special option for this field that lets you find a specific job based on its unique run id. In the search text field, simply enter the unique run id prefixed by "++", for example "++1276". This will search for a job with run id of 1276 when you click the "go" button. If the run id being searched is part of retry chain, then all job runs of the original run id will be shown.
If the search pattern finds matching jobs, they will be in the drop down list. If you choose "All Listed Jobs", then all the jobs in the drop down list will be searched for, otherwise only one single job will be searched for.
The "Paging" field allows you to limit how many jobs are shown at once on the screen. You will be able to page backwards and forwards through the entire list as defined by the paging size. For example, if your search result returns 1000 matches and your paging size is 200, then you will need to page 5 times to get to the end of the list.
Now specify what time range you want to search within. This will search for jobs that started running between the start and end dates specified.
If there are jobs that are running or have run in the past that match the criteria specified, they will be displayed. You can drill down and view more details by clicking on the RunID in the first column. This will allow you to view the job's logging information all the way down to each individual Tasklet. If the job had multiple Tasklets, each one will have an entry and you can drill down and see more detailed logging information per Tasklet.
The download feature allows you to download the entire report to your browser in the form of a tab delimited file. Click the "Download" button to start the download. You can have the downloaded report contain "All" the columns/fields available to be reported on or just the the columns that where "Selected" and visible for display on the screen.
By default you will see the following columns for each job run results:
running- Job is running on a local node, remote agent node or on Mesos slave node.
running (kill pending)- Job is still running, but a user has requested the job to be killed.
expired- Job got scheduled outside it allowed scheduling window and was expired and never run.
queued- Job is waiting in the queue and is ready to run.
started- This state only applies to Mesos jobs and means job is preparing to be run in Mesus cluster.
finished- Job can finish with or without errors.
failure- Failure will often have a tooltip detailing the failure reason message. Failures are severe errors that stop the job from running.
terminated- Terminated means the job stopped processing due to some user action or internal problem. If a job is later retried then it will be marked with "*", e.g.
terminated*, and will have a tooltip indicating it was/will be retried.
NA- Typically means job is still running.
normal- Job finished without any issues.
exit event- Job threw an exit condition exception via Tasklet API.
exit on failure- Job threw a failure exception via Tasklet API.
exit on internal error- Job threw a low-level internal error exception. This will typically have a tooltip indicating reason for the internal error.
killed- Job killed due to request from a user. Tooltip will idicate kill detais.
system shutdown- Job terminated because of system shutdown or job system crash.
queue delete- Job deleted by a user from the queue.
queue failure- Job failed while it was being loaded into the queue.
mesos lost- Means job never returned a TASK_FINISHED event from Mesos and is therefore determined to be terminated. This could mean the Mesos slave was lost or the connection to the Mesos master was lost.
local- Job ran in shared JVM on primary or secondary node.
local iso- Job ran in isolated JVM on primary or secondary node.
agent- Job ran on shared JVM on distributed agent node.
agent iso- Job ran on isolated JVM on distributed agent node.
mesos cmd- Job ran on Mesos managed cluster using Mesos defult command executor. This is limited to only running a remote script on any Mesos slave and can't run the full JobServer Tasklet API.
mesos iso- Job ran on Mesos managed cluster using custom JobServer JVM executor.
agent isoRun Mode) or on a Mesos slave are easier and faster to kill/terminate because they run in their own isocated process. All other job run modes (
agent), can still be killed, but only at defined job run checkpoints, so they are not guaranteed to be killed and if they are killed will only be killed when they reach defined Tasklet API checkpoints.
You can select additional fields to display by using the "Set Display Fields" button. You can add the following fields:
Note that any field that ends with a ".." will have a tooltip associated with it that provides additional information. If you hover the mouse over the label you will see the tooltip.
Queued jobs can be deleted by selecting them and then clicking the "Delete Queued" button.
Running jobs can be killed by selecting them and then clicking the "Kill Running" button. Jobs that are running in their own JVM are faster and easier to kill and are usually terminated immediately upon the kill action. Jobs running in the shared JVM can only be effectively killed when they reach certain checkpoints such as transitioning between one Tasklet to another or when calling logging API or other certain API calls, so please given them time to be terminated.
This is a simple message showing what state the overall JobServer engine is in such as "Running" or "Idle". If in "Idle" mode, for example, it means that the job scheduler is not running and no jobs can be scheduled or are processing. However, if you are using remote Agents, jobs can still be running on remote Agents if they are still processing jobs.
Note, that Mesos related features are only available for JobServer server installations running on Linux.