- 0e96028 [autotest] Consolidate methods required to setup a scheduler. by Prashanth B · 11 years ago
- 372613d [autotest] Sanity check host assignments. by Prashanth B · 11 years ago
- cc9fc70 [autotest] RDB Refactor II + Request/Response API. by beeps · 11 years ago
- 76af802 [autotest] abort Starting suite job leads to scheduler crash by Dan Shi · 11 years ago
- 7d8273b [autotest] RDB refactor I by beeps · 11 years ago
- 5e2bb4a [autotest] Scheduler refactor. by beeps · 11 years ago
- 7d8a1b1 [autotest] De-prioritize hostless hqes in favor of tests. by beeps · 11 years ago
- d0e09ab [autotest] Fix SelfThrottledTask._num_running_processes when suite job is aborted by Dan Shi · 11 years ago
- 1f23b69 [autotest] reenable django or simplejson requiring unit tests by Aviv Keshet · 12 years ago
- 7282202 [autotest] Do not write queue or .machines files. by Alex Miller · 12 years ago
- aa51336 Host scheduler refactoring. Move HostScheduler out of monitor_db. by Dale Curtis · 14 years ago
- dd77e01 by jamesren · 15 years ago
- 76fcf19 Add ability to associate drone sets with jobs. This restricts a job to by jamesren · 15 years ago
- 47bd737 Set hostless queue entries to STARTING upon scheduling the agent. This by jamesren · 15 years ago
- c44ae99 Refactor scheduler models into a separate module, scheduler_models. This module doesn't depend on monitor_db, only the other way around. The separation and isolation of dependencies should help us organize the scheduler code a bit better. by jamesren · 15 years ago
- 883492a First iteration of pluggable metahost handlers. This change adds the basic framework and moves the default, label-based metahost assignment code into a handler. It includes some refactorings to the basic scheduling code to make things a bit cleaner. by jamesren · 15 years ago
- 64a9595 When using Django models from a script, make the current user default to an actual database user named "autotest_system". This allows for simpler, more consistent code. by showard · 15 years ago
- 78f5b01 Update to Django 1.1.1. I want to use a new feature for my RESTful interface prototyping (direct inclusion of URL patterns in URLconfs). by showard · 15 years ago
- eab66ce Rename the tables in the databases, by prefixing the app name. This is by showard · 15 years ago
- f13a9e2 Add periodic CPython garbage collector statistics logging to aid in by showard · 15 years ago
- f65b740 Fix a rather brittle scheduler unit test by showard · 15 years ago
- d119565 Make drone_manager track running processes counts using only the information passed in from the scheduler. Currently it also uses process counts derived from "ps", but that is an unreliable source of information. This improves accuracy and consistency and gives us full control over the process. by showard · 15 years ago
- d07a5f3 The check for enough pending hosts after the delay to wait for others to by showard · 15 years ago
- 418785b Some improvements to process tracking in the scheduler. by showard · 15 years ago
- 9bb960b Support restricting access to drones by user. Administrators can put lines like by showard · 15 years ago
- e60e44e Special tasks show "Failed" as their status instead of "Completed" if by showard · 15 years ago
- 7ca9e01 Remove the synch_job_start_timeout_minutes scheduler "feature" as it is by showard · 15 years ago
- 8375ce0 Fix unindexable object error raised on the error path within by showard · 15 years ago
- d201482 When a delayed call task finishes waiting for extra hosts to enter by showard · 15 years ago
- dae680a Ignore microsecond differences in datetimes when checking existing in by showard · 15 years ago
- ec6a3b9 Make the pidfile timeout in the scheduler configurable. Raise the by showard · 15 years ago
- db50276 Write host keyvals for all verify/cleanup/repair tasks. by showard · 15 years ago
- 8cc058f Make scheduler more stateless. Agents are now scheduled only by the by showard · 15 years ago
- cdaeae8 Fixed bug where scheduler would crash if the autoserv process is lost by showard · 15 years ago
- 6631273 Make a bunch of stuff executable by mbligh · 15 years ago
- 58721a8 One-off fix to address the issue where a scheduler shutdown immediately by showard · 15 years ago
- 6d1c143 Fix scheduler's handling of jobs when the PID file can't be found. by showard · 15 years ago
- 708b352 Do not go through a DelayedCallTask on atomic group jobs when all Hosts by showard · 15 years ago
- 1ef218d This is the result of a batch reindent.py across our tree. by mbligh · 15 years ago
- a5288b4 Upgrade from Django 0.96 to Django 1.0.2. by showard · 15 years ago
- a640b2d Fix scheduler bug with aborting a pre-job task. Scheduler was by showard · 15 years ago
- 8ac6f2a When a SpecialAgentTask is passed an existing SpecialTask, set the _working_directory upon object construction. It was previously set in prolog(), but recovery agents don't run prolog, but they still need _working_directory sometimes (i.e. when a RepairTask fails). by showard · 15 years ago
- 381341a Enter the mock objects created in AgentTasksTest of monitor_db_unittest by showard · 15 years ago
- cfd4a7e With the new SpecialTask recovery code, a RepairTask can be passed a queue entry that was previously requeued. So make sure the task leaves the HQE alone in that case. by showard · 15 years ago
- b6681aa SpecialAgentTasks can be aborted if they're tied to a job that gets aborted while they're active. In that case, we still need to update the SpecialTask entry to mark it as complete. by showard · 15 years ago
- ed2afea make SpecialTasks recoverable. this involves quite a few changes. by showard · 15 years ago
- 6157c63 Make the scheduler robust to finding a HostQueueEntry with more than one by showard · 15 years ago
- 2fe3f1d Enter all Verify/Cleanup/Repair tasks into the special_tasks table. Also by showard · 15 years ago
- e7d9c60 Make the job executiontag available in both the server and client side job by mbligh · 15 years ago
- e9c6936 Pass --verbose flag for verify/repair/cleanup. Since we currently log these via piped console output, we want verbose output. by showard · 15 years ago
- b562645 ensure hosts get cleaned up even in the rare but possible case that a QueueTask finds no process at all by showard · 15 years ago
- 2924b0a Ensure one-time-hosts aren't in the Everyone ACL, and make the scheduler ignore this. by showard · 15 years ago
- af8b4ca Fix _atomic_and_has_started() to check *only* for states that are a by showard · 15 years ago
- 7718256 Have the scheduler wait a configurable amount of time before starting by showard · 15 years ago
- 184a5e8 make AgentTasksTest inherit from BaseSchedulerTest. it didn't used to, since it didn't have any DB dependencies, but the recent introduction of SpecialTasks has changed that, so we need AgentTasksTest to setup the DB now like everything else. It doesn't increase the unit test runtime too drastically. by showard · 15 years ago
- b6d1662 fix JobManager.get_status_counts, which was returning incorrect counts in some cases when jobs were aborted. the problem was that it's possible for a complete entry to have aborted set or not and have the same full status, which was violating an assumption of the method. by showard · 15 years ago
- 5add1c8 Make recovered tasks correctly handle being aborted before being started. Unlike other tasks, recovered tasks are effectively "started" as soon as they're created, since they're recovering a previously started task. So implement that properly so that when they're aborted, they do all the necessary killing and cleanup stuff. by showard · 15 years ago
- 54c1ea9 Sort hosts when choosing them for use in an atomic group and when by showard · 15 years ago
- ebc0fb7 Add an extra check for existence of Autoserv results in GatherLogsTask -- in certain recovery cases this can be false, previously leading to an exception. by showard · 16 years ago
- 12f3e32 Add job maximum runtime, a new per-job timeout that counts time since the job actually started. by showard · 16 years ago
- 2d7c8bd Fix scheduler unittest for parser's new -P flag by mbligh · 16 years ago
- a1e74b3 Add job option for whether or not to parse failed repair results as part of a job, with a default value in global_config. Since the number of options associated with a job is getting out of hand, I packaged them up into a dict in the RPC entry point and passed them around that way from then on. by showard · 16 years ago
- f1ae354 Represent a group of machines with either the atomic group label name, by showard · 16 years ago
- 597bfd3 Only run crashinfo collection when Autoserv exited due to some signal -- not just when it failed. Also make a minor fixup to some logging during process recovery. by showard · 16 years ago
- 08a3641 Change Agent.abort() again. This time, it runs through its queue of AgentTasks, aborting them until it reaches one that ignores the abort (or exhausts the queue). With the previous logic, we might have an Agent with a GatherLogsTasks that should ignore the abort, but if the Agent got aborted before starting it would never run the task. I hope I've really got it right this time. by showard · 16 years ago
- 0bbfc21 Make autoserv --collect_crashinfo only run when Autoserv actually failed (exit status nonzero) or was aborted. I was being lazy and always running it, but it seems that introduced very annoying latency into job runs. by showard · 16 years ago
- 20f9bdd fix Agent.abort() when it's called before the agent has started (in that case, it should do nothing -- but the logic was making it basically ignore the abort). this should fix jobs being aborting in the "starting" phase (a phase that lasts one cycle before "running" starts). by showard · 16 years ago
- d920518 Make RepairTask write job_queued and job_finished keyvals so they can be parsed into TKO when failed repair results are parsed. by showard · 16 years ago
- 6b73341 Fix two bugs introduced in previous change to add collect_crashinfo support. by showard · 16 years ago
- d3dc199 Add support to the scheduler to run autoserv --collect_crashinfo after a job finishes or is aborted. by showard · 16 years ago
- 915958d Fix monitor_db_unittest, broken by previous change to refactor cleanup code. Two main things here: by showard · 16 years ago
- 87ba02a extract code for generated autoserv command lines to a common place, including support for -l and -u params, and make verify, repair and cleanup tasks pass those params. this should make failed repairs include the right user and job name when parsed into tko. by showard · 16 years ago
- 76e29d1 Fix monitor_db.DBObject.save() to handle None values as NULL properly. by showard · 16 years ago
- 205fd60 by showard · 16 years ago
- ccbd6c5 Ensure RepairTasks aren't associated with the queue entries that spawned them, so that if the QE is aborted during repair the repair task will continue running (and just leave the QE alone from then on). by showard · 16 years ago
- 89f84db by showard · 16 years ago
- a3c5857 a) Reduce the number of instances of DBObject classes created for the same row by showard · 16 years ago
- 35162b0 by showard · 16 years ago
- 25cbdbd by showard · 16 years ago
- d9ac445 by showard · 16 years ago
- 678df4f by showard · 16 years ago
- 8bcd23a Move all MySQLdb imports after the 'import common' so that a MySQLdb by mbligh · 16 years ago
- de634ee by showard · 16 years ago
- c9ae178 by showard · 16 years ago
- ade14e2 by showard · 16 years ago
- 324bf81 by showard · 16 years ago
- 2fa5169 by showard · 16 years ago
- d1ee1dd * move some scheduler config options into a separate module, scheduler_config by showard · 16 years ago
- 170873e Attached is a very large patch that adds support for running a by showard · 16 years ago
- e58e3f8 Set HQEs to "Verifying" instead of "Starting" when we're about to run verify on them. We need to set them to an active status, but if we use "Starting" then we can't tell which stage they're in, and we need that information to know when to "stop" synchronous jobs. by showard · 16 years ago
- 8fe93b5 Make CleanupTask copy results to job dir on failure. Did this by extracting code from VerifyTask into a common superclass. by showard · 16 years ago
- e788ea6 -make get_group_entries() return a list instead of a generator, since all callers want it that way anyway by showard · 16 years ago
- e77ac67 Set queue entries to "Starting" when the VerifyTask is created for them. This perennial source of problems cropped up again in the latest change to the job.run() code (as part of the synch_count changes). by showard · 16 years ago
- 2bab8f4 Implement sync_count. The primary change here is replacing the job.synch_type field with a synch_count field. There is no longer just a distinction between synchronous and asynchronous jobs. Instead, every job as a synch_count, with synch_count = 1 corresponding to the old concept of synchronous jobs. This required: by showard · 16 years ago
- 9d9ffd5 don't reboot hosts when aborting inactive jobs. by showard · 16 years ago
- 45ae819 Add a formal cleanup phase to the scheduler flow. by showard · 16 years ago
- fa8629c -ensure Django connection is autocommit enabled, when used from monitor_db by showard · 16 years ago
- 97aed50 Rewrite final reparse code in scheduler. the final reparse is now handled by a separate AgentTask, and there's a "Parsing" status for queue entries. This is a cleaner implementation that allows us to still implement parse throttling with ease and get proper recovery of reparses after a system crash fairly easily. by showard · 16 years ago
- 9886397 Add job start timeout for synchronous jobs. This timeout applies to synchronous jobs that are holding a public pool machine (i.e. in the Everyone ACL) as "Pending". This includes a new global config option, scheduler code to enforce the timeout and a unit test. by showard · 16 years ago
- 3dd6b88 Two simple scheduler fixes: by showard · 16 years ago