1. 0e96028 [autotest] Consolidate methods required to setup a scheduler. by Prashanth B · 11 years ago
  2. 372613d [autotest] Sanity check host assignments. by Prashanth B · 11 years ago
  3. cc9fc70 [autotest] RDB Refactor II + Request/Response API. by beeps · 11 years ago
  4. 76af802 [autotest] abort Starting suite job leads to scheduler crash by Dan Shi · 11 years ago
  5. 7d8273b [autotest] RDB refactor I by beeps · 11 years ago
  6. 5e2bb4a [autotest] Scheduler refactor. by beeps · 11 years ago
  7. 7d8a1b1 [autotest] De-prioritize hostless hqes in favor of tests. by beeps · 11 years ago
  8. d0e09ab [autotest] Fix SelfThrottledTask._num_running_processes when suite job is aborted by Dan Shi · 11 years ago
  9. 1f23b69 [autotest] reenable django or simplejson requiring unit tests by Aviv Keshet · 12 years ago
  10. 7282202 [autotest] Do not write queue or .machines files. by Alex Miller · 12 years ago
  11. aa51336 Host scheduler refactoring. Move HostScheduler out of monitor_db. by Dale Curtis · 14 years ago
  12. dd77e01 by jamesren · 15 years ago
  13. 76fcf19 Add ability to associate drone sets with jobs. This restricts a job to by jamesren · 15 years ago
  14. 47bd737 Set hostless queue entries to STARTING upon scheduling the agent. This by jamesren · 15 years ago
  15. c44ae99 Refactor scheduler models into a separate module, scheduler_models. This module doesn't depend on monitor_db, only the other way around. The separation and isolation of dependencies should help us organize the scheduler code a bit better. by jamesren · 15 years ago
  16. 883492a First iteration of pluggable metahost handlers. This change adds the basic framework and moves the default, label-based metahost assignment code into a handler. It includes some refactorings to the basic scheduling code to make things a bit cleaner. by jamesren · 15 years ago
  17. 64a9595 When using Django models from a script, make the current user default to an actual database user named "autotest_system". This allows for simpler, more consistent code. by showard · 15 years ago
  18. 78f5b01 Update to Django 1.1.1. I want to use a new feature for my RESTful interface prototyping (direct inclusion of URL patterns in URLconfs). by showard · 15 years ago
  19. eab66ce Rename the tables in the databases, by prefixing the app name. This is by showard · 15 years ago
  20. f13a9e2 Add periodic CPython garbage collector statistics logging to aid in by showard · 15 years ago
  21. f65b740 Fix a rather brittle scheduler unit test by showard · 15 years ago
  22. d119565 Make drone_manager track running processes counts using only the information passed in from the scheduler. Currently it also uses process counts derived from "ps", but that is an unreliable source of information. This improves accuracy and consistency and gives us full control over the process. by showard · 15 years ago
  23. d07a5f3 The check for enough pending hosts after the delay to wait for others to by showard · 15 years ago
  24. 418785b Some improvements to process tracking in the scheduler. by showard · 15 years ago
  25. 9bb960b Support restricting access to drones by user. Administrators can put lines like by showard · 15 years ago
  26. e60e44e Special tasks show "Failed" as their status instead of "Completed" if by showard · 15 years ago
  27. 7ca9e01 Remove the synch_job_start_timeout_minutes scheduler "feature" as it is by showard · 15 years ago
  28. 8375ce0 Fix unindexable object error raised on the error path within by showard · 15 years ago
  29. d201482 When a delayed call task finishes waiting for extra hosts to enter by showard · 15 years ago
  30. dae680a Ignore microsecond differences in datetimes when checking existing in by showard · 15 years ago
  31. ec6a3b9 Make the pidfile timeout in the scheduler configurable. Raise the by showard · 15 years ago
  32. db50276 Write host keyvals for all verify/cleanup/repair tasks. by showard · 15 years ago
  33. 8cc058f Make scheduler more stateless. Agents are now scheduled only by the by showard · 15 years ago
  34. cdaeae8 Fixed bug where scheduler would crash if the autoserv process is lost by showard · 15 years ago
  35. 6631273 Make a bunch of stuff executable by mbligh · 15 years ago
  36. 58721a8 One-off fix to address the issue where a scheduler shutdown immediately by showard · 15 years ago
  37. 6d1c143 Fix scheduler's handling of jobs when the PID file can't be found. by showard · 15 years ago
  38. 708b352 Do not go through a DelayedCallTask on atomic group jobs when all Hosts by showard · 15 years ago
  39. 1ef218d This is the result of a batch reindent.py across our tree. by mbligh · 15 years ago
  40. a5288b4 Upgrade from Django 0.96 to Django 1.0.2. by showard · 15 years ago
  41. a640b2d Fix scheduler bug with aborting a pre-job task. Scheduler was by showard · 15 years ago
  42. 8ac6f2a When a SpecialAgentTask is passed an existing SpecialTask, set the _working_directory upon object construction. It was previously set in prolog(), but recovery agents don't run prolog, but they still need _working_directory sometimes (i.e. when a RepairTask fails). by showard · 15 years ago
  43. 381341a Enter the mock objects created in AgentTasksTest of monitor_db_unittest by showard · 15 years ago
  44. cfd4a7e With the new SpecialTask recovery code, a RepairTask can be passed a queue entry that was previously requeued. So make sure the task leaves the HQE alone in that case. by showard · 15 years ago
  45. b6681aa SpecialAgentTasks can be aborted if they're tied to a job that gets aborted while they're active. In that case, we still need to update the SpecialTask entry to mark it as complete. by showard · 15 years ago
  46. ed2afea make SpecialTasks recoverable. this involves quite a few changes. by showard · 15 years ago
  47. 6157c63 Make the scheduler robust to finding a HostQueueEntry with more than one by showard · 15 years ago
  48. 2fe3f1d Enter all Verify/Cleanup/Repair tasks into the special_tasks table. Also by showard · 15 years ago
  49. e7d9c60 Make the job executiontag available in both the server and client side job by mbligh · 15 years ago
  50. e9c6936 Pass --verbose flag for verify/repair/cleanup. Since we currently log these via piped console output, we want verbose output. by showard · 15 years ago
  51. b562645 ensure hosts get cleaned up even in the rare but possible case that a QueueTask finds no process at all by showard · 15 years ago
  52. 2924b0a Ensure one-time-hosts aren't in the Everyone ACL, and make the scheduler ignore this. by showard · 15 years ago
  53. af8b4ca Fix _atomic_and_has_started() to check *only* for states that are a by showard · 15 years ago
  54. 7718256 Have the scheduler wait a configurable amount of time before starting by showard · 15 years ago
  55. 184a5e8 make AgentTasksTest inherit from BaseSchedulerTest. it didn't used to, since it didn't have any DB dependencies, but the recent introduction of SpecialTasks has changed that, so we need AgentTasksTest to setup the DB now like everything else. It doesn't increase the unit test runtime too drastically. by showard · 15 years ago
  56. b6d1662 fix JobManager.get_status_counts, which was returning incorrect counts in some cases when jobs were aborted. the problem was that it's possible for a complete entry to have aborted set or not and have the same full status, which was violating an assumption of the method. by showard · 15 years ago
  57. 5add1c8 Make recovered tasks correctly handle being aborted before being started. Unlike other tasks, recovered tasks are effectively "started" as soon as they're created, since they're recovering a previously started task. So implement that properly so that when they're aborted, they do all the necessary killing and cleanup stuff. by showard · 15 years ago
  58. 54c1ea9 Sort hosts when choosing them for use in an atomic group and when by showard · 15 years ago
  59. ebc0fb7 Add an extra check for existence of Autoserv results in GatherLogsTask -- in certain recovery cases this can be false, previously leading to an exception. by showard · 16 years ago
  60. 12f3e32 Add job maximum runtime, a new per-job timeout that counts time since the job actually started. by showard · 16 years ago
  61. 2d7c8bd Fix scheduler unittest for parser's new -P flag by mbligh · 16 years ago
  62. a1e74b3 Add job option for whether or not to parse failed repair results as part of a job, with a default value in global_config. Since the number of options associated with a job is getting out of hand, I packaged them up into a dict in the RPC entry point and passed them around that way from then on. by showard · 16 years ago
  63. f1ae354 Represent a group of machines with either the atomic group label name, by showard · 16 years ago
  64. 597bfd3 Only run crashinfo collection when Autoserv exited due to some signal -- not just when it failed. Also make a minor fixup to some logging during process recovery. by showard · 16 years ago
  65. 08a3641 Change Agent.abort() again. This time, it runs through its queue of AgentTasks, aborting them until it reaches one that ignores the abort (or exhausts the queue). With the previous logic, we might have an Agent with a GatherLogsTasks that should ignore the abort, but if the Agent got aborted before starting it would never run the task. I hope I've really got it right this time. by showard · 16 years ago
  66. 0bbfc21 Make autoserv --collect_crashinfo only run when Autoserv actually failed (exit status nonzero) or was aborted. I was being lazy and always running it, but it seems that introduced very annoying latency into job runs. by showard · 16 years ago
  67. 20f9bdd fix Agent.abort() when it's called before the agent has started (in that case, it should do nothing -- but the logic was making it basically ignore the abort). this should fix jobs being aborting in the "starting" phase (a phase that lasts one cycle before "running" starts). by showard · 16 years ago
  68. d920518 Make RepairTask write job_queued and job_finished keyvals so they can be parsed into TKO when failed repair results are parsed. by showard · 16 years ago
  69. 6b73341 Fix two bugs introduced in previous change to add collect_crashinfo support. by showard · 16 years ago
  70. d3dc199 Add support to the scheduler to run autoserv --collect_crashinfo after a job finishes or is aborted. by showard · 16 years ago
  71. 915958d Fix monitor_db_unittest, broken by previous change to refactor cleanup code. Two main things here: by showard · 16 years ago
  72. 87ba02a extract code for generated autoserv command lines to a common place, including support for -l and -u params, and make verify, repair and cleanup tasks pass those params. this should make failed repairs include the right user and job name when parsed into tko. by showard · 16 years ago
  73. 76e29d1 Fix monitor_db.DBObject.save() to handle None values as NULL properly. by showard · 16 years ago
  74. 205fd60 by showard · 16 years ago
  75. ccbd6c5 Ensure RepairTasks aren't associated with the queue entries that spawned them, so that if the QE is aborted during repair the repair task will continue running (and just leave the QE alone from then on). by showard · 16 years ago
  76. 89f84db by showard · 16 years ago
  77. a3c5857 a) Reduce the number of instances of DBObject classes created for the same row by showard · 16 years ago
  78. 35162b0 by showard · 16 years ago
  79. 25cbdbd by showard · 16 years ago
  80. d9ac445 by showard · 16 years ago
  81. 678df4f by showard · 16 years ago
  82. 8bcd23a Move all MySQLdb imports after the 'import common' so that a MySQLdb by mbligh · 16 years ago
  83. de634ee by showard · 16 years ago
  84. c9ae178 by showard · 16 years ago
  85. ade14e2 by showard · 16 years ago
  86. 324bf81 by showard · 16 years ago
  87. 2fa5169 by showard · 16 years ago
  88. d1ee1dd * move some scheduler config options into a separate module, scheduler_config by showard · 16 years ago
  89. 170873e Attached is a very large patch that adds support for running a by showard · 16 years ago
  90. e58e3f8 Set HQEs to "Verifying" instead of "Starting" when we're about to run verify on them. We need to set them to an active status, but if we use "Starting" then we can't tell which stage they're in, and we need that information to know when to "stop" synchronous jobs. by showard · 16 years ago
  91. 8fe93b5 Make CleanupTask copy results to job dir on failure. Did this by extracting code from VerifyTask into a common superclass. by showard · 16 years ago
  92. e788ea6 -make get_group_entries() return a list instead of a generator, since all callers want it that way anyway by showard · 16 years ago
  93. e77ac67 Set queue entries to "Starting" when the VerifyTask is created for them. This perennial source of problems cropped up again in the latest change to the job.run() code (as part of the synch_count changes). by showard · 16 years ago
  94. 2bab8f4 Implement sync_count. The primary change here is replacing the job.synch_type field with a synch_count field. There is no longer just a distinction between synchronous and asynchronous jobs. Instead, every job as a synch_count, with synch_count = 1 corresponding to the old concept of synchronous jobs. This required: by showard · 16 years ago
  95. 9d9ffd5 don't reboot hosts when aborting inactive jobs. by showard · 16 years ago
  96. 45ae819 Add a formal cleanup phase to the scheduler flow. by showard · 16 years ago
  97. fa8629c -ensure Django connection is autocommit enabled, when used from monitor_db by showard · 16 years ago
  98. 97aed50 Rewrite final reparse code in scheduler. the final reparse is now handled by a separate AgentTask, and there's a "Parsing" status for queue entries. This is a cleaner implementation that allows us to still implement parse throttling with ease and get proper recovery of reparses after a system crash fairly easily. by showard · 16 years ago
  99. 9886397 Add job start timeout for synchronous jobs. This timeout applies to synchronous jobs that are holding a public pool machine (i.e. in the Everyone ACL) as "Pending". This includes a new global config option, scheduler code to enforce the timeout and a unit test. by showard · 16 years ago
  100. 3dd6b88 Two simple scheduler fixes: by showard · 16 years ago