1. 2924b0a Ensure one-time-hosts aren't in the Everyone ACL, and make the scheduler ignore this. by showard · 15 years ago
  2. e39ebe9 temporary fix for bug in scheduling when at capacity. if no drone has capacity, pick the one with the least load. by showard · 15 years ago
  3. cbe6f94 add a log message to the scheduler thats useful for debugging atomic groups by showard · 15 years ago
  4. af8b4ca Fix _atomic_and_has_started() to check *only* for states that are a by showard · 15 years ago
  5. 08356c1 Do not call .set_host if the host is already set. by showard · 15 years ago
  6. 043c62a Ensure all entry points get the import-time logging logic executed before other autotest imports. by showard · 15 years ago
  7. 136e6dc Make scheduler and babysitter use the new logging_manager system. by showard · 15 years ago
  8. 6d7b2ff Redesign the reverify hosts feature. Host status is no longer changed by showard · 15 years ago
  9. 7718256 Have the scheduler wait a configurable amount of time before starting by showard · 15 years ago
  10. f098ebd convert a few straggling print statements in the scheduler code to logging calls by showard · 15 years ago
  11. 5613c66 Add an option to global config to disable to the scheduler isn't accidentally started on drones. by showard · 15 years ago
  12. 5debf85 Add logging info for drones so we know what drone drone_utility is running on. This will help identify slow drones and also keep track of where we are spending time. by showard · 15 years ago
  13. a64e52a Change behavior of Force Reverify: no longer executes cleanup before. by showard · 15 years ago
  14. 01a5167 Have the scheduler check for and sometimes cleanup various DB inconsistencies. by showard · 15 years ago
  15. 184a5e8 make AgentTasksTest inherit from BaseSchedulerTest. it didn't used to, since it didn't have any DB dependencies, but the recent introduction of SpecialTasks has changed that, so we need AgentTasksTest to setup the DB now like everything else. It doesn't increase the unit test runtime too drastically. by showard · 15 years ago
  16. 844960a make the readonly connection fallback to the regular Django connection when running in the scheduer. this is really important, because otherwise the readonly connection is not autocommit and bad, bad things could happen, though i'm not sure exactly what existing problems there might have been. we used to do this only for testing, but since we do it in another context here, i renamed the method to be more generic and appropriate. by showard · 15 years ago
  17. b6d1662 fix JobManager.get_status_counts, which was returning incorrect counts in some cases when jobs were aborted. the problem was that it's possible for a complete entry to have aborted set or not and have the same full status, which was violating an assumption of the method. by showard · 15 years ago
  18. 5add1c8 Make recovered tasks correctly handle being aborted before being started. Unlike other tasks, recovered tasks are effectively "started" as soon as they're created, since they're recovering a previously started task. So implement that properly so that when they're aborted, they do all the necessary killing and cleanup stuff. by showard · 15 years ago
  19. 29caa4b Explcitly catch SystemExit so we don't stack trace when we exit with sys.exit by showard · 15 years ago
  20. 54c1ea9 Sort hosts when choosing them for use in an atomic group and when by showard · 15 years ago
  21. 1ff7b2e Add ability to reverify a host from the Host List. by showard · 15 years ago
  22. 83d41dd Update debug_scheduler logging config to use INFO instead of debug. by showard · 15 years ago
  23. a9435c0 Fix recurring run code to reflect recent changes to rpc_utils.create_new_job(). by showard · 15 years ago
  24. ebc0fb7 Add an extra check for existence of Autoserv results in GatherLogsTask -- in certain recovery cases this can be false, previously leading to an exception. by showard · 15 years ago
  25. 12f3e32 Add job maximum runtime, a new per-job timeout that counts time since the job actually started. by showard · 15 years ago
  26. 2d7c8bd Fix scheduler unittest for parser's new -P flag by mbligh · 15 years ago
  27. 9e93640 Add post-parse site hooks (parse -P to trigger, default = off) by mbligh · 15 years ago
  28. a1e74b3 Add job option for whether or not to parse failed repair results as part of a job, with a default value in global_config. Since the number of options associated with a job is getting out of hand, I packaged them up into a dict in the RPC entry point and passed them around that way from then on. by showard · 15 years ago
  29. f1ae354 Represent a group of machines with either the atomic group label name, by showard · 15 years ago
  30. 597bfd3 Only run crashinfo collection when Autoserv exited due to some signal -- not just when it failed. Also make a minor fixup to some logging during process recovery. by showard · 15 years ago
  31. ef51921 Pick hosts out of an atomic group in order rather than randomly so that by showard · 15 years ago
  32. 08a3641 Change Agent.abort() again. This time, it runs through its queue of AgentTasks, aborting them until it reaches one that ignores the abort (or exhausts the queue). With the previous logic, we might have an Agent with a GatherLogsTasks that should ignore the abort, but if the Agent got aborted before starting it would never run the task. I hope I've really got it right this time. by showard · 15 years ago
  33. 83c1e9e Call out to site_monitor_db: site_init_monitor_db by mbligh · 15 years ago
  34. 29f7cd2 Here is a patch, which extends the autotest system with recurring job by showard · 15 years ago
  35. 0bbfc21 Make autoserv --collect_crashinfo only run when Autoserv actually failed (exit status nonzero) or was aborted. I was being lazy and always running it, but it seems that introduced very annoying latency into job runs. by showard · 15 years ago
  36. 20f9bdd fix Agent.abort() when it's called before the agent has started (in that case, it should do nothing -- but the logic was making it basically ignore the abort). this should fix jobs being aborting in the "starting" phase (a phase that lasts one cycle before "running" starts). by showard · 15 years ago
  37. b82b1f2 Make a couple of errant files executable by mbligh · 15 years ago
  38. d920518 Make RepairTask write job_queued and job_finished keyvals so they can be parsed into TKO when failed repair results are parsed. by showard · 15 years ago
  39. 6b73341 Fix two bugs introduced in previous change to add collect_crashinfo support. by showard · 15 years ago
  40. c84a950 Change the client, babysitter, scheduler logging configs to append to by jadmanski · 15 years ago
  41. d3dc199 Add support to the scheduler to run autoserv --collect_crashinfo after a job finishes or is aborted. by showard · 16 years ago
  42. 915958d Fix monitor_db_unittest, broken by previous change to refactor cleanup code. Two main things here: by showard · 16 years ago
  43. 87ba02a extract code for generated autoserv command lines to a common place, including support for -l and -u params, and make verify, repair and cleanup tasks pass those params. this should make failed repairs include the right user and job name when parsed into tko. by showard · 16 years ago
  44. 701f626 Add information collecting method so we can see what state the system was in when by showard · 16 years ago
  45. aa085e9 Change connect_timeout default from 30 seconds to 5 minutes by showard · 16 years ago
  46. 76e29d1 Fix monitor_db.DBObject.save() to handle None values as NULL properly. by showard · 16 years ago
  47. 159edc0 This gives us fixed width, and saves some space. by mbligh · 16 years ago
  48. dc41731 New prefix for file logging by mbligh · 16 years ago
  49. f3294cc Move clean up functions into seperate file/classes by mbligh · 16 years ago
  50. 27f3387 Ensure exception information from monitor_db goes to logs. by showard · 16 years ago
  51. 50e463b Add a check for AUTOTEST_SCHEDULER_LOG_DIR by showard · 16 years ago
  52. f2839f6 Change killing %d to %s by showard · 16 years ago
  53. c9895aa Move monitor_db_babysitter to using utils.run to start monitor_db with environment variable for monitor_db's logs. by mbligh · 16 years ago
  54. fb67603 Add write_pid to common code Call write_pid from scheduler and babysitter by mbligh · 16 years ago
  55. 7629f14 by showard · 16 years ago
  56. 205fd60 by showard · 16 years ago
  57. ccbd6c5 Ensure RepairTasks aren't associated with the queue entries that spawned them, so that if the QE is aborted during repair the repair task will continue running (and just leave the QE alone from then on). by showard · 16 years ago
  58. b18134f As discussed on the mailing list, we implemented logging with a single by showard · 16 years ago
  59. 89f84db by showard · 16 years ago
  60. cca334f by showard · 16 years ago
  61. a3c5857 a) Reduce the number of instances of DBObject classes created for the same row by showard · 16 years ago
  62. 35162b0 by showard · 16 years ago
  63. de700d3 by showard · 16 years ago
  64. 6ae5ea9 by showard · 16 years ago
  65. 25cbdbd by showard · 16 years ago
  66. a5cb406 by mbligh · 16 years ago
  67. a038235 by showard · 16 years ago
  68. 73ec044 by showard · 16 years ago
  69. d9ac445 by showard · 16 years ago
  70. 678df4f by showard · 16 years ago
  71. 8bcd23a Move all MySQLdb imports after the 'import common' so that a MySQLdb by mbligh · 16 years ago
  72. 6bb7c29 by showard · 16 years ago
  73. de634ee by showard · 16 years ago
  74. c9ae178 by showard · 16 years ago
  75. 6adf837 Fail quickly if we are accidentally started as root by mbligh · 16 years ago
  76. ade14e2 by showard · 16 years ago
  77. 324bf81 by showard · 16 years ago
  78. 67831ae by showard · 16 years ago
  79. 78d4d97 by showard · 16 years ago
  80. 0205a3e by showard · 16 years ago
  81. 2fa5169 by showard · 16 years ago
  82. 4fd61be by showard · 16 years ago
  83. c5afc46 by showard · 16 years ago
  84. c408c5e by showard · 16 years ago
  85. 55b4b54 by showard · 16 years ago
  86. 4f9e537 by showard · 16 years ago
  87. d1ee1dd * move some scheduler config options into a separate module, scheduler_config by showard · 16 years ago
  88. 170873e Attached is a very large patch that adds support for running a by showard · 16 years ago
  89. 37eceaa Add entries to the config file to control which server is used rather by mbligh · 16 years ago
  90. 6355f6b by showard · 16 years ago
  91. ac9ce22 Only schedule jobs that are "Queued". Now that state "Parsing" is an active=complete=0 state, we need to explicitly check for this. by showard · 16 years ago
  92. ff059d7 Don't abort running entries from synch start timeout (only queued/starting/verifying/pending ones). by showard · 16 years ago
  93. d876f45 gps pointed out that "== and != work in most cases but its better to use is by mbligh · 16 years ago
  94. c85c21b * allow scheduler email "from" address to be specified in global config by showard · 16 years ago
  95. e58e3f8 Set HQEs to "Verifying" instead of "Starting" when we're about to run verify on them. We need to set them to an active status, but if we use "Starting" then we can't tell which stage they're in, and we need that information to know when to "stop" synchronous jobs. by showard · 16 years ago
  96. cbd7461 When aborting a running job, write an INFO line to the status.log. by showard · 16 years ago
  97. 8fe93b5 Make CleanupTask copy results to job dir on failure. Did this by extracting code from VerifyTask into a common superclass. by showard · 16 years ago
  98. e788ea6 -make get_group_entries() return a list instead of a generator, since all callers want it that way anyway by showard · 16 years ago
  99. e77ac67 Set queue entries to "Starting" when the VerifyTask is created for them. This perennial source of problems cropped up again in the latest change to the job.run() code (as part of the synch_count changes). by showard · 16 years ago
  100. 2bab8f4 Implement sync_count. The primary change here is replacing the job.synch_type field with a synch_count field. There is no longer just a distinction between synchronous and asynchronous jobs. Instead, every job as a synch_count, with synch_count = 1 corresponding to the old concept of synchronous jobs. This required: by showard · 16 years ago