- 2924b0a Ensure one-time-hosts aren't in the Everyone ACL, and make the scheduler ignore this. by showard · 15 years ago
- e39ebe9 temporary fix for bug in scheduling when at capacity. if no drone has capacity, pick the one with the least load. by showard · 15 years ago
- cbe6f94 add a log message to the scheduler thats useful for debugging atomic groups by showard · 15 years ago
- af8b4ca Fix _atomic_and_has_started() to check *only* for states that are a by showard · 15 years ago
- 08356c1 Do not call .set_host if the host is already set. by showard · 15 years ago
- 043c62a Ensure all entry points get the import-time logging logic executed before other autotest imports. by showard · 15 years ago
- 136e6dc Make scheduler and babysitter use the new logging_manager system. by showard · 15 years ago
- 6d7b2ff Redesign the reverify hosts feature. Host status is no longer changed by showard · 15 years ago
- 7718256 Have the scheduler wait a configurable amount of time before starting by showard · 15 years ago
- f098ebd convert a few straggling print statements in the scheduler code to logging calls by showard · 15 years ago
- 5613c66 Add an option to global config to disable to the scheduler isn't accidentally started on drones. by showard · 15 years ago
- 5debf85 Add logging info for drones so we know what drone drone_utility is running on. This will help identify slow drones and also keep track of where we are spending time. by showard · 15 years ago
- a64e52a Change behavior of Force Reverify: no longer executes cleanup before. by showard · 15 years ago
- 01a5167 Have the scheduler check for and sometimes cleanup various DB inconsistencies. by showard · 15 years ago
- 184a5e8 make AgentTasksTest inherit from BaseSchedulerTest. it didn't used to, since it didn't have any DB dependencies, but the recent introduction of SpecialTasks has changed that, so we need AgentTasksTest to setup the DB now like everything else. It doesn't increase the unit test runtime too drastically. by showard · 15 years ago
- 844960a make the readonly connection fallback to the regular Django connection when running in the scheduer. this is really important, because otherwise the readonly connection is not autocommit and bad, bad things could happen, though i'm not sure exactly what existing problems there might have been. we used to do this only for testing, but since we do it in another context here, i renamed the method to be more generic and appropriate. by showard · 15 years ago
- b6d1662 fix JobManager.get_status_counts, which was returning incorrect counts in some cases when jobs were aborted. the problem was that it's possible for a complete entry to have aborted set or not and have the same full status, which was violating an assumption of the method. by showard · 15 years ago
- 5add1c8 Make recovered tasks correctly handle being aborted before being started. Unlike other tasks, recovered tasks are effectively "started" as soon as they're created, since they're recovering a previously started task. So implement that properly so that when they're aborted, they do all the necessary killing and cleanup stuff. by showard · 15 years ago
- 29caa4b Explcitly catch SystemExit so we don't stack trace when we exit with sys.exit by showard · 15 years ago
- 54c1ea9 Sort hosts when choosing them for use in an atomic group and when by showard · 15 years ago
- 1ff7b2e Add ability to reverify a host from the Host List. by showard · 15 years ago
- 83d41dd Update debug_scheduler logging config to use INFO instead of debug. by showard · 15 years ago
- a9435c0 Fix recurring run code to reflect recent changes to rpc_utils.create_new_job(). by showard · 15 years ago
- ebc0fb7 Add an extra check for existence of Autoserv results in GatherLogsTask -- in certain recovery cases this can be false, previously leading to an exception. by showard · 15 years ago
- 12f3e32 Add job maximum runtime, a new per-job timeout that counts time since the job actually started. by showard · 15 years ago
- 2d7c8bd Fix scheduler unittest for parser's new -P flag by mbligh · 15 years ago
- 9e93640 Add post-parse site hooks (parse -P to trigger, default = off) by mbligh · 15 years ago
- a1e74b3 Add job option for whether or not to parse failed repair results as part of a job, with a default value in global_config. Since the number of options associated with a job is getting out of hand, I packaged them up into a dict in the RPC entry point and passed them around that way from then on. by showard · 15 years ago
- f1ae354 Represent a group of machines with either the atomic group label name, by showard · 15 years ago
- 597bfd3 Only run crashinfo collection when Autoserv exited due to some signal -- not just when it failed. Also make a minor fixup to some logging during process recovery. by showard · 15 years ago
- ef51921 Pick hosts out of an atomic group in order rather than randomly so that by showard · 15 years ago
- 08a3641 Change Agent.abort() again. This time, it runs through its queue of AgentTasks, aborting them until it reaches one that ignores the abort (or exhausts the queue). With the previous logic, we might have an Agent with a GatherLogsTasks that should ignore the abort, but if the Agent got aborted before starting it would never run the task. I hope I've really got it right this time. by showard · 15 years ago
- 83c1e9e Call out to site_monitor_db: site_init_monitor_db by mbligh · 15 years ago
- 29f7cd2 Here is a patch, which extends the autotest system with recurring job by showard · 15 years ago
- 0bbfc21 Make autoserv --collect_crashinfo only run when Autoserv actually failed (exit status nonzero) or was aborted. I was being lazy and always running it, but it seems that introduced very annoying latency into job runs. by showard · 15 years ago
- 20f9bdd fix Agent.abort() when it's called before the agent has started (in that case, it should do nothing -- but the logic was making it basically ignore the abort). this should fix jobs being aborting in the "starting" phase (a phase that lasts one cycle before "running" starts). by showard · 15 years ago
- b82b1f2 Make a couple of errant files executable by mbligh · 15 years ago
- d920518 Make RepairTask write job_queued and job_finished keyvals so they can be parsed into TKO when failed repair results are parsed. by showard · 15 years ago
- 6b73341 Fix two bugs introduced in previous change to add collect_crashinfo support. by showard · 15 years ago
- c84a950 Change the client, babysitter, scheduler logging configs to append to by jadmanski · 15 years ago
- d3dc199 Add support to the scheduler to run autoserv --collect_crashinfo after a job finishes or is aborted. by showard · 16 years ago
- 915958d Fix monitor_db_unittest, broken by previous change to refactor cleanup code. Two main things here: by showard · 16 years ago
- 87ba02a extract code for generated autoserv command lines to a common place, including support for -l and -u params, and make verify, repair and cleanup tasks pass those params. this should make failed repairs include the right user and job name when parsed into tko. by showard · 16 years ago
- 701f626 Add information collecting method so we can see what state the system was in when by showard · 16 years ago
- aa085e9 Change connect_timeout default from 30 seconds to 5 minutes by showard · 16 years ago
- 76e29d1 Fix monitor_db.DBObject.save() to handle None values as NULL properly. by showard · 16 years ago
- 159edc0 This gives us fixed width, and saves some space. by mbligh · 16 years ago
- dc41731 New prefix for file logging by mbligh · 16 years ago
- f3294cc Move clean up functions into seperate file/classes by mbligh · 16 years ago
- 27f3387 Ensure exception information from monitor_db goes to logs. by showard · 16 years ago
- 50e463b Add a check for AUTOTEST_SCHEDULER_LOG_DIR by showard · 16 years ago
- f2839f6 Change killing %d to %s by showard · 16 years ago
- c9895aa Move monitor_db_babysitter to using utils.run to start monitor_db with environment variable for monitor_db's logs. by mbligh · 16 years ago
- fb67603 Add write_pid to common code Call write_pid from scheduler and babysitter by mbligh · 16 years ago
- 7629f14 by showard · 16 years ago
- 205fd60 by showard · 16 years ago
- ccbd6c5 Ensure RepairTasks aren't associated with the queue entries that spawned them, so that if the QE is aborted during repair the repair task will continue running (and just leave the QE alone from then on). by showard · 16 years ago
- b18134f As discussed on the mailing list, we implemented logging with a single by showard · 16 years ago
- 89f84db by showard · 16 years ago
- cca334f by showard · 16 years ago
- a3c5857 a) Reduce the number of instances of DBObject classes created for the same row by showard · 16 years ago
- 35162b0 by showard · 16 years ago
- de700d3 by showard · 16 years ago
- 6ae5ea9 by showard · 16 years ago
- 25cbdbd by showard · 16 years ago
- a5cb406 by mbligh · 16 years ago
- a038235 by showard · 16 years ago
- 73ec044 by showard · 16 years ago
- d9ac445 by showard · 16 years ago
- 678df4f by showard · 16 years ago
- 8bcd23a Move all MySQLdb imports after the 'import common' so that a MySQLdb by mbligh · 16 years ago
- 6bb7c29 by showard · 16 years ago
- de634ee by showard · 16 years ago
- c9ae178 by showard · 16 years ago
- 6adf837 Fail quickly if we are accidentally started as root by mbligh · 16 years ago
- ade14e2 by showard · 16 years ago
- 324bf81 by showard · 16 years ago
- 67831ae by showard · 16 years ago
- 78d4d97 by showard · 16 years ago
- 0205a3e by showard · 16 years ago
- 2fa5169 by showard · 16 years ago
- 4fd61be by showard · 16 years ago
- c5afc46 by showard · 16 years ago
- c408c5e by showard · 16 years ago
- 55b4b54 by showard · 16 years ago
- 4f9e537 by showard · 16 years ago
- d1ee1dd * move some scheduler config options into a separate module, scheduler_config by showard · 16 years ago
- 170873e Attached is a very large patch that adds support for running a by showard · 16 years ago
- 37eceaa Add entries to the config file to control which server is used rather by mbligh · 16 years ago
- 6355f6b by showard · 16 years ago
- ac9ce22 Only schedule jobs that are "Queued". Now that state "Parsing" is an active=complete=0 state, we need to explicitly check for this. by showard · 16 years ago
- ff059d7 Don't abort running entries from synch start timeout (only queued/starting/verifying/pending ones). by showard · 16 years ago
- d876f45 gps pointed out that "== and != work in most cases but its better to use is by mbligh · 16 years ago
- c85c21b * allow scheduler email "from" address to be specified in global config by showard · 16 years ago
- e58e3f8 Set HQEs to "Verifying" instead of "Starting" when we're about to run verify on them. We need to set them to an active status, but if we use "Starting" then we can't tell which stage they're in, and we need that information to know when to "stop" synchronous jobs. by showard · 16 years ago
- cbd7461 When aborting a running job, write an INFO line to the status.log. by showard · 16 years ago
- 8fe93b5 Make CleanupTask copy results to job dir on failure. Did this by extracting code from VerifyTask into a common superclass. by showard · 16 years ago
- e788ea6 -make get_group_entries() return a list instead of a generator, since all callers want it that way anyway by showard · 16 years ago
- e77ac67 Set queue entries to "Starting" when the VerifyTask is created for them. This perennial source of problems cropped up again in the latest change to the job.run() code (as part of the synch_count changes). by showard · 16 years ago
- 2bab8f4 Implement sync_count. The primary change here is replacing the job.synch_type field with a synch_count field. There is no longer just a distinction between synchronous and asynchronous jobs. Instead, every job as a synch_count, with synch_count = 1 corresponding to the old concept of synchronous jobs. This required: by showard · 16 years ago