move_pages12: handle errno EBUSY for madvise(..., MADV_SOFT_OFFLINE)

The test#2 is going to simulate the race condition, where move_pages()
and soft offline are called on a single hugetlb page concurrently. But,
it return EBUSY and report FAIL in soft-offline a moving hugepage as a
result sometimes.

The root cause seems a call to page_huge_active return false, then the
soft offline action will failed to isolate hugepage with EBUSY return as
below call trace:

In Parent:
  madvise(..., MADV_SOFT_OFFLINE)
  ...
    soft_offline_page
      soft_offline_in_use_page
        soft_offline_huge_page
          isolate_huge_page
            page_huge_active
             # return false at here

In Child:
  move_pages()
  ...
    do_move_pages
      do_move_pages_to_node
        add_page_for_migration
          isolate_huge_page
            # it has already isolated the hugepage

In this patch, I simply regard the returned EBUSY as a normal situation and
mask it in error handler. Because move_pages is calling add_page_for_migration
to isolate hugepage before do migration, so that's very possible to hit the
collision and return EBUSY on the same page.

Error log:
----------
move_pages12.c:235: INFO: Free RAM 8386256 kB
move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on node 0 to 4
move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on node 1 to 6
move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 1
move_pages12.c:169: PASS: Bug not reproduced
move_pages12.c:81: FAIL: madvise failed: SUCCESS
move_pages12.c:81: FAIL: madvise failed: SUCCESS
move_pages12.c:143: BROK: mmap((nil),4194304,3,262178,-1,0) failed: ENOMEM
move_pages12.c:114: FAIL: move_pages failed: EINVAL

Dmesg:
------
[165435.492170] soft offline: 0x61c00 hugepage failed to isolate
[165435.590252] soft offline: 0x61c00 hugepage failed to isolate
[165435.725493] soft offline: 0x61400 hugepage failed to isolate

Other two fixes in this patch:
 * use TERRNO(but not TTERRNO) to catch madvise(..., MADV_SOFT_OFFLINE) errno
 * retry mmap when hugepage allocating failed with ENOMEM
 * retry numa_move_pages when hitting ENOMEM

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Cyril Hrubis <chrubis@suse.cz>
1 file changed
tree: c211f5a50db88115fcd9c3b94970c19f0f3e6a1b
  1. doc/
  2. include/
  3. lib/
  4. libs/
  5. m4/
  6. pan/
  7. runtest/
  8. scenario_groups/
  9. scripts/
  10. testcases/
  11. testscripts/
  12. tools/
  13. travis/
  14. utils/
  15. .gitignore
  16. .gitmodules
  17. .travis.yml
  18. build.sh
  19. configure.ac
  20. COPYING
  21. execltp.in
  22. IDcheck.sh
  23. INSTALL
  24. ltpmenu
  25. Makefile
  26. Makefile.release
  27. README.kernel_config
  28. README.md
  29. runltp
  30. TODO
  31. ver_linux
  32. VERSION
README.md

Linux Test Project

Linux Test Project is a joint project started by SGI, OSDL and Bull developed and maintained by IBM, Cisco, Fujitsu, SUSE, Red Hat, Oracle and others. The project goal is to deliver tests to the open source community that validate the reliability, robustness, and stability of Linux.

The LTP testsuite contains a collection of tools for testing the Linux kernel and related features. Our goal is to improve the Linux kernel and system libraries by bringing test automation to the testing effort. Interested open source contributors are encouraged to join.

Project pages are located at: http://linux-test-project.github.io/

The latest image is always available at: https://github.com/linux-test-project/ltp/releases

The discussion about the project happens at ltp mailing list: http://lists.linux.it/listinfo/ltp

The git repository is located at GitHub at: https://github.com/linux-test-project/ltp

The patchwork instance is at: https://patchwork.ozlabs.org/project/ltp/list/

Warning!

Be careful with these tests!

Don't run them on production systems. Growfiles, doio, and iogen in particular stress the I/O capabilities of systems and while they should not cause problems on properly functioning systems, they are intended to find (or cause) problems.

Quick guide to running the tests

If you have git, autoconf, automake, m4, the linux headers and the common developer packages installed, the chances are the following will work.

$ git clone https://github.com/linux-test-project/ltp.git
$ cd ltp
$ make autotools
$ ./configure

Now you can continue either with compiling and running a single test or with compiling and installing the whole testsuite.

Shortcut to running a single test

If you need to execute a single test you actually do not need to compile the whole LTP, if you want to run a syscall testcase following should work.

$ cd testcases/kernel/syscalls/foo
$ make
$ PATH=$PATH:$PWD ./foo01

Shell testcases are a bit more complicated since these need a path to a shell library as well as to compiled binary helpers, but generally following should work.

$ cd testcases/lib
$ make
$ cd ../commands/foo
$ PATH=$PATH:$PWD:$PWD/../../lib/ ./foo01.sh

Open Posix Testsuite has it's own build system which needs Makefiles to be generated first, then compilation should work in subdirectories as well.

$ cd testcases/open_posix_testsuite/
$ make generate-makefiles
$ cd conformance/interfaces/foo
$ make
$ ./foo_1-1.run-test

Compiling and installing all testcases

$ make
$ make install

This will install LTP to /opt/ltp.

  • If you have a problem see doc/mini-howto-building-ltp-from-git.txt.
  • If you still have a problem see INSTALL and ./configure --help.
  • Failing that, ask for help on the mailing list or Github.

Some tests will be disabled if the configure script can not find their build dependencies.

  • If a test returns TCONF due to a missing component, check the ./configure output.
  • If a tests fails due to a missing user or group, see the Quick Start section of INSTALL.

To run all the test suites

$ cd /opt/ltp
$ ./runltp

Note that many test cases have to be executed as root.

To run a particular test suite

$ ./runltp -f syscalls

To run all tests with madvise in the name

$ ./runltp -f syscalls -s madvise

Also see

$ ./runltp --help

Test suites (e.g. syscalls) are defined in the runtest directory. Each file contains a list of test cases in a simple format, see doc/ltp-run-files.txt.

Each test case has its own executable or script, these can be executed directly

$ testcases/bin/abort01

Some have arguments

$ testcases/bin/fork13 -i 37

The vast majority of test cases accept the -h (help) switch

$ testcases/bin/ioctl01 -h

Many require certain environment variables to be set

$ LTPROOT=/opt/ltp PATH="$PATH:$LTPROOT/testcases/bin" testcases/bin/wc01.sh

Most commonly, the path variable needs to be set and also LTPROOT, but there are a number of other variables, runltp usually sets these for you.

Note that all shell scripts need the PATH to be set. However this is not limited to shell scripts, many C based tests need environment variables as well.

Developers corner

Before you start you should read following documents:

  • doc/test-writing-guidelines.txt
  • doc/build-system-guide.txt

There is also a step-by-step tutorial:

  • doc/c-test-tutorial-simple.txt

If something is not covered there don't hesitate to ask on the LTP mailing list. Also note that these documents are available online at:

Although we accept GitHub pull requests, the preferred way is sending patches to our mailing list.

It's a good idea to test patches on Travis CI before posting to mailing list. Our travis setup covers various architectures and distributions in order to make sure LTP compiles cleanly on most common configurations. For testing you need to sign up to Travis CI, enable running builds on your LTP fork on https://travis-ci.org/account/repositories and push your branch.