Age | Commit message (Collapse) | Author |
|
Since August 2023, we've seen occasional GitHub Windows build test runs
terminate with 0xC00000FD: stack overflow. We've usually responded by bumping
up the default coroutine stack size.
On closer examination, it's always llleap_test.cpp that blows up that way --
and llleap_test.cpp doesn't appear to use coroutines at all. So apparently
we've been consuming more address space for ALL viewer coroutines without
actually addressing the problem.
Reset the default coroutine stack size to where it was before we started
bumping it up in response to these llleap_test.cpp stack overflow failures.
Note that LLCoros already catches and reports Windows structured exceptions,
underscoring that the observed stack overflow is not from within a coroutine.
While at it, restore the Windows llleap_test.cpp data volume to match Posix.
We think the problem that led to reducing that data volume was an APR bug,
which we hope has been fixed.
Equip test.cpp, the test driver program for all our TUT unit and integration
tests, with a Windows structured exception handler. Try to treat a Windows
structured exception as a test failure -- instead of silently terminating with
0xC00000FD. Moreover, when a structured exception occurs, output a stack trace
so we can try to track it down.
|
|
LF, and trim trailing whitespaces as needed
|
|
|
|
Turns out that the pathname of the Python executable wasn't the issue.
This reverts commit 7dc6211ad5ea83685a35c6fff740278343aa8b9d.
|
|
On GitHub Windows runners, trying to make build.yaml set PYTHON=python in the
environment doesn't work: integration tests still fail with "Access is denied"
because they're still trying to execute the interpreter's full pathname.
Instead, make llprocess_test and llleap_test detect the case of GitHub Windows
and override the environment variable PYTHON with a baked-in string constant
"python".
|
|
Introducing indirection via test_python_script.py did NOT address the "Access
is denied" errors on GitHub Windows runners.
|
|
It's cool to be able to write 'arg1 << "stuff" << var ...;' for a lambda
accepting a std::ostream reference, but cascading compile errors mean it's no
longer worth trying to make that work -- given actual C++ lambdas.
Also clean up a lingering BOOST_FOREACH() and a boost::bind() while at it.
|
|
It seems the problem addressed by aab769e wasn't some synergy between
Boost.Phoenix and Boost.Function, but rather the lack of a Phoenix header file
introducing operator<<().
|
|
On GitHub Windows Actions runners, we're getting permissions errors trying to
tell the Python interpreter to run a NamedTempFile script. Try using
NamedExtTempFile to give each such script a .py extension.
|
|
# Conflicts:
# indra/cmake/CMakeLists.txt
# indra/llcommon/llsdserialize.cpp
# indra/llcommon/llsdserialize.h
# indra/llcommon/tests/llleap_test.cpp
# indra/newview/llfilepicker.h
# indra/newview/llfilepicker_mac.h
# indra/newview/llfilepicker_mac.mm
# indra/newview/skins/default/xui/en/strings.xml
|
|
When sending multiple LEAP packets in the same file (for testing convenience),
use a length prefix instead of delimiting with '\n'. Now that we allow a
serialization format that includes an LLSD format header (e.g.
"<?llsd/binary?>"), '\n' is part of the packet content. But in fact, testing
binary LLSD means we can't pick any delimiter guaranteed not to appear in the
packet content.
Using a length prefix also lets us pass a specific max_bytes to the subject
C++ LLSD parser.
Make llleap_test.cpp use new freestanding Python llsd package when available.
Update Python-side LEAP protocol code to work directly with encoded bytes
stream, avoiding bytes<->str encoding and decoding, which breaks binary LLSD.
Make LLSDSerialize::deserialize() recognize LLSD format header case-
insensitively. Python emits and checks for "llsd/binary", while LLSDSerialize
emits and checks for "LLSD/Binary". Once any of the headers is recognized,
pass corrected max_bytes to the specific parser.
Make deserialize() more careful about the no-header case: preserve '\n' in
content. Introduce debugging code (disabled) because it's a little tricky to
recreate.
Revert LLLeap child process stdout parser from LLSDSerialize::deserialize() to
the specific LLSDNotationParser(), as at present: the generic parser fails one
of LLLeap's integration tests for reasons that remain mysterious.
|
|
|
|
|
|
|
|
Always search for python3[.exe] instead of plain 'python'. macOS Monterey no
longer bundles Python 2 at all.
Explicitly make PYTHON_EXECUTABLE a cached value so if the user edits it in
CMakeCache.txt, it won't be overwritten by indra/cmake/Python.cmake.
Do NOT set DYLD_LIBRARY_PATH for test executables! That has Bad Effects, as
discussed in https://stackoverflow.com/q/73418423/5533635. Instead, create
symlinks from build-mumble/sharedlibs/Resources -> Release/Resources and from
build-mumble/test/Resources -> ../sharedlibs/Release/Resources. For test
executables in sharedlibs/RelWithDebInfo and test/RelWithDebInfo, this
supports our dylibs' baked-in load path @executable_path/../Resources. That
load path assumes running in a standard app bundle (which the viewer in fact
does), but we've been avoiding creating an app bundle for every test program.
These symlinks allow us to continue doing that while avoiding
DYLD_LIBRARY_PATH.
Add indra/llcommon/apply.h. The LL::apply() function and its wrapper macro
VAPPLY were very useful in diagnosing the problem.
Tweak llleap_test.cpp. This source was modified extensively for diagnostic
purposes; these are the small improvements that remain.
|
|
This changeset makes it possible to build the Second Life viewer using
Python 3. It is designed to be used with an equivalent Autobuild branch
so that a developer can compile without needing Python 2 on their
machine.
Breaking change: Python 2 support ending
Rather than supporting two versions of Python, including one that was
discontinued at the beginning of the year, this branch focuses on
pouring future effort into Python 3 only. As a result, scripts do not
need to be backwards compatible. This means that build environments,
be they on personal computers and on build agents, need to have a
compatible interpreter.
Notes
- SLVersionChecker will still use Python 2 on macOS
- Fixed the message template url used by template_verifier.py
|
|
The previous implementation went to some effort to crash if anyone attempted
to create or destroy an LLInstanceTracker subclass instance during traversal.
That restriction is manageable within a single thread, but becomes unworkable
if it's possible that a given subclass might be used on more than one thread.
Remove LLInstanceTracker::instance_iter, beginInstances(), endInstances(),
also key_iter, beginKeys() and endKeys(). Instead, introduce key_snapshot()
and instance_snapshot(), the only means of iterating over LLInstanceTracker
instances. (These are intended to resemble functions, but in fact the current
implementation simply presents the classes.) Iterating over a captured
snapshot defends against container modifications during traversal. The term
'snapshot' reminds the coder that a new instance created during traversal will
not be considered. To defend against instance deletion during traversal, a
snapshot stores std::weak_ptrs which it lazily dereferences, skipping on the
fly any that have expired.
Dereferencing instance_snapshot::iterator gets you a reference rather than a
pointer. Because some use cases want to delete all existing instances, add an
instance_snapshot::deleteAll() method that extracts the pointer. Those cases
used to require explicitly copying instance pointers into a separate
container; instance_snapshot() now takes care of that. It remains the caller's
responsibility to ensure that all instances of that LLInstanceTracker subclass
were allocated on the heap.
Replace unkeyed static LLInstanceTracker::getInstance(T*) -- which returned
nullptr if that instance had been destroyed -- with new getWeak() method
returning std::weak_ptr<T>. Caller must detect expiration of that weak_ptr.
Adjust tests accordingly.
Use of std::weak_ptr to detect expired instances requires engaging
std::shared_ptr in the constructor. We now store shared_ptrs in the static
containers (std::map for keyed, std::set for unkeyed).
Make LLInstanceTrackerBase a template parameterized on the type of the static
data it manages. For that reason, hoist static data class declarations out of
the class definitions to an LLInstanceTrackerStuff namespace.
Remove the static atomic sIterationNestDepth and its methods incrementDepth(),
decrementDepth() and getDepth(), since they were used only to forbid creation
and destruction during traversal.
Add a std::mutex to static data. Introduce an internal LockStatic class that
locks the mutex while providing a pointer to static data, making that the only
way to access the static data.
The LLINSTANCETRACKER_DTOR_NOEXCEPT macro goes away because we no longer
expect ~LLInstanceTracker() to throw an exception in test programs.
That affects LLTrace::StatBase as well as LLInstanceTracker itself.
Adapt consumers to the new LLInstanceTracker API.
|
|
Use them in place of awkward try/catch test boilerplate.
|
|
Use LLStringUtil::getenv() or getoptenv() whenever we fetch a string that will
be used as a pathname.
Use LLFile::tmpdir() instead of getenv("TEMP").
As an added extra-special bonus, finally clean up $TMP/llcontrol-test-zzzzzz
directories that have been accumulating every time we run a local build!
|
|
|
|
|
|
|
|
|
|
|
|
The new toolchain may (!) have fixed a longstanding bug in LLLeap / APR when
we try to pump large volumes of data through a Windows named pipe using APR
nonblocking I/O. This used to fail pretty consistently because the APR
nonblocking write call would sometimes spuriously return "would block" when in
fact the data buffer was completely written; the caller would later retry,
which of course would duplicate some of the data in the pipe. Preliminary
experiments with VS 2013 suggest this may have been resolved. This changeset
is to propagate the experiment to a wider range of Windows systems; we may
need to revert it if in fact the bug persists.
|
|
https://svn.boost.org/trac/boost/ticket/10864
I've used boost::lambda with boost::function in a number of creative ways over
the years. But the clang 6 shipped with Xcode 6 seems to have somehow broken
lambda + function in Boost 1.57. boost::phoenix is a partial workaround.
Sadly, lambda's comma-operator overload doesn't seem to be supported,
necessitating a couple ugly workarounds.
With real lambdas now supported by current compilers, I'm sure the Boost
community has little incentive to repair the lambda + function problem.
Presumably we'll be able to use such features ourselves Real Soon Now...
|
|
|
|
|
|
|
|
from this tree
|
|
This test must not be subject to spurious environmental failures, else some
kind soul will disable it entirely. We observe that APR specifies a hard-coded
buffer size of 64Kbytes for pipe creation -- use that and cross fingers.
|
|
Sigh, the rejoicing was premature.
|
|
If in fact we've managed to fix the APR bug writing to a Windows named pipe,
it should no longer be necessary to try to work around it by testing with a
much smaller data volume on Windows!
|
|
Ideally we'd love to be able to nail the underlying bug, but log output
suggests it may actually go all the way down to the OS level. To move forward,
try to bypass it.
|
|
We want to write a robust test that consistently works. On Windows, that
appears to require constraining the max message size. I, the coder, could try
submitting test runs of varying sizes to TC until I found a size that works...
but that could take quite a while. If I were clever, I might even use a manual
binary search. But computers are good at binary searching; there are even
prepackaged algorithms in the STL. If I were cleverer still, I could make the
test program itself search for size that works.
|
|
Apparently, at least on Mac, there are circumstances in which the very-large-
message test can take several times longer than normal, yet still complete
successfully. This is always the problem with timeouts: does timeout
expiration mean that the code in question is actually hung, or would it
complete if given a bit longer?
If very-large-message test fails, retry a few times with smaller sizes to try
to find a size at which the test runs reliably. The default size, ca 1MB, is
intended to be substantially larger than anything we'll encounter in the wild.
Is that "unreasonably" large? Is there a "reasonable" size at which the test
could consistently pass? Is that "reasonable" size still larger than what we
expect to encounter in practice? Need more information, hence this code.
|
|
It seems that under certain circumstances, write logic was duplicating a chunk
of the data being streamed down our pipe. But as this condition is only driven
with a very large data stream, eyeballing that data stream is tedious. Add
code to compare the raw received data with the expected stream, reporting
where and how they first differ.
|
|
While we're accumulating the 'length:' prefix, the present socket-based logic
reads 20 characters, then reads 'length' more, then discards any excess (in
case the whole 'length:data' packet ends up being less than 20 characters).
That's probably a bug: whatever characters follow that packet, however short
it may be, are probably the 'length:' prefix of the next packet. We probably
only get away with it because we probably never send packets that short.
Earlier llleap_test.cpp plugin logic still read 20 characters, then, if there
were any left after the present packet, cached them as the start of the next
packet. This is probably more correct, but complicated. Easier just to read
individual characters until we've seen 'length:', then try for exactly the
specified length over however many reads that requires.
|
|
In load testing, we have observed intermittent failures on Windows in which
LLSDNotationStreamer into std::ostringstream seems to bump into a hard limit
of 1048590 bytes. ostringstream reports that much buffered data and returns
that much -- even though, on examination, the notation-serialized stream is
incomplete at that point. It's our intention to load-test LLLeap and
LLProcess, not the local iostream implementation; we hope that this kind of
data volume is comfortably greater than actual usage. Back off the
load-testing max size a bit.
|
|
New llleap_test.cpp load testing turned up Windows issue in which plugin
process received corrupt packet, producing LLSDParseError. Add code to dump
the bad packet in that case -- but if LLSDParseError is willing to state the
offset of the problem, not ALL of the packet.
Quiet MSVC warning about little internal base class needing virtual destructor.
|
|
|
|
|
|
These tests rule out corruption as we cross buffer boundaries in OS pipes and
the LLLeap implementation itself.
|
|
It only took a few examples of trying to wrangle notation LLSD as string data
to illustrate how clumsy that is. I'd forgotten that a couple other TUT tests
already invoke Python code that depends on the llsd module. The trick is to
recognize that at least as of now, there's still an obsolete version of the
module in the viewer's own source tree. Python code is careful to try
importing llbase.llsd before indra.base.llsd, so that if/when we finally do
clear indra/lib/python from the viewer repo, we need only require that llbase
be installed on every build machine.
|
|
Migrate logic from specific test to common reader module, notably parsing the
wakeup message containing the reply-pump name.
Make test script post to Result struct to communicate success/failure to C++
TUT test, rather than just writing to log.
Make test script insensitive to key order in serialized LLSD::Map.
|
|
Instantiating LLLeap with a command to execute a particular child process sets
up machinery to speak LLSD Event API Plugin protocol with that child process.
LLLeap is an LLInstanceTracker subclass, so the code that instantiates need
not hold the pointer. LLLeap monitors child-process termination and deletes
itself when done.
|