Age | Commit message (Collapse) | Author |
|
# Conflicts:
# .github/workflows/build.yaml
# indra/newview/app_settings/shaders/class2/deferred/alphaF.glsl
# indra/newview/app_settings/shaders/class3/deferred/reflectionProbeF.glsl
# indra/newview/app_settings/shaders/class3/deferred/softenLightF.glsl
# indra/newview/llfilepicker.cpp
|
|
Introduce AlwaysReturn<void> specialization, which always discards any result
of calling the specified callable with specified args.
Derive new Windows_SEH_exception from LLException, not std::runtime_error.
Put the various SEH functions in LL::seh nested namespace, e.g.
LL::seh::catcher() as the primary API.
Break out more levels of Windows SEH handler to work around the restrictions on
functions containing __try/__except.
The triadic catcher() overload now does little save declare a std::string
stacktrace before forwarding the call to catcher_inner(), passing a reference
to stacktrace along with the trycode, filter and handler functions.
catcher_inner() accepts the stacktrace and the three function template
arguments. It contains the __try/__except logic. It calls a new filter_()
wrapper template, which calls fill_stacktrace() before forwarding the call to
the caller's filter function. fill_stacktrace(), in the .cpp file, contains
the logic to populate the stacktrace string -- unless the Structured Exception
is stack overflow, in which case it puts an explanatory string instead.
catcher_inner()'s __except clause passes not only the code, but also the
stacktrace string, to the caller's handler function. It wraps the caller's
handler function in always_return<rtype>(), where rtype is the type returned
by the trycode function. This allows a handler to return a value, while also
supporting the void handler case, e.g. one that throws a C++ exception. (This
is why we need AlwaysReturn<void>: some trycode() functions are themselves
void.)
For the dyadic catcher() overload, introduce common_filter() containing the
logic to distinguish a C++ exception from any other kind of Structured
Exception. The fact that the stacktrace is captured before the filter function
is called should permit capturing a stacktrace for a C++ exception as well as
for most other Structured Exceptions.
As before, the monadic catcher() overload supplies the rethrow() handler, in
the .cpp file.
Change existing calls from seh_catcher() to LL::seh::catcher().
|
|
|
|
Since August 2023, we've seen occasional GitHub Windows build test runs
terminate with 0xC00000FD: stack overflow. We've usually responded by bumping
up the default coroutine stack size.
On closer examination, it's always llleap_test.cpp that blows up that way --
and llleap_test.cpp doesn't appear to use coroutines at all. So apparently
we've been consuming more address space for ALL viewer coroutines without
actually addressing the problem.
Reset the default coroutine stack size to where it was before we started
bumping it up in response to these llleap_test.cpp stack overflow failures.
Note that LLCoros already catches and reports Windows structured exceptions,
underscoring that the observed stack overflow is not from within a coroutine.
While at it, restore the Windows llleap_test.cpp data volume to match Posix.
We think the problem that led to reducing that data volume was an APR bug,
which we hope has been fixed.
Equip test.cpp, the test driver program for all our TUT unit and integration
tests, with a Windows structured exception handler. Try to treat a Windows
structured exception as a test failure -- instead of silently terminating with
0xC00000FD. Moreover, when a structured exception occurs, output a stack trace
so we can try to track it down.
|
|
Maintenance X
|
|
(cherry picked from commit dc0b3aed4782e4e4835fd6b9d59d1d70b78be4a7)
|
|
LF, and trim trailing whitespaces as needed
|
|
# Conflicts:
# .github/workflows/build.yaml
|
|
Under debug LL_ERRS will show a message as well, but release won't show
anything and will quit silently so show a notification when applicable.
|
|
# Conflicts:
# indra/newview/llspatialpartition.cpp
|
|
On a Windows CI host, we got the dreaded rc 3221225725 aka c00000fd aka stack
overflow.
|
|
# Conflicts:
# autobuild.xml
# indra/llcommon/tests/llleap_test.cpp
# indra/newview/viewer_manifest.py
|
|
A test executable on a GitHub Windows runner failed with C00000FD, which
reports stack overflow.
(cherry picked from commit aab7b4ba3812e5876b1205285bcfd8cff96bcac9)
|
|
For unknown reason allocations of these coroutines often crash on client machines.
1. Limit quantity of coros running in parallel by reducing retries and wait time
2. Print out more diagnostic info
|
|
This partially reverts commit 935c1362a222f192bf913270d01f6c31c16e175b.
Reporting seems to have stoped working, trying the same way mac works.
|
|
# Conflicts:
# indra/newview/app_settings/settings.xml
# indra/newview/llfloatersearch.cpp
# indra/newview/llgroupactions.cpp
# indra/newview/llvovolume.cpp
|
|
SL-14961 works better for windows than rethrow
|
|
Rename 'winlevel()' to 'sehandle()'; change it from a static member function
to a free function, thus eliminating the conditional in llcoros.h.
Elsewhere than Windows, provide a zero-cost pass-through sehandle()
implementation, eliminating the conditional in toplevel().
# Conflicts:
# indra/llcommon/llcoros.cpp
# indra/llcommon/llcoros.h
|
|
This mechanism uses a queue of std::exception_ptrs to transport an (otherwise)
uncaught exception from a terminated coroutine to the thread's main fiber. The
main loop calls LLCoros::rethrow() just after giving some cycles to ready
coroutines that frame.
# Conflicts:
# indra/llcommon/llcoros.cpp
# indra/llcommon/llcoros.h
# indra/newview/llappviewer.cpp
|
|
Will be replaced with retrow from nat
|
|
# Conflicts:
# indra/newview/llfloatereditextdaycycle.cpp
# indra/newview/llviewerinput.cpp
|
|
# Conflicts:
# autobuild.xml
# indra/llcommon/llerror.cpp
# indra/llui/llnotifications.h
# indra/newview/llappviewer.cpp
# indra/newview/llappviewermacosx.cpp
|
|
|
|
|
|
Superficially looks like an out of memory crash, redirect allocation failures into LL_ERRS.
|
|
|
|
|
|
For the main coroutine on each thread, show the 'main0' (or whatever) name
instead of the empty-string name.
|
|
Actually, introduce static LLCoros::logname() and make the namespaced free
function an alias for that.
Because CoroData is a subclass of LLInstanceTracker with a key, every instance
requires a distinct key. That conflicts with our "getName() returns empty
string for default coroutine on thread" convention. Introduce a new CoroData
constructor, specifically for the default coroutine on each thread, that
initializes the getName() name to empty string while providing a distinct
"mainN" key. Make get_CoroData() use that new constructor for its thread_local
instance, passing an atomic<int> incremented each time we initialize one for a
new thread.
Then LLCoros::logname() returns either the getName() name or the key.
|
|
|
|
The new LLCoros::Stop exception is intended to terminate long-lived coroutines
-- not interrupt mainstream shutdown processing. Only throw it on an
explicitly-launched coroutine.
Make LLCoros::getName() (used by the above test) static. As with other LLCoros
methods, it might be called after the LLCoros LLSingleton instance has been
deleted. Requiring the caller to call instance() implies a possible need to
also call wasDeleted(). Encapsulate that nuance into a static method instead.
|
|
Instead of heap-allocating a CoroData instance per coroutine, storing the
pointer in a ptr_map and deleting it from the ptr_map once the
fiber_specific_ptr for that coroutine is cleaned up -- just declare a stack
instance on the top-level stack frame, the simplest C++ lifespan management.
Derive CoroData from LLInstanceTracker to detect potential name collisions and
to enumerate instances.
Continue registering each coroutine's CoroData instance in our
fiber_specific_ptr, but use a no-op deleter function.
Make ~LLCoros() directly pump the fiber scheduler a few times, instead of
having a special "LLApp" listener.
|
|
|
|
By the time "LLApp" listeners are notified that the app is quitting, the
mainloop is no longer running. Even though those listeners do things like
close work queues and inject exceptions into pending promises, any coroutines
waiting on those resources must regain control before they can notice and shut
down properly. Add a final "LLApp" listener that resumes ready coroutines a
few more times.
Make sure every other "LLApp" listener is positioned before that new one.
|
|
Introduce LLCoros::Stop exception, with subclasses Stopping, Stopped and
Shutdown. Add LLCoros::checkStop(), intended to be called periodically by any
coroutine with nontrivial lifespan. It checks the LLApp status and, unless
isRunning(), throws one of these new exceptions.
Make LLCoros::toplevel() catch Stop specially and log forcible coroutine
termination.
Now that LLApp status matters even in a test program, introduce a trivial
LLTestApp subclass whose sole function is to make isRunning() true.
(LLApp::setStatus() is protected: only a subclass can call it.) Add LLTestApp
instances to lleventcoro_test.cpp and lllogin_test.cpp.
Make LLCoros::toplevel() accept parameters by value rather than by const
reference so we can continue using them even after context switches.
Make private LLCoros::get_CoroData() static. Given that we've observed some
coroutines living past LLCoros destruction, making the caller call
LLCoros::instance() is more dangerous than encapsulating it within a static
method -- since the encapsulated call can check LLCoros::wasDeleted() first
and do something reasonable instead. This also eliminates the need for both a
const and non-const overload.
Defend LLCoros::delete_CoroData() (cleanup function for fiber_specific_ptr for
CoroData, implicitly called after coroutine termination) against calls after
~LLCoros().
Add a status string to coroutine-local data, with LLCoro::setStatus(),
getStatus() and RAII class TempStatus.
Add an optional 'when' string argument to LLCoros::printActiveCoroutines().
Make ~LLCoros() print the coroutines still active at destruction.
|
|
|
|
Delete the test for SRV timeout: lllogin no longer issues an SRV query. That
test only confuses the test program without exercising any useful paths in
production code.
As with other tests dating from the previous LLCoros implementation, we need a
few llcoro::suspend() calls sprinkled in so that a fiber marked ready -- by
fulfilling the future for which it is waiting -- gets a chance to run.
Clear LLEventPumps between test functions.
|
|
Longtime fans will remember that the "dcoroutine" library is a Google Summer
of Code project by Giovanni P. Deretta. He originally called it
"Boost.Coroutine," and we originally added it to our 3p-boost autobuild
package as such. But when the official Boost.Coroutine library came along
(with a very different API), and we still needed the API of the GSoC project,
we renamed the unofficial one "dcoroutine" to allow coexistence.
The "dcoroutine" library had an internal low-level API more or less analogous
to Boost.Context. We later introduced an implementation of that internal API
based on Boost.Context, a step towards eliminating the GSoC code in favor of
official, supported Boost code.
However, recent versions of Boost.Context no longer support the API on which
we built the shim for "dcoroutine." We started down the path of reimplementing
that shim using the current Boost.Context API -- then realized that it's time
to bite the bullet and replace the "dcoroutine" API with the Boost.Fiber API,
which we've been itching to do for literally years now.
Naturally, most of the heavy lifting is in llcoros.{h,cpp} and
lleventcoro.{h,cpp} -- which is good: the LLCoros layer abstracts away most of
the differences between "dcoroutine" and Boost.Fiber.
The one feature Boost.Fiber does not provide is the ability to forcibly
terminate some other fiber. Accordingly, disable LLCoros::kill() and
LLCoprocedureManager::shutdown(). The only known shutdown() call was in
LLCoprocedurePool's destructor.
We also took the opportunity to remove postAndSuspend2() and its associated
machinery: FutureListener2, LLErrorEvent, errorException(), errorLog(),
LLCoroEventPumps. All that dual-LLEventPump stuff was introduced at a time
when the Responder pattern was king, and we assumed we'd want to listen on one
LLEventPump with the success handler and on another with the error handler. We
have never actually used that in practice. Remove associated tests, of course.
There is one other semantic difference that necessitates patching a number of
tests: with "dcoroutine," fulfilling a future IMMEDIATELY resumes the waiting
coroutine. With Boost.Fiber, fulfilling a future merely marks the fiber as
ready to resume next time the scheduler gets around to it. To observe the test
side effects, we've inserted a number of llcoro::suspend() calls -- also in
the main loop.
For a long time we retained a single unit test exercising the raw "dcoroutine"
API. Remove that.
Eliminate llcoro_get_id.{h,cpp}, which provided llcoro::get_id(), which was a
hack to emulate fiber-local variables. Since Boost.Fiber has an actual API for
that, remove the hack.
In fact, use (new alias) LLCoros::local_ptr for LLSingleton's dependency
tracking in place of llcoro::get_id().
In CMake land, replace BOOST_COROUTINE_LIBRARY with BOOST_FIBER_LIBRARY. We
don't actually use the Boost.Coroutine for anything (though there exist
plausible use cases).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
autobuild 1.1 now supports expanding $variables within a config file --
support that was explicitly added to address this very problem. So now the
windows platform in autobuild.xml uses $AUTOBUILD_ADDRSIZE,
$AUTOBUILD_WIN_VSPLATFORM and $AUTOBUILD_WIN_CMAKE_GEN, which should handle
most of the deltas between the windows platform and windows64.
This permits removing the windows64 platform definition from autobuild.xml.
The one remaining delta between the windows64 and windows platform definitions
was -DLL_64BIT_BUILD=TRUE. But we can handle that instead by checking
ADDRESS_SIZE. Change all existing references to WORD_SIZE to ADDRESS_SIZE
instead, and set ADDRESS_SIZE to $AUTOBUILD_ADDRSIZE. Change the one existing
LL_64BIT_BUILD reference to test (ADDRESS_SIZE EQUAL 64) instead.
|
|
|
|
Until now, the "main coroutine" (the initial context) of each thread left
LLCoros::Current() NULL. The trouble with that is that llcoro::get_id()
returns that CoroData* as an opaque token, and we want distinct values for
every stack in the process. That would not be true if the "main coroutine" on
thread A returned the same value (NULL) as the "main coroutine" on thread B,
and so forth. Give each thread's "main coroutine" a dummy heap CoroData
instance of its own.
|
|
|
|
|