Age | Commit message (Collapse) | Author |
|
|
|
mesh upload
Tried to add consistent mesh upload retries in the HTTP work but a
combination of bad status choices in the upload service and the
one-shot nature of the upload capabilities means that status
information can be lost. For now, retain the wonderful manual
retry logic. At some future point, we might fix the services or
add application-level retry.
|
|
Problem involved a 3-way livelock between the main, upload
and decomposition threads. Viewer is shutting down but an
upload is in the 'generate hulls' state. Main thread asks
upload request to discard and spins waiting for it to finish.
Upload thread is in generateHulls spinning waiting for the
decomposition thread to process a mesh request. Decomposition
thread is sleeping waiting for main thread to deliver work
that upload thread has asked the decomposition thread to do.
|
|
|
|
|
|
|
|
the change made for MAINT-2347. Large transfers are still
10 minutes. Add/update to-do list and add some more info to
the FAQ in the Readme.
|
|
Start using DNS cache in legacy LLCurl code. Go to 15 seconds
particularly as we're using threaded resolver at this point.
Documentation cleanup. Add libcurl status checking and logging
for curl_easy_setopt() operations that fail. Shouldn't happen
and we'll just continue anyway but there's info in the logs to
track these down now. Cleaned up logic around FASTTIMER enable
defines used to evaluate pipeline stalls in main thread.
Removed long-standing thread race around caps strings and
URL construction. Not a significant risk but refactoring the
code to get rid of them removed one huge eyesore. It can be
made even slicker if desired (see notes).
|
|
behavior and swap() as well. Should probably do this
for the other queues at some point.
|
|
traditional swap-less object transfer.
|
|
Much earlier identified some thread races including two on mskininfoq
and mdecompositionq. I actually hit one in testing the other day so
I'm fixing them now. Put them under the mMutex lock and use the
mutex in such a way that main thread stalls are not added.
|
|
While giving myself a full project code review, found a bug in
the MeshUseGetMesh1 setting. Mostly defaulted to old configuration
but used the GetMesh2 caps which would have been a huge DoS
resource sink. Did some documentation maintenance as well while
I was in there. More for the to-do list, etc.
|
|
Added toTerseString() conversion on HttpStatus to generate a string
that's more descriptive than the hex value of the HttpStatus value
but still forms a short, searchable token (e.g. "Http_503" or
"Core_7"). Using this throughout the viewer now, no live cases
of toHex(), I believe.
|
|
Added 'MeshUseGetMesh1' and 'MeshUseHttpRetryAfter' debug settings
to control mesh transport behavior. First forces the use of the
legacy mesh fetch style with high concurrency and connection churn.
The second, on by default, honors Retry-After values if they are
reasonable. If off or unreasonable, internal delay times are used.
|
|
In case of HTTP errors or parsing/processing errors, fail the
fetch request rather than do a retry spin. Add logging for all
such failure paths. Added a development/debug flag to create
probabilistic failures to test these modes and general error
recovery by higher-level layers.
|
|
|
|
the last time and that repo can soon be abandoned (QA function
only).
|
|
|
|
lists to reflect current. Describe the functional flow of things
for a single LOD request. Put together to-do list for follow on
work. Knock down the low/high water limits for GetMesh a bit,
100/200 too high, 75/150 should be better, vis-a-vis pathological
failures.
|
|
replace llinfos, lldebugs, etc with new LL_INFOS(), LL_DEBUGS(), etc.
|
|
GetMesh2 capabilities. They should be independent now.
|
|
Have the ::notifyLoadedMeshes() method doing correct locking
and stall avoidance at the same time. This method now does
lazy mutex lock acquisition (trylock()) and if it fails on
either, it gives up and comes back later. Capture the maximum
number of sequential failures and report this at the end of
the run in the log. (So far, with big mesh regions, I've
only seen 1s and 2s.) Locking/mutex requirements sorted in
other locations as well. LLMutex gets trylock() method as
well as new LLMutexTrylock scoped locking class. Clean up
some documentation, more to do.
|
|
added a Mesh status line to the texture fetch console. Mesh is
often in competition with textures and so the mesh information
seems appropriate there. Do get a nice feel for progress and
you definitely see when the throttles kick in.
|
|
While linking GetMesh2 to the old setting was simpler from a user
point-of-view, they really shouldn't be linked and the old one will
go away. This one may be renamed to AssetMaxConcurrentRequests or
something similar if we get to the mesh/texture unification step.
|
|
This really extended into the client-side request throttling.
Moved this from llmeshrepository (which doesn't really want
to do connection management) into llcorehttp. It's now a
class option with configurable rate. This still isn't the
right thing to do as it creates coupling between viewer
and services. When we get to pipelining, this notion becomes
invalid.
|
|
Generally sorted the mesh timeout parameters for maximum
transport time (staying with default 30 for connect). 60S
for normal meshes, 600S for large. Also documented default
option values in httpoptions.h. Useful to have these. In
the future, the timeouts might go into standard llsd options
where they can be tracked a bit more.
|
|
With this checkin, legacy LLCurl is out of the mesh code.
Uploaders and responders are converted and functioning. Logging
has been cleaned up throughout the code to use the macro form
with tag/subtag capability. DEBUGS-level logging added for some
upload path milestones. Better error information flow from
failed responses to viewer alert boxes but I'd really, really
like to do better here. Mesh upload problems are completely
opaque as a user. Minor cleanups (removed dead members,
method signatures tidied, less data conversion). Could almost
call this complete, will likely have platform cleanups, however.
|
|
|
|
|
|
|
|
Viewer modified for preference for GetMesh2 caps. When found,
uses this cap and uses 1/4 the connection concurrency specified
by MeshMaxConcurrentRequests. Also uses a modified calculation
for high/low water feeding into the llcorehttp library.
|
|
Taught llappcorehttp to register signals on the settings values
that chagne behavior. Have initialization and settings changes
sweep through settings and change them. Dynamic changes are tried
but have no effect (produce a warning message) as dynamic settings
still aren't supported but the plumbing is now connected. Just
need to change llcorehttp. Bounced the 'teleport started' signal
around and it ended up back where it started with some cleanup.
This is making me less angry...
|
|
|
|
Fixed the logic and have it covering all five types of requests now
with validation via an assert (when enabled). Should keep things
working smoothly and avoid floods of 503s when in debug modes. Also
started a round of file-level documentation detailing thread usage
and mutex coverage. More to do, more to describe. But the high-
water stuff is functioning correctly.
|
|
Mesh code.
Pay correct attention to status codes coming back from services. Generate
better and consistent error messages when problems arise. There's more to
do in error handling, need a way to cleanly fail all request types, only
have that for LOD at this point. Do better keeping the HTTP pipeline between
the low and high water marks. This was made challenging because the outer
most code couldn't really see what's going on internally (whose actions are
delayed in a worker thread). More to do here, the debug-like requests don't
honor limits, that will come later. Made retry counts available from llcorehttp
which can be used by the throttle-anticipating logic to advance the count.
It helps but it reinforces the coupling between viewer and server which I
do not like.
|
|
Mesh repo is using three policy classes now: one for
large objects, one for GetMesh2 regions, one for
GetMesh regions. It's also detecting the presence
of the cap and using the correct class. Class
initialization cleaned up significantly in llappcorehttp
using data-directed code. Pulled in the changes to
HttpHeader done for sunshine-internal then did a
refactoring pass on the header callback which now
uses a unified approach to clean up and deliver
header information to all interested parties. Added
support for using Retry-After header information on
503 retries.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Normalize deadman timer's args on U64/F64. Internals remain the
same. Modify mesh to collect and output enhanced CPU metrics.
|
|
Integrated as a ctor-time option to LLDeadmanTimer and have mesh
use this mode for the stats I'm gathering.
|
|
|
|
One of the metrics calls was running in an LLCurl-owned thread
doing responder invocation. Deleted that invocation and will
do with the other safe ones. Added a boost signal on the
TeleportStarted message which is now used to restart the metrics
timer. I think I'd like to move the metric blob into a free-
standing entity later...
|
|
Timer interface violated my design rules and I paid for it
with clumsiness and silent errors. Cleaned it up mainly
removing the evil default values. Found better integration
points in the mesh downloader and it's producing fairly
consistent numbers on the MeshTest2 test region (about
5500 downloads, ~90 seconds, +/- 10 seconds). Will review
with davep and do an early timer stop on teleport which
invalidates a timing sequence.
|
|
Added second mesh class as well as an asset upload class.
Refactored initialization to use less code and more data to
cleanly get http started. Modified mesh to use the new
http class for large requests (>2MB for now). Added additional
timeout setting to llcorehttp to distinguish connection timeout
from transport timeout and are now using transport timeout
values for large asset downloads that may need more time.
|
|
after delete, erase() on end() iterator, a few more like that.
Killed a dead variable.
|