📕 subnode [[@flancian/agora speed]]
in 📚 node [[agora-speed]]
- The Agora started very fast (it was pleasant) and then proceeded to get very slow (with features, but mostly really with the number of nodes+edges that it's handling, see stats in [[nodes]] if interested).
- It could be fast again -- we just need to fix some issues :)
-
As of right now, very often the Agora takes almost 10s (!) to load some pages, particularly when under load. There's a variety of reasons this happens -- I could write about them here but right now I'd rather just go ahead and try to fix them ;)
- Doing pomodoros today, [[2022-03-20]], to try to address this issue.
-
When starting benchmarking, dev.anagora.org/do is regularly taking 3.7s to load (measured with [[time curl]]). In production this is often longer, but this seems like a good test node (because of the amount of pushes it has, which I think is related to the slowdown).
- The first load after restarting the dev server is often 10s, which is closer to what we'd see in prod (because it needs to load+process the whole Agora graph).
-
Same on [[2022-04-09]].
- Increased organic traffic + crawler bot activity.
- Changed number of workers (it seems more is not necessarily better, trying running with 10.)
- Changed cache expiry to a random range to prevent thundering herd by workers hitting cache TTL in unison.
- Cached calls to node().
- I wonder if there's low hanging fruit remaining; dev.anagora.org just tends to feel so fast in comparison with prod, perhaps I'm missing where's the bottleneck. Will add debugging data.
- HA.
- The Agora workers were restarting very often, way more often than CACHE_TTL, because the bots that are causing most of the increase in load were also hitting URLs that produce 500s -- a nice favour really, as they call attention to an obvious bug we need to fix in [[journals]] (they link out to bad URLs due to an issue with the renderer), but in the meantime they were essentially nuking the performance of the Agora and making all the recent performance work not really take effect.
- Now that this is fixed I can happily report that the Agora feels noticeably snappier :)
📄
pushed from garden/flancian/journal/2022_04_09.md by @flancian
-
#push [[agora speed]]
-
[[scaling]] must happen
- but we still have time thanks to the block below :)
-
DONE hmm, but there is [[low hanging fruit]]: the per-worker cache should not all expire in unison (!)
:LOGBOOK:
CLOCK: [2022-04-09 Sat 17:52:37]--[2022-04-09 Sat 19:10:28] => 01:17:51
:END:
- also I wasn't caching calls to G.node() (?).
- AND, much more importantly, the Agora was restarting all the time due to 500s in URLs hit by bots -- so none of the performance work I was doing was taking effect. Now that that's fixed it feels much snappier! I am happy about this development.
-
[[scaling]] must happen
📖 stoas
- public document at doc.anagora.org/agora-speed
- video call at meet.jit.si/agora-speed