Commit graph

124 commits

Author SHA1 Message Date
Till
232aef016c
Add basic runtime tracing (#2996)
This allows us in almost all places to use regions to further trace down
long running tasks.
Also removes an unused function.
2023-03-13 16:45:14 +01:00
Till
9bcd0a2105
Make redaction check easier to read (#2995)
We need to check the redaction PL in Dendrite, if we do it in GMSL, we
end up not sending the event to the output stream because it will be
rejected.

---------

Co-authored-by: kegsay <kegan@matrix.org>
2023-03-03 14:03:17 +01:00
Till
6c20f8f742
Refactor StoreEvent, add MaybeRedactEvent, create an EventDatabase (#2989)
This PR changes the following:
- `StoreEvent` now only stores an event (and possibly prev event),
instead of also doing redactions
- Adds a `MaybeRedactEvent` (pulled out from `StoreEvent`), which should
be called after storing events
- a few other things
2023-03-01 17:06:47 +01:00
Till
ad07b169b8
Refactor StoreEvent and create a new RoomDatabase interface (#2985)
This PR changes a few things:
- It pulls out the creation of several NIDs from the `StoreEvent`
function to make the functions more reusable
- Uses more caching when using those NIDs to avoid DB round trips
2023-02-24 09:40:20 +01:00
Till
caf310fd79
AWSY missing federation tests (#2943)
In an attempt to fix the missing AWSY tests and to get to 100%
server-server compliance.
2023-01-20 15:18:06 +01:00
Till
5eed31fea3
Handle guest access [1/2?] (#2872)
Needs https://github.com/matrix-org/sytest/pull/1315, as otherwise the
membership events aren't persisted yet when hitting `/state` after
kicking guest users.

Makes the following tests pass:
```
Guest users denied access over federation if guest access prohibited
Guest users are kicked from guest_access rooms on revocation of guest_access
Guest users are kicked from guest_access rooms on revocation of guest_access over federation
```

Todo (in a follow up PR):
- Restrict access to CS API Endpoints as per
https://spec.matrix.org/v1.4/client-server-api/#client-behaviour-14

Co-authored-by: kegsay <kegan@matrix.org>
2022-12-22 13:05:59 +01:00
Neil Alexander
9b8bb55430
Don't get blacklisted hosts when querying joined servers (#2880)
Otherwise we just waste time/CPU.
2022-11-15 17:21:16 +00:00
Neil Alexander
6650712a1c
Federation fixes for virtual hosting 2022-11-15 15:05:23 +00:00
Till
2a77a910eb
Handle remote room upgrades (#2866)
Makes the following tests pass
```
/upgrade moves remote aliases to the new room
Local and remote users' homeservers remove a room from their public directory on upgrade
```
2022-11-14 12:07:13 +00:00
Neil Alexander
16c2a95900
Improve logging for processEventWithMissingState 2022-11-02 11:30:49 +00:00
Neil Alexander
cd8f7e1251
Set inactivity threshold on durable consumers in the roomserver input API (#2795)
This prevents us from holding onto durable consumers indefinitely for
rooms that have long since turned inactive, since they do have a bit of
a processing overhead in the NATS Server. If we clear up a consumer and
then a room becomes active again, the consumer gets recreated as needed.

The threshold is set to 24 hours for now, we can tweak it later if needs
be.
2022-10-14 15:14:29 +01:00
Till
088ad1dd21
Fix outliers whose auth_events are in a different room are correctly rejected (#2791)
Fixes `outliers whose auth_events are in a different room are correctly
rejected`, by validating that auth events are all from the same room and
not using rejected events for event auth.
2022-10-14 09:14:54 +02:00
Neil Alexander
3f82bceb70
Don't try to talk to ourselves when finding missing events 2022-10-06 10:51:06 +01:00
Neil Alexander
f022fc1397
Remove origin field from PDUs (#2737)
This nukes the `origin` field from PDUs as per
matrix-org/matrix-spec#998, matrix-org/gomatrixserverlib#341.
2022-09-26 17:35:35 +01:00
Neil Alexander
fc1d8e479b
Ensure that all state event IDs are included in the added section when rewriting state (#2725)
This should hopefully fix an entire class of problems where components
downstream from the roomserver (i.e. the sync API) could just lose a
whole bunch of state after a rewrite operation like a federated join.

The root of the bug is that we set `RewritesState` in the output event
which instructs downstream components to purge their copy of any room
state, but then didn't send the entire state snapshot in
`adds_state_event_ids` so the downstream state ends up being incomplete
as a result.
2022-09-16 10:35:32 +01:00
Neil Alexander
7f89fed1e4
Revert 482914aef4 2022-09-14 09:55:50 +01:00
Neil Alexander
482914aef4
Use AckNone on the ephemeral room input consumer 2022-09-13 15:25:02 +01:00
Neil Alexander
2792d0490f
Fix missing signature check on the /get_missing_events response 2022-09-12 13:30:51 +01:00
Neil Alexander
31f4ae8997
Use a single context instead 2022-09-07 16:24:43 +01:00
Neil Alexander
5014b35bd7
Update state reset capture to Sentry 2022-09-07 16:23:22 +01:00
Neil Alexander
cd22ba22b0
Improve Sentry reporting 2022-09-05 17:25:11 +01:00
Neil Alexander
ecee5f10f4
Tweak logging for detected state resets 2022-09-05 17:08:54 +01:00
Neil Alexander
d1f87e63f1
Move SetLatestEvents call 2022-09-05 13:16:14 +01:00
Neil Alexander
cd7fa34595
Tweak logging and Sentry reporting for roomserver input 2022-08-25 10:57:27 +01:00
Neil Alexander
2668050e53
Tweak soft-failure handling in roomserver
commit 1929b688e31987c46e0c8a546f0f9cb0a46bf9a3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Aug 22 10:09:44 2022 +0100

    Still process state-before for soft-failed events

commit e83c0b701d40d78b92072c4643f6bc6f71b72800
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Aug 22 10:06:50 2022 +0100

    Improve logging

commit 29e26124bc27cb83d449de2a4214b253c594aa93
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Aug 22 09:58:13 2022 +0100

    Don't store soft-failed events as rejected
2022-08-22 10:34:07 +01:00
Neil Alexander
6b48ce0d75
State handling tweaks (#2652)
This tweaks how rejected events are handled in room state and also to not apply checks we can't complete to outliers.
2022-08-18 17:06:13 +01:00
Neil Alexander
59bc0a6f4e
Reprocess rejected input events (#2647)
* Reprocess outliers that were previously rejected

* Might as well do all events this way

* More useful errors

* Fix queries

* Tweak condition

* Don't wrap errors

* Report more useful error

* Flatten error on `r.Queryer.QueryStateAfterEvents`

* Some more debug logging

* Flatten error in `QueryRestrictedJoinAllowed`

* Revert "Flatten error in `QueryRestrictedJoinAllowed`"

This reverts commit 1238b4184c30e0c31ffb0f364806fa1275aba483.

* Tweak `QueryStateAfterEvents`

* Handle MissingStateError too

* Scope to room

* Clean up

* Fix the error

* Only apply rejection check to outliers
2022-08-18 10:37:47 +01:00
Till
05cafbd197
Implement history visibility on /messages, /context, /sync (#2511)
* Add possibility to set history_visibility and user AccountType

* Add new DB queries

* Add actual history_visibility changes for /messages

* Add passing tests

* Extract check function

* Cleanup

* Cleanup

* Fix build on 386

* Move ApplyHistoryVisibilityFilter to internal

* Move queries to topology table

* Add filtering to /sync and /context
Some cleanup

* Add passing tests; Remove failing tests :(

* Re-add passing tests

* Move filtering to own function to avoid duplication

* Re-add passing test

* Use newly added GMSL HistoryVisibility

* Update gomatrixserverlib

* Set the visibility when creating events

* Default to shared history visibility

* Remove unused query

* Update history visibility checks to use gmsl
Update tests

* Remove unused statement

* Update migrations to set "correct" history visibility

* Add method to fetch the membership at a given event

* Tweaks and logging

* Use actual internal rsAPI, default to shared visibility in tests

* Revert "Move queries to topology table"

This reverts commit 4f0d41be9c194a46379796435ce73e79203edbd6.

* Remove noise/unneeded code

* More cleanup

* Try to optimize database requests

* Fix imports

* PR peview fixes/changes

* Move setting history visibility to own migration, be more restrictive

* Fix unit tests

* Lint

* Fix missing entries

* Tweaks for incremental syncs

* Adapt generic changes

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
2022-08-11 18:23:35 +02:00
Neil Alexander
c45d0936b5
Generic-based internal HTTP API (#2626)
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)

* Add `PerformInvite`

* More tweaks

* Fix metric name

* Fix LookupStateIDs

* Lots of changes to clients

* Some serverside stuff

* Some error handling

* Use paths as metric names

* Revert "Use paths as metric names"

This reverts commit a9323a6a343f5ce6461a2e5bd570fe06465f1b15.

* Namespace metric names

* Remove duplicate entry

* Remove another duplicate entry

* Tweak error handling

* Some more tweaks

* Update error behaviour

* Some more error tweaking

* Fix API path for `PerformDeleteKeys`

* Fix another path

* Tweak federation client proxying

* Fix another path

* Don't return typed nils

* Some more tweaks, not that it makes any difference

* Tweak federation client proxying

* Maybe fix the key backup test
2022-08-11 15:29:33 +01:00
Till
1b7f84250a
Fix linter issues (#2624)
* Try that again

* All hail the mighty linter?

* And once again

* goimport all the things
2022-08-05 11:12:41 +02:00
Neil Alexander
3bf5ae5ffe
Try more servers when calling /state_ids (#2610)
* Try more servers when calling `/state_ids`

* More logging

* Maybe fix concurrent map write

* Revert "Maybe fix concurrent map write"

This reverts commit da0dbb836207a911afe77e6f6d63c4809669693c.

* Enforce a limit of 20s per server, 5 mins total
2022-08-03 17:37:27 +01:00
Neil Alexander
f4345dafde
Fix data race in lookupMissingStateViaStateIDs 2022-08-02 13:01:03 +01:00
Neil Alexander
ca3fa58388
Various roominfo tweaks (#2607) 2022-08-02 12:27:15 +01:00
Neil Alexander
f0c8a03649
Membership updater refactoring (#2541)
* Membership updater refactoring

* Pass in membership state

* Use membership check rather than referring to state directly

* Delete irrelevant membership states

* We don't need the leave event after all

* Tweaks

* Put a log entry in that I might stand a chance of finding

* Be less panicky

* Tweak invite handling

* Don't freak if we can't find the event NID

* Use event NID from `types.Event`

* Clean up

* Better invite handling

* Placate the almighty linter

* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand

* Fix the sytest after all (thanks @S7evinK for the spot)
2022-07-22 14:44:04 +01:00
Neil Alexander
3ea21273bc
Ristretto cache (#2563)
* Try Ristretto cache

* Tweak

* It's beautiful

* Update GMSL

* More strict keyable interface

* Fix that some more

* Make less panicky

* Don't enforce mutability checks for now

* Determine mutability using deep equality

* Tweaks

* Namespace keys

* Make federation caches mutable

* Update cost estimation, add metric

* Update GMSL

* Estimate cost for metrics better

* Reduce counters a bit

* Try caching events

* Some guards

* Try again

* Try this

* Use separate caches for hopefully better hash distribution

* Fix bug with admitting events into cache

* Try to fix bugs

* Check nil

* Try that again

* Preserve order jeezo this is messy

* thanks VS Code for doing exactly the wrong thing

* Try this again

* Be more specific

* aaaaargh

* One more time

* That might be better

* Stronger sorting

* Cache expiries, async publishing of EDUs

* Put it back

* Use a shared cache again

* Cost estimation fixes

* Update ristretto

* Reduce counters a bit

* Clean up a bit

* Update GMSL

* 1GB

* Configurable cache sizees

* Tweaks

* Add `config.DataUnit` for specifying friendly cache sizes

* Various tweaks

* Update GMSL

* Add back some lazy loading caching

* Include key in cost

* Include key in cost

* Tweak max age handling, config key name

* Only register prometheus metrics if requested

* Review comments @S7evinK

* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)

* Review comments

* Update sample configs

* Update GHA Workflow

* Update Complement images to Go 1.18

* Remove the cache test from the federation API as we no longer guarantee immediate cache admission

* Don't check the caches in the renewal test

* Possibly fix the upgrade tests

* Update to matrix-org/gomatrixserverlib#322

* Update documentation to refer to Go 1.18
2022-07-11 14:31:31 +01:00
Till
5087b36af0
Fix QuerySharedUsers for the SyncAPI keychange consumer (#2554)
* Make more use of base.BaseDendrite

* Fix QuerySharedUsers if no UserIDs are supplied
2022-07-05 14:50:56 +02:00
Neil Alexander
b50a24c666
Roomserver producers package (#2546)
* Give the roomserver a producers package

* Change init point

* Populate ACLs API

* Fix build issues

* `RoomEventProducer` naming
2022-07-01 10:54:07 +01:00
Neil Alexander
4c2a10f1a6
Handle state before, send history visibility in output (#2532)
* Check state before event

* Tweaks

* Refactor a bit, include in output events

* Don't waste time if soft failed either

* Tweak control flow, comments, use GMSL history visibility type
2022-06-13 15:11:10 +01:00
Neil Alexander
27948fb304
Optimise loadAuthEvents, add roomserver tracing 2022-06-07 14:23:26 +01:00
Neil Alexander
02e5c74101
Revert #2457
Squashed commit of the following:

commit 2bd0daf4d61376d2dd56628eaff267b0bc63e116
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jun 1 09:55:54 2022 +0100

    Revert resolving old extremities as well as new

    This may no longer be needed with the new state fixes and probably just burns more CPU time than is strictly necessary.
2022-06-01 10:09:27 +01:00
Neil Alexander
3d9fe20748
Fix bugs related to state resolution (#2507)
* Fix bugs related to state resolution

* Clean up `resolve-state`

* Don't panic when entries can't be found

* Ensure we have state entries for the auth events

* Revert "Ensure we have state entries for the auth events"

This reverts commit 9b13b7ed37f40ce6d1301d9cb423a27b0db9c897.

* Revert "Revert "Ensure we have state entries for the auth events""

This reverts commit d86db197e3e317f7d64ec6722cc60533872f4617.

* Fix bug

* Try that again

* Update gomatrixserverlib

* Remove recursion from `loadAuthEvents`
2022-06-01 09:46:21 +01:00
Neil Alexander
9eb4fec33b
Make logging output for state deletions a bit better 2022-05-26 10:38:46 +01:00
Neil Alexander
6940c7c7dd
Try to spot state deletions when they happen (#2489) 2022-05-25 16:40:31 +01:00
kegsay
6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers (#2466)
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00
Neil Alexander
be9be2553f
Resolve over old and new extremities (#2457)
* Feed existing state into state res when calculating state from new extremities

* Remove duplicates

* Fix bug

* Sort and unique

* Update to matrix-org/gomatrixserverlib#308

* Trim the slice properly

* Update gomatrixserverlib again

* Update to matrix-org/gomatrixserverlib#308
2022-05-13 11:52:04 +01:00
Neil Alexander
09d754cfbf
One NATS instance per BaseDendrite (#2438)
* One NATS instance per `BaseDendrite`

* Fix roomserver
2022-05-09 14:15:24 +01:00
Neil Alexander
6bc6184d70
Simplify calculateLatest (#2430)
* Simplify `calculateLatest`

* Comments
2022-05-06 15:52:44 +01:00
kegsay
9957752a9d
Define component interfaces based on consumers (2/2) (#2425)
* convert remaining interfaces

* Tidy up the userapi interfaces
2022-05-05 19:30:38 +01:00
Neil Alexander
530fd488a9
Don't log consumer errors on shutdown 2022-05-05 13:29:39 +01:00
Neil Alexander
4ad5f9c982
Global database connection pool (for monolith mode) (#2411)
* Allow monolith components to share a single database pool

* Don't yell about missing connection strings

* Rename field

* Setup tweaks

* Fix panic

* Improve configuration checks

* Update config

* Fix lint errors

* Update comments
2022-05-03 16:35:06 +01:00