Commit graph

155 commits

Author SHA1 Message Date
Till
11d9b9db0e
Remove polylith/API mode (#2967)
This removes most of the code used for polylith/API mode.

This removes the `/api` internal endpoints entirely. 

Binary size change roughly 5%: 
```
51437560 Feb 13 10:15 dendrite-monolith-server # old
48759008 Feb 13 10:15 dendrite-monolith-server # new
```
2023-02-14 12:47:47 +01:00
Till
7d2344049d
Cleanup stale device lists for users we don't share a room with anymore (#2857)
The stale device lists table might contain entries for users we don't
share a room with anymore. This now asks the roomserver about left users
and removes those entries from the table.

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-12-12 08:20:59 +01:00
Till
e245a26f6b
Enable/Disable internal metrics (#2899)
Basically enables us to use `test.WithAllDatabases` when testing
internal HTTP APIs, as this would otherwise result in Prometheus
complaining about already registered metric names.
2022-12-05 13:53:36 +01:00
Neil Alexander
6650712a1c
Federation fixes for virtual hosting 2022-11-15 15:05:23 +00:00
Neil Alexander
529df30b56
Virtual hosting schema and logic changes (#2876)
Note that virtual users cannot federate correctly yet.
2022-11-11 16:41:37 +00:00
Till Faelligen
205a15621a
Add custom build flag to satisfy Sytest 2022-11-07 15:07:47 +01:00
Neil Alexander
a2706e6498
Refactor claimRemoteKeys 2022-10-27 15:34:26 +01:00
Neil Alexander
a553fe7705
Fix slow querying of cross-signing signatures 2022-10-24 10:07:50 +01:00
devonh
9041491201
Mutex protect query keys response (#2812) 2022-10-20 15:54:18 +00:00
Neil Alexander
8cbe14bd6d
Fix lock contention 2022-10-19 12:27:34 +01:00
Neil Alexander
c1463db6c9
Fix concurrent map write in key server 2022-10-19 12:03:12 +01:00
Till
b9d0e9f7ed
Add test for QueryDeviceMessages (#2773)
Adds tests for `QueryDeviceMessages` and also includes some
optimizations to reduce allocations in the DB layer.
2022-10-07 10:54:42 +02:00
Till
ec5d1d681d
Always return one_time_key_counts on /keys/upload (#2769)
The OTK count is
[required](https://spec.matrix.org/v1.4/client-server-api/#post_matrixclientv3keysupload)
in responses to `/keys/upload`, so return those.
2022-10-06 12:30:24 +02:00
Neil Alexander
6f602bb096
Demote Failed to query device keys for some users warning to level=debug
Many of these warnings are due to dead servers and are quite annoying when they fill up the logs.
2022-10-05 11:16:05 +01:00
Neil Alexander
fec3ee384b
Stop CPU burn in PerformMarkAsStaleIfNeeded 2022-10-03 12:59:56 +01:00
Neil Alexander
8a82f10046
Allow more time for device list updates (#2749)
This updates the device list updater so that it has a context
per-request, rather than a global 30 seconds for the entire server. This
could mean that talking to a slow remote server or requesting a lot of
user IDs was pretty much guaranteed to fail.

It also uses the process context to allow correct cancellation when
Dendrite wants to shut down cleanly.
2022-09-30 09:41:16 +01:00
Till
9005e5b4a8
Add /_dendrite/admin/refreshDevices/{userID} (#2746)
Allows to immediately query `/devices/{userID}` over federation to
(hopefully) resolve E2EE issues.
2022-09-30 09:32:31 +01:00
Till
e007b8038f
Mark device list as stale, if we don't have the requesting device (#2728)
This hopefully makes E2EE chats a little bit more reliable by re-syncing
devices if we don't have the `requesting_device_id` in our database. (As
seen in
[Synapse](c52abc1cfd/synapse/handlers/devicemessage.py (L157-L201)))
2022-09-20 11:32:03 +02:00
Till
100fa9b235
Check unique constraint errors when manually inserting migrations (#2712)
This should avoid unnecessary logging on startup if the migration (were
we need `InsertMigration`) was already executed.
This now checks for "unique constraint errors" for SQLite and Postgres
and fails the startup process if the migration couldn't be manually
inserted for some other reason.
2022-09-13 08:07:43 +02:00
Neil Alexander
e1bc4f6a1e
Fix database transaction for keyserver DeleteDeviceKeys 2022-09-09 13:31:55 +01:00
Till
8196b29657
Change detection of already executed migrations (#2665)
This changes the detection of already executed migrations for the
roomserver state block and keychange refactor. It now uses schema tables
provided by the database engine to check if the column was already
removed. We now also store the migration in the migrations table.

This should stop e.g. Postgres from logging errors like `ERROR: column
"event_nid" does not exist at character 8`.
2022-09-09 13:14:52 +01:00
Till
42a82091a8
Fix issue with stale device lists (#2702)
We were only sending the last entry to the worker, so most likely missed
updates.
2022-09-08 12:03:44 +02:00
Till
0d697f6754
Add HTTP status code to FederationClientError (#2699)
Also ensures we wait on more HTTP status codes.
2022-09-07 16:14:09 +02:00
Till
7e8c605f98
Avoid unneeded JSON operations (#2698)
We were `json.Unmarshal`ing the EDU and `json.Marshal`ing right before
sending the EDU to the stream. Those are now removed and the consumer
does `json.Unmarshal` once.
2022-09-07 12:16:04 +02:00
Till Faelligen
4e352390b6
Re-add waitTime if we're not blacklisted and no RetryAfter was
specified.
2022-09-07 12:13:02 +02:00
Till
2cfcfddecc
Add a SigningKeyUpdate producer (#2697)
This adds a new stream for signing key updates, this should ensure we
don't lose any updates over federation.
2022-09-07 11:45:12 +02:00
Till
440eb0f3a2
Handle errors differently in the DeviceListUpdater (#2695)
`If a device list update goes missing, the server resyncs on the next
one` was failing because a previous test would receive a `waitTime` of
1h, resulting in the test timing out.
This now tries to handle the returned errors differently, e.g. by using
the default `waitTime` of 2s. Also doesn't try further users in the
list, if one of the errors would cause a longer `waitTime`.
2022-09-07 11:44:27 +02:00
Neil Alexander
175f65407a
Allow batching in JetStreamConsumer (#2686)
This allows us to receive more than one message from NATS at a time if we want.
2022-08-31 12:21:56 +01:00
Brian Meek
704cc5c9f5
Race in keyserver intialization (#2619)
Signed-off-by: Brian Meek <brian@hntlabs.com>
2022-08-29 09:10:42 +02:00
Neil Alexander
5513f182cc
Enforce device list backoffs (#2653)
This ensures that if the device list updater is already backing off a node, we don't try to call processServer again anyway for server just because the server name arrived in the channel. Otherwise we can keep trying to hit a remote server that is offline or not behaving every second and that spams the logs too.
2022-08-19 10:23:09 +01:00
Neil Alexander
c45d0936b5
Generic-based internal HTTP API (#2626)
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)

* Add `PerformInvite`

* More tweaks

* Fix metric name

* Fix LookupStateIDs

* Lots of changes to clients

* Some serverside stuff

* Some error handling

* Use paths as metric names

* Revert "Use paths as metric names"

This reverts commit a9323a6a343f5ce6461a2e5bd570fe06465f1b15.

* Namespace metric names

* Remove duplicate entry

* Remove another duplicate entry

* Tweak error handling

* Some more tweaks

* Update error behaviour

* Some more error tweaking

* Fix API path for `PerformDeleteKeys`

* Fix another path

* Tweak federation client proxying

* Fix another path

* Don't return typed nils

* Some more tweaks, not that it makes any difference

* Tweak federation client proxying

* Maybe fix the key backup test
2022-08-11 15:29:33 +01:00
Till
03ddd98f5e
Fix issues with migrations not getting executed (#2628)
* Fix issues with migrations not getting executed

* Check actual postgres error

* Return error if it's not "column does not exist"
2022-08-08 10:18:57 +02:00
Neil Alexander
c8935fb53f
Do not use ioutil as it is deprecated (#2625) 2022-08-05 10:26:59 +01:00
Till
1b7f84250a
Fix linter issues (#2624)
* Try that again

* All hail the mighty linter?

* And once again

* goimport all the things
2022-08-05 11:12:41 +02:00
Brian Meek
de78eab63a
Add race testing to tests, and fix a few small race conditions in the tests (#2587)
* Add race testing to tests, and fix a few small race conditions in the tests

* Enable run-sytest on MacOS

* Remove deadlock detecting mutex, per code review feedback

* Remove autoformatting related changes and a closure that is not needed

* Adjust to importing nats client as 'natsclient'

Signed-off-by: Brian Meek <brian@hntlabs.com>

* Clarify the use of gooseMutex to proect goose internal state

Signed-off-by: Brian Meek <brian@hntlabs.com>

* Remove no longer needed mutex for guarding goose

Signed-off-by: Brian Meek <brian@hntlabs.com>
2022-08-05 09:19:33 +01:00
Till
9fe509b18d
Fix syncapi shared users query & device lists (#2614)
* Fix query issue, only add "changed" users if we actually share a room

* Avoid log spam if context is done

* Undo changes to filterSharedUsers

* Add logging again..

* Fix SQLite shared users query

* Change query to include invited users
2022-08-03 18:35:17 +02:00
Till
081f5e7226
Update database migrations, remove goose (#2264)
* Add new db migration

* Update migrations
Remove goose

* Add possibility to test direct upgrades

* Try to fix WASM test

* Add checks for specific migrations

* Remove AddMigration
Use WithTransaction
Add Dendrite version to table

* Fix linter issues

* Update tests

* Update comments, outdent if

* Namespace migrations

* Add direct upgrade tests, skipping over one version

* Split migrations

* Update go version in CI

* Fix copy&paste mistake

* Use contexts in migrations

Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-07-25 10:39:22 +01:00
Till
f29cdb26f6
Use new testrig for key changes tests (#2552)
* Use new testrig for tests

* Log the error message
2022-07-05 14:50:24 +02:00
Neil Alexander
7120eb6bc9
Add InputDeviceListUpdate to the keyserver, remove old input API (#2536)
* Add `InputDeviceListUpdate` to the keyserver, remove old input API

* Fix copyright

* Log more information when a device list update fails
2022-06-15 14:27:07 +01:00
Neil Alexander
70cd8c68c2
Reduce error levels on device list update 2022-06-01 09:49:46 +01:00
kegsay
6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers (#2466)
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00
Till
58af7f61b6
Fix OTK upload spam (#2448)
* Fix OTK spam

* Update comment

* Optimize selectKeysCountSQL to only return max 100 keys

* Return CurrentPosition if the request timed out

* Revert "Return CurrentPosition if the request timed out"

This reverts commit 7dbdda964189f5542048c06ce5ffc6d4da1814e6.

Co-authored-by: kegsay <kegan@matrix.org>
2022-05-11 17:15:18 +01:00
Neil Alexander
09d754cfbf
One NATS instance per BaseDendrite (#2438)
* One NATS instance per `BaseDendrite`

* Fix roomserver
2022-05-09 14:15:24 +01:00
Neil Alexander
4c15c73b3a
Add (user_id, device_id) index on OTK table (#2435) 2022-05-09 11:13:04 +01:00
kegsay
85704eff20
Clean up interface definitions (#2427)
* tidy up interfaces

* remove unused GetCreatorIDForAlias

* Add RoomserverUserAPI interface

* Define more interfaces

* Use AppServiceInternalAPI for consistent naming

* clean up federationapi constructor a bit

* Fix monolith in -http mode
2022-05-06 12:39:26 +01:00
kegsay
9957752a9d
Define component interfaces based on consumers (2/2) (#2425)
* convert remaining interfaces

* Tidy up the userapi interfaces
2022-05-05 19:30:38 +01:00
kegsay
506de4bb3d
Define component interfaces based on consumers (1/2) (#2423)
* Specify interfaces used by appservice, do half of clientapi

* convert more deps of clientapi to finer-grained interfaces

* Convert mediaapi and rest of clientapi

* Somehow this got missed
2022-05-05 13:17:38 +01:00
kegsay
d86dcbef66
syncapi: define specific interfaces for internal HTTP communications (#2416)
* syncapi: use finer-grained interfaces when making the syncapi

* Use specific interfaces for syncapi-roomserver interactions

* Define query access token api for shared http auth code
2022-05-05 09:56:03 +01:00
Neil Alexander
4ad5f9c982
Global database connection pool (for monolith mode) (#2411)
* Allow monolith components to share a single database pool

* Don't yell about missing connection strings

* Rename field

* Setup tweaks

* Fix panic

* Improve configuration checks

* Update config

* Fix lint errors

* Update comments
2022-05-03 16:35:06 +01:00
Neil Alexander
31799a3b2a
Device list display name fixes (#2405)
* Get device names from `unsigned` in `/user/devices`

* Fix display name updates

* Fix bug

* Fix another bug
2022-04-29 16:02:55 +01:00