Page MenuHomePhabricator

Fix database master queries from HTTP GET/HEAD before active-active multi-dc
Open, MediumPublic

Description

Most of MediaWiki was written with the assumption that MediaWiki and its data stores are collocated and connected via reliable, low-latency network links. Until recently, MediaWiki had minimal facilities for maintaining consistency and partition tolerance across wide-area network links. As a result, although the Wikimedia Foundation operates data centers in multiple locations, we only run MediaWiki in one location at any one time.

This has several practical consequences: first, we are not as fault-tolerant as we'd like to be. We have a secondary data center with enough capacity to serve our traffic in case our primary datacenter goes down, but it is in cold standby, meaning it takes some time (and some manual effort) to get it running. Second, site performance is poor for logged-in users that are geographically remote from Ashburn, Virginia, due to the time it takes to transmit and receive data across long-distance links. Thirdly, in some basic cases, like parsing pages, the master database must be up, leading to a SPOF.

It's going to take a lot of work to fix this completely, but we are getting closer to being able to serve some traffic from secondary datacenter. Specifically, we would like to serve "reads" -- requests that don't require a master database connection -- from a secondary datacenter.

In order to serve reads from a different datacenter, we need to be able to predict which incoming requests will modify data, so that we can route them accordingly. We need to be able to make this determination at the edge -- i.e., the outermost layers of the infrastructure, so it cannot be complicated or slow.

The solution we have is to use the HTTP request method (T91820): GETs/HEADs are read-only, while POSTs are not. This was already true for most cases, but there is a long tail of actions with side-effects that are done via GET, such as purge, rollback, markpatrolled.

This task mostly involves fixing DBPerformance log warnings. Warnings can be dealt with be:
a) Changing DB master reads to use DB slaves
b) Moving the database updates to POST requests, the jobqueue, or at least to post-send updates via DeferredUpdates
c) Disabling warnings for a few exceptional cases like CentralAuth.

See +channel:DBPerformance on logstash.wikimedia.org

Most of these warnings are writes or master queries on HTTP GET requests, which would be cross DC in active-active setup for some user. Ideally we could eventually get these to zero.

Details

SubjectRepoBranchLines +/-
mediawiki/coremaster+41 -53
mediawiki/coremaster+19 -27
mediawiki/coremaster+17 -8
mediawiki/extensions/NewUserMessagemaster+45 -3
mediawiki/extensions/CentralAuthmaster+8 -3
mediawiki/extensions/CentralAuthmaster+1 -1
mediawiki/coremaster+19 -7
mediawiki/extensions/CentralAuthmaster+3 -3
mediawiki/extensions/Translatemaster+1 -1
mediawiki/extensions/PageTriagemaster+6 -0
mediawiki/extensions/CentralAuthmaster+7 -3
mediawiki/coremaster+8 -3
mediawiki/extensions/CentralAuthmaster+9 -8
mediawiki/coremaster+77 -23
mediawiki/extensions/CentralAuthmaster+25 -14
mediawiki/coremaster+22 -1
mediawiki/extensions/CentralAuthREL1_27+3 -2
mediawiki/extensions/CentralAuthmaster+3 -2
mediawiki/extensions/CentralAuthREL1_27+6 -5
mediawiki/extensions/CentralAuthmaster+6 -5
mediawiki/extensions/Mathmaster+10 -8
mediawiki/coremaster+5 -1
mediawiki/extensions/CentralAuthmaster+29 -27
mediawiki/extensions/FlaggedRevsmaster+0 -1
mediawiki/coremaster+33 -21
mediawiki/extensions/LiquidThreadsmaster+1 -1
mediawiki/extensions/Wikibasemaster+5 -2
mediawiki/extensions/CentralAuthwmf/1.28.0-wmf.2+3 -2
mediawiki/extensions/EducationProgrammaster+4 -1
mediawiki/extensions/CentralAuthmaster+5 -6
mediawiki/extensions/Translatemaster+10 -9
mediawiki/extensions/LiquidThreadsmaster+4 -1
mediawiki/extensions/FlaggedRevsmaster+2 -1
mediawiki/extensions/FlaggedRevsmaster+29 -27
mediawiki/coremaster+3 -1
mediawiki/extensions/LiquidThreadsmaster+3 -1
mediawiki/coremaster+4 -4
mediawiki/coremaster+4 -1
mediawiki/extensions/AbuseFiltermaster+3 -1
mediawiki/extensions/CentralAuthmaster+9 -11
mediawiki/extensions/CentralAuthmaster+11 -2
mediawiki/extensions/CentralAuthmaster+9 -10
mediawiki/extensions/VisualEditormaster+3 -1
mediawiki/coremaster+4 -2
mediawiki/extensions/VisualEditormaster+1 -1
mediawiki/extensions/Translatemaster+25 -10
mediawiki/coremaster+3 -7
mediawiki/extensions/LiquidThreadsmaster+3 -3
mediawiki/coremaster+1 -1
mediawiki/coremaster+2 -2
mediawiki/corewmf/1.27.0-wmf.8+266 -419
mediawiki/coremaster+266 -419
mediawiki/coremaster+33 -17
mediawiki/extensions/CentralAuthmaster+133 -30
mediawiki/extensions/CentralNoticemaster+24 -21
mediawiki/extensions/Translatemaster+4 -1
mediawiki/extensions/ContentTranslationmaster+4 -1
mediawiki/extensions/BetaFeaturesmaster+4 -1
mediawiki/coremaster+6 -9
mediawiki/extensions/Flowmaster+4 -1
mediawiki/extensions/Echomaster+7 -8
mediawiki/extensions/PageTriagemaster+5 -3
mediawiki/extensions/Echomaster+3 -1
mediawiki/extensions/MobileFrontendmaster+3 -1
mediawiki/extensions/UniversalLanguageSelectormaster+5 -2
mediawiki/coremaster+14 -5
mediawiki/coremaster+3 -0
mediawiki/extensions/CentralAuthmaster+3 -1
mediawiki/coremaster+1 -3
mediawiki/extensions/CentralAuthmaster+3 -0
mediawiki/extensions/FlaggedRevsmaster+1 -1
mediawiki/extensions/LiquidThreadsmaster+3 -2
mediawiki/extensions/CentralAuthmaster+6 -3
mediawiki/coremaster+8 -0
mediawiki/extensions/OAImaster+2 -0
mediawiki/extensions/Echomaster+4 -1
mediawiki/coremaster+4 -1
mediawiki/coremaster+7 -2
mediawiki/extensions/GettingStartedmaster+5 -1
mediawiki/coremaster+1 -1
mediawiki/coremaster+28 -39
mediawiki/extensions/TimedMediaHandlermaster+30 -10
mediawiki/coreREL1_25+150 -174
mediawiki/coremaster+150 -174
mediawiki/extensions/TimedMediaHandlermaster+35 -63
mediawiki/extensions/FlaggedRevsmaster+67 -21
mediawiki/extensions/LiquidThreadsmaster+8 -7
mediawiki/coremaster+12 -2
mediawiki/extensions/FlaggedRevsmaster+7 -11
mediawiki/coremaster+4 -5
mediawiki/coremaster+1 -1
mediawiki/coremaster+1 -4
mediawiki/extensions/CodeReviewmaster+2 -2
mediawiki/corewmf/1.25wmf23+4 -1
mediawiki/corewmf/1.25wmf24+4 -1
mediawiki/coremaster+3 -1
mediawiki/coremaster+7 -3
mediawiki/coremaster+1 -0
mediawiki/extensions/CentralAuthmaster+1 -3
mediawiki/coremaster+1 -1
mediawiki/coremaster+3 -12
mediawiki/extensions/MobileFrontendmaster+1 -0
mediawiki/coremaster+50 -4
mediawiki/extensions/ConfirmEditmaster+9 -4
mediawiki/coremaster+25 -10
mediawiki/extensions/MobileFrontendmaster+1 -1
mediawiki/coremaster+1 -11
mediawiki/coremaster+12 -12
mediawiki/extensions/LiquidThreadsmaster+1 -1
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
Resolvedaaron
OpenNone
Resolvedaude
Duplicate Gilles
Duplicateaaron
Resolvedaaron
DuplicateNone
Resolvedhoo
DuplicateNone
DuplicateNone
Resolvedaaron
OpenNone
OpenNone
ResolvedPRODUCTION ERRORaaron
DuplicateNone
Resolvedaaron
ResolvedNikerabbit
Resolvedaaron
Resolvedaaron
DuplicateNone
DuplicateNone
ResolvedMarcoAurelio
Resolvedaaron
Resolvedtstarling
ResolvedLadsgroup
ResolvedKrinkle
Duplicateaaron
Resolvedkostajh
ResolvedHuji
Resolvedaaron
Resolvedaaron
Resolvedaaron
OpenPRODUCTION ERRORNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 348030 had a related patch set uploaded (by Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Make opportunistic password hash upgrades post-send

https://gerrit.wikimedia.org/r/348030

Change 348030 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Make opportunistic password hash upgrades post-send

https://gerrit.wikimedia.org/r/348030

Change 348658 had a related patch set uploaded (by Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Avoid triggering master queries in ApiValidatePassword

https://gerrit.wikimedia.org/r/348658

Change 350969 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Avoid master queries in loadAndLazyInit() for miser mode

https://gerrit.wikimedia.org/r/350969

Change 350969 merged by jenkins-bot:
[mediawiki/core@master] Avoid master queries in loadAndLazyInit() for miser mode

https://gerrit.wikimedia.org/r/350969

Change 351790 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Add $flags parameter to renameInProgressOn()

https://gerrit.wikimedia.org/r/351790

Change 351790 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Add $flags parameter to renameInProgressOn()

https://gerrit.wikimedia.org/r/351790

Change 353823 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Avoid master queries in beginSecondaryAuthentication()

https://gerrit.wikimedia.org/r/353823

Change 353824 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/PageTriage@master] Avoid DB_MASTER queries on HTTP GET in ArticleMetadata->getMetadata

https://gerrit.wikimedia.org/r/353824

Change 353825 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/CentralAuth@master] Avoid master queries in SpecialGlobalRenameProgress

https://gerrit.wikimedia.org/r/353825

Change 353827 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Avoid DB_MASTER queries in User::newSystemUser() when possible

https://gerrit.wikimedia.org/r/353827

Change 353829 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/Translate@master] Use TranslateUtils::getSafeReadDB() in loadAggregateGroups

https://gerrit.wikimedia.org/r/353829

Change 353824 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Avoid DB_MASTER queries on HTTP GET in ArticleMetadata->getMetadata

https://gerrit.wikimedia.org/r/353824

Change 353829 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Use TranslateUtils::getSafeReadDB() in loadAggregateGroups

https://gerrit.wikimedia.org/r/353829

Change 353825 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Avoid master queries in SpecialGlobalRenameProgress

https://gerrit.wikimedia.org/r/353825

Change 353827 merged by jenkins-bot:
[mediawiki/core@master] Avoid DB_MASTER queries in User::newSystemUser() when possible

https://gerrit.wikimedia.org/r/353827

Change 353823 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Avoid master queries in beginSecondaryAuthentication()

https://gerrit.wikimedia.org/r/353823

Change 348658 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Avoid triggering master queries in ApiValidatePassword

https://gerrit.wikimedia.org/r/348658

Change 499969 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Use the main stash for basic user talk page notifications

https://gerrit.wikimedia.org/r/499969

Change 499985 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Add UserOptionsUpdateJob class and use it for namespaces at SpecialSearch

https://gerrit.wikimedia.org/r/499985

Change 499990 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/NewUserMessage@master] Use a job for triggering new user talk messages

https://gerrit.wikimedia.org/r/499990

Change 499995 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] [WIP] Avoid Category count refresh DB writes on HTTP GET

https://gerrit.wikimedia.org/r/499995

Change 499990 merged by jenkins-bot:
[mediawiki/extensions/NewUserMessage@master] Use a job for triggering new user talk messages

https://gerrit.wikimedia.org/r/499990

Change 499985 abandoned by Aaron Schulz:
Add UserOptionsUpdateJob class and use it for namespaces at SpecialSearch

Reason:
Taking a different route

https://gerrit.wikimedia.org/r/499985

aaron removed aaron as the assignee of this task.May 21 2019, 9:04 PM
aaron removed a project: Patch-For-Review.
Krinkle renamed this task from Fix problematic database master queries performed on HTTP GET/HEAD to Fix database master queries from HTTP GET/HEAD before active-active multid-dc.Jul 2 2020, 2:53 PM
Krinkle subscribed.

I'm slightly rescoping this to give the tracking task a natural end, namely to only track that which we want to get done prior to starting to serve active-active. For the rest we can file individual Sustainability and/or Performance Issue tasks that we might track on our Performance-Team (Radar)

Krinkle renamed this task from Fix database master queries from HTTP GET/HEAD before active-active multid-dc to Fix database master queries from HTTP GET/HEAD before active-active multi-dc.Jul 2 2020, 4:08 PM
Krinkle reassigned this task from Krinkle to aaron.
Krinkle moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.

Change 499995 abandoned by Aaron Schulz:

[mediawiki/core@master] [WIP] Avoid Category count refresh DB writes on HTTP GET

Reason:

https://gerrit.wikimedia.org/r/499995

Aklapper removed a subscriber: Gilles.

Removing task assignee due to inactivity as this open task has been assigned for more than two years. See the email sent to the task assignee on August 22nd, 2022.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome!
If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

Change 499969 abandoned by Aaron Schulz:

[mediawiki/core@master] Use the main stash for basic user talk page notifications

Reason:

https://gerrit.wikimedia.org/r/499969