Page MenuHomePhabricator

Add checkbox on Special:ListUsers to display only users in temporary user groups
Closed, ResolvedPublic

Description

Can a "Show only users in temporary user groups" be added to Special:ListUsers, please? It should work for the selected user group or in "(all)" mode. Thanks.

Event Timeline

Change 374357 had a related patch set uploaded (by Framawiki; owner: Framawiki):
[mediawiki/core@master] Add checkbox on Special:ListUsers to display only users in temporary user groups

https://gerrit.wikimedia.org/r/374357

Framawiki triaged this task as Medium priority.

@Framawiki As you added the user-notice tag: you have a timeline for when this will go into production?

@Framawiki As you added the user-notice tag: you have a timeline for when this will go into production?

Considering that I have just proposed a new modification in my patch, I can't tell you when it will be merged. But I suppose that @ReleaseTaggerBot will add a project when it'll be part of the train ?

To answer to @Anomie's question What are the database queries that result when your new checkbox is checked? on gerrit, here is an example from my vagrant's mediawiki-wiki-debug.log :

[DBQuery] wiki SELECT /* IndexPager::buildQueryInfo (UsersPager) Admin */  user_name,MAX(user_id) AS `user_id`,MAX(user_editcount) AS `edits`,MIN(user_registration) AS `creation`,MAX(ipb_deleted) AS `ipb_deleted`  FROM `user` LEFT JOIN `user_groups` ON ((user_id=ug_user)) LEFT JOIN `ipblocks` ON ((user_id=ipb_user) AND ipb_auto = '0')   WHERE (ug_expiry >= '20170831185747')  GROUP BY user_name ORDER BY user_name LIMIT 51

ug_expiry only (ug_expiry >= '20170831185747') case was added, previously both ug_expiry > or null (ug_expiry IS NULL OR ug_expiry >= '20170831185747') selector could happen.

Given the expected pattern that most rows in user_groups are going to have ug_expiry as null, the (ug_expiry IS NULL OR ug_expiry >= '20170831185747') condition is reasonably safe because it should filter out very few rows and the planner should in fact decide to use that as a filter.

mysql:wikiadmin@db1089 [enwiki]> explain SELECT /* IndexPager::buildQueryInfo (UsersPager) Admin */  user_name,MAX(user_id) AS `user_id`,MAX(user_editcount) AS `edits`,MIN(user_registration) AS `creation`,MAX(ipb_deleted) AS `ipb_deleted`  FROM `user` LEFT JOIN `user_groups` ON ((user_id=ug_user)) LEFT JOIN `ipblocks` ON ((user_id=ipb_user) AND ipb_auto = '0')   WHERE (ug_expiry >= '20170831185747' OR ug_expiry IS NULL)  GROUP BY user_name ORDER BY user_name LIMIT 51;
+------+-------------+-------------+-------+---------------+-----------+---------+---------------------+------+-------------+
| id   | select_type | table       | type  | possible_keys | key       | key_len | ref                 | rows | Extra       |
+------+-------------+-------------+-------+---------------+-----------+---------+---------------------+------+-------------+
|    1 | SIMPLE      | user        | index | NULL          | user_name | 257     | NULL                |   51 |             |
|    1 | SIMPLE      | user_groups | ref   | PRIMARY       | PRIMARY   | 4       | enwiki.user.user_id |    1 | Using where |
|    1 | SIMPLE      | ipblocks    | ref   | ipb_user      | ipb_user  | 4       | enwiki.user.user_id |    1 | Using where |
+------+-------------+-------------+-------+---------------+-----------+---------+---------------------+------+-------------+

Given that same assumption, the condition without the IS NULL bit is probably going to make the planner decide to query user_groups first and then filesort. And that does seem to be the case.

mysql:wikiadmin@db1089 [enwiki]> explain SELECT /* IndexPager::buildQueryInfo (UsersPager) Admin */  user_name,MAX(user_id) AS `user_id`,MAX(user_editcount) AS `edits`,MIN(user_registration) AS `creation`,MAX(ipb_deleted) AS `ipb_deleted`  FROM `user` LEFT JOIN `user_groups` ON ((user_id=ug_user)) LEFT JOIN `ipblocks` ON ((user_id=ipb_user) AND ipb_auto = '0')   WHERE (ug_expiry >= '20170831185747')  GROUP BY user_name ORDER BY user_name LIMIT 51;
+------+-------------+-------------+--------+-------------------+-----------+---------+----------------------------+------+-----------------------------------------------------------+
| id   | select_type | table       | type   | possible_keys     | key       | key_len | ref                        | rows | Extra                                                     |
+------+-------------+-------------+--------+-------------------+-----------+---------+----------------------------+------+-----------------------------------------------------------+
|    1 | SIMPLE      | user_groups | range  | PRIMARY,ug_expiry | ug_expiry | 17      | NULL                       |   14 | Using where; Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | user        | eq_ref | PRIMARY           | PRIMARY   | 4       | enwiki.user_groups.ug_user |    1 |                                                           |
|    1 | SIMPLE      | ipblocks    | ref    | ipb_user          | ipb_user  | 4       | enwiki.user_groups.ug_user |    1 | Using where                                               |
+------+-------------+-------------+--------+-------------------+-----------+---------+----------------------------+------+-----------------------------------------------------------+

This isn't a bad thing when there really are only 14 rows matching ug_expiry >= '20170831185747', filesorting a handful of rows isn't a big deal. The concern is what happens if/when there aren't so few and the choice is between fetching many rows for filesorting versus scanning many rows to find the fraction that pass the filter.

OTOH, we might just assume that few enough users will ever have advanced permissions so user_groups will always be so small that even a full table scan isn't worth worrying about.

It'd probably be a good idea to get an opinion from @jcrespo or @Marostegui on this one.

Using temporary; Using filesort are a non concern when applied over a small amount of rows- it normally just means a single pass of an in-memory sort, so completely not a problem- we should not always aim to eliminate it, in same cases it is better to have a filesort than a bad index selection.

As Anomie says, if in the future the table general content balance changes, we should reevaluate, but I agree with him on that the current state is more than good enough:

root@db1052[enwiki]> flush status;
Query OK, 0 rows affected (0.02 sec)

root@db1052[enwiki]> SELECT /* IndexPager::buildQueryInfo (UsersPager) Admin */  user_name,MAX(user_id) AS `user_id`,MAX(user_editcount) AS `edits`,MIN(user_registration) AS `creation`,MAX(ipb_deleted) AS `ipb_deleted`  FROM `user` LEFT JOIN `user_groups` ON ((user_id=ug_user)) LEFT JOIN `ipblocks` ON ((user_id=ipb_user) AND ipb_auto = '0')   WHERE (ug_expiry >= '20170831185747')  GROUP BY user_name ORDER BY user_name LIMIT 51;
...
14 rows in set (0.02 sec)

root@db1052[enwiki]> SHOW STATUS like 'Hand%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 43    |
| Handler_read_last          | 0     |
| Handler_read_next          | 14    |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 14    |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 15    |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 14    |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.00 sec)

root@db1052[enwiki]> SHOW STATUS like '%sort%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| Sort_merge_passes         | 0     |
| Sort_priority_queue_sorts | 0     |
| Sort_range                | 0     |
| Sort_rows                 | 14    |
| Sort_scan                 | 1     |
+---------------------------+-------+
5 rows in set (0.00 sec)

root@db1052[enwiki]> SHOW STATUS like '%tmp%'; 
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_files       | 0     |
| Created_tmp_tables      | 1     |
| Handler_tmp_update      | 0     |
| Handler_tmp_write       | 14    |
| Rows_tmp_read           | 59    | # Note this changes on every SHOW STATUS execution
+-------------------------+-------+
6 rows in set (0.00 sec)

Change 374357 merged by jenkins-bot:
[mediawiki/core@master] Add checkbox in Special:ListUsers to display only users in temporary user groups

https://gerrit.wikimedia.org/r/374357