source: ValBot/Docs/Generator and replace options.txt

Last change on this file was 1169, checked in by iritscen, 3 years ago

ValBot: Reorganized files. Updated docs with more helpful information.

File size: 23.1 KB
Line 
1
2This bot will make direct text replacements.
3
4It will retrieve information on which pages might need changes either from
5an XML dump or a text file, or only change a single page.
6
7These command line parameters can be used to specify which pages to work on:
8
9GENERATOR OPTIONS
10=================
11
12-cat Work on all pages which are in a specific category.
13 Argument can also be given as "-cat:categoryname" or
14 as "-cat:categoryname|fromtitle" (using # instead of |
15 is also allowed in this one and the following)
16
17-catr Like -cat, but also recursively includes pages in
18 subcategories, sub-subcategories etc. of the
19 given category.
20 Argument can also be given as "-catr:categoryname" or
21 as "-catr:categoryname|fromtitle".
22
23-subcats Work on all subcategories of a specific category.
24 Argument can also be given as "-subcats:categoryname" or
25 as "-subcats:categoryname|fromtitle".
26
27-subcatsr Like -subcats, but also includes sub-subcategories etc. of
28 the given category.
29 Argument can also be given as "-subcatsr:categoryname" or
30 as "-subcatsr:categoryname|fromtitle".
31
32-uncat Work on all pages which are not categorised.
33
34-uncatcat Work on all categories which are not categorised.
35
36-uncatfiles Work on all files which are not categorised.
37
38-file Read a list of pages to treat from the named text file.
39 Page titles in the file may be either enclosed with
40 [[brackets]], or be separated by new lines.
41 Argument can also be given as "-file:filename".
42
43-filelinks Work on all pages that use a certain image/media file.
44 Argument can also be given as "-filelinks:filename".
45
46-search Work on all pages that are found in a MediaWiki search
47 across all namespaces.
48
49-logevents Work on articles that were on a specified Special:Log.
50 The value may be a comma separated list of these values:
51
52 logevent,username,start,end
53
54 or for backward compatibility:
55
56 logevent,username,total
57
58 Note: 'start' is the most recent date and log events are
59 iterated from present to past. If 'start'' is not provided,
60 it means 'now'; if 'end' is not provided, it means 'since
61 the beginning'.
62
63 To use the default value, use an empty string.
64 You have options for every type of logs given by the
65 log event parameter which could be one of the following:
66
67 spamblacklist, titleblacklist, gblblock, renameuser,
68 globalauth, gblrights, gblrename, abusefilter,
69 massmessage, thanks, usermerge, block, protect, rights,
70 delete, upload, move, import, patrol, merge, suppress,
71 tag, managetags, contentmodel, review, stable,
72 timedmediahandler, newusers
73
74 It uses the default number of pages 10.
75
76 Examples:
77
78 -logevents:move gives pages from move log (usually
79 redirects)
80 -logevents:delete,,20 gives 20 pages from deletion log
81 -logevents:protect,Usr gives pages from protect log by user
82 Usr
83 -logevents:patrol,Usr,20 gives 20 patrolled pages by Usr
84 -logevents:upload,,20121231,20100101 gives upload pages
85 in the 2010s, 2011s, and 2012s
86 -logevents:review,,20121231 gives review pages since the
87 beginning till the 31 Dec 2012
88 -logevents:review,Usr,20121231 gives review pages by user
89 Usr since the beginning till the 31 Dec 2012
90
91 In some cases it must be given as -logevents:"move,Usr,20"
92
93-interwiki Work on the given page and all equivalent pages in other
94 languages. This can, for example, be used to fight
95 multi-site spamming.
96 Attention: this will cause the bot to modify
97 pages on several wiki sites, this is not well tested,
98 so check your edits!
99
100-links Work on all pages that are linked from a certain page.
101 Argument can also be given as "-links:linkingpagetitle".
102
103-liverecentchanges Work on pages from the live recent changes feed. If used as
104 -liverecentchanges:x, work on x recent changes.
105
106-imagesused Work on all images that contained on a certain page.
107 Can also be given as "-imagesused:linkingpagetitle".
108
109-newimages Work on the most recent new images. If given as
110 -newimages:x, will work on x newest images.
111
112-newpages Work on the most recent new pages. If given as -newpages:x,
113 will work on x newest pages.
114
115-recentchanges Work on the pages with the most recent changes. If
116 given as -recentchanges:x, will work on the x most recently
117 changed pages. If given as -recentchanges:offset,duration
118 it will work on pages changed from 'offset' minutes with
119 'duration' minutes of timespan. rctags are supported too.
120 The rctag must be the very first parameter part.
121
122 Examples:
123
124 -recentchanges:20 gives the 20 most recently changed pages
125 -recentchanges:120,70 will give pages with 120 offset
126 minutes and 70 minutes of timespan
127 -recentchanges:visualeditor,10 gives the 10 most recently
128 changed pages marked with 'visualeditor'
129 -recentchanges:"mobile edit,60,35" will retrieve pages
130 marked with 'mobile edit' for the given offset and timespan
131
132-unconnectedpages Work on the most recent unconnected pages to the Wikibase
133 repository. Given as -unconnectedpages:x, will work on the
134 x most recent unconnected pages.
135
136-ref Work on all pages that link to a certain page.
137 Argument can also be given as "-ref:referredpagetitle".
138
139-start Specifies that the robot should go alphabetically through
140 all pages on the home wiki, starting at the named page.
141 Argument can also be given as "-start:pagetitle".
142
143 You can also include a namespace. For example,
144 "-start:Template:!" will make the bot work on all pages
145 in the template namespace.
146
147 default value is start:!
148
149-prefixindex Work on pages commencing with a common prefix.
150
151-transcludes Work on all pages that use a certain template.
152 Argument can also be given as "-transcludes:Title".
153
154-unusedfiles Work on all description pages of images/media files that
155 are not used anywhere.
156 Argument can be given as "-unusedfiles:n" where
157 n is the maximum number of articles to work on.
158
159-lonelypages Work on all articles that are not linked from any other
160 article.
161 Argument can be given as "-lonelypages:n" where
162 n is the maximum number of articles to work on.
163
164-unwatched Work on all articles that are not watched by anyone.
165 Argument can be given as "-unwatched:n" where
166 n is the maximum number of articles to work on.
167
168-property:name Work on all pages with a given property name from
169 Special:PagesWithProp.
170
171-usercontribs Work on all articles that were edited by a certain user.
172 (Example : -usercontribs:DumZiBoT)
173
174-weblink Work on all articles that contain an external link to
175 a given URL; may be given as "-weblink:url"
176
177-withoutinterwiki Work on all pages that don't have interlanguage links.
178 Argument can be given as "-withoutinterwiki:n" where
179 n is the total to fetch.
180
181-mysqlquery Takes a MySQL query string like
182 "SELECT page_namespace, page_title FROM page
183 WHERE page_namespace = 0" and treats
184 the resulting pages. See
185 https://www.mediawiki.org/wiki/Manual:Pywikibot/MySQL
186 for more details.
187
188-sparql Takes a SPARQL SELECT query string including ?item
189 and works on the resulting pages.
190
191-sparqlendpoint Specify SPARQL endpoint URL (optional).
192 (Example : -sparqlendpoint:http://myserver.com/sparql)
193
194-searchitem Takes a search string and works on Wikibase pages that
195 contain it.
196 Argument can be given as "-searchitem:text", where text
197 is the string to look for, or "-searchitem:lang:text",
198 where lang is the language to search items in.
199
200-wantedpages Work on pages that are linked, but do not exist;
201 may be given as "-wantedpages:n" where n is the maximum
202 number of articles to work on.
203
204-wantedcategories Work on categories that are used, but do not exist;
205 may be given as "-wantedcategories:n" where n is the
206 maximum number of categories to work on.
207
208-wantedfiles Work on files that are used, but do not exist;
209 may be given as "-wantedfiles:n" where n is the maximum
210 number of files to work on.
211
212-wantedtemplates Work on templates that are used, but do not exist;
213 may be given as "-wantedtemplates:n" where n is the
214 maximum number of templates to work on.
215
216-random Work on random pages returned by [[Special:Random]].
217 Can also be given as "-random:n" where n is the number
218 of pages to be returned.
219
220-randomredirect Work on random redirect pages returned by
221 [[Special:RandomRedirect]]. Can also be given as
222 "-randomredirect:n" where n is the number of pages to be
223 returned.
224
225-google Work on all pages that are found in a Google search.
226 You need a Google Web API license key. Note that Google
227 doesn't give out license keys anymore. See google_key in
228 config.py for instructions.
229 Argument can also be given as "-google:searchstring".
230
231-page Work on a single page. Argument can also be given as
232 "-page:pagetitle", and supplied multiple times for
233 multiple pages.
234
235-pageid Work on a single pageid. Argument can also be given as
236 "-pageid:pageid1,pageid2,." or
237 "-pageid:'pageid1|pageid2|..'"
238 and supplied multiple times for multiple pages.
239
240-linter Work on pages that contain lint errors. Extension Linter
241 must be available on the site.
242 -linter select all categories.
243 -linter:high, -linter:medium or -linter:low select all
244 categories for that prio.
245 Single categories can be selected with commas as in
246 -linter:cat1,cat2,cat3
247
248 Adding '/int' identifies Lint ID to start querying from:
249 e.g. -linter:high/10000
250
251 -linter:show just shows available categories.
252
253-querypage:name Work on pages provided by a QueryPage-based special page,
254 see https://www.mediawiki.org/wiki/API:Querypage.
255 (tip: use -limit:n to fetch only n pages).
256
257 -querypage shows special pages available.
258
259
260FILTER OPTIONS
261==============
262
263-catfilter Filter the page generator to only yield pages in the
264 specified category. See -cat generator for argument format.
265
266-grep A regular expression that needs to match the article
267 otherwise the page won't be returned.
268 Multiple -grep:regexpr can be provided and the page will
269 be returned if content is matched by any of the regexpr
270 provided.
271 Case insensitive regular expressions will be used and
272 dot matches any character, including a newline.
273
274-grepnot Like -grep, but return the page only if the regular
275 expression does not match.
276
277-intersect Work on the intersection of all the provided generators.
278
279-limit When used with any other argument -limit:n specifies a set
280 of pages, work on no more than n pages in total.
281
282-namespaces Filter the page generator to only yield pages in the
283-namespace specified namespaces. Separate multiple namespace
284-ns numbers or names with commas.
285
286 Examples:
287
288 -ns:0,2,4
289 -ns:Help,MediaWiki
290
291 You may use a preleading "not" to exclude the namespace.
292
293 Examples:
294
295 -ns:not:2,3
296 -ns:not:Help,File
297
298 If used with -newpages/-random/-randomredirect/linter
299 generators, -namespace/ns must be provided before
300 -newpages/-random/-randomredirect/linter.
301 If used with -recentchanges generator, efficiency is
302 improved if -namespace is provided before -recentchanges.
303
304 If used with -start generator, -namespace/ns shall contain
305 only one value.
306
307-onlyif A claim the page needs to contain, otherwise the item won't
308 be returned.
309 The format is property=value,qualifier=value. Multiple (or
310 none) qualifiers can be passed, separated by commas.
311
312 Examples:
313
314 P1=Q2 (property P1 must contain value Q2),
315 P3=Q4,P5=Q6,P6=Q7 (property P3 with value Q4 and
316 qualifiers: P5 with value Q6 and P6 with value Q7).
317 Value can be page ID, coordinate in format:
318 latitude,longitude[,precision] (all values are in decimal
319 degrees), year, or plain string.
320 The argument can be provided multiple times and the item
321 page will be returned only if all claims are present.
322 Argument can be also given as "-onlyif:expression".
323
324-onlyifnot A claim the page must not contain, otherwise the item won't
325 be returned.
326 For usage and examples, see -onlyif above.
327
328-ql Filter pages based on page quality.
329 This is only applicable if contentmodel equals
330 'proofread-page', otherwise has no effects.
331 Valid values are in range 0-4.
332 Multiple values can be comma-separated.
333
334-subpage -subpage:n filters pages to only those that have depth n
335 i.e. a depth of 0 filters out all pages that are subpages,
336 and a depth of 1 filters out all pages that are subpages of
337 subpages.
338
339
340-titleregex A regular expression that needs to match the article title
341 otherwise the page won't be returned.
342 Multiple -titleregex:regexpr can be provided and the page
343 will be returned if title is matched by any of the regexpr
344 provided.
345 Case insensitive regular expressions will be used and
346 dot matches any character.
347
348-titleregexnot Like -titleregex, but return the page only if the regular
349 expression does not match.
350
351Furthermore, the following command line parameters are supported:
352
353-mysqlquery Retrieve information from a local database mirror.
354 If no query specified, bot searches for pages with
355 given replacements.
356
357-xml Retrieve information from a local XML dump
358 (pages-articles or pages-meta-current, see
359 https://dumps.wikimedia.org). Argument can also
360 be given as "-xml:filename".
361
362-regex Make replacements using regular expressions. If this argument
363 isn't given, the bot will make simple text replacements.
364
365-nocase Use case insensitive regular expressions.
366
367-dotall Make the dot match any character at all, including a newline.
368 Without this flag, '.' will match anything except a newline.
369
370-multiline '^' and '$' will now match begin and end of each line.
371
372-xmlstart (Only works with -xml) Skip all articles in the XML dump
373 before the one specified (may also be given as
374 -xmlstart:Article).
375
376-addcat:cat_name Adds "cat_name" category to every altered page.
377
378-excepttitle:XYZ Skip pages with titles that contain XYZ. If the -regex
379 argument is given, XYZ will be regarded as a regular
380 expression.
381
382-requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex
383 argument is given, XYZ will be regarded as a regular
384 expression.
385
386-excepttext:XYZ Skip pages which contain the text XYZ. If the -regex
387 argument is given, XYZ will be regarded as a regular
388 expression.
389
390-exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie
391 within XYZ. If the -regex argument is given, XYZ will be
392 regarded as a regular expression.
393
394-exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie
395 within an XYZ tag.
396
397-summary:XYZ Set the summary message text for the edit to XYZ, bypassing
398 the predefined message texts with original and replacements
399 inserted. Can't be used with -automaticsummary.
400
401-automaticsummary Uses an automatic summary for all replacements which don't
402 have a summary defined. Can't be used with -summary.
403
404-sleep:123 If you use -fix you can check multiple regex at the same time
405 in every page. This can lead to a great waste of CPU because
406 the bot will check every regex without waiting using all the
407 resources. This will slow it down between a regex and another
408 in order not to waste too much CPU.
409
410-fix:XYZ Perform one of the predefined replacements tasks, which are
411 given in the dictionary 'fixes' defined inside the files
412 fixes.py and user-fixes.py.
413
414 Currently available predefined fixes are:
415
416 * HTML - Convert HTML tags to wiki syntax, and
417 fix XHTML.
418 * isbn - Fix badly formatted ISBNs.
419 * syntax - Try to fix bad wiki markup. Do not run
420 this in automatic mode, as the bot may
421 make mistakes.
422 * syntax-safe - Like syntax, but less risky, so you can
423 run this in automatic mode.
424 * case-de - fix upper/lower case errors in German
425 * grammar-de - fix grammar and typography in German
426 * vonbis - Ersetze Binde-/Gedankenstrich durch "bis"
427 in German
428 * music - Links auf Begriffsklärungen in German
429 * datum - specific date formats in German
430 * correct-ar - Typo corrections for Arabic Wikipedia and any
431 Arabic wiki.
432 * yu-tld - Fix links to .yu domains because it is
433 disabled, see:
434 https://lists.wikimedia.org/pipermail/wikibots-l/2009-February/000290.html
435 * fckeditor - Try to convert FCKeditor HTML tags to wiki
436 syntax.
437
438-manualinput Request manual replacements via the command line input even
439 if replacements are already defined. If this option is set
440 (or no replacements are defined via -fix or the arguments)
441 it'll ask for additional replacements at start.
442
443-pairsfile Lines from the given file name(s) will be read as replacement
444 arguments. i.e. a file containing lines "a" and "b", used as:
445
446 python pwb.py replace -page:X -pairsfile:file c d
447
448 will replace 'a' with 'b' and 'c' with 'd'.
449
450-always Don't prompt you for each replacement
451
452-recursive Recurse replacement as long as possible. Be careful, this
453 might lead to an infinite loop.
454
455-allowoverlap When occurrences of the pattern overlap, replace all of them.
456 Be careful, this might lead to an infinite loop.
457
458-fullsummary Use one large summary for all command line replacements.
459
460other: First argument is the old text, second argument is the new
461 text. If the -regex argument is given, the first argument
462 will be regarded as a regular expression, and the second
463 argument might contain expressions like \1 or \g<name>.
464 It is possible to introduce more than one pair of old text
465 and replacement.
466
467Examples
468--------
469
470If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the
471new syntax, e.g. {{Stub}}, download an XML dump file (pages-articles) from
472https://dumps.wikimedia.org, then use this command:
473
474 python pwb.py replace -xml -regex "{{msg:(.*?)}}" "{{\1}}"
475
476If you have a dump called foobar.xml and want to fix typos in articles, e.g.
477Errror -> Error, use this:
478
479 python pwb.py replace -xml:foobar.xml "Errror" "Error" -namespace:0
480
481If you want to do more than one replacement at a time, use this:
482
483 python pwb.py replace -xml:foobar.xml "Errror" "Error" "Faail" "Fail" \
484 -namespace:0
485
486If you have a page called 'John Doe' and want to fix the format of ISBNs, use:
487
488 python pwb.py replace -page:John_Doe -fix:isbn
489
490This command will change 'referer' to 'referrer', but not in pages which
491talk about HTTP, where the typo has become part of the standard:
492
493 python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP
494
495Please type "python pwb.py replace -help | more" if you can't read
496the top of the help.
497
498
499GLOBAL OPTIONS
500==============
501For global options use -help:global or run pwb.py -help
502
503
Note: See TracBrowser for help on using the repository browser.