1 |
2 | This bot will make direct text replacements.
3 |
4 | It will retrieve information on which pages might need changes either from
5 | an XML dump or a text file, or only change a single page.
6 |
7 | These command line parameters can be used to specify which pages to work on:
8 |
10 | =================
11 |
12 | -cat Work on all pages which are in a specific category.
13 | Argument can also be given as "-cat:categoryname" or
14 | as "-cat:categoryname|fromtitle" (using # instead of |
15 | is also allowed in this one and the following)
16 |
17 | -catr Like -cat, but also recursively includes pages in
18 | subcategories, sub-subcategories etc. of the
19 | given category.
20 | Argument can also be given as "-catr:categoryname" or
21 | as "-catr:categoryname|fromtitle".
22 |
23 | -subcats Work on all subcategories of a specific category.
24 | Argument can also be given as "-subcats:categoryname" or
25 | as "-subcats:categoryname|fromtitle".
26 |
27 | -subcatsr Like -subcats, but also includes sub-subcategories etc. of
28 | the given category.
29 | Argument can also be given as "-subcatsr:categoryname" or
30 | as "-subcatsr:categoryname|fromtitle".
31 |
32 | -uncat Work on all pages which are not categorised.
33 |
34 | -uncatcat Work on all categories which are not categorised.
35 |
36 | -uncatfiles Work on all files which are not categorised.
37 |
38 | -file Read a list of pages to treat from the named text file.
39 | Page titles in the file may be either enclosed with
40 | [[brackets]], or be separated by new lines.
41 | Argument can also be given as "-file:filename".
42 |
43 | -filelinks Work on all pages that use a certain image/media file.
44 | Argument can also be given as "-filelinks:filename".
45 |
46 | -search Work on all pages that are found in a MediaWiki search
47 | across all namespaces.
48 |
49 | -logevents Work on articles that were on a specified Special:Log.
50 | The value may be a comma separated list of these values:
51 |
52 | logevent,username,start,end
53 |
54 | or for backward compatibility:
55 |
56 | logevent,username,total
57 |
58 | Note: 'start' is the most recent date and log events are
59 | iterated from present to past. If 'start'' is not provided,
60 | it means 'now'; if 'end' is not provided, it means 'since
61 | the beginning'.
62 |
63 | To use the default value, use an empty string.
64 | You have options for every type of logs given by the
65 | log event parameter which could be one of the following:
66 |
67 | spamblacklist, titleblacklist, gblblock, renameuser,
68 | globalauth, gblrights, gblrename, abusefilter,
69 | massmessage, thanks, usermerge, block, protect, rights,
70 | delete, upload, move, import, patrol, merge, suppress,
71 | tag, managetags, contentmodel, review, stable,
72 | timedmediahandler, newusers
73 |
74 | It uses the default number of pages 10.
75 |
76 | Examples:
77 |
78 | -logevents:move gives pages from move log (usually
79 | redirects)
80 | -logevents:delete,,20 gives 20 pages from deletion log
81 | -logevents:protect,Usr gives pages from protect log by user
82 | Usr
83 | -logevents:patrol,Usr,20 gives 20 patrolled pages by Usr
84 | -logevents:upload,,20121231,20100101 gives upload pages
85 | in the 2010s, 2011s, and 2012s
86 | -logevents:review,,20121231 gives review pages since the
87 | beginning till the 31 Dec 2012
88 | -logevents:review,Usr,20121231 gives review pages by user
89 | Usr since the beginning till the 31 Dec 2012
90 |
91 | In some cases it must be given as -logevents:"move,Usr,20"
92 |
93 | -interwiki Work on the given page and all equivalent pages in other
94 | languages. This can, for example, be used to fight
95 | multi-site spamming.
96 | Attention: this will cause the bot to modify
97 | pages on several wiki sites, this is not well tested,
98 | so check your edits!
99 |
100 | -links Work on all pages that are linked from a certain page.
101 | Argument can also be given as "-links:linkingpagetitle".
102 |
103 | -liverecentchanges Work on pages from the live recent changes feed. If used as
104 | -liverecentchanges:x, work on x recent changes.
105 |
106 | -imagesused Work on all images that contained on a certain page.
107 | Can also be given as "-imagesused:linkingpagetitle".
108 |
109 | -newimages Work on the most recent new images. If given as
110 | -newimages:x, will work on x newest images.
111 |
112 | -newpages Work on the most recent new pages. If given as -newpages:x,
113 | will work on x newest pages.
114 |
115 | -recentchanges Work on the pages with the most recent changes. If
116 | given as -recentchanges:x, will work on the x most recently
117 | changed pages. If given as -recentchanges:offset,duration
118 | it will work on pages changed from 'offset' minutes with
119 | 'duration' minutes of timespan. rctags are supported too.
120 | The rctag must be the very first parameter part.
121 |
122 | Examples:
123 |
124 | -recentchanges:20 gives the 20 most recently changed pages
125 | -recentchanges:120,70 will give pages with 120 offset
126 | minutes and 70 minutes of timespan
127 | -recentchanges:visualeditor,10 gives the 10 most recently
128 | changed pages marked with 'visualeditor'
129 | -recentchanges:"mobile edit,60,35" will retrieve pages
130 | marked with 'mobile edit' for the given offset and timespan
131 |
132 | -unconnectedpages Work on the most recent unconnected pages to the Wikibase
133 | repository. Given as -unconnectedpages:x, will work on the
134 | x most recent unconnected pages.
135 |
136 | -ref Work on all pages that link to a certain page.
137 | Argument can also be given as "-ref:referredpagetitle".
138 |
139 | -start Specifies that the robot should go alphabetically through
140 | all pages on the home wiki, starting at the named page.
141 | Argument can also be given as "-start:pagetitle".
142 |
143 | You can also include a namespace. For example,
144 | "-start:Template:!" will make the bot work on all pages
145 | in the template namespace.
146 |
147 | default value is start:!
148 |
149 | -prefixindex Work on pages commencing with a common prefix.
150 |
151 | -transcludes Work on all pages that use a certain template.
152 | Argument can also be given as "-transcludes:Title".
153 |
154 | -unusedfiles Work on all description pages of images/media files that
155 | are not used anywhere.
156 | Argument can be given as "-unusedfiles:n" where
157 | n is the maximum number of articles to work on.
158 |
159 | -lonelypages Work on all articles that are not linked from any other
160 | article.
161 | Argument can be given as "-lonelypages:n" where
162 | n is the maximum number of articles to work on.
163 |
164 | -unwatched Work on all articles that are not watched by anyone.
165 | Argument can be given as "-unwatched:n" where
166 | n is the maximum number of articles to work on.
167 |
168 | -property:name Work on all pages with a given property name from
169 | Special:PagesWithProp.
170 |
171 | -usercontribs Work on all articles that were edited by a certain user.
172 | (Example : -usercontribs:DumZiBoT)
173 |
174 | -weblink Work on all articles that contain an external link to
175 | a given URL; may be given as "-weblink:url"
176 |
177 | -withoutinterwiki Work on all pages that don't have interlanguage links.
178 | Argument can be given as "-withoutinterwiki:n" where
179 | n is the total to fetch.
180 |
181 | -mysqlquery Takes a MySQL query string like
182 | "SELECT page_namespace, page_title FROM page
183 | WHERE page_namespace = 0" and treats
184 | the resulting pages. See
185 | https://www.mediawiki.org/wiki/Manual:Pywikibot/MySQL
186 | for more details.
187 |
188 | -sparql Takes a SPARQL SELECT query string including ?item
189 | and works on the resulting pages.
190 |
191 | -sparqlendpoint Specify SPARQL endpoint URL (optional).
192 | (Example : -sparqlendpoint:http://myserver.com/sparql)
193 |
194 | -searchitem Takes a search string and works on Wikibase pages that
195 | contain it.
196 | Argument can be given as "-searchitem:text", where text
197 | is the string to look for, or "-searchitem:lang:text",
198 | where lang is the language to search items in.
199 |
200 | -wantedpages Work on pages that are linked, but do not exist;
201 | may be given as "-wantedpages:n" where n is the maximum
202 | number of articles to work on.
203 |
204 | -wantedcategories Work on categories that are used, but do not exist;
205 | may be given as "-wantedcategories:n" where n is the
206 | maximum number of categories to work on.
207 |
208 | -wantedfiles Work on files that are used, but do not exist;
209 | may be given as "-wantedfiles:n" where n is the maximum
210 | number of files to work on.
211 |
212 | -wantedtemplates Work on templates that are used, but do not exist;
213 | may be given as "-wantedtemplates:n" where n is the
214 | maximum number of templates to work on.
215 |
216 | -random Work on random pages returned by [[Special:Random]].
217 | Can also be given as "-random:n" where n is the number
218 | of pages to be returned.
219 |
220 | -randomredirect Work on random redirect pages returned by
221 | [[Special:RandomRedirect]]. Can also be given as
222 | "-randomredirect:n" where n is the number of pages to be
223 | returned.
224 |
225 | -google Work on all pages that are found in a Google search.
226 | You need a Google Web API license key. Note that Google
227 | doesn't give out license keys anymore. See google_key in
228 | config.py for instructions.
229 | Argument can also be given as "-google:searchstring".
230 |
231 | -page Work on a single page. Argument can also be given as
232 | "-page:pagetitle", and supplied multiple times for
233 | multiple pages.
234 |
235 | -pageid Work on a single pageid. Argument can also be given as
236 | "-pageid:pageid1,pageid2,." or
237 | "-pageid:'pageid1|pageid2|..'"
238 | and supplied multiple times for multiple pages.
239 |
240 | -linter Work on pages that contain lint errors. Extension Linter
241 | must be available on the site.
242 | -linter select all categories.
243 | -linter:high, -linter:medium or -linter:low select all
244 | categories for that prio.
245 | Single categories can be selected with commas as in
246 | -linter:cat1,cat2,cat3
247 |
248 | Adding '/int' identifies Lint ID to start querying from:
249 | e.g. -linter:high/10000
250 |
251 | -linter:show just shows available categories.
252 |
253 | -querypage:name Work on pages provided by a QueryPage-based special page,
254 | see https://www.mediawiki.org/wiki/API:Querypage.
255 | (tip: use -limit:n to fetch only n pages).
256 |
257 | -querypage shows special pages available.
258 |
259 |
261 | ==============
262 |
263 | -catfilter Filter the page generator to only yield pages in the
264 | specified category. See -cat generator for argument format.
265 |
266 | -grep A regular expression that needs to match the article
267 | otherwise the page won't be returned.
268 | Multiple -grep:regexpr can be provided and the page will
269 | be returned if content is matched by any of the regexpr
270 | provided.
271 | Case insensitive regular expressions will be used and
272 | dot matches any character, including a newline.
273 |
274 | -grepnot Like -grep, but return the page only if the regular
275 | expression does not match.
276 |
277 | -intersect Work on the intersection of all the provided generators.
278 |
279 | -limit When used with any other argument -limit:n specifies a set
280 | of pages, work on no more than n pages in total.
281 |
282 | -namespaces Filter the page generator to only yield pages in the
283 | -namespace specified namespaces. Separate multiple namespace
284 | -ns numbers or names with commas.
285 |
286 | Examples:
287 |
288 | -ns:0,2,4
289 | -ns:Help,MediaWiki
290 |
291 | You may use a preleading "not" to exclude the namespace.
292 |
293 | Examples:
294 |
295 | -ns:not:2,3
296 | -ns:not:Help,File
297 |
298 | If used with -newpages/-random/-randomredirect/linter
299 | generators, -namespace/ns must be provided before
300 | -newpages/-random/-randomredirect/linter.
301 | If used with -recentchanges generator, efficiency is
302 | improved if -namespace is provided before -recentchanges.
303 |
304 | If used with -start generator, -namespace/ns shall contain
305 | only one value.
306 |
307 | -onlyif A claim the page needs to contain, otherwise the item won't
308 | be returned.
309 | The format is property=value,qualifier=value. Multiple (or
310 | none) qualifiers can be passed, separated by commas.
311 |
312 | Examples:
313 |
314 | P1=Q2 (property P1 must contain value Q2),
315 | P3=Q4,P5=Q6,P6=Q7 (property P3 with value Q4 and
316 | qualifiers: P5 with value Q6 and P6 with value Q7).
317 | Value can be page ID, coordinate in format:
318 | latitude,longitude[,precision] (all values are in decimal
319 | degrees), year, or plain string.
320 | The argument can be provided multiple times and the item
321 | page will be returned only if all claims are present.
322 | Argument can be also given as "-onlyif:expression".
323 |
324 | -onlyifnot A claim the page must not contain, otherwise the item won't
325 | be returned.
326 | For usage and examples, see -onlyif above.
327 |
328 | -ql Filter pages based on page quality.
329 | This is only applicable if contentmodel equals
330 | 'proofread-page', otherwise has no effects.
331 | Valid values are in range 0-4.
332 | Multiple values can be comma-separated.
333 |
334 | -subpage -subpage:n filters pages to only those that have depth n
335 | i.e. a depth of 0 filters out all pages that are subpages,
336 | and a depth of 1 filters out all pages that are subpages of
337 | subpages.
338 |
339 |
340 | -titleregex A regular expression that needs to match the article title
341 | otherwise the page won't be returned.
342 | Multiple -titleregex:regexpr can be provided and the page
343 | will be returned if title is matched by any of the regexpr
344 | provided.
345 | Case insensitive regular expressions will be used and
346 | dot matches any character.
347 |
348 | -titleregexnot Like -titleregex, but return the page only if the regular
349 | expression does not match.
350 |
351 | Furthermore, the following command line parameters are supported:
352 |
353 | -mysqlquery Retrieve information from a local database mirror.
354 | If no query specified, bot searches for pages with
355 | given replacements.
356 |
357 | -xml Retrieve information from a local XML dump
358 | (pages-articles or pages-meta-current, see
359 | https://dumps.wikimedia.org). Argument can also
360 | be given as "-xml:filename".
361 |
362 | -regex Make replacements using regular expressions. If this argument
363 | isn't given, the bot will make simple text replacements.
364 |
365 | -nocase Use case insensitive regular expressions.
366 |
367 | -dotall Make the dot match any character at all, including a newline.
368 | Without this flag, '.' will match anything except a newline.
369 |
370 | -multiline '^' and '$' will now match begin and end of each line.
371 |
372 | -xmlstart (Only works with -xml) Skip all articles in the XML dump
373 | before the one specified (may also be given as
374 | -xmlstart:Article).
375 |
376 | -addcat:cat_name Adds "cat_name" category to every altered page.
377 |
378 | -excepttitle:XYZ Skip pages with titles that contain XYZ. If the -regex
379 | argument is given, XYZ will be regarded as a regular
380 | expression.
381 |
382 | -requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex
383 | argument is given, XYZ will be regarded as a regular
384 | expression.
385 |
386 | -excepttext:XYZ Skip pages which contain the text XYZ. If the -regex
387 | argument is given, XYZ will be regarded as a regular
388 | expression.
389 |
390 | -exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie
391 | within XYZ. If the -regex argument is given, XYZ will be
392 | regarded as a regular expression.
393 |
394 | -exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie
395 | within an XYZ tag.
396 |
397 | -summary:XYZ Set the summary message text for the edit to XYZ, bypassing
398 | the predefined message texts with original and replacements
399 | inserted. Can't be used with -automaticsummary.
400 |
401 | -automaticsummary Uses an automatic summary for all replacements which don't
402 | have a summary defined. Can't be used with -summary.
403 |
404 | -sleep:123 If you use -fix you can check multiple regex at the same time
405 | in every page. This can lead to a great waste of CPU because
406 | the bot will check every regex without waiting using all the
407 | resources. This will slow it down between a regex and another
408 | in order not to waste too much CPU.
409 |
410 | -fix:XYZ Perform one of the predefined replacements tasks, which are
411 | given in the dictionary 'fixes' defined inside the files
412 | fixes.py and user-fixes.py.
413 |
414 | Currently available predefined fixes are:
415 |
416 | * HTML - Convert HTML tags to wiki syntax, and
417 | fix XHTML.
418 | * isbn - Fix badly formatted ISBNs.
419 | * syntax - Try to fix bad wiki markup. Do not run
420 | this in automatic mode, as the bot may
421 | make mistakes.
422 | * syntax-safe - Like syntax, but less risky, so you can
423 | run this in automatic mode.
424 | * case-de - fix upper/lower case errors in German
425 | * grammar-de - fix grammar and typography in German
426 | * vonbis - Ersetze Binde-/Gedankenstrich durch "bis"
427 | in German
428 | * music - Links auf Begriffsklärungen in German
429 | * datum - specific date formats in German
430 | * correct-ar - Typo corrections for Arabic Wikipedia and any
431 | Arabic wiki.
432 | * yu-tld - Fix links to .yu domains because it is
433 | disabled, see:
434 | https://lists.wikimedia.org/pipermail/wikibots-l/2009-February/000290.html
435 | * fckeditor - Try to convert FCKeditor HTML tags to wiki
436 | syntax.
437 |
438 | -manualinput Request manual replacements via the command line input even
439 | if replacements are already defined. If this option is set
440 | (or no replacements are defined via -fix or the arguments)
441 | it'll ask for additional replacements at start.
442 |
443 | -pairsfile Lines from the given file name(s) will be read as replacement
444 | arguments. i.e. a file containing lines "a" and "b", used as:
445 |
446 | python pwb.py replace -page:X -pairsfile:file c d
447 |
448 | will replace 'a' with 'b' and 'c' with 'd'.
449 |
450 | -always Don't prompt you for each replacement
451 |
452 | -recursive Recurse replacement as long as possible. Be careful, this
453 | might lead to an infinite loop.
454 |
455 | -allowoverlap When occurrences of the pattern overlap, replace all of them.
456 | Be careful, this might lead to an infinite loop.
457 |
458 | -fullsummary Use one large summary for all command line replacements.
459 |
460 | other: First argument is the old text, second argument is the new
461 | text. If the -regex argument is given, the first argument
462 | will be regarded as a regular expression, and the second
463 | argument might contain expressions like \1 or \g<name>.
464 | It is possible to introduce more than one pair of old text
465 | and replacement.
466 |
467 | Examples
468 | --------
469 |
470 | If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the
471 | new syntax, e.g. {{Stub}}, download an XML dump file (pages-articles) from
472 | https://dumps.wikimedia.org, then use this command:
473 |
474 | python pwb.py replace -xml -regex "{{msg:(.*?)}}" "{{\1}}"
475 |
476 | If you have a dump called foobar.xml and want to fix typos in articles, e.g.
477 | Errror -> Error, use this:
478 |
479 | python pwb.py replace -xml:foobar.xml "Errror" "Error" -namespace:0
480 |
481 | If you want to do more than one replacement at a time, use this:
482 |
483 | python pwb.py replace -xml:foobar.xml "Errror" "Error" "Faail" "Fail" \
484 | -namespace:0
485 |
486 | If you have a page called 'John Doe' and want to fix the format of ISBNs, use:
487 |
488 | python pwb.py replace -page:John_Doe -fix:isbn
489 |
490 | This command will change 'referer' to 'referrer', but not in pages which
491 | talk about HTTP, where the typo has become part of the standard:
492 |
493 | python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP
494 |
495 | Please type "python pwb.py replace -help | more" if you can't read
496 | the top of the help.
497 |
498 |
500 | ==============
501 | For global options use -help:global or run pwb.py -help
502 |
503 |