This bot will make direct text replacements. It will retrieve information on which pages might need changes either from an XML dump or a text file, or only change a single page. These command line parameters can be used to specify which pages to work on: GENERATOR OPTIONS ================= -cat Work on all pages which are in a specific category. Argument can also be given as "-cat:categoryname" or as "-cat:categoryname|fromtitle" (using # instead of | is also allowed in this one and the following) -catr Like -cat, but also recursively includes pages in subcategories, sub-subcategories etc. of the given category. Argument can also be given as "-catr:categoryname" or as "-catr:categoryname|fromtitle". -subcats Work on all subcategories of a specific category. Argument can also be given as "-subcats:categoryname" or as "-subcats:categoryname|fromtitle". -subcatsr Like -subcats, but also includes sub-subcategories etc. of the given category. Argument can also be given as "-subcatsr:categoryname" or as "-subcatsr:categoryname|fromtitle". -uncat Work on all pages which are not categorised. -uncatcat Work on all categories which are not categorised. -uncatfiles Work on all files which are not categorised. -file Read a list of pages to treat from the named text file. Page titles in the file may be either enclosed with [[brackets]], or be separated by new lines. Argument can also be given as "-file:filename". -filelinks Work on all pages that use a certain image/media file. Argument can also be given as "-filelinks:filename". -search Work on all pages that are found in a MediaWiki search across all namespaces. -logevents Work on articles that were on a specified Special:Log. The value may be a comma separated list of these values: logevent,username,start,end or for backward compatibility: logevent,username,total Note: 'start' is the most recent date and log events are iterated from present to past. If 'start'' is not provided, it means 'now'; if 'end' is not provided, it means 'since the beginning'. To use the default value, use an empty string. You have options for every type of logs given by the log event parameter which could be one of the following: spamblacklist, titleblacklist, gblblock, renameuser, globalauth, gblrights, gblrename, abusefilter, massmessage, thanks, usermerge, block, protect, rights, delete, upload, move, import, patrol, merge, suppress, tag, managetags, contentmodel, review, stable, timedmediahandler, newusers It uses the default number of pages 10. Examples: -logevents:move gives pages from move log (usually redirects) -logevents:delete,,20 gives 20 pages from deletion log -logevents:protect,Usr gives pages from protect log by user Usr -logevents:patrol,Usr,20 gives 20 patrolled pages by Usr -logevents:upload,,20121231,20100101 gives upload pages in the 2010s, 2011s, and 2012s -logevents:review,,20121231 gives review pages since the beginning till the 31 Dec 2012 -logevents:review,Usr,20121231 gives review pages by user Usr since the beginning till the 31 Dec 2012 In some cases it must be given as -logevents:"move,Usr,20" -interwiki Work on the given page and all equivalent pages in other languages. This can, for example, be used to fight multi-site spamming. Attention: this will cause the bot to modify pages on several wiki sites, this is not well tested, so check your edits! -links Work on all pages that are linked from a certain page. Argument can also be given as "-links:linkingpagetitle". -liverecentchanges Work on pages from the live recent changes feed. If used as -liverecentchanges:x, work on x recent changes. -imagesused Work on all images that contained on a certain page. Can also be given as "-imagesused:linkingpagetitle". -newimages Work on the most recent new images. If given as -newimages:x, will work on x newest images. -newpages Work on the most recent new pages. If given as -newpages:x, will work on x newest pages. -recentchanges Work on the pages with the most recent changes. If given as -recentchanges:x, will work on the x most recently changed pages. If given as -recentchanges:offset,duration it will work on pages changed from 'offset' minutes with 'duration' minutes of timespan. rctags are supported too. The rctag must be the very first parameter part. Examples: -recentchanges:20 gives the 20 most recently changed pages -recentchanges:120,70 will give pages with 120 offset minutes and 70 minutes of timespan -recentchanges:visualeditor,10 gives the 10 most recently changed pages marked with 'visualeditor' -recentchanges:"mobile edit,60,35" will retrieve pages marked with 'mobile edit' for the given offset and timespan -unconnectedpages Work on the most recent unconnected pages to the Wikibase repository. Given as -unconnectedpages:x, will work on the x most recent unconnected pages. -ref Work on all pages that link to a certain page. Argument can also be given as "-ref:referredpagetitle". -start Specifies that the robot should go alphabetically through all pages on the home wiki, starting at the named page. Argument can also be given as "-start:pagetitle". You can also include a namespace. For example, "-start:Template:!" will make the bot work on all pages in the template namespace. default value is start:! -prefixindex Work on pages commencing with a common prefix. -transcludes Work on all pages that use a certain template. Argument can also be given as "-transcludes:Title". -unusedfiles Work on all description pages of images/media files that are not used anywhere. Argument can be given as "-unusedfiles:n" where n is the maximum number of articles to work on. -lonelypages Work on all articles that are not linked from any other article. Argument can be given as "-lonelypages:n" where n is the maximum number of articles to work on. -unwatched Work on all articles that are not watched by anyone. Argument can be given as "-unwatched:n" where n is the maximum number of articles to work on. -property:name Work on all pages with a given property name from Special:PagesWithProp. -usercontribs Work on all articles that were edited by a certain user. (Example : -usercontribs:DumZiBoT) -weblink Work on all articles that contain an external link to a given URL; may be given as "-weblink:url" -withoutinterwiki Work on all pages that don't have interlanguage links. Argument can be given as "-withoutinterwiki:n" where n is the total to fetch. -mysqlquery Takes a MySQL query string like "SELECT page_namespace, page_title FROM page WHERE page_namespace = 0" and treats the resulting pages. See https://www.mediawiki.org/wiki/Manual:Pywikibot/MySQL for more details. -sparql Takes a SPARQL SELECT query string including ?item and works on the resulting pages. -sparqlendpoint Specify SPARQL endpoint URL (optional). (Example : -sparqlendpoint:http://myserver.com/sparql) -searchitem Takes a search string and works on Wikibase pages that contain it. Argument can be given as "-searchitem:text", where text is the string to look for, or "-searchitem:lang:text", where lang is the language to search items in. -wantedpages Work on pages that are linked, but do not exist; may be given as "-wantedpages:n" where n is the maximum number of articles to work on. -wantedcategories Work on categories that are used, but do not exist; may be given as "-wantedcategories:n" where n is the maximum number of categories to work on. -wantedfiles Work on files that are used, but do not exist; may be given as "-wantedfiles:n" where n is the maximum number of files to work on. -wantedtemplates Work on templates that are used, but do not exist; may be given as "-wantedtemplates:n" where n is the maximum number of templates to work on. -random Work on random pages returned by [[Special:Random]]. Can also be given as "-random:n" where n is the number of pages to be returned. -randomredirect Work on random redirect pages returned by [[Special:RandomRedirect]]. Can also be given as "-randomredirect:n" where n is the number of pages to be returned. -google Work on all pages that are found in a Google search. You need a Google Web API license key. Note that Google doesn't give out license keys anymore. See google_key in config.py for instructions. Argument can also be given as "-google:searchstring". -page Work on a single page. Argument can also be given as "-page:pagetitle", and supplied multiple times for multiple pages. -pageid Work on a single pageid. Argument can also be given as "-pageid:pageid1,pageid2,." or "-pageid:'pageid1|pageid2|..'" and supplied multiple times for multiple pages. -linter Work on pages that contain lint errors. Extension Linter must be available on the site. -linter select all categories. -linter:high, -linter:medium or -linter:low select all categories for that prio. Single categories can be selected with commas as in -linter:cat1,cat2,cat3 Adding '/int' identifies Lint ID to start querying from: e.g. -linter:high/10000 -linter:show just shows available categories. -querypage:name Work on pages provided by a QueryPage-based special page, see https://www.mediawiki.org/wiki/API:Querypage. (tip: use -limit:n to fetch only n pages). -querypage shows special pages available. FILTER OPTIONS ============== -catfilter Filter the page generator to only yield pages in the specified category. See -cat generator for argument format. -grep A regular expression that needs to match the article otherwise the page won't be returned. Multiple -grep:regexpr can be provided and the page will be returned if content is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character, including a newline. -grepnot Like -grep, but return the page only if the regular expression does not match. -intersect Work on the intersection of all the provided generators. -limit When used with any other argument -limit:n specifies a set of pages, work on no more than n pages in total. -namespaces Filter the page generator to only yield pages in the -namespace specified namespaces. Separate multiple namespace -ns numbers or names with commas. Examples: -ns:0,2,4 -ns:Help,MediaWiki You may use a preleading "not" to exclude the namespace. Examples: -ns:not:2,3 -ns:not:Help,File If used with -newpages/-random/-randomredirect/linter generators, -namespace/ns must be provided before -newpages/-random/-randomredirect/linter. If used with -recentchanges generator, efficiency is improved if -namespace is provided before -recentchanges. If used with -start generator, -namespace/ns shall contain only one value. -onlyif A claim the page needs to contain, otherwise the item won't be returned. The format is property=value,qualifier=value. Multiple (or none) qualifiers can be passed, separated by commas. Examples: P1=Q2 (property P1 must contain value Q2), P3=Q4,P5=Q6,P6=Q7 (property P3 with value Q4 and qualifiers: P5 with value Q6 and P6 with value Q7). Value can be page ID, coordinate in format: latitude,longitude[,precision] (all values are in decimal degrees), year, or plain string. The argument can be provided multiple times and the item page will be returned only if all claims are present. Argument can be also given as "-onlyif:expression". -onlyifnot A claim the page must not contain, otherwise the item won't be returned. For usage and examples, see -onlyif above. -ql Filter pages based on page quality. This is only applicable if contentmodel equals 'proofread-page', otherwise has no effects. Valid values are in range 0-4. Multiple values can be comma-separated. -subpage -subpage:n filters pages to only those that have depth n i.e. a depth of 0 filters out all pages that are subpages, and a depth of 1 filters out all pages that are subpages of subpages. -titleregex A regular expression that needs to match the article title otherwise the page won't be returned. Multiple -titleregex:regexpr can be provided and the page will be returned if title is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character. -titleregexnot Like -titleregex, but return the page only if the regular expression does not match. Furthermore, the following command line parameters are supported: -mysqlquery Retrieve information from a local database mirror. If no query specified, bot searches for pages with given replacements. -xml Retrieve information from a local XML dump (pages-articles or pages-meta-current, see https://dumps.wikimedia.org). Argument can also be given as "-xml:filename". -regex Make replacements using regular expressions. If this argument isn't given, the bot will make simple text replacements. -nocase Use case insensitive regular expressions. -dotall Make the dot match any character at all, including a newline. Without this flag, '.' will match anything except a newline. -multiline '^' and '$' will now match begin and end of each line. -xmlstart (Only works with -xml) Skip all articles in the XML dump before the one specified (may also be given as -xmlstart:Article). -addcat:cat_name Adds "cat_name" category to every altered page. -excepttitle:XYZ Skip pages with titles that contain XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression. -requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression. -excepttext:XYZ Skip pages which contain the text XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression. -exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie within XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression. -exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie within an XYZ tag. -summary:XYZ Set the summary message text for the edit to XYZ, bypassing the predefined message texts with original and replacements inserted. Can't be used with -automaticsummary. -automaticsummary Uses an automatic summary for all replacements which don't have a summary defined. Can't be used with -summary. -sleep:123 If you use -fix you can check multiple regex at the same time in every page. This can lead to a great waste of CPU because the bot will check every regex without waiting using all the resources. This will slow it down between a regex and another in order not to waste too much CPU. -fix:XYZ Perform one of the predefined replacements tasks, which are given in the dictionary 'fixes' defined inside the files fixes.py and user-fixes.py. Currently available predefined fixes are: * HTML - Convert HTML tags to wiki syntax, and fix XHTML. * isbn - Fix badly formatted ISBNs. * syntax - Try to fix bad wiki markup. Do not run this in automatic mode, as the bot may make mistakes. * syntax-safe - Like syntax, but less risky, so you can run this in automatic mode. * case-de - fix upper/lower case errors in German * grammar-de - fix grammar and typography in German * vonbis - Ersetze Binde-/Gedankenstrich durch "bis" in German * music - Links auf Begriffsklärungen in German * datum - specific date formats in German * correct-ar - Typo corrections for Arabic Wikipedia and any Arabic wiki. * yu-tld - Fix links to .yu domains because it is disabled, see: https://lists.wikimedia.org/pipermail/wikibots-l/2009-February/000290.html * fckeditor - Try to convert FCKeditor HTML tags to wiki syntax. -manualinput Request manual replacements via the command line input even if replacements are already defined. If this option is set (or no replacements are defined via -fix or the arguments) it'll ask for additional replacements at start. -pairsfile Lines from the given file name(s) will be read as replacement arguments. i.e. a file containing lines "a" and "b", used as: python pwb.py replace -page:X -pairsfile:file c d will replace 'a' with 'b' and 'c' with 'd'. -always Don't prompt you for each replacement -recursive Recurse replacement as long as possible. Be careful, this might lead to an infinite loop. -allowoverlap When occurrences of the pattern overlap, replace all of them. Be careful, this might lead to an infinite loop. -fullsummary Use one large summary for all command line replacements. other: First argument is the old text, second argument is the new text. If the -regex argument is given, the first argument will be regarded as a regular expression, and the second argument might contain expressions like \1 or \g. It is possible to introduce more than one pair of old text and replacement. Examples -------- If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the new syntax, e.g. {{Stub}}, download an XML dump file (pages-articles) from https://dumps.wikimedia.org, then use this command: python pwb.py replace -xml -regex "{{msg:(.*?)}}" "{{\1}}" If you have a dump called foobar.xml and want to fix typos in articles, e.g. Errror -> Error, use this: python pwb.py replace -xml:foobar.xml "Errror" "Error" -namespace:0 If you want to do more than one replacement at a time, use this: python pwb.py replace -xml:foobar.xml "Errror" "Error" "Faail" "Fail" \ -namespace:0 If you have a page called 'John Doe' and want to fix the format of ISBNs, use: python pwb.py replace -page:John_Doe -fix:isbn This command will change 'referer' to 'referrer', but not in pages which talk about HTTP, where the typo has become part of the standard: python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP Please type "python pwb.py replace -help | more" if you can't read the top of the help. GLOBAL OPTIONS ============== For global options use -help:global or run pwb.py -help