Update to beautifulsoup 4.4.0.
Upstream changelog: Especially important changes: * Added a warning when you instantiate a BeautifulSoup object without explicitly naming a parser. [bug=1398866] * __repr__ now returns an ASCII bytestring in Python 2, and a Unicode string in Python 3, instead of a UTF8-encoded bytestring in both versions. In Python 3, __str__ now returns a Unicode string instead of a bytestring. [bug=1420131] * The `text` argument to the find_* methods is now called `string`, which is more accurate. `text` still works, but `string` is the argument described in the documentation. `text` may eventually change its meaning, but not for a very long time. [bug=1366856] * Changed the way soup objects work under copy.copy(). Copying a NavigableString or a Tag will give you a new NavigableString that's equal to the old one but not connected to the parse tree. Patch by Martijn Peters. [bug=1307490] * Started using a standard MIT license. [bug=1294662] * Added a Chinese translation of the documentation by Delong .w. New features: * Introduced the select_one() method, which uses a CSS selector but only returns the first match, instead of a list of matches. [bug=1349367] * You can now create a Tag object without specifying a TreeBuilder. Patch by Martijn Pieters. [bug=1307471] * You can now create a NavigableString or a subclass just by invoking the constructor. [bug=1294315] * Added an `exclude_encodings` argument to UnicodeDammit and to the Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408] * The select() method now supports selector grouping. Patch by Francisco Canas [bug=1191917] Bug fixes: * Fixed yet another problem that caused the html5lib tree builder to create a disconnected parse tree. [bug=1237763] * Force object_was_parsed() to keep the tree intact even when an element from later in the document is moved into place. [bug=1430633] * Fixed yet another bug that caused a disconnected tree when html5lib copied an element from one part of the tree to another. [bug=1270611] * Fixed a bug where Element.extract() could create an infinite loop in the remaining tree. * The select() method can now find tags whose names contain dashes. Patch by Francisco Canas. [bug=1276211] * The select() method can now find tags with attributes whose names contain dashes. Patch by Marek Kapolka. [bug=1304007] * Improved the lxml tree builder's handling of processing instructions. [bug=1294645] * Restored the helpful syntax error that happens when you try to import the Python 2 edition of Beautiful Soup under Python 3. [bug=1213387] * In Python 3.4 and above, set the new convert_charrefs argument to the html.parser constructor to avoid a warning and future failures. Patch by Stefano Revera. [bug=1375721] * The warning when you pass in a filename or URL as markup will now be displayed correctly even if the filename or URL is a Unicode string. [bug=1268888] * If the initial <html> tag contains a CDATA list attribute such as 'class', the html5lib tree builder will now turn its value into a list, as it would with any other tag. [bug=1296481] * Fixed an import error in Python 3.5 caused by the removal of the HTMLParseError class. [bug=1420063] * Improved docstring for encode_contents() and decode_contents(). [bug=1441543] * Fixed a crash in Unicode, Dammit's encoding detector when the name of the encoding itself contained invalid bytes. [bug=1360913] * Improved the exception raised when you call .unwrap() or .replace_with() on an element that's not attached to a tree. * Raise a NotImplementedError whenever an unsupported CSS pseudoclass is used in select(). Previously some cases did not result in a NotImplementedError. * It's now possible to pickle a BeautifulSoup object no matter which tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545]
This commit is contained in:
parent
b127c7b069
commit
d232437105
@ -51,7 +51,7 @@ def import_chromium(bookmarks_file):
|
|||||||
"""Import bookmarks from a HTML file generated by Chromium."""
|
"""Import bookmarks from a HTML file generated by Chromium."""
|
||||||
import bs4
|
import bs4
|
||||||
with open(bookmarks_file, encoding='utf-8') as f:
|
with open(bookmarks_file, encoding='utf-8') as f:
|
||||||
soup = bs4.BeautifulSoup(f)
|
soup = bs4.BeautifulSoup(f, 'html.parser')
|
||||||
|
|
||||||
html_tags = soup.findAll('a')
|
html_tags = soup.findAll('a')
|
||||||
|
|
||||||
|
2
tox.ini
2
tox.ini
@ -66,7 +66,7 @@ setenv = PYTHONPATH={toxinidir}/scripts/dev
|
|||||||
deps =
|
deps =
|
||||||
-r{toxinidir}/requirements.txt
|
-r{toxinidir}/requirements.txt
|
||||||
astroid==1.3.6
|
astroid==1.3.6
|
||||||
beautifulsoup4==4.3.2
|
beautifulsoup4==4.4.0
|
||||||
pylint==1.4.4
|
pylint==1.4.4
|
||||||
logilab-common==1.0.1
|
logilab-common==1.0.1
|
||||||
six==1.9.0
|
six==1.9.0
|
||||||
|
Loading…
Reference in New Issue
Block a user