HEX
Server: Apache
System: Linux s198.coreserver.jp 5.15.0-151-generic #161-Ubuntu SMP Tue Jul 22 14:25:40 UTC 2025 x86_64
User: nagasaki (10062)
PHP: 7.1.33
Disabled: NONE
Upload Files
File: //usr/local/rvm/gems/default/doc/nokogiri-1.12.5-x86_64-linux/ri/Nokogiri/HTML5/cdesc-HTML5.ri
U:RDoc::NormalModule[iI"
HTML5:ETI"Nokogiri::HTML5;T0o:RDoc::Markup::Document:@parts[
o;;[:
@fileI"ext/nokogiri/nokogiri.c;T:0@omit_headings_from_table_of_contents_below0o;;[SS:RDoc::Markup::Heading:
leveli:	textI"
Usage;To:RDoc::Markup::BlankLineo:RDoc::Markup::Paragraph;[I"Parse an HTML5 document:;T@o:RDoc::Markup::Verbatim;[I""doc = Nokogiri.HTML5(string)
;T:@format0o;;[I"Parse an HTML5 fragment:;T@o;;[I"1fragment = Nokogiri::HTML5.fragment(string)
;T;0S;;i;
I"Parsing options;T@o;;[I"bThe document and fragment parsing methods support options that are different from Nokogiri's.;T@o:RDoc::Markup::List:
@type:BULLET:@items[
o:RDoc::Markup::ListItem:@label0;[o;;[I"K<tt>Nokogiri.HTML5(html, url = nil, encoding = nil, options = {})</tt>;To;;0;[o;;[I"R<tt>Nokogiri::HTML5.parse(html, url = nil, encoding = nil, options = {})</tt>;To;;0;[o;;[I"\<tt>Nokogiri::HTML5::Document.parse(html, url = nil, encoding = nil, options = {})</tt>;To;;0;[o;;[I"J<tt>Nokogiri::HTML5.fragment(html, encoding = nil, options = {})</tt>;To;;0;[o;;[I"Y<tt>Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, options = {})</tt>;T@o;;[I"TThe three currently supported options are +:max_errors+, +:max_tree_depth+ and ;TI"(+:max_attributes+, described below.;T@S;;i;
I"Error reporting;T@o;;[I"bNokogiri contains an experimental HTML5 parse error reporting facility. By default, no parse ;TI"[errors are reported but this can be configured by passing the +:max_errors+ option to ;TI"'{HTML5.parse} or {HTML5.fragment}.;T@o;;[I"For example, this script:;T@o;;[	I"Vdoc = Nokogiri::HTML5.parse('<span/>Hi there!</span foo=bar />', max_errors: 10)
;TI"doc.errors.each do |err|
;TI"  puts(err)
;TI"	end
;T;0o;;[I"Emits:;T@o;;[I"*1:1: ERROR: Expected a doctype token
;TI"'<span/>Hi there!</span foo=bar />
;TI"^
;TI"L1:1: ERROR: Start tag of nonvoid HTML element ends with '/>', use '>'.
;TI"'<span/>Hi there!</span foo=bar />
;TI"^
;TI"31:17: ERROR: End tag ends with '/>', use '>'.
;TI"'<span/>Hi there!</span foo=bar />
;TI"                ^
;TI"/1:17: ERROR: End tag contains attributes.
;TI"'<span/>Hi there!</span foo=bar />
;TI"                ^
;T;0o;;[I"[Using <tt>max_errors: -1</tt> results in an unlimited number of errors being returned.;T@o;;[I"cThe errors returned by {HTML5::Document#errors} are instances of {Nokogiri::XML::SyntaxError}.;T@o;;[	I"dThe {https://html.spec.whatwg.org/multipage/parsing.html#parse-errors HTML standard} defines a ;TI"dnumber of standard parse error codes. These error codes only cover the "tokenization" stage of ;TI"dparsing HTML. The parse errors in the "tree construction" stage do not have standardized error ;TI"codes (yet).;T@o;;[I"SAs a convenience to Nokogiri users, the defined error codes are available via ;TI".{Nokogiri::XML::SyntaxError#str1} method.;T@o;;[
I"Vdoc = Nokogiri::HTML5.parse('<span/>Hi there!</span foo=bar />', max_errors: 10)
;TI"doc.errors.each do |err|
;TI"6  puts("#{err.line}:#{err.column}: #{err.str1}")
;TI"	end
;TI"# => 1:1: generic-parser
;TI"E#    1:1: non-void-html-element-start-tag-with-trailing-solidus
;TI".#    1:17: end-tag-with-trailing-solidus
;TI"(#    1:17: end-tag-with-attributes
;T;0o;;[I"dNote that the first error is +generic-parser+ because it's an error from the tree construction ;TI"6stage and doesn't have a standardized error code.;T@o;;[I"cFor the purposes of semantic versioning, the error messages, error locations, and error codes ;TI"dare not part of Nokogiri's public API. That is, these are subject to change without Nokogiri's ;TI"Jmajor version number changing. These may be stabilized in the future.;T@S;;i;
I"Maximum tree depth;T@o;;[I"dThe maximum depth of the DOM tree parsed by the various parsing methods is configurable by the ;TI"Y+:max_tree_depth+ option. If the depth of the tree would exceed this limit, then an ;TI"!{::ArgumentError} is thrown.;T@o;;[I"bThis limit (which defaults to <tt>Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH = 400</tt>) can be ;TI">removed by giving the option <tt>max_tree_depth: -1</tt>.;T@o;;[	I"/html = '<!DOCTYPE html>' + '<div>' * 1000
;TI" doc = Nokogiri.HTML5(html)
;TI"@# raises ArgumentError: Document tree depth limit exceeded
;TI"4doc = Nokogiri.HTML5(html, max_tree_depth: -1)
;T;0S;;i;
I" Attribute limit per element;T@o;;[I"_The maximum number of attributes per DOM element is configurable by the +:max_attributes+ ;TI"]option. If a given element would exceed this limit, then an {::ArgumentError} is thrown.;T@o;;[I"bThis limit (which defaults to <tt>Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES = 400</tt>) can be ;TI">removed by giving the option <tt>max_attributes: -1</tt>.;T@o;;[
I"Whtml = '<!DOCTYPE html><div ' + (1..1000).map { |x| "attr-#{x}" }.join(' ') + '>'
;TI"A# "<!DOCTYPE html><div attr-1 attr-2 attr-3 ... attr-1000>"
;TI" doc = Nokogiri.HTML5(html)
;TI"C# raises ArgumentError: Attributes per element limit exceeded
;TI"4doc = Nokogiri.HTML5(html, max_attributes: -1)
;T;0S;;i;
I"HTML Serialization;T@o;;[
I"cAfter parsing HTML, it may be serialized using any of the {Nokogiri::XML::Node} serialization ;TI"cmethods. In particular, {XML::Node#serialize}, {XML::Node#to_html}, and {XML::Node#to_s} will ;TI"Vserialize a given node and its children. (This is the equivalent of JavaScript's ;TI"d+Element.outerHTML+.) Similarly, {XML::Node#inner_html} will serialize the children of a given ;TI"Hnode. (This is the equivalent of JavaScript's +Element.innerHTML+.);T@o;;[I"Gdoc = Nokogiri::HTML5("<!DOCTYPE html><span>Hello world!</span>")
;TI"puts doc.serialize
;TI"Z# => <!DOCTYPE html><html><head></head><body><span>Hello world!</span></body></html>
;T;0o;;[	I"\Due to quirks in how HTML is parsed and serialized, it's possible for a DOM tree to be ;TI"bserialized and then re-parsed, resulting in a different DOM.  Mostly, this happens with DOMs ;TI"bproduced from invalid HTML. Unfortunately, even valid HTML may not survive serialization and ;TI"re-parsing.;T@o;;[I"fIn particular, a newline at the start of +pre+, +listing+, and +textarea+ elements is ignored by ;TI"the parser.;T@o;;[I"#doc = Nokogiri::HTML5(<<-EOF)
;TI"<!DOCTYPE html>
;TI"<pre>
;TI"Content</pre>
;TI"	EOF
;TI"-puts doc.at('/html/body/pre').serialize
;TI"# => <pre>Content</pre>
;T;0o;;[
I"bIn this case, the original HTML is semantically equivalent to the serialized version. If the ;TI"a+pre+, +listing+, or +textarea+ content starts with two newlines, the first newline will be ;TI"cstripped on the first parse and the second newline will be stripped on the second, leading to ;TI"csemantically different DOMs. Passing the parameter <tt>preserve_newline: true</tt> will cause ;TI"\two or more newlines to be preserved. (A single leading newline will still be removed.);T@o;;[I"#doc = Nokogiri::HTML5(<<-EOF)
;TI"<!DOCTYPE html>
;TI"<listing>
;TI"
;TI"Content</listing>
;TI"	EOF
;TI"Iputs doc.at('/html/body/listing').serialize(preserve_newline: true)
;TI"# => <listing>
;TI"#
;TI"#    Content</listing>
;T;0S;;i;
I"Encodings;T@o;;[I"bNokogiri always parses HTML5 using {https://en.wikipedia.org/wiki/UTF-8 UTF-8}; however, the ;TI"eencoding of the input can be explicitly selected via the optional +encoding+ parameter. This is ;TI"Nmost useful when the input comes not from a string but from an IO object.;T@o;;[	I"eWhen serializing a document or node, the encoding of the output string can be specified via the ;TI"e+:encoding+ options. Characters that cannot be encoded in the selected encoding will be encoded ;TI"eas {https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references HTML numeric ;TI"entities}.;T@o;;[I"Pfrag = Nokogiri::HTML5.fragment('<span>아는 길도 물어가라</span>')
;TI"1html = frag.serialize(encoding: 'US-ASCII')
;TI"puts html
;TI"Z# => <span>&#xc544;&#xb294; &#xae38;&#xb3c4; &#xbb3c;&#xc5b4;&#xac00;&#xb77c;</span>
;TI"+frag = Nokogiri::HTML5.fragment(html)
;TI"puts frag.serialize
;TI"2# => <span>아는 길도 물어가라</span>
;T;0o;;[	I"c(There's a {https://bugs.ruby-lang.org/issues/15033 bug} in all current versions of Ruby that ;TI"ccan cause the entity encoding to fail. Of the mandated supported encodings for HTML, the only ;TI"cencoding I'm aware of that has this bug is <tt>'ISO-2022-JP'</tt>. We recommend avoiding this ;TI"encoding.);T@S;;i;
I"
Notes;T@o;;;;[	o;;0;[o;;[	I"JThe {Nokogiri::HTML5.fragment} function takes a string and parses it ;TI"Mas a HTML5 document.  The +<html>+, +<head>+, and +<body>+ elements are ;TI"Premoved from this document, and any children of these elements that remain ;TI";are returned as a {Nokogiri::HTML5::DocumentFragment}.;T@o;;0;[o;;[I"NThe {Nokogiri::HTML5.parse} function takes a string and passes it to the ;TI"N<code>gumbo_parse_with_options</code> method, using the default options. ;TI"3The resulting Gumbo parse tree is then walked.;T@o;;0;[o;;[I"NInstead of uppercase element names, lowercase element names are produced.;T@o;;0;[o;;[I"NInstead of returning +unknown+ as the element name for unknown tags, the ;TI",original tag name is returned verbatim.;T@o;;[I"@since v1.12.0 ;TI"C@note HTML5 functionality is not available when running JRuby.;T;	I"lib/nokogiri/html5.rb;T;
0o;;[;	I"#lib/nokogiri/html5/document.rb;T;
0o;;[;	I",lib/nokogiri/html5/document_fragment.rb;T;
0o;;[;	I"lib/nokogiri/html5/node.rb;T;
0;	0;
0[[U:RDoc::Constant[iI"HTML_NAMESPACE;TI"$Nokogiri::HTML5::HTML_NAMESPACE;T:public0o;;[o;;[I"#HTML uses the XHTML namespace.;T;	@;
0@@cRDoc::NormalModule0U;[iI"MATHML_NAMESPACE;TI"&Nokogiri::HTML5::MATHML_NAMESPACE;T;0o;;[;	@;
0@@@+0U;[iI"SVG_NAMESPACE;TI"#Nokogiri::HTML5::SVG_NAMESPACE;T;0o;;[;	@;
0@@@+0U;[iI"XLINK_NAMESPACE;TI"%Nokogiri::HTML5::XLINK_NAMESPACE;T;0o;;[;	@;
0@@@+0U;[iI"XML_NAMESPACE;TI"#Nokogiri::HTML5::XML_NAMESPACE;T;0o;;[;	@;
0@@@+0U;[iI"XMLNS_NAMESPACE;TI"%Nokogiri::HTML5::XMLNS_NAMESPACE;T;0o;;[;	@;
0@@@+0[[[I"
class;T[[;[[I"
fragment;TI"lib/nokogiri/html5.rb;T[I"get;T@S[I"
parse;T@S[:protected[[:private[[I"escape_text;T@S[I"
get_impl;T@S[I"prepend_newline?;T@S[I"read_and_encode;T@S[I"
reencode;T@S[I"serialize_node_internal;T@S[I"
instance;T[[;[[;[[;[[[U:RDoc::Context::Section[i0o;;[;	0;
0[
@
@@@@I"
Nokogiri;T@+