metaformats: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(editorial: no use discouraging additional implementations when clearly folks are finding it useful to build them, seems like stable feature are now possible as well)
(split backcompat metaformats from explicit mf2 metaformats, per https://github.com/microformats/microformats2-parsing/issues/75 request from implementor, and in consideration of reducing chance of an unexpected (possibly worse) new first root microformat in parsed documents)
 
Line 18: Line 18:
For each of the following subsections, apply the changes therein to the steps in the same subsection in the [[microformats2-parsing]] specification.
For each of the following subsections, apply the changes therein to the steps in the same subsection in the [[microformats2-parsing]] specification.


=== parse an element for class microformats ===
=== parse a document for microformats ===
Before the step: "if none found, parse child elements …", insert these steps at the same level:
Change the line "start with an empty JSON…" to:
* start with an empty JSON "items" array and hashes "meta-item", "rels", "rel-urls":


* if none found and the element is a <code>head</code> element and there were no root class names found on the <code>html</code> element
Update the JSON sample data structure to:
** parse <code>meta</code> elements for backcompat properties (defined in [[#meta_backward_compatible_parsing|meta backward compatible parsing]])
<syntaxhighlight lang=json>
** if there is a <code>meta[property=og:type]</code> then
{
*** get that meta element’s <code>content</code> attribute value
"items": [],
*** if it’s "article" then imply a <code>head</code> element root class name of <code>[[h-entry]]</code>
"meta-item": {},
*** if it’s "profile" then imply a <code>head</code> element root class name of <code>[[h-card]]</code>
"rels": {},
*** if it's "music" or "video" then imply a <code>head</code> element root class name of <code>[[h-cite]]</code>
"rel-urls": {}
** end if
}
** if there is no implied <code>head</code> element root class name and there is a <code>meta[name=twitter:card]</code> then
</syntaxhighlight>
*** get that meta element’s <code>content</code> attribute value
 
*** if it's "summary" or "summary_large_image" then imply a <code>head</code> element root class name of <code>[[h-entry]]</code>
Before the line "* return the resulting JSON", insert:
** end if
* parse the <code>head</code> element for backcompat metaformats
** if there is no implied <code>head</code> element root class name and
*** there is a <code>meta[property^=og:]</code> or <code>meta[name^=twitter:]</code>
** then imply a <code>head</code> element root class name of <code>[[h-entry]]</code>
** if there is an implied <code>head</code> element root class name
*** set aside the parsed 'item' from that root and its properties
*** after parsing the rest of the document, append that parsed 'item' from the <code>head</code> element to the end of the top level 'items' array
** end if


=== parsing an element for properties ===
=== parsing an element for properties ===
Line 54: Line 48:
* else if <code>meta.dt-x[content]</code>, then return the <code>content</code> attribute
* else if <code>meta.dt-x[content]</code>, then return the <code>content</code> attribute


== meta backward compatible parsing ==
=== parse the head element for backcompat metaformats ===
The following list of meta elements are to be parsed as the listed [[microformats2]] equivalent properties.
After the section "parse an img element for src and alt", insert a new sibling section:
* if <code>meta[property="og:title"]</code>, parse <code>content</code> attribute as <code>p-name</code>
<pre>=== parse the head element for backcompat metaformats ===</pre>
* else <code>meta[name="twitter:title"]</code>, parse <code>content</code> as <code>p-name</code>
 
* if <code>meta[property="og:description"]</code>, parse <code>content</code> as <code>p-summary</code>
With the following instructions:
* else if <code>meta[name="twitter:description"]</code>, parse <code>content</code> as <code>p-summary</code>
 
* if <code>meta[property="og:image"]</code>, parse <code>content</code> as <code>u-photo</code>
* if there were no root class names found on the <code>html</code> and <code>head</code> elements
* else if <code>meta[name="twitter:image"]</code>, parse <code>content</code> as <code>u-photo</code>
** if there is a <code>meta[property=og:type]</code> then
* if <code>meta[property="og:video"]</code>, parse <code>content</code> as <code>u-video</code>
*** get that meta element’s <code>content</code> attribute value
* if <code>meta[property="og:audio"]</code>, parse <code>content</code> as <code>u-audio</code>
*** if it’s "article" then set the meta-item "type" property to "entry"
* if <code>meta[property="article:published_time"]</code>, parse <code>content</code> as <code>dt-published</code>
*** if it’s "profile" then set the meta-item "type" property to "card"
* if <code>meta[property="article:modified_time"]</code>, parse <code>content</code> as <code>dt-updated</code>
*** if it's "music" or "video" then set the meta-item "type" property to "cite"
* if <code>meta[property="article:author"]</code>, parse <code>content</code> as <code>p-author</code>
** end if
** if there is no meta-item "type" property value and there is a <code>meta[name=twitter:card]</code> then
*** get that meta element’s <code>content</code> attribute value
*** if it's "summary" or "summary_large_image" then set the meta-item "type" property to "entry"
** end if
** if there is no meta-item "type" property value and
*** there is a <code>meta[property^=og:]</code> or <code>meta[name^=twitter:]</code>
** then set the meta-item "type" property to "entry"
** parse <code>meta</code> elements for backcompat properties as follows and add them to a "properties" hash inside the "meta-item" hash:
*** if <code>meta[property="og:title"]</code>, parse <code>content</code> attribute as <code>p-name</code>
*** else <code>meta[name="twitter:title"]</code>, parse <code>content</code> as <code>p-name</code>
*** if <code>meta[property="og:description"]</code>, parse <code>content</code> as <code>p-summary</code>
*** else if <code>meta[name="twitter:description"]</code>, parse <code>content</code> as <code>p-summary</code>
*** if <code>meta[property="og:image"]</code>, parse <code>content</code> as <code>u-photo</code>
*** else if <code>meta[name="twitter:image"]</code>, parse <code>content</code> as <code>u-photo</code>
*** if <code>meta[property="og:video"]</code>, parse <code>content</code> as <code>u-video</code>
*** if <code>meta[property="og:audio"]</code>, parse <code>content</code> as <code>u-audio</code>
*** if <code>meta[property="article:published_time"]</code>, parse <code>content</code> as <code>dt-published</code>
*** if <code>meta[property="article:modified_time"]</code>, parse <code>content</code> as <code>dt-updated</code>
*** if <code>meta[property="article:author"]</code>, parse <code>content</code> as <code>p-author</code>
** end if /* parse meta elements for backcompat properties */
* end if /* no root class names found on html and head */


== Background ==
== Background ==
Line 77: Line 92:
== See Also ==
== See Also ==
* [[microformats2-parsing]]
* [[microformats2-parsing]]
* [[meta]]

Latest revision as of 00:36, 2 December 2023

metaformats is an extension to the microformats2-parsing for parsing invisible data published in HTML meta tags, both with an explicit methodology, and for backward compatibility with existing vocabularies that have multiple testable interoperable implementations.

This is a Living Specification that is subject to change as research discovers meta tag vocabularies in wide use and consumed by multiple implementations. This specification has no stable portions. Features are untested, unless explicitly labeled stable, draft, or proposed, or in stable, draft, or proposed sections. All features are likely to change. Stable features are possible, and thus this living specification may eventually document some. When stable features are documented, substantive changes may be proposed by issues and errata filed in response to implementation experience, requiring consensus among participating implementers as part of an explicit to be defined change control process.
Participate
Open Issues
IRC
Editor
Tantek Çelik
License
Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. In addition, as of 2024-04-27, the editors have made this specification available under the Open Web Foundation Agreement Version 1.0.
Initial publication
2022-04-01

algorithm changes

For each of the following subsections, apply the changes therein to the steps in the same subsection in the microformats2-parsing specification.

parse a document for microformats

Change the line "start with an empty JSON…" to:

  • start with an empty JSON "items" array and hashes "meta-item", "rels", "rel-urls":

Update the JSON sample data structure to:

{
 "items": [],
 "meta-item": {},
 "rels": {},
 "rel-urls": {}
}

Before the line "* return the resulting JSON", insert:

  • parse the head element for backcompat metaformats

parsing an element for properties

parsing a p- property

insert before "else return the textContent of the element after …":

  • else if meta.p-x[content], then return the content attribute

parsing a u- property

insert before "else return the textContent of the element after …":

  • else if meta.u-x[content], then get the content attribute

parsing a dt- property

insert before "else return the textContent of the element after …":

  • else if meta.dt-x[content], then return the content attribute

parse the head element for backcompat metaformats

After the section "parse an img element for src and alt", insert a new sibling section:

=== parse the head element for backcompat metaformats ===

With the following instructions:

  • if there were no root class names found on the html and head elements
    • if there is a meta[property=og:type] then
      • get that meta element’s content attribute value
      • if it’s "article" then set the meta-item "type" property to "entry"
      • if it’s "profile" then set the meta-item "type" property to "card"
      • if it's "music" or "video" then set the meta-item "type" property to "cite"
    • end if
    • if there is no meta-item "type" property value and there is a meta[name=twitter:card] then
      • get that meta element’s content attribute value
      • if it's "summary" or "summary_large_image" then set the meta-item "type" property to "entry"
    • end if
    • if there is no meta-item "type" property value and
      • there is a meta[property^=og:] or meta[name^=twitter:]
    • then set the meta-item "type" property to "entry"
    • parse meta elements for backcompat properties as follows and add them to a "properties" hash inside the "meta-item" hash:
      • if meta[property="og:title"], parse content attribute as p-name
      • else meta[name="twitter:title"], parse content as p-name
      • if meta[property="og:description"], parse content as p-summary
      • else if meta[name="twitter:description"], parse content as p-summary
      • if meta[property="og:image"], parse content as u-photo
      • else if meta[name="twitter:image"], parse content as u-photo
      • if meta[property="og:video"], parse content as u-video
      • if meta[property="og:audio"], parse content as u-audio
      • if meta[property="article:published_time"], parse content as dt-published
      • if meta[property="article:modified_time"], parse content as dt-updated
      • if meta[property="article:author"], parse content as p-author
    • end if /* parse meta elements for backcompat properties */
  • end if /* no root class names found on html and head */

Background

The following prior work was used to develop this specification:

See Also