xmdp-brainstorming: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 9: Line 9:


Add your name here if you make significant contributions to this page and wish to take responsibility for them.
Add your name here if you make significant contributions to this page and wish to take responsibility for them.
=== UNDER CONSTRUCTION ===
NOTE: This page is currently a bit of a mishmash of [[xmdp-faq]] , [[xmdp-issues]], and XMDP brainstorming.  I'm going to need to spend some time separating all this out.  - [http://tantek.com/log/ Tantek Çelik]
= XMDP brainstorming =


== Introduction ==
== Introduction ==


Tantek Çelik, Matt Mullenweb and Eric Meyer have developed the [http://gmpg.org/xmdp/ XMDP] to define extensions to XHTML including rel values, class names, and <meta name> properties and values.  Per the [http://gmpg.org/xmdp/description XMDP spec], a link to a microformat's XMDP in the profile attribute of head element indicates that that microformat's vocabulary is formally defined in the document.  A parser could read the allowed attribute values from the linked XMDP and use their presence in the document to infer that that particular microformat was in use.
Tantek Çelik developed [http://gmpg.org/xmdp/ XMDP] to define extensions to XHTML including rel values, class names, and <meta name> properties and values.  Per the [http://gmpg.org/xmdp/description XMDP spec], a link to a microformat's XMDP in the profile attribute of head element indicates that that microformat's vocabulary is formally defined in the document.  A parser could read the allowed attribute values from the linked XMDP and thus know explicitly which microformats may be in use, and which class names are meant to convey which meanings.


=== Raised Issues ===
This page is for exploring possible additions / extensions to XMDP.
* Just because a profile value mentioned in a microformat's linked XMDP also appears in the document does not mean that that microformat is in use.  Such co-occurrences could be purely by chance.
** REJECTED. No this does not make sense.  By definition, an XMDP profile defines certain properties and values.  Any use of such property or value in the document is thus defined by th definition in the XMDP.
** [[User:Bud|Bud]] 20:01, 13 Jul 2005 (PDT): Actually, this is far from clear.  Reading this excerpt from [http://gmpg.org/xmdp/description the XMDP description]:  "This specification does not define a set of legal meta data properties. The meaning of a property and the set of legal values for that property should be defined in a reference lexicon called a profile. For example, a profile designed to help search engines index documents might define properties such as "author", "copyright", "keywords", etc." seems not to imply exclusivity for the whole document, only for the part covered by the profile.  If we assumed the quoted words implied exclusivity for the whole document, then only defined attribute values could be used '''for the whole document'''.  The current usage suggests that we mean the profile to only cover the part of the document covered by the microformat.  As such, we cannot use occurrence of a value to connote presence of the microformat.  Consider this example, xFolk and hCalendar both use a description class attribute value.  Presence of that value is therefore indeterminate as to which format is being used, even if we accepted your claim here, which seems dubious.
** Bud, that quote you give is XMDP quoting HTML4, please re-read the XMDP spec more carefullly.  This is a non-issue.
* Currently, the XMDP can only be linked from the profile attribute of the head element.  In many instances, authors will not have access to the head element.
** ACCEPTED. There are two additional proposed ways to link to XMDP profiles
**# <code>&lt;link rel="profile"&gt;</code>, as introduced in the XMDP poster submitted to WWW2005.
**# <code>&lt;a rel="profile" href&gt;</code>, as similarly discussed.
** Supporting use cases:
*** Providing explicit profile definitions for microformats embedded in RSS and Atom and other feeds / envelope formats that don't necessarily have a &lt;head&gt; element with a 'profile' attribute or something similar.
*** I.e. : A very practical motivation for this question is the process of embedding xFolk in an RSS 2.0 feed and the ability to indicate the microformat is in use and where information about it can be found.
* Documents with user-generated content are hard to parse, and microformats present particular parsing challenges.
** REJECTED. This is a straw man issue.
** [[User:Bud|Bud]] 19:44, 13 Jul 2005 (PDT): Tantek needs to supply some justification for why this is a strawman as every developer I have talked to has raised it.  It may be that the solutions described below are sufficient to solve the issue. More neutral statements to that effect might be more constructive.
** Bud, saying "particular parsing challenges", without stating them is meaningless.  Hence strawman.  I think you may be mistaking questions for issues.


''Feel free to add issues here.  Keep issues in this list in summary form.  Save lengthy discussion and potential solutions for elaboration below.''
See [[xmdp-faq]] and [[xmdp-issues]] for questions and issues.


== Addressing issues ==


These are in no particular order, but an issue should appear in the issues list above if it is addressed here.
== Possible XMDP additions ==


=== Linking to the XMDP ===
=== Linking to the XMDP ===
Line 51: Line 29:
** As noted by a number of people, this approach has the added benefit of creating a viral marketing opportunity for the microformats used.  For instance, developers could add badges saying they are using microformat xyz as suggested by the example.
** As noted by a number of people, this approach has the added benefit of creating a viral marketing opportunity for the microformats used.  For instance, developers could add badges saying they are using microformat xyz as suggested by the example.
** Blog authoring environments allow you to insert links at will, so this squarely <abbr title="avoids">obviates</abbr> the need to access the head element.
** Blog authoring environments allow you to insert links at will, so this squarely <abbr title="avoids">obviates</abbr> the need to access the head element.
It should be noted that none of these linking solutions addresses the issue of when exactly the microformat is being used in the document.  They only indicate that the microformat may be in use.
No. that is false.  Referencing an XMDP introduces its definitions into the document.  Period.  Those definitions then take effect for the properties and values defined therein.
[[User:Bud|Bud]] 20:06, 13 Jul 2005 (PDT): Again, a read of the text I quote above does not support this conclusion.  If it did, you could only use values defined in the XMDP.
Bud, see above, you are confusing a quote for prose in the spec.  It's marked up and displayed and cited as a blockquote.  Please read the spec more carefully.  And where did you get "only" from?


=== Resolving when microformats are actually in use ===
=== Resolving when microformats are actually in use ===
Line 71: Line 41:
=== Parsing microformats ===
=== Parsing microformats ===


Parsing user-generated content is challenging.  Frequently, it does not validate and may not even be well formed.  Therefore, microformat discovery mechanisms that depend on documents having even minimal xml properties like well-formedness will often fail. This is true, in particular, of [http://suda.co.uk/projects/X2V/ Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes] which use XSLT.
''Editor's note: should this section be moved to a different document on parsing microformats? This is not specifically about XMDP - [http://tantek.com/log/ Tantek]''


However, most microformats, which tend to be agnostic about things like exact element type used, typically require that the developer resort to tools like XPATH that assume well-formedness.  Mark Pilgrim's example [http://sourceforge.net/projects/feedparser/ universal feed parser] suggests that it may be possible to sanitize user html to an extent that it is suitable for later processing as xml.
Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content.  Tidy may be a useful work around.
In particular [http://suda.co.uk/projects/X2V/ Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes] use XSLT, and "tidy" any non-well-formed input before processing it.


From a pragmatic developer perspective, parsing web pages to discover microformats is likely to be an area of much work.
Most microformats tend to be agnostic about things like exact element type used.


Bud, this section is conflating several questions and issues and needs to be broken down further in order to make sense.
Developers can use tools like XPATH that assume well-formedness on well-formed content (from the web or by using tidy).  Mark Pilgrim's example [http://sourceforge.net/projects/feedparser/ universal feed parser] suggests that it may be possible to sanitize user html to an extent that it is suitable for later processing as xml.

Revision as of 21:30, 25 October 2005

XMDP Brainstorming

Authors

Add your name here if you make significant contributions to this page and wish to take responsibility for them.

Introduction

Tantek Çelik developed XMDP to define extensions to XHTML including rel values, class names, and <meta name> properties and values. Per the XMDP spec, a link to a microformat's XMDP in the profile attribute of head element indicates that that microformat's vocabulary is formally defined in the document. A parser could read the allowed attribute values from the linked XMDP and thus know explicitly which microformats may be in use, and which class names are meant to convey which meanings.

This page is for exploring possible additions / extensions to XMDP.

See xmdp-faq and xmdp-issues for questions and issues.


Possible XMDP additions

Linking to the XMDP

There are at least two additional methods under discussion for linking to the XMDP in addition to the current method of using the profile attribute of the head element:

  • Using <link rel="profile" href="link to XMDP"/>. This method can be used now and will be formalized in XHTML 2.
    • A problem with this method is that it requires access to the head element.
  • Using <a rel="profile" href="link to XMDP">powered by microformat xyz</a> in the body of the document.
    • As noted by a number of people, this approach has the added benefit of creating a viral marketing opportunity for the microformats used. For instance, developers could add badges saying they are using microformat xyz as suggested by the example.
    • Blog authoring environments allow you to insert links at will, so this squarely obviates the need to access the head element.

Resolving when microformats are actually in use

One solution to this issue is simply to include the <a rel="profile" href="link to XMDP">powered by microformat xyz</a> within the container element for the microformat. The XMDP spec could then specify that when the <a> element is used in this way, it indicates that the microformat is used by the element containing the <a> element.

There are, however, several clear issues with this proposal:

  • Not every microformat has a container element. Consider reltag one of the most widely used microformats.
  • To some extent, using microformats adds to the cost of writing the document. It's like filling in a form just to write your thoughts. Putting <a> elements with each microformat adds unwanted links on top of that.

Parsing microformats

Editor's note: should this section be moved to a different document on parsing microformats? This is not specifically about XMDP - Tantek

Microformat parsing mechanisms that depend on documents having even minimal xml properties like well-formedness may fail when consuming non-well-formed content. Tidy may be a useful work around. In particular Brian Suda's frequently cited X2V hCard and hCalendar discovery and transformation prototypes use XSLT, and "tidy" any non-well-formed input before processing it.

Most microformats tend to be agnostic about things like exact element type used.

Developers can use tools like XPATH that assume well-formedness on well-formed content (from the web or by using tidy). Mark Pilgrim's example universal feed parser suggests that it may be possible to sanitize user html to an extent that it is suitable for later processing as xml.