h2vx

From Microformats Wiki
Revision as of 14:29, 13 October 2015 by Brian (talk | contribs) (Helping debug the code a bit from afar)
Jump to navigation Jump to search

<entry-title>H2VX</entry-title>

H2VX is a production deployment of the X2V hCard and hCalendar conversion transforms.

It converts hCard contacts and hCalendar events on web pages to .vcf and .ics respectively for use in desktop and other client software applications.

documentation

To convert hCards to vCards, go http://h2vx.com/vcf/ and enter the URL to the hCards.

To convert hCalendar to iCalendar, go http://h2vx.com/ics/ and enter the URL to the hCalendar events.

URLs

Links to H2VX.com to convert a URL (like http://microformats.org/wiki/events ) can be constructed as follows:

You may omit the leading "http://" from the URL to be converted for a briefer more readable URL:

download vCards from hCards
http://h2vx.com/vcf/URL
e.g. http://h2vx.com/vcf/microformats.org/wiki/events
download iCalendar from hCalendar
http://h2vx.com/ics/URL
e.g. http://h2vx.com/ics/microformats.org/wiki/events
subscribe to hCalendar from hCalendar
webcal://h2vx.com/ics/URL
e.g. webcal://h2vx.com/ics/microformats.org/wiki/events
http://h2vx.com/ics/sub/URL for systems which don't support auto-linking of webcal: URLs, e.g. MediaWiki, Twitter.
e.g. http://h2vx.com/ics/sub/microformats.org/wiki/events

user agent strings

H2VX uses two user agent strings, when retrieving hCards and hCalendars respectively:

  • H2VX contacts proxy (http://h2vx.com/vcf/)
  • H2VX events proxy (http://h2vx.com/ics/)

You may see occurrences of these in your web server logs when users of H2VX convert hCards and hCalendar events on your pages.

built

H2VX is built and maintained by Tantek with:

  • X2V XSLTs by Brian Suda
  • PHP get-contact.php get-cal.php originally written by Brian, updated/factored by Tantek with various improvements.
  • PHP common.php (and Javascript common.js) by Tantek which incorporate CASSISv0 open source from http://cassisproject.com/
  • XHTML1+CSS+JS front-end design/interface by Tantek (view source of h2vx.com in your browser for more).

open source

H2VX is available on the microformats github:

setting up your own H2VX

(in progress)

history

Some of the open source history behind H2VX can be found with history of X2V. E.g.

issues

Found a problem with H2VX? Please note it here at the top of this list (consider grouping it under an existing subhead or introduce a new subhead if necessary) and use ~~~~ to sign your name and date your comment. If this grows too big we can move it to h2vx-issues

The contact conversion service @ http://h2vx.com/vcf/ fails with the error : Fatal error: Call-time pass-by-reference has been removed in /home/tantek/domains/h2vx.com/public_html/vcf/get-contact.php on line 106

Why this error now ? Is it possible to repair it ? This link is very useful Thank you

The error is on line 106, looking at the source-code, 106 is a print statement. It might be useful to turn on error reporting http://php.net/manual/en/function.error-reporting.php to get a better explanation and what line in the code that is on the server is failing. Then we can debug. Like 114 the regular expression smells like it might be the problem. There is an & which would be pass-by-reference.


value class pattern empty span fails

  • This calendar: http://www.ustreetmusichall.com/calendar results in a ics file that gives Error at line 11: Unparseable date: "T220000" when imported into Google calendar
    • 2012-154 verified with both h2vx.com and dev.h2vx.com. Page uses value class pattern, in particular, empty span technique which seems valid. Need to make a test case of this to isolate and track down. - Tantek 19:42, 2 June 2012 (UTC)
<span class="start dtstart">
 <span class="value-title" title="2012-06-02T22:00:00-04:00"></span>
 10:00 pm
</span>

empty document from arjw.ifmw.mobi

(needs investigation)

  • Have been getting an error where it says that its a "empty document; no HTML can be found" from my site. As with the comment below, there are no issues with using Operator to extract. I used to use this service a good while ago, but since moving to PHP/HTML5 for this page, I'm getting this error. Antoine RJ Wright 04:17:49, 7 January 2010 (UTC)

unable to extract hCards on marinersmexico

(needs investigation)

  • The last few days I've not been able to retrieve vCards from hCards on this site using H2VX yet Operator is nicely extracting them. Comments and insights appreciated. ChipD 05:13, 12 November 2010 (UTC)

hCard ids not working with referrer

(needs investigation)

  • Can't seem to get #hcard-ids working with the /referrer option. Jnpcl 00:01, 14 October 2010 (UTC)
    • Could you provide the URL you are having trouble with? Tantek 18:51, 29 October 2010 (UTC)

hard to find URLs to put on a page

  • As a Web page author I find the H2XV site a bit awkward to use -- it's difficult to find the URLs to use in my Web page. As an end-user it fine to have the H2VX bookmarklets in my toolbar, but as a page author I can't be sure everyone has the bookmarklets or Operator installed. Bob Jonkman 00:56, 10 November 2009 (UTC)

possible problem with iOS6

from: https://twitter.com/equivalentideas/status/275120059788169216

robots.txt prevents subscription in Google Reader

Google Reader won't subscribe to any h2vx hCalendar files due to robots.txt. TomMorris 15:33, 1 June 2011 (UTC)

Google Calendar also fails because of their robots.txt which disallows robots from fetching and therefore caching the ical files. Jayvdb 22:16, 5 May 2012 (UTC)

Apparently, I was correct in thinking Google would likely have a unique user-agent specifically for calendar fetches (see e.g. quote below) but they don't. It's not on the UA list and I tested empirically (`sudo nc -v -l -p 80`) and it is a generic UA. ;( Someone should retest Google Reader to see if it works.

Google has several other user-agents, including Feedfetcher (user-agent Feedfetcher-Google). Since Feedfetcher requests come from explicit action by human users who have added the feeds to their Google home page or to Google Reader], and not from automated crawlers, Feedfetcher does not follow robots.txt guidelines. You can prevent Feedfetcher from crawling your site by configuring your server to serve a 404, 410, or other error status message to user-agent Feedfetcher-Google.More information about Feedfetcher.

<ref>http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182072</ref> --Jeremyb 22:34, 12 June 2012 (UTC)

On second thought, I tested w/ a real webserver (+tcpdump) to see if /robots.txt was fetched with a different agent than the actual feed. no such luck.
From: googlebot(at)googlebot.com
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
--Jeremyb 22:34, 12 June 2012 (UTC)

Date incorrect when not using abbr element for dtstart

I had a date marked up like so <em class="detail dtstart" title="2010-10-20">Wednesday, October 20, 2010</em>. It was not being parsed correctly until I changed it to use <abbr>, but the element shouldn't really make any difference.

.vcf not formed properly

When opening the resultant .vcf files with Outlook, all non-standard characters are not shown correctly, due to the fact that the returned file is not encoded as UTF-8 without the BOM. Thus, these files are useless for use with Outlook - one of the most used e-mail clients.

Can we get a UTF-8 file returned without the BOM?

Missing data/Wrong encoding

  • We'd like to use your service for the new version of our location list (Free WiFi Hotspots in Austria) but ran into problems:
    • After importing the vCard, the Mac OS X Address book showed only the phone number (not as work), the URL, zip code and city. No name and no street.
    • The vCard itself is encoded in ISO-8859-1, although having “CHARSET=utf-8” instructions. The source is also encoded in UTF-8.
    • Here's the HTML code, we've been using:
<div class="vcard">
<h2 class="fn org"><img class="photo" src="http://static.freewave.at/logos/testa_rossa_caffe_150.gif" alt="Testa Rossa Caffèbar Logo" />Testa Rossa Caffèbar</h2>
<div class="adr work"><span class="street-address">Mahlerstraße 4 </span><br />
 <span class="postal-code">1010</span> <span class="locality">Wien</span><br />
 <span class="country-name">Österreich</span></div>
<div><span class="tel work">+43 699 161 616 61</span><br />
 <a class="url work" href="http://www.testarossawien.at/">http://www.testarossawien.at/</a><br />
 <a class="email work" href="mailto:"></a></div>
<div class="geo"><span class="latitude">48.20275</span>,<span class="longitude">16.37079</span></div>
</div>

Thanks! --Vividvisions 17:25, 4 May 2010 (UTC)

I got the same type of problem with non ASCII content. Don't know which part is responsible, though. --Jean-Luc Geering 2010-05-10

+1, I was coaching the dudes at http://hagreve.com/ implementing hCalendar and they reached the roadblock of having the accented chars wrongly encoded on the .ics They resorted to using other ways of building an ics. :sadface: Thanks. -- andr3

HTML5 support

  • <meta charset=utf-8> isn't recognized so the output is double encoded. Greut 11:12, 4 January 2010 (UTC)
  • new HTML5 elements (such as header, footer, section) are not supported (this is because they are stripped out by PHP Tidy and thus ignored). Tantek 16:13, 19 January 2010 (UTC)
    possible solutions:

Please give feedback on the http://dev.h2vx.com/ HTML5 support here:

  • Is there a timeout/throttling on requests to dev.h2vx? I've been getting inconsistent returns on ics and webcal requests from the same markup. Don't know what else could be the issue. When it works, it works great though!
    • Any throttling we've been adding manually as necessary. What URL are you trying? Tantek 18:04, 14 July 2011 (UTC)
  • ...

Previously:

  • Possible options
    • 1. Use a proper PHP html5lib (being coded by the HTML5 community, but not available/functional yet AFAIK) - still might do this long term.
    • 2. Add a flag to the H2VX processing URL which says "I'm a crazy XML person and my markup is 100% well formed XML, please don't tidy, please break and fail to process if it's not well formed".
    • in either case, new special HTML5 elements (like time) will require an update to X2V to know to properly handle/parse new semantic attributes (like datetime).

mouse events

  • The "what are microformats?" style descriptions only appear on mouse-over of the trigger terms (those with class="term"). It does not appear at all when keyboard navigation is used, making it somewhat inaccessible. The problem here is that the trigger elements are the ones that should receive focus, but not being links they are not in the tabbing order so do not, hence the helper text never appears for keyboard users. Norm 10:39, 6 November 2009 (UTC)
    • Quick fix: remove visibility:hidden from .term .info. Andr3

page semantics

  • <i class="term"> should be made into <em>'s for semantic reasons. ;) Andr3
    • Note that the <i> element is used deliberately for "instance" of a term - this is an HTML5 semantic, and is more accurate in this instance than "em"phasis. — Tantek 18:29, 11 November 2009 (UTC)

not possible to use dtstart with timezone in abbr title

  • Adding a timezone to dtstart using abbr pattern leads to The Shining-style debug output repeating “Object is a string”. I tried adding a time with timezone via the value class pattern, and while the vcard downloads the time is incorrect ~~Oli 00:53 15 February 2010 (+09:00)
    • Oli, could you provide a URL to a live example/test case that you were using so we can test with it to try to see exactly what is going on? Thanks! Tantek 17:35, 15 February 2010 (UTC)

resolved

Resolved issues are moved to this section. If this grows too big we can move it to h2vx-issues-resolved

  • ...
  • 2013-06-19 h2vx vcf service fails to parse a vcard if the vcard class is set on an HTML5 <article> element. Will return with a test page URL. Urlyman 15:12, 19 June 2013 (UTC)
    • Try it with dev.h2vx.com - which supports new HTML5 elements. - Tantek 23:08, 28 September 2013 (UTC)
  • 2009-11-11 We were using the Technorati hosted service. Surprised to see it redirected to H2XV, took a minute to realize what was going on. Thanks for picking up the service! Both the hosting provider and the new user agent are blocked by default on our side to prevent scraping. To be more transparent, maybe you could change the UA similar to the old one: from "Technorati contacts proxy (http://technorati.com/contacts/)" to "H2VX contacts proxy (http://h2vx.com/vcf/)" DineMonkey 15:47, 11 November 2009 (UTC)
    • I've updated the user agent strings per your recommendation and documented them above as well. Tantek 18:29, 11 November 2009 (UTC)
      • H2VX contacts proxy (http://h2vx.com/vcf/)
      • H2VX events proxy (http://h2vx.com/ics/)

closed

Once a resolved issue has no further actions (and ideally is verified by the issue reporter), it can be closed and moved to this section. If this grows too big we can move it to h2vx-issues-closed

feedback

Have feedback on H2VX like suggested improvements? Feel free to add to the top of this list and use ~~~~ to sign your name and date your comment. If this grows too big we can move it to h2vx-feedback

  • ...

related

There is at least one related H2V service that uses the same X2V XSLT files as H2VX:

old

Previously Technorati hosted X2V conversion services:

see also