TODO: images reviews comments type (for validation of metadata) type (for use in the publication) serial ("periodical") volume/issue number - need an internal name for the issue also multiple editions recorded on event ================================================================ "feed" containment is not directly represented. rather that is a query against other properties. also, storage or file system paths are not represented; those two are a result of other properties. ================================================================ our proposed metadata vocabulary title "Title" This is the entire "proper title" (as in a book, movie, etc.). It might be repeated with multiple distinguished variants (encodings, translations, etc.). For an article, the main headline. This does not break down the full title; it includes nonsorting words (such as "The") and any subtitles (typically appearing after a colon within the value). Mappings: Atom: atom:title Dublin: DC.title HTML: title MARC: 21 245 $a and $b OpenURL: atitle subtitle "Sub Title" DO NOT USE: We do not at this time define a subtitle field. SEE ALSO: "deck" Mappings: MARC: MARC 21 245 $b MODS: mods:subTitle Atom: atom:subtitle short_title "Short Title" Used for a context where a shorter title is needed, such as in a navigational menu. In a newspaper, a jumpline might need such a title too. Note that the "abbreviated title" (as in MARC 210$a or 773$p) or OpenURL stitle are usually interpreted specifically as including abbreviated words, not just being some shorter alternative. Mappings: OpenURL: stitle NOT: DCTERMS: alternative [too vague] summary "Summary" Used in a teaser (or "sky box") as well as in feeds. This is not the article content itself, in the case of an article, though the summary might be defaulted from a lead paragraph. Mappings: Atom: atom:summary Dublin: DCTERMS.abstract | DC.description HTML: meta description | p[1] slug_line "Slug Line" In newspaper lingo, a "slug line" is a nickname or label for an article during production. Can be defaulted from the title. Has a word separator, often a hyphen. In a web application, might be used for a persistent URL or file name. Mappings: NewsML: SlugLine resource_id opaque string, automatically generated at create only. generally not changed in lifetime despite changes to metadata or resource. should be globally unique, but need not be a URI. Mappings: Atom: atom:id Dublin: DC.identifier resource_url how to fetch the resource this metadata is about. not all resources are electronic. should not just be a web page that provides even more information, should be the resource itself. if the resource and its metadata are separate, this url is for the resource, not the metadata (see "self_url") may change with a move or copy, unlike resource_id. self_url TBD.... The URL to fetch the latest version of this metadata object (not necessarily the same version). Usually automatically maintained. and/or ... a repository-relative path for where this resource is kept. used for create and edit. not absolute, because of source control, create vs. publish, etc. ---- byline "Byline" The principle author's name (sometimes multiple names). Mappings: NewsML TEI: byline dateline "Dateline" Where the author was when the story information was gathered (and if not on location, the dateline is often omitted). May also include the date the item was (primarily) created (not publication date). Mappings: TEI: dateline deck "Deck" This is a structural markup of an article. This is not a substring of title. In newspaper lingo, a "deck" goes between the headline and body, usually in a different font style. (The term "blurb" is more generic; a "blurb" is sometimes used for a short statement about the author, and is also used as a synonym for "sky box". A "sky box" is something short about the article on the front page; a "sky box" may instead have information about the issue as a whole, not just promos for particular articles. A "kicker" may go above the headline; sometimes "kicker" is also used informally to refer to the conclusion of an article.) Mappings: NewsML: SubHeadLine ---- performance_of ---- part_of "partof" is to indicate an object which this is a part of. Should usually be an intrinsic property -- unlike a "feed", since something might participate in multiple feeds. May be nested, an episode might be part of a show, and an episode might be split into two parts. Another nested example is an article in an issue in a periodical. partof.title Mappings: ID3: TALB OpenURL: title partof.resource_id TBD: is this useful? Mappings: Dublin: DCTERMS.isPartOf DCTERMS.isPartOf generally should have a URI value, xsi:type="dcterms:URI" partof.number Within the "partof" object, a distinguishing identifier for which this resource this one is, typically (but not always) a sequential integer. For example, an "Episode Number" or "Track Number". May not be needed if there is a broadcast/publishing date which distinguishes parts. [Note that this is an example of an attribbut on a relationship....] TEI Example: 79th Year Number 2 OpenURL: volume, issue NITF: pubdata/volume pubdata/issue ID3: TRCK (track number), TPOS (if two CDs, etc.) MARC: 440$n (A number designation of a part/section of a series.) or 440$v (Volume number/sequential designation) ---- source "Source" another resource which is the source of this one (for example in republishing). the publisher of the source might for example be in a Dateline, as in "(Business Wire)", from NewsML/NewsEnvelope/NewsService/@FormalName Mappings: Dublin: DC.Source ---- category Atom has @term, @scheme, @label. keyword tags, usually one word, perhaps controlled. Mappings: Atom: atom:category Dublin: DC.subject HTML: keywords MARC: 21 653 - INDEX TERM--UNCONTROLLED ---- action Used to express relationships between resources and agents (humans or corporates). In some contexts it is desired to represent a primary creative contributor.... [TBD]. Note that this differs from "byline" in that sometimes it is desired track the contributors internally, but not credit any public byline. All of these are taken to be actions on the resource, not on the metadata record. In the case of a publisher, when it is associated with a collection such as a feed, then it is the publisher of that feed, not necessarily of the items in it. Often the publisher of the items in a feed is defaulted to the one at the feed level. An agent can have attributes such as: full_name email homepage action.what For a list of MARC relator codes, see http://www.loc.gov/marc/relators/relaterm.html . This includes: "Author", "Host", "Composer", "Publisher". Because it lists roles for agents, it does not list all actions, such as "submit", "approve", "accept" Also, in MARC they always pertain to the resource, not the metadata record (for example PRISM distributor and MARC distributor, distribute the resource). Note that in DC, "Contributor" has role refinements, but "Creator" and "Publisher" do not (see http://dublincore.org/usage/documents/relators/ ). action.where action.when Note that DCTERMS.dateSubmitted is intended for cases such as when a thesis was submitted, etc., but we can use for the submit action to this publication. Note that depending on how distinct the resource definition is, the create date could be after the recording date. Modification date may be set automatically upon create, and on any update (of either resource or metadata). The publication date is for this publication, not some prior publication. Mappings: Dublin: DCTERMS.modified DCTERMS.issued DCTERMS.created DCTERMS.dateAccepted DCTERMS.dateSubmitted Atom: atom:updated atom:published action.who.full_name Note that DC.Contributor may be a name but is not required to be by dublin core. Dublin has nothing for submitter name: http://askdcmi.askvrd.org/default.aspx?id=5228&cat=1730 what = "Author", who.full_name: Mappings: Atom: atom:author/atom:name Dublin: DC.creator HTML: meta author TEI: docAuthor what = "Publisher", who.full_name: Mappings: Dublin: DC.publisher iTunes: itunes:owner/itunes:name ID3: TPUB what = "Publisher", who.email: Mappings: iTunes: itunes:owner/itunes:email what = "Publisher", who.homepage: Mappings: ID3: WPUB ---- image_url Mappings: Google: Google.image_link iTunes: itunes:image Atom: atom:logo image_caption ---- language mime_type comment: text/plain or 'application/pdf', etc.: media type / subtype from: DC.format or automatic resource_class comment: refinement of DCMIType: 'Sound.music', 'Text.newsletter' 'Text.poem' 'Event.workshop' ---- changes NewsML/NewsItem/Identification/NewsIdentifier/RevisionId ---- draft Mappings: Atom: atom:control/@draft NewsML: NewsML/NewsItem/NewsManagement/Status 'Usable', 'Canceled', or 'Embargoed' NewsML/NewsItem/NewsManagement/Instruction has value "Kill" with Status "Canceled" ---- duration_seconds duration expressed as a number of seconds (TBD: or ISO8601 duration, with leading P and containing T) ---- rights_statement copyright_date from: DCTERMS.dateCopyrighted copyright_owner from: DCTERMS.rightsHolder license_link from: atom:link rel="license" | DCTERMS.license | html a rel="license" (http://creativecommons.org/audio/publish-website) | ID3 TCOP (http://creativecommons.org/technology/mp3) ---- purchase_info from: ID3.COMR ================================================================ Or as hierarchical: content "work", "entity", "resource", "creation" party (corporate or individual) "actor", "agent", "contributor" Kinds of relationships: simple scalar attribute: title pagecount ISBN Any scalar can acquire its value via a URI (which may be external or local) May or may not have human language variants, or textual variants (character encoding, html/xhtml) May have refinement: title from spine, title from cover scalars that have a URI value homepage [note that the value is defined to be of type URI; this differs from using a URI to find the value. It should not be considered of type URI merely because the content is remote. ] reference to a media object: thumbnail image main image sidebar image [note that these qualifiers are of the use of the image, not intrinsic to the image. In other cases the object subclass will distinguish a movie trailer from a full movie, for example.] reference to other modeled object: category author organizer Any reference can be an inline definition. n-ary: X is a translation of Y by W X is partof series Y (episode of, issue of ...) X was published by Y on X Split into: X is a translation of Y X was translated by W X was translated on D X was translated at P Any attribute (as well as the entry as a whole) might have its own authorship and history. Most attributes can appear multiple times (with ordering -- multiple authors, etc.) resource book article event image audio video physical instance party person corporate body place event category ================================================================ How to indicate associated images with entries. idea is primary image, or representative image http://www.pictureaustralia.org/ uses DC.Identifier.URL.thumbnail. sixapart has their own: xmlns:book="http://sixapart.com/atom/book#" http://images.amazon.com/images/P/0375412808.01.THUMBZZZ.jpg see http://www.sixapart.com/pronet/docs/typepad_atom_api google has their own http://www.google.com/base/rss_specs.html xmlns:g="http://base.google.com/ns/1.0" but not caption or other metadata about the image apple iPhoto "photocasting" uses a prefix of "apple-wallpapers": http://lists.apple.com/archives/syndication-dev/2006/Jan/msg00020.html apple-wallpapers:image is the uri of the image itself apple-wallpapers:thumbnail is a uri for a placeholder while uploading/downloading the full thing apple iTunes has itunes:image xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0dtd" at the channel level ================================================================ Bibliography http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Hunter/ http://www.columbia.edu/cu/libraries/inside/projects/metadata/model/whitepaper.html www.xml.com/pub/a/2005/01/26/formtax.html http://staff.library.mun.ca/staff/toolbox/standards.htm http://www.cdlib.org/inside/projects/rights/gap_analysis.html Marc XML and OpenURL: http://raymondyee.net/wiki/MarcXmlToOpenUrlCrosswalk http://www.oclc.org/research/projects/mswitch/1_gem-marc.htm http://staff.library.mun.ca/staff/toolbox/standards.htm http://www.inkdroid.org/journal/2005/11/03/a-citation-microformat-when-worlds-collide/ http://www.kcoyle.net/meta_purpose.html http://dublincore.org/documents/dc-citation-guidelines/ http://www.livejournal.com/users/gnomicutterance/1222.html http://cavlec.yarinareth.net/archives/2004/12/17/librarians-and-error/ http://microformats.org/wiki/cite-formats http://microformats.org/wiki/rel-tag some good newspaper terminology lists: http://www.journalism.co.uk/glossary.shtml http://www.dailyherald.com/jump/nie/downloads/newsterms.pdf http://www.freep.com/legacy/jobspage/high/jargon.htm http://highered.mcgraw-hill.com/sites/0072407611/student_view0/glossary.html ================================================================ metadata syntax conventions In theory, could use dotted names, QNames, IRI, or tokens. HTML meta already has some overlapping with dublin: DC.title <meta name="keywords"> DC.subject <meta name="description"> DC.description encoding Dublin in separate xml: http://dublincore.org/documents/dc-xml-guidelines/ <?xml version="1.0"?> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title> ... encoding Dublin in xhtml: http://dublincore.org/documents/dcq-html/ differs from RFC2731, which uses 'DC.Date.modified' rather than 'DCTERMS.modified'. <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /> <meta name="DC.title" lang="en" content="Expressing Dublin Core in HTML/XHTML meta and link elements" /> <meta name="DC.subject" lang="en-GB" content="seafood" /> <meta name="DC.creator" content="Andy Powell, UKOLN, University of Bath" /> <meta name="DCTERMS.modified" content="2001-07-18" /> <link rel="DC.relation" href="http://www.example.org/" /> <link rel="DCTERMS.references" href="http://www.example.org/publications/2002/176459.pdf" /> ================================================================ Different metadata systems Summary comments: Z38.80 decent, but only a draft, and hardly used MARC and MARCXML arcane. very detailed, yet still combines page/serial info in one text field. MODS XML, still somewhat ugly, but ok. apparently lacks a way of saying "this is a journal article": http://www.scripps.edu/~cdputnam/software/bibutils/mods_intro.html PRISM a lot of overlap with DC, yet not as sophisticated as library systems Z39.88 (OpenURL) the KEV encoding will not support full author info, and no version has an abstract or other fields. no great support for web articles yet. BibTeX not purely declarative; really is programming source code. DocBook as with all of docbook, a strange amalgam of apparently redundant and partial fields. but certainly covers more than Dublin. DC/GEM from Syracuse university, for one project. more extended contact info. AGLS ICE NITF NewsML Atom http://www.atomenabled.org/developers/syndication/ (or RFC4287) DC http://dublincore.org/documents/dces/ DCTERMS http://dublincore.org/documents/dcmi-terms/ MP3 ID3 http://id3.org/ Ogg Comment http://www.xiph.org/vorbis/doc/v-comment.html PDF XMP http://www.adobe.com/products/xmp/main.html (http://creativecommons.org/technology/xmp-help) JPEG EXIF MPEG-4/AAC udta indecs XML Namespaces: Atom http://www.w3.org/2005/Atom DC http://purl.org/dc/elements/1.1/ DCTERMS http://purl.org/dc/terms/ NITF,NewsML does not use XML namespaces OpenURL in xml encoding, different namespace for each object, e.g. http://www.niso.org/OpenURL/jarticle/ Detailed notes: CIS Common Information System http://cisac.org copyright society DCMS Department for Culture, Media and Sport UK recording industry Kendra http://kendra.org.uk/develop.php AACR http://www.collectionscanada.ca/jsc/docs.html#logical Wikicat meta.wikimedia.org/wiki/Wikicat FRBR Functional Requirements for Bibliographic Records http://www.oclc.org/research/projects/frbr/ http://www.ifla.org/VII/s13/wgfrbr/related_efforts.htm 1998 recommendation of the International Federation of Library Associations and Institutions (IFLA) 3 kinds of entities: Group 1 are products "A work is realized through one or more expressions each of which is embodied in one or more manifestations each of which is exemplified by one or more items." RLG is collapsing these 4 into 2, work and manifestation Group 2 are entities responsible (person or corporate body) Group 3 are subjects of endeavor (concept, object, event, and place) for discussion contrasting with Xobis, see http://inquiringlibrarian.blogspot.com/2005/03/random-thoughts-on-xobis.html for interesting history and introduction see http://www.ddb.de/standardisierung/pdf/papers_leboeuf.pdf "Brave New FRBR World" an attempt at expressing FRBR in RDF: http://vocab.org/frbr/core# Wikicat http://meta.wikimedia.org/wiki/Wikicat_Technical_Design MusicBrainz http://musicbrainz.org/MM/ CIDOC CRM Conceptual Reference Model http://cidoc.ics.forth.gr/ http://www.willpowerinfo.myby.co.uk/cidoc/ museum/archive some work now on harmonization with FRBR (which means both libraries vs. museums and ER vs. OO/ontology) EPICS/ONIX http://xml.coverpages.org/onix.html book industry apparently influenced by INDECS: http://blogs.talis.com/panlibus/archives/2005/03/when_will_xml_r.html Harmony ABC http://www.ilrt.bris.ac.uk/discovery/harmony/docs/abc/abc_draft.html 1999-2002: http://metadata.net/harmony/ usual RDF stuff. not new vocabularies. partial translation to other vocabularies. does introduce an "event" model such as translation now folding into CIDOC and CIMI? a decent paper on interconversion: http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Hunter/ "MetaNet - A Metadata Term Thesaurus to Enable Semantic Interoperability Between Metadata Domains" on Harmony vs. CIDOC: http://metadata.net/harmony/JODI_Oct2002.pdf also notes some problems with FRBR AGLS # xmlns:agls="http://agls.gov.au/agls/1.2" # xmlns:AglsAgent="http://agls.gov.au/agent/1.0" http://www.agls.gov.au http://www.naa.gov.au/recordkeeping/gov_online/agls/metadata_element_set.html http://www.naa.gov.au/recordkeeping/gov_online/agls/schemes/AglsAgent1.0.html DC/GEM http://www.thegateway.org/ http://www.eduref.org/Eric/ http://www.oclc.org/research/projects/mswitch/1_gem-marc.htm WSRP http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsrp ICE (IDEAlliance) http://www.icestandard.org/specification/ a protocol, not focused on metadata all about getting paid DISC (IDEAlliance) http://www.disc-info.org/specifications/20comment/disc_meta_spec2.pdf DISC v1.0 was based on IPTC IMM DISC v2.0 has more IPTC and has a PRISM mapping oriented around photos IPTC "IPTC headers" = IMM = frozen in 1997 http://www.iptc.org/IIM/ NewsML - container structure http://www.newsml.org no XML namespace has public id urn:newsml:iptc.org:20021018:NewsMLv1.2:1 NITF - article markup http://www.nitf.org/ mostly html-ish but has more support for workflow and change history Adobe XMP (Extensible Metadata Platform) just RDF? XOBIS unused wiki at: http://www.xobis.info/ Originally at Stanford Medical Library. Trying to be between DC and MARC Done by Dick Miller http://www.stanford.edu/~dick/ and Kevin Clarke Seems to have slowed down when Kevin Clarke left Lane: http://www.kevinclarke.info/weblog/about-me/ Active again in late 2005/2006. http://xobis.stanford.edu/ and medlane.info are broken. See: http://elane.stanford.edu/laneauth/IFLA_Berlin.html "Introducing XOBIS to the FRBR Working Group (2003)" http://elane.stanford.edu/laneauth/XOBIS_CCQ/XOBIS_CCQ.html Has these basic entities: Concept String Language Organization Event Time Place Being Object Work Organization is distinct from Being Being has subclasses: human, specimen, special Has an XML vocabulary with RNG schema PRISM http://www.prismstandard.org/ http://xml.coverpages.org/prism.html namespace is http://prismstandard.org/namespaces/basic/1.2 for their extensions includes a lot of metadata for non-text (images etc.) in DIM (Digital Image Management) also has some rights management has an inline markup for content the prism namespace has some redundant stuff like prism:creationDate prims:embargoDate prism:expirationDate prism:hasPart prism:hasFormat prism:location prism:modificationDate and some new ones like prism:coverDate and prism:distributor prism:edition prism:startingPage prism:endingPage prism:hasCorrection prism:hasAlternative MarcXML http://www.loc.gov/MARC21/slim exact mapping of marc21 to xml note that 773$g combines year volume issue and page in MARC, yet those are distinct in OpenURL and in MODS list of Relators: http://www.loc.gov/marc/relators/relaterm.html MODS: http://www.loc.gov/standards/mods/ http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/mods-userguide-elements.html http://www.scripps.edu/~cdputnam/software/bibutils/mods_intro.html MODS Can have beauties like: <place> <placeTerm type="text">[Washington, D.C</placeTerm> </place> <publisher>Library of Congress</publisher> <dateIssued>1998-]</dateIssued> but does break up volume info: <part> <detail type="volume"> <number>24</number> </detail> <detail type="issue"> <number>2</number> <caption>no.</caption> </detail> <extent unit="page"> <start>361</start> <end>378</end> </extent> <date>2000</date> </part> Z39.80 http://www.niso.org/standards/resources/drft4rev.html http://www.niso.org/standards/resources/z3980-3.pdf a plain text tagged format for exchange. never got out of draft in 99. looks like: AU Smith, John AU Doe, John AU Johns, John AF Doe, John; Science Inc, 5555 Science Drive, Science City, MO 44876 EL Smith, John; john_smith@anyu.edu EL Smith, John; http://www.anyu.edu/~jsmith AZ Doe, John; Australia DocBook: http://www.docbook.org/tdg/en/html/biblioentry.html IETF RFC XML: http://www.faqs.org/rfcs/rfc2629.html Z39-88 OpenURL rft_val_fmt key (e.g. "info:ofi/fmt:kev:mtx:journal") info:ofi/ is the registry fmt means format related (vs. nam or enc or tsp) kev is the serialization (vs. xml) mtx is a constraint language Z39.88-2004 Matrix (vs. xsd) journal is a constraint definition XML example: <rft:journal xmlns:rft="info:ofi/fmt:xml:xsd:journal" xsi:schemaLocation="info:ofi/fmt:xml:xsd:journal http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:journal"> <rft:authors> <rft:author> <rft:aulast>Bergelson</rft:aulast> <rft:auinit>J</rft:auinit> </rft:author> </rft:authors> <rft:atitle>Isolation of a common receptor for coxsackie B viruses .... but note that in KEV markup, only one author can be listed: http://alcme.oclc.org/openurl/servlet/OAIHandler/extension?verb=GetMetadata&metadataPrefix=mtx&identifier=info:ofi/fmt:kev:mtx:journal available OpenURL formats are: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=Core:Metadata+Formats See http://www.exlibrisgroup.com/sfx_openurl_syntax.htm OpenURL parameters not perfect for bibentries since only enough info for reference (only first author). also no abstract. if online publication then use rft_id as the url, vs. some internal rft.artnum http://ocoins.info/ COinS <span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.issn=1045-4438"></span> See also http://www.dublincore.org/documents/dc-citation-guidelines/ a means of putting OpenURL into XHTML head instead of a span http://microformats.org/wiki/cite-examples <h1><a name="refs" id="refs">E.</a> References</h1> <p><strong>This appendix is informative.</strong></p> <dl> <dt><a name="ref-css2" id="ref-css2"><strong>[CSS2]</strong></a></dt> <dd> "<cite><a href="http://www.w3.org/TR/1998/REC-CSS2-19980512">Cascading Style Sheets, level 2 (CSS2) Specification</a></cite>", B. Bos, H. W. Lie, C. Lilley, I. Jacobs, 12 May 1998.<br /> <a href="http://www.w3.org/TR/REC-CSS2">Latest version</a> available at: http://www.w3.org/TR/REC-CSS2 </dd> <dt><a name="ref-dom" id="ref-dom"><strong>[DOM]</strong></a></dt> ... indecs http://www.indecs.org/ only accidental overlap with FRBR ? see footnote in http://www.ddb.de/standardisierung/pdf/papers_leboeuf.pdf lead by Godfrey Rust http://www.ontologyx.com/rust.html http://www.rightscom.com/Default.aspx?tabid=1112 See http://www.dlib.org/dlib/july98/rust/07rust.html "Metadata: The Right Approach", "An Integrated Model for Descriptive and Rights Metadata in E-commerce" July/August 1998 MPEG RDD considered a follow-on: http://www.rightscom.com/Default.aspx?tabid=1172 MPEG21 RDD Rights Data Dictionary http://xml.coverpages.org/ni2002-08-26-b.html like indecs, tech lead was Godfrey Rust ================================================================