we partition the world into these: has contents has metadata has children 'link' no yes (e.g. lstat vs. stat) no 'metadata' yes usually not no 'data' yes yes usually not (though might have URI-addressable parts that act that way, such as a tar file); can have metadata 'collection' no yes yes metadata does not have an independent URI from "data". "resource", "data" "node" "item" JSR-170/JSR-283 model: repository contains workspaces, each of which is a tree of items. all items have names; the root node has name "". names are xml qnames. each item can be a "node" or a "property". properties have no children. two sibling properties can't have the same name, but a sibling node and property can, and multiple sibling nodes can, in which case the path uses 1-based indexes. properties can have multiple values. properties are not orderable, but nodes are. properties have only scalar values, among a set of fixed types including URI, PATH, NAME, REFERENCE, WEAKREFERENCE properties have: getType() getLength() getLengths() [multi-valued case] getNode() [follow reference] has a notion of residual properties every node has a special property "jcr:primaryType" not all nodes are referenceable. if they are, they have a uuid. This uuid is shared among multiple workspaces. multiple workspaces can have corresponding nodes with identical UUIDs, possibly created through clone() some nodes are versionable, if they have mixin mix:versionable (implies mix:referenceable). there is at most one version of a node per workspace. properties are not versioned. the versioning of children is dependent on onParentVersion of the NodeDefinition: COPY, VERSION, INITIALIZE, COMPUTE, IGNORE, ABORT version graph is under /jcr:system/jcr:versionStorage a node may have a primary child item. other than that, nodes do not have values. primary child item can be node or property? namespaces are available from the persistent registry, but a client session can also specify namespace prefixes includes a standardized mapping from names to xml names: "My_x0020_Documents" has both system and document xml export, subject to read access control has CND (Compact Node Type Notation) builtin node types correspond to builtin functionality nt:nodeType, nt:propertyDefinition, nt:childNodeDefinition nt:versionHistory, nt:version, nt:versionLabels, nt:frozenNode, nt:versionedChild nt:query nt:activity nt:configuration nt:resource nt:address non content is kept under /jcr:system jcr:versionStorage jcr:nodeTypes jcr:activities jcr:configurations jcr:lost+found (but not oddly, enough access control rules) ACL: checkPermission(path, actions) actions include: "add_node" "set_property" "remove" "read" updates can be to transient storage (allowing temporary invalidity and reducing roundtrips) jcr:path is always available as a pseudo-column to SQL queries oddness/weakness no SPI interface. no xml export/import (or xpath) for the type manager. node type vs. property type versions aren't dealt with via xpath (just like types aren't) a mandatory base type workspace move is distinct from within-workspace move. why should i even know? is this for branch/merge? only import has support for what to do about collisions session save is distinct from UserTransaction methods (commit, etc.) and from versoin methods (checkin, checkpoint, etc.) and from "activities" (aka a set of multiple checkins for a single issue) event types are different from access permissions JCR 283 changes, see http://www.infoq.com/news/jcr-update Extensions for managing a content repository, including access control, workspace and node administration, and content retention) Improved interoperability through new standardized node types New extensions for content modeling Examples: ReiserFS files acting as directories http://www.namesys.com/v4/v4.html#files_dirs has a "..../state" and "..../range" and "..../process" child in the reiser4() system call: http://www.namesys.com/v4/v4.html#reiser4_call HFS "resource fork" Mail.app/ the "two files approach" to backing up ACLs use parameters in uri? HTTP URL (RFC2616) defines URI from RFC2396 path_segments = segment *( "/" segment ) segment = *pchar *( ";" param ) param = *pchar [relative to RFC1808] Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. D2R Server http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/ supports SPARQL http://www.w3.org/TR/rdf-sparql-protocol/ returns results in RDF ================================================================ API ================================================================ the api supports a transactional service context ("session") that includes: base uri (on the server) authentication transaction version spec xml namespace prefixes the session also includes: logout isLive impersonate(Credentials) distinguish folder behavior vs. content behavior by accepted content-type -- choose desired representation ? or by trailing slash? special managers: type info version history (JSR 147 = WVCM is a Java API on top of DeltaV http://www.ietf.org/rfc/rfc3253.txt) labels ("configurations"), branches, users, change sets ("activities"), checkouts locking workspaces users triggers ("observations", events) access control action, path archiving ("lifecycle", "effectivity", DOD 5015.2, "file plan") workflow (approval, drafts, JSR 207) http://confluence.atlassian.com/display/DOC/Approval+Workflow http://www.theserverside.com/tt/articles/content/Workflow/article.html inter-business process (BPEL4WS, JBoss jBPM http://www.jboss.com/products/jbpm BPEL http://en.wikipedia.org/wiki/BPEL BPMN http://en.wikipedia.org/wiki/Business_Process_Modeling_Notation http://www.ebpml.org/bpmn.htm XPDL http://www.ebpml.org/xpdl.htm ) not included: topic ids configuration of search indexing form templates view limitations might be done by user: > release_date < expire_date status = 'approved' upsert(to_parent_path, to_name, to_position, content_role, content, exists_assert, exists_action) to_parent_path - relative URI, relative to a remote repository base URI. to_name - a literal name for this within to_parent_path (no '/') may be missing in some cases: not if exists='must'. not if rel='metadata', usually not if rel='collection' if omitted, server will generate, perhaps based on information in MIME headers to_path - may not be supported by server. where resulting content will be after action. *[1] - first child, of any name *[last()] - last child, of any name following-sibling::*[1] - following sibling of current, of any name following-sibling::foobar[1] - this will become the first "foobar" child after current TBD: just to_position = into | following | preceding content_role - one of: 'metadata' - the contents is metadata for the to_name. data must already exist; metadata may or may not. 'data' - the contents is data for the to_name. 'combined' - metadata and data are together in the contents. exists_assert applies to data only; exists_action applies to both. example might cpio or atom with contents 'collection' - we want to make sure there is a collection by this name ("mkdir") contents - omitted if rel is 'collection'. the actual contents of the intended resource (with applicable MIME headers) exists_assert - one of: 'must' 'cant' 'either'. default 'either'. exists_action - only applicable to non-collection (since applies to contents). one of: 'replace', 'merge'. default 'replace'. returns: a destination name whether replaced or created NOTE: no batch update of multiple things to a single constant or dependent expression NOTE: if to_parent_path is missing or if to_name contains '/', server MAY support automatic creation of missing intermediaries NOTE: the server may expose some automatically derived variant (for example, convert Word to HTML, etc) namechange(to_parent_path, to_name, content_role, from_path, from_action, exists_assert, exists_action) to_name - in some cases can be omitted content_role - from_path: typically relative to remote repository, though a server MAY support external URIs. this might actually be the same as a to_name; this enables a change of "rel" in odd cases from_action: 'delete', 'leave' [TBD: separate move and copy? copy means new UUIDs] delete(path, rel, children_ok, missing_ok) if rel is 'metadata', will delete only that metadata if rel is 'data', will delete that data and any metadata if rel is 'collection', will delete that collection and any children and any metadata NOTE: no facility for deleting the result of an XPath returns: whether successfully deleted whether it existed before get(resource_path, content_role, accept_mime_type) content_role - one of: 'data' 'metadata' 'combined' accept_mime_types - like http Accept (but no ratings). 'application/atom+xml' (atom:entry). default '*:*'. returns: whatever was put in via upsert (if *:*) TBD: stream object to read big things list(resource_path, query_expr, accept_mime_type) these kinds of formats HXDLG http://hdlg.sourceforge.net/ xmlns=http://www.hdlg.info/XML/filesystem manifest.xml xmlns=http://openoffice.org/2001/manifest atom:feed "application/rss+xml revision=http://purl.org/rss/1.0/" RMP (builtin) Web Collections http://www.w3.org/TR/NOTE-XMLsubmit OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/static-repository" TODO: RDDL http://www.rddl.org/rddl2 (explain what namespaces mean) and http://www.w3.org/2001/tag/doc/nsDocuments/ simply lists all metadata objects for all immediate children, in a XML response wrapper ... ================================================================ Unlike APP (http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-07.html#pub-control) + can create collections + listing of available collections is same as listing contents of a collection + can act without any reliance on HTTP headers or methods + can update a part of a resource (as with images referenced by an html file) + supports merge vs. replace + can support "upsert" semantics - no paging in list results - no "title" (as distinguished metadata for a collection) Unlike XUpdate (http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html) - has no distinction between 'insert-before', 'insert-after', 'append' (could support 'name' being a child position) - has no 'rename' modification XQuery Update (http://exist.sourceforge.net/update_ext.html http://www.w3.org/TR/xquery-update-requirements/) FTP Webdav http://bitsko.slc.ut.us/blog/atom-webdav.html WSRP OAI GData http://code.google.com/apis/gdata/overview.html ================================================================ Could keep files like: 2005-12-03:00001/some-slug.xhtml 2005-12-03:00001/images/contained-image.png 2005-12-03:00001/some-slug.atom some-slug/index.xhtml some-slug/contained-image.png some-slug/atom.xml some-slug.xhtml some-slug_files/contained-image.png some-slug.atom