19.3 Harvesting nodes

19.3 Harvesting nodes
Prev	19. Settings hierarchy	Next

The second top level hierarchy is harvesting. All nodes added using the web interface are stored here. Each child has node in its key and its value can be geonetwork, webdav, csw or another depending on the node’s type.

All harvesting nodes share a common setting structure, which is used by the harvesting engine to retrieve these common parameters. This imply that any new harvesting type must honor this structure, which is the following:

site : A container for site information.
- name (string) : Node’s name as shown in the harvesting list.
- uuid (string) : A unique identifier assigned by the system when the harvesting node is created.
- useAccount (boolean) : Indicates if the harvester has to authenticate to access the data.
  - username (string) :
  - password (string) :
options :
- every (integer) : Timeout, in minutes, between 2 consecutive harvesting.
- oneRunOnly (boolean) : If true, the harvester will harvest one time from this node and then it will set the status to inactive.
- status (active|inactive) : Indicates if the harvesting from this node is stopped (inactive) or if the harvester is waiting until the timeout comes.
privileges [0..1] : This is a container for privileges to assign to each imported metadata
- group (integer) [0..n] : Indicate a local group. The node’s value is its local identifier. There can be several group nodes each with its set of privileges.
  - operation (integer) [0..n] : Privilege to assign to the group. The node’s value is the numeric id of the operation like 0=view, 1=download, 2=edit etc...
categories [0..1] : This is a container for categories to assign to each imported metadata
- category (integer) [0..n] : Indicate a local category and the node’s value is its local identifier.
info : Just a container for some information about harvesting from this node.
- lastRun (string) : If not empty, tells when the harvester harvested from this node. The value is the current time in millis since 1 January, 1970.

Privileges and categories nodes can or cannot be present depending on the harvesting type. In the following structures, this common structure is not shown. Only extra information specific to the harvesting type is described.

Nodes of type geonetwork

This is the native harvesting supported by geonetwork 2.1 and above.

site : Contains host and account information
- host (string)
- port (integer)
- servlet (string)
search [0..n] : Contains the search parameters. If this element is missing, an unconstrained search will be performed.
- freeText (string)
- title (string)
- abstract (string)
- keywords (string)
- digital (boolean)
- hardcopy (boolean)
- source (string)
groupsCopyPolicy [0..n] : Represents a copy policy for a remote group. It is used to maintain remote privileges on harvested metadata.
- name (string) : Internal name (not localized) of a remote group.
- policy (string) : Copy policy. For the group all, policies are: copy, copyToIntranet. For all other groups, policies are: copy, createAndCopy. The intranet group is not considered.

Nodes of type geonetwork20

This type allows harvesting from old geonetwork 2.0.x nodes.

site : Contains host and account information
- host (string)
- port (integer)
- servlet (string)
search [0..n] : Contains the search parameters. If this element is missing no harvesting will be performed but the host’s parameters will be used to connect to the remote node.
- freeText (string)
- title (string)
- abstract (string)
- keywords (string)
- digital (boolean)
- hardcopy (boolean)
- siteId (string)

Nodes of type webdav

This harvesting type is capable of connecting to a web server which is WEB DAV enabled.

site : Contains the URL to connect to and account information
- url (string) : URL to connect to. Must be well formed, starting with ’http://’, ’file://’ or a supported protocol.
- icon (string) : This is the icon that will be used as the metadata source’s logo. The image is taken from the images/harvesting folder and copied to the images/logos folder.
options
- recurse (boolean) : Indicates if the remote folder must be recursively scanned for metadata.
- validate (boolean) : If set, the harvester will validate the metadata against its schema and the metadata will be harvested only if it is valid.

Nodes of type csw

This type of harvesting is capable of querying a Catalogue Services for the Web (CSW) server and retrieving all found metadata.

site
- capabUrl (string) : URL of the capabilities file that will be used to retrieve the operations address.
- icon (string) : This is the icon that will be used as the metadata source’s logo. The image is taken from the images/harvesting folder and copied to the images/logos folder.
search [0..n] : Contains search parameters. If this element is missing, an unconstrained search will be performed.
- freeText (string)
- title (string)
- abstract (string)
- subject (string)

Prev	Up	Next
19.2 The system hierarchy	Home	Appendix A. Frequently Asked Questions