Peregrine Usage

From BioAssist
Revision as of 15:29, 17 March 2010 by Dmitry Katsubo (Talk | contribs)

Jump to: navigation, search

Peregrine usage examples

Using Peregrine as library via Java API

  • Use this Maven repository to make Maven aware about Peregrine-related artifacts:
<repositories>
	<repository>
		<id>sara-artifactory-server-id</id>
		<name>SARA Artifactory - Peregrine Maven releases</name>
		<url>http://ws1.grid.sara.nl:21501/artifactory/libs-releases/</url>
	</repository>
</repositories>
  • The list of obligatory packages, that you need to include into your project is:
    • peregrine-api (includes ontology-api and common-utils)
    • peregrine-normalizer
    • peregrine-tokenizer
  • You need to decide, what will be your ontology provider. There are several options[1]:
    • File source ontology (ontology-impl-file)
    • DB source ontology (ontology-impl-db)
  • You need to decide, whether you need to disambiguate text indexing results (usually, you do). If yes, you need to include peregrine-disambiguator project. Disambiguator layer can be used out of the box without special configuration.

  • You have to decide, what Peregrine interface implementation you want to use. By the time of writing this article, there is only one implementation available: peregrine-impl-hash

Sample pom.xml configuration

Peregrine client dependency graph.png
<project
	xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

	<modelVersion>4.0.0</modelVersion>

	<groupId>mygroup</groupId>
	<artifactId>peregrine-client</artifactId>
	<packaging>jar</packaging>
	<version>0.2-SNAPSHOT</version>

	<name>Peregrine Sample Client</name>
	<inceptionYear>2009</inceptionYear>

	<repositories>
		<repository>
			<id>sara-artifactory-server-id</id>
			<url>http://ws1.grid.sara.nl:21501/artifactory/libs-releases</url>
		</repository>
	</repositories>

	<dependencies>
		<dependency>
			<groupId>org.erasmusmc.data-mining.peregrine</groupId>
			<artifactId>peregrine-api</artifactId>
			<version>0.2-SNAPSHOT</version>
		</dependency>
		<dependency>
			<groupId>org.erasmusmc.data-mining.peregrine</groupId>
			<artifactId>peregrine-normalizer</artifactId>
			<version>0.2-SNAPSHOT</version>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.erasmusmc.data-mining.peregrine</groupId>
			<artifactId>peregrine-tokenizer</artifactId>
			<version>0.2-SNAPSHOT</version>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.erasmusmc.data-mining.peregrine</groupId>
			<artifactId>peregrine-disambiguator</artifactId>
			<version>0.2-SNAPSHOT</version>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.erasmusmc.data-mining.peregrine</groupId>
			<artifactId>peregrine-impl-hash</artifactId>
			<version>0.2-SNAPSHOT</version>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.erasmusmc.data-mining.ontology</groupId>
			<artifactId>ontology-impl-db</artifactId>
			<version>0.2-SNAPSHOT</version>
			<scope>runtime</scope>
		</dependency>
	</dependencies>
</project>

Sample ontology-impl-file configuration

<?xml version="1.0"?>
<beans
	xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.springframework.org/schema/beans	http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">

	<bean name="ontology" class="org.erasmusmc.data_mining.ontology.impl.file.SingleFileOntologyImpl">
		<constructor-arg value="file:/home/user/ontology_data.txt" />
	</bean>
</beans>

Sample ontology-impl-db configuration

<?xml version="1.0"?>
<beans
	xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="
		http://www.springframework.org/schema/beans	http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
		http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.5.xsd">

	<bean id="ontology" class="org.erasmusmc.data_mining.ontology.impl.db.DBOntologyImpl" lazy-init="true">
		<constructor-arg>
			<bean class="org.springframework.jdbc.core.simple.SimpleJdbcTemplate">
				<constructor-arg ref="ontologyDataSource" />
			</bean>
		</constructor-arg>
	</bean>

	<bean id="ontologyDataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" scope="singleton" lazy-init="true">
		<property name="driverClassName" value="com.mysql.jdbc.Driver" />
		<property name="url" value="jdbc:mysql://myserver:3306/mydatabase?autoReconnect=true" />
		<property name="username" value="dbuser" />
		<property name="password" value="dbpass" />
		<property name="validationQuery" value="select 1" />
	</bean>

	<bean id="txManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
		<property name="dataSource" ref="ontologyDataSource"/>
	</bean>

	<tx:annotation-driven transaction-manager="txManager" />
</beans>

Sample peregrine-impl-hash configuration

<?xml version="1.0"?>
<beans
	xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.springframework.org/schema/beans	http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">

	<bean name="thresholdDisambiguationDecisionMaker" class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.ThresholdDisambiguationDecisionMakerImpl">
		<property name="disambiguationMinimalWeight" value="50" />
		<property name="disambiguationAlwaysAcceptedWeight" value="80" />
	</bean>

	<bean name="ruleDisambiguator" class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.DisambiguatorImpl">
		<constructor-arg>
			<list>
				<ref local="looseDisambiguator" />
				<ref local="strictDisambiguator" />
			</list>
		</constructor-arg>
	</bean>

	<bean id="looseDisambiguator" class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.LooseDisambiguator">
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsHomonymRule" />
		</constructor-arg>
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsPreferredTermRule" />
		</constructor-arg>
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.HasSynonymRule">
				<property name="maxSynonymDistance" value="40" />
				<property name="minSynonymWeight" value="75" />
				<property name="maxSynonymWeight" value="80" />
			</bean>
		</constructor-arg>
	</bean>

	<bean id="strictDisambiguator" class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.StrictDisambiguator">
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsHomonymRule" />
		</constructor-arg>
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsPreferredTermRule" />
		</constructor-arg>
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsComplexRule">
				<property name="maxTermLength" value="6" />
				<property name="minTermLength" value="3" />
				<property name="minTermNumbers" value="1" />
				<property name="minTermLetters" value="1" />
			</bean>
		</constructor-arg>
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.HasSynonymRule">
				<property name="maxSynonymDistance" value="40" />
				<property name="minSynonymWeight" value="75" />
				<property name="maxSynonymWeight" value="80" />
			</bean>
		</constructor-arg>
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.HasKeywordRule">
				<property name="maxKeywordDistance" value="300" />
				<property name="minKeywordWeight" value="75" />
				<property name="maxKeywordWeight" value="80" />
			</bean>
		</constructor-arg>
	</bean>

	<bean name="peregrine" class="org.erasmusmc.data_mining.peregrine.impl.hash.PeregrineImpl">
		<constructor-arg ref="ontology" />
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.tokenizer.impl.SubSentenceTokenizer" />
		</constructor-arg>
		<constructor-arg>
			<bean class="org.erasmusmc.data_mining.peregrine.normalizer.impl.LVGNormalizer" />
		</constructor-arg>
		<constructor-arg ref="ruleDisambiguator" />
		<constructor-arg ref="thresholdDisambiguationDecisionMaker" />
	</bean>
</beans>

Using one of existing Peregrine projects

peregrine-ws is an example of exposing Peregrine interface as WebService. This is done using JAX-WS technology. JAX-WS provides a build-in servlet that serves registered HTTP endpoints.

peregrine-rmi is a nice example of how Peregrine interface can be exposed as RMI using build-in Spring framework facilities. However, if all RMI-staff is removed, you get an example of some command-line peregrine utility that can take some file as an argument for example, index it and write to another file.

peregrine-client is an example of using Peregine as a library in WebApplication. It provides simple JSP presentation of Peregine indexing results. As JSP is also a servlet, instead of speaking HTML it can produce XML (and act as REST service).

Profiling the memory usage of Peregrine

Peregrine has build-in support for memory profiling using wicket library. As this library is listed as optional in maven dependencies of org.erasmusmc.data-mining.peregrine.peregrine-impl-hash project, the end user of this dependency should explicitly add this dependency as runtime dependency, or add to WEB-INF/lib manually (in case of target deliverable is WAR application).

<dependency>
	<groupId>wicket</groupId>
	<artifactId>wicket</artifactId>
	<version>1.1</version>
	<scope>runtime</scope>
</dependency>

After that the memory information is available via PeregrineImpl.toString() method.

Deploying Peregrine Service

(this page is incomplete; it will be updated when Peregrine installer is implemented)

  • Download and run install script. After that deploy the resulting .war file to application server

Reference List

  1. For complete ontology backend providers, see ontology backends