Peregrine Usage
From BioAssist
Revision as of 15:08, 11 March 2010 by Dmitry Katsubo (Talk | contribs)
Contents
Peregrine usage examples
Using Peregrine as library via Java API
- Use this Maven repository to make Maven aware about Peregrine-related artifacts:
<repositories> <repository> <id>sara-artifactory-server-id</id> <name>SARA Artifactory - Peregrine Maven releases</name> <url>http://ws1.grid.sara.nl:21501/artifactory/libs-releases/</url> </repository> </repositories>
- The list of obligatory packages, that you need to include into your project is:
- peregrine-api (includes ontology-api and common-utils)
- peregrine-normalizer
- peregrine-tokenizer
- You need to decide, what will be your ontology provider. There are several options[1]:
- File source ontology (ontology-impl-file)
- DB source ontology (ontology-impl-db)
- You need to decide, whether you need to disambiguate text indexing results (usually, you do). If yes, you need to include peregrine-disambiguator project. Disambiguator layer can be used out of the box without special configuration.
- You have to decide, what Peregrine interface implementation you want to use. By the time of writing this article, there is only one implementation available: peregrine-impl-hash
Sample pom.xml configuration
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>mygroup</groupId> <artifactId>peregrine-client</artifactId> <packaging>jar</packaging> <version>0.2-SNAPSHOT</version> <name>Peregrine Sample Client</name> <inceptionYear>2009</inceptionYear> <repositories> <repository> <id>sara-artifactory-server-id</id> <url>http://ws1.grid.sara.nl:21501/artifactory/libs-releases</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.erasmusmc.data-mining.peregrine</groupId> <artifactId>peregrine-api</artifactId> <version>0.2-SNAPSHOT</version> </dependency> <dependency> <groupId>org.erasmusmc.data-mining.peregrine</groupId> <artifactId>peregrine-normalizer</artifactId> <version>0.2-SNAPSHOT</version> <scope>runtime</scope> </dependency> <dependency> <groupId>org.erasmusmc.data-mining.peregrine</groupId> <artifactId>peregrine-tokenizer</artifactId> <version>0.2-SNAPSHOT</version> <scope>runtime</scope> </dependency> <dependency> <groupId>org.erasmusmc.data-mining.peregrine</groupId> <artifactId>peregrine-disambiguator</artifactId> <version>0.2-SNAPSHOT</version> <scope>runtime</scope> </dependency> <dependency> <groupId>org.erasmusmc.data-mining.peregrine</groupId> <artifactId>peregrine-impl-hash</artifactId> <version>0.2-SNAPSHOT</version> <scope>runtime</scope> </dependency> <dependency> <groupId>org.erasmusmc.data-mining.ontology</groupId> <artifactId>ontology-impl-db</artifactId> <version>0.2-SNAPSHOT</version> <scope>runtime</scope> </dependency> </dependencies> </project>
Sample ontology-impl-file configuration
<?xml version="1.0"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd"> <bean name="ontology" class="org.erasmusmc.data_mining.ontology.impl.file.SingleFileOntologyImpl"> <constructor-arg value="file:/home/user/ontology_data.txt" /> </bean> </beans>
Sample ontology-impl-db configuration
<?xml version="1.0"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.5.xsd"> <bean id="ontology" class="org.erasmusmc.data_mining.ontology.impl.db.DBOntologyImpl" lazy-init="true"> <constructor-arg> <bean class="org.springframework.jdbc.core.simple.SimpleJdbcTemplate"> <constructor-arg ref="ontologyDataSource" /> </bean> </constructor-arg> </bean> <bean id="ontologyDataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" scope="singleton" lazy-init="true"> <property name="driverClassName" value="com.mysql.jdbc.Driver" /> <property name="url" value="jdbc:mysql://myserver:3306/mydatabase?autoReconnect=true" /> <property name="username" value="dbuser" /> <property name="password" value="dbpass" /> <property name="validationQuery" value="select 1" /> </bean> <bean id="txManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"> <property name="dataSource" ref="ontologyDataSource"/> </bean> <tx:annotation-driven transaction-manager="txManager" /> </beans>
Sample peregrine-impl-hash configuration
<?xml version="1.0"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd"> <bean name="thresholdDisambiguationDecisionMaker" class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.ThresholdDisambiguationDecisionMakerImpl"> <property name="disambiguationMinimalWeight" value="50" /> <property name="disambiguationAlwaysAcceptedWeight" value="80" /> </bean> <bean name="ruleDisambiguator" class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.DisambiguatorImpl"> <constructor-arg> <list> <ref local="looseDisambiguator" /> <ref local="strictDisambiguator" /> </list> </constructor-arg> </bean> <bean id="looseDisambiguator" class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.LooseDisambiguator"> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsHomonymRule" /> </constructor-arg> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsPreferredTermRule" /> </constructor-arg> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.HasSynonymRule"> <property name="maxSynonymDistance" value="40" /> <property name="minSynonymWeight" value="75" /> <property name="maxSynonymWeight" value="80" /> </bean> </constructor-arg> </bean> <bean id="strictDisambiguator" class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.StrictDisambiguator"> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsHomonymRule" /> </constructor-arg> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsPreferredTermRule" /> </constructor-arg> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.IsComplexRule"> <property name="maxTermLength" value="6" /> <property name="minTermLength" value="3" /> <property name="minTermNumbers" value="1" /> <property name="minTermLetters" value="1" /> </bean> </constructor-arg> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.HasSynonymRule"> <property name="maxSynonymDistance" value="40" /> <property name="minSynonymWeight" value="75" /> <property name="maxSynonymWeight" value="80" /> </bean> </constructor-arg> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.disambiguator.impl.rule.HasKeywordRule"> <property name="maxKeywordDistance" value="300" /> <property name="minKeywordWeight" value="75" /> <property name="maxKeywordWeight" value="80" /> </bean> </constructor-arg> </bean> <bean name="peregrine" class="org.erasmusmc.data_mining.peregrine.impl.hash.PeregrineImpl"> <constructor-arg ref="ontology" /> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.tokenizer.impl.SubSentenceTokenizer" /> </constructor-arg> <constructor-arg> <bean class="org.erasmusmc.data_mining.peregrine.normalizer.impl.LVGNormalizer" /> </constructor-arg> <constructor-arg ref="ruleDisambiguator" /> <constructor-arg ref="thresholdDisambiguationDecisionMaker" /> </bean> </beans>
Profiling the memory usage of Peregrine
Peregrine has build-in support for memory profiling using wicket library. As this library is listed as optional in maven dependencies of org.erasmusmc.data-mining.peregrine.peregrine-impl-hash project, the end user of this dependency should explicitly add this dependency as runtime dependency, or add to WEB-INF/lib manually (in case of target deliverable is WAR application).
<dependency> <groupId>wicket</groupId> <artifactId>wicket</artifactId> <version>1.1</version> <scope>runtime</scope> </dependency>
After that the memory information is available via PeregrineImpl.toString() method.
Deploying Peregrine as WebService
(this page is incomplete; it will be updated when Peregrine installer is implemented)
- Download .war from this location (check the latest version in advance) and deploy to your application server.
or
- Download and run install script. After that deploy the resulting .war file to application server
Reference List
- ↑ For complete ontology backend providers, see ontology backends