[Gate-users] batch ML API

Jean-Philippe Prost jpprost at gmail.com
Tue Nov 4 18:20:59 CET 2008


Hello,

I'm fairly new to GATE, and I'm trying to use it for information extraction,
through the Batch ML PR. I've managed to make my way through up to running
it (corpus annotated with both labels and features + config. file). Well,
when I'm saying "running" I'm being slightly optimistic. I should rather say
that now is troubleshooting time, and I'd really appreciate some help on a
couple of questions (btw, I'm running SVMLibSvmJava).

- First off, I couldn't find much information on the web re troubleshooting,
but maybe I haven't been looking at the right places? Would there be any
documents (on or offline) that I should consult before bothering this list's
users with newbies' questions?

- Second, when running the learner (in EVALUATION mode) I get an
ArrayOutOfBoundException. Not so good, generally... I've been trying to
identifying the source of the problem, but no luck so far. The configuration
file is certainly to be blamed, but I can't find where the problem is, and
I'm running short of ideas.
Following are snippets of the output message from the API, and of the log
file. It looks like no classes are found, so I'm looking in that direction
(the data set of in my config file has a <class/> element in one of the
attributes).

Cheers,
JP

************** Output message *************************
Pre-processing the 50 documents...
Learning starts.
For the information about this learning see the log file
/home/jeanp/workspace/experiments/GATE-MLToy1/savedFiles/logFileForNLPLearning.save
** Evaluation mode:
Kfold k=2, numDoc=50, len=25.
java.lang.ArrayIndexOutOfBoundsException
**************** end output message **********************

************** log file ************************
04-Nov-2008 16:39:09:
*** A new run starts.
04-Nov-2008 16:39:09:
The execution time (pre-processing the first document): Tue Nov 04 16:39:09
GMT 2008
04-Nov-2008 16:39:09: The learning start at Tue Nov 04 16:39:09 GMT 2008
04-Nov-2008 16:39:09: The number of documents in dataset: 50
04-Nov-2008 16:39:09: ** Evaluation mode:
04-Nov-2008 16:39:09: K-fold evaluation: k=2
04-Nov-2008 16:39:09: Kfold k=2, numDoc=50, len=25.
04-Nov-2008 16:39:09:
*** Fold 1
Number of docs for training: 25
1 Subscription_-_Change_Of_Address-412809.txt.xml_00061
2 Subscription_-_Change_Of_Address-412843.txt.xml_00062
(...)
Number of docs for application: 25
1 Subscription_-_Change_Of_Address-412085.txt.xml_00048
2 Subscription_-_Change_Of_Address-411758.txt.xml_00049
(...)
04-Nov-2008 16:39:10:
Filtering starts.
04-Nov-2008 16:39:10: Multi to binary conversion.
04-Nov-2008 16:39:10: The number of classes in dataset: 0
04-Nov-2008 16:39:10: The learners: SVMLibSvmJava
04-Nov-2008 16:39:10: total Number of classes for learning is 0
04-Nov-2008 16:39:10: One against others for multi to binary class
conversion.
04-Nov-2008 16:39:10: One against others for multi to binary class
conversion.
Number of classes in model: 0
04-Nov-2008 16:39:10: Application time for class: 0ms
************** end log file **************************
--- End
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opengatecollaboration.org/mailman/private/gate-users/attachments/20081104/3e33315c/attachment.htm>


More information about the Gate-users mailing list