evarln | Bonjour, Mon problème est le suivant: j'aimerais utiliser le script d'une librairie opensource(LEXenstein). J'ai remplacé les paramètres par ce que je voulais sauf que ça ne fonctionne pas. Le compilateur ne râle pas mais rien ne se passe... Càd que dans le code, j'ai changé inp = open("tatata.txt" ), etc.
Code :
- def produceWordCooccurrenceModel(text_file, window, model_file):
- """
- Creates a co-occurrence model from a text file.
- These models can be used by certain classes in LEXenstein, such as the Yamamoto Ranker and the Biran Selector.
- @param text_file: Text from which to estimate the word co-occurrence model.
- @param window: Number of tokens to the left and right of a word to be included as a co-occurring word.
- @param model_file: Path in which to save the word co-occurrence model.
- """
- inp = open(text_file)
- coocs = {}
- c = 0
- for line in inp:
- c += 1
- print('At line: ' + str(c))
- tokens = line.strip().lower().split(' ')
- for i in range(0, len(tokens)):
- target = tokens[i]
- if target not in coocs.keys():
- coocs[target] = {}
- left = max(0, i-window)
- right = min(len(tokens), i+window+1)
- for j in range(left, right):
- if j!=i:
- cooc = tokens[j]
- if cooc not in coocs[target].keys():
- coocs[target][cooc] = 1
- else:
- coocs[target][cooc] += 1
- inp.close()
- targets = sorted(coocs.keys())
- out = open(model_file, 'w')
- for target in targets:
- newline = target + '\t'
- words = sorted(coocs[target].keys())
- for word in words:
- newline += word + ':' + str(coocs[target][word]) + '\t'
- out.write(newline.strip() + '\n')
- out.close()
|
Merci d'avance |