2016年9月2日金曜日

開発環境

Think Python (Allen B. Downey (著)、 O'Reilly Media)のChapter 13.(Case Study: Data Structure Selection)のExercises 13-9.(No. 3150)を取り組んでみる。

Exercises 13-9.(No. 3150)

コード(Emacs)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import string
import matplotlib.pyplot as plt
import math


def readFile(filename, prefix_length=2, header_end='*END*'):
    words = {}
    with open(filename) as f:
        for line in f:
            if line.startswith(header_end):
                break
        chars = string.whitespace + string.punctuation
        for line in f:
            for ch in chars:
                line = line.replace(ch, ' ')
            for word in line.split():
                word = word.strip()
                words[word] = words.get(word, 0) + 1
    return words


if __name__ == '__main__':
    filename = 'emma.txt'
    words = readFile(filename)
    words = [(f, w) for w, f in words.items()]
    words = sorted(words, key=lambda t: -t[0])
    print('{0:^15} {1:^20} {2:^20}'.format('word', 'log f', 'log r'))
    fs = []
    rs = []
    j = int(len(words) / 5)
    k = len(words) - 10
    for i, (f, w) in enumerate(words):
        rank = i + 1
        if i < 10 or j < i <= (j + 10) or k <= i:
            print('{0:<15} {1:>20.16f} {2:>20.16f}'.format(
                w, math.log(f), math.log(rank)))
        rs.append(math.log(rank))
        fs.append(math.log(f))

    plt.figure(figsize=(5, 5))
    plt.plot(rs, fs)
    plt.savefig('zipf_law.svg')
    plt.show()

入出力結果(Terminal, IPython)

$ ./sample9.py
     word              log f                log r        
to                8.5537179660986098   0.0000000000000000
the               8.4859089013764706   0.6931471805599453
and               8.4495565427004262   1.0986122886681098
of                8.3624089776153703   1.3862943611198906
I                 8.0684029585696990   1.6094379124341003
a                 8.0080328469693072   1.7917594692280550
her               7.7832240163360371   1.9459101490553132
was               7.7777926263388304   2.0794415416798357
it                7.6634076648934792   2.1972245773362196
in                7.6582275261613519   2.3025850929940459
dull              2.1972245773362196   7.3225104339973939
talent            2.1972245773362196   7.3231707179434693
supply            2.1972245773362196   7.3238305662023171
End               2.1972245773362196   7.3244899793485319
Don               2.1972245773362196   7.3251489579555749
receive           2.1972245773362196   7.3258075025957732
hesitation        2.1972245773362196   7.3264656138403224
forty             2.1972245773362196   7.3271232922592926
collected         2.1972245773362196   7.3277805384216315
resumed           2.1972245773362196   7.3284373528951621
physician         0.0000000000000000   8.9299648747068403
losses            0.0000000000000000   8.9300972286214009
palpable          0.0000000000000000   8.9302295650207206
movements         0.0000000000000000   8.9303618839094341
enjoyable         0.0000000000000000   8.9304941852921775
cheese            0.0000000000000000   8.9306264691735784
facing            0.0000000000000000   8.9307587355582694
liveliest         0.0000000000000000   8.9308909844508779
Astonished        0.0000000000000000   8.9310232158560297
dispensed         0.0000000000000000   8.9311554297783484
$

0 コメント:

コメントを投稿

Comments on Google+: